Re: [zones-discuss] Zone Not Starting Properly?

2011-12-14 Thread Derek McEachern
I actually know that..

rpcinfo -p 

 The script is trying to mount an nfs share nased on a configured list of
remote nfs filers. In theory the client only has access to the filer
determines which one by using the rpcinfo command.

Derek

On Tue, Dec 13, 2011 at 6:55 PM, Edward Pilatowicz <
edward.pilatow...@oracle.com> wrote:

> On Tue, Dec 13, 2011 at 09:44:23AM -0600, Derek McEachern wrote:
> > Thought I would just send an update on this. Thanks for the all the
> > suggestions.
> >
> > To get around our particular issue I just added some retry logic to the
> > /etc/init.d/ script. When it runs it if finds that the operation has
> failed
> > it pauses for a second and will try again. It will try up to three times
> > before giving up.
> >
>
> it'd be interesting to know what particular operation is failing within
> the script...
>
> ed
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Not Starting Properly?

2011-12-13 Thread Derek McEachern
Thought I would just send an update on this. Thanks for the all the
suggestions.

To get around our particular issue I just added some retry logic to the
/etc/init.d/ script. When it runs it if finds that the operation has failed
it pauses for a second and will try again. It will try up to three times
before giving up.

Running more tests we were able to see that on some occasions it still
fails on the first attempt but so far has always been successful on the 2nd.

Derek

On Thu, Dec 1, 2011 at 3:37 PM, Ian Collins  wrote:

> On 12/ 2/11 10:30 AM, Derek McEachern wrote:
>
>> On Thu, Dec 1, 2011 at 2:48 PM, Ian Collins > i...@ianshome.com>> wrote:
>>
>>On 12/ 2/11 05:39 AM, Derek McEachern wrote:
>>
>>Have a peculiar problem that I haven't seen before.
>>
>>When starting a system that has about 35 - 40 zones on it
>>occasionally we see that one of the zones doesn't come up
>>properly. You can log into the zone but none of the /etc/rc3.d
>>scripts have been run.
>>
>>The same zone, or a random one?
>>
>>What happens if you halt one or more zones before rebooting?  Is
>>there a threshold where the problem begins to occur?
>>
>> Random zone.
>>
>> We've been testing to see if there is a threshold of trying to start too
>> many in parallel but so far we don't see anything.
>>
>> We saw the problem trying to start 3 zones in parallel but it was very
>> intermittent. Like 1 out of every 4 tries at started all 40 zones we would
>> see 1 failure. We ran some tests starting 10 zones in parallel and so far
>> no errors. Our assumption was that if it was load related moving from 3 to
>> 10 zones we would see problems.
>>
>>  I have several systems that start 10 or more zones and I've never seen
> any problems.
>
> I agree with the comment elsewhere that you should be using SMF rather
> than rc scripts to start services.
>
> It is also possible to create SMF services with the appropriate
> dependencies to start your zones in the correct order.
>
> --
> Ian.
>
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Not Starting Properly?

2011-12-01 Thread Derek McEachern
We haven't made the jump to zfs yet :-) We do loose some useful features
but haven't spent the time to port our stuff over to use zfs.

On Thu, Dec 1, 2011 at 2:47 PM, Ian Collins  wrote:

> On 12/ 2/11 06:07 AM, Derek McEachern wrote:
>
>> System has 72GB RAM
>> xeon cpu - 2 socket - 4 core - 16 thread
>>
>> zonereoot is on ufs filesystem on it's own drive, separate from OS.
>>
>>
> That (UFS) is a strange choice for a recent Solaris 10 version.  You loose
> the useful zones/ZFS features such as cloning.
>
> --
> Ian.
>
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Not Starting Properly?

2011-12-01 Thread Derek McEachern
I agree, our script could certainly be improved to add logic to check for
these failures and handle them which we will probably end up doing.

Derek

On Thu, Dec 1, 2011 at 2:47 PM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." <
laot...@gmail.com> wrote:

>  it seems that you could
> 1)improve your rc script to check the other dependence for apache
> or
> 2)use SMF for apache that check other dependence
> my 2c
>
>
>
> On 12/1/2011 1:33 PM, Derek McEachern wrote:
>
> Thanks Mike.
>
>  The more I look at this more I think it is load related. svcs -x only
> shows that the LP print server is not running which I don't think has any
> impact on what I'm seeing.
>
>  As for who not reporting what I would expect I tracked that down to
> someone installing the gnu tools in /usr/local/bin and then setting default
> path to reference those before /bin/ :-(
>
>  /bin/who -r shows the zone is at run level 3.
>
>  Looking at /var/svc/log/milestone-multi-user-server:default.log I can
> see that some of the other services have most likely not completed before
> it tries to run the rc scripts. It appears that the /usr filesystem hasn't
> yet been mounted read/write and the appstart script is logging an error
> that indicates rpc services are not completely running.
>
>  Executing legacy init script "/etc/rc3.d/S98apache".
> (30)Read-only file system: httpd: could not open error log file
> /usr/local/apache2/logs/error_log.
> Unable to open logs
> Legacy init script "/etc/rc3.d/S98apache" exited with return code 0.
> Executing legacy init script "/etc/rc3.d/S99appstart".
> ERROR: Unable to contact any server
> Legacy init script "/etc/rc3.d/S99appstart" exited with return code 0.
> [ Dec 1 09:17:13 Method "start" exited with status 0 ]
>
>  We have a process in place that only starts 3 zones at one time so we
> are not doing all 40 at once but it could be that with this hardware even
> trying 3 at a time is too much and we may need to drop to 2.
>
>  Derek
>
>  On Thu, Dec 1, 2011 at 12:07 PM, Mike Gerdts wrote:
>
>> On Thu 01 Dec 2011 at 10:39AM, Derek McEachern wrote:
>> > Have a peculiar problem that I haven't seen before.
>> >
>> > When starting a system that has about 35 - 40 zones on it occasionally
>> we
>> > see that one of the zones doesn't come up properly. You can log into the
>> > zone but none of the /etc/rc3.d scripts have been run.
>> >
>> > /var/adm/messages is completely empty and when running who -r to see the
>> > run level it doesn't report anything.
>>
>>  Take a look at the output of svcs -x.  Most likely you have a service
>> that svc:/milestone/multi-user-server:default depends on (directly or
>> indirectly) that has timed out and as such is in maintenance.  Because
>> the dependency is not satisfied, this milestone doesn't come up so the
>> rc3 scripts are not run.
>>
>> My guess is the timeout is because so many zones are starting at once
>> that the disks are being thrashed.  The resulting I/O backlog slows down
>> the startup of services, which leads to timeouts, which lead to some
>> services failing to even try to start.
>>
>> A google search and a 5 second read suggests that this link may be of
>> help to adjust the timeout of services that require a longer timeout:
>>
>> http://www.runningunix.com/2009/01/changing-timeouts-on-smf-services/
>>
>> --
>> Mike Gerdts
>> Solaris Core OS / Zones
>> http://blogs.oracle.com/zoneszone/
>>
>
>
>
> ___
> zones-discuss mailing listzones-disc...@opensolaris.org
>
>
> --
> Hung-Sheng Tsao Ph D.
> Founder & Principal
> HopBit GridComputing LLC
> cell: 9734950840http://laotsao.wordpress.com/http://laotsao.blogspot.com/
>
>
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Not Starting Properly?

2011-12-01 Thread Derek McEachern
Random zone.

We've been testing to see if there is a threshold of trying to start too
many in parallel but so far we don't see anything.

We saw the problem trying to start 3 zones in parallel but it was very
intermittent. Like 1 out of every 4 tries at started all 40 zones we would
see 1 failure. We ran some tests starting 10 zones in parallel and so far
no errors. Our assumption was that if it was load related moving from 3 to
10 zones we would see problems.

Derek

On Thu, Dec 1, 2011 at 2:48 PM, Ian Collins  wrote:

> On 12/ 2/11 05:39 AM, Derek McEachern wrote:
>
>> Have a peculiar problem that I haven't seen before.
>>
>> When starting a system that has about 35 - 40 zones on it occasionally we
>> see that one of the zones doesn't come up properly. You can log into the
>> zone but none of the /etc/rc3.d scripts have been run.
>>
>>  The same zone, or a random one?
>
> What happens if you halt one or more zones before rebooting?  Is there a
> threshold where the problem begins to occur?
>
> --
> Ian.
>
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Not Starting Properly?

2011-12-01 Thread Derek McEachern
Thanks Mike.

The more I look at this more I think it is load related. svcs -x only shows
that the LP print server is not running which I don't think has any impact
on what I'm seeing.

As for who not reporting what I would expect I tracked that down to someone
installing the gnu tools in /usr/local/bin and then setting default path to
reference those before /bin/ :-(

/bin/who -r shows the zone is at run level 3.

Looking at /var/svc/log/milestone-multi-user-server:default.log I can see
that some of the other services have most likely not completed before it
tries to run the rc scripts. It appears that the /usr filesystem hasn't yet
been mounted read/write and the appstart script is logging an error that
indicates rpc services are not completely running.

Executing legacy init script "/etc/rc3.d/S98apache".
(30)Read-only file system: httpd: could not open error log file
/usr/local/apache2/logs/error_log.
Unable to open logs
Legacy init script "/etc/rc3.d/S98apache" exited with return code 0.
Executing legacy init script "/etc/rc3.d/S99appstart".
ERROR: Unable to contact any server
Legacy init script "/etc/rc3.d/S99appstart" exited with return code 0.
[ Dec 1 09:17:13 Method "start" exited with status 0 ]

We have a process in place that only starts 3 zones at one time so we are
not doing all 40 at once but it could be that with this hardware even
trying 3 at a time is too much and we may need to drop to 2.

Derek

On Thu, Dec 1, 2011 at 12:07 PM, Mike Gerdts  wrote:

> On Thu 01 Dec 2011 at 10:39AM, Derek McEachern wrote:
> > Have a peculiar problem that I haven't seen before.
> >
> > When starting a system that has about 35 - 40 zones on it occasionally we
> > see that one of the zones doesn't come up properly. You can log into the
> > zone but none of the /etc/rc3.d scripts have been run.
> >
> > /var/adm/messages is completely empty and when running who -r to see the
> > run level it doesn't report anything.
>
> Take a look at the output of svcs -x.  Most likely you have a service
> that svc:/milestone/multi-user-server:default depends on (directly or
> indirectly) that has timed out and as such is in maintenance.  Because
> the dependency is not satisfied, this milestone doesn't come up so the
> rc3 scripts are not run.
>
> My guess is the timeout is because so many zones are starting at once
> that the disks are being thrashed.  The resulting I/O backlog slows down
> the startup of services, which leads to timeouts, which lead to some
> services failing to even try to start.
>
> A google search and a 5 second read suggests that this link may be of
> help to adjust the timeout of services that require a longer timeout:
>
> http://www.runningunix.com/2009/01/changing-timeouts-on-smf-services/
>
> --
> Mike Gerdts
> Solaris Core OS / Zones http://blogs.oracle.com/zoneszone/
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Not Starting Properly?

2011-12-01 Thread Derek McEachern
System has 72GB RAM
xeon cpu - 2 socket - 4 core - 16 thread

zonereoot is on ufs filesystem on it's own drive, separate from OS.

Derek

On Thu, Dec 1, 2011 at 11:01 AM, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." <
laot...@gmail.com> wrote:

>  for 30-40 zone
> what are the main host ram? and what kind of CPU? and how many CPU?
> was everything on ZFS? what are the storage/HDD for zone root?
> regards
>
>
>
> On 12/1/2011 11:39 AM, Derek McEachern wrote:
>
> Have a peculiar problem that I haven't seen before.
>
>  When starting a system that has about 35 - 40 zones on it occasionally
> we see that one of the zones doesn't come up properly. You can log into the
> zone but none of the /etc/rc3.d scripts have been run.
>
>  /var/adm/messages is completely empty and when running who -r to see the
> run level it doesn't report anything.
>
>  # who -r
> run-level Dec 1 09:17 last=
>
>  Anyone else seen anything similar? We are running Solaris 10 update 9.
>
>  Regards,
> Derek
>
>
> ___
> zones-discuss mailing listzones-disc...@opensolaris.org
>
>
> --
> Hung-Sheng Tsao Ph D.
> Founder & Principal
> HopBit GridComputing LLC
> cell: 9734950840http://laotsao.wordpress.com/http://laotsao.blogspot.com/
>
>
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

[zones-discuss] Zone Not Starting Properly?

2011-12-01 Thread Derek McEachern
Have a peculiar problem that I haven't seen before.

When starting a system that has about 35 - 40 zones on it occasionally we
see that one of the zones doesn't come up properly. You can log into the
zone but none of the /etc/rc3.d scripts have been run.

/var/adm/messages is completely empty and when running who -r to see the
run level it doesn't report anything.

# who -r
run-level Dec 1 09:17 last=

Anyone else seen anything similar? We are running Solaris 10 update 9.

Regards,
Derek
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Is it possible to determine from the zone as the global zone is called

2010-08-05 Thread Derek McEachern
One quick method that is mentioned frequently here and one we use very
successfully is to create a readonly lofs to /etc/nodename. We add the
following to all our zonecfgs

add fs
set dir=/etc/GLOBAL
set special=/etc/nodename
set type=lofs
add options [ro, nodevices]
end

so when your in a ngz you can cat /etc/GLOBAL to get the global host name.

On Thu, Aug 5, 2010 at 7:00 AM, Richard L. Hamilton wrote:

> > Hi,
> > i'm new here and i have a question:
> > Is it possible to determine from the zone as the
> > global zone is called?
> >
> > Is there a command in the zone  like zoneadm list ,
> > which show me the name of the global-zone.
> >
> > I need it for a script in the zone.
>
> AFAIK, there is no standard way to do that.
>
> Some people create zones with a file containing the
> hostname of the global zone.
>
> Others might put that in oem-banner, or use sneep
> to put it in nvramrc, along with hardware serial numbers
> and such.http://wikis.sun.com/display/sneep/Home
>
> But none of those are a built-in solution.
>
> I like the idea of putting it in nvram better than putting it
> in a file, since if the zone is moved to another server, it should
> then show the new location without having to update a file.
> --
> This message posted from opensolaris.org
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] how dynamic is your zones network configuration?

2010-06-04 Thread Derek McEachern
Never. We haven't ever had the need to change the interface for a zone.

On 6/4/10, Edward Pilatowicz  wrote:
> hey all,
>
> i had a quick questions for all the zones users out there.
>
> after you've configured and installed a zone with ip-type=shared (the
> default), how often do you change the network interfaces assigned to
> that zone via zonecfg(1m)?  frequently? infrequently? never?  only when
> moving from testing to production?  etc...
>
> thanks
> ed
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>

-- 
Sent from my mobile device
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] vxfs in non-global zone

2010-03-16 Thread Derek McEachern
Thanks for the responses.

We don't plan on running the zone root on vxfs, it will be on ufs.

The VRTSvfxs package installs with parameters
SUNW_PKG_ALLZONES='true'
SUNW_PKG_HOLLOW='true'
SUNW_PKG_THISZONE='false'

so package content is not delivered to the zone just the package information
so it appears to be installed.

I think we are going to experiment with mounting the vxfs into the zone from
the global.


On Tue, Mar 16, 2010 at 10:26 AM, Henrik Johansson wrote:

> I am away from home and om mobie device so i'll make it short.
>
> This work fine if you don't ever put the zoneroot on vxfs, if you do you
> will not be able to use all upgrade options and the Veritas supplied scripts
> for live upgrade only works with a vanilla install (no separate LUN for
> zones etc)
>
> Someone mentioned that not all packages was present in the local zone, I
> think that most of the VRTS packages (or to many at least) has PKG_ALLZONES
> set to to, so they must be installed on all zones(to have a supported
> system)
>
> I might be off target, it's late and I'm not supposed to do this on my
> vaccation;)
>
> Henrik
> http://sparcv9.blogspot.com
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

[zones-discuss] vxfs in non-global zone

2010-03-09 Thread Derek McEachern
We have been experimenting with mounting san storage with vxfs filesystems
in an ngz and there appears to be a couple of ways to accomplish this.

The FAQ links to a Symantec doc (
http://sfdoccentral.symantec.com/sf/5.0MP3/solaris/html/vxvm_admin/apbs07.htm)
that does it by adding a device to the zonecfg. This was problematic as our
ngz doesn't didn't have the necessary Veritas packages and wasn't able to
mount a vxfs filesystem. I haven't yet been able to determine if this was
because of how the software was installed in the gz or if they are
specifically excluded from the ngz.

It could also be done with an lofs by pointing to an already mounted vxfs in
the gz.

Yet another Symantec document (
http://seer.entsupport.symantec.com/docs/vascont/59.html) shows the file
system from the gz being directly mounted into the ngz.

i.e. from the global zone boot the ngz and then "mount -F vxfs
/dev/vx/dsk// //root/"

It took me a while to get my head around this but it works. The only obvious
problem I can see is that if the ngz reboots it loses its storage. There
doesn't appear to be a way to automatically get it remounted.


Is anyone else doing something similar? If so what has been your
experience/recommendation?

Derek
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Parameters in /etc/system in the zone

2009-10-21 Thread Derek McEachern
Vladi,

As far as I know there isn't an /etc/system file for a zone. You only have
one kernel which is running in the global zone so there isn't the need for
one in the ngz's.

If your looking for a list of stuff to set/tune look at the resource
configuration. Here is a link to a Sun doc which has details:
http://docs.sun.com/app/docs/doc/817-1592?l=en

Derek

On Wed, Oct 21, 2009 at 10:38 AM, Yanakiev, Vladimir <
vladimir_yanak...@fanniemae.com> wrote:

>
> Hi, All!
>
> I have a zone question - to my knowledge /etc/system in a zone has very
> little meaning, as these are kernel parameters. But still certain things
> can be set using this file. Is there a list of parameters for
> /etc/system, that work in a zone? I am asking in general, not for
> specific issue...
>
> Thanks!
>
> Vladi
>
> This e-mail and its attachments are confidential and solely for the
> intended addressee(s). Do not share or use them without Fannie Mae's
> approval. If received in error, contact the sender and delete them.
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] NFS zones via file based zpools or lofi/loopback

2009-08-25 Thread Derek McEachern
If zonecfg determines that the target fs type is nfs it will error and not
allow the zone to be configured. You can jump through hoops and do things to
hide that fact from zonecfg so it doesn't believe that the zone root is
running over nfs but I've got to think that it's not going to be well
tested, if tested at all.

Also, the blog you reference has the warning,
*"First let me say this is a workaround hack, we didn't do anything illegal,
and all the interfaces we used are regular Solaris interfaces, however it
comes dangerously close to the "don't do this at home folks" category. So
think twice before you'd use it in a production environment, but it
definitely fun to do."*

I'm not sure that I would feel comfortable running anything production
critical configured this way.

Derek


On Wed, Aug 19, 2009 at 6:03 PM, Henrik Johansson wrote:

> Hello Michael,
> On Aug 19, 2009, at 7:17 PM, Michael Barrett wrote:
>
> Say you create a zpool based on a file that lives on a NFS mount.  Then you
> mount that zpool to a local mount point and give it to your zone to live on.
>  I'm assuming that under the covers this is just another version of this
> loopback method:
>
> http://blogs.sun.com/jph/entry/containers_on_nfs
>
> Is there anyone out there running like this?  Any performance issues that
> jumped out at you?
>
>
>
> Since using files as backing store for a pool is not recommended, putting
> them on NFS will probably not make things better. It will only be as
> reliable as the remote filesystem and NFS implementation together. People
> have lost their pools in far less complex configurations, talk to the ZFS
> people but i doubt they will approve.
>
> I would feel more safe with UFS if I where to put a filesystem on files or
> perhaps iSCSI:
> http://blogs.sun.com/JeffV/entry/zoit_solaris_zones_on_iscsi
>
> Regards
>
> Henrikhttp://sparcv9.blogspot.com
>
>
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Configure a zone through sysidcfg

2009-08-14 Thread Derek McEachern
Does the information in this thread help:

http://www.opensolaris.org/jive/thread.jspa?threadID=108000&tstart=0

On Fri, Aug 14, 2009 at 4:55 PM, v  wrote:

> I created an exclusive IP zone.  Now I want to configure it using sysidcfg
> and avoid the prompts at the initial login.
>
> I created the below sysidcfg file:
>
> timezone=US/Eastern
> system_locale=C
> terminal=xterms
> network_interface=vnic1 {dhcp protocol_ipv6=yes}
> root_password=abc123
> security_policy=none
> name_service=DNS
> nfs4_domain=dynamic
>
> I wanted to copy this file to the zone's etc directory, but there is no
> such directory at this time (I already installed and booted the zone).  I go
> to /export/zones/zone1/root  but the directory is empty.  There is nothing
> in there.  There is no .../zone1/etc either.  So, I created an etc directory
> under root directory, put my sysidcfg file, and logged into the zone.  I
> still got the initial configuration prompts.  Apparently, it didn't looked
> at the sysidcfg file.  What I am doing wrong?
>
> Thanks...
> --
> This message posted from opensolaris.org
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] show which zone

2009-07-22 Thread Derek McEachern
Not using ps -efZ.  If you use ps -eo and select the fields you want to see
you can get the full zonename.

>From the ps(1) man page

 -Z  Prints the name of the zone  with  which
 the process is associated under an addi-
 tional column  header,  ZONE.  The  ZONE
 column width is limited to 8 characters.
 Use ps -eZ for a quick way to see infor-
 mation  about  every process now running
 along with the associated zone name. Use
   ps -eo zone,uid,pid,ppid,time,comm,...
 to see zone names wider than  8  charac-
 ters.


On Wed, Jul 22, 2009 at 1:14 PM, Mike DeMarco wrote:

> >From the global zone I can do a ps -efZ to see which zone a process is
> running in but it only prints the first 8 characters of the zone name. This
> is a problem with our naming convention on zones as the last characters in
> the name are often the unique ones. Is it possible ot expand the name field
> to show more than 8 characters or is there some other command I can use to
> show what zone a pid is running in from the global?
>
> Thanks
> mike
> --
> This message posted from opensolaris.org
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone in down state.

2009-07-13 Thread Derek McEachern
See the following threads:

http://opensolaris.org/jive/thread.jspa?threadID=101438&tstart=30
http://www.opensolaris.org/jive/thread.jspa?threadID=107664&tstart=0

Derek

On Mon, Jul 13, 2009 at 12:23 AM, Ketan  wrote:

> One of my zone is stuck in down state, not able to boot it or halt it ..
> not even detach .. is there any way to recover without rebooting the whole
> system ( global zone ) ?
> --
> This message posted from opensolaris.org
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in down state - nfs share

2009-07-10 Thread Derek McEachern
You can look at the following thread where I had a similar problem with a
zone stuck in a shutting down state:

http://opensolaris.org/jive/thread.jspa?threadID=101438&tstart=30


The other thing to look for is processing that might be accessing the ngz
from the gz using fuser. You can also use pwdx /proc/* | grep slabzone1to
find process. If you find any you can see what they are doing and kill them,
then try shutting down the zone again.

Otherwise I haven't found a way to kill the short of rebooting the box.


As an aside, I was under the impression that it was not advisable to access
ngz filesystems from the gz. A quick search only seems to point to the gz
possibly doing something nefarious to the ngz but I can't find any technical
reason it shouldn't be done.


On Fri, Jul 10, 2009 at 10:58 AM,  wrote:

> I needed to share out a non-global zones folder via nfs so I did it from
> the global zone like so:
>
> # share /slabzone1/zonepath/root/home
>
> Later I rebooted the zone:
> # zoneadm -z slabzone1 reboot
>
> The reboot command hung and the zone became stuck in the "down" status.  I
> assume this is because of the nfs share,  I tried unsharing it, but that
> didn't help:
>
> # unshare /slabzone1/zonepath/root/home
> nfs unshare: /slabzone1/zonepath/root/home: not shared
>
> My attempts to get the zone to transistion to "installed" state have all
> failed.  I assume this is a known issue, is there anyway to recover without
> a reboot?
>
> Thanks,
>
> Alex
>
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

[zones-discuss] zonestats.pl - Proposed change to handle stuck zones.

2009-06-22 Thread Derek McEachern
I have been using the zonestats.pl script for a while and came across an odd
issue. I have a host that has a zone stuck in the "shutting_down" state that
I haven't been able to get clean up. When zonestats runs it sees this zone
and tries to zlogin into it which has the effect of hanging up the script.

I made a modification to the script to check the zone status and if it's not
in a "running" state then skip it.

In the section that of code that gathers the zones names:
Current:
#
# Gather list of zones, their status and pool type and association.
if ($DEBUG) { print "/usr/sbin/zoneadm list -v\n"; }
open (NAMES, "/usr/sbin/zoneadm list -v|");
$znum=0;
while () {
 if (/^\s+(\S+)\s+(\S+)/) {
   if ($1 eq "ID") { next; }
   $znames[$znum++]=$2;
   $zoneid{$2}=$1;
   if ($opt_N) {
 $zlen = length ($znames[$znum-1]);
 $Nmaxznamelen = $zlen > $Nmaxznamelen ? $zlen : $Nmaxznamelen;
   }
 }
}
close NAMES;

Proposed:
#
# Gather list of zones, their status and pool type and association.
if ($DEBUG) { print "/usr/sbin/zoneadm list -v\n"; }
open (NAMES, "/usr/sbin/zoneadm list -v|");
$znum=0;
while () {
 if (/^\s+(\S+)\s+(\S+)*\s+(\S+)*/) {
   if ($1 eq "ID") { next; }
   *if ($3 ne "running" ) { next; }*
   $znames[$znum++]=$2;
   $zoneid{$2}=$1;
   if ($opt_N) {
 $zlen = length ($znames[$znum-1]);
 $Nmaxznamelen = $zlen > $Nmaxznamelen ? $zlen : $Nmaxznamelen;
   }
 }
}
close NAMES;


Regards,
Derek
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state in os2008.11

2009-06-21 Thread Derek McEachern
I guess my wording was a little confusing. What I meant by "nfs mounts in
the ngz from the gz" is that since you can't log into the zone you need to
check for hung nfs mounts from the global zone. The nfs mounts that we have
had hung were from filers.

Derek

On Sun, Jun 21, 2009 at 3:05 PM, Craig Cory  wrote:

> For this reason and others, it is recommended to NOT mount non-global zone
> clients to their own global zone servers. Use lofs for these local mounts.
>
>
> Derek McEachern wrote:
> > I have had the same problem with two zones and using the following two
> steps
> > I was able to get one of the zones to shut down and the other one I
> wasn't.
> >
> > First, check for hung nfs mounts for the ngz from the gz.  mount | grep
> > www2.  If you see any umount them from the gz.
> >
> > Next, check for any processes that might be accessing files in the ngz.
> From
> > the gz you can do something like:
> >
> > #  pwdx /proc/* | grep zone1
> > 17459:  /export/zone/zone1/root
> > 17731:  /export/zone/zone1/root/tmp
> > 18022:  /export/zone/zone1/root/tmp
> >
> > # ps -ef | egrep "18022|17731"
> > root 18022 17731   0 13:15:33 pts/2   0:00 sleep 50
> > root 18064 17745   0 13:17:40 pts/1   0:00 egrep 18022|17731
> > root 17731 17727   0 13:14:18 pts/2   0:00 sh
> >
> >
> > If you find any processes you can try and kill them.
> >
> >
> > After each step try and halt the zone and see if it comes down.
> >
> > If neither of these work the only solution I've heard of is rebooting the
> > host.
> >
> > On Sun, Jun 21, 2009 at 6:25 AM, solarg  wrote:
> >
> >> hello all,
> >> after trying to reboot a zone, it still hang:
> >> he...@antigone:~# zoneadm -z www2 reboot;zlogin -C www2
> >>
> >> on other termial, i try to kill the process:
> >> he...@antigone:~# ps -ef|grep www2
> >>root 16432 1   0   Mar 18 ?   0:03 zoneadmd -z www2
> >>root 18864 18860   0 12:18:14 pts/5   0:00 grep www2
> >>root 18809 11676   0 12:09:53 pts/3   0:00 zoneadm -z www2 reboot
> >> he...@antigone:~# kill -9 16432
> >>
> >> and:
> >> he...@antigone:~# zoneadm -z www2 reboot;zlogin -C www2
> >> door_call failed: Interrupted system call
> >> zone 'www2': WARNING: zone is in state 'down', but zoneadmd does not
> appear
> >> to be available; restarted zoneadmd to recover.
> >> [Connected to zone 'www2' console]
> >> ~^D
> >>
> >> he...@global:~# zoneadm list -cv
> >>  ID NAME STATUS PATH   BRAND
> >>  73 www2 down   /zones/www2ipkg
> shared
> >>
> >> i also have:
> >> he...@antigone:~# mdb -k
> >> > ::walk zone | ::print zone_t zone_name zone_ref
> >> ...
> >> zone_name = 0xff044b049c80 "www2"
> >> zone_ref = 0x2
> >>
> >> a precedent thread said:
> >> "If the refcount is greater than 0x1, it could be:
> >>6272846 User orders zone death; NFS client thumbs nose
> >> "
> >>
> >> he...@antigone:~# ps -ef|grep www2
> >>root 19091 18860   0 13:19:32 pts/5   0:00 zoneadm -z www2 halt
> >>root 19093 1   0 13:19:32 ?   0:00 zoneadmd -z www2
> >>root 19113 11676   0 13:24:30 pts/3   0:00 grep www2
> >> he...@antigone:~# truss -p 19093
> >> /4: door_return(0x, 0, 0x, 0xFE4F0E00, 1007360)
> >> (sleeping...)
> >> /3: zone_destroy(73)(sleeping...)
> >> /1: pollsys(0x08046AD0, 4, 0x, 0x) (sleeping...)
> >> /2: door_unref()(sleeping...)
> >> he...@antigone:~# truss -p 19091
> >> door_call(6, 0x08047590)(sleeping...)
> >>
> >>
> >> Any idea?
> >>
> >> thanks for help,
> >>
> >> gerard
> >> ___
> >> zones-discuss mailing list
> >> zones-discuss@opensolaris.org
> >>
> > ___
> > zones-discuss mailing list
> > zones-discuss@opensolaris.org
>
>
> --
> Craig Cory
>  Senior Instructor :: ExitCertified
>  : Sun Certified System Administrator
>  : Sun Certified Network Administrator
>  : Sun Certified Security Administrator
>  : Veritas Certified Instructor
>
>  8950 Cal Center Drive
>  Bldg 1, Suite 110
>  Sacramento, California  95826
>  [e] craig.c...@exitcertified.com
>  [p] 916.669.3970
>  [f] 916.669.3977
>  [w] WWW.EXITCERTIFIED.COM
> +-+
>   OTTAWA | SACRAMENTO | MONTREAL | LAS VEGAS | QUEBEC CITY | CALGARY
>SAN FRANCISCO | VANCOUVER | REGINA | WINNIPEG | TORONTO
>
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state in os2008.11

2009-06-21 Thread Derek McEachern
I have had the same problem with two zones and using the following two steps
I was able to get one of the zones to shut down and the other one I wasn't.

First, check for hung nfs mounts for the ngz from the gz.  mount | grep
www2.  If you see any umount them from the gz.

Next, check for any processes that might be accessing files in the ngz. From
the gz you can do something like:

#  pwdx /proc/* | grep zone1
17459:  /export/zone/zone1/root
17731:  /export/zone/zone1/root/tmp
18022:  /export/zone/zone1/root/tmp

# ps -ef | egrep "18022|17731"
root 18022 17731   0 13:15:33 pts/2   0:00 sleep 50
root 18064 17745   0 13:17:40 pts/1   0:00 egrep 18022|17731
root 17731 17727   0 13:14:18 pts/2   0:00 sh


If you find any processes you can try and kill them.


After each step try and halt the zone and see if it comes down.

If neither of these work the only solution I've heard of is rebooting the
host.

On Sun, Jun 21, 2009 at 6:25 AM, solarg  wrote:

> hello all,
> after trying to reboot a zone, it still hang:
> he...@antigone:~# zoneadm -z www2 reboot;zlogin -C www2
>
> on other termial, i try to kill the process:
> he...@antigone:~# ps -ef|grep www2
>root 16432 1   0   Mar 18 ?   0:03 zoneadmd -z www2
>root 18864 18860   0 12:18:14 pts/5   0:00 grep www2
>root 18809 11676   0 12:09:53 pts/3   0:00 zoneadm -z www2 reboot
> he...@antigone:~# kill -9 16432
>
> and:
> he...@antigone:~# zoneadm -z www2 reboot;zlogin -C www2
> door_call failed: Interrupted system call
> zone 'www2': WARNING: zone is in state 'down', but zoneadmd does not appear
> to be available; restarted zoneadmd to recover.
> [Connected to zone 'www2' console]
> ~^D
>
> he...@global:~# zoneadm list -cv
>  ID NAME STATUS PATH   BRAND
>  73 www2 down   /zones/www2ipkg shared
>
> i also have:
> he...@antigone:~# mdb -k
> > ::walk zone | ::print zone_t zone_name zone_ref
> ...
> zone_name = 0xff044b049c80 "www2"
> zone_ref = 0x2
>
> a precedent thread said:
> "If the refcount is greater than 0x1, it could be:
>6272846 User orders zone death; NFS client thumbs nose
> "
>
> he...@antigone:~# ps -ef|grep www2
>root 19091 18860   0 13:19:32 pts/5   0:00 zoneadm -z www2 halt
>root 19093 1   0 13:19:32 ?   0:00 zoneadmd -z www2
>root 19113 11676   0 13:24:30 pts/3   0:00 grep www2
> he...@antigone:~# truss -p 19093
> /4: door_return(0x, 0, 0x, 0xFE4F0E00, 1007360)
> (sleeping...)
> /3: zone_destroy(73)(sleeping...)
> /1: pollsys(0x08046AD0, 4, 0x, 0x) (sleeping...)
> /2: door_unref()(sleeping...)
> he...@antigone:~# truss -p 19091
> door_call(6, 0x08047590)(sleeping...)
>
>
> Any idea?
>
> thanks for help,
>
> gerard
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] df -Z behaviour in global Zone

2009-06-05 Thread Derek McEachern
It was actually another problem we were trying to solve and happened to
notice the df behaviour while debugging.

We had a script in the gz that wanted to mount the nfs share. It did a
simple check to see if it was already mounted by parsing mount output which
showed the ngz mount. The script in the gz tried to access the mount that
was in the ngz which faild, at which point the script decided to umount the
share since it couldn't get to it.

The umount was successful, which is expected, but the poor people in the ngz
just saw there mount disappear. Woops.

df doesn't really ignore nfs mounts since it reports the data correctly if
you don't use the -Z option and if you run it in the ngz. The -Z option
works correctly for other fs types in the ngz just not nfs.

I would have expected df -Z to just not report on nfs fs's in the ngs
instead of throwing the cryptic "Not Owner" error.

On Fri, Jun 5, 2009 at 4:52 PM, Peter Tribble wrote:

>
>
> That's why df does ignore nfs mounts by default and you have the -Z option
> to
> force it to go look. I've not yet come across a case where df -Z has
> told me anything
> useful - what problem are you trying to solve by running df -Z?
>
> --
> -Peter Tribble
> http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] What is the correct approach to add Java, GlassFish to non-global zones?

2009-06-05 Thread Derek McEachern
It took me a while to find it but in some of my previous searches I came
across something which might help:

http://opensolaris.org/jive/thread.jspa?messageID=378354&tstart=0

There are some links in one of the messages that talk about these packages.

Derek

On Thu, Jun 4, 2009 at 11:49 PM, Kevin Pan  wrote:

> After creating a non-global zone, what is the correct approach to add Java,
> GlassFish, PostgreSQL, SunWebServer to the non-global zone? (use the "pkg
> install" or "pkgadd" command, but how? and where to find the package name?)
> Please provide some instructions or point to the relavant docs. Thanks!!!
> --
> This message posted from opensolaris.org
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

[zones-discuss] df -Z behaviour in global Zone

2009-06-05 Thread Derek McEachern
In doing some testing we came across some unexpected behaviour (at least
unexpected to me) of the df -Z command when run in the global zone.

If a non-global zone has an nfs fs mounted df -Z dumps all kinds of statvfs
errors because as best I can tell he can't actually see the fs.  It's in the
gz's /etc/mntab but it can't be accessed from the gz.

df -Z
df: cannot statvfs /zone/zone/root/servers: Not owner

This seems like a bug to me, shouldn't df ignore ngz nfs mounts?

Derek
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-05-07 Thread Derek McEachern
Steve,

Thanks for this information.

I ran through the commands and this is what I see.

> ::kmem_cache ! grep rnode
a6438008 rnode_cache   0200 00  65670506
a643c008 rnode4_cache  0200 00  9680

When I run the following:
a6438008::walk kmem | ::print rnode_t r_vnode | ::wnode2path

I can see two listed fs's
/zone/zonetest-new/root/opt/xxx//logs.tar
/zone/zonetest-new/root/var/xxx/

But when I run the ::fsinfo command there are no nfs mounted filesystems in
the zonetest-new fs. Both of the fs's listed were nfs mounted when the zone
was up and running.

It really looks like someone was doing something with the logs.tar file at
the time the zone was coming down which probably started all my problems.

I really appreciate all this info.

Thanks,
Derek


On Thu, May 7, 2009 at 12:34 AM, Steve Lawrence wrote:

> Related comments from bug below (X'ed out some paths):
>
> The zone in question clearly has too many references
>
> > 030004a09680::print zone_t zone_ref
> zone_ref = 0t11
>
> Ten too many, to be precise.  So what's holding onto the zone?  Well the
> rnode
> cache has 5 entries
>
> > ::kmem_cache ! grep rnode
> 030003a1e988 rnode_cache    00  640   572988
> 030003a20988 rnode4_cache   00  9840
> > 030003a1e988::walk kmem | ::print rnode_t r_vnode | ::vnode2path
> /opt/zones/z1/root/
> /opt/zones/z1/root/
> /opt/zones/z1/root/
> /opt/zones/z1/root/
> /opt/zones/z1/root/
>
> even though no nfs filesystems are mounted
>
> > ::fsinfo
>VFSP FS  MOUNT
> 0187f420 ufs /
> 0187f508 devfs   /devices
> 03315780 ctfs/system/contract
> 033156c0 proc/proc
> 03315600 mntfs   /etc/mnttab
> 03315480 tmpfs   /etc/svc/volatile
> 033153c0 objfs   /system/object
> 0300039987c0 namefs  /etc/svc/volatile/repository_door
> 0300039984c0 fd  /dev/fd
> 030003a99e00 ufs /var
> 030003998400 tmpfs   /tmp
> 030003a99680 tmpfs   /var/run
> 030003a98f00 namefs  /var/run/name_service_door
> 030003a98b40 namefs
>  /var/run/sysevent_channels/syseventd_channel...
> 030003a989c0 namefs  /etc/sysevent/sysevent_door
> 030003a98780 namefs  /etc/sysevent/devfsadm_event_channel/1
> 030003a98540 namefs  /dev/.zone_reg_door
> 030003a983c0 namefs  /dev/.devfsadm_synch_door
> 030003a99380 namefs  /etc/sysevent/piclevent_door
> 0300044b1d80 namefs  /var/run/picld_door
> 030003a99200 ufs /opt
> 0300044b0700 namefs  /var/run/zones/z1.zoneadmd_door
>
> And as apparent from the path, all of those rnodes refer to zone z1 through
> their mntinfo structure
>
> > 030003a1e988::walk kmem | ::print rnode_t r_vnode->v_vfsp->vfs_data |
> ::print mntinfo_t mi_zone | ::zone
>ADDR ID NAME PATH
> 030004a09680  1 z1   /opt/zones/z1/root/
> 030004a09680  1 z1   /opt/zones/z1/root/
> 030004a09680  1 z1   /opt/zones/z1/root/
> 030004a09680  1 z1   /opt/zones/z1/root/
> 030004a09680  1 z1   /opt/zones/z1/root/
>
> So if each of those rnodes has two holds on the zone, then that accounts
> for
> all of the extra holds exactly.
>
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-05-06 Thread Derek McEachern
I don't believe that I can see the comments since they are not public.

Is that something you can pass along?

On Wed, May 6, 2009 at 5:27 PM, Steve Lawrence wrote:

> >I already tried killing the zoneadmd process and issuing the halt and
> all
> >it does is start back up the zoneadmd process and hang.* I can't force
> a
> >crashdump on the system since I can't take the box down.
> >
> >Bug 6272846 makes reference to nfs version 3, (which is the version we
> are
> >using), and the client apparently leaking rnodes. Is there any way to
> >verify this other then a forced crashdump? I might take a live core of
> the
> >system and open a case to see if that yields anything.
>
> The zone_ref > 1 means that something in the kernel is holding the zone.
> You should be able to use "mdb -k" on the live system, and issue dcmds
> similar
> to the comments of 6272846.  No need to force a crashdump or take a live
> crashdump.
>
> -Steve L.
>
> >
> >Derek
> >
> >On Wed, May 6, 2009 at 4:08 PM, Steve Lawrence
> ><[1]stephen.lawre...@sun.com> wrote:
> >
> >  zsched is always unkillable. *It will only exit when instructed to
> by
> >  zoneadmd.
> >
> >  Is the remaining zone "shutting down", or "down"? *(zoneadm list
> -v).
> >
> >  What is the ref_count on the zone?
> >
> >  # mdb -k
> >  > ::walk zone | ::print zone_t zone_name zone_ref
> >
> >  If the refcount is greater than 0x1, it could be:
> >  * * * *6272846 User orders zone death; NFS client thumbs nose
> >
> >  No workaround for this one. *A crashdump would help investigate a
> >  zone_ref
> >  greater than 1.
> >
> >  Is there a zoneadmd process for the given zone?
> >  # pgrep -lf zoneadmd
> >
> >  If so, please provide *truss -p " of this process. *You may
> also
> >  attempt
> >  killing this zoneadmd process (which lives in the global zone), and
> then
> >  re-attempting "zoneadm -z  halt".
> >
> >  Thanks,
> >
> >  -Steve L.
> >
> > References
> >
> >Visible links
> >1. mailto:stephen.lawre...@sun.com
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-05-06 Thread Derek McEachern
The zone is in a shutting_down state.

The mdb command for this zone returns 0x1a, greater then 1.

zone_name = 0xfe86c83d61c0 "zonetest-new"
zone_ref = 0x1a

This is new to me, what is the refcount counting? What should this value be
for the zone to shutdown?

There is a zoneadmd processes running for the zone.  Trussing it and issuing
the halt command I can see it look up the zone, get some attributes and then
send it the shutdown which is where it hangs.

5364:   psargs: zoneadmd -z zonetest-new
5364/2: door_return(0xFE6CD870, 4096, 0x, 0) (sleeping...)
5364/4: door_return(0x, 0, 0x, 0) (sleeping...)
5364/3: door_unref()(sleeping...)
5364/1: pollsys(0x08046C50, 4, 0x, 0x) (sleeping...)
5364/2: 32.4128 door_return(0xFE6CD870, 4096, 0x, 0)= 0
5364/2: 32.4642 door_ucred(0x08077150)  = 0
5364/2: 32.4769 zone_lookup("zonetest-new")  =
64
5364/2: 32.4770 zone_getattr(64, ZONE_ATTR_STATUS, 0xFE6CD85C, 4) =
4
5364/2: 32.4843 zone_lookup("zonetest-new")  =
64
5364/2: zone_shutdown(64)   (sleeping...)

I already tried killing the zoneadmd process and issuing the halt and all it
does is start back up the zoneadmd process and hang.  I can't force a
crashdump on the system since I can't take the box down.

Bug 6272846 makes reference to nfs version 3, (which is the version we are
using), and the client apparently leaking rnodes. Is there any way to verify
this other then a forced crashdump? I might take a live core of the system
and open a case to see if that yields anything.

Derek


On Wed, May 6, 2009 at 4:08 PM, Steve Lawrence wrote:

> zsched is always unkillable.  It will only exit when instructed to by
> zoneadmd.
>
> Is the remaining zone "shutting down", or "down"?  (zoneadm list -v).
>
> What is the ref_count on the zone?
>
> # mdb -k
> > ::walk zone | ::print zone_t zone_name zone_ref
>
> If the refcount is greater than 0x1, it could be:
>6272846 User orders zone death; NFS client thumbs nose
>
> No workaround for this one.  A crashdump would help investigate a zone_ref
> greater than 1.
>
> Is there a zoneadmd process for the given zone?
> # pgrep -lf zoneadmd
>
> If so, please provide  truss -p " of this process.  You may also
> attempt
> killing this zoneadmd process (which lives in the global zone), and then
> re-attempting "zoneadm -z  halt".
>
> Thanks,
>
> -Steve L.
>
>
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-05-05 Thread Derek McEachern
Just follow up on the progress and resolution to my stuck zones problem.  I
had two zones stuck in the shutting_down state.

Based on initial feedback I looked for nfs mounts in /etc/mnttab in the gz
that were mounted in the ngz. There were a couple and umount'ed them. Then I
was able to find two processes that indicated they were accessing the ngz
filesystem, zsched and svc.configd. I tried trussing svc.configd but was
unable to due to "unanticipated system error".  I killed svc.configd, ran
the zoneadm halt and the zone successfully shut down.

On the second zone that's stuck I umount'ed the nfs file systems and checked
for processes accessing the ngz filesystem and the only one reported is
zsched. Trying to halt the zone doesn't do anything and from the looks of it
zsched appears to be unkillable. It looks like this zone is here to stay
until I can reboot the box.

Derek

On Tue, Apr 28, 2009 at 10:09 PM, Derek McEachern
wrote:

> There were a bunch of nfs mounts listed in the /etc/mntab of the global
> zone. I was able to umount them but zone is still hung up.
>
>  I tried killing the zoneadmd process and ran zoneadm halt again and it
> started the zoneadmd back up but it didn't do anything.
>
> Thanks to everyone for their suggestions, looks like I'm going to have to
> wait until I can take the box down for a reboot.
>
> Regards,
> Derek
>
>
> On Tue, Apr 28, 2009 at 9:02 PM, Alexander J. Maidak 
> wrote:
>
>> If its hung nfs mount you should be able to see it still mounted in
>> the /etc/mntab file in the global zone: grep nfs /etc/mntab. It will be
>> mounted under the zonepath.  You should then be able to do a umount
>> -f / from the global zone and if you're really lucky the
>> zone will finish shutting down.
>>
>> -Alex
>>
>> On Tue, 2009-04-28 at 16:19 -0500, Derek McEachern wrote:
>> > It's possible that it could be nfs mount related since the zone did
>> > have nfs mounted fs's but they should have been umounted prior to
>> > shutting down the zone.  In any event I can no longer get into the
>> > zone to checkusing  zlogin and zlogin -C.
>> >
>> > I tried Bryan's suggestion on looking for processes that might have
>> > open filehandles to files under the zone's filesystem tree but I don't
>> > see that there are any.
>> >
>> > On Tue, Apr 28, 2009 at 3:40 PM, Bryan Allen 
>> > wrote:
>> >
>> +--
>> > | On 2009-04-28 15:37:22, Derek McEachern wrote:
>> > |
>> > | We were trying to bring down a zone on a S10 U4 system and
>> > it ended up stuck
>> > | in the shutting_down state.
>> > |
>> > | ID NAME STATUS PATH
>> > BRANDIP
>> > | 74 zonetest-new shutting_down /zone/zonetest-new
>> > native
>> > | shared
>> > |
>> > |
>> > | The only process I see running is the zoneadmd process
>> > |
>> > | dlet15:/home/derekm/ ps -efZ | grep zonetest-new
>> > |   globalroot 12680 1   0   Apr 24 ?   0:02
>> > zoneadmd -z
>> > | zonetest-new
>> >
>> >
>> >
>> > Do any processes (notably shells in the global zones) have an
>> > open filehandle
>> > somewhere under the zone's filesystem tree? This can (at least
>> > on Sol10) cause
>> > zones to not shut down, since it can't close the FH (I assume,
>> > anyway).
>> > --
>> > bda
>> > cyberpunk is dead. long live cyberpunk.
>> > ___
>> > zones-discuss mailing list
>> > zones-discuss@opensolaris.org
>> >
>> > ___
>> > zones-discuss mailing list
>> > zones-discuss@opensolaris.org
>>
>>
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-04-28 Thread Derek McEachern
There were a bunch of nfs mounts listed in the /etc/mntab of the global
zone. I was able to umount them but zone is still hung up.

 I tried killing the zoneadmd process and ran zoneadm halt again and it
started the zoneadmd back up but it didn't do anything.

Thanks to everyone for their suggestions, looks like I'm going to have to
wait until I can take the box down for a reboot.

Regards,
Derek

On Tue, Apr 28, 2009 at 9:02 PM, Alexander J. Maidak wrote:

> If its hung nfs mount you should be able to see it still mounted in
> the /etc/mntab file in the global zone: grep nfs /etc/mntab. It will be
> mounted under the zonepath.  You should then be able to do a umount
> -f / from the global zone and if you're really lucky the
> zone will finish shutting down.
>
> -Alex
>
> On Tue, 2009-04-28 at 16:19 -0500, Derek McEachern wrote:
> > It's possible that it could be nfs mount related since the zone did
> > have nfs mounted fs's but they should have been umounted prior to
> > shutting down the zone.  In any event I can no longer get into the
> > zone to checkusing  zlogin and zlogin -C.
> >
> > I tried Bryan's suggestion on looking for processes that might have
> > open filehandles to files under the zone's filesystem tree but I don't
> > see that there are any.
> >
> > On Tue, Apr 28, 2009 at 3:40 PM, Bryan Allen 
> > wrote:
> >
> +--
> > | On 2009-04-28 15:37:22, Derek McEachern wrote:
> > |
> > | We were trying to bring down a zone on a S10 U4 system and
> > it ended up stuck
> > | in the shutting_down state.
> > |
> > | ID NAME STATUS PATH
> > BRANDIP
> > | 74 zonetest-new shutting_down /zone/zonetest-new
> > native
> > | shared
> > |
> > |
> > | The only process I see running is the zoneadmd process
> > |
> > | dlet15:/home/derekm/ ps -efZ | grep zonetest-new
> > |   globalroot 12680 1   0   Apr 24 ?   0:02
> > zoneadmd -z
> > | zonetest-new
> >
> >
> >
> > Do any processes (notably shells in the global zones) have an
> > open filehandle
> > somewhere under the zone's filesystem tree? This can (at least
> > on Sol10) cause
> > zones to not shut down, since it can't close the FH (I assume,
> > anyway).
> > --
> > bda
> > cyberpunk is dead. long live cyberpunk.
> > ___
> > zones-discuss mailing list
> > zones-discuss@opensolaris.org
> >
> > ___
> > zones-discuss mailing list
> > zones-discuss@opensolaris.org
>
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone Stuck in a shutting_down state

2009-04-28 Thread Derek McEachern
It's possible that it could be nfs mount related since the zone did have nfs
mounted fs's but they should have been umounted prior to shutting down the
zone.  In any event I can no longer get into the zone to checkusing  zlogin
and zlogin -C.

I tried Bryan's suggestion on looking for processes that might have open
filehandles to files under the zone's filesystem tree but I don't see that
there are any.

On Tue, Apr 28, 2009 at 3:40 PM, Bryan Allen  wrote:

>
> +--
> | On 2009-04-28 15:37:22, Derek McEachern wrote:
> |
> | We were trying to bring down a zone on a S10 U4 system and it ended up
> stuck
> | in the shutting_down state.
> |
> | ID NAME STATUS PATH   BRANDIP
> | 74 zonetest-new shutting_down /zone/zonetest-new native
> | shared
> |
> |
> | The only process I see running is the zoneadmd process
> |
> | dlet15:/home/derekm/ ps -efZ | grep zonetest-new
> |   globalroot 12680 1   0   Apr 24 ?   0:02 zoneadmd -z
> | zonetest-new
>
>
> Do any processes (notably shells in the global zones) have an open
> filehandle
> somewhere under the zone's filesystem tree? This can (at least on Sol10)
> cause
> zones to not shut down, since it can't close the FH (I assume, anyway).
> --
> bda
> cyberpunk is dead. long live cyberpunk.
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

[zones-discuss] Zone Stuck in a shutting_down state

2009-04-28 Thread Derek McEachern
All,

We were trying to bring down a zone on a S10 U4 system and it ended up stuck
in the shutting_down state.

ID NAME STATUS PATH   BRANDIP
74 zonetest-new shutting_down /zone/zonetest-new native
shared


The only process I see running is the zoneadmd process

dlet15:/home/derekm/ ps -efZ | grep zonetest-new
  globalroot 12680 1   0   Apr 24 ?   0:02 zoneadmd -z
zonetest-new

Any suggestions on how to get rid of this zone and clean it up? I've tried
booting it but it indicated that the zone was already boot.

There are 15 other zones running on the box so rebooting isn't an option
right now.

Thanks,
Derek
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] zonestat.pl without Resource Pools

2009-02-20 Thread Derek McEachern
Jeff,

Sorry this has taken so long to get to but yes, if I enable the pools and
pools/dynamic services it runs as expected.

Has any work started on a 'real' zonestat yet?

On Tue, Feb 17, 2009 at 9:44 PM, Jeff Victor wrote:

> On Tue, Feb 17, 2009 at 4:09 PM, Derek McEachern
>  wrote:
> > We are in the process of deploying applications into zones and I've been
> looking at how to monitor what each zone is up to regarding resource usage.
> > I downloaded the zonestat.pl script to play around with and out of the
> box it didn't actually give me any zone specific information.
> >
> > After poking around the code it turns out it won't break out any zone
> level details unless resource pooling is enabled. We are deploying  our
> zones
> > without resource restrictions.
>
> This is a known problem with v1.3. I am working on v1.3.1 which will
> fix that problem.
>
> As a temporary workaround: does it work correctly if you enable pools
> and don't configure any?
>
> GZ# svcadm enable pools
> GZ# svcadm enable pools/dynamic
>
>
> > I hacked the script to get around this problem for now but is this a
> feature we can get added to the baseline?  Jeff, how are  changes handled to
> this
> > script since you appear to the owner?
>
> To make a contribution to the OpenSolaris community, first you would
> register as a contributor. The other option is to request a specific
> change in behavior, and I will try to get to it promptly.
>
> However, please understand (as the project web pages state) that this
> is a prototype to help us learn what a 'real' zonestat should do. The
> 'real' zonestat would be written in C or D for improved functionality
> and considerably better performance. This Perl script consumes a great
> deal of CPU cycles.
>
>
> --JeffV
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] zonepath on NFS

2009-02-19 Thread Derek McEachern
As far as I could tell nfs is not supported.  I believe it will not allow
the zone path to be on a fs type of procfs, mntfs, autofs, nfs, or cachefs.

On Thu, Feb 19, 2009 at 9:59 AM, Brian Kolaci  wrote:

> Hi,
>
> I wanted to check the availability of putting the zonepath
> on NFS.  Is this now supported?  Are there issues with Live Upgrade?
> Any constraints or gotchas?
>
> Thanks,
>
> Brian
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>
___
zones-discuss mailing list
zones-discuss@opensolaris.org

[zones-discuss] zonestat.pl without Resource Pools

2009-02-17 Thread Derek McEachern
We are in the process of deploying applications into zones and I've been
looking at how to monitor what each zone is up to regarding resource usage.
I downloaded the zonestat.pl script to play around with and out of the box
it didn't actually give me any zone specific information.

After poking around the code it turns out it won't break out any zone level
details unless resource pooling is enabled. We are deploying  our zones
without resource restrictions.

I hacked the script to get around this problem for now but is this a feature
we can get added to the baseline?  Jeff, how are  changes handled to this
script since you appear to the owner?

Derek
___
zones-discuss mailing list
zones-discuss@opensolaris.org