from:"Frank Batschulat \(Home\)"

Re: [zones-discuss] Branded zones and external hardware

2010-08-05 Thread Frank Batschulat (Home)

On Thu, 05 Aug 2010 15:03:56 +0200, Joerg Schilling 
joerg.schill...@fokus.fraunhofer.de wrote:

 Frank Batschulat (Home) frank.batschu...@sun.com wrote:

 the problem with exporting the tape device to a NGZ, which although
 not supported can be achived as you mention,
 is that there's no way to exclusive assign that particular tape device
 to a particular NGZ or to restrict access from the GZ or any other
 NGZ to that same tape device. that might become a problem
 if several different users try to use that tape from different
 NGZs or a NGZ and the GZ, that access may produce a somewhat
 questionable end result that care must be taken here when
 setting up such configuration.

 Where do you see a difference from many different users trying to access  
 the same tape from the Global Zone?

technically there is no difference here.

but from an administrative point of view there is.
 
the zone administration (zones root) is often
delegated to some other person(s) then the one
administering the GZ. the zones root position
may be fullfilled by an internal or external client
of the entity that administers and own the GZ and
the corresponding HW itself.

one must just be more aware of the fact that there's no
restricted access to such a tape device then in normal
situations because its so easy to forgett that
you've given away the tape device so some NGZ
in the past.

---
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] whole root not installed ?!?

2010-07-03 Thread Frank Batschulat (Home)

On Sat, 03 Jul 2010 13:36:51 +0200, Daniel Dinu daniel.d...@gmail.com wrote:

 zone1 is installed in /vol1/zone1 and zone2 in /vol1/zone2.
 zone1 was configured as a sparse root zone (I used create command).
 zone2 was configured as a whole root zone (I used create -b command).
 Still, the space used is the same for both zones, as depicted aboveOf 
 course, I expected that zone2 to use more space than zone1 (GB vs. MB).

 Can anybody tell what have  I done wrong? Is there something else I should've 
 done, besides using create -b for the whole root zone creation?

nothing wrong, there are no sparse root zones for the ipkg(5) branded zones in
OpenSolaris.
 
---
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

[zones-discuss] confusing zone login processes

2010-06-02 Thread Frank Batschulat (Home)

just noticed something strange, perhaps someone has an explanation ?

after booting a zone and login to that:

osoldev.batschul./export/home/techdocs/solaris_kernel/zones.= pfexec zoneadm 
-z zone2 boot

osoldev.batschul./export/home/techdocs/solaris_kernel/zones.= ps -eafd -Z | 
grep login
  global batschul  3821   993   0 07:59:32 pts/3   0:00 grep login
  global root  2301  1750   0 07:43:19 pts/5   0:00 zlogin -C zone2

now login to the zone:

osoldev.batschul./export/home/batschul.= pfexec zlogin zone2
[Connected to zone 'zone2' pts/6]
Last login: Wed Jun  2 07:52:29 on pts/6
Oracle Corporation  SunOS 5.11  snv_140 May 2010

from the NGZ I see:

r...@zone2:~# ps -eafd|grep login
root  3823  3386   0 07:59:39 pts/6   0:00 /usr/bin/login -z global -f 
root
root  3836  3824   0 08:00:30 pts/6   0:00 grep login

from tge GZ I see:

osoldev.batschul./export/home/techdocs/solaris_kernel/zones.= ps -eafd -Z | 
grep login
  global root  3822   975   0 07:59:39 pts/2   0:00 zlogin zone2
   zone2 root  3823  3822   0 07:59:39 ??  0:00 /usr/bin/login -z 
global -f root
  global root  2301  1750   0 07:43:19 pts/5   0:00 zlogin -C zone2
  global batschul  3831   993   0 07:59:43 pts/3   0:00 grep login


hugh? where does it got that from ?

   zone2 root  3823  3822   0 07:59:39 ??  0:00 /usr/bin/login -z 
global -f root

this only happens when I use pfexec zlogin zone2, it does not
happen when logging in on the console ie. pfexec zlogin -C zone2

thanks
frankB




___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] confusing zone login processes

2010-06-02 Thread Frank Batschulat (Home)

On Wed, 02 Jun 2010 09:53:03 +0200, casper@sun.com wrote:


osoldev.batschul./export/home/techdocs/solaris_kernel/zones.= ps -eafd -Z | 
grep login
  global root  3822   975   0 07:59:39 pts/2   0:00 zlogin zone2
   zone2 root  3823  3822   0 07:59:39 ??  0:00 /usr/bin/login -z 
 global -f root
  global root  2301  1750   0 07:43:19 pts/5   0:00 zlogin -C zone2
  global batschul  3831   993   0 07:59:43 pts/3   0:00 grep login


hugh? where does it got that from ?

   zone2 root  3823  3822   0 07:59:39 ??  0:00 /usr/bin/login -z 
 global -f root

 I think it's because of auditing enabled by default: it keeps an
 additional copy of login.

hmmm, auditing or accounting ?

osoldev.batschul./export/home/.= pfexec mdb -k
 audit_active/D
audit_active:
audit_active:   1

557 #define C2AUDIT_UNLOADED1   /* c2audit module not loaded */

and the accounting mod is also not loaded.

---
frankB



___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Q: is there a way to get a zoneid from kernel (not in user context)?

2010-05-28 Thread Frank Batschulat (Home)

On Fri, 28 May 2010 07:31:27 +0200, eiji@oracle.com wrote:

 I'm wondering if there is a way to get a zoneid from kernel even though
 it's not in user context. If it's possible, this is useful for us to
 activate an HCA port in the exclusive-IP zone at boot time.

 Here's the background info.

 Currently the IP path info is gotten when the driver is attached, then
 an HCA port can be ready for RDSv3, but this way is for the global zone,
 and it doesn't work well for the exclusive-IP zone because the driver cannot
 get the zoneid when it's attached (so far). After all, we have to wait until
 customers run a command for RDSv3 in the zone, but the port should be ready
 at boot time w/o any customers' actions. It'd be better off getting it
 in the driver attach, but I don't know if it's possible.

 If it's not possible to get a zoneid from kernel if it's not in user context,
 then is there any recommended method to get it at boot time? I'm thinking
 maybe by using SMF, we can invoke an appropriate command (like ifconfig) at
 boot time to activate HCAs in the exclusive-IP zones, but if there is a proper
 way for this kind of purpose, that'd be better.

for kernel consumers use:

#include sys/zone.h

473 extern zoneid_t getzoneid(void);

cheers

---
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] new zones community leaders

2010-05-28 Thread Frank Batschulat (Home)

On Fri, 28 May 2010 01:01:11 +0200, Edward Pilatowicz 
edward.pilatow...@oracle.com wrote:

 hey all,

 i wanted to propose zone community leadership status for the following
 folks:

 Frank Batschulat
 Gary Pennington
 John Levon
 Susan Kamm-Worrell

+1 

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Q: is there a way to get a zoneid from kernel (not in user context)?

2010-05-28 Thread Frank Batschulat (Home)

On Fri, 28 May 2010 10:43:22 +0200, eiji@oracle.com wrote:

 Hi Frank,

 getzoneid() can return a correct value even if it's called in a taskq thread
 (kernel context) and/or in an interrupt handler (interrupt context)?

I suppose so, look its not doing anything earth shattering:

   2496 getzoneid(void)
   2497 {
   2498 return (curproc-p_zone-zone_id);
   2499 }

no locking involved, no allocations done, nothing considered
harmfull in an interrupt context or taskq thread.

only question is to what proc your taskq/interrupt thread will bind to.

p0 or zsched ? p0 will always deliver the GLOBAL_ZONEID (zone0)

 Thanks,

 -Eiji


  I'm wondering if there is a way to get a zoneid from kernel even though
  it's not in user context. If it's possible, this is useful for us to
  activate an HCA port in the exclusive-IP zone at boot time.
 
  Here's the background info.
 
  Currently the IP path info is gotten when the driver is attached, then
  an HCA port can be ready for RDSv3, but this way is for the global zone,
  and it doesn't work well for the exclusive-IP zone because the driver 
  cannot
  get the zoneid when it's attached (so far). After all, we have to wait 
  until
  customers run a command for RDSv3 in the zone, but the port should be ready
  at boot time w/o any customers' actions. It'd be better off getting it
  in the driver attach, but I don't know if it's possible.
 
  If it's not possible to get a zoneid from kernel if it's not in user 
  context,
  then is there any recommended method to get it at boot time? I'm thinking
  maybe by using SMF, we can invoke an appropriate command (like ifconfig) at
  boot time to activate HCAs in the exclusive-IP zones, but if there is a 
  proper
  way for this kind of purpose, that'd be better.

 for kernel consumers use:

 #include sys/zone.h

 473 extern zoneid_t getzoneid(void);

 cheers

 ---
 frankB
 



-- 
frankB

It is always possible to agglutinate multiple separate problems
into a single complex interdependent solution.
In most cases this is a bad idea.
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Q: is there a way to get a zoneid from kernel (not in user context)?

2010-05-28 Thread Frank Batschulat (Home)

On Fri, 28 May 2010 13:35:07 +0200, James Carlson carls...@workingcode.com 
wrote:

 On 05/28/10 04:57, Frank Batschulat (Home) wrote:
 On Fri, 28 May 2010 10:43:22 +0200, eiji@oracle.com wrote:

 getzoneid() can return a correct value even if it's called in a taskq thread
 (kernel context) and/or in an interrupt handler (interrupt context)?

 I suppose so, look its not doing anything earth shattering:

2496 getzoneid(void)
2497 {
2498  return (curproc-p_zone-zone_id);
2499 }

 no locking involved, no allocations done, nothing considered
 harmfull in an interrupt context or taskq thread.

 only question is to what proc your taskq/interrupt thread will bind to.

 It sounds like we might need more information about what the original
 poster is attempting to do.

 Interrupts themselves aren't features of non-global zones, so they're
 not normally attributed to any particular zone.  In theory, if there
 were devices dedicated to individual zones, you could use the device's
 state structure to find the zoneid associated.

 If you just use getzoneid() in that context, you'll get the zoneid of
 the zone whose thread happens to be pinned down by the interrupt.  In
 other words, it's an arbitrary and almost certainly wrong answer.

 I think something's amiss if you're asking about zoneid outside the
 context of direct system call processing.  The answers there vary quite
 a bit.  For example, with STREAMS, the correct answer is to fetch the
 cred_t attached to the dblk_t, and get the zoneid from the cred_t.

 It's not unusual at all for interrupts and taskqs to do work on behalf
 of many different zones, and for them to need to track this information
 separately.

yepp, my bad ! and of course interrupt threads are always bound to p0 when they
are created (- thread_create_intr())

---
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Q: is there a way to get a zoneid from kernel (not in user context)?

2010-05-28 Thread Frank Batschulat (Home)

On Fri, 28 May 2010 13:35:07 +0200, James Carlson carls...@workingcode.com 
wrote:

 On 05/28/10 04:57, Frank Batschulat (Home) wrote:
 On Fri, 28 May 2010 10:43:22 +0200, eiji@oracle.com wrote:

 getzoneid() can return a correct value even if it's called in a taskq thread
 (kernel context) and/or in an interrupt handler (interrupt context)?

 I suppose so, look its not doing anything earth shattering:

2496 getzoneid(void)
2497 {
2498  return (curproc-p_zone-zone_id);
2499 }

 no locking involved, no allocations done, nothing considered
 harmfull in an interrupt context or taskq thread.

 only question is to what proc your taskq/interrupt thread will bind to.

and not only are interrupt threads bound to p0, taskq threads
created using taskq_create() are also bound to p0.

so for taskq that'd be only valid if you'd use taskq_create_proc()
with something not being p0.

---
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] impossible to attach migrated zone to a new server

2010-05-20 Thread Frank Batschulat (Home)

On Wed, 19 May 2010 20:47:41 +0200, Philippe Bürgisser 
burgis...@rfidcenter.ch wrote:

 I followed the guide 
 (http://docs.sun.com/app/docs/doc/819-2450/gcgnc?l=ena=view) from sun to 
 move a working zone into a new server with the same configuration.

 I executed those commands :

 tar xf myzone.tar -- to /export/zones/myzone
 zonecfg -z myzone
 create -a /export/zones/myzone

 all was ok

 when I wanted to attach, I got this message
 r...@ns358375:/# zoneadm -z proxiproduits attach -u

try these steps for v2v, thats how it worked for me when I tested
this last time:

source system:
 
batsc...@suizid:~$ zoneadm list -cp
0:global:running:/::ipkg:shared
-:my-zone:installed:/tank/zones/my-zone:629e41d5-7d92-4949-cd8b-ee755143fea0:ipkg:shared
 
batsc...@suizid:~$ pfexec zoneadm -z my-zone detach
 
batsc...@suizid:~$ zoneadm list -cp
0:global:running:/::ipkg:shared
-:my-zone:configured:/tank/zones/my-zone::ipkg:shared
 
batsc...@suizid:~$ cd /tank/zones
 
batsc...@suizid:/tank/zones$ su
Password: 
 
batsc...@suizid:/tank/zones# find my-zone -print | cpio -oP@/ | gzip  
my-zone.cpio.gz
891010 blocks
batsc...@suizid:/tank/zones# ls -la my-zone.cpio.gz
-rw-r--r--   1 root root 124913062 Mar 10 14:06 my-zone.cpio.gz
 
target system:
 
osoldev.batschul./.= zoneadm list -cp
0:global:running:/::ipkg:shared
-:my-zone:configured:/tank/zones/my-zone::ipkg:shared

NB: the location of he archive here is important afai recall, ie
it shall be in the zoneroots parent:
 
osoldev.batschul./.= pfexec zoneadm -z my-zone attach -a 
/tank/zones/my-zone.cpio.gz -u
Log File: /tmp/my-zone.attach_log.doaGco
Attaching...
 
   Global zone version: ent...@0.5.11,5.11-0.134:20100302T023003Z
   Non-Global zone version: ent...@0.5.11,5.11-0.134:20100302T023003Z
Evaluation: Packages in my-zone are in sync with global zone.
Attach complete.
 
osoldev.batschul./.= zoneadm list -cp
0:global:running:/::ipkg:shared
-:my-zone:installed:/tank/zones/my-zone:53c18130-533f-69ab-c998-f160f0a8ebc3:ipkg:shared
 
hth
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] renaming zonepath

2010-02-23 Thread Frank Batschulat (Home)

On Sun, 21 Feb 2010 17:33:06 +0100, Anil an...@entic.net wrote:

 r...@vps1:~# zoneadm -z note move /zones/note
 Moving across file systems; copying zonepath /zones/bugs...sh[1]: cd: 
 /zones/bugs: [No such file or directory]
 zoneadm: zone 'note': 'copy' failed with exit code 1.

 The copy failed.
 More information can be found in /var/log/zoneAAA2XaapU

 Cleaning up zonepath /zones/note...The ZFS file system for this zone has been 
 destroyed.

 I believe the zones are not mounted when the zone is not running so the cp 
 fails. Luckily it did not delete the data *phew*.

hmmm...I tested this myself, I'm not getting the bizar error messages from sh.

what error messages are logged in mentioned file 'var/log/zoneAAA2XaapU'

in general, I'd be helpfull to provide corresponding 'zoneadm list -cp' and 
'zfs list -t all' outputs
for the zones before and after the failure, as well as OS version/build info.

the error message suggest we attempted to copy which is correct if we're 
crossing file
system boundaries according to the 'move' psarc case: PSARC/2005/711

snip
The syntax for moving a zone will be:

# zoneadm -z my-zone move /newpath

where /newpath specifies the new zonepath for the zone.  This will
be implemented so that it works both within and across filesystems,
subject to the existing rules for zonepath (e.g. it cannot be on an
NFS mounted filesystem).  When crossing filesystem boundaries the
data will be copied and the original directory will be removed.
Internally the copy will be implemented using cpio with the proper
options to preserve all of the data (ACLs, etc.).  The zone must be
halted while being moved.
snip end

however contrary to this description and your case, in my tests this just 
renames
the zfs mount point property and does notthing else, even when the move would 
cross
file system boundaries:

osoldev.root./export/home/batschul.= zoneadm list -cp
0:global:running:/::ipkg:shared
-:zone1:installed:/tank/zones/zone1:caa7e784-dab0-6f77-e202-8cf135714809:ipkg:shared

osoldev.root./export/home/batschul.= zfs list -t all
NAME USED  AVAIL  REFER  MOUNTPOINT
tank/zones   996M   151G38K  /tank/zones
tank/zones/zone1 996M   151G36K  
/tank/zones/zone1
tank/zones/zone1/ROOT996M   151G  31.5K  legacy
tank/zones/zone1/ROOT/zbe996M   151G   996M  legacy

1) moving inside the same zfs dataset tank/zones

osoldev.root./export/home/batschul.= zoneadm -z zone1 move /tank/zones/test

osoldev.root./export/home/batschul.= zoneadm list -cp
0:global:running:/::ipkg:shared
-:zone1:installed:/tank/zones/test:caa7e784-dab0-6f77-e202-8cf135714809:ipkg:shared

osoldev.root./export/home/batschul.= zfs list -t all
NAME USED  AVAIL  REFER  MOUNTPOINT
tank/zones   996M   151G  36.5K  /tank/zones
tank/zones/zone1 996M   151G36K  
/tank/zones/test
tank/zones/zone1/ROOT996M   151G  31.5K  legacy
tank/zones/zone1/ROOT/zbe996M   151G   996M  legacy

2) moving to different zfs dataset and different pool, rpool/export/home

osoldev.root./export/home/batschul.= zoneadm -z zone1 move /export/home/test2

osoldev.root./export/home/batschul.= zoneadm list -cp
0:global:running:/::ipkg:shared
-:zone1:installed:/export/home/test2:caa7e784-dab0-6f77-e202-8cf135714809:ipkg:shared

osoldev.root./export/home/batschul.= zfs list -t all
NAME USED  AVAIL  REFER  MOUNTPOINT
tank/zones   996M   151G  36.5K  /tank/zones
tank/zones/zone1 996M   151G36K  
/export/home/test2
tank/zones/zone1/ROOT996M   151G  31.5K  legacy
tank/zones/zone1/ROOT/zbe996M   151G   996M  legacy

so there's no move in the sense of move happening in case 2) this seems wrong 
to me.

even the behavior in 1) looks suspect.

apparently we do have a bug open already that pretty much matches suspect 1)

6918505 zone move should rename ZFS file system, not change mountpoint
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6918505

but I can not find anything existing for suspect 2) and the missing move
action here as not only do we cross file system boundaries but we do even move
over to a different pool!

so either we'd need to enhance the scope of 6918505 or a new bug
ought to be filed for case 2)

---
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zones on shared storage - a warning

2010-02-23 Thread Frank Batschulat (Home)

update on this one: 

a workaround if you so will, or the more appropriate way to do this is 
apparently
to use lofiadm(1M) to create a pseudo block device comprising the file hosted 
on NFS
and use the created lofi device (eg. /dev/lofi/1) as the device for zpool create
and all subsequent I/O (this was not producing the strange CKSUM errors), eg.:

osoldev.root./export/home/batschul.= mount -F nfs opteron:/pool/zones /nfszone
osoldev.root./export/home/batschul.= mount -v| grep nfs
opteron:/pool/zones on /nfszone type nfs 
remote/read/write/setuid/devices/xattr/dev=9080001 on Tue Feb  9 10:37:00 2010
osoldev.root./export/home/batschul.= nfsstat -m
/nfszone from opteron:/pool/zones
 Flags: 
vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600
 Attr cache:acregmin=3,acregmax=60,acdirmin=30,acdirmax=60

osoldev.root./export/home/batschul.=  mkfile -n 7G /nfszone/remote.file
osoldev.root./export/home/batschul.=  ls -la /nfszone
total 28243534
drwxrwxrwx   2 nobody   nobody 6 Feb  9 09:36 .
drwxr-xr-x  30 batschul other 32 Feb  8 22:24 ..
-rw---   1 nobody   nobody   7516192768 Feb  9 09:36 remote.file

osoldev.root./export/home/batschul.= lofiadm -a /nfszone/remote.file
/dev/lofi/1

osoldev.root./export/home/batschul.= lofiadm
Block Device File   Options
/dev/lofi/1  /nfszone/remote.file   -

osoldev.root./export/home/batschul.= zpool create -m /tank/zones/nfszone 
nfszone /dev/lofi/1

Feb  9 10:50:35 osoldev zfs: [ID 249136 kern.info] created version 22 pool 
nfszone using 22

osoldev.root./export/home/batschul.= zpool status -v nfszone
  pool: nfszone
 state: ONLINE
 scrub: none requested
config:

NAME   STATE READ WRITE CKSUM
nfszoneONLINE   0 0 0
  /dev/lofi/1  ONLINE   0 0 0

---
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Error on zoneadm attach -u when going from b132 to b133

2010-02-22 Thread Frank Batschulat (Home)

On Mon, 22 Feb 2010 11:49:46 +0100, Paul van der Zwan paul.vanderz...@sun.com 
wrote:

 I upgraded my system from b132 to b133 this weekend and I got error messages 
 when I ran attach -u to upgrade my zones.
 The second run of the install of updated packages fails.
 In the log I find:

 $ pfexec cat /var/tmp/dns.attach_log.sCaydi
 [Saturday, 20 February 2010 20:57:50 CET] Log File: 
 /var/tmp/dns.attach_log.sCaydi
 [Saturday, 20 February 2010 20:57:52 CET] Attaching...
 [Saturday, 20 February 2010 20:57:52 CET] existing
 [Saturday, 20 February 2010 20:57:52 CET]
 [Saturday, 20 February 2010 20:57:52 CET]   Sanity Check: Passed.  Looks like 
 an OpenSolaris system.

 pkg: 'network/ftp' matches multiple packages
 network/ftp
 service/network/ftp
 'network/dns/bind' matches multiple packages
 service/network/dns/bind
 network/dns/bind
 'network/ssh' matches multiple packages
 network/ssh
 service/network/ssh

 If I run attach -u a second time it attaches without doing anything, or 
 giving an error.

 Are my zones OK or are they partly upgraded ?

I think exactly this issue is listed in the 133 release notes, and it states 
running
a 2nd attach will work.

if our marvellous opensolaris.org system would work you could read the 133 
release notes here
on the indiana discuss alias:

http://opensolaris.org/jive/thread.jspa?threadID=124275

---
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

[zones-discuss] headsup: going to build 133 with Zones installed

2010-02-21 Thread Frank Batschulat (Home)

jfyi, if you do 'pkg image-upgrade' from build (132) to build 133 and you do 
have  
non-global zones in the installed state, you may run into:

Bug 14668 - pkg directory action does work when there is none
http://defect.opensolaris.org/bz/show_bug.cgi?id=14668

note that you should only hit this if you have installed SUNWscp.

see bugs:
 Bug 14667 - compatibility/ucb delivers /export and /home
 http://defect.opensolaris.org/bz/show_bug.cgi?id=14667
and
 
 http://bugs.opensolaris.org/view_bug.do?bug_id=6928051

workaround_1:
- uninstall the zones before image-upgrade

workaround_2 (as reported in the bug):
- rebooting into single user, entering maintenance
  mode as root and image-updating from there. 

-

NB: _strongly discouraged_:

workaround_3: 
- detach all of the zones on the system before the image-update.
 
Why is workaround_3 strongly discouraged ?

this is strongly discouraged as no new BE gets created for the installed Zone,
instead the current Zone BE is updated if you do an attach -u following this
image-update.
 
if you want to switch back to a previous build BE later, eg. 132 (because 
there's
something in 133 that prevents you from doing what you want), the Zones BE
will no longer match the global BE, eg. GZ BE will be build 132 but the
NGZ BE will still be 133.

Net result, Zone not usable anymore in previous BE with build 132, ie.
you can not downgrade the NGZ BE again.

---
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] renaming zonepath

2010-02-20 Thread Frank Batschulat (Home)

On Sun, 21 Feb 2010 07:15:10 +0100, Anil an...@entic.net wrote:

 I am trying to rename a zonename (and zonepath) to a new zone. But I get this 
 error:

 # zfs rename zones/bugs zones/note
 cannot rename 'zones/bugs': child dataset with inherited mountpoint is used 
 in a non-global zone

thats correct, zfs rename honours the 'zoned' attribute of the child datasets 
are
currently in use by a NGZ, ie. you shall be seeing this 'zoned' property on:

osoldev.batschul./export/home/batschul.= pfexec zfs get zoned 
tank/zones/zone2/ROOT
NAME   PROPERTY  VALUE  SOURCE
tank/zones/zone2/ROOT  zoned on local

 I tried this with the zone in detached mode. Any tips?

zoneadm -z bugs move /zones/note

zoneadm(1M)

 move new_zonepath

 Move the zonepath to  new_zonepath.  The  zone  must  be
 halted   before   this   subcommand  can  be  used.  The
 new_zonepath must be a local file system and normal res-
 trictions for zonepath apply.

hth
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

[zones-discuss] codereview for 6914152 (zonecfg)

2010-02-19 Thread Frank Batschulat (Home)

May I request 2 code reviewers for the changes for:

6914152 zonecfg fails when less(1M) is missing
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6914152

http://cr.opensolaris.org/~batschul/zpager/

thanks!
frankB

1) old failure when $PAGER was bogus:

osoldev.root./export/home/batschul/tmp.= setenv PAGER foobar

osoldev.root./export/home/batschul/tmp.= zonecfg -z zone2
zonecfg:zone2 info
sh: line 1: foobar: not found

zonecfg:zone2 help
sh: line 1: foobar: not found

zonecfg:zone2 help add
usage:
add resource-type
(global scope)
add property-name property-value
(resource scope)
Add specified resource to configuration.

sh: line 1: foobar: not found

2) new failure mode when $PAGER is bogus:

osoldev.root./export/home/batschul.= setenv PAGER nonsense

osoldev.root./export/home/batschul.= zonecfg -z zone2
zonecfg:zone2 info
Could not stat PAGER nonsense: No such file or directory
zonename: zone2
zonepath: /tank/zones/zone2
brand: ipkg
autoboot: false
bootargs:
pool:
limitpriv:
scheduling-class:
ip-type: shared
hostid:

zonecfg:zone2 help
Could not stat PAGER nonsense: No such file or directory
Commands:

add resource-type
(global scope)
add property-name property-value
(resource scope)
Add specified resource to configuration.
.

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] codereview for 6914152 (zonecfg)

2010-02-19 Thread Frank Batschulat (Home)

On Fri, 19 Feb 2010 15:39:21 +0100, Jerry Jelinek gerald.jeli...@sun.com 
wrote:

 On 02/19/10 06:53, Frank Batschulat (Home) wrote:
 May I request 2 code reviewers for the changes for:

 6914152 zonecfg fails when less(1M) is missing
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6914152

 http://cr.opensolaris.org/~batschul/zpager/

 This looks fine to me.  One nit:

 911  5192 The error says Could not stat PAGER.  This error
  message might be useful to a developer
  but isn't that useful for a sysadmin.  Can you print
  something more meaningful like PAGER %s does not exist

Thanks Jerry, that is indeed a valid concern, I changed it to be:

snip
PAGER /usr/bin/nonsense does not exist (No such file or directory).
snip end

I included the real error string in case of permission errors where the
file does indeed exist and I am now dropping the mysterious stat part.

updated webrev:

http://cr.opensolaris.org/~batschul/zpager/

cheers
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Native Vs. ipkg

2010-02-18 Thread Frank Batschulat (Home)

On Thu, 18 Feb 2010 16:45:01 +0100, Ben ben.lav...@gmail.com wrote:

 What are the differences between native and ipkg zones?
 When I did one of the Solaris 10 admin courses we learnt how zones use parts 
 of the global-zone (for example using global-zone directories like /etc 
 etc...).

That is one of the differences, you are refering to what we called 'sparse 
root' zones
which inherit a lof ot the system stuff like /etc , /dev, /usr via lofs 
loopback file system mounts.
this was possible with Solaris 10 and the so called 'native' branded zones.

this is different for OpenSolaris ipkg branded zones, which does only support
'whole root zones'.

and of course as already mentioned, the IPS vrs SysV packaging difference.

hth
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] zoneadm clone -m copy does not really copy on ZFS zonepath

2010-02-18 Thread Frank Batschulat (Home)

On Thu, 18 Feb 2010 19:58:11 +0100, Christine Tran christine.t...@gmail.com 
wrote:

 On Sat, Feb 13, 2010 at 3:10 AM, Frank Batschulat (Home)
 frank.batschu...@sun.com wrote:

 a '-x nodataset' option for 'clone' like in 'install' is unlikely going to 
 happen, in
 fact I will remove the '-x nodataset' option for 'install' completely soon 
 in OSOL build 135
[...]
 I have created test ipkg type zones on this laptop before, I have not
 done an upgrade but I've allowd Package Manager to update packages as
 far as it's abled.  You say you will remove -x nodataset option,
 implying it hasn't been done yet, but here's what happened this

the '-x nodataset' option does only apply to the 'native' brand (ie. Solaris 
10, SX-DE)
(see native(5)) - it is not available to the 'ipkg' brand nor is it available
for the solaris8, solaris9 and solaris10 brands.

 morning when I tried to create a new zone.

 r...@fiat~ cat /etc/release
OpenSolaris 2008.11 snv_101b_rc2 X86
Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.
 Use is subject to license terms.
Assembled 19 November 2008

 r...@fiat~ zonecfg -z pink
 pink: No such zone configured
 Use 'create' to begin configuring a new zone.
 zonecfg:pink create
 zonecfg:pink set zonepath=/zone/pink
 zonecfg:pink add net
 zonecfg:pink:net set physical=e1000g0
 zonecfg:pink:net set address=192.168.20.1/24
 zonecfg:pink:net end
 zonecfg:pink verify
 zonecfg:pink commit
 zonecfg:pink info
 zonename: pink
 zonepath: /zone/pink
 brand: ipkg
 autoboot: false
 bootargs:
 pool:
 limitpriv:
 scheduling-class:
 ip-type: shared
 net:
   address: 192.168.20.1/24
   physical: e1000g0
   defrouter not specified
 zonecfg:pink exit
 r...@fiat~ zoneadm -z pink install -x nodataset
 Error: no zonepath dataset.

thats one of the problems that may arise, since the
'-x nodataset' option is really handled inside zoneadm.c:install_func()
and not in the brand specific code that is executed later.

zoneadm did honoured this option, but the 'ipkg' brand
specific code that will be executed _after_ zoneadm.c:install_func()
barfs.

 OK, I will create a dataset:

 r...@fiat~ zfs list
 NAME  USED  AVAIL  REFER  MOUNTPOINT
 rpool26.4G  71.5G72K  /rpool
 rpool/ROOT   19.8G  71.5G18K  legacy
 rpool/ROOT/opensolaris   19.8G  71.5G  19.6G  /
 rpool/dump   1.97G  71.5G  1.97G  -
 rpool/export 2.70G  71.5G19K  /export
 rpool/export/home2.70G  71.5G19K  /export/home
 rpool/export/home/ctran  2.70G  71.5G  2.70G  /export/home/ctran
 rpool/swap   1.97G  73.5G  3.81M  -
 r...@fiat~ zfs create rpool/pink
 r...@fiat~ zfs set mountpoint=/zone/pink rpool/pink
 r...@fiat~ zfs list
 NAME  USED  AVAIL  REFER  MOUNTPOINT
 rpool26.4G  71.5G74K  /rpool
 rpool/ROOT   19.8G  71.5G18K  legacy
 rpool/ROOT/opensolaris   19.8G  71.5G  19.6G  /
 rpool/dump   1.97G  71.5G  1.97G  -
 rpool/export 2.70G  71.5G19K  /export
 rpool/export/home2.70G  71.5G19K  /export/home
 rpool/export/home/ctran  2.70G  71.5G  2.70G  /export/home/ctran
 rpool/pink 18K  71.5G18K  /zone/pink
 rpool/swap   1.97G  73.5G  3.81M  -

 Try to install again

 r...@fiat~ zoneadm -z pink uninstall
 Are you sure you want to uninstall zone pink (y/[n])? y
 cannot open 'rpool/pink/ROOT': dataset does not exist
 Error: no active dataset.
 cannot open 'rpool/pink/ROOT': dataset does not exist
 cannot open 'rpool/pink/ROOT': dataset does not exist
 cannot open 'rpool/pink/ROOT': dataset does not exist
 Error: destroying ZFS dataset.

more follow up problems that result from the fact that
zoneadm honoured the '-x nodataset' option when it really must not.

---
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2010-02-15 Thread Frank Batschulat (Home)

On Sat, 23 Jan 2010 18:47:45 +0100, Frank Batschulat (Home) 
frank.batschu...@sun.com wrote:

 so lets wait for build 132...I'll also look at your dump from your test system
 on monday, but I suspect it'll be the same IP panic...

Hey Glenn, so I finally managed testing latest ISC with this stress test on 
build 132.

I did not encounter any zones problems!

1st run was about 30 hours, then hung, used up all mem
2nd run was about 15 hours, then hung, used up all mem

I filed the following bug for this:

6926454 build 132: 6GB physmem vanished into thin air...

cheers
frankB



___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] zoneadm clone -m copy does not really copy on ZFS zonepath

2010-02-13 Thread Frank Batschulat (Home)

On Fri, 12 Feb 2010 23:47:36 +0100, Christine Tran christine.t...@gmail.com
wrote:

Hi, I'm sorry to bug the OpenSolaris for a question that pertains to
S10U8, but I am really stuck.

I am doing a zoneadm clone -m copy, and I do not want a new ZFS
dataset even though my zonepath is on a ZFS filesystem, for
performance reasons particular to how I am using my zones.
Unfortunately, zoneadm clone just ignores the -m copy, and makes me
a new ZFS filesystem anyway; and by the speed with which it finished,
it certainly is a snapshot operation underneath.

I have tested with making the source zone on a separate UFS, have
pre-made a dirname under my ZFS filesystem as the zonepath, nothing
works. I always get a new ZFS filesystem. I see that zoneadm install
has an -x nodataset switch, I need this for zone clone as well. I
have not seen this filed as a bug against S10, is there a work-around
to get the behavior I want?

This is sort of a big deal for our application. We use labeled zones,
a file move within a filesystem has a different performance profile
than a move from one filesystem to another filesystem, even within one
ZFS pool. We are doing tens of thousands of move per minute.

Christine,

the '-m copy' option to 'clone' does not imply that no new zfs dataset
is created.

snip
clone [-m copy] [-s zfs_snapshot] source_zone

Install a zone by copying an existing installed zone.
This subcommand is an alternative way to install the
zone.

-m copy

Force the clone to be a copy, even if a ZFS clone
is possible.
snip end

it changes the method of clone to use 'find/cpio'

http://src.opensolaris.org/source/xref/pkg/on_ips/usr/src/cmd/zoneadm/zoneadm.c#copy_zone

instead of doing it with a zfs snapshot:

http://src.opensolaris.org/source/xref/pkg/on_ips/usr/src/cmd/zoneadm/zfs.c#clone_zfs

however, it does as well always create a new zfs dataset, this is intended.

http://src.opensolaris.org/source/xref/pkg/on_ips/usr/src/cmd/zoneadm/zoneadm.c#clone_copy

a '-x nodataset' option for 'clone' like in 'install' is unlikely going to
happen, in
fact I will remove the '-x nodataset' option for 'install' completely soon in
OSOL build 135

PSARC 2010/008 Remove zoneadm install sub-option -x nodataset
http://opensolaris.org/jive/thread.jspa?messageID=448598

your ZFS problem is with 'move' ie. rename a file from one dataset to another
while both datasets are still in the same pool ending up as a copy of the file
because it crosses dataset ie. file system boundaries. there's a ZFS RFE
open to improve that:

6483179 Provide an efficient way to rename a file to another dataset in same
zpool
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6483179

6650426 RFE: support link(2) between ZFS filesystems
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6650426

---
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] upgrading and zones

2010-02-06 Thread Frank Batschulat (Home)

On Sat, 06 Feb 2010 10:30:18 +0100, dick hoogendijk d...@nagual.nl wrote:

 Going from OpenSolaris b131 to b132..
 Am I correct in this procedure:
 [1] pkg image-update (with zones attached)
 [2] after reboot to the new BE detach the zones
 [3] zoneadm attach -u zones

precisely

 OR, do I have to detach -BEFORE- the pkg image -update?

_after_ 'pkg image update' and _after_ the following reboot.

that way, a new BE for the zone is created as well during
'pkg image update' corresponding to the new GZ BE.

like this (done yesterday):

osoldev.batschul./export/home/batschul.= zoneadm list -cp
0:global:running:/::ipkg:shared
-:zone2:installed:/tank/zones/zone2:04d7381c-9216-eba9-a490-d7c667c5850d:ipkg:shared

osoldev.batschul./export/home/batschul.= zfs list -t all
...
rpool/ROOT/opensolaris-129   33.9M   193G  10.5G  /
rpool/ROOT/opensolaris-130   38.8M   193G  11.3G  /
rpool/ROOT/opensolaris-131   22.1M   193G  11.6G  /
rpool/ROOT/opensolaris-132   18.1G   193G  11.7G  
/tmp/tmp7MMWZf
rpool/ROOT/opensolaris-...@install   1.72G  -  3.98G  -
rpool/ROOT/opensolaris-...@2009-12-25-08:57:20   1.28G  -  10.5G  -
rpool/ROOT/opensolaris-...@2010-01-23-12:56:07   1.44G  -  11.3G  -
rpool/ROOT/opensolaris-...@2010-02-05-21:30:12972M  -  11.6G  -
...
tank/zones/zone2  981M   183G36K  
/tank/zones/zone2
tank/zones/zone2/ROOT 981M   183G  31.5K  legacy
tank/zones/zone2/ROOT/zbe52.5K   183G   981M  legacy
tank/zones/zone2/ROOT/zbe-1   981M   183G   981M  legacy
tank/zones/zone2/ROOT/zb...@2010-02-05-21:30:17   121K  -   981M  -

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] OpenSolaris zone migration

2010-02-04 Thread Frank Batschulat (Home)

On Thu, 04 Feb 2010 01:11:19 +0100, Ted Ward thomas.w...@sun.com wrote:

 I am trying to migrate a zone on OpenSolaris from one identical system
 to another.  It's going from x86 to sparc, but even when going from x86
 to x86 I get the same error.  Here's the build of both systems

 SunOS hostname 5.11 snv_111b i86pc i386 i86pc Solaris  (source system)
 SunOS hostname 5.11 snv_111b sun4u sparc SUNW,Sun-Blade-100 Solaris
 (target system)

 After creating the zone on zfs per expectations, I detach it it and get
 the typical directory you would expect:

 # ls
 SUNWdetached.xml  dev  root

 I then run the following command to migrate the zone:

 zfs send rpool/tedz...@migrate | ssh u...@hostname pfexec /usr/sbin/zfs
 receive -F rpool/tedz...@migrate

 Everything looks good at that point.  The zfs file system is mounted at
 rpool/tedzone automatically, and so I create a zone configuration to
 match that.  However, when I run the attach I get the following error
 message:

 zoneadm -z tedzone attach
 cannot open 'rpool/tedzone/ROOT': dataset does not exist
 ERROR: The -a, -d or -r option is required when there is no active root
 dataset

 The funny thing here is that the zfs list on the source system doesn't
 mention this zfs file system:

 rpool/tedzone  242M  64.1G  22.5K  /tedzone
 rpool/tedzone/ROOT 242M  64.1G19K  legacy
 rpool/tedzone/ROOT/zbe 242M  64.1G   242M  /tedzone/root

so if the above output is from the source system it does list
the 'rpool/tedzone/ROOT ' dataset, right ? I'm not sure why you claim
it does not ?

is the above zfs list from the source system before doing the zoneadm detach or 
after ?

can you show the zfs list output from the target system after the zfs receive 
before
zoneadm attach ?

I believe in the build 111b (2009.06) you are using, when a zone is 
halted/detached
the corresponding zfs datasets where not visible in the GZ, and we later 
changed that
(i'm not sure if that was before or after 111b/2009.06).

---
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Downgrading zones on Opensolaris 2009.x ( b131)

2010-01-26 Thread Frank Batschulat (Home)

On Tue, 26 Jan 2010 11:03:11 +0100, Dick Hoogendijk d...@nagual.nl wrote:

 Op 25-1-2010 12:30, Paul van der Zwan schr
 Unfortunately I am running into bug 6912829 ( causes panic on zoneadm halt ) 
 quite often.


 Do or don't zones work correctly on OpenSolaris-b131?

if you are using exclusive IP stack zones I'd suggest staying away from build 
131
and waht for build 132 which has the fixes for:

6912829 panic in ipsq_xopq_mp_cleanup/RD due to NULL ill-ill_wq on lo0 during 
zone shutdown/reboot
6917808 Panic on exclusive IP zone shutdown in ipsq_current_finish
6917809 Exclusive IP zone shutdown causes assertion failure in ire_inactive

so far to my knowledge and testing this does not affect shared IP stack zones. 
at
least not on my stystem.

---
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2010-01-23 Thread Frank Batschulat (Home)

On Mon, 18 Jan 2010 16:33:38 +0100, Frank Batschulat (Home) 
frank.batschu...@sun.com wrote:

 the bad news is, I'm not getting the dumps, sigh. this is due to bug:

 6911155 kernel dump fails if panic happens in interrupt service routine

 which is fixed in build 131.

 So I will persue this further once OSOL_131 has been released and this
 system has been upgraded. I finally will have dumps by then.

Hey Glenn, latest news - I'm loosing my patience, this isn't fun anymore :(

so I had the ISC now running on this test system that reproduced the hangs with 
OSOL_131.

however I can't do more for you here, halting the ISC zone (which has an 
exclusive IP stack, etherstubs and 
vnics) immediately panics the box due to bug:

6912829 panic in ipsq_xopq_mp_cleanup/RD due to NULL ill-ill_wq on lo0 during 
zone shutdown/reboot

introduced in build 131 and fixed in build 132.

to wit, I'm still not getting a dump either! alhtough 131 contains
the fix for above mentioned bug 6911155.

instead of getting a dump timeout (as fixed by 6911155) I'm now
getting 0% done, dump failed, error 5 (ie. EIO).

so lets wait for build 132...I'll also look at your dump from your test system
on monday, but I suspect it'll be the same IP panic...

see you again in 2..4 weeks.

cheers
frankB


___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] zcons module failing modunload ?

2010-01-22 Thread Frank Batschulat (Home)

On Thu, 21 Jan 2010 20:25:25 +0100, Edward Pilatowicz 
edward.pilatow...@sun.com wrote:

 well, the module is indeed unloadable:
 ---8---
 edp{ro...@mcescher$ modinfo | grep zcons
 edp{ro...@mcescher$ modload -p drv/zcons
 edp{ro...@mcescher$ modinfo | grep zcons
 301 f880a5b0   1ad0 164   1  zcons (Zone console driver)
 edp{ro...@mcescher$ modunload -i 301
 edp{ro...@mcescher$ modinfo | grep zcons
 edp{ro...@mcescher$
 ---8---

well, it is not ;-)

osoldev.batschul./export/home/batschul.= modinfo|grep zcons
osoldev.batschul./export/home/batschul.= pfexec modload -p drv/zcons
osoldev.batschul./export/home/batschul.= modinfo | grep zcons
275 f875f000   1ad0   0   1  zcons (Zone console driver)
osoldev.batschul./export/home/batschul.= pfexec modunload -i 275
can't unload the module: Device busy
osoldev.batschul./export/home/batschul.= pfexec modunload -i 275
can't unload the module: Device busy
osoldev.batschul./export/home/batschul.= pfexec modunload -i 275
can't unload the module: Device busy

really interesting though - I'm using osol_130/x86.

---
frankB


___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] why are zone datasets mounted when no zone is running ?

2010-01-22 Thread Frank Batschulat (Home)

On Fri, 22 Jan 2010 12:57:09 +0100, Frank Batschulat (Home) 
frank.batschu...@sun.com wrote:

 Hiya, I observed that zone datasets are mounted even though no zones are 
 running.

 this strikes me like a bug ? aren't they supposed to be mounted only when the 
 zone boots ?

 example from build 130:

 osoldev.root./export/home/batschul.= zoneadm list -cp
 0:global:running:/::ipkg:shared
 -:zone2:installed:/tank/zones/zone2:8b538910-6026-4342-b342-e7c69c2c14e8:ipkg:shared

 looks whats actually mounted right now:

 osoldev.batschul./export/home/batschul.= mount -v|grep zone
 tank/zones on /tank/zones type zfs 
 read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=21000e on Fri Jan 22 
 09:02:17 2010
 tank/zones/zone2 on /tank/zones/zone2 type zfs 
 read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=21000f on Fri Jan 22 
 09:02:17 2010
 tank/zones/zone2/ROOT/zbe on /tank/zones/zone2/root type zfs 
 read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=210010 on Fri Jan 22 
 09:02:22 2010

 df confirms:

 osoldev.root./export/home/batschul.= df -lkah
 Filesystem size   used  avail capacity  Mounted on

 tank/zones/zone2   228G24K   203G 1%/tank/zones/zone2
 tank/zones/zone2/ROOT/zbe
228G   511M   203G 1%/tank/zones/zone2/root

 osoldev.root./export/home/batschul.= zfs list -t all
 NAME USED  AVAIL  REFER  
 MOUNTPOINT

 tank/zones/zone2 511M   203G24K  
 /tank/zones/zone2
 tank/zones/zone2/ROOT511M   203G21K  legacy
 tank/zones/zone2/ROOT/zbe511M   203G   511M  legacy

 inspecting the 'canmount' zfs property gives a hint:

 osoldev.root./export/home/batschul.= zfs get canmount tank/zones/zone2
 NAME  PROPERTY  VALUE SOURCE
 tank/zones/zone2  canmount  ondefault
 osoldev.root./export/home/batschul.= zfs get canmount tank/zones/zone2/ROOT
 NAME   PROPERTY  VALUE SOURCE
 tank/zones/zone2/ROOT  canmount  ondefault
 osoldev.root./export/home/batschul.= zfs get canmount 
 tank/zones/zone2/ROOT/zbe
 NAME   PROPERTY  VALUE SOURCE
 tank/zones/zone2/ROOT/zbe  canmount  noautolocal

 so the 'zonepath' dataset has 'canmount=on' and is thus mounted
 by zfs mount -a, shouldn't that be 'canmount=noauto' ?

 the 'zonepath/ROOT' dataset has the same.

 even more interesting is that the 'zonepath/ROOT/zbe' apparently
 has the proper 'canmount=noauto' - yet it is mounted as well after boot.

 am I missing something obvious ?

for the 'zonepath/ROOT' dataset it's also interesting that 'canmount' is set to 
'on'
as the real systems /ROOT does have it turned off:

osoldev.batschul./export/home/batschul.= pfexec zfs get canmount
NAMEPROPERTY  VALUE SOURCE
rpool   canmount  ondefault
rpool/ROOT  canmount  off   local
rpool/ROOT/opensolaris-129  canmount  noautolocal
rpool/ROOT/opensolaris-130  canmount  noautolocal
rpool/ROOT/opensolaris-...@install  canmount  - -
rpool/ROOT/opensolaris-...@2009-12-25-08:57:20  canmount  - -

fwiw, there exists a releated bug to LiveUpdgrade that it should not set 
'canmount' to 'on' in the new BE for the /ROOT dataset:

6747122 lucreate should not set canmount to on for zfs root dataset

---
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zones patching issues using attach -u

2010-01-22 Thread Frank Batschulat (Home)

On Fri, 22 Jan 2010 18:30:39 +0100, Gael gael.marti...@gmail.com wrote:

 This is bug:

 6857294 zoneadm attach leads to partially installed packages

 I believe a T patch might be available for the S10 SVr4 packaging code
 if you need it, but I see that the fix has not yet been integrated
 into the nv SVr4 packaging code.  It is scheduled for b124.

 Was that fix ever released ?

Yes, Solaris 10U9 will have it, ONNV/OSOL build 125 has it.
and the following Solaris 10 patches have been released offically
containing the fix for 6857294

119254-72 (sparc)
119255-72 (x86)

(3 days ago)

---
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2010-01-18 Thread Frank Batschulat (Home)

On Wed, 23 Dec 2009 15:26:17 +0100, Glenn Brunette glenn.brune...@sun.com 
wrote:

 Just verified that something is still wrong in b129, but the problem is
 _not_ with a vanilla configuration.  This time around boot/halt #102,
 the system apparently shutdown/panic'ed?  I was running it overnight
 and came in to a system that had been rebooted.  I did not see any
 problem in the audit log nor in /var/adm/messages.  Any pointers?

 I am running an Immutable Service Container configuration, based upon
 the installation steps at:

 http://kenai.com/projects/isc/pages/OpenSolaris

 Specifically:

 pfexec pkg install SUNWmercurial
 hg clone https://kenai.com/hg/isc~source  isc
 pfexec isc/bin/iscadm.ksh -N 0
 pfexec bootadm update-archive
 pfexec shutdown -g 0 -i 0 -y
 [after reboot]
 zlogin -C isc1
 [wait for zone isc1 to fully complete boot process]

 then run the script that I provided that stops and starts the zone.

 Apparently, there must be something wrong with the interaction of
 components.  In this configuration, we have things like resource
 controls, auditing, IP Filter/IP NAT, and zones all enabled.

 Would it be possible for you to try the steps above on a fresh
 install of 2009.06 or later (b129 is where I am right now).  Also,
 if you have other debugging methods, please let me know.

hey Glenn, the good news is that I have an OSOL_130 system with ISC installed
as described below that reliably reproduces _something_.

That something being the system completely hung when run your script:

batsc...@osol:~# while : ; do echo `date`:ZONE BOOT; pfexec zoneadm -z isc1 
boot; sleep 10; echo `date`: ZONE HALT; pfexec zoneadm -z isc1 halt; sleep 
10; done

Note, sleep 30 didn't do it, 17 hours running without an issue, however changing
this to sleep 10, I can reliably hang the system usually within 5 hours.

no remote access possibly anymore and even local console doesn't do it anymore.

F1-A taking a dump when booted into kmdb however works.

the bad news is, I'm not getting the dumps, sigh. this is due to bug:

6911155 kernel dump fails if panic happens in interrupt service routine

which is fixed in build 131.

So I will persue this further once OSOL_131 has been released and this
system has been upgraded. I finally will have dumps by then.

I'll also contact you offline how you can setup your systems
to capture crash dumps and anything else we might need.

cheers
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zones on shared storage - a warning

2010-01-09 Thread Frank Batschulat (Home)

On Fri, 08 Jan 2010 18:33:06 +0100, Mike Gerdts mger...@gmail.com wrote:

 I've written a dtrace script to get the checksums on Solaris 10.
 Here's what I see with NFSv3 on Solaris 10.

jfyi, I've reproduces it as well using a Solaris 10 Update 8 SB2000 sparc client
and NFSv4.

much like you I also get READ errors along with the CKSUM errors which
is different from my observation on a ONNV client.

unfortunately your dtrace script did not worked for me, ie. it
did not spit out anything :(

cheers
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread Frank Batschulat (Home)

On Wed, 23 Dec 2009 03:02:47 +0100, Mike Gerdts mger...@gmail.com wrote:

 I've been playing around with zones on NFS a bit and have run into
 what looks to be a pretty bad snag - ZFS keeps seeing read and/or
 checksum errors.  This exists with S10u8 and OpenSolaris dev build
 snv_129.  This is likely a blocker for anything thinking of
 implementing parts of Ed's Zones on Shared Storage:

 http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss

 The OpenSolaris example appears below.  The order of events is:

 1) Create a file on NFS, turn it into a zpool
 2) Configure a zone with the pool as zonepath
 3) Install the zone, verify that the pool is healthy
 4) Boot the zone, observe that the pool is sick
[...]
 r...@soltrain19# zoneadm -z osol boot

 r...@soltrain19# zpool status osol
   pool: osol
  state: DEGRADED
 status: One or more devices has experienced an unrecoverable error.  An
 attempt was made to correct the error.  Applications are unaffected.
 action: Determine if the device needs to be replaced, and clear the errors
 using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: none requested
 config:

 NAME  STATE READ WRITE CKSUM
 osol  DEGRADED 0 0 0
   /mnt/osolzone/root  DEGRADED 0 0   117  too many errors

 errors: No known data errors

Hey Mike, you're not the only victim of these strange CHKSUM errors, I hit
the same during my slightely different testing, where I'm NFS mounting an
entire, pre-existing remote file living in the zpool on the NFS server and use 
that to create a zpool and install zones into it.

I've filed today:

6915265 zpools on files (over NFS) accumulate CKSUM errors with no apparent 
reason

here's the relevant piece worth investigating out of it (leaving out the actual 
setup etc..)
as in your case, creating the zpool and installing the zone into it still gives
a healthy zpool, but immediately after booting the zone, the zpool served over 
NFS
accumulated CHKSUM errors.

of particular interest are the 'cksum_actual' values as reported by Mike for his
test case here:

http://www.mail-archive.com/zfs-disc...@opensolaris.org/msg33041.html

if compared to the 'chksum_actual' values I got in the fmdump error output on 
my test case/system:

note, the NFS servers zpool that is serving and sharing the file we use is 
healthy.

zone halted now on my test system, and checking fmdump:

osoldev.batschul./export/home/batschul.= fmdump -eV | grep cksum_actual | sort 
| uniq -c | sort -n | tail
   2cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100 
0x7cd81ca72df5ccc0
   2cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74 
0x3d2827dd7ee4f21
   6cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300 
0x983ddbb8c4590e40
*A   6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 
0x89715e34fbf9cdc0
*B   7cksum_actual = 0x0 0x0 0x0 0x0
*C  11cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*D  14cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E  17cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*F  20cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 
0x7f84b11b3fc7f80
*G  25cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 
0x82804bc6ebcfc0

osoldev.root./export/home/batschul.= zpool status -v
  pool: nfszone
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
nfszone DEGRADED 0 0 0
  /nfszone  DEGRADED 0 0   462  too many errors

errors: No known data errors

==

now compare this with Mike's error output as posted here:

http://www.mail-archive.com/zfs-disc...@opensolaris.org/msg33041.html

# fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail

   2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 
0x290cbce13fc59dce
*D   3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E   3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*B   4cksum_actual = 0x0 0x0 0x0 0x0
   4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 
0x330107da7c4bcec0
   5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 
0x4e0b3a8747b8a8
*C   6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*A   6

Re: [zones-discuss] [zfs-discuss] Zones on shared storage - a warning

2010-01-08 Thread Frank Batschulat (Home)

On Fri, 08 Jan 2010 13:55:13 +0100, Darren J Moffat darr...@opensolaris.org 
wrote:

 Frank Batschulat (Home) wrote:
 This just can't be an accident, there must be some coincidence and thus 
 there's a good chance
 that these CHKSUM errors must have a common source, either in ZFS or in NFS ?

 What are you using for on the wire protection with NFS ?  Is it shared
 using krb5i or do you have IPsec configured ?  If not I'd recommend
 trying one of those and see if your symptoms change.

Hey Darren, doing krb5i is certainly a good idea for additional protection in 
general,
however I have some doubts that NFS OTW corruption will produce the exact same
wrong checksum inside 2 totally different setups and networks, as comparing
Mike and my results showed [see 1].

cheers
frankB

[1]

osoldev.batschul./export/home/batschul.= fmdump -eV | grep cksum_actual | sort 
| uniq -c | sort -n | tail
   2cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100 
0x7cd81ca72df5ccc0
   2cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74 
0x3d2827dd7ee4f21
   6cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300 
0x983ddbb8c4590e40
*A   6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 
0x89715e34fbf9cdc0
*B   7cksum_actual = 0x0 0x0 0x0 0x0
*C  11cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*D  14cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E  17cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*F  20cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 
0x7f84b11b3fc7f80
*G  25cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 
0x82804bc6ebcfc0

==

now compare this with Mike's error output as posted here:

http://www.mail-archive.com/zfs-disc...@opensolaris.org/msg33041.html

# fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail

   2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 
0x290cbce13fc59dce
*D   3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E   3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*B   4cksum_actual = 0x0 0x0 0x0 0x0
   4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 
0x330107da7c4bcec0
   5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 
0x4e0b3a8747b8a8
*C   6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*A   6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 
0x89715e34fbf9cdc0
*F  16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 
0x7f84b11b3fc7f80
*G  48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 
0x82804bc6ebcfc0

and observe that the values in 'chksum_actual' causing our CHKSUM pool errors 
eventually
because of missmatching with what had been expected are the SAME ! for 2 totally
different client systems and 2 different NFS servers (mine vrs. Mike's),
see the entries marked with *A to *G.
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2010-01-06 Thread Frank Batschulat (Home)

On Wed, 06 Jan 2010 03:28:35 +0100, Glenn Brunette glenn.brune...@sun.com 
wrote:

 Just an update.  I am still able to get the repeatable hangs, but I am
 still not able to generate a dump.  If anyone has any further ideas as
 to how to troubleshoot this please let me know!

Hey Glenn, you are not fogotten, I shall find some time now to reproduce this
with the additional setup you've described now after the winterbreak.
It's still on my agenda, but a couple of panics crossed my schedules over the 
last week.

cheers
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Add user in a zone - best practice

2009-12-31 Thread Frank Batschulat (Home)

On Thu, 31 Dec 2009 09:02:33 +0100, Jim Klimov jimkli...@cos.ru wrote:

 While I have seen many warnings explicitly noting that an NFS server should 
 never be its own client 
(including sharing global shares to local zones), I confess I have failed to 
find any specific grounds for that.

simplified reasons are deadlocks between NFS, the VM subsystem and the 
underlaying UFS file system.

I'm not so sure that still holds true in the sam way if ZFS is the underlaying 
file system, though
I'm not aware of any real stress test of such configurations.
 
---
frankB


___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Installing a specific dev release into a zone?

2009-12-27 Thread Frank Batschulat (Home)

On Sun, 27 Dec 2009 04:55:21 +0100, Tristan Ball 
tristan.b...@leica-microsystems.com wrote:

 I've got a Opensolaris snv_129 VM with which I'm playing around with
 zones.. Initially everything was fine, however since my initial install
 the opensolaris dev repository has updated to release 130, and now I
 can't install new zones, I get:

 t...@osol-test:~#  zoneadm -z z1 install
 A ZFS file system has been created for this zone.
 Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/ ).
 Image: Preparing at /data/zones/z1/root.
 Cache: Using /var/pkg/download.
 Sanity Check: Looking for 'entire' incorporation.
 ERROR: Unable to locate the incorporation
 'ent...@0.5.11,5.11-0.129:20091205T134302Z' in the preferred publisher
 'opensolaris.org'.
 Use -P to supply a publisher which contains this package.

 It looks like the zone install process only gathers information on the
 latest version of each package?

 t...@osol-test:~# pkg -R /data/zones/z1/root list -a |grep -i entire
 entire0.5.11-0.130known
 -


 Is there some way to tell zoneadm install to install a specific dev
 release, in this case 129?  Actually, it looks like zoneadm is trying to

maybe there is a specific repo for dev build versions, but I'm not aware of any,
perhaps you may want to ask on the IPS or on-discuss alias for it 

 install that release, but the defaults on the pkg tools it uses to do
 the install is hiding the availability of 129 on the dev package
 servers. Is there a way around this?

 I realise that I could upgrade my global zone to 130, but I'd really
 like to have the option of picking a given release and sticking with it
 for longer than the dev version release cycles!

the non-global zone runs the native OS of the global zone
(except for branded zones like solaris10(5) branded zone) which is 
in your case 129. however since your publisher defaults to the dev repo, zone 
install will
pick up 130 which is current for the dev repo. so you have to update the global 
zone
to run 130 before you can run 130 in the on-global zone.

for possible issues with 130, refer to: 
http://opensolaris.org/jive/thread.jspa?threadID=120631

hth
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Webrev for CR 6909222

2009-12-23 Thread Frank Batschulat (Home)

 On Tue, 22 Dec 2009 00:46:00 +0100, Jordan Vaughan jordan.vaug...@sun.com 
 wrote:

 I need someone to review my fix for

 6909222 reboot of system upgraded from 128 to build 129 generated error
 from an s10 zone due to boot-archive

 My webrev is accessible via

 http://cr.opensolaris.org/~flippedb/onnv-s10c

Jordan, we probably should update the s10container dev guide
to point out that we remove $ZONEROOT/boot/solaris/bin/create_ramdisk 
and essentially disable bootarchive update within the s10 branded zone ?

http://hub.opensolaris.org/bin/view/Community+Group+zones/s10brand_dev_guide

there may be ISVs/OEMs that potentially add/change stuff there ?

cheers
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Webrev for CR 6782448

2009-12-22 Thread Frank Batschulat (Home)

On Sat, 19 Dec 2009 04:28:52 +0100, Jordan Vaughan jordan.vaug...@sun.com 
wrote:

 I expanded my webrev to include my fix for

 6910339 zonecfg coredumps with badly formed 'select net defrouter'

 I need someone to review my changes.  The webrev is still accessible via

 http://cr.opensolaris.org/~flippedb/onnv-zone2

Hey Jordan looks good to me modulo this in zonecfg_lookup_nwif()

 size_t addrspec;/* nonzero if tabptr has IP addr */
 size_t physspec;/* nonzero if tabptr has interface */
+size_t defrouterspec;   /* nonzero if tabptr has def. router */
 
 if (tabptr == NULL)
 return (Z_INVAL);
 
+ * zone_nwif_address, zone_nwif_physical, and zone_nwif_defrouter are
+ * arrays, so no NULL checks are necessary.
  */
 addrspec = strlen(tabptr-zone_nwif_address);
 physspec = strlen(tabptr-zone_nwif_physical);
-assert(addrspec  0 || physspec  0);
+defrouterspec = strlen(tabptr-zone_nwif_defrouter);
+assert(addrspec != 0 || physspec != 0 || defrouterspec != 0);
 

so we do consider any of them being 0 a fault given the assert(), fine, but yet
we do check for this again inside the loop:

+if (physspec != 0  (fetchprop(cur, DTD_ATTR_PHYSICAL,
+physical, sizeof (physical)) != Z_OK ||
+strcmp(tabptr-zone_nwif_physical, physical) != 0))
+continue;
+if (addrspec != 0  (fetchprop(cur, DTD_ATTR_ADDRESS, address,
+sizeof (address)) != Z_OK ||
+!zonecfg_same_net_address(tabptr-zone_nwif_address,
+address)))
+continue;
+if (defrouterspec != 0  (fetchprop(cur, DTD_ATTR_DEFROUTER,
+address, sizeof (address)) != Z_OK ||
+!zonecfg_same_net_address(tabptr-zone_nwif_defrouter,
+address)))
+continue;

a good argument could probably be made to turn this assert into a real
check and return Z_INVAL for any of those 3 being 0 and get rid of
the checks inside the xml parsing loop ?

cheers
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Webrev for CR 6782448

2009-12-22 Thread Frank Batschulat (Home)

On Tue, 22 Dec 2009 14:55:34 +0100, Frank Batschulat (Home) 
frank.batschu...@sun.com wrote:

 a good argument could probably be made to turn this assert into a real
 check and return Z_INVAL for any of those 3 being 0 and get rid of
 the checks inside the xml parsing loop ?

probably rather Z_INSUFFICIENT_SPEC then Z_INVAL though.
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Webrev for CR 6909222

2009-12-22 Thread Frank Batschulat (Home)

On Tue, 22 Dec 2009 00:46:00 +0100, Jordan Vaughan jordan.vaug...@sun.com 
wrote:

 I need someone to review my fix for

 6909222 reboot of system upgraded from 128 to build 129 generated error
 from an s10 zone due to boot-archive

 My webrev is accessible via

 http://cr.opensolaris.org/~flippedb/onnv-s10c

Jordan, looks good to me.

what about /usr/lib/brand/ipkg/p2v 
and perhaps /usr/lib/brand/ipkg/pkgcreatezone for the ipkg brand ?

and usr/src/lib/brand/native/zone/p2v.ksh 
and usr/src/lib/brand/native/zone/image_install.ksh for the native brand ?

I'd assume that in the future running an s10u9 update for an s10u8 branded
zone, could that potentially put back the ' /boot/solaris/bin/create_ramdisk' 
script 
but that'd be taken care of by the s10_boot.ksh then.

cheers
frankB


___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Webrev for CR 6782448

2009-12-22 Thread Frank Batschulat (Home)

On Wed, 23 Dec 2009 01:34:59 +0100, Jordan Vaughan jordan.vaug...@sun.com 
wrote:

 http://cr.opensolaris.org/~flippedb/onnv-zone2
[...]
 zone_lookup_nwif() needs the three loop checks.

 I regenerated the webrev.  You'll notice that the assertion was replaced
 by a check that returns Z_INSUFFICIENT_SPEC.

Hey Jordan, thanks for the exhaustive reply. understood. I was ignoring
the fact that without these checks the xml parsing loop would generate 
false alarm for such conditions:

net:
address: 10.5.234.15/24
physical: bge0
defrouter not specified
zonecfg:mojo select net address=10.5.234.15/24
select net: No such resource with that id

lgtm!

cheers
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] code review for 6911329

2009-12-18 Thread Frank Batschulat (Home)

On Thu, 17 Dec 2009 23:16:19 +0100, Dan Price d...@eng.sun.com wrote:

 On Thu, Dec 17, 2009 at 07:17:50PM +0100, Frank Batschulat (Home) wrote:
  May I have 2 code reviewers for:
 
  6911329 Incorrect code in kstat_delete causes panic
  http://cr.opensolaris.org/~batschul/onnvkstat/
 
  Description
  A colleague was looking into a crash and the reason turned out to be a  
  NULL pointer dereference in kstat_delete():
 
  kstat_delete(kstat_t *ksp)
  { kmutex_t *lp;
 ekstat_t *e = (ekstat_t *)ksp;
 zoneid_t zoneid = e-e_zone.zoneid;
 kstat_zone_t *kz;
 
 if (ksp == NULL)
 return;
 
  Note that there is a dereference of 'ksp' [via 'e'] before the check for 
  ksp being NULL.
 
  unfortunately we don't have a dump/stacktrace anymore to inspect who
  called kstat_delete(NULL) and why.

 Do we really think that ksp being NULL is a invalid condition?

Yes, I think we do. kstat_create() offically and documented returns NULL in the 
error case.
ie. the usual sequence for a user would be

ksp = kstat_create()
if (ksp != NULL)
kstat_install()

kstat_delete(9F)
PARAMETERS
 ksp Pointer to a currently  installed  kstat(9S)  structure.

 If it's invalid, then why not add an assertion, so we can root-cause.

absolutely ! webrev updated:

http://cr.opensolaris.org/~batschul/onnvkstat/

 Or has this if (ksp == NULL) been there forever and ever and there
 are drivers abusing it?

nope, it got introduced when Jeff re-wrote the kstat framework in Solaris 9 via:
4460914 kstat implementation has escaped and doesn't scale

 I see a bunch of cmn_err's in kstat_create-- are there log files
 from the machine which might indicate that there was a kstat_create
 which returned NULL?

unfortunately I do have nothing at all. However what we do know is that
the inital kstat_create() can't have returned NULL, because in that 
case the following kstat_install() would've already paniced in face of the NULL 
ksp
due to the various dereferences it does on ksp.

this implies 2 things:

1) a bug in the kstat framework itself
2) a kstat user calling kstat_delete() twice or with a NULL ksp.

I'll run tests with the usual big kstat consumers like Zones, UFS, ZFS and NFS
with the debug bits for testing, perhaps I can uncover an offender.

thanks
frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

[zones-discuss] code review for 6911329

2009-12-17 Thread Frank Batschulat (Home)

May I have 2 code reviewers for:

6911329 Incorrect code in kstat_delete causes panic
http://cr.opensolaris.org/~batschul/onnvkstat/

Description

A colleague was looking into a crash and the reason turned out to be a  NULL 
pointer dereference in kstat_delete():

kstat_delete(kstat_t *ksp)
{ kmutex_t *lp;
   ekstat_t *e = (ekstat_t *)ksp;
   zoneid_t zoneid = e-e_zone.zoneid;
   kstat_zone_t *kz;

   if (ksp == NULL)
   return;

Note that there is a dereference of 'ksp' [via 'e'] before the check for ksp 
being NULL. 

unfortunately we don't have a dump/stacktrace anymore to inspect who
called kstat_delete(NULL) and why.

thanks
frankB
 
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] code review for 6495558

2009-12-17 Thread Frank Batschulat (Home)

Hey Ed, Steve, Jordan, Jerry,

I got it in writing from Veritas Engineering that they do not have any heartburn
over using fsck -o p on VxFS and inside the zone and also by testing in the 
lab I
confirmed it behaves as expected and similar to UFS:

snip end
# uname -a
SunOS lab234 5.10 Generic_139555-08 sun4u sparc sun4u
 
# pkginfo -l VRTSvxfs
   PKGINST:  VRTSvxfs
  NAME:  VERITAS File System
  CATEGORY:  system,utilities
  ARCH:  sparc
   VERSION:  5.0,REV=5.0A55_sol
 
# fsck -F vxfs -o p  /dev/rdsk/c1t14d0s0
/dev/rdsk/c1t14d0s0:file system is clean - log replay is not required
snip end

here's the new webrev for your consideration:

http://cr.opensolaris.org/~batschul/onnv-vplat/

thanks!
frankB

On Tue, 15 Dec 2009 08:37:49 +0100, Frank Batschulat (Home) 
frank.batschu...@sun.com wrote:

 valid point, Ed!

 ignoring the minor detail that my fix should really
 do 'fsck -o p (new webrev is in progress, thanks Steve for catching my 
 ignorance)

 in fact -o p is documented in the generic fsck(1M) man page.

 snip fsck(1M)
  -o specific-options

  p
  Check and  fix  the  file  system  non-interactively
  (preen).  Exit  immediately  if there is a problem
  requiring intervention. This option is  required  to
  enable parallel file system checking.

 snip end

 and VxFS does support it as well, and it has the same net effect as on UFS,
 a log reply without operator intervention:

 http://sfdoccentral.symantec.com/sf/5.0MP3/solaris/manpages/vxfs/man1m/fsck_vxfs.html

 snip
 p
 Allows parallel log replay for several VxFS file systems. Each message from 
 fsck is prefixed with the device name to identify the device. This suboption 
 does not perform a full file system check in parallel; that is still done 
 sequentially on each device, even when multiple devices are specified. This 
 option is compatible only with the -y|Y option (that is, non-interactive full 
 file system check), in which case a log replay is done in parallel on all 
 specified devices. A sequential full file system check is performed on 
 devices where needed.
 snip end

 however the part compatible only with the -y|Y option sounds a bit 
 ambiguous to me
 so I pinged a friend as VRTS to clarify this for me.

 worst case would be to add code differentiating between vxfs and ufs here.

 I'll be back once I have the confirmation.

 thanks!
 frankB

 On Tue, 15 Dec 2009 00:37:52 +0100, Edward Pilatowicz 
 edward.pilatow...@sun.com wrote:

 so just one question.

 the '-p' preen option is only documented in the fsck_ufs(1m) man page,
 and not in fsck(1m).  so i'm wondering is are there zones which may be
 installed on other filesystems which supply an fsck utility which may
 not support the preen option?  (or perhas '-p is defined as something
 else for those versions of fsck?)  specifically vxfs comes to mind since
 i know that some s10 deployments use that.

 ed

 On Fri, Dec 11, 2009 at 02:24:49PM +0100, Frank Batschulat (Home) wrote:
 friends, may I request code review for the earth-shattering fix to:

 6495558 zoneadm -z zone boot should not only check but repair filesystems
 http://cr.opensolaris.org/~batschul/onnv-vplat/

 backround:

 Evaluation

 when booting a zone, zoneadm ( ie. vplat.c:dofsck() ) should perform the 
 same tasks as the /usr/sbin/mountall script,
 which does a 'is suitable for mounting' (fsck -m) check first, followed by 
 a preen fsck (fsck -p) if the former failed.

 the obvious quick fix would be to change the code in vplat.c:dofsck()

 825 argv[0] = fsck;
 826 argv[1] = -m;
 827 argv[2] = (char *)rawdev;
 828 argv[3] = NULL;
 829
 830 status = forkexec(zlogp, cmdbuf, argv);
 831 if (status == 0 || status == -1)
 832 return (status);
 833 zerror(zlogp, B_FALSE, fsck of '%s' failed with exit status 
 %d; 
 834 run fsck manually, rawdev, status);
 835 return (-1);

 to always just run fsck in preen mode (shouldn't cause any real problem) or 
 fork off a 2nd fsck in preen mode
 if the first fsck -m failed.

 actually the fix will be to just execute fsck in preen mode (fsck -p) 
 rather then
 doing the 'is suitable for mounting' and preen fsck dance. if the former 
 fails,
 the latter will have to be done anyways. the latter however kind of implies
 the former.

 thanks!
 --
 frankB

___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2009-12-16 Thread Frank Batschulat (Home)

And finally, I have had this script run on a real, OSOL build 127 box for a day 
now.

can not reproduce it there either.

So I failed to reproduce this at all using the script on:

- ONNV 129 (zfs root, 1 cpu)
- ONNV 126 (ufs root, 2 cpus)
- OSOL 127 (zfs root, 4 cores)

there must be something special that I am missing.

On Wed, 16 Dec 2009 09:49:25 +0100, Frank Batschulat (Home) 
frank.batschu...@sun.com wrote:

 Glenn, I've not been able to reproduce this on onnv build 126 (it's running 
 for a day now)

 if that script would reproduce 6894901 straight away it should be doing so
 on 126 as well (similar to what you've seen in 127)

 this pose the question if there are either some other details in your
 environment that I don't have or if that script really reliably reproduces 
 6894901

 On Tue, 15 Dec 2009 15:23:06 +0100, Frank Batschulat (Home) 
 frank.batschu...@sun.com wrote:

 Glenn, I've been running this test case now for nearly a day on build 129, 
 could'nt
 reproduce at all. good chance this being indeed fixed by 6894901 in build 
 128.

 I'll also try to reproduce this now on buil 126.

 On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette glenn.brune...@sun.com 
 wrote:

 As part of some Immutable Service Container[1] demonstration that I am
 creating for an event in January.  I have the need to start/stop a zone
 quite a few times (as part of a Self-Cleansing[2] demo).  During the
 course of my testing, I have been able to repeatedly get zoneadm to
 hang.

 Since I am working with a highly customized configuration, I started
 over with a default zone on OpenSolaris (b127) and was able to repeat
 this issue.  To reproduce this problem use the following script after
 creating a zone usual the normal/default steps:

 isc...@osol-isc:~$ while : ; do
   echo `date`: ZONE BOOT
   pfexec zoneadm -z test boot
   sleep 30
   pfexec zoneamd -z test halt
   echo `date`: ZONE HALT
   sleep 10
   done

 This script works just fine for a while, but eventually zoneadm hangs
 (was at pass #90 in my last test).  When this happens, zoneadm is shown
 to be consuming quite a bit of CPU:

 PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP

   16598 root   11M 3140K run  10   0:54:49  74% zoneadm/1


 A stack trace of zoneadm shows:

 isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
 16082:  zoneadmd -z test
 -  lwp# 1  
 -  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
 -  lwp# 3  
   feef420f __door_return () + 2f
 -  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
 16598:  zoneadm -z test boot
   feef3fc8 door (6, 80476d0, 0, 0, 0, 3)
   feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
   fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
   0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
   08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9
   0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d


 A stack trace of zoneadmd shows:

 isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd`
 16082:  zoneadmd -z test
 -  lwp# 1  
 -  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
 -  lwp# 3  
   feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27
   feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32
   08058a88 server   (0, fe23f8f0, 510, 0, 0, 8058a04) + 84
   feef4240 __door_return () + 60
 -  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)

 A truss of zoneadm (-f -vall -wall -tall) shows this looping:

 16598:  door_call(6, 0x080476D0)= 0
 16598:  data_ptr=8047730 data_size=0
 16598:  desc_ptr=0x0 desc_num=0
 16598:  rbuf=0x807F2D8 rsize=4096
 16598:  close(6)= 0
 16598:  mkdir(/var/run/zones, 0700)   Err#17 EEXIST
 16598:  chmod(/var/run/zones, 0700)   = 0
 16598:  open(/var/run/zones/test.zoneadm.lock, O_RDWR|O_CREAT, 0600) = 6
 16598:  fcntl(6, F_SETLKW, 0x08046DC0

Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2009-12-15 Thread Frank Batschulat (Home)

Glenn, I've been running this test case now for nearly a day on build 129, 
could'nt
reproduce at all. good chance this being indeed fixed by 6894901 in build 128.

I'll also try to reproduce this now on buil 126.

cheers
frankB

On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette glenn.brune...@sun.com 
wrote:


 As part of some Immutable Service Container[1] demonstration that I am
 creating for an event in January.  I have the need to start/stop a zone
 quite a few times (as part of a Self-Cleansing[2] demo).  During the
 course of my testing, I have been able to repeatedly get zoneadm to
 hang.

 Since I am working with a highly customized configuration, I started
 over with a default zone on OpenSolaris (b127) and was able to repeat
 this issue.  To reproduce this problem use the following script after
 creating a zone usual the normal/default steps:

 isc...@osol-isc:~$ while : ; do
   echo `date`: ZONE BOOT
   pfexec zoneadm -z test boot
   sleep 30
   pfexec zoneamd -z test halt
   echo `date`: ZONE HALT
   sleep 10
   done

 This script works just fine for a while, but eventually zoneadm hangs
 (was at pass #90 in my last test).  When this happens, zoneadm is shown
 to be consuming quite a bit of CPU:

 PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP

   16598 root   11M 3140K run  10   0:54:49  74% zoneadm/1


 A stack trace of zoneadm shows:

 isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
 16082:zoneadmd -z test
 -  lwp# 1  
 -  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
 -  lwp# 3  
   feef420f __door_return () + 2f
 -  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
 16598:zoneadm -z test boot
   feef3fc8 door (6, 80476d0, 0, 0, 0, 3)
   feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
   fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
   0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
   08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9
   0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d


 A stack trace of zoneadmd shows:

 isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd`
 16082:zoneadmd -z test
 -  lwp# 1  
 -  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
 -  lwp# 3  
   feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27
   feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32
   08058a88 server   (0, fe23f8f0, 510, 0, 0, 8058a04) + 84
   feef4240 __door_return () + 60
 -  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)


 A truss of zoneadm (-f -vall -wall -tall) shows this looping:

 16598:  door_call(6, 0x080476D0)= 0
 16598:  data_ptr=8047730 data_size=0
 16598:  desc_ptr=0x0 desc_num=0
 16598:  rbuf=0x807F2D8 rsize=4096
 16598:  close(6)= 0
 16598:  mkdir(/var/run/zones, 0700)   Err#17 EEXIST
 16598:  chmod(/var/run/zones, 0700)   = 0
 16598:  open(/var/run/zones/test.zoneadm.lock, O_RDWR|O_CREAT, 0600) = 6
 16598:  fcntl(6, F_SETLKW, 0x08046DC0)  = 0
 16598:  typ=F_WRLCK  whence=SEEK_SET start=0 len=0
 sys=4277003009 pid=6
 16598:  open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 7
 16598:  door_info(7, 0x08047230)= 0
 16598:  target=16082 proc=0x8058A04 data=0x0
 16598:  attributes=DOOR_UNREF|DOOR_REFUSE_DESC|DOOR_NO_CANCEL
 16598:  uniquifier=26426
 16598:  close(7)= 0
 16598:  close(6)= 0
 16598:  open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 6
 16082/3:door_return(0x, 0, 0x, 0xFE23FE00,
 1007360) = 0
 16082/3:door_ucred(0x080A37C8)  = 0
 16082/3:euid=0 egid=0
 16082/3:ruid=0 rgid=0
 16082/3:pid=16598 zoneid=0
 16082/3:E: all

Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2009-12-12 Thread Frank Batschulat (Home)

sounds somewhat similar to 

6773836 zoneadm halt or halting/rebooting a non-global zone hangs the global 
zone

I'll try to reproduce this using your test case and see what I find. please 
file a bug
if it's still happen with 128 and is not fixed by  6894901 as Steve suggested.

cheers
frankB

On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette glenn.brune...@sun.com 
wrote:


 As part of some Immutable Service Container[1] demonstration that I am
 creating for an event in January.  I have the need to start/stop a zone
 quite a few times (as part of a Self-Cleansing[2] demo).  During the
 course of my testing, I have been able to repeatedly get zoneadm to
 hang.

 Since I am working with a highly customized configuration, I started
 over with a default zone on OpenSolaris (b127) and was able to repeat
 this issue.  To reproduce this problem use the following script after
 creating a zone usual the normal/default steps:

 isc...@osol-isc:~$ while : ; do
   echo `date`: ZONE BOOT
   pfexec zoneadm -z test boot
   sleep 30
   pfexec zoneamd -z test halt
   echo `date`: ZONE HALT
   sleep 10
   done

 This script works just fine for a while, but eventually zoneadm hangs
 (was at pass #90 in my last test).  When this happens, zoneadm is shown
 to be consuming quite a bit of CPU:

 PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP

   16598 root   11M 3140K run  10   0:54:49  74% zoneadm/1


 A stack trace of zoneadm shows:

 isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
 16082:zoneadmd -z test
 -  lwp# 1  
 -  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
 -  lwp# 3  
   feef420f __door_return () + 2f
 -  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
 16598:zoneadm -z test boot
   feef3fc8 door (6, 80476d0, 0, 0, 0, 3)
   feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
   fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
   0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
   08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9
   0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d


 A stack trace of zoneadmd shows:

 isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd`
 16082:zoneadmd -z test
 -  lwp# 1  
 -  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
 -  lwp# 3  
   feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27
   feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32
   08058a88 server   (0, fe23f8f0, 510, 0, 0, 8058a04) + 84
   feef4240 __door_return () + 60
 -  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)


 A truss of zoneadm (-f -vall -wall -tall) shows this looping:

 16598:  door_call(6, 0x080476D0)= 0
 16598:  data_ptr=8047730 data_size=0
 16598:  desc_ptr=0x0 desc_num=0
 16598:  rbuf=0x807F2D8 rsize=4096
 16598:  close(6)= 0
 16598:  mkdir(/var/run/zones, 0700)   Err#17 EEXIST
 16598:  chmod(/var/run/zones, 0700)   = 0
 16598:  open(/var/run/zones/test.zoneadm.lock, O_RDWR|O_CREAT, 0600) = 6
 16598:  fcntl(6, F_SETLKW, 0x08046DC0)  = 0
 16598:  typ=F_WRLCK  whence=SEEK_SET start=0 len=0
 sys=4277003009 pid=6
 16598:  open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 7
 16598:  door_info(7, 0x08047230)= 0
 16598:  target=16082 proc=0x8058A04 data=0x0
 16598:  attributes=DOOR_UNREF|DOOR_REFUSE_DESC|DOOR_NO_CANCEL
 16598:  uniquifier=26426
 16598:  close(7)= 0
 16598:  close(6)= 0
 16598:  open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 6
 16082/3:door_return(0x, 0, 0x, 0xFE23FE00,
 1007360) = 0
 16082/3:door_ucred(0x080A37C8)  = 0
 16082/3:euid=0 egid=0
 16082/3:ruid=0 rgid=0
 16082/3:

Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2009-12-12 Thread Frank Batschulat (Home)

On Sat, 12 Dec 2009 10:06:43 +0100, Frank Batschulat (Home) 
frank.batschu...@sun.com wrote:

 sounds somewhat similar to

 6773836 zoneadm halt or halting/rebooting a non-global zone hangs the global 
 zone

wrong cut+past I did ment to say:

6734679 zoneadm halt hung during zones test


 I'll try to reproduce this using your test case and see what I find. please 
 file a bug
 if it's still happen with 128 and is not fixed by  6894901 as Steve suggested.

 cheers
 frankB

 On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette glenn.brune...@sun.com 
 wrote:


 As part of some Immutable Service Container[1] demonstration that I am
 creating for an event in January.  I have the need to start/stop a zone
 quite a few times (as part of a Self-Cleansing[2] demo).  During the
 course of my testing, I have been able to repeatedly get zoneadm to
 hang.

 Since I am working with a highly customized configuration, I started
 over with a default zone on OpenSolaris (b127) and was able to repeat
 this issue.  To reproduce this problem use the following script after
 creating a zone usual the normal/default steps:

 isc...@osol-isc:~$ while : ; do
   echo `date`: ZONE BOOT
   pfexec zoneadm -z test boot
   sleep 30
   pfexec zoneamd -z test halt
   echo `date`: ZONE HALT
   sleep 10
   done

 This script works just fine for a while, but eventually zoneadm hangs
 (was at pass #90 in my last test).  When this happens, zoneadm is shown
 to be consuming quite a bit of CPU:

 PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP

   16598 root   11M 3140K run  10   0:54:49  74% zoneadm/1


 A stack trace of zoneadm shows:

 isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
 16082:   zoneadmd -z test
 -  lwp# 1  
 -  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
 -  lwp# 3  
   feef420f __door_return () + 2f
 -  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
 16598:   zoneadm -z test boot
   feef3fc8 door (6, 80476d0, 0, 0, 0, 3)
   feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
   fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
   0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
   08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9
   0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d


 A stack trace of zoneadmd shows:

 isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd`
 16082:   zoneadmd -z test
 -  lwp# 1  
 -  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
 -  lwp# 3  
   feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27
   feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32
   08058a88 server   (0, fe23f8f0, 510, 0, 0, 8058a04) + 84
   feef4240 __door_return () + 60
 -  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)


 A truss of zoneadm (-f -vall -wall -tall) shows this looping:

 16598:  door_call(6, 0x080476D0)= 0
 16598:  data_ptr=8047730 data_size=0
 16598:  desc_ptr=0x0 desc_num=0
 16598:  rbuf=0x807F2D8 rsize=4096
 16598:  close(6)= 0
 16598:  mkdir(/var/run/zones, 0700)   Err#17 EEXIST
 16598:  chmod(/var/run/zones, 0700)   = 0
 16598:  open(/var/run/zones/test.zoneadm.lock, O_RDWR|O_CREAT, 0600) = 6
 16598:  fcntl(6, F_SETLKW, 0x08046DC0)  = 0
 16598:  typ=F_WRLCK  whence=SEEK_SET start=0 len=0
 sys=4277003009 pid=6
 16598:  open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 7
 16598:  door_info(7, 0x08047230)= 0
 16598:  target=16082 proc=0x8058A04 data=0x0
 16598:  attributes=DOOR_UNREF|DOOR_REFUSE_DESC|DOOR_NO_CANCEL
 16598:  uniquifier=26426
 16598:  close(7)= 0
 16598:  close(6)= 0
 16598:  open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 6
 16082/3:door_return(0x, 0, 0x

[zones-discuss] code review for 6495558

2009-12-11 Thread Frank Batschulat (Home)

friends, may I request code review for the earth-shattering fix to:

6495558 zoneadm -z zone boot should not only check but repair filesystems
http://cr.opensolaris.org/~batschul/onnv-vplat/

backround:

Evaluation

when booting a zone, zoneadm ( ie. vplat.c:dofsck() ) should perform the same 
tasks as the /usr/sbin/mountall script,
which does a 'is suitable for mounting' (fsck -m) check first, followed by a 
preen fsck (fsck -p) if the former failed.

the obvious quick fix would be to change the code in vplat.c:dofsck()

825 argv[0] = fsck;
826 argv[1] = -m;
827 argv[2] = (char *)rawdev;
828 argv[3] = NULL;
829 
830 status = forkexec(zlogp, cmdbuf, argv);
831 if (status == 0 || status == -1)
832 return (status);
833 zerror(zlogp, B_FALSE, fsck of '%s' failed with exit status 
%d; 
834 run fsck manually, rawdev, status);
835 return (-1);

to always just run fsck in preen mode (shouldn't cause any real problem) or 
fork off a 2nd fsck in preen mode
if the first fsck -m failed.

actually the fix will be to just execute fsck in preen mode (fsck -p) rather 
then
doing the 'is suitable for mounting' and preen fsck dance. if the former fails,
the latter will have to be done anyways. the latter however kind of implies
the former.

thanks!
-- 
frankB

It is always possible to agglutinate multiple separate problems
into a single complex interdependent solution.
In most cases this is a bad idea.
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Solaris10-Branded Zones Webrev: CR 6882732

2009-12-10 Thread Frank Batschulat (Home)

On Wed, 09 Dec 2009 23:54:05 +0100, Jordan Vaughan jordan.vaug...@sun.com 
wrote:

 I need someone to review my fix for

 6882732 unpacking archive with extended file attributes reports errors

 The webrev is accessible via

 http://cr.opensolaris.org/~flippedb/onnv-s10c

looks good to me.

cheers
---
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

[zones-discuss] /var/run/zones not cleaned up ?

2009-12-10 Thread Frank Batschulat (Home)

is it to be expected that after no zoneadm/zoneadmd is running
anymore, /var/run/zones still contains the corresponding lock files ?

(also I looked at the current threadlist of my system and no zone releated
 kernel threads are running anymore)

osoldev.root./var/run/zones.= zoneadm list -cp
0:global:running:/::ipkg:shared
-:zone2:configured:/tank/zones/zone2::ipkg:shared
osoldev.root./var/run/zones.= ps -eafd|grep zone
root  2961  2734   0 16:35:06 pts/2   0:00 grep zone
osoldev.root./var/run/zones.= ls -la
total 16
drwx--   2 root root 335 Dec 10 12:23 .
drwxr-xr-x  11 root sys 2423 Dec 10 12:21 ..
-rw-r--r--   1 root root   0 Dec 10 12:23 index.lock
-rw---   1 root root   0 Dec 10 12:21 zone1.zoneadm.lock
-rw---   1 root root   0 Dec 10 12:21 zone1.zoneadmd_door

this was after a zone boot/zone halt/zone uninstall/zone delete cycle.

bug, feature ? 

---
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Impossible to install a zone on a new indiana b128a install

2009-12-09 Thread Frank Batschulat (Home)

On Tue, 08 Dec 2009 20:01:19 +0100, Ric Aleshire ric.alesh...@sun.com wrote:

 Vincent Boisard wrote:
 Hi,

 I just installed a new opensolaris dev b128a machine from an iso downloaded 
 from genunix.

 I tried to install a zone but it failed:
 r...@pasiphae:~# zoneadm -z template install
 A ZFS file system has been created for this zone.
Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/ ).
Image: Preparing at /zones/template/root.
Cache: Using /var/pkg/download.
 Sanity Check: Looking for 'entire' incorporation.
   Installing: Core System (output follows)
 No updates necessary for this image.
 ERROR: failed to install package

 (it downloaded ~120MB of files into /var/pkg/download)
 The zone is in incomplete state and the zone root fs is mounted

 As it is the first time I'm using opensolaris (I used SXCE before), I have 
 no idea what's going on here.

 This is bug that is fixed in b129:
 http://defect.opensolaris.org/bz/show_bug.cgi?id=12995

Vincent, if you can not wait for build 129, there's a workaround:

edit the script: /usr/lib/brand/ipkg/pkgcreatezone

and comment out line 468 in the file and which looks like:
 
$PKG install --no-refresh --no-index SUNWcs || fail_fatal $f_pkg

if you comment it out, the zone will be installed fine.

cheers
frankB



___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] solaris 10 branded zone

2009-12-08 Thread Frank Batschulat (Home)

On Tue, 08 Dec 2009 18:31:55 +0100, xx david.sc...@autohandle.com wrote:

 i'm still not doing something right:

 init...@dogpatch:~# pkg install SUNWs10brand
 Creating Plan
 pkg: The following pattern(s) did not match any packages in the current 
 catalog.
 Try relaxing the pattern, refreshing and/or examining the catalogs:
   SUNWs10brand

 init...@dogpatch:~# pkg info -r * 21 | grep brand
   pkg://development/SUNWipkg-brand
 init...@dogpatch:~# pkg info -r * 21 | grep s10
 init...@dogpatch:~# pkg publisher
 PUBLISHER TYPE STATUS   URI
 development  (preferred)  origin   online   
 http://pkg.opensolaris.org/dev/
 extra origin   online   
 https://pkg.sun.com/opensolaris/extra/
 opensolaris.org   origin   online   
 http://pkg.opensolaris.org/release/
 init...@dogpatch:~#

I think thats because your publisher local name is not the default,
I did the same today and it worked

batsc...@osoldev:/usr/lib/brand$ pkg info -r * 21 | grep brand
pkg://opensolaris.org/SUNWipkg-brand
pkg://opensolaris.org/SUNWs10brand

my publisher is this:

batsc...@osoldev:/usr/lib/brand$  pkg publisher
PUBLISHER TYPE STATUS   URI
opensolaris.org  (preferred)  origin   online   
http://pkg.opensolaris.org/dev/
blastwave origin   online   
http://blastwave.network.com:1/
osol-contrib  origin   online   
http://pkg.opensolaris.org/contrib/
sunfreeware.com   origin   online   
http://pkg.sunfreeware.com:9000/

I believe there were some problems recently when the local name did
not matched the offical publishers name.

hth
-- 
frankB

It is always possible to agglutinate multiple separate problems
into a single complex interdependent solution.
In most cases this is a bad idea.
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Re: [nfs-discuss] Re: [sysadmin-discuss] NFS server in zones

2007-02-15 Thread Frank Batschulat (Home)

On Thu, 15 Feb 2007 06:19:10 +0100, Mahesh Siddheshwar  
[EMAIL PROTECTED] wrote:



Robert Thurlow wrote:

Glenn Faden wrote:


4) A bug currently prevents a client instance and a server instance
from being safe to use on the same box (apologies, can't quote the
bugid from here).  How likely, in your use case, is it that this will
be a problem, i.e. will your boxes be in the position where a zone
needs data shared from another zone as opposed to a separate server?



This is a must fix. In TX we want to automount between labeled zones  
on the same machine. It seems to work with ZFS. Is the deadlock  
specific to UFS/NFS?


Good question!  I don't expect that it is, but perhaps ZFS's use of the
ARC would insulate it.  Maybe Mahesh would know.


The problem seen in 5065254 and what is seen commonly
in the recent past is mainly due to the interaction between NFS, UFS and
segmap driver**. This scenario, typically, is noticeable only under
heavy load or on systems with a low segmapsize.  Since ZFS does not
use the segmap driver, this particular scenario should be averted.

Currently the loopback mounted configuration is never tested. So I
won't be surprised if we run into other loopings, but with some
effort those should be tractable.

Mahesh

** NFS tries to commit pages, which, on the NFS server
requires UFS to obtain a segmap slot. Before you use the segmap slot,
you need to free/destroy the previous mappings for the segmap slot, which
happens to be a locked NFS page, the lock for which is currently
owned by the commit thread which begun this process.

There is one more scenario which I have not seen, but
is theoretically possible -- where writing to a UFS file
requires stealing a dirty NFS page which would in
turn require writing to the server, which requires exclusive
locking of the same UFS file.


even if you leave this segmap issue aside, it is very likely you encounter
different deadlocks, because this is an attempt to stack file systems
that are not really stackable file systems and you'd run into
issues similar to 4498652 / 4154394

---
frankB
___
zones-discuss mailing list
zones-discuss@opensolaris.org

53 matches

Mail list logo