Re: [zones-discuss] Branded zones and external hardware
On Thu, 05 Aug 2010 15:03:56 +0200, Joerg Schilling joerg.schill...@fokus.fraunhofer.de wrote: Frank Batschulat (Home) frank.batschu...@sun.com wrote: the problem with exporting the tape device to a NGZ, which although not supported can be achived as you mention, is that there's no way to exclusive assign that particular tape device to a particular NGZ or to restrict access from the GZ or any other NGZ to that same tape device. that might become a problem if several different users try to use that tape from different NGZs or a NGZ and the GZ, that access may produce a somewhat questionable end result that care must be taken here when setting up such configuration. Where do you see a difference from many different users trying to access the same tape from the Global Zone? technically there is no difference here. but from an administrative point of view there is. the zone administration (zones root) is often delegated to some other person(s) then the one administering the GZ. the zones root position may be fullfilled by an internal or external client of the entity that administers and own the GZ and the corresponding HW itself. one must just be more aware of the fact that there's no restricted access to such a tape device then in normal situations because its so easy to forgett that you've given away the tape device so some NGZ in the past. --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] whole root not installed ?!?
On Sat, 03 Jul 2010 13:36:51 +0200, Daniel Dinu daniel.d...@gmail.com wrote: zone1 is installed in /vol1/zone1 and zone2 in /vol1/zone2. zone1 was configured as a sparse root zone (I used create command). zone2 was configured as a whole root zone (I used create -b command). Still, the space used is the same for both zones, as depicted aboveOf course, I expected that zone2 to use more space than zone1 (GB vs. MB). Can anybody tell what have I done wrong? Is there something else I should've done, besides using create -b for the whole root zone creation? nothing wrong, there are no sparse root zones for the ipkg(5) branded zones in OpenSolaris. --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] confusing zone login processes
just noticed something strange, perhaps someone has an explanation ? after booting a zone and login to that: osoldev.batschul./export/home/techdocs/solaris_kernel/zones.= pfexec zoneadm -z zone2 boot osoldev.batschul./export/home/techdocs/solaris_kernel/zones.= ps -eafd -Z | grep login global batschul 3821 993 0 07:59:32 pts/3 0:00 grep login global root 2301 1750 0 07:43:19 pts/5 0:00 zlogin -C zone2 now login to the zone: osoldev.batschul./export/home/batschul.= pfexec zlogin zone2 [Connected to zone 'zone2' pts/6] Last login: Wed Jun 2 07:52:29 on pts/6 Oracle Corporation SunOS 5.11 snv_140 May 2010 from the NGZ I see: r...@zone2:~# ps -eafd|grep login root 3823 3386 0 07:59:39 pts/6 0:00 /usr/bin/login -z global -f root root 3836 3824 0 08:00:30 pts/6 0:00 grep login from tge GZ I see: osoldev.batschul./export/home/techdocs/solaris_kernel/zones.= ps -eafd -Z | grep login global root 3822 975 0 07:59:39 pts/2 0:00 zlogin zone2 zone2 root 3823 3822 0 07:59:39 ?? 0:00 /usr/bin/login -z global -f root global root 2301 1750 0 07:43:19 pts/5 0:00 zlogin -C zone2 global batschul 3831 993 0 07:59:43 pts/3 0:00 grep login hugh? where does it got that from ? zone2 root 3823 3822 0 07:59:39 ?? 0:00 /usr/bin/login -z global -f root this only happens when I use pfexec zlogin zone2, it does not happen when logging in on the console ie. pfexec zlogin -C zone2 thanks frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] confusing zone login processes
On Wed, 02 Jun 2010 09:53:03 +0200, casper@sun.com wrote: osoldev.batschul./export/home/techdocs/solaris_kernel/zones.= ps -eafd -Z | grep login global root 3822 975 0 07:59:39 pts/2 0:00 zlogin zone2 zone2 root 3823 3822 0 07:59:39 ?? 0:00 /usr/bin/login -z global -f root global root 2301 1750 0 07:43:19 pts/5 0:00 zlogin -C zone2 global batschul 3831 993 0 07:59:43 pts/3 0:00 grep login hugh? where does it got that from ? zone2 root 3823 3822 0 07:59:39 ?? 0:00 /usr/bin/login -z global -f root I think it's because of auditing enabled by default: it keeps an additional copy of login. hmmm, auditing or accounting ? osoldev.batschul./export/home/.= pfexec mdb -k audit_active/D audit_active: audit_active: 1 557 #define C2AUDIT_UNLOADED1 /* c2audit module not loaded */ and the accounting mod is also not loaded. --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Q: is there a way to get a zoneid from kernel (not in user context)?
On Fri, 28 May 2010 07:31:27 +0200, eiji@oracle.com wrote: I'm wondering if there is a way to get a zoneid from kernel even though it's not in user context. If it's possible, this is useful for us to activate an HCA port in the exclusive-IP zone at boot time. Here's the background info. Currently the IP path info is gotten when the driver is attached, then an HCA port can be ready for RDSv3, but this way is for the global zone, and it doesn't work well for the exclusive-IP zone because the driver cannot get the zoneid when it's attached (so far). After all, we have to wait until customers run a command for RDSv3 in the zone, but the port should be ready at boot time w/o any customers' actions. It'd be better off getting it in the driver attach, but I don't know if it's possible. If it's not possible to get a zoneid from kernel if it's not in user context, then is there any recommended method to get it at boot time? I'm thinking maybe by using SMF, we can invoke an appropriate command (like ifconfig) at boot time to activate HCAs in the exclusive-IP zones, but if there is a proper way for this kind of purpose, that'd be better. for kernel consumers use: #include sys/zone.h 473 extern zoneid_t getzoneid(void); cheers --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] new zones community leaders
On Fri, 28 May 2010 01:01:11 +0200, Edward Pilatowicz edward.pilatow...@oracle.com wrote: hey all, i wanted to propose zone community leadership status for the following folks: Frank Batschulat Gary Pennington John Levon Susan Kamm-Worrell +1 ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Q: is there a way to get a zoneid from kernel (not in user context)?
On Fri, 28 May 2010 10:43:22 +0200, eiji@oracle.com wrote: Hi Frank, getzoneid() can return a correct value even if it's called in a taskq thread (kernel context) and/or in an interrupt handler (interrupt context)? I suppose so, look its not doing anything earth shattering: 2496 getzoneid(void) 2497 { 2498 return (curproc-p_zone-zone_id); 2499 } no locking involved, no allocations done, nothing considered harmfull in an interrupt context or taskq thread. only question is to what proc your taskq/interrupt thread will bind to. p0 or zsched ? p0 will always deliver the GLOBAL_ZONEID (zone0) Thanks, -Eiji I'm wondering if there is a way to get a zoneid from kernel even though it's not in user context. If it's possible, this is useful for us to activate an HCA port in the exclusive-IP zone at boot time. Here's the background info. Currently the IP path info is gotten when the driver is attached, then an HCA port can be ready for RDSv3, but this way is for the global zone, and it doesn't work well for the exclusive-IP zone because the driver cannot get the zoneid when it's attached (so far). After all, we have to wait until customers run a command for RDSv3 in the zone, but the port should be ready at boot time w/o any customers' actions. It'd be better off getting it in the driver attach, but I don't know if it's possible. If it's not possible to get a zoneid from kernel if it's not in user context, then is there any recommended method to get it at boot time? I'm thinking maybe by using SMF, we can invoke an appropriate command (like ifconfig) at boot time to activate HCAs in the exclusive-IP zones, but if there is a proper way for this kind of purpose, that'd be better. for kernel consumers use: #include sys/zone.h 473 extern zoneid_t getzoneid(void); cheers --- frankB -- frankB It is always possible to agglutinate multiple separate problems into a single complex interdependent solution. In most cases this is a bad idea. ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Q: is there a way to get a zoneid from kernel (not in user context)?
On Fri, 28 May 2010 13:35:07 +0200, James Carlson carls...@workingcode.com wrote: On 05/28/10 04:57, Frank Batschulat (Home) wrote: On Fri, 28 May 2010 10:43:22 +0200, eiji@oracle.com wrote: getzoneid() can return a correct value even if it's called in a taskq thread (kernel context) and/or in an interrupt handler (interrupt context)? I suppose so, look its not doing anything earth shattering: 2496 getzoneid(void) 2497 { 2498 return (curproc-p_zone-zone_id); 2499 } no locking involved, no allocations done, nothing considered harmfull in an interrupt context or taskq thread. only question is to what proc your taskq/interrupt thread will bind to. It sounds like we might need more information about what the original poster is attempting to do. Interrupts themselves aren't features of non-global zones, so they're not normally attributed to any particular zone. In theory, if there were devices dedicated to individual zones, you could use the device's state structure to find the zoneid associated. If you just use getzoneid() in that context, you'll get the zoneid of the zone whose thread happens to be pinned down by the interrupt. In other words, it's an arbitrary and almost certainly wrong answer. I think something's amiss if you're asking about zoneid outside the context of direct system call processing. The answers there vary quite a bit. For example, with STREAMS, the correct answer is to fetch the cred_t attached to the dblk_t, and get the zoneid from the cred_t. It's not unusual at all for interrupts and taskqs to do work on behalf of many different zones, and for them to need to track this information separately. yepp, my bad ! and of course interrupt threads are always bound to p0 when they are created (- thread_create_intr()) --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Q: is there a way to get a zoneid from kernel (not in user context)?
On Fri, 28 May 2010 13:35:07 +0200, James Carlson carls...@workingcode.com wrote: On 05/28/10 04:57, Frank Batschulat (Home) wrote: On Fri, 28 May 2010 10:43:22 +0200, eiji@oracle.com wrote: getzoneid() can return a correct value even if it's called in a taskq thread (kernel context) and/or in an interrupt handler (interrupt context)? I suppose so, look its not doing anything earth shattering: 2496 getzoneid(void) 2497 { 2498 return (curproc-p_zone-zone_id); 2499 } no locking involved, no allocations done, nothing considered harmfull in an interrupt context or taskq thread. only question is to what proc your taskq/interrupt thread will bind to. and not only are interrupt threads bound to p0, taskq threads created using taskq_create() are also bound to p0. so for taskq that'd be only valid if you'd use taskq_create_proc() with something not being p0. --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] impossible to attach migrated zone to a new server
On Wed, 19 May 2010 20:47:41 +0200, Philippe Bürgisser burgis...@rfidcenter.ch wrote: I followed the guide (http://docs.sun.com/app/docs/doc/819-2450/gcgnc?l=ena=view) from sun to move a working zone into a new server with the same configuration. I executed those commands : tar xf myzone.tar -- to /export/zones/myzone zonecfg -z myzone create -a /export/zones/myzone all was ok when I wanted to attach, I got this message r...@ns358375:/# zoneadm -z proxiproduits attach -u try these steps for v2v, thats how it worked for me when I tested this last time: source system: batsc...@suizid:~$ zoneadm list -cp 0:global:running:/::ipkg:shared -:my-zone:installed:/tank/zones/my-zone:629e41d5-7d92-4949-cd8b-ee755143fea0:ipkg:shared batsc...@suizid:~$ pfexec zoneadm -z my-zone detach batsc...@suizid:~$ zoneadm list -cp 0:global:running:/::ipkg:shared -:my-zone:configured:/tank/zones/my-zone::ipkg:shared batsc...@suizid:~$ cd /tank/zones batsc...@suizid:/tank/zones$ su Password: batsc...@suizid:/tank/zones# find my-zone -print | cpio -oP@/ | gzip my-zone.cpio.gz 891010 blocks batsc...@suizid:/tank/zones# ls -la my-zone.cpio.gz -rw-r--r-- 1 root root 124913062 Mar 10 14:06 my-zone.cpio.gz target system: osoldev.batschul./.= zoneadm list -cp 0:global:running:/::ipkg:shared -:my-zone:configured:/tank/zones/my-zone::ipkg:shared NB: the location of he archive here is important afai recall, ie it shall be in the zoneroots parent: osoldev.batschul./.= pfexec zoneadm -z my-zone attach -a /tank/zones/my-zone.cpio.gz -u Log File: /tmp/my-zone.attach_log.doaGco Attaching... Global zone version: ent...@0.5.11,5.11-0.134:20100302T023003Z Non-Global zone version: ent...@0.5.11,5.11-0.134:20100302T023003Z Evaluation: Packages in my-zone are in sync with global zone. Attach complete. osoldev.batschul./.= zoneadm list -cp 0:global:running:/::ipkg:shared -:my-zone:installed:/tank/zones/my-zone:53c18130-533f-69ab-c998-f160f0a8ebc3:ipkg:shared hth frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] renaming zonepath
On Sun, 21 Feb 2010 17:33:06 +0100, Anil an...@entic.net wrote: r...@vps1:~# zoneadm -z note move /zones/note Moving across file systems; copying zonepath /zones/bugs...sh[1]: cd: /zones/bugs: [No such file or directory] zoneadm: zone 'note': 'copy' failed with exit code 1. The copy failed. More information can be found in /var/log/zoneAAA2XaapU Cleaning up zonepath /zones/note...The ZFS file system for this zone has been destroyed. I believe the zones are not mounted when the zone is not running so the cp fails. Luckily it did not delete the data *phew*. hmmm...I tested this myself, I'm not getting the bizar error messages from sh. what error messages are logged in mentioned file 'var/log/zoneAAA2XaapU' in general, I'd be helpfull to provide corresponding 'zoneadm list -cp' and 'zfs list -t all' outputs for the zones before and after the failure, as well as OS version/build info. the error message suggest we attempted to copy which is correct if we're crossing file system boundaries according to the 'move' psarc case: PSARC/2005/711 snip The syntax for moving a zone will be: # zoneadm -z my-zone move /newpath where /newpath specifies the new zonepath for the zone. This will be implemented so that it works both within and across filesystems, subject to the existing rules for zonepath (e.g. it cannot be on an NFS mounted filesystem). When crossing filesystem boundaries the data will be copied and the original directory will be removed. Internally the copy will be implemented using cpio with the proper options to preserve all of the data (ACLs, etc.). The zone must be halted while being moved. snip end however contrary to this description and your case, in my tests this just renames the zfs mount point property and does notthing else, even when the move would cross file system boundaries: osoldev.root./export/home/batschul.= zoneadm list -cp 0:global:running:/::ipkg:shared -:zone1:installed:/tank/zones/zone1:caa7e784-dab0-6f77-e202-8cf135714809:ipkg:shared osoldev.root./export/home/batschul.= zfs list -t all NAME USED AVAIL REFER MOUNTPOINT tank/zones 996M 151G38K /tank/zones tank/zones/zone1 996M 151G36K /tank/zones/zone1 tank/zones/zone1/ROOT996M 151G 31.5K legacy tank/zones/zone1/ROOT/zbe996M 151G 996M legacy 1) moving inside the same zfs dataset tank/zones osoldev.root./export/home/batschul.= zoneadm -z zone1 move /tank/zones/test osoldev.root./export/home/batschul.= zoneadm list -cp 0:global:running:/::ipkg:shared -:zone1:installed:/tank/zones/test:caa7e784-dab0-6f77-e202-8cf135714809:ipkg:shared osoldev.root./export/home/batschul.= zfs list -t all NAME USED AVAIL REFER MOUNTPOINT tank/zones 996M 151G 36.5K /tank/zones tank/zones/zone1 996M 151G36K /tank/zones/test tank/zones/zone1/ROOT996M 151G 31.5K legacy tank/zones/zone1/ROOT/zbe996M 151G 996M legacy 2) moving to different zfs dataset and different pool, rpool/export/home osoldev.root./export/home/batschul.= zoneadm -z zone1 move /export/home/test2 osoldev.root./export/home/batschul.= zoneadm list -cp 0:global:running:/::ipkg:shared -:zone1:installed:/export/home/test2:caa7e784-dab0-6f77-e202-8cf135714809:ipkg:shared osoldev.root./export/home/batschul.= zfs list -t all NAME USED AVAIL REFER MOUNTPOINT tank/zones 996M 151G 36.5K /tank/zones tank/zones/zone1 996M 151G36K /export/home/test2 tank/zones/zone1/ROOT996M 151G 31.5K legacy tank/zones/zone1/ROOT/zbe996M 151G 996M legacy so there's no move in the sense of move happening in case 2) this seems wrong to me. even the behavior in 1) looks suspect. apparently we do have a bug open already that pretty much matches suspect 1) 6918505 zone move should rename ZFS file system, not change mountpoint http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6918505 but I can not find anything existing for suspect 2) and the missing move action here as not only do we cross file system boundaries but we do even move over to a different pool! so either we'd need to enhance the scope of 6918505 or a new bug ought to be filed for case 2) --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zones on shared storage - a warning
update on this one: a workaround if you so will, or the more appropriate way to do this is apparently to use lofiadm(1M) to create a pseudo block device comprising the file hosted on NFS and use the created lofi device (eg. /dev/lofi/1) as the device for zpool create and all subsequent I/O (this was not producing the strange CKSUM errors), eg.: osoldev.root./export/home/batschul.= mount -F nfs opteron:/pool/zones /nfszone osoldev.root./export/home/batschul.= mount -v| grep nfs opteron:/pool/zones on /nfszone type nfs remote/read/write/setuid/devices/xattr/dev=9080001 on Tue Feb 9 10:37:00 2010 osoldev.root./export/home/batschul.= nfsstat -m /nfszone from opteron:/pool/zones Flags: vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600 Attr cache:acregmin=3,acregmax=60,acdirmin=30,acdirmax=60 osoldev.root./export/home/batschul.= mkfile -n 7G /nfszone/remote.file osoldev.root./export/home/batschul.= ls -la /nfszone total 28243534 drwxrwxrwx 2 nobody nobody 6 Feb 9 09:36 . drwxr-xr-x 30 batschul other 32 Feb 8 22:24 .. -rw--- 1 nobody nobody 7516192768 Feb 9 09:36 remote.file osoldev.root./export/home/batschul.= lofiadm -a /nfszone/remote.file /dev/lofi/1 osoldev.root./export/home/batschul.= lofiadm Block Device File Options /dev/lofi/1 /nfszone/remote.file - osoldev.root./export/home/batschul.= zpool create -m /tank/zones/nfszone nfszone /dev/lofi/1 Feb 9 10:50:35 osoldev zfs: [ID 249136 kern.info] created version 22 pool nfszone using 22 osoldev.root./export/home/batschul.= zpool status -v nfszone pool: nfszone state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nfszoneONLINE 0 0 0 /dev/lofi/1 ONLINE 0 0 0 --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Error on zoneadm attach -u when going from b132 to b133
On Mon, 22 Feb 2010 11:49:46 +0100, Paul van der Zwan paul.vanderz...@sun.com wrote: I upgraded my system from b132 to b133 this weekend and I got error messages when I ran attach -u to upgrade my zones. The second run of the install of updated packages fails. In the log I find: $ pfexec cat /var/tmp/dns.attach_log.sCaydi [Saturday, 20 February 2010 20:57:50 CET] Log File: /var/tmp/dns.attach_log.sCaydi [Saturday, 20 February 2010 20:57:52 CET] Attaching... [Saturday, 20 February 2010 20:57:52 CET] existing [Saturday, 20 February 2010 20:57:52 CET] [Saturday, 20 February 2010 20:57:52 CET] Sanity Check: Passed. Looks like an OpenSolaris system. pkg: 'network/ftp' matches multiple packages network/ftp service/network/ftp 'network/dns/bind' matches multiple packages service/network/dns/bind network/dns/bind 'network/ssh' matches multiple packages network/ssh service/network/ssh If I run attach -u a second time it attaches without doing anything, or giving an error. Are my zones OK or are they partly upgraded ? I think exactly this issue is listed in the 133 release notes, and it states running a 2nd attach will work. if our marvellous opensolaris.org system would work you could read the 133 release notes here on the indiana discuss alias: http://opensolaris.org/jive/thread.jspa?threadID=124275 --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] headsup: going to build 133 with Zones installed
jfyi, if you do 'pkg image-upgrade' from build (132) to build 133 and you do have non-global zones in the installed state, you may run into: Bug 14668 - pkg directory action does work when there is none http://defect.opensolaris.org/bz/show_bug.cgi?id=14668 note that you should only hit this if you have installed SUNWscp. see bugs: Bug 14667 - compatibility/ucb delivers /export and /home http://defect.opensolaris.org/bz/show_bug.cgi?id=14667 and http://bugs.opensolaris.org/view_bug.do?bug_id=6928051 workaround_1: - uninstall the zones before image-upgrade workaround_2 (as reported in the bug): - rebooting into single user, entering maintenance mode as root and image-updating from there. - NB: _strongly discouraged_: workaround_3: - detach all of the zones on the system before the image-update. Why is workaround_3 strongly discouraged ? this is strongly discouraged as no new BE gets created for the installed Zone, instead the current Zone BE is updated if you do an attach -u following this image-update. if you want to switch back to a previous build BE later, eg. 132 (because there's something in 133 that prevents you from doing what you want), the Zones BE will no longer match the global BE, eg. GZ BE will be build 132 but the NGZ BE will still be 133. Net result, Zone not usable anymore in previous BE with build 132, ie. you can not downgrade the NGZ BE again. --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] renaming zonepath
On Sun, 21 Feb 2010 07:15:10 +0100, Anil an...@entic.net wrote: I am trying to rename a zonename (and zonepath) to a new zone. But I get this error: # zfs rename zones/bugs zones/note cannot rename 'zones/bugs': child dataset with inherited mountpoint is used in a non-global zone thats correct, zfs rename honours the 'zoned' attribute of the child datasets are currently in use by a NGZ, ie. you shall be seeing this 'zoned' property on: osoldev.batschul./export/home/batschul.= pfexec zfs get zoned tank/zones/zone2/ROOT NAME PROPERTY VALUE SOURCE tank/zones/zone2/ROOT zoned on local I tried this with the zone in detached mode. Any tips? zoneadm -z bugs move /zones/note zoneadm(1M) move new_zonepath Move the zonepath to new_zonepath. The zone must be halted before this subcommand can be used. The new_zonepath must be a local file system and normal res- trictions for zonepath apply. hth frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] codereview for 6914152 (zonecfg)
May I request 2 code reviewers for the changes for: 6914152 zonecfg fails when less(1M) is missing http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6914152 http://cr.opensolaris.org/~batschul/zpager/ thanks! frankB 1) old failure when $PAGER was bogus: osoldev.root./export/home/batschul/tmp.= setenv PAGER foobar osoldev.root./export/home/batschul/tmp.= zonecfg -z zone2 zonecfg:zone2 info sh: line 1: foobar: not found zonecfg:zone2 help sh: line 1: foobar: not found zonecfg:zone2 help add usage: add resource-type (global scope) add property-name property-value (resource scope) Add specified resource to configuration. sh: line 1: foobar: not found 2) new failure mode when $PAGER is bogus: osoldev.root./export/home/batschul.= setenv PAGER nonsense osoldev.root./export/home/batschul.= zonecfg -z zone2 zonecfg:zone2 info Could not stat PAGER nonsense: No such file or directory zonename: zone2 zonepath: /tank/zones/zone2 brand: ipkg autoboot: false bootargs: pool: limitpriv: scheduling-class: ip-type: shared hostid: zonecfg:zone2 help Could not stat PAGER nonsense: No such file or directory Commands: add resource-type (global scope) add property-name property-value (resource scope) Add specified resource to configuration. . ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] codereview for 6914152 (zonecfg)
On Fri, 19 Feb 2010 15:39:21 +0100, Jerry Jelinek gerald.jeli...@sun.com wrote: On 02/19/10 06:53, Frank Batschulat (Home) wrote: May I request 2 code reviewers for the changes for: 6914152 zonecfg fails when less(1M) is missing http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6914152 http://cr.opensolaris.org/~batschul/zpager/ This looks fine to me. One nit: 911 5192 The error says Could not stat PAGER. This error message might be useful to a developer but isn't that useful for a sysadmin. Can you print something more meaningful like PAGER %s does not exist Thanks Jerry, that is indeed a valid concern, I changed it to be: snip PAGER /usr/bin/nonsense does not exist (No such file or directory). snip end I included the real error string in case of permission errors where the file does indeed exist and I am now dropping the mysterious stat part. updated webrev: http://cr.opensolaris.org/~batschul/zpager/ cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Native Vs. ipkg
On Thu, 18 Feb 2010 16:45:01 +0100, Ben ben.lav...@gmail.com wrote: What are the differences between native and ipkg zones? When I did one of the Solaris 10 admin courses we learnt how zones use parts of the global-zone (for example using global-zone directories like /etc etc...). That is one of the differences, you are refering to what we called 'sparse root' zones which inherit a lof ot the system stuff like /etc , /dev, /usr via lofs loopback file system mounts. this was possible with Solaris 10 and the so called 'native' branded zones. this is different for OpenSolaris ipkg branded zones, which does only support 'whole root zones'. and of course as already mentioned, the IPS vrs SysV packaging difference. hth frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm clone -m copy does not really copy on ZFS zonepath
On Thu, 18 Feb 2010 19:58:11 +0100, Christine Tran christine.t...@gmail.com wrote: On Sat, Feb 13, 2010 at 3:10 AM, Frank Batschulat (Home) frank.batschu...@sun.com wrote: a '-x nodataset' option for 'clone' like in 'install' is unlikely going to happen, in fact I will remove the '-x nodataset' option for 'install' completely soon in OSOL build 135 [...] I have created test ipkg type zones on this laptop before, I have not done an upgrade but I've allowd Package Manager to update packages as far as it's abled. You say you will remove -x nodataset option, implying it hasn't been done yet, but here's what happened this the '-x nodataset' option does only apply to the 'native' brand (ie. Solaris 10, SX-DE) (see native(5)) - it is not available to the 'ipkg' brand nor is it available for the solaris8, solaris9 and solaris10 brands. morning when I tried to create a new zone. r...@fiat~ cat /etc/release OpenSolaris 2008.11 snv_101b_rc2 X86 Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 19 November 2008 r...@fiat~ zonecfg -z pink pink: No such zone configured Use 'create' to begin configuring a new zone. zonecfg:pink create zonecfg:pink set zonepath=/zone/pink zonecfg:pink add net zonecfg:pink:net set physical=e1000g0 zonecfg:pink:net set address=192.168.20.1/24 zonecfg:pink:net end zonecfg:pink verify zonecfg:pink commit zonecfg:pink info zonename: pink zonepath: /zone/pink brand: ipkg autoboot: false bootargs: pool: limitpriv: scheduling-class: ip-type: shared net: address: 192.168.20.1/24 physical: e1000g0 defrouter not specified zonecfg:pink exit r...@fiat~ zoneadm -z pink install -x nodataset Error: no zonepath dataset. thats one of the problems that may arise, since the '-x nodataset' option is really handled inside zoneadm.c:install_func() and not in the brand specific code that is executed later. zoneadm did honoured this option, but the 'ipkg' brand specific code that will be executed _after_ zoneadm.c:install_func() barfs. OK, I will create a dataset: r...@fiat~ zfs list NAME USED AVAIL REFER MOUNTPOINT rpool26.4G 71.5G72K /rpool rpool/ROOT 19.8G 71.5G18K legacy rpool/ROOT/opensolaris 19.8G 71.5G 19.6G / rpool/dump 1.97G 71.5G 1.97G - rpool/export 2.70G 71.5G19K /export rpool/export/home2.70G 71.5G19K /export/home rpool/export/home/ctran 2.70G 71.5G 2.70G /export/home/ctran rpool/swap 1.97G 73.5G 3.81M - r...@fiat~ zfs create rpool/pink r...@fiat~ zfs set mountpoint=/zone/pink rpool/pink r...@fiat~ zfs list NAME USED AVAIL REFER MOUNTPOINT rpool26.4G 71.5G74K /rpool rpool/ROOT 19.8G 71.5G18K legacy rpool/ROOT/opensolaris 19.8G 71.5G 19.6G / rpool/dump 1.97G 71.5G 1.97G - rpool/export 2.70G 71.5G19K /export rpool/export/home2.70G 71.5G19K /export/home rpool/export/home/ctran 2.70G 71.5G 2.70G /export/home/ctran rpool/pink 18K 71.5G18K /zone/pink rpool/swap 1.97G 73.5G 3.81M - Try to install again r...@fiat~ zoneadm -z pink uninstall Are you sure you want to uninstall zone pink (y/[n])? y cannot open 'rpool/pink/ROOT': dataset does not exist Error: no active dataset. cannot open 'rpool/pink/ROOT': dataset does not exist cannot open 'rpool/pink/ROOT': dataset does not exist cannot open 'rpool/pink/ROOT': dataset does not exist Error: destroying ZFS dataset. more follow up problems that result from the fact that zoneadm honoured the '-x nodataset' option when it really must not. --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm hangs after repeated boot/halt use
On Sat, 23 Jan 2010 18:47:45 +0100, Frank Batschulat (Home) frank.batschu...@sun.com wrote: so lets wait for build 132...I'll also look at your dump from your test system on monday, but I suspect it'll be the same IP panic... Hey Glenn, so I finally managed testing latest ISC with this stress test on build 132. I did not encounter any zones problems! 1st run was about 30 hours, then hung, used up all mem 2nd run was about 15 hours, then hung, used up all mem I filed the following bug for this: 6926454 build 132: 6GB physmem vanished into thin air... cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm clone -m copy does not really copy on ZFS zonepath
On Fri, 12 Feb 2010 23:47:36 +0100, Christine Tran christine.t...@gmail.com wrote: Hi, I'm sorry to bug the OpenSolaris for a question that pertains to S10U8, but I am really stuck. I am doing a zoneadm clone -m copy, and I do not want a new ZFS dataset even though my zonepath is on a ZFS filesystem, for performance reasons particular to how I am using my zones. Unfortunately, zoneadm clone just ignores the -m copy, and makes me a new ZFS filesystem anyway; and by the speed with which it finished, it certainly is a snapshot operation underneath. I have tested with making the source zone on a separate UFS, have pre-made a dirname under my ZFS filesystem as the zonepath, nothing works. I always get a new ZFS filesystem. I see that zoneadm install has an -x nodataset switch, I need this for zone clone as well. I have not seen this filed as a bug against S10, is there a work-around to get the behavior I want? This is sort of a big deal for our application. We use labeled zones, a file move within a filesystem has a different performance profile than a move from one filesystem to another filesystem, even within one ZFS pool. We are doing tens of thousands of move per minute. Christine, the '-m copy' option to 'clone' does not imply that no new zfs dataset is created. snip clone [-m copy] [-s zfs_snapshot] source_zone Install a zone by copying an existing installed zone. This subcommand is an alternative way to install the zone. -m copy Force the clone to be a copy, even if a ZFS clone is possible. snip end it changes the method of clone to use 'find/cpio' http://src.opensolaris.org/source/xref/pkg/on_ips/usr/src/cmd/zoneadm/zoneadm.c#copy_zone instead of doing it with a zfs snapshot: http://src.opensolaris.org/source/xref/pkg/on_ips/usr/src/cmd/zoneadm/zfs.c#clone_zfs however, it does as well always create a new zfs dataset, this is intended. http://src.opensolaris.org/source/xref/pkg/on_ips/usr/src/cmd/zoneadm/zoneadm.c#clone_copy a '-x nodataset' option for 'clone' like in 'install' is unlikely going to happen, in fact I will remove the '-x nodataset' option for 'install' completely soon in OSOL build 135 PSARC 2010/008 Remove zoneadm install sub-option -x nodataset http://opensolaris.org/jive/thread.jspa?messageID=448598 your ZFS problem is with 'move' ie. rename a file from one dataset to another while both datasets are still in the same pool ending up as a copy of the file because it crosses dataset ie. file system boundaries. there's a ZFS RFE open to improve that: 6483179 Provide an efficient way to rename a file to another dataset in same zpool http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6483179 6650426 RFE: support link(2) between ZFS filesystems http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6650426 --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] upgrading and zones
On Sat, 06 Feb 2010 10:30:18 +0100, dick hoogendijk d...@nagual.nl wrote: Going from OpenSolaris b131 to b132.. Am I correct in this procedure: [1] pkg image-update (with zones attached) [2] after reboot to the new BE detach the zones [3] zoneadm attach -u zones precisely OR, do I have to detach -BEFORE- the pkg image -update? _after_ 'pkg image update' and _after_ the following reboot. that way, a new BE for the zone is created as well during 'pkg image update' corresponding to the new GZ BE. like this (done yesterday): osoldev.batschul./export/home/batschul.= zoneadm list -cp 0:global:running:/::ipkg:shared -:zone2:installed:/tank/zones/zone2:04d7381c-9216-eba9-a490-d7c667c5850d:ipkg:shared osoldev.batschul./export/home/batschul.= zfs list -t all ... rpool/ROOT/opensolaris-129 33.9M 193G 10.5G / rpool/ROOT/opensolaris-130 38.8M 193G 11.3G / rpool/ROOT/opensolaris-131 22.1M 193G 11.6G / rpool/ROOT/opensolaris-132 18.1G 193G 11.7G /tmp/tmp7MMWZf rpool/ROOT/opensolaris-...@install 1.72G - 3.98G - rpool/ROOT/opensolaris-...@2009-12-25-08:57:20 1.28G - 10.5G - rpool/ROOT/opensolaris-...@2010-01-23-12:56:07 1.44G - 11.3G - rpool/ROOT/opensolaris-...@2010-02-05-21:30:12972M - 11.6G - ... tank/zones/zone2 981M 183G36K /tank/zones/zone2 tank/zones/zone2/ROOT 981M 183G 31.5K legacy tank/zones/zone2/ROOT/zbe52.5K 183G 981M legacy tank/zones/zone2/ROOT/zbe-1 981M 183G 981M legacy tank/zones/zone2/ROOT/zb...@2010-02-05-21:30:17 121K - 981M - ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] OpenSolaris zone migration
On Thu, 04 Feb 2010 01:11:19 +0100, Ted Ward thomas.w...@sun.com wrote: I am trying to migrate a zone on OpenSolaris from one identical system to another. It's going from x86 to sparc, but even when going from x86 to x86 I get the same error. Here's the build of both systems SunOS hostname 5.11 snv_111b i86pc i386 i86pc Solaris (source system) SunOS hostname 5.11 snv_111b sun4u sparc SUNW,Sun-Blade-100 Solaris (target system) After creating the zone on zfs per expectations, I detach it it and get the typical directory you would expect: # ls SUNWdetached.xml dev root I then run the following command to migrate the zone: zfs send rpool/tedz...@migrate | ssh u...@hostname pfexec /usr/sbin/zfs receive -F rpool/tedz...@migrate Everything looks good at that point. The zfs file system is mounted at rpool/tedzone automatically, and so I create a zone configuration to match that. However, when I run the attach I get the following error message: zoneadm -z tedzone attach cannot open 'rpool/tedzone/ROOT': dataset does not exist ERROR: The -a, -d or -r option is required when there is no active root dataset The funny thing here is that the zfs list on the source system doesn't mention this zfs file system: rpool/tedzone 242M 64.1G 22.5K /tedzone rpool/tedzone/ROOT 242M 64.1G19K legacy rpool/tedzone/ROOT/zbe 242M 64.1G 242M /tedzone/root so if the above output is from the source system it does list the 'rpool/tedzone/ROOT ' dataset, right ? I'm not sure why you claim it does not ? is the above zfs list from the source system before doing the zoneadm detach or after ? can you show the zfs list output from the target system after the zfs receive before zoneadm attach ? I believe in the build 111b (2009.06) you are using, when a zone is halted/detached the corresponding zfs datasets where not visible in the GZ, and we later changed that (i'm not sure if that was before or after 111b/2009.06). --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Downgrading zones on Opensolaris 2009.x ( b131)
On Tue, 26 Jan 2010 11:03:11 +0100, Dick Hoogendijk d...@nagual.nl wrote: Op 25-1-2010 12:30, Paul van der Zwan schr Unfortunately I am running into bug 6912829 ( causes panic on zoneadm halt ) quite often. Do or don't zones work correctly on OpenSolaris-b131? if you are using exclusive IP stack zones I'd suggest staying away from build 131 and waht for build 132 which has the fixes for: 6912829 panic in ipsq_xopq_mp_cleanup/RD due to NULL ill-ill_wq on lo0 during zone shutdown/reboot 6917808 Panic on exclusive IP zone shutdown in ipsq_current_finish 6917809 Exclusive IP zone shutdown causes assertion failure in ire_inactive so far to my knowledge and testing this does not affect shared IP stack zones. at least not on my stystem. --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm hangs after repeated boot/halt use
On Mon, 18 Jan 2010 16:33:38 +0100, Frank Batschulat (Home) frank.batschu...@sun.com wrote: the bad news is, I'm not getting the dumps, sigh. this is due to bug: 6911155 kernel dump fails if panic happens in interrupt service routine which is fixed in build 131. So I will persue this further once OSOL_131 has been released and this system has been upgraded. I finally will have dumps by then. Hey Glenn, latest news - I'm loosing my patience, this isn't fun anymore :( so I had the ISC now running on this test system that reproduced the hangs with OSOL_131. however I can't do more for you here, halting the ISC zone (which has an exclusive IP stack, etherstubs and vnics) immediately panics the box due to bug: 6912829 panic in ipsq_xopq_mp_cleanup/RD due to NULL ill-ill_wq on lo0 during zone shutdown/reboot introduced in build 131 and fixed in build 132. to wit, I'm still not getting a dump either! alhtough 131 contains the fix for above mentioned bug 6911155. instead of getting a dump timeout (as fixed by 6911155) I'm now getting 0% done, dump failed, error 5 (ie. EIO). so lets wait for build 132...I'll also look at your dump from your test system on monday, but I suspect it'll be the same IP panic... see you again in 2..4 weeks. cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zcons module failing modunload ?
On Thu, 21 Jan 2010 20:25:25 +0100, Edward Pilatowicz edward.pilatow...@sun.com wrote: well, the module is indeed unloadable: ---8--- edp{ro...@mcescher$ modinfo | grep zcons edp{ro...@mcescher$ modload -p drv/zcons edp{ro...@mcescher$ modinfo | grep zcons 301 f880a5b0 1ad0 164 1 zcons (Zone console driver) edp{ro...@mcescher$ modunload -i 301 edp{ro...@mcescher$ modinfo | grep zcons edp{ro...@mcescher$ ---8--- well, it is not ;-) osoldev.batschul./export/home/batschul.= modinfo|grep zcons osoldev.batschul./export/home/batschul.= pfexec modload -p drv/zcons osoldev.batschul./export/home/batschul.= modinfo | grep zcons 275 f875f000 1ad0 0 1 zcons (Zone console driver) osoldev.batschul./export/home/batschul.= pfexec modunload -i 275 can't unload the module: Device busy osoldev.batschul./export/home/batschul.= pfexec modunload -i 275 can't unload the module: Device busy osoldev.batschul./export/home/batschul.= pfexec modunload -i 275 can't unload the module: Device busy really interesting though - I'm using osol_130/x86. --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] why are zone datasets mounted when no zone is running ?
On Fri, 22 Jan 2010 12:57:09 +0100, Frank Batschulat (Home) frank.batschu...@sun.com wrote: Hiya, I observed that zone datasets are mounted even though no zones are running. this strikes me like a bug ? aren't they supposed to be mounted only when the zone boots ? example from build 130: osoldev.root./export/home/batschul.= zoneadm list -cp 0:global:running:/::ipkg:shared -:zone2:installed:/tank/zones/zone2:8b538910-6026-4342-b342-e7c69c2c14e8:ipkg:shared looks whats actually mounted right now: osoldev.batschul./export/home/batschul.= mount -v|grep zone tank/zones on /tank/zones type zfs read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=21000e on Fri Jan 22 09:02:17 2010 tank/zones/zone2 on /tank/zones/zone2 type zfs read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=21000f on Fri Jan 22 09:02:17 2010 tank/zones/zone2/ROOT/zbe on /tank/zones/zone2/root type zfs read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=210010 on Fri Jan 22 09:02:22 2010 df confirms: osoldev.root./export/home/batschul.= df -lkah Filesystem size used avail capacity Mounted on tank/zones/zone2 228G24K 203G 1%/tank/zones/zone2 tank/zones/zone2/ROOT/zbe 228G 511M 203G 1%/tank/zones/zone2/root osoldev.root./export/home/batschul.= zfs list -t all NAME USED AVAIL REFER MOUNTPOINT tank/zones/zone2 511M 203G24K /tank/zones/zone2 tank/zones/zone2/ROOT511M 203G21K legacy tank/zones/zone2/ROOT/zbe511M 203G 511M legacy inspecting the 'canmount' zfs property gives a hint: osoldev.root./export/home/batschul.= zfs get canmount tank/zones/zone2 NAME PROPERTY VALUE SOURCE tank/zones/zone2 canmount ondefault osoldev.root./export/home/batschul.= zfs get canmount tank/zones/zone2/ROOT NAME PROPERTY VALUE SOURCE tank/zones/zone2/ROOT canmount ondefault osoldev.root./export/home/batschul.= zfs get canmount tank/zones/zone2/ROOT/zbe NAME PROPERTY VALUE SOURCE tank/zones/zone2/ROOT/zbe canmount noautolocal so the 'zonepath' dataset has 'canmount=on' and is thus mounted by zfs mount -a, shouldn't that be 'canmount=noauto' ? the 'zonepath/ROOT' dataset has the same. even more interesting is that the 'zonepath/ROOT/zbe' apparently has the proper 'canmount=noauto' - yet it is mounted as well after boot. am I missing something obvious ? for the 'zonepath/ROOT' dataset it's also interesting that 'canmount' is set to 'on' as the real systems /ROOT does have it turned off: osoldev.batschul./export/home/batschul.= pfexec zfs get canmount NAMEPROPERTY VALUE SOURCE rpool canmount ondefault rpool/ROOT canmount off local rpool/ROOT/opensolaris-129 canmount noautolocal rpool/ROOT/opensolaris-130 canmount noautolocal rpool/ROOT/opensolaris-...@install canmount - - rpool/ROOT/opensolaris-...@2009-12-25-08:57:20 canmount - - fwiw, there exists a releated bug to LiveUpdgrade that it should not set 'canmount' to 'on' in the new BE for the /ROOT dataset: 6747122 lucreate should not set canmount to on for zfs root dataset --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zones patching issues using attach -u
On Fri, 22 Jan 2010 18:30:39 +0100, Gael gael.marti...@gmail.com wrote: This is bug: 6857294 zoneadm attach leads to partially installed packages I believe a T patch might be available for the S10 SVr4 packaging code if you need it, but I see that the fix has not yet been integrated into the nv SVr4 packaging code. It is scheduled for b124. Was that fix ever released ? Yes, Solaris 10U9 will have it, ONNV/OSOL build 125 has it. and the following Solaris 10 patches have been released offically containing the fix for 6857294 119254-72 (sparc) 119255-72 (x86) (3 days ago) --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm hangs after repeated boot/halt use
On Wed, 23 Dec 2009 15:26:17 +0100, Glenn Brunette glenn.brune...@sun.com wrote: Just verified that something is still wrong in b129, but the problem is _not_ with a vanilla configuration. This time around boot/halt #102, the system apparently shutdown/panic'ed? I was running it overnight and came in to a system that had been rebooted. I did not see any problem in the audit log nor in /var/adm/messages. Any pointers? I am running an Immutable Service Container configuration, based upon the installation steps at: http://kenai.com/projects/isc/pages/OpenSolaris Specifically: pfexec pkg install SUNWmercurial hg clone https://kenai.com/hg/isc~source isc pfexec isc/bin/iscadm.ksh -N 0 pfexec bootadm update-archive pfexec shutdown -g 0 -i 0 -y [after reboot] zlogin -C isc1 [wait for zone isc1 to fully complete boot process] then run the script that I provided that stops and starts the zone. Apparently, there must be something wrong with the interaction of components. In this configuration, we have things like resource controls, auditing, IP Filter/IP NAT, and zones all enabled. Would it be possible for you to try the steps above on a fresh install of 2009.06 or later (b129 is where I am right now). Also, if you have other debugging methods, please let me know. hey Glenn, the good news is that I have an OSOL_130 system with ISC installed as described below that reliably reproduces _something_. That something being the system completely hung when run your script: batsc...@osol:~# while : ; do echo `date`:ZONE BOOT; pfexec zoneadm -z isc1 boot; sleep 10; echo `date`: ZONE HALT; pfexec zoneadm -z isc1 halt; sleep 10; done Note, sleep 30 didn't do it, 17 hours running without an issue, however changing this to sleep 10, I can reliably hang the system usually within 5 hours. no remote access possibly anymore and even local console doesn't do it anymore. F1-A taking a dump when booted into kmdb however works. the bad news is, I'm not getting the dumps, sigh. this is due to bug: 6911155 kernel dump fails if panic happens in interrupt service routine which is fixed in build 131. So I will persue this further once OSOL_131 has been released and this system has been upgraded. I finally will have dumps by then. I'll also contact you offline how you can setup your systems to capture crash dumps and anything else we might need. cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zones on shared storage - a warning
On Fri, 08 Jan 2010 18:33:06 +0100, Mike Gerdts mger...@gmail.com wrote: I've written a dtrace script to get the checksums on Solaris 10. Here's what I see with NFSv3 on Solaris 10. jfyi, I've reproduces it as well using a Solaris 10 Update 8 SB2000 sparc client and NFSv4. much like you I also get READ errors along with the CKSUM errors which is different from my observation on a ONNV client. unfortunately your dtrace script did not worked for me, ie. it did not spit out anything :( cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Zones on shared storage - a warning
On Wed, 23 Dec 2009 03:02:47 +0100, Mike Gerdts mger...@gmail.com wrote: I've been playing around with zones on NFS a bit and have run into what looks to be a pretty bad snag - ZFS keeps seeing read and/or checksum errors. This exists with S10u8 and OpenSolaris dev build snv_129. This is likely a blocker for anything thinking of implementing parts of Ed's Zones on Shared Storage: http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss The OpenSolaris example appears below. The order of events is: 1) Create a file on NFS, turn it into a zpool 2) Configure a zone with the pool as zonepath 3) Install the zone, verify that the pool is healthy 4) Boot the zone, observe that the pool is sick [...] r...@soltrain19# zoneadm -z osol boot r...@soltrain19# zpool status osol pool: osol state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM osol DEGRADED 0 0 0 /mnt/osolzone/root DEGRADED 0 0 117 too many errors errors: No known data errors Hey Mike, you're not the only victim of these strange CHKSUM errors, I hit the same during my slightely different testing, where I'm NFS mounting an entire, pre-existing remote file living in the zpool on the NFS server and use that to create a zpool and install zones into it. I've filed today: 6915265 zpools on files (over NFS) accumulate CKSUM errors with no apparent reason here's the relevant piece worth investigating out of it (leaving out the actual setup etc..) as in your case, creating the zpool and installing the zone into it still gives a healthy zpool, but immediately after booting the zone, the zpool served over NFS accumulated CHKSUM errors. of particular interest are the 'cksum_actual' values as reported by Mike for his test case here: http://www.mail-archive.com/zfs-disc...@opensolaris.org/msg33041.html if compared to the 'chksum_actual' values I got in the fmdump error output on my test case/system: note, the NFS servers zpool that is serving and sharing the file we use is healthy. zone halted now on my test system, and checking fmdump: osoldev.batschul./export/home/batschul.= fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100 0x7cd81ca72df5ccc0 2cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74 0x3d2827dd7ee4f21 6cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300 0x983ddbb8c4590e40 *A 6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 0x89715e34fbf9cdc0 *B 7cksum_actual = 0x0 0x0 0x0 0x0 *C 11cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 *D 14cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 *E 17cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 *F 20cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 0x7f84b11b3fc7f80 *G 25cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 0x82804bc6ebcfc0 osoldev.root./export/home/batschul.= zpool status -v pool: nfszone state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAMESTATE READ WRITE CKSUM nfszone DEGRADED 0 0 0 /nfszone DEGRADED 0 0 462 too many errors errors: No known data errors == now compare this with Mike's error output as posted here: http://www.mail-archive.com/zfs-disc...@opensolaris.org/msg33041.html # fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 0x290cbce13fc59dce *D 3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 *E 3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 *B 4cksum_actual = 0x0 0x0 0x0 0x0 4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 0x330107da7c4bcec0 5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 0x4e0b3a8747b8a8 *C 6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 *A 6
Re: [zones-discuss] [zfs-discuss] Zones on shared storage - a warning
On Fri, 08 Jan 2010 13:55:13 +0100, Darren J Moffat darr...@opensolaris.org wrote: Frank Batschulat (Home) wrote: This just can't be an accident, there must be some coincidence and thus there's a good chance that these CHKSUM errors must have a common source, either in ZFS or in NFS ? What are you using for on the wire protection with NFS ? Is it shared using krb5i or do you have IPsec configured ? If not I'd recommend trying one of those and see if your symptoms change. Hey Darren, doing krb5i is certainly a good idea for additional protection in general, however I have some doubts that NFS OTW corruption will produce the exact same wrong checksum inside 2 totally different setups and networks, as comparing Mike and my results showed [see 1]. cheers frankB [1] osoldev.batschul./export/home/batschul.= fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100 0x7cd81ca72df5ccc0 2cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74 0x3d2827dd7ee4f21 6cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300 0x983ddbb8c4590e40 *A 6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 0x89715e34fbf9cdc0 *B 7cksum_actual = 0x0 0x0 0x0 0x0 *C 11cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 *D 14cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 *E 17cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 *F 20cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 0x7f84b11b3fc7f80 *G 25cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 0x82804bc6ebcfc0 == now compare this with Mike's error output as posted here: http://www.mail-archive.com/zfs-disc...@opensolaris.org/msg33041.html # fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 0x290cbce13fc59dce *D 3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 *E 3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 *B 4cksum_actual = 0x0 0x0 0x0 0x0 4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 0x330107da7c4bcec0 5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 0x4e0b3a8747b8a8 *C 6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 *A 6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 0x89715e34fbf9cdc0 *F 16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 0x7f84b11b3fc7f80 *G 48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 0x82804bc6ebcfc0 and observe that the values in 'chksum_actual' causing our CHKSUM pool errors eventually because of missmatching with what had been expected are the SAME ! for 2 totally different client systems and 2 different NFS servers (mine vrs. Mike's), see the entries marked with *A to *G. ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm hangs after repeated boot/halt use
On Wed, 06 Jan 2010 03:28:35 +0100, Glenn Brunette glenn.brune...@sun.com wrote: Just an update. I am still able to get the repeatable hangs, but I am still not able to generate a dump. If anyone has any further ideas as to how to troubleshoot this please let me know! Hey Glenn, you are not fogotten, I shall find some time now to reproduce this with the additional setup you've described now after the winterbreak. It's still on my agenda, but a couple of panics crossed my schedules over the last week. cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Add user in a zone - best practice
On Thu, 31 Dec 2009 09:02:33 +0100, Jim Klimov jimkli...@cos.ru wrote: While I have seen many warnings explicitly noting that an NFS server should never be its own client (including sharing global shares to local zones), I confess I have failed to find any specific grounds for that. simplified reasons are deadlocks between NFS, the VM subsystem and the underlaying UFS file system. I'm not so sure that still holds true in the sam way if ZFS is the underlaying file system, though I'm not aware of any real stress test of such configurations. --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Installing a specific dev release into a zone?
On Sun, 27 Dec 2009 04:55:21 +0100, Tristan Ball tristan.b...@leica-microsystems.com wrote: I've got a Opensolaris snv_129 VM with which I'm playing around with zones.. Initially everything was fine, however since my initial install the opensolaris dev repository has updated to release 130, and now I can't install new zones, I get: t...@osol-test:~# zoneadm -z z1 install A ZFS file system has been created for this zone. Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/ ). Image: Preparing at /data/zones/z1/root. Cache: Using /var/pkg/download. Sanity Check: Looking for 'entire' incorporation. ERROR: Unable to locate the incorporation 'ent...@0.5.11,5.11-0.129:20091205T134302Z' in the preferred publisher 'opensolaris.org'. Use -P to supply a publisher which contains this package. It looks like the zone install process only gathers information on the latest version of each package? t...@osol-test:~# pkg -R /data/zones/z1/root list -a |grep -i entire entire0.5.11-0.130known - Is there some way to tell zoneadm install to install a specific dev release, in this case 129? Actually, it looks like zoneadm is trying to maybe there is a specific repo for dev build versions, but I'm not aware of any, perhaps you may want to ask on the IPS or on-discuss alias for it install that release, but the defaults on the pkg tools it uses to do the install is hiding the availability of 129 on the dev package servers. Is there a way around this? I realise that I could upgrade my global zone to 130, but I'd really like to have the option of picking a given release and sticking with it for longer than the dev version release cycles! the non-global zone runs the native OS of the global zone (except for branded zones like solaris10(5) branded zone) which is in your case 129. however since your publisher defaults to the dev repo, zone install will pick up 130 which is current for the dev repo. so you have to update the global zone to run 130 before you can run 130 in the on-global zone. for possible issues with 130, refer to: http://opensolaris.org/jive/thread.jspa?threadID=120631 hth frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Webrev for CR 6909222
On Tue, 22 Dec 2009 00:46:00 +0100, Jordan Vaughan jordan.vaug...@sun.com wrote: I need someone to review my fix for 6909222 reboot of system upgraded from 128 to build 129 generated error from an s10 zone due to boot-archive My webrev is accessible via http://cr.opensolaris.org/~flippedb/onnv-s10c Jordan, we probably should update the s10container dev guide to point out that we remove $ZONEROOT/boot/solaris/bin/create_ramdisk and essentially disable bootarchive update within the s10 branded zone ? http://hub.opensolaris.org/bin/view/Community+Group+zones/s10brand_dev_guide there may be ISVs/OEMs that potentially add/change stuff there ? cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Webrev for CR 6782448
On Sat, 19 Dec 2009 04:28:52 +0100, Jordan Vaughan jordan.vaug...@sun.com wrote: I expanded my webrev to include my fix for 6910339 zonecfg coredumps with badly formed 'select net defrouter' I need someone to review my changes. The webrev is still accessible via http://cr.opensolaris.org/~flippedb/onnv-zone2 Hey Jordan looks good to me modulo this in zonecfg_lookup_nwif() size_t addrspec;/* nonzero if tabptr has IP addr */ size_t physspec;/* nonzero if tabptr has interface */ +size_t defrouterspec; /* nonzero if tabptr has def. router */ if (tabptr == NULL) return (Z_INVAL); + * zone_nwif_address, zone_nwif_physical, and zone_nwif_defrouter are + * arrays, so no NULL checks are necessary. */ addrspec = strlen(tabptr-zone_nwif_address); physspec = strlen(tabptr-zone_nwif_physical); -assert(addrspec 0 || physspec 0); +defrouterspec = strlen(tabptr-zone_nwif_defrouter); +assert(addrspec != 0 || physspec != 0 || defrouterspec != 0); so we do consider any of them being 0 a fault given the assert(), fine, but yet we do check for this again inside the loop: +if (physspec != 0 (fetchprop(cur, DTD_ATTR_PHYSICAL, +physical, sizeof (physical)) != Z_OK || +strcmp(tabptr-zone_nwif_physical, physical) != 0)) +continue; +if (addrspec != 0 (fetchprop(cur, DTD_ATTR_ADDRESS, address, +sizeof (address)) != Z_OK || +!zonecfg_same_net_address(tabptr-zone_nwif_address, +address))) +continue; +if (defrouterspec != 0 (fetchprop(cur, DTD_ATTR_DEFROUTER, +address, sizeof (address)) != Z_OK || +!zonecfg_same_net_address(tabptr-zone_nwif_defrouter, +address))) +continue; a good argument could probably be made to turn this assert into a real check and return Z_INVAL for any of those 3 being 0 and get rid of the checks inside the xml parsing loop ? cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Webrev for CR 6782448
On Tue, 22 Dec 2009 14:55:34 +0100, Frank Batschulat (Home) frank.batschu...@sun.com wrote: a good argument could probably be made to turn this assert into a real check and return Z_INVAL for any of those 3 being 0 and get rid of the checks inside the xml parsing loop ? probably rather Z_INSUFFICIENT_SPEC then Z_INVAL though. ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Webrev for CR 6909222
On Tue, 22 Dec 2009 00:46:00 +0100, Jordan Vaughan jordan.vaug...@sun.com wrote: I need someone to review my fix for 6909222 reboot of system upgraded from 128 to build 129 generated error from an s10 zone due to boot-archive My webrev is accessible via http://cr.opensolaris.org/~flippedb/onnv-s10c Jordan, looks good to me. what about /usr/lib/brand/ipkg/p2v and perhaps /usr/lib/brand/ipkg/pkgcreatezone for the ipkg brand ? and usr/src/lib/brand/native/zone/p2v.ksh and usr/src/lib/brand/native/zone/image_install.ksh for the native brand ? I'd assume that in the future running an s10u9 update for an s10u8 branded zone, could that potentially put back the ' /boot/solaris/bin/create_ramdisk' script but that'd be taken care of by the s10_boot.ksh then. cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Webrev for CR 6782448
On Wed, 23 Dec 2009 01:34:59 +0100, Jordan Vaughan jordan.vaug...@sun.com wrote: http://cr.opensolaris.org/~flippedb/onnv-zone2 [...] zone_lookup_nwif() needs the three loop checks. I regenerated the webrev. You'll notice that the assertion was replaced by a check that returns Z_INSUFFICIENT_SPEC. Hey Jordan, thanks for the exhaustive reply. understood. I was ignoring the fact that without these checks the xml parsing loop would generate false alarm for such conditions: net: address: 10.5.234.15/24 physical: bge0 defrouter not specified zonecfg:mojo select net address=10.5.234.15/24 select net: No such resource with that id lgtm! cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review for 6911329
On Thu, 17 Dec 2009 23:16:19 +0100, Dan Price d...@eng.sun.com wrote: On Thu, Dec 17, 2009 at 07:17:50PM +0100, Frank Batschulat (Home) wrote: May I have 2 code reviewers for: 6911329 Incorrect code in kstat_delete causes panic http://cr.opensolaris.org/~batschul/onnvkstat/ Description A colleague was looking into a crash and the reason turned out to be a NULL pointer dereference in kstat_delete(): kstat_delete(kstat_t *ksp) { kmutex_t *lp; ekstat_t *e = (ekstat_t *)ksp; zoneid_t zoneid = e-e_zone.zoneid; kstat_zone_t *kz; if (ksp == NULL) return; Note that there is a dereference of 'ksp' [via 'e'] before the check for ksp being NULL. unfortunately we don't have a dump/stacktrace anymore to inspect who called kstat_delete(NULL) and why. Do we really think that ksp being NULL is a invalid condition? Yes, I think we do. kstat_create() offically and documented returns NULL in the error case. ie. the usual sequence for a user would be ksp = kstat_create() if (ksp != NULL) kstat_install() kstat_delete(9F) PARAMETERS ksp Pointer to a currently installed kstat(9S) structure. If it's invalid, then why not add an assertion, so we can root-cause. absolutely ! webrev updated: http://cr.opensolaris.org/~batschul/onnvkstat/ Or has this if (ksp == NULL) been there forever and ever and there are drivers abusing it? nope, it got introduced when Jeff re-wrote the kstat framework in Solaris 9 via: 4460914 kstat implementation has escaped and doesn't scale I see a bunch of cmn_err's in kstat_create-- are there log files from the machine which might indicate that there was a kstat_create which returned NULL? unfortunately I do have nothing at all. However what we do know is that the inital kstat_create() can't have returned NULL, because in that case the following kstat_install() would've already paniced in face of the NULL ksp due to the various dereferences it does on ksp. this implies 2 things: 1) a bug in the kstat framework itself 2) a kstat user calling kstat_delete() twice or with a NULL ksp. I'll run tests with the usual big kstat consumers like Zones, UFS, ZFS and NFS with the debug bits for testing, perhaps I can uncover an offender. thanks frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] code review for 6911329
May I have 2 code reviewers for: 6911329 Incorrect code in kstat_delete causes panic http://cr.opensolaris.org/~batschul/onnvkstat/ Description A colleague was looking into a crash and the reason turned out to be a NULL pointer dereference in kstat_delete(): kstat_delete(kstat_t *ksp) { kmutex_t *lp; ekstat_t *e = (ekstat_t *)ksp; zoneid_t zoneid = e-e_zone.zoneid; kstat_zone_t *kz; if (ksp == NULL) return; Note that there is a dereference of 'ksp' [via 'e'] before the check for ksp being NULL. unfortunately we don't have a dump/stacktrace anymore to inspect who called kstat_delete(NULL) and why. thanks frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review for 6495558
Hey Ed, Steve, Jordan, Jerry, I got it in writing from Veritas Engineering that they do not have any heartburn over using fsck -o p on VxFS and inside the zone and also by testing in the lab I confirmed it behaves as expected and similar to UFS: snip end # uname -a SunOS lab234 5.10 Generic_139555-08 sun4u sparc sun4u # pkginfo -l VRTSvxfs PKGINST: VRTSvxfs NAME: VERITAS File System CATEGORY: system,utilities ARCH: sparc VERSION: 5.0,REV=5.0A55_sol # fsck -F vxfs -o p /dev/rdsk/c1t14d0s0 /dev/rdsk/c1t14d0s0:file system is clean - log replay is not required snip end here's the new webrev for your consideration: http://cr.opensolaris.org/~batschul/onnv-vplat/ thanks! frankB On Tue, 15 Dec 2009 08:37:49 +0100, Frank Batschulat (Home) frank.batschu...@sun.com wrote: valid point, Ed! ignoring the minor detail that my fix should really do 'fsck -o p (new webrev is in progress, thanks Steve for catching my ignorance) in fact -o p is documented in the generic fsck(1M) man page. snip fsck(1M) -o specific-options p Check and fix the file system non-interactively (preen). Exit immediately if there is a problem requiring intervention. This option is required to enable parallel file system checking. snip end and VxFS does support it as well, and it has the same net effect as on UFS, a log reply without operator intervention: http://sfdoccentral.symantec.com/sf/5.0MP3/solaris/manpages/vxfs/man1m/fsck_vxfs.html snip p Allows parallel log replay for several VxFS file systems. Each message from fsck is prefixed with the device name to identify the device. This suboption does not perform a full file system check in parallel; that is still done sequentially on each device, even when multiple devices are specified. This option is compatible only with the -y|Y option (that is, non-interactive full file system check), in which case a log replay is done in parallel on all specified devices. A sequential full file system check is performed on devices where needed. snip end however the part compatible only with the -y|Y option sounds a bit ambiguous to me so I pinged a friend as VRTS to clarify this for me. worst case would be to add code differentiating between vxfs and ufs here. I'll be back once I have the confirmation. thanks! frankB On Tue, 15 Dec 2009 00:37:52 +0100, Edward Pilatowicz edward.pilatow...@sun.com wrote: so just one question. the '-p' preen option is only documented in the fsck_ufs(1m) man page, and not in fsck(1m). so i'm wondering is are there zones which may be installed on other filesystems which supply an fsck utility which may not support the preen option? (or perhas '-p is defined as something else for those versions of fsck?) specifically vxfs comes to mind since i know that some s10 deployments use that. ed On Fri, Dec 11, 2009 at 02:24:49PM +0100, Frank Batschulat (Home) wrote: friends, may I request code review for the earth-shattering fix to: 6495558 zoneadm -z zone boot should not only check but repair filesystems http://cr.opensolaris.org/~batschul/onnv-vplat/ backround: Evaluation when booting a zone, zoneadm ( ie. vplat.c:dofsck() ) should perform the same tasks as the /usr/sbin/mountall script, which does a 'is suitable for mounting' (fsck -m) check first, followed by a preen fsck (fsck -p) if the former failed. the obvious quick fix would be to change the code in vplat.c:dofsck() 825 argv[0] = fsck; 826 argv[1] = -m; 827 argv[2] = (char *)rawdev; 828 argv[3] = NULL; 829 830 status = forkexec(zlogp, cmdbuf, argv); 831 if (status == 0 || status == -1) 832 return (status); 833 zerror(zlogp, B_FALSE, fsck of '%s' failed with exit status %d; 834 run fsck manually, rawdev, status); 835 return (-1); to always just run fsck in preen mode (shouldn't cause any real problem) or fork off a 2nd fsck in preen mode if the first fsck -m failed. actually the fix will be to just execute fsck in preen mode (fsck -p) rather then doing the 'is suitable for mounting' and preen fsck dance. if the former fails, the latter will have to be done anyways. the latter however kind of implies the former. thanks! -- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm hangs after repeated boot/halt use
And finally, I have had this script run on a real, OSOL build 127 box for a day now. can not reproduce it there either. So I failed to reproduce this at all using the script on: - ONNV 129 (zfs root, 1 cpu) - ONNV 126 (ufs root, 2 cpus) - OSOL 127 (zfs root, 4 cores) there must be something special that I am missing. On Wed, 16 Dec 2009 09:49:25 +0100, Frank Batschulat (Home) frank.batschu...@sun.com wrote: Glenn, I've not been able to reproduce this on onnv build 126 (it's running for a day now) if that script would reproduce 6894901 straight away it should be doing so on 126 as well (similar to what you've seen in 127) this pose the question if there are either some other details in your environment that I don't have or if that script really reliably reproduces 6894901 On Tue, 15 Dec 2009 15:23:06 +0100, Frank Batschulat (Home) frank.batschu...@sun.com wrote: Glenn, I've been running this test case now for nearly a day on build 129, could'nt reproduce at all. good chance this being indeed fixed by 6894901 in build 128. I'll also try to reproduce this now on buil 126. On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette glenn.brune...@sun.com wrote: As part of some Immutable Service Container[1] demonstration that I am creating for an event in January. I have the need to start/stop a zone quite a few times (as part of a Self-Cleansing[2] demo). During the course of my testing, I have been able to repeatedly get zoneadm to hang. Since I am working with a highly customized configuration, I started over with a default zone on OpenSolaris (b127) and was able to repeat this issue. To reproduce this problem use the following script after creating a zone usual the normal/default steps: isc...@osol-isc:~$ while : ; do echo `date`: ZONE BOOT pfexec zoneadm -z test boot sleep 30 pfexec zoneamd -z test halt echo `date`: ZONE HALT sleep 10 done This script works just fine for a while, but eventually zoneadm hangs (was at pass #90 in my last test). When this happens, zoneadm is shown to be consuming quite a bit of CPU: PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 16598 root 11M 3140K run 10 0:54:49 74% zoneadm/1 A stack trace of zoneadm shows: isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm` 16082: zoneadmd -z test - lwp# 1 - lwp# 2 feef41c6 door (0, 0, 0, 0, 0, 8) feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67 f3f3 _thrp_setup (fe5b0a00) + 9b f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0) - lwp# 3 feef420f __door_return () + 2f - lwp# 4 feef420f door (0, 0, 0, fe140e00, f5f00, a) feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f f3f3 _thrp_setup (fe5b1a00) + 9b f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0) 16598: zoneadm -z test boot feef3fc8 door (6, 80476d0, 0, 0, 0, 3) feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124 0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd 08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9 0805576d _start (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d A stack trace of zoneadmd shows: isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd` 16082: zoneadmd -z test - lwp# 1 - lwp# 2 feef41c6 door (0, 0, 0, 0, 0, 8) feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67 f3f3 _thrp_setup (fe5b0a00) + 9b f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0) - lwp# 3 feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27 feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32 08058a88 server (0, fe23f8f0, 510, 0, 0, 8058a04) + 84 feef4240 __door_return () + 60 - lwp# 4 feef420f door (0, 0, 0, fe140e00, f5f00, a) feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f f3f3 _thrp_setup (fe5b1a00) + 9b f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0) A truss of zoneadm (-f -vall -wall -tall) shows this looping: 16598: door_call(6, 0x080476D0)= 0 16598: data_ptr=8047730 data_size=0 16598: desc_ptr=0x0 desc_num=0 16598: rbuf=0x807F2D8 rsize=4096 16598: close(6)= 0 16598: mkdir(/var/run/zones, 0700) Err#17 EEXIST 16598: chmod(/var/run/zones, 0700) = 0 16598: open(/var/run/zones/test.zoneadm.lock, O_RDWR|O_CREAT, 0600) = 6 16598: fcntl(6, F_SETLKW, 0x08046DC0
Re: [zones-discuss] zoneadm hangs after repeated boot/halt use
Glenn, I've been running this test case now for nearly a day on build 129, could'nt reproduce at all. good chance this being indeed fixed by 6894901 in build 128. I'll also try to reproduce this now on buil 126. cheers frankB On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette glenn.brune...@sun.com wrote: As part of some Immutable Service Container[1] demonstration that I am creating for an event in January. I have the need to start/stop a zone quite a few times (as part of a Self-Cleansing[2] demo). During the course of my testing, I have been able to repeatedly get zoneadm to hang. Since I am working with a highly customized configuration, I started over with a default zone on OpenSolaris (b127) and was able to repeat this issue. To reproduce this problem use the following script after creating a zone usual the normal/default steps: isc...@osol-isc:~$ while : ; do echo `date`: ZONE BOOT pfexec zoneadm -z test boot sleep 30 pfexec zoneamd -z test halt echo `date`: ZONE HALT sleep 10 done This script works just fine for a while, but eventually zoneadm hangs (was at pass #90 in my last test). When this happens, zoneadm is shown to be consuming quite a bit of CPU: PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 16598 root 11M 3140K run 10 0:54:49 74% zoneadm/1 A stack trace of zoneadm shows: isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm` 16082:zoneadmd -z test - lwp# 1 - lwp# 2 feef41c6 door (0, 0, 0, 0, 0, 8) feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67 f3f3 _thrp_setup (fe5b0a00) + 9b f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0) - lwp# 3 feef420f __door_return () + 2f - lwp# 4 feef420f door (0, 0, 0, fe140e00, f5f00, a) feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f f3f3 _thrp_setup (fe5b1a00) + 9b f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0) 16598:zoneadm -z test boot feef3fc8 door (6, 80476d0, 0, 0, 0, 3) feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124 0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd 08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9 0805576d _start (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d A stack trace of zoneadmd shows: isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd` 16082:zoneadmd -z test - lwp# 1 - lwp# 2 feef41c6 door (0, 0, 0, 0, 0, 8) feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67 f3f3 _thrp_setup (fe5b0a00) + 9b f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0) - lwp# 3 feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27 feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32 08058a88 server (0, fe23f8f0, 510, 0, 0, 8058a04) + 84 feef4240 __door_return () + 60 - lwp# 4 feef420f door (0, 0, 0, fe140e00, f5f00, a) feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f f3f3 _thrp_setup (fe5b1a00) + 9b f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0) A truss of zoneadm (-f -vall -wall -tall) shows this looping: 16598: door_call(6, 0x080476D0)= 0 16598: data_ptr=8047730 data_size=0 16598: desc_ptr=0x0 desc_num=0 16598: rbuf=0x807F2D8 rsize=4096 16598: close(6)= 0 16598: mkdir(/var/run/zones, 0700) Err#17 EEXIST 16598: chmod(/var/run/zones, 0700) = 0 16598: open(/var/run/zones/test.zoneadm.lock, O_RDWR|O_CREAT, 0600) = 6 16598: fcntl(6, F_SETLKW, 0x08046DC0) = 0 16598: typ=F_WRLCK whence=SEEK_SET start=0 len=0 sys=4277003009 pid=6 16598: open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 7 16598: door_info(7, 0x08047230)= 0 16598: target=16082 proc=0x8058A04 data=0x0 16598: attributes=DOOR_UNREF|DOOR_REFUSE_DESC|DOOR_NO_CANCEL 16598: uniquifier=26426 16598: close(7)= 0 16598: close(6)= 0 16598: open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 6 16082/3:door_return(0x, 0, 0x, 0xFE23FE00, 1007360) = 0 16082/3:door_ucred(0x080A37C8) = 0 16082/3:euid=0 egid=0 16082/3:ruid=0 rgid=0 16082/3:pid=16598 zoneid=0 16082/3:E: all
Re: [zones-discuss] zoneadm hangs after repeated boot/halt use
sounds somewhat similar to 6773836 zoneadm halt or halting/rebooting a non-global zone hangs the global zone I'll try to reproduce this using your test case and see what I find. please file a bug if it's still happen with 128 and is not fixed by 6894901 as Steve suggested. cheers frankB On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette glenn.brune...@sun.com wrote: As part of some Immutable Service Container[1] demonstration that I am creating for an event in January. I have the need to start/stop a zone quite a few times (as part of a Self-Cleansing[2] demo). During the course of my testing, I have been able to repeatedly get zoneadm to hang. Since I am working with a highly customized configuration, I started over with a default zone on OpenSolaris (b127) and was able to repeat this issue. To reproduce this problem use the following script after creating a zone usual the normal/default steps: isc...@osol-isc:~$ while : ; do echo `date`: ZONE BOOT pfexec zoneadm -z test boot sleep 30 pfexec zoneamd -z test halt echo `date`: ZONE HALT sleep 10 done This script works just fine for a while, but eventually zoneadm hangs (was at pass #90 in my last test). When this happens, zoneadm is shown to be consuming quite a bit of CPU: PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 16598 root 11M 3140K run 10 0:54:49 74% zoneadm/1 A stack trace of zoneadm shows: isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm` 16082:zoneadmd -z test - lwp# 1 - lwp# 2 feef41c6 door (0, 0, 0, 0, 0, 8) feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67 f3f3 _thrp_setup (fe5b0a00) + 9b f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0) - lwp# 3 feef420f __door_return () + 2f - lwp# 4 feef420f door (0, 0, 0, fe140e00, f5f00, a) feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f f3f3 _thrp_setup (fe5b1a00) + 9b f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0) 16598:zoneadm -z test boot feef3fc8 door (6, 80476d0, 0, 0, 0, 3) feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124 0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd 08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9 0805576d _start (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d A stack trace of zoneadmd shows: isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd` 16082:zoneadmd -z test - lwp# 1 - lwp# 2 feef41c6 door (0, 0, 0, 0, 0, 8) feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67 f3f3 _thrp_setup (fe5b0a00) + 9b f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0) - lwp# 3 feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27 feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32 08058a88 server (0, fe23f8f0, 510, 0, 0, 8058a04) + 84 feef4240 __door_return () + 60 - lwp# 4 feef420f door (0, 0, 0, fe140e00, f5f00, a) feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f f3f3 _thrp_setup (fe5b1a00) + 9b f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0) A truss of zoneadm (-f -vall -wall -tall) shows this looping: 16598: door_call(6, 0x080476D0)= 0 16598: data_ptr=8047730 data_size=0 16598: desc_ptr=0x0 desc_num=0 16598: rbuf=0x807F2D8 rsize=4096 16598: close(6)= 0 16598: mkdir(/var/run/zones, 0700) Err#17 EEXIST 16598: chmod(/var/run/zones, 0700) = 0 16598: open(/var/run/zones/test.zoneadm.lock, O_RDWR|O_CREAT, 0600) = 6 16598: fcntl(6, F_SETLKW, 0x08046DC0) = 0 16598: typ=F_WRLCK whence=SEEK_SET start=0 len=0 sys=4277003009 pid=6 16598: open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 7 16598: door_info(7, 0x08047230)= 0 16598: target=16082 proc=0x8058A04 data=0x0 16598: attributes=DOOR_UNREF|DOOR_REFUSE_DESC|DOOR_NO_CANCEL 16598: uniquifier=26426 16598: close(7)= 0 16598: close(6)= 0 16598: open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 6 16082/3:door_return(0x, 0, 0x, 0xFE23FE00, 1007360) = 0 16082/3:door_ucred(0x080A37C8) = 0 16082/3:euid=0 egid=0 16082/3:ruid=0 rgid=0 16082/3:
Re: [zones-discuss] zoneadm hangs after repeated boot/halt use
On Sat, 12 Dec 2009 10:06:43 +0100, Frank Batschulat (Home) frank.batschu...@sun.com wrote: sounds somewhat similar to 6773836 zoneadm halt or halting/rebooting a non-global zone hangs the global zone wrong cut+past I did ment to say: 6734679 zoneadm halt hung during zones test I'll try to reproduce this using your test case and see what I find. please file a bug if it's still happen with 128 and is not fixed by 6894901 as Steve suggested. cheers frankB On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette glenn.brune...@sun.com wrote: As part of some Immutable Service Container[1] demonstration that I am creating for an event in January. I have the need to start/stop a zone quite a few times (as part of a Self-Cleansing[2] demo). During the course of my testing, I have been able to repeatedly get zoneadm to hang. Since I am working with a highly customized configuration, I started over with a default zone on OpenSolaris (b127) and was able to repeat this issue. To reproduce this problem use the following script after creating a zone usual the normal/default steps: isc...@osol-isc:~$ while : ; do echo `date`: ZONE BOOT pfexec zoneadm -z test boot sleep 30 pfexec zoneamd -z test halt echo `date`: ZONE HALT sleep 10 done This script works just fine for a while, but eventually zoneadm hangs (was at pass #90 in my last test). When this happens, zoneadm is shown to be consuming quite a bit of CPU: PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 16598 root 11M 3140K run 10 0:54:49 74% zoneadm/1 A stack trace of zoneadm shows: isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm` 16082: zoneadmd -z test - lwp# 1 - lwp# 2 feef41c6 door (0, 0, 0, 0, 0, 8) feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67 f3f3 _thrp_setup (fe5b0a00) + 9b f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0) - lwp# 3 feef420f __door_return () + 2f - lwp# 4 feef420f door (0, 0, 0, fe140e00, f5f00, a) feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f f3f3 _thrp_setup (fe5b1a00) + 9b f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0) 16598: zoneadm -z test boot feef3fc8 door (6, 80476d0, 0, 0, 0, 3) feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124 0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd 08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9 0805576d _start (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d A stack trace of zoneadmd shows: isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd` 16082: zoneadmd -z test - lwp# 1 - lwp# 2 feef41c6 door (0, 0, 0, 0, 0, 8) feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67 f3f3 _thrp_setup (fe5b0a00) + 9b f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0) - lwp# 3 feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27 feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32 08058a88 server (0, fe23f8f0, 510, 0, 0, 8058a04) + 84 feef4240 __door_return () + 60 - lwp# 4 feef420f door (0, 0, 0, fe140e00, f5f00, a) feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f f3f3 _thrp_setup (fe5b1a00) + 9b f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0) A truss of zoneadm (-f -vall -wall -tall) shows this looping: 16598: door_call(6, 0x080476D0)= 0 16598: data_ptr=8047730 data_size=0 16598: desc_ptr=0x0 desc_num=0 16598: rbuf=0x807F2D8 rsize=4096 16598: close(6)= 0 16598: mkdir(/var/run/zones, 0700) Err#17 EEXIST 16598: chmod(/var/run/zones, 0700) = 0 16598: open(/var/run/zones/test.zoneadm.lock, O_RDWR|O_CREAT, 0600) = 6 16598: fcntl(6, F_SETLKW, 0x08046DC0) = 0 16598: typ=F_WRLCK whence=SEEK_SET start=0 len=0 sys=4277003009 pid=6 16598: open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 7 16598: door_info(7, 0x08047230)= 0 16598: target=16082 proc=0x8058A04 data=0x0 16598: attributes=DOOR_UNREF|DOOR_REFUSE_DESC|DOOR_NO_CANCEL 16598: uniquifier=26426 16598: close(7)= 0 16598: close(6)= 0 16598: open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 6 16082/3:door_return(0x, 0, 0x
[zones-discuss] code review for 6495558
friends, may I request code review for the earth-shattering fix to: 6495558 zoneadm -z zone boot should not only check but repair filesystems http://cr.opensolaris.org/~batschul/onnv-vplat/ backround: Evaluation when booting a zone, zoneadm ( ie. vplat.c:dofsck() ) should perform the same tasks as the /usr/sbin/mountall script, which does a 'is suitable for mounting' (fsck -m) check first, followed by a preen fsck (fsck -p) if the former failed. the obvious quick fix would be to change the code in vplat.c:dofsck() 825 argv[0] = fsck; 826 argv[1] = -m; 827 argv[2] = (char *)rawdev; 828 argv[3] = NULL; 829 830 status = forkexec(zlogp, cmdbuf, argv); 831 if (status == 0 || status == -1) 832 return (status); 833 zerror(zlogp, B_FALSE, fsck of '%s' failed with exit status %d; 834 run fsck manually, rawdev, status); 835 return (-1); to always just run fsck in preen mode (shouldn't cause any real problem) or fork off a 2nd fsck in preen mode if the first fsck -m failed. actually the fix will be to just execute fsck in preen mode (fsck -p) rather then doing the 'is suitable for mounting' and preen fsck dance. if the former fails, the latter will have to be done anyways. the latter however kind of implies the former. thanks! -- frankB It is always possible to agglutinate multiple separate problems into a single complex interdependent solution. In most cases this is a bad idea. ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Solaris10-Branded Zones Webrev: CR 6882732
On Wed, 09 Dec 2009 23:54:05 +0100, Jordan Vaughan jordan.vaug...@sun.com wrote: I need someone to review my fix for 6882732 unpacking archive with extended file attributes reports errors The webrev is accessible via http://cr.opensolaris.org/~flippedb/onnv-s10c looks good to me. cheers --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
[zones-discuss] /var/run/zones not cleaned up ?
is it to be expected that after no zoneadm/zoneadmd is running anymore, /var/run/zones still contains the corresponding lock files ? (also I looked at the current threadlist of my system and no zone releated kernel threads are running anymore) osoldev.root./var/run/zones.= zoneadm list -cp 0:global:running:/::ipkg:shared -:zone2:configured:/tank/zones/zone2::ipkg:shared osoldev.root./var/run/zones.= ps -eafd|grep zone root 2961 2734 0 16:35:06 pts/2 0:00 grep zone osoldev.root./var/run/zones.= ls -la total 16 drwx-- 2 root root 335 Dec 10 12:23 . drwxr-xr-x 11 root sys 2423 Dec 10 12:21 .. -rw-r--r-- 1 root root 0 Dec 10 12:23 index.lock -rw--- 1 root root 0 Dec 10 12:21 zone1.zoneadm.lock -rw--- 1 root root 0 Dec 10 12:21 zone1.zoneadmd_door this was after a zone boot/zone halt/zone uninstall/zone delete cycle. bug, feature ? --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Impossible to install a zone on a new indiana b128a install
On Tue, 08 Dec 2009 20:01:19 +0100, Ric Aleshire ric.alesh...@sun.com wrote: Vincent Boisard wrote: Hi, I just installed a new opensolaris dev b128a machine from an iso downloaded from genunix. I tried to install a zone but it failed: r...@pasiphae:~# zoneadm -z template install A ZFS file system has been created for this zone. Publisher: Using opensolaris.org (http://pkg.opensolaris.org/dev/ ). Image: Preparing at /zones/template/root. Cache: Using /var/pkg/download. Sanity Check: Looking for 'entire' incorporation. Installing: Core System (output follows) No updates necessary for this image. ERROR: failed to install package (it downloaded ~120MB of files into /var/pkg/download) The zone is in incomplete state and the zone root fs is mounted As it is the first time I'm using opensolaris (I used SXCE before), I have no idea what's going on here. This is bug that is fixed in b129: http://defect.opensolaris.org/bz/show_bug.cgi?id=12995 Vincent, if you can not wait for build 129, there's a workaround: edit the script: /usr/lib/brand/ipkg/pkgcreatezone and comment out line 468 in the file and which looks like: $PKG install --no-refresh --no-index SUNWcs || fail_fatal $f_pkg if you comment it out, the zone will be installed fine. cheers frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] solaris 10 branded zone
On Tue, 08 Dec 2009 18:31:55 +0100, xx david.sc...@autohandle.com wrote: i'm still not doing something right: init...@dogpatch:~# pkg install SUNWs10brand Creating Plan pkg: The following pattern(s) did not match any packages in the current catalog. Try relaxing the pattern, refreshing and/or examining the catalogs: SUNWs10brand init...@dogpatch:~# pkg info -r * 21 | grep brand pkg://development/SUNWipkg-brand init...@dogpatch:~# pkg info -r * 21 | grep s10 init...@dogpatch:~# pkg publisher PUBLISHER TYPE STATUS URI development (preferred) origin online http://pkg.opensolaris.org/dev/ extra origin online https://pkg.sun.com/opensolaris/extra/ opensolaris.org origin online http://pkg.opensolaris.org/release/ init...@dogpatch:~# I think thats because your publisher local name is not the default, I did the same today and it worked batsc...@osoldev:/usr/lib/brand$ pkg info -r * 21 | grep brand pkg://opensolaris.org/SUNWipkg-brand pkg://opensolaris.org/SUNWs10brand my publisher is this: batsc...@osoldev:/usr/lib/brand$ pkg publisher PUBLISHER TYPE STATUS URI opensolaris.org (preferred) origin online http://pkg.opensolaris.org/dev/ blastwave origin online http://blastwave.network.com:1/ osol-contrib origin online http://pkg.opensolaris.org/contrib/ sunfreeware.com origin online http://pkg.sunfreeware.com:9000/ I believe there were some problems recently when the local name did not matched the offical publishers name. hth -- frankB It is always possible to agglutinate multiple separate problems into a single complex interdependent solution. In most cases this is a bad idea. ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Re: [nfs-discuss] Re: [sysadmin-discuss] NFS server in zones
On Thu, 15 Feb 2007 06:19:10 +0100, Mahesh Siddheshwar [EMAIL PROTECTED] wrote: Robert Thurlow wrote: Glenn Faden wrote: 4) A bug currently prevents a client instance and a server instance from being safe to use on the same box (apologies, can't quote the bugid from here). How likely, in your use case, is it that this will be a problem, i.e. will your boxes be in the position where a zone needs data shared from another zone as opposed to a separate server? This is a must fix. In TX we want to automount between labeled zones on the same machine. It seems to work with ZFS. Is the deadlock specific to UFS/NFS? Good question! I don't expect that it is, but perhaps ZFS's use of the ARC would insulate it. Maybe Mahesh would know. The problem seen in 5065254 and what is seen commonly in the recent past is mainly due to the interaction between NFS, UFS and segmap driver**. This scenario, typically, is noticeable only under heavy load or on systems with a low segmapsize. Since ZFS does not use the segmap driver, this particular scenario should be averted. Currently the loopback mounted configuration is never tested. So I won't be surprised if we run into other loopings, but with some effort those should be tractable. Mahesh ** NFS tries to commit pages, which, on the NFS server requires UFS to obtain a segmap slot. Before you use the segmap slot, you need to free/destroy the previous mappings for the segmap slot, which happens to be a locked NFS page, the lock for which is currently owned by the commit thread which begun this process. There is one more scenario which I have not seen, but is theoretically possible -- where writing to a UFS file requires stealing a dirty NFS page which would in turn require writing to the server, which requires exclusive locking of the same UFS file. even if you leave this segmap issue aside, it is very likely you encounter different deadlocks, because this is an attempt to stack file systems that are not really stackable file systems and you'd run into issues similar to 4498652 / 4154394 --- frankB ___ zones-discuss mailing list zones-discuss@opensolaris.org