[zones-discuss] Move zone from Sun4v to Sun4u
Hi IHAC that's are going to move Solaris 10 zones from T5120 to M5000 server, there is the same Solaris version and patches, can we move it as is, or do we need to use some special commands? /Magnus ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] Move zone from Sun4v to Sun4u
Magnus, you can migrate zone between sun4v and sun4u in the folowing way: 1. use zoneadm detach to unplug the zone from the system 2. move the zoneroot to the other system 3. run zoneadm attach -u to attach the zone. During that attach also the plattform dependent packages/patches will be installed from the global zone in the zone. Detlef Magnus Sjolander schrieb am 15.12.09 09:18: Hi IHAC that's are going to move Solaris 10 zones from T5120 to M5000 server, there is the same Solaris version and patches, can we move it as is, or do we need to use some special commands? /Magnus ___ zones-discuss mailing list zones-discuss@opensolaris.org --- Sitz der Gesellschaft: Sonnenallee 1, D-85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Thomas Schroeder,Wolfgang Engels,Wolf Frenkel Vorsitzender des Aufsichtsrates: Martin Haering ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zoneadm hangs after repeated boot/halt use
Glenn, I've been running this test case now for nearly a day on build 129, could'nt reproduce at all. good chance this being indeed fixed by 6894901 in build 128. I'll also try to reproduce this now on buil 126. cheers frankB On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette glenn.brune...@sun.com wrote: As part of some Immutable Service Container[1] demonstration that I am creating for an event in January. I have the need to start/stop a zone quite a few times (as part of a Self-Cleansing[2] demo). During the course of my testing, I have been able to repeatedly get zoneadm to hang. Since I am working with a highly customized configuration, I started over with a default zone on OpenSolaris (b127) and was able to repeat this issue. To reproduce this problem use the following script after creating a zone usual the normal/default steps: isc...@osol-isc:~$ while : ; do echo `date`: ZONE BOOT pfexec zoneadm -z test boot sleep 30 pfexec zoneamd -z test halt echo `date`: ZONE HALT sleep 10 done This script works just fine for a while, but eventually zoneadm hangs (was at pass #90 in my last test). When this happens, zoneadm is shown to be consuming quite a bit of CPU: PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 16598 root 11M 3140K run 10 0:54:49 74% zoneadm/1 A stack trace of zoneadm shows: isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm` 16082:zoneadmd -z test - lwp# 1 - lwp# 2 feef41c6 door (0, 0, 0, 0, 0, 8) feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67 f3f3 _thrp_setup (fe5b0a00) + 9b f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0) - lwp# 3 feef420f __door_return () + 2f - lwp# 4 feef420f door (0, 0, 0, fe140e00, f5f00, a) feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f f3f3 _thrp_setup (fe5b1a00) + 9b f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0) 16598:zoneadm -z test boot feef3fc8 door (6, 80476d0, 0, 0, 0, 3) feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124 0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd 08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9 0805576d _start (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d A stack trace of zoneadmd shows: isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd` 16082:zoneadmd -z test - lwp# 1 - lwp# 2 feef41c6 door (0, 0, 0, 0, 0, 8) feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67 f3f3 _thrp_setup (fe5b0a00) + 9b f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0) - lwp# 3 feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27 feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32 08058a88 server (0, fe23f8f0, 510, 0, 0, 8058a04) + 84 feef4240 __door_return () + 60 - lwp# 4 feef420f door (0, 0, 0, fe140e00, f5f00, a) feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f f3f3 _thrp_setup (fe5b1a00) + 9b f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0) A truss of zoneadm (-f -vall -wall -tall) shows this looping: 16598: door_call(6, 0x080476D0)= 0 16598: data_ptr=8047730 data_size=0 16598: desc_ptr=0x0 desc_num=0 16598: rbuf=0x807F2D8 rsize=4096 16598: close(6)= 0 16598: mkdir(/var/run/zones, 0700) Err#17 EEXIST 16598: chmod(/var/run/zones, 0700) = 0 16598: open(/var/run/zones/test.zoneadm.lock, O_RDWR|O_CREAT, 0600) = 6 16598: fcntl(6, F_SETLKW, 0x08046DC0) = 0 16598: typ=F_WRLCK whence=SEEK_SET start=0 len=0 sys=4277003009 pid=6 16598: open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 7 16598: door_info(7, 0x08047230)= 0 16598: target=16082 proc=0x8058A04 data=0x0 16598: attributes=DOOR_UNREF|DOOR_REFUSE_DESC|DOOR_NO_CANCEL 16598: uniquifier=26426 16598: close(7)= 0 16598: close(6)= 0 16598: open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 6 16082/3:door_return(0x, 0, 0x, 0xFE23FE00, 1007360) = 0 16082/3:door_ucred(0x080A37C8) = 0 16082/3:euid=0 egid=0 16082/3:ruid=0 rgid=0 16082/3:pid=16598 zoneid=0 16082/3:E: all
Re: [zones-discuss] zones code review
Ed, Thanks for reviewing this. My responses to your comments are in-line. Edward Pilatowicz wrote: On Tue, Dec 15, 2009 at 08:39:12AM -0700, Jerry Jelinek wrote: I have an initial code review for the fix for bug: 6768950 panic[cpu1]/thread=ff084ce0b3e0: syscall_asm_amd64.s:480 lwp ff0756a8cdc0, pcb_rupdate != 0 There is a webrev at: http://cr.opensolaris.org/~gjelinek/webrev.6768950/ The code changes in the sn1 and solaris10 brands are basically identical. I know there is a lot of common code there but I didn't want to clutter up this bug fix with the unrelated changes necessary to make the code common. I'll be addressing that with a separate fix. My initial testing of these changes looks good but I still need to run more extensive tests. this looks great. i have some initial comments. -- usr/src/lib/brand/{sn1|solaris10}/*_brand/*/*_handler.s: - could you update the following lines with comments: xchgq CPTRSIZE(%rbp), %rax/* swap JMP table offset and ret addr */ shrq$4, %rax/* JMP table offset / JMP size = syscall num */ movq%rax, EH_LOCALS_GREG(REG_RAX)(%rbp) /* save syscall num */ Will do. -- usr/src/uts/i86pc/ml/syscall_asm.s: - don't you need to update this file as well? have you tested 32-bit kernels? No, this doesn't need to be updated since this code doesn't touch the user's stack. I have done preliminary testing with 32 bit kernels and the callbacks work correctly with the code as is. Thats because the 32 bit code is more like the 64 bit code that handles an interrupt stack where we already have the right data pushed. -- usr/src/uts/i86pc/ml/syscall_asm_amd64.s - perhaps you could do the following renames: BRAND_GET_RET_REG - BRAND_URET_FROM_REG BRAND_GET_RET_STACK - BRAND_URET_FROM_INTR_STACK Will do. - wrt this code: cmpq$NSYSCALL, %rax /* is 0 = syscall = MAX? */ jbe 0f /* yes, syscall is OK */ xorq%rax, %rax /* no, zero syscall number */ it's duplicated in every brand callback right after CALLBACK_PROLOGUE(). why not make it part of CALLBACK_PROLOGUE? Will do. also, if the syscall num is NSYSCALL, why not just jump to 9: and let the normal syscall path detect and return the error? OK. I was modeling this on the way lx did it but your suggestion seems better. - it seems like there should be a macro for this rough block of code (which calculates the jmp table address): GET_P_BRAND_DATA(%esp, 1, %edx);/* get p_brand_data ptr */ movlSPD_HANDLER(%edx), %edx /* get p_brand_data-spd_handler ptr */ shll$4, %eax addl%eax, %edx /* we'll return to our handler */ I'll put one together. - prior to these changes V_URET_ADDR wasn't always set, so the different brand syscall callbacks would get the userland return address from their syscall specific locations (registers, interrupt stack, etc). but now since V_URET_ADDR is always set, perhaps the callback handlers could be made more consistent by always getting the value from the stack (ie, via V_URET_ADDR)? - so following up with the last comment (and getting more into potential comminization work) it seems to me like it might be benificial to move all the syscall mechanism specific handling code out of the actual brand callbacks and into BRAND_CALLBACK. you've already started doing this by having BRAND_CALLBACK be aware of how to get the userland return address. (prior to that it didn't have any dependancy upon the different syscall mechanisms, except when deciding which brand callback to invoke.) continuing down that path we could move all the syscall specific handling code into BRAND_CALLBACK. then each brand would only deliver a single callback which would take one parameter, the syscall number. it would return one value, a userland return address. then BRAND_CALLBACK could handle all the different syscall specific return paths. this would also be benificial in the future since if a new syscall mechanism was introduced, we wouldn't have to update any actual brand callbacks, just BRAND_CALLBACK. thoughts? For these last two I agree that there are some good opportunities here and I was torn between doing a bunch more clean up on this and deferring that work to the fix for: 6900207 code can be shared between solaris10 and ipkg brands Since bug 6768950 is serious and I'd like to get the fix done sooner rather than later, I'd like to defer some of these other changes to 6900207. I was about to start on that anyway so once 6768950 is done I'm going to immediately start work on a bunch of ideas I have for making the code shared and simpler. I was also going to roll a fix for: 6887823 brandz on x86 should ignore %gs on 64-bit kernels into that same set of cleanup. I definitely agree with your comments here
Re: [zones-discuss] zones code review
On Tue, Dec 15, 2009 at 01:28:01PM -0700, Jerry Jelinek wrote: Ed, Thanks for reviewing this. My responses to your comments are in-line. Edward Pilatowicz wrote: On Tue, Dec 15, 2009 at 08:39:12AM -0700, Jerry Jelinek wrote: I have an initial code review for the fix for bug: 6768950 panic[cpu1]/thread=ff084ce0b3e0: syscall_asm_amd64.s:480 lwp ff0756a8cdc0, pcb_rupdate != 0 There is a webrev at: http://cr.opensolaris.org/~gjelinek/webrev.6768950/ The code changes in the sn1 and solaris10 brands are basically identical. I know there is a lot of common code there but I didn't want to clutter up this bug fix with the unrelated changes necessary to make the code common. I'll be addressing that with a separate fix. My initial testing of these changes looks good but I still need to run more extensive tests. ... - prior to these changes V_URET_ADDR wasn't always set, so the different brand syscall callbacks would get the userland return address from their syscall specific locations (registers, interrupt stack, etc). but now since V_URET_ADDR is always set, perhaps the callback handlers could be made more consistent by always getting the value from the stack (ie, via V_URET_ADDR)? - so following up with the last comment (and getting more into potential comminization work) it seems to me like it might be benificial to move all the syscall mechanism specific handling code out of the actual brand callbacks and into BRAND_CALLBACK. you've already started doing this by having BRAND_CALLBACK be aware of how to get the userland return address. (prior to that it didn't have any dependancy upon the different syscall mechanisms, except when deciding which brand callback to invoke.) continuing down that path we could move all the syscall specific handling code into BRAND_CALLBACK. then each brand would only deliver a single callback which would take one parameter, the syscall number. it would return one value, a userland return address. then BRAND_CALLBACK could handle all the different syscall specific return paths. this would also be benificial in the future since if a new syscall mechanism was introduced, we wouldn't have to update any actual brand callbacks, just BRAND_CALLBACK. thoughts? For these last two I agree that there are some good opportunities here and I was torn between doing a bunch more clean up on this and deferring that work to the fix for: 6900207 code can be shared between solaris10 and ipkg brands Since bug 6768950 is serious and I'd like to get the fix done sooner rather than later, I'd like to defer some of these other changes to 6900207. I was about to start on that anyway so once 6768950 is done I'm going to immediately start work on a bunch of ideas I have for making the code shared and simpler. I was also going to roll a fix for: 6887823 brandz on x86 should ignore %gs on 64-bit kernels into that same set of cleanup. I definitely agree with your comments here but I'm worried about the fix for 6768950 taking too long. sounds good. ed ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] zones code review
On 12/15/09 07:39 AM, Jerry Jelinek wrote: I have an initial code review for the fix for bug: 6768950 panic[cpu1]/thread=ff084ce0b3e0: syscall_asm_amd64.s:480 lwp ff0756a8cdc0, pcb_rupdate != 0 There is a webrev at: http://cr.opensolaris.org/~gjelinek/webrev.6768950/ The code changes in the sn1 and solaris10 brands are basically identical. I know there is a lot of common code there but I didn't want to clutter up this bug fix with the unrelated changes necessary to make the code common. I'll be addressing that with a separate fix. My initial testing of these changes looks good but I still need to run more extensive tests. Thanks, Jerry ___ zones-discuss mailing list zones-discuss@opensolaris.org Hi Jerry, I'll add one question to Ed's suggestions: -- usr/src/lib/brand/sn1/sn1_brand/amd64/sn1_handler.s 44: Shouldn't this function be named sn1_handler_table? Jordan ___ zones-discuss mailing list zones-discuss@opensolaris.org