[zones-discuss] Move zone from Sun4v to Sun4u

2009-12-15 Thread Magnus Sjolander
Hi
IHAC that's are going to move Solaris 10 zones from T5120 to M5000 server, 
there is the same Solaris version and patches, can we move it as is, or do we 
need to use some special commands?


/Magnus




___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Move zone from Sun4v to Sun4u

2009-12-15 Thread Detlef Drewanz

Magnus,
you can migrate zone between sun4v and sun4u in the folowing way:
1. use zoneadm detach to unplug the zone from the system
2. move the zoneroot to the other system
3. run zoneadm attach -u to attach the zone. During that attach also
the plattform dependent packages/patches will be installed from the
global zone in the zone.

Detlef

Magnus Sjolander schrieb am 15.12.09 09:18:

Hi IHAC that's are going to move Solaris 10 zones from T5120 to
M5000 server, there is the same Solaris version and patches, can we
move it as is, or do we need to use some special commands?


/Magnus




___ zones-discuss
mailing list zones-discuss@opensolaris.org


---
Sitz der Gesellschaft: Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder,Wolfgang Engels,Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zoneadm hangs after repeated boot/halt use

2009-12-15 Thread Frank Batschulat (Home)
Glenn, I've been running this test case now for nearly a day on build 129, 
could'nt
reproduce at all. good chance this being indeed fixed by 6894901 in build 128.

I'll also try to reproduce this now on buil 126.

cheers
frankB

On Fri, 11 Dec 2009 21:48:52 +0100, Glenn Brunette glenn.brune...@sun.com 
wrote:


 As part of some Immutable Service Container[1] demonstration that I am
 creating for an event in January.  I have the need to start/stop a zone
 quite a few times (as part of a Self-Cleansing[2] demo).  During the
 course of my testing, I have been able to repeatedly get zoneadm to
 hang.

 Since I am working with a highly customized configuration, I started
 over with a default zone on OpenSolaris (b127) and was able to repeat
 this issue.  To reproduce this problem use the following script after
 creating a zone usual the normal/default steps:

 isc...@osol-isc:~$ while : ; do
   echo `date`: ZONE BOOT
   pfexec zoneadm -z test boot
   sleep 30
   pfexec zoneamd -z test halt
   echo `date`: ZONE HALT
   sleep 10
   done

 This script works just fine for a while, but eventually zoneadm hangs
 (was at pass #90 in my last test).  When this happens, zoneadm is shown
 to be consuming quite a bit of CPU:

 PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP

   16598 root   11M 3140K run  10   0:54:49  74% zoneadm/1


 A stack trace of zoneadm shows:

 isc...@osol-isc:~$ pfexec pstack `pgrep zoneadm`
 16082:zoneadmd -z test
 -  lwp# 1  
 -  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
 -  lwp# 3  
   feef420f __door_return () + 2f
 -  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)
 16598:zoneadm -z test boot
   feef3fc8 door (6, 80476d0, 0, 0, 0, 3)
   feede653 door_call (6, 80476d0, 400, fe3d43f7) + 7b
   fe3d44f0 zonecfg_call_zoneadmd (8047e33, 8047730, 8078448, 1) + 124
   0805792d boot_func (0, 8047d74, 100, 805ff0b) + 1cd
   08060125 main (4, 8047d64, 8047d78, 805570f) + 2b9
   0805576d _start   (4, 8047e28, 8047e30, 8047e33, 8047e38, 0) + 7d


 A stack trace of zoneadmd shows:

 isc...@osol-isc:~$ pfexec pstack `pgrep zoneadmd`
 16082:zoneadmd -z test
 -  lwp# 1  
 -  lwp# 2  
   feef41c6 door (0, 0, 0, 0, 0, 8)
   feed99f7 door_unref_func (3ed2, fef81000, fe33efe8, f39e) + 67
   f3f3 _thrp_setup (fe5b0a00) + 9b
   f680 _lwp_start (fe5b0a00, 0, 0, 0, 0, 0)
 -  lwp# 3  
   feef4147 __door_ucred (80a37c8, fef81000, fe23e838, feed9cfe) + 27
   feed9d0d door_ucred (fe23f870, 1000, 0, 0) + 32
   08058a88 server   (0, fe23f8f0, 510, 0, 0, 8058a04) + 84
   feef4240 __door_return () + 60
 -  lwp# 4  
   feef420f door (0, 0, 0, fe140e00, f5f00, a)
   feed9f57 door_create_func (0, fef81000, fe140fe8, f39e) + 2f
   f3f3 _thrp_setup (fe5b1a00) + 9b
   f680 _lwp_start (fe5b1a00, 0, 0, 0, 0, 0)


 A truss of zoneadm (-f -vall -wall -tall) shows this looping:

 16598:  door_call(6, 0x080476D0)= 0
 16598:  data_ptr=8047730 data_size=0
 16598:  desc_ptr=0x0 desc_num=0
 16598:  rbuf=0x807F2D8 rsize=4096
 16598:  close(6)= 0
 16598:  mkdir(/var/run/zones, 0700)   Err#17 EEXIST
 16598:  chmod(/var/run/zones, 0700)   = 0
 16598:  open(/var/run/zones/test.zoneadm.lock, O_RDWR|O_CREAT, 0600) = 6
 16598:  fcntl(6, F_SETLKW, 0x08046DC0)  = 0
 16598:  typ=F_WRLCK  whence=SEEK_SET start=0 len=0
 sys=4277003009 pid=6
 16598:  open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 7
 16598:  door_info(7, 0x08047230)= 0
 16598:  target=16082 proc=0x8058A04 data=0x0
 16598:  attributes=DOOR_UNREF|DOOR_REFUSE_DESC|DOOR_NO_CANCEL
 16598:  uniquifier=26426
 16598:  close(7)= 0
 16598:  close(6)= 0
 16598:  open(/var/run/zones/test.zoneadmd_door, O_RDONLY) = 6
 16082/3:door_return(0x, 0, 0x, 0xFE23FE00,
 1007360) = 0
 16082/3:door_ucred(0x080A37C8)  = 0
 16082/3:euid=0 egid=0
 16082/3:ruid=0 rgid=0
 16082/3:pid=16598 zoneid=0
 16082/3:E: all
 

Re: [zones-discuss] zones code review

2009-12-15 Thread Jerry Jelinek

Ed,

Thanks for reviewing this.  My responses to your comments are
in-line.

Edward Pilatowicz wrote:

On Tue, Dec 15, 2009 at 08:39:12AM -0700, Jerry Jelinek wrote:

I have an initial code review for the fix for bug:

6768950 panic[cpu1]/thread=ff084ce0b3e0: syscall_asm_amd64.s:480
lwp ff0756a8cdc0, pcb_rupdate != 0

There is a webrev at:

http://cr.opensolaris.org/~gjelinek/webrev.6768950/

The code changes in the sn1 and solaris10 brands are basically
identical.  I know there is a lot of common code there but I
didn't want to clutter up this bug fix with the unrelated changes
necessary to make the code common.  I'll be addressing that with
a separate fix.

My initial testing of these changes looks good but I still need
to run more extensive tests.



this looks great.  i have some initial comments.

--
usr/src/lib/brand/{sn1|solaris10}/*_brand/*/*_handler.s:

- could you update the following lines with comments:
xchgq   CPTRSIZE(%rbp), %rax/* swap JMP table offset and ret addr */
shrq$4, %rax/* JMP table offset / JMP size = syscall num */
movq%rax, EH_LOCALS_GREG(REG_RAX)(%rbp) /* save syscall num */


Will do.


--
usr/src/uts/i86pc/ml/syscall_asm.s:

- don't you need to update this file as well?  have you tested 32-bit
  kernels?


No, this doesn't need to be updated since this code doesn't touch the
user's stack.  I have done preliminary testing with 32 bit kernels and
the callbacks work correctly with the code as is.  Thats because the
32 bit code is more like the 64 bit code that handles an interrupt stack
where we already have the right data pushed.


--
usr/src/uts/i86pc/ml/syscall_asm_amd64.s

- perhaps you could do the following renames:
BRAND_GET_RET_REG - BRAND_URET_FROM_REG
BRAND_GET_RET_STACK - BRAND_URET_FROM_INTR_STACK


Will do.


- wrt this code:
cmpq$NSYSCALL, %rax /* is 0 = syscall = MAX?  */
jbe 0f  /* yes, syscall is OK */
xorq%rax, %rax  /* no, zero syscall number */

  it's duplicated in every brand callback right after
  CALLBACK_PROLOGUE().  why not make it part of CALLBACK_PROLOGUE?


Will do.


  also, if the syscall num is  NSYSCALL, why not just jump to 9: and
  let the normal syscall path detect and return the error?


OK.  I was modeling this on the way lx did it but your suggestion seems
better.


- it seems like there should be a macro for this rough block of code
  (which calculates the jmp table address):

GET_P_BRAND_DATA(%esp, 1, %edx);/* get p_brand_data ptr */
movlSPD_HANDLER(%edx), %edx /* get p_brand_data-spd_handler ptr */
shll$4, %eax
addl%eax, %edx  /* we'll return to our handler */


I'll put one together.


- prior to these changes V_URET_ADDR wasn't always set, so the different
  brand syscall callbacks would get the userland return address from
  their syscall specific locations (registers, interrupt stack, etc).  but
  now since V_URET_ADDR is always set, perhaps the callback handlers could
  be made more consistent by always getting the value from the stack (ie,
  via V_URET_ADDR)?

- so following up with the last comment (and getting more into potential
  comminization work) it seems to me like it might be benificial to move
  all the syscall mechanism specific handling code out of the actual brand
  callbacks and into BRAND_CALLBACK.  you've already started doing this by
  having BRAND_CALLBACK be aware of how to get the userland return
  address.  (prior to that it didn't have any dependancy upon the
  different syscall mechanisms, except when deciding which brand callback
  to invoke.)  continuing down that path we could move all the syscall
  specific handling code into BRAND_CALLBACK.  then each brand would only
  deliver a single callback which would take one parameter, the syscall
  number.  it would return one value, a userland return address.  then
  BRAND_CALLBACK could handle all the different syscall specific return
  paths.  this would also be benificial in the future since if a new
  syscall mechanism was introduced, we wouldn't have to update any actual
  brand callbacks, just BRAND_CALLBACK.  thoughts?


For these last two I agree that there are some good opportunities here and
I was torn between doing a bunch more clean up on this and deferring that
work to the fix for:

6900207 code can be shared between solaris10 and ipkg brands

Since bug 6768950 is serious and I'd like to get the fix done sooner
rather than later, I'd like to defer some of these other changes to 6900207.
I was about to start on that anyway so once 6768950 is done I'm going to
immediately start work on a bunch of ideas I have for making the code shared
and simpler.  I was also going to roll a fix for:

6887823 brandz on x86 should ignore %gs on 64-bit kernels

into that same set of cleanup.  I definitely agree with your comments here

Re: [zones-discuss] zones code review

2009-12-15 Thread Edward Pilatowicz
On Tue, Dec 15, 2009 at 01:28:01PM -0700, Jerry Jelinek wrote:
 Ed,

 Thanks for reviewing this.  My responses to your comments are
 in-line.

 Edward Pilatowicz wrote:
 On Tue, Dec 15, 2009 at 08:39:12AM -0700, Jerry Jelinek wrote:
 I have an initial code review for the fix for bug:
 
 6768950 panic[cpu1]/thread=ff084ce0b3e0: syscall_asm_amd64.s:480
 lwp ff0756a8cdc0, pcb_rupdate != 0
 
 There is a webrev at:
 
 http://cr.opensolaris.org/~gjelinek/webrev.6768950/
 
 The code changes in the sn1 and solaris10 brands are basically
 identical.  I know there is a lot of common code there but I
 didn't want to clutter up this bug fix with the unrelated changes
 necessary to make the code common.  I'll be addressing that with
 a separate fix.
 
 My initial testing of these changes looks good but I still need
 to run more extensive tests.
 
...
 - prior to these changes V_URET_ADDR wasn't always set, so the different
   brand syscall callbacks would get the userland return address from
   their syscall specific locations (registers, interrupt stack, etc).  but
   now since V_URET_ADDR is always set, perhaps the callback handlers could
   be made more consistent by always getting the value from the stack (ie,
   via V_URET_ADDR)?
 
 - so following up with the last comment (and getting more into potential
   comminization work) it seems to me like it might be benificial to move
   all the syscall mechanism specific handling code out of the actual brand
   callbacks and into BRAND_CALLBACK.  you've already started doing this by
   having BRAND_CALLBACK be aware of how to get the userland return
   address.  (prior to that it didn't have any dependancy upon the
   different syscall mechanisms, except when deciding which brand callback
   to invoke.)  continuing down that path we could move all the syscall
   specific handling code into BRAND_CALLBACK.  then each brand would only
   deliver a single callback which would take one parameter, the syscall
   number.  it would return one value, a userland return address.  then
   BRAND_CALLBACK could handle all the different syscall specific return
   paths.  this would also be benificial in the future since if a new
   syscall mechanism was introduced, we wouldn't have to update any actual
   brand callbacks, just BRAND_CALLBACK.  thoughts?

 For these last two I agree that there are some good opportunities here and
 I was torn between doing a bunch more clean up on this and deferring that
 work to the fix for:

 6900207 code can be shared between solaris10 and ipkg brands

 Since bug 6768950 is serious and I'd like to get the fix done sooner
 rather than later, I'd like to defer some of these other changes to 6900207.
 I was about to start on that anyway so once 6768950 is done I'm going to
 immediately start work on a bunch of ideas I have for making the code shared
 and simpler.  I was also going to roll a fix for:

 6887823 brandz on x86 should ignore %gs on 64-bit kernels

 into that same set of cleanup.  I definitely agree with your comments here
 but I'm worried about the fix for 6768950 taking too long.


sounds good.
ed
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] zones code review

2009-12-15 Thread Jordan Vaughan

On 12/15/09 07:39 AM, Jerry Jelinek wrote:

I have an initial code review for the fix for bug:

6768950 panic[cpu1]/thread=ff084ce0b3e0: syscall_asm_amd64.s:480
lwp ff0756a8cdc0, pcb_rupdate != 0

There is a webrev at:

http://cr.opensolaris.org/~gjelinek/webrev.6768950/

The code changes in the sn1 and solaris10 brands are basically
identical.  I know there is a lot of common code there but I
didn't want to clutter up this bug fix with the unrelated changes
necessary to make the code common.  I'll be addressing that with
a separate fix.

My initial testing of these changes looks good but I still need
to run more extensive tests.

Thanks,
Jerry

___
zones-discuss mailing list
zones-discuss@opensolaris.org


Hi Jerry,

I'll add one question to Ed's suggestions:


--
usr/src/lib/brand/sn1/sn1_brand/amd64/sn1_handler.s

44: Shouldn't this function be named sn1_handler_table?


Jordan
___
zones-discuss mailing list
zones-discuss@opensolaris.org