Re: [zones-discuss] code review: native brand refactoring
Ed, My responses are in-line. Edward Pilatowicz wrote: On Tue, May 27, 2008 at 08:48:17AM -0600, Jerry Jelinek wrote: I have updated the webrev at: http://cr.opensolaris.org/~gjelinek/webrev/ This includes the changes for the feedback I have received so far. I also added the zlogin.c file to the webrev with two bug fixes. One of these was for a bug I was hitting during testing of these changes and there is a second bug in zlogin that came in which I also fixed. So, at a minimum, it would be good to take a look at that additional file. more comments below. all that's left is sw_support.c, i'll get to that tommorow. ed -- usr/src/cmd/zoneadm/zoneadm.c - isn't it kinda weird to have the brand specific cli options come before the brand name when cloning a zone? I'm not sure that I know which code you are thinking of here, but since these are options, the order doesn't matter. - why are you parsing arguments by hand in stead of using getopt()? this will result in non-standard argument parsing and probably break things that used to work like: zoneadm -z foo clone -mcopy -sexport/zones/[EMAIL PROTECTED] ... (note the lack of spaces between the flag and the option) or: zoneadm -z foo attach -Fu ... I would prefer to use getopt but I don't think there is any way to make getopt work with a mix of known and unknown options. I initially coded this to handle the returned ? with opterr=0 when an option that is not in the argument string is encountered, but getopt is then in an error state and won't process any additional options. So, I had to fall back to using the manual parsing I have now which does handle unknown options, although doesn't handle all of the permutations that getopt could, as you noted. I am not really happy with what is there now, so if you have any suggestions, let me know. - in clone_func(), done: always calls zonecfg_release_lock_file(), but there are paths that can invoke goto done; without first calling zonecfg_grab_lock_file(). Fixed. - in clone_func(), it seems like the is_system_labeled() check should be moved out of here and into the native brand postclone callback. Moved. - why is there no locking around the pre-detach hook? That hasn't changed with this code but if I recall correctly, the idea was that predetach is not making any changes to the zone so we don't lock until actual changes to the state are being made. - in dryrun_get_brand(), use STDIN_FILENO instead of 0. Fixed. - in dryrun_get_brand(), you might want to always read the manifest into a temporary file. with the current code specifying a manifest path of /dev/fd/0 (which is a standard substitute for -) won't work. Done. - in the case of a forced attach why don't we run the brand callback? That is really the definition of forced. We don't do anything when forced except change the state of the zone to installed. - might it simplify zoneadm to just include locking for help request subcommands? It probably does but I didn't want to lock the zone when we don't have to and this is consistent with the original behavior where the built-in help never locked the zone. - in attach_func(), if the force flag is specified it seems like we'll always call zonecfg_release_lock_file(). (Even if we didn't aquire the lock beforehand.) We should be doing this but I think I see the problem you were wondering about. The test for the combination of dry-run and force flags was wrong. I fixed that. - uninstall_func() has lots of error returns that don't call zonecfg_release_lock_file(). It looks like most of those are pre-existing bugs, but I went ahead and fixed them. Thanks for spotting that. Thanks again for all your comments on this code-review, Jerry ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review: native brand refactoring
Steve Lawrence wrote: Hey Jerry, Does this address this comment in 6621020: This appears to point out at least one bug in zlogin, namely that it keeps stdout_pipe[1] and stderr_pipe[1] from noninteractive_login() open when returning to the parent. Basically, I think the filer expected to see something like: pipe(stdout); pipe(stderr); if (fork() == 0) { /* in child, close pipe sides read by parent) */ close(stdout[0]); close(stderr[0]); .. write to std*[1]... ... exit(..); } /* in parent, close pipe sides written by child */ close(stdout([1]); close(stderr([1]); ... read from std*[0] ... ... I think the in child part is handled by the closefrom on line 1559, but the parent does not close the sides of the pipes that the child writes to. Steve, I did not make the change discussed above. Let me go back and take another look at that. Thanks, Jerry ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review: native brand refactoring
Edward Pilatowicz wrote: - why are you parsing arguments by hand in stead of using getopt()? this will result in non-standard argument parsing and probably break things that used to work like: zoneadm -z foo clone -mcopy -sexport/zones/[EMAIL PROTECTED] ... (note the lack of spaces between the flag and the option) or: zoneadm -z foo attach -Fu ... Ed, Your comment prompted me to go back and take another look at this and now I am not sure what I thought I was seeing before, but it seems like I can make this work, so I will go back to using getopt. Thanks, Jerry ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review: native brand refactoring
See also: 6475075 zlogin i/o loop needs work 6263984 zlogin's i/o loop can sometimes drop data on child death On Wed, May 28, 2008 at 11:53:18AM -0500, Nicolas Williams wrote: On Tue, May 27, 2008 at 08:48:17AM -0600, Jerry Jelinek wrote: This includes the changes for the feedback I have received so far. I also added the zlogin.c file to the webrev with two bug fixes. One of these was for a bug I was hitting during testing of these changes and there is a second bug in zlogin that came in which I also fixed. So, at a minimum, it would be good to take a look at that additional file. Why timeout the poll(2) call when you can have a signal handler that writes a byte into a pipe that you could poll() for? That's the standard way (well, one of them) to handle signals + polling. (E.g., we use it in Sun_SSH.) If you do it this way you get more responsiveness and you don't waste time waking up the zlogin process when there's nothing to do (this last is important w.r.t. power consumption). Nico -- ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review: native brand refactoring
On Wed, May 28, 2008 at 07:02:32AM -0600, Jerry Jelinek wrote: - why is there no locking around the pre-detach hook? That hasn't changed with this code but if I recall correctly, the idea was that predetach is not making any changes to the zone so we don't lock until actual changes to the state are being made. hm. that differs from the description in the dtd. the dtd says the callback should be carefull making changes since it could be invoked multiple times. it doesn't say anything about being read-only or concurrent invocations. i'd probably just be best to include locking around the pre-detach call. ed ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review: native brand refactoring
Nicolas Williams wrote: On Tue, May 27, 2008 at 08:48:17AM -0600, Jerry Jelinek wrote: This includes the changes for the feedback I have received so far. I also added the zlogin.c file to the webrev with two bug fixes. One of these was for a bug I was hitting during testing of these changes and there is a second bug in zlogin that came in which I also fixed. So, at a minimum, it would be good to take a look at that additional file. Why timeout the poll(2) call when you can have a signal handler that writes a byte into a pipe that you could poll() for? That's the standard way (well, one of them) to handle signals + polling. (E.g., we use it in Sun_SSH.) If you do it this way you get more responsiveness and you don't waste time waking up the zlogin process when there's nothing to do (this last is important w.r.t. power consumption). Noco, Thanks for looking at this. I'll take a look at implementing your suggestion. Thanks again, Jerry ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review: native brand refactoring
Edward Pilatowicz wrote: On Wed, May 28, 2008 at 07:02:32AM -0600, Jerry Jelinek wrote: - why is there no locking around the pre-detach hook? That hasn't changed with this code but if I recall correctly, the idea was that predetach is not making any changes to the zone so we don't lock until actual changes to the state are being made. hm. that differs from the description in the dtd. the dtd says the callback should be carefull making changes since it could be invoked multiple times. it doesn't say anything about being read-only or concurrent invocations. i'd probably just be best to include locking around the pre-detach call. Ed, OK, I'll move the locking up. Thanks, Jerry ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review: native brand refactoring
On Wed, May 28, 2008 at 07:02:32AM -0600, Jerry Jelinek wrote: - in the case of a forced attach why don't we run the brand callback? That is really the definition of forced. We don't do anything when forced except change the state of the zone to installed. hm. that differs from my concept of a forced operation, which i thought always meant, do all the same things you normally do, but ignore any errors encountered along the way. if we do a forced boot, do we invoke the brand boot callback? if you leave it as is, could you document this behavior in the dtd comment? ed ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review: native brand refactoring
Edward Pilatowicz wrote: On Wed, May 28, 2008 at 07:02:32AM -0600, Jerry Jelinek wrote: - in the case of a forced attach why don't we run the brand callback? That is really the definition of forced. We don't do anything when forced except change the state of the zone to installed. hm. that differs from my concept of a forced operation, which i thought always meant, do all the same things you normally do, but ignore any errors encountered along the way. if we do a forced boot, do we invoke the brand boot callback? if you leave it as is, could you document this behavior in the dtd comment? Ed, I'll update the dtd since this behavior hasn't changed with this update to the code and I would prefer to keep it as is. Thanks, Jerry ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review: native brand refactoring
On Tue, May 27, 2008 at 08:48:17AM -0600, Jerry Jelinek wrote: I have updated the webrev at: http://cr.opensolaris.org/~gjelinek/webrev/ This includes the changes for the feedback I have received so far. I also added the zlogin.c file to the webrev with two bug fixes. One of these was for a bug I was hitting during testing of these changes and there is a second bug in zlogin that came in which I also fixed. So, at a minimum, it would be good to take a look at that additional file. here's my last batch of comments. (for sw_support.c i only looked at the new functions you introduced.) ed -- general - iirc, there is still tons of zulu code in zoneadmd. this code creates the special live upgrade environment, and LU is tied to svr4 packaging, right? so it seems like the zulu code is brand specific and should also be factored out into mount callbacks. (and iirc, the mount operations isn't currenly supported by any branded zones.) thoughts? -- usr/src/lib/brand/native/zone/sw_support.c - in main(), an invalid cmd value will result in returning an uninitialzed err value. makes me wonder, is this code linting properly? - nit: in install_func(), status should be initialized to Z_ERR not 1. - in install_func(), if the snprintf() overflows, you'll pass a a constant error value (Z_ERR) to subproc_status() (which is expecting a waitpid() encoded status value). - hm. attach_func() looks for the -F force flag, but as discussed previously, in the case of forced attach, the attach brand callback is currently not executed. ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review: native brand refactoring
Ed, Responses below. Edward Pilatowicz wrote: On Tue, May 27, 2008 at 08:48:17AM -0600, Jerry Jelinek wrote: I have updated the webrev at: http://cr.opensolaris.org/~gjelinek/webrev/ This includes the changes for the feedback I have received so far. I also added the zlogin.c file to the webrev with two bug fixes. One of these was for a bug I was hitting during testing of these changes and there is a second bug in zlogin that came in which I also fixed. So, at a minimum, it would be good to take a look at that additional file. here's my last batch of comments. (for sw_support.c i only looked at the new functions you introduced.) ed -- general - iirc, there is still tons of zulu code in zoneadmd. this code creates the special live upgrade environment, and LU is tied to svr4 packaging, right? so it seems like the zulu code is brand specific and should also be factored out into mount callbacks. (and iirc, the mount operations isn't currenly supported by any branded zones.) thoughts? I don't think there is really any zulu code in zoneadmd but I'll take a pass through to double check. If I see anything like that, I'll fix it up and update the webrev. The mount code is not really svr4 pkg specific either, but is specific to the native brand. I have extensions to mount which would allow us to mount any type of brand with very little change. I added those with the update on attach work. Being able to mount any zone is something I've thought would be handy for a long time now, so I'll take a look at that as I am making the pass through zoneadmd. -- usr/src/lib/brand/native/zone/sw_support.c - in main(), an invalid cmd value will result in returning an uninitialzed err value. makes me wonder, is this code linting properly? I am not getting any lint warning from that, which is surprising. I fixed this. - nit: in install_func(), status should be initialized to Z_ERR not 1. Fixed. - in install_func(), if the snprintf() overflows, you'll pass a a constant error value (Z_ERR) to subproc_status() (which is expecting a waitpid() encoded status value). That was old code that just got moved, but it is broken and I fixed it. - hm. attach_func() looks for the -F force flag, but as discussed previously, in the case of forced attach, the attach brand callback is currently not executed. I duplicated that when I split things up. I have removed it. Thanks again, Jerry ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review: native brand refactoring
On Wed, May 28, 2008 at 12:24:07PM -0600, Jerry Jelinek wrote: Ed, Responses below. Edward Pilatowicz wrote: On Tue, May 27, 2008 at 08:48:17AM -0600, Jerry Jelinek wrote: I have updated the webrev at: http://cr.opensolaris.org/~gjelinek/webrev/ This includes the changes for the feedback I have received so far. I also added the zlogin.c file to the webrev with two bug fixes. One of these was for a bug I was hitting during testing of these changes and there is a second bug in zlogin that came in which I also fixed. So, at a minimum, it would be good to take a look at that additional file. here's my last batch of comments. (for sw_support.c i only looked at the new functions you introduced.) ed -- general - iirc, there is still tons of zulu code in zoneadmd. this code creates the special live upgrade environment, and LU is tied to svr4 packaging, right? so it seems like the zulu code is brand specific and should also be factored out into mount callbacks. (and iirc, the mount operations isn't currenly supported by any branded zones.) thoughts? I don't think there is really any zulu code in zoneadmd but I'll take a pass through to double check. If I see anything like that, I'll fix it up and update the webrev. The mount code is not really svr4 pkg specific either, but is specific to the native brand. I have extensions to mount which would allow us to mount any type of brand with very little change. I added those with the update on attach work. Being able to mount any zone is something I've thought would be handy for a long time now, so I'll take a look at that as I am making the pass through zoneadmd. hm. from what i remember mount mode actually puts the zone root at zoneroot/a and lofs mounts stuff from the global zone at zoneroot, all so that the svr4 packaging code can enter the zone to do packaging operations. i thought this dance was necessary because we wanted packaging scripts to execute inside a zone (since for security reasons, we wouldn t want them to be able to muck with stuff outside a zone). but IPS doesn't have packaging scripts. so will this complicated setup really be necessary for IPS? ie, could we do away with the /a mount in mount mode for IPS? i'm asking about this because the /a mount code in zoneadmd has lots of hard coded paths and makes zoneadmd pretty complicated. (just look at mount_filesystems() and build_mounted_post_var().) i've had to modify this code a few times, i've broken it a couple times, and been confused by it all the time... ed ___ zones-discuss mailing list zones-discuss@opensolaris.org
Re: [zones-discuss] code review: native brand refactoring
On Wed, May 28, 2008 at 01:00:08PM -0600, Jerry Jelinek wrote: Edward Pilatowicz wrote: hm. from what i remember mount mode actually puts the zone root at zoneroot/a and lofs mounts stuff from the global zone at zoneroot, all so that the svr4 packaging code can enter the zone to do packaging operations. i thought this dance was necessary because we wanted packaging scripts to execute inside a zone (since for security reasons, we wouldn t want them to be able to muck with stuff outside a zone). but IPS doesn't have packaging scripts. so will this complicated setup really be necessary for IPS? ie, could we do away with the /a mount in mount mode for IPS? i'm asking about this because the /a mount code in zoneadmd has lots of hard coded paths and makes zoneadmd pretty complicated. (just look at mount_filesystems() and build_mounted_post_var().) i've had to modify this code a few times, i've broken it a couple times, and been confused by it all the time... Ed, Yes, this description is correct but not because of the pkging code. We want to be able to enter the zone while running the sw that is installed in the global zone, irrespective of what is inside the non-global zone. All of the global zone sw is mounted read-only in a mounted zone so we can login into the zone and have a safe environment that matches the global zone. The non-global zone is mounted under /a so we can safely operate on it within the zone using the global zones sw. None of this has any relationship to the pkging code in either zone. All the mount is doing is setting up a safe zone that is compatible with the global zone. With this definition, you can use mount to safely do any kind of admin work, not just running pkging code. The problem with the current mount is it lofs mounts the zone's etc and var back into the global zone mount area. This obviously won't work for non-native zones or even older zones that we are migrating to the new host, which is why I had to fix this for update on attach. ah. ok. thanks for clearing that up for me. i assume all that stuff was for zulu/lu and we would be able to rip it out someday... ed ___ zones-discuss mailing list zones-discuss@opensolaris.org