Hi David, Thanks for the valuable comments. -tony
David Bustos wrote: > Quoth Tony Nguyen on Sun, Oct 19, 2008 at 02:49:40AM -0700: >> This is for 6761070 PSARC 2008/580 Solaris host-based firewall >> <http://monaco.sfbay/detail.jsf?cr=6761070>* >> >> *The webrev and the new event message are respectively at: >> >> http://cr.opensolaris.org/~tonyn/firewall/ >> >> http://cr.opensolaris.org/~tonyn/firewall/8000-R4 > > I've only reviewed the mentioned files so far. I'll continue after > you've addressed these comments. > > One general question I have is what the long-term plan is for observing > service changes. Here you're using _scf_notify_wait(), but the > restarter protocol is far better suited for this sort of information. > I presume you're not using that because you don't want to be the > responsible restarter for these services. Which makes me wonder why > you're not modifying the restarters themselves. Otherwise, it seems > that you're working around a hole in the SMF architecture for agents > which want to tie behaviors of the interfaces a service implementation > uses to the state of the service implementation itself. Which I vaugely > recall coming up before, in the form of an agent wanting to observe > state changes in services, but I don't remember the details. > That's a very good point. A generic SMf mechanism for observing and monitoring service changes would be of great help here but we decided that's outside the scope of this project. I also don't recall details of those discussion. However, I think a good approach would be allowing a service to specify events to observe/monitor and corresponding action (methods) to be invoked for those events. I agree the restarter protocol, restarters or librestart, would probably be a more appropriate places for these changes. However, these changes imply: - third-party restarters, in addition to services need modification to use firewall framework - potential synchronization issues between services under different restarters trying to manipulate IPfilter (simultaneous enable/disable of IPfilter). - and in the case of new dedicate restarter, services managed under another restarter wouldn't be able to use the firewall framework. I do, however, agree that we should revisit the long-term plan for observing service changes. > > > cmd/cmd-inet/usr.lib/inetd/inetd.c > 291: Where is this handle destroyed? The caller is responsible for destroying this handle, except when function returns non-zero in which case it should destroy the handle. I added commented and the handle destroy call. > > 341: If you're going to document one return value, please document all > of them. done > > 354: Please add a comment explaining why it's ok to return the same > value when fmri_to_instance() fails and instance_validate_aux_fmri() > fails. done > > 405 > - Doesn't lint complain since you're not casting this return value > to void? > > - Please add a comment explaining why it's ok not to notify anyone > when this fails. > I missed this and two other lint errors. I addressed the other two lint warnings and will use the return value and notify the failure here. > cmd/cmd/inet/usr.lib/inetd/repval.c > 828: If you're going to document one return value, please document all > of them. done > > scf_inst_delete_prop(): Since this function isn't in libscf, please > use a different prefix. How about instance_delete_prop? > > 899: My manpage says scf_entry_destroy() has a void return type. Is > yours different? Umm..I'm not sure that cast was there. You sure we have only one manpage? :^) It's removed. > > cmd/ipf/svc/ipfd.c > 26: Please add a comment explaining what this program is supposed to > do. will do As a general response to most of your concerns about the error handlings in this file. This daemon is started by network/ipfilter which currently is a transient service. I don't speak for the Networking team but my guess is that the team also wants users to be able to manipulate IPfilter outside of SMF (ipf -E|D). Consequently, this daemon, svc.ipfd is started by a transient service and won't get restarted by svc.startd. My goal is to have svc.ipfd run as long as network/ipfilter is online and not interrupt it for reasons outside of its control, svc.configd or libscf interactions. This way, svc.ipfd's availability is consistent with ipmon, the other process started by network/ipfilter. Thus, the scf related errors are not handled as well as they should be. I'll address your concerns as much as I can and would appreciate your suggestions on how we can best achieve such goal. > > 88: I believe commented-out code is frowned upon. > done > 102: Why is it ok to return 0 if scf_handle_bind() fails for reasons > other than SCF_ERROR_NO_RESOURCES? > The return value is meaningless here. Essentially, we try our best and exit the function. It should probably be a void function. > 141: I don't understand what "Restarter events are notified twice" > means, or how it's relevent to this function. > Old code. Removed > 146: Please add an explanation of why it's ok to return false when > there's a memory allocation error. > ok > 157: cstyle says lines should not begin with binary operators. > done > 163: Please add > > assert(max_scf_value_size >= scf_limit(SCF_LIMIT_MAX_NAME_LEN) + 1); > > somewhere before this. I changed the initialization in main() so we wouldn't need to keep checking. max_scf_fmri_size = scf_limit(SCF_LIMIT_MAX_FMRI_LENGTH) + 1; max_scf_value_size = scf_limit(SCF_LIMIT_MAX_VALUE_LENGTH) + 1; assert(max_scf_fmri_size > 0); assert(max_scf_value_size > 0); > 164: What should a user do if he sees this log message? > The problem is most likely in SMF and this daemon is another victim. I don't see what user can do. What do you suggest we can do here? > 197: So I suppose you're relying on the fact that svc.startd removes > all of the values from the property when it's done, eh? I believe > that's an unstable interface. Shouldn't that have been included in > your ARC case? Yes and yes. I'll look into amending an approved ARC case. > 310: I presume you considered launching ipfilter in a separate process > contract? That doesn't buy us much since network/ipfilter is a transient service. Do you see any benefit? > 317: Is there a reason why you launch through the shell? It doesn't > seem that any of the arguments need to allow shell substitions. > You're right. it's unnecessary and should be replaced. > 312: What should a user do if he sees this log message? > Again, I'm not sure how a user can correct such systematic problem? > 318: Why do you use _exit() here rather than exit()? > no reason. changed to exit(). > service_is_rpc(): Please add a comment explaining why it's ok to > return B_FALSE when the property exists with the value "false" and > when there's an error. ok > > 352: Is there a reason you don't use SCF_SNAPSHOT_RUNNING here? > changed. > 359: I didn't see this behavior of preferring inetd/isrpc over > firewall_context/isrpc in the ARC case. Did I miss it? Or does it > not matter? An RPC service has either inetd or firewall_context pg but not both. There isn't a preference as order does not matter. > > 369: Is there a reason you're using scf_value_get_as_string() instead > of scf_value_get_boolean()? With the latter, you wouldn't have to > allocate value. > yes, scf_value_get_boolean() is cleaner, changed. > 405-12: Too much indentation. > fixed > 407: Please add a comment explaining why it's ok to continue of > repository_rebind() failed. > > 414: It's not clear to me that it's ok to retry in such a tight loop > for all errors of _scf_notify_add_pgtype() caught by the default > case. It would help for you to catch the errors explicitly and > abort() on unknown errors. But I'd like to avoid aborting :^( What other options do we have? If there aren't any, I agree it should abort. What do you think? > 418: "for (;;)" is the preferred style for infinite loops in O/N. It > also eliminates the need for "/*CONSTCOND*/". > ah, done. > 420: Note that using _scf_notify_wait() will increase the incidence of > 6531656. We may want to raise its priority to avoid a fire drill. > agreed > 422: Please add a comment explaining why it's ok to ignore errors from > repository_rebind(). > repository_rebind doesn't return any useful error at the moment. > 426-535: Is there a reason this code can't be in a separate function? > No, it's simple enough that I thought it'd be ok in the same function. If you feel strongly about it, I can have it in a separate function. > 436: SCF_ERROR_NOT_BOUND should be impossible since _scf_notify_wait() > succeeded. > ok > 438: I'm not sure it's appropriate to log this message without > explaining what its consequences are. Yes, "Lost repository event" may not be a clear consequence. Do you have other suggestions? > > 443: I believe SCF_ERROR_NOT_SET signifies a bug in libscf or > svc.configd. Shouldn't you abort() if it occurs? If it's a bug in configd or libscf, this daemon should continue to run so that it can process future events when configd or libscf issues are sorted out. > 446: What should a user do if he sees this log message? > The problem is most likely in SMF and this daemon is another victim. I don't see what user can do. What do you suggest we can do here? > 453: SCF_ERROR_NOT_BOUND should be impossible since scf_pg_update() > succeeded. > ok > 458: Please add a comment explaining why it's ok to ignore the other > errors. I added a generic comment explaining that we have minimal scf error handling since most scf errors are outside the control of the daemon, essentially we serve when we can. Thus, rebind if connection is broken and simply continue and wait for future events and hope that SMF can sort itself out. > > 462: This buffer is one character too small. In theory, you should > allocate a buffer of size scf_limit(SCF_LIMIT_MAX_FMRI_LENGTH + 1). Hmm...Would it make more sense if we change the definition to have the right value? An rfe? > > 466: SCF_ERROR_NOT_BOUND should be impossible since > scf_pg_get_parent_instance() succeeded. ok > > 471: Please add a comment explaining why it's ok to ignore the other > errors. see generic comment > 490: SCF_ERROR_NOT_BOUND should be impossible since > scf_instance_to_fmri() succeeded. ok > 495: Please add a comment explaining why it's ok to ignore the other > errors. And why when the running snapshot doesn't exist, we don't > fall back to the current properties like service_is_rpc(). > Commented added and falling back to current properties if we can't get running snapshot for any errors except instance deleted. > 503: What should a user do if he sees this log message? > If update fails, the problem is probably outside of user's control, error in allocating memory, forking, or getting service's state. What do you suggest we can do here? > 512: SCF_ERROR_NOT_BOUND should be impossible since > scf_instance_get_snapshot() succeeded. > agreed > 515: Please add a comment explaining why it's ok to continue for other > errors. > see above > 522: What should a user do if he sees this log message? > same as above. > 527: SCF_ERROR_NOT_BOUND should be impossible since > scf_instance_get_snapshot() succeeded. > agreed > 532: Please add a comment explaining why it's ok to continue on other > errors. > see above generic comment > lib/librestart/common/librestart.c > 1608: If you're going to document one return value, please document > all of them. done > > instance_validate_aux_fmri(): Is there a reason this function is in > this part of the file? What's the more appropriate location? > > 1631: Why doesn't this test val for NULL? it does now. > > 1638: Is there a reason this function doesn't use the running > snapshot? If so, please document the fact that it doesn't in the > function's comment. yes, it should was read from the running snapshot. > > 1642: Since you don't use aux_inst, did you consider using > scf_parse_fmri() instead? I wasn't aware of scf_parse_fmri. It's changed to use scf_parse_fmri(). > > lib/librestart/common/librestart.h > 240 > - It seems that all of the other symbols in this file begin with > "restarter". Is there a reason instance_validate_aux_fmri() > doesn't? how about restarter_inst_validate_aux_fmri? > > - Please add a comment documenting the function. done