Quoth Tony Nguyen on Thu, Oct 23, 2008 at 04:43:04PM -0700:
> David Bustos wrote:
> > Quoth Tony Nguyen on Sun, Oct 19, 2008 at 02:49:40AM -0700:
> >> This is for 6761070 PSARC 2008/580 Solaris host-based firewall 
> >> <http://monaco.sfbay/detail.jsf?cr=6761070>*
> >>
> >> *The webrev and the new event message are respectively at:
> >>
> >> http://cr.opensolaris.org/~tonyn/firewall/
> >>
> >> http://cr.opensolaris.org/~tonyn/firewall/8000-R4
...
> I agree the restarter protocol, restarters or librestart, would probably 
> be a more appropriate places for these changes. However, these changes 
> imply:
> 
> - third-party restarters, in addition to services need modification to 
> use firewall framework
> 
> - potential synchronization issues between services under different 
> restarters trying to manipulate IPfilter (simultaneous enable/disable of 
> IPfilter).
> 
> - and in the case of new dedicate restarter, services managed under 
> another restarter wouldn't be able to use the firewall framework.
> 
> I do, however, agree that we should revisit the long-term plan for 
> observing service changes.

That's true.  There is one entity which can observe and exert control
over all services regardless of restarter, though: svc.startd's graph
engine.  Much like dependencies are evaluated without knowledge of the
service or its restarter, IP Filter rules could be installed.

I'm not sure the difference in functionality, susceptibility to bugs, or
amenability to future changes is small enough that it's ok to proceed
with the current architecture.  What's your judgement?

Another thing that concerns me is the tying of this functionality to the
state of network/ipfilter:default.  Won't we have more flexibility if
this behavior can be turned on or off though a separate command?

> > cmd/cmd-inet/usr.lib/inetd/inetd.c
> >   291: Where is this handle destroyed?
> 
> The caller is responsible for destroying this handle, except when
> function returns non-zero in which case it should destroy the handle. I
> added commented and the handle destroy call.

Ah, yes, you destroy it in inst_reset_aux_fmri().  I missed that.

...
> > cmd/cmd/inet/usr.lib/inetd/repval.c
...
> >   scf_inst_delete_prop(): Since this function isn't in libscf, please
> >     use a different prefix.
> 
> How about instance_delete_prop?

That's fine.

...
> > cmd/ipf/svc/ipfd.c
...
> As a general response to most of your concerns about the error handlings
> in this file. This daemon is started by network/ipfilter which currently 
> is a transient service. I don't speak for the Networking team but my 
> guess is that the team also wants users to be able to manipulate 
> IPfilter outside of SMF (ipf -E|D). Consequently, this daemon, svc.ipfd 
> is started by a transient service and won't get restarted by svc.startd.

Can you elaborate on the connection between those two?  Does ipf -E|D
kill your daemon?

> My goal is to have svc.ipfd run as long as network/ipfilter is online
> and not interrupt it for reasons outside of its control, svc.configd or
> libscf interactions. This way, svc.ipfd's availability is consistent
> with ipmon, the other process started by network/ipfilter. Thus, the scf
> related errors are not handled as well as they should be. I'll address
> your concerns as much as I can and would appreciate your suggestions on
> how we can best achieve such goal.

It's ok to ignore errors.  Specifically when they're expected, or they
have no impact on functionality expected by the user.  When it's not
clear to me that either of these is the case, I've asked you to explain.
And usually when that's required, I think you should also add a comment
so people reading or modifying this code in the future will also
understand.

...
> >   102: Why is it ok to return 0 if scf_handle_bind() fails for reasons
> >     other than SCF_ERROR_NO_RESOURCES?
> 
> The return value is meaningless here. Essentially, we try our best and
> exit the function. It should probably be a void function.

It seems that the return value is used at line 559.  The other uses
assume that the handle is bound.  If scf_handle_bind() fails, we know it
isn't bound, and shouldn't return.  We should either retry, exit, or
return a value and make the callers handle the error appropriately.  And
perhaps emit a message to the user explaining the impact of the error.

...
> >   163: Please add
> > 
> >     assert(max_scf_value_size >= scf_limit(SCF_LIMIT_MAX_NAME_LEN) + 1);
> > 
> >     somewhere before this.
> 
> I changed the initialization in main() so we wouldn't need to keep checking.
> 
> max_scf_fmri_size = scf_limit(SCF_LIMIT_MAX_FMRI_LENGTH) + 1;
> max_scf_value_size = scf_limit(SCF_LIMIT_MAX_VALUE_LENGTH) + 1;
> assert(max_scf_fmri_size > 0);
> assert(max_scf_value_size > 0);

Actually my point was that you're fetching a name into a buffer sized
for a value.  This is currently safe because values can be much longer
than names.  But I don't think that's inherent in the abstractions, so
I think you should assert() that it's the case, so if it changes, we'll
know before a bug about bizarre behavior is filed.

> >   164: What should a user do if he sees this log message?
> 
> The problem is most likely in SMF and this daemon is another victim. I
> don't see what user can do. What do you suggest we can do here?

Well what is the impact of this error?  If there is none, then you
should probably not log a message.

...
> >   310: I presume you considered launching ipfilter in a separate process
> >     contract? 
> 
> That doesn't buy us much since network/ipfilter is a transient service. 
> Do you see any benefit?

I don't recall whether when one child dies in a transient service, all
processes are killed.  If so, then this would be even more important
because SMF won't restart you.  But it should be pretty rare and can be
deferred if you want.

...
> >   312: What should a user do if he sees this log message?
> 
> Again, I'm not sure how a user can correct such systematic problem?

Well if it's because the system is low on memory, then maybe he can add
memory.  If it's because the software is defective, then he can report
a bug.  If it's because the hardware is introducing undetected memory
errors, then he should investigate his hardware.  And perhaps more
importantly, if his firewall is now inconsistent, then he should
probably take steps to correct that.  To the extent that you can,
I think your error message should convey information which helps the
user decide.

...
> >   359: I didn't see this behavior of preferring inetd/isrpc over
> >     firewall_context/isrpc in the ARC case.  Did I miss it?  Or does it
> >     not matter?
> 
> An RPC service has either inetd or firewall_context pg but not both. 
> There isn't a preference as order does not matter.

Oh, I missed that in the ARC case.  Where is it explained?  And will
this be enforced, or is it up to the service author?

...
> >   407: Please add a comment explaining why it's ok to continue of
> >     repository_rebind() failed.

I presume you accepted this comment.

> >   414: It's not clear to me that it's ok to retry in such a tight loop
> >     for all errors of _scf_notify_add_pgtype() caught by the default
> >     case.  It would help for you to catch the errors explicitly and
> >     abort() on unknown errors.
> 
> But I'd like to avoid aborting :^( What other options do we have? If 
> there aren't any, I agree it should abort. What do you think?

You should determine the other errors _scf_notify_add_pgtype() can fail
with.  For errors such as out-of-memory, you should pause before
retrying.  For unknown errors, I would abort() because it means that the
interface is documented incorrectly, or the system is suffering memory
errors, and the core should help us understand which.  If you don't
think a core would be useful, then you could emit an error message
instead.  And if you would rather make it easier for future libscf
coders to introduce new errors, then you can ignore the unknown errors.

...
> >   422: Please add a comment explaining why it's ok to ignore errors from
> >     repository_rebind().
> 
> repository_rebind doesn't return any useful error at the moment.

Why not?  Do you regard returning with h unbound a success?

> >   426-535: Is there a reason this code can't be in a separate function?
> 
> No, it's simple enough that I thought it'd be ok in the same function.
> If you feel strongly about it, I can have it in a separate function.

I think it's too long and has too much indentation to be considered
simple.  Please move it to a separate function.

...
> >   438: I'm not sure it's appropriate to log this message without
> >     explaining what its consequences are.
> 
> Yes, "Lost repository event" may not be a clear consequence. Do you have
> other suggestions?

Well what are the consequences of the error?

> >   443: I believe SCF_ERROR_NOT_SET signifies a bug in libscf or
> >     svc.configd.  Shouldn't you abort() if it occurs?
> 
> If it's a bug in configd or libscf, this daemon should continue to run
> so that it can process future events when configd or libscf issues are
> sorted out.

How can the configd or libscf issues be sorted out?

> >   446: What should a user do if he sees this log message?
> 
> The problem is most likely in SMF and this daemon is another victim. I
> don't see what user can do. What do you suggest we can do here?

Well what errors can lead to this?  What are their impacts?  Do you
think a core would not be useful in diagnosing a problem here?

...
> >   458: Please add a comment explaining why it's ok to ignore the other
> >     errors.
> 
> I added a generic comment explaining that we have minimal scf error
> handling since most scf errors are outside the control of the daemon,
> essentially we serve when we can. Thus, rebind if connection is broken
> and simply continue and wait for future events and hope that SMF can
> sort itself out.

I don't think hope should be part of this algorithm, especially since
it's supposed to enforce security.  Unless this error results in no
impact on functionality, at the least an error message should be issued.

> >   462: This buffer is one character too small.  In theory, you should
> >     allocate a buffer of size scf_limit(SCF_LIMIT_MAX_FMRI_LENGTH + 1).
> 
> Hmm...Would it make more sense if we change the definition to have the
> right value? An rfe?

It does have the right value, as currently defined.  I presume you mean
that you don't think the current definition is as useful as a new
definition.  (The size of the buffer required, rather than the maximum
length of the string.)  You can certainly file an RFE for that.

...
> >   503: What should a user do if he sees this log message?
> 
> If update fails, the problem is probably outside of user's control,
> error in allocating memory, forking, or getting service's state. What do
> you suggest we can do here?

I think sometimes errors allocating memory or getting a service's state
are in the user's control.  I suggest you explain the impact of the
error here, and possibly explain likely causes and their remedies.

...
> > lib/librestart/common/librestart.c
...
> >   instance_validate_aux_fmri(): Is there a reason this function is in
> >     this part of the file?
> 
> What's the more appropriate location?

If there is no reason for it to be there, then move it to the end.  If
there are related functions, then you should move it near them.

...
> > lib/librestart/common/librestart.h
> >   240
> >     - It seems that all of the other symbols in this file begin with
> >       "restarter".  Is there a reason instance_validate_aux_fmri()
> >       doesn't?
> 
> how about restarter_inst_validate_aux_fmri?

Sure.


David

Reply via email to