David,

Thanks for your thorough code review and comments.  Responses inline.  
Let me know if there is any
comment missing my response.

Thanks

Steve

David Bustos wrote:
 > Quoth Steve Peng on Tue, Dec 18, 2007 at 11:07:30AM -0700:
 >
 >> The incremental webrev which incorporate all the comments can be found
 >> under the same location.  The original webrev has been renamed to
 >> 6351623-old for your reference.  Again, send me any comment on those new
 >> changes.
 >>
 >
 > Good work so far.  I found some devils in your details, though:
 >
 > common/svc/repcache_protocol.h
 >   357: This should be "SWITCH(src, dst, flag)".

Updated.

 >
 >   358:
 >     - This should explain what flag means.
 >
 >     - This describes how the operation will be used.  You should at
 >       least precede it with a description of the operation.  Something
 >       like "Moves the main repository database file from src to dst."

Updated.

 >
 >   817: I believe the comment is supposed to correspond to the
 >     enum rep_protocol_requestid constant, which would just be SWITCH in
 >     this case.

Updated.

 >
 >   818-20: Do these need to be indented so much?

Updated.

 >
 > cmd/svc/configd/client.c
 >   1774: Please document the errors that repository_switch() can fail
 >     with.

Updated.

 >
 >   1789: Don't you need the changeid logic that backup_repository() has?
 >     Or is backend_switch() somehow idempotent?  (See 1.1 at the top of
 >     repcache_protocol.h about idempotency.)

Will look into this in detail and get back to you.

 >
 > cmd/svc/configd/backend.c
 >   1105-6: This is a lot of memory.  Can you allocate it on the heap
 >     rather than the stack, so that we can fail gracefully if we're out
 >     of memory, rather than segfaulting?

Good point.  Changed.  We probably should fix the backup code as well.

 >
 >   1109: Can't this be const char *?

Updated.

 >
 >   1118: Can't you just make tmppath PATH_MAX + 7 to avoid this?

Yes, we can avoid this by extending the buf by 7 and it may make sense to
any string which is not pathname related.  Here we are dealing with
the filesystem pathname which in my opinion should be limited and checked
by the system maximum (PATH_MAX).

 >   1122,27: Please include strerror(errno) in these error messages.

Done.

 >
 >   1128: Shouldn't you warn the administrator if unlink() fails?

Warning message is added.

 >
 >   1129: Shouldn't you close(dstfd)?

Yes.

 >
 >   1133-7: Shouldn't this explanation either be before
 >     backend_switch_copy() or before 1148?

It is before 1148, isn't it?  Are you suggesting "right before 1148?"

 >
 >   1133: Shouldn't this explanation of sw_back be either before
 >     backend_switch_copy() or before we use sw_back at 1178?

The reason the comment is here is to clearly explain the copy direction 
and what
src and dst are when the flag is set or unset.  It may not make sense to 
put it
before 1178 since it has nothing to do what the following code section does.
Make sense?

 >
 >   1141: What is this fstat() for?

One of Mike's recommendations is to stat the file right before the copy op.

 >
 >   1142: Please include strerror(errno) in this error message.

Updated.

 >
 >   1143: Shouldn't you warn the administrator if unlink() fails?

Yes. Updated.

 >
 >   1151,63: Please include strerror(errno) in these error messages.

Updated.

 >
 >   1185,96,1206: Please include both paths and strerror(errno) in these
 >     error messages.

Updated.

 >
 >   1204: What's the benefit of moving to trans_db if you're going to
 >     move to dst right away?

It is used as an indicator of the completion of the copying op
(from src to tmppath) so when the system crashes right after the copy and
if this file exists then we know the copy op is completed and the file
can be checked for the possible crash recovery.  If we dont have this
intermidiate step then it will be hard to determin if the 'tmppath' file
has the completion of the previous copy op or not.  Make sense?

I have thought about that as well but decide that this may give us a
better recovery story.

 >
 >   1208,15: Shouldn't you warn the administrator if these unlink()s fail?

Done.

 >
 >   1223: Please explain what the objectives of backend_switch() are, and
 >     maybe how it accomplishes them.

Done.

 >
 >   1227-8: For each error code, please explain why the function failed.


Updated.

 >
 >   1243: backend_lock() cannot fail with those arguments.  Please
 >     assert(result == REP_PROTOCOL_SUCCESS).

Should the same call in backend_create_backup be changed to assert as well?

 >
 >   1248: Why do you fail if be_readonly is set?  Won't this be true when
 >     we try to copy the repository off while the root filesystem is
 >     read-only?  If it is correct, why not invoke backend_lock() with
 >     writing = 1?

That is true when we have the true early import (import with ro root),  
For now,
the copying and switching ops are done with writable root filesystem and the
repository needs to be writable to allow switch back.  Make sense?

 >
 >   1254: src has come straight from the client.  It might be
 >     unterminated.  In client.c:repository_switch() you should either use
 >     strnlen() to detect the situation and return an error, or you should
 >     forcibly terminate the buffer.  Same for dst.

rpr_src and rpr_dst are strlcpy in _scf_repository_switch() which always
null-termitated.  Isn't it?

 >
 >   1256: Why is it ok to fail if src doesn't match?  What should the
 >     client do in that case?

Good point.  I probably should assert it.

 >
 >   1267: I'm confused by this comment.  It says that you're going to "go
 >     ahead switch back to the original repository", but the code just
 >     returns without switching.  Can you explain this for me?

If we are switching back (from tmpfs, sw_back is set) and copying fails
then we actually don't return and continue with switching back to dst which
is the original repository under /etc/svc.

 >
 >   1273: If backend_switch_copy() failed, but sw_back is true, will we
 >     try to sqlite_open(dst) anyway?

Are you refering to line 1278?  Do you see any reason that we shouldn't
sqlite_open?

 >
 >   1276: Since sqlite has a specific "integrity check" function, please
 >     use "sanity check" here.

Done.

 >
 >   1278: If sqlite_open() fails, shouldn't we issue an error message?

Probably should.  Added.

 >
 >   1287: This string looks a lot like one in backend_create().  Should
 >     they be shared with a macro or a common function?

Will be part of a new common function (see response for 1348-57 comment).

 >
 >   1296: If this fails, shouldn't we issue an error message?  And return
 >     something other than REP_PROTOCOL_SUCCESS?

Good point.  I will add an new error msg along with the return code.

 >
 >   1315-7: Can these be const char *?

Updated.

 >
 >   1329s/uses/use/

Updated.

 >
 >   1332: This will be overwritten if stat(trans_db) succeeds.  Why don't
 >     you do it first, and only do stat(fast_db) if it fails?

Actually only one of them can exist when the system crashes.  If 
trans_db exists
then there is not possible for fast_db to exist.  Vice versa.

 >
 >   1343: As for 1276, "integrity check" means something specific in
 >     sqlite, so please change this to "sanity check".

Done.

 >
 >   1371: If sqlite_open() failed, shouldn't we warn the administrator
 >     that there's an invalid repository lying around?  Or that we didn't
 >     use it because it is invalid?

We just unlink it and don't use at all.

 >
 >   1348-57: This looks a lot like the code at 1283-92.  Shouldn't we put
 >     them into a separate function?

A new common interface created.

 >
 >   1366: Shouldn't we warn the administrator if this fails?

We will do the best to recover from crash and if it fails we will go 
with the
default repository so I don't know what else admin can do with the warning
message in this case.  Do you see anything that I may miss?

 >
 >   1370: This shouldn't be necessary if the rename() succeeded, should
 >     it?

Right, if rename or switch succeeds then unlink becomes no-op but if it
fails then the file needs to be unlinked.  Make sense?


 >
 > lib/libscf/inc/libscf.h
 >   562-5: Why did you choose to put these in libscf.h rather than
 >     libscf_priv.h?

Since REPOSITORY_DB is needed in both configd and svcadm implementation so
I move it from configd.h to a place that both implementation can reference
it w/o having configd reference libscf_priv.h since it already reference
libscf.h.

 >
 > lib/libscf/inc/libscf_priv.h
 >   297-301: Why do we need a structure?  Can't _scf_repository_switch()
 >     accept sw_src, sw_dst, and sw_back as separate parameters?

Yes, I was thinking that too but decided to put them in a structure so
in the future if we decide to add more parameters we can do so w/o chaning
the interfaces prototype.

 > lib/libscf/common/lowlevel.c
 >   6859: Please change this to something like "Request svc.configd to
 >     switch repository database files."  Also document what errors the
 >     function can fail with.

Updated.

 >
 >   6881: Blank line, please.

Done.

 >
 >   6884-5: This can be done before the mutex is taken.

Done.

 >
 > cmd/svc/svcadm/svcadm.c
 >   2405,16: Since you're not adding this subcommand to the usage message,
 >     these calls won't be helpful.

I thought that as well when I looked at the backup case.  I will change
that to exit(1) instead of usage().

 >
 >   2408-9,13-14: Why are you encoding the paths into svcadm?  Do you
 >     anticipate needing to switch to repositories other than
 >     FAST_REPOSITORY_DB and REPOSITORY_DB in the future?  If not, this
 >     seems like extra information which is easy to get wrong.  Couldn't
 >     you just set sw_back to 1 or 0, and have svc.configd know what paths
 >     to use?

Correct.  I like to leave door openned for the possible future enhancement.

 >
 >   2428: _NOT_BOUND should be impossible, so you should abort() if it
 >     happens.

Done.

 >
 >   2432: This should be "/* NOTREACHED */".

Done.

 >
 >   2442: Please add a default case which prints that
 >     _scf_repository_switch() returned an unknown error.

Updated.

 >
 > cmd/svc/milestone/manifest-import
 >   449: Please change this to something like "Try moving the repository
 >     to tmpfs.  If it doesn't work, reset doswitch so we won't try
 >     switching back."

Updated.

 >
 >   451-75: I think you could save some lines with
 >
 >       /usr/sbin/svcadm _smf_repository_switch fast
 >       switchret=$?
 >       ...
 >       if [ $switchret -eq 0 ]; then
 >               /usr/sbin/svcadm _smf_repository_switch perm || exit 1
 >       fi

Updated.

 >
 >   468: Can you explain that the existing retry mechanism is SMF
 >     restarting the service?  Something like, "... if switch back fails,
 >     we'll just exit and svc.startd should restart us."

Done.

 >
 >   473: I think you left the dollar sign out of this line.

Done.
 >
 >
 > David
 >




Reply via email to