Quoth Steve Peng on Tue, Dec 18, 2007 at 11:07:30AM -0700: > The incremental webrev which incorporate all the comments can be found > under the same location. The original webrev has been renamed to > 6351623-old for your reference. Again, send me any comment on those new > changes.
Good work so far. I found some devils in your details, though: common/svc/repcache_protocol.h 357: This should be "SWITCH(src, dst, flag)". 358: - This should explain what flag means. - This describes how the operation will be used. You should at least precede it with a description of the operation. Something like "Moves the main repository database file from src to dst." 817: I believe the comment is supposed to correspond to the enum rep_protocol_requestid constant, which would just be SWITCH in this case. 818-20: Do these need to be indented so much? cmd/svc/configd/client.c 1774: Please document the errors that repository_switch() can fail with. 1789: Don't you need the changeid logic that backup_repository() has? Or is backend_switch() somehow idempotent? (See 1.1 at the top of repcache_protocol.h about idempotency.) cmd/svc/configd/backend.c 1105-6: This is a lot of memory. Can you allocate it on the heap rather than the stack, so that we can fail gracefully if we're out of memory, rather than segfaulting? 1109: Can't this be const char *? 1118: Can't you just make tmppath PATH_MAX + 7 to avoid this? 1122,27: Please include strerror(errno) in these error messages. 1128: Shouldn't you warn the administrator if unlink() fails? 1129: Shouldn't you close(dstfd)? 1133-7: Shouldn't this explanation either be before backend_switch_copy() or before 1148? 1133: Shouldn't this explanation of sw_back be either before backend_switch_copy() or before we use sw_back at 1178? 1141: What is this fstat() for? 1142: Please include strerror(errno) in this error message. 1143: Shouldn't you warn the administrator if unlink() fails? 1151,63: Please include strerror(errno) in these error messages. 1185,96,1206: Please include both paths and strerror(errno) in these error messages. 1204: What's the benefit of moving to trans_db if you're going to move to dst right away? 1208,15: Shouldn't you warn the administrator if these unlink()s fail? 1223: Please explain what the objectives of backend_switch() are, and maybe how it accomplishes them. 1227-8: For each error code, please explain why the function failed. 1243: backend_lock() cannot fail with those arguments. Please assert(result == REP_PROTOCOL_SUCCESS). 1248: Why do you fail if be_readonly is set? Won't this be true when we try to copy the repository off while the root filesystem is read-only? If it is correct, why not invoke backend_lock() with writing = 1? 1254: src has come straight from the client. It might be unterminated. In client.c:repository_switch() you should either use strnlen() to detect the situation and return an error, or you should forcibly terminate the buffer. Same for dst. 1256: Why is it ok to fail if src doesn't match? What should the client do in that case? 1267: I'm confused by this comment. It says that you're going to "go ahead switch back to the original repository", but the code just returns without switching. Can you explain this for me? 1273: If backend_switch_copy() failed, but sw_back is true, will we try to sqlite_open(dst) anyway? 1276: Since sqlite has a specific "integrity check" function, please use "sanity check" here. 1278: If sqlite_open() fails, shouldn't we issue an error message? 1287: This string looks a lot like one in backend_create(). Should they be shared with a macro or a common function? 1296: If this fails, shouldn't we issue an error message? And return something other than REP_PROTOCOL_SUCCESS? 1315-7: Can these be const char *? 1329s/uses/use/ 1332: This will be overwritten if stat(trans_db) succeeds. Why don't you do it first, and only do stat(fast_db) if it fails? 1343: As for 1276, "integrity check" means something specific in sqlite, so please change this to "sanity check". 1371: If sqlite_open() failed, shouldn't we warn the administrator that there's an invalid repository lying around? Or that we didn't use it because it is invalid? 1348-57: This looks a lot like the code at 1283-92. Shouldn't we put them into a separate function? 1366: Shouldn't we warn the administrator if this fails? 1370: This shouldn't be necessary if the rename() succeeded, should it? lib/libscf/inc/libscf.h 562-5: Why did you choose to put these in libscf.h rather than libscf_priv.h? lib/libscf/inc/libscf_priv.h 297-301: Why do we need a structure? Can't _scf_repository_switch() accept sw_src, sw_dst, and sw_back as separate parameters? lib/libscf/common/lowlevel.c 6859: Please change this to something like "Request svc.configd to switch repository database files." Also document what errors the function can fail with. 6881: Blank line, please. 6884-5: This can be done before the mutex is taken. cmd/svc/svcadm/svcadm.c 2405,16: Since you're not adding this subcommand to the usage message, these calls won't be helpful. 2408-9,13-14: Why are you encoding the paths into svcadm? Do you anticipate needing to switch to repositories other than FAST_REPOSITORY_DB and REPOSITORY_DB in the future? If not, this seems like extra information which is easy to get wrong. Couldn't you just set sw_back to 1 or 0, and have svc.configd know what paths to use? 2428: _NOT_BOUND should be impossible, so you should abort() if it happens. 2432: This should be "/* NOTREACHED */". 2442: Please add a default case which prints that _scf_repository_switch() returned an unknown error. cmd/svc/milestone/manifest-import 449: Please change this to something like "Try moving the repository to tmpfs. If it doesn't work, reset doswitch so we won't try switching back." 451-75: I think you could save some lines with /usr/sbin/svcadm _smf_repository_switch fast switchret=$? ... if [ $switchret -eq 0 ]; then /usr/sbin/svcadm _smf_repository_switch perm || exit 1 fi 468: Can you explain that the existing retry mechanism is SMF restarting the service? Something like, "... if switch back fails, we'll just exit and svc.startd should restart us." 473: I think you left the dollar sign out of this line. David