David,

Thanks for your comments.  Replies inline.  With regard to idempotency, 
I think the protocol should
be idempotent and handle accordingly.

Thanks

Steve

David Bustos wrote:
> Quoth Steve Peng on Tue, Jan 15, 2008 at 08:16:16PM -0700:
>   
>> David Bustos wrote:
>>     
> ...
>   
>>  > cmd/svc/configd/backend.c
>>  >   1105-6: This is a lot of memory.  Can you allocate it on the heap
>>  >     rather than the stack, so that we can fail gracefully if we're out
>>  >     of memory, rather than segfaulting?
>>
>> Good point.  Changed.  We probably should fix the backup code as well.
>>     
>
> Yes.  Those can be done later, under new bugs.
>   

Changed anyway.

>   
>>  >   1118: Can't you just make tmppath PATH_MAX + 7 to avoid this?
>>
>> Yes, we can avoid this by extending the buf by 7 and it may make sense to
>> any string which is not pathname related.  Here we are dealing with
>> the filesystem pathname which in my opinion should be limited and checked
>> by the system maximum (PATH_MAX).
>>     
>
> Ok.  Please change the description of _FAIL_TRUNCATED in the function
> comment to something like "dst is too long", then.
>   

Done.

>   
>>  >   1133-7: Shouldn't this explanation either be before
>>  >     backend_switch_copy() or before 1148?
>>
>> It is before 1148, isn't it?  Are you suggesting "right before 1148?"
>>     
>
> Yes, between 1147 and 1148.
>
>   
>>  >   1133: Shouldn't this explanation of sw_back be either before
>>  >     backend_switch_copy() or before we use sw_back at 1178?
>>
>> The reason the comment is here is to clearly explain the copy direction 
>> and what src and dst are when the flag is set or unset.  It may not
>> make sense to put it before 1178 since it has nothing to do what the
>> following code section does.  Make sense?
>>     
>
> If you are explaining the meaning of src & dst, then please move it to
> backend_switch_copy()'s function comment.
>   

Now it is the function comment.

>   
>>  >   1141: What is this fstat() for?
>>
>> One of Mike's recommendations is to stat the file right before the copy op.
>>     
>
> I believe he meant for you to verify that the number of bytes that you
> copy matches what fstat() returned.  Your code doesn't use the results
> of fstat() at all.
>   

Ah, ok.  Bytes check is added.

>   
>>  >   1204: What's the benefit of moving to trans_db if you're going to
>>  >     move to dst right away?
>>
>> It is used as an indicator of the completion of the copying op
>> (from src to tmppath) so when the system crashes right after the copy and
>> if this file exists then we know the copy op is completed and the file
>> can be checked for the possible crash recovery.  If we dont have this
>> intermidiate step then it will be hard to determin if the 'tmppath' file
>> has the completion of the previous copy op or not.  Make sense?
>>
>> I have thought about that as well but decide that this may give us a
>> better recovery story.
>>     
>
> Well we can still crash after the copy and before the rename(tmppath,
> trans_db), right?  Then the rename to trans_db doesn't eliminate the
> possibility that we'll have to recopy the database.  So it seems to be
> a performance optimization, to save us time.  I'm concerned that the
> scenario under which it saves us time is too rare to justify the extra
> code and behavioral complexity.
>
> Unless this is about atomicity of rename() across filesystems.  If
> rename(tmppath, dst) isn't atomic because tmppath & dst reside on
> different filesystems but rename(trans_db, dst) is, then the extra
> rename() would be justified.
>   

Rename is an atomic operation and does not across the file system 
boundary.  So in this case, the
tmppath and dst are on the same filesystem.  I am perfect fine with 
eliminating the extra
rename which was in my original thought anyway. :-)

>   
>>  >   1243: backend_lock() cannot fail with those arguments.  Please
>>  >     assert(result == REP_PROTOCOL_SUCCESS).
>>
>> Should the same call in backend_create_backup be changed to assert as well?
>>     
>
> Yes.  Unless you see how backend_lock() can fail.
>   

Ok, changed.

>   
>>  >   1248: Why do you fail if be_readonly is set?  Won't this be true when
>>  >     we try to copy the repository off while the root filesystem is
>>  >     read-only?  If it is correct, why not invoke backend_lock() with
>>  >     writing = 1?
>>
>> That is true when we have the true early import (import with ro root),  
>> For now, the copying and switching ops are done with writable root
>> filesystem and the repository needs to be writable to allow switch
>> back.  Make sense?
>>     
>
> Oh right.  Why not invoke backend_lock() with writing = 1, then?
>   

Changed.

>   
>>  >   1254: src has come straight from the client.  It might be
>>  >     unterminated.  In client.c:repository_switch() you should either use
>>  >     strnlen() to detect the situation and return an error, or you should
>>  >     forcibly terminate the buffer.  Same for dst.
>>
>> rpr_src and rpr_dst are strlcpy in _scf_repository_switch() which always
>> null-termitated.  Isn't it?
>>     
>
> Yes, but that's in libscf.  You can't assume that clients are using your
> libscf; they could be using a hacked version, or could be fabricating
> door calls out of whole cloth.
>   

Ok. 

>   
>>  >   1256: Why is it ok to fail if src doesn't match?  What should the
>>  >     client do in that case?
>>
>> Good point.  I probably should assert it.
>>     
>
> No, then a bug in svcadm could cause svc.configd to crash.  What
> I wonder is if src is really necessary; shouldn't we always switch from
> the current repository?  If so, why should we refuse if the client has
> the wrong path?
>   

What bug can cause configd to crash?  Shouldn't the path (be_path) 
always match the existing be_db?  Is there any
possibility that be_path can refer to a file path which is different 
from what be_db points to?

>   
>>  >   1267: I'm confused by this comment.  It says that you're going to "go
>>  >     ahead switch back to the original repository", but the code just
>>  >     returns without switching.  Can you explain this for me?
>>
>> If we are switching back (from tmpfs, sw_back is set) and copying fails
>> then we actually don't return and continue with switching back to dst which
>> is the original repository under /etc/svc.
>>     
>
> Ok.  I was confused because the comment talks about the case where we
> skip over the goto errout.  Do you think you could rephrase the comment
> to something like "If the copy failed for any reason, return an error,
> unless we're switching back...".
>   

Fair enough.  Updated.

>   
>>  >   1273: If backend_switch_copy() failed, but sw_back is true, will we
>>  >     try to sqlite_open(dst) anyway?
>>
>> Are you refering to line 1278?  Do you see any reason that we shouldn't
>> sqlite_open?
>>     
>
> No, I was just confused from the comment.  I do wonder, though, if we
> really should switch to the file the client gave us, or if we should
> switch back to the file which we opened when whe first started.
>   

In this case, I prefer switching to the original /etc/svc repository 
instead of staying in the tmpfs location.

>   
>>  >   1332: This will be overwritten if stat(trans_db) succeeds.  Why don't
>>  >     you do it first, and only do stat(fast_db) if it fails?
>>
>> Actually only one of them can exist when the system crashes.  If 
>> trans_db exists then there is not possible for fast_db to exist.  Vice
>> versa.
>>     
>
> Ok, you could take advantage of that assumption to reduce the number of
> system calls you make by structuring your code like
>
>       if (stat(trans_db, &sbuf) == 0) {
>               db = trans_db;
>       } else if (stat(fast_db, &sbuf) == 0) {
>               db = fast_db;
>       }
>
> I would tend to verify that assumption and do something if it turned out
> to be false, though.
>   

Right.  Updated.

>   
>>  >   1371: If sqlite_open() failed, shouldn't we warn the administrator
>>  >     that there's an invalid repository lying around?  Or that we didn't
>>  >     use it because it is invalid?
>>
>> We just unlink it and don't use at all.
>>     
>
> Where?  The unlink() at 1370 is only executed if be_db != NULL.
>   

Ah!  Updated.

>   
>>  >   1366: Shouldn't we warn the administrator if this fails?
>>
>> We will do the best to recover from crash and if it fails we will go 
>> with the default repository so I don't know what else admin can do
>> with the warning message in this case.  Do you see anything that I may
>> miss?
>>     
>
> Presumably we're trying to use the repository because we think it may
> have more up-to-date data than the main one.  If we found such
> a repository but didn't use it because rename() failed, it seems to me
> that we should warn the administrator in case he wants to examine it for
> customizations.
>   

Fair enough.  Updated.

>   
>>  >   1370: This shouldn't be necessary if the rename() succeeded, should
>>  >     it?
>>
>> Right, if rename or switch succeeds then unlink becomes no-op but if it
>> fails then the file needs to be unlinked.  Make sense?
>>     
>
> Sure.  I just prefer to structure my code to avoid system calls I know
> will fail.  I suppose usually because I take action if they fail.
> I guess it's not a big deal.
>   

Good.

>   
>>  > lib/libscf/inc/libscf.h
>>  >   562-5: Why did you choose to put these in libscf.h rather than
>>  >     libscf_priv.h?
>>
>> Since REPOSITORY_DB is needed in both configd and svcadm implementation so
>> I move it from configd.h to a place that both implementation can reference
>> it w/o having configd reference libscf_priv.h since it already reference
>> libscf.h.
>>     
>
> Those variables should be considered private, so they should be placed
> in libscf_priv.h, and the relevent configd files should #include
> libscf_priv.h as necessary.
>   

That's what I like to see as well.  Changed.

>   
>>  > lib/libscf/inc/libscf_priv.h
>>  >   297-301: Why do we need a structure?  Can't _scf_repository_switch()
>>  >     accept sw_src, sw_dst, and sw_back as separate parameters?
>>
>> Yes, I was thinking that too but decided to put them in a structure so
>> in the future if we decide to add more parameters we can do so w/o chaning
>> the interfaces prototype.
>>     
>
> Do you know of other places in ON where we have done that?
>   

I think it should be easy to find the code where the interface/function 
can be expanded easily by adding new fields in the
structure passed to the function.  Say in the near future, if we decide 
to pass more than three parameters we can do so by
just adding new members to the structure w/o changing the interface 
prototype.  This can be useful if the interface is called
everywhere.

>   
>>  > cmd/svc/svcadm/svcadm.c
>>     
> ...
>   
>>  >   2408-9,13-14: Why are you encoding the paths into svcadm?  Do you
>>  >     anticipate needing to switch to repositories other than
>>  >     FAST_REPOSITORY_DB and REPOSITORY_DB in the future?  If not, this
>>  >     seems like extra information which is easy to get wrong.  Couldn't
>>  >     you just set sw_back to 1 or 0, and have svc.configd know what paths
>>  >     to use?
>>
>> Correct.  I like to leave door openned for the possible future enhancement.
>>     
>
> Ok.
>
>
> David
>   


Reply via email to