[nfs-discuss] [Fwd: Code review for CR # 6762222]

2008-11-20 Thread Peter Memishian

 > not much.. the xid could be useful, but that only makes sense
 > when sent over the wire. And if required (in some very specific case)
 > can easily be observed by enabling the specific FBT probes...

OK.

--
meem



[nfs-discuss] [Fwd: Code review for CR # 6762222]

2008-11-20 Thread Peter Memishian

 > >* 1086, 2537, 2709: Seems like this failure should be observable
 > >  somehow -- e.g., DTrace SDT probe or the like.
 >
 > added. thanks.

Thanks.  Is there any additional state that it makes sense to pass into
the DTrace probe point for future analysis?

 > http://cr.opensolaris.org/~maheshvs/676-webrev3/

-- 
meem



[nfs-discuss] [Fwd: Code review for CR # 6762222]

2008-11-19 Thread Tom Haynes
Mahesh Siddheshwar wrote:
> webrev's refreshed, but since it looks like there are proxy caching 
> issues, I've copied it at:
>
> http://cr.opensolaris.org/~maheshvs/676-webrev3/
>
> Mahesh
> ___
> nfs-discuss mailing list
> nfs-discuss at opensolaris.org
>   

What I typically do in this case is delete the older copy of the webrev 
and then copy back in.

I.e., I'm not sure it is a proxy cache as much as it might be the 
software on cr.opensolaris.org...



[nfs-discuss] [Fwd: Code review for CR # 6762222]

2008-11-19 Thread Peter Memishian
 > 
 > Meem, yes I thought of that..  and didn't really like the undo work be
 > done later on.  But I reluctantly fell on the side of keeping the canput()
 > and put() as close to each other due to a mixture of sense of guilt and 
 > paronia.. ;) 
 > Your comments helped me sway the other way. (I also think the case 
 > of QFULL will be a  special case anyway like in the case of 
 > lockd -> statd interaction,  but that's besides the point). 
 > 
 > I've changed the code now.. and the new webrev's at:
 > 
 > http://cr.opensolaris.org/~maheshvs/676-webrev2/

Looks good.  I haven't looked at the back-off mechanism; I presume there's
something to ensure that the system won't be pounding on this codepath if
clnt_dispatch_send() fails.  A couple nits:

usr/src/uts/common/rpc/clnt_cots.c:

* 400: Could probably remove the whitespace after "static int"
  and remove the need for the linewrap to 401.

* 1086, 2537, 2709: Seems like this failure should be observable
  somehow -- e.g., DTrace SDT probe or the like.

* 1087: Blank line seems needless.

--
meem



[nfs-discuss] [Fwd: Code review for CR # 6762222]

2008-11-19 Thread Mahesh Siddheshwar
Peter Memishian wrote:
>  > >  * 1086, 2537, 2709: Seems like this failure should be observable
>  > >somehow -- e.g., DTrace SDT probe or the like.
>  >
>  > added. thanks.
>
> Thanks.  Is there any additional state that it makes sense to pass into
> the DTrace probe point for future analysis?
>
>   
not much.. the xid could be useful, but that only makes sense
when sent over the wire. And if required (in some very specific case)
can easily be observed by enabling the specific FBT probes...

Mahesh



[nfs-discuss] [Fwd: Code review for CR # 6762222]

2008-11-19 Thread Mahesh Siddheshwar
Peter Memishian wrote:
>  > 
>  > Meem, yes I thought of that..  and didn't really like the undo work be
>  > done later on.  But I reluctantly fell on the side of keeping the canput()
>  > and put() as close to each other due to a mixture of sense of guilt and 
>  > paronia.. ;) 
>  > Your comments helped me sway the other way. (I also think the case 
>  > of QFULL will be a  special case anyway like in the case of 
>  > lockd -> statd interaction,  but that's besides the point). 
>  > 
>  > I've changed the code now.. and the new webrev's at:
>  > 
>  > http://cr.opensolaris.org/~maheshvs/676-webrev2/
>
> Looks good.  I haven't looked at the back-off mechanism; I presume there's
> something to ensure that the system won't be pounding on this codepath if
> clnt_dispatch_send() fails. 
yes, it uses the current delay behavior which exists right now.
>  A couple nits:
>
> usr/src/uts/common/rpc/clnt_cots.c:
>
>   * 400: Could probably remove the whitespace after "static int"
> and remove the need for the linewrap to 401.
>
>   
nope, that doesn't prevent the linewrap. I've left it as it is.
>   * 1086, 2537, 2709: Seems like this failure should be observable
> somehow -- e.g., DTrace SDT probe or the like.
>
>   
added. thanks.
>   * 1087: Blank line seems needless.
>
>   
removed.

thanks for doing the review meem!

webrev's refreshed, but since it looks like there are proxy caching 
issues, I've copied it at:

http://cr.opensolaris.org/~maheshvs/676-webrev3/

Mahesh



[nfs-discuss] [Fwd: Code review for CR # 6762222]

2008-11-19 Thread Peter Memishian

 > Need a quick code review for:
 > 
 > 676* lockd burns cpu cycles, nfs pathologically slow
 > 
 > As has been discussed, the immediate approach is to enforce flow
 > control in the RPC client side dispatch routines --
 > clnt_dispatch_send() and clnt_clts_dispatch_send(). In case the
 > downstream is flow controlled, the error recovery is straightforward
 > and you rely on the upper (rfscall) layers to retry the call -- a
 > mechanism that already exists currently to take care of other errors.
 > 
 > webrev:
 > http://cr.opensolaris.org/~maheshvs/676-webrev/

I haven't looked over everything in detail yet, but at least in
clnt_dispatch_send(), it'd be simpler to check canput() earlier in the
function, before you've done anything you have to later undo.  This is
OK because canput() is already not atomic with respect to put() and
it's fine if you go a couple of messages over QFULL.

 > * External links:
 > CR: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=676
 > Evaluation for the CR: http://cr.opensolaris.org/~maheshvs/676-eval

-- 
meem



[nfs-discuss] [Fwd: Code review for CR # 6762222]

2008-11-19 Thread Mahesh Siddheshwar
Peter Memishian wrote:
>  > Need a quick code review for:
>  > 
>  > 676* lockd burns cpu cycles, nfs pathologically slow
>  > 
>  > As has been discussed, the immediate approach is to enforce flow
>  > control in the RPC client side dispatch routines --
>  > clnt_dispatch_send() and clnt_clts_dispatch_send(). In case the
>  > downstream is flow controlled, the error recovery is straightforward
>  > and you rely on the upper (rfscall) layers to retry the call -- a
>  > mechanism that already exists currently to take care of other errors.
>  > 
>  > webrev:
>  > http://cr.opensolaris.org/~maheshvs/676-webrev/
>
> I haven't looked over everything in detail yet, but at least in
> clnt_dispatch_send(), it'd be simpler to check canput() earlier in the
> function, before you've done anything you have to later undo.  This is
> OK because canput() is already not atomic with respect to put() and
> it's fine if you go a couple of messages over QFULL.
>   

Meem, yes I thought of that..  and didn't really like the undo work be
done later on.  But I reluctantly fell on the side of keeping the canput()
and put() as close to each other due to a mixture of sense of guilt and 
paronia.. ;) 
Your comments helped me sway the other way. (I also think the case 
of QFULL will be a  special case anyway like in the case of 
lockd -> statd interaction,  but that's besides the point). 

I've changed the code now.. and the new webrev's at:

http://cr.opensolaris.org/~maheshvs/676-webrev2/

Thanks,
Mahesh

>  > * External links:
>  > CR: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=676
>  > Evaluation for the CR: http://cr.opensolaris.org/~maheshvs/676-eval
>
>