[nfs-discuss] Code review request for 6755735
Hello,
thanks for the comments.
> I would suggest adding your analysis (i.e., the Evaluation section of
> the bug report) to code reviews you sent out to OpenSolaris. It
> helps understand why you made changes.
Ok, here is my evaluation as is in the bug report:
-
In /onnv-clone/usr/src/cmd/fs.d/nfs/mount/mount.c, we have
1384 static struct netbuf *
1385 get_the_addr(char *hostname, ulong_t prog, ulong_t vers,
...
1388 {
...
1495 if ((get_pubfh == TRUE) && (vers != NFS_V4)) {
...
1575 if (vers == NFS_VERSION) {
...
1596 } else {
1597 WNL_LOOKUP3args arg;
1598 WNL_LOOKUP3res res;
...
1615 fh3p->fh3_length =
1616 res.WNL_LOOKUP3res_u.res_ok.object.data.data_len;
1617 memcpy(fh3p->fh3_u.data,
1618 &res.WNL_LOOKUP3res_u.res_ok.object.data.data_val,
1619 fh3p->fh3_length);
...
The & at the beginning of 1618 doesn't seem right: the struct is defined as
struct wnl_fh3 {
struct {
u_int data_len;
char *data_val;
} data;
};
The bug is not reproducible on 10, which makes perfect sense, since we don't
have the '&' in s10 source.
-
> I have some questions:
>
> 1) What is the root cause here?
The root cause is a typo that had been done while preparing the
Nevada version of the fix for 6710019.
> Not this line of code, but why was this line of code changed
> in Nevada and not S10?
This line has been changed both in Nevada and S10, but this typo
has been done only in Nevada.
> Or was S10 somehow fixed earlier?
> If so, then something is broken because the fix should have occurred in
> Nevada first.
The fix came into Nevada at the end of September and the S10 fix
had been put back in November. I can only speculate why the typo is
present in Nevada and not in S10---a reasonable scenario is that the
S10 fix had been prepared in order to resolve a customer escalation,
then the Nevada fix was prepared (including the typo) and put back,
and after the soak period the original S10 fix was put back.
> 2) Why does the client have to be snv_100 or greater?
Because this is when the fix for 6710019 was integrated.
> Thanks,
> Tom
Thanks
Petr
[nfs-discuss] Code review request for 6755735
On Wed, 03 Dec 2008 14:55:43 +0100, Petr Skovron wrote: > I would like to get a review of the following fix in nfs in the > userspace mount helper: > > 6755735 mounting public filehandle with NFSv3 got ESTALE > http://cr.opensolaris.org/~xofon/6755735/ > > Steps to reproduce the problem: > > on the server: > root at eschaton:~# mkdir /foo > root at eschaton:~# share -F nfs -o rw,log=stcwnfs,public /foo > > then on the client: > root at vha-v65xd:~# mount -o vers=3,public eschaton.czech:/foo /mnt > nfs mount: mount: /mnt: Stale NFS file handle > > The problem occurs if client is >=snv100. looks good to me. fortunately this nit did not got carried over in the s10u7 backport of 6710019 ;-) --- frankB
[nfs-discuss] Code review request for 6755735
Hello, I would like to get a review of the following fix in nfs in the userspace mount helper: 6755735 mounting public filehandle with NFSv3 got ESTALE http://cr.opensolaris.org/~xofon/6755735/ Steps to reproduce the problem: on the server: root at eschaton:~# mkdir /foo root at eschaton:~# share -F nfs -o rw,log=stcwnfs,public /foo then on the client: root at vha-v65xd:~# mount -o vers=3,public eschaton.czech:/foo /mnt nfs mount: mount: /mnt: Stale NFS file handle The problem occurs if client is >=snv100. Thanks Petr
[nfs-discuss] Code review request for 6755735
Petr Skovron wrote: > Hello, > > thanks for the comments. > > I approve of the change.
[nfs-discuss] Code review request for 6755735
Petr, I would suggest adding your analysis (i.e., the Evaluation section of the bug report) to code reviews you sent out to OpenSolaris. It helps understand why you made changes. I have some questions: 1) What is the root cause here? Not this line of code, but why was this line of code changed in Nevada and not S10? I.e., was there a change after the branch to Nevada? If so, then you should look at that changeset to try and understand what the author was trying to do here. Another way to look at it, is that this mistake may have been done elsewhere. Or was S10 somehow fixed earlier? If so, then something is broken because the fix should have occurred in Nevada first. 2) Why does the client have to be snv_100 or greater? Thanks, Tom
