Re: ffs_truncate3 panics

2018-08-17 Thread Rick Macklem
Just fyi, I have committed r337962 to head, which stops the pNFS service
from creating non-zero length empty files.
Since these files were the ones causing the "ffs_truncate3" panics, the panics
should no longer occur.

I will take a closer look at some point to see if I can spot why the panics()
occur and can easily revert this patch on my test setup if others have
suggested patches to try related to this.

Thanks for your help with this, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-12 Thread Rick Macklem
Konstantin Belousov  wrote:
[stuff snipped]
>Problem with this buffer is that BX_ALTDATA bit is not set.
>This is the reason why vinvalbuf(V_ALT) skips it.
[more stuff snipped]
>The vnode is exclusively locked. Other thread must not be able to
>instantiate a buffer under us.
That's what I thought, but I wasn't sure that UFS never did anything to the 
buffers
without the vnode lock.
[more stuff snipped]
>This is the patch that I posted long time ago.  It is obviously related
>to missed BX_ALTDATA.  Can you add this patch to your kernel ?
>
>diff --git a/sys/ufs/ffs/ffs_balloc.c b/sys/ufs/ffs/ffs_balloc.c
>index 552c295753d..6d89a229ea7 100644
>--- a/sys/ufs/ffs/ffs_balloc.c
>+++ b/sys/ufs/ffs/ffs_balloc.c
>@@ -682,8 +682,16 @@ ffs_balloc_ufs2(struct vnode *vp, off_t startoffset, int 
>size,
>ffs_blkpref_ufs2(ip, lbn, (int)lbn,
>&dp->di_extb[0]), osize, nsize, flags,
>cred, &bp);
>-   if (error)
>+   if (error != 0) {
>+   /* getblk does truncation, if needed */
>+   bp = getblk(vp, -1 - lbn, osize, 0, 0,
>+   GB_NOCREAT);
>+   if (bp != NULL) {
>+   bp->b_xflags |= BX_ALTDATA;
>+   brelse(bp);
>+   }
>return (error);
>+   }
>bp->b_xflags |= BX_ALTDATA;
>if (DOINGSOFTDEP(vp))
>softdep_setup_allocext(ip, lbn,
>@@ -699,8 +707,17 @@ ffs_balloc_ufs2(struct vnode *vp, off_t startoffset, int 
>size,
>error = ffs_alloc(ip, lbn,
>   ffs_blkpref_ufs2(ip, lbn, (int)lbn, 
> &dp->di_extb[0]),
>   nsize, flags, cred, &newb);
>-   if (error)
>+   if (error != 0) {
>+   bp = getblk(vp, -1 - lbn, nsize, 0, 0,
>+   GB_NOCREAT);
>+   if (bp != NULL) {
>+   bp->b_xflags |= BX_ALTDATA;
>+   bp->b_flags |= B_RELBUF | B_INVAL;
>+   bp->b_flags &= ~B_ASYNC;
>+   brelse(bp);
>+   }
>return (error);
>+   }
>bp = getblk(vp, -1 - lbn, nsize, 0, 0, gbflags);
>bp->b_blkno = fsbtodb(fs, newb);
>bp->b_xflags |= BX_ALTDATA;

I don't think this patch helped. I still get printf()s with b_xflags == clear 
with it
applied. I haven't gotten one that would cause the panic yet, but it didn't
make the BX_ALTDATA flag get set.

However, I have narrowed down how the ones that cause a panic() occur.
Turns out I was wrong when I said di_size == 0 for all these files.
They don't store any data, but if an application does a truncate(2) with length 
> 0,
the di_size does get set non-zero.
It is when one of these files hits the ffs_truncate() with the extended 
attribute
buffer on it, that the panic() happens.
(Most of them have di_size == 0 and return from the function in the code block
 that starts with "if (ip->I_size == length)" at around line#299, before the 
panic()
 check.)

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-11 Thread Konstantin Belousov
On Sat, Aug 11, 2018 at 12:05:25PM +, Rick Macklem wrote:
> Konstantin Belousov wrote:
> >On Thu, Aug 09, 2018 at 08:38:50PM +, Rick Macklem wrote:
> >> >BTW, does NFS server use extended attributes ?  What for ?  Can you, 
> >> >please,
> >> >point out the code which does this ?
> >> For the pNFS service, there are two system namespace extended attributes 
> >> for
> >> each file stored on the service.
> >> pnfsd.dsfile - Stores where the data for the file is. Can be displayed by 
> >> the
> >>  pnfsdsfile(8) command.
> >>
> >> pnfsd.dsattr - Cached attributes that change when a file is written (size, 
> >> mtime,
> >> change) so that the MDS doesn't have to do a Getattr on the data server 
> >> for every client Getattr.
> >>
> >
> >My reading of the nfsd code + ffs extattr handling reminds me that you
> >already reported this issue some time ago.  I suspected ufs_balloc() at
> >that time.
> Yes. I had almost forgotten about them, because I have been testing with a
> couple of machines (not big, but amd64 with a few Gbytes of RAM) and they
> never hit the panic(). Recently, I've been using the 256Mbyte i386 and started
> seeing them again.
> 
> >Now I think that the situation with the stray buffers hanging on the
> >queue is legitimate, ffs_extread() might create such buffer and release
> >it to a clean queue, then removal of the file would see inode with no
> >allocated ext blocks but with the buffer.
> >
> >I think the easiest way to handle it is to always flush buffers and pages
> >in the ext attr range, regardless of the number of allocated ext blocks.
> >Patch below was not tested.
> [patch deleted for brevity]
> Well, the above sounds reasonable, but the patch didn't help.
> Here's a small portion of the log a test run last night.
> - First, a couple of things about the printf()s. When they start with 
> "CL=",
>   the printf() is at the start of ffs_truncate(). "" is a static counter 
> of calls to
>   ffs_truncate(), so "same value" indicates same call.
> 
> 
> CL=31816 flags=0xc00 vtyp=1 bodirty=0 boclean=1 diextsiz=320
> buf at 0x429f260
> b_flags = 0x20001020, b_xflags=0x2, b_vflags=0x0
> b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0
> b_bufobj = (0xfa3f734), b_data = 0x4c9, b_blkno = -1, b_lblkno = -1, 
> b_dep = 0
> b_kvabase = 0x4c9, b_kvasize = 32768
> 
> CL=34593 flags=0xc00 vtyp=1 bodirty=0 boclean=1 diextsiz=320
> buf at 0x429deb0
> b_flags = 0x20001020, b_xflags=0x2, b_vflags=0x0
> b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0
> b_bufobj = (0xfd3da94), b_data = 0x570, b_blkno = -1, b_lblkno = -1, 
> b_dep = 0
> b_kvabase = 0x570, b_kvasize = 32768
> 
> FFST3=34593 vtyp=1 bodirty=0 boclean=1
> buf at 0x429deb0
> b_flags = 0x20001020, b_xflags=0x2, b_vflags=0x0
> b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0
> b_bufobj = (0xfd3da94), b_data = 0x570, b_blkno = -1, b_lblkno = -1, 
> b_dep = 0
> b_kvabase = 0x570, b_kvasize = 32768
Problem with this buffer is that BX_ALTDATA bit is not set.
This is the reason why vinvalbuf(V_ALT) skips it.

> 
> So, the first one is what typically happens and there would be no panic().
>  The second/third would be a panic(), since the one that starts with "FFST3"
> is a printf() that replaces the panic() call.
> - Looking at the second/third, the number at the beginning is the same, so it 
> is
>   the same call, but for some reason, between the start of the function and
>   where the ffs_truncate3 panic() test is, di_extsize has been set to 0, but 
> the
>   buffer is still there (or has been re-created there by another thread?).
> 
> Looking at the code, I can't see how this could happen, since there is a 
> vinvalbuf()
> call after the only place in the code that sets di_extsize == 0, from what I 
> can see?
> I am going to add printf()s after the vinvalbuf() calls, to make sure they are
> happening and getting rid of the buffer.
> 
> If another thread could somehow (re)create the buffer concurrently with the
> ffs_truncate() call, that would explain it, I think?
The vnode is exclusively locked. Other thread must not be able to
instantiate a buffer under us.

> 
> Just a wild guess, but I suspect softdep_slowdown() is flipping, due to the 
> small
> size of the machine and this makes the behaviour of ffs_truncate() confusing.

This is the patch that I posted long time ago.  It is obviously related
to missed BX_ALTDATA.  Can you add this patch to your kernel ?

diff --git a/sys/ufs/ffs/ffs_balloc.c b/sys/ufs/ffs/ffs_balloc.c
index 552c295753d..6d89a229ea7 100644
--- a/sys/ufs/ffs/ffs_balloc.c
+++ b/sys/ufs/ffs/ffs_balloc.c
@@ -682,8 +682,16 @@ ffs_balloc_ufs2(struct vnode *vp, off_t startoffset, int 
size,
ffs_blkpref_ufs2(ip, lbn, (int)lbn,
&dp->di_extb[0]), osize, nsize, flags,
cred, &bp);
-   if (error)
+   if (erro

Re: ffs_truncate3 panics

2018-08-11 Thread Rick Macklem
Konstantin Belousov wrote:
>On Thu, Aug 09, 2018 at 08:38:50PM +, Rick Macklem wrote:
>> >BTW, does NFS server use extended attributes ?  What for ?  Can you, please,
>> >point out the code which does this ?
>> For the pNFS service, there are two system namespace extended attributes for
>> each file stored on the service.
>> pnfsd.dsfile - Stores where the data for the file is. Can be displayed by the
>>  pnfsdsfile(8) command.
>>
>> pnfsd.dsattr - Cached attributes that change when a file is written (size, 
>> mtime,
>> change) so that the MDS doesn't have to do a Getattr on the data server for 
>> every client Getattr.
>>
>
>My reading of the nfsd code + ffs extattr handling reminds me that you
>already reported this issue some time ago.  I suspected ufs_balloc() at
>that time.
Yes. I had almost forgotten about them, because I have been testing with a
couple of machines (not big, but amd64 with a few Gbytes of RAM) and they
never hit the panic(). Recently, I've been using the 256Mbyte i386 and started
seeing them again.

>Now I think that the situation with the stray buffers hanging on the
>queue is legitimate, ffs_extread() might create such buffer and release
>it to a clean queue, then removal of the file would see inode with no
>allocated ext blocks but with the buffer.
>
>I think the easiest way to handle it is to always flush buffers and pages
>in the ext attr range, regardless of the number of allocated ext blocks.
>Patch below was not tested.
[patch deleted for brevity]
Well, the above sounds reasonable, but the patch didn't help.
Here's a small portion of the log a test run last night.
- First, a couple of things about the printf()s. When they start with "CL=",
  the printf() is at the start of ffs_truncate(). "" is a static counter of 
calls to
  ffs_truncate(), so "same value" indicates same call.


CL=31816 flags=0xc00 vtyp=1 bodirty=0 boclean=1 diextsiz=320
buf at 0x429f260
b_flags = 0x20001020, b_xflags=0x2, b_vflags=0x0
b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0
b_bufobj = (0xfa3f734), b_data = 0x4c9, b_blkno = -1, b_lblkno = -1, b_dep 
= 0
b_kvabase = 0x4c9, b_kvasize = 32768

CL=34593 flags=0xc00 vtyp=1 bodirty=0 boclean=1 diextsiz=320
buf at 0x429deb0
b_flags = 0x20001020, b_xflags=0x2, b_vflags=0x0
b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0
b_bufobj = (0xfd3da94), b_data = 0x570, b_blkno = -1, b_lblkno = -1, b_dep 
= 0
b_kvabase = 0x570, b_kvasize = 32768

FFST3=34593 vtyp=1 bodirty=0 boclean=1
buf at 0x429deb0
b_flags = 0x20001020, b_xflags=0x2, b_vflags=0x0
b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0
b_bufobj = (0xfd3da94), b_data = 0x570, b_blkno = -1, b_lblkno = -1, b_dep 
= 0
b_kvabase = 0x570, b_kvasize = 32768

So, the first one is what typically happens and there would be no panic().
 The second/third would be a panic(), since the one that starts with "FFST3"
is a printf() that replaces the panic() call.
- Looking at the second/third, the number at the beginning is the same, so it is
  the same call, but for some reason, between the start of the function and
  where the ffs_truncate3 panic() test is, di_extsize has been set to 0, but the
  buffer is still there (or has been re-created there by another thread?).

Looking at the code, I can't see how this could happen, since there is a 
vinvalbuf()
call after the only place in the code that sets di_extsize == 0, from what I 
can see?
I am going to add printf()s after the vinvalbuf() calls, to make sure they are
happening and getting rid of the buffer.

If another thread could somehow (re)create the buffer concurrently with the
ffs_truncate() call, that would explain it, I think?

Just a wild guess, but I suspect softdep_slowdown() is flipping, due to the 
small
size of the machine and this makes the behaviour of ffs_truncate() confusing.

I'll post again when I have more info.
Thanks for looking at it, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-10 Thread Konstantin Belousov
On Thu, Aug 09, 2018 at 08:38:50PM +, Rick Macklem wrote:
> >BTW, does NFS server use extended attributes ?  What for ?  Can you, please,
> >point out the code which does this ?
> For the pNFS service, there are two system namespace extended attributes for
> each file stored on the service.
> pnfsd.dsfile - Stores where the data for the file is. Can be displayed by the
>  pnfsdsfile(8) command.
> 
> pnfsd.dsattr - Cached attributes that change when a file is written (size, 
> mtime,
> change) so that the MDS doesn't have to do a Getattr on the data server for 
> every client Getattr.
> 

My reading of the nfsd code + ffs extattr handling reminds me that you
already reported this issue some time ago.  I suspected ufs_balloc() at
that time.

Now I think that the situation with the stray buffers hanging on the
queue is legitimate, ffs_extread() might create such buffer and release
it to a clean queue, then removal of the file would see inode with no
allocated ext blocks but with the buffer.

I think the easiest way to handle it is to always flush buffers and pages
in the ext attr range, regardless of the number of allocated ext blocks.
Patch below was not tested.
diff --git a/sys/ufs/ffs/ffs_inode.c b/sys/ufs/ffs/ffs_inode.c
index 3cf58558c18..2ffd861f3b4 100644
--- a/sys/ufs/ffs/ffs_inode.c
+++ b/sys/ufs/ffs/ffs_inode.c
@@ -244,22 +244,19 @@ ffs_truncate(vp, length, flags, cred)
extblocks = btodb(fragroundup(fs, ip->i_din2->di_extsize));
datablocks -= extblocks;
}
-   if ((flags & IO_EXT) && extblocks > 0) {
+   if ((flags & IO_EXT) != 0) {
if (length != 0)
panic("ffs_truncate: partial trunc of extdata");
if (softdeptrunc || journaltrunc) {
if ((flags & IO_NORMAL) == 0)
goto extclean;
needextclean = 1;
-   } else {
-   if ((error = ffs_syncvnode(vp, MNT_WAIT, 0)) != 0)
-   return (error);
+   } else if ((error = ffs_syncvnode(vp, MNT_WAIT, 0)) != 0)
+   return (error);
+   if (extblocks > 0) {
 #ifdef QUOTA
(void) chkdq(ip, -extblocks, NOCRED, 0);
 #endif
-   vinvalbuf(vp, V_ALT, 0, 0);
-   vn_pages_remove(vp,
-   OFF_TO_IDX(lblktosize(fs, -extblocks)), 0);
osize = ip->i_din2->di_extsize;
ip->i_din2->di_blocks -= extblocks;
ip->i_din2->di_extsize = 0;
@@ -278,6 +275,8 @@ ffs_truncate(vp, length, flags, cred)
vp->v_type, NULL, SINGLETON);
}
}
+   vinvalbuf(vp, V_ALT, 0, 0);
+   vn_pages_remove(vp, OFF_TO_IDX(lblktosize(fs, -UFS_NXADDR)), 0);
}
if ((flags & IO_NORMAL) == 0)
return (0);
@@ -631,7 +630,10 @@ ffs_truncate(vp, length, flags, cred)
softdep_journal_freeblocks(ip, cred, length, IO_EXT);
else
softdep_setup_freeblocks(ip, length, IO_EXT);
-   return (ffs_update(vp, waitforupdate));
+   error = ffs_update(vp, waitforupdate);
+   vinvalbuf(vp, V_ALT, 0, 0);
+   vn_pages_remove(vp, OFF_TO_IDX(lblktosize(fs, -UFS_NXADDR)), 0);
+   return (error);
 }
 
 /*
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-10 Thread Rick Macklem
Konstantin Belousov wrote:
>On Thu, Aug 09, 2018 at 08:38:50PM +, Rick Macklem wrote:
>> I did notice that my code locks the vnode first and then calls 
>> vn_start_write()
>>  for the vn_extattr_set() calls, whereas the syscall code locks the vnode 
>> after the vn_start_write() call.
>>
>> Does that matter?
>
>Yes, it matter.  It would cause deadlocks when corresponding filesystem
>is suspended in parallel with NFSD activities.  vn_start_write() is a lock,
>and the correct lock order is vn_start_write()->vnode lock.
Ok, thanks, I'll work on a patch to fix this LOR.

rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-09 Thread Konstantin Belousov
On Thu, Aug 09, 2018 at 08:38:50PM +, Rick Macklem wrote:
> I did notice that my code locks the vnode first and then calls 
> vn_start_write()
>  for the vn_extattr_set() calls, whereas the syscall code locks the vnode 
> after the vn_start_write() call.
> 
> Does that matter?

Yes, it matter.  It would cause deadlocks when corresponding filesystem
is suspended in parallel with NFSD activities.  vn_start_write() is a lock,
and the correct lock order is vn_start_write()->vnode lock.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-09 Thread Rick Macklem
Konstantin Belousov wrote:
[stuff snipped]
>I wrote:
>>
>> I can add printf()s anywhere you suggest, but I'm not sure how you would 
>> catch
>> this case sooner? (For example, I could print out di_extsize at the 
>> beginning of
>> ffs_truncate(), if that would help?)
>May be, add a loop at the beginning of ffs_truncate(), over all buffers
>on both clean and dirty queues, calculating number of buffers with
>b_lblkno < 0 and >= -UFS_NXADDR. Print some diagnostic if such buffer is
>detected but di_extsize is zero.
Ok, I can do that. These failures don't occur that often, so it might take a 
while
to get one.

>BTW, does NFS server use extended attributes ?  What for ?  Can you, please,
>point out the code which does this ?
For the pNFS service, there are two system namespace extended attributes for
each file stored on the service.
pnfsd.dsfile - Stores where the data for the file is. Can be displayed by the
 pnfsdsfile(8) command.

pnfsd.dsattr - Cached attributes that change when a file is written (size, 
mtime,
change) so that the MDS doesn't have to do a Getattr on the data server for 
every client Getattr.


The code is in sys/fs/nfsserver/nfs_nfsdport.c and
sys/fs/nfsserver/nfs_nfsdserv.c. Just grep for vn_extattr to see the code.

I did notice that my code locks the vnode first and then calls vn_start_write()
 for the vn_extattr_set() calls, whereas the syscall code locks the vnode after 
the vn_start_write() call.

Does that matter?


Thanks, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-09 Thread Rick Macklem
Rodney W. Grimes wrote:
[stuff snipped]
>It should be possible to design a set of VM's using bhyve, xen or ones
>favorite hypervisor/virtulization platform to do "more" pNFS testing.
>If you could provide a rought machine set needed to have a functional
>test bed, and what should be done to "test" for problems I think it
>should be possible to get you some addition testers.
Well, I've posted this before, but it is easier now, since the code is in head 
and the
most recent FreeBSD-12 snapshots.

Short version is in "man pnfsserver".
Longer version is at http://people.freebsd.org/~rmacklem/pnfs-planb-setup.xtx

Two VMs would be the minimum, although 3-5 would be a better setup.
(However, you have to set up the service and then mount it and exercise the
 mount point, like you would any other NFS mount. This can't really be automated
 easily, although I typically do a "make" on something like a kernel source tree
 to exercise it.)

rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-09 Thread Rodney W. Grimes
> Konstantin Belousov wrote:
> [stuff snipped]
> >> >Can you print the only buffer on the clean queue when the panic occur ?
> >> ffst3 vtyp=1 bodirty=0 boclean=1
> >> buf at 0x428a110
> >> b_flags = 0x20001020, b_xflags=0x2, b_vflags=0x0
> >> b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0
> >> b_bufobj = (0xfd8ba94), b_data = 0x517, b_blkno = -1, b_lblkno = -1, 
> >> b_dep = 0
> >> b_kvabase = 0x517, b_kvasize = 32768
> >So the buffer was indeed for extended attrs, and never written to the disk.
> >I am quite interested what was the inode content prior to the truncation,
> >esp. the di_extsize.
> Just in case it wasn't clear, this buffer is on the clean list and not the 
> dirty one.
> (Does this mean it somehow got onto the "clean list" without being written to 
> disk?)
> 
> >Could you try to formulate a way to reproduce the panic so that Peter
> >can recreate it, please ?
> I doubt it. It would require him doing a pNFS setup with multiple systems.
> (At least that is the only way I reproduce it and I sometimes go a week of 
> testing
>  before I see them.)
> It would be great to have more testers for the pNFS server stuff, but I doubt 
> it
> would fit into Peter's setup?

It should be possible to design a set of VM's using bhyve, xen or ones
favorite hypervisor/virtulization platform to do "more" pNFS testing.
If you could provide a rought machine set needed to have a functional
test bed, and what should be done to "test" for problems I think it
should be possible to get you some addition testers.

Thanks,

> I can add printf()s anywhere you suggest, but I'm not sure how you would catch
> this case sooner? (For example, I could print out di_extsize at the beginning 
> of
> ffs_truncate(), if that would help?)
> 
> rick
> [more stuff snipped]

-- 
Rod Grimes rgri...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-09 Thread Konstantin Belousov
On Thu, Aug 09, 2018 at 01:39:20AM +, Rick Macklem wrote:
> Konstantin Belousov wrote:
> [stuff snipped]
> >> >Can you print the only buffer on the clean queue when the panic occur ?
> >> ffst3 vtyp=1 bodirty=0 boclean=1
> >> buf at 0x428a110
> >> b_flags = 0x20001020, b_xflags=0x2, b_vflags=0x0
> >> b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0
> >> b_bufobj = (0xfd8ba94), b_data = 0x517, b_blkno = -1, b_lblkno = -1, 
> >> b_dep = 0
> >> b_kvabase = 0x517, b_kvasize = 32768
> >So the buffer was indeed for extended attrs, and never written to the disk.
> >I am quite interested what was the inode content prior to the truncation,
> >esp. the di_extsize.
> Just in case it wasn't clear, this buffer is on the clean list and not the 
> dirty one.
> (Does this mean it somehow got onto the "clean list" without being written to 
> disk?)
> 
> >Could you try to formulate a way to reproduce the panic so that Peter
> >can recreate it, please ?
> I doubt it. It would require him doing a pNFS setup with multiple systems.
> (At least that is the only way I reproduce it and I sometimes go a week of 
> testing
>  before I see them.)
> It would be great to have more testers for the pNFS server stuff, but I doubt 
> it
> would fit into Peter's setup?
> 
> I can add printf()s anywhere you suggest, but I'm not sure how you would catch
> this case sooner? (For example, I could print out di_extsize at the beginning 
> of
> ffs_truncate(), if that would help?)
May be, add a loop at the beginning of ffs_truncate(), over all buffers
on both clean and dirty queues, calculating number of buffers with
b_lblkno < 0 and >= -UFS_NXADDR. Print some diagnostic if such buffer is
detected but di_extsize is zero.

BTW, does NFS server use extended attributes ?  What for ?  Can you, please,
point out the code which does this ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-08 Thread Rick Macklem
Konstantin Belousov wrote:
[stuff snipped]
>> >Can you print the only buffer on the clean queue when the panic occur ?
>> ffst3 vtyp=1 bodirty=0 boclean=1
>> buf at 0x428a110
>> b_flags = 0x20001020, b_xflags=0x2, b_vflags=0x0
>> b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0
>> b_bufobj = (0xfd8ba94), b_data = 0x517, b_blkno = -1, b_lblkno = -1, 
>> b_dep = 0
>> b_kvabase = 0x517, b_kvasize = 32768
>So the buffer was indeed for extended attrs, and never written to the disk.
>I am quite interested what was the inode content prior to the truncation,
>esp. the di_extsize.
Just in case it wasn't clear, this buffer is on the clean list and not the 
dirty one.
(Does this mean it somehow got onto the "clean list" without being written to 
disk?)

>Could you try to formulate a way to reproduce the panic so that Peter
>can recreate it, please ?
I doubt it. It would require him doing a pNFS setup with multiple systems.
(At least that is the only way I reproduce it and I sometimes go a week of 
testing
 before I see them.)
It would be great to have more testers for the pNFS server stuff, but I doubt it
would fit into Peter's setup?

I can add printf()s anywhere you suggest, but I'm not sure how you would catch
this case sooner? (For example, I could print out di_extsize at the beginning of
ffs_truncate(), if that would help?)

rick
[more stuff snipped]
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-08 Thread Konstantin Belousov
On Wed, Aug 08, 2018 at 12:30:54PM +, Rick Macklem wrote:
> Konstantin Belousov wrote:
> >On Tue, Aug 07, 2018 at 12:28:33PM +, Rick Macklem wrote:
> >> Hi,
> >>
> >> During testing of the pNFS server I get an ffs_truncate3 panic every once 
> >> in a while.
> >> A few things that might be relevant:
> >> - Seems to happen more often when soft update journaling is enabled, but 
> >> will
> >>   happen when it is disabled.
> >> - Normally happens when a fairly large subtree of the file system is being 
> >> removed.
> - Oh, and this is an old i386 with 256Mbytes (not one of them new fangled 
> computers,
>where memory is in Gbytes;-)
> 
> >>
> >> These file systems are a bit odd, since all the regular files in them are 
> >> empty but
> >> have extended attributes that are accessed during the subtree removal. (The
> >> extended attributes tell the server where the data files are.)
> >>
> >> I replaced the panic() with a printf() and every time the printf() 
> >> happens...
> >> bo->bo_dirty.bv_cnt == 0 and bo->bo_clean.bv_cnt == 1.
> >> After one of these printf()s, the system continues to run ok. When the file
> >> system is fsck'd after this has occurred, it passes fine and I haven't 
> >> seen and
> >> indication of file system corruption after running with this file system 
> >> for
> >> quite a while after the printf()s first occurred.
> >The lack of corruption is, most likely, because the files are removed.
> >Would the files truncated to zero length and then extended, I am almost
> >sure that a corruption occur.
> >
> >Can you print the only buffer on the clean queue when the panic occur ?
> ffst3 vtyp=1 bodirty=0 boclean=1
> buf at 0x428a110
> b_flags = 0x20001020, b_xflags=0x2, b_vflags=0x0
> b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0
> b_bufobj = (0xfd8ba94), b_data = 0x517, b_blkno = -1, b_lblkno = -1, 
> b_dep = 0
> b_kvabase = 0x517, b_kvasize = 32768
So the buffer was indeed for extended attrs, and never written to the disk.
I am quite interested what was the inode content prior to the truncation,
esp. the di_extsize.

Could you try to formulate a way to reproduce the panic so that Peter
can recreate it, please ?

> 
> >Also, it is interesting to know the initial length of the file.
> Since they are regular files, they are 0 length. (Just inodes with extended 
> attributes.)
> 
> >>
> >> Since the panic() only occurs when "options INVARIANTS" is enabled and I 
> >> don't
> >> see evidence of file system corruption, I'm wondering if this panic() is 
> >> valid and
> >> needed?
> 
> rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-08 Thread Conrad Meyer
On Wed, Aug 8, 2018 at 5:30 AM, Rick Macklem  wrote:
> - Oh, and this is an old i386 with 256Mbytes (not one of them new fangled 
> computers,
>where memory is in Gbytes;-)

Have you run memtest86+ recently?

Best,
Conrad
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-08 Thread Rick Macklem
Konstantin Belousov wrote:
>On Tue, Aug 07, 2018 at 12:28:33PM +, Rick Macklem wrote:
>> Hi,
>>
>> During testing of the pNFS server I get an ffs_truncate3 panic every once in 
>> a while.
>> A few things that might be relevant:
>> - Seems to happen more often when soft update journaling is enabled, but will
>>   happen when it is disabled.
>> - Normally happens when a fairly large subtree of the file system is being 
>> removed.
- Oh, and this is an old i386 with 256Mbytes (not one of them new fangled 
computers,
   where memory is in Gbytes;-)

>>
>> These file systems are a bit odd, since all the regular files in them are 
>> empty but
>> have extended attributes that are accessed during the subtree removal. (The
>> extended attributes tell the server where the data files are.)
>>
>> I replaced the panic() with a printf() and every time the printf() happens...
>> bo->bo_dirty.bv_cnt == 0 and bo->bo_clean.bv_cnt == 1.
>> After one of these printf()s, the system continues to run ok. When the file
>> system is fsck'd after this has occurred, it passes fine and I haven't seen 
>> and
>> indication of file system corruption after running with this file system for
>> quite a while after the printf()s first occurred.
>The lack of corruption is, most likely, because the files are removed.
>Would the files truncated to zero length and then extended, I am almost
>sure that a corruption occur.
>
>Can you print the only buffer on the clean queue when the panic occur ?
ffst3 vtyp=1 bodirty=0 boclean=1
buf at 0x428a110
b_flags = 0x20001020, b_xflags=0x2, b_vflags=0x0
b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0
b_bufobj = (0xfd8ba94), b_data = 0x517, b_blkno = -1, b_lblkno = -1, b_dep 
= 0
b_kvabase = 0x517, b_kvasize = 32768

>Also, it is interesting to know the initial length of the file.
Since they are regular files, they are 0 length. (Just inodes with extended 
attributes.)

>>
>> Since the panic() only occurs when "options INVARIANTS" is enabled and I 
>> don't
>> see evidence of file system corruption, I'm wondering if this panic() is 
>> valid and
>> needed?

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ffs_truncate3 panics

2018-08-07 Thread Konstantin Belousov
On Tue, Aug 07, 2018 at 12:28:33PM +, Rick Macklem wrote:
> Hi,
> 
> During testing of the pNFS server I get an ffs_truncate3 panic every once in 
> a while.
> A few things that might be relevant:
> - Seems to happen more often when soft update journaling is enabled, but will
>   happen when it is disabled.
> - Normally happens when a fairly large subtree of the file system is being 
> removed.
> 
> These file systems are a bit odd, since all the regular files in them are 
> empty but
> have extended attributes that are accessed during the subtree removal. (The
> extended attributes tell the server where the data files are.)
> 
> I replaced the panic() with a printf() and every time the printf() happens...
> bo->bo_dirty.bv_cnt == 0 and bo->bo_clean.bv_cnt == 1.
> After one of these printf()s, the system continues to run ok. When the file
> system is fsck'd after this has occurred, it passes fine and I haven't seen 
> and
> indication of file system corruption after running with this file system for
> quite a while after the printf()s first occurred.
The lack of corruption is, most likely, because the files are removed.
Would the files truncated to zero length and then extended, I am almost
sure that a corruption occur.

Can you print the only buffer on the clean queue when the panic occur ?
Also, it is interesting to know the initial length of the file.

> 
> Since the panic() only occurs when "options INVARIANTS" is enabled and I don't
> see evidence of file system corruption, I'm wondering if this panic() is 
> valid and
> needed?
> 
> rick
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"