Re: ZFS l2arc broken in 10.3

2016-10-13 Thread Peter

Pete French wrote:


Ok, thats a bit worry if true - but I can confirm that l2arc works fine
under 10.3 on amd64, so what you say about cross-compling might be true.
Am taking an inetrest in this as I have just dpeloyed a lot of machines
which are going to be relying on l2arc working to get reasobale performance.


Sure on my amd64 it also works fine. AFAIK such things are tolerated 
when compiling in 64bit.


But I was pointed to another point interim: my source is from STABLE 
branch; in the 10.3 RELEASE the code is different. Obviousely there were

recent changes, and that explains why the problem was not yet detected.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS l2arc broken in 10.3

2016-10-13 Thread Andriy Gapon
On 12/10/2016 23:18, Peter wrote:
> Details:
> After upgrading 2 machines from 9.3 to 10.3-STABLE, on one of them the
> l2arc stays empty (capacity alloc = 0), although it is online and gets
> accessed. It did work well on 9.3.
> 
> I did the following tests:
>  * Create a zpool on a stick, with two volumes: one filesystem and one
>cache. The cache stays with alloc=0.
>Export it and move it into the other machine. The cache immediately
>fills.
>Move it back, the cache stays with alloc=0.
>-> this rules out all zpool/zfs get/set options, as they should
>   walk with the pool.
>  * Boot the GENERIC kernel. l2arc stays with alloc=0.
>-> this rules out all my nonstandard kernel options.
>  * Boot in single user mode. l2arc stays with alloc=0.
>-> this rules out all /etc/* config files.
>  * Delete the zpool.cache and reimport pools. l2arc stays with alloc=0.
>  * Copy the /boot/loader.conf settings to the other machine. The l2arc
>still works there.
> 
> I could not think of any remaining place where this could come from,
> except the kernel code itself.
> From there, I found these counters nicely incrementing each second:
>   kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 50758
>   kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 27121
>   kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 40589375488
> But also this counter incrementing:
>   kstat.zfs.misc.arcstats.l2_write_full: 14604
> 
> Then with some printf in the code I saw these values provided:
>  buf_sz = hdr->b_size;
> align = (size_t)1 << dev->l2ad_vdev->vdev_ashift;
> buf_a_sz = P2ROUNDUP(buf_sz, align);
> if ((write_asize + buf_a_sz) > target_sz) {
>full = B_TRUE;
>mutex_exit(hash_lock);
>ARCSTAT_BUMP(arcstat_l2_write_full);
>break;
> }
> 
> buf_sz =1536
> align =512
> buf_a_sz =18446744069414585856
> write_asize =0
> target_sz =16777216
> 
> where buf_a_sz is obviousely off by (2^64 - 2^32).
> 
> Maybe this is an effect of crosscompiling i386 on amd64.

Yes, the problem is specific to 32-bit platforms where size_t is 32-bit.

> But anyway, as long as
> i386 is still supported, it should not happen.

Certainly.

> Now, my real concern is: if this really obvious ... made it undetected until
> 10.3, how many other missing typecasts are still in the code??

No need to be dramatic here.  That particular piece code is very new.
I committed it to head in April (r297848), MFC-ed even later.
Apparently no one else who uses 32-bit systems and has L2ARC configured had a
chance to run into the bug.

Thank you very much for discovering and analyzing the bug and providing a fix
for it!


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS l2arc broken in 10.3

2016-10-13 Thread Pete French

Ok, thats a bit worry if true - but I can confirm that l2arc works fine
under 10.3 on amd64, so what you say about cross-compling might be true.
Am taking an inetrest in this as I have just dpeloyed a lot of machines
which are going to be relying on l2arc working to get reasobale performance.

-pete.

On 10/12/16 21:18, Peter wrote:
> Details:
> After upgrading 2 machines from 9.3 to 10.3-STABLE, on one of them the
> l2arc stays empty (capacity alloc = 0), although it is online and gets
> accessed. It did work well on 9.3.
> 
> I did the following tests:
>  * Create a zpool on a stick, with two volumes: one filesystem and one
>cache. The cache stays with alloc=0.
>Export it and move it into the other machine. The cache immediately
>fills.
>Move it back, the cache stays with alloc=0.
>-> this rules out all zpool/zfs get/set options, as they should
>   walk with the pool.
>  * Boot the GENERIC kernel. l2arc stays with alloc=0.
>-> this rules out all my nonstandard kernel options.
>  * Boot in single user mode. l2arc stays with alloc=0.
>-> this rules out all /etc/* config files.
>  * Delete the zpool.cache and reimport pools. l2arc stays with alloc=0.
>  * Copy the /boot/loader.conf settings to the other machine. The l2arc
>still works there.
> 
> I could not think of any remaining place where this could come from,
> except the kernel code itself.
> From there, I found these counters nicely incrementing each second:
>   kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 50758
>   kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 27121
>   kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 40589375488
> But also this counter incrementing:
>   kstat.zfs.misc.arcstats.l2_write_full: 14604
> 
> Then with some printf in the code I saw these values provided:
>  buf_sz = hdr->b_size;
> align = (size_t)1 << dev->l2ad_vdev->vdev_ashift;
> buf_a_sz = P2ROUNDUP(buf_sz, align);
> if ((write_asize + buf_a_sz) > target_sz) {
>full = B_TRUE;
>mutex_exit(hash_lock);
>ARCSTAT_BUMP(arcstat_l2_write_full);
>break;
> }
> 
> buf_sz =1536
> align =512
> buf_a_sz =18446744069414585856
> write_asize =0
> target_sz =16777216
> 
> where buf_a_sz is obviousely off by (2^64 - 2^32).
> 
> Maybe this is an effect of crosscompiling i386 on amd64. But anyway, as
> long as i386 is still supported, it should not happen.
> 
> 
> Now, my real concern is: if this really obvious ... made it undetected
> until 10.3, how many other missing typecasts are still in the code??
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[fixed] ZFS l2arc broken in 10.3

2016-10-12 Thread Peter
sendbug seems not to work anymore, I end up on websites with marketing- 
babble and finally get asked to provide some login and passwd. :(
But the former mail looks like having come back to me, so it seems I'm 
still allowed to post here...


*** sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c.orig   Wed Oct 12 
21:07:25 2016
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.cWed Oct 12 
21:46:16 2016
***
*** 6508,6514 
 */
buf_sz = hdr->b_size;
align = (size_t)1 << dev->l2ad_vdev->vdev_ashift;
!   buf_a_sz = P2ROUNDUP(buf_sz, align);
  
if ((write_asize + buf_a_sz) > target_sz) {
full = B_TRUE;
--- 6508,6514 
 */
buf_sz = hdr->b_size;
align = (size_t)1 << dev->l2ad_vdev->vdev_ashift;
!   buf_a_sz = P2ROUNDUP_TYPED(buf_sz, align, uint64_t);
  
if ((write_asize + buf_a_sz) > target_sz) {
full = B_TRUE;
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS l2arc broken in 10.3

2016-10-12 Thread Peter

Details:
After upgrading 2 machines from 9.3 to 10.3-STABLE, on one of them the
l2arc stays empty (capacity alloc = 0), although it is online and gets
accessed. It did work well on 9.3.

I did the following tests:
 * Create a zpool on a stick, with two volumes: one filesystem and one
   cache. The cache stays with alloc=0.
   Export it and move it into the other machine. The cache immediately
   fills.
   Move it back, the cache stays with alloc=0.
   -> this rules out all zpool/zfs get/set options, as they should
  walk with the pool.
 * Boot the GENERIC kernel. l2arc stays with alloc=0.
   -> this rules out all my nonstandard kernel options.
 * Boot in single user mode. l2arc stays with alloc=0.
   -> this rules out all /etc/* config files.
 * Delete the zpool.cache and reimport pools. l2arc stays with alloc=0.
 * Copy the /boot/loader.conf settings to the other machine. The l2arc
   still works there.

I could not think of any remaining place where this could come from,
except the kernel code itself.
From there, I found these counters nicely incrementing each second:
  kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 50758
  kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 27121
  kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 40589375488
But also this counter incrementing:
  kstat.zfs.misc.arcstats.l2_write_full: 14604

Then with some printf in the code I saw these values provided:
buf_sz = hdr->b_size;
align = (size_t)1 << dev->l2ad_vdev->vdev_ashift;
buf_a_sz = P2ROUNDUP(buf_sz, align);
if ((write_asize + buf_a_sz) > target_sz) {
   full = B_TRUE;
   mutex_exit(hash_lock);
   ARCSTAT_BUMP(arcstat_l2_write_full);
   break;
}

buf_sz =1536
align = 512
buf_a_sz =  18446744069414585856
write_asize =   0
target_sz = 16777216

where buf_a_sz is obviousely off by (2^64 - 2^32).

Maybe this is an effect of crosscompiling i386 on amd64. But anyway, as 
long as i386 is still supported, it should not happen.



Now, my real concern is: if this really obvious ... made it undetected 
until 10.3, how many other missing typecasts are still in the code??


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


ZFS l2arc broken in 10.3

2016-10-12 Thread Peter

details to follow
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"