Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-21 Thread Andriy Gapon
on 18/02/2014 15:47 Vitalij Satanivskij said the following:
 No checksume errors or any other errors found for now.

Thank you again!
Could you please send me an output of
sysctl kstat | fgrep 'l2'
from a system that has been patched and with a sufficiently long uptime?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org



Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-21 Thread Vitalij Satanivskij

Dear Andriy,

system uptime is 8 days, 20:26

Output:

kstat.zfs.misc.arcstats.evict_l2_cached: 9771077767680
kstat.zfs.misc.arcstats.evict_l2_eligible: 3844577713152
kstat.zfs.misc.arcstats.evict_l2_ineligible: 8855320643072
kstat.zfs.misc.arcstats.l2_hits: 79824726
kstat.zfs.misc.arcstats.l2_misses: 217864980
kstat.zfs.misc.arcstats.l2_feeds: 760023
kstat.zfs.misc.arcstats.l2_rw_clash: 61903
kstat.zfs.misc.arcstats.l2_read_bytes: 3058416338944
kstat.zfs.misc.arcstats.l2_write_bytes: 2487863166464
kstat.zfs.misc.arcstats.l2_writes_sent: 732146
kstat.zfs.misc.arcstats.l2_writes_done: 732146
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 51888
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 4416
kstat.zfs.misc.arcstats.l2_evict_reading: 1
kstat.zfs.misc.arcstats.l2_free_on_write: 282867
kstat.zfs.misc.arcstats.l2_cdata_free_on_write: 326028
kstat.zfs.misc.arcstats.l2_abort_lowmem: 1348
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 257940027392
kstat.zfs.misc.arcstats.l2_asize: 108789048832
kstat.zfs.misc.arcstats.l2_hdr_size: 881715600
kstat.zfs.misc.arcstats.l2_compress_successes: 60790954
kstat.zfs.misc.arcstats.l2_compress_zeros: 0
kstat.zfs.misc.arcstats.l2_compress_failures: 1738173
kstat.zfs.misc.arcstats.l2_write_trylock_fail: 1168505250
kstat.zfs.misc.arcstats.l2_write_passed_headroom: 29511803
kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 1307899433
kstat.zfs.misc.arcstats.l2_write_in_l2: 51108634609
kstat.zfs.misc.arcstats.l2_write_io_in_progress: 637
kstat.zfs.misc.arcstats.l2_write_not_cacheable: 100398037509
kstat.zfs.misc.arcstats.l2_write_full: 97839
kstat.zfs.misc.arcstats.l2_write_buffer_iter: 760023
kstat.zfs.misc.arcstats.l2_write_pios: 732146
kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 4642717602824192
kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 48013995
kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 80483



Andriy Gapon wrote:
AG on 18/02/2014 15:47 Vitalij Satanivskij said the following:
AG  No checksume errors or any other errors found for now.
AG 
AG Thank you again!
AG Could you please send me an output of
AG sysctl kstat | fgrep 'l2'
AG from a system that has been patched and with a sufficiently long uptime?
AG 
AG -- 
AG Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-18 Thread Vitalij Satanivskij
Dear Andriy and FreeBSD community,

I'm testing you patch for sometime and looks like everything is ok. 

At last for 5 day of working any notisible memory leak wos not found.


AG 
AG I've been able to spend some time on this issue.
AG Could you please try the following patch?
AG http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.2.patch
AG It obsoletes all previous patches from me.
AG 
AG -- 
AG Andriy Gapon
AG ___
AG freebsd-current@freebsd.org mailing list
AG http://lists.freebsd.org/mailman/listinfo/freebsd-current
AG To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-18 Thread Andriy Gapon
on 18/02/2014 15:38 Vitalij Satanivskij said the following:
 Dear Andriy and FreeBSD community,
 
 I'm testing you patch for sometime and looks like everything is ok. 
 
 At last for 5 day of working any notisible memory leak wos not found.

Vitalij,

thank you very much for testing!
What about those checksum errors?  Do you see them now?

 AG 
 AG I've been able to spend some time on this issue.
 AG Could you please try the following patch?
 AG http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.2.patch
 AG It obsoletes all previous patches from me.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-18 Thread Vitalij Satanivskij
Dear Andriy and FreeBSD community,

No checksume errors or any other errors found for now.


Andriy Gapon wrote:
AG on 18/02/2014 15:38 Vitalij Satanivskij said the following:
AG  Dear Andriy and FreeBSD community,
AG  
AG  I'm testing you patch for sometime and looks like everything is ok. 
AG  
AG  At last for 5 day of working any notisible memory leak wos not found.
AG 
AG Vitalij,
AG 
AG thank you very much for testing!
AG What about those checksum errors?  Do you see them now?
AG 
AG  AG 
AG  AG I've been able to spend some time on this issue.
AG  AG Could you please try the following patch?
AG  AG http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.2.patch
AG  AG It obsoletes all previous patches from me.
AG 
AG -- 
AG Andriy Gapon
AG ___
AG freebsd-current@freebsd.org mailing list
AG http://lists.freebsd.org/mailman/listinfo/freebsd-current
AG To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-14 Thread Andriy Gapon
on 11/02/2014 16:38 Vitalij Satanivskij said the following:
 Get first result's while testing l2 without compression 
 
 Memory leak is not seen for now ( system working only 20 hours) but 
 zfs stats saying that l2 degraded 
 
 output of zfs-stats -L: 
 
 
 ZFS Subsystem ReportTue Feb 11 16:34:43 2014
 
 
 L2 ARC Summary: (DEGRADED)
 Passed Headroom:3.81m
 Tried Lock Failures:79.52m
 IO In Progress: 9
 Low Memory Aborts:  235
 Free on Write:  54.37k
 Writes While Full:  9.68k
 R/W Clashes:2.82k
 Bad Checksums:  211.94k
 IO Errors:  0
 SPA Mismatch:   58.33m
 
 L2 ARC Size: (Adaptive) 243.32  GiB
 Header Size:0.36%   895.11  MiB
 
 L2 ARC Evicts:
 Lock Retries:   45
 Upon Reading:   0
 
 L2 ARC Breakdown:   38.15m
 Hit Ratio:  17.79%  6.79m
 Miss Ratio: 82.21%  31.36m
 Feeds:  88.88k
 
 L2 ARC Buffer:
 Bytes Scanned:  292.58  TiB
 Buffer Iterations:  88.88k
 List Iterations:5.63m
 NULL List Iterations:   17.26k
 
 L2 ARC Writes:
 Writes Sent: (FAULTED)  77.95k
   Done Ratio:   100.00% 77.95k
   Error Ratio:  0.00%   0
 
 
 
 As you can see we have Bad Checksums:  211.94k and 
 growing 
 
 and also 
 Writes Sent: (FAULTED)  77.95k
   Done Ratio:   100.00% 77.95k

I have no clue how this tool summarizes the statistics.  I think that I would
prefer output of vfs.zfs and kstat sysctl hierarchies.

I have no idea what could cause those checksum errors.  This will have to be
investigated separately when I (or someone else) have time.

 
 Another question: Please provide revision number of arc.c against which was 
 diff created (http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.2.patch)
 Because in version in head have some small diferent's and I need manualy aply 
 patch.

I've just updated the patch in-place.  It is now based r261726.
Sorry for the previous version which was against my local tree.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-11 Thread Andriy Gapon
on 07/02/2014 11:11 Andriy Gapon said the following:
 on 05/02/2014 14:22 Vitalij Satanivskij said the following:
 Dear Andriy and FreeBSD community,

 Ok. I'm get coredump on panic.

 What else i need to do?
 
 
 Vitalij, Vladimir,
 
 I have been able to reproduce the leak at work, so now I have full access to 
 all
 debugging information that I need.  Thank you for your testing and reports.
 
 I have reported my observations to OpenZFS developers.  It looks like the 
 author
 of L2ARC compression code is too busy right now to produce a fix.
 Unfortunately, I am not very familiar with the L2ARC code, so I can not 
 promise
 to produce a patch soon.

I've been able to spend some time on this issue.
Could you please try the following patch?
http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.2.patch
It obsoletes all previous patches from me.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-11 Thread Vitalij Satanivskij
Dear Andriy and FreeBSD community,


For now I begin testing l2 cache without compression (with you path provided in 
last messages) in production. 

I will test the new patch on the test server first, and then if all is ok on 
one of the production servers.


Andriy Gapon wrote:
AG on 07/02/2014 11:11 Andriy Gapon said the following:
AG  on 05/02/2014 14:22 Vitalij Satanivskij said the following:
AG  Dear Andriy and FreeBSD community,
AG 
AG  Ok. I'm get coredump on panic.
AG 
AG  What else i need to do?
AG  
AG  
AG  Vitalij, Vladimir,
AG  
AG  I have been able to reproduce the leak at work, so now I have full access 
to all
AG  debugging information that I need.  Thank you for your testing and 
reports.
AG  
AG  I have reported my observations to OpenZFS developers.  It looks like the 
author
AG  of L2ARC compression code is too busy right now to produce a fix.
AG  Unfortunately, I am not very familiar with the L2ARC code, so I can not 
promise
AG  to produce a patch soon.
AG 
AG I've been able to spend some time on this issue.
AG Could you please try the following patch?
AG http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.2.patch
AG It obsoletes all previous patches from me.
AG 
AG -- 
AG Andriy Gapon
AG ___
AG freebsd-current@freebsd.org mailing list
AG http://lists.freebsd.org/mailman/listinfo/freebsd-current
AG To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-11 Thread Vitalij Satanivskij
Get first result's while testing l2 without compression 

Memory leak is not seen for now ( system working only 20 hours) but 
zfs stats saying that l2 degraded 

output of zfs-stats -L: 


ZFS Subsystem ReportTue Feb 11 16:34:43 2014


L2 ARC Summary: (DEGRADED)
Passed Headroom:3.81m
Tried Lock Failures:79.52m
IO In Progress: 9
Low Memory Aborts:  235
Free on Write:  54.37k
Writes While Full:  9.68k
R/W Clashes:2.82k
Bad Checksums:  211.94k
IO Errors:  0
SPA Mismatch:   58.33m

L2 ARC Size: (Adaptive) 243.32  GiB
Header Size:0.36%   895.11  MiB

L2 ARC Evicts:
Lock Retries:   45
Upon Reading:   0

L2 ARC Breakdown:   38.15m
Hit Ratio:  17.79%  6.79m
Miss Ratio: 82.21%  31.36m
Feeds:  88.88k

L2 ARC Buffer:
Bytes Scanned:  292.58  TiB
Buffer Iterations:  88.88k
List Iterations:5.63m
NULL List Iterations:   17.26k

L2 ARC Writes:
Writes Sent: (FAULTED)  77.95k
  Done Ratio:   100.00% 77.95k
  Error Ratio:  0.00%   0



As you can see we have Bad Checksums:  211.94k and 
growing 

and also 
Writes Sent: (FAULTED)  77.95k
  Done Ratio:   100.00% 77.95k



Another question: Please provide revision number of arc.c against which was 
diff created (http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.2.patch)
Because in version in head have some small diferent's and I need manualy aply 
patch.

Thank you.




Vitalij Satanivskij wrote:
VS Dear Andriy and FreeBSD community,
VS 
VS 
VS For now I begin testing l2 cache without compression (with you path 
provided in last messages) in production. 
VS 
VS I will test the new patch on the test server first, and then if all is ok 
on one of the production servers.
VS 
VS 
VS Andriy Gapon wrote:
VS AG on 07/02/2014 11:11 Andriy Gapon said the following:
VS AG  on 05/02/2014 14:22 Vitalij Satanivskij said the following:
VS AG  Dear Andriy and FreeBSD community,
VS AG 
VS AG  Ok. I'm get coredump on panic.
VS AG 
VS AG  What else i need to do?
VS AG  
VS AG  
VS AG  Vitalij, Vladimir,
VS AG  
VS AG  I have been able to reproduce the leak at work, so now I have full 
access to all
VS AG  debugging information that I need.  Thank you for your testing and 
reports.
VS AG  
VS AG  I have reported my observations to OpenZFS developers.  It looks like 
the author
VS AG  of L2ARC compression code is too busy right now to produce a fix.
VS AG  Unfortunately, I am not very familiar with the L2ARC code, so I can 
not promise
VS AG  to produce a patch soon.
VS AG 
VS AG I've been able to spend some time on this issue.
VS AG Could you please try the following patch?
VS AG http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.2.patch
VS AG It obsoletes all previous patches from me.
VS AG 
VS AG -- 
VS AG Andriy Gapon
VS AG ___
VS AG freebsd-current@freebsd.org mailing list
VS AG http://lists.freebsd.org/mailman/listinfo/freebsd-current
VS AG To unsubscribe, send any mail to 
freebsd-current-unsubscr...@freebsd.org
VS ___
VS freebsd-current@freebsd.org mailing list
VS http://lists.freebsd.org/mailman/listinfo/freebsd-current
VS To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-07 Thread Andriy Gapon
on 05/02/2014 14:22 Vitalij Satanivskij said the following:
 Dear Andriy and FreeBSD community,
 
 Ok. I'm get coredump on panic.
 
 What else i need to do?


Vitalij, Vladimir,

I have been able to reproduce the leak at work, so now I have full access to all
debugging information that I need.  Thank you for your testing and reports.

I have reported my observations to OpenZFS developers.  It looks like the author
of L2ARC compression code is too busy right now to produce a fix.
Unfortunately, I am not very familiar with the L2ARC code, so I can not promise
to produce a patch soon.

My recommendation would be to completely disable L2ARC _compression_ (not L2ARC
itself) on your production systems for time being.
The following patch should do that:

--- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
+++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
@@ -5080,20 +5080,22 @@ l2arc_write_buffers
 * ab-b_buf may be invalid by now due to ARC eviction.
 */
l2hdr = ab-b_l2hdr;
l2hdr-b_daddr = dev-l2ad_hand;

+#if 0
if ((ab-b_flags  ARC_L2COMPRESS) 
l2hdr-b_asize = buf_compress_minsz) {
if (l2arc_compress_buf(l2hdr)) {
/*
 * If compression succeeded, enable headroom
 * boost on the next scan cycle.
 */
*headroom_boost = B_TRUE;
}
}
+#endif

/*
 * Pick up the buffer data we had previously stashed away
 * (and now potentially also compressed).
 */


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-05 Thread Vitalij Satanivskij
Dear Andriy and FreeBSD community,

Andriy Gapon wrote:
AG on 04/02/2014 19:10 Vitalij Satanivskij said the following:
AG  Dear Andriy and FreeBSD community,
AG  
AG  I'm aply patch and ofter few minutes of work get new panic 
AG  
AG  screen shot on picture.
AG  
AG  http://i59.tinypic.com/sfctvc.jpg
AG 
AG Does this happen too early to get a crashdump?
AG Do you have a chance to attach with remote kgdb?

How I reproduce crash - simply attach cache device (zpool add pool cache 
/dev/gpt/cache0 ) and 
run ls -R -la /pool 

I repeat eksperimet and try to get core.

About kgdb - server on which we test path is no very critical so I can connect 
via remove ipmi (acceptibly from local network) 
and run some comands at any time and of course I try to get kernel core dump



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-05 Thread Vitalij Satanivskij
Dear Andriy and FreeBSD community,

Ok. I'm get coredump on panic.

What else i need to do?


Vitalij Satanivskij wrote:
VS Dear Andriy and FreeBSD community,
VS 
VS Andriy Gapon wrote:
VS AG on 04/02/2014 19:10 Vitalij Satanivskij said the following:
VS AG  Dear Andriy and FreeBSD community,
VS AG  
VS AG  I'm aply patch and ofter few minutes of work get new panic 
VS AG  
VS AG  screen shot on picture.
VS AG  
VS AG  http://i59.tinypic.com/sfctvc.jpg
VS AG 
VS AG Does this happen too early to get a crashdump?
VS AG Do you have a chance to attach with remote kgdb?
VS 
VS How I reproduce crash - simply attach cache device (zpool add pool cache 
/dev/gpt/cache0 ) and 
VS run ls -R -la /pool 
VS 
VS I repeat eksperimet and try to get core.
VS 
VS About kgdb - server on which we test path is no very critical so I can 
connect via remove ipmi (acceptibly from local network) 
VS and run some comands at any time and of course I try to get kernel core dump
VS 
VS 
VS 
VS ___
VS freebsd-current@freebsd.org mailing list
VS http://lists.freebsd.org/mailman/listinfo/freebsd-current
VS To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-04 Thread Vitalij Satanivskij

Dear Andriy and FreeBSD community,

With patch system panic on boot. 

After remove cache device from pool system boot without problem.

After this cache added again and sone kernel panic happened

Screen shot of panic here http://i61.tinypic.com/30sbx2g.jpg



Vitalij Satanivskij wrote:
VS Dear Andriy and FreeBSD community,
VS 
VS Build world with path failed with error 
VS 
VS 
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4642:13:
 error: use of
VS   undeclared identifier 'l2hdr'
VS ASSERT3P(l2hdr-b_tmp_cdata, ==, NULL);
VS  ^
VS 
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:125:40:
 note: expanded from
VS   macro 'ASSERT3P'
VS #define ASSERT3P(x, y, z)   VERIFY3_IMPL(x, y, z, uintptr_t)
VS  ^
VS 
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:109:29:
 note: expanded from
VS   macro 'VERIFY3_IMPL'
VS const TYPE __left = (TYPE)(LEFT); \
VS^
VS 1 error generated.
VS *** Error code 1
VS 
VS 
VS 
VS Vladimir Sharun wrote:
VS VS Dear Andriy and FreeBSD community,
VS VS 
VS VS L2ARC temporarily turned off by setting secondarycache=none everywhere 
it was enabled,
VS VS so no more leak for one particular day.
VS VS 
VS VS Here's the top header:
VS VS last pid: 89916;  load averages:  2.49,  2.91,  2.89up 5+19:21:42  
14:09:12
VS VS 561 processes: 2 running, 559 sleeping
VS VS CPU:  5.7% user,  0.0% nice, 14.0% system,  1.0% interrupt, 79.3% idle
VS VS Mem: 23G Active, 1017M Inact, 98G Wired, 1294M Cache, 3285M Buf, 1997M 
Free
VS VS ARC: 69G Total, 3498M MFU, 59G MRU, 53M Anon, 1651M Header, 4696M Other
VS VS Swap:
VS VS 
VS VS Here's the calculated vmstat -z (mean all of the allocations, which 
exceeds 100*1024^2 printed):
VS VS UMA Slabs:  199,915M
VS VS VM OBJECT:  207,354M
VS VS 32: 205,558M
VS VS 64: 901,122M
VS VS 128:215,211M
VS VS 256:242,262M
VS VS 4096:   2316,01M
VS VS range_seg_cache:205,396M
VS VS zio_buf_512:1103,31M
VS VS zio_buf_16384:  15697,9M
VS VS zio_data_buf_16384: 348,297M
VS VS zio_data_buf_24576: 129,352M
VS VS zio_data_buf_32768: 104,375M
VS VS zio_data_buf_36864: 163,371M
VS VS zio_data_buf_53248: 100,496M
VS VS zio_data_buf_57344: 105,93M
VS VS zio_data_buf_65536: 101,75M
VS VS zio_data_buf_73728: 111,938M
VS VS zio_data_buf_90112: 104,414M
VS VS zio_data_buf_106496:100,242M
VS VS zio_data_buf_131072:61652,5M
VS VS dnode_t:3203,98M
VS VS dmu_buf_impl_t: 797,695M
VS VS arc_buf_hdr_t:  1498,76M
VS VS arc_buf_t:  105,802M
VS VS zfs_znode_cache:352,61M
VS VS 
VS VS zio_data_buf_131072 (61652M) + zio_buf_16384 (15698M) = 77350M
VS VS easily exceeds ARC total (70G)
VS VS 
VS VS 
VS VS Here's the same calculations from exact the same system where L2 was 
disabled before reboot:
VS VS last pid: 63407;  load averages:  2.35,  2.71,  2.73up 8+19:42:54  
14:17:33
VS VS 527 processes: 1 running, 526 sleeping
VS VS CPU:  4.8% user,  0.0% nice,  6.6% system,  1.1% interrupt, 87.4% idle
VS VS Mem: 21G Active, 1460M Inact, 99G Wired, 1748M Cache, 3308M Buf, 952M 
Free
VS VS ARC: 87G Total, 4046M MFU, 76G MRU, 37M Anon, 2026M Header, 4991M Other
VS VS Swap:
VS VS 
VS VS and the vmstat -z filtered:
VS VS UMA Slabs:  208,004M
VS VS VM OBJECT:  207,392M
VS VS 32: 172,831M
VS VS 64: 752,226M
VS VS 128:210,024M
VS VS 256:244,204M
VS VS 4096:   2249,02M
VS VS range_seg_cache:245,711M
VS VS zio_buf_512:1145,25M
VS VS zio_buf_16384:  15170,1M
VS VS zio_data_buf_16384: 422,766M
VS VS zio_data_buf_20480: 120,742M
VS VS zio_data_buf_24576: 148,641M
VS VS zio_data_buf_28672: 112,848M
VS VS zio_data_buf_32768: 117,375M
VS VS zio_data_buf_36864: 185,379M
VS VS zio_data_buf_45056: 103,168M
VS VS zio_data_buf_53248: 105,32M
VS VS zio_data_buf_57344: 122,828M
VS VS zio_data_buf_65536: 109,25M
VS VS zio_data_buf_69632: 100,406M
VS VS zio_data_buf_73728: 126,844M
VS VS zio_data_buf_77824: 101,086M
VS VS zio_data_buf_81920: 100,391M
VS VS zio_data_buf_86016: 101,391M
VS VS zio_data_buf_90112: 112,836M
VS VS zio_data_buf_98304: 100,688M
VS VS zio_data_buf_102400:106,543M
VS VS zio_data_buf_106496:108,875M
VS VS zio_data_buf_131072:63190,5M
VS VS dnode_t:3437,36M
VS VS dmu_buf_impl_t: 840,62M
VS VS arc_buf_hdr_t:  1870,88M
VS VS arc_buf_t:  114,942M
VS VS zfs_znode_cache:353,055M
VS VS 
VS VS Everything seems within ARC total range.
VS VS 
VS VS We will try patch attached within few days and will come back with the 
result.
VS VS 
VS VS Thank you for your help.
VS VS 
VS VS  on 28/01/2014 11:28 Vladimir Sharun said the following:
VS VS   Dear Andriy and FreeBSD community,
VS VS   
VS VS   After 

Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-04 Thread Andriy Gapon
on 04/02/2014 12:08 Vitalij Satanivskij said the following:
 
 Dear Andriy and FreeBSD community,
 
 With patch system panic on boot. 
 
 After remove cache device from pool system boot without problem.
 
 After this cache added again and sone kernel panic happened
 
 Screen shot of panic here http://i61.tinypic.com/30sbx2g.jpg

I think that my previous patch was wrong.
I've updated it in place:
http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.patch


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-04 Thread Vitalij Satanivskij
Dear Andriy and FreeBSD community,

I'm aply patch and ofter few minutes of work get new panic 

screen shot on picture.

http://i59.tinypic.com/sfctvc.jpg



Andriy Gapon wrote:
AG on 04/02/2014 12:08 Vitalij Satanivskij said the following:
AG  
AG  Dear Andriy and FreeBSD community,
AG  
AG  With patch system panic on boot. 
AG  
AG  After remove cache device from pool system boot without problem.
AG  
AG  After this cache added again and sone kernel panic happened
AG  
AG  Screen shot of panic here http://i61.tinypic.com/30sbx2g.jpg
AG 
AG I think that my previous patch was wrong.
AG I've updated it in place:
AG http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.patch
AG 
AG 
AG -- 
AG Andriy Gapon
AG ___
AG freebsd-current@freebsd.org mailing list
AG http://lists.freebsd.org/mailman/listinfo/freebsd-current
AG To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-02-04 Thread Andriy Gapon
on 04/02/2014 19:10 Vitalij Satanivskij said the following:
 Dear Andriy and FreeBSD community,
 
 I'm aply patch and ofter few minutes of work get new panic 
 
 screen shot on picture.
 
 http://i59.tinypic.com/sfctvc.jpg

Does this happen too early to get a crashdump?
Do you have a chance to attach with remote kgdb?

 Andriy Gapon wrote:
 AG on 04/02/2014 12:08 Vitalij Satanivskij said the following:
 AG  
 AG  Dear Andriy and FreeBSD community,
 AG  
 AG  With patch system panic on boot. 
 AG  
 AG  After remove cache device from pool system boot without problem.
 AG  
 AG  After this cache added again and sone kernel panic happened
 AG  
 AG  Screen shot of panic here http://i61.tinypic.com/30sbx2g.jpg
 AG 
 AG I think that my previous patch was wrong.
 AG I've updated it in place:
 AG http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.patch

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-31 Thread Vitalij Satanivskij
Dear Andriy and FreeBSD community,

Build world with path failed with error 

/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4642:13:
 error: use of
  undeclared identifier 'l2hdr'
ASSERT3P(l2hdr-b_tmp_cdata, ==, NULL);
 ^
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:125:40:
 note: expanded from
  macro 'ASSERT3P'
#define ASSERT3P(x, y, z)   VERIFY3_IMPL(x, y, z, uintptr_t)
 ^
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:109:29:
 note: expanded from
  macro 'VERIFY3_IMPL'
const TYPE __left = (TYPE)(LEFT); \
   ^
1 error generated.
*** Error code 1



Vladimir Sharun wrote:
VS Dear Andriy and FreeBSD community,
VS 
VS L2ARC temporarily turned off by setting secondarycache=none everywhere it 
was enabled,
VS so no more leak for one particular day.
VS 
VS Here's the top header:
VS last pid: 89916;  load averages:  2.49,  2.91,  2.89up 5+19:21:42  
14:09:12
VS 561 processes: 2 running, 559 sleeping
VS CPU:  5.7% user,  0.0% nice, 14.0% system,  1.0% interrupt, 79.3% idle
VS Mem: 23G Active, 1017M Inact, 98G Wired, 1294M Cache, 3285M Buf, 1997M Free
VS ARC: 69G Total, 3498M MFU, 59G MRU, 53M Anon, 1651M Header, 4696M Other
VS Swap:
VS 
VS Here's the calculated vmstat -z (mean all of the allocations, which exceeds 
100*1024^2 printed):
VS UMA Slabs:  199,915M
VS VM OBJECT:  207,354M
VS 32: 205,558M
VS 64: 901,122M
VS 128:215,211M
VS 256:242,262M
VS 4096:   2316,01M
VS range_seg_cache:205,396M
VS zio_buf_512:1103,31M
VS zio_buf_16384:  15697,9M
VS zio_data_buf_16384: 348,297M
VS zio_data_buf_24576: 129,352M
VS zio_data_buf_32768: 104,375M
VS zio_data_buf_36864: 163,371M
VS zio_data_buf_53248: 100,496M
VS zio_data_buf_57344: 105,93M
VS zio_data_buf_65536: 101,75M
VS zio_data_buf_73728: 111,938M
VS zio_data_buf_90112: 104,414M
VS zio_data_buf_106496:100,242M
VS zio_data_buf_131072:61652,5M
VS dnode_t:3203,98M
VS dmu_buf_impl_t: 797,695M
VS arc_buf_hdr_t:  1498,76M
VS arc_buf_t:  105,802M
VS zfs_znode_cache:352,61M
VS 
VS zio_data_buf_131072 (61652M) + zio_buf_16384 (15698M) = 77350M
VS easily exceeds ARC total (70G)
VS 
VS 
VS Here's the same calculations from exact the same system where L2 was 
disabled before reboot:
VS last pid: 63407;  load averages:  2.35,  2.71,  2.73up 8+19:42:54  
14:17:33
VS 527 processes: 1 running, 526 sleeping
VS CPU:  4.8% user,  0.0% nice,  6.6% system,  1.1% interrupt, 87.4% idle
VS Mem: 21G Active, 1460M Inact, 99G Wired, 1748M Cache, 3308M Buf, 952M Free
VS ARC: 87G Total, 4046M MFU, 76G MRU, 37M Anon, 2026M Header, 4991M Other
VS Swap:
VS 
VS and the vmstat -z filtered:
VS UMA Slabs:  208,004M
VS VM OBJECT:  207,392M
VS 32: 172,831M
VS 64: 752,226M
VS 128:210,024M
VS 256:244,204M
VS 4096:   2249,02M
VS range_seg_cache:245,711M
VS zio_buf_512:1145,25M
VS zio_buf_16384:  15170,1M
VS zio_data_buf_16384: 422,766M
VS zio_data_buf_20480: 120,742M
VS zio_data_buf_24576: 148,641M
VS zio_data_buf_28672: 112,848M
VS zio_data_buf_32768: 117,375M
VS zio_data_buf_36864: 185,379M
VS zio_data_buf_45056: 103,168M
VS zio_data_buf_53248: 105,32M
VS zio_data_buf_57344: 122,828M
VS zio_data_buf_65536: 109,25M
VS zio_data_buf_69632: 100,406M
VS zio_data_buf_73728: 126,844M
VS zio_data_buf_77824: 101,086M
VS zio_data_buf_81920: 100,391M
VS zio_data_buf_86016: 101,391M
VS zio_data_buf_90112: 112,836M
VS zio_data_buf_98304: 100,688M
VS zio_data_buf_102400:106,543M
VS zio_data_buf_106496:108,875M
VS zio_data_buf_131072:63190,5M
VS dnode_t:3437,36M
VS dmu_buf_impl_t: 840,62M
VS arc_buf_hdr_t:  1870,88M
VS arc_buf_t:  114,942M
VS zfs_znode_cache:353,055M
VS 
VS Everything seems within ARC total range.
VS 
VS We will try patch attached within few days and will come back with the 
result.
VS 
VS Thank you for your help.
VS 
VS  on 28/01/2014 11:28 Vladimir Sharun said the following:
VS   Dear Andriy and FreeBSD community,
VS   
VS   After applying this path one of the systems runs fine (disk subsystem 
load low to moderate 
VS   - 10-20% busy sustained),
VS   
VS   Then I saw this patch was merged to the HEAD and we apply it to the one 
of the systems 
VS   with moderate to high disk load: 30-60% busy (11.0-CURRENT #7 r261118: 
Fri Jan 24 17:25:08 EET 2014)
VS   
VS   Within 4 days we experiencing the same leak(?) as without patch: 
VS   
VS   last pid: 53841;  load averages:  4.47,  4.18,  3.78 up 3+16:37:09  
11:24:39
VS   543 processes: 6 running, 537 sleeping
VS   CPU:  8.7% user,  0.0% nice, 14.6% system,  1.4% interrupt, 75.3% idle
VS   Mem: 22G Active, 1045M Inact, 98G Wired, 

Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-30 Thread Andriy Gapon
on 28/01/2014 11:28 Vladimir Sharun said the following:
 Dear Andriy and FreeBSD community,
 
 After applying this path one of the systems runs fine (disk subsystem load 
 low to moderate 
 - 10-20% busy sustained),
 
 Then I saw this patch was merged to the HEAD and we apply it to the one of 
 the systems 
 with moderate to high disk load: 30-60% busy (11.0-CURRENT #7 r261118: Fri 
 Jan 24 17:25:08 EET 2014)
 
 Within 4 days we experiencing the same leak(?) as without patch: 
 
 last pid: 53841;  load averages:  4.47,  4.18,  3.78 up 3+16:37:09  
 11:24:39
 543 processes: 6 running, 537 sleeping
 CPU:  8.7% user,  0.0% nice, 14.6% system,  1.4% interrupt, 75.3% idle
 Mem: 22G Active, 1045M Inact, 98G Wired, 1288M Cache, 3284M Buf, 2246M Free
 ARC: 73G Total, 3763M MFU, 62G MRU, 56M Anon, 1887M Header, 4969M Other
 Swap:
 
 The ARC is populated within 30mins under load to the max (90Gb) then start 
 decreasing.
 
 The delta between Wiread and ARC total start growing from typical 10-12Gb 
 without L2 enabled
 to the 25Gb with L2 enabled and counting (4 hours ago was 22Gb delta).

First,  have you checked that vmstat -z output contains the same anomaly as for
in your original report?

If yes, the please try to reproduce the problem with the following debugging 
patch:
http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.patch
Please make sure to compile your kernel (and modules) with INVARIANTS.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re[2]: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-30 Thread Vladimir Sharun
Dear Andriy and FreeBSD community,

L2ARC temporarily turned off by setting secondarycache=none everywhere it was 
enabled,
so no more leak for one particular day.

Here's the top header:
last pid: 89916;  load averages:  2.49,  2.91,  2.89up 5+19:21:42  14:09:12
561 processes: 2 running, 559 sleeping
CPU:  5.7% user,  0.0% nice, 14.0% system,  1.0% interrupt, 79.3% idle
Mem: 23G Active, 1017M Inact, 98G Wired, 1294M Cache, 3285M Buf, 1997M Free
ARC: 69G Total, 3498M MFU, 59G MRU, 53M Anon, 1651M Header, 4696M Other
Swap:

Here's the calculated vmstat -z (mean all of the allocations, which exceeds 
100*1024^2 printed):
UMA Slabs:  199,915M
VM OBJECT:  207,354M
32: 205,558M
64: 901,122M
128:215,211M
256:242,262M
4096:   2316,01M
range_seg_cache:205,396M
zio_buf_512:1103,31M
zio_buf_16384:  15697,9M
zio_data_buf_16384: 348,297M
zio_data_buf_24576: 129,352M
zio_data_buf_32768: 104,375M
zio_data_buf_36864: 163,371M
zio_data_buf_53248: 100,496M
zio_data_buf_57344: 105,93M
zio_data_buf_65536: 101,75M
zio_data_buf_73728: 111,938M
zio_data_buf_90112: 104,414M
zio_data_buf_106496:100,242M
zio_data_buf_131072:61652,5M
dnode_t:3203,98M
dmu_buf_impl_t: 797,695M
arc_buf_hdr_t:  1498,76M
arc_buf_t:  105,802M
zfs_znode_cache:352,61M

zio_data_buf_131072 (61652M) + zio_buf_16384 (15698M) = 77350M
easily exceeds ARC total (70G)


Here's the same calculations from exact the same system where L2 was disabled 
before reboot:
last pid: 63407;  load averages:  2.35,  2.71,  2.73up 8+19:42:54  14:17:33
527 processes: 1 running, 526 sleeping
CPU:  4.8% user,  0.0% nice,  6.6% system,  1.1% interrupt, 87.4% idle
Mem: 21G Active, 1460M Inact, 99G Wired, 1748M Cache, 3308M Buf, 952M Free
ARC: 87G Total, 4046M MFU, 76G MRU, 37M Anon, 2026M Header, 4991M Other
Swap:

and the vmstat -z filtered:
UMA Slabs:  208,004M
VM OBJECT:  207,392M
32: 172,831M
64: 752,226M
128:210,024M
256:244,204M
4096:   2249,02M
range_seg_cache:245,711M
zio_buf_512:1145,25M
zio_buf_16384:  15170,1M
zio_data_buf_16384: 422,766M
zio_data_buf_20480: 120,742M
zio_data_buf_24576: 148,641M
zio_data_buf_28672: 112,848M
zio_data_buf_32768: 117,375M
zio_data_buf_36864: 185,379M
zio_data_buf_45056: 103,168M
zio_data_buf_53248: 105,32M
zio_data_buf_57344: 122,828M
zio_data_buf_65536: 109,25M
zio_data_buf_69632: 100,406M
zio_data_buf_73728: 126,844M
zio_data_buf_77824: 101,086M
zio_data_buf_81920: 100,391M
zio_data_buf_86016: 101,391M
zio_data_buf_90112: 112,836M
zio_data_buf_98304: 100,688M
zio_data_buf_102400:106,543M
zio_data_buf_106496:108,875M
zio_data_buf_131072:63190,5M
dnode_t:3437,36M
dmu_buf_impl_t: 840,62M
arc_buf_hdr_t:  1870,88M
arc_buf_t:  114,942M
zfs_znode_cache:353,055M

Everything seems within ARC total range.

We will try patch attached within few days and will come back with the result.

Thank you for your help.

 on 28/01/2014 11:28 Vladimir Sharun said the following:
  Dear Andriy and FreeBSD community,
  
  After applying this path one of the systems runs fine (disk subsystem load 
  low to moderate 
  - 10-20% busy sustained),
  
  Then I saw this patch was merged to the HEAD and we apply it to the one of 
  the systems 
  with moderate to high disk load: 30-60% busy (11.0-CURRENT #7 r261118: Fri 
  Jan 24 17:25:08 EET 2014)
  
  Within 4 days we experiencing the same leak(?) as without patch: 
  
  last pid: 53841;  load averages:  4.47,  4.18,  3.78 up 3+16:37:09  
  11:24:39
  543 processes: 6 running, 537 sleeping
  CPU:  8.7% user,  0.0% nice, 14.6% system,  1.4% interrupt, 75.3% idle
  Mem: 22G Active, 1045M Inact, 98G Wired, 1288M Cache, 3284M Buf, 2246M Free
  ARC: 73G Total, 3763M MFU, 62G MRU, 56M Anon, 1887M Header, 4969M Other
  Swap:
  
  The ARC is populated within 30mins under load to the max (90Gb) then start 
  decreasing.
  
  The delta between Wiread and ARC total start growing from typical 10-12Gb 
  without L2 enabled
  to the 25Gb with L2 enabled and counting (4 hours ago was 22Gb delta).
 
 First,  have you checked that vmstat -z output contains the same anomaly as 
 for
 in your original report?
 
 If yes, the please try to reproduce the problem with the following debugging 
 patch:
 http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.patch
 Please make sure to compile your kernel (and modules) with INVARIANTS.
 
 -- 
 Andriy Gapon
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re[2]: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-28 Thread Vladimir Sharun
Dear Andriy and FreeBSD community,

After applying this path one of the systems runs fine (disk subsystem load low 
to moderate 
- 10-20% busy sustained),

Then I saw this patch was merged to the HEAD and we apply it to the one of the 
systems 
with moderate to high disk load: 30-60% busy (11.0-CURRENT #7 r261118: Fri Jan 
24 17:25:08 EET 2014)

Within 4 days we experiencing the same leak(?) as without patch: 

last pid: 53841;  load averages:  4.47,  4.18,  3.78 up 3+16:37:09  11:24:39
543 processes: 6 running, 537 sleeping
CPU:  8.7% user,  0.0% nice, 14.6% system,  1.4% interrupt, 75.3% idle
Mem: 22G Active, 1045M Inact, 98G Wired, 1288M Cache, 3284M Buf, 2246M Free
ARC: 73G Total, 3763M MFU, 62G MRU, 56M Anon, 1887M Header, 4969M Other
Swap:

The ARC is populated within 30mins under load to the max (90Gb) then start 
decreasing.

The delta between Wiread and ARC total start growing from typical 10-12Gb 
without L2 enabled
to the 25Gb with L2 enabled and counting (4 hours ago was 22Gb delta).

L2ARC statistics:

L2 ARC Size: (Adaptive) 291.63  GiB
Header Size:0.25%   734.14  MiB

L2 ARC Evicts:
Lock Retries:   682
Upon Reading:   0

L2 ARC Breakdown:   106.56m
Hit Ratio:  29.04%  30.95m
Miss Ratio: 70.96%  75.62m
Feeds:  317.18k

So, again, what shall we do to better understand/mitigate the problem further ?

Thank you.

 on 15/01/2014 12:28 Vitalij Satanivskij said the following:
  Dear Andriy and FreeBSD community,
  
  Andriy Gapon wrote:
  AG on 14/01/2014 07:27 Vladimir Sharun said the following:
  AG  Dear Andriy and FreeBSD community,
  AG  
  AG  I am not sure if the buffers are leaked somehow or if they are 
  actually in use.
  AG  It's one of the very few places where data buffers are allocated 
  without
  AG  charging ARC. In all other places it's quite easy to match 
  allocations and
  AG  deallocations. But in L2ARC it is not obvious that all buffers get 
  freed or
  AG  when that happens.
  AG  
  AG  After one week under load I think we figure out the cause: it's 
  L2ARC. 
  AG  Here's the top's header for 7d17h of the runtime:
  AG  
  AG  last pid: 46409; load averages: 0.37, 0.62, 0.70 up 7+17:14:01 
  07:24:10
  AG  173 processes: 1 running, 171 sleeping, 1 zombie
  AG  CPU: 2.0% user, 0.0% nice, 3.5% system, 0.4% interrupt, 94.2% idle
  AG  Mem: 8714M Active, 14G Inact, 96G Wired, 1929M Cache, 3309M Buf, 
  3542M Free
  AG  ARC: 85G Total, 2558M MFU, 77G MRU, 28M Anon, 1446M Header, 4802M 
  Other
  AG  
  AG  ARC related tunables:
  AG  
  AG  vm.kmem_size=110G
  AG  vfs.zfs.arc_max=90G
  AG  vfs.zfs.arc_min=42G
  AG  
  AG  For more than 7 days of hard runtime the picture clearly shows: 
  AG  Wired minus ARC = 11..12Gb, ARC grow and shrinks in 80-87Gb range and 
  the
  AG  system runs just fine.
  AG  
  AG  So what shall we do with L2ARC leakage ?
  AG 
  AG 
  AG Could you please try this patch
  AG http://cr.illumos.org/~webrev/skiselkov/3995/illumos-gate.patch ?
  AG 
  
  While applying path to curent version of arc.c (r260622) I'm found next 
  truble with compilation 
  
  olaris/uts/common/fs/zfs/arc.c -o arc.o
  /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4628:18:
   error: use of
  undeclared identifier 'abl2'
  trim_map_free(abl2-b_dev-l2ad_vdev, abl2-b_daddr,
  ^
  1 error generated.
  *** Error code 1
  
  
  the code is - 
  
  if (zio-io_error != 0) {
  /*
  * Error - drop L2ARC entry.
  */
  list_remove(buflist, ab);
  ARCSTAT_INCR(arcstat_l2_asize, -l2hdr-b_asize);
  ab-b_l2hdr = NULL;
  trim_map_free(abl2-b_dev-l2ad_vdev, abl2-b_daddr,
  ab-b_size, 0);
  kmem_free(l2hdr, sizeof (l2arc_buf_hdr_t));
  ARCSTAT_INCR(arcstat_l2_size, -ab-b_size);
  }
  
  
  Looks like it's part is freebsd specific changes.
  Can somebody help with this part of code ?
  
 
 The first hunk of the patch is renaming of abl2 to l2hdr.
 
 -- 
 Andriy Gapon
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-15 Thread Vitalij Satanivskij
Dear Andriy and FreeBSD community,

Andriy Gapon wrote:
AG on 14/01/2014 07:27 Vladimir Sharun said the following:
AG  Dear Andriy and FreeBSD community,
AG  
AG  I am not sure if the buffers are leaked somehow or if they are actually 
in use.
AG  It's one of the very few places where data buffers are allocated without
AG  charging ARC.  In all other places it's quite easy to match allocations 
and
AG  deallocations.  But in L2ARC it is not obvious that all buffers get 
freed or
AG  when that happens.
AG  
AG  After one week under load I think we figure out the cause: it's L2ARC. 
AG  Here's the top's header for 7d17h of the runtime:
AG  
AG  last pid: 46409;  load averages:  0.37,  0.62,  0.70 up 7+17:14:01  
07:24:10
AG  173 processes: 1 running, 171 sleeping, 1 zombie
AG  CPU:  2.0% user,  0.0% nice,  3.5% system,  0.4% interrupt, 94.2% idle
AG  Mem: 8714M Active, 14G Inact, 96G Wired, 1929M Cache, 3309M Buf, 3542M 
Free
AG  ARC: 85G Total, 2558M MFU, 77G MRU, 28M Anon, 1446M Header, 4802M Other
AG  
AG  ARC related tunables:
AG  
AG  vm.kmem_size=110G
AG  vfs.zfs.arc_max=90G
AG  vfs.zfs.arc_min=42G
AG  
AG  For more than 7 days of hard runtime the picture clearly shows: 
AG  Wired minus ARC = 11..12Gb, ARC grow and shrinks in 80-87Gb range and the
AG  system runs just fine.
AG  
AG  So what shall we do with L2ARC leakage ?
AG 
AG 
AG Could you please try this patch
AG http://cr.illumos.org/~webrev/skiselkov/3995/illumos-gate.patch ?
AG 

While applying path to curent version of arc.c (r260622) I'm found next truble 
with compilation 

olaris/uts/common/fs/zfs/arc.c -o arc.o
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4628:18:
 error: use of
  undeclared identifier 'abl2'
trim_map_free(abl2-b_dev-l2ad_vdev, abl2-b_daddr,
  ^
1 error generated.
*** Error code 1


the code is - 

if (zio-io_error != 0) {
/*
 * Error - drop L2ARC entry.
 */
list_remove(buflist, ab);
ARCSTAT_INCR(arcstat_l2_asize, -l2hdr-b_asize);
ab-b_l2hdr = NULL;
trim_map_free(abl2-b_dev-l2ad_vdev, abl2-b_daddr,
ab-b_size, 0);
kmem_free(l2hdr, sizeof (l2arc_buf_hdr_t));
ARCSTAT_INCR(arcstat_l2_size, -ab-b_size);
}


Looks like it's part is freebsd specific changes.
Can somebody help with this part of code ?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-15 Thread Andriy Gapon
on 15/01/2014 12:28 Vitalij Satanivskij said the following:
 Dear Andriy and FreeBSD community,
 
 Andriy Gapon wrote:
 AG on 14/01/2014 07:27 Vladimir Sharun said the following:
 AG  Dear Andriy and FreeBSD community,
 AG  
 AG  I am not sure if the buffers are leaked somehow or if they are 
 actually in use.
 AG  It's one of the very few places where data buffers are allocated 
 without
 AG  charging ARC.  In all other places it's quite easy to match 
 allocations and
 AG  deallocations.  But in L2ARC it is not obvious that all buffers get 
 freed or
 AG  when that happens.
 AG  
 AG  After one week under load I think we figure out the cause: it's L2ARC. 
 AG  Here's the top's header for 7d17h of the runtime:
 AG  
 AG  last pid: 46409;  load averages:  0.37,  0.62,  0.70 up 7+17:14:01  
 07:24:10
 AG  173 processes: 1 running, 171 sleeping, 1 zombie
 AG  CPU:  2.0% user,  0.0% nice,  3.5% system,  0.4% interrupt, 94.2% idle
 AG  Mem: 8714M Active, 14G Inact, 96G Wired, 1929M Cache, 3309M Buf, 3542M 
 Free
 AG  ARC: 85G Total, 2558M MFU, 77G MRU, 28M Anon, 1446M Header, 4802M Other
 AG  
 AG  ARC related tunables:
 AG  
 AG  vm.kmem_size=110G
 AG  vfs.zfs.arc_max=90G
 AG  vfs.zfs.arc_min=42G
 AG  
 AG  For more than 7 days of hard runtime the picture clearly shows: 
 AG  Wired minus ARC = 11..12Gb, ARC grow and shrinks in 80-87Gb range and 
 the
 AG  system runs just fine.
 AG  
 AG  So what shall we do with L2ARC leakage ?
 AG 
 AG 
 AG Could you please try this patch
 AG http://cr.illumos.org/~webrev/skiselkov/3995/illumos-gate.patch ?
 AG 
 
 While applying path to curent version of arc.c (r260622) I'm found next 
 truble with compilation 
 
 olaris/uts/common/fs/zfs/arc.c -o arc.o
 /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4628:18:
  error: use of
   undeclared identifier 'abl2'
 trim_map_free(abl2-b_dev-l2ad_vdev, abl2-b_daddr,
   ^
 1 error generated.
 *** Error code 1
 
 
 the code is - 
 
 if (zio-io_error != 0) {
 /*
  * Error - drop L2ARC entry.
  */
 list_remove(buflist, ab);
 ARCSTAT_INCR(arcstat_l2_asize, -l2hdr-b_asize);
 ab-b_l2hdr = NULL;
 trim_map_free(abl2-b_dev-l2ad_vdev, abl2-b_daddr,
 ab-b_size, 0);
 kmem_free(l2hdr, sizeof (l2arc_buf_hdr_t));
 ARCSTAT_INCR(arcstat_l2_size, -ab-b_size);
 }
 
 
 Looks like it's part is freebsd specific changes.
 Can somebody help with this part of code ?
 

The first hunk of the patch is renaming of abl2 to l2hdr.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-15 Thread Vitalij Satanivskij
Dear Andriy and FreeBSD community,
AG 
AG The first hunk of the patch is renaming of abl2 to l2hdr.
AG 

So it.s ok just change 
 trim_map_free(abl2-b_dev-l2ad_vdev, abl2-b_daddr,
 ab-b_size, 0);
to 
 trim_map_free(l2hdr-b_dev-l2ad_vdev, l2hdr-b_daddr,
 ab-b_size, 0);
?
Ok. Thank you. I will try this patch
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-14 Thread Andriy Gapon
on 14/01/2014 07:27 Vladimir Sharun said the following:
 Dear Andriy and FreeBSD community,
 
 I am not sure if the buffers are leaked somehow or if they are actually in 
 use.
 It's one of the very few places where data buffers are allocated without
 charging ARC.  In all other places it's quite easy to match allocations and
 deallocations.  But in L2ARC it is not obvious that all buffers get freed or
 when that happens.
 
 After one week under load I think we figure out the cause: it's L2ARC. 
 Here's the top's header for 7d17h of the runtime:
 
 last pid: 46409;  load averages:  0.37,  0.62,  0.70 up 7+17:14:01  07:24:10
 173 processes: 1 running, 171 sleeping, 1 zombie
 CPU:  2.0% user,  0.0% nice,  3.5% system,  0.4% interrupt, 94.2% idle
 Mem: 8714M Active, 14G Inact, 96G Wired, 1929M Cache, 3309M Buf, 3542M Free
 ARC: 85G Total, 2558M MFU, 77G MRU, 28M Anon, 1446M Header, 4802M Other
 
 ARC related tunables:
 
 vm.kmem_size=110G
 vfs.zfs.arc_max=90G
 vfs.zfs.arc_min=42G
 
 For more than 7 days of hard runtime the picture clearly shows: 
 Wired minus ARC = 11..12Gb, ARC grow and shrinks in 80-87Gb range and the
 system runs just fine.
 
 So what shall we do with L2ARC leakage ?


Could you please try this patch
http://cr.illumos.org/~webrev/skiselkov/3995/illumos-gate.patch ?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re[2]: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-13 Thread Vladimir Sharun
Dear Andriy and FreeBSD community,

 I am not sure if the buffers are leaked somehow or if they are actually in 
 use.
 It's one of the very few places where data buffers are allocated without
 charging ARC.  In all other places it's quite easy to match allocations and
 deallocations.  But in L2ARC it is not obvious that all buffers get freed or
 when that happens.

After one week under load I think we figure out the cause: it's L2ARC. 
Here's the top's header for 7d17h of the runtime:

last pid: 46409;  load averages:  0.37,  0.62,  0.70 up 7+17:14:01  07:24:10
173 processes: 1 running, 171 sleeping, 1 zombie
CPU:  2.0% user,  0.0% nice,  3.5% system,  0.4% interrupt, 94.2% idle
Mem: 8714M Active, 14G Inact, 96G Wired, 1929M Cache, 3309M Buf, 3542M Free
ARC: 85G Total, 2558M MFU, 77G MRU, 28M Anon, 1446M Header, 4802M Other

ARC related tunables:

vm.kmem_size=110G
vfs.zfs.arc_max=90G
vfs.zfs.arc_min=42G

For more than 7 days of hard runtime the picture clearly shows: 
Wired minus ARC = 11..12Gb, ARC grow and shrinks in 80-87Gb range and the
system runs just fine.

So what shall we do with L2ARC leakage ?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re[2]: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-06 Thread Vladimir Sharun
Dear Andriy and FreeBSD community,

I got the few minutes run for this dtrace hook; here's the output for 15 
minutes run:

http://pastebin.com/pKm9kLwa

Does it explain something ?

 
 on 04/01/2014 14:50 Vladimir Sharun said the following:
 [snip]
  ARC: 28G Total, 2085M MFU, 20G MRU, 29M Anon, 1858M Header, 3855M Other
 [snip]
  ITEM   SIZE  LIMIT USED FREE  REQ FAIL SLEEP
 [snip]
  zio_data_buf_131072: 131072,  0,  488217,   9,287155442,   0,   0
 
 I noticed a particular discrepancy between reported ARC usage and sizes of UMA
 zones used by ZFS code:
 
 488217 * 131072 = ~59GB right there.
 
 There are several possibilities for this discrepancy:
 - bad accounting or reporting of ARC stats
 - those 128K buffers being used in a special way and thus not accounted as ARC
 - some sort of resource leak
 
 You could try to use DTrace to gather the stacks of all code paths that lead 
 to
 allocation of those buffers.  Something like:
 
 fbt::zio_data_buf_alloc:entry
 /arg0 == 131072/
 {
 @[stack()] = count();
 }
 
 This could be a start for understanding the issue.
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-06 Thread Andriy Gapon
on 06/01/2014 13:14 Vladimir Sharun said the following:
 Dear Andriy and FreeBSD community,
 
 I got the few minutes run for this dtrace hook; here's the output for 15 
 minutes run:
 
 http://pastebin.com/pKm9kLwa
 
 Does it explain something ?

The following makes me suspect a problem with L2ARC compression code.

  zfs.ko`l2arc_feed_thread+0x7d9
  kernel`fork_exit+0x9a
  kernel`0x8069ad6e
95131

I am not sure if the buffers are leaked somehow or if they are actually in use.
It's one of the very few places where data buffers are allocated without
charging ARC.  In all other places it's quite easy to match allocations and
deallocations.  But in L2ARC it is not obvious that all buffers get freed or
when that happens.

 on 04/01/2014 14:50 Vladimir Sharun said the following:
 [snip]
 ARC: 28G Total, 2085M MFU, 20G MRU, 29M Anon, 1858M Header, 3855M Other
 [snip]
 ITEM   SIZE  LIMIT USED FREE  REQ FAIL SLEEP
 [snip]
 zio_data_buf_131072: 131072,  0,  488217,   9,287155442,   0,   0

 I noticed a particular discrepancy between reported ARC usage and sizes of 
 UMA
 zones used by ZFS code:

 488217 * 131072 = ~59GB right there.

 There are several possibilities for this discrepancy:
 - bad accounting or reporting of ARC stats
 - those 128K buffers being used in a special way and thus not accounted as 
 ARC
 - some sort of resource leak

 You could try to use DTrace to gather the stacks of all code paths that lead 
 to
 allocation of those buffers.  Something like:

 fbt::zio_data_buf_alloc:entry
 /arg0 == 131072/
 {
 @[stack()] = count();
 }

 This could be a start for understanding the issue.



-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re[2]: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-06 Thread Vladimir Sharun
Dear Andriy, FreeBSD community,

Thank you for your suggestion, so we will turn off the L2ARC and will report if 
the issue persist or not.
For now the test server rebooted with L2ARC turned off and there's no 
allocations done in 
l2arc_feed_thread according to dtrace hook you provided.

Let's imagine the situation, we found l2arc_feed_thread allocations is the 
cause, what shall our next step ?

The feedback from us will be here within few days (can't reproduce faster)
 
 on 06/01/2014 13:14 Vladimir Sharun said the following:
  Dear Andriy and FreeBSD community,
  
  I got the few minutes run for this dtrace hook; here's the output for 15 
  minutes run:
  
  http://pastebin.com/pKm9kLwa
  
  Does it explain something ?
 
 The following makes me suspect a problem with L2ARC compression code.
 
 zfs.ko`l2arc_feed_thread+0x7d9
 kernel`fork_exit+0x9a
 kernel`0x8069ad6e
 95131
 
 I am not sure if the buffers are leaked somehow or if they are actually in 
 use.
 It's one of the very few places where data buffers are allocated without
 charging ARC.  In all other places it's quite easy to match allocations and
 deallocations.  But in L2ARC it is not obvious that all buffers get freed or
 when that happens.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-05 Thread Andriy Gapon
on 04/01/2014 14:50 Vladimir Sharun said the following:
[snip]
 ARC: 28G Total, 2085M MFU, 20G MRU, 29M Anon, 1858M Header, 3855M Other
[snip]
 ITEM   SIZE  LIMIT USED FREE  REQ FAIL SLEEP
[snip]
 zio_data_buf_131072: 131072,  0,  488217,   9,287155442,   0,   0

I noticed a particular discrepancy between reported ARC usage and sizes of UMA
zones used by ZFS code:

488217 * 131072 = ~59GB right there.

There are several possibilities for this discrepancy:
- bad accounting or reporting of ARC stats
- those 128K buffers being used in a special way and thus not accounted as ARC
- some sort of resource leak

You could try to use DTrace to gather the stacks of all code paths that lead to
allocation of those buffers.  Something like:

fbt::zio_data_buf_alloc:entry
/arg0 == 131072/
{
@[stack()] = count();
}

This could be a start for understanding the issue.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


ARC pressured out, how to control/stabilize ? (reformatted to text/plain)

2014-01-04 Thread Vladimir Sharun
Good day community,

We ran the system in production with r259544 on it. 128Gb RAM, 72Gb - arc_max 
and 42Gb - arc_min set in loader.conf.
The system ran both application (which are FS ops hungry) and 18Tb storage on 
this setup.
From the start, ARC clearly shows 72Gb of memory consumed, but during next few 
days it pressured out to even lower than arc_min (after 14 days of uptime 
there's 28Gb only used by ARC. The problem is: lesser data in the ARC, lesser 
performance from the entire system. First two days (when ARC slowly decreases 
to 55-60Gb) the system run fine with excellent performance telemetry 
(application response time) up to 7-10 days, when it reach ~30Gb and then the 
performance falls off to unacceptable level.

So the question is: how to understand, who eats memory from ARC, and how to 
control this memory eating ?

top, vmstat's follow (vm.kmem_size limited to 100Gb, if not, wired will reach 
107Gb approx with the same ARC pressured out)

last pid: 13273;  load averages:  2.99,  1.55,  1.20   up 14+19:00:35  14:24:55
227 processes: 1 running, 226 sleeping
CPU:  4.6% user,  0.0% nice,  5.9% system,  1.0% interrupt, 88.5% idle
Mem: 667M Active, 30G Inact, 91G Wired, 1783M Cache, 3309M Buf, 933M Free
ARC: 28G Total, 2085M MFU, 20G MRU, 29M Anon, 1858M Header, 3855M Other
Swap:

$ vmstat -z
ITEM   SIZE  LIMIT USED FREE  REQ FAIL SLEEP

UMA Kegs:   384,  0, 208,   2, 208,   0,   0
UMA Zones: 2176,  0, 208,   0, 208,   0,   0
UMA Slabs:   80,  0, 2311694,  204306,718386402,   0,   0
UMA RCntSlabs:   88,  0,9229,  41,   18930,   0,   0
UMA Hash:   256,  0,   3,  12,  79,   0,   0
4 Bucket:32,  0,1785,   21090,1248973183,   0,   0
6 Bucket:48,  0, 185,   12763,148487075,   0,   0
8 Bucket:64,  0,  65,   20643,41333608,   0,   0
12 Bucket:   96,  0,9665,9482,11049020,   0,   0
16 Bucket:  128,  0, 649,4807,11971939,  11,   0
32 Bucket:  256,  0,   12630,   50790,31362215,  33,   0
64 Bucket:  512,  0,   23203,6781,17367241,133,   0
128 Bucket:1024,  0,   95458,   48746,810293851,317422,   0
vmem btag:   56,  0, 1411085,   96316,27776938,21231,   0
VM OBJECT:  256,  0,  771497,  279598,2938160439,   0,   0
RADIX NODE: 144,  0, 2598081, 1326882,4515451399,   0,   0
MAP:240,  0,   3,  61,   3,   0,   0
KMAP ENTRY: 128,  0,  25, 874,  25,   0,   0
MAP ENTRY:  128,  0,   25469,   45149,11668899620,   0,   0
VMSPACE:448,  0, 235, 782,149851562,   0,   0
fakepg: 104,  0,   0,   0,   0,   0,   0
mt_zone:   4112,  0, 263,   0, 263,   0,   0
16:  16,  0, 2812047,   93027,53798617567,   0,   0
32:  32,  0, 9723566, 1887934,12748376945,   0,   0
64:  64,  0,10475477, 6543585,20857570861,   0,   0
128:128,  0, 1848882,  490254,23475247741,   0,   0
256:256,  0,  914734,  702251,12021626624,   0,   0
512:512,  0,1307, 581,10911947626,   0,   0
1024:  1024,  0,   28407, 233,234462675,   0,   0
2048:  2048,  0, 511,2463,2135442383,   0,   0
4096:  4096,  0,  507210,   85487,572913690,   0,   0
uint64 pcpu:  8,  0, 298, 598, 298,   0,   0
SLEEPQUEUE:  80,  0,1666, 721,1669,   0,   0
Files:   80,  0,4779,   10221,3345112793,   0,   0
TURNSTILE:  136,  0,1666, 414,1669,   0,   0
rl_entry:40,  0, 696, 804, 696,   0,   0
umtx pi: 96,  0,   0,   0,   0,   0,   0
MAC labels:  40,  0,   0,   0,   0,   0,   0
PROC:  1208,  0, 256, 473,149840920,   0,   0
THREAD:1168,  0,1514, 151,  783235,   0,   0
cpuset:  72,  0, 791,1134,   11746,   0,   0
audit_record:  1248,  0,   0,   0,   0,   0,   0
sendfile_sync:   64,  0,   0,   0,   0,   0,   0
mbuf_packet:256, 41943045,8248,4127,8662647947,   0,   0
mbuf:   256, 41943045, 314,   13696,64235586097,   0,   0
mbuf_cluster:  2048, 6553600,   12375,  41,   20500,   0,   0
mbuf_jumbo_page:   4096, 3276800, 311,2710,1565741448,   0,   0
mbuf_jumbo_9k: 9216, 970903,   0,   0,   0,   0,   0
mbuf_jumbo_16k:   16384, 546133,