Re: All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64

2018-07-31 Thread Nagy László Zsolt
I have experienced a very similar thing. After upgrading my machine from
11.1-R to 11.2-R, the swap space is filled up to about 66% in about
every 2 days. First I tought that it was PostgreSQL, and lowered the
shared_buffers setting, but it only postponed the problem for another day.

The only thing that has changed is the OS version 11.1-R -> 11.2-R. Here
is the top of top:

last pid: 50425;  load averages:  0.19,  0.16, 
0.17   up
15+23:02:21  06:18:18
45 processes:  1 running, 43 sleeping, 1 zombie
CPU: % user, % nice, % system, % interrupt, % idle
Mem: 81M Active, 91M Inact, 1577M Laundry, 14G Wired, 226M Free
ARC: 9598M Total, 90M MFU, 8715M MRU, 105K Anon, 199M Header, 594M Other
 8085M Compressed, 15G Uncompressed, 1.84:1 Ratio
Swap: 4096M Total, 3103M Used, 993M Free, 75% Inuse

The ARC value seem to be growing for a while, then it starts to use the
swap heavily.

But this might be unrelated because the swap usage does not go above
80%. (E.g. it does not crash, but it is clearly using swap when it
should not.)

  Laszlo

> On 01/08/2018 07:24, Mark Martinec wrote:
>> I have now upgraded this host from 11.1-RELEASE-p11 to 11.2-RELEASE
>> and the situation has not improved. Also turned off all services.
>> ZFS is still leaking memory about 30 MB per hour, until the host
>> runs out of memory and swap space and crashes, unless I reboot it
>> first every four days.
>>
>> Any advise before I try to get rid of that faulted disk with a pool
>> (or downgrade to 10.3, which was stable) ?
>>
>>   Mark
>>
>>
>> 2018-07-23 17:12, myself wrote:
>>> After upgrading an older AMD host from FreeBSD 10.3 to 11.1-RELEASE-p11
>>> (amd64), ZFS is gradually eating up all memory, so that it crashes every
>>> few days when the memory is completely exhausted (after swapping heavily
>>> for a couple of hours).
>>>
>>> This machine has only 4 GB of memory. After capping up the ZFS ARC
>>> to 1.8 GB the machine can now stay up a bit longer, but in four days
>>> all the memory is used up. The machine is lightly loaded, it runs
>>> a bind resolver and a lightly used web server, the ps output
>>> does not show any excessive memory use by any process.
> When you say all used up - you mean the amount of wired ram goes higher
> than about 90% physical ram? You can watch the wired amount in top, or
> calculate it as vm.stats.vm.v_wire_count * hw.pagesize
>
> ZFS ARC is marked as wired, there is also vm.max_wired which limits how
> much the kernel can wire, this defaults to 30% ram, so about 1.2G for
> you. It seems these two wired values don't interact and can add up to
> more than physical ram. I have reported this in bug 229764
>
> Try the patch at
> https://reviews.freebsd.org/D7538
> it has given me the best arc related memory improvements I have seen
> since 10.1, I now see arc being released instead of swap being used.
>


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64

2018-07-31 Thread Shane Ambler
On 01/08/2018 07:24, Mark Martinec wrote:
> I have now upgraded this host from 11.1-RELEASE-p11 to 11.2-RELEASE
> and the situation has not improved. Also turned off all services.
> ZFS is still leaking memory about 30 MB per hour, until the host
> runs out of memory and swap space and crashes, unless I reboot it
> first every four days.
> 
> Any advise before I try to get rid of that faulted disk with a pool
> (or downgrade to 10.3, which was stable) ?
> 
>   Mark
> 
> 
> 2018-07-23 17:12, myself wrote:
>> After upgrading an older AMD host from FreeBSD 10.3 to 11.1-RELEASE-p11
>> (amd64), ZFS is gradually eating up all memory, so that it crashes every
>> few days when the memory is completely exhausted (after swapping heavily
>> for a couple of hours).
>>
>> This machine has only 4 GB of memory. After capping up the ZFS ARC
>> to 1.8 GB the machine can now stay up a bit longer, but in four days
>> all the memory is used up. The machine is lightly loaded, it runs
>> a bind resolver and a lightly used web server, the ps output
>> does not show any excessive memory use by any process.

When you say all used up - you mean the amount of wired ram goes higher
than about 90% physical ram? You can watch the wired amount in top, or
calculate it as vm.stats.vm.v_wire_count * hw.pagesize

ZFS ARC is marked as wired, there is also vm.max_wired which limits how
much the kernel can wire, this defaults to 30% ram, so about 1.2G for
you. It seems these two wired values don't interact and can add up to
more than physical ram. I have reported this in bug 229764

Try the patch at
https://reviews.freebsd.org/D7538
it has given me the best arc related memory improvements I have seen
since 10.1, I now see arc being released instead of swap being used.

-- 
FreeBSD - the place to B...Software Developing

Shane Ambler

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64

2018-07-31 Thread Mark Johnston
On Tue, Jul 31, 2018 at 11:54:29PM +0200, Mark Martinec wrote:
> I have now upgraded this host from 11.1-RELEASE-p11 to 11.2-RELEASE
> and the situation has not improved. Also turned off all services.
> ZFS is still leaking memory about 30 MB per hour, until the host
> runs out of memory and swap space and crashes, unless I reboot it
> first every four days.
> 
> Any advise before I try to get rid of that faulted disk with a pool
> (or downgrade to 10.3, which was stable) ?

If you're able to use dtrace, it would be useful to try tracking
allocations with the solaris tag:

# dtrace -n 'dtmalloc::solaris:malloc {@allocs[stack(), args[3]] =
  count()} dtmalloc::solaris:free {@frees[stack(), args[3]] = count();}'

Try letting that run for one minute, then kill it and paste the output.
Ideally the host will be as close to idle as possible while still
demonstrating the leak.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64

2018-07-31 Thread Mark Martinec

I have now upgraded this host from 11.1-RELEASE-p11 to 11.2-RELEASE
and the situation has not improved. Also turned off all services.
ZFS is still leaking memory about 30 MB per hour, until the host
runs out of memory and swap space and crashes, unless I reboot it
first every four days.

Any advise before I try to get rid of that faulted disk with a pool
(or downgrade to 10.3, which was stable) ?

  Mark


2018-07-23 17:12, myself wrote:

After upgrading an older AMD host from FreeBSD 10.3 to 11.1-RELEASE-p11
(amd64), ZFS is gradually eating up all memory, so that it crashes 
every
few days when the memory is completely exhausted (after swapping 
heavily

for a couple of hours).

This machine has only 4 GB of memory. After capping up the ZFS ARC
to 1.8 GB the machine can now stay up a bit longer, but in four days
all the memory is used up. The machine is lightly loaded, it runs
a bind resolver and a lightly used web server, the ps output
does not show any excessive memory use by any process.

During the last survival period I ran  vmstat -m  every second
and logged results. What caught my eye was the 'solaris' entry,
which seems to explain all the exhaustion.

The MemUse for the solaris entry starts modestly, e.g. after a few
hours of uptime:

$ vmstat -m :
 Type InUse MemUse HighUse Requests  Size(s)
  solaris 3141552 225178K   - 12066929
16,32,64,128,256,512,1024,2048,4096,8192,16384,32768

... but this number keeps steadily growing.

After about four days, shortly before a crash, it grew to 2.5 GB,
which gets dangerously close to all the available memory:

  solaris 39359484 2652696K   - 234986296
16,32,64,128,256,512,1024,2048,4096,8192,16384,32768

Plotting the 'solaris' MemUse entry vs. wall time in seconds, one can 
see
a steady linear growth, about 25 MB per hour. On a fine-resolution 
small scale

the step size seems to be one small step increase per about 6 seconds.
All steps are small, but not all are the same size.

The only thing (in my mind) that distinguishes this host from others
running 11.1 seems to be that one of the two ZFS pools is down because
its disk is broken. This is a scratch data pool, not otherwise in use.
The pool with the OS is healthy.

The syslog shows entries like the following periodically:

Jul 23 16:48:49 xxx ZFS: vdev state changed,
pool_guid=15371508659919408885 vdev_guid=11732693005294113354
Jul 23 16:49:09 xxx ZFS: vdev state changed,
pool_guid=15371508659919408885 vdev_guid=11732693005294113354
Jul 23 16:55:34 xxx ZFS: vdev state changed,
pool_guid=15371508659919408885 vdev_guid=11732693005294113354

The 'zpool status -v' on this pool shows:

  pool: stuff
 state: UNAVAIL
status: One or more devices could not be opened.  There are 
insufficient

replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

NAMESTATE READ WRITE CKSUM
stuff   UNAVAIL  0 0 0
  11732693005294113354  UNAVAIL  0 0 0  was 
/dev/da2



The same machine with this broken pool could previously survive 
indefinitely

under FreeBSD 10.3 .

So, could this be the reason for memory depletion?
Any fixes for that? Any more tests suggested to perform
before I try to get rid of this pool?

  Mark
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to 
"freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Can't upgrade past 10.4-STABLE (interrupt storm?)

2018-07-31 Thread Chris H

Hello,
I've got an older laptop that I attempted to install 12 on w/o success.
Well, it installed. But was unusable. Typing anything at the console
frequently doesn't output on the screen w/o tapping one of the arrow
keys. But doing that causes other problems. As I can't really use the
output. :(
I suspected an interrupt storm of some type. But really can't say for
sure. a vmstat -i seems to show unusually high numbers for irq1: atkbd0
often ~1/3rd the number for CPU0. I can easily realize all this during
the install process from the install media. So it's easy to test.
This is a i386 based Pentium M. FreeBSD 10.4-STABLE runs like a dream.
But I'm going to need to move forward at some point, and I'd like that
some point to be now. :)
Any thoughts on how I might discover /what/ change was made from
10.4-->11* to cause this?
It has a trackpad. Should that make any difference.

Thanks!

--Chris


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mail coredumping on yesterdays STABLE

2018-07-31 Thread Pete French
> Mark Johnston fixed this. See
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230196

Thanks - I saw the commit go in earlier, but I hadnt checked my email
until now. Sorry I didnt investigate this myself, as I am perfectly capable
of doing so, to be honest, just very short of time :-(

cheers,

-pete.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"