from:"Gerhard Wiesinger"


On 27.05.2018 22:31, Florian Fainelli wrote:

Le 05/27/18 à 12:01, Gerhard Wiesinger a écrit :

On 24.05.2018 08:22, Gerhard Wiesinger wrote:

On 24.05.2018 07:29, Gerhard Wiesinger wrote:

After some analysis with Florian (thnx) we found out that the current
implementation is broken:

https://patchwork.ozlabs.org/patch/836538/
https://github.com/torvalds/linux/commit/c499696e7901bda18385ac723b7bd27c3a4af624#diff-a2b6f8d89e18de600e873ac3ac43fa1d


Florians comment:

c499696e7901bda18385ac723b7bd27c3a4af624 ("net: dsa: b53: Stop using
dev->cpu_port incorrectly") since it would result in no longer setting
the CPU port as tagged for a specific VLAN. Easiest way for you right
now is to just revert it, but this needs some more thoughts for a proper
upstream change. I will think about it some more.

Can confirm 4.14.18-200.fc26.armv7hl works, 4.15.x should be broken.

# Kernel 4.14.x ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.14.43

# Kernel 4.15.x should be NOT ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.15.18


Kernel 4.14.18-300.fc27.armv7hl works well so far, even with FC28
update. Florian send me a patch to try for 4.16.x

So does my patch make 4.16 work correctly for you now? If so, can I just
submit it and copy you?


I got the  commands below to work with manual script commands.
Afterwards I wrote systemd-networkd config where I've a strage problem
when IPv6 sends a multicast broadcast from another machine to the bridge
this will be sent back via the network interface, but with the source
MAC of the bridge of the other machine. dmesg from the other machine:
[117768.330444] br0: received packet on lan0 with own address as source
address (addr:a0:36:9f:ab:cd:ef, vlan:0)
[117768.334887] br0: received packet on lan0 with own address as source
address (addr:a0:36:9f:ab:cd:ef, vlan:0)
[117768.339281] br0: received packet on lan0 with own address as source
address (addr:a0:36:9f:ab:cd:ef, vlan:0)

And: If I just enter this command after e.g. a systemd-network restart
everything is fine forever:
# Not OK (dmesg message above is triggered on a remote computer, whole
switching network gets unstable, ssh terminals close, packet loss, etc.)
systemctl restart systemd-networkd
# OK again when this command is entered
bridge vlan add dev wan vid 102 pvid untagged

brctl show, ip link, bridge vlan, bridge link commands, etc. look all
the same, also /sys/class/net/br0/bridge, /sys/class/net/br1/bridge
settings

Systemd config correct?
Any ideas?

You should not have eth0.101 and eth0.102 to be enslaved in a bridge at
all, this is what is causing the bridge to be confused. Remember what I
wrote to you before, with the current b53 driver that does not have any
tagging enabled the lanX interfaces and brX interfaces are only used for
control and should not be used for passing any data. The only network
device that will be passing data is eth0, which is why we need to set-up
VLAN interfaces to pop/push the VLAN id accordingly.

I have no idea why manual vs. systemd does not work but you can most
certainly troubleshoot that by comparing the bridge/ip outputs.


So is that then the correct structure?

br1
- lan1 (with VID 101)
- lan2 (with VID 101)
- lan3 (with VID 101)
- lan4 (with VID 101)

brlan
- eth0.101
- wlan0 (currently not active, could be optimized without bridge but for 
future comfort)


br2
- wan (with VID 102) (could be optimized without bridge but for future 
comfort)

- future1

brwan
- eth0.102 (could be optimized without bridge but for future comfort)
- future2

Ad systemd vs. manual config: As I said I didn't find any difference in 
the bridge/ip outputs. As they are broken (see other message) maybe 
something else is broken, too.


Thnx.

Ciao,
Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26 - systemd-networkd problem


On 27.05.2018 22:31, Florian Fainelli wrote:

Le 05/27/18 à 12:01, Gerhard Wiesinger a écrit :

On 24.05.2018 08:22, Gerhard Wiesinger wrote:

On 24.05.2018 07:29, Gerhard Wiesinger wrote:

After some analysis with Florian (thnx) we found out that the current
implementation is broken:

https://patchwork.ozlabs.org/patch/836538/
https://github.com/torvalds/linux/commit/c499696e7901bda18385ac723b7bd27c3a4af624#diff-a2b6f8d89e18de600e873ac3ac43fa1d


Florians comment:

c499696e7901bda18385ac723b7bd27c3a4af624 ("net: dsa: b53: Stop using
dev->cpu_port incorrectly") since it would result in no longer setting
the CPU port as tagged for a specific VLAN. Easiest way for you right
now is to just revert it, but this needs some more thoughts for a proper
upstream change. I will think about it some more.

Can confirm 4.14.18-200.fc26.armv7hl works, 4.15.x should be broken.

# Kernel 4.14.x ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.14.43

# Kernel 4.15.x should be NOT ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.15.18


Kernel 4.14.18-300.fc27.armv7hl works well so far, even with FC28
update. Florian send me a patch to try for 4.16.x

So does my patch make 4.16 work correctly for you now? If so, can I just
submit it and copy you?


I got the  commands below to work with manual script commands.
Afterwards I wrote systemd-networkd config where I've a strage problem
when IPv6 sends a multicast broadcast from another machine to the bridge
this will be sent back via the network interface, but with the source
MAC of the bridge of the other machine. dmesg from the other machine:
[117768.330444] br0: received packet on lan0 with own address as source
address (addr:a0:36:9f:ab:cd:ef, vlan:0)
[117768.334887] br0: received packet on lan0 with own address as source
address (addr:a0:36:9f:ab:cd:ef, vlan:0)
[117768.339281] br0: received packet on lan0 with own address as source
address (addr:a0:36:9f:ab:cd:ef, vlan:0)

And: If I just enter this command after e.g. a systemd-network restart
everything is fine forever:
# Not OK (dmesg message above is triggered on a remote computer, whole
switching network gets unstable, ssh terminals close, packet loss, etc.)
systemctl restart systemd-networkd
# OK again when this command is entered
bridge vlan add dev wan vid 102 pvid untagged

brctl show, ip link, bridge vlan, bridge link commands, etc. look all
the same, also /sys/class/net/br0/bridge, /sys/class/net/br1/bridge
settings

Systemd config correct?
Any ideas?

You should not have eth0.101 and eth0.102 to be enslaved in a bridge at
all, this is what is causing the bridge to be confused. Remember what I
wrote to you before, with the current b53 driver that does not have any
tagging enabled the lanX interfaces and brX interfaces are only used for
control and should not be used for passing any data. The only network
device that will be passing data is eth0, which is why we need to set-up
VLAN interfaces to pop/push the VLAN id accordingly.

I have no idea why manual vs. systemd does not work but you can most
certainly troubleshoot that by comparing the bridge/ip outputs.


So is that then the correct structure?

br1
- lan1 (with VID 101)
- lan2 (with VID 101)
- lan3 (with VID 101)
- lan4 (with VID 101)

brlan
- eth0.101
- wlan0 (currently not active, could be optimized without bridge but for 
future comfort)


br2
- wan (with VID 102) (could be optimized without bridge but for future 
comfort)

- future1

brwan
- eth0.102 (could be optimized without bridge but for future comfort)
- future2

Ad systemd vs. manual config: As I said I didn't find any difference in 
the bridge/ip outputs. As they are broken (see other message) maybe 
something else is broken, too.


Thnx.

Ciao,
Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26 - systemd-networkd problem


On 27.05.2018 22:35, Florian Fainelli wrote:

Le 05/27/18 à 12:18, Gerhard Wiesinger a écrit :

On 27.05.2018 21:01, Gerhard Wiesinger wrote:

On 24.05.2018 08:22, Gerhard Wiesinger wrote:

On 24.05.2018 07:29, Gerhard Wiesinger wrote:

After some analysis with Florian (thnx) we found out that the
current implementation is broken:

https://patchwork.ozlabs.org/patch/836538/
https://github.com/torvalds/linux/commit/c499696e7901bda18385ac723b7bd27c3a4af624#diff-a2b6f8d89e18de600e873ac3ac43fa1d


Florians comment:

c499696e7901bda18385ac723b7bd27c3a4af624 ("net: dsa: b53: Stop using
dev->cpu_port incorrectly") since it would result in no longer setting
the CPU port as tagged for a specific VLAN. Easiest way for you right
now is to just revert it, but this needs some more thoughts for a
proper
upstream change. I will think about it some more.

Can confirm 4.14.18-200.fc26.armv7hl works, 4.15.x should be broken.

# Kernel 4.14.x ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.14.43

# Kernel 4.15.x should be NOT ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.15.18




Forgot to mention: What's also strange is that the VLAN ID is very high:

# 4.14.18-300.fc27.armv7hl, iproute-4.15.0-1.fc28.armv7hl
ip -d link show eth0.101 | grep "vlan protocol"
     vlan protocol 802.1Q id 3069279796 
ip -d link show eth0.102 | grep "vlan protocol"
     vlan protocol 802.1Q id 3068673588 

On older kernels this looks ok: 4.12.8-200.fc25.armv7hl,
iproute-4.11.0-1.fc25.armv7hl:
  ip -d link show eth0.101 | grep "vlan protocol"
     vlan protocol 802.1Q id 101 
ip -d link show eth0.102 | grep "vlan protocol"
     vlan protocol 802.1Q id 102 

Ideas?

That is quite likely a kernel/iproute2 issue, if you configured the
switch through bridge vlan to have the ports in VLAN 101 and VLAN 102
and you do indeed see frames entering eth0 with these VLAN IDs, then
clearly the bridge -> switchdev -> dsa -> b53 part is working just fine
and what you are seeing is some for of kernel header/netlink
incompatibility.


Yes, sniffing on eth0 shows the correct VLAN IDs, e.g. 101.

Yes, my guess is that tools are wrong and have random values on 2 calls 
in different values (e.g. alsopromiscuity ) , see below 


Who can fix it?

BTW: On FC27 same issue with same kernel version, but guess older 
iproute version.


Ciao,

Gerhard


ip -d link show eth0.101

13: eth0.101@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue master br0 state UP mode DEFAULT group default qlen 1000
    link/ether 02:18:09:ab:cd:ef brd ff:ff:ff:ff:ff:ff promiscuity 
3068661300

    vlan protocol 802.1Q id 3068661300 
    bridge_slave state forwarding priority 32 cost 4 hairpin off guard 
off root_block off fastleave off learning on flood on port_id 0x8005 
port_no 0x5 designa
ted_port 3068661300 designated_cost 3068661300 designated_bridge 
8000.66:5d:a2:ab:cd:ef designated_root 8000.66:5d:a2:ab:cd:ef 
hold_timer    0.00 message_age_tim
er    0.00 forward_delay_timer    0.00 topology_change_ack 3068661300 
config_pending 3068661300 proxy_arp off proxy_arp_wifi off mcast_router 
3068661300 mcast_
fast_leave off mcast_flood on vlan_tunnel off addrgenmode eui64 
numtxqueues 3068661300 numrxqueues 3068661300 gso_max_size 3068661300 
gso_max_segs 3068661300


ip -d link show eth0.101
13: eth0.101@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
noqueue master br0 state UP mode DEFAULT group default qlen 1000
    link/ether 02:18:09:ab:cd:ef brd ff:ff:ff:ff:ff:ff promiscuity 
3068735028

    vlan protocol 802.1Q id 3068735028 
    bridge_slave state forwarding priority 32 cost 4 hairpin off guard 
off root_block off fastleave off learning on flood on port_id 0x8005 
port_no 0x5 designa
ted_port 3068735028 designated_cost 3068735028 designated_bridge 
8000.66:5d:ab:cd:ef designated_root 8000.66:5d:a2:ab:cd:ef hold_timer    
0.00 message_age_tim
er    0.00 forward_delay_timer    0.00 topology_change_ack 3068735028 
config_pending 3068735028 proxy_arp off proxy_arp_wifi off mcast_router 
3068735028 mcast_
fast_leave off mcast_flood on vlan_tunnel off addrgenmode eui64 
numtxqueues 3068735028 numrxqueues 3068735028 gso_max_size 3068735028 
gso_max_segs 3068735028

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26 - systemd-networkd problem


On 27.05.2018 22:35, Florian Fainelli wrote:

Le 05/27/18 à 12:18, Gerhard Wiesinger a écrit :

On 27.05.2018 21:01, Gerhard Wiesinger wrote:

On 24.05.2018 08:22, Gerhard Wiesinger wrote:

On 24.05.2018 07:29, Gerhard Wiesinger wrote:

After some analysis with Florian (thnx) we found out that the
current implementation is broken:

https://patchwork.ozlabs.org/patch/836538/
https://github.com/torvalds/linux/commit/c499696e7901bda18385ac723b7bd27c3a4af624#diff-a2b6f8d89e18de600e873ac3ac43fa1d


Florians comment:

c499696e7901bda18385ac723b7bd27c3a4af624 ("net: dsa: b53: Stop using
dev->cpu_port incorrectly") since it would result in no longer setting
the CPU port as tagged for a specific VLAN. Easiest way for you right
now is to just revert it, but this needs some more thoughts for a
proper
upstream change. I will think about it some more.

Can confirm 4.14.18-200.fc26.armv7hl works, 4.15.x should be broken.

# Kernel 4.14.x ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.14.43

# Kernel 4.15.x should be NOT ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.15.18




Forgot to mention: What's also strange is that the VLAN ID is very high:

# 4.14.18-300.fc27.armv7hl, iproute-4.15.0-1.fc28.armv7hl
ip -d link show eth0.101 | grep "vlan protocol"
     vlan protocol 802.1Q id 3069279796 
ip -d link show eth0.102 | grep "vlan protocol"
     vlan protocol 802.1Q id 3068673588 

On older kernels this looks ok: 4.12.8-200.fc25.armv7hl,
iproute-4.11.0-1.fc25.armv7hl:
  ip -d link show eth0.101 | grep "vlan protocol"
     vlan protocol 802.1Q id 101 
ip -d link show eth0.102 | grep "vlan protocol"
     vlan protocol 802.1Q id 102 

Ideas?

That is quite likely a kernel/iproute2 issue, if you configured the
switch through bridge vlan to have the ports in VLAN 101 and VLAN 102
and you do indeed see frames entering eth0 with these VLAN IDs, then
clearly the bridge -> switchdev -> dsa -> b53 part is working just fine
and what you are seeing is some for of kernel header/netlink
incompatibility.


Yes, sniffing on eth0 shows the correct VLAN IDs, e.g. 101.

Yes, my guess is that tools are wrong and have random values on 2 calls 
in different values (e.g. alsopromiscuity ) , see below 


Who can fix it?

BTW: On FC27 same issue with same kernel version, but guess older 
iproute version.


Ciao,

Gerhard


ip -d link show eth0.101

13: eth0.101@eth0:  mtu 1500 qdisc 
noqueue master br0 state UP mode DEFAULT group default qlen 1000
    link/ether 02:18:09:ab:cd:ef brd ff:ff:ff:ff:ff:ff promiscuity 
3068661300

    vlan protocol 802.1Q id 3068661300 
    bridge_slave state forwarding priority 32 cost 4 hairpin off guard 
off root_block off fastleave off learning on flood on port_id 0x8005 
port_no 0x5 designa
ted_port 3068661300 designated_cost 3068661300 designated_bridge 
8000.66:5d:a2:ab:cd:ef designated_root 8000.66:5d:a2:ab:cd:ef 
hold_timer    0.00 message_age_tim
er    0.00 forward_delay_timer    0.00 topology_change_ack 3068661300 
config_pending 3068661300 proxy_arp off proxy_arp_wifi off mcast_router 
3068661300 mcast_
fast_leave off mcast_flood on vlan_tunnel off addrgenmode eui64 
numtxqueues 3068661300 numrxqueues 3068661300 gso_max_size 3068661300 
gso_max_segs 3068661300


ip -d link show eth0.101
13: eth0.101@eth0:  mtu 1500 qdisc 
noqueue master br0 state UP mode DEFAULT group default qlen 1000
    link/ether 02:18:09:ab:cd:ef brd ff:ff:ff:ff:ff:ff promiscuity 
3068735028

    vlan protocol 802.1Q id 3068735028 
    bridge_slave state forwarding priority 32 cost 4 hairpin off guard 
off root_block off fastleave off learning on flood on port_id 0x8005 
port_no 0x5 designa
ted_port 3068735028 designated_cost 3068735028 designated_bridge 
8000.66:5d:ab:cd:ef designated_root 8000.66:5d:a2:ab:cd:ef hold_timer    
0.00 message_age_tim
er    0.00 forward_delay_timer    0.00 topology_change_ack 3068735028 
config_pending 3068735028 proxy_arp off proxy_arp_wifi off mcast_router 
3068735028 mcast_
fast_leave off mcast_flood on vlan_tunnel off addrgenmode eui64 
numtxqueues 3068735028 numrxqueues 3068735028 gso_max_size 3068735028 
gso_max_segs 3068735028

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26 - systemd-networkd problem


On 27.05.2018 21:01, Gerhard Wiesinger wrote:

On 24.05.2018 08:22, Gerhard Wiesinger wrote:

On 24.05.2018 07:29, Gerhard Wiesinger wrote:
After some analysis with Florian (thnx) we found out that the 
current implementation is broken:


https://patchwork.ozlabs.org/patch/836538/
https://github.com/torvalds/linux/commit/c499696e7901bda18385ac723b7bd27c3a4af624#diff-a2b6f8d89e18de600e873ac3ac43fa1d 



Florians comment:

c499696e7901bda18385ac723b7bd27c3a4af624 ("net: dsa: b53: Stop using
dev->cpu_port incorrectly") since it would result in no longer setting
the CPU port as tagged for a specific VLAN. Easiest way for you right
now is to just revert it, but this needs some more thoughts for a 
proper

upstream change. I will think about it some more.


Can confirm 4.14.18-200.fc26.armv7hl works, 4.15.x should be broken.

# Kernel 4.14.x ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.14.43 


# Kernel 4.15.x should be NOT ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.15.18 






Forgot to mention: What's also strange is that the VLAN ID is very high:

# 4.14.18-300.fc27.armv7hl, iproute-4.15.0-1.fc28.armv7hl
ip -d link show eth0.101 | grep "vlan protocol"
    vlan protocol 802.1Q id 3069279796 
ip -d link show eth0.102 | grep "vlan protocol"
    vlan protocol 802.1Q id 3068673588 

On older kernels this looks ok: 4.12.8-200.fc25.armv7hl, 
iproute-4.11.0-1.fc25.armv7hl:

 ip -d link show eth0.101 | grep "vlan protocol"
    vlan protocol 802.1Q id 101 
ip -d link show eth0.102 | grep "vlan protocol"
    vlan protocol 802.1Q id 102 

Ideas?

Thank you.

Ciao,
Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26 - systemd-networkd problem


On 27.05.2018 21:01, Gerhard Wiesinger wrote:

On 24.05.2018 08:22, Gerhard Wiesinger wrote:

On 24.05.2018 07:29, Gerhard Wiesinger wrote:
After some analysis with Florian (thnx) we found out that the 
current implementation is broken:


https://patchwork.ozlabs.org/patch/836538/
https://github.com/torvalds/linux/commit/c499696e7901bda18385ac723b7bd27c3a4af624#diff-a2b6f8d89e18de600e873ac3ac43fa1d 



Florians comment:

c499696e7901bda18385ac723b7bd27c3a4af624 ("net: dsa: b53: Stop using
dev->cpu_port incorrectly") since it would result in no longer setting
the CPU port as tagged for a specific VLAN. Easiest way for you right
now is to just revert it, but this needs some more thoughts for a 
proper

upstream change. I will think about it some more.


Can confirm 4.14.18-200.fc26.armv7hl works, 4.15.x should be broken.

# Kernel 4.14.x ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.14.43 


# Kernel 4.15.x should be NOT ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.15.18 






Forgot to mention: What's also strange is that the VLAN ID is very high:

# 4.14.18-300.fc27.armv7hl, iproute-4.15.0-1.fc28.armv7hl
ip -d link show eth0.101 | grep "vlan protocol"
    vlan protocol 802.1Q id 3069279796 
ip -d link show eth0.102 | grep "vlan protocol"
    vlan protocol 802.1Q id 3068673588 

On older kernels this looks ok: 4.12.8-200.fc25.armv7hl, 
iproute-4.11.0-1.fc25.armv7hl:

 ip -d link show eth0.101 | grep "vlan protocol"
    vlan protocol 802.1Q id 101 
ip -d link show eth0.102 | grep "vlan protocol"
    vlan protocol 802.1Q id 102 

Ideas?

Thank you.

Ciao,
Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26 - systemd-networkd problem


On 24.05.2018 08:22, Gerhard Wiesinger wrote:

On 24.05.2018 07:29, Gerhard Wiesinger wrote:
After some analysis with Florian (thnx) we found out that the current 
implementation is broken:


https://patchwork.ozlabs.org/patch/836538/
https://github.com/torvalds/linux/commit/c499696e7901bda18385ac723b7bd27c3a4af624#diff-a2b6f8d89e18de600e873ac3ac43fa1d 



Florians comment:

c499696e7901bda18385ac723b7bd27c3a4af624 ("net: dsa: b53: Stop using
dev->cpu_port incorrectly") since it would result in no longer setting
the CPU port as tagged for a specific VLAN. Easiest way for you right
now is to just revert it, but this needs some more thoughts for a proper
upstream change. I will think about it some more.


Can confirm 4.14.18-200.fc26.armv7hl works, 4.15.x should be broken.

# Kernel 4.14.x ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.14.43 


# Kernel 4.15.x should be NOT ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.15.18 



Kernel 4.14.18-300.fc27.armv7hl works well so far, even with FC28 
update. Florian send me a patch to try for 4.16.x


I got the  commands below to work with manual script commands.
Afterwards I wrote systemd-networkd config where I've a strage problem 
when IPv6 sends a multicast broadcast from another machine to the bridge 
this will be sent back via the network interface, but with the source 
MAC of the bridge of the other machine. dmesg from the other machine:
[117768.330444] br0: received packet on lan0 with own address as source 
address (addr:a0:36:9f:ab:cd:ef, vlan:0)
[117768.334887] br0: received packet on lan0 with own address as source 
address (addr:a0:36:9f:ab:cd:ef, vlan:0)
[117768.339281] br0: received packet on lan0 with own address as source 
address (addr:a0:36:9f:ab:cd:ef, vlan:0)


And: If I just enter this command after e.g. a systemd-network restart 
everything is fine forever:
# Not OK (dmesg message above is triggered on a remote computer, whole 
switching network gets unstable, ssh terminals close, packet loss, etc.)

systemctl restart systemd-networkd
# OK again when this command is entered
bridge vlan add dev wan vid 102 pvid untagged

brctl show, ip link, bridge vlan, bridge link commands, etc. look all 
the same, also /sys/class/net/br0/bridge, /sys/class/net/br1/bridge settings


Systemd config correct?
Any ideas?

Thank you.

Ciao,
Gerhard

brctl show
bridge name bridge id   STP enabled interfaces
br0 8000.665da2abcdef   no  eth0.101
    lan1
    lan2
    lan3
    lan4
br1 8000.9a4557abcdef  no  eth0.102
    wan


bridge vlan show
port    vlan ids
lan2 101 PVID Egress Untagged

lan3 101 PVID Egress Untagged

lan4 101 PVID Egress Untagged

wan  102 PVID Egress Untagged

lan1 101 PVID Egress Untagged

br1 None
br0 None
eth0.102    None
eth0.101    None



OK: manual scripts



ip link add link eth0 name eth0.101 type vlan id 101
ip link set eth0.101 up
ip link add link eth0 name eth0.102 type vlan id 102
ip link set eth0.102 up
ip link add br0 type bridge
ip link set dev br0 type bridge stp_state 0
ip link set lan1 master br0
bridge vlan add dev lan1 vid 101 pvid untagged
ip link set lan1 up
ip link set lan2 master br0
bridge vlan add dev lan2 vid 101 pvid untagged
ip link set lan2 up
ip link set lan3 master br0
bridge vlan add dev lan3 vid 101 pvid untagged
ip link set lan3 up
ip link set lan4 master br0
bridge vlan add dev lan4 vid 101 pvid untagged
ip link set lan4 up
ip link set eth0.101 master br0
ip link set eth0.101 up
ip link set br0 up
ip link add br1 type bridge
ip link set dev br1 type bridge stp_state 0
ip link set wan master br1
bridge vlan add dev wan vid 102 pvid untagged
ip link set wan up
ip link set eth0.102 master br1
ip link set eth0.102 up
ip link set br1 up
ip addr flush dev br0
ip addr add 192.168.0.250/24 dev br0
ip route del default via 192.168.0.1 dev br0
ip route add default via 192.168.0.1 dev br0
ip addr flush dev br1
ip addr add 192.168.1.1/24 dev br1




NOK: after a multicast packe

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26 - systemd-networkd problem


On 24.05.2018 08:22, Gerhard Wiesinger wrote:

On 24.05.2018 07:29, Gerhard Wiesinger wrote:
After some analysis with Florian (thnx) we found out that the current 
implementation is broken:


https://patchwork.ozlabs.org/patch/836538/
https://github.com/torvalds/linux/commit/c499696e7901bda18385ac723b7bd27c3a4af624#diff-a2b6f8d89e18de600e873ac3ac43fa1d 



Florians comment:

c499696e7901bda18385ac723b7bd27c3a4af624 ("net: dsa: b53: Stop using
dev->cpu_port incorrectly") since it would result in no longer setting
the CPU port as tagged for a specific VLAN. Easiest way for you right
now is to just revert it, but this needs some more thoughts for a proper
upstream change. I will think about it some more.


Can confirm 4.14.18-200.fc26.armv7hl works, 4.15.x should be broken.

# Kernel 4.14.x ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.14.43 


# Kernel 4.15.x should be NOT ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.15.18 



Kernel 4.14.18-300.fc27.armv7hl works well so far, even with FC28 
update. Florian send me a patch to try for 4.16.x


I got the  commands below to work with manual script commands.
Afterwards I wrote systemd-networkd config where I've a strage problem 
when IPv6 sends a multicast broadcast from another machine to the bridge 
this will be sent back via the network interface, but with the source 
MAC of the bridge of the other machine. dmesg from the other machine:
[117768.330444] br0: received packet on lan0 with own address as source 
address (addr:a0:36:9f:ab:cd:ef, vlan:0)
[117768.334887] br0: received packet on lan0 with own address as source 
address (addr:a0:36:9f:ab:cd:ef, vlan:0)
[117768.339281] br0: received packet on lan0 with own address as source 
address (addr:a0:36:9f:ab:cd:ef, vlan:0)


And: If I just enter this command after e.g. a systemd-network restart 
everything is fine forever:
# Not OK (dmesg message above is triggered on a remote computer, whole 
switching network gets unstable, ssh terminals close, packet loss, etc.)

systemctl restart systemd-networkd
# OK again when this command is entered
bridge vlan add dev wan vid 102 pvid untagged

brctl show, ip link, bridge vlan, bridge link commands, etc. look all 
the same, also /sys/class/net/br0/bridge, /sys/class/net/br1/bridge settings


Systemd config correct?
Any ideas?

Thank you.

Ciao,
Gerhard

brctl show
bridge name bridge id   STP enabled interfaces
br0 8000.665da2abcdef   no  eth0.101
    lan1
    lan2
    lan3
    lan4
br1 8000.9a4557abcdef  no  eth0.102
    wan


bridge vlan show
port    vlan ids
lan2 101 PVID Egress Untagged

lan3 101 PVID Egress Untagged

lan4 101 PVID Egress Untagged

wan  102 PVID Egress Untagged

lan1 101 PVID Egress Untagged

br1 None
br0 None
eth0.102    None
eth0.101    None



OK: manual scripts



ip link add link eth0 name eth0.101 type vlan id 101
ip link set eth0.101 up
ip link add link eth0 name eth0.102 type vlan id 102
ip link set eth0.102 up
ip link add br0 type bridge
ip link set dev br0 type bridge stp_state 0
ip link set lan1 master br0
bridge vlan add dev lan1 vid 101 pvid untagged
ip link set lan1 up
ip link set lan2 master br0
bridge vlan add dev lan2 vid 101 pvid untagged
ip link set lan2 up
ip link set lan3 master br0
bridge vlan add dev lan3 vid 101 pvid untagged
ip link set lan3 up
ip link set lan4 master br0
bridge vlan add dev lan4 vid 101 pvid untagged
ip link set lan4 up
ip link set eth0.101 master br0
ip link set eth0.101 up
ip link set br0 up
ip link add br1 type bridge
ip link set dev br1 type bridge stp_state 0
ip link set wan master br1
bridge vlan add dev wan vid 102 pvid untagged
ip link set wan up
ip link set eth0.102 master br1
ip link set eth0.102 up
ip link set br1 up
ip addr flush dev br0
ip addr add 192.168.0.250/24 dev br0
ip route del default via 192.168.0.1 dev br0
ip route add default via 192.168.0.1 dev br0
ip addr flush dev br1
ip addr add 192.168.1.1/24 dev br1




NOK: after a multicast packe

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26

2018-05-24 Thread Gerhard Wiesinger


On 24.05.2018 07:29, Gerhard Wiesinger wrote:
After some analysis with Florian (thnx) we found out that the current 
implementation is broken:


https://patchwork.ozlabs.org/patch/836538/
https://github.com/torvalds/linux/commit/c499696e7901bda18385ac723b7bd27c3a4af624#diff-a2b6f8d89e18de600e873ac3ac43fa1d 



Florians comment:

c499696e7901bda18385ac723b7bd27c3a4af624 ("net: dsa: b53: Stop using
dev->cpu_port incorrectly") since it would result in no longer setting
the CPU port as tagged for a specific VLAN. Easiest way for you right
now is to just revert it, but this needs some more thoughts for a proper
upstream change. I will think about it some more.


Can confirm 4.14.18-200.fc26.armv7hl works, 4.15.x should be broken.

# Kernel 4.14.x ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.14.43
# Kernel 4.15.x should be NOT ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.15.18

Ciao,
Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26

2018-05-24 Thread Gerhard Wiesinger


On 24.05.2018 07:29, Gerhard Wiesinger wrote:
After some analysis with Florian (thnx) we found out that the current 
implementation is broken:


https://patchwork.ozlabs.org/patch/836538/
https://github.com/torvalds/linux/commit/c499696e7901bda18385ac723b7bd27c3a4af624#diff-a2b6f8d89e18de600e873ac3ac43fa1d 



Florians comment:

c499696e7901bda18385ac723b7bd27c3a4af624 ("net: dsa: b53: Stop using
dev->cpu_port incorrectly") since it would result in no longer setting
the CPU port as tagged for a specific VLAN. Easiest way for you right
now is to just revert it, but this needs some more thoughts for a proper
upstream change. I will think about it some more.


Can confirm 4.14.18-200.fc26.armv7hl works, 4.15.x should be broken.

# Kernel 4.14.x ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.14.43
# Kernel 4.15.x should be NOT ok
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/log/drivers/net/dsa/b53?h=v4.15.18

Ciao,
Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26

After some analysis with Florian (thnx) we found out that the current 
implementation is broken:


https://patchwork.ozlabs.org/patch/836538/
https://github.com/torvalds/linux/commit/c499696e7901bda18385ac723b7bd27c3a4af624#diff-a2b6f8d89e18de600e873ac3ac43fa1d

Florians comment:

c499696e7901bda18385ac723b7bd27c3a4af624 ("net: dsa: b53: Stop using
dev->cpu_port incorrectly") since it would result in no longer setting
the CPU port as tagged for a specific VLAN. Easiest way for you right
now is to just revert it, but this needs some more thoughts for a proper
upstream change. I will think about it some more.

Ciao,
Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26

After some analysis with Florian (thnx) we found out that the current 
implementation is broken:


https://patchwork.ozlabs.org/patch/836538/
https://github.com/torvalds/linux/commit/c499696e7901bda18385ac723b7bd27c3a4af624#diff-a2b6f8d89e18de600e873ac3ac43fa1d

Florians comment:

c499696e7901bda18385ac723b7bd27c3a4af624 ("net: dsa: b53: Stop using
dev->cpu_port incorrectly") since it would result in no longer setting
the CPU port as tagged for a specific VLAN. Easiest way for you right
now is to just revert it, but this needs some more thoughts for a proper
upstream change. I will think about it some more.

Ciao,
Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26


On 23.05.2018 19:55, Florian Fainelli wrote:

On 05/23/2018 10:35 AM, Gerhard Wiesinger wrote:

On 23.05.2018 17:28, Florian Fainelli wrote:

And in the future (time plan)?

If you don't care about multicast then you can use those patches:

https://github.com/ffainelli/linux/commit/de055bf5f34e9806463ab2793e0852f5dfc380df


and you have to change the part of drivers/net/dsa/b53/b53_common.c that
returns DSA_TAG_PROTO_NONE for 53125:


diff --git a/drivers/net/dsa/b53/b53_common.c
b/drivers/net/dsa/b53/b53_common.c
index 9f561fe505cb..3c64f026a8ce 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1557,7 +1557,7 @@ enum dsa_tag_protocol b53_get_tag_protocol(struct
dsa_switch *ds, int port)
   * mode to be turned on which means we need to specifically
manage ARL
   * misses on multicast addresses (TBD).
   */
-   if (is5325(dev) || is5365(dev) || is539x(dev) || is531x5(dev) ||
+   if (is5325(dev) || is5365(dev) || is539x(dev) ||
  !b53_can_enable_brcm_tags(ds, port))
  return DSA_TAG_PROTO_NONE;


That would bring Broadcom tags to the 53125 switch and you would be able
to use the configuration lines from Andrew in that case.

What's the plan here regarding these 2 config option mode (how do you
call them?)?

Broadcom tags is the underlying feature that provides per-port
information about the packets going in and out. Turning on Broadcom tags
requires turning on managed mode which means that the host now has to
manage how MAC addresses are programmed into the switch, it's not rocket
science, but I don't have a good test framework to automate the testing
of those changes yet. If you are willing to help in the testing, I can
certainly give you patches to try.


Yes, patches are welcome.


I mean, will this be a breaking change in the future where config has to
be done in a different way then?

When Broadcom tags are enabled the switch gets usable the way Andrew
expressed it, the only difference that makes on your configuration if
you want e.g: VLAN 101 to be for port 1-4 and VLAN 102 to be for port 5,
is that you no longer create an eth0.101 and eth0.102, but you create
br0.101 and br0.102.


I think documentation (dsa.txt) should provide more examples.




Or will it be configurable via module parameters or /proc or /sys
filesystem options?

We might be able to expose a sysfs attribute which shows the type of
tagging being enabled by a particular switch, that way scripts can
detect which variant: configuring the host controller or the bridge is
required. Would that be acceptable?


Yes, acceptable for me. But what's the long term concept for DSA (and 
also other implementations)?


- "old" mode variant, mode can only be read

- "new" mode variant, mode can only be read

- mode settable/configurable by the user, mode can be read


In general:

OK, thank you for your explainations.


I think DSA (at least with b53) had secveral topics. implementation 
bugs, missing documentation, lack of distribution support (e.g. 
systemd), etc. which were not understood by the users.


So everything which clarifies the topics for DSA in the future is welcome.

BTW: systemd-networkd support for DSA #7478

https://github.com/systemd/systemd/issues/7478


Ciao,

Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26


On 23.05.2018 19:55, Florian Fainelli wrote:

On 05/23/2018 10:35 AM, Gerhard Wiesinger wrote:

On 23.05.2018 17:28, Florian Fainelli wrote:

And in the future (time plan)?

If you don't care about multicast then you can use those patches:

https://github.com/ffainelli/linux/commit/de055bf5f34e9806463ab2793e0852f5dfc380df


and you have to change the part of drivers/net/dsa/b53/b53_common.c that
returns DSA_TAG_PROTO_NONE for 53125:


diff --git a/drivers/net/dsa/b53/b53_common.c
b/drivers/net/dsa/b53/b53_common.c
index 9f561fe505cb..3c64f026a8ce 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1557,7 +1557,7 @@ enum dsa_tag_protocol b53_get_tag_protocol(struct
dsa_switch *ds, int port)
   * mode to be turned on which means we need to specifically
manage ARL
   * misses on multicast addresses (TBD).
   */
-   if (is5325(dev) || is5365(dev) || is539x(dev) || is531x5(dev) ||
+   if (is5325(dev) || is5365(dev) || is539x(dev) ||
  !b53_can_enable_brcm_tags(ds, port))
  return DSA_TAG_PROTO_NONE;


That would bring Broadcom tags to the 53125 switch and you would be able
to use the configuration lines from Andrew in that case.

What's the plan here regarding these 2 config option mode (how do you
call them?)?

Broadcom tags is the underlying feature that provides per-port
information about the packets going in and out. Turning on Broadcom tags
requires turning on managed mode which means that the host now has to
manage how MAC addresses are programmed into the switch, it's not rocket
science, but I don't have a good test framework to automate the testing
of those changes yet. If you are willing to help in the testing, I can
certainly give you patches to try.


Yes, patches are welcome.


I mean, will this be a breaking change in the future where config has to
be done in a different way then?

When Broadcom tags are enabled the switch gets usable the way Andrew
expressed it, the only difference that makes on your configuration if
you want e.g: VLAN 101 to be for port 1-4 and VLAN 102 to be for port 5,
is that you no longer create an eth0.101 and eth0.102, but you create
br0.101 and br0.102.


I think documentation (dsa.txt) should provide more examples.




Or will it be configurable via module parameters or /proc or /sys
filesystem options?

We might be able to expose a sysfs attribute which shows the type of
tagging being enabled by a particular switch, that way scripts can
detect which variant: configuring the host controller or the bridge is
required. Would that be acceptable?


Yes, acceptable for me. But what's the long term concept for DSA (and 
also other implementations)?


- "old" mode variant, mode can only be read

- "new" mode variant, mode can only be read

- mode settable/configurable by the user, mode can be read


In general:

OK, thank you for your explainations.


I think DSA (at least with b53) had secveral topics. implementation 
bugs, missing documentation, lack of distribution support (e.g. 
systemd), etc. which were not understood by the users.


So everything which clarifies the topics for DSA in the future is welcome.

BTW: systemd-networkd support for DSA #7478

https://github.com/systemd/systemd/issues/7478


Ciao,

Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26


On 23.05.2018 19:47, Florian Fainelli wrote:

On 05/23/2018 10:29 AM, Gerhard Wiesinger wrote:

On 23.05.2018 17:50, Florian Fainelli wrote:

On 05/23/2018 08:28 AM, Florian Fainelli wrote:

On 05/22/2018 09:49 PM, Gerhard Wiesinger wrote:

On 22.05.2018 22:42, Florian Fainelli wrote:

On 05/22/2018 01:16 PM, Andrew Lunn wrote:

Planned network structure will be as with 4.7.x kernels:

br0 <=> eth0.101 <=> eth0 (vlan 101 tagged) <=> lan 1-lan4 (vlan 101
untagged pvid)

br1 <=> eth0.102 <=> eth0 (vlan 102 tagged) <=> wan (vlan 102
untagged pvid)

Do you even need these vlans?

Yes, remember, b53 does not currently turn on Broadcom tags, so the
only
way to segregate traffic is to have VLANs for that.


Are you doing this for port separation? To keep lan1-4 traffic
separate from wan? DSA does that by default, no vlan needed.

So you can just do

ip link add name br0 type bridge
ip link set dev br0 up
ip link set dev lan1 master br0
ip link set dev lan2 master br0
ip link set dev lan3 master br0
ip link set dev lan4 master br0

and use interface wan directly, no bridge needed.

That would work once Broadcom tags are turned on which requires
turning
on managed mode, which requires work that I have not been able to get
done :)

Setup with swconfig:

#!/usr/bin/bash


INTERFACE=eth0

# Delete all IP addresses and get link up
ip addr flush dev ${INTERFACE}
ip link set ${INTERFACE} up

# Lamobo R1 aka BPi R1 Routerboard
#
# Speaker | LAN1 | LAN2 | LAN3 | LAN4 || LAN5 | HDMI
# SW-Port |  P2  |  P1  |  P0  |  P4  ||  P3  |
# VLAN    |  11  |  12  |  13  |  14  ||ALL(t)|
#
# Switch-Port P8 - ALL(t) boards internal CPU Port

# Setup switch
swconfig dev ${INTERFACE} set reset 1
swconfig dev ${INTERFACE} set enable_vlan 1
swconfig dev ${INTERFACE} vlan 101 set ports '3 8t'
swconfig dev ${INTERFACE} vlan 102 set ports '4 0 1 2 8t'
swconfig dev ${INTERFACE} set apply 1

How to achieve this setup CURRENTLY with DSA?

Your first email had the right programming sequence, but you did not
answer whether you have CONFIG_BRIDGE_VLAN_FILTERING enabled or not,
which is likely your problem.

Here are some reference configurations that should work:

https://github.com/armbian/build/issues/511#issuecomment-320473246

I know, some comments are from me but none of them worked, therefore on
LKML :-)

I see, maybe you could have started there, that would have saved me a
trip to github to find out the thread.


/boot/config-4.16.7-100.fc26.armv7hl:CONFIG_BRIDGE_VLAN_FILTERING=y

so this can't be the issue, any further ideas?

Yes, remove the "self" from your bridge vlan commands, I don't see that
being necessary.


Same:
[root@bpi ~]# bridge vlan add dev lan1 vid 101 pvid untagged self
RTNETLINK answers: Operation not supported
[root@bpi ~]# bridge vlan add dev lan1 vid 101 pvid untagged
RTNETLINK answers: Operation not supported
[root@bpi ~]# bridge vlan add dev lan1 vid 101
RTNETLINK answers: Operation not supported

Any ideas how to debug further?




On my 2nd Banana Pi-R1 still on Fedora 25 with kernel
4.12.8-200.fc25.armv7hl the commands still work well, but I wanted to
test the upgrade on another one.

/boot/config-4.12.8-200.fc25.armv7hl:CONFIG_BRIDGE_VLAN_FILTERING=y

Is using an upstream or compiled by yourself kernel an option at all? I
have no clue what is in a distribution kernel.


Typically the Fedora kernels work fine (long term experience since 
Fedora Core 1 from 2004 :-) ). I had some custom patches in there in the 
past for external RTC and b53_switch.kernel_4.5+.patch, but otherwise no 
topics. Therefore with upstream DSA support that should be fine then.



Infos can be found here:

https://koji.fedoraproject.org/koji/packageinfo?packageID=8

https://koji.fedoraproject.org/koji/buildinfo?buildID=1078638


Ciao,

Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26


On 23.05.2018 19:47, Florian Fainelli wrote:

On 05/23/2018 10:29 AM, Gerhard Wiesinger wrote:

On 23.05.2018 17:50, Florian Fainelli wrote:

On 05/23/2018 08:28 AM, Florian Fainelli wrote:

On 05/22/2018 09:49 PM, Gerhard Wiesinger wrote:

On 22.05.2018 22:42, Florian Fainelli wrote:

On 05/22/2018 01:16 PM, Andrew Lunn wrote:

Planned network structure will be as with 4.7.x kernels:

br0 <=> eth0.101 <=> eth0 (vlan 101 tagged) <=> lan 1-lan4 (vlan 101
untagged pvid)

br1 <=> eth0.102 <=> eth0 (vlan 102 tagged) <=> wan (vlan 102
untagged pvid)

Do you even need these vlans?

Yes, remember, b53 does not currently turn on Broadcom tags, so the
only
way to segregate traffic is to have VLANs for that.


Are you doing this for port separation? To keep lan1-4 traffic
separate from wan? DSA does that by default, no vlan needed.

So you can just do

ip link add name br0 type bridge
ip link set dev br0 up
ip link set dev lan1 master br0
ip link set dev lan2 master br0
ip link set dev lan3 master br0
ip link set dev lan4 master br0

and use interface wan directly, no bridge needed.

That would work once Broadcom tags are turned on which requires
turning
on managed mode, which requires work that I have not been able to get
done :)

Setup with swconfig:

#!/usr/bin/bash


INTERFACE=eth0

# Delete all IP addresses and get link up
ip addr flush dev ${INTERFACE}
ip link set ${INTERFACE} up

# Lamobo R1 aka BPi R1 Routerboard
#
# Speaker | LAN1 | LAN2 | LAN3 | LAN4 || LAN5 | HDMI
# SW-Port |  P2  |  P1  |  P0  |  P4  ||  P3  |
# VLAN    |  11  |  12  |  13  |  14  ||ALL(t)|
#
# Switch-Port P8 - ALL(t) boards internal CPU Port

# Setup switch
swconfig dev ${INTERFACE} set reset 1
swconfig dev ${INTERFACE} set enable_vlan 1
swconfig dev ${INTERFACE} vlan 101 set ports '3 8t'
swconfig dev ${INTERFACE} vlan 102 set ports '4 0 1 2 8t'
swconfig dev ${INTERFACE} set apply 1

How to achieve this setup CURRENTLY with DSA?

Your first email had the right programming sequence, but you did not
answer whether you have CONFIG_BRIDGE_VLAN_FILTERING enabled or not,
which is likely your problem.

Here are some reference configurations that should work:

https://github.com/armbian/build/issues/511#issuecomment-320473246

I know, some comments are from me but none of them worked, therefore on
LKML :-)

I see, maybe you could have started there, that would have saved me a
trip to github to find out the thread.


/boot/config-4.16.7-100.fc26.armv7hl:CONFIG_BRIDGE_VLAN_FILTERING=y

so this can't be the issue, any further ideas?

Yes, remove the "self" from your bridge vlan commands, I don't see that
being necessary.


Same:
[root@bpi ~]# bridge vlan add dev lan1 vid 101 pvid untagged self
RTNETLINK answers: Operation not supported
[root@bpi ~]# bridge vlan add dev lan1 vid 101 pvid untagged
RTNETLINK answers: Operation not supported
[root@bpi ~]# bridge vlan add dev lan1 vid 101
RTNETLINK answers: Operation not supported

Any ideas how to debug further?




On my 2nd Banana Pi-R1 still on Fedora 25 with kernel
4.12.8-200.fc25.armv7hl the commands still work well, but I wanted to
test the upgrade on another one.

/boot/config-4.12.8-200.fc25.armv7hl:CONFIG_BRIDGE_VLAN_FILTERING=y

Is using an upstream or compiled by yourself kernel an option at all? I
have no clue what is in a distribution kernel.


Typically the Fedora kernels work fine (long term experience since 
Fedora Core 1 from 2004 :-) ). I had some custom patches in there in the 
past for external RTC and b53_switch.kernel_4.5+.patch, but otherwise no 
topics. Therefore with upstream DSA support that should be fine then.



Infos can be found here:

https://koji.fedoraproject.org/koji/packageinfo?packageID=8

https://koji.fedoraproject.org/koji/buildinfo?buildID=1078638


Ciao,

Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26


On 23.05.2018 17:28, Florian Fainelli wrote:



And in the future (time plan)?

If you don't care about multicast then you can use those patches:

https://github.com/ffainelli/linux/commit/de055bf5f34e9806463ab2793e0852f5dfc380df

and you have to change the part of drivers/net/dsa/b53/b53_common.c that
returns DSA_TAG_PROTO_NONE for 53125:


diff --git a/drivers/net/dsa/b53/b53_common.c
b/drivers/net/dsa/b53/b53_common.c
index 9f561fe505cb..3c64f026a8ce 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1557,7 +1557,7 @@ enum dsa_tag_protocol b53_get_tag_protocol(struct
dsa_switch *ds, int port)
  * mode to be turned on which means we need to specifically
manage ARL
  * misses on multicast addresses (TBD).
  */
-   if (is5325(dev) || is5365(dev) || is539x(dev) || is531x5(dev) ||
+   if (is5325(dev) || is5365(dev) || is539x(dev) ||
 !b53_can_enable_brcm_tags(ds, port))
 return DSA_TAG_PROTO_NONE;


That would bring Broadcom tags to the 53125 switch and you would be able
to use the configuration lines from Andrew in that case.


What's the plan here regarding these 2 config option mode (how do you 
call them?)?


I mean, will this be a breaking change in the future where config has to 
be done in a different way then?


Or will it be configurable via module parameters or /proc or /sys 
filesystem options?



Thank you.

Ciao,

Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26


On 23.05.2018 17:28, Florian Fainelli wrote:



And in the future (time plan)?

If you don't care about multicast then you can use those patches:

https://github.com/ffainelli/linux/commit/de055bf5f34e9806463ab2793e0852f5dfc380df

and you have to change the part of drivers/net/dsa/b53/b53_common.c that
returns DSA_TAG_PROTO_NONE for 53125:


diff --git a/drivers/net/dsa/b53/b53_common.c
b/drivers/net/dsa/b53/b53_common.c
index 9f561fe505cb..3c64f026a8ce 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1557,7 +1557,7 @@ enum dsa_tag_protocol b53_get_tag_protocol(struct
dsa_switch *ds, int port)
  * mode to be turned on which means we need to specifically
manage ARL
  * misses on multicast addresses (TBD).
  */
-   if (is5325(dev) || is5365(dev) || is539x(dev) || is531x5(dev) ||
+   if (is5325(dev) || is5365(dev) || is539x(dev) ||
 !b53_can_enable_brcm_tags(ds, port))
 return DSA_TAG_PROTO_NONE;


That would bring Broadcom tags to the 53125 switch and you would be able
to use the configuration lines from Andrew in that case.


What's the plan here regarding these 2 config option mode (how do you 
call them?)?


I mean, will this be a breaking change in the future where config has to 
be done in a different way then?


Or will it be configurable via module parameters or /proc or /sys 
filesystem options?



Thank you.

Ciao,

Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26


On 23.05.2018 17:50, Florian Fainelli wrote:


On 05/23/2018 08:28 AM, Florian Fainelli wrote:


On 05/22/2018 09:49 PM, Gerhard Wiesinger wrote:

On 22.05.2018 22:42, Florian Fainelli wrote:

On 05/22/2018 01:16 PM, Andrew Lunn wrote:

Planned network structure will be as with 4.7.x kernels:

br0 <=> eth0.101 <=> eth0 (vlan 101 tagged) <=> lan 1-lan4 (vlan 101
untagged pvid)

br1 <=> eth0.102 <=> eth0 (vlan 102 tagged) <=> wan (vlan 102
untagged pvid)

Do you even need these vlans?

Yes, remember, b53 does not currently turn on Broadcom tags, so the only
way to segregate traffic is to have VLANs for that.


Are you doing this for port separation? To keep lan1-4 traffic
separate from wan? DSA does that by default, no vlan needed.

So you can just do

ip link add name br0 type bridge
ip link set dev br0 up
ip link set dev lan1 master br0
ip link set dev lan2 master br0
ip link set dev lan3 master br0
ip link set dev lan4 master br0

and use interface wan directly, no bridge needed.

That would work once Broadcom tags are turned on which requires turning
on managed mode, which requires work that I have not been able to get
done :)

Setup with swconfig:

#!/usr/bin/bash


INTERFACE=eth0

# Delete all IP addresses and get link up
ip addr flush dev ${INTERFACE}
ip link set ${INTERFACE} up

# Lamobo R1 aka BPi R1 Routerboard
#
# Speaker | LAN1 | LAN2 | LAN3 | LAN4 || LAN5 | HDMI
# SW-Port |  P2  |  P1  |  P0  |  P4  ||  P3  |
# VLAN    |  11  |  12  |  13  |  14  ||ALL(t)|
#
# Switch-Port P8 - ALL(t) boards internal CPU Port

# Setup switch
swconfig dev ${INTERFACE} set reset 1
swconfig dev ${INTERFACE} set enable_vlan 1
swconfig dev ${INTERFACE} vlan 101 set ports '3 8t'
swconfig dev ${INTERFACE} vlan 102 set ports '4 0 1 2 8t'
swconfig dev ${INTERFACE} set apply 1

How to achieve this setup CURRENTLY with DSA?

Your first email had the right programming sequence, but you did not
answer whether you have CONFIG_BRIDGE_VLAN_FILTERING enabled or not,
which is likely your problem.

Here are some reference configurations that should work:

https://github.com/armbian/build/issues/511#issuecomment-320473246


I know, some comments are from me but none of them worked, therefore on 
LKML :-)


/boot/config-4.16.7-100.fc26.armv7hl:CONFIG_BRIDGE_VLAN_FILTERING=y

so this can't be the issue, any further ideas?

On my 2nd Banana Pi-R1 still on Fedora 25 with kernel 
4.12.8-200.fc25.armv7hl the commands still work well, but I wanted to 
test the upgrade on another one.


/boot/config-4.12.8-200.fc25.armv7hl:CONFIG_BRIDGE_VLAN_FILTERING=y

Thnx.

Ciao,

Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26


On 23.05.2018 17:50, Florian Fainelli wrote:


On 05/23/2018 08:28 AM, Florian Fainelli wrote:


On 05/22/2018 09:49 PM, Gerhard Wiesinger wrote:

On 22.05.2018 22:42, Florian Fainelli wrote:

On 05/22/2018 01:16 PM, Andrew Lunn wrote:

Planned network structure will be as with 4.7.x kernels:

br0 <=> eth0.101 <=> eth0 (vlan 101 tagged) <=> lan 1-lan4 (vlan 101
untagged pvid)

br1 <=> eth0.102 <=> eth0 (vlan 102 tagged) <=> wan (vlan 102
untagged pvid)

Do you even need these vlans?

Yes, remember, b53 does not currently turn on Broadcom tags, so the only
way to segregate traffic is to have VLANs for that.


Are you doing this for port separation? To keep lan1-4 traffic
separate from wan? DSA does that by default, no vlan needed.

So you can just do

ip link add name br0 type bridge
ip link set dev br0 up
ip link set dev lan1 master br0
ip link set dev lan2 master br0
ip link set dev lan3 master br0
ip link set dev lan4 master br0

and use interface wan directly, no bridge needed.

That would work once Broadcom tags are turned on which requires turning
on managed mode, which requires work that I have not been able to get
done :)

Setup with swconfig:

#!/usr/bin/bash


INTERFACE=eth0

# Delete all IP addresses and get link up
ip addr flush dev ${INTERFACE}
ip link set ${INTERFACE} up

# Lamobo R1 aka BPi R1 Routerboard
#
# Speaker | LAN1 | LAN2 | LAN3 | LAN4 || LAN5 | HDMI
# SW-Port |  P2  |  P1  |  P0  |  P4  ||  P3  |
# VLAN    |  11  |  12  |  13  |  14  ||ALL(t)|
#
# Switch-Port P8 - ALL(t) boards internal CPU Port

# Setup switch
swconfig dev ${INTERFACE} set reset 1
swconfig dev ${INTERFACE} set enable_vlan 1
swconfig dev ${INTERFACE} vlan 101 set ports '3 8t'
swconfig dev ${INTERFACE} vlan 102 set ports '4 0 1 2 8t'
swconfig dev ${INTERFACE} set apply 1

How to achieve this setup CURRENTLY with DSA?

Your first email had the right programming sequence, but you did not
answer whether you have CONFIG_BRIDGE_VLAN_FILTERING enabled or not,
which is likely your problem.

Here are some reference configurations that should work:

https://github.com/armbian/build/issues/511#issuecomment-320473246


I know, some comments are from me but none of them worked, therefore on 
LKML :-)


/boot/config-4.16.7-100.fc26.armv7hl:CONFIG_BRIDGE_VLAN_FILTERING=y

so this can't be the issue, any further ideas?

On my 2nd Banana Pi-R1 still on Fedora 25 with kernel 
4.12.8-200.fc25.armv7hl the commands still work well, but I wanted to 
test the upgrade on another one.


/boot/config-4.12.8-200.fc25.armv7hl:CONFIG_BRIDGE_VLAN_FILTERING=y

Thnx.

Ciao,

Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26


On 22.05.2018 22:42, Florian Fainelli wrote:

On 05/22/2018 01:16 PM, Andrew Lunn wrote:

Planned network structure will be as with 4.7.x kernels:

br0 <=> eth0.101 <=> eth0 (vlan 101 tagged) <=> lan 1-lan4 (vlan 101
untagged pvid)

br1 <=> eth0.102 <=> eth0 (vlan 102 tagged) <=> wan (vlan 102 untagged pvid)

Do you even need these vlans?

Yes, remember, b53 does not currently turn on Broadcom tags, so the only
way to segregate traffic is to have VLANs for that.


Are you doing this for port separation? To keep lan1-4 traffic
separate from wan? DSA does that by default, no vlan needed.

So you can just do

ip link add name br0 type bridge
ip link set dev br0 up
ip link set dev lan1 master br0
ip link set dev lan2 master br0
ip link set dev lan3 master br0
ip link set dev lan4 master br0

and use interface wan directly, no bridge needed.

That would work once Broadcom tags are turned on which requires turning
on managed mode, which requires work that I have not been able to get
done :)


Setup with swconfig:

#!/usr/bin/bash


INTERFACE=eth0

# Delete all IP addresses and get link up
ip addr flush dev ${INTERFACE}
ip link set ${INTERFACE} up

# Lamobo R1 aka BPi R1 Routerboard
#
# Speaker | LAN1 | LAN2 | LAN3 | LAN4 || LAN5 | HDMI
# SW-Port |  P2  |  P1  |  P0  |  P4  ||  P3  |
# VLAN    |  11  |  12  |  13  |  14  ||ALL(t)|
#
# Switch-Port P8 - ALL(t) boards internal CPU Port

# Setup switch
swconfig dev ${INTERFACE} set reset 1
swconfig dev ${INTERFACE} set enable_vlan 1
swconfig dev ${INTERFACE} vlan 101 set ports '3 8t'
swconfig dev ${INTERFACE} vlan 102 set ports '4 0 1 2 8t'
swconfig dev ${INTERFACE} set apply 1

How to achieve this setup CURRENTLY with DSA?

And in the future (time plan)?

Thank you.

Ciao,
Gerhard

Re: B53 DSA switch problem on Banana Pi-R1 on Fedora 26


On 22.05.2018 22:42, Florian Fainelli wrote:

On 05/22/2018 01:16 PM, Andrew Lunn wrote:

Planned network structure will be as with 4.7.x kernels:

br0 <=> eth0.101 <=> eth0 (vlan 101 tagged) <=> lan 1-lan4 (vlan 101
untagged pvid)

br1 <=> eth0.102 <=> eth0 (vlan 102 tagged) <=> wan (vlan 102 untagged pvid)

Do you even need these vlans?

Yes, remember, b53 does not currently turn on Broadcom tags, so the only
way to segregate traffic is to have VLANs for that.


Are you doing this for port separation? To keep lan1-4 traffic
separate from wan? DSA does that by default, no vlan needed.

So you can just do

ip link add name br0 type bridge
ip link set dev br0 up
ip link set dev lan1 master br0
ip link set dev lan2 master br0
ip link set dev lan3 master br0
ip link set dev lan4 master br0

and use interface wan directly, no bridge needed.

That would work once Broadcom tags are turned on which requires turning
on managed mode, which requires work that I have not been able to get
done :)


Setup with swconfig:

#!/usr/bin/bash


INTERFACE=eth0

# Delete all IP addresses and get link up
ip addr flush dev ${INTERFACE}
ip link set ${INTERFACE} up

# Lamobo R1 aka BPi R1 Routerboard
#
# Speaker | LAN1 | LAN2 | LAN3 | LAN4 || LAN5 | HDMI
# SW-Port |  P2  |  P1  |  P0  |  P4  ||  P3  |
# VLAN    |  11  |  12  |  13  |  14  ||ALL(t)|
#
# Switch-Port P8 - ALL(t) boards internal CPU Port

# Setup switch
swconfig dev ${INTERFACE} set reset 1
swconfig dev ${INTERFACE} set enable_vlan 1
swconfig dev ${INTERFACE} vlan 101 set ports '3 8t'
swconfig dev ${INTERFACE} vlan 102 set ports '4 0 1 2 8t'
swconfig dev ${INTERFACE} set apply 1

How to achieve this setup CURRENTLY with DSA?

And in the future (time plan)?

Thank you.

Ciao,
Gerhard

B53 DSA switch problem on Banana Pi-R1 on Fedora 26


Hello,

I'm trying to get B53 DSA switch working on the Banana Pi-R1 on Fedora 
26 to run (I will upgrade to Fedora 27 and Fedora 28 when networking 
works again). Previously the switch was configured with swconfig without 
any problems.


Kernel: 4.16.7-100.fc26.armv7hl

b53_common: found switch: BCM53125, rev 4

I see all interfaces: lan1 to lan4 and wan.

i get the following error messages:

# master and self, same results

bridge vlan add dev lan1 vid 101 pvid untagged self
RTNETLINK answers: Operation not supported
bridge vlan add dev lan2 vid 101 pvid untagged self
RTNETLINK answers: Operation not supported
bridge vlan add dev lan3 vid 101 pvid untagged self
RTNETLINK answers: Operation not supported
bridge vlan add dev lan4 vid 101 pvid untagged self
RTNETLINK answers: Operation not supported

# No quite sure here regarding CPU interface and VLAN, because this 
changed with some patches, also from dsa.txt


bridge vlan add dev eth0 vid 101 self
RTNETLINK answers: Operation not supported

Planned network structure will be as with 4.7.x kernels:

br0 <=> eth0.101 <=> eth0 (vlan 101 tagged) <=> lan 1-lan4 (vlan 101 
untagged pvid)


br1 <=> eth0.102 <=> eth0 (vlan 102 tagged) <=> wan (vlan 102 untagged pvid)

I think the rest of the config is clear after some research now, but I 
provide details if that one worked well.


If necessary I can provide full commands & logs and further details.

Thank you.

Any ideas?

Ciao,

Gerhard

B53 DSA switch problem on Banana Pi-R1 on Fedora 26


Hello,

I'm trying to get B53 DSA switch working on the Banana Pi-R1 on Fedora 
26 to run (I will upgrade to Fedora 27 and Fedora 28 when networking 
works again). Previously the switch was configured with swconfig without 
any problems.


Kernel: 4.16.7-100.fc26.armv7hl

b53_common: found switch: BCM53125, rev 4

I see all interfaces: lan1 to lan4 and wan.

i get the following error messages:

# master and self, same results

bridge vlan add dev lan1 vid 101 pvid untagged self
RTNETLINK answers: Operation not supported
bridge vlan add dev lan2 vid 101 pvid untagged self
RTNETLINK answers: Operation not supported
bridge vlan add dev lan3 vid 101 pvid untagged self
RTNETLINK answers: Operation not supported
bridge vlan add dev lan4 vid 101 pvid untagged self
RTNETLINK answers: Operation not supported

# No quite sure here regarding CPU interface and VLAN, because this 
changed with some patches, also from dsa.txt


bridge vlan add dev eth0 vid 101 self
RTNETLINK answers: Operation not supported

Planned network structure will be as with 4.7.x kernels:

br0 <=> eth0.101 <=> eth0 (vlan 101 tagged) <=> lan 1-lan4 (vlan 101 
untagged pvid)


br1 <=> eth0.102 <=> eth0 (vlan 102 tagged) <=> wan (vlan 102 untagged pvid)

I think the rest of the config is clear after some research now, but I 
provide details if that one worked well.


If necessary I can provide full commands & logs and further details.

Thank you.

Any ideas?

Ciao,

Gerhard

Re: [Qemu-devel] kvm_intel fails to load on Conroe CPUs running Linux 4.12

2017-09-27 Thread Gerhard Wiesinger


On 15.09.2017 19:07, Paolo Bonzini wrote:

On 15/09/2017 16:43, Gerhard Wiesinger wrote:

On 27.08.2017 20:55, Paolo Bonzini wrote:


Il 27 ago 2017 4:48 PM, "Gerhard Wiesinger" <li...@wiesinger.com
<mailto:li...@wiesinger.com>> ha scritto:

 On 27.08.2017 14 <tel:27.08.2017%2014>:03, Paolo Bonzini wrote:


 We will revert the patch, but 4.13.0 will not have the fix.
 Expect it in later stable kernels (because vacations).


 Thnx. Why will 4.13.0 NOT have the fix?


Because maintainers are on vacation! :-)



Hello Paolo,

Any update on this for 4.12 and 4.13 kernels?

A late fix is better than a wrong fix.  Hope to get to it next week!


Hello Paolo,

Any update? Thnx.

Ciao,
Gerhard

Re: [Qemu-devel] kvm_intel fails to load on Conroe CPUs running Linux 4.12

2017-09-27 Thread Gerhard Wiesinger


On 15.09.2017 19:07, Paolo Bonzini wrote:

On 15/09/2017 16:43, Gerhard Wiesinger wrote:

On 27.08.2017 20:55, Paolo Bonzini wrote:


Il 27 ago 2017 4:48 PM, "Gerhard Wiesinger" mailto:li...@wiesinger.com>> ha scritto:

 On 27.08.2017 14 :03, Paolo Bonzini wrote:


 We will revert the patch, but 4.13.0 will not have the fix.
 Expect it in later stable kernels (because vacations).


 Thnx. Why will 4.13.0 NOT have the fix?


Because maintainers are on vacation! :-)



Hello Paolo,

Any update on this for 4.12 and 4.13 kernels?

A late fix is better than a wrong fix.  Hope to get to it next week!


Hello Paolo,

Any update? Thnx.

Ciao,
Gerhard

Re: [Qemu-devel] kvm_intel fails to load on Conroe CPUs running Linux 4.12


On 27.08.2017 14:03, Paolo Bonzini wrote:

Il 27 ago 2017 9:49 AM, "Gerhard Wiesinger" <li...@wiesinger.com> ha
scritto:

On 17.08.2017 23:14, Gerhard Wiesinger wrote:


On 17.08.2017 22:58, Gerhard Wiesinger wrote:

On 07.08.2017 19:50, Paolo Bonzini wrote:


Not much to say, unfortunately. It's pretty much the same capabilities
as a Prescott/Cedar Mill processor, except that it has MSR bitmaps. It
also lacks FlexPriority compared to the Conroe I had checked.

It's not great that even the revert patch doesn't apply cleanly---this
is *not* necessarily a boring area of the hypervisor...

Given the rarity of your machine I'm currently leaning towards _not_
reverting the change. I'll check another non-Xeon Core 2 tomorrow that
is from December 2008 (IIRC). If that one also lacks vNMI, or if I get
other reports, I suppose I will have to reconsider that.

Hello Paolo,

Can you please revert the patch.

CPU is a Core 2 Extreme QX6700: SL9UL (B3) running VERY stable with ECC
RAM for years now.
https://ark.intel.com/products/28028/Intel-Core2-Extreme-
Processor-QX6700-8M-Cache-2_66-GHz-1066-MHz-FSB?q=Core%
202%20Extreme%20QX6700
https://en.wikipedia.org/wiki/List_of_Intel_Core_2_microprocessors

CPU details below.

Thank you.

Ciao,
Gerhard


Hello Paolo,

Any update on this major issue?


We will revert the patch, but 4.13.0 will not have the fix. Expect it in
later stable kernels (because vacations).

Thnx. Why will 4.13.0 NOT have the fix?

Thnx.

Ciao,
Gerhard

Re: [Qemu-devel] kvm_intel fails to load on Conroe CPUs running Linux 4.12


On 27.08.2017 14:03, Paolo Bonzini wrote:

Il 27 ago 2017 9:49 AM, "Gerhard Wiesinger"  ha
scritto:

On 17.08.2017 23:14, Gerhard Wiesinger wrote:


On 17.08.2017 22:58, Gerhard Wiesinger wrote:

On 07.08.2017 19:50, Paolo Bonzini wrote:


Not much to say, unfortunately. It's pretty much the same capabilities
as a Prescott/Cedar Mill processor, except that it has MSR bitmaps. It
also lacks FlexPriority compared to the Conroe I had checked.

It's not great that even the revert patch doesn't apply cleanly---this
is *not* necessarily a boring area of the hypervisor...

Given the rarity of your machine I'm currently leaning towards _not_
reverting the change. I'll check another non-Xeon Core 2 tomorrow that
is from December 2008 (IIRC). If that one also lacks vNMI, or if I get
other reports, I suppose I will have to reconsider that.

Hello Paolo,

Can you please revert the patch.

CPU is a Core 2 Extreme QX6700: SL9UL (B3) running VERY stable with ECC
RAM for years now.
https://ark.intel.com/products/28028/Intel-Core2-Extreme-
Processor-QX6700-8M-Cache-2_66-GHz-1066-MHz-FSB?q=Core%
202%20Extreme%20QX6700
https://en.wikipedia.org/wiki/List_of_Intel_Core_2_microprocessors

CPU details below.

Thank you.

Ciao,
Gerhard


Hello Paolo,

Any update on this major issue?


We will revert the patch, but 4.13.0 will not have the fix. Expect it in
later stable kernels (because vacations).

Thnx. Why will 4.13.0 NOT have the fix?

Thnx.

Ciao,
Gerhard

Re: [Qemu-devel] kvm_intel fails to load on Conroe CPUs running Linux 4.12


On 17.08.2017 23:14, Gerhard Wiesinger wrote:

On 17.08.2017 22:58, Gerhard Wiesinger wrote:
>
> On 07.08.2017 19:50, Paolo Bonzini wrote:
>
> >Not much to say, unfortunately. It's pretty much the same capabilities
> >as a Prescott/Cedar Mill processor, except that it has MSR bitmaps. It
> >also lacks FlexPriority compared to the Conroe I had checked.
> >
> >It's not great that even the revert patch doesn't apply cleanly---this
> >is *not* necessarily a boring area of the hypervisor...
> >
> >Given the rarity of your machine I'm currently leaning towards _not_
> >reverting the change. I'll check another non-Xeon Core 2 tomorrow that
> >is from December 2008 (IIRC). If that one also lacks vNMI, or if I get
> >other reports, I suppose I will have to reconsider that.

Hello Paolo,

Can you please revert the patch.

CPU is a Core 2 Extreme QX6700: SL9UL (B3) running VERY stable with 
ECC RAM for years now.
https://ark.intel.com/products/28028/Intel-Core2-Extreme-Processor-QX6700-8M-Cache-2_66-GHz-1066-MHz-FSB?q=Core%202%20Extreme%20QX6700 


https://en.wikipedia.org/wiki/List_of_Intel_Core_2_microprocessors

CPU details below.

Thank you.

Ciao,
Gerhard 


Hello Paolo,

Any update on this major issue?

Thnx.

Ciao,
Gerhard

Re: [Qemu-devel] kvm_intel fails to load on Conroe CPUs running Linux 4.12


On 17.08.2017 23:14, Gerhard Wiesinger wrote:

On 17.08.2017 22:58, Gerhard Wiesinger wrote:
>
> On 07.08.2017 19:50, Paolo Bonzini wrote:
>
> >Not much to say, unfortunately. It's pretty much the same capabilities
> >as a Prescott/Cedar Mill processor, except that it has MSR bitmaps. It
> >also lacks FlexPriority compared to the Conroe I had checked.
> >
> >It's not great that even the revert patch doesn't apply cleanly---this
> >is *not* necessarily a boring area of the hypervisor...
> >
> >Given the rarity of your machine I'm currently leaning towards _not_
> >reverting the change. I'll check another non-Xeon Core 2 tomorrow that
> >is from December 2008 (IIRC). If that one also lacks vNMI, or if I get
> >other reports, I suppose I will have to reconsider that.

Hello Paolo,

Can you please revert the patch.

CPU is a Core 2 Extreme QX6700: SL9UL (B3) running VERY stable with 
ECC RAM for years now.
https://ark.intel.com/products/28028/Intel-Core2-Extreme-Processor-QX6700-8M-Cache-2_66-GHz-1066-MHz-FSB?q=Core%202%20Extreme%20QX6700 


https://en.wikipedia.org/wiki/List_of_Intel_Core_2_microprocessors

CPU details below.

Thank you.

Ciao,
Gerhard 


Hello Paolo,

Any update on this major issue?

Thnx.

Ciao,
Gerhard

Re: kvm_intel fails to load on Conroe CPUs running Linux 4.12

2017-08-17 Thread Gerhard Wiesinger


On 17.08.2017 22:58, Gerhard Wiesinger wrote:
>
> On 07.08.2017 19:50, Paolo Bonzini wrote:
>
> >Not much to say, unfortunately. It's pretty much the same capabilities
> >as a Prescott/Cedar Mill processor, except that it has MSR bitmaps. It
> >also lacks FlexPriority compared to the Conroe I had checked.
> >
> >It's not great that even the revert patch doesn't apply cleanly---this
> >is *not* necessarily a boring area of the hypervisor...
> >
> >Given the rarity of your machine I'm currently leaning towards _not_
> >reverting the change. I'll check another non-Xeon Core 2 tomorrow that
> >is from December 2008 (IIRC). If that one also lacks vNMI, or if I get
> >other reports, I suppose I will have to reconsider that.

Hello Paolo,

Can you please revert the patch.

CPU is a Core 2 Extreme QX6700: SL9UL (B3) running VERY stable with ECC 
RAM for years now.

https://ark.intel.com/products/28028/Intel-Core2-Extreme-Processor-QX6700-8M-Cache-2_66-GHz-1066-MHz-FSB?q=Core%202%20Extreme%20QX6700
https://en.wikipedia.org/wiki/List_of_Intel_Core_2_microprocessors

CPU details below.

Thank you.

Ciao,
Gerhard

cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Core(TM)2 Quad CPU   @ 2.66GHz
stepping    : 7
microcode   : 0x6a
cpu MHz : 1596.000
cache size  : 4096 KB
physical id : 0
siblings    : 4
core id : 0
cpu cores   : 4
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid
aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm 
lahf_lm tpr_shadow dtherm

bugs    :
bogomips    : 5333.45
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

Script output:
Basic VMX Information
  Hex: 0x1a0407
  Revision 7
  VMCS size    1024
  VMCS restricted to 32 bit addresses  no
  Dual-monitor support yes
  VMCS memory type 6
  INS/OUTS instruction information no
  IA32_VMX_TRUE_*_CTLS support no
pin-based controls
  External interrupt exiting   yes
  NMI exiting  yes
  Virtual NMIs no
  Activate VMX-preemption timer    no
  Process posted interrupts    no
primary processor-based controls
  Interrupt window exiting yes
  Use TSC offsetting   yes
  HLT exiting  yes
  INVLPG exiting   yes
  MWAIT exiting    yes
  RDPMC exiting    yes
  RDTSC exiting    yes
  CR3-load exiting forced
  CR3-store exiting    forced
  CR8-load exiting yes
  CR8-store exiting    yes
  Use TPR shadow   yes
  NMI-window exiting   no
  MOV-DR exiting   yes
  Unconditional I/O exiting    yes
  Use I/O bitmaps  yes
  Monitor trap flag    no
  Use MSR bitmaps  yes
  MONITOR exiting  yes
  PAUSE exiting    yes
  Activate secondary control   no
secondary processor-based controls
  Virtualize APIC accesses no
  Enable EPT   no
  Descriptor-table exiting no
  Enable RDTSCP    no
  Virtualize x2APIC mode   no
  Enable VPID  no
  WBINVD exiting   no
  Unrestricted guest   no
  APIC register emulation  no
  Virtual interrupt delivery   no
  PAUSE-loop exiting   no
  RDRAND exiting   no
  Enable INVPCID   no
  Enable VM functions  no
  VMCS shadowing   no
  Enable ENCLS exiting no
  RDSEED exiting   no
  Enable PML   no
  EPT-violation #VE    no
  Conceal non-root operation from PT   no
  Enable XSAVES/XRSTORS    no
  Mode-based execute control (XS/XU)   no
  TSC scaling  no
VM-Exit controls
  Save debug controls  forced
  Host address-space size

Re: kvm_intel fails to load on Conroe CPUs running Linux 4.12

2017-08-17 Thread Gerhard Wiesinger


On 17.08.2017 22:58, Gerhard Wiesinger wrote:
>
> On 07.08.2017 19:50, Paolo Bonzini wrote:
>
> >Not much to say, unfortunately. It's pretty much the same capabilities
> >as a Prescott/Cedar Mill processor, except that it has MSR bitmaps. It
> >also lacks FlexPriority compared to the Conroe I had checked.
> >
> >It's not great that even the revert patch doesn't apply cleanly---this
> >is *not* necessarily a boring area of the hypervisor...
> >
> >Given the rarity of your machine I'm currently leaning towards _not_
> >reverting the change. I'll check another non-Xeon Core 2 tomorrow that
> >is from December 2008 (IIRC). If that one also lacks vNMI, or if I get
> >other reports, I suppose I will have to reconsider that.

Hello Paolo,

Can you please revert the patch.

CPU is a Core 2 Extreme QX6700: SL9UL (B3) running VERY stable with ECC 
RAM for years now.

https://ark.intel.com/products/28028/Intel-Core2-Extreme-Processor-QX6700-8M-Cache-2_66-GHz-1066-MHz-FSB?q=Core%202%20Extreme%20QX6700
https://en.wikipedia.org/wiki/List_of_Intel_Core_2_microprocessors

CPU details below.

Thank you.

Ciao,
Gerhard

cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Core(TM)2 Quad CPU   @ 2.66GHz
stepping    : 7
microcode   : 0x6a
cpu MHz : 1596.000
cache size  : 4096 KB
physical id : 0
siblings    : 4
core id : 0
cpu cores   : 4
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid
aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm 
lahf_lm tpr_shadow dtherm

bugs    :
bogomips    : 5333.45
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

Script output:
Basic VMX Information
  Hex: 0x1a0407
  Revision 7
  VMCS size    1024
  VMCS restricted to 32 bit addresses  no
  Dual-monitor support yes
  VMCS memory type 6
  INS/OUTS instruction information no
  IA32_VMX_TRUE_*_CTLS support no
pin-based controls
  External interrupt exiting   yes
  NMI exiting  yes
  Virtual NMIs no
  Activate VMX-preemption timer    no
  Process posted interrupts    no
primary processor-based controls
  Interrupt window exiting yes
  Use TSC offsetting   yes
  HLT exiting  yes
  INVLPG exiting   yes
  MWAIT exiting    yes
  RDPMC exiting    yes
  RDTSC exiting    yes
  CR3-load exiting forced
  CR3-store exiting    forced
  CR8-load exiting yes
  CR8-store exiting    yes
  Use TPR shadow   yes
  NMI-window exiting   no
  MOV-DR exiting   yes
  Unconditional I/O exiting    yes
  Use I/O bitmaps  yes
  Monitor trap flag    no
  Use MSR bitmaps  yes
  MONITOR exiting  yes
  PAUSE exiting    yes
  Activate secondary control   no
secondary processor-based controls
  Virtualize APIC accesses no
  Enable EPT   no
  Descriptor-table exiting no
  Enable RDTSCP    no
  Virtualize x2APIC mode   no
  Enable VPID  no
  WBINVD exiting   no
  Unrestricted guest   no
  APIC register emulation  no
  Virtual interrupt delivery   no
  PAUSE-loop exiting   no
  RDRAND exiting   no
  Enable INVPCID   no
  Enable VM functions  no
  VMCS shadowing   no
  Enable ENCLS exiting no
  RDSEED exiting   no
  Enable PML   no
  EPT-violation #VE    no
  Conceal non-root operation from PT   no
  Enable XSAVES/XRSTORS    no
  Mode-based execute control (XS/XU)   no
  TSC scaling  no
VM-Exit controls
  Save debug controls  forced
  Host address-space size

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-26 Thread Gerhard Wiesinger


On 23.03.2017 09:38, Mike Galbraith wrote:

On Thu, 2017-03-23 at 08:16 +0100, Gerhard Wiesinger wrote:

On 21.03.2017 08:13, Mike Galbraith wrote:

On Tue, 2017-03-21 at 06:59 +0100, Gerhard Wiesinger wrote:


Is this the correct information?

Incomplete, but enough to reiterate cgroup_disable=memory
suggestion.


How to collect complete information?

If Michal wants specifics, I suspect he'll ask.  I posted only to pass
along a speck of information, and offer a test suggestion.. twice.


Still OOM with cgroup_disable=memory, kernel 
4.11.0-0.rc3.git0.2.fc26.x86_64,I set vm.min_free_kbytes = 10240 in 
these tests.

# Full config
grep "vm\." /etc/sysctl.d/*
/etc/sysctl.d/00-dirty_background_ratio.conf:vm.dirty_background_ratio = 3
/etc/sysctl.d/00-dirty_ratio.conf:vm.dirty_ratio = 15
/etc/sysctl.d/00-kernel-vm-min-free-kbyzes.conf:vm.min_free_kbytes = 10240
/etc/sysctl.d/00-overcommit_memory.conf:vm.overcommit_memory = 2
/etc/sysctl.d/00-overcommit_ratio.conf:vm.overcommit_ratio = 80
/etc/sysctl.d/00-swappiness.conf:vm.swappiness=10

[31880.623557] sa1: page allocation stalls for 10942ms, order:0, 
mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null)

[31880.623623] sa1 cpuset=/ mems_allowed=0
[31880.623630] CPU: 1 PID: 17112 Comm: sa1 Not tainted 
4.11.0-0.rc3.git0.2.fc26.x86_64 #1
[31880.623724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3 04/01/2014

[31880.623819] Call Trace:
[31880.623893]  dump_stack+0x63/0x84
[31880.623971]  warn_alloc+0x10c/0x1b0
[31880.624046]  __alloc_pages_slowpath+0x93d/0xe60
[31880.624142]  ? get_page_from_freelist+0x122/0xbf0
[31880.624225]  ? unmap_region+0xf7/0x130
[31880.624312]  __alloc_pages_nodemask+0x290/0x2b0
[31880.624388]  alloc_pages_vma+0xa0/0x2b0
[31880.624463]  __handle_mm_fault+0x4d0/0x1160
[31880.624550]  handle_mm_fault+0xb3/0x250
[31880.624628]  __do_page_fault+0x23f/0x4c0
[31880.624701]  trace_do_page_fault+0x41/0x120
[31880.624781]  do_async_page_fault+0x51/0xa0
[31880.624866]  async_page_fault+0x28/0x30
[31880.624941] RIP: 0033:0x7f9218d4914f
[31880.625032] RSP: 002b:7ffe0d1376a8 EFLAGS: 00010206
[31880.625153] RAX: 7f9218d2a314 RBX: 7f9218f4e658 RCX: 
7f9218d2a354
[31880.625235] RDX: 05ec RSI:  RDI: 
7f9218d2a314
[31880.625324] RBP: 7ffe0d137950 R08: 7f9218d2a900 R09: 
00027000
[31880.625423] R10: 7ffe0d1376e0 R11: 7f9218d2a900 R12: 
0003
[31880.625505] R13: 7ffe0d137a38 R14: fd01 R15: 
0002

[31880.625688] Mem-Info:
[31880.625762] active_anon:36671 inactive_anon:36711 isolated_anon:88
active_file:1399 inactive_file:1410 isolated_file:0
unevictable:0 dirty:5 writeback:15 unstable:0
slab_reclaimable:3099 slab_unreclaimable:3558
mapped:2037 shmem:3 pagetables:3340 bounce:0
free:2972 free_pcp:102 free_cma:0
[31880.627334] Node 0 active_anon:146684kB inactive_anon:146816kB 
active_file:5596kB inactive_file:5572kB unevictable:0kB 
isolated(anon):368kB isolated(file):0kB mapped:8044kB dirty:20kB 
writeback:136kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 
12kB writeback_tmp:0kB unstable:0kB pages_scanned:82 all_unreclaimable? no
[31880.627606] Node 0 DMA free:1816kB min:440kB low:548kB high:656kB 
active_anon:5636kB inactive_anon:6844kB active_file:132kB 
inactive_file:148kB unevictable:0kB writepending:4kB present:15992kB 
managed:15908kB mlocked:0kB slab_reclaimable:284kB 
slab_unreclaimable:532kB kernel_stack:0kB pagetables:188kB bounce:0kB 
free_pcp:0kB local_pcp:0kB free_cma:0kB

[31880.627883] lowmem_reserve[]: 0 327 327 327 327
[31880.627959] Node 0 DMA32 free:10072kB min:9796kB low:12244kB 
high:14692kB active_anon:141048kB inactive_anon:14kB 
active_file:5432kB inactive_file:5444kB unevictable:0kB 
writepending:152kB present:376688kB managed:353760kB mlocked:0kB 
slab_reclaimable:12112kB slab_unreclaimable:13700kB kernel_stack:2464kB 
pagetables:13172kB bounce:0kB free_pcp:504kB local_pcp:272kB free_cma:0kB

[31880.628334] lowmem_reserve[]: 0 0 0 0 0
[31880.629882] Node 0 DMA: 33*4kB (UME) 24*8kB (UM) 26*16kB (UME) 4*32kB 
(UME) 5*64kB (UME) 1*128kB (E) 2*256kB (M) 0*512kB 0*1024kB 0*2048kB 
0*4096kB = 1828kB
[31880.632255] Node 0 DMA32: 174*4kB (UMEH) 107*8kB (UMEH) 96*16kB 
(UMEH) 59*32kB (UME) 30*64kB (UMEH) 8*128kB (UEH) 8*256kB (UMEH) 1*512kB 
(E) 0*1024kB 0*2048kB 0*4096kB = 10480kB
[31880.634344] Node 0 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=2048kB

[31880.634346] 7276 total pagecache pages
[31880.635277] 4367 pages in swap cache
[31880.636206] Swap cache stats: add 563, delete 5635551, find 
6573228/8496821

[31880.637145] Free swap  = 973736kB
[31880.638038] Total swap = 2064380kB
[31880.638988] 98170 pages RAM
[31880.640309] 0 pages HighMem/MovableOnly
[31880.641791] 5753 pages reserved
[31880.642908] 0 pages cma reserved
[31880.643978] 0 pages hwpoisoned

Wil

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-26 Thread Gerhard Wiesinger


On 23.03.2017 09:38, Mike Galbraith wrote:

On Thu, 2017-03-23 at 08:16 +0100, Gerhard Wiesinger wrote:

On 21.03.2017 08:13, Mike Galbraith wrote:

On Tue, 2017-03-21 at 06:59 +0100, Gerhard Wiesinger wrote:


Is this the correct information?

Incomplete, but enough to reiterate cgroup_disable=memory
suggestion.


How to collect complete information?

If Michal wants specifics, I suspect he'll ask.  I posted only to pass
along a speck of information, and offer a test suggestion.. twice.


Still OOM with cgroup_disable=memory, kernel 
4.11.0-0.rc3.git0.2.fc26.x86_64,I set vm.min_free_kbytes = 10240 in 
these tests.

# Full config
grep "vm\." /etc/sysctl.d/*
/etc/sysctl.d/00-dirty_background_ratio.conf:vm.dirty_background_ratio = 3
/etc/sysctl.d/00-dirty_ratio.conf:vm.dirty_ratio = 15
/etc/sysctl.d/00-kernel-vm-min-free-kbyzes.conf:vm.min_free_kbytes = 10240
/etc/sysctl.d/00-overcommit_memory.conf:vm.overcommit_memory = 2
/etc/sysctl.d/00-overcommit_ratio.conf:vm.overcommit_ratio = 80
/etc/sysctl.d/00-swappiness.conf:vm.swappiness=10

[31880.623557] sa1: page allocation stalls for 10942ms, order:0, 
mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null)

[31880.623623] sa1 cpuset=/ mems_allowed=0
[31880.623630] CPU: 1 PID: 17112 Comm: sa1 Not tainted 
4.11.0-0.rc3.git0.2.fc26.x86_64 #1
[31880.623724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3 04/01/2014

[31880.623819] Call Trace:
[31880.623893]  dump_stack+0x63/0x84
[31880.623971]  warn_alloc+0x10c/0x1b0
[31880.624046]  __alloc_pages_slowpath+0x93d/0xe60
[31880.624142]  ? get_page_from_freelist+0x122/0xbf0
[31880.624225]  ? unmap_region+0xf7/0x130
[31880.624312]  __alloc_pages_nodemask+0x290/0x2b0
[31880.624388]  alloc_pages_vma+0xa0/0x2b0
[31880.624463]  __handle_mm_fault+0x4d0/0x1160
[31880.624550]  handle_mm_fault+0xb3/0x250
[31880.624628]  __do_page_fault+0x23f/0x4c0
[31880.624701]  trace_do_page_fault+0x41/0x120
[31880.624781]  do_async_page_fault+0x51/0xa0
[31880.624866]  async_page_fault+0x28/0x30
[31880.624941] RIP: 0033:0x7f9218d4914f
[31880.625032] RSP: 002b:7ffe0d1376a8 EFLAGS: 00010206
[31880.625153] RAX: 7f9218d2a314 RBX: 7f9218f4e658 RCX: 
7f9218d2a354
[31880.625235] RDX: 05ec RSI:  RDI: 
7f9218d2a314
[31880.625324] RBP: 7ffe0d137950 R08: 7f9218d2a900 R09: 
00027000
[31880.625423] R10: 7ffe0d1376e0 R11: 7f9218d2a900 R12: 
0003
[31880.625505] R13: 7ffe0d137a38 R14: fd01 R15: 
0002

[31880.625688] Mem-Info:
[31880.625762] active_anon:36671 inactive_anon:36711 isolated_anon:88
active_file:1399 inactive_file:1410 isolated_file:0
unevictable:0 dirty:5 writeback:15 unstable:0
slab_reclaimable:3099 slab_unreclaimable:3558
mapped:2037 shmem:3 pagetables:3340 bounce:0
free:2972 free_pcp:102 free_cma:0
[31880.627334] Node 0 active_anon:146684kB inactive_anon:146816kB 
active_file:5596kB inactive_file:5572kB unevictable:0kB 
isolated(anon):368kB isolated(file):0kB mapped:8044kB dirty:20kB 
writeback:136kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 
12kB writeback_tmp:0kB unstable:0kB pages_scanned:82 all_unreclaimable? no
[31880.627606] Node 0 DMA free:1816kB min:440kB low:548kB high:656kB 
active_anon:5636kB inactive_anon:6844kB active_file:132kB 
inactive_file:148kB unevictable:0kB writepending:4kB present:15992kB 
managed:15908kB mlocked:0kB slab_reclaimable:284kB 
slab_unreclaimable:532kB kernel_stack:0kB pagetables:188kB bounce:0kB 
free_pcp:0kB local_pcp:0kB free_cma:0kB

[31880.627883] lowmem_reserve[]: 0 327 327 327 327
[31880.627959] Node 0 DMA32 free:10072kB min:9796kB low:12244kB 
high:14692kB active_anon:141048kB inactive_anon:14kB 
active_file:5432kB inactive_file:5444kB unevictable:0kB 
writepending:152kB present:376688kB managed:353760kB mlocked:0kB 
slab_reclaimable:12112kB slab_unreclaimable:13700kB kernel_stack:2464kB 
pagetables:13172kB bounce:0kB free_pcp:504kB local_pcp:272kB free_cma:0kB

[31880.628334] lowmem_reserve[]: 0 0 0 0 0
[31880.629882] Node 0 DMA: 33*4kB (UME) 24*8kB (UM) 26*16kB (UME) 4*32kB 
(UME) 5*64kB (UME) 1*128kB (E) 2*256kB (M) 0*512kB 0*1024kB 0*2048kB 
0*4096kB = 1828kB
[31880.632255] Node 0 DMA32: 174*4kB (UMEH) 107*8kB (UMEH) 96*16kB 
(UMEH) 59*32kB (UME) 30*64kB (UMEH) 8*128kB (UEH) 8*256kB (UMEH) 1*512kB 
(E) 0*1024kB 0*2048kB 0*4096kB = 10480kB
[31880.634344] Node 0 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=2048kB

[31880.634346] 7276 total pagecache pages
[31880.635277] 4367 pages in swap cache
[31880.636206] Swap cache stats: add 563, delete 5635551, find 
6573228/8496821

[31880.637145] Free swap  = 973736kB
[31880.638038] Total swap = 2064380kB
[31880.638988] 98170 pages RAM
[31880.640309] 0 pages HighMem/MovableOnly
[31880.641791] 5753 pages reserved
[31880.642908] 0 pages cma reserved
[31880.643978] 0 pages hwpoisoned

Wil

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-23 Thread Gerhard Wiesinger


On 21.03.2017 08:13, Mike Galbraith wrote:

On Tue, 2017-03-21 at 06:59 +0100, Gerhard Wiesinger wrote:


Is this the correct information?

Incomplete, but enough to reiterate cgroup_disable=memory suggestion.



How to collect complete information?

Thnx.

Ciao,
Gerhard

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-23 Thread Gerhard Wiesinger


On 21.03.2017 08:13, Mike Galbraith wrote:

On Tue, 2017-03-21 at 06:59 +0100, Gerhard Wiesinger wrote:


Is this the correct information?

Incomplete, but enough to reiterate cgroup_disable=memory suggestion.



How to collect complete information?

Thnx.

Ciao,
Gerhard

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-21 Thread Gerhard Wiesinger


On 20.03.2017 04:05, Mike Galbraith wrote:

On Sun, 2017-03-19 at 17:02 +0100, Gerhard Wiesinger wrote:


mount | grep cgroup

Just because controllers are mounted doesn't mean they're populated. To
check that, you want to look for directories under the mount points
with a non-empty 'tasks'.  You will find some, but memory cgroup
assignments would likely be most interesting for this thread.  You can
eliminate any diddling there by booting with cgroup_disable=memory.



Is this the correct information?

mount | grep "type cgroup" | cut -f 3 -d ' ' | while read LINE; do echo 
"";echo 
${LINE};ls -l ${LINE}; done


/sys/fs/cgroup/systemd
total 0
-rw-r--r--  1 root root 0 Mar 20 14:31 cgroup.clone_children
-rw-r--r--  1 root root 0 Mar 20 14:31 cgroup.procs
-r--r--r--  1 root root 0 Mar 20 14:31 cgroup.sane_behavior
drwxr-xr-x  2 root root 0 Mar 20 14:31 init.scope
-rw-r--r--  1 root root 0 Mar 20 14:31 notify_on_release
-rw-r--r--  1 root root 0 Mar 20 14:31 release_agent
drwxr-xr-x 60 root root 0 Mar 21 06:50 system.slice
-rw-r--r--  1 root root 0 Mar 20 14:31 tasks
drwxr-xr-x  4 root root 0 Mar 21 06:55 user.slice

/sys/fs/cgroup/net_cls,net_prio
total 0
-rw-r--r-- 1 root root 0 Mar 20 14:31 cgroup.clone_children
-rw-r--r-- 1 root root 0 Mar 20 14:31 cgroup.procs
-r--r--r-- 1 root root 0 Mar 20 14:31 cgroup.sane_behavior
-rw-r--r-- 1 root root 0 Mar 20 14:31 net_cls.classid
-rw-r--r-- 1 root root 0 Mar 20 14:31 net_prio.ifpriomap
-r--r--r-- 1 root root 0 Mar 20 14:31 net_prio.prioidx
-rw-r--r-- 1 root root 0 Mar 20 14:31 notify_on_release
-rw-r--r-- 1 root root 0 Mar 20 14:31 release_agent
-rw-r--r-- 1 root root 0 Mar 20 14:31 tasks

/sys/fs/cgroup/cpu,cpuacct
total 0
-rw-r--r-- 1 root root 0 Mar 20 14:31 cgroup.clone_children
-rw-r--r-- 1 root root 0 Mar 20 14:31 cgroup.procs
-r--r--r-- 1 root root 0 Mar 20 14:31 cgroup.sane_behavior
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.stat
-rw-r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage_all
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage_percpu
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage_percpu_sys
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage_percpu_user
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage_sys
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage_user
-rw-r--r-- 1 root root 0 Mar 20 14:31 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 Mar 20 14:31 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 Mar 20 14:31 cpu.shares
-r--r--r-- 1 root root 0 Mar 20 14:31 cpu.stat
drwxr-xr-x 2 root root 0 Mar 20 14:31 init.scope
-rw-r--r-- 1 root root 0 Mar 20 14:31 notify_on_release
-rw-r--r-- 1 root root 0 Mar 20 14:31 release_agent
drwxr-xr-x 2 root root 0 Mar 20 14:31 system.slice
-rw-r--r-- 1 root root 0 Mar 20 14:31 tasks
drwxr-xr-x 4 root root 0 Mar 21 06:55 user.slice

/sys/fs/cgroup/devices
total 0
-rw-r--r--  1 root root 0 Mar 20 14:31 cgroup.clone_children
-rw-r--r--  1 root root 0 Mar 20 14:31 cgroup.procs
-r--r--r--  1 root root 0 Mar 20 14:31 cgroup.sane_behavior
--w---  1 root root 0 Mar 20 14:31 devices.allow
--w---  1 root root 0 Mar 20 14:31 devices.deny
-r--r--r--  1 root root 0 Mar 20 14:31 devices.list
drwxr-xr-x  2 root root 0 Mar 20 14:31 init.scope
-rw-r--r--  1 root root 0 Mar 20 14:31 notify_on_release
-rw-r--r--  1 root root 0 Mar 20 14:31 release_agent
drwxr-xr-x 60 root root 0 Mar 21 06:50 system.slice
-rw-r--r--  1 root root 0 Mar 20 14:31 tasks
drwxr-xr-x  4 root root 0 Mar 21 06:55 user.slice

/sys/fs/cgroup/freezer
total 0
-rw-r--r-- 1 root root 0 Mar 20 14:31 cgroup.clone_children
-rw-r--r-- 1 root root 0 Mar 20 14:31 cgroup.procs
-r--r--r-- 1 root root 0 Mar 20 14:31 cgroup.sane_behavior
-rw-r--r-- 1 root root 0 Mar 20 14:31 notify_on_release
-rw-r--r-- 1 root root 0 Mar 20 14:31 release_agent
-rw-r--r-- 1 root root 0 Mar 20 14:31 tasks
===

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-21 Thread Gerhard Wiesinger


On 20.03.2017 04:05, Mike Galbraith wrote:

On Sun, 2017-03-19 at 17:02 +0100, Gerhard Wiesinger wrote:


mount | grep cgroup

Just because controllers are mounted doesn't mean they're populated. To
check that, you want to look for directories under the mount points
with a non-empty 'tasks'.  You will find some, but memory cgroup
assignments would likely be most interesting for this thread.  You can
eliminate any diddling there by booting with cgroup_disable=memory.



Is this the correct information?

mount | grep "type cgroup" | cut -f 3 -d ' ' | while read LINE; do echo 
"";echo 
${LINE};ls -l ${LINE}; done


/sys/fs/cgroup/systemd
total 0
-rw-r--r--  1 root root 0 Mar 20 14:31 cgroup.clone_children
-rw-r--r--  1 root root 0 Mar 20 14:31 cgroup.procs
-r--r--r--  1 root root 0 Mar 20 14:31 cgroup.sane_behavior
drwxr-xr-x  2 root root 0 Mar 20 14:31 init.scope
-rw-r--r--  1 root root 0 Mar 20 14:31 notify_on_release
-rw-r--r--  1 root root 0 Mar 20 14:31 release_agent
drwxr-xr-x 60 root root 0 Mar 21 06:50 system.slice
-rw-r--r--  1 root root 0 Mar 20 14:31 tasks
drwxr-xr-x  4 root root 0 Mar 21 06:55 user.slice

/sys/fs/cgroup/net_cls,net_prio
total 0
-rw-r--r-- 1 root root 0 Mar 20 14:31 cgroup.clone_children
-rw-r--r-- 1 root root 0 Mar 20 14:31 cgroup.procs
-r--r--r-- 1 root root 0 Mar 20 14:31 cgroup.sane_behavior
-rw-r--r-- 1 root root 0 Mar 20 14:31 net_cls.classid
-rw-r--r-- 1 root root 0 Mar 20 14:31 net_prio.ifpriomap
-r--r--r-- 1 root root 0 Mar 20 14:31 net_prio.prioidx
-rw-r--r-- 1 root root 0 Mar 20 14:31 notify_on_release
-rw-r--r-- 1 root root 0 Mar 20 14:31 release_agent
-rw-r--r-- 1 root root 0 Mar 20 14:31 tasks

/sys/fs/cgroup/cpu,cpuacct
total 0
-rw-r--r-- 1 root root 0 Mar 20 14:31 cgroup.clone_children
-rw-r--r-- 1 root root 0 Mar 20 14:31 cgroup.procs
-r--r--r-- 1 root root 0 Mar 20 14:31 cgroup.sane_behavior
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.stat
-rw-r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage_all
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage_percpu
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage_percpu_sys
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage_percpu_user
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage_sys
-r--r--r-- 1 root root 0 Mar 20 14:31 cpuacct.usage_user
-rw-r--r-- 1 root root 0 Mar 20 14:31 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 Mar 20 14:31 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 Mar 20 14:31 cpu.shares
-r--r--r-- 1 root root 0 Mar 20 14:31 cpu.stat
drwxr-xr-x 2 root root 0 Mar 20 14:31 init.scope
-rw-r--r-- 1 root root 0 Mar 20 14:31 notify_on_release
-rw-r--r-- 1 root root 0 Mar 20 14:31 release_agent
drwxr-xr-x 2 root root 0 Mar 20 14:31 system.slice
-rw-r--r-- 1 root root 0 Mar 20 14:31 tasks
drwxr-xr-x 4 root root 0 Mar 21 06:55 user.slice

/sys/fs/cgroup/devices
total 0
-rw-r--r--  1 root root 0 Mar 20 14:31 cgroup.clone_children
-rw-r--r--  1 root root 0 Mar 20 14:31 cgroup.procs
-r--r--r--  1 root root 0 Mar 20 14:31 cgroup.sane_behavior
--w---  1 root root 0 Mar 20 14:31 devices.allow
--w---  1 root root 0 Mar 20 14:31 devices.deny
-r--r--r--  1 root root 0 Mar 20 14:31 devices.list
drwxr-xr-x  2 root root 0 Mar 20 14:31 init.scope
-rw-r--r--  1 root root 0 Mar 20 14:31 notify_on_release
-rw-r--r--  1 root root 0 Mar 20 14:31 release_agent
drwxr-xr-x 60 root root 0 Mar 21 06:50 system.slice
-rw-r--r--  1 root root 0 Mar 20 14:31 tasks
drwxr-xr-x  4 root root 0 Mar 21 06:55 user.slice

/sys/fs/cgroup/freezer
total 0
-rw-r--r-- 1 root root 0 Mar 20 14:31 cgroup.clone_children
-rw-r--r-- 1 root root 0 Mar 20 14:31 cgroup.procs
-r--r--r-- 1 root root 0 Mar 20 14:31 cgroup.sane_behavior
-rw-r--r-- 1 root root 0 Mar 20 14:31 notify_on_release
-rw-r--r-- 1 root root 0 Mar 20 14:31 release_agent
-rw-r--r-- 1 root root 0 Mar 20 14:31 tasks
===

Re: Still OOM problems with 4.9er/4.10er kernels


On 19.03.2017 16:18, Michal Hocko wrote:

On Fri 17-03-17 21:08:31, Gerhard Wiesinger wrote:

On 17.03.2017 18:13, Michal Hocko wrote:

On Fri 17-03-17 17:37:48, Gerhard Wiesinger wrote:
[...]

Why does the kernel prefer to swapin/out and not use

a.) the free memory?

It will use all the free memory up to min watermark which is set up
based on min_free_kbytes.

Makes sense, how is /proc/sys/vm/min_free_kbytes default value calculated?

See init_per_zone_wmark_min


b.) the buffer/cache?

the memory reclaim is strongly biased towards page cache and we try to
avoid swapout as much as possible (see get_scan_count).

If I understand it correctly, swapping is preferred over dropping the
cache, right. Can this behaviour be changed to prefer dropping the
cache to some minimum amount?  Is this also configurable in a way?

No, we enforce swapping if the amount of free + file pages are below the
cumulative high watermark.


(As far as I remember e.g. kernel 2.4 dropped the caches well).


There is ~100M memory available but kernel swaps all the time ...

Any ideas?

Kernel: 4.9.14-200.fc25.x86_64

top - 17:33:43 up 28 min,  3 users,  load average: 3.58, 1.67, 0.89
Tasks: 145 total,   4 running, 141 sleeping,   0 stopped,   0 zombie
%Cpu(s): 19.1 us, 56.2 sy,  0.0 ni,  4.3 id, 13.4 wa, 2.0 hi,  0.3 si,  4.7
st
KiB Mem :   230076 total,61508 free,   123472 used,45096 buff/cache

procs ---memory-- ---swap-- -io -system--
--cpu-
  r  b   swpd   free   buff  cache   si   sobibo in   cs us sy id wa st
  3  5 303916  60372328  43864 27828  200 41420   236 6984 11138 11 47  6 
23 14

I am really surprised to see any reclaim at all. 26% of free memory
doesn't sound as if we should do a reclaim at all. Do you have an
unusual configuration of /proc/sys/vm/min_free_kbytes ? Or is there
anything running inside a memory cgroup with a small limit?

nothing special set regarding /proc/sys/vm/min_free_kbytes (default values),
detailed config below. Regarding cgroups, none of I know. How to check (I
guess nothing is set because cg* commands are not available)?

be careful because systemd started to use some controllers. You can
easily check cgroup mount points.


See below.




/proc/sys/vm/min_free_kbytes
45056

So at least 45M will be kept reserved for the system. Your data
indicated you had more memory. How does /proc/zoneinfo look like?
Btw. you seem to be using fc kernel, are there any patches applied on
top of Linus tree? Could you try to retest vanilla kernel?



System looks normally now, FYI (e.g. now permanent swapping)


free
  totalusedfree  shared buff/cache   
available

Mem: 349076  154112   41560 184 153404  148716
Swap:   2064380  831844 1232536

cat /proc/zoneinfo

Node 0, zone  DMA
  per-node stats
  nr_inactive_anon 9543
  nr_active_anon 22105
  nr_inactive_file 9877
  nr_active_file 13416
  nr_unevictable 0
  nr_isolated_anon 0
  nr_isolated_file 0
  nr_pages_scanned 0
  workingset_refault 1926013
  workingset_activate 707166
  workingset_nodereclaim 187276
  nr_anon_pages 11429
  nr_mapped6852
  nr_file_pages 46772
  nr_dirty 1
  nr_writeback 0
  nr_writeback_temp 0
  nr_shmem 46
  nr_shmem_hugepages 0
  nr_shmem_pmdmapped 0
  nr_anon_transparent_hugepages 0
  nr_unstable  0
  nr_vmscan_write 3319047
  nr_vmscan_immediate_reclaim 32363
  nr_dirtied   222115
  nr_written   3537529
  pages free 3110
min  27
low  33
high 39
   node_scanned  0
spanned  4095
present  3998
managed  3977
  nr_free_pages 3110
  nr_zone_inactive_anon 18
  nr_zone_active_anon 3
  nr_zone_inactive_file 51
  nr_zone_active_file 75
  nr_zone_unevictable 0
  nr_zone_write_pending 0
  nr_mlock 0
  nr_slab_reclaimable 214
  nr_slab_unreclaimable 289
  nr_page_table_pages 185
  nr_kernel_stack 16
  nr_bounce0
  nr_zspages   0
  numa_hit 1214071
  numa_miss0
  numa_foreign 0
  numa_interleave 0
  numa_local   1214071
  numa_other   0
  nr_free_cma  0
protection: (0, 306, 306, 306, 306)
  pagesets
cpu: 0
  count: 0
  high:  0
  batch: 1
  vm stats threshold: 4
cpu: 1
  count: 0
  high:  0
  batch: 1
  vm stats threshold: 4
  node_unreclaimable:  0
  start_pfn:   1
  node_inactive_ratio: 0
Node 0, zoneDMA32
  pages free 7921
min  546
low  682
high 818
   node_scanned  0
spanned  94172
present  94172
managed  83292
  nr_free_pages 7921
  nr_zone_inactive_anon 9525
  nr_zone_active_anon 22102
  nr_zone_inactive_file 9826
  nr_zone_active_file 13341

Re: Still OOM problems with 4.9er/4.10er kernels


On 19.03.2017 16:18, Michal Hocko wrote:

On Fri 17-03-17 21:08:31, Gerhard Wiesinger wrote:

On 17.03.2017 18:13, Michal Hocko wrote:

On Fri 17-03-17 17:37:48, Gerhard Wiesinger wrote:
[...]

Why does the kernel prefer to swapin/out and not use

a.) the free memory?

It will use all the free memory up to min watermark which is set up
based on min_free_kbytes.

Makes sense, how is /proc/sys/vm/min_free_kbytes default value calculated?

See init_per_zone_wmark_min


b.) the buffer/cache?

the memory reclaim is strongly biased towards page cache and we try to
avoid swapout as much as possible (see get_scan_count).

If I understand it correctly, swapping is preferred over dropping the
cache, right. Can this behaviour be changed to prefer dropping the
cache to some minimum amount?  Is this also configurable in a way?

No, we enforce swapping if the amount of free + file pages are below the
cumulative high watermark.


(As far as I remember e.g. kernel 2.4 dropped the caches well).


There is ~100M memory available but kernel swaps all the time ...

Any ideas?

Kernel: 4.9.14-200.fc25.x86_64

top - 17:33:43 up 28 min,  3 users,  load average: 3.58, 1.67, 0.89
Tasks: 145 total,   4 running, 141 sleeping,   0 stopped,   0 zombie
%Cpu(s): 19.1 us, 56.2 sy,  0.0 ni,  4.3 id, 13.4 wa, 2.0 hi,  0.3 si,  4.7
st
KiB Mem :   230076 total,61508 free,   123472 used,45096 buff/cache

procs ---memory-- ---swap-- -io -system--
--cpu-
  r  b   swpd   free   buff  cache   si   sobibo in   cs us sy id wa st
  3  5 303916  60372328  43864 27828  200 41420   236 6984 11138 11 47  6 
23 14

I am really surprised to see any reclaim at all. 26% of free memory
doesn't sound as if we should do a reclaim at all. Do you have an
unusual configuration of /proc/sys/vm/min_free_kbytes ? Or is there
anything running inside a memory cgroup with a small limit?

nothing special set regarding /proc/sys/vm/min_free_kbytes (default values),
detailed config below. Regarding cgroups, none of I know. How to check (I
guess nothing is set because cg* commands are not available)?

be careful because systemd started to use some controllers. You can
easily check cgroup mount points.


See below.




/proc/sys/vm/min_free_kbytes
45056

So at least 45M will be kept reserved for the system. Your data
indicated you had more memory. How does /proc/zoneinfo look like?
Btw. you seem to be using fc kernel, are there any patches applied on
top of Linus tree? Could you try to retest vanilla kernel?



System looks normally now, FYI (e.g. now permanent swapping)


free
  totalusedfree  shared buff/cache   
available

Mem: 349076  154112   41560 184 153404  148716
Swap:   2064380  831844 1232536

cat /proc/zoneinfo

Node 0, zone  DMA
  per-node stats
  nr_inactive_anon 9543
  nr_active_anon 22105
  nr_inactive_file 9877
  nr_active_file 13416
  nr_unevictable 0
  nr_isolated_anon 0
  nr_isolated_file 0
  nr_pages_scanned 0
  workingset_refault 1926013
  workingset_activate 707166
  workingset_nodereclaim 187276
  nr_anon_pages 11429
  nr_mapped6852
  nr_file_pages 46772
  nr_dirty 1
  nr_writeback 0
  nr_writeback_temp 0
  nr_shmem 46
  nr_shmem_hugepages 0
  nr_shmem_pmdmapped 0
  nr_anon_transparent_hugepages 0
  nr_unstable  0
  nr_vmscan_write 3319047
  nr_vmscan_immediate_reclaim 32363
  nr_dirtied   222115
  nr_written   3537529
  pages free 3110
min  27
low  33
high 39
   node_scanned  0
spanned  4095
present  3998
managed  3977
  nr_free_pages 3110
  nr_zone_inactive_anon 18
  nr_zone_active_anon 3
  nr_zone_inactive_file 51
  nr_zone_active_file 75
  nr_zone_unevictable 0
  nr_zone_write_pending 0
  nr_mlock 0
  nr_slab_reclaimable 214
  nr_slab_unreclaimable 289
  nr_page_table_pages 185
  nr_kernel_stack 16
  nr_bounce0
  nr_zspages   0
  numa_hit 1214071
  numa_miss0
  numa_foreign 0
  numa_interleave 0
  numa_local   1214071
  numa_other   0
  nr_free_cma  0
protection: (0, 306, 306, 306, 306)
  pagesets
cpu: 0
  count: 0
  high:  0
  batch: 1
  vm stats threshold: 4
cpu: 1
  count: 0
  high:  0
  batch: 1
  vm stats threshold: 4
  node_unreclaimable:  0
  start_pfn:   1
  node_inactive_ratio: 0
Node 0, zoneDMA32
  pages free 7921
min  546
low  682
high 818
   node_scanned  0
spanned  94172
present  94172
managed  83292
  nr_free_pages 7921
  nr_zone_inactive_anon 9525
  nr_zone_active_anon 22102
  nr_zone_inactive_file 9826
  nr_zone_active_file 13341

Re: Still OOM problems with 4.9er/4.10er kernels


On 17.03.2017 21:08, Gerhard Wiesinger wrote:

On 17.03.2017 18:13, Michal Hocko wrote:

On Fri 17-03-17 17:37:48, Gerhard Wiesinger wrote:
[...] 


4.11.0-0.rc2.git4.1.fc27.x86_64

There are also lockups after some runtime hours to 1 day:
Message from syslogd@myserver Mar 19 08:22:33 ...
 kernel:BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 
stuck for 18717s!


Message from syslogd@myserver at Mar 19 08:22:33 ...
 kernel:BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 
stuck for 18078s!


repeated a lot of times 

Ciao,
Gerhard

Re: Still OOM problems with 4.9er/4.10er kernels


On 17.03.2017 21:08, Gerhard Wiesinger wrote:

On 17.03.2017 18:13, Michal Hocko wrote:

On Fri 17-03-17 17:37:48, Gerhard Wiesinger wrote:
[...] 


4.11.0-0.rc2.git4.1.fc27.x86_64

There are also lockups after some runtime hours to 1 day:
Message from syslogd@myserver Mar 19 08:22:33 ...
 kernel:BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 
stuck for 18717s!


Message from syslogd@myserver at Mar 19 08:22:33 ...
 kernel:BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 
stuck for 18078s!


repeated a lot of times 

Ciao,
Gerhard

Re: Still OOM problems with 4.9er/4.10er kernels


On 17.03.2017 18:13, Michal Hocko wrote:

On Fri 17-03-17 17:37:48, Gerhard Wiesinger wrote:
[...]

Why does the kernel prefer to swapin/out and not use

a.) the free memory?

It will use all the free memory up to min watermark which is set up
based on min_free_kbytes.


Makes sense, how is /proc/sys/vm/min_free_kbytes default value calculated?




b.) the buffer/cache?

the memory reclaim is strongly biased towards page cache and we try to
avoid swapout as much as possible (see get_scan_count).


If I understand it correctly, swapping is preferred over dropping the 
cache, right. Can this behaviour be changed to prefer dropping the cache 
to some minimum amount?

Is this also configurable in a way?
(As far as I remember e.g. kernel 2.4 dropped the caches well).

  

There is ~100M memory available but kernel swaps all the time ...

Any ideas?

Kernel: 4.9.14-200.fc25.x86_64

top - 17:33:43 up 28 min,  3 users,  load average: 3.58, 1.67, 0.89
Tasks: 145 total,   4 running, 141 sleeping,   0 stopped,   0 zombie
%Cpu(s): 19.1 us, 56.2 sy,  0.0 ni,  4.3 id, 13.4 wa, 2.0 hi,  0.3 si,  4.7
st
KiB Mem :   230076 total,61508 free,   123472 used,45096 buff/cache

procs ---memory-- ---swap-- -io -system--
--cpu-
  r  b   swpd   free   buff  cache   si   sobibo in   cs us sy id wa st
  3  5 303916  60372328  43864 27828  200 41420   236 6984 11138 11 47  6 
23 14

I am really surprised to see any reclaim at all. 26% of free memory
doesn't sound as if we should do a reclaim at all. Do you have an
unusual configuration of /proc/sys/vm/min_free_kbytes ? Or is there
anything running inside a memory cgroup with a small limit?


nothing special set regarding /proc/sys/vm/min_free_kbytes (default 
values), detailed config below. Regarding cgroups, none of I know. How 
to check (I guess nothing is set because cg* commands are not available)?


cat /etc/sysctl.d/* | grep "^vm"
vm.dirty_background_ratio = 3
vm.dirty_ratio = 15
vm.overcommit_memory = 2
vm.overcommit_ratio = 80
vm.swappiness=10

find /proc/sys/vm -type f -exec echo {} \; -exec cat {} \;
/proc/sys/vm/admin_reserve_kbytes
8192
/proc/sys/vm/block_dump
0
/proc/sys/vm/compact_memory
cat: /proc/sys/vm/compact_memory: Permission denied
/proc/sys/vm/compact_unevictable_allowed
1
/proc/sys/vm/dirty_background_bytes
0
/proc/sys/vm/dirty_background_ratio
3
/proc/sys/vm/dirty_bytes
0
/proc/sys/vm/dirty_expire_centisecs
3000
/proc/sys/vm/dirty_ratio
15
/proc/sys/vm/dirty_writeback_centisecs
500
/proc/sys/vm/dirtytime_expire_seconds
43200
/proc/sys/vm/drop_caches
0
/proc/sys/vm/extfrag_threshold
500
/proc/sys/vm/hugepages_treat_as_movable
0
/proc/sys/vm/hugetlb_shm_group
0
/proc/sys/vm/laptop_mode
0
/proc/sys/vm/legacy_va_layout
0
/proc/sys/vm/lowmem_reserve_ratio
256 256 32  1
/proc/sys/vm/max_map_count
65530
/proc/sys/vm/memory_failure_early_kill
0
/proc/sys/vm/memory_failure_recovery
1
/proc/sys/vm/min_free_kbytes
45056
/proc/sys/vm/min_slab_ratio
5
/proc/sys/vm/min_unmapped_ratio
1
/proc/sys/vm/mmap_min_addr
65536
/proc/sys/vm/mmap_rnd_bits
28
/proc/sys/vm/mmap_rnd_compat_bits
8
/proc/sys/vm/nr_hugepages
0
/proc/sys/vm/nr_hugepages_mempolicy
0
/proc/sys/vm/nr_overcommit_hugepages
0
/proc/sys/vm/nr_pdflush_threads
0
/proc/sys/vm/numa_zonelist_order
default
/proc/sys/vm/oom_dump_tasks
1
/proc/sys/vm/oom_kill_allocating_task
0
/proc/sys/vm/overcommit_kbytes
0
/proc/sys/vm/overcommit_memory
2
/proc/sys/vm/overcommit_ratio
80
/proc/sys/vm/page-cluster
3
/proc/sys/vm/panic_on_oom
0
/proc/sys/vm/percpu_pagelist_fraction
0
/proc/sys/vm/stat_interval
1
/proc/sys/vm/stat_refresh
/proc/sys/vm/swappiness
10
/proc/sys/vm/user_reserve_kbytes
31036
/proc/sys/vm/vfs_cache_pressure
100
/proc/sys/vm/watermark_scale_factor
10
/proc/sys/vm/zone_reclaim_mode
0

Thnx.


Ciao,

Gerhard

Re: Still OOM problems with 4.9er/4.10er kernels


On 17.03.2017 18:13, Michal Hocko wrote:

On Fri 17-03-17 17:37:48, Gerhard Wiesinger wrote:
[...]

Why does the kernel prefer to swapin/out and not use

a.) the free memory?

It will use all the free memory up to min watermark which is set up
based on min_free_kbytes.


Makes sense, how is /proc/sys/vm/min_free_kbytes default value calculated?




b.) the buffer/cache?

the memory reclaim is strongly biased towards page cache and we try to
avoid swapout as much as possible (see get_scan_count).


If I understand it correctly, swapping is preferred over dropping the 
cache, right. Can this behaviour be changed to prefer dropping the cache 
to some minimum amount?

Is this also configurable in a way?
(As far as I remember e.g. kernel 2.4 dropped the caches well).

  

There is ~100M memory available but kernel swaps all the time ...

Any ideas?

Kernel: 4.9.14-200.fc25.x86_64

top - 17:33:43 up 28 min,  3 users,  load average: 3.58, 1.67, 0.89
Tasks: 145 total,   4 running, 141 sleeping,   0 stopped,   0 zombie
%Cpu(s): 19.1 us, 56.2 sy,  0.0 ni,  4.3 id, 13.4 wa, 2.0 hi,  0.3 si,  4.7
st
KiB Mem :   230076 total,61508 free,   123472 used,45096 buff/cache

procs ---memory-- ---swap-- -io -system--
--cpu-
  r  b   swpd   free   buff  cache   si   sobibo in   cs us sy id wa st
  3  5 303916  60372328  43864 27828  200 41420   236 6984 11138 11 47  6 
23 14

I am really surprised to see any reclaim at all. 26% of free memory
doesn't sound as if we should do a reclaim at all. Do you have an
unusual configuration of /proc/sys/vm/min_free_kbytes ? Or is there
anything running inside a memory cgroup with a small limit?


nothing special set regarding /proc/sys/vm/min_free_kbytes (default 
values), detailed config below. Regarding cgroups, none of I know. How 
to check (I guess nothing is set because cg* commands are not available)?


cat /etc/sysctl.d/* | grep "^vm"
vm.dirty_background_ratio = 3
vm.dirty_ratio = 15
vm.overcommit_memory = 2
vm.overcommit_ratio = 80
vm.swappiness=10

find /proc/sys/vm -type f -exec echo {} \; -exec cat {} \;
/proc/sys/vm/admin_reserve_kbytes
8192
/proc/sys/vm/block_dump
0
/proc/sys/vm/compact_memory
cat: /proc/sys/vm/compact_memory: Permission denied
/proc/sys/vm/compact_unevictable_allowed
1
/proc/sys/vm/dirty_background_bytes
0
/proc/sys/vm/dirty_background_ratio
3
/proc/sys/vm/dirty_bytes
0
/proc/sys/vm/dirty_expire_centisecs
3000
/proc/sys/vm/dirty_ratio
15
/proc/sys/vm/dirty_writeback_centisecs
500
/proc/sys/vm/dirtytime_expire_seconds
43200
/proc/sys/vm/drop_caches
0
/proc/sys/vm/extfrag_threshold
500
/proc/sys/vm/hugepages_treat_as_movable
0
/proc/sys/vm/hugetlb_shm_group
0
/proc/sys/vm/laptop_mode
0
/proc/sys/vm/legacy_va_layout
0
/proc/sys/vm/lowmem_reserve_ratio
256 256 32  1
/proc/sys/vm/max_map_count
65530
/proc/sys/vm/memory_failure_early_kill
0
/proc/sys/vm/memory_failure_recovery
1
/proc/sys/vm/min_free_kbytes
45056
/proc/sys/vm/min_slab_ratio
5
/proc/sys/vm/min_unmapped_ratio
1
/proc/sys/vm/mmap_min_addr
65536
/proc/sys/vm/mmap_rnd_bits
28
/proc/sys/vm/mmap_rnd_compat_bits
8
/proc/sys/vm/nr_hugepages
0
/proc/sys/vm/nr_hugepages_mempolicy
0
/proc/sys/vm/nr_overcommit_hugepages
0
/proc/sys/vm/nr_pdflush_threads
0
/proc/sys/vm/numa_zonelist_order
default
/proc/sys/vm/oom_dump_tasks
1
/proc/sys/vm/oom_kill_allocating_task
0
/proc/sys/vm/overcommit_kbytes
0
/proc/sys/vm/overcommit_memory
2
/proc/sys/vm/overcommit_ratio
80
/proc/sys/vm/page-cluster
3
/proc/sys/vm/panic_on_oom
0
/proc/sys/vm/percpu_pagelist_fraction
0
/proc/sys/vm/stat_interval
1
/proc/sys/vm/stat_refresh
/proc/sys/vm/swappiness
10
/proc/sys/vm/user_reserve_kbytes
31036
/proc/sys/vm/vfs_cache_pressure
100
/proc/sys/vm/watermark_scale_factor
10
/proc/sys/vm/zone_reclaim_mode
0

Thnx.


Ciao,

Gerhard

Re: Still OOM problems with 4.9er/4.10er kernels


On 16.03.2017 10:39, Michal Hocko wrote:

On Thu 16-03-17 02:23:18, l...@pengaru.com wrote:

On Thu, Mar 16, 2017 at 10:08:44AM +0100, Michal Hocko wrote:

On Thu 16-03-17 01:47:33, l...@pengaru.com wrote:
[...]

While on the topic of understanding allocation stalls, Philip Freeman recently
mailed linux-kernel with a similar report, and in his case there are plenty of
page cache pages.  It was also a GFP_HIGHUSER_MOVABLE 0-order allocation.

care to point me to the report?

http://lkml.iu.edu/hypermail/linux/kernel/1703.1/06360.html

Thanks. It is gone from my lkml mailbox. Could you CC me (and linux-mm) please?
  
  

I'm no MM expert, but it appears a bit broken for such a low-order allocation
to stall on the order of 10 seconds when there's plenty of reclaimable pages,
in addition to mostly unused and abundant swap space on SSD.

yes this might indeed signal a problem.

Well maybe I missed something obvious that a better informed eye will catch.

Nothing really obvious. There is indeed a lot of anonymous memory to
swap out. Almost no pages on file LRU lists (active_file:759
inactive_file:749) but 158783 total pagecache pages so we have to have a
lot of pages in the swap cache. I would probably have to see more data
to make a full picture.



Why does the kernel prefer to swapin/out and not use

a.) the free memory?

b.) the buffer/cache?

There is ~100M memory available but kernel swaps all the time ...

Any ideas?

Kernel: 4.9.14-200.fc25.x86_64

top - 17:33:43 up 28 min,  3 users,  load average: 3.58, 1.67, 0.89
Tasks: 145 total,   4 running, 141 sleeping,   0 stopped,   0 zombie
%Cpu(s): 19.1 us, 56.2 sy,  0.0 ni,  4.3 id, 13.4 wa, 2.0 hi,  0.3 si,  
4.7 st

KiB Mem :   230076 total,61508 free,   123472 used,45096 buff/cache

procs ---memory-- ---swap-- -io -system-- 
--cpu-
 r  b   swpd   free   buff  cache   si   sobibo in   cs us sy 
id wa st
 3  5 303916  60372328  43864 27828  200 41420   236 6984 11138 11 
47  6 23 14
 5  4 292852  52904756  58584 19600  448 48780   540 8088 10528 18 
61  1  7 13
 3  3 288792  49052   1152  65924 4856  576  9824  1100 4324 5720  7 
18  2 64  8
 2  2 283676  54160716  67604 6332  344 31740   964 3879 5055 12 34 
10 37  7
 3  3 286852  66712216  53136 28064 4832 56532  4920 9175 12625 10 
55 12 14 10
 2  0 299680  62428196  53316 36312 13164 54728 13212 16820 25283  
7 56 18 12  7
 1  1 300756  63220624  58160 17944 1260 24528  1304 5804 9302  3 
22 38 34  3


Thnx.


Ciao,

Gerhard

Re: Still OOM problems with 4.9er/4.10er kernels


On 16.03.2017 10:39, Michal Hocko wrote:

On Thu 16-03-17 02:23:18, l...@pengaru.com wrote:

On Thu, Mar 16, 2017 at 10:08:44AM +0100, Michal Hocko wrote:

On Thu 16-03-17 01:47:33, l...@pengaru.com wrote:
[...]

While on the topic of understanding allocation stalls, Philip Freeman recently
mailed linux-kernel with a similar report, and in his case there are plenty of
page cache pages.  It was also a GFP_HIGHUSER_MOVABLE 0-order allocation.

care to point me to the report?

http://lkml.iu.edu/hypermail/linux/kernel/1703.1/06360.html

Thanks. It is gone from my lkml mailbox. Could you CC me (and linux-mm) please?
  
  

I'm no MM expert, but it appears a bit broken for such a low-order allocation
to stall on the order of 10 seconds when there's plenty of reclaimable pages,
in addition to mostly unused and abundant swap space on SSD.

yes this might indeed signal a problem.

Well maybe I missed something obvious that a better informed eye will catch.

Nothing really obvious. There is indeed a lot of anonymous memory to
swap out. Almost no pages on file LRU lists (active_file:759
inactive_file:749) but 158783 total pagecache pages so we have to have a
lot of pages in the swap cache. I would probably have to see more data
to make a full picture.



Why does the kernel prefer to swapin/out and not use

a.) the free memory?

b.) the buffer/cache?

There is ~100M memory available but kernel swaps all the time ...

Any ideas?

Kernel: 4.9.14-200.fc25.x86_64

top - 17:33:43 up 28 min,  3 users,  load average: 3.58, 1.67, 0.89
Tasks: 145 total,   4 running, 141 sleeping,   0 stopped,   0 zombie
%Cpu(s): 19.1 us, 56.2 sy,  0.0 ni,  4.3 id, 13.4 wa, 2.0 hi,  0.3 si,  
4.7 st

KiB Mem :   230076 total,61508 free,   123472 used,45096 buff/cache

procs ---memory-- ---swap-- -io -system-- 
--cpu-
 r  b   swpd   free   buff  cache   si   sobibo in   cs us sy 
id wa st
 3  5 303916  60372328  43864 27828  200 41420   236 6984 11138 11 
47  6 23 14
 5  4 292852  52904756  58584 19600  448 48780   540 8088 10528 18 
61  1  7 13
 3  3 288792  49052   1152  65924 4856  576  9824  1100 4324 5720  7 
18  2 64  8
 2  2 283676  54160716  67604 6332  344 31740   964 3879 5055 12 34 
10 37  7
 3  3 286852  66712216  53136 28064 4832 56532  4920 9175 12625 10 
55 12 14 10
 2  0 299680  62428196  53316 36312 13164 54728 13212 16820 25283  
7 56 18 12  7
 1  1 300756  63220624  58160 17944 1260 24528  1304 5804 9302  3 
22 38 34  3


Thnx.


Ciao,

Gerhard

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-16 Thread Gerhard Wiesinger


On 02.03.2017 08:17, Minchan Kim wrote:

Hi Michal,

On Tue, Feb 28, 2017 at 09:12:24AM +0100, Michal Hocko wrote:

On Tue 28-02-17 14:17:23, Minchan Kim wrote:

On Mon, Feb 27, 2017 at 10:44:49AM +0100, Michal Hocko wrote:

On Mon 27-02-17 18:02:36, Minchan Kim wrote:
[...]

>From 9779a1c5d32e2edb64da5cdfcd6f9737b94a247a Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Mon, 27 Feb 2017 17:39:06 +0900
Subject: [PATCH] mm: use up highatomic before OOM kill

Not-Yet-Signed-off-by: Minchan Kim 
---
  mm/page_alloc.c | 14 --
  1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 614cd0397ce3..e073cca4969e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3549,16 +3549,6 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
*no_progress_loops = 0;
else
(*no_progress_loops)++;
-
-   /*
-* Make sure we converge to OOM if we cannot make any progress
-* several times in the row.
-*/
-   if (*no_progress_loops > MAX_RECLAIM_RETRIES) {
-   /* Before OOM, exhaust highatomic_reserve */
-   return unreserve_highatomic_pageblock(ac, true);
-   }
-
/*
 * Keep reclaiming pages while there is a chance this will lead
 * somewhere.  If none of the target zones can satisfy our allocation
@@ -3821,6 +3811,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
if (read_mems_allowed_retry(cpuset_mems_cookie))
goto retry_cpuset;
  
+	/* Before OOM, exhaust highatomic_reserve */

+   if (unreserve_highatomic_pageblock(ac, true))
+   goto retry;
+

OK, this can help for higher order requests when we do not exhaust all
the retries and fail on compaction but I fail to see how can this help
for order-0 requets which was what happened in this case. I am not
saying this is wrong, though.

The should_reclaim_retry can return false although no_progress_loop is less
than MAX_RECLAIM_RETRIES unless eligible zones has enough reclaimable pages
by the progress_loop.

Yes, sorry I should have been more clear. I was talking about this
particular case where we had a lot of reclaimable pages (a lot of
anonymous with the swap available).

This reports shows two problems. Why we see OOM 1) enough *free* pages and
2) enough *freeable* pages.

I just pointed out 1) and sent the patch to solve it.

About 2), one of my imaginary scenario is inactive anon list is full of
pinned pages so VM can unmap them successfully in shrink_page_list but fail
to free due to increased page refcount. In that case, the page will be added
to inactive anonymous LRU list again without activating so inactive_list_is_low
on anonymous LRU is always false. IOW, there is no deactivation from active 
list.

It's just my picture without no clue. ;-)


With latest kernels (4.11.0-0.rc2.git0.2.fc26.x86_64) I'm having the 
issue that swapping is active all the time after some runtime (~1day).


top - 07:30:17 up 1 day, 19:42,  1 user,  load average: 13.71, 16.98, 15.36
Tasks: 130 total,   2 running, 128 sleeping,   0 stopped, 0 zombie
%Cpu(s): 15.8 us, 33.5 sy,  0.0 ni,  3.9 id, 34.5 wa,  4.9 hi,  1.0 si,  
6.4 st

KiB Mem :   369700 total, 5484 free,   311556 used, 52660 buff/cache
KiB Swap:  2064380 total,  1187684 free,   876696 used. 20340 avail Mem

[root@smtp ~]# vmstat 1
procs ---memory-- ---swap-- -io -system-- 
--cpu-
 r  b   swpd   free   buff  cache   si   sobibo in   cs us sy 
id wa st
 3  1 876280   7132  16536  64840  238  226  1027   258 80   97  2  3 
83 11  1
 0  4 876140   3812  10520  64552 3676  168 11840  1100 2255 2582  7 
13  8 70  3
 0  3 875372   3628   4024  56160 5424   64 10004   476 2157 2580  2 
14  0 83  2
 0  4 875560  24056   2208  56296 9032 2180 39928  2388 4111 4549 10 
32  0 55  3
 2  2 875660   7540   5256  58220 5536 1604 48756  1864 4505 4196 12 
23  5 58  3
 0  3 875264   3664   2120  57596 2304  116 17904   560 2223 1825 15 
15  0 67  3
 0  2 875564   3800588  57856 1340 1068 14780  1184 1390 1364 12 
10  0 77  3
 1  2 875724   3740372  53988 3104  928 16884  1068 1560 1527  3 
12  0 83  3
 0  3 881096   3708532  52220 4604 5872 21004  6104 2752 2259  7 
18  5 67  2


The following commit is included in that version:
commit 710531320af876192d76b2c1f68190a1df941b02
Author: Michal Hocko 
Date:   Wed Feb 22 15:45:58 2017 -0800

mm, vmscan: cleanup lru size claculations

commit fd538803731e50367b7c59ce4ad3454426a3d671 upstream.

But still OOMs:
[157048.030760] clamscan: page allocation stalls for 19405ms, order:0, 
mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null)

[157048.031985] clamscan cpuset=/ mems_allowed=0
[157048.031993] CPU: 1 PID: 9597 Comm: clamscan Not tainted 
4.11.0-0.rc2.git0.2.fc26.x86_64 #1
[157048.033197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS

Re: Still OOM problems with 4.9er/4.10er kernels

2017-03-16 Thread Gerhard Wiesinger


On 02.03.2017 08:17, Minchan Kim wrote:

Hi Michal,

On Tue, Feb 28, 2017 at 09:12:24AM +0100, Michal Hocko wrote:

On Tue 28-02-17 14:17:23, Minchan Kim wrote:

On Mon, Feb 27, 2017 at 10:44:49AM +0100, Michal Hocko wrote:

On Mon 27-02-17 18:02:36, Minchan Kim wrote:
[...]

>From 9779a1c5d32e2edb64da5cdfcd6f9737b94a247a Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Mon, 27 Feb 2017 17:39:06 +0900
Subject: [PATCH] mm: use up highatomic before OOM kill

Not-Yet-Signed-off-by: Minchan Kim 
---
  mm/page_alloc.c | 14 --
  1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 614cd0397ce3..e073cca4969e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3549,16 +3549,6 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
*no_progress_loops = 0;
else
(*no_progress_loops)++;
-
-   /*
-* Make sure we converge to OOM if we cannot make any progress
-* several times in the row.
-*/
-   if (*no_progress_loops > MAX_RECLAIM_RETRIES) {
-   /* Before OOM, exhaust highatomic_reserve */
-   return unreserve_highatomic_pageblock(ac, true);
-   }
-
/*
 * Keep reclaiming pages while there is a chance this will lead
 * somewhere.  If none of the target zones can satisfy our allocation
@@ -3821,6 +3811,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
if (read_mems_allowed_retry(cpuset_mems_cookie))
goto retry_cpuset;
  
+	/* Before OOM, exhaust highatomic_reserve */

+   if (unreserve_highatomic_pageblock(ac, true))
+   goto retry;
+

OK, this can help for higher order requests when we do not exhaust all
the retries and fail on compaction but I fail to see how can this help
for order-0 requets which was what happened in this case. I am not
saying this is wrong, though.

The should_reclaim_retry can return false although no_progress_loop is less
than MAX_RECLAIM_RETRIES unless eligible zones has enough reclaimable pages
by the progress_loop.

Yes, sorry I should have been more clear. I was talking about this
particular case where we had a lot of reclaimable pages (a lot of
anonymous with the swap available).

This reports shows two problems. Why we see OOM 1) enough *free* pages and
2) enough *freeable* pages.

I just pointed out 1) and sent the patch to solve it.

About 2), one of my imaginary scenario is inactive anon list is full of
pinned pages so VM can unmap them successfully in shrink_page_list but fail
to free due to increased page refcount. In that case, the page will be added
to inactive anonymous LRU list again without activating so inactive_list_is_low
on anonymous LRU is always false. IOW, there is no deactivation from active 
list.

It's just my picture without no clue. ;-)


With latest kernels (4.11.0-0.rc2.git0.2.fc26.x86_64) I'm having the 
issue that swapping is active all the time after some runtime (~1day).


top - 07:30:17 up 1 day, 19:42,  1 user,  load average: 13.71, 16.98, 15.36
Tasks: 130 total,   2 running, 128 sleeping,   0 stopped, 0 zombie
%Cpu(s): 15.8 us, 33.5 sy,  0.0 ni,  3.9 id, 34.5 wa,  4.9 hi,  1.0 si,  
6.4 st

KiB Mem :   369700 total, 5484 free,   311556 used, 52660 buff/cache
KiB Swap:  2064380 total,  1187684 free,   876696 used. 20340 avail Mem

[root@smtp ~]# vmstat 1
procs ---memory-- ---swap-- -io -system-- 
--cpu-
 r  b   swpd   free   buff  cache   si   sobibo in   cs us sy 
id wa st
 3  1 876280   7132  16536  64840  238  226  1027   258 80   97  2  3 
83 11  1
 0  4 876140   3812  10520  64552 3676  168 11840  1100 2255 2582  7 
13  8 70  3
 0  3 875372   3628   4024  56160 5424   64 10004   476 2157 2580  2 
14  0 83  2
 0  4 875560  24056   2208  56296 9032 2180 39928  2388 4111 4549 10 
32  0 55  3
 2  2 875660   7540   5256  58220 5536 1604 48756  1864 4505 4196 12 
23  5 58  3
 0  3 875264   3664   2120  57596 2304  116 17904   560 2223 1825 15 
15  0 67  3
 0  2 875564   3800588  57856 1340 1068 14780  1184 1390 1364 12 
10  0 77  3
 1  2 875724   3740372  53988 3104  928 16884  1068 1560 1527  3 
12  0 83  3
 0  3 881096   3708532  52220 4604 5872 21004  6104 2752 2259  7 
18  5 67  2


The following commit is included in that version:
commit 710531320af876192d76b2c1f68190a1df941b02
Author: Michal Hocko 
Date:   Wed Feb 22 15:45:58 2017 -0800

mm, vmscan: cleanup lru size claculations

commit fd538803731e50367b7c59ce4ad3454426a3d671 upstream.

But still OOMs:
[157048.030760] clamscan: page allocation stalls for 19405ms, order:0, 
mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null)

[157048.031985] clamscan cpuset=/ mems_allowed=0
[157048.031993] CPU: 1 PID: 9597 Comm: clamscan Not tainted 
4.11.0-0.rc2.git0.2.fc26.x86_64 #1
[157048.033197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3 04/01/2014

[157048.034382] Call Trace:

Re: Still OOM problems with 4.9er/4.10er kernels

2017-02-27 Thread Gerhard Wiesinger


On 27.02.2017 09:27, Michal Hocko wrote:

On Sun 26-02-17 09:40:42, Gerhard Wiesinger wrote:

On 04.01.2017 10:11, Michal Hocko wrote:

The VM stops working (e.g. not pingable) after around 8h (will be restarted
automatically), happened serveral times.

Had also further OOMs which I sent to Mincham.

Could you post them to the mailing list as well, please?

Still OOMs on dnf update procedure with kernel 4.10: 4.10.0-1.fc26.x86_64 as
well on 4.9.9-200.fc25.x86_64

On 4.10er kernels:

[...]

kernel: Node 0 DMA32 free:5012kB min:2264kB low:2828kB high:3392kB
active_anon:143580kB inactive_anon:143300kB active_file:2576kB
inactive_file:2560kB unevictable:0kB writepending:0kB present:376688kB
managed:353968kB mlocked:0kB slab_reclaimable:13708kB
slab_unreclaimable:18064kB kernel_stack:2352kB pagetables:12888kB bounce:0kB
free_pcp:412kB local_pcp:88kB free_cma:0kB

[...]


On 4.9er kernels:

[...]

kernel: Node 0 DMA32 free:3356kB min:2668kB low:3332kB high:3996kB
active_anon:122148kB inactive_anon:112068kB active_file:81324kB
inactive_file:101972kB unevictable:0kB writepending:4648kB present:507760kB
managed:484384kB mlocked:0kB slab_reclaimable:17660kB
slab_unreclaimable:21404kB kernel_stack:2432kB pagetables:10124kB bounce:0kB
free_pcp:120kB local_pcp:0kB free_cma:0kB

In both cases the amount if free memory is above the min watermark, so
we shouldn't be hitting the oom. We might have somebody freeing memory
after the last attempt, though...

[...]

Should be very easy to reproduce with a low mem VM (e.g. 192MB) under KVM
with ext4 and Fedora 25 and some memory load and updating the VM.

Any further progress?

The linux-next (resp. mmotm tree) has new tracepoints which should help
to tell us more about what is going on here. Could you try to enable
oom/reclaim_retry_zone and vmscan/mm_vmscan_direct_reclaim_{begin,end}


Is this available in this version?

https://koji.fedoraproject.org/koji/buildinfo?buildID=862775

kernel-4.11.0-0.rc0.git5.1.fc26

How to enable?


Thnx.

Ciao,

gerhard

Re: Still OOM problems with 4.9er/4.10er kernels

2017-02-27 Thread Gerhard Wiesinger


On 27.02.2017 09:27, Michal Hocko wrote:

On Sun 26-02-17 09:40:42, Gerhard Wiesinger wrote:

On 04.01.2017 10:11, Michal Hocko wrote:

The VM stops working (e.g. not pingable) after around 8h (will be restarted
automatically), happened serveral times.

Had also further OOMs which I sent to Mincham.

Could you post them to the mailing list as well, please?

Still OOMs on dnf update procedure with kernel 4.10: 4.10.0-1.fc26.x86_64 as
well on 4.9.9-200.fc25.x86_64

On 4.10er kernels:

[...]

kernel: Node 0 DMA32 free:5012kB min:2264kB low:2828kB high:3392kB
active_anon:143580kB inactive_anon:143300kB active_file:2576kB
inactive_file:2560kB unevictable:0kB writepending:0kB present:376688kB
managed:353968kB mlocked:0kB slab_reclaimable:13708kB
slab_unreclaimable:18064kB kernel_stack:2352kB pagetables:12888kB bounce:0kB
free_pcp:412kB local_pcp:88kB free_cma:0kB

[...]


On 4.9er kernels:

[...]

kernel: Node 0 DMA32 free:3356kB min:2668kB low:3332kB high:3996kB
active_anon:122148kB inactive_anon:112068kB active_file:81324kB
inactive_file:101972kB unevictable:0kB writepending:4648kB present:507760kB
managed:484384kB mlocked:0kB slab_reclaimable:17660kB
slab_unreclaimable:21404kB kernel_stack:2432kB pagetables:10124kB bounce:0kB
free_pcp:120kB local_pcp:0kB free_cma:0kB

In both cases the amount if free memory is above the min watermark, so
we shouldn't be hitting the oom. We might have somebody freeing memory
after the last attempt, though...

[...]

Should be very easy to reproduce with a low mem VM (e.g. 192MB) under KVM
with ext4 and Fedora 25 and some memory load and updating the VM.

Any further progress?

The linux-next (resp. mmotm tree) has new tracepoints which should help
to tell us more about what is going on here. Could you try to enable
oom/reclaim_retry_zone and vmscan/mm_vmscan_direct_reclaim_{begin,end}


Is this available in this version?

https://koji.fedoraproject.org/koji/buildinfo?buildID=862775

kernel-4.11.0-0.rc0.git5.1.fc26

How to enable?


Thnx.

Ciao,

gerhard

Re: Still OOM problems with 4.9er/4.10er kernels

2017-02-26 Thread Gerhard Wiesinger


On 04.01.2017 10:11, Michal Hocko wrote:

The VM stops working (e.g. not pingable) after around 8h (will be restarted
automatically), happened serveral times.

Had also further OOMs which I sent to Mincham.

Could you post them to the mailing list as well, please?


Still OOMs on dnf update procedure with kernel 4.10: 
4.10.0-1.fc26.x86_64 as well on 4.9.9-200.fc25.x86_64


On 4.10er kernels:

Free swap  = 1137532kB

cat /etc/sysctl.d/* | grep ^vm
vm.dirty_background_ratio = 3
vm.dirty_ratio = 15
vm.overcommit_memory = 2
vm.overcommit_ratio = 80
vm.swappiness=10

kernel: python invoked oom-killer: 
gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, 
order=0, oom_score_adj=0

kernel: python cpuset=/ mems_allowed=0
kernel: CPU: 1 PID: 813 Comm: python Not tainted 4.10.0-1.fc26.x86_64 #1
kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3 04/01/2014

kernel: Call Trace:
kernel:  dump_stack+0x63/0x84
kernel:  dump_header+0x7b/0x1f6
kernel:  ? do_try_to_free_pages+0x2c5/0x340
kernel:  oom_kill_process+0x202/0x3d0
kernel:  out_of_memory+0x2b7/0x4e0
kernel:  __alloc_pages_slowpath+0x915/0xb80
kernel:  __alloc_pages_nodemask+0x218/0x2d0
kernel:  alloc_pages_current+0x93/0x150
kernel:  __page_cache_alloc+0xcf/0x100
kernel:  filemap_fault+0x39d/0x800
kernel:  ? page_add_file_rmap+0xe5/0x200
kernel:  ? filemap_map_pages+0x2e1/0x4e0
kernel:  ext4_filemap_fault+0x36/0x50
kernel:  __do_fault+0x21/0x110
kernel:  handle_mm_fault+0xdd1/0x1410
kernel:  ? swake_up+0x42/0x50
kernel:  __do_page_fault+0x23f/0x4c0
kernel:  trace_do_page_fault+0x41/0x120
kernel:  do_async_page_fault+0x51/0xa0
kernel:  async_page_fault+0x28/0x30
kernel: RIP: 0033:0x7f0681ad6350
kernel: RSP: 002b:7ffcbdd238d8 EFLAGS: 00010246
kernel: RAX: 7f0681b0f960 RBX:  RCX: 7fff
kernel: RDX:  RSI: 3ff0 RDI: 3ff0
kernel: RBP: 7f067461ab40 R08:  R09: 3ff0
kernel: R10: 556f1c6d8a80 R11: 0001 R12: 7f0676d1a8d0
kernel: R13:  R14: 7f06746168bc R15: 7f0674385910
kernel: Mem-Info:
kernel: active_anon:37423 inactive_anon:37512 isolated_anon:0
 active_file:462 inactive_file:603 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 slab_reclaimable:3538 slab_unreclaimable:4818
 mapped:859 shmem:9 pagetables:3370 bounce:0
 free:1650 free_pcp:103 free_cma:0
kernel: Node 0 active_anon:149380kB inactive_anon:149704kB 
active_file:1848kB inactive_file:3660kB unevictable:0kB 
isolated(anon):128kB isolated(file):0kB mapped:4580kB dirty:0kB 
writeback:380kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 
36kB writeback_tmp:0kB unstable:0kB pages_scanned:352 all_unreclaimable? no
kernel: Node 0 DMA free:1484kB min:104kB low:128kB high:152kB 
active_anon:5660kB inactive_anon:6156kB active_file:56kB 
inactive_file:64kB unevictable:0kB writepending:0kB present:15992kB 
managed:15908kB mlocked:0kB slab_reclaimable:444kB 
slab_unreclaimable:1208kB kernel_stack:32kB pagetables:592kB bounce:0kB 
free_pcp:0kB local_pcp:0kB free_cma:0kB

kernel: lowmem_reserve[]: 0 327 327 327 327
kernel: Node 0 DMA32 free:5012kB min:2264kB low:2828kB high:3392kB 
active_anon:143580kB inactive_anon:143300kB active_file:2576kB 
inactive_file:2560kB unevictable:0kB writepending:0kB present:376688kB 
managed:353968kB mlocked:0kB slab_reclaimable:13708kB 
slab_unreclaimable:18064kB kernel_stack:2352kB pagetables:12888kB 
bounce:0kB free_pcp:412kB local_pcp:88kB free_cma:0kB

kernel: lowmem_reserve[]: 0 0 0 0 0
kernel: Node 0 DMA: 70*4kB (UMEH) 20*8kB (UMEH) 13*16kB (MH) 5*32kB (H) 
4*64kB (H) 2*128kB (H) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
1576kB
kernel: Node 0 DMA32: 1134*4kB (UMEH) 25*8kB (UMEH) 13*16kB (MH) 7*32kB 
(H) 3*64kB (H) 0*128kB 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
5616kB
kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=2048kB

kernel: 6561 total pagecache pages
kernel: 5240 pages in swap cache
kernel: Swap cache stats: add 100078658, delete 100073419, find 
199458343/238460223

kernel: Free swap  = 1137532kB
kernel: Total swap = 2064380kB
kernel: 98170 pages RAM
kernel: 0 pages HighMem/MovableOnly
kernel: 5701 pages reserved
kernel: 0 pages cma reserved
kernel: 0 pages hwpoisoned
kernel: Out of memory: Kill process 11968 (clamscan) score 170 or 
sacrifice child
kernel: Killed process 11968 (clamscan) total-vm:538120kB, 
anon-rss:182220kB, file-rss:464kB, shmem-rss:0kB


On 4.9er kernels:

Free swap  = 1826688kB

cat /etc/sysctl.d/* | grep ^vm
vm.dirty_background_ratio=3
vm.dirty_ratio=15
vm.overcommit_memory=2
vm.overcommit_ratio=80
vm.swappiness=10

kernel: dnf invoked oom-killer: 
gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=0, 
order=0, oom_score_adj=0

kernel: dnf cpuset=/ mems_allowed=0
kernel: CPU: 0 PID: 20049 Comm: dnf Not tainted 4.9.9-200.fc25.x86_64 #1
kernel:

Re: Still OOM problems with 4.9er/4.10er kernels

2017-02-26 Thread Gerhard Wiesinger


On 04.01.2017 10:11, Michal Hocko wrote:

The VM stops working (e.g. not pingable) after around 8h (will be restarted
automatically), happened serveral times.

Had also further OOMs which I sent to Mincham.

Could you post them to the mailing list as well, please?


Still OOMs on dnf update procedure with kernel 4.10: 
4.10.0-1.fc26.x86_64 as well on 4.9.9-200.fc25.x86_64


On 4.10er kernels:

Free swap  = 1137532kB

cat /etc/sysctl.d/* | grep ^vm
vm.dirty_background_ratio = 3
vm.dirty_ratio = 15
vm.overcommit_memory = 2
vm.overcommit_ratio = 80
vm.swappiness=10

kernel: python invoked oom-killer: 
gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, 
order=0, oom_score_adj=0

kernel: python cpuset=/ mems_allowed=0
kernel: CPU: 1 PID: 813 Comm: python Not tainted 4.10.0-1.fc26.x86_64 #1
kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3 04/01/2014

kernel: Call Trace:
kernel:  dump_stack+0x63/0x84
kernel:  dump_header+0x7b/0x1f6
kernel:  ? do_try_to_free_pages+0x2c5/0x340
kernel:  oom_kill_process+0x202/0x3d0
kernel:  out_of_memory+0x2b7/0x4e0
kernel:  __alloc_pages_slowpath+0x915/0xb80
kernel:  __alloc_pages_nodemask+0x218/0x2d0
kernel:  alloc_pages_current+0x93/0x150
kernel:  __page_cache_alloc+0xcf/0x100
kernel:  filemap_fault+0x39d/0x800
kernel:  ? page_add_file_rmap+0xe5/0x200
kernel:  ? filemap_map_pages+0x2e1/0x4e0
kernel:  ext4_filemap_fault+0x36/0x50
kernel:  __do_fault+0x21/0x110
kernel:  handle_mm_fault+0xdd1/0x1410
kernel:  ? swake_up+0x42/0x50
kernel:  __do_page_fault+0x23f/0x4c0
kernel:  trace_do_page_fault+0x41/0x120
kernel:  do_async_page_fault+0x51/0xa0
kernel:  async_page_fault+0x28/0x30
kernel: RIP: 0033:0x7f0681ad6350
kernel: RSP: 002b:7ffcbdd238d8 EFLAGS: 00010246
kernel: RAX: 7f0681b0f960 RBX:  RCX: 7fff
kernel: RDX:  RSI: 3ff0 RDI: 3ff0
kernel: RBP: 7f067461ab40 R08:  R09: 3ff0
kernel: R10: 556f1c6d8a80 R11: 0001 R12: 7f0676d1a8d0
kernel: R13:  R14: 7f06746168bc R15: 7f0674385910
kernel: Mem-Info:
kernel: active_anon:37423 inactive_anon:37512 isolated_anon:0
 active_file:462 inactive_file:603 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 slab_reclaimable:3538 slab_unreclaimable:4818
 mapped:859 shmem:9 pagetables:3370 bounce:0
 free:1650 free_pcp:103 free_cma:0
kernel: Node 0 active_anon:149380kB inactive_anon:149704kB 
active_file:1848kB inactive_file:3660kB unevictable:0kB 
isolated(anon):128kB isolated(file):0kB mapped:4580kB dirty:0kB 
writeback:380kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 
36kB writeback_tmp:0kB unstable:0kB pages_scanned:352 all_unreclaimable? no
kernel: Node 0 DMA free:1484kB min:104kB low:128kB high:152kB 
active_anon:5660kB inactive_anon:6156kB active_file:56kB 
inactive_file:64kB unevictable:0kB writepending:0kB present:15992kB 
managed:15908kB mlocked:0kB slab_reclaimable:444kB 
slab_unreclaimable:1208kB kernel_stack:32kB pagetables:592kB bounce:0kB 
free_pcp:0kB local_pcp:0kB free_cma:0kB

kernel: lowmem_reserve[]: 0 327 327 327 327
kernel: Node 0 DMA32 free:5012kB min:2264kB low:2828kB high:3392kB 
active_anon:143580kB inactive_anon:143300kB active_file:2576kB 
inactive_file:2560kB unevictable:0kB writepending:0kB present:376688kB 
managed:353968kB mlocked:0kB slab_reclaimable:13708kB 
slab_unreclaimable:18064kB kernel_stack:2352kB pagetables:12888kB 
bounce:0kB free_pcp:412kB local_pcp:88kB free_cma:0kB

kernel: lowmem_reserve[]: 0 0 0 0 0
kernel: Node 0 DMA: 70*4kB (UMEH) 20*8kB (UMEH) 13*16kB (MH) 5*32kB (H) 
4*64kB (H) 2*128kB (H) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
1576kB
kernel: Node 0 DMA32: 1134*4kB (UMEH) 25*8kB (UMEH) 13*16kB (MH) 7*32kB 
(H) 3*64kB (H) 0*128kB 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
5616kB
kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=2048kB

kernel: 6561 total pagecache pages
kernel: 5240 pages in swap cache
kernel: Swap cache stats: add 100078658, delete 100073419, find 
199458343/238460223

kernel: Free swap  = 1137532kB
kernel: Total swap = 2064380kB
kernel: 98170 pages RAM
kernel: 0 pages HighMem/MovableOnly
kernel: 5701 pages reserved
kernel: 0 pages cma reserved
kernel: 0 pages hwpoisoned
kernel: Out of memory: Kill process 11968 (clamscan) score 170 or 
sacrifice child
kernel: Killed process 11968 (clamscan) total-vm:538120kB, 
anon-rss:182220kB, file-rss:464kB, shmem-rss:0kB


On 4.9er kernels:

Free swap  = 1826688kB

cat /etc/sysctl.d/* | grep ^vm
vm.dirty_background_ratio=3
vm.dirty_ratio=15
vm.overcommit_memory=2
vm.overcommit_ratio=80
vm.swappiness=10

kernel: dnf invoked oom-killer: 
gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=0, 
order=0, oom_score_adj=0

kernel: dnf cpuset=/ mems_allowed=0
kernel: CPU: 0 PID: 20049 Comm: dnf Not tainted 4.9.9-200.fc25.x86_64 #1
kernel:

Re: Still OOM problems with 4.9er kernels

2017-01-04 Thread Gerhard Wiesinger


On 23.12.2016 03:55, Minchan Kim wrote:

On Fri, Dec 09, 2016 at 04:52:07PM +0100, Gerhard Wiesinger wrote:

On 09.12.2016 14:40, Michal Hocko wrote:

On Fri 09-12-16 08:06:25, Gerhard Wiesinger wrote:

Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes
better).

./update.sh: line 40:  1591 Killed  ${EXE} update ${PARAMS}
(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Could you post your oom report please?

E.g. a new one with more than one included, first one after boot ...

Just setup a low mem VM under KVM and it is easily triggerable.

Still enough virtual memory available ...

4.9.0-0.rc8.git2.1.fc26.x86_64

[  624.862777] ksoftirqd/0: page allocation failure: order:0,
mode:0x2080020(GFP_ATOMIC)
[  624.863319] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[  624.863410] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.9.3
[  624.863510]  aa62c007f958 904774e3 90c7dd98

[  624.863923]  aa62c007f9e0 9020e6ea 020800200246
90c7dd98
[  624.864019]  aa62c007f980 96b90010 aa62c007f9f0
aa62c007f9a0
[  624.864998] Call Trace:
[  624.865149]  [] dump_stack+0x86/0xc3
[  624.865347]  [] warn_alloc+0x13a/0x170
[  624.865432]  [] __alloc_pages_slowpath+0x252/0xbb0
[  624.865563]  [] __alloc_pages_nodemask+0x40d/0x4b0
[  624.865675]  [] __alloc_page_frag+0x193/0x200
[  624.866024]  [] __napi_alloc_skb+0x8e/0xf0
[  624.866113]  [] page_to_skb.isra.28+0x5d/0x310
[virtio_net]
[  624.866201]  [] virtnet_receive+0x2db/0x9a0
[virtio_net]
[  624.867378]  [] virtnet_poll+0x1d/0x80 [virtio_net]
[  624.867494]  [] net_rx_action+0x23e/0x470
[  624.867612]  [] __do_softirq+0xcd/0x4b9
[  624.867704]  [] ? smpboot_thread_fn+0x34/0x1f0
[  624.867833]  [] ? smpboot_thread_fn+0x12d/0x1f0
[  624.867924]  [] run_ksoftirqd+0x25/0x80
[  624.868109]  [] smpboot_thread_fn+0x128/0x1f0
[  624.868197]  [] ? sort_range+0x30/0x30
[  624.868596]  [] kthread+0x102/0x120
[  624.868679]  [] ? wait_for_completion+0x110/0x140
[  624.868768]  [] ? kthread_park+0x60/0x60
[  624.868850]  [] ret_from_fork+0x2a/0x40
[  843.528656] httpd (2490) used greatest stack depth: 10304 bytes left
[  878.077750] httpd (2976) used greatest stack depth: 10096 bytes left
[93918.861109] netstat (14579) used greatest stack depth: 9488 bytes left
[94050.874669] kworker/dying (6253) used greatest stack depth: 9008 bytes
left
[95895.765570] kworker/1:1H: page allocation failure: order:0,
mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
[95895.765819] CPU: 1 PID: 440 Comm: kworker/1:1H Not tainted
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[95895.765911] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.9.3
[95895.766060] Workqueue: kblockd blk_mq_run_work_fn
[95895.766143]  aa62c0257628 904774e3 90c7dd98

[95895.766235]  aa62c02576b0 9020e6ea 022800200046
90c7dd98
[95895.766325]  aa62c0257650 96b90010 aa62c02576c0
aa62c0257670
[95895.766417] Call Trace:
[95895.766502]  [] dump_stack+0x86/0xc3
[95895.766596]  [] warn_alloc+0x13a/0x170
[95895.766681]  [] __alloc_pages_slowpath+0x252/0xbb0
[95895.766767]  [] __alloc_pages_nodemask+0x40d/0x4b0
[95895.766866]  [] alloc_pages_current+0xa1/0x1f0
[95895.766971]  [] ? _raw_spin_unlock+0x27/0x40
[95895.767073]  [] new_slab+0x316/0x7c0
[95895.767160]  [] ___slab_alloc+0x3fb/0x5c0
[95895.772611]  [] ? cpuacct_charge+0xf2/0x1f0
[95895.773406]  [] ? alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.774327]  [] ? rcu_read_lock_sched_held+0x45/0x80
[95895.775212]  [] ? alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.776155]  [] __slab_alloc+0x51/0x90
[95895.777090]  [] __kmalloc+0x251/0x320
[95895.781502]  [] ? alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.782309]  [] alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.783334]  [] virtqueue_add_sgs+0x1c3/0x4a0
[virtio_ring]
[95895.784059]  [] ? kvm_sched_clock_read+0x25/0x40
[95895.784742]  [] __virtblk_add_req+0xbc/0x220
[virtio_blk]
[95895.785419]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
[95895.786086]  [] ? virtio_queue_rq+0x105/0x290
[virtio_blk]
[95895.786750]  [] virtio_queue_rq+0x12d/0x290
[virtio_blk]
[95895.787427]  [] __blk_mq_run_hw_queue+0x26d/0x3b0
[95895.788106]  [] blk_mq_run_work_fn+0x12/0x20
[95895.789065]  [] process_one_work+0x23e/0x6f0
[95895.789741]  [] ? process_one_work+0x1ba/0x6f0
[95895.790444]  [] worker_thread+0x4e/0x490
[95895.791178]  [] ? process_one_work+0x6f0/0x6f0
[95895.791911]  [] ? process_one_work+0x6f0/0x6f0
[95895.792653]  [] ? do_syscall_64+0x6c/0x1f0
[95895.793397]  [] kthread+0x102/0x120
[95895.794212]  [] ? trace_hardirqs_on_caller+0xf5/0x1b0
[95895.794942]  [] ? kthread_park+0x60/0x60
[95895.795689

Re: Still OOM problems with 4.9er kernels

2017-01-04 Thread Gerhard Wiesinger


On 23.12.2016 03:55, Minchan Kim wrote:

On Fri, Dec 09, 2016 at 04:52:07PM +0100, Gerhard Wiesinger wrote:

On 09.12.2016 14:40, Michal Hocko wrote:

On Fri 09-12-16 08:06:25, Gerhard Wiesinger wrote:

Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes
better).

./update.sh: line 40:  1591 Killed  ${EXE} update ${PARAMS}
(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Could you post your oom report please?

E.g. a new one with more than one included, first one after boot ...

Just setup a low mem VM under KVM and it is easily triggerable.

Still enough virtual memory available ...

4.9.0-0.rc8.git2.1.fc26.x86_64

[  624.862777] ksoftirqd/0: page allocation failure: order:0,
mode:0x2080020(GFP_ATOMIC)
[  624.863319] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[  624.863410] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.9.3
[  624.863510]  aa62c007f958 904774e3 90c7dd98

[  624.863923]  aa62c007f9e0 9020e6ea 020800200246
90c7dd98
[  624.864019]  aa62c007f980 96b90010 aa62c007f9f0
aa62c007f9a0
[  624.864998] Call Trace:
[  624.865149]  [] dump_stack+0x86/0xc3
[  624.865347]  [] warn_alloc+0x13a/0x170
[  624.865432]  [] __alloc_pages_slowpath+0x252/0xbb0
[  624.865563]  [] __alloc_pages_nodemask+0x40d/0x4b0
[  624.865675]  [] __alloc_page_frag+0x193/0x200
[  624.866024]  [] __napi_alloc_skb+0x8e/0xf0
[  624.866113]  [] page_to_skb.isra.28+0x5d/0x310
[virtio_net]
[  624.866201]  [] virtnet_receive+0x2db/0x9a0
[virtio_net]
[  624.867378]  [] virtnet_poll+0x1d/0x80 [virtio_net]
[  624.867494]  [] net_rx_action+0x23e/0x470
[  624.867612]  [] __do_softirq+0xcd/0x4b9
[  624.867704]  [] ? smpboot_thread_fn+0x34/0x1f0
[  624.867833]  [] ? smpboot_thread_fn+0x12d/0x1f0
[  624.867924]  [] run_ksoftirqd+0x25/0x80
[  624.868109]  [] smpboot_thread_fn+0x128/0x1f0
[  624.868197]  [] ? sort_range+0x30/0x30
[  624.868596]  [] kthread+0x102/0x120
[  624.868679]  [] ? wait_for_completion+0x110/0x140
[  624.868768]  [] ? kthread_park+0x60/0x60
[  624.868850]  [] ret_from_fork+0x2a/0x40
[  843.528656] httpd (2490) used greatest stack depth: 10304 bytes left
[  878.077750] httpd (2976) used greatest stack depth: 10096 bytes left
[93918.861109] netstat (14579) used greatest stack depth: 9488 bytes left
[94050.874669] kworker/dying (6253) used greatest stack depth: 9008 bytes
left
[95895.765570] kworker/1:1H: page allocation failure: order:0,
mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
[95895.765819] CPU: 1 PID: 440 Comm: kworker/1:1H Not tainted
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[95895.765911] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.9.3
[95895.766060] Workqueue: kblockd blk_mq_run_work_fn
[95895.766143]  aa62c0257628 904774e3 90c7dd98

[95895.766235]  aa62c02576b0 9020e6ea 022800200046
90c7dd98
[95895.766325]  aa62c0257650 96b90010 aa62c02576c0
aa62c0257670
[95895.766417] Call Trace:
[95895.766502]  [] dump_stack+0x86/0xc3
[95895.766596]  [] warn_alloc+0x13a/0x170
[95895.766681]  [] __alloc_pages_slowpath+0x252/0xbb0
[95895.766767]  [] __alloc_pages_nodemask+0x40d/0x4b0
[95895.766866]  [] alloc_pages_current+0xa1/0x1f0
[95895.766971]  [] ? _raw_spin_unlock+0x27/0x40
[95895.767073]  [] new_slab+0x316/0x7c0
[95895.767160]  [] ___slab_alloc+0x3fb/0x5c0
[95895.772611]  [] ? cpuacct_charge+0xf2/0x1f0
[95895.773406]  [] ? alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.774327]  [] ? rcu_read_lock_sched_held+0x45/0x80
[95895.775212]  [] ? alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.776155]  [] __slab_alloc+0x51/0x90
[95895.777090]  [] __kmalloc+0x251/0x320
[95895.781502]  [] ? alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.782309]  [] alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.783334]  [] virtqueue_add_sgs+0x1c3/0x4a0
[virtio_ring]
[95895.784059]  [] ? kvm_sched_clock_read+0x25/0x40
[95895.784742]  [] __virtblk_add_req+0xbc/0x220
[virtio_blk]
[95895.785419]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
[95895.786086]  [] ? virtio_queue_rq+0x105/0x290
[virtio_blk]
[95895.786750]  [] virtio_queue_rq+0x12d/0x290
[virtio_blk]
[95895.787427]  [] __blk_mq_run_hw_queue+0x26d/0x3b0
[95895.788106]  [] blk_mq_run_work_fn+0x12/0x20
[95895.789065]  [] process_one_work+0x23e/0x6f0
[95895.789741]  [] ? process_one_work+0x1ba/0x6f0
[95895.790444]  [] worker_thread+0x4e/0x490
[95895.791178]  [] ? process_one_work+0x6f0/0x6f0
[95895.791911]  [] ? process_one_work+0x6f0/0x6f0
[95895.792653]  [] ? do_syscall_64+0x6c/0x1f0
[95895.793397]  [] kthread+0x102/0x120
[95895.794212]  [] ? trace_hardirqs_on_caller+0xf5/0x1b0
[95895.794942]  [] ? kthread_park+0x60/0x60
[95895.795689

Re: Still OOM problems with 4.9er kernels

2017-01-01 Thread Gerhard Wiesinger


On 23.12.2016 03:55, Minchan Kim wrote:

On Fri, Dec 09, 2016 at 04:52:07PM +0100, Gerhard Wiesinger wrote:

On 09.12.2016 14:40, Michal Hocko wrote:

On Fri 09-12-16 08:06:25, Gerhard Wiesinger wrote:

Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes
better).

./update.sh: line 40:  1591 Killed  ${EXE} update ${PARAMS}
(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Could you post your oom report please?

E.g. a new one with more than one included, first one after boot ...

Just setup a low mem VM under KVM and it is easily triggerable.

Still enough virtual memory available ...

4.9.0-0.rc8.git2.1.fc26.x86_64

[  624.862777] ksoftirqd/0: page allocation failure: order:0,
mode:0x2080020(GFP_ATOMIC)
[  624.863319] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[  624.863410] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.9.3
[  624.863510]  aa62c007f958 904774e3 90c7dd98

[  624.863923]  aa62c007f9e0 9020e6ea 020800200246
90c7dd98
[  624.864019]  aa62c007f980 96b90010 aa62c007f9f0
aa62c007f9a0
[  624.864998] Call Trace:
[  624.865149]  [] dump_stack+0x86/0xc3
[  624.865347]  [] warn_alloc+0x13a/0x170
[  624.865432]  [] __alloc_pages_slowpath+0x252/0xbb0
[  624.865563]  [] __alloc_pages_nodemask+0x40d/0x4b0
[  624.865675]  [] __alloc_page_frag+0x193/0x200
[  624.866024]  [] __napi_alloc_skb+0x8e/0xf0
[  624.866113]  [] page_to_skb.isra.28+0x5d/0x310
[virtio_net]
[  624.866201]  [] virtnet_receive+0x2db/0x9a0
[virtio_net]
[  624.867378]  [] virtnet_poll+0x1d/0x80 [virtio_net]
[  624.867494]  [] net_rx_action+0x23e/0x470
[  624.867612]  [] __do_softirq+0xcd/0x4b9
[  624.867704]  [] ? smpboot_thread_fn+0x34/0x1f0
[  624.867833]  [] ? smpboot_thread_fn+0x12d/0x1f0
[  624.867924]  [] run_ksoftirqd+0x25/0x80
[  624.868109]  [] smpboot_thread_fn+0x128/0x1f0
[  624.868197]  [] ? sort_range+0x30/0x30
[  624.868596]  [] kthread+0x102/0x120
[  624.868679]  [] ? wait_for_completion+0x110/0x140
[  624.868768]  [] ? kthread_park+0x60/0x60
[  624.868850]  [] ret_from_fork+0x2a/0x40
[  843.528656] httpd (2490) used greatest stack depth: 10304 bytes left
[  878.077750] httpd (2976) used greatest stack depth: 10096 bytes left
[93918.861109] netstat (14579) used greatest stack depth: 9488 bytes left
[94050.874669] kworker/dying (6253) used greatest stack depth: 9008 bytes
left
[95895.765570] kworker/1:1H: page allocation failure: order:0,
mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
[95895.765819] CPU: 1 PID: 440 Comm: kworker/1:1H Not tainted
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[95895.765911] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.9.3
[95895.766060] Workqueue: kblockd blk_mq_run_work_fn
[95895.766143]  aa62c0257628 904774e3 90c7dd98

[95895.766235]  aa62c02576b0 9020e6ea 022800200046
90c7dd98
[95895.766325]  aa62c0257650 96b90010 aa62c02576c0
aa62c0257670
[95895.766417] Call Trace:
[95895.766502]  [] dump_stack+0x86/0xc3
[95895.766596]  [] warn_alloc+0x13a/0x170
[95895.766681]  [] __alloc_pages_slowpath+0x252/0xbb0
[95895.766767]  [] __alloc_pages_nodemask+0x40d/0x4b0
[95895.766866]  [] alloc_pages_current+0xa1/0x1f0
[95895.766971]  [] ? _raw_spin_unlock+0x27/0x40
[95895.767073]  [] new_slab+0x316/0x7c0
[95895.767160]  [] ___slab_alloc+0x3fb/0x5c0
[95895.772611]  [] ? cpuacct_charge+0xf2/0x1f0
[95895.773406]  [] ? alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.774327]  [] ? rcu_read_lock_sched_held+0x45/0x80
[95895.775212]  [] ? alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.776155]  [] __slab_alloc+0x51/0x90
[95895.777090]  [] __kmalloc+0x251/0x320
[95895.781502]  [] ? alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.782309]  [] alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.783334]  [] virtqueue_add_sgs+0x1c3/0x4a0
[virtio_ring]
[95895.784059]  [] ? kvm_sched_clock_read+0x25/0x40
[95895.784742]  [] __virtblk_add_req+0xbc/0x220
[virtio_blk]
[95895.785419]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
[95895.786086]  [] ? virtio_queue_rq+0x105/0x290
[virtio_blk]
[95895.786750]  [] virtio_queue_rq+0x12d/0x290
[virtio_blk]
[95895.787427]  [] __blk_mq_run_hw_queue+0x26d/0x3b0
[95895.788106]  [] blk_mq_run_work_fn+0x12/0x20
[95895.789065]  [] process_one_work+0x23e/0x6f0
[95895.789741]  [] ? process_one_work+0x1ba/0x6f0
[95895.790444]  [] worker_thread+0x4e/0x490
[95895.791178]  [] ? process_one_work+0x6f0/0x6f0
[95895.791911]  [] ? process_one_work+0x6f0/0x6f0
[95895.792653]  [] ? do_syscall_64+0x6c/0x1f0
[95895.793397]  [] kthread+0x102/0x120
[95895.794212]  [] ? trace_hardirqs_on_caller+0xf5/0x1b0
[95895.794942]  [] ? kthread_park+0x60/0x60
[95895.795689

Re: Still OOM problems with 4.9er kernels

2017-01-01 Thread Gerhard Wiesinger


On 23.12.2016 03:55, Minchan Kim wrote:

On Fri, Dec 09, 2016 at 04:52:07PM +0100, Gerhard Wiesinger wrote:

On 09.12.2016 14:40, Michal Hocko wrote:

On Fri 09-12-16 08:06:25, Gerhard Wiesinger wrote:

Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes
better).

./update.sh: line 40:  1591 Killed  ${EXE} update ${PARAMS}
(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Could you post your oom report please?

E.g. a new one with more than one included, first one after boot ...

Just setup a low mem VM under KVM and it is easily triggerable.

Still enough virtual memory available ...

4.9.0-0.rc8.git2.1.fc26.x86_64

[  624.862777] ksoftirqd/0: page allocation failure: order:0,
mode:0x2080020(GFP_ATOMIC)
[  624.863319] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[  624.863410] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.9.3
[  624.863510]  aa62c007f958 904774e3 90c7dd98

[  624.863923]  aa62c007f9e0 9020e6ea 020800200246
90c7dd98
[  624.864019]  aa62c007f980 96b90010 aa62c007f9f0
aa62c007f9a0
[  624.864998] Call Trace:
[  624.865149]  [] dump_stack+0x86/0xc3
[  624.865347]  [] warn_alloc+0x13a/0x170
[  624.865432]  [] __alloc_pages_slowpath+0x252/0xbb0
[  624.865563]  [] __alloc_pages_nodemask+0x40d/0x4b0
[  624.865675]  [] __alloc_page_frag+0x193/0x200
[  624.866024]  [] __napi_alloc_skb+0x8e/0xf0
[  624.866113]  [] page_to_skb.isra.28+0x5d/0x310
[virtio_net]
[  624.866201]  [] virtnet_receive+0x2db/0x9a0
[virtio_net]
[  624.867378]  [] virtnet_poll+0x1d/0x80 [virtio_net]
[  624.867494]  [] net_rx_action+0x23e/0x470
[  624.867612]  [] __do_softirq+0xcd/0x4b9
[  624.867704]  [] ? smpboot_thread_fn+0x34/0x1f0
[  624.867833]  [] ? smpboot_thread_fn+0x12d/0x1f0
[  624.867924]  [] run_ksoftirqd+0x25/0x80
[  624.868109]  [] smpboot_thread_fn+0x128/0x1f0
[  624.868197]  [] ? sort_range+0x30/0x30
[  624.868596]  [] kthread+0x102/0x120
[  624.868679]  [] ? wait_for_completion+0x110/0x140
[  624.868768]  [] ? kthread_park+0x60/0x60
[  624.868850]  [] ret_from_fork+0x2a/0x40
[  843.528656] httpd (2490) used greatest stack depth: 10304 bytes left
[  878.077750] httpd (2976) used greatest stack depth: 10096 bytes left
[93918.861109] netstat (14579) used greatest stack depth: 9488 bytes left
[94050.874669] kworker/dying (6253) used greatest stack depth: 9008 bytes
left
[95895.765570] kworker/1:1H: page allocation failure: order:0,
mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
[95895.765819] CPU: 1 PID: 440 Comm: kworker/1:1H Not tainted
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[95895.765911] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.9.3
[95895.766060] Workqueue: kblockd blk_mq_run_work_fn
[95895.766143]  aa62c0257628 904774e3 90c7dd98

[95895.766235]  aa62c02576b0 9020e6ea 022800200046
90c7dd98
[95895.766325]  aa62c0257650 96b90010 aa62c02576c0
aa62c0257670
[95895.766417] Call Trace:
[95895.766502]  [] dump_stack+0x86/0xc3
[95895.766596]  [] warn_alloc+0x13a/0x170
[95895.766681]  [] __alloc_pages_slowpath+0x252/0xbb0
[95895.766767]  [] __alloc_pages_nodemask+0x40d/0x4b0
[95895.766866]  [] alloc_pages_current+0xa1/0x1f0
[95895.766971]  [] ? _raw_spin_unlock+0x27/0x40
[95895.767073]  [] new_slab+0x316/0x7c0
[95895.767160]  [] ___slab_alloc+0x3fb/0x5c0
[95895.772611]  [] ? cpuacct_charge+0xf2/0x1f0
[95895.773406]  [] ? alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.774327]  [] ? rcu_read_lock_sched_held+0x45/0x80
[95895.775212]  [] ? alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.776155]  [] __slab_alloc+0x51/0x90
[95895.777090]  [] __kmalloc+0x251/0x320
[95895.781502]  [] ? alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.782309]  [] alloc_indirect.isra.11+0x1d/0x50
[virtio_ring]
[95895.783334]  [] virtqueue_add_sgs+0x1c3/0x4a0
[virtio_ring]
[95895.784059]  [] ? kvm_sched_clock_read+0x25/0x40
[95895.784742]  [] __virtblk_add_req+0xbc/0x220
[virtio_blk]
[95895.785419]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
[95895.786086]  [] ? virtio_queue_rq+0x105/0x290
[virtio_blk]
[95895.786750]  [] virtio_queue_rq+0x12d/0x290
[virtio_blk]
[95895.787427]  [] __blk_mq_run_hw_queue+0x26d/0x3b0
[95895.788106]  [] blk_mq_run_work_fn+0x12/0x20
[95895.789065]  [] process_one_work+0x23e/0x6f0
[95895.789741]  [] ? process_one_work+0x1ba/0x6f0
[95895.790444]  [] worker_thread+0x4e/0x490
[95895.791178]  [] ? process_one_work+0x6f0/0x6f0
[95895.791911]  [] ? process_one_work+0x6f0/0x6f0
[95895.792653]  [] ? do_syscall_64+0x6c/0x1f0
[95895.793397]  [] kthread+0x102/0x120
[95895.794212]  [] ? trace_hardirqs_on_caller+0xf5/0x1b0
[95895.794942]  [] ? kthread_park+0x60/0x60
[95895.795689

Re: Still OOM problems with 4.9er kernels

2016-12-10 Thread Gerhard Wiesinger

On 09.12.2016 22:42, Vlastimil Babka wrote:

On 12/09/2016 07:01 PM, Gerhard Wiesinger wrote:

On 09.12.2016 18:30, Michal Hocko wrote:

On Fri 09-12-16 17:58:14, Gerhard Wiesinger wrote:

On 09.12.2016 17:09, Michal Hocko wrote:

[...]

[97883.882611] Mem-Info:
[97883.883747] active_anon:2915 inactive_anon:3376 isolated_anon:0
active_file:3902 inactive_file:3639 isolated_file:0
unevictable:0 dirty:205 writeback:0 unstable:0
slab_reclaimable:9856 slab_unreclaimable:9682
mapped:3722 shmem:59 pagetables:2080 bounce:0
free:748 free_pcp:15 free_cma:0

there is still some page cache which doesn't seem to be neither dirty
nor under writeback. So it should be theoretically reclaimable but for
some reason we cannot seem to reclaim that memory.
There is still some anonymous memory and free swap so we could reclaim
it as well but it all seems pretty down and the memory pressure is
really large

Yes, it might be large on the update situation, but that should be handled
by a virtual memory system by the kernel, right?

Well this is what we try and call it memory reclaim. But if we are not
able to reclaim anything then we eventually have to give up and trigger
the OOM killer.

I'm not familiar with the Linux implementation of the VM system in
detail. But can't you reserve as much memory for the kernel (non
pageable) at least that you can swap everything out (even without
killing a process at least as long there is enough swap available, which
should be in all of my cases)?

We don't have such bulletproof reserves. In this case the amount of
anonymous memory that can be swapped out is relatively low, and either
something is pinning it in memory, or it's being swapped back in quickly.

Now the information that 4.4 made a difference is
interesting. I do not really see any major differences in the reclaim
between 4.3 and 4.4 kernels. The reason might be somewhere else as well.
E.g. some of the subsystem consumes much more memory than before.

Just curious, what kind of filesystem are you using?

I'm using ext4 only with virt-* drivers (storage, network). But it is
definitly a virtual memory allocation/swap usage issue.

Could you try some
additional debugging. Enabling reclaim related tracepoints might tell us
more. The following should tell us more
mount -t tracefs none /trace
echo 1 > /trace/events/vmscan/enable
echo 1 > /trace/events/writeback/writeback_congestion_wait/enable
cat /trace/trace_pipe > trace.log

Collecting /proc/vmstat over time might be helpful as well
mkdir logs
while true
do
cp /proc/vmstat vmstat.$(date +%s)
sleep 1s
done

Activated it. But I think it should be very easy to trigger also on your
side. A very small configured VM with a program running RAM
allocations/writes (I guess you have some testing programs already)
should be sufficient to trigger it. You can also use the attached
program which I used to trigger such situations some years ago. If it
doesn't help try to reduce the available CPU for the VM and also I/O
(e.g. use all CPU/IO on the host or other VMs).

Well it's not really a surprise that if the VM is small enough and
workload large enough, OOM killer will kick in. The exact threshold
might have changed between kernel versions for a number of possible reasons.

IMHO: The OOM killer should NOT kick in even on the highest workloads if
there is swap available.

https://www.spinics.net/lists/linux-mm/msg113665.html

Yeah, but I do think that "oom when you have 156MB free and 7GB
reclaimable, and haven't even tried swapping" counts as obviously
wrong.

So Linus also thinks that trying swapping is a must have. And there always was
enough swap available in my cases. Then it should swap out/swapin all the time
(which worked well in kernel 2.4/2.6 times).

Another topic: Why does the kernel prefer to swap in/swap out instead of
use cache pages/buffers (see vmstat 1 output below)?

BTW: Don't know if you have seen also my original message on the kernel
mailinglist only:

Linus had also OOM problems with 1kB RAM requests and a lot of free RAM
(use a translation service for the german page):
https://lkml.org/lkml/2016/11/30/64
https://marius.bloggt-in-braunschweig.de/2016/11/17/linuxkernel-4-74-8-und-der-oom-killer/
https://www.spinics.net/lists/linux-mm/msg113661.html

Yeah we were involved in the last one. The regressions were about
high-order allocations
though (the 1kB premise turned out to be misinterpretation) and there
were regressions
for those in 4.7/4.8. But yours are order-0.

With kernel 4.7./4.8 it was really reaproduceable at every dnf update.
With 4.9rc8 it has been much much better. So something must have
changed, too.

As far as I understood it the order is 2^order kB pagesize. I don't
think it makes a difference when swap is not used which order the memory
allocation request is.

BTW: What were the commit that introduced

Re: Still OOM problems with 4.9er kernels

2016-12-10 Thread Gerhard Wiesinger

On 09.12.2016 22:42, Vlastimil Babka wrote:

On 12/09/2016 07:01 PM, Gerhard Wiesinger wrote:

On 09.12.2016 18:30, Michal Hocko wrote:

On Fri 09-12-16 17:58:14, Gerhard Wiesinger wrote:

On 09.12.2016 17:09, Michal Hocko wrote:

[...]

Yes, it might be large on the update situation, but that should be handled
by a virtual memory system by the kernel, right?

Well this is what we try and call it memory reclaim. But if we are not
able to reclaim anything then we eventually have to give up and trigger
the OOM killer.

Just curious, what kind of filesystem are you using?

I'm using ext4 only with virt-* drivers (storage, network). But it is
definitly a virtual memory allocation/swap usage issue.

Collecting /proc/vmstat over time might be helpful as well
mkdir logs
while true
do
cp /proc/vmstat vmstat.$(date +%s)
sleep 1s
done

IMHO: The OOM killer should NOT kick in even on the highest workloads if
there is swap available.

https://www.spinics.net/lists/linux-mm/msg113665.html

Yeah, but I do think that "oom when you have 156MB free and 7GB
reclaimable, and haven't even tried swapping" counts as obviously
wrong.

Another topic: Why does the kernel prefer to swap in/swap out instead of
use cache pages/buffers (see vmstat 1 output below)?

BTW: Don't know if you have seen also my original message on the kernel
mailinglist only:

With kernel 4.7./4.8 it was really reaproduceable at every dnf update.
With 4.9rc8 it has been much much better. So something must have
changed, too.

As far as I understood it the order is 2^order kB pagesize. I don't
think it makes a difference when swap is not used which order the memory
allocation request is.

BTW: What were the commit that introduced

Re: Still OOM problems with 4.9er kernels


On 09.12.2016 18:30, Michal Hocko wrote:

On Fri 09-12-16 17:58:14, Gerhard Wiesinger wrote:

On 09.12.2016 17:09, Michal Hocko wrote:

[...]

[97883.882611] Mem-Info:
[97883.883747] active_anon:2915 inactive_anon:3376 isolated_anon:0
  active_file:3902 inactive_file:3639 isolated_file:0
  unevictable:0 dirty:205 writeback:0 unstable:0
  slab_reclaimable:9856 slab_unreclaimable:9682
  mapped:3722 shmem:59 pagetables:2080 bounce:0
  free:748 free_pcp:15 free_cma:0

there is still some page cache which doesn't seem to be neither dirty
nor under writeback. So it should be theoretically reclaimable but for
some reason we cannot seem to reclaim that memory.
There is still some anonymous memory and free swap so we could reclaim
it as well but it all seems pretty down and the memory pressure is
really large

Yes, it might be large on the update situation, but that should be handled
by a virtual memory system by the kernel, right?

Well this is what we try and call it memory reclaim. But if we are not
able to reclaim anything then we eventually have to give up and trigger
the OOM killer.


I'm not familiar with the Linux implementation of the VM system in 
detail. But can't you reserve as much memory for the kernel (non 
pageable) at least that you can swap everything out (even without 
killing a process at least as long there is enough swap available, which 
should be in all of my cases)?




  Now the information that 4.4 made a difference is
interesting. I do not really see any major differences in the reclaim
between 4.3 and 4.4 kernels. The reason might be somewhere else as well.
E.g. some of the subsystem consumes much more memory than before.

Just curious, what kind of filesystem are you using?


I'm using ext4 only with virt-* drivers (storage, network). But it is 
definitly a virtual memory allocation/swap usage issue.



  Could you try some
additional debugging. Enabling reclaim related tracepoints might tell us
more. The following should tell us more
mount -t tracefs none /trace
echo 1 > /trace/events/vmscan/enable
echo 1 > /trace/events/writeback/writeback_congestion_wait/enable
cat /trace/trace_pipe > trace.log

Collecting /proc/vmstat over time might be helpful as well
mkdir logs
while true
do
cp /proc/vmstat vmstat.$(date +%s)
sleep 1s
done


Activated it. But I think it should be very easy to trigger also on your 
side. A very small configured VM with a program running RAM 
allocations/writes (I guess you have some testing programs already) 
should be sufficient to trigger it. You can also use the attached 
program which I used to trigger such situations some years ago. If it 
doesn't help try to reduce the available CPU for the VM and also I/O 
(e.g. use all CPU/IO on the host or other VMs).


BTW: Don't know if you have seen also my original message on the kernel 
mailinglist only:


Linus had also OOM problems with 1kB RAM requests and a lot of free RAM 
(use a translation service for the german page):

https://lkml.org/lkml/2016/11/30/64
https://marius.bloggt-in-braunschweig.de/2016/11/17/linuxkernel-4-74-8-und-der-oom-killer/
https://www.spinics.net/lists/linux-mm/msg113661.html

Thnx.

Ciao,
Gerhard

// mallocsleep.c
#include 
#include 
#include 

typedef unsigned int BOOL;
typedef char* PCHAR;
typedef unsigned int DWORD;
typedef unsigned long DDWORD;

#define FALSE 0
#define TRUE 1

BOOL getlong(PCHAR s, DDWORD* retvalue)
{
  char *eptr;
  long value;

  value=strtoll(s,,0);
  if ((eptr == s)||(*eptr != '\0')) return FALSE;
  if (value < 0) return FALSE;
  *retvalue = value;
  return TRUE;
}

int main(int argc, char* argv[])
{
  unsigned long* p;
  unsigned long size = 16*1024*1024;
  unsigned long size_of = sizeof(*p);
  unsigned long i;
  unsigned long sleep_allocated = 3600;
  unsigned long sleep_freed = 3600;

  if (argc > 1)
  {
if (!getlong(argv[1], ))
{
  printf("Wrong memsize!\n");
  exit(1);
}
  }

  if (argc > 2)
  {
if (!getlong(argv[2], _allocated))
{
  printf("Wrong sleep_allocated time!\n");
  exit(1);
}
  }

  if (argc > 3)
  {
if (!getlong(argv[3], _freed))
{
  printf("Wrong sleep_freed time!\n");
  exit(1);
}
  }

  printf("size=%lu, size_of=%lu\n", size, size_of);
  fflush(stdout);

  p = malloc(size);
  if (!p)
  {
printf("Could not allocate memory!\n");
exit(2);
  }

  printf("malloc done, writing to memory, p=%p ...\n", (void*)p);
  fflush(stdout);

  for(i = 0;i < (size/size_of);i++) p[i]=i;

  printf("writing to memory done, sleeping for %lu seconds ...\n", 
sleep_allocated);

  fflush(stdout);

  sleep(sleep_allocated);

  printf("sleeping done, freeing ...\n");
  fflush(stdout);

  free(p);

  printf("freeing done, sleeping for %lu seconds ...\n", sleep_freed);
  fflush(stdout);

  sleep(sleep_freed);

  printf("sleeping done, exitiing ...\n");
  fflush(stdout);

  exit(0);
  return 0;
}

Re: Still OOM problems with 4.9er kernels


On 09.12.2016 18:30, Michal Hocko wrote:

On Fri 09-12-16 17:58:14, Gerhard Wiesinger wrote:

On 09.12.2016 17:09, Michal Hocko wrote:

[...]

[97883.882611] Mem-Info:
[97883.883747] active_anon:2915 inactive_anon:3376 isolated_anon:0
  active_file:3902 inactive_file:3639 isolated_file:0
  unevictable:0 dirty:205 writeback:0 unstable:0
  slab_reclaimable:9856 slab_unreclaimable:9682
  mapped:3722 shmem:59 pagetables:2080 bounce:0
  free:748 free_pcp:15 free_cma:0

there is still some page cache which doesn't seem to be neither dirty
nor under writeback. So it should be theoretically reclaimable but for
some reason we cannot seem to reclaim that memory.
There is still some anonymous memory and free swap so we could reclaim
it as well but it all seems pretty down and the memory pressure is
really large

Yes, it might be large on the update situation, but that should be handled
by a virtual memory system by the kernel, right?

Well this is what we try and call it memory reclaim. But if we are not
able to reclaim anything then we eventually have to give up and trigger
the OOM killer.


I'm not familiar with the Linux implementation of the VM system in 
detail. But can't you reserve as much memory for the kernel (non 
pageable) at least that you can swap everything out (even without 
killing a process at least as long there is enough swap available, which 
should be in all of my cases)?




  Now the information that 4.4 made a difference is
interesting. I do not really see any major differences in the reclaim
between 4.3 and 4.4 kernels. The reason might be somewhere else as well.
E.g. some of the subsystem consumes much more memory than before.

Just curious, what kind of filesystem are you using?


I'm using ext4 only with virt-* drivers (storage, network). But it is 
definitly a virtual memory allocation/swap usage issue.



  Could you try some
additional debugging. Enabling reclaim related tracepoints might tell us
more. The following should tell us more
mount -t tracefs none /trace
echo 1 > /trace/events/vmscan/enable
echo 1 > /trace/events/writeback/writeback_congestion_wait/enable
cat /trace/trace_pipe > trace.log

Collecting /proc/vmstat over time might be helpful as well
mkdir logs
while true
do
cp /proc/vmstat vmstat.$(date +%s)
sleep 1s
done


Activated it. But I think it should be very easy to trigger also on your 
side. A very small configured VM with a program running RAM 
allocations/writes (I guess you have some testing programs already) 
should be sufficient to trigger it. You can also use the attached 
program which I used to trigger such situations some years ago. If it 
doesn't help try to reduce the available CPU for the VM and also I/O 
(e.g. use all CPU/IO on the host or other VMs).


BTW: Don't know if you have seen also my original message on the kernel 
mailinglist only:


Linus had also OOM problems with 1kB RAM requests and a lot of free RAM 
(use a translation service for the german page):

https://lkml.org/lkml/2016/11/30/64
https://marius.bloggt-in-braunschweig.de/2016/11/17/linuxkernel-4-74-8-und-der-oom-killer/
https://www.spinics.net/lists/linux-mm/msg113661.html

Thnx.

Ciao,
Gerhard

// mallocsleep.c
#include 
#include 
#include 

typedef unsigned int BOOL;
typedef char* PCHAR;
typedef unsigned int DWORD;
typedef unsigned long DDWORD;

#define FALSE 0
#define TRUE 1

BOOL getlong(PCHAR s, DDWORD* retvalue)
{
  char *eptr;
  long value;

  value=strtoll(s,,0);
  if ((eptr == s)||(*eptr != '\0')) return FALSE;
  if (value < 0) return FALSE;
  *retvalue = value;
  return TRUE;
}

int main(int argc, char* argv[])
{
  unsigned long* p;
  unsigned long size = 16*1024*1024;
  unsigned long size_of = sizeof(*p);
  unsigned long i;
  unsigned long sleep_allocated = 3600;
  unsigned long sleep_freed = 3600;

  if (argc > 1)
  {
if (!getlong(argv[1], ))
{
  printf("Wrong memsize!\n");
  exit(1);
}
  }

  if (argc > 2)
  {
if (!getlong(argv[2], _allocated))
{
  printf("Wrong sleep_allocated time!\n");
  exit(1);
}
  }

  if (argc > 3)
  {
if (!getlong(argv[3], _freed))
{
  printf("Wrong sleep_freed time!\n");
  exit(1);
}
  }

  printf("size=%lu, size_of=%lu\n", size, size_of);
  fflush(stdout);

  p = malloc(size);
  if (!p)
  {
printf("Could not allocate memory!\n");
exit(2);
  }

  printf("malloc done, writing to memory, p=%p ...\n", (void*)p);
  fflush(stdout);

  for(i = 0;i < (size/size_of);i++) p[i]=i;

  printf("writing to memory done, sleeping for %lu seconds ...\n", 
sleep_allocated);

  fflush(stdout);

  sleep(sleep_allocated);

  printf("sleeping done, freeing ...\n");
  fflush(stdout);

  free(p);

  printf("freeing done, sleeping for %lu seconds ...\n", sleep_freed);
  fflush(stdout);

  sleep(sleep_freed);

  printf("sleeping done, exitiing ...\n");
  fflush(stdout);

  exit(0);
  return 0;
}

Re: Still OOM problems with 4.9er kernels


On 09.12.2016 17:09, Michal Hocko wrote:

On Fri 09-12-16 16:52:07, Gerhard Wiesinger wrote:

On 09.12.2016 14:40, Michal Hocko wrote:

On Fri 09-12-16 08:06:25, Gerhard Wiesinger wrote:

Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes
better).

./update.sh: line 40:  1591 Killed  ${EXE} update ${PARAMS}
(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Could you post your oom report please?

E.g. a new one with more than one included, first one after boot ...

Just setup a low mem VM under KVM and it is easily triggerable.

What is the workload?


just run dnf clean all;dnf update
(and the other tasks running on those machine. The normal load on most 
of these machines is pretty VERY LOW, e.g. running just an apache httpd 
doing nothing or e.g. running samba domain controller doing nothing)


So my setups are low mem VMs so that KVM host has most of the caching 
effects shared.


I'm running this setup since Fedora 17 under kernel-3.3.4-5.fc17.x86_64 
and had NO problems.


Problems started with 4.4.3-300.fc23.x86_64 and got worser in each major 
kernel versions (for upgrades I had even give the VMs temporarilly more 
memory for the upgrade situation).

(from my bug report at
https://bugzilla.redhat.com/show_bug.cgi?id=1314697
Previous kernel version on guest/host was rocket stable. Revert to 
kernel-4.3.5-300.fc23.x86_64 also solved it.)


For completeness the actual kernel parameters on all hosts and VMs.
vm.dirty_background_ratio=3
vm.dirty_ratio=15
vm.overcommit_memory=2
vm.overcommit_ratio=80
vm.swappiness=10

With kernel 4.9.0rc7 or rc8 it was getting better. But still not there 
where it should be (and was already).





Still enough virtual memory available ...

Well, you will always have a lot of virtual memory...


And why is it not used, e.g. swapped and gets into an OOM situation?




4.9.0-0.rc8.git2.1.fc26.x86_64

[  624.862777] ksoftirqd/0: page allocation failure: order:0, 
mode:0x2080020(GFP_ATOMIC)

[...]

[95895.765570] kworker/1:1H: page allocation failure: order:0, 
mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)

These are atomic allocation failures and should be recoverable.
[...]


[97883.838418] httpd invoked oom-killer:  
gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0,  
oom_score_adj=0

But this is a real OOM killer invocation because a single page allocation
cannot proceed.

[...]

[97883.882611] Mem-Info:
[97883.883747] active_anon:2915 inactive_anon:3376 isolated_anon:0
 active_file:3902 inactive_file:3639 isolated_file:0
 unevictable:0 dirty:205 writeback:0 unstable:0
 slab_reclaimable:9856 slab_unreclaimable:9682
 mapped:3722 shmem:59 pagetables:2080 bounce:0
 free:748 free_pcp:15 free_cma:0

there is still some page cache which doesn't seem to be neither dirty
nor under writeback. So it should be theoretically reclaimable but for
some reason we cannot seem to reclaim that memory.
There is still some anonymous memory and free swap so we could reclaim
it as well but it all seems pretty down and the memory pressure is
really large


Yes, it might be large on the update situation, but that should be 
handled by a virtual memory system by the kernel, right?





[97883.890766] Node 0 active_anon:11660kB inactive_anon:13504kB
active_file:15608kB inactive_file:14556kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB mapped:14888kB dirty:820kB writeback:0kB shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 236kB writeback_tmp:0kB
unstable:0kB pages_scanned:168352 all_unreclaimable? yes

all_unreclaimable also agrees that basically nothing is reclaimable.
That was one of the criterion to hit the OOM killer prior to the rewrite
in 4.6 kernel. So I suspect that older kernels would OOM under your
memory pressure as well.



See comments above.

Thnx.


Ciao,

Gerhard

Re: Still OOM problems with 4.9er kernels


On 09.12.2016 17:09, Michal Hocko wrote:

On Fri 09-12-16 16:52:07, Gerhard Wiesinger wrote:

On 09.12.2016 14:40, Michal Hocko wrote:

On Fri 09-12-16 08:06:25, Gerhard Wiesinger wrote:

Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes
better).

./update.sh: line 40:  1591 Killed  ${EXE} update ${PARAMS}
(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Could you post your oom report please?

E.g. a new one with more than one included, first one after boot ...

Just setup a low mem VM under KVM and it is easily triggerable.

What is the workload?


just run dnf clean all;dnf update
(and the other tasks running on those machine. The normal load on most 
of these machines is pretty VERY LOW, e.g. running just an apache httpd 
doing nothing or e.g. running samba domain controller doing nothing)


So my setups are low mem VMs so that KVM host has most of the caching 
effects shared.


I'm running this setup since Fedora 17 under kernel-3.3.4-5.fc17.x86_64 
and had NO problems.


Problems started with 4.4.3-300.fc23.x86_64 and got worser in each major 
kernel versions (for upgrades I had even give the VMs temporarilly more 
memory for the upgrade situation).

(from my bug report at
https://bugzilla.redhat.com/show_bug.cgi?id=1314697
Previous kernel version on guest/host was rocket stable. Revert to 
kernel-4.3.5-300.fc23.x86_64 also solved it.)


For completeness the actual kernel parameters on all hosts and VMs.
vm.dirty_background_ratio=3
vm.dirty_ratio=15
vm.overcommit_memory=2
vm.overcommit_ratio=80
vm.swappiness=10

With kernel 4.9.0rc7 or rc8 it was getting better. But still not there 
where it should be (and was already).





Still enough virtual memory available ...

Well, you will always have a lot of virtual memory...


And why is it not used, e.g. swapped and gets into an OOM situation?




4.9.0-0.rc8.git2.1.fc26.x86_64

[  624.862777] ksoftirqd/0: page allocation failure: order:0, 
mode:0x2080020(GFP_ATOMIC)

[...]

[95895.765570] kworker/1:1H: page allocation failure: order:0, 
mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)

These are atomic allocation failures and should be recoverable.
[...]


[97883.838418] httpd invoked oom-killer:  
gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0,  
oom_score_adj=0

But this is a real OOM killer invocation because a single page allocation
cannot proceed.

[...]

[97883.882611] Mem-Info:
[97883.883747] active_anon:2915 inactive_anon:3376 isolated_anon:0
 active_file:3902 inactive_file:3639 isolated_file:0
 unevictable:0 dirty:205 writeback:0 unstable:0
 slab_reclaimable:9856 slab_unreclaimable:9682
 mapped:3722 shmem:59 pagetables:2080 bounce:0
 free:748 free_pcp:15 free_cma:0

there is still some page cache which doesn't seem to be neither dirty
nor under writeback. So it should be theoretically reclaimable but for
some reason we cannot seem to reclaim that memory.
There is still some anonymous memory and free swap so we could reclaim
it as well but it all seems pretty down and the memory pressure is
really large


Yes, it might be large on the update situation, but that should be 
handled by a virtual memory system by the kernel, right?





[97883.890766] Node 0 active_anon:11660kB inactive_anon:13504kB
active_file:15608kB inactive_file:14556kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB mapped:14888kB dirty:820kB writeback:0kB shmem:0kB
shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 236kB writeback_tmp:0kB
unstable:0kB pages_scanned:168352 all_unreclaimable? yes

all_unreclaimable also agrees that basically nothing is reclaimable.
That was one of the criterion to hit the OOM killer prior to the rewrite
in 4.6 kernel. So I suspect that older kernels would OOM under your
memory pressure as well.



See comments above.

Thnx.


Ciao,

Gerhard

Re: Still OOM problems with 4.9er kernels


On 09.12.2016 14:40, Michal Hocko wrote:

On Fri 09-12-16 08:06:25, Gerhard Wiesinger wrote:

Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes
better).

./update.sh: line 40:  1591 Killed  ${EXE} update ${PARAMS}
(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Could you post your oom report please?


And another one which ended in a native_safe_halt 

[73366.837826] nmbd: page allocation failure: order:0, 
mode:0x2280030(GFP_ATOMIC|__GFP_RECLAIMABLE|__GFP_NOTRACK)
[73366.837985] CPU: 1 PID: 2005 Comm: nmbd Not tainted 
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[73366.838075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3
[73366.838175]  aa4ac059f548 8d4774e3 8dc7dd98 

[73366.838272]  aa4ac059f5d0 8d20e6ea 022800300046 
8dc7dd98
[73366.838364]  aa4ac059f570 9c370010 aa4ac059f5e0 
aa4ac059f590

[73366.838458] Call Trace:
[73366.838590]  [] dump_stack+0x86/0xc3
[73366.838680]  [] warn_alloc+0x13a/0x170
[73366.838762]  [] __alloc_pages_slowpath+0x252/0xbb0
[73366.838846]  [] ? finish_task_switch+0xb0/0x260
[73366.838926]  [] __alloc_pages_nodemask+0x40d/0x4b0
[73366.839007]  [] alloc_pages_current+0xa1/0x1f0
[73366.839088]  [] ? kvm_sched_clock_read+0x25/0x40
[73366.839170]  [] new_slab+0x316/0x7c0
[73366.839245]  [] ___slab_alloc+0x3fb/0x5c0
[73366.839325]  [] ? kvm_sched_clock_read+0x25/0x40
[73366.839409]  [] ? __es_insert_extent+0xb3/0x330
[73366.839501]  [] ? __es_insert_extent+0xb3/0x330
[73366.839583]  [] __slab_alloc+0x51/0x90
[73366.839662]  [] ? __es_insert_extent+0xb3/0x330
[73366.839743]  [] kmem_cache_alloc+0x246/0x2d0
[73366.839822]  [] ? __es_remove_extent+0x56/0x2d0
[73366.839906]  [] __es_insert_extent+0xb3/0x330
[73366.839985]  [] ext4_es_insert_extent+0xee/0x280
[73366.840067]  [] ? ext4_map_blocks+0x2b4/0x5f0
[73366.840147]  [] ext4_map_blocks+0x323/0x5f0
[73366.840225]  [] ? workingset_refault+0x10a/0x220
[73366.840314]  [] ext4_mpage_readpages+0x413/0xa60
[73366.840397]  [] ? __page_cache_alloc+0x146/0x190
[73366.840487]  [] ext4_readpages+0x35/0x40
[73366.840569]  [] __do_page_cache_readahead+0x2bf/0x390
[73366.840651]  [] ? __do_page_cache_readahead+0x16a/0x390
[73366.840735]  [] filemap_fault+0x51b/0x790
[73366.840814]  [] ? ext4_filemap_fault+0x2e/0x50
[73366.840896]  [] ext4_filemap_fault+0x39/0x50
[73366.840976]  [] __do_fault+0x83/0x1d0
[73366.841056]  [] handle_mm_fault+0x11e2/0x17a0
[73366.841138]  [] ? handle_mm_fault+0x5a/0x17a0
[73366.841220]  [] __do_page_fault+0x266/0x520
[73366.841300]  [] trace_do_page_fault+0x58/0x2a0
[73366.841382]  [] do_async_page_fault+0x1a/0xa0
[73366.841464]  [] async_page_fault+0x28/0x30
[73366.842500] Mem-Info:
[73366.843149] active_anon:8677 inactive_anon:8798 isolated_anon:0
active_file:328 inactive_file:317 isolated_file:32
unevictable:0 dirty:0 writeback:2 unstable:0
slab_reclaimable:4968 slab_unreclaimable:9242
mapped:365 shmem:1 pagetables:2690 bounce:0
free:764 free_pcp:41 free_cma:0
[73366.846832] Node 0 active_anon:34708kB inactive_anon:35192kB 
active_file:1312kB inactive_file:1268kB unevictable:0kB 
isolated(anon):0kB isolated(file):128kB mapped:1460kB dirty:0kB 
writeback:8kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 
4kB writeback_tmp:0kB unstable:0kB pages_scanned:32 all_unreclaimable? no
[73366.848711] Node 0 DMA free:1468kB min:172kB low:212kB high:252kB 
active_anon:3216kB inactive_anon:3448kB active_file:40kB 
inactive_file:228kB unevictable:0kB writepending:0kB present:15992kB 
managed:15908kB mlocked:0kB slab_reclaimable:2064kB 
slab_unreclaimable:2960kB kernel_stack:100kB pagetables:1536kB 
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB

[73366.850769] lowmem_reserve[]: 0 116 116 116 116
[73366.851479] Node 0 DMA32 free:1588kB min:1296kB low:1620kB 
high:1944kB active_anon:31464kB inactive_anon:31740kB active_file:1236kB 
inactive_file:1056kB unevictable:0kB writepending:0kB present:180080kB 
managed:139012kB mlocked:0kB slab_reclaimable:17808kB 
slab_unreclaimable:34008kB kernel_stack:1676kB pagetables:9224kB 
bounce:0kB free_pcp:164kB local_pcp:12kB free_cma:0kB

[73366.853757] lowmem_reserve[]: 0 0 0 0 0
[73366.854544] Node 0 DMA: 13*4kB (H) 13*8kB (H) 17*16kB (H) 12*32kB (H) 
8*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1452kB
[73366.856200] Node 0 DMA32: 70*4kB (UMH) 12*8kB (MH) 12*16kB (H) 2*32kB 
(H) 5*64kB (H) 5*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
1592kB
[73366.857955] Node 0 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=2048kB

[73366.857956] 2401 total pagecache pages
[73366.858829] 1741 pages in swap cache
[73366.859721] Swap cache stats: add

Re: Still OOM problems with 4.9er kernels


On 09.12.2016 14:40, Michal Hocko wrote:

On Fri 09-12-16 08:06:25, Gerhard Wiesinger wrote:

Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes
better).

./update.sh: line 40:  1591 Killed  ${EXE} update ${PARAMS}
(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Could you post your oom report please?


And another one which ended in a native_safe_halt 

[73366.837826] nmbd: page allocation failure: order:0, 
mode:0x2280030(GFP_ATOMIC|__GFP_RECLAIMABLE|__GFP_NOTRACK)
[73366.837985] CPU: 1 PID: 2005 Comm: nmbd Not tainted 
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[73366.838075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3
[73366.838175]  aa4ac059f548 8d4774e3 8dc7dd98 

[73366.838272]  aa4ac059f5d0 8d20e6ea 022800300046 
8dc7dd98
[73366.838364]  aa4ac059f570 9c370010 aa4ac059f5e0 
aa4ac059f590

[73366.838458] Call Trace:
[73366.838590]  [] dump_stack+0x86/0xc3
[73366.838680]  [] warn_alloc+0x13a/0x170
[73366.838762]  [] __alloc_pages_slowpath+0x252/0xbb0
[73366.838846]  [] ? finish_task_switch+0xb0/0x260
[73366.838926]  [] __alloc_pages_nodemask+0x40d/0x4b0
[73366.839007]  [] alloc_pages_current+0xa1/0x1f0
[73366.839088]  [] ? kvm_sched_clock_read+0x25/0x40
[73366.839170]  [] new_slab+0x316/0x7c0
[73366.839245]  [] ___slab_alloc+0x3fb/0x5c0
[73366.839325]  [] ? kvm_sched_clock_read+0x25/0x40
[73366.839409]  [] ? __es_insert_extent+0xb3/0x330
[73366.839501]  [] ? __es_insert_extent+0xb3/0x330
[73366.839583]  [] __slab_alloc+0x51/0x90
[73366.839662]  [] ? __es_insert_extent+0xb3/0x330
[73366.839743]  [] kmem_cache_alloc+0x246/0x2d0
[73366.839822]  [] ? __es_remove_extent+0x56/0x2d0
[73366.839906]  [] __es_insert_extent+0xb3/0x330
[73366.839985]  [] ext4_es_insert_extent+0xee/0x280
[73366.840067]  [] ? ext4_map_blocks+0x2b4/0x5f0
[73366.840147]  [] ext4_map_blocks+0x323/0x5f0
[73366.840225]  [] ? workingset_refault+0x10a/0x220
[73366.840314]  [] ext4_mpage_readpages+0x413/0xa60
[73366.840397]  [] ? __page_cache_alloc+0x146/0x190
[73366.840487]  [] ext4_readpages+0x35/0x40
[73366.840569]  [] __do_page_cache_readahead+0x2bf/0x390
[73366.840651]  [] ? __do_page_cache_readahead+0x16a/0x390
[73366.840735]  [] filemap_fault+0x51b/0x790
[73366.840814]  [] ? ext4_filemap_fault+0x2e/0x50
[73366.840896]  [] ext4_filemap_fault+0x39/0x50
[73366.840976]  [] __do_fault+0x83/0x1d0
[73366.841056]  [] handle_mm_fault+0x11e2/0x17a0
[73366.841138]  [] ? handle_mm_fault+0x5a/0x17a0
[73366.841220]  [] __do_page_fault+0x266/0x520
[73366.841300]  [] trace_do_page_fault+0x58/0x2a0
[73366.841382]  [] do_async_page_fault+0x1a/0xa0
[73366.841464]  [] async_page_fault+0x28/0x30
[73366.842500] Mem-Info:
[73366.843149] active_anon:8677 inactive_anon:8798 isolated_anon:0
active_file:328 inactive_file:317 isolated_file:32
unevictable:0 dirty:0 writeback:2 unstable:0
slab_reclaimable:4968 slab_unreclaimable:9242
mapped:365 shmem:1 pagetables:2690 bounce:0
free:764 free_pcp:41 free_cma:0
[73366.846832] Node 0 active_anon:34708kB inactive_anon:35192kB 
active_file:1312kB inactive_file:1268kB unevictable:0kB 
isolated(anon):0kB isolated(file):128kB mapped:1460kB dirty:0kB 
writeback:8kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 
4kB writeback_tmp:0kB unstable:0kB pages_scanned:32 all_unreclaimable? no
[73366.848711] Node 0 DMA free:1468kB min:172kB low:212kB high:252kB 
active_anon:3216kB inactive_anon:3448kB active_file:40kB 
inactive_file:228kB unevictable:0kB writepending:0kB present:15992kB 
managed:15908kB mlocked:0kB slab_reclaimable:2064kB 
slab_unreclaimable:2960kB kernel_stack:100kB pagetables:1536kB 
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB

[73366.850769] lowmem_reserve[]: 0 116 116 116 116
[73366.851479] Node 0 DMA32 free:1588kB min:1296kB low:1620kB 
high:1944kB active_anon:31464kB inactive_anon:31740kB active_file:1236kB 
inactive_file:1056kB unevictable:0kB writepending:0kB present:180080kB 
managed:139012kB mlocked:0kB slab_reclaimable:17808kB 
slab_unreclaimable:34008kB kernel_stack:1676kB pagetables:9224kB 
bounce:0kB free_pcp:164kB local_pcp:12kB free_cma:0kB

[73366.853757] lowmem_reserve[]: 0 0 0 0 0
[73366.854544] Node 0 DMA: 13*4kB (H) 13*8kB (H) 17*16kB (H) 12*32kB (H) 
8*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1452kB
[73366.856200] Node 0 DMA32: 70*4kB (UMH) 12*8kB (MH) 12*16kB (H) 2*32kB 
(H) 5*64kB (H) 5*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
1592kB
[73366.857955] Node 0 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=2048kB

[73366.857956] 2401 total pagecache pages
[73366.858829] 1741 pages in swap cache
[73366.859721] Swap cache stats: add

Re: Still OOM problems with 4.9er kernels


On 09.12.2016 16:52, Gerhard Wiesinger wrote:

On 09.12.2016 14:40, Michal Hocko wrote:

On Fri 09-12-16 08:06:25, Gerhard Wiesinger wrote:

Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes
better).

./update.sh: line 40:  1591 Killed  ${EXE} update 
${PARAMS}

(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Could you post your oom report please?


E.g. a new one with more than one included, first one after boot ...

Just setup a low mem VM under KVM and it is easily triggerable.

Still enough virtual memory available ...

4.9.0-0.rc8.git2.1.fc26.x86_64

[  624.862777] ksoftirqd/0: page allocation failure: order:0, 
mode:0x2080020(GFP_ATOMIC)
[  624.863319] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[  624.863410] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3
[  624.863510]  aa62c007f958 904774e3 90c7dd98 

[  624.863923]  aa62c007f9e0 9020e6ea 020800200246 
90c7dd98
[  624.864019]  aa62c007f980 96b90010 aa62c007f9f0 
aa62c007f9a0

[  624.864998] Call Trace:
[  624.865149]  [] dump_stack+0x86/0xc3
[  624.865347]  [] warn_alloc+0x13a/0x170
[  624.865432]  [] __alloc_pages_slowpath+0x252/0xbb0
[  624.865563]  [] __alloc_pages_nodemask+0x40d/0x4b0
[  624.865675]  [] __alloc_page_frag+0x193/0x200
[  624.866024]  [] __napi_alloc_skb+0x8e/0xf0
[  624.866113]  [] page_to_skb.isra.28+0x5d/0x310 
[virtio_net]
[  624.866201]  [] virtnet_receive+0x2db/0x9a0 
[virtio_net]

[  624.867378]  [] virtnet_poll+0x1d/0x80 [virtio_net]
[  624.867494]  [] net_rx_action+0x23e/0x470
[  624.867612]  [] __do_softirq+0xcd/0x4b9
[  624.867704]  [] ? smpboot_thread_fn+0x34/0x1f0
[  624.867833]  [] ? smpboot_thread_fn+0x12d/0x1f0
[  624.867924]  [] run_ksoftirqd+0x25/0x80
[  624.868109]  [] smpboot_thread_fn+0x128/0x1f0
[  624.868197]  [] ? sort_range+0x30/0x30
[  624.868596]  [] kthread+0x102/0x120
[  624.868679]  [] ? wait_for_completion+0x110/0x140
[  624.868768]  [] ? kthread_park+0x60/0x60
[  624.868850]  [] ret_from_fork+0x2a/0x40
[  843.528656] httpd (2490) used greatest stack depth: 10304 bytes left
[  878.077750] httpd (2976) used greatest stack depth: 10096 bytes left
[93918.861109] netstat (14579) used greatest stack depth: 9488 bytes left
[94050.874669] kworker/dying (6253) used greatest stack depth: 9008 
bytes left
[95895.765570] kworker/1:1H: page allocation failure: order:0, 
mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
[95895.765819] CPU: 1 PID: 440 Comm: kworker/1:1H Not tainted 
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[95895.765911] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3

[95895.766060] Workqueue: kblockd blk_mq_run_work_fn
[95895.766143]  aa62c0257628 904774e3 90c7dd98 

[95895.766235]  aa62c02576b0 9020e6ea 022800200046 
90c7dd98
[95895.766325]  aa62c0257650 96b90010 aa62c02576c0 
aa62c0257670

[95895.766417] Call Trace:
[95895.766502]  [] dump_stack+0x86/0xc3
[95895.766596]  [] warn_alloc+0x13a/0x170
[95895.766681]  [] __alloc_pages_slowpath+0x252/0xbb0
[95895.766767]  [] __alloc_pages_nodemask+0x40d/0x4b0
[95895.766866]  [] alloc_pages_current+0xa1/0x1f0
[95895.766971]  [] ? _raw_spin_unlock+0x27/0x40
[95895.767073]  [] new_slab+0x316/0x7c0
[95895.767160]  [] ___slab_alloc+0x3fb/0x5c0
[95895.772611]  [] ? cpuacct_charge+0xf2/0x1f0
[95895.773406]  [] ? 
alloc_indirect.isra.11+0x1d/0x50 [virtio_ring]

[95895.774327]  [] ? rcu_read_lock_sched_held+0x45/0x80
[95895.775212]  [] ? 
alloc_indirect.isra.11+0x1d/0x50 [virtio_ring]

[95895.776155]  [] __slab_alloc+0x51/0x90
[95895.777090]  [] __kmalloc+0x251/0x320
[95895.781502]  [] ? 
alloc_indirect.isra.11+0x1d/0x50 [virtio_ring]
[95895.782309]  [] alloc_indirect.isra.11+0x1d/0x50 
[virtio_ring]
[95895.783334]  [] virtqueue_add_sgs+0x1c3/0x4a0 
[virtio_ring]

[95895.784059]  [] ? kvm_sched_clock_read+0x25/0x40
[95895.784742]  [] __virtblk_add_req+0xbc/0x220 
[virtio_blk]
[95895.785419]  [] ? 
debug_lockdep_rcu_enabled+0x1d/0x20
[95895.786086]  [] ? virtio_queue_rq+0x105/0x290 
[virtio_blk]
[95895.786750]  [] virtio_queue_rq+0x12d/0x290 
[virtio_blk]

[95895.787427]  [] __blk_mq_run_hw_queue+0x26d/0x3b0
[95895.788106]  [] blk_mq_run_work_fn+0x12/0x20
[95895.789065]  [] process_one_work+0x23e/0x6f0
[95895.789741]  [] ? process_one_work+0x1ba/0x6f0
[95895.790444]  [] worker_thread+0x4e/0x490
[95895.791178]  [] ? process_one_work+0x6f0/0x6f0
[95895.791911]  [] ? process_one_work+0x6f0/0x6f0
[95895.792653]  [] ? do_syscall_64+0x6c/0x1f0
[95895.793397]  [] kthread+0x102/0x120
[95895.794212]  [] ? 
trace_hardirqs_on_caller+0xf5/0x1b0

[95895.794942]  [] ? kthread_park+0x60/0x60
[95895.795689]  [] ret_from_fork+0x2a/0x40

Re: Still OOM problems with 4.9er kernels


On 09.12.2016 16:52, Gerhard Wiesinger wrote:

On 09.12.2016 14:40, Michal Hocko wrote:

On Fri 09-12-16 08:06:25, Gerhard Wiesinger wrote:

Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes
better).

./update.sh: line 40:  1591 Killed  ${EXE} update 
${PARAMS}

(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Could you post your oom report please?


E.g. a new one with more than one included, first one after boot ...

Just setup a low mem VM under KVM and it is easily triggerable.

Still enough virtual memory available ...

4.9.0-0.rc8.git2.1.fc26.x86_64

[  624.862777] ksoftirqd/0: page allocation failure: order:0, 
mode:0x2080020(GFP_ATOMIC)
[  624.863319] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[  624.863410] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3
[  624.863510]  aa62c007f958 904774e3 90c7dd98 

[  624.863923]  aa62c007f9e0 9020e6ea 020800200246 
90c7dd98
[  624.864019]  aa62c007f980 96b90010 aa62c007f9f0 
aa62c007f9a0

[  624.864998] Call Trace:
[  624.865149]  [] dump_stack+0x86/0xc3
[  624.865347]  [] warn_alloc+0x13a/0x170
[  624.865432]  [] __alloc_pages_slowpath+0x252/0xbb0
[  624.865563]  [] __alloc_pages_nodemask+0x40d/0x4b0
[  624.865675]  [] __alloc_page_frag+0x193/0x200
[  624.866024]  [] __napi_alloc_skb+0x8e/0xf0
[  624.866113]  [] page_to_skb.isra.28+0x5d/0x310 
[virtio_net]
[  624.866201]  [] virtnet_receive+0x2db/0x9a0 
[virtio_net]

[  624.867378]  [] virtnet_poll+0x1d/0x80 [virtio_net]
[  624.867494]  [] net_rx_action+0x23e/0x470
[  624.867612]  [] __do_softirq+0xcd/0x4b9
[  624.867704]  [] ? smpboot_thread_fn+0x34/0x1f0
[  624.867833]  [] ? smpboot_thread_fn+0x12d/0x1f0
[  624.867924]  [] run_ksoftirqd+0x25/0x80
[  624.868109]  [] smpboot_thread_fn+0x128/0x1f0
[  624.868197]  [] ? sort_range+0x30/0x30
[  624.868596]  [] kthread+0x102/0x120
[  624.868679]  [] ? wait_for_completion+0x110/0x140
[  624.868768]  [] ? kthread_park+0x60/0x60
[  624.868850]  [] ret_from_fork+0x2a/0x40
[  843.528656] httpd (2490) used greatest stack depth: 10304 bytes left
[  878.077750] httpd (2976) used greatest stack depth: 10096 bytes left
[93918.861109] netstat (14579) used greatest stack depth: 9488 bytes left
[94050.874669] kworker/dying (6253) used greatest stack depth: 9008 
bytes left
[95895.765570] kworker/1:1H: page allocation failure: order:0, 
mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
[95895.765819] CPU: 1 PID: 440 Comm: kworker/1:1H Not tainted 
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[95895.765911] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3

[95895.766060] Workqueue: kblockd blk_mq_run_work_fn
[95895.766143]  aa62c0257628 904774e3 90c7dd98 

[95895.766235]  aa62c02576b0 9020e6ea 022800200046 
90c7dd98
[95895.766325]  aa62c0257650 96b90010 aa62c02576c0 
aa62c0257670

[95895.766417] Call Trace:
[95895.766502]  [] dump_stack+0x86/0xc3
[95895.766596]  [] warn_alloc+0x13a/0x170
[95895.766681]  [] __alloc_pages_slowpath+0x252/0xbb0
[95895.766767]  [] __alloc_pages_nodemask+0x40d/0x4b0
[95895.766866]  [] alloc_pages_current+0xa1/0x1f0
[95895.766971]  [] ? _raw_spin_unlock+0x27/0x40
[95895.767073]  [] new_slab+0x316/0x7c0
[95895.767160]  [] ___slab_alloc+0x3fb/0x5c0
[95895.772611]  [] ? cpuacct_charge+0xf2/0x1f0
[95895.773406]  [] ? 
alloc_indirect.isra.11+0x1d/0x50 [virtio_ring]

[95895.774327]  [] ? rcu_read_lock_sched_held+0x45/0x80
[95895.775212]  [] ? 
alloc_indirect.isra.11+0x1d/0x50 [virtio_ring]

[95895.776155]  [] __slab_alloc+0x51/0x90
[95895.777090]  [] __kmalloc+0x251/0x320
[95895.781502]  [] ? 
alloc_indirect.isra.11+0x1d/0x50 [virtio_ring]
[95895.782309]  [] alloc_indirect.isra.11+0x1d/0x50 
[virtio_ring]
[95895.783334]  [] virtqueue_add_sgs+0x1c3/0x4a0 
[virtio_ring]

[95895.784059]  [] ? kvm_sched_clock_read+0x25/0x40
[95895.784742]  [] __virtblk_add_req+0xbc/0x220 
[virtio_blk]
[95895.785419]  [] ? 
debug_lockdep_rcu_enabled+0x1d/0x20
[95895.786086]  [] ? virtio_queue_rq+0x105/0x290 
[virtio_blk]
[95895.786750]  [] virtio_queue_rq+0x12d/0x290 
[virtio_blk]

[95895.787427]  [] __blk_mq_run_hw_queue+0x26d/0x3b0
[95895.788106]  [] blk_mq_run_work_fn+0x12/0x20
[95895.789065]  [] process_one_work+0x23e/0x6f0
[95895.789741]  [] ? process_one_work+0x1ba/0x6f0
[95895.790444]  [] worker_thread+0x4e/0x490
[95895.791178]  [] ? process_one_work+0x6f0/0x6f0
[95895.791911]  [] ? process_one_work+0x6f0/0x6f0
[95895.792653]  [] ? do_syscall_64+0x6c/0x1f0
[95895.793397]  [] kthread+0x102/0x120
[95895.794212]  [] ? 
trace_hardirqs_on_caller+0xf5/0x1b0

[95895.794942]  [] ? kthread_park+0x60/0x60
[95895.795689]  [] ret_from_fork+0x2a/0x40

Re: Still OOM problems with 4.9er kernels


On 09.12.2016 14:40, Michal Hocko wrote:

On Fri 09-12-16 08:06:25, Gerhard Wiesinger wrote:

Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes
better).

./update.sh: line 40:  1591 Killed  ${EXE} update ${PARAMS}
(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Could you post your oom report please?


E.g. a new one with more than one included, first one after boot ...

Just setup a low mem VM under KVM and it is easily triggerable.

Still enough virtual memory available ...

4.9.0-0.rc8.git2.1.fc26.x86_64

[  624.862777] ksoftirqd/0: page allocation failure: order:0, 
mode:0x2080020(GFP_ATOMIC)
[  624.863319] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[  624.863410] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3
[  624.863510]  aa62c007f958 904774e3 90c7dd98 

[  624.863923]  aa62c007f9e0 9020e6ea 020800200246 
90c7dd98
[  624.864019]  aa62c007f980 96b90010 aa62c007f9f0 
aa62c007f9a0

[  624.864998] Call Trace:
[  624.865149]  [] dump_stack+0x86/0xc3
[  624.865347]  [] warn_alloc+0x13a/0x170
[  624.865432]  [] __alloc_pages_slowpath+0x252/0xbb0
[  624.865563]  [] __alloc_pages_nodemask+0x40d/0x4b0
[  624.865675]  [] __alloc_page_frag+0x193/0x200
[  624.866024]  [] __napi_alloc_skb+0x8e/0xf0
[  624.866113]  [] page_to_skb.isra.28+0x5d/0x310 
[virtio_net]
[  624.866201]  [] virtnet_receive+0x2db/0x9a0 
[virtio_net]

[  624.867378]  [] virtnet_poll+0x1d/0x80 [virtio_net]
[  624.867494]  [] net_rx_action+0x23e/0x470
[  624.867612]  [] __do_softirq+0xcd/0x4b9
[  624.867704]  [] ? smpboot_thread_fn+0x34/0x1f0
[  624.867833]  [] ? smpboot_thread_fn+0x12d/0x1f0
[  624.867924]  [] run_ksoftirqd+0x25/0x80
[  624.868109]  [] smpboot_thread_fn+0x128/0x1f0
[  624.868197]  [] ? sort_range+0x30/0x30
[  624.868596]  [] kthread+0x102/0x120
[  624.868679]  [] ? wait_for_completion+0x110/0x140
[  624.868768]  [] ? kthread_park+0x60/0x60
[  624.868850]  [] ret_from_fork+0x2a/0x40
[  843.528656] httpd (2490) used greatest stack depth: 10304 bytes left
[  878.077750] httpd (2976) used greatest stack depth: 10096 bytes left
[93918.861109] netstat (14579) used greatest stack depth: 9488 bytes left
[94050.874669] kworker/dying (6253) used greatest stack depth: 9008 
bytes left
[95895.765570] kworker/1:1H: page allocation failure: order:0, 
mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
[95895.765819] CPU: 1 PID: 440 Comm: kworker/1:1H Not tainted 
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[95895.765911] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3

[95895.766060] Workqueue: kblockd blk_mq_run_work_fn
[95895.766143]  aa62c0257628 904774e3 90c7dd98 

[95895.766235]  aa62c02576b0 9020e6ea 022800200046 
90c7dd98
[95895.766325]  aa62c0257650 96b90010 aa62c02576c0 
aa62c0257670

[95895.766417] Call Trace:
[95895.766502]  [] dump_stack+0x86/0xc3
[95895.766596]  [] warn_alloc+0x13a/0x170
[95895.766681]  [] __alloc_pages_slowpath+0x252/0xbb0
[95895.766767]  [] __alloc_pages_nodemask+0x40d/0x4b0
[95895.766866]  [] alloc_pages_current+0xa1/0x1f0
[95895.766971]  [] ? _raw_spin_unlock+0x27/0x40
[95895.767073]  [] new_slab+0x316/0x7c0
[95895.767160]  [] ___slab_alloc+0x3fb/0x5c0
[95895.772611]  [] ? cpuacct_charge+0xf2/0x1f0
[95895.773406]  [] ? alloc_indirect.isra.11+0x1d/0x50 
[virtio_ring]

[95895.774327]  [] ? rcu_read_lock_sched_held+0x45/0x80
[95895.775212]  [] ? alloc_indirect.isra.11+0x1d/0x50 
[virtio_ring]

[95895.776155]  [] __slab_alloc+0x51/0x90
[95895.777090]  [] __kmalloc+0x251/0x320
[95895.781502]  [] ? alloc_indirect.isra.11+0x1d/0x50 
[virtio_ring]
[95895.782309]  [] alloc_indirect.isra.11+0x1d/0x50 
[virtio_ring]
[95895.783334]  [] virtqueue_add_sgs+0x1c3/0x4a0 
[virtio_ring]

[95895.784059]  [] ? kvm_sched_clock_read+0x25/0x40
[95895.784742]  [] __virtblk_add_req+0xbc/0x220 
[virtio_blk]

[95895.785419]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
[95895.786086]  [] ? virtio_queue_rq+0x105/0x290 
[virtio_blk]
[95895.786750]  [] virtio_queue_rq+0x12d/0x290 
[virtio_blk]

[95895.787427]  [] __blk_mq_run_hw_queue+0x26d/0x3b0
[95895.788106]  [] blk_mq_run_work_fn+0x12/0x20
[95895.789065]  [] process_one_work+0x23e/0x6f0
[95895.789741]  [] ? process_one_work+0x1ba/0x6f0
[95895.790444]  [] worker_thread+0x4e/0x490
[95895.791178]  [] ? process_one_work+0x6f0/0x6f0
[95895.791911]  [] ? process_one_work+0x6f0/0x6f0
[95895.792653]  [] ? do_syscall_64+0x6c/0x1f0
[95895.793397]  [] kthread+0x102/0x120
[95895.794212]  [] ? trace_hardirqs_on_caller+0xf5/0x1b0
[95895.794942]  [] ? kthread_park+0x60/0x60
[95895.795689]  [] ret_from_fork+0x2a/0x40
[95895.796408] Mem-Info:
[95895.797110] active_anon:8800

Re: Still OOM problems with 4.9er kernels


On 09.12.2016 14:40, Michal Hocko wrote:

On Fri 09-12-16 08:06:25, Gerhard Wiesinger wrote:

Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes
better).

./update.sh: line 40:  1591 Killed  ${EXE} update ${PARAMS}
(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Could you post your oom report please?


E.g. a new one with more than one included, first one after boot ...

Just setup a low mem VM under KVM and it is easily triggerable.

Still enough virtual memory available ...

4.9.0-0.rc8.git2.1.fc26.x86_64

[  624.862777] ksoftirqd/0: page allocation failure: order:0, 
mode:0x2080020(GFP_ATOMIC)
[  624.863319] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[  624.863410] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3
[  624.863510]  aa62c007f958 904774e3 90c7dd98 

[  624.863923]  aa62c007f9e0 9020e6ea 020800200246 
90c7dd98
[  624.864019]  aa62c007f980 96b90010 aa62c007f9f0 
aa62c007f9a0

[  624.864998] Call Trace:
[  624.865149]  [] dump_stack+0x86/0xc3
[  624.865347]  [] warn_alloc+0x13a/0x170
[  624.865432]  [] __alloc_pages_slowpath+0x252/0xbb0
[  624.865563]  [] __alloc_pages_nodemask+0x40d/0x4b0
[  624.865675]  [] __alloc_page_frag+0x193/0x200
[  624.866024]  [] __napi_alloc_skb+0x8e/0xf0
[  624.866113]  [] page_to_skb.isra.28+0x5d/0x310 
[virtio_net]
[  624.866201]  [] virtnet_receive+0x2db/0x9a0 
[virtio_net]

[  624.867378]  [] virtnet_poll+0x1d/0x80 [virtio_net]
[  624.867494]  [] net_rx_action+0x23e/0x470
[  624.867612]  [] __do_softirq+0xcd/0x4b9
[  624.867704]  [] ? smpboot_thread_fn+0x34/0x1f0
[  624.867833]  [] ? smpboot_thread_fn+0x12d/0x1f0
[  624.867924]  [] run_ksoftirqd+0x25/0x80
[  624.868109]  [] smpboot_thread_fn+0x128/0x1f0
[  624.868197]  [] ? sort_range+0x30/0x30
[  624.868596]  [] kthread+0x102/0x120
[  624.868679]  [] ? wait_for_completion+0x110/0x140
[  624.868768]  [] ? kthread_park+0x60/0x60
[  624.868850]  [] ret_from_fork+0x2a/0x40
[  843.528656] httpd (2490) used greatest stack depth: 10304 bytes left
[  878.077750] httpd (2976) used greatest stack depth: 10096 bytes left
[93918.861109] netstat (14579) used greatest stack depth: 9488 bytes left
[94050.874669] kworker/dying (6253) used greatest stack depth: 9008 
bytes left
[95895.765570] kworker/1:1H: page allocation failure: order:0, 
mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
[95895.765819] CPU: 1 PID: 440 Comm: kworker/1:1H Not tainted 
4.9.0-0.rc8.git2.1.fc26.x86_64 #1
[95895.765911] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3

[95895.766060] Workqueue: kblockd blk_mq_run_work_fn
[95895.766143]  aa62c0257628 904774e3 90c7dd98 

[95895.766235]  aa62c02576b0 9020e6ea 022800200046 
90c7dd98
[95895.766325]  aa62c0257650 96b90010 aa62c02576c0 
aa62c0257670

[95895.766417] Call Trace:
[95895.766502]  [] dump_stack+0x86/0xc3
[95895.766596]  [] warn_alloc+0x13a/0x170
[95895.766681]  [] __alloc_pages_slowpath+0x252/0xbb0
[95895.766767]  [] __alloc_pages_nodemask+0x40d/0x4b0
[95895.766866]  [] alloc_pages_current+0xa1/0x1f0
[95895.766971]  [] ? _raw_spin_unlock+0x27/0x40
[95895.767073]  [] new_slab+0x316/0x7c0
[95895.767160]  [] ___slab_alloc+0x3fb/0x5c0
[95895.772611]  [] ? cpuacct_charge+0xf2/0x1f0
[95895.773406]  [] ? alloc_indirect.isra.11+0x1d/0x50 
[virtio_ring]

[95895.774327]  [] ? rcu_read_lock_sched_held+0x45/0x80
[95895.775212]  [] ? alloc_indirect.isra.11+0x1d/0x50 
[virtio_ring]

[95895.776155]  [] __slab_alloc+0x51/0x90
[95895.777090]  [] __kmalloc+0x251/0x320
[95895.781502]  [] ? alloc_indirect.isra.11+0x1d/0x50 
[virtio_ring]
[95895.782309]  [] alloc_indirect.isra.11+0x1d/0x50 
[virtio_ring]
[95895.783334]  [] virtqueue_add_sgs+0x1c3/0x4a0 
[virtio_ring]

[95895.784059]  [] ? kvm_sched_clock_read+0x25/0x40
[95895.784742]  [] __virtblk_add_req+0xbc/0x220 
[virtio_blk]

[95895.785419]  [] ? debug_lockdep_rcu_enabled+0x1d/0x20
[95895.786086]  [] ? virtio_queue_rq+0x105/0x290 
[virtio_blk]
[95895.786750]  [] virtio_queue_rq+0x12d/0x290 
[virtio_blk]

[95895.787427]  [] __blk_mq_run_hw_queue+0x26d/0x3b0
[95895.788106]  [] blk_mq_run_work_fn+0x12/0x20
[95895.789065]  [] process_one_work+0x23e/0x6f0
[95895.789741]  [] ? process_one_work+0x1ba/0x6f0
[95895.790444]  [] worker_thread+0x4e/0x490
[95895.791178]  [] ? process_one_work+0x6f0/0x6f0
[95895.791911]  [] ? process_one_work+0x6f0/0x6f0
[95895.792653]  [] ? do_syscall_64+0x6c/0x1f0
[95895.793397]  [] kthread+0x102/0x120
[95895.794212]  [] ? trace_hardirqs_on_caller+0xf5/0x1b0
[95895.794942]  [] ? kthread_park+0x60/0x60
[95895.795689]  [] ret_from_fork+0x2a/0x40
[95895.796408] Mem-Info:
[95895.797110] active_anon:8800

Re: Still OOM problems with 4.9er kernels

2016-12-08 Thread Gerhard Wiesinger


Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes 
better).


./update.sh: line 40:  1591 Killed  ${EXE} update ${PARAMS}
(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7 
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Any chance to get it fixed in 4.9.0 release?

Ciao,
Gerhard


On 30.11.2016 08:20, Gerhard Wiesinger wrote:

Hello,

See also:
Bug 1314697 - Kernel 4.4.3-300.fc23.x86_64 is not stable inside a KVM VM
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Ciao,
Gerhard


On 30.11.2016 08:10, Gerhard Wiesinger wrote:

Hello,

I'm having out of memory situations with my "low memory" VMs in KVM 
under Fedora (Kernel 4.7, 4.8 and also before). They started to get 
more and more sensitive to OOM. I recently found the following info:


https://marius.bloggt-in-braunschweig.de/2016/11/17/linuxkernel-4-74-8-und-der-oom-killer/ 


https://www.spinics.net/lists/linux-mm/msg113661.html

Therefore I tried the latest Fedora kernels: 
4.9.0-0.rc6.git2.1.fc26.x86_64


But OOM situation is still very easy to reproduce:

1.) VM with 128-384MB under Fedora 25

2.) Having some processes run without any load (e.g. Apache)

3.) run an update with: dnf clean all; dnf update

4.) dnf python process get's killed


Please make the VM system working again in Kernel 4.9 and to use swap 
again correctly.


Thnx.

Ciao,

Gerhard

Re: Still OOM problems with 4.9er kernels

2016-12-08 Thread Gerhard Wiesinger


Hello,

same with latest kernel rc, dnf still killed with OOM (but sometimes 
better).


./update.sh: line 40:  1591 Killed  ${EXE} update ${PARAMS}
(does dnf clean all;dnf update)
Linux database.intern 4.9.0-0.rc8.git2.1.fc26.x86_64 #1 SMP Wed Dec 7 
17:53:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


Updated bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Any chance to get it fixed in 4.9.0 release?

Ciao,
Gerhard


On 30.11.2016 08:20, Gerhard Wiesinger wrote:

Hello,

See also:
Bug 1314697 - Kernel 4.4.3-300.fc23.x86_64 is not stable inside a KVM VM
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Ciao,
Gerhard


On 30.11.2016 08:10, Gerhard Wiesinger wrote:

Hello,

I'm having out of memory situations with my "low memory" VMs in KVM 
under Fedora (Kernel 4.7, 4.8 and also before). They started to get 
more and more sensitive to OOM. I recently found the following info:


https://marius.bloggt-in-braunschweig.de/2016/11/17/linuxkernel-4-74-8-und-der-oom-killer/ 


https://www.spinics.net/lists/linux-mm/msg113661.html

Therefore I tried the latest Fedora kernels: 
4.9.0-0.rc6.git2.1.fc26.x86_64


But OOM situation is still very easy to reproduce:

1.) VM with 128-384MB under Fedora 25

2.) Having some processes run without any load (e.g. Apache)

3.) run an update with: dnf clean all; dnf update

4.) dnf python process get's killed


Please make the VM system working again in Kernel 4.9 and to use swap 
again correctly.


Thnx.

Ciao,

Gerhard

Another kernel OOPS still not fixed


Hello,

There is another major kernel OOPS which is still not fixed over a year

Bug 1279188 - bind-chroot causes kernel to crash on restart (mount with 
bind option):


https://bugzilla.redhat.com/show_bug.cgi?id=1279188

Can you please fix it.

Thnx.

Ciao,

Gerhard

Another kernel OOPS still not fixed


Hello,

There is another major kernel OOPS which is still not fixed over a year

Bug 1279188 - bind-chroot causes kernel to crash on restart (mount with 
bind option):


https://bugzilla.redhat.com/show_bug.cgi?id=1279188

Can you please fix it.

Thnx.

Ciao,

Gerhard

Still OOM problems with 4.9er kernels


Hello,

I'm having out of memory situations with my "low memory" VMs in KVM 
under Fedora (Kernel 4.7, 4.8 and also before). They started to get more 
and more sensitive to OOM. I recently found the following info:


https://marius.bloggt-in-braunschweig.de/2016/11/17/linuxkernel-4-74-8-und-der-oom-killer/
https://www.spinics.net/lists/linux-mm/msg113661.html

Therefore I tried the latest Fedora kernels: 4.9.0-0.rc6.git2.1.fc26.x86_64

But OOM situation is still very easy to reproduce:

1.) VM with 128-384MB under Fedora 25

2.) Having some processes run without any load (e.g. Apache)

3.) run an update with: dnf clean all; dnf update

4.) dnf python process get's killed


Please make the VM system working again in Kernel 4.9 and to use swap 
again correctly.


Thnx.

Ciao,

Gerhard

Still OOM problems with 4.9er kernels


Hello,

I'm having out of memory situations with my "low memory" VMs in KVM 
under Fedora (Kernel 4.7, 4.8 and also before). They started to get more 
and more sensitive to OOM. I recently found the following info:


https://marius.bloggt-in-braunschweig.de/2016/11/17/linuxkernel-4-74-8-und-der-oom-killer/
https://www.spinics.net/lists/linux-mm/msg113661.html

Therefore I tried the latest Fedora kernels: 4.9.0-0.rc6.git2.1.fc26.x86_64

But OOM situation is still very easy to reproduce:

1.) VM with 128-384MB under Fedora 25

2.) Having some processes run without any load (e.g. Apache)

3.) run an update with: dnf clean all; dnf update

4.) dnf python process get's killed


Please make the VM system working again in Kernel 4.9 and to use swap 
again correctly.


Thnx.

Ciao,

Gerhard

Re: Still OOM problems with 4.9er kernels


Hello,

See also:
Bug 1314697 - Kernel 4.4.3-300.fc23.x86_64 is not stable inside a KVM VM
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Ciao,
Gerhard


On 30.11.2016 08:10, Gerhard Wiesinger wrote:

Hello,

I'm having out of memory situations with my "low memory" VMs in KVM 
under Fedora (Kernel 4.7, 4.8 and also before). They started to get 
more and more sensitive to OOM. I recently found the following info:


https://marius.bloggt-in-braunschweig.de/2016/11/17/linuxkernel-4-74-8-und-der-oom-killer/ 


https://www.spinics.net/lists/linux-mm/msg113661.html

Therefore I tried the latest Fedora kernels: 
4.9.0-0.rc6.git2.1.fc26.x86_64


But OOM situation is still very easy to reproduce:

1.) VM with 128-384MB under Fedora 25

2.) Having some processes run without any load (e.g. Apache)

3.) run an update with: dnf clean all; dnf update

4.) dnf python process get's killed


Please make the VM system working again in Kernel 4.9 and to use swap 
again correctly.


Thnx.

Ciao,

Gerhard

Re: Still OOM problems with 4.9er kernels


Hello,

See also:
Bug 1314697 - Kernel 4.4.3-300.fc23.x86_64 is not stable inside a KVM VM
https://bugzilla.redhat.com/show_bug.cgi?id=1314697

Ciao,
Gerhard


On 30.11.2016 08:10, Gerhard Wiesinger wrote:

Hello,

I'm having out of memory situations with my "low memory" VMs in KVM 
under Fedora (Kernel 4.7, 4.8 and also before). They started to get 
more and more sensitive to OOM. I recently found the following info:


https://marius.bloggt-in-braunschweig.de/2016/11/17/linuxkernel-4-74-8-und-der-oom-killer/ 


https://www.spinics.net/lists/linux-mm/msg113661.html

Therefore I tried the latest Fedora kernels: 
4.9.0-0.rc6.git2.1.fc26.x86_64


But OOM situation is still very easy to reproduce:

1.) VM with 128-384MB under Fedora 25

2.) Having some processes run without any load (e.g. Apache)

3.) run an update with: dnf clean all; dnf update

4.) dnf python process get's killed


Please make the VM system working again in Kernel 4.9 and to use swap 
again correctly.


Thnx.

Ciao,

Gerhard

Re: Linux 4.2.4

2015-11-09 Thread Gerhard Wiesinger

On 08.11.2015 18:20, Greg KH wrote:

On Sun, Nov 08, 2015 at 02:51:01PM +0100, Gerhard Wiesinger wrote:

On 25.10.2015 17:29, Greg KH wrote:

On Sun, Oct 25, 2015 at 11:48:54AM +0100, Gerhard Wiesinger wrote:

On 25.10.2015 10:46, Willy Tarreau wrote:

ipset *triggered* the problem. The whole stack dump would tell more.

OK, find the stack traces in the bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1272645

Kernel 4.1.10 triggered also a kernel dump when playing with ipset commands
and IPv6, details in the bug report

Kernel 4.2 seems to me not well tested in the netfilter parts at all
(Bug with already known bugfix
https://lists.debian.org/debian-kernel/2015/10/msg00034.html was
triggered on 2 of 3 of my machines, the new bug on 1 of 1 tested machine).

There's a reason why Greg maintains stable and LTS kernels :-)

Stable kernels don't crash but definiton. :-)

At least triggered 2 kernel panics in 5min, even with 4.1.10 and ipset
commands ...

Does this happen also with Linus's tree? I suggest you ask the
networking developers about this on net...@vger.kernel.org, there's
nothing that I can do on my own about this, sorry.

Patch is now available, see:
[PATCH 0/3] ipset patches for nf
https://marc.info/?l=netfilter-devel=144690007708041=2
https://marc.info/?l=netfilter-devel=144690007808042=2
https://marc.info/?l=netfilter-devel=144690008608043=2
https://marc.info/?l=netfilter-devel=144690007708039=2
[ANNOUNCE] ipset 6.27 released
https://marc.info/?l=netfilter-devel=144690048308099=2

Requires also new userland ipset version.

Please integrate it upstream.

Thanx to Jozsef Kadlecsik for fixing it.

That's great, can you let me know the git commits that end up in Linus's
tree? That's what we need for the stable kernel.

Find the commits here:
https://git.kernel.org/cgit/linux/kernel/git/pablo/nf.git/
https://git.kernel.org/cgit/linux/kernel/git/pablo/nf.git/commit/?id=e75cb467df29a428612c162e6f1451c5c0717091

Don't know exactly the merging processes, so feel free to merge or
contact Pablo.

Ciao,
Gerhard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 4.2.4

2015-11-09 Thread Gerhard Wiesinger

On 08.11.2015 18:20, Greg KH wrote:

On Sun, Nov 08, 2015 at 02:51:01PM +0100, Gerhard Wiesinger wrote:

On 25.10.2015 17:29, Greg KH wrote:

On Sun, Oct 25, 2015 at 11:48:54AM +0100, Gerhard Wiesinger wrote:

On 25.10.2015 10:46, Willy Tarreau wrote:

ipset *triggered* the problem. The whole stack dump would tell more.

OK, find the stack traces in the bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1272645

Kernel 4.1.10 triggered also a kernel dump when playing with ipset commands
and IPv6, details in the bug report

There's a reason why Greg maintains stable and LTS kernels :-)

Stable kernels don't crash but definiton. :-)

At least triggered 2 kernel panics in 5min, even with 4.1.10 and ipset
commands ...

Does this happen also with Linus's tree? I suggest you ask the
networking developers about this on net...@vger.kernel.org, there's
nothing that I can do on my own about this, sorry.

Requires also new userland ipset version.

Please integrate it upstream.

Thanx to Jozsef Kadlecsik for fixing it.

That's great, can you let me know the git commits that end up in Linus's
tree? That's what we need for the stable kernel.

Find the commits here:
https://git.kernel.org/cgit/linux/kernel/git/pablo/nf.git/
https://git.kernel.org/cgit/linux/kernel/git/pablo/nf.git/commit/?id=e75cb467df29a428612c162e6f1451c5c0717091

Don't know exactly the merging processes, so feel free to merge or
contact Pablo.

Re: Linux 4.2.4

2015-11-08 Thread Gerhard Wiesinger


On 25.10.2015 17:29, Greg KH wrote:

On Sun, Oct 25, 2015 at 11:48:54AM +0100, Gerhard Wiesinger wrote:

On 25.10.2015 10:46, Willy Tarreau wrote:

ipset *triggered* the problem. The whole stack dump would tell more.

OK, find the stack traces in the bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1272645

Kernel 4.1.10 triggered also a kernel dump when playing with ipset commands
and IPv6, details in the bug report  


Kernel 4.2 seems to me not well tested in the netfilter parts at all
(Bug with already known bugfix
https://lists.debian.org/debian-kernel/2015/10/msg00034.html was
triggered on 2 of 3 of my machines, the new bug on 1 of 1 tested machine).

There's a reason why Greg maintains stable and LTS kernels :-)

Stable kernels don't crash but definiton. :-)

At least triggered 2 kernel panics in 5min, even with 4.1.10 and ipset
commands ...

Does this happen also with Linus's tree?  I suggest you ask the
networking developers about this on net...@vger.kernel.org, there's
nothing that I can do on my own about this, sorry.


Patch is now available, see:
[PATCH 0/3] ipset patches for nf
https://marc.info/?l=netfilter-devel=144690007708041=2
https://marc.info/?l=netfilter-devel=144690007808042=2
https://marc.info/?l=netfilter-devel=144690008608043=2
https://marc.info/?l=netfilter-devel=144690007708039=2
[ANNOUNCE] ipset 6.27 released
https://marc.info/?l=netfilter-devel=144690048308099=2

Requires also new userland ipset version.

Please integrate it upstream.

Thanx to Jozsef Kadlecsik for fixing it.

Ciao,
Gerhard

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 4.2.4

2015-11-08 Thread Gerhard Wiesinger


On 25.10.2015 17:29, Greg KH wrote:

On Sun, Oct 25, 2015 at 11:48:54AM +0100, Gerhard Wiesinger wrote:

On 25.10.2015 10:46, Willy Tarreau wrote:

ipset *triggered* the problem. The whole stack dump would tell more.

OK, find the stack traces in the bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1272645

Kernel 4.1.10 triggered also a kernel dump when playing with ipset commands
and IPv6, details in the bug report  


Kernel 4.2 seems to me not well tested in the netfilter parts at all
(Bug with already known bugfix
https://lists.debian.org/debian-kernel/2015/10/msg00034.html was
triggered on 2 of 3 of my machines, the new bug on 1 of 1 tested machine).

There's a reason why Greg maintains stable and LTS kernels :-)

Stable kernels don't crash but definiton. :-)

At least triggered 2 kernel panics in 5min, even with 4.1.10 and ipset
commands ...

Does this happen also with Linus's tree?  I suggest you ask the
networking developers about this on net...@vger.kernel.org, there's
nothing that I can do on my own about this, sorry.


Patch is now available, see:
[PATCH 0/3] ipset patches for nf
https://marc.info/?l=netfilter-devel=144690007708041=2
https://marc.info/?l=netfilter-devel=144690007808042=2
https://marc.info/?l=netfilter-devel=144690008608043=2
https://marc.info/?l=netfilter-devel=144690007708039=2
[ANNOUNCE] ipset 6.27 released
https://marc.info/?l=netfilter-devel=144690048308099=2

Requires also new userland ipset version.

Please integrate it upstream.

Thanx to Jozsef Kadlecsik for fixing it.

Ciao,
Gerhard

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 4.2.4


On 26.10.2015 09:58, Jozsef Kadlecsik wrote:

On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:


Also any idea regarding the second isssue? Or do you think it has the
same root cause?

Looking at your RedHat bugzilla report, the "nf_conntrack: table full,
dropping packet" and "Alignment trap: not handling instruction" are two
unrelated issues and the second one is triggered by the unaligned counter
extension acccess in ipset, I'm investigating. I can't think of any reason
how those issues could be related to each other.


Yes, they are unrelated.
Issue 1: nf_conntrack: table full, dropping packet => Fixed with 4.2.4
Issue 2: Alignment trap: not handling instruction => Happens when ipset 
counters are enabled


Please keep in mind it happens with IPv6 commands.

Currently 4.2.4 without ipset counters runs well.

Ciao,
Gerhard

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 4.2.4


On 25.10.2015 22:53, Jozsef Kadlecsik wrote:

On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:


Any further ideas?

Does it crash without counters? That could narrow down where to look for.




Hello Jozsef,

it doesn't crash i I don't use the counters so far. So there must be a 
bug with the counters.


Any idea for the root cause?

Thnx.

Ciao,
Gerhard

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 4.2.4


On 25.10.2015 22:53, Jozsef Kadlecsik wrote:

On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:


Any further ideas?

Does it crash without counters? That could narrow down where to look for.




Hello Jozsef,

it doesn't crash i I don't use the counters so far. So there must be a 
bug with the counters.


Any idea for the root cause?

Thnx.

Ciao,
Gerhard

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 4.2.4