Bug#907175: 4.9.0-8-686-pae swapon failed invalid argument

2018-10-26 Thread Sven Hartge
On 26.10.18 15:38, William Salmon wrote:

> Ubuntu bug #1788321 says they have fixed this in kernels: 
> 4.4.0-137-generic #163~14.04.1-Ubuntu SMP Mon Sep 24 17:13:32 UTC 2018 i686 
> (above is Ubuntu16/14)
> and 
> 4.15.0-36-generic #39-Ubuntu i686
> 
> This is still an issue with latest kernel on 32-bit Debian 9 kernel :
>   4.9.0-8-686-pae #1 SMP Debian 4.9.110-3+deb9u6 (2018-10-08) 
> gives:swapon: /dev/sdb1: swapon failed: Invalid argument
> 
> so we are stuck on 4.9.0-7 
> 
> Any news on a fix, please ?

Since this bug can only happen on 64bit capable CPUs with 42+ bits
physical address size, a valid workaround is to install the amd64
kernel. (At least this is what I did for my affected systems for the
time being.)

A proper fix would be of course nice.

Grüße,
Sven.




signature.asc
Description: OpenPGP digital signature


Bug#864642: Effects Vanilla images as well

2017-11-14 Thread Sven Hartge
On 13.11.2017 07:03, Dennis Downs III wrote:

> I experienced this problem running a minecraft server on in a VM. The
> server would stay up until shortly after joining it in game. After <5min
> i'd lose connection. The virtual console would freeze, with no new
> messages on either linux or java consoles. VMWare resource monitoring
> indicated that the system had stopped completely. Following others
> example, I changed the nic type from from vmxnet3 to E1000e. This also
> solved the issue for me. I'm a linux novice but can follow instructions
> well. If there's any more info I can provide, let me know.
> 
> Host:
> Image profileESXi-6.5.0-20170104001-standard (VMware, Inc.)
> vSphere HA stateNot configured

This ESXi is too old. This is 6.5.0a from February 2017, still
containing the issue,

This problem was fixed in 6.5.0u1 (Build 5969303), released in July 2017.

https://kb.vmware.com/s/article/2149931

Grüße,
Sven.



signature.asc
Description: OpenPGP digital signature


Bug#864642: vmxnet3: Reports suspect GRO implementation on vSphere hosts / one VM crashes

2017-08-11 Thread Sven Hartge
On 10.08.2017 15:09, Andrew Moore wrote:

> Both of those reports were me. I suspect the issue may be isolated to
> the HPE custom implementation of the ESXi 6.5u1 build. I haven't seen
> any similar reports of people using the vanilla 6.5u1 build.

Not surprising. It wouldn't be the first time HPE horribly botched their
ESX custom ISOs. (Which is the prime reason I don't *ever* use custom
vendor ISOs from any vendor in the first place.)

> Interestingly none of the fixes that have been discussed work with this
> build either. This includes disabling the rx-mini buffer (# ethtool -G
>  rx-mini 0) and adding vmxnet3.rev.30 = FALSE to the VMs vmx
> file.

Very strange, indeed.

> The only way I've managed to restore stability is by removing vmxnet3
> out of the equation completely and changing to the e1000 NIC type.

Using a HW version lower than 13 should also help.


Unfortunately the sample size of people reporting failure or success is
very small at the time, a conclusive result can't be drawn, I am afraid.

Grüße,
Sven.



signature.asc
Description: OpenPGP digital signature


Bug#864642: vmxnet3: Reports suspect GRO implementation on vSphere hosts / one VM crashes

2017-08-08 Thread Sven Hartge
Um 16:22 Uhr am 03.08.17 schrieb Sven Hartge:
> On 03.08.2017 15:34, Patrick Matthäi wrote:
>> Am 16.07.2017 um 23:42 schrieb Ben Hutchings:
>>> On Thu, 2017-07-06 at 21:50 +0200, Sven Hartge wrote:
 
>>>>> Could this be https://bugzilla.kernel.org/show_bug.cgi?id=191201 ?
>>> Note that this has been root-caused as a bug in the virtual device, not
>>> the driver.  (Though it would be nice if the driver could work around
>>> it.)
> 
>> I can confirm, that the VMs do not crash anymore with vSphere 6.5 build
>> 5969303 from 27.07.2017, that is why I lowered the severity.
> 
> This is the version from 6.5u1, right?
> 
> Still: Stretch is basically unusable with HW13 on ESX6.5 below Update1.

Hmm. There are discussions on Reddit right now indicating the bug still 
occurs even with the latest ESXi6.5u1 (Build 5969303).

https://www.reddit.com/r/homelab/comments/6s5dh6/debian_9_on_esxi_65u1_complete_lockup/

One of the latest comments on the Kernel Bugzilla shows the same:

https://bugzilla.kernel.org/show_bug.cgi?id=191201#c54

(For me, this is really frustrating right now, since I waited until 
ESX6.5u1 before updating my infrastructure and now it seems I have to push 
this update even farther into the future because of this critical blocker 
bug.)

I really wonder what could be done on the Kernel side to avoid the 
problem, since only newer Kernel are affected while older one don't show 
the problem.

Grüße,
Sven.



Bug#864642: vmxnet3: Reports suspect GRO implementation on vSphere hosts / one VM crashes

2017-08-03 Thread Sven Hartge
On 03.08.2017 15:34, Patrick Matthäi wrote:
> Am 16.07.2017 um 23:42 schrieb Ben Hutchings:
>> On Thu, 2017-07-06 at 21:50 +0200, Sven Hartge wrote:

>>>> Could this be https://bugzilla.kernel.org/show_bug.cgi?id=191201 ?
>> Note that this has been root-caused as a bug in the virtual device, not
>> the driver.  (Though it would be nice if the driver could work around
>> it.)

> I can confirm, that the VMs do not crash anymore with vSphere 6.5 build
> 5969303 from 27.07.2017, that is why I lowered the severity.

This is the version from 6.5u1, right?

Still: Stretch is basically unusable with HW13 on ESX6.5 below Update1.

Grüße,
Sven.





signature.asc
Description: OpenPGP digital signature


Bug#864642: vmxnet3: Reports suspect GRO implementation on vSphere hosts / one VM crashes

2017-07-06 Thread Sven Hartge
Hi!

Could this be https://bugzilla.kernel.org/show_bug.cgi?id=191201 ?

Try the following, from comment 37 
https://bugzilla.kernel.org/show_bug.cgi?id=191201#c37

| In the meantime, suggested workaround:
|  - disable rx data ring: ethtool -G eth? rx-mini 0

Also adding "vmxnet3.rev.30 = FALSE" to the vmx file of the VM seems to 
be needed. https://bugzilla.kernel.org/show_bug.cgi?id=191201#c40

Also: Which hardware version are you running? It is v10 for me (highest 
for ESX5.5)

Grüße,
Sven.



Bug#864642: vmxnet3: Reports suspect GRO implementation on vSphere hosts / one VM crashes

2017-07-06 Thread Sven Hartge
On Mon, 12 Jun 2017 10:02:56 +0200 =?UTF-8?Q?Patrick_Matth=c3=a4i?=
 wrote:

> Since updating the kernel from linux-image-4.9.0-2-amd64 (4.9.18-1) to
> linux-image-4.9.0-3-amd64 (4.9.30-1) all VMs report - just for the
> "primary" interface this:
> 
> TCP: ens192: Driver has suspect GRO implementation, TCP performance may
> be compromised.
> 
> I can't see any performance impact. This happens on all our vSphere 6.0
> and 6.5 hosts (running on HPE ProLiant DL 360 G8 - G9 HW / ProLiant ML
> 350 G9 and so on).

I see the same for my Stretch Test VMs, running on ESXi 5.5 on Dell R720.

I have yet to experience a kernel panic, but those VMs are mostly idle
and don't transfer many bytes via network, so the crash-intensity might
be related to the amount of data transmitted or the peak throughput at
some time.

Grüße,
Sven.



signature.asc
Description: OpenPGP digital signature


Bug#856808: error during mount: first meta block group too large

2017-03-25 Thread Sven Hartge
On Mon, 20 Mar 2017 12:42:32 +0100 Sven Hartge <s...@svenhartge.de> wrote:

> I'd really like to see this fixed in Debian (and the LTS branch 4.9 in
> general), because after doing some test upgrades in clones of some of my
> older systems this problem shows up in 3 of 12 of them, breaking them
> hard after the upgrade.

The needed patch is in the queue for 4.9.18:

Message-ID: <20170324151226.795130...@linuxfoundation.org>

Grüße,
Sven.



signature.asc
Description: OpenPGP digital signature


Bug#856808: error during mount: first meta block group too large

2017-03-20 Thread Sven Hartge
On Sat, 04 Mar 2017 23:34:22 +0100 Sven Hartge <s...@svenhartge.de> wrote:

> This is https://bugzilla.kernel.org/show_bug.cgi?id=194567 and got fixed
> by https://patchwork.ozlabs.org/patch/728066/

The problem is still present in 4.9.16:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/fs/ext4/super.c?h=linux-4.9.y#n3832

but is fixed in 4.10.x:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/fs/ext4/super.c?h=linux-4.10.y#n3847

But the patch from Theodore T'so was sent to <sta...@vger.kernel.org>
making me think it would be applied to all stable branches and not just
the latest one. (Am I missing something here?)

I'd really like to see this fixed in Debian (and the LTS branch 4.9 in
general), because after doing some test upgrades in clones of some of my
older systems this problem shows up in 3 of 12 of them, breaking them
hard after the upgrade.

Grüße,
Sven.



signature.asc
Description: OpenPGP digital signature


Bug#856808: error during mount: first meta block group too large

2017-03-04 Thread Sven Hartge
Package: linux-image-4.9.0-2-amd64
Version: 4.9.13-1
Severity: important
Tags: patch upstream

Hi!

After rebooting one of my systems with 4.9.x I got hit by the following
error:

[  309.934171] EXT4-fs (dm-5): mounting ext3 file system using the ext4 
subsystem
[  309.934748] EXT4-fs (dm-5): first meta block group too large: 1 (group 
descriptor block count 1)

Unfortunately for me this is my /var filesystem.

This is https://bugzilla.kernel.org/show_bug.cgi?id=194567 and got fixed
by https://patchwork.ozlabs.org/patch/728066/

Rebuilding the kernel with the patch applied allows my system to mount
and boot again. Please add the fix to the next release, should be fix
not be included in a stable release until then.

Grüße,
Sven.

-- System Information:
Debian Release: 9.0
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'testing-debug'), (500, 
'unstable'), (500, 'testing'), (200, 'experimental'), (1, 'experimental-debug')
Architecture: i386 (x86_64)
Foreign Architectures: amd64

Kernel: Linux 4.8.0-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=de_DE.utf8, LC_CTYPE=de_DE.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages linux-image-4.9.0-2-amd64:amd64 depends on:
ii  initramfs-tools [linux-initramfs-tool]  0.127
ii  kmod24-1
ii  linux-base  4.5

Versions of packages linux-image-4.9.0-2-amd64:amd64 recommends:
ii  firmware-linux-free  3.4
ii  irqbalance   1.1.0-2.2

Versions of packages linux-image-4.9.0-2-amd64:amd64 suggests:
pn  debian-kernel-handbook  
ii  grub-pc 2.02~beta3-5
pn  linux-doc-4.9   

-- debconf-show failed
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index dde14a7ac6d7..a673558fe5f8 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3860,7 +3860,7 @@ static int ext4_fill_super(struct super_block *sb, void 
*data, int silent)
db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
   EXT4_DESC_PER_BLOCK(sb);
if (ext4_has_feature_meta_bg(sb)) {
-   if (le32_to_cpu(es->s_first_meta_bg) >= db_count) {
+   if (le32_to_cpu(es->s_first_meta_bg) > db_count) {
ext4_msg(sb, KERN_WARNING,
 "first meta block group too large: %u "
 "(group descriptor block count %u)",
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index dde14a7ac6d7..a673558fe5f8 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3860,7 +3860,7 @@ static int ext4_fill_super(struct super_block *sb, void 
*data, int silent)
db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
   EXT4_DESC_PER_BLOCK(sb);
if (ext4_has_feature_meta_bg(sb)) {
-   if (le32_to_cpu(es->s_first_meta_bg) >= db_count) {
+   if (le32_to_cpu(es->s_first_meta_bg) > db_count) {
ext4_msg(sb, KERN_WARNING,
 "first meta block group too large: %u "
 "(group descriptor block count %u)",


Bug#856382: /boot/vmlinuz-4.9.0-2-amd64: external modules fail to load with "disagrees about version of symbol module_layout"

2017-02-28 Thread Sven Hartge
Package: src:linux
Version: 4.9.13-1
Severity: important
File: /boot/vmlinuz-4.9.0-2-amd64

Hi!

After installation of 4.9.13-1 and rebooting, external kernel modules
fail to load:

[   40.801423] vboxdrv: disagrees about version of symbol module_layout
[   44.170949] nvidia: disagrees about version of symbol module_layout
[   44.824807] nvidia: disagrees about version of symbol module_layout
[   45.431563] nvidia: disagrees about version of symbol module_layout
[   46.070232] nvidia: disagrees about version of symbol module_layout
[   46.683785] nvidia: disagrees about version of symbol module_layout

After removing and reinstalling them via dkms the error is gone.
It seems 4.9.13-1 is not as ABI-compatible as it should be.

Grüße,
Sven.

-- Package-specific info:
** Version:
Linux version 4.9.0-2-amd64 (debian-kernel@lists.debian.org) (gcc version 6.3.0 
20170221 (Debian 6.3.0-8) ) #1 SMP Debian 4.9.13-1 (2017-02-27)

** Command line:
BOOT_IMAGE=/boot/vmlinuz-4.9.0-2-amd64 root=/dev/md0 ro nvidia-drm.modeset=1

** Tainted: POE (12289)
 * Proprietary module has been loaded.
 * Out-of-tree module has been loaded.
 * Unsigned module has been loaded.

** Kernel log:
Unable to read kernel log; any relevant messages should be attached

** Model information
sys_vendor: MSI
product_name: MS-7760
product_version: 1.0
chassis_vendor: MSI
chassis_version: 1.0
bios_vendor: American Megatrends Inc.
bios_version: V1.8
board_vendor: MSI
board_name: X79A-GD65 (8D) (MS-7760)
board_version: 1.0

** Loaded modules:
rpcsec_gss_krb5
auth_rpcgss
nfsv4
dns_resolver
nfs
lockd
grace
fscache
fuse
snd_hrtimer
snd_seq_midi
snd_seq_midi_event
snd_rawmidi
snd_seq
snd_seq_device
pci_stub
vboxpci(OE)
vboxnetadp(OE)
vboxnetflt(OE)
cpufreq_powersave
cpufreq_conservative
cpufreq_userspace
vboxdrv(OE)
dm_crypt
binfmt_misc
intel_rapl
pl2303
usbserial
joydev
snd_hda_codec_hdmi
edac_core
snd_hda_codec_realtek
x86_pkg_temp_thermal
intel_powerclamp
snd_hda_codec_generic
iTCO_wdt
crct10dif_pclmul
iTCO_vendor_support
crc32_pclmul
snd_hda_intel
ghash_clmulni_intel
snd_hda_codec
intel_cstate
snd_hda_core
intel_uncore
snd_hwdep
snd_pcm_oss
intel_rapl_perf
serio_raw
sg
pcspkr
snd_mixer_oss
snd_pcm
mei_me
snd_timer
snd
mei
tpm_tis
soundcore
shpchp
lpc_ich
mfd_core
tpm_tis_core
evdev
sch_fq_codel
f71882fg
coretemp
tpm_rng
tpm
rng_core
ecryptfs
sunrpc
cbc
hmac
encrypted_keys
parport_pc
ppdev
lp
parport
ip_tables
x_tables
autofs4
ext4
crc16
jbd2
fscrypto
mbcache
btrfs
raid10
raid456
async_raid6_recov
async_memcpy
async_pq
async_xor
async_tx
xor
raid6_pq
libcrc32c
raid0
multipath
linear
dm_mirror
dm_region_hash
dm_log
nvidia_uvm(POE)
nvidia_drm(POE)
nvidia_modeset(POE)
nvidia(POE)
drm_kms_helper
drm
dm_mod
raid1
md_mod
sr_mod
cdrom
hid_generic
hid_cherry
usbhid
hid
sd_mod
ahci
libahci
firewire_ohci
crc32c_intel
xhci_pci
firewire_core
ehci_pci
e1000e
libata
aesni_intel
xhci_hcd
ehci_hcd
aes_x86_64
glue_helper
lrw
gf128mul
i2c_i801
ablk_helper
usbcore
ptp
mxm_wmi
crc_itu_t
fjes
pps_core
cryptd
scsi_mod
i2c_smbus
psmouse
usb_common
wmi
button

** PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation Xeon E5/Core i7 DMI2 [8086:3c00] 
(rev 07)
Subsystem: Micro-Star International Co., Ltd. [MSI] Xeon E5/Core i7 
DMI2 [1462:7760]
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 

00:01.0 PCI bridge [0604]: Intel Corporation Xeon E5/Core i7 IIO PCI Express 
Root Port 1a [8086:3c02] (rev 07) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: 
Kernel driver in use: pcieport
Kernel modules: shpchp

00:02.0 PCI bridge [0604]: Intel Corporation Xeon E5/Core i7 IIO PCI Express 
Root Port 2a [8086:3c04] (rev 07) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: 
Kernel driver in use: pcieport
Kernel modules: shpchp

00:03.0 PCI bridge [0604]: Intel Corporation Xeon E5/Core i7 IIO PCI Express 
Root Port 3a in PCI Express Mode [8086:3c08] (rev 07) (prog-if 00 [Normal 
decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: 
Kernel driver in use: pcieport
Kernel modules: 

Bug#851481: linux-headers-4.9.0-1-amd64:amd64 not installable with 32bit userland

2017-01-15 Thread Sven Hartge
Source: linux
Version: 4.9.2-2
Severity: normal

Hi!

I am using a mixed system with 64bit Kernel and 32bit userland.

With linux-headers-4.8-0-2-amd64 this worked fine, I was able to install
the kernel headers and use dkms to compile out-of-tree modules such as
xtables-addons.

With linux-headers-4.9.0-1-amd64 switching its dependencies to
gcc-6:amd64 I am no longer able to do so.

And trying to install a 64bit gcc (and related tools) would effectively
uninstall half of my system.

Is running a 64bit system with 32bit userland no longer supported or did
I miss something obvious?

Grüße,
Sven.

-- System Information:
Debian Release: stretch/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'testing-debug'), (500, 
'unstable'), (500, 'testing'), (200, 'experimental'), (1, 'experimental-debug')
Architecture: i386 (x86_64)
Foreign Architectures: amd64

Kernel: Linux 4.8.0-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=de_DE.utf8, LC_CTYPE=de_DE.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)


Bug#819390: linux-image-4.5.0-trunk-armmp-lpae: Ethernet on Firefly-RK3288

2016-04-29 Thread Sven Hartge
On Sun, 27 Mar 2016 15:13:16 -0700 Vagrant Cascadian
 wrote:

> When upgrading to linux 4.5-1~exp1, the ethernet no longer responds
> correctly.  Downgrading back to 4.4.6-1 from sid/stretch fixes the issue.

I just want to add I still see this problem on a Cubietruck with
linux-image-4.5.0-1-armmp-lpae 4.5.1-1.

Only way to get the network back is to downgrade to 4.4.6-1.

Related LKML thread: https://lkml.org/lkml/2016/3/15/63

Grüße,
Sven.



Bug#778239: Strange GRE packet forwarding slowness in 3.16.7-ckt2-1~bpo70+1

2015-02-18 Thread Sven Hartge
On 18.02.2015 02:25, Ben Hutchings wrote:
 On Wed, 2015-02-18 at 00:14 +0100, Sven Hartge wrote:
 On Thu, 12 Feb 2015 16:33:09 +0100 Sven Hartge s...@svenhartge.de wrote:

 I have encountered a strange slowness on a router/packetfilter system
 (Wheezy 7.8 with backported kernel 3.16.7-ckt2-1~bpo70+1) of mine while
 forwarding GRE packets.

 Could this be the fix for this bug:

 https://lists.ubuntu.com/archives/kernel-team/2014-December/052158.html

 gre: Set inner mac header in gro complete
 
 I don't know.
 
 I cannot confirm this myself right now, as the only systems affected are
 in production and I am not able to set up a lab installation right now.
 
 As that went into 3.16.7-ckt3, it is therefore included in the current
 packages in testing/unstable.

Well, I've got a window for testing later today. I will report back, if
this release fixes the problem I am seeing.

Grüße,
Sven.




signature.asc
Description: OpenPGP digital signature


Bug#778239: Strange GRE packet forwarding slowness in 3.16.7-ckt2-1~bpo70+1

2015-02-18 Thread Sven Hartge
On 18.02.2015 09:17, Sven Hartge wrote:
 On 18.02.2015 02:25, Ben Hutchings wrote:
 On Wed, 2015-02-18 at 00:14 +0100, Sven Hartge wrote:
 On Thu, 12 Feb 2015 16:33:09 +0100 Sven Hartge s...@svenhartge.de wrote:

 I have encountered a strange slowness on a router/packetfilter system
 (Wheezy 7.8 with backported kernel 3.16.7-ckt2-1~bpo70+1) of mine while
 forwarding GRE packets.

 Could this be the fix for this bug:

 https://lists.ubuntu.com/archives/kernel-team/2014-December/052158.html

 gre: Set inner mac header in gro complete

 I don't know.

 I cannot confirm this myself right now, as the only systems affected are
 in production and I am not able to set up a lab installation right now.

 As that went into 3.16.7-ckt3, it is therefore included in the current
 packages in testing/unstable.
 
 Well, I've got a window for testing later today. I will report back, if
 this release fixes the problem I am seeing.

Yes, this patch fixes this problem for me. Should I close the bug or do
you want to do it?

Grüße,
Sven.



signature.asc
Description: OpenPGP digital signature


Bug#778239: Strange GRE packet forwarding slowness in 3.16.7-ckt2-1~bpo70+1

2015-02-17 Thread Sven Hartge
On Thu, 12 Feb 2015 16:33:09 +0100 Sven Hartge s...@svenhartge.de wrote:

 I have encountered a strange slowness on a router/packetfilter system
 (Wheezy 7.8 with backported kernel 3.16.7-ckt2-1~bpo70+1) of mine while
 forwarding GRE packets.

Could this be the fix for this bug:

https://lists.ubuntu.com/archives/kernel-team/2014-December/052158.html

gre: Set inner mac header in gro complete

I cannot confirm this myself right now, as the only systems affected are
in production and I am not able to set up a lab installation right now.

Grüße,
Sven.



signature.asc
Description: OpenPGP digital signature


Bug#778239: Strange GRE packet forwarding slowness in 3.16.7-ckt2-1~bpo70+1

2015-02-12 Thread Sven Hartge
Package: src:linux
Version: 3.16.7-ckt2-1~bpo70+1
Severity: normal

Hi!

I have encountered a strange slowness on a router/packetfilter system
(Wheezy 7.8 with backported kernel 3.16.7-ckt2-1~bpo70+1) of mine while
forwarding GRE packets.

The system in question does not terminate the GRE tunnels itself, it
merely forwards those packets from one interface to another.

This problem occurred after a reboot from a 3.12-bpo to the current
3.16.7-ckt2-1~bpo70+1.

This problem reminded me of
http://lists.openwall.net/netdev/2014/07/08/132 and
http://patchwork.ozlabs.org/patch/324953/ and as described on those
mailing-list exchanges, after disabling GSO and GRO on the involved
interfaces, the forwarding speed for GRE packets was back to normal.

What puzzles me is the fact that this bug should be fixed since 3.14.x
and 3.15.x and sure enough, I can find the code changes from the patch
from patchwork in the current kernel code in the Debian package.

But the symptoms are the same as described: with active GRO/GSO the GRE
tunnels max out at about 200kbit/s, after disabling GRO/GSO the possible
bandwidth is only limited by the speed of the interfaces and connections
involved.

The hardware is like this:

14:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit
SFI/SFP+ Network Connection (rev 01)
14:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit
SFI/SFP+ Network Connection (rev 01)
0e:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit
SFI/SFP+ Network Connection (rev 01)
0e:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit
SFI/SFP+ Network Connection (rev 01)

ixgbe :14:00.0: Multiqueue Enabled: Rx Queue count = 16, Tx Queue
count = 16
ixgbe :14:00.0: PCI Express bandwidth of 32GT/s available
ixgbe :14:00.0: (Speed:5.0GT/s, Width: x8, Encoding Loss:20%)
ixgbe :14:00.0: MAC: 2, PHY: 14, SFP+: 5, PBA No: FF-0FF
ixgbe :14:00.0: 00:10:f3:33:d3:e8
ixgbe :14:00.0: Intel(R) 10 Gigabit Network Connection

I created two LACP bonds (bond0 and bond1) out of two interfaces and on
top of those two bonds there are several VLANs. The GRE packets are
forwarded from a VLAN on bond0 to a VLAN on bond1.

The regain the full bandwidth I disabled GSO and GRO on each of the four
slave interfaces and both bond interfaces.

Grüße,
Sven.








signature.asc
Description: OpenPGP digital signature


Bug#515741: flock() error with ocfs2/dlm

2009-07-21 Thread Sven Hartge
Moritz Muehlenhoff wrote:
 On Tue, Feb 17, 2009 at 03:21:51PM +0100, Sven Hartge wrote:
 Um 11:58 Uhr am 17.02.09 schrieb Sven Hartge:

 Right now I am locally rebuilding the Lenny kernel with the patches from 
 2b83256407687613e906bee93d98a25339128a4d..7b791d68562e4ce5ab57cbacb10a1ad4ee33956e
 to see if this solves this problem.

 OK, applying the whole dlm-fixes-series changes the ABI, so I am building 
 a test-kernel with only 7b791d68562e4ce5ab57cbacb10a1ad4ee33956e right 
 now.

 As this bug is more likely to trigger with more than two nodes, this will 
 seriously affect larger setups.

 What was the result of your tests? Did 
 7b791d68562e4ce5ab57cbacb10a1ad4ee33956e
 fix the problem?

I'm unfortunately not directly the administrator of the affected systems.

I'm Cc'ing Marc Kowal, who might be able to provide the necessary feedback.

Marc: Please have a look at the bug
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=515741. This is about
the locking problem you experienced with the 4-node eStudy cluster and
ocfs2.

-- 
Grüße,
Sven Hartge


  - Dienst-Entwicklung, Server-Betreuung -
IT-Services FH Gießen-Friedberg (Bereich Gießen)
   http://www.dvz.fh-giessen.de

Telefon: +49 641 309-1291
Fax: +49 641 309-1288




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#515741: flock() error with ocfs2/dlm

2009-02-17 Thread Sven Hartge
Package: linux-image-2.6.26-1-amd64
Version: 2.6.26-13
Severity: normal
Tags: patch

Hi!

One of our users was hit by the problems described in the 
thread http://www.mail-archive.com/ocfs2-us...@oss.oracle.com/msg02915.html

Is there any chance to get this investigated and the patch/patches
eventually integrated into an updated kernel for Lenny?

Right now I am locally rebuilding the Lenny kernel with the patches from 
2b83256407687613e906bee93d98a25339128a4d..7b791d68562e4ce5ab57cbacb10a1ad4ee33956e
to see if this solves this problem.

Grüße,
Sven.

-- System Information:
Debian Release: 5.0
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 2.6.26.3-210
Locale: lang=de...@euro, lc_ctype=de...@euro (charmap=ISO-8859-15)
Shell: /bin/sh linked to /bin/bash



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#515741: flock() error with ocfs2/dlm

2009-02-17 Thread Sven Hartge
Um 11:58 Uhr am 17.02.09 schrieb Sven Hartge:

 Right now I am locally rebuilding the Lenny kernel with the patches from 
 2b83256407687613e906bee93d98a25339128a4d..7b791d68562e4ce5ab57cbacb10a1ad4ee33956e
 to see if this solves this problem.

OK, applying the whole dlm-fixes-series changes the ABI, so I am building 
a test-kernel with only 7b791d68562e4ce5ab57cbacb10a1ad4ee33956e right 
now.

As this bug is more likely to trigger with more than two nodes, this will 
seriously affect larger setups.

Grüße,
Sven.



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#324583: linux-patch-debian-2.6.12: missing default version numbers in unpatch/apply script break make-kpkg

2005-08-22 Thread Sven Hartge
Package: linux-patch-debian-2.6.12
Version: 2.6.12-5
Severity: important


Somehow the @upstream@ and @version@ macros didn't get expanded in
debian/bin/unpatch and debian/bin/apply which causes

  PATCH_THE_KERNEL=YES make-kpkg cleab

to fail with the following message:

/usr/bin/make -f /usr/share/kernel-package/rules unpatch_now
make[2]: Entering directory `/usr/src/linux-source-2.6.12'
for patch in /usr/src/kernel-patches/all/2.6.12/unpatch/debian ; do 
 \
  if test -x  $patch; then\
  if $patch; then \
  echo Removed Patch $patch ;   \
  else \
   echo Patch $patch  failed.;  \
   echo Hit return to Continue;  \
   read ans;   \
  fi;  \
  fi;  \
done
/usr/src/kernel-patches/all/2.6.12/unpatch/debian: line 8: 
/usr/src/kernel-patches/all//apply/debian: No such file or directory
Patch /usr/src/kernel-patches/all/2.6.12/unpatch/debian  failed.
Hit return to Continue

(Note the missing version between /all/ and /apply/ in the error message.)

Building a kernel 

  PATCH_THE_KERNEL=YES make-kpkg kernel-image

fails with with

for patch in /usr/src/kernel-patches/all/2.6.12/apply/debian ; do\
  if test -x  $patch; then\
  if $patch; then \
  echo Patch $patch processed fine; \
  echo $patch  applied_patches;   \
  else \
   echo Patch $patch  failed.;  \
   echo Hit return to Continue;  \
   read ans;   \
  fi;  \
  fi;  \
done
E: Can't patch to nonexistent revision  (wait until 2006)
Patch /usr/src/kernel-patches/all/2.6.12/apply/debian  failed.
Hit return to Continue

Changing line 158 in /usr/src/kernel-patches/all/2.6.12/apply/debian from

  version=${override_version:-}

to

  version=${override_version:-2.6.12-5}

and line 6 in /usr/src/kernel-patches/all/2.6.12/unpatch/debian

  upstream=${override_upstream:-}

to

  upstream=${override_upstream:-2.6.12}

fixes the problem.

Of course the real problem lies somewhere in the build process for this
patch, since @upstream@ and @version@ where expanded to empty strings
instead of the correct values.


-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (990, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.12-251
Locale: [EMAIL PROTECTED], [EMAIL PROTECTED] (charmap=ISO-8859-15)

Versions of packages linux-patch-debian-2.6.12 depends on:
ii  bash  3.0-15 The GNU Bourne Again SHell
ii  bzip2 1.0.2-8high-quality block-sorting file co
ii  grep-dctrl2.6.7  Grep Debian package information
ii  patch 2.5.9-2Apply a diff file to an original

linux-patch-debian-2.6.12 recommends no packages.

-- no debconf information


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]