Bug#931781: rsync: Buster hangs when rsyncing large (400M) files over ssh. Same hardware works OK with Stretch

2019-07-13 Thread hfvk

Paul Slootman kirjoitti 13.7.2019 14:49:

Seeing the error messages you are getting, it sounds like there is a
memory shortage, possibly the vmware ballooning driver is failing to
provide sufficient memory in time.

It does not look like rsync is to blame for your problems.


Paul


Indeed, I have investigated this further.

The issues seems to be somehow related to the combination of the 
megaraid_sas driver, RAID controller fw and the linux kernel I am using. 
I have now repeated this issue on both Debian 10 and Ubuntu 18.04.


I will continue the examination and report to the correct forum.

I apologize for the erroneous bug report on rsync. This bug can be 
closed.


Antti



Bug#931781: rsync: Buster hangs when rsyncing large (400M) files over ssh. Same hardware works OK with Stretch

2019-07-12 Thread hfvk

Antti kirjoitti 10.7.2019 13:45:

Package: rsync
Version: 3.1.3-6
Severity: critical
Justification: breaks unrelated software



-- System Information:
Debian Release: 10.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.19.0-5-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8),
LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages rsync depends on:
ii  base-files   10.3
ii  init-system-helpers  1.56+nmu1
ii  libacl1  2.2.53-4
ii  libattr1 1:2.4.48-4
ii  libc62.28-10
ii  libpopt0 1.16-12
ii  lsb-base 10.2019051400

rsync recommends no packages.

Versions of packages rsync suggests:
ii  openssh-client  1:7.9p1-10
ii  openssh-server  1:7.9p1-10

-- no debconf information

I am rsyncing files with the following command:
/usr/bin/rsync --progress -avzse ssh 'user@host:/path/a/' '/path/b/'

I have succesfully run this command on Debian Stretch before the 
upgrade.


I updated to Debian Buster and now the system hangs when syncing large
(400M) files using the above command. I have tried also a clean Buster
installation but the problem persists.

Initially I suspected this a hardware issue but I tested the disks
with DD (on Buster) and even heavy load did not cause any issues.

This problem seems to be related to rsync and large files. Rsync with
small files works OK.

Disabling AppArmor on Buster does not help.

Symptoms:
Aftert issuing the command, the rsync starts OK. Even large files are
transferring OK at this point.
After a while, the CPU load on the system goes to 100 % and the system
becomes unresponsive.

I am running the system on VMWare ESXi 6.7 U2 host and the system is
using Broadcom 9440-8i RAID controller.

If I switch from Buster to Stretch, everything works OK.

Typically nothing can be seen in the dmesg but once I was able to
capture the following:

Jul 10 07:27:00 nasunas kernel: BUG: unable to handle kernel paging
request at 00f0416baec0
Jul 10 07:27:00 nasunas kernel: PGD 0 P4D 0
Jul 10 07:27:00 nasunas kernel: Oops:  [#1] SMP PTI
Jul 10 07:27:00 nasunas kernel: CPU: 0 PID: 1053 Comm: rsync Not
tainted 4.19.0-5-amd64 #1 Debian 4.19.37-5
Jul 10 07:27:00 nasunas kernel: Hardware name: VMware, Inc. VMware
Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
12/12/2018
Jul 10 07:27:00 nasunas kernel: RIP: 0010:__radix_tree_lookup+0x4b/0xe0
Jul 10 07:27:00 nasunas kernel: Code: 48 83 e0 fe 0f b6 08 48 89 d8 48
d3 e0 48 83 e8 01 48 39 c6 0f 87 9e 00 00 00 49 83 f8 01 74 c9 4d 89
c1 48 89 f0 49 83 e1 fe $
Jul 10 07:27:00 nasunas kernel: RSP: 0018:9cba41893c28 EFLAGS: 
00010206

Jul 10 07:27:00 nasunas kernel: RAX: b580 RBX:
0040 RCX: 
Jul 10 07:27:00 nasunas kernel: RDX:  RSI:
b580 RDI: 88b8ef932420
Jul 10 07:27:00 nasunas kernel: RBP: 9cba41893c40 R08:
00f0416baec1 R09: 00f0416baec0
Jul 10 07:27:00 nasunas kernel: R10: 88b844d82dd8 R11:
0001 R12: 006200ca
Jul 10 07:27:00 nasunas kernel: R13: 88b8ef932418 R14:
b580 R15: b8675de0
Jul 10 07:27:00 nasunas kernel: FS:  7fa4bf1d4b80()
GS:88b8fba0() knlGS:
Jul 10 07:27:00 nasunas kernel: CS:  0010 DS:  ES:  CR0:
80050033
Jul 10 07:27:00 nasunas kernel: CR2: 00f0416baec0 CR3:
00012ed28005 CR4: 003606f0
Jul 10 07:27:00 nasunas kernel: DR0:  DR1:
 DR2: 
Jul 10 07:27:00 nasunas kernel: DR3:  DR6:
fffe0ff0 DR7: 0400
Jul 10 07:27:00 nasunas kernel: Call Trace:
Jul 10 07:27:00 nasunas kernel:  radix_tree_lookup_slot+0x1e/0x50
Jul 10 07:27:00 nasunas kernel:  find_get_entry+0x19/0xf0
Jul 10 07:27:00 nasunas kernel:  pagecache_get_page+0x30/0x2c0
Jul 10 07:27:00 nasunas kernel:  ? jbd2_journal_stop+0xef/0x3c0 [jbd2]
Jul 10 07:27:00 nasunas kernel:  grab_cache_page_write_begin+0x1f/0x40
Jul 10 07:27:00 nasunas kernel:  ext4_da_write_begin+0xce/0x470 [ext4]
Jul 10 07:27:00 nasunas kernel:  generic_perform_write+0xf4/0x1b0
Jul 10 07:27:00 nasunas kernel:  ? file_update_time+0xed/0x130
Jul 10 07:27:00 nasunas kernel:  __generic_file_write_iter+0xfe/0x1c0
Jul 10 07:27:00 nasunas kernel:  ext4_file_write_iter+0xc6/0x400 [ext4]
Jul 10 07:27:00 nasunas kernel:  new_sync_write+0xfb/0x160
Jul 10 07:27:00 nasunas kernel:  vfs_write+0xa5/0x1a0
Jul 10 07:27:00 nasunas kernel:  ksys_write+0x4f/0xb0
Jul 10 07:27:00 nasunas kernel:  do_syscall_64+0x53/0x110
Jul 10 07:27:00 nasunas kernel:  
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Jul 10 07:27:00 nasunas kernel: RIP: 0033:0x7fa4bf2c0504
Jul 10 07:27:00 nasunas kernel: Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff
ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75
13 b8 01 00 00 00 0f 05