Bug#939170: linux-image-5.2.0-2-amd64: does not suspend completely, locks up

2019-09-05 Thread Moritz Schlarb
Source: linux
Version: 5.2.9-2
Followup-For: Bug #939170

Hi everyone,

I'm seeing the same issue with linux-image-5.2.0-2-amd64=5.2.9-2 on Lenovo
Thinkpad X1 Carbon 4th Gen.

Best regards,
Moritz



-- System Information:
Debian Release: bullseye/sid
  APT prefers testing
  APT policy: (990, 'testing'), (800, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.2.0-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled



Bug#898165: Regression in [v2] nfs: Fix ugly referral attributes ?

2018-05-18 Thread Moritz Schlarb
Control: tags -1 + upstream patch
Control: severity -1 grave
Control: summary -1 0
Control: outlook -1 0

3.16.54 introduced a regression by including "nfs: Fix ugly referral
attributes" but not "nfs: Fetch MOUNTED_ON_FILEID when updating an
inode". Please include that other patch, too so NFS referrals work again.

The required patch is
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ea96d1ecbe4fcb1df487d99309d3157b4ff5fc02

Best regards,
Moritz

On 17.05.2018 16:15, Chuck Lever wrote:

> Just a shot in the dark: Wondering if v3.16 needs
> 
> commit ea96d1ecbe4fcb1df487d99309d3157b4ff5fc02
> Author: Anna Schumaker 
> AuthorDate: Fri Apr 3 14:35:59 2015 -0400
> Commit: Trond Myklebust 
> CommitDate: Thu Apr 23 14:43:54 2015 -0400
> 
> nfs: Fetch MOUNTED_ON_FILEID when updating an inode



Bug#898165: Regression in [v2] nfs: Fix ugly referral attributes ?

2018-05-17 Thread Moritz Schlarb
Hi Chuck,

On 17.05.2018 16:15, Chuck Lever wrote:

> Just a shot in the dark: Wondering if v3.16 needs
> 
> commit ea96d1ecbe4fcb1df487d99309d3157b4ff5fc02
> Author: Anna Schumaker 
> AuthorDate: Fri Apr 3 14:35:59 2015 -0400
> Commit: Trond Myklebust 
> CommitDate: Thu Apr 23 14:43:54 2015 -0400
> 
> nfs: Fetch MOUNTED_ON_FILEID when updating an inode

Gosh, it seems you're right!
When I take that patch and apply it, the referrals are being followed again!

Thanks for your idea!
Now how do we make sure it gets applied soonish?

Regards,
Moritz



Bug#898165: Regression in [v2] nfs: Fix ugly referral attributes ?

2018-05-17 Thread Moritz Schlarb
Hi everyone,

there might be a regression coming from this patch:
Since it got included in 3.16.54, our clients running a recent 3.16
kernel (like from Debian jessie-security) did not follow NFS 4.1
referrals (issued by nfs-ganesha) anymore.
I have built that exact Debian kernel package with just this patch
reversed and it worked again, so I got pretty confident that this patch
is at least strongly related to the problem.
Pradeep also confirmed the problem happening in 3.16.54 but not in 3.16.51.
Interestingly, this does *not* happen with 4.9 kernels, although the
patch was part of 4.9.80...

I have attached a pcap file of a machine running 3.16.56-1+deb8u1 in
which I try to login as a user where my home directory is
/uni-mainz.de/homes/schlarbm (with nfsrefer.zdv.uni-mainz.de:/ on
/uni-mainz.de) which is then referred to
fs02.uni-mainz.de:/vol/ma17/homes/schlarbm but that referral is not
followed by the client.

Please let me know if you need additional information to reproduce or
have suggestions on what we could try.

Best regards,
Moritz

On 05.11.2017 21:45, Chuck Lever wrote:
> Before traversing a referral and performing a mount, the mounted-on
> directory looks strange:
> 
> dr-xr-xr-x. 2 4294967294 4294967294 0 Dec 31  1969 dir.0
> 
> nfs4_get_referral is wiping out any cached attributes with what was
> returned via GETATTR(fs_locations), but the bit mask for that
> operation does not request any file attributes.
> 
> Retrieve owner and timestamp information so that the memcpy in
> nfs4_get_referral fills in more attributes.
> 
> Changes since v1:
> - Don't request attributes that the client unconditionally replaces
> - Request only MOUNTED_ON_FILEID or FILEID attribute, not both
> - encode_fs_locations() doesn't use the third bitmask word
> 
> Fixes: 6b97fd3da1ea ("NFSv4: Follow a referral")
> Suggested-by: Pradeep Thomas 
> Signed-off-by: Chuck Lever 
> Cc: sta...@vger.kernel.org
> ---
>  fs/nfs/nfs4proc.c |   18 --
>  1 file changed, 8 insertions(+), 10 deletions(-)
> 
> I could send this as an incremental, but that just seems to piss
> off distributors, who will just squash them all together anyway.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 6c61e2b..2662879 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -254,15 +254,12 @@ static int nfs4_map_errors(int err)
>  };
>  
>  const u32 nfs4_fs_locations_bitmap[3] = {
> - FATTR4_WORD0_TYPE
> - | FATTR4_WORD0_CHANGE
> + FATTR4_WORD0_CHANGE
>   | FATTR4_WORD0_SIZE
>   | FATTR4_WORD0_FSID
>   | FATTR4_WORD0_FILEID
>   | FATTR4_WORD0_FS_LOCATIONS,
> - FATTR4_WORD1_MODE
> - | FATTR4_WORD1_NUMLINKS
> - | FATTR4_WORD1_OWNER
> + FATTR4_WORD1_OWNER
>   | FATTR4_WORD1_OWNER_GROUP
>   | FATTR4_WORD1_RAWDEV
>   | FATTR4_WORD1_SPACE_USED
> @@ -6763,9 +6760,7 @@ static int _nfs4_proc_fs_locations(struct rpc_clnt 
> *client, struct inode *dir,
>  struct page *page)
>  {
>   struct nfs_server *server = NFS_SERVER(dir);
> - u32 bitmask[3] = {
> - [0] = FATTR4_WORD0_FSID | FATTR4_WORD0_FS_LOCATIONS,
> - };
> + u32 bitmask[3];
>   struct nfs4_fs_locations_arg args = {
>   .dir_fh = NFS_FH(dir),
>   .name = name,
> @@ -6784,12 +6779,15 @@ static int _nfs4_proc_fs_locations(struct rpc_clnt 
> *client, struct inode *dir,
>  
>   dprintk("%s: start\n", __func__);
>  
> + bitmask[0] = nfs4_fattr_bitmap[0] | FATTR4_WORD0_FS_LOCATIONS;
> + bitmask[1] = nfs4_fattr_bitmap[1];
> +
>   /* Ask for the fileid of the absent filesystem if mounted_on_fileid
>* is not supported */
>   if (NFS_SERVER(dir)->attr_bitmask[1] & FATTR4_WORD1_MOUNTED_ON_FILEID)
> - bitmask[1] |= FATTR4_WORD1_MOUNTED_ON_FILEID;
> + bitmask[0] &= ~FATTR4_WORD0_FILEID;
>   else
> - bitmask[0] |= FATTR4_WORD0_FILEID;
> + bitmask[1] &= ~FATTR4_WORD1_MOUNTED_ON_FILEID;
>  
>   nfs_fattr_init(_locations->fattr);
>   fs_locations->server = server;
> 


nfs-referral-broken.pcap
Description: application/vnd.tcpdump.pcap
<>

signature.asc
Description: OpenPGP digital signature


Bug#898165: linux-image-3.16.0-6-amd64: can't mount NFS shares via nfs referrals

2018-05-14 Thread Moritz Schlarb
Hi Pradeep,

thanks for your response.

On 14.05.2018 17:48, Pradeep wrote:
> The patch is for NFS client side bug where it was initializing the
> attributes to zero if NFS4ERR_MOVED is returned in LOOKUP; but referral
> was not followed later. This only happens with NFSv4 server and the
> specific error (NFS4ERR_MOVED). 
> 
> It is not related to nfs-ganesha - it can be reproduced with kernel NFS
> as well.
> 
> Are you seeing any regressions with the patch?

I would think so.
Since that patch arrived in Kernel 3.16, it would not even try to follow
the referral as it did before. When I just revert this specific patch
for the kernel, it works.

On the referrer server, we use nfs-ganesha 2.4.5-2 with Christoph's
patch for nfs referral
(https://sources.debian.org/src/nfs-ganesha/2.4.5-2%7Ebpo9+1/debian/patches/nfs-ganesha-nfsrefer.patch).
The actual NFS server is a NetApp cluster.

I'm not so sure right now if it is not maybe a bug in nfs-ganesha (that
maybe even got fixed in the meantime), so I thought, maybe you know.

Thanks,
Moritz



signature.asc
Description: OpenPGP digital signature


Bug#898165: linux-image-3.16.0-6-amd64: can't mount NFS shares via nfs referrals

2018-05-14 Thread Moritz Schlarb
Hello Frank and Pradeep,

I was hoping that you would have some insight on a possible
bug/regression/incompability between nfs-ganesha and the Linux kernel
with a specific patch to which you reacted (see below) in
https://marc.info/?l=linux-nfs=150998968529002=2.

There is no mail about the results of Pradeep's checking whether that
patch is safe for nfs-ganesha on the server side, or whether there were
additional changes needed. Maybe one of you could shed some light on that.

I've created a tracking Debian bug report for our issue:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=898165

Best regards,
Moritz

On 14.05.2018 11:05, Moritz Schlarb wrote:
> Control: tags -1 + patch upstream
> Control: notfound -1 linux/3.16.51-3+deb8u1
> 
> Hi everyone,
> 
> I have identified the upstream commit that introduced this
> bug/regression for us.
> 
> It is c05cefcc72416a37eba5a2b35f0704ed758a9145 "nfs: Fix ugly referral
> attributes"
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c05cefcc72416a37eba5a2b35f0704ed758a9145)
> which seems to have been part of upstream 3.16.54.
> 
> I have manually compiled 3.16.56-1+deb8u1 with that patch reversed and I
> can successfully mount my home directory again.
> 
> Regards,
> 

-- 
Moritz Schlarb
Unix-Gruppe | Systembetreuung
Zentrum für Datenverarbeitung
Johannes Gutenberg-Universität Mainz
Raum 01-331 - Tel. +49 6131 39-29441
OpenPGP Fingerprint: DF01 2247 BFC6
5501 AFF2 8445 0C24 B841 C7DD BAAF
<>

signature.asc
Description: OpenPGP digital signature


Bug#898165: linux-image-3.16.0-6-amd64: can't mount NFS shares via nfs referrals

2018-05-14 Thread Moritz Schlarb
Control: tags -1 + patch upstream
Control: notfound -1 linux/3.16.51-3+deb8u1

Hi everyone,

I have identified the upstream commit that introduced this
bug/regression for us.

It is c05cefcc72416a37eba5a2b35f0704ed758a9145 "nfs: Fix ugly referral
attributes"
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c05cefcc72416a37eba5a2b35f0704ed758a9145)
which seems to have been part of upstream 3.16.54.

I have manually compiled 3.16.56-1+deb8u1 with that patch reversed and I
can successfully mount my home directory again.

Regards,
-- 
Moritz Schlarb
Unix-Gruppe | Systembetreuung
Zentrum für Datenverarbeitung
Johannes Gutenberg-Universität Mainz
Raum 01-331 - Tel. +49 6131 39-29441
OpenPGP Fingerprint: DF01 2247 BFC6
5501 AFF2 8445 0C24 B841 C7DD BAAF
<>

signature.asc
Description: OpenPGP digital signature


Bug#898165: linux-image-3.16.0-6-amd64: can't mount NFS shares via nfs referrals

2018-05-11 Thread Moritz Schlarb
Hello again,

we tried to get some insight into the issue by wiresharking while trying
to mount.

Although the client first receives NFS4ERR_MOVED and then re-queries for
FS_Locations and receives a correct response for the referred
fs_location, it just does not continue to mount that given fs_location
then like it used to.

Regards,
Moritz
<>

signature.asc
Description: OpenPGP digital signature


Bug#898165: linux-image-3.16.0-6-amd64: can't mount NFS shares via nfs referrals

2018-05-08 Thread Moritz Schlarb
Hi everyone,

we have performed additional tests that led to the conclusion that this
bug did already exist in 3.16.0-5-amd64, but not in 3.16.0-4-amd64.
Given that, it must have been some change in  3.16.51-3+deb8u1 which
luckily are only few.
I hope its not fallout from the KPTI patch, so the only other thing that
seems relevant (since we're using Kerberos) would be:

>  * KEYS: add missing permission check for request_key() destination
>(CVE-2017-17807)

Does that seem valid?

Regards,
-- 
Moritz Schlarb
Unix-Gruppe | Systembetreuung
Zentrum für Datenverarbeitung
Johannes Gutenberg-Universität Mainz
Raum 01-331 - Tel. +49 6131 39-29441
OpenPGP Fingerprint: DF01 2247 BFC6
5501 AFF2 8445 0C24 B841 C7DD BAAF
<>

signature.asc
Description: OpenPGP digital signature


Bug#898165: linux-image-3.16.0-6-amd64: can't mount NFS shares via nfs referrals

2018-05-08 Thread Moritz Schlarb
Package: src:linux
Version: 3.16.56-1
Severity: important

Control: fixed -1 linux/4.9.88-1~bpo8+1
Control: fixed -1 linux/4.9.88-1

Hello,

after getting the latest stable security kernel version on one of our
NFS clients, said client can't mount our user home directories via our
NFS referer server anymore.

This problem is only similar to
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=850713
but (for us) far more severe, since *this* bug affects home directories.

Our workaround is to install the latest kernel from jessie-backports,
which does not have this problem.

The logs aren't saying anything at the time of login, where the home
should be mounted.

There are only some patches regarding nfs listed in the package
changelog, maybe if you could point us to a specific one, we could try
to bisect it.

Regards,
Moritz

-- Package-specific info:
** Kernel log: boot messages should be attached

** Model information
sys_vendor: Dell Inc.
product_name: OptiPlex 7010
product_version: 01
chassis_vendor: Dell Inc.
chassis_version: 
bios_vendor: Dell Inc.
bios_version: A28
board_vendor: Dell Inc.
board_name: 0GY6Y8
board_version: A03

** PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v2/3rd Gen Core 
processor DRAM Controller [8086:0150] (rev 09)
Subsystem: Dell Device [1028:0577]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- 
Kernel driver in use: ivb_uncore

00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 v2/3rd 
Gen Core processor Graphics Controller [8086:0162] (rev 09) (prog-if 00 [VGA 
controller])
Subsystem: Dell Device [1028:0577]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- 
Kernel driver in use: i915

00:14.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset 
Family USB xHCI Host Controller [8086:1e31] (rev 04) (prog-if 30 [XHCI])
Subsystem: Dell Device [1028:0577]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- 
Kernel driver in use: xhci_hcd

00:16.0 Communication controller [0780]: Intel Corporation 7 Series/C210 Series 
Chipset Family MEI Controller #1 [8086:1e3a] (rev 04)
Subsystem: Dell Device [1028:0577]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Kernel driver in use: mei_me

00:19.0 Ethernet controller [0200]: Intel Corporation 82579LM Gigabit Network 
Connection [8086:1502] (rev 04)
Subsystem: Dell Device [1028:052c]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Kernel driver in use: e1000e

00:1a.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset 
Family USB Enhanced Host Controller #2 [8086:1e2d] (rev 04) (prog-if 20 [EHCI])
Subsystem: Dell Device [1028:0577]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- 
Kernel driver in use: ehci-pci

00:1b.0 Audio device [0403]: Intel Corporation 7 Series/C210 Series Chipset 
Family High Definition Audio Controller [8086:1e20] (rev 04)
Subsystem: Dell Device [1028:0577]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Kernel driver in use: snd_hda_intel

00:1d.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series Chipset 
Family USB Enhanced Host Controller #1 [8086:1e26] (rev 04) (prog-if 20 [EHCI])
Subsystem: Dell Device [1028:0577]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- 
Kernel driver in use: ehci-pci

00:1e.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev 
a4) (prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- 
Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: 

00:1f.0 ISA bridge [0601]: Intel Corporation Q77 Express Chipset LPC 

Bug#869939: [Hyper-V] Feature request: pick up PTP Hyper-V timesync source from upstream 4.12

2017-09-08 Thread Moritz Schlarb
Control: found -1 linux/4.9.30-2
Control: found 85 linux/4.9.30-2
Control: block 85 by -1

I want to heavily second this request!
In stretch-backports, we have 4.12 now, but as we are having quite a lot
of Jessie machines, we are hoping for a solution for jessie, too (even
if it were having 4.12 in jessie-backports-sloppy).

FWIW, here is the same request from Josh in Ubuntu:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1676635
and that had already been closed some time ago.

ATM, Debian is not listed as supporting "Windows Server 2016 Accurate
Time" on
https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/supported-debian-virtual-machines-on-hyper-v,
whereas Ubuntu is on
https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/supported-ubuntu-virtual-machines-on-hyper-v

Hope there's something that can be done here ;-)

-- 
Moritz Schlarb
Unix-Gruppe | Systembetreuung
Zentrum für Datenverarbeitung
Johannes Gutenberg-Universität Mainz
Raum 01-331 - Tel. +49 6131 39-29441
OpenPGP Fingerprint: DF01 2247 BFC6
5501 AFF2 8445 0C24 B841 C7DD BAAF
<>

Bug#850713: can't mount NFS shares via nfs referrals

2017-04-25 Thread Moritz Schlarb
Control: fixed -1 linux/4.9+79~bpo8+1

Hi everyone,

it seems that this bug has been fixed in the latest version of the
Kernel package.
Out of curiosity, we would really like to know when and where an
appropriate fix was included - we tried finding something in
https://tracker.debian.org/media/packages/l/linux/changelog-4.9.18-1~bpo8+1
but could not find any hint.

Regards,
-- 
Moritz Schlarb
Unix-Gruppe | Systembetreuung
Zentrum für Datenverarbeitung
Johannes Gutenberg-Universität Mainz
Raum 01-321 - Tel. +49 6131 39-29441
OpenPGP Fingerprint: DF01 2247 BFC6
5501 AFF2 8445 0C24 B841 C7DD BAAF
<>

signature.asc
Description: OpenPGP digital signature


Bug#854444: linux-image-4.9.0-0.bpo.1-amd64-unsigned: System time divergence with HyperV TimeSync protocol version 4

2017-02-07 Thread Moritz Schlarb
Package: linux-image-4.9.0-0.bpo.1-amd64-unsigned
Version: 4.9.2-2~bpo8+1
Severity: important
Tags: upstream

Since using the linux-image-4.9.0-0.bpo.1-amd64 kernel, some of our Jessie
systems running under HyperV virtualization show an enormous time divergence
gradually building up over some hours (see attached graphs from our NTP
monitoring).
System time and NTP time converge to approx. 10 minutes every approx. 8 hours
(though some machines diverge forwards and some backwards...). Especially
jumping backwards in time can be critical for various applications.
Additionally, Systemd prints "Time has been changed" to the syslog every 5
seconds, which is a little bit annoying.

In upstream, there is a patch under development that drops in-kernel time
adjustments and exposes the TimeSync messages as a PTP device for consumation
by an NTP client: https://lkml.org/lkml/2017/1/30/232.

I don't know whether this classifies as some kind of brokenness on some
hardware (e.g. HyperV 2016)...
A possible workaround of course is to just disable Time Synchronisation in the
Guest Additions setting in HyperV.

Best regards,
Moritz



-- System Information:
Debian Release: 8.7
  APT prefers stable-updates
  APT policy: (700, 'stable-updates'), (700, 'stable'), (60, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)