Bug#1017720: nfs-common: No such file or directory

2022-09-22 Thread Jason Breitman
The issue also occurs when using the lookupcache=none option along with the 
5.10.X kernel.
I was hoping for this option to succeed and to investigate the performance 
impact, but it is no longer viable.
I believe that I am out of options to try with the 5.10.X kernel.
Please let me know where we stand.

> -Original Message-
> From: Jason Breitman
> Sent: Wednesday, September 21, 2022 1:01 PM
> To: Ben Hutchings ; 1017...@bugs.debian.org
> Subject: RE: Bug#1017720: nfs-common: No such file or directory
> 
> I now know that this behavior does exist in Debian Buster 10.8 and more
> specifically in the 4.19.X kernel after running stricter testing on more 
> servers.
> The 4.19.X kernel resolves itself immediately following the No such file or
> directory error which is different than the 5.X kernel requiring me to clear 
> the
> inode and dentry cache by running echo 2 > /proc/sys/vm/drop_caches.
> What further information is required to resolve this issue?
> 
> > -Original Message-
> > From: Jason Breitman
> > Sent: Tuesday, September 13, 2022 4:41 PM
> > To: Ben Hutchings ; 1017...@bugs.debian.org
> > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> >
> > I downgraded the nfs-common package which required the downgrade of
> > the libevent packages and am using the 4.19.X kernel.
> > I see the issue running the initial test, but then the issue is gone when
> > running the test a subsequent time.
> >
> > libevent-2.1-6:amd64  2.1.8-stable-4
> > amd64
> > Asynchronous event notification library
> > libevent-core-2.1-6:amd64 2.1.8-stable-4
> > amd64
> > Asynchronous event notification library (core)
> > libevent-pthreads-2.1-6:amd64 2.1.8-stable-4
> > amd64
> > Asynchronous event notification library (pthreads)
> > linux-image-4.19.0-21-amd644.19.249-2  
> > amd64Linux
> > 4.19 for 64-bit PCs (signed)
> > nfs-common  1:1.3.4-2.5+deb10u1 
> >amd64NFS
> > support files common to client and server
> >
> > What other packages do I need to downgrade in order to get Debian 11.4 to
> > behave like Debian 10.8?
> > What additional questions can I answer so that we can move forward?
> >
> > > -Original Message-
> > > From: Jason Breitman
> > > Sent: Tuesday, September 6, 2022 5:18 PM
> > > To: Ben Hutchings ; 1017...@bugs.debian.org
> > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > >
> > > I also see the failure with the kernels below, but the 4.19.X kernel
> resolves
> > > the issue without dropping caches.
> > > linux-image-4.19.0-14-amd64   4.19.171-2 amd64
> > > Linux 4.19
> > for
> > > 64-bit PCs (signed)
> > > linux-image-4.19.0-21-amd64   4.19.249-2 amd64
> > > Linux 4.19
> > for
> > > 64-bit PCs (signed)
> > >
> > > I see the issue running the initial test, but then the issue is gone when
> > > running the test a subsequent time.
> > > I ran several tests to verify the behavior differences between the 4.19.X
> > and
> > > 5.X kernels.
> > >
> > > -- Test
> > > ls -l /mnt/dir/someOtherDir/* | grep '?'
> > >
> > > -- Error message - the error message is showing files that have been
> erased
> > > via rsync --delete
> > > ls: cannot access 'filename': No such file or directory
> > > -? ? ???? filename
> > >
> > > > -Original Message-
> > > > From: Jason Breitman
> > > > Sent: Friday, September 2, 2022 5:17 PM
> > > > To: Ben Hutchings ; 1017...@bugs.debian.org
> > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > >
> > > > I have tested with the following kernels and see this issue in each 
> > > > case.
> > > >
> > > > linux-image-5.10.0-16-amd64  5.10.127-1 
> > > >  amd64
> > > Linux
> > > > 5.10 for 64-bit PCs (signed)
> > > > linux-image-5.15.0-0.bpo.3-amd64 5.15.15-2~bpo11+1  
> > > > amd64
> > > > Linux 5.15 for 64-bit PCs (signed)
> > > > linux-image-5.18.0-0.deb11.3-amd64 5.18.14-1~bpo11+1  amd64
> > > > Linux 5.18 for 64-bit PCs (signed)
> > > >
> > > > An interesting note is that when using the 5.18 kernel, I had to run 
> > > > echo
> 3
> > >
> > > > /proc/sys/vm/drop_caches to resolve the issue.
> > > > echo 2 > /proc/sys/vm/drop_caches did not work as it did on the 5.10
> and
> > > > 5.15 kernels.
> > > >
> > > > > -Original Message-
> > > > > From: Jason Breitman
> > > > > Sent: Friday, August 26, 2022 3:36 PM
> > > > > To: 'Ben Hutchings' ;
> > '1017...@bugs.debian.org'
> > > > > <1017...@bugs.debian.org>
> > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory
> > > > >
> > > > > I was able to identify another workaround today which may help you
> to
> > > > > identify the issue.
> > > > > The 

Bug#1020534: sgx: EPC section 0x50200000-0x55f7ffff (crash)

2022-09-22 Thread Diederik de Haas
On donderdag 22 september 2022 22:24:31 CEST Diederik de Haas wrote:
> Since kernel 5.18.2-1 I'm getting the following error/message in dmesg

FTR: the 'crash' seems to be in the SGX component, not the whole system.
I don't use SGX, so I don't know if or what for effect this has on its working.

signature.asc
Description: This is a digitally signed message part.


Bug#1020534: sgx: EPC section 0x50200000-0x55f7ffff (crash)

2022-09-22 Thread Diederik de Haas
Source: linux
Version: 5.18.2-1
Severity: normal

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Since kernel 5.18.2-1 I'm getting the following error/message in dmesg:

[0.465573] DMAR: Intel(R) Virtualization Technology for Directed I/O
[0.465579] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[0.465581] software IO TLB: mapped [mem 
0x49be6000-0x4dbe6000] (64MB)
[0.465676] sgx: EPC section 0x5020-0x55f7
[0.466859] [ cut here ]
[0.466863] WARNING: CPU: 1 PID: 55 at arch/x86/kernel/cpu/sgx/main.c:446 
ksgxd+0x1b3/0x1c0
[0.466877] Modules linked in:
[0.466881] CPU: 1 PID: 55 Comm: ksgxd Not tainted 5.18.0-1-amd64 #1  Debian 
5.18.2-1
[0.466887] Hardware name: LENOVO 20HRCTO1WW/20HRCTO1WW, BIOS N1MET70W (1.55 
) 07/07/2022
[0.466889] RIP: 0010:ksgxd+0x1b3/0x1c0
[0.466896] Code: ff e9 f6 fe ff ff 48 89 df e8 99 dd 0c 00 84 c0 0f 84 c7 
fe ff ff 31 ff e8 fa dd 0c 00 84 c0 0f 85 98 fe ff ff e9 b3 fe ff ff <0f> 0b e9 
83 fe ff ff e8 a1 d8 90 00 90 0f 1f 44 00 00 41 57 48 c1
[0.466903] RSP: :b24640463ed8 EFLAGS: 00010283
[0.466913] RAX: b246403a1970 RBX: 9104c2753400 RCX: 
[0.466922] RDX: 8000 RSI: b246403a1930 RDI: 
[0.466925] RBP: 9104c1345300 R08: 9104c13456c0 R09: 9104c13456c0
[0.466928] R10:  R11: 0001 R12: b24640073ce0
[0.466930] R13: 9104c2789980 R14: a0a5c1c0 R15: 
[0.466933] FS:  () GS:91085168() 
knlGS:
[0.466937] CS:  0010 DS:  ES:  CR0: 80050033
[0.466940] CR2:  CR3: 00042d210001 CR4: 003706e0
[0.466943] DR0:  DR1:  DR2: 
[0.466945] DR3:  DR6: fffe0ff0 DR7: 0400
[0.466948] Call Trace:
[0.466952]  
[0.466956]  ? _raw_spin_lock_irqsave+0x24/0x50
[0.466976]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[0.466982]  ? __kthread_parkme+0x36/0x80
[0.466990]  kthread+0xe8/0x110
[0.466994]  ? kthread_complete_and_exit+0x20/0x20
[0.466998]  ret_from_fork+0x22/0x30
[0.467008]  
[0.467010] ---[ end trace  ]---
[0.468117] Initialise system trusted keyrings
[0.468137] Key type blacklist registered
[0.468217] workingset: timestamp_bits=36 max_order=22 bucket_order=0
[0.472438] zbud: loaded


It does NOT occur with version 5.17.3-1 and 5.18-1~exp1, but it does
occur with 5.18.2-1, 5.18.16-1, 5.19.6-1 and I first noticed it with a
self-compiled 6.0-rc6 (custom branch based on Debian kernel's master
branch).

Where it does NOT occur, there is no message containing 'sgx' in dmesg
at all and where it DOES occur it appears to be the same with only a
variantion in line number with ``arch/x86/kernel/cpu/sgx/main.c``

There are 5 commits with 'sgx' in their primary commit message:

diederik@prancing-pony:~/dev/kernel.org/linux$ git log --oneline v5.18..v5.18.2 
| grep -ci sgx
5

And they are all sequential:

$ git log --oneline 
557b6a9ccceeec1ae13a83b4490458b92e064c0e..5aada654649d9bcf6b89d7c0d1ff4b794f9295d3
5aada654649d media: i2c: imx412: Fix reset GPIO polarity
22e83371210d x86/sgx: Ensure no data in PCMD page after truncate
0e1f97633953 x86/sgx: Fix race between reclaimer and page fault handler
69432ff18091 x86/sgx: Obtain backing storage page with enclave mutex held
876053dd7503 x86/sgx: Mark PCMD page as dirty when modifying contents
5ded81f42258 x86/sgx: Disconnect backing page references from dirty status
6ad9dbb202a9 HID: multitouch: add quirks to enable Lenovo X12 trackpoint

But I haven't verified that the issue got introduced with one of them.

No idea if it could be relevant, but the SGX related section in my BIOS
has the following settings (on a Lenovo ThinkPad X1 Carbon 5th gen):
Intel (R) SGX Control: Software Controlled
Current State: Enabled

(And an item to 'Change Owner EPOCH')

I haven't changed those settings between reboots with the various
kernels (or ever AFAIR).


- -- System Information:
Debian Release: bookworm/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'stable-security'), (500, 'unstable'), 
(500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.18.0-1-amd64 (SMP w/4 CPU threads; PREEMPT)
Kernel taint flags: TAINT_WARN
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

-BEGIN PGP SIGNATURE-

iHUEARYIAB0WIQT1sUPBYsyGmi4usy/XblvOeH7bbgUCYyzEdQAKCRDXblvOeH7b
blRMAQDxcH2fNVpKGoGU6AupSn7uQKSuNI0daaf+XKDCXbYiswD/T7R54kdPqh7L
xIpNsQJijfZ2r6+jVQ+232GODcbCkAs=
=B9if
-END PGP SIGNATURE-



Bug#1019700: Info received (Fwd: Bug#1019700: mmc0: Timeout waiting for hardware cmd interrupt.)

2022-09-22 Thread Hank Barta
I've gone through "git bisect" on the repo and results are at
https://paste.debian.net/1254605/

Each candidate was tested with 6 reboots (including 3 that involved power
cycling.)

There were four stages that did not build and at which I executed 'git
bisect skip' to get a buildable candidate. They are listed in the paste
linked above.

I have notes on the build errors if that would be useful.

-- 
Beautiful Sunny Winfield


Bug#1020504: marked as done (Bug on Debian 11 Bullseye - Kernel 5.19.6-amd64 dont run with Nvidia drivers)

2022-09-22 Thread Debian Bug Tracking System
Your message dated Thu, 22 Sep 2022 13:30:10 +0200
with message-id 
and subject line Re: Bug#1020504: Bug on Debian 11 Bullseye - Kernel 
5.19.6-amd64 dont run with Nvidia drivers
has caused the Debian Bug report #1020504,
regarding Bug on Debian 11 Bullseye - Kernel 5.19.6-amd64 dont run with Nvidia 
drivers
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)


-- 
1020504: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020504
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems
--- Begin Message ---
Package: linux-image-5.19.0-1-amd64 
Version: 5.19.6-1 
Nvidia drivers version: 515.65.01 & 515.76 from Nvidia web site

Bug Description: 
The installation of the Nvidia drivers is done perfectly from the kernel 
5.10.140-1, but it is impossible to boot then on kernel 5.19.6-1.
If you try to install the drivers from kernel 5.10.140-1, it ends in failure.
The problem occurs with Nvidia drivers 515.65.01 & 515.76 from Nvidia web site. 
This happens whether you choose the option of installing proprietary drivers or 
free drivers.
I am delighted that future Nvidia drivers will soon be included in the Linux 
Kernel, it will require less maintenance from users and it will offer us more 
stability on our systems...
Best regards.

Philippe
--- End Message ---
--- Begin Message ---
Hi Philippe,

On Thu, Sep 22, 2022 at 12:17:30PM +0200, pham...@bluewin.ch wrote:
> Package: linux-image-5.19.0-1-amd64 
> Version: 5.19.6-1 
> Nvidia drivers version: 515.65.01 & 515.76 from Nvidia web site
> 
> Bug Description: 
> The installation of the Nvidia drivers is done perfectly from the
> kernel 5.10.140-1, but it is impossible to boot then on kernel
> 5.19.6-1.
> If you try to install the drivers from kernel 5.10.140-1, it ends in
> failure.
> The problem occurs with Nvidia drivers 515.65.01 & 515.76 from
> Nvidia web site. This happens whether you choose the option of
> installing proprietary drivers or free drivers.
> I am delighted that future Nvidia drivers will soon be included in
> the Linux Kernel, it will require less maintenance from users and it
> will offer us more stability on our systems...
> Best regards.

Without knowing what exactly is the failure, still issues affecting
the out ot tree and proprietary modules from Nvidia should be handled
by them. I'm thus going to close this bug now.

Regards,
Salvatore--- End Message ---


Bug#1020504: Bug on Debian 11 Bullseye - Kernel 5.19.6-amd64 dont run with Nvidia drivers

2022-09-22 Thread pham...@bluewin.ch
Package: linux-image-5.19.0-1-amd64 
Version: 5.19.6-1 
Nvidia drivers version: 515.65.01 & 515.76 from Nvidia web site

Bug Description: 
The installation of the Nvidia drivers is done perfectly from the kernel 
5.10.140-1, but it is impossible to boot then on kernel 5.19.6-1.
If you try to install the drivers from kernel 5.10.140-1, it ends in failure.
The problem occurs with Nvidia drivers 515.65.01 & 515.76 from Nvidia web site. 
This happens whether you choose the option of installing proprietary drivers or 
free drivers.
I am delighted that future Nvidia drivers will soon be included in the Linux 
Kernel, it will require less maintenance from users and it will offer us more 
stability on our systems...
Best regards.

Philippe


Re: OCFS2 and GFS2 in -cloud kernel images

2022-09-22 Thread Bastian Blank
Hi Lukas

On Tue, Sep 20, 2022 at 05:18:37PM +0200, Lukas Martini wrote:
> I understand the cloud images are supposed to be stripped down images with
> only the bare essentials for cloud operation.

Even more, they are destined for certain environments.  OpenStack
technically is not part of that, but works in a lot of cases as a
generic OpenStack is just kvm and virtio, the same as GCE.

And some parts are disabled simply because they are large and of really
uncertain use.

> However I think it's quite unfortunate that the OCFS2 and GFS2 modules are
> also disabled compared to the regular kernel config since I would argue
> those are _especially_ useful in a cloud environment.

Actually I don't think this is true.  Why would you use GFS if your
environment already provides a redundant shared file storage for you?
Or can use a distributed store like ceph, which does not require huge
kernel extensions and are way more resilient.

> For example, OpenStack offers multiattach images that require a shared-disk
> file system like these. I think Amazon AWS added a similar feature recently
> too.

Azure supports it, GCE supports it as a preview.  I did not find
anything about AWS.

> Is there any chance these could be re-enabled for the cloud images, or is
> the official advice to just switch to the regular images where those are
> needed?

I don't think this warants shipping GFS and/or OCFS.  If you really,
really want it, use the generic image with the full kernel, which is
required for several OpenStack environments anyway.

Regards,
Bastian

-- 
You're too beautiful to ignore.  Too much woman.
-- Kirk to Yeoman Rand, "The Enemy Within", stardate unknown