Fwd: Also observing #988477

2024-01-18 Thread Elliott Mitchell
I should have Cc'd debian-kernel@lists.debian.org, but failed to do so.
As such now forwarding a copy.  At the very least this involves the
Linux MD-RAID1 functionality, but I am unsure whether this is a Linux
kernel bug versus a Xen bug.


Forwarded:

I am also observing #988477 occur.  This machine has a AMD Zen 4
processor.  The first observation was when motherboard/processor was
swapped out, the older motherboard/processor was several generations old.

The pattern which is emerging is Linux MD RAID1 plus recent AMD processor
which has full IOMMU functionality.  The older machine was believed to
have an IOMMU, but the BIOS wasn't creating appropriate ACPI tables
(IVRS) and thus Xen was unable to utilize it.

This seems to be occuring with a small percentage of write operations.
Subsequent read operations appear to be fine.

I am not convinced this is a Xen bug.  I suspect this is instead a bug
in the Linux MD subsystem.  In particular if the DMA interface was
designed assuming only a single device would ever access any page, but
the MD RAID1 driver is reusing the same page for both devices.

IOMMU page release could be handled by marking the page unused in a
device data structure and later removed by sweeping a table.  In such
case if the MD-RAID1 driver was to redirect the page to another device
between these two steps, the entry for a subsequent device could be wiped
out when trying to invalidate an entry for a prior device.


Anyway, I'm also observing bug #988477.  This could also be a kernel bug.
So far no crashes/confirmed data loss have occured, but sweeping the
mirror does turn up small numbers of inconsistencies.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445




Bug#1049450: New rpc.mountd rejects -N 2 option

2023-08-16 Thread Elliott Mitchell
On Wed, Aug 16, 2023 at 08:57:16AM +0200, Salvatore Bonaccorso wrote:
> 
> On Tue, Aug 15, 2023 at 04:13:59PM -0700, Elliott Mitchell wrote:
> > Package: nfs-kernel-server
> > Version: 1:2.6.2-4
> > 
> > Hopefully SSIA.
> > 
> > `rpc.mountd` has a -N option to disable versions of NFS.
> > 
> > I had been previously using "-N 2", but that is now broken.  The error
> > message was quite non-helpful ("nfsd2" if I recall correctly).  Upon
> > removing "-N 2", luckily NFSv2 didn't get enabled, but this was still
> > annoying to deal with.  At worst using a deprecated setting should merely
> > generate a warning.
> 
> Removal of NFSv2 support was documented with a Debian NEWS entry for
> 1:2.6.1-1~exp1, cf. #1006650.
> 
> nfs-utils (1:2.6.1-1~exp1) unstable; urgency=medium
> 
>   Support for NFSv2 has been removed from nfs-kernel-server.  It was
>   previously disabled by default, but still available.
> 
>  -- Ben Hutchings   Sun, 13 Mar 2022 19:05:02 +0100

Removing NFSv2 support shouldn't invalidate "-N 2".  "-N 2" is supposed
to disable NFSv2 at runtime, as such removing all NFSv2 support should
merely render "-N 2" 100% redundant and at worst produce a warning.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#1049450: New rpc.mountd rejects -N 2 option

2023-08-15 Thread Elliott Mitchell
Package: nfs-kernel-server
Version: 1:2.6.2-4

Hopefully SSIA.

`rpc.mountd` has a -N option to disable versions of NFS.

I had been previously using "-N 2", but that is now broken.  The error
message was quite non-helpful ("nfsd2" if I recall correctly).  Upon
removing "-N 2", luckily NFSv2 didn't get enabled, but this was still
annoying to deal with.  At worst using a deprecated setting should merely
generate a warning.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#1034811: linux: consider CONFIG_HW_RANDOM_VIRTIO=n

2023-04-24 Thread Elliott Mitchell
Package: src:linux
Version: 6.0.3-1~bpo11+1
Severity: wishlist

Looks like someone had the idea of a virtualized HW RNG.  Yet looking at
the kernel source, there isn't a single actual implementation.  Unless
I'm missing something, having CONFIG_HW_RANDOM_VIRTIO simply wastes
processor time during build and enlarges the package for no gain.
Perhaps time for Debian to quit packaging this used idea?

Looks like on-processor HW RNGs are what are taking over.  Possibly also
the HW RNG from the vTPM implementation.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#1034463: closing 1034463

2023-04-16 Thread Elliott Mitchell
On Sun, Apr 16, 2023 at 07:08:03AM +0200, Salvatore Bonaccorso wrote:
> CONFIG_AGP is built-in in Debian, in particular for:
> 
> debian/config/alpha/config:CONFIG_AGP=y
> debian/config/amd64/config:CONFIG_AGP=y
> debian/config/hppa/config.parisc64:CONFIG_AGP=y
> debian/config/ia64/config:CONFIG_AGP=y
> debian/config/kernelarch-powerpc/config:CONFIG_AGP=y
> debian/config/kernelarch-x86/config:CONFIG_AGP=y

I hadn't checked all architectures, but was well-aware it is built-in
for amd64.  I was suggesting it should change from being built-in to
being a module.

The reason being AGP is very rare on amd64 motherboards.  According to
the handy reference, AGP was starting to disappear just as amd64 hardware
started hitting the market.

I'm unsure where other architectures stand on the issue.  Yet amd64 it
shouldn't be built-in.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#1034463: linux: consider CONFIG_AGP=m

2023-04-15 Thread Elliott Mitchell
Package: src:linux
Version: 5.10.158+2
Severity: wishlist

Could AGP support be turned into a module for Debian kernels?

I'm tempted to suggest it shouldn't even be built for amd64, but does
seem reasonable for i686 kernels.  Given this, module seems to make
sense.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#1009793: linux-source 5.10.106-1 changes block device order

2022-04-17 Thread Elliott Mitchell
Package: src:linux
Version: 5.10.106-1

Between 5.10.103-1 and 5.10.106-1 (image -13) something changed which
reliably causes what used to show as /dev/sda to show as /dev/sdb.  Other
block devices plugged into the SCSI subsystem may have swapped around,
but I've yet to untangle the others.

A few utilities are still sensitive to block device order and this
causes issues for those.  Nothing on the hardware explains this.  The
controller thinks the device has a lower number, the device should
respond much faster.

The lowest level is the cciss driver.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: (Presently) Not in 5.10 source

2021-12-06 Thread Elliott Mitchell
Having finally gotten to test this, the issue does NOT effect 5.10.70-1.
So far I've only gotten to try reboot, but that went fine.

Might have been an ACPI or Xen mismerge into 4.19.  Alas this may simply
disappear into history.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#996608: linux-source-5.10: Mising dependency: dwarves

2021-10-15 Thread Elliott Mitchell
Package: linux-source-5.10
Version: 5.10.70-1

SSIA.  Debian's 5.10 configuration will NOT build without the "dwarves"
package (`pahole`).  In light of this some package, likely
linux-source-5.10 should recommend "dwarves".


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: Simply ACPI powerdown/reset issue?

2021-09-25 Thread Elliott Mitchell
On Tue, Sep 21, 2021 at 06:33:20AM -0400, Chuck Zmudzinski wrote:
> I presume you are suggesting I try booting 4.19.181-1 on the
> current version of Xen-4.14 for bullseye as a dom0. I am not
> inclined to try it until an official Debian developer endorses
> your opinion that the bug I am seeing is distinct
> from #991967, at which point I will report the bug I am
> seeing as a new bug.

Chuck Zmudzinski you are getting rather close to my threshold for calling
harrassment.  You're not /quite/ there, but I'm concerned.


Since the purpose of the bug reports is to find and diagnose bugs, I did
a bit of experimentation and made some observations.

I checked out the Debian Xen source via git.  I got the current
"master" branch which is presently the candidate 4.14.3-1 version,
which includes urgent fixes.  The hash is:
e7a17db0305c8de891b366ad3528e5a43015

On top of this I cherry-picked 3 commits from Xen's main branch:
5a4087004d1adbbb223925f3306db0e5824a2bdc
0f089bbf43ecce6f27576cb548ba4341d0ec46a8
bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b

(these can be retrieved via Xen's gitweb at
https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is
suitable for the `git am` command)

With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and
4.19.194-3 (this system is presently mostly on oldstable).  The results
were:

Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful

Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung

Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I
missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with
Linux 4.19.181-1.  I believe this combination would have hung during
reboot.


As such, I believe there are in fact two distinct bugs being observed.
The presence of EITHER of these is sufficient to cause hangs during
powerdown or reboot.

First, some patch originally from Linux's main branch breaks Xen reboots
was backported somewhere between 4.19.181-1 and 4.19.194-3.  This may
either have been introduced before 5.10 diverged from main, or may also
have been backported to 5.10.  THIS is Debian bug #991967.

Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is
valuable to ARM devices breaks reboots and powerdowns on x86.  This is
correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.  Presently
this has no Debian bug report.


The first is presently unidentified, someone enthusiastic either needs to
read git logs/source code, or bisect and build to find where it got
broken.

The second we seem to have a fix.  The only question is how many patches
to cherry pick?  bc141e8ca562 is non-urgent as it is merely superficial
and not needed for functionality.
5a4087004d1a is a workaround for Linux kernel breakage, but how likely
are we to see that fixed in the Linux kernel packages?  The fix is
well-contained and needed for some highly popular ARM devices.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-20 Thread Elliott Mitchell
On Mon, Sep 20, 2021 at 10:23:39PM -0400, Chuck Zmudzinski wrote:
> 
> On 9/20/21 7:39 PM, Diederik de Haas wrote:
> > On dinsdag 21 september 2021 01:15:15 CEST Elliott Mitchell wrote:
> >> Merely having the path is a sufficiently strong indicator for me to
> >> simply wave it past.  I though would suggest Debian should instead
> >> cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.
> >>
> >> This is available as a patch at:
> >>
> >> https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8
> > You probably then also want the following commit, which is a fix on that 
> > patch:
> > https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b
> >
> > Found that via the following url/query:
> > https://xenbits.xen.org/gitweb/?p=xen.git=search=HEAD=commit=x86%2FACPI
> >
> > I don't know whether others should be used from that as well.
> 
> I tried these two commits (adapted for the xen-4.14 branch) but this
> approach did not fix the bug - with these patches applied the dom0
> did not power down.
> 
> My advice for the Debian Xen Team is to consult with upstream and
> get their advice on whether or not it is advisable for Debian to
> retain the patches from the Xen-4.16 branch that have been
> added to the Debian 4.14 package in an attempt to support
> some arm devices that panic during on an unpatched Xen-4.14.
> If upstream cannot help Debian backport fixes for arm panics
> from Xen-4.16/unstable to Xen-4.14 stable, I think the Debian
> Xen team should remove aggressive patches that really have now
> turned the Debian Xen-4.14 package into a Frankenstein version
> that is a mixture of Xen-4.14 and Xen-4.16, and decide that support
> for those arm devices must wait until Debian gets Xen 4.16 up
> and running on the unstable and hopefully soon, testing distribution.

It is still not established you're running into #991967.  Unless the one
you're pointing towards was backported to the Xen 4.11 packages (which I
doubt) it cannot explain #991967, since at the time 4.11 was in use.

Could be this is a second bug with symptoms similar to #991967.  Now
that a fix for the second bug has been identified, you might try a
4.19.181-1 kernel and see whether that fixes things.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-20 Thread Elliott Mitchell
On Mon, Sep 20, 2021 at 06:29:49PM -0400, Chuck Zmudzinski wrote:
> On 9/20/21 1:43 PM, Chuck Zmudzinski wrote:
> >
> > On 9/20/21 12:27 AM, Elliott Mitchell wrote:
> >> On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:
> >>
> >>> I suspect the following patch is the culprit for problems
> >>> shutting down on the amd64 architecture:
> >>>
> >>> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
> >>> This patch does affect amd64 acpi code, and is probably causing
> >>> the problem on my amd64 system, so my build of the xen-4.14
> >>> hypervisor without this patch fixed the problem.
> >> Of the ones listed that is the only one which has any overlap with x86
> >> code.?? The next reproduction step is `apt-get source xen &&
> >> patch -p1 -R < 
> >> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
> >> && dpkg-buildpackage -b`.?? Then try with this to confirm that patch
> >> is what does it.
> >>
> >> Thing is that delta is rather small.?? I don't have a simulator, but that
> >> is rather small to be the culprit.
> >
> > I just tested the build with
> > patch -p1 -R < 
> > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
> > applied before building the package and I can confirm that this is the 
> > patch
> > causing the trouble for dom0 poweroff on x86/amd64. Reverting this patch
> > fixes it on my amd64 system. But this would probably break the arm build.
> >
> > I think one possible fix would require modifying
> > 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
> > so it only applies at runtime to the arm architecture. I will try some
> > modifications to the patch instead of removing it, and if I get something
> > that works on amd64 and also might work on arm, I will post it
> > for Elliott to try.
> 
> I have an encouraging result. I found a very simple patch
> to xen/arch/x86/acpi/lib.c that fixes the dom0 poweroff
> bug on my system and it should not affect the arm patches
> at all:
> --
> This patch partially reverts previous patch
> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
> 
> This hopefully fixes #911976
> 
> --- a/xen/arch/x86/acpi/lib.c?? 2021-09-20 16:49:08.0 -0400
> +++ b/xen/arch/x86/acpi/lib.c?? 2021-09-20 16:25:05.572038000 -0400
> @@ -46,10 +46,6 @@
>   if ((phys + size) <= (1 * 1024 * 1024))
>   ?? return __va(phys);
> 
> -?? /* No further arch specific implementation after early boot */
> -?? if (system_state >= SYS_STATE_boot)
> -?? ?? return NULL;
> -
>   offset = phys & (PAGE_SIZE - 1);
>   mapped_size = PAGE_SIZE - offset;
>   set_fixmap(FIX_ACPI_END, phys);
> --
> 
> Can you try this patch to src:xen and see if your
> arm devices are OK with it?

Merely having the path is a sufficiently strong indicator for me to
simply wave it past.  I though would suggest Debian should instead
cherry-pick commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.

This is available as a patch at:

https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=0f089bbf43ecce6f27576cb548ba4341d0ec46a8


The other commit I would suggest being picked by src:xen is
5a4087004d1adbbb223925f3306db0e5824a2bdc

This is for device-tree funkiness which got added between linux-5.10.0
and linux-5.10.y (if the Debian kernel team wants to maintain a fix in
Debian's kernel source, that works too).

BTW have I mentioned I've become rather skeptical of device-trees being
a usable way of representing hardware information?


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-19 Thread Elliott Mitchell
On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:
> xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64
> 
> linux kernel version: 5.10.46-4 (the current amd64 kernel
> for bullseye)
> 
> Boot system: EFI, not using secure boot, booting xen
> hypervisor and dom0 bullseye with grub-efi package for
> bullseye, and it boots the xen-4.14-amd64.gz file, not
> the xen-4.14-amd64.efi file.

> I also tested a buster dom0 with the 4.19 series kernel
> on the xen-4.14 hypervisor from bullseye and saw the
> problem, but I did not see the problem with either
> a buster (linux 4.19) or bullseye (linux 5.10) dom0 on
> the xen-4.11 hypervisor, so I think the problem is
> with the Debian version of the xen-4.14 hypervisor,
> not with src:linux.

You're referencing several software versions which are mismatches for
#991967.  #991967 was observed with Xen 4.11 and Linux kernel 4.19.194-3,
but not Linux kernel 4.19.181.

The fact it correlates with a Linux kernel update rather strongly points
to the Linux kernel.  I could believe the situation is partially the
fault of both though.


> I suspect the following patch is the culprit for problems
> shutting down on the amd64 architecture:
> 
> 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

> This patch does affect amd64 acpi code, and is probably causing
> the problem on my amd64 system, so my build of the xen-4.14
> hypervisor without this patch fixed the problem.

Of the ones listed that is the only one which has any overlap with x86
code.  The next reproduction step is `apt-get source xen &&
patch -p1 -R < 0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
&& dpkg-buildpackage -b`.  Then try with this to confirm that patch
is what does it.

Thing is that delta is rather small.  I don't have a simulator, but that
is rather small to be the culprit.


> I think this bug should be re-classified as a bug in src:xen.

There could be a separate bug in src:xen, but that is not #991967.

> I also would inquire with the Debian Xen Team about why they
> are backporting patches from the upstream xen unstable
> branch into Debian's 4.14 package that is currently shipping
> on Debian stable (bullseye). IMHO, the aforementioned
> patches that are not in the stable 4.14 branch upstream
> should not be included in the xen package for Debian stable.

It was requested since someone trying to have Xen operational on a device
needed those for operation.  Rather a lot of bugfix or very small
standalone feature patches get cherry-picked.


Presently I haven't been convinced this is a Xen bug (though it does
effect Xen installations).

Any chance you've got the tools to build and try a 5.5.0 or 5.10.0 Linux
kernel?  I'm suspecting got incorrectly backported on the Linux side
(alternatively the Xen project seems a bit poor at keeping needed patches
in Linux).


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-19 Thread Elliott Mitchell
On Sun, Sep 19, 2021 at 01:05:56AM -0400, Chuck Zmudzinski wrote:
> On Sat, 11 Sep 2021 13:29:12 +0200 Salvatore Bonaccorso 
>  wrote:
>  >
>  > On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote:
>  > > An experiment lead to a potential alternative explanation for #991967.
>  > > The issue may be ACPI (non-UEFI) powerdown/reset was broken at
>  > > 4.19.194-3. Presence of Xen on the system may be unrelated.
>  > >
>  > > Failing that, it could be Xen and non-UEFI systems are effected. (Xen
>  > > was tried on a UEFI system and the issue wasn't observed)
>  >
>  > Following up on https://bugs.debian.org/991967#12
>  >
>  > Did you succeeded in bisecting the issue as you seem to have it
>  > reproducible?
> 
> I noticed this bug on bullseye ever since I have been
> running bullseye as a dom0, but my testing indicates
> there is no problem with src:linux but the problem
> appeared in src:xen with the 4.14 version of xen on
> bullseye.
> 
> I ask Elliott if you are only seeing the problem on Debian's
> xen-4.14 hypervisor? Also, which architecture, arm or
> amd64? I only see the problem on the Debian xen-4.14
> hypervisor, and I have only tested on amd64, and I
> have found a fix for my amd64 system which is as
> follows:
> 
> Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015,
> with a Haswell CPU (core i5-4590S)
> 
> xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64
> 
> linux kernel version: 5.10.46-4 (the current amd64 kernel
> for bullseye)

Nope.  As per the report the problem appeared with kernel 4.19.194-3 and
at the time using Xen 4.11.

The kernel you're listing is rather more recent, which might suggest a
patch which had been backported from 5.x to 4.19.

I could believe a Xen security update being the trigger though (I don't
recall there being one at the right time, but I wouldn't rule it out).


> Boot system: EFI, not using secure boot, booting xen
> hypervisor and dom0 bullseye with grub-efi package for
> bullseye, and it boots the xen-4.14-amd64.gz file, not
> the xen-4.14-amd64.efi file.
> 
> I also tested a buster dom0 with the 4.19 series kernel
> on the xen-4.14 hypervisor from bullseye and saw the
> problem, but I did not see the problem with either
> a buster (linux 4.19) or bullseye (linux 5.10) dom0 on
> the xen-4.11 hypervisor, so I think the problem is
> with the Debian version of the xen-4.14 hypervisor,
> not with src:linux.

Just to make sure, the kernel you were testing was 4.19.194-3?  The
issue didn't manifest with kernels earlier than that.

Could be we're seeing distinct bugs.


> This patch does affect amd64 acpi code, and is probably causing
> the problem on my amd64 system, so my build of the xen-4.14
> hypervisor without this patch fixed the problem.

While that commit modifies the code path the processor takes, the
modified path appears identical.


> I also would inquire with the Debian Xen Team about why they
> are backporting patches from the upstream xen unstable
> branch into Debian's 4.14 package that is currently shipping
> on Debian stable (bullseye). IMHO, the aforementioned
> patches that are not in the stable 4.14 branch upstream
> should not be included in the xen package for Debian stable.

Some people are asking for those.  Those are bugfixes for an extremely
popular device which panics on boot without the patches.


Meanwhile turned out between 5.10.0 and 5.10.30 the ARM64 device-trees
were modified in a way which broke Xen 4.14 on ARM64.  The change
violated Linux's own standards for device-trees, yet still appeared in a
stable branch.

In other news, if you see device-trees compared to ACPI tables, they're
not very comparable.  99% of ACPI tables work for all versions of all
OSes.  Any given device-tree is only likely to work for a single version
of a single OS.  While a useful abstraction for portions of kernel code,
device-trees are utter garbage compared to ACPI tables.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-12 Thread Elliott Mitchell
On Sat, Sep 11, 2021 at 01:29:12PM +0200, Salvatore Bonaccorso wrote:
> On Fri, Sep 10, 2021 at 06:47:12PM -0700, Elliott Mitchell wrote:
> > An experiment lead to a potential alternative explanation for #991967.
> > The issue may be ACPI (non-UEFI) powerdown/reset was broken at
> > 4.19.194-3.  Presence of Xen on the system may be unrelated.
> > 
> > Failing that, it could be Xen and non-UEFI systems are effected.  (Xen
> > was tried on a UEFI system and the issue wasn't observed)
> 
> Following up on https://bugs.debian.org/991967#12
> 
> Did you succeeded in bisecting the issue as you seem to have it
> reproducible?

Problem is that is rather a lot of kernel builds, which also means a lot
of downtime...   Right now distribution update seems worthy of greater
attention.

The one notable bit is the one I sent in the last message.  The system
does NOT have UEFI, and a test system with UEFI seemed to have no
problem.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: #991967: Simply ACPI powerdown/reset issue?

2021-09-10 Thread Elliott Mitchell
An experiment lead to a potential alternative explanation for #991967.
The issue may be ACPI (non-UEFI) powerdown/reset was broken at
4.19.194-3.  Presence of Xen on the system may be unrelated.

Failing that, it could be Xen and non-UEFI systems are effected.  (Xen
was tried on a UEFI system and the issue wasn't observed)


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#991967: linux-src 4.19.194-3 breaks Xen Dom0 powerdown and reboot

2021-08-06 Thread Elliott Mitchell
Package: src:linux
Version: 4.19.194-3
Control: affects -1 src:xen

SSIA.  Previous versions of 4.19 had no issues (4.19.181-1 according to
notes), but this cropped up with 4.19.194-3 (-1 and -2 weren't tested).

When a Xen domain 0 tries to reboot or powerdown the computer, it hangs
with the display off, but the power supply is active.

I'm rebuilding from source, so I imagine this also effects
linux-image-4.19.0-17-amd64.

Seems .194 caused multiple problems for Xen given 990642.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#939633: More severe #939633 for RP4 on 5.8?

2020-11-27 Thread Elliott Mitchell
found 935456 5.9.6-1~bpo10+1
quit

After having spent several hours on kernel compiles and experimenting
with the situation, I'm fairly sure this also applies to
linux-source-5.9.

Odd thing is, when I booted the device using the Tianocore implementation
it came right up with no problems.  I'm getting this odd suspicion
someone deliberately broke the device-trees in Debian's kernel source.
The goal being to force everyone onto the Tianocore/ACPI implementation
and try to kill device-trees.

Right now I think this is conspiracy theory territory, but I'm left
wondering how such a serious bug could hang around so long...


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#923814: Does #923814 dup of #906225?

2020-11-25 Thread Elliott Mitchell
On Wed, Nov 25, 2020 at 02:30:30PM -0800, Elliott Mitchell wrote:
> The kernel versions are quite different, but #923814 reads suspiciously
> like it is a duplicate of #906225.

On double-checking, hit the wrong follow-up address.  I was wanting to
advise the maintainers these two looked to potentially be the same bug...


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#939633: More severe #939633 for RP4 on 5.8?

2020-11-25 Thread Elliott Mitchell
found 939633 5.8.10-1~bpo10+1
severity 939633 important
merge 935456 939633
quit

I'm left suspecting bugs #935456 and #939633, are in reality a single
bug: Raspberry Pi device trees were garbled during Debian's 5.2 kernel
development.

They appear to remain very garbled, to the point of being pretty well
useless.  I've built a kernel from Debian's 5.8 kernel source and the
device tree binary produced doesn't appear to allow a Raspberry PI 4B
to complete its boot.  Might be USB functionality is operational, but
neither ethernet interface nor display function.

Ironically, the additional ACPI/EFI support DOES function.  This means
the Tianocore image for Raspberry PI 4B works better with the current
source.

I'm unsure whether badly breaking all Raspberry PI variants quite
justifies critical or grave (popular machine, but kernel issues by nature
cause 10x the damage so severities should be somewhat damped).

I certainly hope to see the 5.9 release since that has additional
high-value improvements...


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#965049: linux-source-5.6 build issues for ARM64

2020-07-14 Thread Elliott Mitchell
On Tue, Jul 14, 2020 at 08:20:29PM -0700, Elliott Mitchell wrote:
> I'm speculating the build may work if I run the correct rule, but I
> haven't yet identified that.

To make things easier for others, "all" was sufficient.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#965049: linux-source-5.6 build issues for ARM64

2020-07-14 Thread Elliott Mitchell
Package: src:linux
Version: 5.6.14-2~bpo10+1
Severity: important

I'm guessing this is isolated to ARM64 targets as I don't see other
reports.  I'm having difficulty trying to taget "bindeb-pkg" with
linux-source-5.6.

During the initial phase build was terminating quickly, complaining about
missing System.map.  I managed to work around this via
`make vmlinux modules`.  Now I'm to the error
"cp: cannot stat 'arch/arm64/boot/Image.gz': No such file or directory"

I'm speculating the build may work if I run the correct rule, but I
haven't yet identified that.

Kind of feels like all dependancies got lost for ARM64 targets.  This
may not warrant grave severity as some architectures build, but if you're
on ARM64 there is a major problem.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#962254: Umask ignored when mounting NFSv4.2 share of an exported ZFS (with acltype=off) (was: Re: Bug#962254: NFS(v4) broken at 4.19.118-2)

2020-06-15 Thread Elliott Mitchell
On Mon, Jun 15, 2020 at 10:50:35AM -0400, J. Bruce Fields wrote:
> Honestly I don't think I currently have a regression test for this so
> it's possible I could have missed something upstream.  I haven't seen
> any reports, though
> 
> ZFS's ACL implementation is very different from any in-tree
> filesystem's, and given limited time, a filesystem with no prospect of
> going upstream isn't going to get much attention, so, yes, I'd need to
> see a reproducer on xfs or ext4 or something.

Salvatore managing to reproduce it with ext4 yet all prior reports with
the filesystem used being known was ZFS seems to suggest one of two
things.

First, could be enabling POSIX ACLs has been very strongly pushed by
other filesystems, while ZFS hasn't pushed them as strongly.

Second, could be a substantial majority of users of NFS are using ZFS.

If the former, this simply means an additional test case is needed.  If
the latter, then any testing of NFS which excludes ZFS is going to have
underwhelming coverage.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#962254: Umask ignored when mounting NFSv4.2 share of an exported ZFS (with acltype=off) (was: Re: Bug#962254: NFS(v4) broken at 4.19.118-2)

2020-06-13 Thread Elliott Mitchell
On Sat, Jun 13, 2020 at 02:54:31PM +0200, Salvatore Bonaccorso wrote:
> indicated this was specifically observed on ZFS on Linux only. Seth
> Arnold's answer seem to be inline with that that the issue is more on
> the ZFS on Linux side and the issue keeps biting people a bit
> unexpectedly. Why does this break with ACL off settings?

I disagree with this assessment.  All of the reporters have been using
ZFS, but this could indicate an absence of testers using other
filesystems.  We need someone with a NFS server which has a 4.15+ kernel
and uses a different filesystem which supports ACLs.

I'm though doubtful ACLs are related to the actual problem.  My
impression of what I've read is they're a useful tool to work around the
problem, but not related to the actual cause.


> But there was at least one other (but again without further
> detail/followups) that it was observed on an export from OpenWRT, but
> no specific details here:
> 
> https://bugs.openwrt.org/index.php?do=details_id=2581

This appears to be the same reporter as the RedHat bug report (comment 3
on the RedHat report).  This is a report for the server portion of the
reporter's setup.

Analyzing the setup, I disagree with one of the prior assessment of this
report.  This is OpenWRT on x86_64 hardware which would suggest a
high-end router or embedded device.  Such might well have ECC memory and
a processor fast enough to handle ZFS.



Let me add one more data point.  I had been thinking I might need the
additional features in Linux-ZFS 0.7.12.  As such my NFS server had been
running a 4.9 kernel with Debian's ZFS 0.7.12-2+debg10u1~bpo9+1 packages.
Now with the problem manifesting my NFS server is running a 4.19 kernel
with Debian's ZFS 0.7.12-2+deb10u2 packages.

I could well believe the actual root cause is a problem with the
Linux-ZFS implementation.  What manifested the problem though seems to be
in Linux's NFS implementation between 4.9 and 4.15.  ie Linux-ZFS
implemented /something/ which worked when implemented, but may not have
properly implemented the intended API and was broken by Linux-NFS.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#962254: NFS(v4) broken at 4.19.118-2

2020-06-11 Thread Elliott Mitchell
Bit more experimentation on this issue.

I tried a very small C program meant to create files with fewer
permissions bits set.  This succeeded which strengthens the theory of
the umask getting ignored.

I haven't seen anything hinting whether this is more a client or server
issue.

I can speculate perhaps somewhere between 4.9 and 4.15 the NFS client
code stepped closer to proper the "proper" 4.2 protocol.  If a
corresponding NFS server was slow at getting merged, what we're seeing
could happen.

Alternatively someone was trying to get a Linux NFS v4.2 client to work
better with a different NFS v4.2 server, so they fixed Linux's NFS v4.2
client.  Yet they failed to test with Linux's v4.2 server.


This though is speculation.  All I can say is sometime between kernels
4.9 and 4.15, NFS v4.2 got broken.  There are hints this is related to
handling of umask.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#934160: Bug#962254: NFS(v4) broken at 4.19.118-2

2020-06-08 Thread Elliott Mitchell
Control: tags 962254 +security -unreproducible
Control: severity 962254 grave

On Fri, Jun 05, 2020 at 08:36:31PM +0200, Salvatore Bonaccorso wrote:
> This now let some rings bell, the described scenario is very similar
> to what was reported in https://bugs.debian.org/934160
> 
> Respectively
> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1779736 and
> https://bugzilla.redhat.com/show_bug.cgi?id=1667761 .

Upon more experimentation I continue to favor this being a kernel bug
(src:linux, bug #962254) and not a bug with nfs-common.

Setting vers=4.1 works around the issue, so this is *strictly* NFSv4.2.

I was able to reproduce this issue on a system with nfs-common
1:1.3.4-2.1 and a 4.19.118-2 kernel.

Based upon what I've observed I believe this requires a recent kernel on
*both* NFS client and NFS server.  A NFS client with 4.9 connecting to a
NFS server with 4.19 does NOT experience this issue.

I suspect my earlier assessment of this appearing between 4.19.98-1 and
4.19.118-2 was erroneous.  I think I was mislead by the order of
computers being updated, and a NFS client with 4.19 connecting to a NFS
server with 4.9 also does not experience this issue.

>From https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1779736
this bug appeared somewhere between Linux kernels 4.9 and 4.15.

I concur with John Goerzen's assessment of this qualifying as grave due
to its security implications.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#934160: Bug#962254: NFS(v4) broken at 4.19.118-2

2020-06-05 Thread Elliott Mitchell
I've run into a problem which produces the same behavior as bug #934160,
but attributed it elsewhere due to other observations.

What are the version(s) of the Linux kernel being used on your server and
clients?

I've confirmed using a 4.9 kernel on a client instead of a 4.19 kernel
also works around this issue.  In fact one client using a kernel from
4.19.98+1+deb10u1 source doesn't display the issue, but one using
4.19.118+2 source does.

This timeframe though doesn't match when you reported the issue.  Could
be there are several things working together to cause this.

I haven't yet tried tried using NFS version 4.1, instead of 4.2.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#962254: NFS(v4) broken at 4.19.118-2

2020-06-05 Thread Elliott Mitchell
On Fri, Jun 05, 2020 at 08:36:31PM +0200, Salvatore Bonaccorso wrote:
> This now let some rings bell, the described scenario is very similar
> to what was reported in https://bugs.debian.org/934160
> 
> Respectively
> https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1779736 and
> https://bugzilla.redhat.com/show_bug.cgi?id=1667761 .

Those do indeed seem similar and could be the same bug, but attributing
the bug to a distinct package.  Alternatively this is several bugs and
*all* of them need to be present for the issue to occur.

Seems I'll need to do some checking of the VM with the earlier kernel
and see which updates cause it to break...


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#962254: NFS(v4) broken at 4.19.118-2

2020-06-05 Thread Elliott Mitchell
On Fri, Jun 05, 2020 at 08:44:26AM +0200, Salvatore Bonaccorso wrote:
> 
> On Thu, Jun 04, 2020 at 10:16:07PM -0700, Elliott Mitchell wrote:
> > Somewhere between linux-image-4.19.0-8-amd64/4.19.98+1+deb10u1 and
> > linux-image-4.19.0-9-amd64/4.19.118+2 NFS, in particular v4 got broken.
> > Mounting an appropriate filesystem became unreliable, and once mounted
> > behavior is unpredictable.
> > 
> > In particular in the problematic case `umask 022 ; touch foo ; ls -l foo`
> > yields a -rw-rw-rw- file.
> > 
> > This occurs if *both* the server *and* client are on 4.19.118+2.  I have
> > confirmed this does NOT occur if the server is on a 4.9 kernel.  I have
> > also confirmed this does NOT occur if the client is on a 4.9 or
> > 4.19.98+1+deb10u1 kernel.
> 
> I cannot reproducde the described behaviour. Can you give more details
> on your setup?
> 
> How do you export the filesystem?
> What is the underlying filesystem exported?
> How and whith which options do clients mount the NFS share?

Presently it is a whole directories being exported to hosts.  The
filesystem on the server is ZFS.

Client is mounting hard,intr.  Client is using cachefilesd, but that
appears unrelated to the issue.

As this is NFSv4 (v2 and v3 are thoroughly disabled on the server), TCP
is being used.  The port is non-standard.

I'm uncertain I properly tried server on 4.9, client on 4.19.118+2 (could
be this is strictly 4.19.118+2 NFSv4 client code).


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#962254: NFS(v4) broken at 4.19.118-2

2020-06-04 Thread Elliott Mitchell
Package: src:linux
Version: 4.19.118+2
Severity: important

Somewhere between linux-image-4.19.0-8-amd64/4.19.98+1+deb10u1 and
linux-image-4.19.0-9-amd64/4.19.118+2 NFS, in particular v4 got broken.
Mounting an appropriate filesystem became unreliable, and once mounted
behavior is unpredictable.

In particular in the problematic case `umask 022 ; touch foo ; ls -l foo`
yields a -rw-rw-rw- file.

This occurs if *both* the server *and* client are on 4.19.118+2.  I have
confirmed this does NOT occur if the server is on a 4.9 kernel.  I have
also confirmed this does NOT occur if the client is on a 4.9 or
4.19.98+1+deb10u1 kernel.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#926046: Negotiated or default wsize causes misbehavior

2019-03-30 Thread Elliott Mitchell
Package: nfs-common
Version: 1:1.3.4-2.1

I'm using NFSv4 over TCP at the moment.  If I don't specify rsize and
wsize on the client, either the client negotiates a wsize of 256KB or
defaults to a wsize of 256KB ("wsize=262144").

When dumping large amounts of data (moving 2TB of data around, figure
many 200MB files) onto the server, after a while the mount hangs and then
messages start appearing in the server kernel log:
"[sss.mmm] NFSD: client x testing state ID with incorrect client ID"
After several minutes the mount was recovering, but having an entire
machine locked up for a while is a problem.

During an attempt to revert to using UDP, I discovered that explicitly
setting wsize=8192 fixed the problem (this size is reasonable with UDP if
you've got jumbo-frame support).  I'm guessing either the default is bad
or negotiation is failing to generate a working value.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#801067: Concurring with #801067

2019-03-30 Thread Elliott Mitchell
I have little option, but to agree with Reuben Thomas.  The bottom of
README.Debian.nfsv4 has a date of "Wed, 11 Oct 2006 15:18:03 +0200", more
than 10 years old.

Even for Debian being in the distribution for 10 years no longer
qualifies as "rather new".  A 2.6 kernel is no longer "recent" in light
of Debian being on 4.9 now.  The lines suggested to be added to
/etc/services on the client are now present in Debian's default
/etc/services file.

Yeah, that file needs a bit of an update or removal...


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#524458: Still a problem? NFS version?

2019-03-30 Thread Elliott Mitchell
It has been quite some time since there was last any activity on #524458.
Is this problem still occuring for the submitter?  Might it have been
fixed in one of the update rounds?

If this is still a problem, what version of the NFS protocol is in use?
In theory NFSv2 should be able to handle files under 2GB, but perhaps a
limitation of Linux's NFS client or NFS server made a 176MB file be a
problem.  Version 3 of the protocol is widely supportted, I'd suggest
moving to version 3 or version 4 if this mount is still on version 2.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#903914: xen_netfront broken in 4.9.110-1

2018-07-16 Thread Elliott Mitchell
Package: linux-source-4.9
Version: 4.9.110-1

Anyone who was using jumbo frames inside a Xen guest was fine with
4.9.88-1+deb9u1, but a problem suddenly showed up with 4.9.110-1.

Discussion of problem:

https://lists.gt.net/xen/devel/519117

Something which acts like a working patch is here:

http://ubuntu.5.x6.nabble.com/Xenial-Regression-SRU-Fix-quot-Cannot-set-MTU-higher-than-1500-in-Xen-instance-quot-td5170202.html


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#832629: CONFIG_CGROUPS=n appears to have been broken in updates

2016-07-27 Thread Elliott Mitchell
Package: src:linux
Version: 3.16.7-ckt25-2+deb8u3
Severity: important

Unfortunately I cannot finger the exact version where it happened, but it
appears one of the updates to linux-source-3.16 *broke* builds where
CONFIG_CGROUPS was left unset.  While this may be an unusual
configuration, it certainly *was* working.

I thought updates were supposed to be confined to *security* issues, the
better to avoid breaking other confirmed to work setups.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#696292: Is #696292 a duplicate of #588675?

2016-07-22 Thread Elliott Mitchell
Reads like #696292 might be yet another manifestation of #588675, or
perhaps #588675 is the root cause (or related to the root cause) of
#696292.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#827561: Update 3.2.78 -> 3.2.81 broke builds in fs/fcntl.c

2016-06-17 Thread Elliott Mitchell
Control: tags -1 patch

On Fri, Jun 17, 2016 at 11:28:27PM +0100, Ben Hutchings wrote:
> On Fri, 2016-06-17 at 12:27 -0700, Elliott Mitchell wrote:
> > Package: linux-source-3.2
> > Version: 3.2.81-1
> > Severity: important
> > 
> > SSIA:
> > 
> > ?? CCfs/fcntl.o
> > fs/fcntl.c: In function 'setfl':
> > fs/fcntl.c:186:31: error: dereferencing pointer to incomplete type
> > fs/fcntl.c:187:30: error: dereferencing pointer to incomplete type
> > make[2]: *** [fs/fcntl.o] Error 1
> > make[2]: *** Waiting for unfinished jobs
> > 
> > That would be a problem for this update, this hunk of code is new for
> > 3.2.81.Seems someone forgot a header (I'm not yet sure which).
> 
> This code was added as part of the fix for??#627782. ??It builds
> successfully in Debian's own configurations.
> 
> It looks like this build failure occurs if CONFIG_MODULES is disabled??
> and you should be able to avoid it by enabling that.

Problem is that was a very deliberate on the particular computer.
Unusual, but something that *should* work.

I've got a partial patch for general consumption attached.  I'm pretty
sure the changes done for #627782 are buggy.  If someone builds a kernel
with AUFS built into the kernel the test in fcntl.c will fail (the test
only works if AUFS is a module).


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445


--- fcntl.c.orig	2016-06-16 10:03:04.0 -0700
+++ fcntl.c	2016-06-17 17:58:26.0 -0700
@@ -182,10 +182,24 @@
 	 * Since only aufs will implement it, check that the file ops
 	 * are implemented by a version of aufs that does.  (Ugh.)
 	 */
-	if (!error && filp->f_op->owner &&
-	!strcmp(filp->f_op->owner->name, "aufs") &&
-	strstr(filp->f_op->owner->version, "+setfl"))
+#if defined(CONFIG_MODULES)
+#if defined(CONFIG_AUFS_FS)
+#if 0
+#if CONFIG_AUFS_FS == "y"
+#error "CONFIG_AUFS_FS=y is a known problem, see #627782"
+#endif
+#endif
+#define AUFS_UNLIKELY
+#else
+#define AUFS_UNLIKELY unlikely
+#endif
+	if (likely(!error) && filp->f_op->owner &&
+	!AUFS_UNLIKELY(strcmp(filp->f_op->owner->name, "aufs")) &&
+	AUFS_UNLIKELY(strstr(filp->f_op->owner->version, "+setfl")))
 		error = filp->f_op->setfl(filp, arg);
+#elif defined(CONFIG_AUFS_FS)
+#error "CONFIG_MODULES=n && CONFIG_AUFS_FS=y is a known problem, see #627782"
+#endif
 	if (error)
 		return error;
 


Bug#827561: Update 3.2.78 -> 3.2.81 broke builds in fs/fcntl.c

2016-06-17 Thread Elliott Mitchell
Okay, looked through and not quite the problem I thought.  Problem is the
section added to fs/fcntl.c:setfl() depends upon CONFIG_MODULES being
enabled.  Certainly turning off kernel modules isn't all that common, but
it is a situation that is actively used for some situations.

I also note the added code is only really useful if CONFIG_AUFS_FS is
enabled.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#827561: Update 3.2.78 -> 3.2.81 broke builds in fs/fcntl.c

2016-06-17 Thread Elliott Mitchell
Package: linux-source-3.2
Version: 3.2.81-1
Severity: important

SSIA:

  CC  fs/fcntl.o
fs/fcntl.c: In function 'setfl':
fs/fcntl.c:186:31: error: dereferencing pointer to incomplete type
fs/fcntl.c:187:30: error: dereferencing pointer to incomplete type
make[2]: *** [fs/fcntl.o] Error 1
make[2]: *** Waiting for unfinished jobs

That would be a problem for this update, this hunk of code is new for
3.2.81.  Seems someone forgot a header (I'm not yet sure which).


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#588675: Beginings of Heading Towards a Root Cause of #588675

2016-06-16 Thread Elliott Mitchell
For some time I'd been trying to search for a cause of #588675.  Looks
like I finally searched for the right string (problem is "root" occurs in
many places inside the Linux kernel source).

Looks like the key file is linux/init/do_mounts.c:

Appears the line:
ROOT_DEV = name_to_dev_t(root_device_name);
inside prepare_namespace() resolves any specified root device into
major/minor.  Later at the end of mount_root(), /dev/root is created with
the appropriate major/minor, but mount_root() never tries to resolve the
major/minor back into a proper device name.

The two spots that I've gotten hints of potentially being able to get
back the proper device name are: Inside do_mount_root(), s->s_id is
"sda1", but I'm a bit worried that may not work in cases with LVM where
the proper result could have been "scsi0/target0/".
The other is potentially doing bdevname(bdget(ROOT_DEV), char_buffer)
may give something approximating a proper name.

>From looking at the current code, I suspect while this behavior may have
first appeared with SCSI devices, it may well have spread to all block
devices other than MTD and UBI (commonly being embedded devices with
memory completely inadequate to hold an initial ramdisk, users of MTD
device roots would have screamed too loudly to ignore).  So I got that
wrong.

If things go well, I may have a patch soon (alas, I'm also having to
fight other issues as well, so that could take a while).


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#826999: Kernel build scripts confused by entries for SCSI devices in /proc/mounts

2016-06-16 Thread Elliott Mitchell
Control: reopen 826999

On Sat, Jun 11, 2016 at 10:06:16AM -0700, Elliott Mitchell wrote:
> On Sat, Jun 11, 2016 at 01:15:18PM +0100, Ben Hutchings wrote:
> > I make no judgement about the significance of that bug. ??But if you
> > refuse to answer a maintainer's reasonable questions about a report,
> > there is no way to progress and it should be closed.
> 
> That is a perfectly reasonable statement.  Please cite an example of
> such an unanswered question.

Can I take the lack of response to this as an admission that there are no
such unanswered questions?  Should I go further and suggest perhaps you
didn't fully read a prior message?


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#826999: Kernel build scripts confused by entries for SCSI devices in /proc/mounts

2016-06-11 Thread Elliott Mitchell
On Sat, Jun 11, 2016 at 01:15:18PM +0100, Ben Hutchings wrote:
> Control: forcmerge??588675 -1
> 
> On Fri, 2016-06-10 at 18:16 -0700, Elliott Mitchell wrote:
> > The kernel build scripts are confused by what the SCSI subsystem produces
> > in /proc/mounts:
> 
> This is not controlled by the SCSI subsystem.
> 
> Unlike you, I've read the relevant source code, and I'm the upstream
> maintainer of initramfs-tools. ??I've seen problems like this before.
> When I ask you questions I am not just speculating.

I'm pretty sure the initramfs isn't the problem, though its presence does
manage to work around bug #588675.

> > $ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts
> > /dev/root / ext3 ro
> > $??
> > 
> > A kernel build on such a system will panic on boot unless the root
> > filesystem is explicitly passed to the kernel by the bootloader.While
> > in common configurations bootloaders generally default to telling the
> > kernel what device it should use as root, that has not been documented to
> > be required.
> 
> It seems pretty obvious to me that you have to specify the root device
> somehow.

Indeed, and there are lots of ways to do that.  There is the "rdev"
setting (which I'm pretty sure is what underlies #826999), you or your
bootloader can also specify a device on the kernel's command-line.

> > Since Ben Hutchings thinks #588675 is too insignificant to ever be worthy
> > of a single line of code to fix, this bug now needs to be fixed (along
> > with many other utilities that are broken by #588675).
> 
> I make no judgement about the significance of that bug. ??But if you
> refuse to answer a maintainer's reasonable questions about a report,
> there is no way to progress and it should be closed.

That is a perfectly reasonable statement.  Please cite an example of
such an unanswered question.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#826999: Kernel build scripts confused by entries for SCSI devices in /proc/mounts

2016-06-10 Thread Elliott Mitchell
Package: src:linux
Version: 3.2.63-2
Control: found -1 linux/2.6.18
Control: found -1 linux/3.16.7-ckt25-2
Control: found -1 linux/3.16.7-ckt20-1+deb8u3
Control: found -1 linux/3.2.78-1
Control: found -1 linux/3.16.7-ckt11-1+deb8u6~bpo70+1
Control: found -1 linux/2.6.32

The kernel build scripts are confused by what the SCSI subsystem produces
in /proc/mounts:

$ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts
/dev/root / ext3 ro
$ 

A kernel build on such a system will panic on boot unless the root
filesystem is explicitly passed to the kernel by the bootloader.  While
in common configurations bootloaders generally default to telling the
kernel what device it should use as root, that has not been documented to
be required.

Since Ben Hutchings thinks #588675 is too insignificant to ever be worthy
of a single line of code to fix, this bug now needs to be fixed (along
with many other utilities that are broken by #588675).


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#588675: Summary of observations of #588675

2016-06-10 Thread Elliott Mitchell
On Fri, Jun 10, 2016 at 10:19:47PM +0100, Ben Hutchings wrote:
> Are you using LILO? ??Are you specifying the root device by name or
> UUID?

I'm quite certain that is completely irrelevant.  Either of those, or
even allowing the rdev field in the kernel image should result in the
device being shown for / in /proc/mounts.  On the most recent boot of
this machine though, "root=/dev/sda1" is in /proc/cmdline, yet the line
in /proc/mounts is "/dev/root / ext3 ..."

The only common factor is the SCSI subsystem.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#588675: Summary of observations of #588675

2016-06-10 Thread Elliott Mitchell
Control: retitle 588675 SCSI subsystem loses name of root device on boot
Control: severity 588675 normal
Control: found 588675 3.2.78-1
Control: found 588675 3.16.7-ckt20-1+deb8u3
Control: found 588675 3.16.7-ckt25-2
Control: found 588675 2.6.18

According to the advanced information on the BTS, under severity levels:

   wishlist
  for any feature request, and also for any bugs that are very
  difficult to fix due to major design considerations.

The first condition is untrue, this is definitely a bug.  While the
damage may not be that major, it is pretty widespread.  If the Debian
kernel maintainers were to claim this wasn't a problem, then I would be
forced to report another bug against src:linux since the kernel build
scripts themselves are confused by this behavior!

The second condition requires a judgement call to evaluate, but looking
at things I'm pretty sure it is untrue.  I'm guessing this is simply one
crucial field that needs to be copied by the SCSI subsystem, but is not.
Since many other subsystems manage to copy the value, almost certainly
the change is small.  I'd be surprised if it took more than 4 lines to
fix (two of which being blank and one being a comment).  I will concede
this may need expertise on how /proc/mounts works and the interface
between that and the driver subsystems (alternatively simply looking for
one field which is ignored may be enough), but with that this should be a
simple fix.

Meanwhile the damage from this bug may not be that large, but it is
rather widespread.  I know of 4 reports where this is the root cause and
I imagine there are others I do not know of.  There may also be many
utilities that already work around this bug and hundreds of scripts that
are similarly forced to do so.

This bug has also wasted a great deal of time trying to figure out where
to attribute the issue.  My earliest observations were close to a decade
ago, but I didn't feel confident placing blame anywhere.  Then more
recently I had to spend time building several kernels to confirm the
conditions under which the problem occurred.


Uneffected systems:

This group consists of all system where the root filesystem is NOT on a
device that directly plugs into the SCSI subsystem.  It does not matter
whether an initial ramdisk is used or not.  This includes systems like:

root on Linux software RAID:
$ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts
/dev/md0 / ext3 ro 0 0
$ 
I recall this system being in service from around 2.6.5(?) to 2.6.18 or
so.  Even though the immediate driver was the MD subsystem, underlying
this were SCSI devices.  This is long in the past, but I'd already been
observing the bug by then (and wondering where to point the finger).

root on olde IDE devices, on the olde IDE subsystem:
$ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts
/dev/hda1 / ext3 ro 0 0
$ 
I think this system managed to remain in service into the 2.6.29
timeframe, but is also no longer in service.  This does give an example
of the root filesystem being on a different subsystem though.  Crucially
this is prior to the olde IDE subsystem being retired and the driver for
PATA devices which plugged into the SCSI subsystem coming into service.

root on MTD devices:
$ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts
/dev/mtdblock4 / jffs2 rw,relatime 0 0
$ 
A very different system here.  Different filesystem and rather different
device.  This one hasn't been tried with kernels earlier than 3.2, but
seems to echo other observations.  This one is in active service and due
to interesting setup allows for testing of some interesting scenarios.

root on BLK_DEV_IDE_PMAC (olde Mac IDE subsystem?):
This is Christian Kujau's report in bug #588675.  I believe
BLK_DEV_IDE_PMAC would be a PowerMac analog of the x86 IDE driver which
had it's own subsystem and which didn't plug into the SCSI subsystem.


Effected systems:

This group consists of all system where the root filesystem is on a
device that directly plugs into the SCSI subsystem and the system
directly mounts that device at boot.  On such systems:

$ awk '$2 == "/" && $1 != "rootfs"' < /proc/mounts
/dev/root /  ro,relatime 0 0
$ 

Most of my systems are running ext3, but Christian Kujau confirmed this
with ext4 and jfs.  Christian Kujau also observed this with the
PATA_MACIO driver, which I believe is a Macintosh equivalent of the x86
PATA driver which plugs into the SCSI subsystem.  I've observed this on
many different systems with devices which plug into the SCSI subsystem,
this includes a 3ware card, SATA disks, USB flash drives and genuine SCSI
disks.


Workaround:

The workaround that bypasses the problem is to initially mount some other
device as root, then pivot_root or such onto the real root.  Using an
initial ramdisk is one example of this.  From the DebWRT project I'm also
aware of the case of booting onto a root on MTD and then doing a
pivot_root onto a USB flash key works arount the issue.

$ awk '$2 == "/" && $1 != "rootfs"' < 

Bug#820567: kexec on mipsel partially broken between ckt20 and ckt25

2016-04-10 Thread Elliott Mitchell
On Mon, Apr 11, 2016 at 01:34:56AM +0100, Ben Hutchings wrote:
> On Sun, 2016-04-10 at 14:32 -0700, Elliott Mitchell wrote:
> > On Sun, Apr 10, 2016 at 07:47:28PM +0100, Ben Hutchings wrote:
> > > That in no way contradicts what I said. :-) ??When I backport the linux
> > > source package from jessie to wheezy I change it to use gcc-4.6.
> > > 
> > > But the linux-source-X.Y packages (which are a different thing to the
> > > linux source package!) don't specify any particular compiler version.
> > > You can choose that with the CC variable; otherwise the default
> > > compiler (specified by the gcc package) will be used.
> > For this particular mipsel device I was unable to kexec the kernel unless
> > it was built with GCC-4.8.
> 
> I see.
> 
> > If the kernel was built with GCC-4.7 or
> > earlier, I got symptoms identical to the above, messages from the old
> > kernel on the console serial port that it was going away and kexec'd
> > kernel never output any messages.I could believe this is a funky
> > compiler issue.
> 
> Could it be the kernel image is close to a critical size limit? ??The
> kernel typically gets slightly larger with each stable update. ??Does
> gcc 4.8 generate a smaller or larger kernel image than older versions?

You win one and lose one.  I tracked down the configuration option that
managed to switch from "y" to "n" (seems my base config had it as "y",
but other options interfered, now it is "m"), that shrank the kernel by
40KB and the resultant kernel was successfully loaded by the 3.3 kernel.

The kernel built with GCC 4.4 was about 2% larger than the GCC 4.8 build,
while a GCC 4.6 build was less than 1% smaller than the GCC 4.8 build.
Neither of these kernels was able to successfully start when kexec'd by
a 3.16 kernel (which *was* able to start the bigger kernel).


So this solves the problem this bug was about, it was a size issue.  :-(

Alas I'm expecting it to be a while before I can get the proper solution
in place.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#820567: kexec on mipsel partially broken between ckt20 and ckt25

2016-04-10 Thread Elliott Mitchell
On Sun, Apr 10, 2016 at 07:47:28PM +0100, Ben Hutchings wrote:
> On Sun, 2016-04-10 at 11:09 -0700, Elliott Mitchell wrote:
> > On Sun, Apr 10, 2016 at 10:09:38AM +0100, Ben Hutchings wrote:
> > > 
> > > On Sat, 2016-04-09 at 18:31 -0700, Elliott Mitchell wrote:
> > > > 
> > > > Between 3.16.7-ctk20 and 3.16.7-ctk25 the kexec functionality of the
> > > > Linux kernel was damaged.The system I'm looking at uses a 3.3 kernel
> > > > to load the "real" kernel off a filesystem and kexec into that.The 
> > > > 3.3
> > > > kernel was able to successfully kexec into a 3.16.7-ctk20 kernel, but
> > > > is unable to kexec into a 3.16.7-ctk25 kernel.However I found the
> > > > 3.16.7-ctk20 IS able to successfully kexec the 3.16.7-ctk25 kernel.
> > > Surely this is a bug in the built-in (3.3) kernel, not the new one? ??If
> > > there's something simple that can be done in the Debian kernel to work
> > > around this, we should do that, but otherwise you're stuck with this.
> > This is certainly a reasonable theory.Alas, I cannot speak to which of
> > these theories is correct.All I can say for certain is that something
> > changed between ctk20 and ctk25 which made the 3.3 kernel unable to kexec
> > ctk25.I'm under the impression as of 3.3 the kexec functionality was
> > supposed to be stable on MIPS, but that could be incorrect.
> > 
> > I should also note, during the failed kexecs I would see the messages
> > from the 3.3 kernel saying the kexec was starting, but never see any
> > messages from the ctk25 kernel.Unless someone wants to send me a JTAG
> > decoder that is all I can say.
> 
> One of the MIPS porters may be able to help you, but I have no idea
> what to suggest.
> 
> Are you using one of the linux-image packages or building from source
> with your own configuration? ??In the latter case, are you sure you used
> the same configuration for old and new kernels?

Building from source.  The .config files used started identical, but
looks like things changed in the Kconfig files which caused 4 items to
switch from 'm' to 'y' (all networking, which shouldn't cause the
observed bug).  There were also some patches derived from OpenWRT's
patches, but those did not change at all.


> > > > Doing a double-kexec does work around the issue, but it means I need to
> > > > hold onto that one magic kernel for the moment...
> > > > 
> > > > In other news, it appears sometime between 3.3 and 3.10 there started
> > > > being a requirement for GCC 4.8 on mipsel.
> > > Packages in jessie must be buildable using compiler versions in jessie.
> > > That means either gcc-4.8 or gcc-4.9.
> > linux-source-3.16 is available as a backport to wheezy, which does not
> > feature gcc-4.8.
> 
> That in no way contradicts what I said. :-) ??When I backport the linux
> source package from jessie to wheezy I change it to use gcc-4.6.
> 
> But the linux-source-X.Y packages (which are a different thing to the
> linux source package!) don't specify any particular compiler version.
> You can choose that with the CC variable; otherwise the default
> compiler (specified by the gcc package) will be used.

For this particular mipsel device I was unable to kexec the kernel unless
it was built with GCC-4.8.  If the kernel was built with GCC-4.7 or
earlier, I got symptoms identical to the above, messages from the old
kernel on the console serial port that it was going away and kexec'd
kernel never output any messages.  I could believe this is a funky
compiler issue.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#820567: kexec on mipsel partially broken between ckt20 and ckt25

2016-04-10 Thread Elliott Mitchell
On Sun, Apr 10, 2016 at 10:09:38AM +0100, Ben Hutchings wrote:
> On Sat, 2016-04-09 at 18:31 -0700, Elliott Mitchell wrote:
> > Between 3.16.7-ctk20 and 3.16.7-ctk25 the kexec functionality of the
> > Linux kernel was damaged.The system I'm looking at uses a 3.3 kernel
> > to load the "real" kernel off a filesystem and kexec into that.The 3.3
> > kernel was able to successfully kexec into a 3.16.7-ctk20 kernel, but
> > is unable to kexec into a 3.16.7-ctk25 kernel.However I found the
> > 3.16.7-ctk20 IS able to successfully kexec the 3.16.7-ctk25 kernel.
> 
> Surely this is a bug in the built-in (3.3) kernel, not the new one? ??If
> there's something simple that can be done in the Debian kernel to work
> around this, we should do that, but otherwise you're stuck with this.

This is certainly a reasonable theory.  Alas, I cannot speak to which of
these theories is correct.  All I can say for certain is that something
changed between ctk20 and ctk25 which made the 3.3 kernel unable to kexec
ctk25.  I'm under the impression as of 3.3 the kexec functionality was
supposed to be stable on MIPS, but that could be incorrect.

I should also note, during the failed kexecs I would see the messages
from the 3.3 kernel saying the kexec was starting, but never see any
messages from the ctk25 kernel.  Unless someone wants to send me a JTAG
decoder that is all I can say.


> > Doing a double-kexec does work around the issue, but it means I need to
> > hold onto that one magic kernel for the moment...
> > 
> > In other news, it appears sometime between 3.3 and 3.10 there started
> > being a requirement for GCC 4.8 on mipsel.
> 
> Packages in jessie must be buildable using compiler versions in jessie.
> That means either gcc-4.8 or gcc-4.9.

linux-source-3.16 is available as a backport to wheezy, which does not
feature gcc-4.8.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#820567: kexec on mipsel partially broken between ckt20 and ckt25

2016-04-09 Thread Elliott Mitchell
Package: linux-source-3.16
Version: 3.16.7-ckt25-1~bpo70+1

Between 3.16.7-ctk20 and 3.16.7-ctk25 the kexec functionality of the
Linux kernel was damaged.  The system I'm looking at uses a 3.3 kernel
to load the "real" kernel off a filesystem and kexec into that.  The 3.3
kernel was able to successfully kexec into a 3.16.7-ctk20 kernel, but
is unable to kexec into a 3.16.7-ctk25 kernel.  However I found the
3.16.7-ctk20 IS able to successfully kexec the 3.16.7-ctk25 kernel.

Doing a double-kexec does work around the issue, but it means I need to
hold onto that one magic kernel for the moment...

In other news, it appears sometime between 3.3 and 3.10 there started
being a requirement for GCC 4.8 on mipsel.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#588675: / left as /dev/root with non-initrd kernel

2015-12-03 Thread Elliott Mitchell
On Thu, Dec 03, 2015 at 03:11:45PM -0800, Christian Kujau wrote:
> On 12/02/2015 04:30 PM, Elliott Mitchell wrote:
> > You're thinking of the wrong bug.  #588675 is the bug # for /proc/mounts
> > having "/dev/root" listed as the device for the root filesystem.  Your
> 
> Indeed, I think I confused this with #656333 ("Please ignore rootfs in
> df output"), which may be related to this one.

So many bugs, so little time.  :-/   This actually effected many other
utilities as well.  I'm a bit surprised if this is getting fixed in the
kernel, I thought the kernel maintainers had decided "this is the way the
kernel does this, userspace needs to compensate".

> > previous mention indicated you would expect "/dev/sda6" to be there.  I'm
> > guessing prior to wheezy, when you were using BLK_DEV_IDE_PMAC, you would
> > have been seeing "/dev/hda6" listed as the root device?
> 
> Again, I'm afraid I don't remember what I've seen prior to wheezy. The
> system is running 24x7 but I'll try to boot a pre-wheezy image (or
> something with BLK_DEV_IDE_PMAC enabled) the next time this machine is
> rebooted and see if the actual disk or /dev/rootfs is displayed.

Thanks.  I'm pretty sure it will show the actual disk, but another
confirmation will help.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#588675: / left as /dev/root with non-initrd kernel

2015-12-02 Thread Elliott Mitchell
Control: found -1 3.16.7-ckt11-1+deb8u6~bpo70+1
Control: found -1 2.6.32

Could you confirm a few things about what you've seen of bug 588675?

Did you observe the behavior prior to Debian wheezy/Linux kernel 3.2?

What type of disk/controller/disk subsystem is on your powerpc system?

>From your mention of /dev/sda6 in bug #588675 it is clear as of Debian
wheezy/Linux kernel 3.2 that the disk/controller plugs into the SCSI
subsystem.  I'm pretty sure this is a SCSI subsystem bug, since you've
also seen the behavior I'd like to confirm this has followed the SCSI
subsystem for you as well.


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#588675: / left as /dev/root with non-initrd kernel

2015-12-02 Thread Elliott Mitchell
On Wed, Dec 02, 2015 at 02:15:18PM -0800, Christian Kujau wrote:
> On 12/02/2015 01:23 PM, Elliott Mitchell wrote:
> > Could you confirm a few things about what you've seen of bug 588675?
> > 
> > Did you observe the behavior prior to Debian wheezy/Linux kernel 3.2?
> > 
> > What type of disk/controller/disk subsystem is on your powerpc system?
> > 
> > From your mention of /dev/sda6 in bug #588675 it is clear as of Debian
> > wheezy/Linux kernel 3.2 that the disk/controller plugs into the SCSI
> > subsystem.  I'm pretty sure this is a SCSI subsystem bug, since you've
> > also seen the behavior I'd like to confirm this has followed the SCSI
> > subsystem for you as well.
> 
> Wow, that's an old bug :-)

Nah, some of my older bug reports would be eligible to drive in some
countries.   :-)

> I had to reinstall the PowerBook with Wheezy due to a disk failure and
> after that I've upgraded from Wheezy to Jessie and the problem is gone now.
> 
> I can't tell if I've seen this prior to Linux 3.2 kernels. If it helps I
> could try to boot an older Debian/wheezy live-cd and see if the rootfs
> comes up twice.
> 
> The disk controller of this PowerBook G4 is:
> 
> 0002:20:0d.0 Unassigned class [ff00]: Apple Inc. UniNorth/Intrepid ATA/100
> 
> I've used BLK_DEV_IDE_PMAC ages ago, but have switched to PATA_MACIO for
> a while now. But again, I can't tell when the double "/" entry occured
> first, I noticed it only at the time of my bug entry (wheezy 7.3).
> 
> I'm still using a self-compiled kernel but the issue is gone now, at
> least on this system:
> 
> $ uname -r; grep root /proc/mounts
> 4.3.0-11626-g5d50ac7
> /dev/root / jfs rw,nodev,relatime 0 0

You're thinking of the wrong bug.  #588675 is the bug # for /proc/mounts
having "/dev/root" listed as the device for the root filesystem.  Your
previous mention indicated you would expect "/dev/sda6" to be there.  I'm
guessing prior to wheezy, when you were using BLK_DEV_IDE_PMAC, you would
have been seeing "/dev/hda6" listed as the root device?


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Bug#588675: Narrowing on location of bug #588675

2015-01-04 Thread Elliott Mitchell
reassign 588675 linux-source 2.6.32
retitle 588675 SCSI subsystem forgets root device on boot
found 588675 2.6.18
found 588675 3.2.63-2
submitter 588675 !
quit

I suspect the list of kernel versions with this bug is rather longer, but
I'm merely including some of those I'm certain it does effect.  I suspect
even the latest kernels are effected, but I haven't confirmed this.
Thankfully I finally noticed an ingredient crucial enough to narrow down
the list of causes to a reasonable length.

The bug's manifestation is fairly simple, on an effected system:

$ head -2 /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 ro,errors=continue 0 0
$

Whereas on an uneffected system:

$ head -2 /proc/mounts 
rootfs / rootfs rw 0 0
/dev/sda1 / ext3 rw,errors=continue 0 0
$

And another uneffected example:

$ head -2 /proc/mounts
rootfs / rootfs rw 0 0
/dev/mtdblock2 / jffs2 rw,relatime 0 0
$

The two crucial ingredients for reproducing this bug, the system must
boot directly onto the root device (no initrd) and the root device must
be something that plugs into the SCSI subsystem.

This effects x86, PowerPC, MIPSel and likely other machines.  This
effects systems with SATA main disks (SATA devices go through the SCSI
subsystem) as well as root on USB devices (yes, it *is* possible to get a
kernel to directly boot onto a USB key).

This does NOT effect older kernels when booting onto IDE subsystem disks
(/dev/hd* with newer kernels IDE disks go through the SCSI subsystem and
are likely effected).  This does not effect systems which initially mount
*any* other device as root, and subsequently chroot onto a SCSI subsystem
device (this explains why initrd system are uneffected).

While this bug is mostly harmless, it is the root cause behind bugs
620465, 653073, 656333, and may very well have caused other bug reports
I'm unaware of.


-- 
(\___(\___(\__  --= 8-) EHM =--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include stddisclaimer.h O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 -PGP- 41D1 B375 37D0 8714\_|_/___/5445


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20150105000223.ga50...@scollay.m5p.com



Bug#588675: Narrowing on location of bug #588675

2015-01-04 Thread Elliott Mitchell
On Mon, Jan 05, 2015 at 03:17:28AM +, Ben Hutchings wrote:
 Control: reassign -1 src:linux 3.2.63-2
 Control: retitle -1 / left as /dev/root with non-initrd kernel
 Control: severity -1 wishlist
 Control: tag -1 upstream wontfix
 
 On Sun, 2015-01-04 at 16:02 -0800, Elliott Mitchell wrote:
 [...]
  The two crucial ingredients for reproducing this bug, the system must
  boot directly onto the root device (no initrd) and the root device must
  be something that plugs into the SCSI subsystem.
 [...]
  This does NOT effect older kernels when booting onto IDE subsystem disks
  (/dev/hd* with newer kernels IDE disks go through the SCSI subsystem and
  are likely effected).  This does not effect systems which initially mount
  *any* other device as root, and subsequently chroot onto a SCSI subsystem
  device (this explains why initrd system are uneffected).
 [...]
 
 I don't see why the driver would matter.  Since at least the beginning
 of git history (2.6.12), when you use the root= parameter to boot
 directly from a block device, the kernel has done:

I'm also surprised about the driver making such a difference, but
observation has demonstrated it clearly does.  Prior to hardware
replacement I'd been using a system with an IDE^WPATA disk which used the
olde IDE subsystem and /dev/hda1 appeared in /proc/mounts.  Notice my
prior message I mentioned with a 3.2 kernel a device that mounts
/dev/mtdblock2 (without any initrd) as root filesystem, and
/dev/mtdblock2 appears correctly in /proc/mounts.

Since the SCSI subsystem is in common with the observed occurances, I
must point my finger towards *something* being messed up with the SCSI
subsystem.

 1. Mount rootfs (which is really either tmpfs or ramfs) at /
 2. Create directories /dev and /root, and block device /dev/console
 3. Create block device node /dev/root for the specified block device
 4. Mount /dev/root at /root
 5. Move-mount /root to / (hiding the tmpfs/ramfs)
 
 What *has* changed is that /etc/mtab is now a symlink to /proc/mounts
 and therefore the root device name recorded there is not affected by
 /etc/fstab.

No, that is not where the problem occurs.  Back when I was running on the
system with IDE disk, /etc/mtab was already a symbolic link to
/proc/mounts, yet the issue did not occur.  My first observation of the
problem corresponds with when I'd installed a PCI SATA card in a system
and using the exact same filesystem image, except for rebuilding the
kernel with SATA support.

 None of this is likely to change, so if you don't want to use an
 initramfs then you'd better create a symlink called /dev/root on your
 root filesystem.

Userspace (I /think/, maybe this too is inside the kernel?) has already
been creating /dev/root for a long time.

While this only causes mild corruption of output, it causes it in *many*
programs.  Either this *kernel* *bug* needs to be fixed inside the
kernel, or I'm going have to report many bugs against the many programs
which display bad output.

Again, a computer with /dev/mtdblock2 as root device, directly mounts
/dev/mtdblock2 just fine and lists /dev/mtdblock2 in /proc/mounts just
fine.  That sounds very much like a (perhaps fairly minor) SCSI subsystem
bug.


-- 
(\___(\___(\__  --= 8-) EHM =--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include stddisclaimer.h O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 -PGP- 41D1 B375 37D0 8714\_|_/___/5445


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20150105042259.ga50...@scollay.m5p.com



Is #588675 (/ left as /dev/root) A Kernel Bug?

2014-12-22 Thread Elliott Mitchell
#588675 may not be all that severe, but does cause issues with multiple
packages.  The issue is sometime between Debian Etch and Debian Lenny the
line for the root filesystem in /proc/mounts stopped listing the actual
device (/dev/sda1) started listing /dev/root instead.

One crucial ingredient I'm certain of is this requires a kernel that has
the drivers necessary to mount the root filesystem builtin and not be
using an initrd.  Both reportted encounters with this bug involve kernels
that have been built from source.

I've got little idea of where the actual bug is lurking.  I've got some
suspicion this may be a bug in the SCSI subsystem.  The one machine I've
noticed this *doesn't* occur on is one where it boots with root on a MTD
device.  I'll try to see if I can pursuade that machine to boot directly
onto it's USB device as root, to see whether or not it is due to being a
MIPS system (grr, emdebian.org being down is a problem).


-- 
(\___(\___(\__  --= 8-) EHM =--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include stddisclaimer.h O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 -PGP- 41D1 B375 37D0 8714\_|_/___/5445



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20141223064816.ga8...@scollay.m5p.com



Bug#572406: users Mount Option Broken

2011-03-21 Thread Elliott Mitchell
From: Luk Claes l...@debian.org
  The users option got broken with the latest release, despite working
  correctly in 1:1.0.10-6+etch.1 (old stable). Non-root users can mount
  filesystems listed in /etc/fstab that have users specified, but they
  will be unable to unmount the filesystem
  (umount.nfs: You are not permitted to unmount ...).
 
  My first thought is someone confused the user (which would need to
  check who mounted the FS) and users option. Given bug report #501459,
  part of which sounds like a similar issue with cifs, I'm also wondering
  if the interface between `mount` and `mount.fstype` got changed
  slightly.

 Does adding a trailing slash in /etc/fstab fix the issue for you?

Nope, no impact from adding one to either the NFS-server nor the mount
point (nor both at the same time). Also tried `umount` both with and
without a trailing slash in each of those combinations as well. Nothing.


-- 
(\___(\___(\__  --= 8-) EHM =--  __/)___/)___/)
 \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |)   /
  \_CS\   |  _  -O #include stddisclaimer.h O-   _  |   /  _/
2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201103212257.p2lmvzjl001...@m5p.com



Bug#572406: users Mount Option Broken

2011-03-21 Thread Elliott Mitchell
I should also add that I'm now dealing with nfs-common 1:1.2.2-4, and
mount 2.17.2-9 (current stable).


-- 
(\___(\___(\__  --= 8-) EHM =--  __/)___/)___/)
 \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |)   /
  \_CS\   |  _  -O #include stddisclaimer.h O-   _  |   /  _/
2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201103220029.p2m0tahx002...@m5p.com



Bug#589118: `rdev` setting ignored

2010-07-16 Thread Elliott Mitchell
reopen 589118
quit

From: Ben Hutchings b...@decadent.org.uk
 On Thu, 2010-07-15 at 13:48 -0700, Elliott Mitchell wrote:
  Bzzzt! While the initrd= kernel command-line option and `rdev` kernel
  settings are not completely orthogonal, they are mostly unrelated.
 
 You obviously haven't read the code.  I have.

This is in fact true. An unrelated project may cause me to do so.

  Unlike the kernel command-line, I don't know how the `rdev` (and
  accompanying) setting is passed along to initial ram disks, but I do know
  it is (or was).
 
 It isn't.

Reeeaaally? Sorry to be speculating outside my area of firm knowledge,
but I'm noting that the rdev setting was honored all the way through
Debian 3.1/Sarge, which was a 2.4 kernel. Was the rdev setting really
available to initial ramdisks all the way through 2.4, yet lost with 2.6
kernels?

  I'm unsure whether Debian 4.0/Etch honored the `rdev`
  setting, but I am pretty certain initial ram disks generated with Debian
  3.1/Sage did honor the `rdev` setting unless overridden by the root=
  option.
 
 That's nice, but this feature isn't coming back.

That sounds suspiciously like wontfix, not done.


-- 
(\___(\___(\__  --= 8-) EHM =--  __/)___/)___/)
 \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |)   /
  \_CS\   |  _  -O #include stddisclaimer.h O-   _  |   /  _/
2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201007170018.o6h0ifmg076...@m5p.com



Bug#589118: `rdev` setting ignored

2010-07-15 Thread Elliott Mitchell
reopen 589118
quit

From: Ben Hutchings b...@decadent.org.uk
 On Wed, 2010-07-14 at 18:11 -0700, Elliott Mitchell wrote:
  Package: initramfs-tools
  Version: 0.92o
  
  Subject tells the story. Appears the images generated by initramfs-tools
  completely ignore the `rdev` setting that the kernel was given to the
  kernel. While 99% of users may be explicitly passing the root device via
  passing root=/dev/foo through the bootloader, if that is absent one
  would think the value from `rdev` would be honored.
  
  (yeah, it's an ancient method, but not officially deprecated)
 
 If the bootloader passes an initramfs to the kernel, that overrides any
 rdev parameter.  This is nothing to do with the contents of the
 initramfs.

Bzzzt! While the initrd= kernel command-line option and `rdev` kernel
settings are not completely orthogonal, they are mostly unrelated. The
initrd= option overrides the `rdev` setting in the same fashion the
initrd= option overrides the root= and all other kernel command-line
options. Mainly, the initramfs can ignore any and all options and use
ones built in, or it can implement all those options.  It is the root=
option that is directly related to `rdev`.

Unlike the kernel command-line, I don't know how the `rdev` (and
accompanying) setting is passed along to initial ram disks, but I do know
it is (or was). I'm unsure whether Debian 4.0/Etch honored the `rdev`
setting, but I am pretty certain initial ram disks generated with Debian
3.1/Sage did honor the `rdev` setting unless overridden by the root=
option.


-- 
(\___(\___(\__  --= 8-) EHM =--  __/)___/)___/)
 \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |)   /
  \_CS\   |  _  -O #include stddisclaimer.h O-   _  |   /  _/
2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201007152048.o6fkm7n7071...@m5p.com



Bug#589118: `rdev` setting ignored

2010-07-14 Thread Elliott Mitchell
Package: initramfs-tools
Version: 0.92o

Subject tells the story. Appears the images generated by initramfs-tools
completely ignore the `rdev` setting that the kernel was given to the
kernel. While 99% of users may be explicitly passing the root device via
passing root=/dev/foo through the bootloader, if that is absent one
would think the value from `rdev` would be honored.

(yeah, it's an ancient method, but not officially deprecated)


-- 
(\___(\___(\__  --= 8-) EHM =--  __/)___/)___/)
 \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |)   /
  \_CS\   |  _  -O #include stddisclaimer.h O-   _  |   /  _/
2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201007150111.o6f1byyw068...@m5p.com



Bug#575154: Incorrect assumes existance of /proc/modules

2010-03-23 Thread Elliott Mitchell
Package: initramfs-tools
Version: 0.92o

If the running kernel has had module support removed, you'll get a bunch
of errors of:
grep: /proc/modules: No such file or directory

The one place I found was in /usr/share/initramfs-tools/hook-functions,
the function manual_add_modules(). Looks like you need the -s option
to grep, or else redirect standard error.


-- 
(\___(\___(\__  --= 8-) EHM =--  __/)___/)___/)
 \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |)   /
  \_CS\   |  _  -O #include stddisclaimer.h O-   _  |   /  _/
2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201003232141.o2nlfwh3061...@m5p.com



Bug#575157: Calling `cpio` can produce error messages when working correctly

2010-03-23 Thread Elliott Mitchell
Package: initramfs-tools
Version: 0.92o
Severity: minor

Thankfully pretty harmless, despite the annoyance:
cpio: ./etc/udev/RCS: Cannot stat: No such file or directory

Looks like `/usr/sbin/mkinitramfs` is the culprit. In this case,
/etc/udev/RCS is a symbolic link to ../RCS


-- 
(\___(\___(\__  --= 8-) EHM =--  __/)___/)___/)
 \BS (| e...@gremlin.m5p.com PGP F6B23DE0 |)   /
  \_CS\   |  _  -O #include stddisclaimer.h O-   _  |   /  _/
2477\___\_|_/DC21 03A0 5D61 985B -PGP- F2BE 6526 ABD2 F6B2\_|_/___/3DE0





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201003232155.o2nltk4b061...@m5p.com