from:"Rob Landley"

Re: Old platforms never die, was Re: Old platforms: bring out your dead

2021-01-15 Thread Rob Landley

On 1/12/21 6:12 PM, Finn Thain wrote:
> If you're a museum interested in cultural artifacts from decades past, or 
> if you're a business doing data recovery, you're going to need to operate 
> those platforms.

Or if you're camping patent expirations and want to be able to point at prior
art for new hardware development WITHOUT a legal team big enough to have its own
office building.

> Once removed from mainline Linux, a port becomes basically frozen, and may 
> not be compatible with future emulators, which are a moving target. I say 
> that because last year I fixed bugs in Linux/m68k that made it incomatible 
> with recent QEMU releases (it was only compatible with old QEMU releases).

Speaking of which, my qemu m68k system has failed to boot ever since commit:

commit f93bfeb55255bddaa16597e187a99ae6131b964a
Author: Finn Thain 
Date:   Sun Jun 28 14:23:12 2020 +1000

macintosh/via-macii: Poll the device most likely to respond

Poll the most recently polled device by default, rather than the lowest
device address that happens to be enabled in autopoll_devs. This improves
input latency. Re-use macii_queue_poll() rather than duplicate that logic.
This eliminates a static struct and function.

It hangs in a cpu-eating loop after "random: crng init done". Miniconfig
attached, the qemu invocation is:

qemu-system-m68k -M q800 -nographic -no-reboot -m 256 -kernel vmlinux \
  -initrd cpio.gz -append "panic=1 HOST=m68k console=ttyS0

Rob

P.S. This is the toybox "make root" m68k target from
https://github.com/landley/toybox/blob/master/scripts/mkroot.sh#L171 if that's
useful to know. It doesn't get to the root filesystem and the build just creates
that miniconfig and runs it as the comments say...
# make ARCH=m68k allnoconfig KCONFIG_ALLCONFIG=m68k.miniconf
# make ARCH=m68k -j $(nproc)
# boot vmlinux

# CONFIG_EMBEDDED is not set
# architecture independent
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_RD_GZIP=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_UTF8=y
CONFIG_MISC_FILESYSTEMS=y
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IPV6=y
CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
CONFIG_NETCONSOLE=y
CONFIG_ETHERNET=y
CONFIG_COMPAT_32BIT_TIME=y
CONFIG_EARLY_PRINTK=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y

# architecture specific
CONFIG_MMU=y
CONFIG_M68040=y
CONFIG_M68KFPU_EMU=y
CONFIG_MAC=y
CONFIG_SCSI_MAC_ESP=y
CONFIG_MACINTOSH_DRIVERS=y
CONFIG_ADB=y
CONFIG_ADB_MACII=y
CONFIG_NET_CORE=y
CONFIG_MACSONIC=y
CONFIG_SERIAL_PMACZILOG=y
CONFIG_SERIAL_PMACZILOG_TTYS=y
CONFIG_SERIAL_PMACZILOG_CONSOLE=y

Re: Old platforms: bring out your dead

2021-01-13 Thread Rob Landley

On 1/13/21 2:21 AM, Geert Uytterhoeven wrote:
> Hi Rob,
> 
> On Wed, Jan 13, 2021 at 8:58 AM Rob Landley  wrote:
>> On 1/12/21 4:46 PM, Linus Walleij wrote:
>>> On Tue, Jan 12, 2021 at 3:45 PM John Paul Adrian Glaubitz
>>>  wrote:
>>>> Yeah, I have the same impression that's the strong commercial interest 
>>>> pushes
>>>> hobbyist use of the Linux kernel a bit down. A lot of these changes feel 
>>>> like
>>>> they're motivated by corporate decisions.
>>>>
>>>> There has to be a healthy balance between hobbyist and commercial use. I 
>>>> understand
>>>> that from a commercial point of view, it doesn't make much sense to run 
>>>> Linux
>>>> on a 30-year-old computer. But it's a hobbyist project for many people and 
>>>> hacking
>>>> Linux stuff for these old machines has a very entertaining and educational 
>>>> factor.
>>>
>>> This is actually one of the most interesting things written in this 
>>> discussion.
>>>
>>> I have both revamped and deleted subarchitectures in the ARM tree. We
>>> never deleted anyone's pet project *unless* they were clearly unwilling to
>>> work on it (such as simply testning new patches) and agreed that it will
>>> not go on.
>>
>> Another fun aspect of old hardware is it serves as prior art for patents. The
>> j-core hardware implementation schedule has in part been driven by specific
>> patents expiring, as in "we can't do $FEATURE until $DATE".
> 
> Indeed, so that's why the release of j4 is postponed to 2016...
> /me runs date (again).

We renamed it J32 because although the patents have expired the trademarks have
not, and provoking Renesas' lawyers more than necessary seemed gratuitous.

It's actually been feature complete for years now, but we've never ported the
kernel to it. (Rich has been working on a kernel port since new year's though.
Jeff Garzik sponsored some engineering time in our 2021 budget to finally get
that done, which has been our blocker for publishing because the lab tests don't
guarantee we won't have to change bits of the API in response to real world 
loads.)

>> When I did an sh4 porting contract in 2018 I got that board updated to a
>> current-ish kernel (3 versions back from then-current it hit some 
>> intermittent
>> nor flash filesystem corruption that only occurred intermittently under
>> sustained load; had to ship so I backed off one version and never tracked it
>> down). But these days I'm not always on the same continent as my two actual 
>> sh4
>> hardware boards, have never gotten my physical sh2 board to boot, and 
>> $DAYJOB is
>> all j-core stuff not sh4.
> 
> Which is not upstream, investing in the future?

Alas I'm not in charge of what is cleared for public release. (I complain about
it on the weekly calls from time to time.) We have actual marketing people now
(Mike and Bunga) so I'm not supposed to do the website in raw stylesheet-less
html with vi anymore.

Unpublished stuff we _mean_ to publish is a form of technical debt. It
_shouldn't_ be (release early release often) but Jeff insists on doing
everything in Mercurial which makes dogfooding our github repos as part of our
normal process darn awkward, and once there a little out of sync with the rest
of the build it becomes a todo item...

Rob

Re: Old platforms: bring out your dead

2021-01-12 Thread Rob Landley

On 1/12/21 4:46 PM, Linus Walleij wrote:
> On Tue, Jan 12, 2021 at 3:45 PM John Paul Adrian Glaubitz
>  wrote:
> 
>> Yeah, I have the same impression that's the strong commercial interest pushes
>> hobbyist use of the Linux kernel a bit down. A lot of these changes feel like
>> they're motivated by corporate decisions.
>>
>> There has to be a healthy balance between hobbyist and commercial use. I 
>> understand
>> that from a commercial point of view, it doesn't make much sense to run Linux
>> on a 30-year-old computer. But it's a hobbyist project for many people and 
>> hacking
>> Linux stuff for these old machines has a very entertaining and educational 
>> factor.
> 
> This is actually one of the most interesting things written in this 
> discussion.
> 
> I have both revamped and deleted subarchitectures in the ARM tree. We
> never deleted anyone's pet project *unless* they were clearly unwilling to
> work on it (such as simply testning new patches) and agreed that it will
> not go on.

Another fun aspect of old hardware is it serves as prior art for patents. The
j-core hardware implementation schedule has in part been driven by specific
patents expiring, as in "we can't do $FEATURE until $DATE".

It's much easier to nip a patent suit in the bud if you can go "here is
functionally equivalent hardware from the past, dates the specifications were
published, and the specific patents on the technology which have now expired".

We're a little overscheduled and not always _prompt_ about it, but for example
the reason we couldn't do full 6-wire sd 1.0 in the first j-core SOC release
(and had to implement a painfully slow mmc bus instead) is the patents hadn't
expired yet.

> That said there are a three things that people should really be doing if they
> want to keep their pet archs/subarchs around as good community
> members, and they are in essence to:
> 
> 1. Test and review/ack patches that others make

I'm trying. :)

> 2. Migrate existing drivers to newly appeared and
> appropriate subsystems (I think there are some hacky heartbeat LED
> drivers down in arch/* for example) there is also the feature matrix
> core maintainers like and which appears if you type
> Documentation/features/list-arch.sh 
> would be nice if you work on them if you can support them!
> Or at least take a look.

For 3 years in the 1990's SuperH was the best-selling processor in the world,
and that's left the architecture with a bunch of legacy boards that aren't
easily available to us. I regression test a current kernel build under qemu
every month or two, and have portable USB-powered boards for the j-core stuff.

When I did an sh4 porting contract in 2018 I got that board updated to a
current-ish kernel (3 versions back from then-current it hit some intermittent
nor flash filesystem corruption that only occurred intermittently under
sustained load; had to ship so I backed off one version and never tracked it
down). But these days I'm not always on the same continent as my two actual sh4
hardware boards, have never gotten my physical sh2 board to boot, and $DAYJOB is
all j-core stuff not sh4.

Testing that a basic superh system still builds and boots under qemu and j-core
I can commit to doing regularly. Testing specific hardware devices on boards I
don't regularly use is a lot harder.

> 3. Migrate old systems to use the
>contemporary hardware descriptions (such as device tree or ACPI)
>because it makes things so much easier to maintain. Some
>upfront work, but a great win for everyone. Especially for
>subsystem maintainers.

We did that one for our SOC. We haven't ported a lot of legacy boards because we
can't easily test most of them.

> And if your arch uses highmem then please get rid of highmem. I'm
> trying to do this a bit right now for ARM let's see how it goes.

I don't believe it does? (We haven't got any configs using it, anyway...)

> I understand that for some maintainers time is a factor and this list
> feels stressful. I'd say relax, but it'd be nice if you have a TODO that
> you cross items off of.

My todo list runneth over. One of our perpetual todo list items is "collate todo
lists"...

> Just my €0.01
> Linus Walleij

Rob

Re: Old platforms: bring out your dead

2021-01-11 Thread Rob Landley

On 1/11/21 8:55 AM, chase rayfield wrote:
> On Mon, Jan 11, 2021 at 3:09 AM John Paul Adrian Glaubitz
>  wrote:
> 
>>
>> I'm not sure I understand the reasoning for doing this. The SPARC 
>> architecture
>> isn't going to see any new hardware developments in the future after Oracle
>> let go of most of the SPARC developers. So it's not that we need to make room
>> for new hardware.
>>
> My take is that there *would* be more interest in Sparc sun4m / Sun4d
> from enthusiasts at the very least if it was possible to actually boot
> the bloat hog that is Linux these days in a fully usable configuration
> that probably means some modifications to SILO and Linux required.

You can trim current linux down a bit, it's just non-obvious how. Unfortunately
there's an "expert" menu and CONFIG_EMBEDDED and if you touch anything there's
suddenly a hundred extra options in your config with no explanation of what 
they do.

At least 50% of what you want is probably disabling the printk strings that
aren't visible at your default verbosity level, but alas you must open pandora's
box to access those options...

> The problem is as I understand it, SILO only sets up a 16Mb mapping
> (either due to having to assume 4MB minimum dram stick size or due to
> mapping limitations not sure, most of these machines have at least
> 16MB in slot one...these days though that wasn't the case for sun4c),
> loads Linux into it and says good Luck. This isn't enough for a modern
> kernel with any  hardware support built in. So you might for instance
> get a kernel to fit but only if you dropped all of networking support
> etc... I'm guessing the fix for this would be to modify silo to map a
> larger amount in a way that Linux expects so it can remap it as it
> likes, or just have SILO map the full memory as Linux would. Anyway
> that is THE main demotivation for these architectures otherwise
> they have plenty of ram and performance to do basic router/server
> tasks sans SSL.

A lot of people with hardware like this haven't stopped using it, they've just
stopped fighting with kernel upgrades. (Common issue in the embedded world. Not
really a fun thing for security, but )

> This has been the status quo for since the last of the 2.6 series of
> kernels which it was still possible to just barely squeeze a usable
> kernel out of... If someone wanted to take a few hours and fix this
> issue, and keep these architectures around I'd be happy to "buy them a
> round of pizza", though I recognize that many people that work on this
> already have nice jobs, and just don't have time.

My https://github.com/landley/toybox/blob/master/scripts/mkroot.sh ~250 line
bash script generates the simplest kernel configs for a bunch of platforms to
boot qemu to a shell prompt, but you then have to open the "expert" menu and
_disable_ stuff in order to get the size down from there.

> Also Sparc would probably be a good project for someone to extend/test

Sparc has a runtime relocation I've never understood but did manage to break
once, resulting in a long thread to fix:

http://lists.landley.net/pipermail/aboriginal-landley.net/2011-December/001964.html

Between that and the weird save half the stack register thing with function
calls on some sort of "wheel"... there's a _reason_ I haven't been able to talk
Rich into adding support for it to musl.

> Andi Keen's Linux LTO patch set so we could reduce the kernel binary
> size that way also even if sun4 architectures are dropped, it would
> still be useful for embedded sparc. Also there is a port of Temlib to
> the Mister hardware now, 3 cores roughly equivalent to a mid 90s
> machine, at least 128MB ram is possible ( more if a way to map the ARM
> system memory also 1GB is available there, it would have higher
> latency though).
> 
> It is perfectly viable to build Sparc v7 or v8 32bit binaries in a
> chroot on a fast machine also, and I would recommend this if you wish
> to retain sanity rather than attempting cross compiler voodoo, unless
> that is your thing.

It is, sadly, my thing. The above 250 line bash script builds:

aarch64  armv7l  i686mipspowerpc  s390x  x86_64
armv4l   armv7m  m68kmips64  powerpc64sh2eb
armv5l   i486microblaze  mipsel  powerpc64le  sh4

That's toybox booting to a shell prompt and a linux kernel configured for qemu
for each target. Adding new targets looks something like:

elif [ "$TARGET" == m68k ]; then
  QEMU="m68k -M q800" KARCH=m68k KARGS=ttyS0 VMLINUX=vmlinux
KCONF=MMU,M68040,M68KFPU_EMU,MAC,SCSI_MAC_ESP,MACINTOSH_DRIVERS,ADB,ADB_MACII,NET_CORE,MACSONIC,SERIAL_PMACZILOG,SERIAL_PMACZILOG_TTYS,SERIAL_PMACZILOG_CONSOLE
elif [ "$TARGET" = s390x ]; then
  QEMU="s390x" KARCH=s390 VMLINUX=arch/s390/boot/bzImage
KCONF=MARCH_Z900,PACK_STACK,NET_CORE,VIRTIO_NET,VIRTIO_BLK,SCLP_TTY,SCLP_CONSOLE,SCLP_VT220_TTY,SCLP_VT220_CONSOLE,S390_GUEST

(Well, modulo thunderbird being unable to an indent a line that goes off the
right edge of the screen. The mozilla

Re: Old platforms: bring out your dead

2021-01-11 Thread Rob Landley

On 1/10/21 3:46 PM, Sam Ravnborg wrote:
> Hi all,
>> Hi Arnd!
>>
>> (Please let's have this cross-posted for more visibility. I only learned 
>> about this
>>  while reading Phoronix news)
>>
>>> I also looked at non-ARM platforms while preparing for my article. Some of
>>> these look like they are no longer actively maintained or used, but I'm not
>>> doing anything about those unless the maintainers would like me to:
>>>
> 
>>> * sparc/sun4m: A patch for removing 32-bit Sun sparc support (not LEON)
>>>is currently under review
>>
>> I don't think this has reached any agreement yet. Multiple people want it to 
>> stay.
> 
> None of the people that replied have any real use of the sun4m port,
> they only wanted it to stay because they had some machines or such.
> In other words - people will be sad if we sunset sun4m, but it will not
> hurt anyone as there are no users left.
> 
> I will include the above summary when I post v2 of the patch to sunset
> sun4m and sun4d. Then we will see what we conclude in the end.

I used to regression test it in my cross builds but I switched my
toolchains/userspace from uClibc to musl-libc a couple years back, and musl
never added sparc support.

Rob

Re: ac0e958a00: Kernel_panic-not_syncing:stack-protector:Kernel_stack_is_corrupted_in:run_init_process

2020-11-12 Thread Rob Landley

On 11/12/20 7:49 AM, David Laight wrote:
> From: Rob Landley
>> Sent: 12 November 2020 12:46
>>
>> On 11/12/20 1:11 AM, kernel test robot wrote:
>>>
>>> Greeting,
>>>
>>> FYI, we noticed the following commit (built with gcc-9):
>>
>> Blah, switched from strlcpy to sprintf due to the lack of spaces and didn't
>> adjust the size.
>>
>> (And yes, the compiler's lifetime analysis should free the stack space before
>> the tail call, and I'd assume exec restarts the stack anyway.)

This is why I didn't put anything like that in the first submission. (I knew
better, and did it anyway...)

>> Second-attempt-by: Rob Landley 
>> ---
>>
>>  init/main.c |   15 ---
>>  1 file changed, 8 insertions(+), 7 deletions(-)
>>
>> diff --git a/init/main.c b/init/main.c
>> index 130376ec10ba..e92320816ef8 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -1328,15 +1328,16 @@ static void __init do_pre_smp_initcalls(void)
>>  static int run_init_process(const char *init_filename)
>>  {
>>  const char *const *p;
>> +char buf[512], *s = buf;
>>
>>  argv_init[0] = init_filename;
>> -pr_info("Run %s as init process\n", init_filename);
>> -pr_debug("  with arguments:\n");
>> -for (p = argv_init; *p; p++)
>> -pr_debug("%s\n", *p);
>> -pr_debug("  with environment:\n");
>> -for (p = envp_init; *p; p++)
>> -pr_debug("%s\n", *p);
>> +
>> +for (p = (void *)envp_init; *p; p++)
>> +s += sprintf(s, "%.*s ", (int)(sizeof(buf)-(s-buf)-2), *p);
>> +for (p = (void *)argv_init; *p; p++)
>> +s += sprintf(s, "%.*s ", (int)(sizeof(buf)-(s-buf)-2), *p);
>> +pr_info("Run init: %s\n", buf);
>> +
> 
> Why not use scnprintf() as:
>   len += scnprintf(buf + len, 256 - len, " %s", *p);

Because what I did worked for me?

The buffer size isn't 256, sizeof() means if the buffer size changes the code
automatically adjusts and the -2 gets constant folded at compile time anyway,
you've proposed switching from a posix function to a kernel-specific function
for no obvious benefit, it's the same number of arguments and other than 2 bytes
in a string constant you've just swapped s-buf for buf+len, I didn't want to dig
into whether passing rdinit= could have an empty environment so the "skip a
space" has no space to skip and thus you skip the null terminator so I just left
one harmlessly trailing on the end, if I wanted to get FANCY I'd measure and
allocate space then free it after printing...

I could go on, but that's about as much bikeshedding as I have the stomach for
right now on two for loops calling print statements, thanks.

> or even:
>   s = buf + sizeof buf;
>   len = sizeof buf;
>   ...
>   len -= scnprintf(s - len, len, " %s", *p);
> 
> and remove the " " before the %s in the final pr_info().

Feel free to submit your own patch...?

I don't really expect this to get merged. It's not like I cc'd any humans. My
latest thingied-by tag is not an approved entry in
Documentation/process/submitting-patches.rst and I didn't go through all 27
steps in Documentation/process/submit-checklist.rst because I'm not part of the
fulltime kernel political clique and I don't bother to fight them anymore. It's
just a small thing that annoyed me and I mostly posted it here so when some
clown sues us for shipping a modified kernel I can cost them more money by
pointing their lawyers at the patch on the list's web archive. (I could do so on
a website I maintain, but then I'd have to track it and dowanna.)

I was only ever involved here as a hobbyist. The Linux Foundation is currently
holding a "conference dedicated to driving collaboration and innovation in
financial services" with "featured speakers" including the Managing Director of
Goldman Sachs, the Founder of the Alliance for Innovative Regulation, the former
CIO of Deutsche Bank, Red Hat's Director of Financial Services Strategy, and
whatever "Open Source Wonk, Azure Office of the CTO, Microsoft" means.

No, I did not make that up, they spammed me about it as part of their perpetual
fundraising strategy to sell for-profit conference tickets:

  https://events.linuxfoundation.org/open-source-strategy-forum/

Rob

Re: ac0e958a00: Kernel_panic-not_syncing:stack-protector:Kernel_stack_is_corrupted_in:run_init_process

2020-11-12 Thread Rob Landley

On 11/12/20 1:11 AM, kernel test robot wrote:
> 
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-9):

Blah, switched from strlcpy to sprintf due to the lack of spaces and didn't
adjust the size.

(And yes, the compiler's lifetime analysis should free the stack space before
the tail call, and I'd assume exec restarts the stack anyway.)

Second-attempt-by: Rob Landley 
---

 init/main.c |   15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/init/main.c b/init/main.c
index 130376ec10ba..e92320816ef8 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1328,15 +1328,16 @@ static void __init do_pre_smp_initcalls(void)
 static int run_init_process(const char *init_filename)
 {
const char *const *p;
+   char buf[512], *s = buf;

argv_init[0] = init_filename;
-   pr_info("Run %s as init process\n", init_filename);
-   pr_debug("  with arguments:\n");
-   for (p = argv_init; *p; p++)
-   pr_debug("%s\n", *p);
-   pr_debug("  with environment:\n");
-   for (p = envp_init; *p; p++)
-   pr_debug("%s\n", *p);
+
+   for (p = (void *)envp_init; *p; p++)
+   s += sprintf(s, "%.*s ", (int)(sizeof(buf)-(s-buf)-2), *p);
+   for (p = (void *)argv_init; *p; p++)
+   s += sprintf(s, "%.*s ", (int)(sizeof(buf)-(s-buf)-2), *p);
+   pr_info("Run init: %s\n", buf);
+
return kernel_execve(init_filename, argv_init, envp_init);
 }

[PATCH] Collate "run init" message to one line with prefixed var assignments

2020-11-11 Thread Rob Landley

From: Rob Landley 

Run init: HOME=/ TERM=linux /init

Signed-off-by: Rob Landley 
---

 init/main.c |   15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/init/main.c b/init/main.c
index 130376ec10ba..80b06566852b 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1328,15 +1328,16 @@ static void __init do_pre_smp_initcalls(void)
 static int run_init_process(const char *init_filename)
 {
const char *const *p;
+   char buf[512], *s = buf;

argv_init[0] = init_filename;
-   pr_info("Run %s as init process\n", init_filename);
-   pr_debug("  with arguments:\n");
-   for (p = argv_init; *p; p++)
-   pr_debug("%s\n", *p);
-   pr_debug("  with environment:\n");
-   for (p = envp_init; *p; p++)
-   pr_debug("%s\n", *p);
+
+   for (p = (void *)envp_init; *p; p++)
+   s += sprintf(s, "%.*s ", (int)(sizeof(buf)-(s-buf)), *p);
+   for (p = (void *)argv_init; *p; p++)
+   s += sprintf(s, "%.*s ", (int)(sizeof(buf)-(s-buf)), *p);
+   pr_info("Run init: %s\n", buf);
+
return kernel_execve(init_filename, argv_init, envp_init);
 }

[PATCH] switch "random: fast init done" message from NOTICE to INFO.

2020-10-24 Thread Rob Landley

From: Rob Landley 

If you loglevel=4 you get zero kernel boot messages, but at loglevel=5
the shell prompt is overwritten on devices that boot to a serial console
a second after it comes up, and if the prompt is "#" it's easy to think the
boot's hung.

Signed-off-by: Rob Landley 
---

 drivers/char/random.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d20ba1b104ca..91daf9113204 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -895,7 +895,7 @@ static int crng_fast_load(const char *cp, size_t len)
if (crng_init_cnt >= CRNG_INIT_CNT_THRESH) {
invalidate_batched_entropy();
crng_init = 1;
-   pr_notice("fast init done\n");
+   pr_info("fast init done\n");
}
return 1;
 }

Re: [PATCH] sh: fix syscall tracing

2020-09-10 Thread Rob Landley

On 9/10/20 4:55 AM, John Paul Adrian Glaubitz wrote:
> Hi Rich!
> 
> On 9/7/20 7:44 PM, Rich Felker wrote:
>>> Can we still get this merged as a hotfix for 5.9?
>>
>> Yes, fixes for regressions in the same release cycle are in-scope (the
>> whole point of having -rc's). I have at least one other fix that needs
>> to go in too and was just giving it a little time to make sure
>> everything's ok now and that there are no more.
> 
> Let me know if there is anything else left for testing.

Could you also merge the fix the build break, ala:

> The vmlinux image is a current vanilla Linux kernel using an initramfs 
> filesystem:
> 
>   make ARCH=sh CROSS_COMPILE=sh2eb-linux-muslfdpic- j2_defconfig vmlinux
> 
> And trying to do that in current git dies with:
> 
>   CC  init/version.o
> In file included from ./include/linux/spinlock.h:318,
>  from ./arch/sh/include/asm/smp.h:11,
>  from ./include/linux/smp.h:82,
>  from ./include/linux/lockdep.h:14,
>  from ./include/linux/rcupdate.h:29,
>  from ./include/linux/rculist.h:11,
>  from ./include/linux/pid.h:5,
>  from ./include/linux/sched.h:14,
>  from ./include/linux/utsname.h:6,
>  from init/version.c:14:
> ./include/linux/spinlock_api_smp.h: In function '__raw_spin_trylock':
> ./include/linux/spinlock_api_smp.h:90:3: error: implicit declaration of 
> function
> 'spin_acquire'; did you mean 'xchg_acquire'? 
> [-Werror=implicit-function-declaration]
>90 |   spin_acquire(>dep_map, 0, 1, _RET_IP_);
>   |   ^~~~
>   |   xchg_acquire
> ./include/linux/spinlock_api_smp.h:90:21: error: 'raw_spinlock_t' {aka 'struct
> raw_spinlock'} has no member named 'dep_map'
>90 |   spin_acquire(>dep_map, 0, 1, _RET_IP_);
>   | ^~
> 
> And so on and so forth for pages. I bisected it to:
> 
> commit 0cd39f4600ed4de859383018eb10f0f724900e1b
> Author: Peter Zijlstra 
> Date:   Thu Aug 6 14:35:11 2020 +0200
> 
> locking/seqlock, headers: Untangle the spaghetti monster

Which I reported to Rich on the 2nd and he had me test a one line patch fixing
it (adding an extra #include) on the 3rd, but I just did a fresh pull and the
j2_defconfig build still broke a week later.

Rob

Re: [PATCH] serial: sh-sci: Make sure status register SCxSR is read in correct sequence

2020-08-16 Thread Rob Landley

On 8/16/20 11:22 AM, Prabhakar Mahadev Lad wrote:
>> FTR, I gave it a try on the SH7751R-based I-O DATA USL-5P aka Landisk:
>> SCIF is affected, and fixed by commit 3dc4db3662366306 ("serial: sh-sci:
>> Make sure status register SCxSR is read in correct sequence").
>>
> Thank you Geert.
> 
> Cheers,
> Prabhakar

Did we ever figure out how to get linux to talk to the _first_ serial port on
the qemu-system-sh4 r2d board? I'm still doing:

  qemu-system-sh4 -M r2d -serial null -serial mon:stdio

Because I can only get a working console on the _second_ serial port. (SCI vs
SCIF I think?)

Rob

Re: [PATCH] sh: Replace HTTP links with HTTPS ones

2020-07-12 Thread Rob Landley

On 7/12/20 6:11 AM, Alexander A. Klimov wrote:
> Rationale:
> Reduces attack surface on kernel devs opening the links for MITM
> as HTTPS traffic is much harder to manipulate.

Trimmed just to the one site without the self-signed certficate: check.

> Deterministic algorithm:
> For each file:
>   If not .svg:
> For each line:
>   If doesn't contain `\bxmlns\b`:
> For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
> If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
> If both the HTTP and HTTPS versions
> return 200 OK and serve the same content:
>   Replace HTTP with HTTPS.
> 
> Signed-off-by: Alexander A. Klimov 

Acked-by: Rob Landley 

Rob

Re: [PATCH] SUPERH: Replace HTTP links with HTTPS ones

2020-07-12 Thread Rob Landley

On 7/8/20 9:17 PM, Alexander A. Klimov wrote:
> diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
> index 9fc2b010e938..bc91bdb0b665 100644
> --- a/arch/sh/Kconfig
> +++ b/arch/sh/Kconfig
> @@ -74,7 +74,7 @@ config SUPERH
> The SuperH is a RISC processor targeted for use in embedded systems
> and consumer electronics; it was also used in the Sega Dreamcast
> gaming console.  The SuperH port has a home page at
> -   .
> +   .

That's a historical page last edited in 2006 (according to
http://www.linux-sh.org/shwiki/RecentChanges/ anyway) with a self-signed
certificate that pops up a full page warning on chrome about the certificate
being invalid, in a wiki that can theoretically be edited by arbitrary third
parties anyway.

Not a huge man-in-the-middle target.

Rob

Re: [PATCH 09/10] sh: don't allow non-coherent DMA for NOMMU

2020-06-27 Thread Rob Landley

On 6/26/20 3:07 AM, Christoph Hellwig wrote:
> The code handling non-coherent DMA depends on being able to remap code
> as non-cached.  But that can't be done without an MMU, so using this
> option on NOMMU builds is broken.

I'm working on a nommu j-core board that's doing DMA behind the OS's back at the
moment, which I have a todo item to teach the kernel about. The DMA does not go
through the cache, there's currently a cache flush before looking at the result
instead.

How should this be wired up after your patch?

Rob

headers_install builds break on a lot of targets?

2020-06-03 Thread Rob Landley

The headers_install_all target got removed last year (commit f3c8d4c7a728 and
would someone like to update Documentation/kbuild/headers_install.txt which
still describes it?)

The musl-libc maintainer is using a forked hand-hacked kernel header package in
his toolchain build project (https://github.com/richfelker/musl-cross-make), and
he said the reason for it is:

  http://lists.landley.net/pipermail/toybox-landley.net/2020-March/011536.html

  Because downloading 100 MB of kernel source and extracting it to a far
  larger tree just to get the headers isn't really fun.

And I thought "that's why headers_install_all existed", and noticed the target
being removed, so I tried my hand at a small shell script vesion:

  for i in $(echo arch/*/ | sed 's@arch/\([^/]*\)/@\1@g')
  do
echo $i
X="$PWD/fruitbasket/$i"
mkdir -p "$X"
make ARCH=$i distclean defconfig headers_install \
  INSTALL_HDR_PATH="$PWD/fruitbasket/$i" > /dev/null
  done

On the bright side, the resulting fruitbasket.tar.xz is 1.5 megabytes. The
downside is I have no idea how broken the resulting header files are after this
error-fest:

alpha
arc
gcc: error: unrecognized command line option ‘-mmedium-calls’
gcc: error: unrecognized command line option ‘-mno-sdata’; did you mean
‘-fno-stats’?
gcc: error: unrecognized command line option ‘-mmedium-calls’
gcc: error: unrecognized command line option ‘-mno-sdata’; did you mean
‘-fno-stats’?
arm
arm64
c6x
csky
h8300
gcc: error: missing argument to ‘-Wframe-larger-than=’
gcc: error: unrecognized command line option ‘-mint32’; did you mean ‘-fintfc’?
hexagon
gcc: error: unrecognized command line option ‘-G0’
gcc: error: unrecognized command line option ‘-G0’
gcc: error: unrecognized command line option ‘-G0’
ia64
./arch/ia64/scripts/check-segrel.S: Assembler messages:
./arch/ia64/scripts/check-segrel.S:2: Error: unknown pseudo-op: `.rodata'
./arch/ia64/scripts/check-segrel.S:3: Error: no such instruction: `data4
@segrel(start)'
objdump: '/tmp/out17279': No such file
objdump: section '.rodata' mentioned in a -j option, but not found in any input 
file
./arch/ia64/scripts/toolchain-flags: 20: [: !=: unexpected operator
./arch/ia64/scripts/check-text-align.S: Assembler messages:
./arch/ia64/scripts/check-text-align.S:2: Error: unknown pseudo-op: `.proc'
./arch/ia64/scripts/check-text-align.S:3: Error: unknown pseudo-op: `.prologue'
./arch/ia64/scripts/check-text-align.S:4: Error: unknown pseudo-op: `.save'
./arch/ia64/scripts/check-text-align.S:7: Error: unknown pseudo-op: `.endp'
readelf: Error: '/tmp/out17279': No such file
./arch/ia64/scripts/check-gas-asm.S: Assembler messages:
./arch/ia64/scripts/check-gas-asm.S:1: Error: junk at end of line, first
unrecognized character is `['
./arch/ia64/scripts/check-gas-asm.S:2: Error: unknown pseudo-op: `.xdata4'
objdump: '/tmp/out17306.o': No such file
objdump: section '.data' mentioned in a -j option, but not found in any input 
file
./arch/ia64/scripts/check-gas: 11: [: !=: unexpected operator
./arch/ia64/scripts/check-segrel.S: Assembler messages:
./arch/ia64/scripts/check-segrel.S:2: Error: unknown pseudo-op: `.rodata'
./arch/ia64/scripts/check-segrel.S:3: Error: no such instruction: `data4
@segrel(start)'
objdump: '/tmp/out19677': No such file
objdump: section '.rodata' mentioned in a -j option, but not found in any input 
file
./arch/ia64/scripts/toolchain-flags: 20: [: !=: unexpected operator
./arch/ia64/scripts/check-text-align.S: Assembler messages:
./arch/ia64/scripts/check-text-align.S:2: Error: unknown pseudo-op: `.proc'
./arch/ia64/scripts/check-text-align.S:3: Error: unknown pseudo-op: `.prologue'
./arch/ia64/scripts/check-text-align.S:4: Error: unknown pseudo-op: `.save'
./arch/ia64/scripts/check-text-align.S:7: Error: unknown pseudo-op: `.endp'
readelf: Error: '/tmp/out19677': No such file
./arch/ia64/scripts/check-gas-asm.S: Assembler messages:
./arch/ia64/scripts/check-gas-asm.S:1: Error: junk at end of line, first
unrecognized character is `['
./arch/ia64/scripts/check-gas-asm.S:2: Error: unknown pseudo-op: `.xdata4'
objdump: '/tmp/out19705.o': No such file
objdump: section '.data' mentioned in a -j option, but not found in any input 
file
./arch/ia64/scripts/check-gas: 11: [: !=: unexpected operator
./arch/ia64/scripts/check-segrel.S: Assembler messages:
./arch/ia64/scripts/check-segrel.S:2: Error: unknown pseudo-op: `.rodata'
./arch/ia64/scripts/check-segrel.S:3: Error: no such instruction: `data4
@segrel(start)'
objdump: '/tmp/out19983': No such file
objdump: section '.rodata' mentioned in a -j option, but not found in any input 
file
./arch/ia64/scripts/toolchain-flags: 20: [: !=: unexpected operator
./arch/ia64/scripts/check-text-align.S: Assembler messages:
./arch/ia64/scripts/check-text-align.S:2: Error: unknown pseudo-op: `.proc'
./arch/ia64/scripts/check-text-align.S:3: Error: unknown pseudo-op: `.prologue'
./arch/ia64/scripts/check-text-align.S:4: Error: unknown pseudo-op: `.save'

Re: [GIT PULL] sh: remove sh5 support

2020-05-30 Thread Rob Landley

On 5/30/20 3:08 AM, John Paul Adrian Glaubitz wrote:
> On 5/29/20 7:53 PM, Rich Felker wrote:
>> Frustratingly, I _still_ don't have an official tree on kernel.org for
>> the purpose of being the canonical place for linux-next to pull from,
>> due to policies around pgp keys and nobody following up on signing
>> mine. This is all really silly since there are ridiculously many
>> independent channels I could cryptographically validate identity
>> through with vanishing probability that they're all compromised. For
>> the time being I'll reactivate my repo on git.musl-libc.org.
> 
> May I suggest to pick up these patches, for example? There might be
> more I missed, but getting these merged should already help a lot with
> the clean-up of arch/sh.

Does that include the 2 fixes to build with current binutils I made puppy eyes
about last -rc7 (in march)?

https://marc.info/?l=linux-sh=158544749818664=2

Rob

Re: [GIT PULL] sh: remove sh5 support

2020-05-28 Thread Rob Landley

On 5/28/20 5:14 PM, Rich Felker wrote:
> Aside from that, the open source & open hardware J-core models are
> still active and in development, with the latest release having been
> made this month, and the J32 with MMU nearly complete and pending
> release, contingent mostly on integration and testing with Linux.

J-core's doing stuff:

https://github.com/j-core

https://www.prnewswire.com/news-releases/spacechain-foundation-invests-in-core-semiconductor-to-produce-open-hardware-platform-for-direct-satellite-to-devices-communication-301061761.html

https://www.reddit.com/r/IAmA/comments/gs7qpn/we_are_jeff_garzik_and_jeff_dionne_and_we_are/

And I note that I worked a year long contract in 2018 porting an existing sh4
Windows CE board to Linux (as a field upgrade to a widely deployed building
control system with a big stock of parts to build replacement units), so
conventional superh isn't exactly dead either.

Rob

Re: [GIT PULL] sh: remove sh5 support

2020-05-28 Thread Rob Landley

On 5/28/20 12:55 AM, John Paul Adrian Glaubitz wrote:
> On 5/28/20 7:46 AM, Christoph Hellwig wrote:
>> [adding Linus]
>>
>> On Thu, May 07, 2020 at 07:35:52AM -0700, Christoph Hellwig wrote:
>>> Any progress on this?  I plan to resend the sh dma-mapping I've been
>>> trying to get upstream for a year again, and they would conflict,
>>> so I could look into rebasing them first.
>>
>> So for years now it has been close to and in the end impossible to
>> provoke sh maintainer action.  At the same point hardware is pretty much
>> long gone for the real commercial variants, and never took off for the
>> open hardware nommu variant.
>>
>> Linus, would you ok with a 5.8 pull request to just kill off arch/sh/?
> 
> We're maintaining SH in Debian so I'm interested in keeping arch/sh, but
> I'm also let down that SH maintainers aren't that active at the moment.
> 
> I do know that Yoshinori Sato has a tree where he takes patches and sends
> PRs from time to time, but I have no idea what is going on.

There are still people who care about the architecture and try to get fixes in:

  https://www.spinics.net/lists/linux-sh/msg56844.html

Alas, I haven't had better luck getting Rich's attention, and I say that as
someone who has his phone number.

It met Sato-san for lunch once years ago, but he lives in Tokyo and english is
not his first language. I was under the impression he became co-maintainer to
show Rich the ropes of maintainership and to answer obscure architectural
questions, not because he was volunteering for significantly more work. Rich was
supposed to be load bearing.

I don't really have the domain expertise to do it myself... :(

Rob

Re: [PATCH v2 7/8] exec: Generic execfd support

2020-05-21 Thread Rob Landley

On 5/21/20 10:28 PM, Eric W. Biederman wrote:
> 
> Rob Landley  writes:
> 
>> On 5/20/20 11:05 AM, Eric W. Biederman wrote:
> 
>> Toybox would _like_ proc mounted, but can't assume it. I'm writing a new
>> bash-compatible shell with nommu support, which means in order to do subshell
>> and background tasks if (!CONFIG_FORK) I need to create a pipe pair, vfork(),
>> have the child exec itself to unblock the parent, and then read the context 
>> data
>> that just got discarded through the pipe from the parent. ("Wheee." And you 
>> can
>> quote me on that.)
> 
> Do you have clone(CLONE_VM) ?  If my quick skim of the kernel sources is
> correct that should be the same as vfork except without causing the
> parent to wait for you.  Which I think would remove the need to reexec
> yourself.

As with perpetual motion, that only seems like it would work if you don't
understand what's going on.

A nommu system uses physical addresses, not virtual ones, so every process sees
the same addresses. So if I allocate a new block of memory and memcpy the
contents of the old one into the new one, any pointers in the copy point back
into the ORIGINAL block of memory. Trying to adjust the pointers in the copy is
the exact same problem as trying to do garbage collection in C: it's an AI
complete problem.

Any attempt to "implement a full fork" on nommu hits this problem: copying an
existing mapping to a new address range means any address values in the new
mapping point into the OLD mapping. Things like fdpic fix this up at exec time
(traversing elf tables and relocating), but not at runtime. If you can solve the
"relocate at runtime all addresses within an existing mapping, and all other
mappings that might point to this mapping, including local variables on the
stack that point to a structure member or halfway into a string rather than the
start of an allocation, without adjusting unrelated values coincidentally within
RANGE of a mapping" problem, THEN you can fork on a nommu system.

What vfork() does is pause the parent and have the child continue AS the parent
for a bit (with the system call returning 0). The child starts with all the same
memory mappings the parent has (usually not even a new stack). The child has a
new PID and new resources like its own file descriptor table so close() and
open() don't affect the parent, but if you change a global that's visible to the
parent when it resumes (ant often local variables too: don't return from the
function that called vfork() because if you DON'T have a new stack it'll stomp
the return address the parent needs when IT does it). If the child calls
malloc() the parent needs to free it because it's same heap (because same
mapping of the same physical memory).

Then when the child is ready to discard all those mappings (due to calling
either execve() or _exit(), those are the only two options), the parent resumes
from where it left off with the PID of the child as the system call return 
value.

The reason the child pauses the parent is so only one process is ever using
those mappings at a given time. Otherwise they're acting like threads without
locking, and usually both are sharing a stack.

P.S. You can use threads _instead_ of fork for some stuff on nommu, but that's
its own can of worms. You still need to vfork() when you do create a child
process you're going to exec, so it doesn't go away, you're just requiring
multiple techniques simultaneously to handle a special case.

P.P.S. vfork() is useful on mmu systems to solve the "don't fork from a thread"
problem. You can vfork() from a thread cheaply and reliably and it only pauses
the one thread you forked from, not every thread in the whole process. If you
fork() from a heavily threadded process you can cause a multi-milisecond latency
spike because even with an mmu the copy on write "keep track of what's shared by
what" generally can't handle the "threads AND processes sharing mappings" case,
so it just gives up and copies it all at fork time, in one go, holding a big
lock while doing so. This causes a large latency spike which vfork() avoids.
(And can cause a large wasteful allocation and memory dirtying which is
immediately freed.)

>>> The file descriptor is stored in mm->exe_file.
>>> Probably the most straight forward implementation is to allow
>>> execveat(AT_EXE_FILE, ...).
>>
>> Cool, that works.
>>
>>> You can look at binfmt_misc for how to reopen an open file descriptor.
>>
>> Added to the todo heap.
> 
> Yes I don't think it would be a lot of code.
> 
> I think you might be better served with clone(CLONE_VM) as it doesn't
> block so you don't need to feed yourself your context over a pipe.

Except that doesn't fix it.

Yes I could use threads instead, but the cure is worse than the disease and the
result is your shell background processes are threads rather than independent
processes (is $$ reporting PID or TID, I really don't want to go there).

> Eric

Rob

Re: [PATCH v2 7/8] exec: Generic execfd support

2020-05-21 Thread Rob Landley

On 5/20/20 11:05 AM, Eric W. Biederman wrote:
> Rob Landley  writes:
> 
>> On 5/18/20 7:33 PM, Eric W. Biederman wrote:
>>>
>>> Most of the support for passing the file descriptor of an executable
>>> to an interpreter already lives in the generic code and in binfmt_elf.
>>> Rework the fields in binfmt_elf that deal with executable file
>>> descriptor passing to make executable file descriptor passing a first
>>> class concept.
>>
>> I was reading this to try to figure out how to do execve(NULL, argv[], envp) 
>> to
>> re-exec self after a vfork() in a chroot with no /proc, and hit the most 
>> trivial
>> quibble ever:
> 
> We have /proc/self/exe today.

Not when you first enter a container that's just created a new namespace, or
initramfs first launches PID 1 and runs a shell script to set up the environment
and your (subshell) and background& support only has vfork and not fork, or just
plain "somebody did a chroot"...

(Yes a nommu system with range registers can want _security_ without
_address_translation_. Strange but true! I haven't actually sat down to try to
implement nommu containers yet, but I've done worse things on many occasions.
Remember: the S in IoT stands for Security.)

> If I understand you correctly you would
> like to do the equivalent of 'execve("/proc/self/exe", argv[], envp[])'
> without having proc mounted.

Toybox would _like_ proc mounted, but can't assume it. I'm writing a new
bash-compatible shell with nommu support, which means in order to do subshell
and background tasks if (!CONFIG_FORK) I need to create a pipe pair, vfork(),
have the child exec itself to unblock the parent, and then read the context data
that just got discarded through the pipe from the parent. ("Wheee." And you can
quote me on that.)

I've implemented that already
(https://github.com/landley/toybox/blob/0.8.3/toys/pending/sh.c#L674 and reentry
is L2516, yeah it's a work in progress), but "exec self" requires /proc/self/exe
and since I gave up on getting
http://lkml.iu.edu/hypermail/linux/kernel/2005.1/09399.html in (I should
apologize to Randy but I just haven't got the spoons to face
https://landley.net/notes-2017.html#14-09-2017 again; three strikes and the
patch stays out) I need /init to be a shell script to set up an initramfs that's
made by pointing CONFIG_INITRAMFS_SOURCE at a directory that was made without
running the build as root, because there's no /dev/console and you can't mknod
as a non-root user.

Maybe instead of fixing CONFIG_DEVTMPFS_MOUNT to apply to initramfs I could
instead add a CONFIG_INITRAMFS_EXTRA=blah.txt to usr/{Kconfig,Makefile} to
append user-supplied extra lines to the end of the gen_initramfs.sh output and
make a /dev/console that way (kinda like genext2fs and mksquashfs), but getting
that in through the linux-kernel bureaucracy means consulting a 27 step
checklist supplementing the basic 17 step submission procedure (with
bibliographic references) explaining how to fill out the forms, perform the
validation steps, go through the proper channels, and get the appropriate series
of signatures and approvals, and I just haven't got the stomach for it anymore.
I was participating here as a hobbyist. Linux-kernel has aged into a rigid
bureaucracy. It's no fun anymore.

Which means any kernel patch I write I have to forward port regularly, sometimes
for a very long time. Heck, I gave linux-kernel three strikes at miniconfig
fifteen years ago now:

  http://lkml.iu.edu/hypermail/linux/kernel/0511.2/0479.html
  https://lwn.net/Articles/161086/
  https://lkml.org/lkml/2006/7/6/404

And was still maintaining it out of tree a decade later:

  https://landley.net/aboriginal/FAQ.html#dev_miniconfig
  https://github.com/landley/aboriginal/blob/master/more/miniconfig.sh

These days I've moved on to a microconfig format that mostly fits on one line,
ala the KCONF= stuff in toybox's built in:

  https://github.com/landley/toybox/blob/master/scripts/mkroot.sh#L136

For example, the User Mode Linux miniconfig from my ancient
https://landley.net/writing/docs/UML.html would translate to microconfig as:

  BINFMT_ELF,HOSTFS,LBD,BLK_DEV,BLK_DEV_LOOP,STDERR_CONSOLE,UNIX98_PTYS,EXT2_FS

The current kernel also needs "64BIT" because my host toolchain doesn't have the
-m32 headers installed, but then it builds fine ala:

make ARCH=um allnoconfig KCONFIG_ALLCONFIG=<(echo
BINFMT_ELF,HOSTFS,LBD,BLK_DEV,BLK_DEV_LOOP,STDERR_CONSOLE,UNIX98_PTYS,EXT2_FS,64BIT
| sed -E 's/([^,]*)(,|$)/CONFIG_\1=y\n/g')

Of course running the resulting ./linux says:

  Checking PROT_EXEC mmap in /dev/shm...Operation not permitted
  /dev/shm must be not mounted noexec

But *shrug*, Devuan did that not me. I haven't really used UML since QEMU
started working. Shouldn't the old "create file, map file, delete file" trick
stop flushing the data t

Re: [PATCH v2 7/8] exec: Generic execfd support

2020-05-19 Thread Rob Landley

On 5/18/20 7:33 PM, Eric W. Biederman wrote:
> 
> Most of the support for passing the file descriptor of an executable
> to an interpreter already lives in the generic code and in binfmt_elf.
> Rework the fields in binfmt_elf that deal with executable file
> descriptor passing to make executable file descriptor passing a first
> class concept.

I was reading this to try to figure out how to do execve(NULL, argv[], envp) to
re-exec self after a vfork() in a chroot with no /proc, and hit the most trivial
quibble ever:

> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1323,7 +1323,10 @@ int begin_new_exec(struct linux_binprm * bprm)
>*/
>   set_mm_exe_file(bprm->mm, bprm->file);
>  
> + /* If the binary is not readable than enforce mm->dumpable=0 */

then

Rob

Re: [PATCH v4] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2020-05-15 Thread Rob Landley

On 5/14/20 11:50 PM, Randy Dunlap wrote:
> Hi Rob,

Um, hi.

> You need to send this patch to some maintainer who could merge it.

The implicit "if" is "you expect the kernel bureaucracy to merge anything Not
Invented Here", and the 3 year gap since the last version is because I stopped:

  https://landley.net/notes-2017.html#14-09-2017

To be honest I didn't think anyone would notice this. They don't usually:

  http://lkml.iu.edu/hypermail/linux/kernel/2002.2/00083.html
  https://www.spinics.net/lists/linux-sh/msg56844.html

It just seems polite to post things that got shipped to customers.

> And it uses the wrong multi-line comment format.

Offending comment removed.

Rob
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index f046d21..97352d4 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -48,16 +48,10 @@  config DEVTMPFS_MOUNT
 	bool "Automount devtmpfs at /dev, after the kernel mounted the rootfs"
 	depends on DEVTMPFS
 	help
-	  This will instruct the kernel to automatically mount the
-	  devtmpfs filesystem at /dev, directly after the kernel has
-	  mounted the root filesystem. The behavior can be overridden
-	  with the commandline parameter: devtmpfs.mount=0|1.
-	  This option does not affect initramfs based booting, here
-	  the devtmpfs filesystem always needs to be mounted manually
-	  after the rootfs is mounted.
-	  With this option enabled, it allows to bring up a system in
-	  rescue mode with init=/bin/sh, even when the /dev directory
-	  on the rootfs is completely empty.
+	  Automatically mount devtmpfs at /dev on the root filesystem, which
+	  lets the system come up in rescue mode with [rd]init=/bin/sh.
+	  Override with devtmpfs.mount=0 on the commandline. Initramfs can
+	  create a /dev dir as needed, other rootfs needs the mount point.
 
 config STANDALONE
 	bool "Select only drivers that don't need compile-time external firmware"
diff --git a/fs/namespace.c b/fs/namespace.c
index f8893dc..06057d7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2417,7 +2417,16 @@  static int do_add_mount(struct mount *newmnt, struct path *path, int mnt_flags)
 	err = -EBUSY;
 	if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
 	path->mnt->mnt_root == path->dentry)
+	{
+		if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT) &&
+		!strcmp(path->mnt->mnt_sb->s_type->name, "devtmpfs"))
+		{
+			printk(KERN_WARNING "Debian bug workaround for devtmpfs overmount.");
+
+			err = 0;
+		}
 		goto unlock;
+	}
 
 	err = -EINVAL;
 	if (d_is_symlink(newmnt->mnt.mnt_root))
diff --git a/init/main.c b/init/main.c
index 0ee9c686..0d8e5ec 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1065,12 +1065,6 @@  static noinline void __init kernel_init_freeable(void)
 
 	do_basic_setup();
 
-	/* Open the /dev/console on the rootfs, this should never fail */
-	if (ksys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
-		pr_err("Warning: unable to open an initial console.\n");
-
-	(void) ksys_dup(0);
-	(void) ksys_dup(0);
 	/*
 	 * check if there is an early userspace init.  If yes, let it do all
 	 * the work
@@ -1082,8 +1076,17 @@  static noinline void __init kernel_init_freeable(void)
 			ramdisk_execute_command, 0) != 0) {
 		ramdisk_execute_command = NULL;
 		prepare_namespace();
+	} else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {
+		sys_mkdir("/dev", 0755);
+		devtmpfs_mount("/dev");
 	}
 
+	/* Open the /dev/console on the rootfs, this should never fail */
+	if (ksys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
+		pr_err("Warning: unable to open an initial console.\n");
+	(void) ksys_dup(0);
+	(void) ksys_dup(0);
+
 	/*
 	 * Ok, we have completed the initial bootup, and
 	 * we're essentially up and running. Get rid of the

[PATCH v4] Make initramfs honor CONFIG_DEVTMPFS_MOUNT

2020-05-14 Thread Rob Landley

FYI I dug up my old https://lkml.org/lkml/2017/9/13/651 and ported it to
current, because I needed it for a thing.

From: Rob Landley 

Make initramfs honor CONFIG_DEVTMPFS_MOUNT, and move
/dev/console open after devtmpfs mount.

Add workaround for Debian bug that was copied by Ubuntu.

Signed-off-by: Rob Landley 
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index f046d21..97352d4 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -48,16 +48,10 @@  config DEVTMPFS_MOUNT
 	bool "Automount devtmpfs at /dev, after the kernel mounted the rootfs"
 	depends on DEVTMPFS
 	help
-	  This will instruct the kernel to automatically mount the
-	  devtmpfs filesystem at /dev, directly after the kernel has
-	  mounted the root filesystem. The behavior can be overridden
-	  with the commandline parameter: devtmpfs.mount=0|1.
-	  This option does not affect initramfs based booting, here
-	  the devtmpfs filesystem always needs to be mounted manually
-	  after the rootfs is mounted.
-	  With this option enabled, it allows to bring up a system in
-	  rescue mode with init=/bin/sh, even when the /dev directory
-	  on the rootfs is completely empty.
+	  Automatically mount devtmpfs at /dev on the root filesystem, which
+	  lets the system come up in rescue mode with [rd]init=/bin/sh.
+	  Override with devtmpfs.mount=0 on the commandline. Initramfs can
+	  create a /dev dir as needed, other rootfs needs the mount point.
 
 config STANDALONE
 	bool "Select only drivers that don't need compile-time external firmware"
diff --git a/fs/namespace.c b/fs/namespace.c
index f8893dc..06057d7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2417,7 +2417,21 @@  static int do_add_mount(struct mount *newmnt, struct path *path, int mnt_flags)
 	err = -EBUSY;
 	if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb &&
 	path->mnt->mnt_root == path->dentry)
+	{
+		if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT) &&
+		!strcmp(path->mnt->mnt_sb->s_type->name, "devtmpfs"))
+		{
+			/* Debian's kernel config enables DEVTMPFS_MOUNT, then
+			   its initramfs setup script tries to mount devtmpfs
+			   again, and if the second mount-over-itself fails
+			   the script overmounts a tmpfs on /dev to hide the
+			   existing contents, then boot fails with empty /dev. */
+			printk(KERN_WARNING "Debian bug workaround for devtmpfs overmount.");
+
+			err = 0;
+		}
 		goto unlock;
+	}
 
 	err = -EINVAL;
 	if (d_is_symlink(newmnt->mnt.mnt_root))
diff --git a/init/main.c b/init/main.c
index 0ee9c686..0d8e5ec 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1065,12 +1065,6 @@  static noinline void __init kernel_init_freeable(void)
 
 	do_basic_setup();
 
-	/* Open the /dev/console on the rootfs, this should never fail */
-	if (ksys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
-		pr_err("Warning: unable to open an initial console.\n");
-
-	(void) ksys_dup(0);
-	(void) ksys_dup(0);
 	/*
 	 * check if there is an early userspace init.  If yes, let it do all
 	 * the work
@@ -1082,8 +1076,17 @@  static noinline void __init kernel_init_freeable(void)
 			ramdisk_execute_command, 0) != 0) {
 		ramdisk_execute_command = NULL;
 		prepare_namespace();
+	} else if (IS_ENABLED(CONFIG_DEVTMPFS_MOUNT)) {
+		sys_mkdir("/dev", 0755);
+		devtmpfs_mount("/dev");
 	}
 
+	/* Open the /dev/console on the rootfs, this should never fail */
+	if (ksys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0)
+		pr_err("Warning: unable to open an initial console.\n");
+	(void) ksys_dup(0);
+	(void) ksys_dup(0);
+
 	/*
 	 * Ok, we have completed the initial bootup, and
 	 * we're essentially up and running. Get rid of the

Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler

2020-05-14 Thread Rob Landley

On 5/13/20 4:59 PM, Eric W. Biederman wrote:
> Careful with your terminology.  ELF sections are for .o's For
> executables ELF have segments.  And reading through the code it is the
> program segments that are independently relocatable.

Sorry, I have trouble keeping this stuff straight when it's not in front of me.
(I have a paperback copy of the old "linkers and loaders" book and it was the
driest thing I have _ever_ slogged through. Back before the Linux Foundation ate
the FSG I was pushing https://refspecs.linuxbase.org/ to include missing ABI
supplement, I have copies of ones it doesn't collected from now long-dead 
sites...)

But more recently I've just made puppy eyes at Rich Felker to have him fix this
stuff for me, because I do _not_ retain the terminology here. REL vs RELA vs
PLT, can you have a PLT without a GOT...?

> There is a flag but it is defined per architecture and I don't think one
> of the architectures define it.

They all check for one, but I don't remember there being a #define.

I have a todo item to check more architectures' fdpic binaries, this was from
sh2eb (ala j-core):

  https://github.com/landley/toybox/commit/d61aeaf9e#diff-4442ddbb8949R65

There was the out of tree arm fdpic toolchain from the french guys for cortex-m,
and the original frv paper, and in theory blackfin but nothing they touched ever
got merged upstream anywhere:

In _theory_ you could do fdpic for x86, but as with u-boot for x86 nobody ever
bothers because it's got an x86-only solution. (And then the x86 version of
stuff gets pushed to other platforms because all our device tree files were
GPLed so of course acpi for arm became a thing. Sigh...)

> I looked at ARM and apparently with an MMU ARM turns fdpic binaries into
> PIE executables.  I am not certain why.

Falling back to a more widely tested codepath, I expect. Also maybe it saves 3
registers if all 4 are using the same base register? Map them linearly and it
becomes "single base + offset"? Which of course looses the extra ASLR benefits
the security people wanted, but "undoing what the security people want in the
name of an unmeasurable microbenchmark optimization" is a proud tradition.

Just because the 4 segments are compiled as independently relocatable doesn't
mean they HAVE to be. (You'd think the code would be using different register
numbers to index stuff so you'd STILL be using 4 registers, but I haven't looked
at what arm's doing...)

> The registers passed to the entry point are also different for both
> cases.

>From the same machine code chunks? I boggle at what the ld.so fixup is doing 
>then...

> I think it would have been nice if the fdpic support had used a
> different ELF type, instead of a different depending on using a
> different architecture.

This is what you get when a blackfin developer talks to the gnu/binutils 
developers:

  https://sourceware.org/legacy-ml/binutils/2008-04/msg00350.html

> All that aside the core dumping code looks to be essentially the same
> between binfmt_elf.c and binfmt_elf_fdpic.c.  Do you think people would
> be interested in refactoring binfmt_elf.c and binfmt_elf_fdpic.c so that
> they could share the same core dumping code?

I think merging the two of them together entirely would be a good idea, and
anything that can collapse together I'm happy to regression test on sh2.

I also note that qemu-sh4eb can run these binaries, maybe I can whip up a
qemu-system-sh4eb that runs a nommu fdpic userspace...

[hours later]

Ok, here's me asking Rich Felker a question:

>>> So fdpic binaries run under qemu-sh2eb and there's a qemu-system-sh2eb that
>>> SHOULD also be able to run them under the r2d board emulation, and the 
>>> kernel
>>> builds fine under the sh2eb compiler but I can't enable fdpic support 
>>> without
>>> CONFIG_NOMMU, and if I yank that dependency from Kconfig (which only sh2 
>>> has,
>>> arm and such do fdpic with or without mmu) the build breaks with:
>>>
>>> /home/landley/toybox/clean/ccc/sh2eb-linux-muslfdpic-cross/bin/sh2eb-linux-muslfdpic-ld:
>>> fs/binfmt_elf_fdpic.o: in function `load_elf_fdpic_binary':
>>> binfmt_elf_fdpic.c:(.text+0x1734): undefined reference to
>>> `elf_fdpic_arch_lay_out_mm'
>>>
>>> The problem is if I switch off CONFIG_MMU in the kernel, buckets of stuff 
>>> in the
>>> r2d board kernel config changes and suddenly I don't get serial output from 
>>> the
>>> qemu-system-sh2eb -M r2d boot anymore. Before it was running the kernel but 
>>> just
>>> failing to run init...

And his response:

>> I don't think qemu-system-sh4eb can boot a nommu kernel. But you don't
>> need to in order to do userspace-only testing. Just build a normal
>> sh4eb kernel. It doesn't need CONFIG_BINFMT_ELF_FDPIC. The normal ELF
>> loader can load FDPIC just fine, because a valid FDPIC ELF file is a
>> valid ELF file, just with more constraints (in same sense a square is
>> a rectangle). The normal ELF loader won't independently float the text
>> and data segments, but that's okay

Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler

2020-05-12 Thread Rob Landley




On 5/12/20 7:20 PM, Linus Torvalds wrote:
> On Tue, May 12, 2020 at 11:46 AM Eric W. Biederman
>  wrote:
>>
>> I am still thinking about this one, but here is where I am at.  At a
>> practical level passing the file descriptor of the script to interpreter
>> seems like something we should encourage in the long term.  It removes
>> races and it is cheaper because then the interpreter does not have to
>> turn around and open the script itself.
> 
> Yeah, I think we should continue to support it, because I think it's
> the right thing to do (and we might just end up having compatibility
> issues if we don't).
...
>> It is possible although unlikely for userspace to find the file
>> descriptor without consulting AT_EXECFD so just to be conservative I
>> think we should install the file descriptor in begin_new_exec even if
>> the next interpreter does not support AT_EXECFD.
> 
> Ack. I think the AT_EXECFD thing is a sign that this isn't internal to
> binfmt_misc, but it also shouldn't be gating this issue. In reality,
> ELF is the only real binary format that matters - the script/misc
> binfmts are just indirection entries - and it supports AT_EXECFD, so
> let's just ignore the theoretical case of "maybe nobody exposes it".

Would this potentially make the re-exec-yourself case easier to do at some
point? (Which nommu needs to do, and /proc/self/exe isn't always available.)

Here's the first time I asked about that:

https://lore.kernel.org/lkml/200612261823.07927@landley.net/

Here's the most recent:

https://lkml.org/lkml/2017/9/5/246

Here's someone else asking and being basically told "chroot isn't a thing":

http://lkml.iu.edu/hypermail/linux/kernel/0906.3/00584.html

(See also "CVE-2019-5736" and the workarounds thereto.)

Rob

P.S. Yes I'm aware it would only work properly with static binaries. Not the
first thing that's true for.

Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler

2020-05-11 Thread Rob Landley

On 5/11/20 9:33 AM, Eric W. Biederman wrote:
> What I do see is that interp_data is just a parameter that is smuggled
> into the call of search binary handler.  And the next binary handler
> needs to be binfmt_elf for it to make much sense, as only binfmt_elf
> (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD.

The binfmt_elf_fdpic driver is separate from binfmt_elf for the same reason
ext2/ext3/ext4 used to have 3 drivers: fdpic is really just binfmt_elf with the
4 main sections (text, data, bss, rodata) able to move independently of each
other (each tracked with its own base pointer).

It's kind of -fPIE on steroids, and various security people have sniffed at it
over the years to give ASLR more degrees of freedom on with-MMU systems. Many
moons ago Rich Felker proposed teaching the fdpic loader how to load normal ELF
binaries so there's just the one loader (there's a flag in the ELF header to say
whether the sections are independent or not).

Rob

Re: [PATCH v2 0/5] Fix ELF / FDPIC ELF core dumping, and use mmap_sem properly in there

2020-05-01 Thread Rob Landley

On 5/1/20 1:00 AM, Greg Ungerer wrote:
>> This sounds correct. My understanding of FLAT shared library support
>> is that it's really bad and based on having preassigned slot indices
>> for each library on the system, and a global array per-process to give
>> to data base address for each library. Libraries are compiled to know
>> their own slot numbers so that they just load from fixed_reg[slot_id]
>> to get what's effectively their GOT pointer.

fdpic is to elf what binflt is to a.out, and a.out shared libraries were never
pretty. Or easy.

>> I'm not sure if anybody has actually used this in over a decade. Last
>> time I looked the tooling appeared broken, but in this domain lots of
>> users have forked private tooling that's not publicly available or at
>> least not publicly indexed, so it's hard to say for sure.
> 
> Be at least 12 or 13 years since I last had a working shared library
> build for m68knommu. I have not bothered with it since then, not that I
> even used it much when it worked. Seemed more pain than it was worth.

Shared libraries worked fine with fdpic on sh2 last I checked, it's basically
just ELF PIC with the ability to move the 4 segments (text/rodata/bss/data)
independently of each other. (4 base pointers, no waiting.)

I don't think I've _ever_ used shared binflt libraries. I left myself
breadcrumbs back when I was wrestling with that stuff:

  https://landley.net/notes-2014.html#07-12-2014

But it looks like that last time I touched anything using elf2flt was:

  https://landley.net/notes-2018.html#08-05-2018

And that was just because arm's fdpic support stayed out of tree for years so I
dug up binflt and gave it another go. (It sucked so much I wound up building
static pie for cortex-m, taking the efficiency hit, and moving on. Running pie
binaries on nommu _works_, it's just incredibly inefficient. Since the writeable
and readable segments of the ELF are all relative to the same single base
pointer, you can't share the read-only parts of the binaries without address
remapping, so if you launch 4 instances of PIE bash on nommu you've loaded 4
instances of the bash text and rodata, and of course none of it can even be
demand faulted. In theory shared libraries _do_ help there but I hit some ld.so
bug and didn't want to debug a half-assed solution, so big hammer and moved on
until arm fdpic got merged and fixed it _properly_...)

Rob

P.S. The reason for binflt is bare metal hardware engineers who are conceptually
uncomfortable with software love them, because it's as close to "objcopy -O
binary" as they can get. Meanwhile on j-core we've had an 8k ROM boot loader
that loads vmlinux images and does the ELF relocations for 5 years now, and ever
since the switch to device tree that's our _only_ way to feed a dtb to the
kernel without statically linking it in, so it's ELF all the way down for us.

Re: [PATCH v2 0/5] Fix ELF / FDPIC ELF core dumping, and use mmap_sem properly in there

2020-04-30 Thread Rob Landley

On 4/30/20 9:51 AM, Rich Felker wrote:
> This sounds correct. My understanding of FLAT shared library support
> is that it's really bad and based on having preassigned slot indices
> for each library on the system, and a global array per-process to give
> to data base address for each library. Libraries are compiled to know
> their own slot numbers so that they just load from fixed_reg[slot_id]
> to get what's effectively their GOT pointer.
> 
> I'm not sure if anybody has actually used this in over a decade. Last
> time I looked the tooling appeared broken, but in this domain lots of
> users have forked private tooling that's not publicly available or at
> least not publicly indexed, so it's hard to say for sure.

Lots of people in this area are also still using 10 year old tools because it
breaks every time they upgrade.

Heck, nommu support for architectures musl doesn't support yet is _explicitly_
the main thing keeping uClibc alive:

  https://www.openwall.com/lists/musl/2015/05/30/1

Rob

Re: [PATCH 2/5] coredump: Fix handling of partial writes in dump_emit()

2020-04-28 Thread Rob Landley

On 4/27/20 10:35 PM, Linus Torvalds wrote:
> On Mon, Apr 27, 2020 at 8:28 PM Jann Horn  wrote:
>>
>> After a partial write, we have to update the input buffer pointer.
> 
> Interesting. It seems this partial write case never triggers (except
> for actually killing the core-dump).
> 
> Or did you find a case where it actually matters?
> 
> Your fix is obviously correct, but it also makes me go "that function
> clearly never actually worked for partial writes, maybe we shouldn't
> even bother?"

Writes to a local filesystem should never be short unless disk full/error.

Once upon a time this was yet another thing that NFS could break that no other
filesystem would break, but I dunno about now? (I think the page cache collates
it and defers the flush until the error can't be reported back anyway?)

Rob

Re: [PATCH v2 7/7] n_tty: Provide an informational line on VSTATUS receipt

2019-08-01 Thread Rob Landley

On 8/1/19 4:20 AM, Greg Kroah-Hartman wrote:
>> SysRq is system-wide, whereas this is per-terminal and only cares about
>> one tty which the status char is pressed at and its foreground pgrp
>> (most likely it's the foreground shell job).
>>
>> I hope this is clear enough.
> 
> It is, yes.  My big objection is the crazy code I point out above, as
> well as the "create a totally new interface when we might be able to use
> an existing one" that you need to convince me is really required :)

It's not a new interface, it's a multiple decades old BSD interface our
tcgetattr man page already mentions, which seems to be one of the big things BSD
people miss when using Linux, and which I tried and failed to implement without
kernel support months ago.

I wasn't involved in this kernel patch effort, I got pointed at news coverage
about it by the Android Bionic maintainer:

  http://lists.landley.net/pipermail/toybox-landley.net/2019-June/010536.html

Which is how I wound up cc'd on this thread.

I don't think Android specifically cares about SIGINFO, but they're trying to
support building Android on MacOSX, which means trying to support building it on
FreeBSD, which involves outreach to the BSD community, and they brought up the
lack of ctrl-T and siginfo as a thing they really missed when having to deal
with the Linux command line.

(The fact there _was_ news coverage of the patch for somebody to point me at may
also be an indication of interest floating around out there...)

Rob

Re: [PATCH v2 7/7] n_tty: Provide an informational line on VSTATUS receipt

2019-08-01 Thread Rob Landley

On 7/30/19 11:19 AM, Greg Kroah-Hartman wrote:
> On Tue, Jun 25, 2019 at 07:11:53PM +0300, Arseny Maslennikov wrote:
>> If the three termios local flags isig, icanon, iexten are enabled
>> and the local flag nokerninfo is disabled for a tty governed
>> by the n_tty line discipline, then on receiving the keyboard status
>> character n_tty will generate a status message and write it out to
>> the tty before sending SIGINFO to the tty's foreground process group.
>>
>> This kerninfo line contains information about the current system load
>> as well as some properties of "the most interesting" process in the
>> tty's current foreground process group, namely:
>>  - its PID as seen inside its deepest PID namespace;
>>* the whole process group ought to be in a single PID namespace,
>>  so this is actually deterministic
>>  - its saved command name truncated to 16 bytes (task_struct::comm);
>>* at the time of writing TASK_COMM_LEN == 16
>>  - its state and some related bits, procps-style;
>>  - for S and D: its symbolic wait channel, if available; or a short
>>description for other process states instead;
>>  - its user, system and real rusage time values;
>>  - its resident set size (as well as the high watermark) in kilobytes.
> 
> Why is this really all needed as we have the SysRq handlers that report
> all of this today?

People were lamenting the lack of siginfo in linux back in May, I offered to try
to implement it, several people jumped in to offer suggestions, and it turns out
you can't really do it without kernel support.

https://twitter.com/landley/status/1131764323196522498

 >> The "most interesting" process is chosen as follows:
>>  - runnables over everything
>>  - uninterruptibles over everything else
>>  - among 2 runnables pick the biggest utime + stime
>>  - any unresolved ties are decided in favour of greatest PID.
> 
> This does not feel like something that the tty core code should be doing
> at all.

I couldn't figure out how to do it without kernel support when I tried.

http://lists.landley.net/pipermail/toybox-landley.net/2019-May/010461.html

Rob

Re: [PATCH 0/7] TTY Keyboard Status Request

2019-06-10 Thread Rob Landley

On 6/9/19 3:56 PM, Arseny Maslennikov wrote:
> This is similar to SIGWINCH, which is default-ignored as well: if the
> terminal width/height changes (like when a terminal emulator window is
> resized), its foreground pgrp gets a surprise signal as well, and the
> processes that don't care about WINCH (and thus have default
> disposition) do not get confused.
> E.g. 'strace cat' demonstrates this quite clearly.

Once upon a time suspending pipelines with ctrl-z broke stuff all the time due
to zero length reads being interpreted as EOF. These days I don't see that so
much anymore, I think SA_RESTART is the default now?

Rob

Re: [PATCH 1/7] signal.h: Define SIGINFO on all architectures

2019-06-10 Thread Rob Landley

On 6/5/19 3:19 AM, Arseny Maslennikov wrote:
> This complementary patch defines SIGINFO as a synonym for SIGPWR
> on every architecture supported by the kernel.
> The particular signal number chosen does not really matter and is only
> required for the related tty functionality to work properly,
> so if it does not suite expectations, any suggestions are warmly
> welcome.

This was the problem I saw last month: 32 bits worth of signal numbers already
defined, gotta alias something.

> SIGPWR looks like a nice candidate for this role, because it is
> defined on every supported arch; it is currently only used to inform
> PID 1 of power failures, and daemons that care about low-level
> events do not tend to have a controlling terminal.

/dev/console isn't a controlling tty so ctrl-T wouldn't send SIGPWR to PID 1 
anyway.

> However, on sparcs SIGPWR is a synonym for SIGLOST, a signal unique
> to that architecture, with a narrow set of intended uses that do not
> combine well with interactively requesting status.
> SIGLOST is not used by any kernel code at the moment.
> I'm not sure there is a more reasonable alternative right now.

The fact it's already _been_ aliased once says it's a good candidate for it. The
easy solution is don't support SIGINFO on sparc until the sparc guys figure out
what to do there and add sparc support in a follow-up patch.

Rob

Re: [PATCH 0/7] TTY Keyboard Status Request

2019-06-10 Thread Rob Landley

On 6/5/19 3:18 AM, Arseny Maslennikov wrote:
> This patch series introduces TTY keyboard status request, a feature of
> the n_tty line discipline that reserves a character in struct termios
> (^T by default) and reacts to it by printing a short informational line
> to the terminal and sending a Unix signal to the tty's foreground
> process group. The processes may, in response to the signal, output a
> textual description of what they're doing.

I had a long twitter thread about this with some BSD developers,
https://twitter.com/landley/status/1127148250430152704
asked on the toybox list for opinions,
http://lists.landley.net/pipermail/toybox-landley.net/2019-May/010461.html
and became aware of this patch set when the android bionic maintainer pointed me
at news coverage of it
http://lists.landley.net/pipermail/toybox-landley.net/2019-June/010536.html

So there would appear to at least be interest in the concept.

(The conclusion I came to looking at it last month is is it can't be done
without kernel support, but if such support _does_ arrive I want to add it to
toybox.)

Rob

Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()

2019-05-22 Thread Rob Landley




On 5/22/19 11:17 AM, h...@zytor.com wrote:
> On May 20, 2019 2:39:46 AM PDT, Roberto Sassu  
> wrote:
>> On 5/18/2019 12:17 AM, Arvind Sankar wrote:
>>> On Fri, May 17, 2019 at 02:47:31PM -0700, H. Peter Anvin wrote:
 On 5/17/19 2:02 PM, Arvind Sankar wrote:
> On Fri, May 17, 2019 at 01:18:11PM -0700, h...@zytor.com wrote:
>>
>> Ok... I just realized this does not work for a modular initramfs,
>> composed at load time from multiple files, which is a very real
>> problem. Should be easy enough to deal with: instead of one large file,
>> use one companion file per source file, perhaps something like
>> filename..xattrs (suggesting double dots to make it less likely to
>> conflict with a "real" file.) No leading dot, as it makes it more
>> likely that archivers will sort them before the file proper.
> This version of the patch was changed from the previous one exactly
>> to deal with this case --
> it allows for the bootloader to load multiple initramfs archives,
>> each
> with its own .xattr-list file, and to have that work properly.
> Could you elaborate on the issue that you see?
>

 Well, for one thing, how do you define "cpio archive", each with its
>> own
 .xattr-list file? Second, that would seem to depend on the ordering,
>> no,
 in which case you depend critically on .xattr-list file following
>> the
 files, which most archivers won't do.

 Either way it seems cleaner to have this per file; especially if/as
>> it
 can be done without actually mucking up the format.

 I need to run, but I'll post a more detailed explanation of what I
>> did
 in a little bit.

-hpa

>>> Not sure what you mean by how do I define it? Each cpio archive will
>>> contain its own .xattr-list file with signatures for the files within
>>> it, that was the idea.
>>>
>>> You need to review the code more closely I think -- it does not
>> depend
>>> on the .xattr-list file following the files to which it applies.
>>>
>>> The code first extracts .xattr-list as though it was a regular file.
>> If
>>> a later dupe shows up (presumably from a second archive, although the
>>> patch will actually allow a second one in the same archive), it will
>>> then process the existing .xattr-list file and apply the attributes
>>> listed within it. It then will proceed to read the second one and
>>> overwrite the first one with it (this is the normal behaviour in the
>>> kernel cpio parser). At the end once all the archives have been
>>> extracted, if there is an .xattr-list file in the rootfs it will be
>>> parsed (it would've been the last one encountered, which hasn't been
>>> parsed yet, just extracted).
>>>
>>> Regarding the idea to use the high 16 bits of the mode field in
>>> the header that's another possibility. It would just require
>> additional
>>> support in the program that actually creates the archive though,
>> which
>>> the current patch doesn't.
>>
>> Yes, for adding signatures for a subset of files, no changes to the ram
>> disk generator are necessary. Everything is done by a custom module. To
>> support a generic use case, it would be necessary to modify the
>> generator to execute getfattr and the awk script after files have been
>> placed in the temporary directory.
>>
>> If I understood the new proposal correctly, it would be task for cpio
>> to
>> read file metadata after the content and create a new record for each
>> file with mode 0x18000, type of metadata encoded in the file name and
>> metadata as file content. I don't know how easy it would be to modify
>> cpio. Probably the amount of changes would be reasonable.

I could make toybox cpio do it in a weekend, and could probably throw a patch at
usr/gen_init_cpio.c while I'm at it. I prototyped something like that a couple
years ago, it's not hard.

The real question is scripts/gen_initramfs_list.sh and the text format it
produces. We can currently generate cpio files with different ownership and
permissions than the host system can represent (when not building as root, on a
filesystem that may not support xattrs or would get unhappy about conflicting
selinux annotations). We work around it by having the metadata represented
textually in the initramfs_list file gen_initramfs_list.sh produces and
gen_init_cpio.c consumes.

xattrs are a terrible idea the Macintosh invented so Finder could remember where
you moved a file's icon in its folder without having to modify the file, and
then things like OS/2 copied it and Windows picked it up from there and went "Of
course, this is a security mechanism!" and... sigh.

This is "data that is not data", it's metadata of unbounded size. It seems like
it should go in gen_initramfs_list.sh but as what, keyword=value pairs that
might have embedded newlines in them? A base64 encoding? Something else?

>> The kernel will behave in a similar way. It will call do_readxattrs()
>> in
>> do_copy() for each file. Since the only difference between

Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()

2019-05-17 Thread Rob Landley

On 5/17/19 4:41 PM, H. Peter Anvin wrote:
> On 5/17/19 1:18 PM, h...@zytor.com wrote:
>>
>> Ok... I just realized this does not work for a modular initramfs, composed 
>> at load time from multiple files, which is a very real problem. Should be 
>> easy enough to deal with: instead of one large file, use one companion file 
>> per source file, perhaps something like filename..xattrs (suggesting double 
>> dots to make it less likely to conflict with a "real" file.) No leading dot, 
>> as it makes it more likely that archivers will sort them before the file 
>> proper.
>>
>> A side benefit is that the format can be simpler as there is no need to 
>> encode the filename.
>>
>> A technically cleaner solution still, but which would need archiver 
>> modifications, would be to encode the xattrs as an optionally nameless file 
>> (just an empty string) with a new file mode value, immediately following the 
>> original file. The advantage there is that the archiver itself could support 
>> xattrs and other extended metadata (which has been requested elsewhere); the 
>> disadvantage obviously is that that it requires new support in the archiver. 
>> However, at least it ought to be simpler since it is still a higher protocol 
>> level than the cpio archive itself.
>>
>> There's already one special case in cpio, which is the "!!!TRAILER!!!" 
>> filename; although I don't think it is part of the formal spec, to the 
>> extent there is one, I would expect that in practice it is always encoded 
>> with a mode of 0, which incidentally could be used to unbreak the case where 
>> such a filename actually exists. So one way to support such extended 
>> metadata would be to set mode to 0 and use the filename to encode the type 
>> of metadata. I wonder how existing GNU or BSD cpio (the BSD one is better 
>> maintained these days) would deal with reading such a file; it would at 
>> least not be a regression if it just read it still, possibly with warnings. 
>> It could also be possible to use bits 17:16 in the mode, which are 
>> traditionally always zero (mode_t being 16 bits), but I believe are present 
>> in most or all of the cpio formats for historical reasons. It might be 
>> accepted better by existing implementations to use one of these high bits 
>> combined with S_IFREG, I dont know.
>
> 
> Correction: it's just !!!TRAILER!!!.

We documented it as "TRAILER!!!" without leading !!!, and that its purpose is to
flush hardlinks:

  https://www.kernel.org/doc/Documentation/early-userspace/buffer-format.txt

That's what toybox cpio has been producing. Kernel consumes it just fine. Just
checked busybox cpio and that's what they're producing as well...

Rob

Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()

2019-05-17 Thread Rob Landley

On 5/17/19 3:18 PM, h...@zytor.com wrote:
> Ok... I just realized this does not work for a modular initramfs, composed at 
> load time from multiple files, which is a very real problem. Should be easy 
> enough to deal with: instead of one large file, use one companion file per 
> source file, perhaps something like filename..xattrs (suggesting double dots 
> to make it less likely to conflict with a "real" file.) No leading dot, as it 
> makes it more likely that archivers will sort them before the file proper.
> 
> A side benefit is that the format can be simpler as there is no need to 
> encode the filename.
> 
> A technically cleaner solution still, but which would need archiver 
> modifications, would be to encode the xattrs as an optionally nameless file 
> (just an empty string) with a new file mode value, immediately following the 
> original file. The advantage there is that the archiver itself could support 
> xattrs and other extended metadata (which has been requested elsewhere); the 
> disadvantage obviously is that that it requires new support in the archiver. 
> However, at least it ought to be simpler since it is still a higher protocol 
> level than the cpio archive itself.
> 
> There's already one special case in cpio, which is the "!!!TRAILER!!!" 
> filename; although I don't think it is part of the formal spec, to the extent 
> there is one, I would expect that in practice it is always encoded with a 
> mode of 0, which incidentally could be used to unbreak the case where such a 
> filename actually exists. So one way to support such extended metadata would 
> be to set mode to 0 and use the filename to encode the type of metadata. I 
> wonder how existing GNU or BSD cpio (the BSD one is better maintained these 
> days) would deal with reading such a file; it would at least not be a 
> regression if it just read it still, possibly with warnings. It could also be 
> possible to use bits 17:16 in the mode, which are traditionally always zero 
> (mode_t being 16 bits), but I believe are present in most or all of the cpio 
> formats for historical reasons. It might be accepted better by existing 
> implementations to use one of these high bits combined with S_IFREG, I dont 
> know.
> 

I'll happily modify toybox cpio to understand xattrs (compress and decompress),
the android guys do a lot with xattrs already. I tapped out of _this_ discussion
from disgust with the proposed encoding.

Rob

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-14 Thread Rob Landley

On 5/14/19 2:18 PM, James Bottomley wrote:
>> I think Rob is right here.  If /init was statically built into the
>> kernel image, it has no more ability to compromise the kernel than
>> anything else in the kernel.  What's the problem here?
> 
> The specific problem is that unless you own the kernel signing key,
> which is really untrue for most distribution consumers because the
> distro owns the key, you cannot build the initrd statically into the
> kernel.  You can take the distro signed kernel, link it with the initrd
> then resign the combination with your key, provided you insert your key
> into the MoK variables as a trusted secure boot key, but the distros
> have been unhappy recommending this as standard practice.
> 
> If our model for security is going to be to link the kernel and the
> initrd statically to give signature protection over the aggregate then
> we need to figure out how to execute this via the distros.  If we
> accept that the split model, where the distro owns and signs the kernel
> but the machine owner builds and is responsible for the initrd, then we
> need to explore split security models like this proposal.

You can have a built-in and an external initrd? The second extracts over the
first? (I know because once upon a time conflicting files would append. It
sounds like the desired behavior here is O_EXCL fail and move on.)

Rob

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-14 Thread Rob Landley

On 5/14/19 6:52 AM, Roberto Sassu wrote:
> On 5/14/2019 8:06 AM, Rob Landley wrote:
>> On 5/13/19 7:47 AM, Roberto Sassu wrote:
>>> On 5/13/2019 11:07 AM, Rob Landley wrote:
>>>>>> Wouldn't the below work even before enforcing signatures on external
>>>>>> initramfs:
>>>>>> 1. Create an embedded initramfs with an /init that does the xattr
>>>>>> parsing/setting. This will be verified as part of the kernel image
>>>>>> signature, so no new code required.
>>>>>> 2. Add a config option/boot parameter to panic the kernel if an external
>>>>>> initramfs attempts to overwrite anything in the embedded initramfs. This
>>>>>> prevents overwriting the embedded /init even if the external initramfs
>>>>>> is unverified.
>>>>>
>>>>> Unfortunately, it wouldn't work. IMA is already initialized and it would
>>>>> verify /init in the embedded initial ram disk.
>>>>
>>>> So you made broken infrastructure that's causing you problems. Sounds
>>>> unfortunate.
>>>
>>> The idea is to be able to verify anything that is accessed, as soon as
>>> rootfs is available, without distinction between embedded or external
>>> initial ram disk.
>>
>> If /init is in the internal one and you can't overwrite files with an 
>> external
>> one, all your init has to be is something that applies the xattrs, enables 
>> your
>> paranoia mode, and then execs something else.
> 
> Shouldn't file metadata be handled by the same code that extracts the
> content? Instead, file content is extracted by the kernel, and we are
> adding another step to the boot process, to execute a new binary with a
> link to libc.

I haven't made a dynamically linked initramfs in years (except a couple for
testing purposes). But then I don't deploy glibc, so...

> From the perspective of a remote verifier that checks the software
> running on the system, would it be easier to check less than 150 lines
> of code, or a CPIO image containing a binary + libc?

https://github.com/torvalds/linux/blob/master/tools/include/nolibc/nolibc.h

(I have a todo item to add sh4 and m68k and ppc and such sections to that, but
see "I've needed to resubmit
http://lkml.iu.edu/hypermail/linux/kernel/1709.1/03561.html for a couple years
now but it works for me locally and dealing with linux-kernel is just no fun
anymore"...)

>> You can totally use initramfs for lots of purposes simultaneously.
> 
> Yes, I agree. However, adding an initramfs to initialize another
> initramfs when you can simply extract file content and metadata with the
> same parser, this for me it is difficult to justify.

You just said it's simpler to modify the kernel than do a thing you can already
do in userspace. You realize that, right?

>>>>> The only reason why
>>>>> opening .xattr-list works is that IMA is not yet initialized
>>>>> (late_initcall vs rootfs_initcall).
>>>>
>>>> Launching init before enabling ima is bad because... you didn't think of 
>>>> it?
>>>
>>> No, because /init can potentially compromise the integrity of the
>>> system.
>>
>> Which isn't a problem if it was statically linked in the kernel, or if your
>> external cpio.gz was signed. You want a signed binary but don't want the
>> signature _in_ the binary...
> 
> It is not just for binaries. How you would deal with arbitrary file
> formats?

I'm confused, are you saying that /init can/should be an arbitrary file format,
or that a cpio statically linked into the kernel can't contain files in
arbitrary formats?

>> Which is why there's a cpio in the kernel and an external cpio loaded via the
>> old initrd mechanism and BOTH files wind up in the cpio and there's a way to
>> make it O_EXCL so it can't overwrite, and then the /init binary inside the
>> kernel's cpio can do any other weird verification you need to do before 
>> anything
>> else gets a chance to run so why are you having ring 0 kernel code read a 
>> file
>> out of the filesystem and act upon it?
> 
> The CPIO parser already invokes many system calls.

The one in the kernel doesn't call system calls, no. Once userspace is running
it can do what it likes. The one statically linked into the kernel was set up by
the same people who built the kernel; if you're letting arbitrary kernels run on
your system it's kinda over already from a security context?

>> If it's in the file's contents you get uniform behavior regardless of the
>> filesystem used. And "mandatory access controls do that" is b

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-14 Thread Rob Landley

On 5/13/19 5:09 PM, Mimi Zohar wrote:
>> Ok, but wouldn't my idea still work? Leave the default compiled-in
>> policy set to not appraise initramfs. The embedded /init sets all the
>> xattrs, changes the policy to appraise tmpfs, and then exec's the real
>> init? Then everything except the embedded /init and the file with the
>> xattrs will be appraised, and the embedded /init was verified as part of
>> the kernel image signature. The only additional kernel change needed
>> then is to add a config option to the kernel to disallow overwriting the
>> embedded initramfs (or at least the embedded /init).
> 
> Yes and no.  The current IMA design allows a builtin policy to be
> specified on the boot command line ("ima_policy="), so that it exists
> from boot, and allows it to be replaced once with a custom policy.
>  After that, assuming that CONFIG_IMA_WRITE_POLICY is configured,
> additional rules may be appended.  As your embedded /init solution
> already replaces the builtin policy, the IMA policy couldn't currently
> be replaced a second time with a custom policy based on LSM labels.

So your design assumption you're changing other code to work around in that
instance is the policy can only be replaced once rather than having a "finalize"
option when it's set, making it immutable from then on.

Rob

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-14 Thread Rob Landley

On 5/13/19 7:47 AM, Roberto Sassu wrote:
> On 5/13/2019 11:07 AM, Rob Landley wrote:
>>>> Wouldn't the below work even before enforcing signatures on external
>>>> initramfs:
>>>> 1. Create an embedded initramfs with an /init that does the xattr
>>>> parsing/setting. This will be verified as part of the kernel image
>>>> signature, so no new code required.
>>>> 2. Add a config option/boot parameter to panic the kernel if an external
>>>> initramfs attempts to overwrite anything in the embedded initramfs. This
>>>> prevents overwriting the embedded /init even if the external initramfs
>>>> is unverified.
>>>
>>> Unfortunately, it wouldn't work. IMA is already initialized and it would
>>> verify /init in the embedded initial ram disk.
>>
>> So you made broken infrastructure that's causing you problems. Sounds
>> unfortunate.
> 
> The idea is to be able to verify anything that is accessed, as soon as
> rootfs is available, without distinction between embedded or external
> initial ram disk.

If /init is in the internal one and you can't overwrite files with an external
one, all your init has to be is something that applies the xattrs, enables your
paranoia mode, and then execs something else.

Heck, I do that sort of set up in shell scripts all the time. Running the shell
script as PID 1 and then having it exec the "real init" binary at the end:

https://github.com/landley/mkroot/blob/83def3cbae21/mkroot.sh#L205

If your first init binary is in the initramfs statically linked into the kernel
image, and the cpio code is doing open(O_EXCL), then it's as verified as any
other kernel code and runs "securely" until it decides to run something else.

> Also, requiring an embedded initramfs for xattrs would be an issue for
> systems that use it for other purposes.

I'm the guy who wrote the initmpfs code. (And has pending patches to improve it
that will probably never go upstream because I'm a hobbyist and dealing with the
 linux-kernel clique is the opposite of fun. I'm only in this conversation
because I was cc'd.)

You can totally use initramfs for lots of purposes simultaneously.

>>> The only reason why
>>> opening .xattr-list works is that IMA is not yet initialized
>>> (late_initcall vs rootfs_initcall).
>>
>> Launching init before enabling ima is bad because... you didn't think of it?
> 
> No, because /init can potentially compromise the integrity of the
> system.

Which isn't a problem if it was statically linked in the kernel, or if your
external cpio.gz was signed. You want a signed binary but don't want the
signature _in_ the binary...

>>> Allowing a kernel with integrity enforcement to parse the CPIO image
>>> without verifying it first is the weak point.
>>
>> If you don't verify the CPIO image then in theory it could have anything in 
>> it,
>> yes. You seem to believe that signing individual files is more secure than
>> signing the archive. This is certainly a point of view.
> 
> As I wrote above, signing the CPIO image would be more secure, if this
> option is available. However, a disadvantage would be that you have to
> sign the CPIO image every time a file changes.

Which is why there's a cpio in the kernel and an external cpio loaded via the
old initrd mechanism and BOTH files wind up in the cpio and there's a way to
make it O_EXCL so it can't overwrite, and then the /init binary inside the
kernel's cpio can do any other weird verification you need to do before anything
else gets a chance to run so why are you having ring 0 kernel code read a file
out of the filesystem and act upon it?

(Heck, you can mv /newinit /init before the exec /init so the file isn't on the
system anymore by the time the other stuff gets to run...)

>>> However, extracted files
>>> are not used, and before they are used they are verified. At the time
>>> they are verified, they (included /init) must already have a signature
>>> or otherwise access would be denied.
>>
>> You build infrastructure that works a certain way, the rest of the system
>> doesn't fit your assumptions, so you need to change the rest of the system to
>> fit your assumptions.
> 
> Requiring file metadata to make decisions seems reasonable. Also
> mandatory access controls do that. The objective of this patch set is to
> have uniform behavior regardless of the filesystem used.

If it's in the file's contents you get uniform behavior regardless of the
filesystem used. And "mandatory access controls do that" is basically restating
what _I_ said in the paragraph above.

The "infrastructure you have that works a certain way" is called "mandatory
access control

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-13 Thread Rob Landley

On 5/13/19 2:49 AM, Roberto Sassu wrote:
> On 5/12/2019 9:43 PM, Arvind Sankar wrote:
>> On Sun, May 12, 2019 at 05:05:48PM +0000, Rob Landley wrote:
>>> On 5/12/19 7:52 AM, Mimi Zohar wrote:
>>>> On Sun, 2019-05-12 at 11:17 +0200, Dominik Brodowski wrote:
>>>>> On Thu, May 09, 2019 at 01:24:17PM +0200, Roberto Sassu wrote:
>>>>>> This proposal consists in marshaling pathnames and xattrs in a file 
>>>>>> called
>>>>>> .xattr-list. They are unmarshaled by the CPIO parser after all files have
>>>>>> been extracted.
>>>>>
>>>>> Couldn't this parsing of the .xattr-list file and the setting of the 
>>>>> xattrs
>>>>> be done equivalently by the initramfs' /init? Why is kernel involvement
>>>>> actually required here?
>>>>
>>>> It's too late.  The /init itself should be signed and verified.
>>>
>>> If the initramfs cpio.gz image was signed and verified by the extractor, 
>>> how is
>>> the init in it _not_ verified?
>>>
>>> Ro
>>
>> Wouldn't the below work even before enforcing signatures on external
>> initramfs:
>> 1. Create an embedded initramfs with an /init that does the xattr
>> parsing/setting. This will be verified as part of the kernel image
>> signature, so no new code required.
>> 2. Add a config option/boot parameter to panic the kernel if an external
>> initramfs attempts to overwrite anything in the embedded initramfs. This
>> prevents overwriting the embedded /init even if the external initramfs
>> is unverified.
> 
> Unfortunately, it wouldn't work. IMA is already initialized and it would
> verify /init in the embedded initial ram disk.

So you made broken infrastructure that's causing you problems. Sounds 
unfortunate.

> The only reason why
> opening .xattr-list works is that IMA is not yet initialized
> (late_initcall vs rootfs_initcall).

Launching init before enabling ima is bad because... you didn't think of it?

> Allowing a kernel with integrity enforcement to parse the CPIO image
> without verifying it first is the weak point.

If you don't verify the CPIO image then in theory it could have anything in it,
yes. You seem to believe that signing individual files is more secure than
signing the archive. This is certainly a point of view.

> However, extracted files
> are not used, and before they are used they are verified. At the time
> they are verified, they (included /init) must already have a signature
> or otherwise access would be denied.

You build infrastructure that works a certain way, the rest of the system
doesn't fit your assumptions, so you need to change the rest of the system to
fit your assumptions.

> This scheme relies on the ability of the kernel to not be corrupted in
> the event it parses a malformed CPIO image.

I'm unaware of any buffer overruns or wild pointer traversals in the cpio
extraction code. You can fill up all physical memory with initramfs and lock the
system hard, though.

It still only parses them at boot time before launching PID 1, right? So you
have a local physical exploit and you're trying to prevent people from working
around your Xbox copy protection without a mod chip?

> Mimi suggested to use
> digital signatures to prevent this issue, but it cannot be used in all
> scenarios, since conventional systems generate the initial ram disk
> locally.

So you use a proprietary init binary you can't rebuild from source, and put it
in a cpio where /dev/urandom is a file with known contents? Clearly, not
exploitable at all. (And we update the initramfs.cpio but not the kernel because
clearly keeping the kernel up to date is less important to security...)

Whatever happened to https://lwn.net/Articles/532778/ ? Modules are signed
in-band in the file, but you need xattrs for some reason?

> Roberto

Rob

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-12 Thread Rob Landley

On 5/12/19 7:52 AM, Mimi Zohar wrote:
> On Sun, 2019-05-12 at 11:17 +0200, Dominik Brodowski wrote:
>> On Thu, May 09, 2019 at 01:24:17PM +0200, Roberto Sassu wrote:
>>> This proposal consists in marshaling pathnames and xattrs in a file called
>>> .xattr-list. They are unmarshaled by the CPIO parser after all files have
>>> been extracted.
>>
>> Couldn't this parsing of the .xattr-list file and the setting of the xattrs
>> be done equivalently by the initramfs' /init? Why is kernel involvement
>> actually required here?
> 
> It's too late.  The /init itself should be signed and verified.

If the initramfs cpio.gz image was signed and verified by the extractor, how is
the init in it _not_ verified?

Rob

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-11 Thread Rob Landley

On 5/11/19 11:04 PM, Rob Landley wrote:
> P.P.S. Sadly, if you want an actually standardized standard format where
> implementations adhere to the standard: IETF RFC 1991 was published in 1996 
> and

Nope, darn it, checked my notes and that wasn't it. I thought zip had an RFC,
it's just zlib, deflate, and gzip, and that's not the number of any of them.

I still think sticking with a lightly modified cpio makes the most sense,
just... in band signalling that _doesn't_ solve the y2038 problem, the file size
limit, or address sparse files seems kinda silly.

Rob

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-11 Thread Rob Landley

On 5/11/19 5:44 PM, Andy Lutomirski wrote:
> I read some of those emails.  ISTM that adding TAR support should be
> seriously considered.  Sure, it's baroque, but it's very, very well
> supported, and it does exactly what we need.

Which means you now have two parsers supported in parallel forevermore, and are
reversing the design decision initially made when this went in without new info.

Also, I just did a tar implementation for toybox: It took me a month to debug it
(_not_ starting from scratch but from a submission), I only just added sparse
file support (because something in the android build was generating a sparse
file), there are historical tarballs I know it won't extract (I'm just testing
against what the current one produces with the default flags), and I haven't
even started on xattr support yet.

Instead I was experimenting with corner cases like "S records replace the
prefix[] field starting at byte 386 with an offset/length pair array, but
prefix[] starts at 345, do those first 41 bytes still function as a prefix and
is there any circumstance under which existing tar binaries will populate them?
Also, why does every instance of an S record generated by gnu/tar end with a
gratuitous length zero segment?"

"cpio -H newc" is a _much_ simpler format. And posix no longer specifies
_either_ format usefully, hasn't for years. From toybox tar's header comment:

 * For the command, see
 *   http://pubs.opengroup.org/onlinepubs/007908799/xcu/tar.html
 * For the modern file format, see
 *
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92_13_06
 *   https://en.wikipedia.org/wiki/Tar_(computing)#File_format
 *   https://www.gnu.org/software/tar/manual/html_node/Tar-Internals.html

And no, that isn't _enough_ information, you still have to "tar | hd" a lot and
squint. (There's no current spec, it's pieced together from multiple sources
because posix abdicated responsibility for this to Jorg Schilling.)

Rob

P.S. Yes that gnu/dammit page starts with a "this will be deleted" comment which
according to archive.org has been there for over a dozen years.

P.P.S. Sadly, if you want an actually standardized standard format where
implementations adhere to the standard: IETF RFC 1991 was published in 1996 and
remains compatible with files an archivers in service. Or we could stick with
cpio and make minor changes to it, since we have to remain backwards compatible
with it _anyway_

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-10 Thread Rob Landley

On 5/10/19 6:49 AM, Mimi Zohar wrote:
> On Fri, 2019-05-10 at 08:56 +0200, Roberto Sassu wrote:
>> On 5/9/2019 8:34 PM, Rob Landley wrote:
>>> On 5/9/19 6:24 AM, Roberto Sassu wrote:
> 
>>>> The difference with another proposal
>>>> (https://lore.kernel.org/patchwork/cover/888071/) is that xattrs can be
>>>> included in an image without changing the image format, as opposed to
>>>> defining a new one. As seen from the discussion, if a new format has to be
>>>> defined, it should fix the issues of the existing format, which requires
>>>> more time.
>>>
>>> So you've explicitly chosen _not_ to address Y2038 while you're there.
>>
>> Can you be more specific?
> 
> Right, this patch set avoids incrementing the CPIO magic number and
> the resulting changes required (eg. increasing the timestamp field
> size), by including a file with the security xattrs in the CPIO.  In
> either case, including the security xattrs in the initramfs header or
> as a separate file, the initramfs, itself, needs to be signed.

The /init binary in the initramfs runs as root and launches all other processes
on the system. Presumably it can write any xattrs it wants to, and doesn't need
any extra permissions granted to it to do so. But as soon as you start putting
xattrs on _other_ files within the initramfs that are _not_ necessarily running
as PID 1, _that's_ when the need to sign the initramfs comes in?

Presumably the signing occurs on the gzipped file. How does that affect the cpio
parsing _after_ it's decompressed? Why would that be part of _this_ patch?

Rob

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-09 Thread Rob Landley

On 5/9/19 6:24 AM, Roberto Sassu wrote:
> This patch set aims at solving the following use case: appraise files from
> the initial ram disk. To do that, IMA checks the signature/hash from the
> security.ima xattr. Unfortunately, this use case cannot be implemented
> currently, as the CPIO format does not support xattrs.
> 
> This proposal consists in marshaling pathnames and xattrs in a file called
> .xattr-list. They are unmarshaled by the CPIO parser after all files have
> been extracted.

So it's in-band signalling that has a higher peak memory requirement.

> The difference with another proposal
> (https://lore.kernel.org/patchwork/cover/888071/) is that xattrs can be
> included in an image without changing the image format, as opposed to
> defining a new one. As seen from the discussion, if a new format has to be
> defined, it should fix the issues of the existing format, which requires
> more time.

So you've explicitly chosen _not_ to address Y2038 while you're there.

Rob

Re: Commit 594cc251fdd0 (user_access_begin does access_ok) made arch/sh stop booting on qemu.

2019-01-31 Thread Rob Landley

On 1/31/19 2:30 AM, Linus Torvalds wrote:
> See
> 
>  
>   
> https://lore.kernel.org/lkml/CAHk-=wihe4dnhkpe4oghwwmy23jntuufqagwtgcjjxyovyj...@mail.gmail.com/
> 
> for an explanation of the SH bug.
> 
> But Guenter Roeck confirmed that my patch fixed it on SH for him. Is there
> something else going on in your configuration?

That fixed it for me.

Thanks,

Rob

Commit 594cc251fdd0 (user_access_begin does access_ok) made arch/sh stop booting on qemu.

2019-01-30 Thread Rob Landley

That's what I bisected it to, anyway. Commit before that boots to a shell prompt
under qemu-system-sh4 (built from today's git), after produces no console output
(no boot messages, no nothin').

Rob
# make ARCH=sh allnoconfig KCONFIG_ALLCONFIG=sh4.miniconf
# make ARCH=sh -j $(nproc)
# boot arch/sh/boot/zImage


CONFIG_CPU_SUBTYPE_SH7751R=y
CONFIG_MMU=y
CONFIG_MEMORY_START=0x0c00
CONFIG_VSYSCALL=y
CONFIG_SH_FPU=y
CONFIG_SH_RTS7751R2D=y
CONFIG_RTS7751R2D_PLUS=y
CONFIG_SERIAL_SH_SCI=y
CONFIG_SERIAL_SH_SCI_CONSOLE=y

CONFIG_PCI=y
CONFIG_NET_VENDOR_REALTEK=y
CONFIG_8139CP=y

CONFIG_PCI=y
CONFIG_BLK_DEV_SD=y
CONFIG_ATA=y
CONFIG_ATA_SFF=y
CONFIG_ATA_BMDMA=y
CONFIG_PATA_PLATFORM=y

#CONFIG_SPI=y
#CONFIG_SPI_SH_SCI=y
#CONFIG_MFD_SM501=y

#CONFIG_RTC_CLASS=y
#CONFIG_RTC_DRV_R9701=y
#CONFIG_RTC_DRV_SH=y
#CONFIG_RTC_HCTOSYS=y


# CONFIG_EMBEDDED is not set
CONFIG_EARLY_PRINTK=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_RD_GZIP=y

CONFIG_BLK_DEV_LOOP=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_UTF8=y
CONFIG_MISC_FILESYSTEMS=y
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y

CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IPV6=y
CONFIG_NETDEVICES=y
#CONFIG_NET_CORE=y
#CONFIG_NETCONSOLE=y
CONFIG_ETHERNET=y



qemu-sh4.sh
Description: application/shellscript

Re: [PATCH 2/2] sh: generate uapi header and syscall table header files

2019-01-11 Thread Rob Landley

On 1/10/19 5:54 PM, Guenter Roeck wrote:
> On Wed, Jan 02, 2019 at 09:07:25PM +0530, Firoz Khan wrote:
>> Unified system call table generation script must be run to
>> generate unistd_32.h and syscall_table.h files. This patch
>> will have changes which will invokes the script.
>>
>> This patch will generate unistd_32.h and syscall_table.h
>> files by the syscall table generation script invoked by
>> sh/Makefile and the generated files against the removed
>> files must be identical.
>>
>> The generated uapi header file will be included in uapi/-
>> asm/unistd.h and generated system call table header file
>> will be included by kernel/syscall_32.S file.
>>
>> Signed-off-by: Firoz Khan 
> 
> Have you tested this patch ?

I tested it at one point, but not recently. (It was before 4.20 came out...)

Rob

Re: dma_declare_coherent_memory on main memory

2018-12-08 Thread Rob Landley

On 12/7/18 9:34 AM, Christoph Hellwig wrote:
> Hi all,
> 
> the ARM imx27/31 ports and various sh boards use
> dma_declare_coherent_memory on main memory taken from the memblock
> allocator.
> 
> Is there any good reason these couldn't be switched to CMA areas?
> Getting rid of these magic dma_declare_coherent_memory area would
> help making the dma allocator a lot simpler.

Not that I know of?

Rob

Re: [RFC][PATCH] fs: set xattrs in initramfs from regular files

2018-11-26 Thread Rob Landley

On 11/26/18 6:56 AM, Roberto Sassu wrote:
> On 11/23/2018 9:21 PM, Rob Landley wrote:
>>> The aim of this patch is to provide the same functionality without
>>> introducing a new format. The value of xattrs is placed in regular files
>>> having the same file name as the files xattrs are added to, plus a
>>> separator and the xattr name (.xattr-).
>>
>> I think you're solving the wrong problem, but that's just my opinion.
> 
> Instead of iterating over rootfs, would it be better to detect files
> with extended attributes (from the file name) when the cpio image is
> parsed by the kernel,

Huh, I thought at first glance that's what the new approach _was_ doing.

In band signaling in the archive is ugly, still requires new tools to create it,
doesn't address the y2038 issue... (Although we could cheat, treat the time
signature as unsigned, and buy another few decades.)

But doing that in the filesystem _after_ you extract the archive is beyond ugly.

> and call sys_lsetxattr() in do_copy()? This part
> can be turned on by introducing a new type in the existing format (if
> possible).
> 
> The impact of this alternative is very low, and LSMs/IMA would be able,
> with minimum effort, to enforce policies on files in the initial ram
> disk.

The cpio extension isn't a big deal, I was pondering doing it myself in toybox
(and submitting a kernel patch to consume the output) before Mimi approached me.
(I did the initmpfs stuff already, I've stomped around in this area before.) I
just didn't because mimi sent their patch first, so I waited for that to work
its way though.

Unfortunately, it's simple enough that there was a bit of bikeshedding. (You can
store time in milliseconds as a 64 bit number without worrying about the range,
but if you do it as nanoseconds you need two fields, but people spoke up and
said "but if you don't store the nanoseconds the rounding causes spurious time
differences when between filesystems and it confuses our build system about what
has and hasn't changed"...)

The new in-band signaling proposal is, at best, a hack. (Filename lengths are
capped at 255 in the VFS, can you strip the xattrs on a long filename by having
the extension fail to create in the filesystem? Or do you have an arbitrary max
length on filenames because you need to reserve room for the extension?)

Rob

Re: [RFC][PATCH] fs: set xattrs in initramfs from regular files

2018-11-26 Thread Rob Landley

On 11/26/18 6:56 AM, Roberto Sassu wrote:
> On 11/23/2018 9:21 PM, Rob Landley wrote:
>>> The aim of this patch is to provide the same functionality without
>>> introducing a new format. The value of xattrs is placed in regular files
>>> having the same file name as the files xattrs are added to, plus a
>>> separator and the xattr name (.xattr-).
>>
>> I think you're solving the wrong problem, but that's just my opinion.
> 
> Instead of iterating over rootfs, would it be better to detect files
> with extended attributes (from the file name) when the cpio image is
> parsed by the kernel,

Huh, I thought at first glance that's what the new approach _was_ doing.

In band signaling in the archive is ugly, still requires new tools to create it,
doesn't address the y2038 issue... (Although we could cheat, treat the time
signature as unsigned, and buy another few decades.)

But doing that in the filesystem _after_ you extract the archive is beyond ugly.

> and call sys_lsetxattr() in do_copy()? This part
> can be turned on by introducing a new type in the existing format (if
> possible).
> 
> The impact of this alternative is very low, and LSMs/IMA would be able,
> with minimum effort, to enforce policies on files in the initial ram
> disk.

The cpio extension isn't a big deal, I was pondering doing it myself in toybox
(and submitting a kernel patch to consume the output) before Mimi approached me.
(I did the initmpfs stuff already, I've stomped around in this area before.) I
just didn't because mimi sent their patch first, so I waited for that to work
its way though.

Unfortunately, it's simple enough that there was a bit of bikeshedding. (You can
store time in milliseconds as a 64 bit number without worrying about the range,
but if you do it as nanoseconds you need two fields, but people spoke up and
said "but if you don't store the nanoseconds the rounding causes spurious time
differences when between filesystems and it confuses our build system about what
has and hasn't changed"...)

The new in-band signaling proposal is, at best, a hack. (Filename lengths are
capped at 255 in the VFS, can you strip the xattrs on a long filename by having
the extension fail to create in the filesystem? Or do you have an arbitrary max
length on filenames because you need to reserve room for the extension?)

Rob

Re: [RFC][PATCH] fs: set xattrs in initramfs from regular files

2018-11-23 Thread Rob Landley

On 11/22/18 9:49 AM, Roberto Sassu wrote:
> Although rootfs (tmpfs) supports xattrs, they are not set due to the
> limitation of the cpio format. A new format called 'newcx' was proposed to
> overcome this limitation.

I got email about that format the day before you posted this, by the way.

> However, it looks like that adding a new format is not simple: 15 kernel
> patches; user space tools must support the new format; mistakes made in the
> past should be avoided; it is unclear whether the kernel should switch from
> cpio to tar.

The kernel _can't_ switch from cpio to tar without breaking backwards
compatability, it could only add tar as a second format it supported (remember
cpio images can be sideloaded so a new rootfs can be used with an existing
initramfs, plus existing build systems generate them and would still need to if
they wanted to keep supporting older kernels), and then once you've got two
formats somebody will propose zip support, and let's just not go there please.

The changes to the userspace tools are trivial (I say that as the maintainer of
toybox, which has a cpio). The argument was about things like 64 bit timestamps
(y2038 problem), nanosecond support, sparse files, etc. And I think the argument
had largely died down?

Keep in mind the squashfs guy spent 5 years trying to get his filesystem merged
(https://lwn.net/Articles/563578/), I spent several years trying to get my perl
removal patch merged (and only work up the enthusiasm to resubmit
http://lists.busybox.net/pipermail/buildroot/2015-March/123385.html
https://patchwork.kernel.org/patch/9193529/ https://lkml.org/lkml/2017/9/13/651
about once a year because dealing with linux-kernel is just no fun for hobbyists
anymore).

> The aim of this patch is to provide the same functionality without
> introducing a new format. The value of xattrs is placed in regular files
> having the same file name as the files xattrs are added to, plus a
> separator and the xattr name (.xattr-).

I think you're solving the wrong problem, but that's just my opinion.

Rob

Re: [RFC][PATCH] fs: set xattrs in initramfs from regular files

2018-11-23 Thread Rob Landley

On 11/22/18 9:49 AM, Roberto Sassu wrote:
> Although rootfs (tmpfs) supports xattrs, they are not set due to the
> limitation of the cpio format. A new format called 'newcx' was proposed to
> overcome this limitation.

I got email about that format the day before you posted this, by the way.

> However, it looks like that adding a new format is not simple: 15 kernel
> patches; user space tools must support the new format; mistakes made in the
> past should be avoided; it is unclear whether the kernel should switch from
> cpio to tar.

The kernel _can't_ switch from cpio to tar without breaking backwards
compatability, it could only add tar as a second format it supported (remember
cpio images can be sideloaded so a new rootfs can be used with an existing
initramfs, plus existing build systems generate them and would still need to if
they wanted to keep supporting older kernels), and then once you've got two
formats somebody will propose zip support, and let's just not go there please.

The changes to the userspace tools are trivial (I say that as the maintainer of
toybox, which has a cpio). The argument was about things like 64 bit timestamps
(y2038 problem), nanosecond support, sparse files, etc. And I think the argument
had largely died down?

Keep in mind the squashfs guy spent 5 years trying to get his filesystem merged
(https://lwn.net/Articles/563578/), I spent several years trying to get my perl
removal patch merged (and only work up the enthusiasm to resubmit
http://lists.busybox.net/pipermail/buildroot/2015-March/123385.html
https://patchwork.kernel.org/patch/9193529/ https://lkml.org/lkml/2017/9/13/651
about once a year because dealing with linux-kernel is just no fun for hobbyists
anymore).

> The aim of this patch is to provide the same functionality without
> introducing a new format. The value of xattrs is placed in regular files
> having the same file name as the files xattrs are added to, plus a
> separator and the xattr name (.xattr-).

I think you're solving the wrong problem, but that's just my opinion.

Rob

Re: [PATCH v3 0/3] sh: system call table generation support

2018-11-19 Thread Rob Landley

On 11/19/18 2:08 AM, Geert Uytterhoeven wrote:
> On Mon, Nov 19, 2018 at 6:26 AM Rob Landley  wrote:
>> WARNING: CPU: 0 PID: 1 at mm/slub.c:2448 
>> ___slab_alloc.constprop.34+0x196/0x288
> 
> https://patchwork.kernel.org/patch/10549883/

Given that Sato-san is co-maintainer of arch/sh and that was posted in July, why
is it not upstream yet?

Rob

Re: [PATCH v3 0/3] sh: system call table generation support

2018-11-19 Thread Rob Landley

On 11/19/18 2:08 AM, Geert Uytterhoeven wrote:
> On Mon, Nov 19, 2018 at 6:26 AM Rob Landley  wrote:
>> WARNING: CPU: 0 PID: 1 at mm/slub.c:2448 
>> ___slab_alloc.constprop.34+0x196/0x288
> 
> https://patchwork.kernel.org/patch/10549883/

Given that Sato-san is co-maintainer of arch/sh and that was posted in July, why
is it not upstream yet?

Rob

Re: [PATCH v3 0/3] sh: system call table generation support

2018-11-18 Thread Rob Landley

On 11/13/18 10:32 PM, Firoz Khan wrote:
> The purpose of this patch series is, we can easily
> add/modify/delete system call table support by cha-
> nging entry in syscall.tbl file instead of manually
> changing many files. The other goal is to unify the 
> system call table generation support implementation 
> across all the architectures. 

I applied the patch in https://github.com/landley/mkroot and the result booted
under qemu-system-sh4, seems to work fine. Network's fine, it can read a block
device, etc.

Acked-and-or-tested-by: Rob Landley 

I assume that this is just git du jour and not your patch:

WARNING: CPU: 0 PID: 1 at mm/slub.c:2448 ___slab_alloc.constprop.34+0x196/0x288

CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc3 #1
PC is at ___slab_alloc.constprop.34+0x196/0x288
PR is at __slab_alloc.constprop.33+0x2a/0x4c
PC  : 8c09d09a SP  : 8f829ea0 SR  : 400080f0
TEA : c0001240
R0  : 8c09cf04 R1  : 8c01cbec R2  :  R3  : 
R4  : 8f8020a0 R5  : 006080c0 R6  : 8c01d74a R7  : 8fff5180
R8  : 8c011a40 R9  : 8fff5180 R10 : 8f8020a0 R11 : 8000
R12 : 8c01d74a R13 : 006080c0 R14 : 8f80211c
MACH: 008e MACL: 0ae4849d GBR :  PR  : 8c09d1b6

Call trace:
 [<(ptrval)>] arch_local_irq_restore+0x0/0x24
 [<(ptrval)>] __slab_alloc.constprop.33+0x2a/0x4c
 [<(ptrval)>] arch_local_save_flags+0x0/0x8
 [<(ptrval)>] arch_local_irq_restore+0x0/0x24
 [<(ptrval)>] mm_init.isra.6+0xca/0x120
 [<(ptrval)>] kmem_cache_alloc+0x9a/0xf4
 [<(ptrval)>] mm_init.isra.6+0xca/0x120
 [<(ptrval)>] arch_local_irq_restore+0x0/0x24
 [<(ptrval)>] kmem_cache_alloc+0x9a/0xf4
 [<(ptrval)>] mm_alloc+0xe/0x48
 [<(ptrval)>] mm_init.isra.6+0xca/0x120
 [<(ptrval)>] memset+0x0/0x8c
 [<(ptrval)>] __do_execve_file+0x1de/0x574
 [<(ptrval)>] getname_kernel+0x1e/0xc8
 [<(ptrval)>] kmem_cache_alloc+0x0/0xf4
 [<(ptrval)>] do_execve+0x16/0x24
 [<(ptrval)>] arch_local_save_flags+0x0/0x8
 [<(ptrval)>] arch_local_irq_restore+0x0/0x24
 [<(ptrval)>] printk+0x0/0x24
 [<(ptrval)>] kernel_init+0x34/0xec
 [<(ptrval)>] ret_from_kernel_thread+0xc/0x14
 [<(ptrval)>] schedule_tail+0x0/0x58
 [<(ptrval)>] kernel_init+0x0/0xec

---[ end trace 6e84d1e05051e55d ]---

Re: [PATCH v3 0/3] sh: system call table generation support

2018-11-18 Thread Rob Landley

On 11/13/18 10:32 PM, Firoz Khan wrote:
> The purpose of this patch series is, we can easily
> add/modify/delete system call table support by cha-
> nging entry in syscall.tbl file instead of manually
> changing many files. The other goal is to unify the 
> system call table generation support implementation 
> across all the architectures. 

I applied the patch in https://github.com/landley/mkroot and the result booted
under qemu-system-sh4, seems to work fine. Network's fine, it can read a block
device, etc.

Acked-and-or-tested-by: Rob Landley 

I assume that this is just git du jour and not your patch:

WARNING: CPU: 0 PID: 1 at mm/slub.c:2448 ___slab_alloc.constprop.34+0x196/0x288

CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc3 #1
PC is at ___slab_alloc.constprop.34+0x196/0x288
PR is at __slab_alloc.constprop.33+0x2a/0x4c
PC  : 8c09d09a SP  : 8f829ea0 SR  : 400080f0
TEA : c0001240
R0  : 8c09cf04 R1  : 8c01cbec R2  :  R3  : 
R4  : 8f8020a0 R5  : 006080c0 R6  : 8c01d74a R7  : 8fff5180
R8  : 8c011a40 R9  : 8fff5180 R10 : 8f8020a0 R11 : 8000
R12 : 8c01d74a R13 : 006080c0 R14 : 8f80211c
MACH: 008e MACL: 0ae4849d GBR :  PR  : 8c09d1b6

Call trace:
 [<(ptrval)>] arch_local_irq_restore+0x0/0x24
 [<(ptrval)>] __slab_alloc.constprop.33+0x2a/0x4c
 [<(ptrval)>] arch_local_save_flags+0x0/0x8
 [<(ptrval)>] arch_local_irq_restore+0x0/0x24
 [<(ptrval)>] mm_init.isra.6+0xca/0x120
 [<(ptrval)>] kmem_cache_alloc+0x9a/0xf4
 [<(ptrval)>] mm_init.isra.6+0xca/0x120
 [<(ptrval)>] arch_local_irq_restore+0x0/0x24
 [<(ptrval)>] kmem_cache_alloc+0x9a/0xf4
 [<(ptrval)>] mm_alloc+0xe/0x48
 [<(ptrval)>] mm_init.isra.6+0xca/0x120
 [<(ptrval)>] memset+0x0/0x8c
 [<(ptrval)>] __do_execve_file+0x1de/0x574
 [<(ptrval)>] getname_kernel+0x1e/0xc8
 [<(ptrval)>] kmem_cache_alloc+0x0/0xf4
 [<(ptrval)>] do_execve+0x16/0x24
 [<(ptrval)>] arch_local_save_flags+0x0/0x8
 [<(ptrval)>] arch_local_irq_restore+0x0/0x24
 [<(ptrval)>] printk+0x0/0x24
 [<(ptrval)>] kernel_init+0x34/0xec
 [<(ptrval)>] ret_from_kernel_thread+0xc/0x14
 [<(ptrval)>] schedule_tail+0x0/0x58
 [<(ptrval)>] kernel_init+0x0/0xec

---[ end trace 6e84d1e05051e55d ]---

The CONFIG_UNWINDER_ORC bug is back.

2018-11-04 Thread Rob Landley

Do "make defconfig" on x86-64, fire up menuconfig and select the frame pointer
uwinder (under kernel hacking -> choose unwinder) and:

$ make
Makefile:966: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.

Confirming:

$ grep UNWINDER .config
# CONFIG_UNWINDER_ORC is not set
CONFIG_UNWINDER_FRAME_POINTER=y

Rob

The CONFIG_UNWINDER_ORC bug is back.

2018-11-04 Thread Rob Landley

Do "make defconfig" on x86-64, fire up menuconfig and select the frame pointer
uwinder (under kernel hacking -> choose unwinder) and:

$ make
Makefile:966: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.

Confirming:

$ grep UNWINDER .config
# CONFIG_UNWINDER_ORC is not set
CONFIG_UNWINDER_FRAME_POINTER=y

Rob

devinet.c inet_abc_len() breaking ifconfig?

2018-10-15 Thread Rob Landley

Dave Taht pointed out to me that this doesn't work in toybox:

  $ ifconfig eth0 242.2.0.1 netmask 255.255.255.0 broadcast 242.2.0.255
  ifconfig: ioctl 8916: Invalid argument

Because of this in the kernel:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/devinet.c#n940

What is the inet_abc_len() function _for_? When's the last time "class A, Class
B, and Class C" IPv4 addresses actually _existed_ as a thing? (Somewhere around
1995?)

He suggested I switch to the netlink API, which doesn't enforce this, but... why
is the kernel enforcing it for one api but not the other?

Rob

devinet.c inet_abc_len() breaking ifconfig?

2018-10-15 Thread Rob Landley

Dave Taht pointed out to me that this doesn't work in toybox:

  $ ifconfig eth0 242.2.0.1 netmask 255.255.255.0 broadcast 242.2.0.255
  ifconfig: ioctl 8916: Invalid argument

Because of this in the kernel:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/devinet.c#n940

What is the inet_abc_len() function _for_? When's the last time "class A, Class
B, and Class C" IPv4 addresses actually _existed_ as a thing? (Somewhere around
1995?)

He suggested I switch to the netlink API, which doesn't enforce this, but... why
is the kernel enforcing it for one api but not the other?

Rob

Re: [RESEND PATCH] init/Kconfig: Use short unix-style option instead of --longname

2018-08-08 Thread Rob Landley

On 08/07/2018 11:06 PM, Masahiro Yamada wrote:
> From: Rob Landley 
> 
> Avoids warning messages with the latest release of toybox, which never
> bothered to implement the --longopts nothing was using.
> 
> Signed-off-by: Rob Landley 
> Signed-off-by: Masahiro Yamada 
> ---
> 
> This was sent to the trivial ML some time ago,
> but not applied yet.
> 
> I will apply this to kbuild tree for v4.19

Thank you.

(I have a pile of small kernel fixes but haven't had time+energy to follow up on
them recently.)

Rob

Re: [RESEND PATCH] init/Kconfig: Use short unix-style option instead of --longname

2018-08-08 Thread Rob Landley

On 08/07/2018 11:06 PM, Masahiro Yamada wrote:
> From: Rob Landley 
> 
> Avoids warning messages with the latest release of toybox, which never
> bothered to implement the --longopts nothing was using.
> 
> Signed-off-by: Rob Landley 
> Signed-off-by: Masahiro Yamada 
> ---
> 
> This was sent to the trivial ML some time ago,
> but not applied yet.
> 
> I will apply this to kbuild tree for v4.19

Thank you.

(I have a pile of small kernel fixes but haven't had time+energy to follow up on
them recently.)

Rob

Re: [PATCH] Fix platform data in leds-pca955x.c

2018-07-04 Thread Rob Landley

On 07/04/2018 12:04 PM, Andy Shevchenko wrote:
> On Wed, Jul 4, 2018 at 8:00 PM, Andy Shevchenko
>  wrote:
>> On Wed, Jul 4, 2018 at 3:46 AM, Rob Landley  wrote:
> 
>> For now, you can switch to unified device properties API (basically
>> un-ifdef pca955x_pdata_of_init() and replacing of_* by device_* or
>> fwnode_* compatible calls) and providing a static table of built-in
>> device properties in the platform code in question.
>> (see include/linux/property.h, for example users of
>> PROPERTY_ENTRY_U*() macros, like arch/arm/mach-pxa/raumfeld.c)
> 
> Taking into consideration that device is enumerated by i2c core, which
> is being aware of device properties (1), better example might be
> drivers/platform/x86/intel_cht_int33fe.c

This file doesn't include the word "LED".

  $ grep -i led drivers/platform/x86/intel_cht_int33fe.c
  $

Examining it... this is an ACPI driver, Intel's Not-Invented-Here proprietary
device tree.

So I should convert an sh7760 board to ACPI? How would this fix the problem
where the driver's probe function expects a structure as input that is locally
defined, instead of the generic structure from linux/leds.h it used to accept?

If we feed the probe function NULL platform data _and_ don't have device tree
enabled, doesn't it error out?

Rob

Re: [PATCH] Fix platform data in leds-pca955x.c

2018-07-04 Thread Rob Landley

On 07/04/2018 12:04 PM, Andy Shevchenko wrote:
> On Wed, Jul 4, 2018 at 8:00 PM, Andy Shevchenko
>  wrote:
>> On Wed, Jul 4, 2018 at 3:46 AM, Rob Landley  wrote:
> 
>> For now, you can switch to unified device properties API (basically
>> un-ifdef pca955x_pdata_of_init() and replacing of_* by device_* or
>> fwnode_* compatible calls) and providing a static table of built-in
>> device properties in the platform code in question.
>> (see include/linux/property.h, for example users of
>> PROPERTY_ENTRY_U*() macros, like arch/arm/mach-pxa/raumfeld.c)
> 
> Taking into consideration that device is enumerated by i2c core, which
> is being aware of device properties (1), better example might be
> drivers/platform/x86/intel_cht_int33fe.c

This file doesn't include the word "LED".

  $ grep -i led drivers/platform/x86/intel_cht_int33fe.c
  $

Examining it... this is an ACPI driver, Intel's Not-Invented-Here proprietary
device tree.

So I should convert an sh7760 board to ACPI? How would this fix the problem
where the driver's probe function expects a structure as input that is locally
defined, instead of the generic structure from linux/leds.h it used to accept?

If we feed the probe function NULL platform data _and_ don't have device tree
enabled, doesn't it error out?

Rob

Re: [PATCH] Fix platform data in leds-pca955x.c

2018-07-04 Thread Rob Landley




On 07/04/2018 12:00 PM, Andy Shevchenko wrote:
> On Wed, Jul 4, 2018 at 3:46 AM, Rob Landley  wrote:
>> I have some questions about recent changes to leds-pca955x.c since 4.13:
>>
>> How is non-of platform data supposed to work now? Commit ed1f4b9676a8 
>> switched
>> struct led_platform_data *pdata in the _probe() function to a locally defined
>> structure that platform data can't provide because it's not in any header it
>> can #include.
>>
>> This is disguised by dev_get_platdata() returning a void * so changing the 
>> type
>> of pdata the returned pointer is assigned to didn't require a new typecast,
>> instead existing board definitions still compile but quietly break at 
>> runtime.
>> (Specifically the SH7760 board I use at work broke in the pdata->num_leds !=
>> chip->bits sanity check, and then userpace sees an empty /sys/class/leds and 
>> I
>> started start digging because "huh"?)
>>
>> Why did the type change, anyway? The generic led_platform_data it was
>> using before has all the fields the device tree's actually initializing, at
>> least if you use flags for the new gpio stuff.
>>
>> Commit 561099a1a2e9 added CONFIG_LEDS_PCA955X_GPIO, but the initialization
>> code adds gpio logic outside this config symbol: probe only calls
>> devm_led_classdev_register() within a case statement that depends on setting 
>> the
>> right "this is not GPIO" value.
>>
>> The "GPIO" indicator could have been a flag in the existing LED structure's
>> flags field, but instead of a bit it's #defining three symbols. The
>> PCA955X_TYPE_* macros with the new type constants only exist in the device 
>> tree
>> header. Strangely, the old default "this is an LED" value isn't zero, zero is
>> PCA955X_TYPE_NONE which is unused (never set anywhere in the tree), and would
>> cause the LED to be skipped: you have to set a field platform data can't
>> access, using a macro platform data probably doesn't have, in order for
>> devm_led_classdev_register() to get called on that LED at all. Why?
>>
>> This is especially odd since if you did want to skip an LED, there was 
>> already a
>> way to indicate that: by giving it an empty string as a name. (It doesn't 
>> seem
>> to have come up, but it's the obvious way to do it.) Except commit 
>> 390c97dc6e34
>> deals with that by writing the index number as a string to the platform data
>> struct. Leaving aside "why did you do that?", isn't the platform data 
>> supposed to
>> be in a read only section when it's actual platform data? And since the probe
>> function then immediately copies the data into the another structure, can't 
>> we
>> just fill out the other one directly without overwriting our arguments?
>>
>> As for the lifetime rules, the local pca955x_led (writeable copy initialized 
>> from
>> the read-only platform data) had the name[] array locally living in the
>> struct, but the device tree commits added char *default_trigger pointing to
>> external memory. Is there a reason this is now inconsistent?
>>
>> Here's the patch I whipped up at work today (applied to v4.14) that undid 
>> enough
>> of this to make the driver work again with platform data on the board we 
>> ship:
> 
> No platform data, please.

Why?

Platform data is what this was using before the device tree migration broke it,
without regression testing any existing boards using the driver. I just put it
back to working with the existing structures defined in the existing board file,
which is as straightforward "undoing the regression" as I can.

I'm happy to migrate the whole thing to device tree, but that's bigger than
fixing an LED driver regression, and too big a change for this product release.

> So, we have two options here:
> - use Unified Device Properties API
> - introduce something like LED_LOOKUP_TABLE (see analogue with GPIO /
> regulator / PWM)
> 
> Looking at the platform data LED framework provides I don't see for
> now a necessity of creating lookup tables (though it might be better
> choice in long term).

I... don't see that necessity either?

The data structure the driver needed in 4.13 contained all the information
needed to run the device. The new data structure this driver created locally in
the C file had no obvious reason to exist, and didn't have visiblity outside the
driver file despite being the new input format the driver expected. How was that
thought through?

The new OF probe is allocating a temporary structure to copy data into from the
fdt, then feeding the intermediate structure to a probe function that

Re: [PATCH] Fix platform data in leds-pca955x.c

2018-07-04 Thread Rob Landley




On 07/04/2018 12:00 PM, Andy Shevchenko wrote:
> On Wed, Jul 4, 2018 at 3:46 AM, Rob Landley  wrote:
>> I have some questions about recent changes to leds-pca955x.c since 4.13:
>>
>> How is non-of platform data supposed to work now? Commit ed1f4b9676a8 
>> switched
>> struct led_platform_data *pdata in the _probe() function to a locally defined
>> structure that platform data can't provide because it's not in any header it
>> can #include.
>>
>> This is disguised by dev_get_platdata() returning a void * so changing the 
>> type
>> of pdata the returned pointer is assigned to didn't require a new typecast,
>> instead existing board definitions still compile but quietly break at 
>> runtime.
>> (Specifically the SH7760 board I use at work broke in the pdata->num_leds !=
>> chip->bits sanity check, and then userpace sees an empty /sys/class/leds and 
>> I
>> started start digging because "huh"?)
>>
>> Why did the type change, anyway? The generic led_platform_data it was
>> using before has all the fields the device tree's actually initializing, at
>> least if you use flags for the new gpio stuff.
>>
>> Commit 561099a1a2e9 added CONFIG_LEDS_PCA955X_GPIO, but the initialization
>> code adds gpio logic outside this config symbol: probe only calls
>> devm_led_classdev_register() within a case statement that depends on setting 
>> the
>> right "this is not GPIO" value.
>>
>> The "GPIO" indicator could have been a flag in the existing LED structure's
>> flags field, but instead of a bit it's #defining three symbols. The
>> PCA955X_TYPE_* macros with the new type constants only exist in the device 
>> tree
>> header. Strangely, the old default "this is an LED" value isn't zero, zero is
>> PCA955X_TYPE_NONE which is unused (never set anywhere in the tree), and would
>> cause the LED to be skipped: you have to set a field platform data can't
>> access, using a macro platform data probably doesn't have, in order for
>> devm_led_classdev_register() to get called on that LED at all. Why?
>>
>> This is especially odd since if you did want to skip an LED, there was 
>> already a
>> way to indicate that: by giving it an empty string as a name. (It doesn't 
>> seem
>> to have come up, but it's the obvious way to do it.) Except commit 
>> 390c97dc6e34
>> deals with that by writing the index number as a string to the platform data
>> struct. Leaving aside "why did you do that?", isn't the platform data 
>> supposed to
>> be in a read only section when it's actual platform data? And since the probe
>> function then immediately copies the data into the another structure, can't 
>> we
>> just fill out the other one directly without overwriting our arguments?
>>
>> As for the lifetime rules, the local pca955x_led (writeable copy initialized 
>> from
>> the read-only platform data) had the name[] array locally living in the
>> struct, but the device tree commits added char *default_trigger pointing to
>> external memory. Is there a reason this is now inconsistent?
>>
>> Here's the patch I whipped up at work today (applied to v4.14) that undid 
>> enough
>> of this to make the driver work again with platform data on the board we 
>> ship:
> 
> No platform data, please.

Why?

Platform data is what this was using before the device tree migration broke it,
without regression testing any existing boards using the driver. I just put it
back to working with the existing structures defined in the existing board file,
which is as straightforward "undoing the regression" as I can.

I'm happy to migrate the whole thing to device tree, but that's bigger than
fixing an LED driver regression, and too big a change for this product release.

> So, we have two options here:
> - use Unified Device Properties API
> - introduce something like LED_LOOKUP_TABLE (see analogue with GPIO /
> regulator / PWM)
> 
> Looking at the platform data LED framework provides I don't see for
> now a necessity of creating lookup tables (though it might be better
> choice in long term).

I... don't see that necessity either?

The data structure the driver needed in 4.13 contained all the information
needed to run the device. The new data structure this driver created locally in
the C file had no obvious reason to exist, and didn't have visiblity outside the
driver file despite being the new input format the driver expected. How was that
thought through?

The new OF probe is allocating a temporary structure to copy data into from the
fdt, then feeding the intermediate structure to a probe function that

[PATCH] Fix platform data in leds-pca955x.c

2018-07-03 Thread Rob Landley

I have some questions about recent changes to leds-pca955x.c since 4.13:

How is non-of platform data supposed to work now? Commit ed1f4b9676a8 switched
struct led_platform_data *pdata in the _probe() function to a locally defined
structure that platform data can't provide because it's not in any header it
can #include.

This is disguised by dev_get_platdata() returning a void * so changing the type
of pdata the returned pointer is assigned to didn't require a new typecast,
instead existing board definitions still compile but quietly break at runtime.
(Specifically the SH7760 board I use at work broke in the pdata->num_leds !=
chip->bits sanity check, and then userpace sees an empty /sys/class/leds and I
started start digging because "huh"?)

Why did the type change, anyway? The generic led_platform_data it was
using before has all the fields the device tree's actually initializing, at
least if you use flags for the new gpio stuff.

Commit 561099a1a2e9 added CONFIG_LEDS_PCA955X_GPIO, but the initialization
code adds gpio logic outside this config symbol: probe only calls
devm_led_classdev_register() within a case statement that depends on setting the
right "this is not GPIO" value.

The "GPIO" indicator could have been a flag in the existing LED structure's
flags field, but instead of a bit it's #defining three symbols. The
PCA955X_TYPE_* macros with the new type constants only exist in the device tree
header. Strangely, the old default "this is an LED" value isn't zero, zero is
PCA955X_TYPE_NONE which is unused (never set anywhere in the tree), and would
cause the LED to be skipped: you have to set a field platform data can't
access, using a macro platform data probably doesn't have, in order for
devm_led_classdev_register() to get called on that LED at all. Why?

This is especially odd since if you did want to skip an LED, there was already a
way to indicate that: by giving it an empty string as a name. (It doesn't seem
to have come up, but it's the obvious way to do it.) Except commit 390c97dc6e34
deals with that by writing the index number as a string to the platform data
struct. Leaving aside "why did you do that?", isn't the platform data supposed 
to
be in a read only section when it's actual platform data? And since the probe
function then immediately copies the data into the another structure, can't we
just fill out the other one directly without overwriting our arguments?

As for the lifetime rules, the local pca955x_led (writeable copy initialized 
from
the read-only platform data) had the name[] array locally living in the
struct, but the device tree commits added char *default_trigger pointing to
external memory. Is there a reason this is now inconsistent?

Here's the patch I whipped up at work today (applied to v4.14) that undid enough
of this to make the driver work again with platform data on the board we ship:


From: Rob Landley 

The LED driver changes that went into 4.14 to add device tree support broke
non-device-tree support.

Signed-off-by: Rob Landley 

 leds-pca955x.c |   46 +++---
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/drivers/leds/leds-pca955x.c b/drivers/leds/leds-pca955x.c
index 9057291..c1df4f1 100644
--- a/drivers/leds/leds-pca955x.c
+++ b/drivers/leds/leds-pca955x.c
@@ -134,11 +134,6 @@ struct pca955x_led {
const char  *default_trigger;
 };
 
-struct pca955x_platform_data {
-   struct pca955x_led  *leds;
-   int num_leds;
-};
-
 /* 8 bits per input register */
 static inline int pca95xx_num_input_regs(int bits)
 {
@@ -367,24 +362,25 @@ static int pca955x_gpio_direction_output(struct gpio_chip 
*gc,
 #endif /* CONFIG_LEDS_PCA955X_GPIO */
 
 #if IS_ENABLED(CONFIG_OF)
-static struct pca955x_platform_data *
+static struct led_platform_data *
 pca955x_pdata_of_init(struct i2c_client *client, struct pca955x_chipdef *chip)
 {
struct device_node *np = client->dev.of_node;
struct device_node *child;
-   struct pca955x_platform_data *pdata;
+   struct led_platform_data *pdata;
int count;
 
count = of_get_child_count(np);
if (!count || count > chip->bits)
return ERR_PTR(-ENODEV);
 
+   /* Never freed, can be called multiple times with insmod/rmmod */
pdata = devm_kzalloc(>dev, sizeof(*pdata), GFP_KERNEL);
if (!pdata)
return ERR_PTR(-ENOMEM);
 
pdata->leds = devm_kzalloc(>dev,
-  sizeof(struct pca955x_led) * chip->bits,
+  sizeof(struct led_platform_dat) * chip->bits,
   GFP_KERNEL);
if (!pdata->leds)
return ERR_PTR(-ENOMEM);
@@ -401,11 +397,10 @@ pca955x_pdata_of_init(struct i2c_client *client, struct 
pca955x_chipdef *chip)
if (of

[PATCH] Fix platform data in leds-pca955x.c

2018-07-03 Thread Rob Landley

I have some questions about recent changes to leds-pca955x.c since 4.13:

How is non-of platform data supposed to work now? Commit ed1f4b9676a8 switched
struct led_platform_data *pdata in the _probe() function to a locally defined
structure that platform data can't provide because it's not in any header it
can #include.

This is disguised by dev_get_platdata() returning a void * so changing the type
of pdata the returned pointer is assigned to didn't require a new typecast,
instead existing board definitions still compile but quietly break at runtime.
(Specifically the SH7760 board I use at work broke in the pdata->num_leds !=
chip->bits sanity check, and then userpace sees an empty /sys/class/leds and I
started start digging because "huh"?)

Why did the type change, anyway? The generic led_platform_data it was
using before has all the fields the device tree's actually initializing, at
least if you use flags for the new gpio stuff.

Commit 561099a1a2e9 added CONFIG_LEDS_PCA955X_GPIO, but the initialization
code adds gpio logic outside this config symbol: probe only calls
devm_led_classdev_register() within a case statement that depends on setting the
right "this is not GPIO" value.

The "GPIO" indicator could have been a flag in the existing LED structure's
flags field, but instead of a bit it's #defining three symbols. The
PCA955X_TYPE_* macros with the new type constants only exist in the device tree
header. Strangely, the old default "this is an LED" value isn't zero, zero is
PCA955X_TYPE_NONE which is unused (never set anywhere in the tree), and would
cause the LED to be skipped: you have to set a field platform data can't
access, using a macro platform data probably doesn't have, in order for
devm_led_classdev_register() to get called on that LED at all. Why?

This is especially odd since if you did want to skip an LED, there was already a
way to indicate that: by giving it an empty string as a name. (It doesn't seem
to have come up, but it's the obvious way to do it.) Except commit 390c97dc6e34
deals with that by writing the index number as a string to the platform data
struct. Leaving aside "why did you do that?", isn't the platform data supposed 
to
be in a read only section when it's actual platform data? And since the probe
function then immediately copies the data into the another structure, can't we
just fill out the other one directly without overwriting our arguments?

As for the lifetime rules, the local pca955x_led (writeable copy initialized 
from
the read-only platform data) had the name[] array locally living in the
struct, but the device tree commits added char *default_trigger pointing to
external memory. Is there a reason this is now inconsistent?

Here's the patch I whipped up at work today (applied to v4.14) that undid enough
of this to make the driver work again with platform data on the board we ship:


From: Rob Landley 

The LED driver changes that went into 4.14 to add device tree support broke
non-device-tree support.

Signed-off-by: Rob Landley 

 leds-pca955x.c |   46 +++---
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/drivers/leds/leds-pca955x.c b/drivers/leds/leds-pca955x.c
index 9057291..c1df4f1 100644
--- a/drivers/leds/leds-pca955x.c
+++ b/drivers/leds/leds-pca955x.c
@@ -134,11 +134,6 @@ struct pca955x_led {
const char  *default_trigger;
 };
 
-struct pca955x_platform_data {
-   struct pca955x_led  *leds;
-   int num_leds;
-};
-
 /* 8 bits per input register */
 static inline int pca95xx_num_input_regs(int bits)
 {
@@ -367,24 +362,25 @@ static int pca955x_gpio_direction_output(struct gpio_chip 
*gc,
 #endif /* CONFIG_LEDS_PCA955X_GPIO */
 
 #if IS_ENABLED(CONFIG_OF)
-static struct pca955x_platform_data *
+static struct led_platform_data *
 pca955x_pdata_of_init(struct i2c_client *client, struct pca955x_chipdef *chip)
 {
struct device_node *np = client->dev.of_node;
struct device_node *child;
-   struct pca955x_platform_data *pdata;
+   struct led_platform_data *pdata;
int count;
 
count = of_get_child_count(np);
if (!count || count > chip->bits)
return ERR_PTR(-ENODEV);
 
+   /* Never freed, can be called multiple times with insmod/rmmod */
pdata = devm_kzalloc(>dev, sizeof(*pdata), GFP_KERNEL);
if (!pdata)
return ERR_PTR(-ENOMEM);
 
pdata->leds = devm_kzalloc(>dev,
-  sizeof(struct pca955x_led) * chip->bits,
+  sizeof(struct led_platform_dat) * chip->bits,
   GFP_KERNEL);
if (!pdata->leds)
return ERR_PTR(-ENOMEM);
@@ -401,11 +397,10 @@ pca955x_pdata_of_init(struct i2c_client *client, struct 
pca955x_chipdef *chip)
if (of

Re: [PATCH] kbuild: add machine size to CHEKCFLAGS

2018-05-30 Thread Rob Landley

On 05/30/2018 05:00 PM, Andreas Färber wrote:
> What about the architectures not touched by your patch that previously
> had no -m32/-m64? (arc, c6x, h8300, hexagon, m68k, microblaze, nds32,
> nios2, openrisc, powerpc, riscv, s390, sh, unicore32, xtensa)
> 
> You forgot to CC them on this patch.

A) He cc'd arch/sh on the previous patch earlier today, to which I replied:

https://marc.info/?l=linux-sh=152769132515226=2

B) Every change to common infrastructure should cc: every arch? Really? So like
filesystem changes and stuff to?

> Have you really checked that all their toolchains support the -m32/-m64
> flags you newly introduce for them? Apart from non-biarch architectures,
> I'm thinking of 31-bit s390 as a corner case where !64 != 32.

1) Last I heard Linux implements lp64:
   http://www.unix.org/whitepapers/64bit.html

2) it's unlikely to be worse than it was before the patch,

3) last I checked https://github.com/landley/mkroot boots to an s390 shell
prompt under qemu, although I haven't tried building with this patch. (And you
may still need to add HOST_EXTRA='lex yacc bison flex' to the command line
unless they've re-added the _shipped versions like the old kconfig had...) Point
is, shouldn't be too hard to test it. Presumably that's why we have an -rc1 and
then 6 more -rc versions each release...

Rob

Re: [PATCH] kbuild: add machine size to CHEKCFLAGS

2018-05-30 Thread Rob Landley

On 05/30/2018 05:00 PM, Andreas Färber wrote:
> What about the architectures not touched by your patch that previously
> had no -m32/-m64? (arc, c6x, h8300, hexagon, m68k, microblaze, nds32,
> nios2, openrisc, powerpc, riscv, s390, sh, unicore32, xtensa)
> 
> You forgot to CC them on this patch.

A) He cc'd arch/sh on the previous patch earlier today, to which I replied:

https://marc.info/?l=linux-sh=152769132515226=2

B) Every change to common infrastructure should cc: every arch? Really? So like
filesystem changes and stuff to?

> Have you really checked that all their toolchains support the -m32/-m64
> flags you newly introduce for them? Apart from non-biarch architectures,
> I'm thinking of 31-bit s390 as a corner case where !64 != 32.

1) Last I heard Linux implements lp64:
   http://www.unix.org/whitepapers/64bit.html

2) it's unlikely to be worse than it was before the patch,

3) last I checked https://github.com/landley/mkroot boots to an s390 shell
prompt under qemu, although I haven't tried building with this patch. (And you
may still need to add HOST_EXTRA='lex yacc bison flex' to the command line
unless they've re-added the _shipped versions like the old kconfig had...) Point
is, shouldn't be too hard to test it. Presumably that's why we have an -rc1 and
then 6 more -rc versions each release...

Rob

Re: [PATCH] sh: pass machine size to sparse

2018-05-30 Thread Rob Landley

On 05/28/2018 11:40 AM, Luc Van Oostenryck wrote:
> By default, sparse assumes a 64bit machine when compiled on x86-64
> and 32bit when compiled on anything else.
> 
> This can of course create all sort of problems, like issuing false
> warnings like: 'shift too big (32) for type unsigned long', or
> worse, to not emit legitimate warnings.
> 
> Fix this by passing to sparse the appropriate -m32/-m64 flag

$ ${CROSS_COMPILE}gcc -E -dM - < /dev/null | grep __SIZEOF_LONG__
#define __SIZEOF_LONG__ 8

You can ask the compiler, you don't need to redundantly add this to every
architecture's Makefile.

Rob

Re: [PATCH] sh: pass machine size to sparse

2018-05-30 Thread Rob Landley

On 05/28/2018 11:40 AM, Luc Van Oostenryck wrote:
> By default, sparse assumes a 64bit machine when compiled on x86-64
> and 32bit when compiled on anything else.
> 
> This can of course create all sort of problems, like issuing false
> warnings like: 'shift too big (32) for type unsigned long', or
> worse, to not emit legitimate warnings.
> 
> Fix this by passing to sparse the appropriate -m32/-m64 flag

$ ${CROSS_COMPILE}gcc -E -dM - < /dev/null | grep __SIZEOF_LONG__
#define __SIZEOF_LONG__ 8

You can ask the compiler, you don't need to redundantly add this to every
architecture's Makefile.

Rob

Re: [J-core] [PATCH v5 00/22] sh: LANDISK and R2Dplus convert to device tree

2018-05-07 Thread Rob Landley

On 05/07/2018 10:55 AM, Rich Felker wrote:
> On Mon, May 07, 2018 at 10:28:37AM -0500, Rob Landley wrote:
>>
>>
>> On 05/07/2018 09:45 AM, Rich Felker wrote:
>> (You can usually configure/build uboot in a couple different ways, with a
>> brain-dead built in shell or with busybox hush glued into it. Depends on how 
>> big
>> you want the image to be. Not sure how much of that is upstream and how much 
>> is
>> vendor forks I've used, though. Been a while.)
> 
> This sounds like a pain,

Well yeah. It's u-boot.

> but none of it seems relevant to the setup
> we're using. This U-Boot variant does not install on flash or use

That's the full dance for getting it installed on a board it's not already
running on. Usually somebody else sets it up and you inherit one with a "tftp
download" script and another "boot from persistent storage" script and you
mostly just the command line to swap the autoboot variable to point to the right
one of the two.

> flash; it runs from disk in place of LILO or another MBR-based
> bootloader. I'm just trying to understand where/how the binary blobs
> are installed on the disk so I can reproduce that when making new disk
> images with my kernel and filesystem.

The point of the tftp boot is quick reboot cycles during development, not having
to install the kernel you're booting on target each time. But as long as you're
not replacing u-boot and have a u-boot console you can fall back on an alternate
kernel from disk. (It's not really designed to give you a menu though, it gives
you a command line. You can have the kernel name to load in its own variable and
"set kernelname 'walrus.img'; run hdboot" though.)

There are probably more elegant ways to use this tool. I learned how to hammer
it in and get the lid off, and went on to other things...

> Rich

Rob

Re: [J-core] [PATCH v5 00/22] sh: LANDISK and R2Dplus convert to device tree

2018-05-07 Thread Rob Landley

On 05/07/2018 10:55 AM, Rich Felker wrote:
> On Mon, May 07, 2018 at 10:28:37AM -0500, Rob Landley wrote:
>>
>>
>> On 05/07/2018 09:45 AM, Rich Felker wrote:
>> (You can usually configure/build uboot in a couple different ways, with a
>> brain-dead built in shell or with busybox hush glued into it. Depends on how 
>> big
>> you want the image to be. Not sure how much of that is upstream and how much 
>> is
>> vendor forks I've used, though. Been a while.)
> 
> This sounds like a pain,

Well yeah. It's u-boot.

> but none of it seems relevant to the setup
> we're using. This U-Boot variant does not install on flash or use

That's the full dance for getting it installed on a board it's not already
running on. Usually somebody else sets it up and you inherit one with a "tftp
download" script and another "boot from persistent storage" script and you
mostly just the command line to swap the autoboot variable to point to the right
one of the two.

> flash; it runs from disk in place of LILO or another MBR-based
> bootloader. I'm just trying to understand where/how the binary blobs
> are installed on the disk so I can reproduce that when making new disk
> images with my kernel and filesystem.

The point of the tftp boot is quick reboot cycles during development, not having
to install the kernel you're booting on target each time. But as long as you're
not replacing u-boot and have a u-boot console you can fall back on an alternate
kernel from disk. (It's not really designed to give you a menu though, it gives
you a command line. You can have the kernel name to load in its own variable and
"set kernelname 'walrus.img'; run hdboot" though.)

There are probably more elegant ways to use this tool. I learned how to hammer
it in and get the lid off, and went on to other things...

> Rich

Rob

Re: [J-core] [PATCH v5 00/22] sh: LANDISK and R2Dplus convert to device tree

2018-05-07 Thread Rob Landley

On 05/07/2018 09:45 AM, Rich Felker wrote:
> On Mon, May 07, 2018 at 01:00:17PM +0200, John Paul Adrian Glaubitz wrote:
>> On 05/07/2018 03:40 AM, Yoshinori Sato wrote:
 @Yoshinori:

 Did the HDL-160U LANDISK device you have use u-boot by default or
 did you convert it from lilo?
>>>
>>> Yes.
>>> Replace sh-lilo's second stage with u-boot.
>>> With this method it is unnecessary to rewrite Flash for boot.
>>
>> Great, thank you. I will give it a try on my USL-5P and write down
>> the individual steps once I have figured it out.
> 
> Please let me know once you figure it out. I haven't made much
> progress yet and it would be really helpful to have some simple
> directions/outline of what to do, especially one that's not in terms
> of black box tools ("run this command") but how it all works (where
> the different bootloader components live when installed -- MBRs?
> partition boot records? files in a filesystem (who interprets it?)?
> etc.)

U-boot 101. The workflow you want is usually:

1) get u-boot to load and run on the board, with serial console and a basic
knowledge of where the DRAM is. (This often involves fighting with dram refresh
init, often by convincing u-boot NOT to do it because your stage 1 bootloader
already did, which involves a rolled up newspaper and a lot of swearing because
it ASSUMES. Oh it assumes. Or sometimes there's an sram->dram relocation which
means somewhere, there's a magic linker script you will learn to hate. Well,
Rich might be comfortable with that area, I still stub my toes there a lot.)

2) Getting u-boot reading/writing a flash area it can store its environment
variables in, so they can persist. (It's a driver.)

3) get u-boot talking to the network card, with either dhcp or static IP.
(Another driver, and some magic environment variables the driver consumes.)

4) tftp fetch an ELF kernel (or uimage if you must) into DRAM starting at a
known address. (This is a u-boot command line command. You'll need a tftp server
set up on another machine for it to fetch from.)

5) tftp fetch any other data (initrd.cpio.gz, board.dtb). (Same command,
different parameters.)

6) boot the kernel with all that gorp (a big long command line command) which
will need a kernel command line (generally stored in another persistent
environjment variable).

7) make a "go" script that does all that in one commend. There's a command to
run an environment variable's contents as a set of semicolon-separated command
line commands (that's how u-boot implements scripts), and there's a magic
environment variable whose contents get run on startup (bootup? startup? I
forget, it's in the source and docs and a buncha examples out there). It's
cleaner to have the magic one do "run $othervar" rather than putting a lot of
plumbing in the magic one. And you will totally want a "wait 3 seconds for a key
to be pressed and do a shell prompt if it is" header on that or you have to
reflash the bootloader to get your shell prompt back, which is sad.

8) Once you've got tftp working, there's a copy command to copy flash memory to
dram, and a corresponding "write to flash from dram" command with dram start
address and flash start address and length arguments. This is how the boot
without tftp is implemented in u-boot, and how updating the saved image it
auto-boots to if you don't press a key is implemented.

(You can usually configure/build uboot in a couple different ways, with a
brain-dead built in shell or with busybox hush glued into it. Depends on how big
you want the image to be. Not sure how much of that is upstream and how much is
vendor forks I've used, though. Been a while.)

> Rich

Rob

Re: [J-core] [PATCH v5 00/22] sh: LANDISK and R2Dplus convert to device tree

2018-05-07 Thread Rob Landley

On 05/07/2018 09:45 AM, Rich Felker wrote:
> On Mon, May 07, 2018 at 01:00:17PM +0200, John Paul Adrian Glaubitz wrote:
>> On 05/07/2018 03:40 AM, Yoshinori Sato wrote:
 @Yoshinori:

 Did the HDL-160U LANDISK device you have use u-boot by default or
 did you convert it from lilo?
>>>
>>> Yes.
>>> Replace sh-lilo's second stage with u-boot.
>>> With this method it is unnecessary to rewrite Flash for boot.
>>
>> Great, thank you. I will give it a try on my USL-5P and write down
>> the individual steps once I have figured it out.
> 
> Please let me know once you figure it out. I haven't made much
> progress yet and it would be really helpful to have some simple
> directions/outline of what to do, especially one that's not in terms
> of black box tools ("run this command") but how it all works (where
> the different bootloader components live when installed -- MBRs?
> partition boot records? files in a filesystem (who interprets it?)?
> etc.)

U-boot 101. The workflow you want is usually:

1) get u-boot to load and run on the board, with serial console and a basic
knowledge of where the DRAM is. (This often involves fighting with dram refresh
init, often by convincing u-boot NOT to do it because your stage 1 bootloader
already did, which involves a rolled up newspaper and a lot of swearing because
it ASSUMES. Oh it assumes. Or sometimes there's an sram->dram relocation which
means somewhere, there's a magic linker script you will learn to hate. Well,
Rich might be comfortable with that area, I still stub my toes there a lot.)

2) Getting u-boot reading/writing a flash area it can store its environment
variables in, so they can persist. (It's a driver.)

3) get u-boot talking to the network card, with either dhcp or static IP.
(Another driver, and some magic environment variables the driver consumes.)

4) tftp fetch an ELF kernel (or uimage if you must) into DRAM starting at a
known address. (This is a u-boot command line command. You'll need a tftp server
set up on another machine for it to fetch from.)

5) tftp fetch any other data (initrd.cpio.gz, board.dtb). (Same command,
different parameters.)

6) boot the kernel with all that gorp (a big long command line command) which
will need a kernel command line (generally stored in another persistent
environjment variable).

7) make a "go" script that does all that in one commend. There's a command to
run an environment variable's contents as a set of semicolon-separated command
line commands (that's how u-boot implements scripts), and there's a magic
environment variable whose contents get run on startup (bootup? startup? I
forget, it's in the source and docs and a buncha examples out there). It's
cleaner to have the magic one do "run $othervar" rather than putting a lot of
plumbing in the magic one. And you will totally want a "wait 3 seconds for a key
to be pressed and do a shell prompt if it is" header on that or you have to
reflash the bootloader to get your shell prompt back, which is sad.

8) Once you've got tftp working, there's a copy command to copy flash memory to
dram, and a corresponding "write to flash from dram" command with dram start
address and flash start address and length arguments. This is how the boot
without tftp is implemented in u-boot, and how updating the saved image it
auto-boots to if you don't press a key is implemented.

(You can usually configure/build uboot in a couple different ways, with a
brain-dead built in shell or with busybox hush glued into it. Depends on how big
you want the image to be. Not sure how much of that is upstream and how much is
vendor forks I've used, though. Been a while.)

> Rich

Rob

Re: [J-core] [PATCH v5 00/22] sh: LANDISK and R2Dplus convert to device tree

2018-05-07 Thread Rob Landley

On 05/07/2018 09:43 AM, Rich Felker wrote:
> On Mon, May 07, 2018 at 08:40:35AM -0500, Rob Landley wrote:
>> On 05/07/2018 06:00 AM, John Paul Adrian Glaubitz wrote:
>>> I have been able to boot my own kernel on my USL-5P device, but
>>> I could never get it to detect the IDE controller. Do I need
>>> an additional patch for that?
>>
>> On a related note, is there a list of boards anywhere? I'm working on a 7760
>> system at $DAYJOB, Rich has a landisk which according to
>> https://www.openbsd.org/landisk.html is an SH7751R, and Sato-san says that
>> QEMU's -r2d emulates that too? ("RTS7751R2Dplus is QEMU-SH4 target. So easy
>> trying.")
>>
>> What other boards do we need to covert to device tree? arch/sh/boards has 15 
>> C
>> files and 19 subdirectories, but I dunno the status of any of them...
> 
> I think asking "what we need to convert" is at least slightly
> mis-framed. Once the basics for device tree support are in place
> (basically patches 06-09), which boards are supported by device tree
> is mostly a matter of (1) whether the hardware drivers you want to use
> have bindings and use modern kernel interfaces, and (2) someone
> writing the dts files.

(3) being able to test the result on real hardware.

We can _add_ device tree support without that, but can we remove the old board
files without it?

> I don't mind holding off a little bit on removal of the legacy board
> file support if it's hard to get enough hardware working right away
> with device tree, but I do want to move towards getting rid of it as
> soon as we can, since it's a large volume of code cutting into my
> ability to have a good maintainer-level understanding of the arch/sh
> tree and has a lot of crufty, unmaintained parallel infrastructure
> duplicating stuff that can be done in cleaner and more modern ways
> (see the threads on early platform device stuff, rtc drivers, etc.).

The process may include a deprecation of hardware nobody has anymore, with call
for testers, for a year or so before deleting stuff. (And then the old stuff's
in git if somebody finds a board and wants to fish it out.)

Also, I'd really like QEMU support to act as a first class board. At least 256
megs of ram (so you can do native compiles on it), serial support that works
(enabling the FIFO broke it because they don't implement the '15 bits of silence
triggers a flush timer' part, so data gets stranded in the buffer until enough
comes in to fill it the rest of the way which is a pain to type at when it's a
serial console), multiple hard drives, and so on.

I'd be fine with virtio but there's no virtio devices on that target I've
noticed yet, although maybe I just haven't figured out how to enable it...

> Rich

Rob

Re: [J-core] [PATCH v5 00/22] sh: LANDISK and R2Dplus convert to device tree

2018-05-07 Thread Rob Landley

On 05/07/2018 09:43 AM, Rich Felker wrote:
> On Mon, May 07, 2018 at 08:40:35AM -0500, Rob Landley wrote:
>> On 05/07/2018 06:00 AM, John Paul Adrian Glaubitz wrote:
>>> I have been able to boot my own kernel on my USL-5P device, but
>>> I could never get it to detect the IDE controller. Do I need
>>> an additional patch for that?
>>
>> On a related note, is there a list of boards anywhere? I'm working on a 7760
>> system at $DAYJOB, Rich has a landisk which according to
>> https://www.openbsd.org/landisk.html is an SH7751R, and Sato-san says that
>> QEMU's -r2d emulates that too? ("RTS7751R2Dplus is QEMU-SH4 target. So easy
>> trying.")
>>
>> What other boards do we need to covert to device tree? arch/sh/boards has 15 
>> C
>> files and 19 subdirectories, but I dunno the status of any of them...
> 
> I think asking "what we need to convert" is at least slightly
> mis-framed. Once the basics for device tree support are in place
> (basically patches 06-09), which boards are supported by device tree
> is mostly a matter of (1) whether the hardware drivers you want to use
> have bindings and use modern kernel interfaces, and (2) someone
> writing the dts files.

(3) being able to test the result on real hardware.

We can _add_ device tree support without that, but can we remove the old board
files without it?

> I don't mind holding off a little bit on removal of the legacy board
> file support if it's hard to get enough hardware working right away
> with device tree, but I do want to move towards getting rid of it as
> soon as we can, since it's a large volume of code cutting into my
> ability to have a good maintainer-level understanding of the arch/sh
> tree and has a lot of crufty, unmaintained parallel infrastructure
> duplicating stuff that can be done in cleaner and more modern ways
> (see the threads on early platform device stuff, rtc drivers, etc.).

The process may include a deprecation of hardware nobody has anymore, with call
for testers, for a year or so before deleting stuff. (And then the old stuff's
in git if somebody finds a board and wants to fish it out.)

Also, I'd really like QEMU support to act as a first class board. At least 256
megs of ram (so you can do native compiles on it), serial support that works
(enabling the FIFO broke it because they don't implement the '15 bits of silence
triggers a flush timer' part, so data gets stranded in the buffer until enough
comes in to fill it the rest of the way which is a pain to type at when it's a
serial console), multiple hard drives, and so on.

I'd be fine with virtio but there's no virtio devices on that target I've
noticed yet, although maybe I just haven't figured out how to enable it...

> Rich

Rob

Re: [J-core] [PATCH v5 00/22] sh: LANDISK and R2Dplus convert to device tree

2018-05-07 Thread Rob Landley

On 05/07/2018 06:00 AM, John Paul Adrian Glaubitz wrote:
> I have been able to boot my own kernel on my USL-5P device, but
> I could never get it to detect the IDE controller. Do I need
> an additional patch for that?

On a related note, is there a list of boards anywhere? I'm working on a 7760
system at $DAYJOB, Rich has a landisk which according to
https://www.openbsd.org/landisk.html is an SH7751R, and Sato-san says that
QEMU's -r2d emulates that too? ("RTS7751R2Dplus is QEMU-SH4 target. So easy
trying.")

What other boards do we need to covert to device tree? arch/sh/boards has 15 C
files and 19 subdirectories, but I dunno the status of any of them...

Rob

Re: [J-core] [PATCH v5 00/22] sh: LANDISK and R2Dplus convert to device tree

2018-05-07 Thread Rob Landley

On 05/07/2018 06:00 AM, John Paul Adrian Glaubitz wrote:
> I have been able to boot my own kernel on my USL-5P device, but
> I could never get it to detect the IDE controller. Do I need
> an additional patch for that?

On a related note, is there a list of boards anywhere? I'm working on a 7760
system at $DAYJOB, Rich has a landisk which according to
https://www.openbsd.org/landisk.html is an SH7751R, and Sato-san says that
QEMU's -r2d emulates that too? ("RTS7751R2Dplus is QEMU-SH4 target. So easy
trying.")

What other boards do we need to covert to device tree? arch/sh/boards has 15 C
files and 19 subdirectories, but I dunno the status of any of them...

Rob

Re: [PATCH] Replace unnecessary perl with sed, printf, and the shell $(( )) operator.

2018-04-16 Thread Rob Landley

On 04/16/2018 07:09 AM, Russell King - ARM Linux wrote:
> On Wed, Apr 11, 2018 at 08:38:37PM -0500, Rob Landley wrote:
>> You can build a kernel in a cross compiling environment that doesn't have 
>> perl
>> in the $PATH. Commit 429f7a062e3b broke that for 32 bit arm. Fix it.
...
> This looks more complicated than necessary, and therefore less readable.
> What's wrong with:
> 
> KBSS_SZ := $(shell echo $$(($$($(CROSS_COMPILE)nm $(obj)/../../../../vmlinux 
> | \
>   sed -n -e 's/^\([^ ]*\) B __bss_start$$/-0x\1/p' \
>  -e 's/^\([^ ]*\) B __bss_stop$$/+0x\1/p') ))
> 
> The sed command produces output such as:
> 
> -0xc0955e58
> +0xc10b0f9c
> 
> which the shell is then able to evaluate and produce a decimal number.
> This seems to work fine with both bash and dash.

Yes, that is better.

Acked-by: Rob Landley <r...@landley.net>

Thanks,

Rob

Re: [PATCH] Replace unnecessary perl with sed, printf, and the shell $(( )) operator.

2018-04-16 Thread Rob Landley

On 04/16/2018 07:09 AM, Russell King - ARM Linux wrote:
> On Wed, Apr 11, 2018 at 08:38:37PM -0500, Rob Landley wrote:
>> You can build a kernel in a cross compiling environment that doesn't have 
>> perl
>> in the $PATH. Commit 429f7a062e3b broke that for 32 bit arm. Fix it.
...
> This looks more complicated than necessary, and therefore less readable.
> What's wrong with:
> 
> KBSS_SZ := $(shell echo $$(($$($(CROSS_COMPILE)nm $(obj)/../../../../vmlinux 
> | \
>   sed -n -e 's/^\([^ ]*\) B __bss_start$$/-0x\1/p' \
>  -e 's/^\([^ ]*\) B __bss_stop$$/+0x\1/p') ))
> 
> The sed command produces output such as:
> 
> -0xc0955e58
> +0xc10b0f9c
> 
> which the shell is then able to evaluate and produce a decimal number.
> This seems to work fine with both bash and dash.

Yes, that is better.

Acked-by: Rob Landley 

Thanks,

Rob

[PATCH] Replace unnecessary perl with sed, printf, and the shell $(( )) operator.

2018-04-11 Thread Rob Landley

You can build a kernel in a cross compiling environment that doesn't have perl
in the $PATH. Commit 429f7a062e3b broke that for 32 bit arm. Fix it.

Signed-off-by: Rob Landley <r...@landley.net>
---

 arch/arm/boot/compressed/Makefile |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm/boot/compressed/Makefile 
b/arch/arm/boot/compressed/Makefile
index 45a6b9b..33ebeb2 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -117,11 +117,10 @@ ccflags-y := -fpic -mno-single-pic-base -fno-builtin 
-I$(obj)
 asflags-y := -DZIMAGE
 
 # Supply kernel BSS size to the decompressor via a linker symbol.
-KBSS_SZ = $(shell $(CROSS_COMPILE)nm $(obj)/../../../../vmlinux | \
-   perl -e 'while (<>) { \
-   $$bss_start=hex($$1) if /^([[:xdigit:]]+) B 
__bss_start$$/; \
-   $$bss_end=hex($$1) if /^([[:xdigit:]]+) B 
__bss_stop$$/; \
-   }; printf "%d\n", $$bss_end - $$bss_start;')
+KBSS_SZ := $(shell echo $$(($$(printf '%d+%d' $$( \
+   $(CROSS_COMPILE)nm $(obj)/../../../../vmlinux | \
+   sed -n -e 's/^\([^ ]*\) B __bss_start$$/-0x\1/p' \
+  -e 's/^\([^ ]*\) B __bss_stop$$/0x\1/p') ) )) )
 LDFLAGS_vmlinux = --defsym _kernel_bss_size=$(KBSS_SZ)
 # Supply ZRELADDR to the decompressor via a linker symbol.
 ifneq ($(CONFIG_AUTO_ZRELADDR),y)

[PATCH] Replace unnecessary perl with sed, printf, and the shell $(( )) operator.

2018-04-11 Thread Rob Landley

You can build a kernel in a cross compiling environment that doesn't have perl
in the $PATH. Commit 429f7a062e3b broke that for 32 bit arm. Fix it.

Signed-off-by: Rob Landley 
---

 arch/arm/boot/compressed/Makefile |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm/boot/compressed/Makefile 
b/arch/arm/boot/compressed/Makefile
index 45a6b9b..33ebeb2 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -117,11 +117,10 @@ ccflags-y := -fpic -mno-single-pic-base -fno-builtin 
-I$(obj)
 asflags-y := -DZIMAGE
 
 # Supply kernel BSS size to the decompressor via a linker symbol.
-KBSS_SZ = $(shell $(CROSS_COMPILE)nm $(obj)/../../../../vmlinux | \
-   perl -e 'while (<>) { \
-   $$bss_start=hex($$1) if /^([[:xdigit:]]+) B 
__bss_start$$/; \
-   $$bss_end=hex($$1) if /^([[:xdigit:]]+) B 
__bss_stop$$/; \
-   }; printf "%d\n", $$bss_end - $$bss_start;')
+KBSS_SZ := $(shell echo $$(($$(printf '%d+%d' $$( \
+   $(CROSS_COMPILE)nm $(obj)/../../../../vmlinux | \
+   sed -n -e 's/^\([^ ]*\) B __bss_start$$/-0x\1/p' \
+  -e 's/^\([^ ]*\) B __bss_stop$$/0x\1/p') ) )) )
 LDFLAGS_vmlinux = --defsym _kernel_bss_size=$(KBSS_SZ)
 # Supply ZRELADDR to the decompressor via a linker symbol.
 ifneq ($(CONFIG_AUTO_ZRELADDR),y)

Re: [RFC PATCH v2 0/2] Randomization of address chosen by mmap.

2018-03-27 Thread Rob Landley

On 03/23/2018 02:06 PM, Matthew Wilcox wrote:
> On Fri, Mar 23, 2018 at 02:00:24PM -0400, Rich Felker wrote:
>> On Fri, Mar 23, 2018 at 05:48:06AM -0700, Matthew Wilcox wrote:
>>> On Thu, Mar 22, 2018 at 07:36:36PM +0300, Ilya Smith wrote:
 Current implementation doesn't randomize address returned by mmap.
 All the entropy ends with choosing mmap_base_addr at the process
 creation. After that mmap build very predictable layout of address
 space. It allows to bypass ASLR in many cases. This patch make
 randomization of address on any mmap call.
>>>
>>> Why should this be done in the kernel rather than libc?  libc is perfectly
>>> capable of specifying random numbers in the first argument of mmap.
>>
>> Generally libc does not have a view of the current vm maps, and thus
>> in passing "random numbers", they would have to be uniform across the
>> whole vm space and thus non-uniform once the kernel rounds up to avoid
>> existing mappings.
> 
> I'm aware that you're the musl author, but glibc somehow manages to
> provide etext, edata and end, demonstrating that it does know where at
> least some of the memory map lies.

You can parse /proc/self/maps, but it's really expensive and disgusting.

Rob

Re: [RFC PATCH v2 0/2] Randomization of address chosen by mmap.

2018-03-27 Thread Rob Landley

On 03/23/2018 02:06 PM, Matthew Wilcox wrote:
> On Fri, Mar 23, 2018 at 02:00:24PM -0400, Rich Felker wrote:
>> On Fri, Mar 23, 2018 at 05:48:06AM -0700, Matthew Wilcox wrote:
>>> On Thu, Mar 22, 2018 at 07:36:36PM +0300, Ilya Smith wrote:
 Current implementation doesn't randomize address returned by mmap.
 All the entropy ends with choosing mmap_base_addr at the process
 creation. After that mmap build very predictable layout of address
 space. It allows to bypass ASLR in many cases. This patch make
 randomization of address on any mmap call.
>>>
>>> Why should this be done in the kernel rather than libc?  libc is perfectly
>>> capable of specifying random numbers in the first argument of mmap.
>>
>> Generally libc does not have a view of the current vm maps, and thus
>> in passing "random numbers", they would have to be uniform across the
>> whole vm space and thus non-uniform once the kernel rounds up to avoid
>> existing mappings.
> 
> I'm aware that you're the musl author, but glibc somehow manages to
> provide etext, edata and end, demonstrating that it does know where at
> least some of the memory map lies.

You can parse /proc/self/maps, but it's really expensive and disgusting.

Rob

Re: [PATCH v3 15/15] selinux: delay sid population for rootfs till init is complete

2018-03-07 Thread Rob Landley

On 02/20/2018 12:56 PM, Stephen Smalley wrote:
> On Fri, 2018-02-16 at 20:33 +, Taras Kondratiuk wrote:
>> From: Victor Kamensky 
>>
>> With initramfs cpio format that supports extended attributes
>> we need to skip sid population on sys_lsetxattr call from
>> initramfs for rootfs if security server is not initialized yet.
>>
>> Otherwise callback in selinux_inode_post_setxattr will try to
>> translate give security.selinux label into sid context and since
>> security server is not available yet inode will receive default
>> sid (typically kernel_t). Note that in the same time proper
>> label will be stored in inode xattrs. Later, since inode sid
>> would be already populated system will never look back at
>> actual xattrs. But if we skip sid population for rootfs and
>> we have policy that direct use of xattrs for rootfs, proper
>> sid will be filled in from extended attributes one node is
>> accessed and server is initialized.
>>
>> Note new DELAYAFTERINIT_MNT super block flag is introduced
>> to only mark rootfs for such behavior. For other types of
>> tmpfs original logic is still used.
> 
> (cc selinux maintainers)
> 
> Wondering if we shouldn't just do this always, for all filesystem
> types.  Also, I think this should likely also be done in
> selinux_inode_setsecurity() for consistency.

I don't understand what selinux thinks it's doing here.

Initramfs is special because it's populated early, ideally early enough drivers
can load their firmware out of it. This is guaranteed to be before any processes
have launched, before any other filesystems have been mounted. I'm surprised
selinux is trying to do anything this early because A) what is there for it to
do, B) where did it get a ruleset?

This isn't really a mount flag, this is a "the selinux subsystem isn't
functionally initialized yet" flag. We haven't launched init. In a modular
system the module probably isn't loaded. There are no processes, and the only
files anywhere are the ones we're in the process of extracting. What's there
fore selinux to do?

When a filesystem is mounted, none of these cached selinux "we already looked at
the xattrs" inode fields are populated yet, correct? It can figure that out when
something accesses the file and do it then, so the point is _not_ doing this now
and thus not cacheing the wrong info. That's what the mount flag is doing,
telling selinux "not yet". So why does selinux not already _know_ "not yet"?

Why doesn't load_policy flush the cache of the old default contexts? What
happens if you mount an ext2 root and then init reads a dozen files before it
gets to the load_policy? Do those doesn't files have bad default contexts
forever now?

Where does the selinux ruleset come from during the cpio extract? Was it
hardwired into the driver? It certainly didn't come out of a file, and it wasn't
a process that loaded it. Why is selinux trying to evaluate and cache the
security context of files before it has any rules? (It has xattr annotations,
but they have no _meaning_ without rules...?

Confused,

Rob

Re: [PATCH v3 15/15] selinux: delay sid population for rootfs till init is complete

2018-03-07 Thread Rob Landley

On 02/20/2018 12:56 PM, Stephen Smalley wrote:
> On Fri, 2018-02-16 at 20:33 +, Taras Kondratiuk wrote:
>> From: Victor Kamensky 
>>
>> With initramfs cpio format that supports extended attributes
>> we need to skip sid population on sys_lsetxattr call from
>> initramfs for rootfs if security server is not initialized yet.
>>
>> Otherwise callback in selinux_inode_post_setxattr will try to
>> translate give security.selinux label into sid context and since
>> security server is not available yet inode will receive default
>> sid (typically kernel_t). Note that in the same time proper
>> label will be stored in inode xattrs. Later, since inode sid
>> would be already populated system will never look back at
>> actual xattrs. But if we skip sid population for rootfs and
>> we have policy that direct use of xattrs for rootfs, proper
>> sid will be filled in from extended attributes one node is
>> accessed and server is initialized.
>>
>> Note new DELAYAFTERINIT_MNT super block flag is introduced
>> to only mark rootfs for such behavior. For other types of
>> tmpfs original logic is still used.
> 
> (cc selinux maintainers)
> 
> Wondering if we shouldn't just do this always, for all filesystem
> types.  Also, I think this should likely also be done in
> selinux_inode_setsecurity() for consistency.

I don't understand what selinux thinks it's doing here.

Initramfs is special because it's populated early, ideally early enough drivers
can load their firmware out of it. This is guaranteed to be before any processes
have launched, before any other filesystems have been mounted. I'm surprised
selinux is trying to do anything this early because A) what is there for it to
do, B) where did it get a ruleset?

This isn't really a mount flag, this is a "the selinux subsystem isn't
functionally initialized yet" flag. We haven't launched init. In a modular
system the module probably isn't loaded. There are no processes, and the only
files anywhere are the ones we're in the process of extracting. What's there
fore selinux to do?

When a filesystem is mounted, none of these cached selinux "we already looked at
the xattrs" inode fields are populated yet, correct? It can figure that out when
something accesses the file and do it then, so the point is _not_ doing this now
and thus not cacheing the wrong info. That's what the mount flag is doing,
telling selinux "not yet". So why does selinux not already _know_ "not yet"?

Why doesn't load_policy flush the cache of the old default contexts? What
happens if you mount an ext2 root and then init reads a dozen files before it
gets to the load_policy? Do those doesn't files have bad default contexts
forever now?

Where does the selinux ruleset come from during the cpio extract? Was it
hardwired into the driver? It certainly didn't come out of a file, and it wasn't
a process that loaded it. Why is selinux trying to evaluate and cache the
security context of files before it has any rules? (It has xattr annotations,
but they have no _meaning_ without rules...?

Confused,

Rob

Re: [PATCH v3 01/15] Documentation: add newcx initramfs format description

2018-02-17 Thread Rob Landley

On 02/16/2018 06:00 PM, h...@zytor.com wrote:
> Introducing new, incompatible data formats is an inherently *very*
> costly operation; unfortunately many engineers don't seem to have a good grip
> of just *how* expensive it is (see "silly embedded nonsense hacks", "too
> little, too soon".)

So your argument is we should use the _existing_ cpio format that supports 
xattrs?

You keep bringing up the embedded world as a thing you don't understand and is
thus bad. I remember when you dismissed "I would like to constrain my
cross-compiling dependencies to a minimal set" as a... what did you call it, a
silly academic exercise? (Googles...)

https://lkml.org/lkml/2008/2/15/548

> Cpio itself is a great horror show of just how bad this gets:

That's not what you said last time?

http://lkml.iu.edu/hypermail/linux/kernel/0112.2/1540.html

> Introducing a new incompatible data format without strong justification

Here's you suggesting a new format when initramfs first went in, because you
disliked _both_ tar and cpio:

http://lkml.iu.edu/hypermail/linux/kernel/0112.2/1587.html

Seriously, there is a "why cpio rather than tar" section of
https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt
with links to the messages. (www.uwsg became lkml in the links, I should submit
a patch fixing that, it redirected 6 months ago...)

We've _had_ this argument already. You are not bringing up _new_ arguments.

This patch set is because people want xattrs in initramfs. I still don't
personally understand why they want this, but they do. We need to still support
the existing file format for the forseeable future, and we might as well fix
y2038 while we're there (treating it as unsigned buys us a lot of decades, but
as long as we're bumping the version number anyway...).

Otherwise it tries to be the minimal set of changes to get us there. (My first
stab at this was dealing with sparse files, but runs of zeroes gzip pretty well
and tmpfs could always make itself sparse after the fact...)

> Doing it under the non-justification of expedience ("oh, we can share most>
> of the code") is aggravated engineering malpractice.

Coming from the guy who added perl as a build dependency to every project he
maintained simultaneously (the linux kernel, your bootloader, klibc), that seems
a lot more like an opinion than an objective metric.

> It is entirely possible that the modern posix tar/pax format is too complex

In the link above you declared it too complex in 2001. Partly because the gnu
tar and pax formats aren't really the same format.

> to be practical in this case – that would be justifying a new format.  But
> then you are taking the fundamental cost of breakage, and then the new format
> definitely should not be replicating known defects of another format and
> without at least some thought about how to avoid it in the future.

Didn't Linus flame more than one developer for ripping things out and replacing
them with a new untested thing rather than leaving a trail of breadcrumbs from a
known working thing to another known working thing? Or has the right way to do
it changed since the 2.5 development cycle?

Strangely the poor souls suffering under the burden of cpio to use initramfs
today haven't been screaming out their agony in a detectable way. (They're mad
the kernel doesn't give better feedback about why init failed to launch and it
either paniced or fell through to the fallback ROOT=, my patch to make
devtmpfs_mount work for initramfs was trying to fix the "you pointed the kernel
at a root filesystem directory which it cpio'd up but there was /dev/console in
it so your init has no stdin/stdout/stderr and dies immediately because of it"
problem. And the recent thread about "please don't add a third knob to make
initramfs be tmpfs instead of ramfs" was another corner case of that). And I
have half an INITRAMFS_VERBOSE patch around here somewhere to printk() a lot
more status (and I need to update the initramfs documentation I wrote to help
people have an easier time using it...)

But that's not about archive format. That's kernel userspace bringup being
persnickety. The silent majority you speak for on this archive format issue is
pretty darn silent.

Was this recorded as a problem for you before somebody suggested changing it? I
tend to be public about https://twitter.com/landley/status/964620648050982912
and collect links to other people's concerns when I notice...

Or is this just your opinion?

Rob

Re: [PATCH v3 01/15] Documentation: add newcx initramfs format description

2018-02-17 Thread Rob Landley

On 02/16/2018 06:00 PM, h...@zytor.com wrote:
> Introducing new, incompatible data formats is an inherently *very*
> costly operation; unfortunately many engineers don't seem to have a good grip
> of just *how* expensive it is (see "silly embedded nonsense hacks", "too
> little, too soon".)

So your argument is we should use the _existing_ cpio format that supports 
xattrs?

You keep bringing up the embedded world as a thing you don't understand and is
thus bad. I remember when you dismissed "I would like to constrain my
cross-compiling dependencies to a minimal set" as a... what did you call it, a
silly academic exercise? (Googles...)

https://lkml.org/lkml/2008/2/15/548

> Cpio itself is a great horror show of just how bad this gets:

That's not what you said last time?

http://lkml.iu.edu/hypermail/linux/kernel/0112.2/1540.html

> Introducing a new incompatible data format without strong justification

Here's you suggesting a new format when initramfs first went in, because you
disliked _both_ tar and cpio:

http://lkml.iu.edu/hypermail/linux/kernel/0112.2/1587.html

Seriously, there is a "why cpio rather than tar" section of
https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt
with links to the messages. (www.uwsg became lkml in the links, I should submit
a patch fixing that, it redirected 6 months ago...)

We've _had_ this argument already. You are not bringing up _new_ arguments.

This patch set is because people want xattrs in initramfs. I still don't
personally understand why they want this, but they do. We need to still support
the existing file format for the forseeable future, and we might as well fix
y2038 while we're there (treating it as unsigned buys us a lot of decades, but
as long as we're bumping the version number anyway...).

Otherwise it tries to be the minimal set of changes to get us there. (My first
stab at this was dealing with sparse files, but runs of zeroes gzip pretty well
and tmpfs could always make itself sparse after the fact...)

> Doing it under the non-justification of expedience ("oh, we can share most>
> of the code") is aggravated engineering malpractice.

Coming from the guy who added perl as a build dependency to every project he
maintained simultaneously (the linux kernel, your bootloader, klibc), that seems
a lot more like an opinion than an objective metric.

> It is entirely possible that the modern posix tar/pax format is too complex

In the link above you declared it too complex in 2001. Partly because the gnu
tar and pax formats aren't really the same format.

> to be practical in this case – that would be justifying a new format.  But
> then you are taking the fundamental cost of breakage, and then the new format
> definitely should not be replicating known defects of another format and
> without at least some thought about how to avoid it in the future.

Didn't Linus flame more than one developer for ripping things out and replacing
them with a new untested thing rather than leaving a trail of breadcrumbs from a
known working thing to another known working thing? Or has the right way to do
it changed since the 2.5 development cycle?

Strangely the poor souls suffering under the burden of cpio to use initramfs
today haven't been screaming out their agony in a detectable way. (They're mad
the kernel doesn't give better feedback about why init failed to launch and it
either paniced or fell through to the fallback ROOT=, my patch to make
devtmpfs_mount work for initramfs was trying to fix the "you pointed the kernel
at a root filesystem directory which it cpio'd up but there was /dev/console in
it so your init has no stdin/stdout/stderr and dies immediately because of it"
problem. And the recent thread about "please don't add a third knob to make
initramfs be tmpfs instead of ramfs" was another corner case of that). And I
have half an INITRAMFS_VERBOSE patch around here somewhere to printk() a lot
more status (and I need to update the initramfs documentation I wrote to help
people have an easier time using it...)

But that's not about archive format. That's kernel userspace bringup being
persnickety. The silent majority you speak for on this archive format issue is
pretty darn silent.

Was this recorded as a problem for you before somebody suggested changing it? I
tend to be public about https://twitter.com/landley/status/964620648050982912
and collect links to other people's concerns when I notice...

Or is this just your opinion?

Rob

Re: [PATCH v3 01/15] Documentation: add newcx initramfs format description

2018-02-16 Thread Rob Landley

On 02/16/2018 02:59 PM, H. Peter Anvin wrote:
> On 02/16/18 12:33, Taras Kondratiuk wrote:
>> Many of the Linux security/integrity features are dependent on file
>> metadata, stored as extended attributes (xattrs), for making decisions.
>> These features need to be initialized during initcall and enabled as
>> early as possible for complete security coverage.
>>
>> Initramfs (tmpfs) supports xattrs, but newc CPIO archive format does not
>> support including them into the archive.
>>
>> This patch describes "extended" newc format (newcx) that is based on
>> newc and has following changes:
>> - extended attributes support
>> - increased size of filesize to support files >4GB
>> - increased mtime field size to have 64 bits of seconds and added a
>>   field for nanoseconds
>> - removed unused checksum field
>>
> 
> If you are going to implement a new, non-backwards-compatible format,
> you shouldn't replicate the mistakes of the current format.  Specifically:

So rather than make minimal changes to the existing format and continue to
support the existing format (sharing as much code as possible), you recommend
gratuitous aesthetic changes?

> 1. The use of ASCII-encoded fixed-length numbers is an idiotic legacy
> from an era before there were any portable way of dealing with numbers
> with prespecified endianness.

It lets encoders and decoders easily share code with the existing cpio format,
which we still intend to be able to read and write.

> If you are going to use ASCII, make them
> delimited so that they don't have fixed limits, or just use binary.

When it's gzipped this accomplishes what? (Other than being gratuitously
different from the previous iteration?)

> The cpio header isn't fixed size, so that argument goes away, in fact
> the only way to determine the end of the header is to scan forward.
> 
> 2. Alignment sensitivity!  Because there is no header length
> information, the above scan tells you where the header ends, but there
> is padding before the data, and the size of that padding is only defined
> by alignment.

Again, these are minimal changes to the existing cpio format. You're complaining
about _cpio_, and that the new stuff isn't _different_ enough from it.

> 3. Inband encoding of EOF: if you actually have a filename "TRAILER!!!"
> you have problems.

Been there, done that:

http://lkml.iu.edu/hypermail/linux/kernel/1801.3/01791.html

> But first, before you define a whole new format for which no tools exist
> (you will have to work with the maintainers of the GNU tools to add
> support)

No, he's been working with the maintainer of toybox to add support (for about a
year now), which gets him the Android command line. And the kernel has its own
built-in tool to generate cpio images anyway.

Why would anyone care what the GNU project thinks?

> you should see how complex it would be to support the POSIX
> tar/pax format,

That argument was had (at length) when initramfs went in over a decade ago.
There are links in Documentation/filesystems/ramfs-rootfs-initramfs.txt to the
mailing list entries about it.

> which already has all the features you are seeking, and
> by now is well-supported.

So... tar wasn't well-supported 15 years ago? (Hasn't the kernel source always
been distributed via tarball back since 0.0.1?)

You're suggesting having a whole second codepath that shares no code with the
existing cpio extractor. Are you suggesting abandoning support for the existing
initramfs.cpio.gz file format?

Rob

Re: [PATCH v3 01/15] Documentation: add newcx initramfs format description

2018-02-16 Thread Rob Landley

On 02/16/2018 02:59 PM, H. Peter Anvin wrote:
> On 02/16/18 12:33, Taras Kondratiuk wrote:
>> Many of the Linux security/integrity features are dependent on file
>> metadata, stored as extended attributes (xattrs), for making decisions.
>> These features need to be initialized during initcall and enabled as
>> early as possible for complete security coverage.
>>
>> Initramfs (tmpfs) supports xattrs, but newc CPIO archive format does not
>> support including them into the archive.
>>
>> This patch describes "extended" newc format (newcx) that is based on
>> newc and has following changes:
>> - extended attributes support
>> - increased size of filesize to support files >4GB
>> - increased mtime field size to have 64 bits of seconds and added a
>>   field for nanoseconds
>> - removed unused checksum field
>>
> 
> If you are going to implement a new, non-backwards-compatible format,
> you shouldn't replicate the mistakes of the current format.  Specifically:

So rather than make minimal changes to the existing format and continue to
support the existing format (sharing as much code as possible), you recommend
gratuitous aesthetic changes?

> 1. The use of ASCII-encoded fixed-length numbers is an idiotic legacy
> from an era before there were any portable way of dealing with numbers
> with prespecified endianness.

It lets encoders and decoders easily share code with the existing cpio format,
which we still intend to be able to read and write.

> If you are going to use ASCII, make them
> delimited so that they don't have fixed limits, or just use binary.

When it's gzipped this accomplishes what? (Other than being gratuitously
different from the previous iteration?)

> The cpio header isn't fixed size, so that argument goes away, in fact
> the only way to determine the end of the header is to scan forward.
> 
> 2. Alignment sensitivity!  Because there is no header length
> information, the above scan tells you where the header ends, but there
> is padding before the data, and the size of that padding is only defined
> by alignment.

Again, these are minimal changes to the existing cpio format. You're complaining
about _cpio_, and that the new stuff isn't _different_ enough from it.

> 3. Inband encoding of EOF: if you actually have a filename "TRAILER!!!"
> you have problems.

Been there, done that:

http://lkml.iu.edu/hypermail/linux/kernel/1801.3/01791.html

> But first, before you define a whole new format for which no tools exist
> (you will have to work with the maintainers of the GNU tools to add
> support)

No, he's been working with the maintainer of toybox to add support (for about a
year now), which gets him the Android command line. And the kernel has its own
built-in tool to generate cpio images anyway.

Why would anyone care what the GNU project thinks?

> you should see how complex it would be to support the POSIX
> tar/pax format,

That argument was had (at length) when initramfs went in over a decade ago.
There are links in Documentation/filesystems/ramfs-rootfs-initramfs.txt to the
mailing list entries about it.

> which already has all the features you are seeking, and
> by now is well-supported.

So... tar wasn't well-supported 15 years ago? (Hasn't the kernel source always
been distributed via tarball back since 0.0.1?)

You're suggesting having a whole second codepath that shares no code with the
existing cpio extractor. Are you suggesting abandoning support for the existing
initramfs.cpio.gz file format?

Rob

tmpfs and brickability (size=50% default considered naieve).

2018-02-10 Thread Rob Landley

If you have two default tmpfs instances on your box (hi buildroot!) and
they're world writeable and a normal user goes "cat /dev/zero >
/run/fillit; cat /dev/zero > /tmp/fillit" the system then goes "sh:
can't fork" trying to call rm on those files, because they each default
to 50% of total system memory, no matter how many instances there are.
It only stops writing when memory allocation fails in the kernel.

I'm not quite sure how to fix that. I want to say "change the default to
be 50% of what's _left_" (and if you size= manually you get to keep the
pieces) but define "what's left"? Other things are using some memory
already, the 50% _was_ the amount it's safe for tmpfs to use. (Once upon
a time the logic was that 50% of memory can go to disk cache. I think
that's gotten a bit more complicated since then? No idea what the
current rules are.)

This is related to the guy who wanted initramfs to be ramfs instead of
tmpfs because he had a cpio.gz that extracted to >50% of system memory
so it extracted fine with ramfs but hit the limit and failed with tmpfs.
That sounds like a tmpfs rootfs (I.E. initmpfs) should start with a
limit of 90% and then scale it _down_ to 50% after extracting the cpio.
(I wonder what happens if you -o remount the size= limit to smaller than
the filesystem currently holds? Hmmm...)

Rob

P.S. Yes I need to pipe rootflags= through so you can specify the size
on the command line. I've got 3 or 4 initramfs/tmpfs plates spinning and
haven't had a chance to work on them all merge window. I'll try to get
some patches ready by next weekend, if they miss the release they miss
the release...

tmpfs and brickability (size=50% default considered naieve).

2018-02-10 Thread Rob Landley

If you have two default tmpfs instances on your box (hi buildroot!) and
they're world writeable and a normal user goes "cat /dev/zero >
/run/fillit; cat /dev/zero > /tmp/fillit" the system then goes "sh:
can't fork" trying to call rm on those files, because they each default
to 50% of total system memory, no matter how many instances there are.
It only stops writing when memory allocation fails in the kernel.

I'm not quite sure how to fix that. I want to say "change the default to
be 50% of what's _left_" (and if you size= manually you get to keep the
pieces) but define "what's left"? Other things are using some memory
already, the 50% _was_ the amount it's safe for tmpfs to use. (Once upon
a time the logic was that 50% of memory can go to disk cache. I think
that's gotten a bit more complicated since then? No idea what the
current rules are.)

This is related to the guy who wanted initramfs to be ramfs instead of
tmpfs because he had a cpio.gz that extracted to >50% of system memory
so it extracted fine with ramfs but hit the limit and failed with tmpfs.
That sounds like a tmpfs rootfs (I.E. initmpfs) should start with a
limit of 90% and then scale it _down_ to 50% after extracting the cpio.
(I wonder what happens if you -o remount the size= limit to smaller than
the filesystem currently holds? Hmmm...)

Rob

P.S. Yes I need to pipe rootflags= through so you can specify the size
on the command line. I've got 3 or 4 initramfs/tmpfs plates spinning and
haven't had a chance to work on them all merge window. I'll try to get
some patches ready by next weekend, if they miss the release they miss
the release...

Re: [RFC PATCH] rootfs: force mounting rootfs as tmpfs

2018-02-01 Thread Rob Landley



On 02/01/2018 04:41 PM, Taras Kondratiuk wrote:
> Quoting Mimi Zohar (2018-02-01 13:51:52)
>> On Thu, 2018-02-01 at 11:09 -0600, Rob Landley wrote:
>>> On 02/01/2018 09:55 AM, Mimi Zohar wrote:
>>>> On Thu, 2018-02-01 at 09:20 -0600, Rob Landley wrote:
>>>>
>>>>>> With your patch and specifying "root=tmpfs", dracut is complaining:
>>>>>>
>>>>>> dracut: FATAL: Don't know how to handle 'root=tmpfs'
>>>>>> dracut: refusing to continue
>>>>>
>>>>> [googles]... I do not understand why this package exists.
>>>>>
>>>>> If you're switching to another root filesystem, using a tool that
>>>>> wikipedia[citation needed] says has no purpose but to switch to another
>>>>> root filesystem, (so let's reproduce the kernel infrastructure in
>>>>> userspace while leaving it the kernel too)... why do you need initramfs
>>>>> to be tmpfs? You're using it for half a second, then discarding it,
>>>>> what's the point of it being tmpfs?
>>>>
>>>> Unlike the kernel image which is signed by the distros, the initramfs
>>>> doesn't come signed, because it is built on the target system.  Even
>>>> if the initramfs did come signed, it is beneficial to measure and
>>>> appraise the individual files in the initramfs.
>>>
>>> You can still shoot yourself in the foot with tmpfs. People mount a /run
>>> and a /tmp and then as a normal user you can go
>>> https://twitter.com/landley/status/959103235305951233 and maybe the
>>> default should be a little more clever there...
>>>
>>> I'll throw it on the todo heap. :)
>>>
>>>>> Sigh. If people are ok with having rootfs just be tmpfs whenever tmpfs
>>>>> is configured in, even when you're then going to overmount it with
>>>>> something else like you're doing, let's just _remove_ the test. If it
>>>>> can be tmpfs, have it be tmpfs.
>>>>
>>>> Very much appreciated!
>>>
>>> Not yet tested, but something like the attached? (Sorry for the
>>> half-finished doc changes in there, I'm at work and have a 5 minute
>>> break. I can test properly this evening if you don't get to it...)
>>
>> Yes, rootfs is being mounted as tmpfs.
> 
> I don't think you can unconditionally replace ramfs with initramfs by
> default. Their behavior is different in some cases (e.g. pivot_root vs
> switch_root)

Both are switch_root, you can't pivot_root off of either one. (Yes, I
hit that bug and reported it, and they fixed it, back in the day...
http://lists.busybox.net/pipermail/busybox/2006-March/053529.html )

> and it can break many systems that expect ramfs by default.

The use case I told Mimi about off-list (since they stopped cc:ing the
list in one of their replies but the conversation continued) was the guy
who was extracting an initramfs bigger than 50% of system memory, which
worked with initramfs but failed with initmpfs. A quick google didn't
find the original message but it resulted in this blog entry from the
affected party:

http://www.lightofdawn.org/blog/?viewDetailed=00128

I.E. yeah, I know, I need to redo these patches tonight.

Rob

Re: [RFC PATCH] rootfs: force mounting rootfs as tmpfs

2018-02-01 Thread Rob Landley



On 02/01/2018 04:41 PM, Taras Kondratiuk wrote:
> Quoting Mimi Zohar (2018-02-01 13:51:52)
>> On Thu, 2018-02-01 at 11:09 -0600, Rob Landley wrote:
>>> On 02/01/2018 09:55 AM, Mimi Zohar wrote:
>>>> On Thu, 2018-02-01 at 09:20 -0600, Rob Landley wrote:
>>>>
>>>>>> With your patch and specifying "root=tmpfs", dracut is complaining:
>>>>>>
>>>>>> dracut: FATAL: Don't know how to handle 'root=tmpfs'
>>>>>> dracut: refusing to continue
>>>>>
>>>>> [googles]... I do not understand why this package exists.
>>>>>
>>>>> If you're switching to another root filesystem, using a tool that
>>>>> wikipedia[citation needed] says has no purpose but to switch to another
>>>>> root filesystem, (so let's reproduce the kernel infrastructure in
>>>>> userspace while leaving it the kernel too)... why do you need initramfs
>>>>> to be tmpfs? You're using it for half a second, then discarding it,
>>>>> what's the point of it being tmpfs?
>>>>
>>>> Unlike the kernel image which is signed by the distros, the initramfs
>>>> doesn't come signed, because it is built on the target system.  Even
>>>> if the initramfs did come signed, it is beneficial to measure and
>>>> appraise the individual files in the initramfs.
>>>
>>> You can still shoot yourself in the foot with tmpfs. People mount a /run
>>> and a /tmp and then as a normal user you can go
>>> https://twitter.com/landley/status/959103235305951233 and maybe the
>>> default should be a little more clever there...
>>>
>>> I'll throw it on the todo heap. :)
>>>
>>>>> Sigh. If people are ok with having rootfs just be tmpfs whenever tmpfs
>>>>> is configured in, even when you're then going to overmount it with
>>>>> something else like you're doing, let's just _remove_ the test. If it
>>>>> can be tmpfs, have it be tmpfs.
>>>>
>>>> Very much appreciated!
>>>
>>> Not yet tested, but something like the attached? (Sorry for the
>>> half-finished doc changes in there, I'm at work and have a 5 minute
>>> break. I can test properly this evening if you don't get to it...)
>>
>> Yes, rootfs is being mounted as tmpfs.
> 
> I don't think you can unconditionally replace ramfs with initramfs by
> default. Their behavior is different in some cases (e.g. pivot_root vs
> switch_root)

Both are switch_root, you can't pivot_root off of either one. (Yes, I
hit that bug and reported it, and they fixed it, back in the day...
http://lists.busybox.net/pipermail/busybox/2006-March/053529.html )

> and it can break many systems that expect ramfs by default.

The use case I told Mimi about off-list (since they stopped cc:ing the
list in one of their replies but the conversation continued) was the guy
who was extracting an initramfs bigger than 50% of system memory, which
worked with initramfs but failed with initmpfs. A quick google didn't
find the original message but it resulted in this blog entry from the
affected party:

http://www.lightofdawn.org/blog/?viewDetailed=00128

I.E. yeah, I know, I need to redo these patches tonight.

Rob

Re: [RFC PATCH] rootfs: force mounting rootfs as tmpfs

2018-02-01 Thread Rob Landley



On 01/31/2018 10:22 PM, Mimi Zohar wrote:
> On Wed, 2018-01-31 at 21:03 -0500, Arvind Sankar wrote:
>> On Wed, Jan 31, 2018 at 05:48:20PM -0600, Rob Landley wrote:
>>> On 01/31/2018 04:07 PM, Mimi Zohar wrote:
>>>> On Wed, 2018-01-31 at 13:32 -0600, Rob Landley wrote:>> (The old "I 
>>>> configured in tmpfs and am using rootfs but I want that
>>> rootfs
>>>>> to be ramfs, not tmpfs" code doesn't seem to be a real-world concern, does
>>>>> it?)
>>>>
>>>> I must be missing something.  Which systems don't specify "root=" on
>>>> the boot command line.
>>>
>>> Any system using initrd or initramfs?
>>>
>>
>> Don't a lot of initramfs setups use root= to tell the initramfs which
>> actual root file system to switch to after early boot?
> 
> With your patch and specifying "root=tmpfs", dracut is complaining:
> 
> dracut: FATAL: Don't know how to handle 'root=tmpfs'
> dracut: refusing to continue

"The kernel can't break this buggy userspace package."

"The kernel must give access to a new feature to this buggy userspace
package".

I think kernel policy asks you to pick one, but I could be wrong...

Rob

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1794 matches

Mail list logo