Re: [PATCH 18/18] arm64: apple: Add initial Mac Mini 2020 (M1) devicetree

2021-02-08 Thread Hector Martin 'marcan'
n the bootloader. Will be gone for v2.



+   clk24: clk24 {


Just "clock". Node names should be generic.


Really? Almost every other device device tree uses unique clock node names.


+   compatible = "fixed-clock";
+   #clock-cells = <0>;
+   clock-frequency = <2400>;
+   clock-output-names = "clk24";


What clock is it? Part of board or SoC? Isn't it a work-around for
missing clock drivers?


The clock topology isn't entirely known yet; I'm submitting this as an 
initial bring-up patchset and indeed there should be a clockchip driver 
in the future. The UART driver wants a clock to be able to calculate 
baud rates. I figured we can get away with a fixed-clock for now while 
that part of the SoC gets figured out.


Ack on all the other comments, will fix for v2.

Thanks for the review!

--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 15/18] irqchip/apple-aic: Add support for the Apple Interrupt Controller

2021-02-08 Thread Hector Martin 'marcan'

On 08/02/2021 19.29, Arnd Bergmann wrote:

On Mon, Feb 8, 2021 at 10:25 AM Marc Zyngier  wrote:

On Thu, 04 Feb 2021 20:39:48 +, Hector Martin  wrote:



+{
+ return readl(ic->base + reg);


Please consider using the _relaxed accessors, as I don't think any of
these interacts with memory (apart from IPIs, of course).


MSI interrupts require serializing with DMA, so at the minimum I think there
needs to be something that ensures that DMA from device into memory
has completed before delivering the completion interrupt to a driver. This
may already be implied when the AIC is entered, but this is hard to know
without actual hardware specs.



I don't think this can be implied in any case, because if IRQ A fires 
and then the CPU speculates its way through AIC into the IRQ B handler, 
which reads DMA'd memory, then IRQ B fires and it has higher priority 
and *that* is what ends up getting returned from the event register 
first, the execution will commit with an ordering violation.


I'm pretty sure we need *some* level of explicit synchronization between 
reading the event register and actually delivering IRQs downstream. 
Using _relaxed might be okay, but we'd still need something where the 
isb() currently is in aic_handle_irq (though I admit I don't have a 
perfect picture of the memory ordering subtleties involved here yet).


Incidentally, just from the races and problems I've run into with 
trivial tests in m1n1, these CPUs seem to be *very* eager to speculate 
and I suspect they will help uncover race conditions in Linux...


--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 10/18] arm64: Introduce FIQ support

2021-02-07 Thread Hector Martin 'marcan'

On 07/02/2021 21.25, Arnd Bergmann wrote:

On Sun, Feb 7, 2021 at 9:36 AM Hector Martin 'marcan'  wrote:

On 07/02/2021 01.22, Arnd Bergmann wrote:

* In the fiq handler code, check if normal interrupts were enabled
when the fiq hit. Normally they are enabled, so just proceed to
handle the timer and ipi directly

* if irq was disabled, defer the handling by doing a self-ipi
through the aic's ipi method, and handle it from there
when dealing with the next interrupt once interrupts get
enabled.

This would be similar to the soft-disable feature on powerpc, which
never actually turns off interrupts from regular kernel code but
just checks a flag in local_irq_enable that gets set when a
hardirq happened.


Case #2 seems messy. In AIC, we'd have to either:

* Disable FIQs, and hope that doesn't confuse any save/restore code
going on, then set a flag and check it in *both* the IRQ and FIQ path
since either might trigger depending on what happens next, or
* Mask the relevant timer, which we'd then need to make sure does not
confuse the timer code (Unmask it again when we fire the interrupt? But
what if the timer code intended to mask it in the interim?)


I'm not quite following here. The IRQ should be disabled the entire time
while handling that self-IPI and the timer top half code, so if we get
another FIQ while handling the timer from the IRQ path, it will lead
either yet another self-IPI or it will be ignored in case the previous timer
event has not been Acked yet. I would expect that both cases are
race-free here, the only time that the FIQ needs to be disabled is
while actually handling the FIQ. Did I miss something?


FIQs are level-triggered, and there are only two* ways of masking them 
(that we know of): in the timer, or DAIF. That means that if we get a 
FIQ, we *must* do one of two things: either mask it in the timer 
register, or mask FIQs entirely. If we do neither of these, we get a FIQ 
storm.


So if a timer FIQ fires while IRQs are disabled, and we can't call into 
the timer code (because IRQs were disabled, so we need to defer handling 
via the IPI), the only other options are to either poke the timer mask 
bit directly, or to mask FIQs. Neither seems particularly correct.


* An exception seems to be non-HV timer interrupts firing while we have 
a VM guest running (HCR_EL2.TGE=0). This causes a single FIQ, and no 
more, which suggests there is a mask bit for guest timer FIQs somewhere 
that gets automatically set when the FIQ is delivered to the CPU core. 
I've yet to find where this bit lives, I'll be doing a brute force sweep 
of system register space soon to see if I can find it, and if there is 
anything else useful near it.



Plus I don't know if the vector entry code and other scaffolding between
the vector and the AIC driver would be happy with, effectively,
recursive interrupts. This could work with a carefully controlled path
to make sure it doesn't break things, but I'm not so sure about the
current "just point FIQ and IRQ to the same place" approach here.


If we do what I described above, the FIQ and IRQ entry would have
to be separate and only arrive in the same code path when calling
into drivers/clocksource/arm_arch_timer.c. It's not recursive there
because that part is only called when IRQ is disabled, and no IRQ
is being executed while the FIQ hits.


Right, that's what i'm saying; we can't re-use the IRQ handler like Marc 
proposed, because I don't think that expects to be called reentrantly; 
we'd have to have a separate FIQ entry, but since it can be called with 
IRQs enabled and handle the FIQ in-line, it also needs to be able to 
*conditionally* behave like a normal IRQ handler. This level of 
complexity seems somewhat dubious, just to not maintain the FIQ mask bit 
synced. That's not just AIC code any more, it needs a bespoke FIQ vector 
and logic to decide whether IRQs are masked (call AIC to self-IPI 
without doing the usual IRQ processing) or unmasked (go through regular 
IRQ accounting and behave like an IRQ).


Perhaps I'm misunderstanding what you're proposing here or how this 
would work :)


--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 05/18] tty: serial: samsung_tty: add support for Apple UARTs

2021-02-07 Thread Hector Martin 'marcan'

On 07/02/2021 18.12, Hector Martin 'marcan' wrote:

On 06/02/2021 22.15, Marc Zyngier wrote:

The default should be IRQ_NONE, otherwise the kernel cannot detect a
screaming spurious interrupt.


Good point, and this needs fixing in s3c64xx_serial_handle_irq too then
(which is what I based mine off of).


+   ret = request_irq(port->irq, apple_serial_handle_irq, IRQF_SHARED,
+ s3c24xx_serial_portname(port), ourport);


Why IRQF_SHARED? Do you expect any other device sharing the same line
with this UART?


This also came from s3c64xx_serial_startup and... now I wonder why that
one needs it. Maybe on some SoCs it does get shared? Certainly not for
discrete rx/tx irq chips (and indeed those don't set the flag)...

CCing Thomas, who added the S3C64xx support (and should probably review
this patch); is there a reason for IRQF_SHARED there? NB: v1 breaks the
build on arm or with CONFIG_PM_SLEEP, those will be fixed for v2.


Seems Thomas does not work for Linaro any more :)

CCing Krzysztof instead, who is the Samsung arch maintainer.



Either way, certainly not for Apple SoCs; I'll get rid of IRQF_SHARED
for v2.


--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 11/18] arm64: Kconfig: Require FIQ support for ARCH_APPLE

2021-02-07 Thread Hector Martin 'marcan'

On 07/02/2021 00.46, Marc Zyngier wrote:

  config ARCH_APPLE
bool "Apple Silicon SoC family"
select GENERIC_IRQ_CHIP
+   select ARM64_FIQ_SUPPORT


Ah, this is what I was expecting in the previous patch. I guess the
initial ARCH_APPLE patch could be moved down the line and add all the
dependencies in one go.


I was trying to introduce the Kconfig before the code that depends on 
it; is it kosher to have it in the other order, looking for CONFIG_ 
defines that don't exist yet?


Though in this case the only user earlier in the series is the Samsung 
stuff, which doesn't care about FIQs, so I can just sort things as 
FIQ->ARCH_APPLE->samsung->AIC...


I'm not sure about AIC vs. ARCH_APPLE though. Right now the pattern is 
that AIC depends on ARCH_APPLE and also defaults to that. But then you 
can build with ARCH_APPLE and AIC disabled if you so choose, which does 
result in a broken system on these machines. AIC should build without 
ARCH_APPLE (as long as we're on ARM64), so we could reverse that.


--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 05/18] tty: serial: samsung_tty: add support for Apple UARTs

2021-02-07 Thread Hector Martin 'marcan'

On 06/02/2021 22.15, Marc Zyngier wrote:

-static int s3c24xx_serial_has_interrupt_mask(struct uart_port *port)
+static int s3c24xx_irq_type(struct uart_port *port)
  {
-   return to_ourport(port)->info->type == PORT_S3C6400;
+   switch (to_ourport(port)->info->type) {
+   case PORT_S3C6400:
+   return IRQ_S3C6400;
+   case PORT_APPLE:
+   return IRQ_APPLE;
+   default:
+   return IRQ_DISCRETE;
+   }
+


nit: For ease of reviewing, it'd be good if you could split this patch
with introducing the S3C6400 and "discrete" support initially, and
only then add the new stuff.


Good idea, will do for v2.


+   if (s3c24xx_irq_type(port) == IRQ_APPLE)
+   s3c24xx_serial_tx_chars(NO_IRQ, ourport);


Instead of directly calling into the handler (which has its own
problems, see below), could you just tickle the interrupt status
register to make an interrupt pending and trigger an actual interrupt?
I have no idea whether the HW supports this kind of trick though.


I thought of that, but I tried really hard to find such a feature with 
no success. The best I can do is unmask and trigger the *RX* timeout 
interrupt which will eventually fire but... this doesn't work so well in 
practice. There is no way to trigger IRQ flags directly (as those bits 
are write-1-to-clear).



-   spin_lock_irqsave(>lock, flags);
+   /* Only lock if called from IRQ context */
+   if (irq != NO_IRQ)
+   spin_lock_irqsave(>lock, flags);


Isn't that actually dangerous? What prevents the interrupt from firing
right in the middle of this sequence and create havoc when called from
enable_tx_pio()? I fail to see what you gain with sidestepping the
locking.


The callpath here is:

uart_start -> __uart_start -> (uart_ops.start_tx) 
s3c24xx_serial_start_tx -> s3c24xx_serial_start_tx_pio -> enable_tx_pio 
-> s3c24xx_serial_tx_chars


And uart_start takes the uart_port lock. None of the serial functions 
take the lock because the serial core already does, but obviously the 
IRQ handler needs to, *if* it's called as an IRQ handler only.



The default should be IRQ_NONE, otherwise the kernel cannot detect a
screaming spurious interrupt.


Good point, and this needs fixing in s3c64xx_serial_handle_irq too then 
(which is what I based mine off of).



+   ret = request_irq(port->irq, apple_serial_handle_irq, IRQF_SHARED,
+ s3c24xx_serial_portname(port), ourport);


Why IRQF_SHARED? Do you expect any other device sharing the same line
with this UART?


This also came from s3c64xx_serial_startup and... now I wonder why that 
one needs it. Maybe on some SoCs it does get shared? Certainly not for 
discrete rx/tx irq chips (and indeed those don't set the flag)...


CCing Thomas, who added the S3C64xx support (and should probably review 
this patch); is there a reason for IRQF_SHARED there? NB: v1 breaks the 
build on arm or with CONFIG_PM_SLEEP, those will be fixed for v2.


Either way, certainly not for Apple SoCs; I'll get rid of IRQF_SHARED 
for v2.



diff --git a/include/uapi/linux/serial_core.h b/include/uapi/linux/serial_core.h
index 62c22045fe65..59d102b674db 100644
--- a/include/uapi/linux/serial_core.h
+++ b/include/uapi/linux/serial_core.h
@@ -277,4 +277,7 @@
  /* Freescale LINFlexD UART */
  #define PORT_LINFLEXUART  122
  
+/* Apple Silicon (M1/T8103) UART (Samsung variant) */

+#define PORT_APPLE 123
+


Do you actually need a new port type here? Looking at the driver
itself, it is mainly used to work out the IRQ model. Maybe introducing
a new irq_type field in the port structure would be better than
exposing this to userspace (which should see something that is exactly
the same as a S3C UART).


Well... every S3C variant already has its own port type here.

#define PORT_S3C241055
#define PORT_S3C244061
#define PORT_S3C240067
#define PORT_S3C241273
#define PORT_S3C640084

If we don't introduce a new one, which one should we pretend to be? :)

I agree that it might make sense to merge all of these into one, though; 
I don't know what the original reason for splitting them out is. But now 
that they're part of the userspace API, this might not be a good idea. 
Though, unsurprisingly, some googling suggests there are zero users of 
these defines in userspace.


--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 10/18] arm64: Introduce FIQ support

2021-02-07 Thread Hector Martin 'marcan'

On 07/02/2021 00.37, Marc Zyngier wrote:

See my digression in patch 8. I really wonder what the benefit is to
treat FIQ independently of IRQ, and we might as well generalise
this. We could always panic on getting a FIQ on platforms that don't
expect one.

It'd be good to rope in the other interested parties (Mark for the
early entry code, James for RAS and SError handling).


CCing Mark and James: TL;DR what do you think about unconditionally 
keeping DAIF.I == DAIF.F, would this break other platforms with spurious 
FIQs or conversely mask FIQs when we don't want to in some cases? The 
FIQ vector would remain a panic except on platforms that require using 
it, via an alternatives patch.



kernel_ventry   1, sync // Synchronous EL1h
kernel_ventry   1, irq  // IRQ EL1h
-   kernel_ventry   1, fiq_invalid  // FIQ EL1h
+   // FIQ EL1h
+   kernel_ventry   1, fiq_invalid, 64, irq, ARM64_NEEDS_FIQ


It could be better to create a set of first class FIQ handlers rather
than this alternative target macro. I quickly hacked this instead,
which I find more readable.


I think I ended up with the macro change to keep it 1:1 with IRQ, vs a 
separate branch... but I didn't think of the fallthrough-with-nop trick, 
neat. It is definitely is more readable. Are you OK with me pulling this 
patch in for v2, with your name on it?



-   kernel_ventry   0, fiq_invalid_compat, 32   // FIQ 32-bit EL0
+   kernel_ventry   0, fiq, 32  // FIQ 32-bit EL0


fiq_compat here, right?

--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 10/18] arm64: Introduce FIQ support

2021-02-07 Thread Hector Martin 'marcan'

On 07/02/2021 01.22, Arnd Bergmann wrote:

* In the fiq handler code, check if normal interrupts were enabled
   when the fiq hit. Normally they are enabled, so just proceed to
   handle the timer and ipi directly

* if irq was disabled, defer the handling by doing a self-ipi
   through the aic's ipi method, and handle it from there
   when dealing with the next interrupt once interrupts get
   enabled.

This would be similar to the soft-disable feature on powerpc, which
never actually turns off interrupts from regular kernel code but
just checks a flag in local_irq_enable that gets set when a
hardirq happened.


Case #2 seems messy. In AIC, we'd have to either:

* Disable FIQs, and hope that doesn't confuse any save/restore code 
going on, then set a flag and check it in *both* the IRQ and FIQ path 
since either might trigger depending on what happens next, or
* Mask the relevant timer, which we'd then need to make sure does not 
confuse the timer code (Unmask it again when we fire the interrupt? But 
what if the timer code intended to mask it in the interim?)


Neither sounds particularly clean to me... if we had FIQ status masking 
registers this would be more reasonable, but I'm not sure I'd want the 
AIC driver to mess with neither DAIF nor the timer registers. It's bad 
enough that it has to read the latter already (and I hope I can find a 
better way of doing that...).


Plus I don't know if the vector entry code and other scaffolding between 
the vector and the AIC driver would be happy with, effectively, 
recursive interrupts. This could work with a carefully controlled path 
to make sure it doesn't break things, but I'm not so sure about the 
current "just point FIQ and IRQ to the same place" approach here.


--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 08/18] arm64: cpufeature: Add a feature for FIQ support

2021-02-07 Thread Hector Martin 'marcan'

On 06/02/2021 22.58, Marc Zyngier wrote:

Hector Martin  wrote:

+static void cpu_sync_irq_to_fiq(struct arm64_cpu_capabilities const *cap)
+{
+   u64 daif = read_sysreg(daif);
+
+   /*
+* By this point in the boot process IRQs are likely masked and FIOs
+* aren't, so we need to sync things to avoid spurious early FIQs.
+*/
+
+   if (daif & PSR_I_BIT)
+   daif |= PSR_F_BIT;
+   else
+   daif &= ~PSR_F_BIT;
+
+   write_sysreg(daif, daif);


Could this happen too late? If, as explained above, we can get a FIQ
until we mask it here, what prevents something (a timer?) from kicking
and creating havoc just before the sync?


Nothing, other than timers not being enabled this early (hopefully the 
bootloader doesn't leave a rogue timer running for us...)



If the answer is "nothing", then it probably means that the default
behaviour should be to treat PSTATE.I and PSTATE.F as containing the
same value at all times, and not just as an afterthought when we
detect that we're on a CPU type or another.


I thought of this too. Originally I thought PSTATE.F was always set on 
other systems, and thus unmasking FIQs could cause problems if there is 
a pending rogue FIQ for some reason. However, while writing this patch I 
realized that as part of normal process state changes we already unmask 
FIQs anyway (see DAIF_PROCCTX).


Thus, in fact, this patch actually changes things (when the cpufeature 
is set) to mask FIQs in some cases where they currently aren't, and 
conversely to unmask them in some cases where they currently are. But 
the fact that FIQ masking is somewhat inconsistent to begin with 
suggests that we should be able to mess with it without causing breakage 
for other systems.


So I guess in this case it would be legitimate to just make I==F on 
every system, and if something breaks it should be fixed by making 
whatever is causing a rogue FIQ not do that, right?


That would leave the vector switcheroo as the only thing the cpufeature 
does, which would certainly simplify a lot of the patch.



This could expand into enabling Group-0 interrupts with GICv3 on
systems that have a single security state (such as virtual machines),
though I don't really see a good use case for it.


I could see having a separate vector path opening up the door for 
performance hacks for very specific use cases that want really low 
latency for *one* thing (e.g. the mess the Raspberry Pi folks do to work 
around that braindead USB controller's performance issues), though I 
imagine there would have to be very compelling reasons to develop a 
framework to do this sanely upstream.


Incidentally, I have a personal interest in real-time performance 
(especially audio); once the dust settles and we have a workable kernel 
for normal use I do hope to spend some time taking a deep dive into 
latencies and finding RT-unfriendly code, but that's pretty far off 
right now. Maybe PREEMPT_RT will even be merged by then :-) (I hope that 
without SMM to screw things up on these machines they might make very 
nice RT-capable boxes...)


--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 04/18] arm64: Kconfig: Introduce CONFIG_ARCH_APPLE

2021-02-07 Thread Hector Martin 'marcan'

On 06/02/2021 22.17, Marc Zyngier wrote:

+config ARCH_APPLE
+   bool "Apple Silicon SoC family"
+   select GENERIC_IRQ_CHIP


nit: This is better selected by the interrupt controller that relies
on the generic irqchip infrastructure.


Ack, changed for v2.

--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH v6 05/21] arm64: Initialise as nVHE before switching to VHE

2021-02-05 Thread Hector Martin 'marcan'

On 01/02/2021 20.56, Marc Zyngier wrote:

As we are aiming to be able to control whether we enable VHE or
not, let's always drop down to EL1 first, and only then upgrade
to VHE if at all possible.

This means that if the kernel is booted at EL2, we always start
with a nVHE init, drop to EL1 to initialise the the kernel, and
only then upgrade the kernel EL to EL2 if possible (the process
is obviously shortened for secondary CPUs).


Unfortunately, this is going to break on Apple SoCs, where it turns out 
HCR_EL2.E2H is hard-wired to 1 - there is no nVHE mode. :(


>>> mrs(HCR_EL2) & (1<<34)
0x4
>>> msr(HCR_EL2, mrs(HCR_EL2) & ~(1<<34))
>>> mrs(HCR_EL2) & (1<<34)
0x4

--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 00/18] Apple M1 SoC platform bring-up

2021-02-05 Thread Hector Martin 'marcan'

On 05/02/2021 05.39, Hector Martin wrote:

This series brings up initial support for the Apple M1 SoC, used in the
2020 Mac Mini, MacBook Pro, and MacBook Air models.


Forgot to CC: a few folks involved in the previous related thread, 
sorry! Adding them here, hope everyone got the series via the MLs.


v2 will be CCed to everyone else too.

--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 15/18] irqchip/apple-aic: Add support for the Apple Interrupt Controller

2021-02-05 Thread Hector Martin 'marcan'

On 05/02/2021 11.27, kernel test robot wrote:

config: arc-allyesconfig (attached as .config)


This is never going to build on !ARM64 since it uses ARM64 registers, so 
removing COMPILE_TEST for v2.


--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 05/18] tty: serial: samsung_tty: add support for Apple UARTs

2021-02-05 Thread Hector Martin 'marcan'

On 05/02/2021 08.55, kernel test robot wrote:

drivers/tty/serial/samsung_tty.c:60: warning: "NO_IRQ" redefined

   60 | #define NO_IRQ -1


Turns out arm (32) defines NO_IRQ. Replaced with NOT_IN_IRQ for v2.

--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 15/18] irqchip/apple-aic: Add support for the Apple Interrupt Controller

2021-02-04 Thread Hector Martin 'marcan'

On 05/02/2021 08.04, Arnd Bergmann wrote:

On Thu, Feb 4, 2021 at 11:06 PM Hector Martin 'marcan'  wrote:

If we split it up again, one of the two still needs to be the root,
decide whether what fired is an IRQ or FIQ, and dispatch accordingly. Or
we could have three nodes and have one root handler dispatch to IRQ and
FIQ nodes, but that sounds like overkill... (?)


Maybe I'm misreading the low-level entry code, but my impression
was that the fiq and irq exception vectors could just be pointed to
two different root drivers from the code in kernel_ventry


Certainly, but we'd have to introduce a fiq handler global and duplicate 
the handler code; this is what was done in the previous submission, but 
I seem to recall someone (Marc?) mentioned it would be cleaner to just 
merge them into the single IRQ path and discriminate in the irqchip, 
which is what I did here.


I can certainly go with either solution; I don't have a strong 
preference here.


Advantages of split path:

* More orthogonal

Advantages of merged path:

* Minimizes common vector changes needed for a single platform
* Keeps FIQ/IRQ code common, so FIQs are less likely to be accidentally 
broken by people not testing on Apple platforms.


Unclear:

* Performance. Split path runs less code, merged path has lower icache 
pressure.



Are you proposing just having different drivers/nodes in the same file,
or implementing these as separate drivers in separate files?


I was thinking of separate driver files.


That's what I previously had then :)

If this is the way to go I can certainly go back to that.


I looked at other architectures, and found that at least powerpc
and sparc64 have a really minimal timer tick, with their timer_interrupt()
function getting called directly from the exception vector, and
doing a minimum of accounting (irq_enter(), statistics, ...) manually.

It's a different question if we want to do that, or if there should always
be an irqchip for consistency.


I think the issue here is that those platforms presumably have *one* 
timer hard wired to a specific exception vector (e.g. on PowerPC that's 
the decrementer). So, that setup is shared by all implementations in 
that platform.


But on ARM64, the architectural timer is supposed to go through an 
irqchip (GIC in normal platforms), it's just that here it ended up 
hard-wired to FIQ - though not alone, since fast IPIs are also there, so 
we can't treat it as a strict "timer vector" either.


So even if we could do this for Apple SoCs, it would be a non-standard 
setup, since every other ARM64 platform puts the timer behind an 
irqchip. Therefore, I think it makes sense to always go through an 
irqchip, rather than introduce a bypass for these SoCs.


Also worth noting that we have at least two functional hardware timers 
here (not sure if there are more, we run with HCR_EL2.E2H=1 in m1n1 
which maps the EL2 timer to be the EL1 timer; I'm not yet sure if 
setting that to 0 will expose extra HV timers or not) wired to the same 
FIQ. I confirmed that both the virtual and physical timers function 
independently in m1n1.


I did confirm there are no secure timers, which is expected given that 
there is no EL3 on these chips.



Benchmarking would at least help understand why there are two.


Well, they call them "Fast IPIs" so *presumably* they are faster, but 
we'll see :)



I don't think we have to pay too much attention to preparing the
code design for it, we can always change it when needed. However,
anything that impacts the DT binding here would have to be designed
to not get in the way of adding it later.


I think this shouldn't pose much of a problem, since IPIs aren't exposed 
in the DT anyway. As long as we decide how we're handling IRQs vs FIQs 
(one or two nodes/drivers), then either of them could take 
responsibility for handling IPIs depending on the platform. We should 
probably just add a "fast-ipi" property to both nodes on platforms that 
support that, so that the drivers can make the decision based on it. Or 
perhaps that should be done with different compatibles?


--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 18/18] arm64: apple: Add initial Mac Mini 2020 (M1) devicetree

2021-02-04 Thread Hector Martin 'marcan'

On 05/02/2021 08.08, Arnd Bergmann wrote:

On Thu, Feb 4, 2021 at 10:44 PM Hector Martin 'marcan'  wrote:

On 05/02/2021 06.29, Arnd Bergmann wrote:

On Thu, Feb 4, 2021 at 9:39 PM Hector Martin  wrote:

We tend to split the dts file into one file per SoC and one for the
specific board. I guess in this case the split can be slightly different,
but it does feel better to be prepared for sharing a lot of the contents
between the different products.

In most cases, you'd want the 'aliases' and 'chosen' nodes to be
in the board specific file.


I thought about that, but wasn't sure if splitting it up at this early
stage made much sense since I'm not sure what the split should be, given
all supported hardware is the same for all 3 released devices.

I'm happy to throw the aliases/chosen nodes into board specific files if
you think that's a good starting point. Perhaps /memory too? Those
properties are filled in/patched by the bootloader anyway...


Yes, I think that would help make it more consistent with other
platforms even if we don't care too much here.


Ack, I'll split it up for v2.


We don't really have overlays in the kernel sources (yet), though it
is something that keeps coming up. For the moment, I'd just
assume you can have one .dts file for each thing you want to
support and keep the shared bits in .dtsi files.


No problem. We'll experiment with overlays in m1n1 and see how that goes.

One thing I wanted to ask: is there some kind of "experimental" policy 
for DT bindings? At early platform bring-up stages it seems like it 
could be valuable to allow for breaking DT changes while we flesh out 
the details (this is especially true of a reverse engineered platform 
like this, where we don't have knowledge of all the hardware details a 
priori). The dozen or so users we might have at this stage obviously 
won't complain too much :)


--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 15/18] irqchip/apple-aic: Add support for the Apple Interrupt Controller

2021-02-04 Thread Hector Martin 'marcan'
 into a state where a handler is run while
it should be masked.


That's a good point. We could filter with the SPSR_ELx mask bits here.

Though the FIQ support patch tries pretty hard to keep the mask bits in 
sync after early boot, so this concern might be somewhat academic. I'm 
happy to implement it if you think it might help though.


--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 18/18] arm64: apple: Add initial Mac Mini 2020 (M1) devicetree

2021-02-04 Thread Hector Martin 'marcan'

On 05/02/2021 06.29, Arnd Bergmann wrote:

On Thu, Feb 4, 2021 at 9:39 PM Hector Martin  wrote:


+/ {
+   model = "Apple Mac Mini M1 2020";
+   compatible = "AAPL,j274", "AAPL,m1", "AAPL,arm-platform";
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   chosen {
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   bootargs = "earlycon";
+   stdout-path = "serial0:150";
+
+   framebuffer0: framebuffer@0 {
+   compatible = "AAPL,simple-framebuffer", 
"simple-framebuffer";
+   reg = <0 0 0 0>; // To be filled by loader
+   // Format properties will be added by loader
+   status = "disabled";
+   };
+   };
+
+   memory@8 {
+   device_type = "memory";
+   reg = <0 0 0 0>; // To be filled by loader
+   };
+
+   aliases {
+   serial0 = 
+   };


We tend to split the dts file into one file per SoC and one for the
specific board. I guess in this case the split can be slightly different,
but it does feel better to be prepared for sharing a lot of the contents
between the different products.

In most cases, you'd want the 'aliases' and 'chosen' nodes to be
in the board specific file.


I thought about that, but wasn't sure if splitting it up at this early 
stage made much sense since I'm not sure what the split should be, given 
all supported hardware is the same for all 3 released devices.


I'm happy to throw the aliases/chosen nodes into board specific files if 
you think that's a good starting point. Perhaps /memory too? Those 
properties are filled in/patched by the bootloader anyway...


There are also DT overlays; I was wondering if we could use those to 
keep the hierarchy and avoid having many duplicate trees in a 
hypothetical bootloader that embeds support for a large set of hardware, 
having it construct the final devicetree on the fly from SoC + a board 
overlay (and possibly further levels); but I'm not sure how that ties in 
with the device trees that live in the Linux tree. Do you have any 
pointers about this?


For reference, this is our current DT patching code in m1n1:

https://github.com/AsahiLinux/m1n1/blob/main/src/kboot.c

Eventually we're going to build some kind of tooling to automate diffing 
Apple device trees and importing changes/new devices into our own, 
though it will probably be quite a while until that is relevant; at this 
stage hand-maintaining them is perfectly fine (in any case this wouldn't 
be fully automated, so in the end our trees will still be organized 
however we want).



+   cpus {
+   #address-cells = <2>;
+   #size-cells = <0>;
+
+   cpu0: cpu@0 {
+   compatible = "AAPL,icestorm";
+   device_type = "cpu";
+   reg = <0x0 0x0>;
+   enable-method = "spin-table";
+   cpu-release-addr = <0 0>; // To be filled by loader
+   };


Did you see the discussion on the #armlinux channel about the possibility
of moving the cpu-enable method to PSCI based on a UEFI runtime
interface?

There are a few open questions about what that would look like in the
end, but Ard has come up with a prototype for the kernel side of it
(obviously untested), which would interface either into the UEFI side
of u-boot, or a simple already-instantiated version that could be
kept inside of m1n1 and stay resident in memory.

I would like to see that model get adopted here eventually. If
we manage to get the other patches ready for an initial merge in
v5.12, we can probably start out with spin-table and move to that
in a following release though.


I saw it go by but need to review it again; I've been missing too much 
sleep this week :) thanks for the reminder.


I think we might want to start with spin-table for now, given that there 
are no kernel changes needed anyway, but I'm happy to take the protoype 
for a spin (:)) and try implementing it in m1n1.


I do think it's valuable for whatever we do, at this stage, to not 
require u-boot; having that be an integral part of the boot chain is 
perfectly fine in the future but right now it helps to have a simple 
boot chain while we work out the early bring-up, and while u-boot grows 
the required support.


--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH 07/18] tty: serial: samsung_tty: enable for ARCH_APPLE

2021-02-04 Thread Hector Martin 'marcan'

On 05/02/2021 06.16, Arnd Bergmann wrote:

On Thu, Feb 4, 2021 at 9:39 PM Hector Martin  wrote:



  config SERIAL_SAMSUNG
 tristate "Samsung SoC serial support"
-   depends on PLAT_SAMSUNG || ARCH_S5PV210 || ARCH_EXYNOS || COMPILE_TEST
+   depends on PLAT_SAMSUNG || ARCH_S5PV210 || ARCH_EXYNOS || COMPILE_TEST 
|| ARCH_APPLE


By convention, please keep "|| COMPILE_TEST" last in the list.


Ack, good catch! Fixed for v2.

--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [RFC PATCH 4/7] irqchip/apple-aic: Add support for Apple AIC

2021-01-21 Thread Hector Martin 'marcan'

On 20/01/2021 22.27, Mohamed Mediouni wrote:

+   irq_domain_set_info(d, virq, hw, _aic_irq_chip,
+   d->host_data, handle_level_irq, NULL, NULL);


The AIC automatically masks IRQs on reason fetch, which means the 
handle_level_irq flow is redundant. Using the fasteoi flow, as in [1], 
should be more efficient.


[1] https://github.com/AsahiLinux/linux/commit/d4cb18c93

--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [RFC PATCH 4/7] irqchip/apple-aic: Add support for Apple AIC

2021-01-21 Thread Hector Martin 'marcan'

On 21/01/2021 19.37, Arnd Bergmann wrote:

On Thu, Jan 21, 2021 at 10:48 AM Linus Walleij  wrote:


However weird it may seem, Apple is not in the file
Documentation/devicetree/bindings/vendor-prefixes.yaml


Since Apple are already using both the "AAPL" and the "apple"
prefix themselves, I have a bad feeling about reusing either of
them for defining the devicetree.org bindings that we add to
linux/Documentation/devicetree/bindings. The question is: if
not "apple", what else should we use here?


This ties into the larger question of how we should handle devicetrees 
in general on these platforms.


The two extremes are:

1) Have the bootloader outright convert ADT to FDT and make Linux 
support the entirety of Apple's devicetree structure, or


2) Maintain our own devicetrees and ignore Apple's entirely

My feeling is that 1) is a non-starter, because Linux ARM device trees 
and Apple ARM device trees have seen independent evolution from the 
PowerPC era, and many details are completely different. Plus conversion 
is non-trivial, because the endianness is different and the format is 
too ambiguous to do programmatically without complex logic.


On the other hand, cranking out devicetrees by hand for every device 
variant that Apple puts out is a waste of time.


Obviously at the bare minimum the bootloader will need to move some 
dynamic information from the ADT to the FDT, but that can be a very 
specific set of properties (memory layout, MAC addresses, etc).


My current thinking is that we should write offline, automated tooling 
to parse, diff, and automatically convert portions of Apple devicetrees 
into Linux ones. Then we can more easily maintain our own, but still 
ultimately have humans decide what goes into the Linux device trees.


It's worth noting that AIUI Apple does not consider their devicetree 
layout to be stable, and it may change any time. On M1 devices, the 
devicetree is provided as part of the iBoot2 firmware bundle, which 
means it changes from one macOS version to the next (this is paired with 
the Darwin kernel itself, and they are upgraded as a unit). It includes 
placeholder values that iBoot2 then replaces with data from NOR before 
handing control over to the kernel. My goal for our long-term project 
[1] is to keep up with iBoot2 updates so that we do not have to instruct 
users to dig up old macOS versions.


Quick TL;DR on how these things boot:
- Boot ROM boots
- iBoot1 (system firmware) in NOR flash which looks for a bootable OS in 
internal storage (only!) in the form of an APFS container+volume and 
then boots
- iBoot2 (OS loader) which loads a bunch of firmware blobs and the 
devicetree off of storage, customizes it with system data from NOR, and 
then loads a wrapped mach-o file containing

- A Darwin kernel, or in our case a Linux bootloader which then boots
- A standard arm64 Linux blob

The boot ROM is ROM. iBoot1 only ever rolls forward (downgrades 
impossible). iBoot2 downgrades are possible but Apple already proved 
they can break this willingly or not, at least in betas (macOS 11.2 
Beta2 iBoot1 cannot boot Beta1 iBoot2). The secureboot chain goes all 
the way up to the mach-o kernel load, that is the first point where we 
can change boot policy to load anything we want (with user consent).


[1] https://asahilinux.org/

--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [RFC PATCH 7/7] irqchip/apple-aic: add SMP support to the Apple AIC driver.

2021-01-21 Thread Hector Martin 'marcan'

On 21/01/2021 22.00, Arnd Bergmann wrote:

Ok, makes sense. Does fast-ipi mean you cannot use the other mode at
all, or is it just faster as implied by the name? If so, how much faster?


The other mode still works, as the AIC IPI registers are still there on 
M1 and work properly. I don't have performance numbers though.


--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [RFC PATCH 2/7] arm64: kernel: Add a WFI hook.

2021-01-21 Thread Hector Martin 'marcan'

I already mentioned this privately, but for the benefit of the ML:

On 21/01/2021 09.48, Mohamed Mediouni wrote:

If we explicitly use the hardware override registers for this, then
we'll be unable to use the deep sleep support provided by wfi on
Apple SoCs later on without touching Apple-specific MSRs.

Our current policy is to avoid having those modified inside the kernel
at all costs, considering it to be a job for the bootloader instead.


I don't think there is any particular reason to do this; the bootloader 
should be responsible for setting up all the chicken bits that make the 
CPU work properly, including doing so for all SMP cores before entering 
the kernel, but that's not the same thing as power management bits.


It seems entirely reasonable to me to configure WFI as clockgate only 
(so it keeps state), not touch this part of kernel code at all, and then 
in the cpuidle driver (which can come later) just do:


- switch to powerdown mode
- save state
- wfi
- restore state
- switch to clockgate mode

--
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH v2] usb: serial: Repair FTDI FT232R bricked eeprom

2020-09-09 Thread Hector Martin &quot;marcan"



On September 10, 2020 12:46:20 PM GMT+09:00, James Hilliard 
 wrote:
>On Wed, Sep 9, 2020 at 9:17 PM Hector Martin "marcan"
> wrote:
>>
>>
>>
>> On September 10, 2020 12:02:34 PM GMT+09:00, Oliver Neukum
> wrote:
>> >Am Mittwoch, den 09.09.2020, 13:34 -0600 schrieb James Hilliard:
>> >> This patch detects and reverses the effects of the malicious FTDI
>> >> Windows driver version 2.12.00(FTDIgate).
>> >
>> >Hi,
>> >
>> >this raises questions.
>> >Should we do this unconditionally without asking?
>> >Does this belong into kernel space?
>>
>> I agree; this is very cute, but does it really need to be an
>automatic Linux feature? Presumably someone looking to fix a bricked
>FTDI chip can just run my script, and those who just want to use those
>chips with Linux already can since the driver binds to the zero PID.
>Well for one your script is not easily useable with embedded platforms
>like mine where I ran into this issue, I have no python2 interpreter
>available in my production builds.

Surely you can port the exact same algorithm to plain userspace C, as you did 
to kernel space C :)

>>
>> I am deeply amused by the idea of Linux automatically fixing problems
>caused by malicious Windows drivers, but thinking objectively, I'm not
>sure if that's the right thing to do.
>From my understanding Linux fixing up hardware issues caused
>by faulty/weird Windows drivers isn't exactly unusual.

I'm not aware of any instances like this where nonvolatile memory is modified. 
At most you'll get things like resetting devices that a previous windows warm 
boot misconfigured, I think?

>>
>> >
>> >> +static int ftdi_repair_brick(struct usb_serial_port *port)
>> >> +{
>> >> +struct ftdi_private *priv = usb_get_serial_port_data(port);
>> >> +int orig_latency;
>> >> +int rv;
>> >> +u16 *eeprom_data;
>> >> +u16 checksum;
>> >> +int eeprom_size;
>> >> +int result;
>> >> +
>> >> +switch (priv->chip_type) {
>> >> +case FT232RL:
>> >> +eeprom_size = 0x40;
>> >> +break;
>> >> +default:
>> >> +/* Unsupported for brick repair */
>> >> +return 0;
>> >> +}
>> >> +
>> >> +/* Latency timer needs to be 0x77 to unlock EEPROM
>programming */
>> >> +    if (priv->latency != 0x77) {
>> >> +orig_latency = priv->latency;
>> >> +priv->latency = 0x77;
>> >> +rv = write_latency_timer(port);
>> >> +priv->latency = orig_latency;
>> >> +if (rv < 0)
>> >> +return -EIO;
>> >> +}
>> >
>> >Do you really want to change this without returning to the original?
>> >
>> >   Regards
>> >   Oliver
>>
>> --
>> Hector Martin "marcan" (hec...@marcansoft.com)
>> Public key: https://mrcn.st/pub

-- 
Hector Martin "marcan" (hec...@marcansoft.com)
Public key: https://mrcn.st/pub


Re: [PATCH v2] usb: serial: Repair FTDI FT232R bricked eeprom

2020-09-09 Thread Hector Martin &quot;marcan"



On September 10, 2020 12:40:59 PM GMT+09:00, James Hilliard 
 wrote:
>On Wed, Sep 9, 2020 at 9:02 PM Oliver Neukum  wrote:
>>
>> Am Mittwoch, den 09.09.2020, 13:34 -0600 schrieb James Hilliard:
>> > This patch detects and reverses the effects of the malicious FTDI
>> > Windows driver version 2.12.00(FTDIgate).
>>
>> Hi,
>>
>> this raises questions.
>> Should we do this unconditionally without asking?
>Well I think since we can reliably detect devices that have been
>bricked by the malicious windows driver it's fine. I was careful to
>ensure that this will bail out and not try to change anything unless
>all
>conditions match this specific brick attack.
>> Does this belong into kernel space?
>This seemed to be by far the simplest option for embedded systems
>that need to automatically detect and repair the bricked eeproms.
>
>People seem to have plugged a bunch of counterfeit FTDI Arduino's
>that normally attach to an embedded Linux host into windows for
>some reason for a kiosk platform of mine which messed up the
>userspace detection/mappings. This seemed like the best way to
>avoid this being a support issue requiring manual unbricking
>prochedures.

If you need to update the kernel on this platform anyway to use this feature, 
why not just introduce a userspace script to fix the bricked units instead?

Needing this autofixed seems like somewhat of a niche use case better served by 
a script on platforms where it might be a problem, rather than upstream kernel 
code.

>>
>> > +static int ftdi_repair_brick(struct usb_serial_port *port)
>> > +{
>> > + struct ftdi_private *priv = usb_get_serial_port_data(port);
>> > + int orig_latency;
>> > + int rv;
>> > + u16 *eeprom_data;
>> > + u16 checksum;
>> > + int eeprom_size;
>> > + int result;
>> > +
>> > + switch (priv->chip_type) {
>> > + case FT232RL:
>> > + eeprom_size = 0x40;
>> > + break;
>> > + default:
>> > + /* Unsupported for brick repair */
>> > + return 0;
>> > + }
>> > +
>> > + /* Latency timer needs to be 0x77 to unlock EEPROM
>programming */
>> > + if (priv->latency != 0x77) {
>> > + orig_latency = priv->latency;
>> > + priv->latency = 0x77;
>> > + rv = write_latency_timer(port);
>> > + priv->latency = orig_latency;
>> > + if (rv < 0)
>> > + return -EIO;
>> > + }
>>
>> Do you really want to change this without returning to the original?
>> @@ -2255,6 +2364,12 @@ static int ftdi_sio_port_probe(struct
>usb_serial_port *port)
>> ftdi_set_max_packet_size(port);
>> if (read_latency_timer(port) < 0)
>> priv->latency = 16;
>> +   vendor_id =
>le16_to_cpu(port->serial->dev->descriptor.idVendor);
>> +   product_id =
>le16_to_cpu(port->serial->dev->descriptor.idProduct);
>> +   if (vendor_id == FTDI_VID &&
>> +   product_id == FTDI_BRICK_PID &&
>> +   priv->chip_type == FT232RL)
>> +   ftdi_repair_brick(port);
>> write_latency_timer(port);
>It should get restored here.
>> create_sysfs_attrs(port);
>>
>>
>> Regards
>> Oliver
>>

-- 
Hector Martin "marcan" (hec...@marcansoft.com)
Public key: https://mrcn.st/pub


Re: [PATCH v2] usb: serial: Repair FTDI FT232R bricked eeprom

2020-09-09 Thread Hector Martin &quot;marcan"



On September 10, 2020 12:02:34 PM GMT+09:00, Oliver Neukum  
wrote:
>Am Mittwoch, den 09.09.2020, 13:34 -0600 schrieb James Hilliard:
>> This patch detects and reverses the effects of the malicious FTDI
>> Windows driver version 2.12.00(FTDIgate).
>
>Hi,
>
>this raises questions.
>Should we do this unconditionally without asking?
>Does this belong into kernel space?

I agree; this is very cute, but does it really need to be an automatic Linux 
feature? Presumably someone looking to fix a bricked FTDI chip can just run my 
script, and those who just want to use those chips with Linux already can since 
the driver binds to the zero PID.

I am deeply amused by the idea of Linux automatically fixing problems caused by 
malicious Windows drivers, but thinking objectively, I'm not sure if that's the 
right thing to do.

>
>> +static int ftdi_repair_brick(struct usb_serial_port *port)
>> +{
>> +struct ftdi_private *priv = usb_get_serial_port_data(port);
>> +int orig_latency;
>> +int rv;
>> +u16 *eeprom_data;
>> +u16 checksum;
>> +int eeprom_size;
>> +int result;
>> +
>> +switch (priv->chip_type) {
>> +case FT232RL:
>> +eeprom_size = 0x40;
>> +break;
>> +default:
>> +/* Unsupported for brick repair */
>> +return 0;
>> +}
>> +
>> +/* Latency timer needs to be 0x77 to unlock EEPROM programming */
>> +if (priv->latency != 0x77) {
>> +orig_latency = priv->latency;
>> +priv->latency = 0x77;
>> +rv = write_latency_timer(port);
>> +priv->latency = orig_latency;
>> +    if (rv < 0)
>> +return -EIO;
>> +}
>
>Do you really want to change this without returning to the original?
>
>   Regards
>   Oliver

-- 
Hector Martin "marcan" (hec...@marcansoft.com)
Public key: https://mrcn.st/pub


Re: [PATCH] firewire-ohci: work around oversized DMA reads on JMicron controllers

2018-01-10 Thread Hector Martin 'marcan'
On 2017-11-13 06:05, Stefan Richter wrote:
> Thanks Hector for the troubleshooting and for the patch.
> Thanks Clemens for the review.
> 
> It's been a while since I last reviewed and tested kernel patches, and
> also my main FireWire equipped PC is currently tied up in work for which
> reboots aren't desirable.  But I am updating a long unused secondary
> FireWire'd PC right now and give the patch some testing this week.  (This
> one even has a JMicron controller, but not an IOMMU.)
> 
Hi Stefan, did you ever get around to testing the patch? It doesn't seem
to have made it into any trees or percolated up to mainline yet.

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH] firewire-ohci: work around oversized DMA reads on JMicron controllers

2018-01-10 Thread Hector Martin 'marcan'
On 2017-11-13 06:05, Stefan Richter wrote:
> Thanks Hector for the troubleshooting and for the patch.
> Thanks Clemens for the review.
> 
> It's been a while since I last reviewed and tested kernel patches, and
> also my main FireWire equipped PC is currently tied up in work for which
> reboots aren't desirable.  But I am updating a long unused secondary
> FireWire'd PC right now and give the patch some testing this week.  (This
> one even has a JMicron controller, but not an IOMMU.)
> 
Hi Stefan, did you ever get around to testing the patch? It doesn't seem
to have made it into any trees or percolated up to mainline yet.

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH] [v2] x86/doc: add PTI description

2018-01-04 Thread Hector Martin 'marcan'
On 2018-01-05 09:24, Dave Hansen wrote:
> + Not specifying this option nothing is equivalent to
> + pti=auto.

-nothing

> +Page Table Isolation (pti, previously known as KAISER[1]) is a
> +countermeasure against attacks on kernel address information such
> +as the "Meltdown" approach[2].

It's not really just address information, but any data. Maybe "attacks
that leak kernel memory"?

> +To avoid leaking address information, we create an new, independent

Same issue here. Also an -> a.

> +copy of the page tables which are used only when running userspace

are -> is. The copy is singular.

> +applications.  When the kernel is entered via syscalls, interrupts or
> +exceptions, page tables are switched to the full "kernel" copy.  When

"the page tables".

> +crippled by setting the NX bit in the top level.  This ensures
> +that if a kernel->user CR3 switch is missed that userspace will
> +crash immediately upon executing its first instruction.

"that userspace" -> "then userspace"

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH] [v2] x86/doc: add PTI description

2018-01-04 Thread Hector Martin 'marcan'
On 2018-01-05 09:24, Dave Hansen wrote:
> + Not specifying this option nothing is equivalent to
> + pti=auto.

-nothing

> +Page Table Isolation (pti, previously known as KAISER[1]) is a
> +countermeasure against attacks on kernel address information such
> +as the "Meltdown" approach[2].

It's not really just address information, but any data. Maybe "attacks
that leak kernel memory"?

> +To avoid leaking address information, we create an new, independent

Same issue here. Also an -> a.

> +copy of the page tables which are used only when running userspace

are -> is. The copy is singular.

> +applications.  When the kernel is entered via syscalls, interrupts or
> +exceptions, page tables are switched to the full "kernel" copy.  When

"the page tables".

> +crippled by setting the NX bit in the top level.  This ensures
> +that if a kernel->user CR3 switch is missed that userspace will
> +crash immediately upon executing its first instruction.

"that userspace" -> "then userspace"

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: Bricked x86 CPU with software?

2018-01-04 Thread Hector Martin 'marcan'
On 2018-01-05 10:21, Tim Mouraveiko wrote:
>> On Thu 2018-01-04 14:13:56, Tim Mouraveiko wrote:
>> Actually... I don't think your code works. That's why I'm curious. But
>> if it works, its rather a big news... and I'm sure Intel and cloud
>> providers are going to be interested.
>>
> 
> I first discovered this issue over a year ago, quite by accident. I changed 
> the code I was 
> working on so as not to kill the CPU (as that is not what I was trying to). 
> We made Intel aware 
> of it. They didn´t care much, one of their personnel suggesting that they 
> already knew about it 
> (whether this is true or not I couldn´t say). It popped up again later, so I 
> had to fix the code 
> again. It could be a buggy implementation of a certain x86 functionality, but 
> I left it at that 
> because I had better things to do with my time.
> 
> Now this news came up about meltdown and spectre and I was curious if anyone 
> else had 
> experienced a dead CPU by software, too. Meltdown and spectre are undeniably 
> a problem, 
> but the magnitude and practicality of it is questionable.
> 
> I suspect that what I discovered is either a kill switch, an unintentional 
> flaw that was 
> implemented at the time the original feature was built into x86 functionality 
> and kept 
> propagating through successive generations of processors, or could well be 
> that I have a 
> very destructive and targeted solar flare that is after my CPUs. So, I 
> figured I would put the 
> question out there, to see if anyone else had a similar experience. Putting 
> the solar flare idea 
> aside, I can´t conclusively say whether it is a flaw or a feature. Both 
> options are supported at 
> this time by my observations of the CPU behavior.
> 

If you made Intel aware of the issue a year ago, and they weren't
interested, then the responsible thing to do is disclose the problem
publicly. This is a security issue (if trusted code can brick a CPU,
it's an issue for bare metal hosting providers; if untrusted code can
brick a CPU, it's a *huge* issue for every cloud provider and many, many
others who run code in various sandboxes). If the vendor is not
receptive to coordinated disclosure, the only option is public
disclosure to at least make people aware of the problem and allow for
mitigations to be developed, if possible.

Personally, I would be very interested in seeing such code. We've seen
several ways to brick nonvolatile firmware (writable BIOSes, bad CMOS
data, etc.), but bricking a CPU is a first. The only way that can happen
is either blowing a kill fuse, or causing actual hardware damage, since
CPUs have no nonvolatile memory other than fuses. Either way this would
be a very interesting result.

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: Bricked x86 CPU with software?

2018-01-04 Thread Hector Martin 'marcan'
On 2018-01-05 10:21, Tim Mouraveiko wrote:
>> On Thu 2018-01-04 14:13:56, Tim Mouraveiko wrote:
>> Actually... I don't think your code works. That's why I'm curious. But
>> if it works, its rather a big news... and I'm sure Intel and cloud
>> providers are going to be interested.
>>
> 
> I first discovered this issue over a year ago, quite by accident. I changed 
> the code I was 
> working on so as not to kill the CPU (as that is not what I was trying to). 
> We made Intel aware 
> of it. They didn´t care much, one of their personnel suggesting that they 
> already knew about it 
> (whether this is true or not I couldn´t say). It popped up again later, so I 
> had to fix the code 
> again. It could be a buggy implementation of a certain x86 functionality, but 
> I left it at that 
> because I had better things to do with my time.
> 
> Now this news came up about meltdown and spectre and I was curious if anyone 
> else had 
> experienced a dead CPU by software, too. Meltdown and spectre are undeniably 
> a problem, 
> but the magnitude and practicality of it is questionable.
> 
> I suspect that what I discovered is either a kill switch, an unintentional 
> flaw that was 
> implemented at the time the original feature was built into x86 functionality 
> and kept 
> propagating through successive generations of processors, or could well be 
> that I have a 
> very destructive and targeted solar flare that is after my CPUs. So, I 
> figured I would put the 
> question out there, to see if anyone else had a similar experience. Putting 
> the solar flare idea 
> aside, I can´t conclusively say whether it is a flaw or a feature. Both 
> options are supported at 
> this time by my observations of the CPU behavior.
> 

If you made Intel aware of the issue a year ago, and they weren't
interested, then the responsible thing to do is disclose the problem
publicly. This is a security issue (if trusted code can brick a CPU,
it's an issue for bare metal hosting providers; if untrusted code can
brick a CPU, it's a *huge* issue for every cloud provider and many, many
others who run code in various sandboxes). If the vendor is not
receptive to coordinated disclosure, the only option is public
disclosure to at least make people aware of the problem and allow for
mitigations to be developed, if possible.

Personally, I would be very interested in seeing such code. We've seen
several ways to brick nonvolatile firmware (writable BIOSes, bad CMOS
data, etc.), but bricking a CPU is a first. The only way that can happen
is either blowing a kill fuse, or causing actual hardware damage, since
CPUs have no nonvolatile memory other than fuses. Either way this would
be a very interesting result.

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check

2017-11-11 Thread Hector Martin 'marcan'
On 2017-11-12 13:21, Andy Lutomirski wrote:
>> Only because Go is not C and is not compiled like this. If all the code
>> is GCC-compiled C code and built with -fstack-check, it should always
>> probe stack pages in order except for potentially the second page in the
>> stack, which may be touched after the third page (but hopefully your
>> stack is at least two pages long to begin with).
> 
> If you're generating code to improve stack overflow, assuming that you
> have at least two pages left seems like an *awful* assumption to make.

If all the code is compiled with the option, then it guarantees you have
at least two pages left, as long as you have at least two pages when the
program/thread starts.

> Gentoo Hardened should seriously consider turning it back off.  Do you
> happen to know what exactly Gentoo does to cause the vdso to get build
> with -fstack-check?  I'll write a patch to either fail the build or to
> force it off.

So you're saying Gentoo Hardened should turn off an exploit mitigation
that, though imperfect, works in the vast majority of cases and seems to
have caused a grand total of one [1] bug in a package so far (two if you
count the one I found)? That seems completely silly to me. I'm sure once
GCC 8 is out with dedicated stack clash protection they'll switch to
that, but in the meantime -fstack-check seems like a perfectly
reasonable idea.

Anyway, they just add it to the default specs for gcc. You can turn it
back off with -fstack-check=no.

You still haven't addressed the important question, though: should vDSO
be *documented* to consume a certain maximum amount of stack space? If
not, this whole thing is pointless since vDSO would be fine as it is. If
yes, then -fstack-check=no isn't the only thing you have to worry about;
more options to limit stack frame size, and perhaps code changes to
inline everything, might be appropriate.

[1] https://bugs.gentoo.org/637152

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check

2017-11-11 Thread Hector Martin 'marcan'
On 2017-11-12 13:21, Andy Lutomirski wrote:
>> Only because Go is not C and is not compiled like this. If all the code
>> is GCC-compiled C code and built with -fstack-check, it should always
>> probe stack pages in order except for potentially the second page in the
>> stack, which may be touched after the third page (but hopefully your
>> stack is at least two pages long to begin with).
> 
> If you're generating code to improve stack overflow, assuming that you
> have at least two pages left seems like an *awful* assumption to make.

If all the code is compiled with the option, then it guarantees you have
at least two pages left, as long as you have at least two pages when the
program/thread starts.

> Gentoo Hardened should seriously consider turning it back off.  Do you
> happen to know what exactly Gentoo does to cause the vdso to get build
> with -fstack-check?  I'll write a patch to either fail the build or to
> force it off.

So you're saying Gentoo Hardened should turn off an exploit mitigation
that, though imperfect, works in the vast majority of cases and seems to
have caused a grand total of one [1] bug in a package so far (two if you
count the one I found)? That seems completely silly to me. I'm sure once
GCC 8 is out with dedicated stack clash protection they'll switch to
that, but in the meantime -fstack-check seems like a perfectly
reasonable idea.

Anyway, they just add it to the default specs for gcc. You can turn it
back off with -fstack-check=no.

You still haven't addressed the important question, though: should vDSO
be *documented* to consume a certain maximum amount of stack space? If
not, this whole thing is pointless since vDSO would be fine as it is. If
yes, then -fstack-check=no isn't the only thing you have to worry about;
more options to limit stack frame size, and perhaps code changes to
inline everything, might be appropriate.

[1] https://bugs.gentoo.org/637152

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check

2017-11-10 Thread Hector Martin 'marcan'
On 2017-11-11 07:04, Andy Lutomirski wrote:
>> On Nov 10, 2017, at 8:36 AM, Hector Martin 'marcan' <mar...@marcan.st> wrote:
>>
>>> On 2017-11-11 01:02, Hector Martin 'marcan' wrote:
>>> Not entirely sure what's going on here.
>>
>> Actually, if you think about it, it doesn't matter that it skips the
>> first page, since it's probing one page more. That just means the caller
>> will have probed the previous page. So ultimately you're just probing
>> ahead of where you need to, but that should be OK.
>>
> 
> The whole point is to touch the stack pages in order.  Also, I see no
> guarantee that the function would touch the intermediate page before
> clobbering the probed page.  You're seeing exactly that behavior, in
> fact.

Only because Go is not C and is not compiled like this. If all the code
is GCC-compiled C code and built with -fstack-check, it should always
probe stack pages in order except for potentially the second page in the
stack, which may be touched after the third page (but hopefully your
stack is at least two pages long to begin with).

AIUI -fstack-check was not intended for stack clash protection (the
latter isn't even in a GCC release yet), but in most circumstances it
seems to me like it's an effective mitigation if all code is compiled
with it. Qualys mentioned it as such in their advisory. This is probably
why Gentoo Hardened enables it by default globally in their toolchain.

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check

2017-11-10 Thread Hector Martin 'marcan'
On 2017-11-11 07:04, Andy Lutomirski wrote:
>> On Nov 10, 2017, at 8:36 AM, Hector Martin 'marcan'  wrote:
>>
>>> On 2017-11-11 01:02, Hector Martin 'marcan' wrote:
>>> Not entirely sure what's going on here.
>>
>> Actually, if you think about it, it doesn't matter that it skips the
>> first page, since it's probing one page more. That just means the caller
>> will have probed the previous page. So ultimately you're just probing
>> ahead of where you need to, but that should be OK.
>>
> 
> The whole point is to touch the stack pages in order.  Also, I see no
> guarantee that the function would touch the intermediate page before
> clobbering the probed page.  You're seeing exactly that behavior, in
> fact.

Only because Go is not C and is not compiled like this. If all the code
is GCC-compiled C code and built with -fstack-check, it should always
probe stack pages in order except for potentially the second page in the
stack, which may be touched after the third page (but hopefully your
stack is at least two pages long to begin with).

AIUI -fstack-check was not intended for stack clash protection (the
latter isn't even in a GCC release yet), but in most circumstances it
seems to me like it's an effective mitigation if all code is compiled
with it. Qualys mentioned it as such in their advisory. This is probably
why Gentoo Hardened enables it by default globally in their toolchain.

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check

2017-11-10 Thread Hector Martin 'marcan'
On 2017-11-11 01:02, Hector Martin 'marcan' wrote:
> Not entirely sure what's going on here.

Actually, if you think about it, it doesn't matter that it skips the
first page, since it's probing one page more. That just means the caller
will have probed the previous page. So ultimately you're just probing
ahead of where you need to, but that should be OK.

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check

2017-11-10 Thread Hector Martin 'marcan'
On 2017-11-11 01:02, Hector Martin 'marcan' wrote:
> Not entirely sure what's going on here.

Actually, if you think about it, it doesn't matter that it skips the
first page, since it's probing one page more. That just means the caller
will have probed the previous page. So ultimately you're just probing
ahead of where you need to, but that should be OK.

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check

2017-11-10 Thread Hector Martin 'marcan'
On 2017-11-10 23:57, Andy Lutomirski wrote:
> This code is so wrong I don't even no where to start.  Seriously, sub,
> orq, add?  How about just orq with an offset?  How about a *load*
> instead of a store?

Stores should be cheaper than loads (since they don't stall), but
apparently the rationale for using orq is:

gcc/config/i386/i386.md: ;; Use IOR for stack probes, this is shorter.

Saves bytes I guess? Though being read-modify-write it probably hurts
performance; I don't know what real CPUs would do with it.

I suspect the sub, add is there to guarantee that the stack pointer is
actually below the probed location. IIRC the x86-64 ABI specifies a
128-byte redzone that you can freely mess with; going beyond that would
require actually changing the stack pointer.

> But stepping back even further, an offset > 4096 is just bogus.
> That's big enough to skip right over the guard page.

The code (gcc/config/i386/i386.c) says:

  /* We skip the probe for the first interval + a small dope of 4 words
 and probe that many bytes past the specified size to maintain a
 protection area at the botton of the stack.  */

Not entirely sure what's going on here.

OTOH I'm not sure why it's probing at all, since AIUI it only needs to
probe for stack frames >4k to begin with.

> Anyway, my recollection is that GCC's stack check code is busted until
> much newer gcc versions.  I suppose we could try to make the kernel
> fail to build at all on a broken configuration like this.

Well, the original point still stands. Even if what GCC is doing is
stupid here, it's not illegal (it's just eating stack space), and the
kernel still currently makes no guarantees about that. So I think the
conversation regarding vDSO stack usage guarantees is still worth having.

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [kernel-hardening] Re: vDSO maximum stack usage, stack probes, and -fstack-check

2017-11-10 Thread Hector Martin 'marcan'
On 2017-11-10 23:57, Andy Lutomirski wrote:
> This code is so wrong I don't even no where to start.  Seriously, sub,
> orq, add?  How about just orq with an offset?  How about a *load*
> instead of a store?

Stores should be cheaper than loads (since they don't stall), but
apparently the rationale for using orq is:

gcc/config/i386/i386.md: ;; Use IOR for stack probes, this is shorter.

Saves bytes I guess? Though being read-modify-write it probably hurts
performance; I don't know what real CPUs would do with it.

I suspect the sub, add is there to guarantee that the stack pointer is
actually below the probed location. IIRC the x86-64 ABI specifies a
128-byte redzone that you can freely mess with; going beyond that would
require actually changing the stack pointer.

> But stepping back even further, an offset > 4096 is just bogus.
> That's big enough to skip right over the guard page.

The code (gcc/config/i386/i386.c) says:

  /* We skip the probe for the first interval + a small dope of 4 words
 and probe that many bytes past the specified size to maintain a
 protection area at the botton of the stack.  */

Not entirely sure what's going on here.

OTOH I'm not sure why it's probing at all, since AIUI it only needs to
probe for stack frames >4k to begin with.

> Anyway, my recollection is that GCC's stack check code is busted until
> much newer gcc versions.  I suppose we could try to make the kernel
> fail to build at all on a broken configuration like this.

Well, the original point still stands. Even if what GCC is doing is
stupid here, it's not illegal (it's just eating stack space), and the
kernel still currently makes no guarantees about that. So I think the
conversation regarding vDSO stack usage guarantees is still worth having.

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


vDSO maximum stack usage, stack probes, and -fstack-check

2017-11-10 Thread Hector Martin 'marcan'
As far as I know, the vDSO specs (both Documentation/ABI/stable/vdso and
`man 7 vdso`) make no mention of how much stack the vDSO functions are
allowed to use. They just say "the usual C ABI", which makes no guarantees.

It turns out that Go has been assuming that those functions use less
than 104 bytes of stack space, because it calls them directly on its
tiny stack allocations with no guard pages or other hardware overflow
protection [1]. On most systems, this is fine.

However, on my system the stars aligned and turned it into a
nondeterministic crash. I use Gentoo Hardened, which builds its
toolchain with -fstack-check on by default. It turns out that with the
combination of GCC 6.4.0, -fstack-protect, linux-4.13.9-gentoo, and
CONFIG_OPTIMIZE_INLINING=n, gcc decides to *not* inline vread_tsc (it's
not marked inline, so it's perfectly within its right not to do that,
though for some reason it does inline when CONFIG_OPTIMIZE_INLINING=y
even though that nominally gives it greater freedom *not* to inline
things marked inline). That turns __vdso_clock_gettime and
__vdso_gettimeofday into non-leaf functions, and GCC then inserts a
stack probe (full objdump at [2]):

0030 <__vdso_clock_gettime>:
  30:   55  push   %rbp
  31:   48 89 e5mov%rsp,%rbp
  34:   48 81 ec 20 10 00 00sub$0x1020,%rsp
  3b:   48 83 0c 24 00  orq$0x0,(%rsp)
  40:   48 81 c4 20 10 00 00add$0x1020,%rsp

That silently overflows the Go stack. "orq 0" does nothing as long as
the page is mapped, but it's not atomic. It turns out that sometimes
(pretty often on my box) that races another thread accessing the same
location and corrupts memory.

The stack probe sounds unnecessary, since it only calls vread_tsc and
that can't ever skip over more than a page of stack. In fact I don't
even know why it does the probe; I thought the point of stack probes was
to poke the stack on allocations >4K to ensure the guard page isn't
skipped, but none of these functions use more than a few bytes of stack
space. Nonetheless, none of this is wrong per se; the current vDSO spec
makes no guarantees about stack usage.

The question is, should it? Should the vDSO spec set a hard limit on
stack consumption that userspace can rely on, and perhaps inline
everything and/or disable -fstack-check to avoid the stack probes?

[1] https://github.com/golang/go/issues/20427#issuecomment-343255844
[2] https://marcan.st/paste/HCVuLG6T.txt

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


vDSO maximum stack usage, stack probes, and -fstack-check

2017-11-10 Thread Hector Martin 'marcan'
As far as I know, the vDSO specs (both Documentation/ABI/stable/vdso and
`man 7 vdso`) make no mention of how much stack the vDSO functions are
allowed to use. They just say "the usual C ABI", which makes no guarantees.

It turns out that Go has been assuming that those functions use less
than 104 bytes of stack space, because it calls them directly on its
tiny stack allocations with no guard pages or other hardware overflow
protection [1]. On most systems, this is fine.

However, on my system the stars aligned and turned it into a
nondeterministic crash. I use Gentoo Hardened, which builds its
toolchain with -fstack-check on by default. It turns out that with the
combination of GCC 6.4.0, -fstack-protect, linux-4.13.9-gentoo, and
CONFIG_OPTIMIZE_INLINING=n, gcc decides to *not* inline vread_tsc (it's
not marked inline, so it's perfectly within its right not to do that,
though for some reason it does inline when CONFIG_OPTIMIZE_INLINING=y
even though that nominally gives it greater freedom *not* to inline
things marked inline). That turns __vdso_clock_gettime and
__vdso_gettimeofday into non-leaf functions, and GCC then inserts a
stack probe (full objdump at [2]):

0030 <__vdso_clock_gettime>:
  30:   55  push   %rbp
  31:   48 89 e5mov%rsp,%rbp
  34:   48 81 ec 20 10 00 00sub$0x1020,%rsp
  3b:   48 83 0c 24 00  orq$0x0,(%rsp)
  40:   48 81 c4 20 10 00 00add$0x1020,%rsp

That silently overflows the Go stack. "orq 0" does nothing as long as
the page is mapped, but it's not atomic. It turns out that sometimes
(pretty often on my box) that races another thread accessing the same
location and corrupts memory.

The stack probe sounds unnecessary, since it only calls vread_tsc and
that can't ever skip over more than a page of stack. In fact I don't
even know why it does the probe; I thought the point of stack probes was
to poke the stack on allocations >4K to ensure the guard page isn't
skipped, but none of these functions use more than a few bytes of stack
space. Nonetheless, none of this is wrong per se; the current vDSO spec
makes no guarantees about stack usage.

The question is, should it? Should the vDSO spec set a hard limit on
stack consumption that userspace can rely on, and perhaps inline
everything and/or disable -fstack-check to avoid the stack probes?

[1] https://github.com/golang/go/issues/20427#issuecomment-343255844
[2] https://marcan.st/paste/HCVuLG6T.txt

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: https://mrcn.st/pub


Re: [PATCH v1 1/3] Add the latent_entropy gcc plugin

2016-05-30 Thread Hector Martin &quot;marcan"
On 2016-05-31 00:40, Kees Cook wrote:
> On Sun, May 29, 2016 at 8:46 PM, Hector Martin "marcan"
> <mar...@marcan.st> wrote:
>> On 2016-05-30 11:16, Kees Cook wrote:
>>> On Sun, May 29, 2016 at 10:59 AM, Hector Martin <mar...@marcan.st> wrote:
>>>> On Mon, May 23, 2016 at 3:15 PM, Emese Revfy <re.em...@gmail.com> wrote:
>>>>> +/*
>>>>> + * Copyright 2012-2016 by the PaX Team <pagee...@freemail.hu>
>>>>> + * Copyright 2016 by Emese Revfy <re.em...@gmail.com>
>>>>> + * Licensed under the GPL v2
>>>>> + *
>>>>> + * Note: the choice of the license means that the compilation process is
>>>>> + *   NOT 'eligible' as defined by gcc's library exception to the GPL 
>>>>> v3,
>>>>> + *   but for the kernel it doesn't matter since it doesn't link 
>>>>> against
>>>>> + *   any of the gcc libraries
>>>>> + *
>>>>> + * gcc plugin to help generate a little bit of entropy from program 
>>>>> state,
>>>>> + * used throughout the uptime of the kernel
>>>>
>>>> The "Note" seems misleading. Since this is a GCC plugin, and directly
>>>> uses GCC's internal interfaces, doesn't that make it a derived work of
>>>> GCC, and thus, require that it be licensed under GPLv3 instead of GPLv2
>>>> (which is incompatible)?
>>>>
>>>> AFAIK this is how the GPLv3 works in this context, and the GCC exception
>>>> doesn't change that because it only applies to libgcc and friends (and
>>>> does not weaken the default effects of the GPL over the rest of GCC). My
>>>> understanding is that the whole "eligible compilation" licensing hack
>>>> was designed to hinder non-linking proprietary compilation passes that
>>>> operate over data files containing an internal GCC representation, but
>>>> plain old loaded plugins still need to be GPLv3 regardless of whether
>>>> you link the end result to libgcc or not.
>>>>
>>>> (Also, don't some arches link against libgcc, further complicating this?
>>>> Trying to use this compiler plugin with those arches would wind up with
>>>> non-redistributable kernels, this time due to the exception.)
>>>
>>> IANAL. My interpretation is that plugins can be whatever license they
>>> want to be, and if they declare that they're GPL-compatible (which
>>> GPLv2 is), then the produced output can be under whatever license it
>>> wants. Regardless, things are clearly following the intended purposes:
>>> the plugin is free software, used to help gcc compile free software,
>>> so no weird proprietary source, steps, or outputs, which is what the
>>> gcc folks were trying to make sure continued to happen.
>>
>> The first problem isn't the output, it's the plugin itself. The FSF is
>> clear on their interpretation that plugins need to be licensed under a
>> compatible license under this kind of scenario:
>>
>> http://www.gnu.org/licenses/gpl-faq.en.html#GPLAndPlugins
>>
>> Since the FSF is the copyright owner for GCC, I'd say that makes a
>> pretty clear case that the plugin needs to be GPLv3 (they don't specify
>> v2 vs. v3 in that FAQ, but since the licenses are incompatible, GPLv2 is
>> as good as proprietary in the eyes of GPLv3).
> 
> "If the program dynamically links plug-ins, and they make function
> calls to each other and share data structures, we believe they form a
> single program, which must be treated as an extension of both the main
> program and the plug-ins. This means you must license the plug-in
> under the GPL or a GPL-compatible free software license and distribute
> it with source code in a GPL-compliant way."
> 
> I would point out "we believe" followed up with "GPL-compatible" (not
> GPLv3-compatible, and the FAQ does distinguish between them in other
> places). And in
> http://www.gnu.org/licenses/gpl-faq.html#WhatDoesCompatMean they
> appear to be all-inclusive about GPL versions.

That GPLv2 and GPLv3 are not compatible is pretty well established:
http://www.gnu.org/licenses/gpl-faq.en.html#v2v3Compatibility

Ultimately, though, the license is what matters. And the GPLv3 license
is pretty clear:

> For example, Corresponding Source includes interface definition files
> associated with source files for the work

Thus, GCC internal interfaces are covered by the GPLv3.

> To “modify” a work means to copy from or adapt all or part of the work
> in a fa

Re: [PATCH v1 1/3] Add the latent_entropy gcc plugin

2016-05-30 Thread Hector Martin &quot;marcan"
On 2016-05-31 00:40, Kees Cook wrote:
> On Sun, May 29, 2016 at 8:46 PM, Hector Martin "marcan"
>  wrote:
>> On 2016-05-30 11:16, Kees Cook wrote:
>>> On Sun, May 29, 2016 at 10:59 AM, Hector Martin  wrote:
>>>> On Mon, May 23, 2016 at 3:15 PM, Emese Revfy  wrote:
>>>>> +/*
>>>>> + * Copyright 2012-2016 by the PaX Team 
>>>>> + * Copyright 2016 by Emese Revfy 
>>>>> + * Licensed under the GPL v2
>>>>> + *
>>>>> + * Note: the choice of the license means that the compilation process is
>>>>> + *   NOT 'eligible' as defined by gcc's library exception to the GPL 
>>>>> v3,
>>>>> + *   but for the kernel it doesn't matter since it doesn't link 
>>>>> against
>>>>> + *   any of the gcc libraries
>>>>> + *
>>>>> + * gcc plugin to help generate a little bit of entropy from program 
>>>>> state,
>>>>> + * used throughout the uptime of the kernel
>>>>
>>>> The "Note" seems misleading. Since this is a GCC plugin, and directly
>>>> uses GCC's internal interfaces, doesn't that make it a derived work of
>>>> GCC, and thus, require that it be licensed under GPLv3 instead of GPLv2
>>>> (which is incompatible)?
>>>>
>>>> AFAIK this is how the GPLv3 works in this context, and the GCC exception
>>>> doesn't change that because it only applies to libgcc and friends (and
>>>> does not weaken the default effects of the GPL over the rest of GCC). My
>>>> understanding is that the whole "eligible compilation" licensing hack
>>>> was designed to hinder non-linking proprietary compilation passes that
>>>> operate over data files containing an internal GCC representation, but
>>>> plain old loaded plugins still need to be GPLv3 regardless of whether
>>>> you link the end result to libgcc or not.
>>>>
>>>> (Also, don't some arches link against libgcc, further complicating this?
>>>> Trying to use this compiler plugin with those arches would wind up with
>>>> non-redistributable kernels, this time due to the exception.)
>>>
>>> IANAL. My interpretation is that plugins can be whatever license they
>>> want to be, and if they declare that they're GPL-compatible (which
>>> GPLv2 is), then the produced output can be under whatever license it
>>> wants. Regardless, things are clearly following the intended purposes:
>>> the plugin is free software, used to help gcc compile free software,
>>> so no weird proprietary source, steps, or outputs, which is what the
>>> gcc folks were trying to make sure continued to happen.
>>
>> The first problem isn't the output, it's the plugin itself. The FSF is
>> clear on their interpretation that plugins need to be licensed under a
>> compatible license under this kind of scenario:
>>
>> http://www.gnu.org/licenses/gpl-faq.en.html#GPLAndPlugins
>>
>> Since the FSF is the copyright owner for GCC, I'd say that makes a
>> pretty clear case that the plugin needs to be GPLv3 (they don't specify
>> v2 vs. v3 in that FAQ, but since the licenses are incompatible, GPLv2 is
>> as good as proprietary in the eyes of GPLv3).
> 
> "If the program dynamically links plug-ins, and they make function
> calls to each other and share data structures, we believe they form a
> single program, which must be treated as an extension of both the main
> program and the plug-ins. This means you must license the plug-in
> under the GPL or a GPL-compatible free software license and distribute
> it with source code in a GPL-compliant way."
> 
> I would point out "we believe" followed up with "GPL-compatible" (not
> GPLv3-compatible, and the FAQ does distinguish between them in other
> places). And in
> http://www.gnu.org/licenses/gpl-faq.html#WhatDoesCompatMean they
> appear to be all-inclusive about GPL versions.

That GPLv2 and GPLv3 are not compatible is pretty well established:
http://www.gnu.org/licenses/gpl-faq.en.html#v2v3Compatibility

Ultimately, though, the license is what matters. And the GPLv3 license
is pretty clear:

> For example, Corresponding Source includes interface definition files
> associated with source files for the work

Thus, GCC internal interfaces are covered by the GPLv3.

> To “modify” a work means to copy from or adapt all or part of the work
> in a fashion requiring copyright permission, other than the making of an
> exact copy. The resulting work is called a “modified version”

Re: [PATCH v1 1/3] Add the latent_entropy gcc plugin

2016-05-29 Thread Hector Martin &quot;marcan"
On 2016-05-30 11:16, Kees Cook wrote:
> On Sun, May 29, 2016 at 10:59 AM, Hector Martin <mar...@marcan.st> wrote:
>> On Mon, May 23, 2016 at 3:15 PM, Emese Revfy <re.em...@gmail.com> wrote:
>>> +/*
>>> + * Copyright 2012-2016 by the PaX Team <pagee...@freemail.hu>
>>> + * Copyright 2016 by Emese Revfy <re.em...@gmail.com>
>>> + * Licensed under the GPL v2
>>> + *
>>> + * Note: the choice of the license means that the compilation process is
>>> + *   NOT 'eligible' as defined by gcc's library exception to the GPL 
>>> v3,
>>> + *   but for the kernel it doesn't matter since it doesn't link against
>>> + *   any of the gcc libraries
>>> + *
>>> + * gcc plugin to help generate a little bit of entropy from program state,
>>> + * used throughout the uptime of the kernel
>>
>> The "Note" seems misleading. Since this is a GCC plugin, and directly
>> uses GCC's internal interfaces, doesn't that make it a derived work of
>> GCC, and thus, require that it be licensed under GPLv3 instead of GPLv2
>> (which is incompatible)?
>>
>> AFAIK this is how the GPLv3 works in this context, and the GCC exception
>> doesn't change that because it only applies to libgcc and friends (and
>> does not weaken the default effects of the GPL over the rest of GCC). My
>> understanding is that the whole "eligible compilation" licensing hack
>> was designed to hinder non-linking proprietary compilation passes that
>> operate over data files containing an internal GCC representation, but
>> plain old loaded plugins still need to be GPLv3 regardless of whether
>> you link the end result to libgcc or not.
>>
>> (Also, don't some arches link against libgcc, further complicating this?
>> Trying to use this compiler plugin with those arches would wind up with
>> non-redistributable kernels, this time due to the exception.)
> 
> IANAL. My interpretation is that plugins can be whatever license they
> want to be, and if they declare that they're GPL-compatible (which
> GPLv2 is), then the produced output can be under whatever license it
> wants. Regardless, things are clearly following the intended purposes:
> the plugin is free software, used to help gcc compile free software,
> so no weird proprietary source, steps, or outputs, which is what the
> gcc folks were trying to make sure continued to happen.

The first problem isn't the output, it's the plugin itself. The FSF is
clear on their interpretation that plugins need to be licensed under a
compatible license under this kind of scenario:

http://www.gnu.org/licenses/gpl-faq.en.html#GPLAndPlugins

Since the FSF is the copyright owner for GCC, I'd say that makes a
pretty clear case that the plugin needs to be GPLv3 (they don't specify
v2 vs. v3 in that FAQ, but since the licenses are incompatible, GPLv2 is
as good as proprietary in the eyes of GPLv3).

As for the output, unfortunately, GPLv2 is not "GPL-compatible". This is
defined here:

http://www.gnu.org/licenses/gcc-exception-3.1.en.html

> "GPL-compatible Software" is software whose conditions of propagation,
> modification and use would permit combination with GCC in accord with
> the license of GCC.

It's one of those unfortunate cases where a term defined in the legalese
doesn't match the obvious interpretation, but in this context,
"GPL-compatioble" means "GPLv3-compatible", and GPLv2 isn't - so the
plugin combined with GCC does not for an "Eligible Compilation Process",
and that means that if it is used to compile software that links with
libgcc, that software has to be GPLv3-compatible - which the kernel
isn't. AFAICT, that makes any kernels built for arches which link libgcc
and which use this plugin non-redistributable. There is no issue with
the source code form, as the kernel does not explicitly invoke libgcc
interfaces, but the binary would contain a mix of GPLv2-only and
GPLv3-only-exception-does-not-apply code.

IANAL, but regardless of whether this is all free software anyway,
license compliance is important. If all free software were to be treated
equally, we wouldn't have choices like GPLv2 vs GPLv3 vs BSD vs MIT to
begin with.

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: http://www.marcansoft.com/marcan.asc


Re: [PATCH v1 1/3] Add the latent_entropy gcc plugin

2016-05-29 Thread Hector Martin &quot;marcan"
On 2016-05-30 11:16, Kees Cook wrote:
> On Sun, May 29, 2016 at 10:59 AM, Hector Martin  wrote:
>> On Mon, May 23, 2016 at 3:15 PM, Emese Revfy  wrote:
>>> +/*
>>> + * Copyright 2012-2016 by the PaX Team 
>>> + * Copyright 2016 by Emese Revfy 
>>> + * Licensed under the GPL v2
>>> + *
>>> + * Note: the choice of the license means that the compilation process is
>>> + *   NOT 'eligible' as defined by gcc's library exception to the GPL 
>>> v3,
>>> + *   but for the kernel it doesn't matter since it doesn't link against
>>> + *   any of the gcc libraries
>>> + *
>>> + * gcc plugin to help generate a little bit of entropy from program state,
>>> + * used throughout the uptime of the kernel
>>
>> The "Note" seems misleading. Since this is a GCC plugin, and directly
>> uses GCC's internal interfaces, doesn't that make it a derived work of
>> GCC, and thus, require that it be licensed under GPLv3 instead of GPLv2
>> (which is incompatible)?
>>
>> AFAIK this is how the GPLv3 works in this context, and the GCC exception
>> doesn't change that because it only applies to libgcc and friends (and
>> does not weaken the default effects of the GPL over the rest of GCC). My
>> understanding is that the whole "eligible compilation" licensing hack
>> was designed to hinder non-linking proprietary compilation passes that
>> operate over data files containing an internal GCC representation, but
>> plain old loaded plugins still need to be GPLv3 regardless of whether
>> you link the end result to libgcc or not.
>>
>> (Also, don't some arches link against libgcc, further complicating this?
>> Trying to use this compiler plugin with those arches would wind up with
>> non-redistributable kernels, this time due to the exception.)
> 
> IANAL. My interpretation is that plugins can be whatever license they
> want to be, and if they declare that they're GPL-compatible (which
> GPLv2 is), then the produced output can be under whatever license it
> wants. Regardless, things are clearly following the intended purposes:
> the plugin is free software, used to help gcc compile free software,
> so no weird proprietary source, steps, or outputs, which is what the
> gcc folks were trying to make sure continued to happen.

The first problem isn't the output, it's the plugin itself. The FSF is
clear on their interpretation that plugins need to be licensed under a
compatible license under this kind of scenario:

http://www.gnu.org/licenses/gpl-faq.en.html#GPLAndPlugins

Since the FSF is the copyright owner for GCC, I'd say that makes a
pretty clear case that the plugin needs to be GPLv3 (they don't specify
v2 vs. v3 in that FAQ, but since the licenses are incompatible, GPLv2 is
as good as proprietary in the eyes of GPLv3).

As for the output, unfortunately, GPLv2 is not "GPL-compatible". This is
defined here:

http://www.gnu.org/licenses/gcc-exception-3.1.en.html

> "GPL-compatible Software" is software whose conditions of propagation,
> modification and use would permit combination with GCC in accord with
> the license of GCC.

It's one of those unfortunate cases where a term defined in the legalese
doesn't match the obvious interpretation, but in this context,
"GPL-compatioble" means "GPLv3-compatible", and GPLv2 isn't - so the
plugin combined with GCC does not for an "Eligible Compilation Process",
and that means that if it is used to compile software that links with
libgcc, that software has to be GPLv3-compatible - which the kernel
isn't. AFAICT, that makes any kernels built for arches which link libgcc
and which use this plugin non-redistributable. There is no issue with
the source code form, as the kernel does not explicitly invoke libgcc
interfaces, but the binary would contain a mix of GPLv2-only and
GPLv3-only-exception-does-not-apply code.

IANAL, but regardless of whether this is all free software anyway,
license compliance is important. If all free software were to be treated
equally, we wouldn't have choices like GPLv2 vs GPLv3 vs BSD vs MIT to
begin with.

-- 
Hector Martin "marcan" (mar...@marcan.st)
Public Key: http://www.marcansoft.com/marcan.asc