from:"Peter Hurley"

Re: console issue since 3.6, console=ttyS1 hangs

2016-11-04 Thread Peter Hurley

On Fri, Nov 4, 2016 at 3:33 PM, Nathan Zimmer <nzim...@sgi.com> wrote:
> On Thu, Nov 03, 2016 at 06:25:46PM -0600, Peter Hurley wrote:
>> On Wed, Nov 2, 2016 at 9:29 AM, Nathan Zimmer <nzim...@sgi.com> wrote:
>> > On Mon, Oct 31, 2016 at 08:55:49PM -0600, Peter Hurley wrote:
>> >> On Mon, Oct 31, 2016 at 2:27 PM, Sean Young <s...@mess.org> wrote:
>> >> > On Sun, Oct 30, 2016 at 10:33:02AM -0500, Nathan wrote:
>> >> >> I think this should be PNP0501 instead of PNP0c02.
>> >> >> Once I alter that then when I boot the serial comes up on irq 3. 
>> >> >> However it
>> >> >> still hangs.
>> >> >> I'll keep digging.
>> >> >
>> >> > Well that's that theory out of the window. I'm not sure where to look 
>> >> > now,
>> >> > I would start by enabling as many as possible of the "kernel hacking" 
>> >> > config
>> >> > options and see if anything gets caught.
>> >> >
>> >> > Looking at your earlier messages, you have a collection of percpu 
>> >> > allocation
>> >> > failures. That might be worth resolving before anything else.
>> >>
>> >> Hi Nathan,
>> >>
>> >> Couple of questions:
>> >> 1. Was login over serial console setup and working on SLES 11?  or was
>> >> the 'console=ttyS1' only for debug output?
>> >> I ask because console output doesn't use IRQs; iow, maybe the serial
>> >> port w/ driver never actually worked.
>> >> 2. Can you post dmesg for the SLES 11 setup?  That would show if there
>> >> were probe errors even on that.
>> >>
>> >> An alternative that should be equivalent to your previous setup is to
>> >> build w/ CONFIG_SERIAL_8250_PNP=n
>> >> Seems like your ACPI BIOS is buggy, but also that something else is using 
>> >> IRQ 3?
>> >>
>> >> Regards,
>> >> Peter Hurley
>> >
>> >
>> >
>> > 1) Yes I can confirm I used it to login sometimes.
>> >
>> > I built with CONFIG_SERIAL_8250_PNP=n and that seemed to work better, in 
>> > that the system did not hang.
>> > However I couldn't login on the serial and got these error messages, I 
>> > suspect I broke something while trying different permutations.
>> >
>> > gdm[5206]: WARNING: GdmDisplay: display lasted 0.136636 seconds
>> > gdm[5206]: WARNING: GdmDisplay: display lasted 0.180955 seconds
>> > gdm[5206]: WARNING: GdmDisplay: display lasted 0.161415 seconds
>> > gdm[5206]: WARNING: GdmLocalDisplayFactory: maximum number of X display 
>> > failures reached: check X server log for errors
>> >
>> > It did boot all the way though.
>> >
>> > 2) attached log
>>
>> So I'm confused where this leaves us.
>>
>> In your OP, you claim to have gotten it working with a partial revert
>> of commit 835d844d1a28 (but you didn't attach the partial revert so no
>> one knows what you did); however, my suggestion should have been
>> equivalent.
>
> I apologize, if I was unclear.  Your suggestion of CONFIG_SERIAL_8250_PNP=n 
> did successfully boot and provide messages
> across the console, and yes is basically equivelent to the revert.

Ok, so the partial revert didn't get the login working then?

> Those warnings I just noticed in the dmesg and they weren't there before.
>
>>
>> Note that you have the serial port disabled in BIOS; that's why you're
>> getting the probe error for PNP.
>
> Now when you say its diabled in bios, how can I be sure and double check that?

Well, the ACPI BIOS is reporting it as disabled. Even the SLES11 log says:

[2.136899] pnp 00:04: Plug and Play ACPI device, IDs PNP0501 (disabled)


> These bios screens do not have any mention of PNP settings.
> I am getting output over the console (via ipmi) until the boot hangs.

Yeah, probably the device actually decodes io address access anyway,
but in the disabled state probably has not routed IRQ.

I have no idea how to help you with the bios, sorry.

Regards,
Peter Hurley

Re: console issue since 3.6, console=ttyS1 hangs

2016-11-04 Thread Peter Hurley

On Fri, Nov 4, 2016 at 3:33 PM, Nathan Zimmer  wrote:
> On Thu, Nov 03, 2016 at 06:25:46PM -0600, Peter Hurley wrote:
>> On Wed, Nov 2, 2016 at 9:29 AM, Nathan Zimmer  wrote:
>> > On Mon, Oct 31, 2016 at 08:55:49PM -0600, Peter Hurley wrote:
>> >> On Mon, Oct 31, 2016 at 2:27 PM, Sean Young  wrote:
>> >> > On Sun, Oct 30, 2016 at 10:33:02AM -0500, Nathan wrote:
>> >> >> I think this should be PNP0501 instead of PNP0c02.
>> >> >> Once I alter that then when I boot the serial comes up on irq 3. 
>> >> >> However it
>> >> >> still hangs.
>> >> >> I'll keep digging.
>> >> >
>> >> > Well that's that theory out of the window. I'm not sure where to look 
>> >> > now,
>> >> > I would start by enabling as many as possible of the "kernel hacking" 
>> >> > config
>> >> > options and see if anything gets caught.
>> >> >
>> >> > Looking at your earlier messages, you have a collection of percpu 
>> >> > allocation
>> >> > failures. That might be worth resolving before anything else.
>> >>
>> >> Hi Nathan,
>> >>
>> >> Couple of questions:
>> >> 1. Was login over serial console setup and working on SLES 11?  or was
>> >> the 'console=ttyS1' only for debug output?
>> >> I ask because console output doesn't use IRQs; iow, maybe the serial
>> >> port w/ driver never actually worked.
>> >> 2. Can you post dmesg for the SLES 11 setup?  That would show if there
>> >> were probe errors even on that.
>> >>
>> >> An alternative that should be equivalent to your previous setup is to
>> >> build w/ CONFIG_SERIAL_8250_PNP=n
>> >> Seems like your ACPI BIOS is buggy, but also that something else is using 
>> >> IRQ 3?
>> >>
>> >> Regards,
>> >> Peter Hurley
>> >
>> >
>> >
>> > 1) Yes I can confirm I used it to login sometimes.
>> >
>> > I built with CONFIG_SERIAL_8250_PNP=n and that seemed to work better, in 
>> > that the system did not hang.
>> > However I couldn't login on the serial and got these error messages, I 
>> > suspect I broke something while trying different permutations.
>> >
>> > gdm[5206]: WARNING: GdmDisplay: display lasted 0.136636 seconds
>> > gdm[5206]: WARNING: GdmDisplay: display lasted 0.180955 seconds
>> > gdm[5206]: WARNING: GdmDisplay: display lasted 0.161415 seconds
>> > gdm[5206]: WARNING: GdmLocalDisplayFactory: maximum number of X display 
>> > failures reached: check X server log for errors
>> >
>> > It did boot all the way though.
>> >
>> > 2) attached log
>>
>> So I'm confused where this leaves us.
>>
>> In your OP, you claim to have gotten it working with a partial revert
>> of commit 835d844d1a28 (but you didn't attach the partial revert so no
>> one knows what you did); however, my suggestion should have been
>> equivalent.
>
> I apologize, if I was unclear.  Your suggestion of CONFIG_SERIAL_8250_PNP=n 
> did successfully boot and provide messages
> across the console, and yes is basically equivelent to the revert.

Ok, so the partial revert didn't get the login working then?

> Those warnings I just noticed in the dmesg and they weren't there before.
>
>>
>> Note that you have the serial port disabled in BIOS; that's why you're
>> getting the probe error for PNP.
>
> Now when you say its diabled in bios, how can I be sure and double check that?

Well, the ACPI BIOS is reporting it as disabled. Even the SLES11 log says:

[2.136899] pnp 00:04: Plug and Play ACPI device, IDs PNP0501 (disabled)


> These bios screens do not have any mention of PNP settings.
> I am getting output over the console (via ipmi) until the boot hangs.

Yeah, probably the device actually decodes io address access anyway,
but in the disabled state probably has not routed IRQ.

I have no idea how to help you with the bios, sorry.

Regards,
Peter Hurley

Re: more hangs in the tty layer

2016-11-03 Thread Peter Hurley

truct: http://paste.ubuntu.com/23273973/
>
> To me this seems like a lost wakeup event, especially that I've
> seen patches flying around fixing such issues. Unfortunately I
> cannot reliably reproduce this and booting a production machine
> with lockdep would be problematic.

Too bad because lockdep would definitively tell you what process is
holding the ldsem for this tty.
If you paste *all the tasks*, I can try to see if I recognize the task
stack that is holding the ldsem.

Regards,
Peter Hurley

Re: more hangs in the tty layer

2016-11-03 Thread Peter Hurley

u.com/23273973/
>
> To me this seems like a lost wakeup event, especially that I've
> seen patches flying around fixing such issues. Unfortunately I
> cannot reliably reproduce this and booting a production machine
> with lockdep would be problematic.

Too bad because lockdep would definitively tell you what process is
holding the ldsem for this tty.
If you paste *all the tasks*, I can try to see if I recognize the task
stack that is holding the ldsem.

Regards,
Peter Hurley

Re: console issue since 3.6, console=ttyS1 hangs

2016-11-03 Thread Peter Hurley

On Wed, Nov 2, 2016 at 9:29 AM, Nathan Zimmer <nzim...@sgi.com> wrote:
> On Mon, Oct 31, 2016 at 08:55:49PM -0600, Peter Hurley wrote:
>> On Mon, Oct 31, 2016 at 2:27 PM, Sean Young <s...@mess.org> wrote:
>> > On Sun, Oct 30, 2016 at 10:33:02AM -0500, Nathan wrote:
>> >> I think this should be PNP0501 instead of PNP0c02.
>> >> Once I alter that then when I boot the serial comes up on irq 3. However 
>> >> it
>> >> still hangs.
>> >> I'll keep digging.
>> >
>> > Well that's that theory out of the window. I'm not sure where to look now,
>> > I would start by enabling as many as possible of the "kernel hacking" 
>> > config
>> > options and see if anything gets caught.
>> >
>> > Looking at your earlier messages, you have a collection of percpu 
>> > allocation
>> > failures. That might be worth resolving before anything else.
>>
>> Hi Nathan,
>>
>> Couple of questions:
>> 1. Was login over serial console setup and working on SLES 11?  or was
>> the 'console=ttyS1' only for debug output?
>> I ask because console output doesn't use IRQs; iow, maybe the serial
>> port w/ driver never actually worked.
>> 2. Can you post dmesg for the SLES 11 setup?  That would show if there
>> were probe errors even on that.
>>
>> An alternative that should be equivalent to your previous setup is to
>> build w/ CONFIG_SERIAL_8250_PNP=n
>> Seems like your ACPI BIOS is buggy, but also that something else is using 
>> IRQ 3?
>>
>> Regards,
>> Peter Hurley
>
>
>
> 1) Yes I can confirm I used it to login sometimes.
>
> I built with CONFIG_SERIAL_8250_PNP=n and that seemed to work better, in that 
> the system did not hang.
> However I couldn't login on the serial and got these error messages, I 
> suspect I broke something while trying different permutations.
>
> gdm[5206]: WARNING: GdmDisplay: display lasted 0.136636 seconds
> gdm[5206]: WARNING: GdmDisplay: display lasted 0.180955 seconds
> gdm[5206]: WARNING: GdmDisplay: display lasted 0.161415 seconds
> gdm[5206]: WARNING: GdmLocalDisplayFactory: maximum number of X display 
> failures reached: check X server log for errors
>
> It did boot all the way though.
>
> 2) attached log

So I'm confused where this leaves us.

In your OP, you claim to have gotten it working with a partial revert
of commit 835d844d1a28 (but you didn't attach the partial revert so no
one knows what you did); however, my suggestion should have been
equivalent.

Note that you have the serial port disabled in BIOS; that's why you're
getting the probe error for PNP.

Regards,
Peter Hurley

Re: console issue since 3.6, console=ttyS1 hangs

2016-11-03 Thread Peter Hurley

On Wed, Nov 2, 2016 at 9:29 AM, Nathan Zimmer  wrote:
> On Mon, Oct 31, 2016 at 08:55:49PM -0600, Peter Hurley wrote:
>> On Mon, Oct 31, 2016 at 2:27 PM, Sean Young  wrote:
>> > On Sun, Oct 30, 2016 at 10:33:02AM -0500, Nathan wrote:
>> >> I think this should be PNP0501 instead of PNP0c02.
>> >> Once I alter that then when I boot the serial comes up on irq 3. However 
>> >> it
>> >> still hangs.
>> >> I'll keep digging.
>> >
>> > Well that's that theory out of the window. I'm not sure where to look now,
>> > I would start by enabling as many as possible of the "kernel hacking" 
>> > config
>> > options and see if anything gets caught.
>> >
>> > Looking at your earlier messages, you have a collection of percpu 
>> > allocation
>> > failures. That might be worth resolving before anything else.
>>
>> Hi Nathan,
>>
>> Couple of questions:
>> 1. Was login over serial console setup and working on SLES 11?  or was
>> the 'console=ttyS1' only for debug output?
>> I ask because console output doesn't use IRQs; iow, maybe the serial
>> port w/ driver never actually worked.
>> 2. Can you post dmesg for the SLES 11 setup?  That would show if there
>> were probe errors even on that.
>>
>> An alternative that should be equivalent to your previous setup is to
>> build w/ CONFIG_SERIAL_8250_PNP=n
>> Seems like your ACPI BIOS is buggy, but also that something else is using 
>> IRQ 3?
>>
>> Regards,
>> Peter Hurley
>
>
>
> 1) Yes I can confirm I used it to login sometimes.
>
> I built with CONFIG_SERIAL_8250_PNP=n and that seemed to work better, in that 
> the system did not hang.
> However I couldn't login on the serial and got these error messages, I 
> suspect I broke something while trying different permutations.
>
> gdm[5206]: WARNING: GdmDisplay: display lasted 0.136636 seconds
> gdm[5206]: WARNING: GdmDisplay: display lasted 0.180955 seconds
> gdm[5206]: WARNING: GdmDisplay: display lasted 0.161415 seconds
> gdm[5206]: WARNING: GdmLocalDisplayFactory: maximum number of X display 
> failures reached: check X server log for errors
>
> It did boot all the way though.
>
> 2) attached log

So I'm confused where this leaves us.

In your OP, you claim to have gotten it working with a partial revert
of commit 835d844d1a28 (but you didn't attach the partial revert so no
one knows what you did); however, my suggestion should have been
equivalent.

Note that you have the serial port disabled in BIOS; that's why you're
getting the probe error for PNP.

Regards,
Peter Hurley

Re: console issue since 3.6, console=ttyS1 hangs

2016-10-31 Thread Peter Hurley

On Mon, Oct 31, 2016 at 2:27 PM, Sean Young <s...@mess.org> wrote:
> On Sun, Oct 30, 2016 at 10:33:02AM -0500, Nathan wrote:
>> I think this should be PNP0501 instead of PNP0c02.
>> Once I alter that then when I boot the serial comes up on irq 3. However it
>> still hangs.
>> I'll keep digging.
>
> Well that's that theory out of the window. I'm not sure where to look now,
> I would start by enabling as many as possible of the "kernel hacking" config
> options and see if anything gets caught.
>
> Looking at your earlier messages, you have a collection of percpu allocation
> failures. That might be worth resolving before anything else.

Hi Nathan,

Couple of questions:
1. Was login over serial console setup and working on SLES 11?  or was
the 'console=ttyS1' only for debug output?
I ask because console output doesn't use IRQs; iow, maybe the serial
port w/ driver never actually worked.
2. Can you post dmesg for the SLES 11 setup?  That would show if there
were probe errors even on that.

An alternative that should be equivalent to your previous setup is to
build w/ CONFIG_SERIAL_8250_PNP=n
Seems like your ACPI BIOS is buggy, but also that something else is using IRQ 3?

Regards,
Peter Hurley

Re: console issue since 3.6, console=ttyS1 hangs

2016-10-31 Thread Peter Hurley

On Mon, Oct 31, 2016 at 2:27 PM, Sean Young  wrote:
> On Sun, Oct 30, 2016 at 10:33:02AM -0500, Nathan wrote:
>> I think this should be PNP0501 instead of PNP0c02.
>> Once I alter that then when I boot the serial comes up on irq 3. However it
>> still hangs.
>> I'll keep digging.
>
> Well that's that theory out of the window. I'm not sure where to look now,
> I would start by enabling as many as possible of the "kernel hacking" config
> options and see if anything gets caught.
>
> Looking at your earlier messages, you have a collection of percpu allocation
> failures. That might be worth resolving before anything else.

Hi Nathan,

Couple of questions:
1. Was login over serial console setup and working on SLES 11?  or was
the 'console=ttyS1' only for debug output?
I ask because console output doesn't use IRQs; iow, maybe the serial
port w/ driver never actually worked.
2. Can you post dmesg for the SLES 11 setup?  That would show if there
were probe errors even on that.

An alternative that should be equivalent to your previous setup is to
build w/ CONFIG_SERIAL_8250_PNP=n
Seems like your ACPI BIOS is buggy, but also that something else is using IRQ 3?

Regards,
Peter Hurley

Re: [RFC PATCH 0/6] UART slave devices using serio

2016-10-31 Thread Peter Hurley

[ this time w/o the html ]

On Tue, Oct 25, 2016 at 4:02 PM, Rob Herring <r...@kernel.org> wrote:
> > Maybe you can try to find some minutes at the Kernel Summit to talk
> > about this?
>
> Still waiting for my invite...
>
> But I will be at Plumbers if folks want to discuss this.

Hey Rob,

I'm here so let's make time to discuss this.

Regards,
Peter Hurley

Re: [RFC PATCH 0/6] UART slave devices using serio

2016-10-31 Thread Peter Hurley

[ this time w/o the html ]

On Tue, Oct 25, 2016 at 4:02 PM, Rob Herring  wrote:
> > Maybe you can try to find some minutes at the Kernel Summit to talk
> > about this?
>
> Still waiting for my invite...
>
> But I will be at Plumbers if folks want to discuss this.

Hey Rob,

I'm here so let's make time to discuss this.

Regards,
Peter Hurley

Re: [PATCH v8 4/4] serial: pl011: add console matching function

2016-06-22 Thread Peter Hurley



> On Jun 22, 2016, at 5:18 AM, Yury Norov <yno...@caviumnetworks.com> wrote:
> 
>> On Fri, May 20, 2016 at 04:03:23PM +0300, Aleksey Makarov wrote:
>> This patch adds function pl011_console_match() that implements
>> method match of struct console.  It allows to match consoles against
>> data specified in a string, for example taken from command line or
>> compiled by ACPI SPCR table handler.
>> 
>> Signed-off-by: Aleksey Makarov <aleksey.maka...@linaro.org>
>> Reviewed-by: Peter Hurley <pe...@hurleysoftware.com>
>> ---
>> drivers/tty/serial/amba-pl011.c | 56 
>> +
>> 1 file changed, 56 insertions(+)
>> 
>> diff --git a/drivers/tty/serial/amba-pl011.c 
>> b/drivers/tty/serial/amba-pl011.c
>> index a2aa655..388edc8 100644
>> --- a/drivers/tty/serial/amba-pl011.c
>> +++ b/drivers/tty/serial/amba-pl011.c
>> @@ -2288,12 +2288,68 @@ static int __init pl011_console_setup(struct console 
>> *co, char *options)
>>return uart_set_options(>port, co, baud, parity, bits, flow);
>> }
>> 
>> +/**
>> + *pl011_console_match - non-standard console matching
>> + *@co:  registering console
>> + *@name:  name from console command line
>> + *@idx:  index from console command line
>> + *@options: ptr to option string from console command line
>> + *
>> + *Only attempts to match console command lines of the form:
>> + *console=pl011,mmio|mmio32,[,]
>> + *console=pl011,0x[,]
>> + *This form is used to register an initial earlycon boot console and
>> + *replace it with the amba_console at pl011 driver init.
>> + *
>> + *Performs console setup for a match (as required by interface)
>> + *If no  are specified, then assume the h/w is already setup.
>> + *
>> + *Returns 0 if console matches; otherwise non-zero to use default 
>> matching
>> + */
>> +static int __init pl011_console_match(struct console *co, char *name, int 
>> idx,
>> +  char *options)
>> +{
>> +char match[] = "pl011";/* pl011-specific earlycon name */
>> +unsigned char iotype;
>> +unsigned long addr;
>> +int i;
>> +
>> +if (strncmp(name, match, 5) != 0)
>> +return -ENODEV;
>> +
>> +if (uart_parse_earlycon(options, , , ))
>> +return -ENODEV;
>> +
>> +/* try to match the port specified on the command line */
>> +for (i = 0; i < ARRAY_SIZE(amba_ports); i++) {
>> +struct uart_port *port;
>> +
>> +if (!amba_ports[i])
>> +continue;
>> +
>> +port = _ports[i]->port;
>> +
>> +if (iotype != UPIO_MEM && iotype != UPIO_MEM32)
>> +continue;
> 
> So it looks like iotype is constant inside the loop, and UPIO_MEM
> and UPIO_MEM32 too, of course. It means you can move this check out of
> cycle and avoid ports traversing at all in specific case. Am I wrong?
> 

No you're not wrong but I'd prefer if we don't use assumptions like that in 
port enumeration.


>> +
>> +if (port->mapbase != addr)
>> +continue;
>> +
>> +co->index = i;
>> +port->cons = co;
>> +return pl011_console_setup(co, options);
>> +}
>> +
>> +return -ENODEV;
>> +}
>> +
>> static struct uart_driver amba_reg;
>> static struct console amba_console = {
>>.name= "ttyAMA",
>>.write= pl011_console_write,
>>.device= uart_console_device,
>>.setup= pl011_console_setup,
>> +.match= pl011_console_match,
>>.flags= CON_PRINTBUFFER,
>>.index= -1,
>>.data= _reg,
>> -- 
>> 2.8.2

Re: [PATCH v8 4/4] serial: pl011: add console matching function

2016-06-22 Thread Peter Hurley



> On Jun 22, 2016, at 5:18 AM, Yury Norov  wrote:
> 
>> On Fri, May 20, 2016 at 04:03:23PM +0300, Aleksey Makarov wrote:
>> This patch adds function pl011_console_match() that implements
>> method match of struct console.  It allows to match consoles against
>> data specified in a string, for example taken from command line or
>> compiled by ACPI SPCR table handler.
>> 
>> Signed-off-by: Aleksey Makarov 
>> Reviewed-by: Peter Hurley 
>> ---
>> drivers/tty/serial/amba-pl011.c | 56 
>> +
>> 1 file changed, 56 insertions(+)
>> 
>> diff --git a/drivers/tty/serial/amba-pl011.c 
>> b/drivers/tty/serial/amba-pl011.c
>> index a2aa655..388edc8 100644
>> --- a/drivers/tty/serial/amba-pl011.c
>> +++ b/drivers/tty/serial/amba-pl011.c
>> @@ -2288,12 +2288,68 @@ static int __init pl011_console_setup(struct console 
>> *co, char *options)
>>return uart_set_options(>port, co, baud, parity, bits, flow);
>> }
>> 
>> +/**
>> + *pl011_console_match - non-standard console matching
>> + *@co:  registering console
>> + *@name:  name from console command line
>> + *@idx:  index from console command line
>> + *@options: ptr to option string from console command line
>> + *
>> + *Only attempts to match console command lines of the form:
>> + *console=pl011,mmio|mmio32,[,]
>> + *console=pl011,0x[,]
>> + *This form is used to register an initial earlycon boot console and
>> + *replace it with the amba_console at pl011 driver init.
>> + *
>> + *Performs console setup for a match (as required by interface)
>> + *If no  are specified, then assume the h/w is already setup.
>> + *
>> + *Returns 0 if console matches; otherwise non-zero to use default 
>> matching
>> + */
>> +static int __init pl011_console_match(struct console *co, char *name, int 
>> idx,
>> +  char *options)
>> +{
>> +char match[] = "pl011";/* pl011-specific earlycon name */
>> +unsigned char iotype;
>> +unsigned long addr;
>> +int i;
>> +
>> +if (strncmp(name, match, 5) != 0)
>> +return -ENODEV;
>> +
>> +if (uart_parse_earlycon(options, , , ))
>> +return -ENODEV;
>> +
>> +/* try to match the port specified on the command line */
>> +for (i = 0; i < ARRAY_SIZE(amba_ports); i++) {
>> +struct uart_port *port;
>> +
>> +if (!amba_ports[i])
>> +continue;
>> +
>> +port = _ports[i]->port;
>> +
>> +if (iotype != UPIO_MEM && iotype != UPIO_MEM32)
>> +continue;
> 
> So it looks like iotype is constant inside the loop, and UPIO_MEM
> and UPIO_MEM32 too, of course. It means you can move this check out of
> cycle and avoid ports traversing at all in specific case. Am I wrong?
> 

No you're not wrong but I'd prefer if we don't use assumptions like that in 
port enumeration.


>> +
>> +if (port->mapbase != addr)
>> +continue;
>> +
>> +co->index = i;
>> +port->cons = co;
>> +return pl011_console_setup(co, options);
>> +}
>> +
>> +return -ENODEV;
>> +}
>> +
>> static struct uart_driver amba_reg;
>> static struct console amba_console = {
>>.name= "ttyAMA",
>>.write= pl011_console_write,
>>.device= uart_console_device,
>>.setup= pl011_console_setup,
>> +.match= pl011_console_match,
>>.flags= CON_PRINTBUFFER,
>>.index= -1,
>>.data= _reg,
>> -- 
>> 2.8.2

Re: [RFC] serial: 8250: fix regression in 8250 uart driver

2016-06-13 Thread Peter Hurley

On 06/13/2016 01:13 PM, Dinh Nguyen wrote:
> Hi Andy,
> 
> I saw that you have discovered that commit ec5a11a91eec ("serial: 8250:
> Validate dmaengine rx chan meets requirements") introduced a regression
> in the 8250 uart driver. For SoCFPGA platform, I am seeing this error:
> 
> [5.541751] ttyS0 - failed to request DMA
> 
> Reverting the commit ec5a11a91eec removes the error.

The error is that your dmaengine driver doesn't support the functionality
required by the 8250 driver.

Regards,
Peter Hurley

> I saw that you started the discussion, but I didn't see that a patch was
> included[1].
> 
> The following patch seems to fix the error, but I'm not sure if it's the
> same fix that you had in mind.
> 
> Thanks,
> Dinh
> 
> [1] http://marc.info/?l=linux-serial=146254187602862=2
> 
> ---8<
> diff --git a/drivers/tty/serial/8250/8250_dma.c
> b/drivers/tty/serial/8250/8250_dma.c
> index 7f33d1c..847a203 100644
> --- a/drivers/tty/serial/8250/8250_dma.c
> +++ b/drivers/tty/serial/8250/8250_dma.c
> @@ -176,7 +176,7 @@ int serial8250_request_dma(struct uart_8250_port *p)
> ret = dma_get_slave_caps(dma->rxchan, );
> if (ret)
> goto release_rx;
> -   if (!caps.cmd_pause || !caps.cmd_terminate ||
> +   if ((!caps.cmd_pause || !caps.cmd_terminate) &&
> caps.residue_granularity ==
> DMA_RESIDUE_GRANULARITY_DESCRIPTOR) {
> ret = -EINVAL;
> goto release_rx;
>

Re: [RFC] serial: 8250: fix regression in 8250 uart driver

2016-06-13 Thread Peter Hurley

On 06/13/2016 01:13 PM, Dinh Nguyen wrote:
> Hi Andy,
> 
> I saw that you have discovered that commit ec5a11a91eec ("serial: 8250:
> Validate dmaengine rx chan meets requirements") introduced a regression
> in the 8250 uart driver. For SoCFPGA platform, I am seeing this error:
> 
> [5.541751] ttyS0 - failed to request DMA
> 
> Reverting the commit ec5a11a91eec removes the error.

The error is that your dmaengine driver doesn't support the functionality
required by the 8250 driver.

Regards,
Peter Hurley

> I saw that you started the discussion, but I didn't see that a patch was
> included[1].
> 
> The following patch seems to fix the error, but I'm not sure if it's the
> same fix that you had in mind.
> 
> Thanks,
> Dinh
> 
> [1] http://marc.info/?l=linux-serial=146254187602862=2
> 
> ---8<
> diff --git a/drivers/tty/serial/8250/8250_dma.c
> b/drivers/tty/serial/8250/8250_dma.c
> index 7f33d1c..847a203 100644
> --- a/drivers/tty/serial/8250/8250_dma.c
> +++ b/drivers/tty/serial/8250/8250_dma.c
> @@ -176,7 +176,7 @@ int serial8250_request_dma(struct uart_8250_port *p)
> ret = dma_get_slave_caps(dma->rxchan, );
> if (ret)
> goto release_rx;
> -   if (!caps.cmd_pause || !caps.cmd_terminate ||
> +   if ((!caps.cmd_pause || !caps.cmd_terminate) &&
> caps.residue_granularity ==
> DMA_RESIDUE_GRANULARITY_DESCRIPTOR) {
> ret = -EINVAL;
> goto release_rx;
>

Re: [PATCH] serial: earlycon: stop abusing console::index

2016-06-08 Thread Peter Hurley

On 06/03/2016 08:19 AM, Mark Rutland wrote:
> Commit cda64e6824026575 ("serial: earlycon: Fixup earlycon console name
> and index") added code to decompose an earlycon driver name into a
> string prefix and numeric suffix, and place this into console::name and
> console::index, such that we'd get a pretty name out of the core console
> code, which requires a name and index. This is simply to give a
> pretty/useful earlycon driver name in the log messages, and other than
> logging the name and index hold no significance.

The idea was to keep the console_cmdline[] array consistent with the
registered consoles, even though earlycon uses its own matching scheme.
Least surprise and all that.


> However, this is broken for drivers such as "pl011", where the "011"
> suffix gets stripped off, then subsequently restored, printed as a
> decimal, erroneously giving "pl11" in log messages.

Yeah, I saw that.

> Additionally, for
> earlycon drivers without a numeric suffix, "0" is added regardless. Thus
> the code is broken in some cases, and generally inconsistent.

I would like to continue with some kind of naming scheme that preserves
both the command line parameters (uart,uart8250,pl011,etc.) and uniquely
identifies which uart driver is the earlycon.

The current scheme could be fixed easily enough (by just using a single digit).
Or using a separator, ie. uart/0, pl011/0, etc.

Regards,
Peter Hurley


> Instead, this patch changes the earlycon code to consistently register
> "earlycon0", but ensures that the earlycon driver name is logged at
> earlycon_init time. This is obvious, consistent, and sufficient to
> provide the required information to the user.
> 
> With this in place, amongst the first messages from the kernel, the user
> will see something like:
> 
> [0.00] earlycon: earlycon0 (pl011) at MMIO 0x7ff8 
> (options '')
> [0.00] bootconsole [earlycon0] enabled
> 
> Signed-off-by: Mark Rutland <mark.rutl...@arm.com>
> Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
> Cc: Jiri Slaby <jsl...@suse.com>
> Cc: Leif Lindholm <leif.lindh...@arm.com>
> Cc: Peter Hurley <pe...@hurleysoftware.com>
> Cc: Rob Herring <r...@kernel.org>
> Cc: Will Deacon <will.dea...@arm.com>
> Cc: linux-ser...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/tty/serial/earlycon.c | 19 ---
>  1 file changed, 4 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/tty/serial/earlycon.c b/drivers/tty/serial/earlycon.c
> index 067783f..2b6622a 100644
> --- a/drivers/tty/serial/earlycon.c
> +++ b/drivers/tty/serial/earlycon.c
> @@ -29,7 +29,7 @@
>  #include 
>  
>  static struct console early_con = {
> - .name = "uart", /* fixed up at earlycon registration */
> + .name = "earlycon",
>   .flags =CON_PRINTBUFFER | CON_BOOT,
>   .index =0,
>  };
> @@ -60,24 +60,13 @@ static void __init earlycon_init(struct earlycon_device 
> *device,
>  {
>   struct console *earlycon = device->con;
>   struct uart_port *port = >port;
> - const char *s;
> - size_t len;
> -
> - /* scan backwards from end of string for first non-numeral */
> - for (s = name + strlen(name);
> -  s > name && s[-1] >= '0' && s[-1] <= '9';
> -  s--)
> - ;
> - if (*s)
> - earlycon->index = simple_strtoul(s, NULL, 10);
> - len = s - name;
> - strlcpy(earlycon->name, name, min(len + 1, sizeof(earlycon->name)));
> +
>   earlycon->data = _console_dev;
>  
>   if (port->iotype == UPIO_MEM || port->iotype == UPIO_MEM16 ||
>   port->iotype == UPIO_MEM32 || port->iotype == UPIO_MEM32BE)
> - pr_info("%s%d at MMIO%s %pa (options '%s')\n",
> - earlycon->name, earlycon->index,
> + pr_info("%s%d (%s) at MMIO%s %pa (options '%s')\n",
> + earlycon->name, earlycon->index, name,
>   (port->iotype == UPIO_MEM) ? "" :
>   (port->iotype == UPIO_MEM16) ? "16" :
>   (port->iotype == UPIO_MEM32) ? "32" : "32be",
>

Re: [PATCH] serial: earlycon: stop abusing console::index

2016-06-08 Thread Peter Hurley

On 06/03/2016 08:19 AM, Mark Rutland wrote:
> Commit cda64e6824026575 ("serial: earlycon: Fixup earlycon console name
> and index") added code to decompose an earlycon driver name into a
> string prefix and numeric suffix, and place this into console::name and
> console::index, such that we'd get a pretty name out of the core console
> code, which requires a name and index. This is simply to give a
> pretty/useful earlycon driver name in the log messages, and other than
> logging the name and index hold no significance.

The idea was to keep the console_cmdline[] array consistent with the
registered consoles, even though earlycon uses its own matching scheme.
Least surprise and all that.


> However, this is broken for drivers such as "pl011", where the "011"
> suffix gets stripped off, then subsequently restored, printed as a
> decimal, erroneously giving "pl11" in log messages.

Yeah, I saw that.

> Additionally, for
> earlycon drivers without a numeric suffix, "0" is added regardless. Thus
> the code is broken in some cases, and generally inconsistent.

I would like to continue with some kind of naming scheme that preserves
both the command line parameters (uart,uart8250,pl011,etc.) and uniquely
identifies which uart driver is the earlycon.

The current scheme could be fixed easily enough (by just using a single digit).
Or using a separator, ie. uart/0, pl011/0, etc.

Regards,
Peter Hurley


> Instead, this patch changes the earlycon code to consistently register
> "earlycon0", but ensures that the earlycon driver name is logged at
> earlycon_init time. This is obvious, consistent, and sufficient to
> provide the required information to the user.
> 
> With this in place, amongst the first messages from the kernel, the user
> will see something like:
> 
> [0.00] earlycon: earlycon0 (pl011) at MMIO 0x7ff8 
> (options '')
> [0.00] bootconsole [earlycon0] enabled
> 
> Signed-off-by: Mark Rutland 
> Cc: Greg Kroah-Hartman 
> Cc: Jiri Slaby 
> Cc: Leif Lindholm 
> Cc: Peter Hurley 
> Cc: Rob Herring 
> Cc: Will Deacon 
> Cc: linux-ser...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/tty/serial/earlycon.c | 19 ---
>  1 file changed, 4 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/tty/serial/earlycon.c b/drivers/tty/serial/earlycon.c
> index 067783f..2b6622a 100644
> --- a/drivers/tty/serial/earlycon.c
> +++ b/drivers/tty/serial/earlycon.c
> @@ -29,7 +29,7 @@
>  #include 
>  
>  static struct console early_con = {
> - .name = "uart", /* fixed up at earlycon registration */
> + .name = "earlycon",
>   .flags =CON_PRINTBUFFER | CON_BOOT,
>   .index =0,
>  };
> @@ -60,24 +60,13 @@ static void __init earlycon_init(struct earlycon_device 
> *device,
>  {
>   struct console *earlycon = device->con;
>   struct uart_port *port = >port;
> - const char *s;
> - size_t len;
> -
> - /* scan backwards from end of string for first non-numeral */
> - for (s = name + strlen(name);
> -  s > name && s[-1] >= '0' && s[-1] <= '9';
> -  s--)
> - ;
> - if (*s)
> - earlycon->index = simple_strtoul(s, NULL, 10);
> - len = s - name;
> - strlcpy(earlycon->name, name, min(len + 1, sizeof(earlycon->name)));
> +
>   earlycon->data = _console_dev;
>  
>   if (port->iotype == UPIO_MEM || port->iotype == UPIO_MEM16 ||
>   port->iotype == UPIO_MEM32 || port->iotype == UPIO_MEM32BE)
> - pr_info("%s%d at MMIO%s %pa (options '%s')\n",
> - earlycon->name, earlycon->index,
> + pr_info("%s%d (%s) at MMIO%s %pa (options '%s')\n",
> + earlycon->name, earlycon->index, name,
>   (port->iotype == UPIO_MEM) ? "" :
>   (port->iotype == UPIO_MEM16) ? "16" :
>   (port->iotype == UPIO_MEM32) ? "32" : "32be",
>

Re: [PATCH] serial_core: Change UART PM state to OFF on failure

2016-06-08 Thread Peter Hurley

Hi Pramod,

On 05/06/2016 02:46 AM, Pramod Gurav wrote:
> uart_change_pm is used to turn on the UART controller resources and
> change UART's PM status. On failure to allocate pages the controller
> be left in ON state. This will change the state to OFF on failure.
> 
> Signed-off-by: Pramod Gurav <pramod.gu...@linaro.org>
> ---
>  drivers/tty/serial/serial_core.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/serial/serial_core.c 
> b/drivers/tty/serial/serial_core.c
> index 62fe368..58af2e9 100644
> --- a/drivers/tty/serial/serial_core.c
> +++ b/drivers/tty/serial/serial_core.c
> @@ -156,9 +156,10 @@ static int uart_port_startup(struct tty_struct *tty, 
> struct uart_state *state,
>   if (!state->xmit.buf) {
>   /* This is protected by the per port mutex */
>   page = get_zeroed_page(GFP_KERNEL);
> - if (!page)
> + if (!page) {

if (!uart_console(uport))

Otherwise, you'll be powering off the console.

Just out of curiosity, did you actually hit this error?

Regards,
Peter Hurley

> + uart_change_pm(state, UART_PM_STATE_OFF);
>   return -ENOMEM;
> -
> + }
>   state->xmit.buf = (unsigned char *) page;
>   uart_circ_clear(>xmit);
>   }
>

Re: [PATCH] serial_core: Change UART PM state to OFF on failure

2016-06-08 Thread Peter Hurley

Hi Pramod,

On 05/06/2016 02:46 AM, Pramod Gurav wrote:
> uart_change_pm is used to turn on the UART controller resources and
> change UART's PM status. On failure to allocate pages the controller
> be left in ON state. This will change the state to OFF on failure.
> 
> Signed-off-by: Pramod Gurav 
> ---
>  drivers/tty/serial/serial_core.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/serial/serial_core.c 
> b/drivers/tty/serial/serial_core.c
> index 62fe368..58af2e9 100644
> --- a/drivers/tty/serial/serial_core.c
> +++ b/drivers/tty/serial/serial_core.c
> @@ -156,9 +156,10 @@ static int uart_port_startup(struct tty_struct *tty, 
> struct uart_state *state,
>   if (!state->xmit.buf) {
>   /* This is protected by the per port mutex */
>   page = get_zeroed_page(GFP_KERNEL);
> - if (!page)
> + if (!page) {

if (!uart_console(uport))

Otherwise, you'll be powering off the console.

Just out of curiosity, did you actually hit this error?

Regards,
Peter Hurley

> + uart_change_pm(state, UART_PM_STATE_OFF);
>   return -ENOMEM;
> -
> + }
>   state->xmit.buf = (unsigned char *) page;
>   uart_circ_clear(>xmit);
>   }
>

Re: crosstool builds - https://www.kernel.org/pub/tools/crosstool/

2016-05-23 Thread Peter Hurley

Hi Jon,

On 05/19/2016 08:48 PM, Jon Masters wrote:
> Hi Tony,
> 
> I'm wondering whether you know of anyone who has picked up where you
> left off, building cross toolchains for various and sundry alternative
> architectures? I wanted to download an IA64 compiler recently (really)
> and noticed that the most recent builds were against GCC 4.6.3.

Someone's been making them.

https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.9.0/

arm64
alpha
arm = gnueabi
parisc = hppa
parisc64 = hppa64
mips
mips64
powerpc
powerpc64
sparc
sparc64
s390
...

Regards,
Peter Hurley

Re: crosstool builds - https://www.kernel.org/pub/tools/crosstool/

2016-05-23 Thread Peter Hurley

Hi Jon,

On 05/19/2016 08:48 PM, Jon Masters wrote:
> Hi Tony,
> 
> I'm wondering whether you know of anyone who has picked up where you
> left off, building cross toolchains for various and sundry alternative
> architectures? I wanted to download an IA64 compiler recently (really)
> and noticed that the most recent builds were against GCC 4.6.3.

Someone's been making them.

https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.9.0/

arm64
alpha
arm = gnueabi
parisc = hppa
parisc64 = hppa64
mips
mips64
powerpc
powerpc64
sparc
sparc64
s390
...

Regards,
Peter Hurley

Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE

2016-05-21 Thread Peter Hurley

On 05/18/2016 12:58 PM, Jason Low wrote:
> On Wed, 2016-05-18 at 14:29 -0400, Waiman Long wrote:
>> On 05/18/2016 01:21 PM, Jason Low wrote:
>>> On Wed, 2016-05-18 at 07:04 -0700, Davidlohr Bueso wrote:
>>>> On Tue, 17 May 2016, Waiman Long wrote:
>>>>
>>>>> Without using WRITE_ONCE(), the compiler can potentially break a
>>>>> write into multiple smaller ones (store tearing). So a read from the
>>>>> same data by another task concurrently may return a partial result.
>>>>> This can result in a kernel crash if the data is a memory address
>>>>> that is being dereferenced.
>>>>>
>>>>> This patch changes all write to rwsem->owner to use WRITE_ONCE()
>>>>> to make sure that store tearing will not happen. READ_ONCE() may
>>>>> not be needed for rwsem->owner as long as the value is only used for
>>>>> comparison and not dereferencing.
>>> It might be okay to leave out READ_ONCE() for reading rwsem->owner, but
>>> couldn't we include it to at least document that we're performing a
>>> "special" lockless read?
>>>
>>
>> Using READ_ONCE() does have a bit of cost as it limits compiler 
>> optimization. If we changes all access to rwsem->owner to READ_ONCE() 
>> and WRITE_ONCE(), we may as well change its type to volatile and be done 
>> with.
> 
> Right, although there are still places like the init function where
> WRITE_ONCE isn't necessary.

Which doesn't cost anything either.


>> I am not against doing that, but it feels a bit over-reach for me. 
>> On the other hand, we may define a do-nothing macro that designates the 
>> owner as a special variable for documentation purpose, but don't need 
>> protection at that particular call site.
> 
> It should be fine to use the standard READ_ONCE here, even if it's just
> for documentation, as it's probably not going to cost anything in
> practice. It would be better to avoid adding any special macros for this
> which may just add more complexity.

See, I don't understand this line of reasoning at all.

I read this as "it's ok to be non-optimal here where were spinning CPU
time but not ok to be non-optimal generally elsewhere where it's
way less important like at init time".

And by the way, it's not just "here" but _everywhere_.
What about reading ->on_cpu locklessly?

Sure it's a bool, but doesn't the "we need to document lockless access"
argument equally apply here?

Regards,
Peter Hurley

Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE

2016-05-21 Thread Peter Hurley

On 05/18/2016 12:58 PM, Jason Low wrote:
> On Wed, 2016-05-18 at 14:29 -0400, Waiman Long wrote:
>> On 05/18/2016 01:21 PM, Jason Low wrote:
>>> On Wed, 2016-05-18 at 07:04 -0700, Davidlohr Bueso wrote:
>>>> On Tue, 17 May 2016, Waiman Long wrote:
>>>>
>>>>> Without using WRITE_ONCE(), the compiler can potentially break a
>>>>> write into multiple smaller ones (store tearing). So a read from the
>>>>> same data by another task concurrently may return a partial result.
>>>>> This can result in a kernel crash if the data is a memory address
>>>>> that is being dereferenced.
>>>>>
>>>>> This patch changes all write to rwsem->owner to use WRITE_ONCE()
>>>>> to make sure that store tearing will not happen. READ_ONCE() may
>>>>> not be needed for rwsem->owner as long as the value is only used for
>>>>> comparison and not dereferencing.
>>> It might be okay to leave out READ_ONCE() for reading rwsem->owner, but
>>> couldn't we include it to at least document that we're performing a
>>> "special" lockless read?
>>>
>>
>> Using READ_ONCE() does have a bit of cost as it limits compiler 
>> optimization. If we changes all access to rwsem->owner to READ_ONCE() 
>> and WRITE_ONCE(), we may as well change its type to volatile and be done 
>> with.
> 
> Right, although there are still places like the init function where
> WRITE_ONCE isn't necessary.

Which doesn't cost anything either.


>> I am not against doing that, but it feels a bit over-reach for me. 
>> On the other hand, we may define a do-nothing macro that designates the 
>> owner as a special variable for documentation purpose, but don't need 
>> protection at that particular call site.
> 
> It should be fine to use the standard READ_ONCE here, even if it's just
> for documentation, as it's probably not going to cost anything in
> practice. It would be better to avoid adding any special macros for this
> which may just add more complexity.

See, I don't understand this line of reasoning at all.

I read this as "it's ok to be non-optimal here where were spinning CPU
time but not ok to be non-optimal generally elsewhere where it's
way less important like at init time".

And by the way, it's not just "here" but _everywhere_.
What about reading ->on_cpu locklessly?

Sure it's a bool, but doesn't the "we need to document lockless access"
argument equally apply here?

Regards,
Peter Hurley

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-17 Thread Peter Hurley

Hi Paul,

You can disregard this as I think we're talking about
the same things with the other email thread.

Regards,
Peter Hurley


On 05/17/2016 12:46 PM, Peter Hurley wrote:
> On 05/16/2016 10:22 AM, Paul E. McKenney wrote:
>> On Mon, May 16, 2016 at 07:17:42AM -0700, Peter Hurley wrote:
>>> On 05/16/2016 05:17 AM, Paul E. McKenney wrote:
>>>> On Mon, May 16, 2016 at 01:09:48PM +0200, Peter Zijlstra wrote:
>>>>> On Fri, May 13, 2016 at 10:58:05AM -0700, Peter Hurley wrote:
>>>>>>> Note that barrier() and READ_ONCE() have overlapping but not identical
>>>>>>> results and the combined use actually makes sense here.
>>>>>>>
>>>>>>> Yes, a barrier() anywhere in the loop will force a reload of the
>>>>>>> variable, _however_ it doesn't force that reload to not suffer from
>>>>>>> load tearing.
>>>>>>>
>>>>>>> Using volatile also forces a reload, but also ensures the load cannot
>>>>>>> be torn IFF it is of machine word side and naturally aligned.
>>>>>>>
>>>>>>> So while the READ_ONCE() here is pointless for forcing the reload;
>>>>>>> that's already ensured, we still need to make sure the load isn't torn.
>>>>>>
>>>>>> If load tearing a naturally aligned pointer is a real code generation
>>>>>> possibility then the rcu list code is broken too (which loads ->next
>>>>>> directly; cf. list_for_each_entry_rcu() & 
>>>>>> list_for_each_entry_lockless()).
>>>>>>
>>>>>> For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
>>>>>> that had to do with control dependencies and not load tearing.
>>>>>
>>>>> Well, Paul is the one who started the whole load/store tearing thing, so
>>>>> I suppose he knows what he's doing.
>>>>
>>>> That had to do with suppressing false positives for one of Dmitry
>>>> Vjukov's concurrency checkers.  I suspect that Peter Hurley is right
>>>> that continued use of that checker would identify other places needing
>>>> READ_ONCE(), but from what I understand that is on hold pending a formal
>>>> definition of the Linux-kernel memory model.  (KCC and Dmitry (CCed)
>>>> can correct my if I am confused on this point.)
>>>>
>>>>> That said; its a fairly recent as things go so lots of code hasn't been
>>>>> updated yet, and its also a very unlikely thing for a compiler to do;
>>>>> since it mostly doesn't make sense to emit multiple instructions where
>>>>> one will do, so its not a very high priority thing either.
>>>>>
>>>>> But from what I understand, the compiler is free to emit all kinds of
>>>>> nonsense for !volatile loads/stores.
>>>>
>>>> That is quite true.  :-/
>>>>
>>>>>> OTOH, this patch might actually produce store-tearing:
>>>>>>
>>>>>> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
>>>>>> +{
>>>>>> +/*
>>>>>> + * We check the owner value first to make sure that we will only
>>>>>> + * do a write to the rwsem cacheline when it is really necessary
>>>>>> + * to minimize cacheline contention.
>>>>>> + */
>>>>>> +if (sem->owner != RWSEM_READER_OWNED)
>>>>>> +sem->owner = RWSEM_READER_OWNED;
>>>>>> +}
>>>>>
>>>>> Correct; which is why we should always use {READ,WRITE}_ONCE() for
>>>>> anything that is used locklessly.
>>>>
>>>> Completely agreed.  Improve readability of code by flagging lockless
>>>> shared-memory accesses, help checkers better find bugs, and prevent the
>>>> occasional compiler mischief!
>>>
>>> I think this would be a mistake for 3 reasons:
>>>
>>> 1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
>>>of any normally-atomic type (char/int/long/void*), then _every_ access
>>>would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
>>>of compiler optimization (eg. eliding redundant loads) where it would
>>>otherwise be possible.
>>
>> The point about eliding redundant loads is a good one, at least

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-17 Thread Peter Hurley

Hi Paul,

You can disregard this as I think we're talking about
the same things with the other email thread.

Regards,
Peter Hurley


On 05/17/2016 12:46 PM, Peter Hurley wrote:
> On 05/16/2016 10:22 AM, Paul E. McKenney wrote:
>> On Mon, May 16, 2016 at 07:17:42AM -0700, Peter Hurley wrote:
>>> On 05/16/2016 05:17 AM, Paul E. McKenney wrote:
>>>> On Mon, May 16, 2016 at 01:09:48PM +0200, Peter Zijlstra wrote:
>>>>> On Fri, May 13, 2016 at 10:58:05AM -0700, Peter Hurley wrote:
>>>>>>> Note that barrier() and READ_ONCE() have overlapping but not identical
>>>>>>> results and the combined use actually makes sense here.
>>>>>>>
>>>>>>> Yes, a barrier() anywhere in the loop will force a reload of the
>>>>>>> variable, _however_ it doesn't force that reload to not suffer from
>>>>>>> load tearing.
>>>>>>>
>>>>>>> Using volatile also forces a reload, but also ensures the load cannot
>>>>>>> be torn IFF it is of machine word side and naturally aligned.
>>>>>>>
>>>>>>> So while the READ_ONCE() here is pointless for forcing the reload;
>>>>>>> that's already ensured, we still need to make sure the load isn't torn.
>>>>>>
>>>>>> If load tearing a naturally aligned pointer is a real code generation
>>>>>> possibility then the rcu list code is broken too (which loads ->next
>>>>>> directly; cf. list_for_each_entry_rcu() & 
>>>>>> list_for_each_entry_lockless()).
>>>>>>
>>>>>> For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
>>>>>> that had to do with control dependencies and not load tearing.
>>>>>
>>>>> Well, Paul is the one who started the whole load/store tearing thing, so
>>>>> I suppose he knows what he's doing.
>>>>
>>>> That had to do with suppressing false positives for one of Dmitry
>>>> Vjukov's concurrency checkers.  I suspect that Peter Hurley is right
>>>> that continued use of that checker would identify other places needing
>>>> READ_ONCE(), but from what I understand that is on hold pending a formal
>>>> definition of the Linux-kernel memory model.  (KCC and Dmitry (CCed)
>>>> can correct my if I am confused on this point.)
>>>>
>>>>> That said; its a fairly recent as things go so lots of code hasn't been
>>>>> updated yet, and its also a very unlikely thing for a compiler to do;
>>>>> since it mostly doesn't make sense to emit multiple instructions where
>>>>> one will do, so its not a very high priority thing either.
>>>>>
>>>>> But from what I understand, the compiler is free to emit all kinds of
>>>>> nonsense for !volatile loads/stores.
>>>>
>>>> That is quite true.  :-/
>>>>
>>>>>> OTOH, this patch might actually produce store-tearing:
>>>>>>
>>>>>> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
>>>>>> +{
>>>>>> +/*
>>>>>> + * We check the owner value first to make sure that we will only
>>>>>> + * do a write to the rwsem cacheline when it is really necessary
>>>>>> + * to minimize cacheline contention.
>>>>>> + */
>>>>>> +if (sem->owner != RWSEM_READER_OWNED)
>>>>>> +sem->owner = RWSEM_READER_OWNED;
>>>>>> +}
>>>>>
>>>>> Correct; which is why we should always use {READ,WRITE}_ONCE() for
>>>>> anything that is used locklessly.
>>>>
>>>> Completely agreed.  Improve readability of code by flagging lockless
>>>> shared-memory accesses, help checkers better find bugs, and prevent the
>>>> occasional compiler mischief!
>>>
>>> I think this would be a mistake for 3 reasons:
>>>
>>> 1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
>>>of any normally-atomic type (char/int/long/void*), then _every_ access
>>>would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
>>>of compiler optimization (eg. eliding redundant loads) where it would
>>>otherwise be possible.
>>
>> The point about eliding redundant loads is a good one, at least

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-17 Thread Peter Hurley

On 05/16/2016 10:22 AM, Paul E. McKenney wrote:
> On Mon, May 16, 2016 at 07:17:42AM -0700, Peter Hurley wrote:
>> On 05/16/2016 05:17 AM, Paul E. McKenney wrote:
>>> On Mon, May 16, 2016 at 01:09:48PM +0200, Peter Zijlstra wrote:
>>>> On Fri, May 13, 2016 at 10:58:05AM -0700, Peter Hurley wrote:
>>>>>> Note that barrier() and READ_ONCE() have overlapping but not identical
>>>>>> results and the combined use actually makes sense here.
>>>>>>
>>>>>> Yes, a barrier() anywhere in the loop will force a reload of the
>>>>>> variable, _however_ it doesn't force that reload to not suffer from
>>>>>> load tearing.
>>>>>>
>>>>>> Using volatile also forces a reload, but also ensures the load cannot
>>>>>> be torn IFF it is of machine word side and naturally aligned.
>>>>>>
>>>>>> So while the READ_ONCE() here is pointless for forcing the reload;
>>>>>> that's already ensured, we still need to make sure the load isn't torn.
>>>>>
>>>>> If load tearing a naturally aligned pointer is a real code generation
>>>>> possibility then the rcu list code is broken too (which loads ->next
>>>>> directly; cf. list_for_each_entry_rcu() & list_for_each_entry_lockless()).
>>>>>
>>>>> For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
>>>>> that had to do with control dependencies and not load tearing.
>>>>
>>>> Well, Paul is the one who started the whole load/store tearing thing, so
>>>> I suppose he knows what he's doing.
>>>
>>> That had to do with suppressing false positives for one of Dmitry
>>> Vjukov's concurrency checkers.  I suspect that Peter Hurley is right
>>> that continued use of that checker would identify other places needing
>>> READ_ONCE(), but from what I understand that is on hold pending a formal
>>> definition of the Linux-kernel memory model.  (KCC and Dmitry (CCed)
>>> can correct my if I am confused on this point.)
>>>
>>>> That said; its a fairly recent as things go so lots of code hasn't been
>>>> updated yet, and its also a very unlikely thing for a compiler to do;
>>>> since it mostly doesn't make sense to emit multiple instructions where
>>>> one will do, so its not a very high priority thing either.
>>>>
>>>> But from what I understand, the compiler is free to emit all kinds of
>>>> nonsense for !volatile loads/stores.
>>>
>>> That is quite true.  :-/
>>>
>>>>> OTOH, this patch might actually produce store-tearing:
>>>>>
>>>>> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
>>>>> +{
>>>>> + /*
>>>>> +  * We check the owner value first to make sure that we will only
>>>>> +  * do a write to the rwsem cacheline when it is really necessary
>>>>> +  * to minimize cacheline contention.
>>>>> +  */
>>>>> + if (sem->owner != RWSEM_READER_OWNED)
>>>>> + sem->owner = RWSEM_READER_OWNED;
>>>>> +}
>>>>
>>>> Correct; which is why we should always use {READ,WRITE}_ONCE() for
>>>> anything that is used locklessly.
>>>
>>> Completely agreed.  Improve readability of code by flagging lockless
>>> shared-memory accesses, help checkers better find bugs, and prevent the
>>> occasional compiler mischief!
>>
>> I think this would be a mistake for 3 reasons:
>>
>> 1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
>>of any normally-atomic type (char/int/long/void*), then _every_ access
>>would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
>>of compiler optimization (eg. eliding redundant loads) where it would
>>otherwise be possible.
> 
> The point about eliding redundant loads is a good one, at least in those
> cases where it is a reasonable optimization.  Should we ever get to a
> point where we no longer use pre-C11 compilers, those use cases could
> potentially use memory_order_relaxed loads.  Preferably wrappered in
> something that can be typed with fewer characters.  And it could of course
> lead to an interesting discussion of what use cases would be required
> to justify this change, but what else is new?

I believe lockless access is quite widespread in the kernel, and this
use was based on the previous a

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-17 Thread Peter Hurley

On 05/16/2016 10:22 AM, Paul E. McKenney wrote:
> On Mon, May 16, 2016 at 07:17:42AM -0700, Peter Hurley wrote:
>> On 05/16/2016 05:17 AM, Paul E. McKenney wrote:
>>> On Mon, May 16, 2016 at 01:09:48PM +0200, Peter Zijlstra wrote:
>>>> On Fri, May 13, 2016 at 10:58:05AM -0700, Peter Hurley wrote:
>>>>>> Note that barrier() and READ_ONCE() have overlapping but not identical
>>>>>> results and the combined use actually makes sense here.
>>>>>>
>>>>>> Yes, a barrier() anywhere in the loop will force a reload of the
>>>>>> variable, _however_ it doesn't force that reload to not suffer from
>>>>>> load tearing.
>>>>>>
>>>>>> Using volatile also forces a reload, but also ensures the load cannot
>>>>>> be torn IFF it is of machine word side and naturally aligned.
>>>>>>
>>>>>> So while the READ_ONCE() here is pointless for forcing the reload;
>>>>>> that's already ensured, we still need to make sure the load isn't torn.
>>>>>
>>>>> If load tearing a naturally aligned pointer is a real code generation
>>>>> possibility then the rcu list code is broken too (which loads ->next
>>>>> directly; cf. list_for_each_entry_rcu() & list_for_each_entry_lockless()).
>>>>>
>>>>> For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
>>>>> that had to do with control dependencies and not load tearing.
>>>>
>>>> Well, Paul is the one who started the whole load/store tearing thing, so
>>>> I suppose he knows what he's doing.
>>>
>>> That had to do with suppressing false positives for one of Dmitry
>>> Vjukov's concurrency checkers.  I suspect that Peter Hurley is right
>>> that continued use of that checker would identify other places needing
>>> READ_ONCE(), but from what I understand that is on hold pending a formal
>>> definition of the Linux-kernel memory model.  (KCC and Dmitry (CCed)
>>> can correct my if I am confused on this point.)
>>>
>>>> That said; its a fairly recent as things go so lots of code hasn't been
>>>> updated yet, and its also a very unlikely thing for a compiler to do;
>>>> since it mostly doesn't make sense to emit multiple instructions where
>>>> one will do, so its not a very high priority thing either.
>>>>
>>>> But from what I understand, the compiler is free to emit all kinds of
>>>> nonsense for !volatile loads/stores.
>>>
>>> That is quite true.  :-/
>>>
>>>>> OTOH, this patch might actually produce store-tearing:
>>>>>
>>>>> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
>>>>> +{
>>>>> + /*
>>>>> +  * We check the owner value first to make sure that we will only
>>>>> +  * do a write to the rwsem cacheline when it is really necessary
>>>>> +  * to minimize cacheline contention.
>>>>> +  */
>>>>> + if (sem->owner != RWSEM_READER_OWNED)
>>>>> + sem->owner = RWSEM_READER_OWNED;
>>>>> +}
>>>>
>>>> Correct; which is why we should always use {READ,WRITE}_ONCE() for
>>>> anything that is used locklessly.
>>>
>>> Completely agreed.  Improve readability of code by flagging lockless
>>> shared-memory accesses, help checkers better find bugs, and prevent the
>>> occasional compiler mischief!
>>
>> I think this would be a mistake for 3 reasons:
>>
>> 1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
>>of any normally-atomic type (char/int/long/void*), then _every_ access
>>would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
>>of compiler optimization (eg. eliding redundant loads) where it would
>>otherwise be possible.
> 
> The point about eliding redundant loads is a good one, at least in those
> cases where it is a reasonable optimization.  Should we ever get to a
> point where we no longer use pre-C11 compilers, those use cases could
> potentially use memory_order_relaxed loads.  Preferably wrappered in
> something that can be typed with fewer characters.  And it could of course
> lead to an interesting discussion of what use cases would be required
> to justify this change, but what else is new?

I believe lockless access is quite widespread in the kernel, and this
use was based on the previous a

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-17 Thread Peter Hurley

On 05/16/2016 10:50 AM, Peter Zijlstra wrote:
> On Mon, May 16, 2016 at 07:17:42AM -0700, Peter Hurley wrote:
> 
>>>> Correct; which is why we should always use {READ,WRITE}_ONCE() for
>>>> anything that is used locklessly.
>>>
>>> Completely agreed.  Improve readability of code by flagging lockless
>>> shared-memory accesses, help checkers better find bugs, and prevent the
>>> occasional compiler mischief!
>>
>> I think this would be a mistake for 3 reasons:
>>
>> 1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
>>of any normally-atomic type (char/int/long/void*), then _every_ access
>>would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
>>of compiler optimization (eg. eliding redundant loads) where it would
>>otherwise be possible.
> 
> Should not really be a problem I think; you already have to be very
> careful when doing lockless stuff.

I think you may not understand my point here.

Consider normal list processing.

Paul added a READ_ONCE() load of head->next to list_empty().
That's because list_empty() is being used locklessly in lots of places,
not just for RCU lists.

Ok, so the load won't be torn. But the store to ->next can, so Paul
added WRITE_ONCE() to some but not all those list primitives too, even
though those aren't being used locklessly.

Note that the compiler _cannot_ optimize those stores even though
they're being used with locks held; IOW, exactly the situation that
volatile-consider-harmful rails against.

Similarly for the load in list_empty() even if it's used with locks held.

As Paul pointed out in his reply, there is no way to address this right now.

What's really needed is separate, not overloaded semantics:

1. "*If* you load or store this value, do it with single memory access, but
feel free to optimize (elide load/defer store/hoist load/whatever)."

2. "No, seriously, load/store this value regardless of what you think you
know." == READ_ONCE/WRITE_ONCE

If we start adding READ_ONCE/WRITE_ONCE to every lockless load/store,
worse code will be generated even though the vast majority of current use
is safe.

>> 2. Makes a mess of otherwise readable code.
> 
> We have to disagree here; I think code with {READ,WRITE}_ONCE() is more
> readable, as their presence is a clear indication something special is
> up.

I think you and Paul may wildly underestimate the scale of lockless
use in the kernel not currently annotated by READ_ONCE()/WRITE_ONCE().

Even in primitives designed for lockless concurrency like
rwsem and seqlock and sched/core, much less higher order code like
ipc/msg.c and vfs.

The majority of this lockless use is safe if the assumption
that loads/stores are performed atomically for char/short/int/long/void*
hold; in other words usage #1 above.

>> 3. Error-prone; ie., easy to overlook in review.
> 
> lockless stuff is error prone by nature; what's your point?

That lockless use is very common, even outside core functionality, and
needlessly splattering READ_ONCE/WRITE_ONCE when the existing code
generation is already safe, will not make it better. And that it will be
no safer by adding READ_ONCE/WRITE_ONCE because plenty of accesses will
be overlooked, like the stores to sem->owner are now.

>> There is no practical difference between _always_ using 
>> READ_ONCE()/WRITE_ONCE()
>> (to prevent tearing) and declaring the field volatile.
> 
> There is, declaring the field volatile doesn't make it stand out in the
> code while reading at all; it also doesn't allow you to omit the
> volatile qualifier in places where it doesn't matter.
> 
> The whole; variables aren't volatile, accesses are, thing is still very
> much in effect.

I'm not really seriously arguing for volatile declarations; I'm pointing
out that READ_ONCE()/WRITE_ONCE() has overloaded meaning that is
counter-productive.

For example, you point out that it doesn't allow you to omit the
volatile qualifier but yet we're adding it to underlying primitives
that may or may not be used locklessly.

Guaranteed list_empty() and list_add() are used way more than INIT_LIST_HEAD().

>> So we've come full-circle from volatile-considered-harmful.
> 
> I don't think so; the cases that document talks about are still very
> much relevant and volatile should not be used for that.

The example below is clearly wrong now.

"Another situation where one might be tempted to use volatile is
when the processor is busy-waiting on the value of a variable.  The 
right
way to perform a busy wait is:

while (my_variable != what_i_want)
cpu_relax();

The cpu_relax() call can lower CPU power consumption or yield to a
hyperthreaded twin processor; it

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-17 Thread Peter Hurley

On 05/16/2016 10:50 AM, Peter Zijlstra wrote:
> On Mon, May 16, 2016 at 07:17:42AM -0700, Peter Hurley wrote:
> 
>>>> Correct; which is why we should always use {READ,WRITE}_ONCE() for
>>>> anything that is used locklessly.
>>>
>>> Completely agreed.  Improve readability of code by flagging lockless
>>> shared-memory accesses, help checkers better find bugs, and prevent the
>>> occasional compiler mischief!
>>
>> I think this would be a mistake for 3 reasons:
>>
>> 1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
>>of any normally-atomic type (char/int/long/void*), then _every_ access
>>would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
>>of compiler optimization (eg. eliding redundant loads) where it would
>>otherwise be possible.
> 
> Should not really be a problem I think; you already have to be very
> careful when doing lockless stuff.

I think you may not understand my point here.

Consider normal list processing.

Paul added a READ_ONCE() load of head->next to list_empty().
That's because list_empty() is being used locklessly in lots of places,
not just for RCU lists.

Ok, so the load won't be torn. But the store to ->next can, so Paul
added WRITE_ONCE() to some but not all those list primitives too, even
though those aren't being used locklessly.

Note that the compiler _cannot_ optimize those stores even though
they're being used with locks held; IOW, exactly the situation that
volatile-consider-harmful rails against.

Similarly for the load in list_empty() even if it's used with locks held.

As Paul pointed out in his reply, there is no way to address this right now.

What's really needed is separate, not overloaded semantics:

1. "*If* you load or store this value, do it with single memory access, but
feel free to optimize (elide load/defer store/hoist load/whatever)."

2. "No, seriously, load/store this value regardless of what you think you
know." == READ_ONCE/WRITE_ONCE

If we start adding READ_ONCE/WRITE_ONCE to every lockless load/store,
worse code will be generated even though the vast majority of current use
is safe.

>> 2. Makes a mess of otherwise readable code.
> 
> We have to disagree here; I think code with {READ,WRITE}_ONCE() is more
> readable, as their presence is a clear indication something special is
> up.

I think you and Paul may wildly underestimate the scale of lockless
use in the kernel not currently annotated by READ_ONCE()/WRITE_ONCE().

Even in primitives designed for lockless concurrency like
rwsem and seqlock and sched/core, much less higher order code like
ipc/msg.c and vfs.

The majority of this lockless use is safe if the assumption
that loads/stores are performed atomically for char/short/int/long/void*
hold; in other words usage #1 above.

>> 3. Error-prone; ie., easy to overlook in review.
> 
> lockless stuff is error prone by nature; what's your point?

That lockless use is very common, even outside core functionality, and
needlessly splattering READ_ONCE/WRITE_ONCE when the existing code
generation is already safe, will not make it better. And that it will be
no safer by adding READ_ONCE/WRITE_ONCE because plenty of accesses will
be overlooked, like the stores to sem->owner are now.

>> There is no practical difference between _always_ using 
>> READ_ONCE()/WRITE_ONCE()
>> (to prevent tearing) and declaring the field volatile.
> 
> There is, declaring the field volatile doesn't make it stand out in the
> code while reading at all; it also doesn't allow you to omit the
> volatile qualifier in places where it doesn't matter.
> 
> The whole; variables aren't volatile, accesses are, thing is still very
> much in effect.

I'm not really seriously arguing for volatile declarations; I'm pointing
out that READ_ONCE()/WRITE_ONCE() has overloaded meaning that is
counter-productive.

For example, you point out that it doesn't allow you to omit the
volatile qualifier but yet we're adding it to underlying primitives
that may or may not be used locklessly.

Guaranteed list_empty() and list_add() are used way more than INIT_LIST_HEAD().

>> So we've come full-circle from volatile-considered-harmful.
> 
> I don't think so; the cases that document talks about are still very
> much relevant and volatile should not be used for that.

The example below is clearly wrong now.

"Another situation where one might be tempted to use volatile is
when the processor is busy-waiting on the value of a variable.  The 
right
way to perform a busy wait is:

while (my_variable != what_i_want)
cpu_relax();

The cpu_relax() call can lower CPU power consumption or yield to a
hyperthreaded twin processor; it

Re: [PATCH 2/2] serial: 8250_mid: Read RX buffer on RX DMA timeout for DNV

2016-05-17 Thread Peter Hurley

On 05/13/2016 04:27 AM, Andy Shevchenko wrote:
> On Fri, 2016-05-13 at 18:15 +0800, kbuild test robot wrote:
>> Hi,
>>
>> [auto build test ERROR on next-20160513]
>> [cannot apply to tty/tty-testing usb/usb-testing v4.6-rc7 v4.6-rc6
>> v4.6-rc5 v4.6-rc7]
>> [if your patch is applied to the wrong git tree, please drop us a note
>> to help improving the system]
>>
>> url:https://github.com/0day-ci/linux/commits/Chuah-Kim-Tatt/Fix-DN
>> V-HSUART-RX-DMA-timeout-interrupt-issue/20160513-162046
>> config: i386-randconfig-s0-201619 (attached as .config)
>> compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
>> reproduce:
>> # save the attached .config to linux build tree
>> make ARCH=i386 
>>
>> Note: the linux-review/Chuah-Kim-Tatt/Fix-DNV-HSUART-RX-DMA-timeout-
>> interrupt-issue/20160513-162046 HEAD
>> 0354112aa9821bec8d278ad06b3d543724f5291d builds fine.
>>   It only hurts bisectibility.
>>
>> All errors (new ones prefixed by >>):
> 
> Peter, what happened to your DMA series in the linux-next? Did I miss
> any discussion related?

Sorry, I don't understand. What is the problem?

And where is this patch coming from? I didn't see it on linux-serial.

Regards,
Peter Hurley


>>>>
>>>> ERROR: "serial8250_rx_dma_flush"
>>>> [drivers/tty/serial/8250/8250_mid.ko] undefined!
>> ---
>> 0-DAY kernel test infrastructureOpen Source Technology
>> Center
>> https://lists.01.org/pipermail/kbuild-all   Intel
>> Corporation
>

Re: [PATCH 2/2] serial: 8250_mid: Read RX buffer on RX DMA timeout for DNV

2016-05-17 Thread Peter Hurley

On 05/13/2016 04:27 AM, Andy Shevchenko wrote:
> On Fri, 2016-05-13 at 18:15 +0800, kbuild test robot wrote:
>> Hi,
>>
>> [auto build test ERROR on next-20160513]
>> [cannot apply to tty/tty-testing usb/usb-testing v4.6-rc7 v4.6-rc6
>> v4.6-rc5 v4.6-rc7]
>> [if your patch is applied to the wrong git tree, please drop us a note
>> to help improving the system]
>>
>> url:https://github.com/0day-ci/linux/commits/Chuah-Kim-Tatt/Fix-DN
>> V-HSUART-RX-DMA-timeout-interrupt-issue/20160513-162046
>> config: i386-randconfig-s0-201619 (attached as .config)
>> compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
>> reproduce:
>> # save the attached .config to linux build tree
>> make ARCH=i386 
>>
>> Note: the linux-review/Chuah-Kim-Tatt/Fix-DNV-HSUART-RX-DMA-timeout-
>> interrupt-issue/20160513-162046 HEAD
>> 0354112aa9821bec8d278ad06b3d543724f5291d builds fine.
>>   It only hurts bisectibility.
>>
>> All errors (new ones prefixed by >>):
> 
> Peter, what happened to your DMA series in the linux-next? Did I miss
> any discussion related?

Sorry, I don't understand. What is the problem?

And where is this patch coming from? I didn't see it on linux-serial.

Regards,
Peter Hurley


>>>>
>>>> ERROR: "serial8250_rx_dma_flush"
>>>> [drivers/tty/serial/8250/8250_mid.ko] undefined!
>> ---
>> 0-DAY kernel test infrastructureOpen Source Technology
>> Center
>> https://lists.01.org/pipermail/kbuild-all   Intel
>> Corporation
>

Re: tty crash in Linux 4.6

2016-05-17 Thread Peter Hurley

On 05/17/2016 08:57 AM, Peter Hurley wrote:
> On 05/16/2016 04:36 PM, Peter Hurley wrote:
>> > Hi Mikulas,
>> > 
>> > On 05/16/2016 01:12 PM, Mikulas Patocka wrote:
>>> >> Hi
>>> >>
>>> >> In the kernel 4.6 I get crashes in the tty layer. I can reproduce the 
>>> >> crash by logging into the machine with ssh and typing before the prompt 
>>> >> appears.
>> > 
>> > Thanks for the report.
>> > I tried to reproduce this a number of times on different machines
>> > with no luck.
>
> I was able to reproduce this crash with a test jig.
> The patch below fixed it, but I'm testing a better patch now, which
> I'll get to you asap.

--- >% ---
Subject: [PATCH] tty: Fix ldisc crash on reopened tty

If the tty has been hungup, the ldisc instance may have been destroyed.
Continued input to the tty will be ignored as long as the ldisc instance
is not visible to the flush_to_ldisc kworker. However, when the tty
is reopened and a new ldisc instance is created, the flush_to_ldisc
kworker can obtain an ldisc reference before the new ldisc is
completely initialized. This will likely crash:

 BUG: unable to handle kernel paging request at 2260
 IP: [] n_tty_receive_buf_common+0x6d/0xb80
 PGD 2ab581067 PUD 290c11067 PMD 0
 Oops:  [#1] PREEMPT SMP
 Modules linked in: nls_iso8859_1 ip6table_filter [.]
 CPU: 2 PID: 103 Comm: kworker/u16:1 Not tainted 4.6.0-rc7+wip-xeon+debug 
#rc7+wip
 Hardware name: Dell Inc. Precision WorkStation T5400  /0RW203, BIOS A11 
04/30/2012
 Workqueue: events_unbound flush_to_ldisc
 task: 8802ad16d100 ti: 8802ad31c000 task.ti: 8802ad31c000
 RIP: 0010:[]  [] 
n_tty_receive_buf_common+0x6d/0xb80
 RSP: 0018:8802ad31fc70  EFLAGS: 00010296
 RAX:  RBX: 8802aaddd800 RCX: 0001
 RDX:  RSI: 810db48f RDI: 0246
 RBP: 8802ad31fd08 R08:  R09: 0001
 R10: 8802aadddb28 R11: 0001 R12: 8800ba6da808
 R13: 8802ad18be80 R14: 8800ba6da858 R15: 8800ba6da800
 FS:  () GS:8802b0a0() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 2260 CR3: 00028ee5d000 CR4: 06e0
 Stack:
  81531219 8802aadddab8 8802aae0 8802aa78
  0001 8800ba6da858 8800ba6da860 8802ad31fd30
  81885f78 81531219  0002
 Call Trace:
  [] ? flush_to_ldisc+0x49/0xd0
  [] ? mutex_lock_nested+0x2c8/0x430
  [] ? flush_to_ldisc+0x49/0xd0
  [] n_tty_receive_buf2+0x14/0x20
  [] tty_ldisc_receive_buf+0x22/0x50
  [] flush_to_ldisc+0xbe/0xd0
  [] process_one_work+0x1ed/0x6e0
  [] ? process_one_work+0x16f/0x6e0
  [] worker_thread+0x4e/0x490
  [] ? process_one_work+0x6e0/0x6e0
  [] kthread+0xf2/0x110
  [] ? preempt_count_sub+0x4c/0x80
  [] ret_from_fork+0x22/0x50
  [] ? kthread_create_on_node+0x220/0x220
 Code: ff ff e8 27 a0 35 00 48 8d 83 78 05 00 00 c7 45 c0 00 00 00 00 48 89 45 
80 48
   8d 83 e0 05 00 00 48 89 85 78 ff ff ff 48 8b 45 b8 <48> 8b b8 60 22 00 
00 48
   8b 30 89 f8 8b 8b 88 04 00 00 29 f0 8d
 RIP  [] n_tty_receive_buf_common+0x6d/0xb80
  RSP 
 CR2: 2260

Ensure the kworker cannot obtain the ldisc reference until the new ldisc
is completely initialized.

Fixes: 892d1fa7eaae ("tty: Destroy ldisc instance on hangup")
Reported-by: Mikulas Patocka <mpato...@redhat.com>
Signed-off-by: Peter Hurley <pe...@hurleysoftware.com>
---
 drivers/tty/tty_ldisc.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index cdd063f..bda0c85 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -669,16 +669,17 @@ int tty_ldisc_reinit(struct tty_struct *tty, int disc)
tty_ldisc_put(tty->ldisc);
}
 
-   /* switch the line discipline */
-   tty->ldisc = ld;
tty_set_termios_ldisc(tty, disc);
-   retval = tty_ldisc_open(tty, tty->ldisc);
+   retval = tty_ldisc_open(tty, ld);
if (retval) {
if (!WARN_ON(disc == N_TTY)) {
-   tty_ldisc_put(tty->ldisc);
-   tty->ldisc = NULL;
+   tty_ldisc_put(ld);
+   ld = NULL;
}
}
+
+   /* switch the line discipline */
+   smp_store_release(>ldisc, ld);
return retval;
 }
 
-- 
2.8.2

Re: tty crash in Linux 4.6

2016-05-17 Thread Peter Hurley

On 05/17/2016 08:57 AM, Peter Hurley wrote:
> On 05/16/2016 04:36 PM, Peter Hurley wrote:
>> > Hi Mikulas,
>> > 
>> > On 05/16/2016 01:12 PM, Mikulas Patocka wrote:
>>> >> Hi
>>> >>
>>> >> In the kernel 4.6 I get crashes in the tty layer. I can reproduce the 
>>> >> crash by logging into the machine with ssh and typing before the prompt 
>>> >> appears.
>> > 
>> > Thanks for the report.
>> > I tried to reproduce this a number of times on different machines
>> > with no luck.
>
> I was able to reproduce this crash with a test jig.
> The patch below fixed it, but I'm testing a better patch now, which
> I'll get to you asap.

--- >% ---
Subject: [PATCH] tty: Fix ldisc crash on reopened tty

If the tty has been hungup, the ldisc instance may have been destroyed.
Continued input to the tty will be ignored as long as the ldisc instance
is not visible to the flush_to_ldisc kworker. However, when the tty
is reopened and a new ldisc instance is created, the flush_to_ldisc
kworker can obtain an ldisc reference before the new ldisc is
completely initialized. This will likely crash:

 BUG: unable to handle kernel paging request at 2260
 IP: [] n_tty_receive_buf_common+0x6d/0xb80
 PGD 2ab581067 PUD 290c11067 PMD 0
 Oops:  [#1] PREEMPT SMP
 Modules linked in: nls_iso8859_1 ip6table_filter [.]
 CPU: 2 PID: 103 Comm: kworker/u16:1 Not tainted 4.6.0-rc7+wip-xeon+debug 
#rc7+wip
 Hardware name: Dell Inc. Precision WorkStation T5400  /0RW203, BIOS A11 
04/30/2012
 Workqueue: events_unbound flush_to_ldisc
 task: 8802ad16d100 ti: 8802ad31c000 task.ti: 8802ad31c000
 RIP: 0010:[]  [] 
n_tty_receive_buf_common+0x6d/0xb80
 RSP: 0018:8802ad31fc70  EFLAGS: 00010296
 RAX:  RBX: 8802aaddd800 RCX: 0001
 RDX:  RSI: 810db48f RDI: 0246
 RBP: 8802ad31fd08 R08:  R09: 0001
 R10: 8802aadddb28 R11: 0001 R12: 8800ba6da808
 R13: 8802ad18be80 R14: 8800ba6da858 R15: 8800ba6da800
 FS:  () GS:8802b0a0() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 2260 CR3: 00028ee5d000 CR4: 06e0
 Stack:
  81531219 8802aadddab8 8802aae0 8802aa78
  0001 8800ba6da858 8800ba6da860 8802ad31fd30
  81885f78 81531219  0002
 Call Trace:
  [] ? flush_to_ldisc+0x49/0xd0
  [] ? mutex_lock_nested+0x2c8/0x430
  [] ? flush_to_ldisc+0x49/0xd0
  [] n_tty_receive_buf2+0x14/0x20
  [] tty_ldisc_receive_buf+0x22/0x50
  [] flush_to_ldisc+0xbe/0xd0
  [] process_one_work+0x1ed/0x6e0
  [] ? process_one_work+0x16f/0x6e0
  [] worker_thread+0x4e/0x490
  [] ? process_one_work+0x6e0/0x6e0
  [] kthread+0xf2/0x110
  [] ? preempt_count_sub+0x4c/0x80
  [] ret_from_fork+0x22/0x50
  [] ? kthread_create_on_node+0x220/0x220
 Code: ff ff e8 27 a0 35 00 48 8d 83 78 05 00 00 c7 45 c0 00 00 00 00 48 89 45 
80 48
   8d 83 e0 05 00 00 48 89 85 78 ff ff ff 48 8b 45 b8 <48> 8b b8 60 22 00 
00 48
   8b 30 89 f8 8b 8b 88 04 00 00 29 f0 8d
 RIP  [] n_tty_receive_buf_common+0x6d/0xb80
  RSP 
 CR2: 2260

Ensure the kworker cannot obtain the ldisc reference until the new ldisc
is completely initialized.

Fixes: 892d1fa7eaae ("tty: Destroy ldisc instance on hangup")
Reported-by: Mikulas Patocka 
Signed-off-by: Peter Hurley 
---
 drivers/tty/tty_ldisc.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index cdd063f..bda0c85 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -669,16 +669,17 @@ int tty_ldisc_reinit(struct tty_struct *tty, int disc)
tty_ldisc_put(tty->ldisc);
}
 
-   /* switch the line discipline */
-   tty->ldisc = ld;
tty_set_termios_ldisc(tty, disc);
-   retval = tty_ldisc_open(tty, tty->ldisc);
+   retval = tty_ldisc_open(tty, ld);
if (retval) {
if (!WARN_ON(disc == N_TTY)) {
-   tty_ldisc_put(tty->ldisc);
-   tty->ldisc = NULL;
+   tty_ldisc_put(ld);
+   ld = NULL;
}
}
+
+   /* switch the line discipline */
+   smp_store_release(>ldisc, ld);
return retval;
 }
 
-- 
2.8.2

Re: [PATCH v3 4/4] locking/rwsem: Streamline the rwsem_optimistic_spin() code

2016-05-17 Thread Peter Hurley

On 05/12/2016 03:56 PM, Waiman Long wrote:
> This patch moves the owner loading and checking code entirely inside of
> rwsem_spin_on_owner() to simplify the logic of rwsem_optimistic_spin()
> loop.

Thanks for this.

Reviewed-by: Peter Hurley <pe...@hurleysoftware.com>

Re: [PATCH v3 4/4] locking/rwsem: Streamline the rwsem_optimistic_spin() code

2016-05-17 Thread Peter Hurley

On 05/12/2016 03:56 PM, Waiman Long wrote:
> This patch moves the owner loading and checking code entirely inside of
> rwsem_spin_on_owner() to simplify the logic of rwsem_optimistic_spin()
> loop.

Thanks for this.

Reviewed-by: Peter Hurley

Re: [PATCH v3 2/4] locking/rwsem: Don't wake up one's own task

2016-05-17 Thread Peter Hurley

On 05/12/2016 03:56 PM, Waiman Long wrote:
> As rwsem_down_read_failed() will queue itself and potentially call
> __rwsem_do_wake(sem, RWSEM_WAKE_ANY), it is possible that a reader
> will try to wake up its own task. This patch adds a check to make
> sure that this won't happen.

Although there's no particular harm in the current code, this at
least spells out this condition is normal (ie., when a failed reader
wakes itself while waking the other waiting readers).

Reviewed-by: Peter Hurley <pe...@hurleysoftware.com>

Re: [PATCH v3 2/4] locking/rwsem: Don't wake up one's own task

2016-05-17 Thread Peter Hurley

On 05/12/2016 03:56 PM, Waiman Long wrote:
> As rwsem_down_read_failed() will queue itself and potentially call
> __rwsem_do_wake(sem, RWSEM_WAKE_ANY), it is possible that a reader
> will try to wake up its own task. This patch adds a check to make
> sure that this won't happen.

Although there's no particular harm in the current code, this at
least spells out this condition is normal (ie., when a failed reader
wakes itself while waking the other waiting readers).

Reviewed-by: Peter Hurley

Re: [PATCH v3 3/4] locking/rwsem: Improve reader wakeup code

2016-05-17 Thread Peter Hurley

On 05/12/2016 03:56 PM, Waiman Long wrote:
> In __rwsem_do_wake(), the reader wakeup code will assume a writer
> has stolen the lock if the active reader/writer count is not 0.
> However, this is not as reliable an indicator as the original
> "< RWSEM_WAITING_BIAS" check. If another reader is present, the code
> will still break out and exit even if the writer is gone. This patch
> changes it to check the same "< RWSEM_WAITING_BIAS" condition to
> reduce the chance of false positive.

Nice.

Reviewed-by: Peter Hurley <pe...@hurleysoftware.com>

Re: [PATCH v3 3/4] locking/rwsem: Improve reader wakeup code

2016-05-17 Thread Peter Hurley

On 05/12/2016 03:56 PM, Waiman Long wrote:
> In __rwsem_do_wake(), the reader wakeup code will assume a writer
> has stolen the lock if the active reader/writer count is not 0.
> However, this is not as reliable an indicator as the original
> "< RWSEM_WAITING_BIAS" check. If another reader is present, the code
> will still break out and exit even if the writer is gone. This patch
> changes it to check the same "< RWSEM_WAITING_BIAS" condition to
> reduce the chance of false positive.

Nice.

Reviewed-by: Peter Hurley

Re: tty crash in Linux 4.6

2016-05-17 Thread Peter Hurley

On 05/16/2016 04:36 PM, Peter Hurley wrote:
> Hi Mikulas,
> 
> On 05/16/2016 01:12 PM, Mikulas Patocka wrote:
>> Hi
>>
>> In the kernel 4.6 I get crashes in the tty layer. I can reproduce the 
>> crash by logging into the machine with ssh and typing before the prompt 
>> appears.
> 
> Thanks for the report.
> I tried to reproduce this a number of times on different machines
> with no luck.

I was able to reproduce this crash with a test jig.
The patch below fixed it, but I'm testing a better patch now, which
I'll get to you asap.

Regards,
Peter Hurley


>> The crash is caused by the pointer tty->disc_data being NULL in the 
>> function n_tty_receive_buf_common. The crash happens on the statement 
>> smp_load_acquire(>read_tail).
>>
>> Bisecting shows that the crashes are caused by the patch 
>> 892d1fa7eaaed9d3c04954cb140c34ebc3393932 ("tty: Destroy ldisc instance on 
>> hangup").
> 
> 
> Can you try the test patch below?
> 
> Regards,
> Peter Hurley
> 
> 
>> Kernel Fault: Code=15 regs=7d9e0720 (Addr=2260)
>> CPU: 0 PID: 3319 Comm: kworker/u8:0 Not tainted 4.6.0 #1
>> Workqueue: events_unbound flush_to_ldisc
>> task: 7c25ea80 ti: 7d9e task.ti: 7d9e
>>
>>  YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
>> PSW: 1100 Not tainted
>> r00-03  0804000f 4076cd10 40475fb4 7f761800
>> r04-07  40749510 0001 7f761800 7d9e0490
>> r08-11  7e722890  7da4ec00 7f763823
>> r12-15   7fc08ea8 7fc08c78 4080e080
>> r16-19  7fc08c00 0001  2260
>> r20-23  7f7618b0 7c25ea80 0001 0001
>> r24-27   080f 7f7618ac 40749510
>> r28-31  0001 7d9e0840 7d9e0720 0001
>> sr00-03  086c8800   086c8800
>> sr04-07     
>>
>> IASQ:   IAOQ: 40475fd4 
>> 40475fd8
>>  IIR: 0e6c00d5ISR:   IOR: 2260
>>  CPU:0   CR30: 7d9e CR31: ff87e7ffbc9e
>>  ORIG_R28: 4080a180
>>  IAOQ[0]: n_tty_receive_buf_common+0xb4/0xbe0
>>  IAOQ[1]: n_tty_receive_buf_common+0xb8/0xbe0
>>  RP(r2): n_tty_receive_buf_common+0x94/0xbe0
>> Backtrace:
>>  [<40476b14>] n_tty_receive_buf2+0x14/0x20
>>  [<4047a208>] tty_ldisc_receive_buf+0x30/0x90
>>  [<4047a544>] flush_to_ldisc+0x144/0x1c8
>>  [<402556bc>] process_one_work+0x1b4/0x460
>>  [<40255bbc>] worker_thread+0x1e4/0x5e0
>>  [<4025d454>] kthread+0x134/0x168
> 
> --- >% ---
> diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
> index 68947f6..f271832 100644
> --- a/drivers/tty/tty_ldisc.c
> +++ b/drivers/tty/tty_ldisc.c
> @@ -653,7 +653,7 @@ static void tty_reset_termios(struct tty_struct *tty)
>   *   Returns 0 if successful, otherwise error code < 0
>   */
>  
> -int tty_ldisc_reinit(struct tty_struct *tty, int disc)
> +static int __tty_ldisc_reinit(struct tty_struct *tty, int disc)
>  {
>   struct tty_ldisc *ld;
>   int retval;
> @@ -682,6 +682,16 @@ int tty_ldisc_reinit(struct tty_struct *tty, int disc)
>   return retval;
>  }
>  
> +int tty_ldisc_reinit(struct tty_struct *tty, int disc)
> +{
> + int retval;
> +
> + tty_ldisc_lock(tty, MAX_SCHEDULE_TIMEOUT);
> + retval = __tty_ldisc_reinit(tty, disc);
> + tty_ldisc_unlock(tty);
> + return retval;
> +}
> +
>  /**
>   *   tty_ldisc_hangup-   hangup ldisc reset
>   *   @tty: tty being hung up
> @@ -732,8 +742,8 @@ void tty_ldisc_hangup(struct tty_struct *tty, bool reinit)
>  
>   if (tty->ldisc) {
>   if (reinit) {
> - if (tty_ldisc_reinit(tty, tty->termios.c_line) < 0)
> - tty_ldisc_reinit(tty, N_TTY);
> + if (__tty_ldisc_reinit(tty, tty->termios.c_line) < 0)
> + __tty_ldisc_reinit(tty, N_TTY);
>   } else
>   tty_ldisc_kill(tty);
>   }
>

Re: tty crash in Linux 4.6

2016-05-17 Thread Peter Hurley

On 05/16/2016 04:36 PM, Peter Hurley wrote:
> Hi Mikulas,
> 
> On 05/16/2016 01:12 PM, Mikulas Patocka wrote:
>> Hi
>>
>> In the kernel 4.6 I get crashes in the tty layer. I can reproduce the 
>> crash by logging into the machine with ssh and typing before the prompt 
>> appears.
> 
> Thanks for the report.
> I tried to reproduce this a number of times on different machines
> with no luck.

I was able to reproduce this crash with a test jig.
The patch below fixed it, but I'm testing a better patch now, which
I'll get to you asap.

Regards,
Peter Hurley


>> The crash is caused by the pointer tty->disc_data being NULL in the 
>> function n_tty_receive_buf_common. The crash happens on the statement 
>> smp_load_acquire(>read_tail).
>>
>> Bisecting shows that the crashes are caused by the patch 
>> 892d1fa7eaaed9d3c04954cb140c34ebc3393932 ("tty: Destroy ldisc instance on 
>> hangup").
> 
> 
> Can you try the test patch below?
> 
> Regards,
> Peter Hurley
> 
> 
>> Kernel Fault: Code=15 regs=7d9e0720 (Addr=2260)
>> CPU: 0 PID: 3319 Comm: kworker/u8:0 Not tainted 4.6.0 #1
>> Workqueue: events_unbound flush_to_ldisc
>> task: 7c25ea80 ti: 7d9e task.ti: 7d9e
>>
>>  YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
>> PSW: 1100 Not tainted
>> r00-03  0804000f 4076cd10 40475fb4 7f761800
>> r04-07  40749510 0001 7f761800 7d9e0490
>> r08-11  7e722890  7da4ec00 7f763823
>> r12-15   7fc08ea8 7fc08c78 4080e080
>> r16-19  7fc08c00 0001  2260
>> r20-23  7f7618b0 7c25ea80 0001 0001
>> r24-27   080f 7f7618ac 40749510
>> r28-31  0001 7d9e0840 7d9e0720 0001
>> sr00-03  086c8800   086c8800
>> sr04-07     
>>
>> IASQ:   IAOQ: 40475fd4 
>> 40475fd8
>>  IIR: 0e6c00d5ISR:   IOR: 2260
>>  CPU:0   CR30: 7d9e CR31: ff87e7ffbc9e
>>  ORIG_R28: 4080a180
>>  IAOQ[0]: n_tty_receive_buf_common+0xb4/0xbe0
>>  IAOQ[1]: n_tty_receive_buf_common+0xb8/0xbe0
>>  RP(r2): n_tty_receive_buf_common+0x94/0xbe0
>> Backtrace:
>>  [<40476b14>] n_tty_receive_buf2+0x14/0x20
>>  [<4047a208>] tty_ldisc_receive_buf+0x30/0x90
>>  [<4047a544>] flush_to_ldisc+0x144/0x1c8
>>  [<402556bc>] process_one_work+0x1b4/0x460
>>  [<40255bbc>] worker_thread+0x1e4/0x5e0
>>  [<4025d454>] kthread+0x134/0x168
> 
> --- >% ---
> diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
> index 68947f6..f271832 100644
> --- a/drivers/tty/tty_ldisc.c
> +++ b/drivers/tty/tty_ldisc.c
> @@ -653,7 +653,7 @@ static void tty_reset_termios(struct tty_struct *tty)
>   *   Returns 0 if successful, otherwise error code < 0
>   */
>  
> -int tty_ldisc_reinit(struct tty_struct *tty, int disc)
> +static int __tty_ldisc_reinit(struct tty_struct *tty, int disc)
>  {
>   struct tty_ldisc *ld;
>   int retval;
> @@ -682,6 +682,16 @@ int tty_ldisc_reinit(struct tty_struct *tty, int disc)
>   return retval;
>  }
>  
> +int tty_ldisc_reinit(struct tty_struct *tty, int disc)
> +{
> + int retval;
> +
> + tty_ldisc_lock(tty, MAX_SCHEDULE_TIMEOUT);
> + retval = __tty_ldisc_reinit(tty, disc);
> + tty_ldisc_unlock(tty);
> + return retval;
> +}
> +
>  /**
>   *   tty_ldisc_hangup-   hangup ldisc reset
>   *   @tty: tty being hung up
> @@ -732,8 +742,8 @@ void tty_ldisc_hangup(struct tty_struct *tty, bool reinit)
>  
>   if (tty->ldisc) {
>   if (reinit) {
> - if (tty_ldisc_reinit(tty, tty->termios.c_line) < 0)
> - tty_ldisc_reinit(tty, N_TTY);
> + if (__tty_ldisc_reinit(tty, tty->termios.c_line) < 0)
> + __tty_ldisc_reinit(tty, N_TTY);
>   } else
>   tty_ldisc_kill(tty);
>   }
>

Re: ast: cursor flashing softlockups

2016-05-17 Thread Peter Hurley

[ +to Scot Doyle ]

Scot, please take a look at this soft lockup.

Regards,
Peter Hurley


Hi Ming,

On 05/17/2016 02:12 AM, Ming Lei wrote:
> Hi,
> 
> On Tue, May 17, 2016 at 4:07 AM, Dann Frazier
> <dann.fraz...@canonical.com> wrote:
>> Hi,
>>  I'm observing a soft lockup issue w/ the ASPEED controller on an
>> arm64 server platform. This was originally seen on Ubuntu's 4.4
>> kernel, but it is reproducible w/ vanilla 4.6-rc7 as well.
>>
>> [   32.792656] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s!
>> [swapper/38:0]
>>
>> I observe this just once each time I boot into debian-installer (I'm
>> using a serial console, but the ast module gets loaded during
>> startup).
> 
> I have figured out that it is caused by 'mod_timer(timer, jiffies)' and
> 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
> when the issue happened.

Thanks for tracking this down.

This softlockup looks to be caused by:

commit 27a4c827c34ac4256a190cc9d24607f953c1c459
Author: Scot Doyle <lkm...@scotdoyle.com>
Date:   Thu Mar 26 13:56:38 2015 +

fbcon: use the cursor blink interval provided by vt

vt now provides a cursor blink interval via vc_data. Use this
interval instead of the currently hardcoded 200 msecs. Store it in
fbcon_ops to avoid locking the console in cursor_timer_handler().

Signed-off-by: Scot Doyle <lkm...@scotdoyle.com>
Acked-by: Pavel Machek <pa...@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

and

commit bd63364caa8df38bad2b25b11b2a1b849475cce5
Author: Scot Doyle <lkm...@scotdoyle.com>
Date:   Thu Mar 26 13:54:39 2015 +

vt: add cursor blink interval escape sequence

Add an escape sequence to specify the current console's cursor blink
interval. The interval is specified as a number of milliseconds 
until
the next cursor display state toggle, from 50 to 65535. 
/proc/loadavg
did not show a difference with a one msec interval, but the lower
bound is set to 50 msecs since slower hardware wasn't tested.

Store the interval in the vc_data structure for later access by 
fbcon,
initializing the value to fbcon's current hardcoded value of 200 
msecs.

Signed-off-by: Scot Doyle <lkm...@scotdoyle.com>
Acked-by: Pavel Machek <pa...@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>



> Looks it is a real fbcon/vt issue, see following:
> 
> fbcon_init()
> <-.con_init
>   <-visual_init()
> 
> reset_terminal()
> <-vc_init()
> 
> vc->vc_cur_blink_ms is just set in reset_terminal() from vc_init() path,
> and ops->cur_blink_jiffies is figured out from vc->vc_cur_blink_ms
> in fbcon_init().
> 
> And visual_init() is always run before vc_init(), so ops->cur_blink_jiffies
> is initialized as zero and cause the soft lockup issue finally.
> 
> Thanks,
> Ming
> 
>>
>> perf shows that the CPU caught by the NMI is typically in code
>> updating the cursor timer:
>>
>> -   16.92%  swapper  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>>- _raw_spin_unlock_irqrestore
>>   + 16.87% mod_timer
>>   + 0.05% cursor_timer_handler
>> -   12.15%  swapper  [kernel.kallsyms]  [k] queue_work_on
>>- queue_work_on
>>   + 12.00% cursor_timer_handler
>>   + 0.15% call_timer_fn
>> +   10.98%  swapper  [kernel.kallsyms]  [k] run_timer_softirq
>> -2.23%  swapper  [kernel.kallsyms]  [k] mod_timer
>>- mod_timer
>>   + 1.97% cursor_timer_handler
>>   + 0.26% call_timer_fn
>>
>> During the same period, I can see that another CPU is actively
>> executing the timer function:
>>
>> -   42.18%  kworker/u96:2  [kernel.kallsyms]  [k] ww_mutex_unlock
>>- ww_mutex_unlock
>>   - 40.70% ast_dirty_update
>>ast_imageblit
>>soft_cursor
>>bit_cursor
>>fb_flashcursor
>>process_one_work
>>worker_thread
>>kthread
>>ret_from_fork
>>   + 1.48% ast_imageblit
>> -   40.15%  kworker/u96:2  [kernel.kallsyms]  [k] __memcpy_toio
>>- __memcpy_toio
>>   + 31.54% ast_dirty_update
>>   + 8.61% ast_imageblit
>>
>> Using the graph function tracer on fb_flashcursor(), I see that
>> ast_dirty_update usually takes around 60 us, in which it makes 16
>> calls to __memcpy_toio(). However, there is always one instance on
>> every boot of the installer where ast_dirty_update() takes ~98 *ms* to
>> complete, during which it makes 743 calls to __memcpy_toio(). While
>> that  doesn't directly account for the full 22s, I do wonder if that
>> maybe a smoking gun.
>>
>> fyi, this is being tracked at: https://bugs.launchpad.net/bugs/1574814
>>
>>   -dann

Re: ast: cursor flashing softlockups

2016-05-17 Thread Peter Hurley

[ +to Scot Doyle ]

Scot, please take a look at this soft lockup.

Regards,
Peter Hurley


Hi Ming,

On 05/17/2016 02:12 AM, Ming Lei wrote:
> Hi,
> 
> On Tue, May 17, 2016 at 4:07 AM, Dann Frazier
>  wrote:
>> Hi,
>>  I'm observing a soft lockup issue w/ the ASPEED controller on an
>> arm64 server platform. This was originally seen on Ubuntu's 4.4
>> kernel, but it is reproducible w/ vanilla 4.6-rc7 as well.
>>
>> [   32.792656] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s!
>> [swapper/38:0]
>>
>> I observe this just once each time I boot into debian-installer (I'm
>> using a serial console, but the ast module gets loaded during
>> startup).
> 
> I have figured out that it is caused by 'mod_timer(timer, jiffies)' and
> 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
> when the issue happened.

Thanks for tracking this down.

This softlockup looks to be caused by:

commit 27a4c827c34ac4256a190cc9d24607f953c1c459
Author: Scot Doyle 
Date:   Thu Mar 26 13:56:38 2015 +

fbcon: use the cursor blink interval provided by vt

vt now provides a cursor blink interval via vc_data. Use this
interval instead of the currently hardcoded 200 msecs. Store it in
fbcon_ops to avoid locking the console in cursor_timer_handler().

Signed-off-by: Scot Doyle 
Acked-by: Pavel Machek 
Signed-off-by: Greg Kroah-Hartman 

and

commit bd63364caa8df38bad2b25b11b2a1b849475cce5
Author: Scot Doyle 
Date:   Thu Mar 26 13:54:39 2015 +

vt: add cursor blink interval escape sequence

Add an escape sequence to specify the current console's cursor blink
interval. The interval is specified as a number of milliseconds 
until
the next cursor display state toggle, from 50 to 65535. 
/proc/loadavg
did not show a difference with a one msec interval, but the lower
bound is set to 50 msecs since slower hardware wasn't tested.

Store the interval in the vc_data structure for later access by 
fbcon,
initializing the value to fbcon's current hardcoded value of 200 
msecs.

Signed-off-by: Scot Doyle 
Acked-by: Pavel Machek 
Signed-off-by: Greg Kroah-Hartman 



> Looks it is a real fbcon/vt issue, see following:
> 
> fbcon_init()
> <-.con_init
>   <-visual_init()
> 
> reset_terminal()
> <-vc_init()
> 
> vc->vc_cur_blink_ms is just set in reset_terminal() from vc_init() path,
> and ops->cur_blink_jiffies is figured out from vc->vc_cur_blink_ms
> in fbcon_init().
> 
> And visual_init() is always run before vc_init(), so ops->cur_blink_jiffies
> is initialized as zero and cause the soft lockup issue finally.
> 
> Thanks,
> Ming
> 
>>
>> perf shows that the CPU caught by the NMI is typically in code
>> updating the cursor timer:
>>
>> -   16.92%  swapper  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>>- _raw_spin_unlock_irqrestore
>>   + 16.87% mod_timer
>>   + 0.05% cursor_timer_handler
>> -   12.15%  swapper  [kernel.kallsyms]  [k] queue_work_on
>>- queue_work_on
>>   + 12.00% cursor_timer_handler
>>   + 0.15% call_timer_fn
>> +   10.98%  swapper  [kernel.kallsyms]  [k] run_timer_softirq
>> -2.23%  swapper  [kernel.kallsyms]  [k] mod_timer
>>- mod_timer
>>   + 1.97% cursor_timer_handler
>>   + 0.26% call_timer_fn
>>
>> During the same period, I can see that another CPU is actively
>> executing the timer function:
>>
>> -   42.18%  kworker/u96:2  [kernel.kallsyms]  [k] ww_mutex_unlock
>>- ww_mutex_unlock
>>   - 40.70% ast_dirty_update
>>ast_imageblit
>>soft_cursor
>>bit_cursor
>>fb_flashcursor
>>process_one_work
>>worker_thread
>>kthread
>>ret_from_fork
>>   + 1.48% ast_imageblit
>> -   40.15%  kworker/u96:2  [kernel.kallsyms]  [k] __memcpy_toio
>>- __memcpy_toio
>>   + 31.54% ast_dirty_update
>>   + 8.61% ast_imageblit
>>
>> Using the graph function tracer on fb_flashcursor(), I see that
>> ast_dirty_update usually takes around 60 us, in which it makes 16
>> calls to __memcpy_toio(). However, there is always one instance on
>> every boot of the installer where ast_dirty_update() takes ~98 *ms* to
>> complete, during which it makes 743 calls to __memcpy_toio(). While
>> that  doesn't directly account for the full 22s, I do wonder if that
>> maybe a smoking gun.
>>
>> fyi, this is being tracked at: https://bugs.launchpad.net/bugs/1574814
>>
>>   -dann

Re: tty crash in Linux 4.6

2016-05-16 Thread Peter Hurley

Hi Mikulas,

On 05/16/2016 01:12 PM, Mikulas Patocka wrote:
> Hi
> 
> In the kernel 4.6 I get crashes in the tty layer. I can reproduce the 
> crash by logging into the machine with ssh and typing before the prompt 
> appears.

Thanks for the report.
I tried to reproduce this a number of times on different machines
with no luck.


> The crash is caused by the pointer tty->disc_data being NULL in the 
> function n_tty_receive_buf_common. The crash happens on the statement 
> smp_load_acquire(>read_tail).
> 
> Bisecting shows that the crashes are caused by the patch 
> 892d1fa7eaaed9d3c04954cb140c34ebc3393932 ("tty: Destroy ldisc instance on 
> hangup").


Can you try the test patch below?

Regards,
Peter Hurley


> Kernel Fault: Code=15 regs=7d9e0720 (Addr=2260)
> CPU: 0 PID: 3319 Comm: kworker/u8:0 Not tainted 4.6.0 #1
> Workqueue: events_unbound flush_to_ldisc
> task: 7c25ea80 ti: 7d9e task.ti: 7d9e
> 
>  YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> PSW: 1100 Not tainted
> r00-03  0804000f 4076cd10 40475fb4 7f761800
> r04-07  40749510 0001 7f761800 7d9e0490
> r08-11  7e722890  7da4ec00 7f763823
> r12-15   7fc08ea8 7fc08c78 4080e080
> r16-19  7fc08c00 0001  2260
> r20-23  7f7618b0 7c25ea80 0001 0001
> r24-27   080f 7f7618ac 40749510
> r28-31  0001 7d9e0840 7d9e0720 0001
> sr00-03  086c8800   086c8800
> sr04-07     
> 
> IASQ:   IAOQ: 40475fd4 
> 40475fd8
>  IIR: 0e6c00d5ISR:   IOR: 2260
>  CPU:0   CR30: 7d9e CR31: ff87e7ffbc9e
>  ORIG_R28: 4080a180
>  IAOQ[0]: n_tty_receive_buf_common+0xb4/0xbe0
>  IAOQ[1]: n_tty_receive_buf_common+0xb8/0xbe0
>  RP(r2): n_tty_receive_buf_common+0x94/0xbe0
> Backtrace:
>  [<40476b14>] n_tty_receive_buf2+0x14/0x20
>  [<4047a208>] tty_ldisc_receive_buf+0x30/0x90
>  [<4047a544>] flush_to_ldisc+0x144/0x1c8
>  [<402556bc>] process_one_work+0x1b4/0x460
>  [<40255bbc>] worker_thread+0x1e4/0x5e0
>  [<4025d454>] kthread+0x134/0x168

--- >% ---
diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 68947f6..f271832 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -653,7 +653,7 @@ static void tty_reset_termios(struct tty_struct *tty)
  * Returns 0 if successful, otherwise error code < 0
  */
 
-int tty_ldisc_reinit(struct tty_struct *tty, int disc)
+static int __tty_ldisc_reinit(struct tty_struct *tty, int disc)
 {
struct tty_ldisc *ld;
int retval;
@@ -682,6 +682,16 @@ int tty_ldisc_reinit(struct tty_struct *tty, int disc)
return retval;
 }
 
+int tty_ldisc_reinit(struct tty_struct *tty, int disc)
+{
+   int retval;
+
+   tty_ldisc_lock(tty, MAX_SCHEDULE_TIMEOUT);
+   retval = __tty_ldisc_reinit(tty, disc);
+   tty_ldisc_unlock(tty);
+   return retval;
+}
+
 /**
  * tty_ldisc_hangup-   hangup ldisc reset
  * @tty: tty being hung up
@@ -732,8 +742,8 @@ void tty_ldisc_hangup(struct tty_struct *tty, bool reinit)
 
if (tty->ldisc) {
if (reinit) {
-   if (tty_ldisc_reinit(tty, tty->termios.c_line) < 0)
-   tty_ldisc_reinit(tty, N_TTY);
+   if (__tty_ldisc_reinit(tty, tty->termios.c_line) < 0)
+   __tty_ldisc_reinit(tty, N_TTY);
} else
tty_ldisc_kill(tty);
}

Re: tty crash in Linux 4.6

2016-05-16 Thread Peter Hurley

Hi Mikulas,

On 05/16/2016 01:12 PM, Mikulas Patocka wrote:
> Hi
> 
> In the kernel 4.6 I get crashes in the tty layer. I can reproduce the 
> crash by logging into the machine with ssh and typing before the prompt 
> appears.

Thanks for the report.
I tried to reproduce this a number of times on different machines
with no luck.


> The crash is caused by the pointer tty->disc_data being NULL in the 
> function n_tty_receive_buf_common. The crash happens on the statement 
> smp_load_acquire(>read_tail).
> 
> Bisecting shows that the crashes are caused by the patch 
> 892d1fa7eaaed9d3c04954cb140c34ebc3393932 ("tty: Destroy ldisc instance on 
> hangup").


Can you try the test patch below?

Regards,
Peter Hurley


> Kernel Fault: Code=15 regs=7d9e0720 (Addr=2260)
> CPU: 0 PID: 3319 Comm: kworker/u8:0 Not tainted 4.6.0 #1
> Workqueue: events_unbound flush_to_ldisc
> task: 7c25ea80 ti: 7d9e task.ti: 7d9e
> 
>  YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> PSW: 1100 Not tainted
> r00-03  0804000f 4076cd10 40475fb4 7f761800
> r04-07  40749510 0001 7f761800 7d9e0490
> r08-11  7e722890  7da4ec00 7f763823
> r12-15   7fc08ea8 7fc08c78 4080e080
> r16-19  7fc08c00 0001  2260
> r20-23  7f7618b0 7c25ea80 0001 0001
> r24-27   080f 7f7618ac 40749510
> r28-31  0001 7d9e0840 7d9e0720 0001
> sr00-03  086c8800   086c8800
> sr04-07     
> 
> IASQ:   IAOQ: 40475fd4 
> 40475fd8
>  IIR: 0e6c00d5ISR:   IOR: 2260
>  CPU:0   CR30: 7d9e CR31: ff87e7ffbc9e
>  ORIG_R28: 4080a180
>  IAOQ[0]: n_tty_receive_buf_common+0xb4/0xbe0
>  IAOQ[1]: n_tty_receive_buf_common+0xb8/0xbe0
>  RP(r2): n_tty_receive_buf_common+0x94/0xbe0
> Backtrace:
>  [<40476b14>] n_tty_receive_buf2+0x14/0x20
>  [<4047a208>] tty_ldisc_receive_buf+0x30/0x90
>  [<4047a544>] flush_to_ldisc+0x144/0x1c8
>  [<402556bc>] process_one_work+0x1b4/0x460
>  [<40255bbc>] worker_thread+0x1e4/0x5e0
>  [<4025d454>] kthread+0x134/0x168

--- >% ---
diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 68947f6..f271832 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -653,7 +653,7 @@ static void tty_reset_termios(struct tty_struct *tty)
  * Returns 0 if successful, otherwise error code < 0
  */
 
-int tty_ldisc_reinit(struct tty_struct *tty, int disc)
+static int __tty_ldisc_reinit(struct tty_struct *tty, int disc)
 {
struct tty_ldisc *ld;
int retval;
@@ -682,6 +682,16 @@ int tty_ldisc_reinit(struct tty_struct *tty, int disc)
return retval;
 }
 
+int tty_ldisc_reinit(struct tty_struct *tty, int disc)
+{
+   int retval;
+
+   tty_ldisc_lock(tty, MAX_SCHEDULE_TIMEOUT);
+   retval = __tty_ldisc_reinit(tty, disc);
+   tty_ldisc_unlock(tty);
+   return retval;
+}
+
 /**
  * tty_ldisc_hangup-   hangup ldisc reset
  * @tty: tty being hung up
@@ -732,8 +742,8 @@ void tty_ldisc_hangup(struct tty_struct *tty, bool reinit)
 
if (tty->ldisc) {
if (reinit) {
-   if (tty_ldisc_reinit(tty, tty->termios.c_line) < 0)
-   tty_ldisc_reinit(tty, N_TTY);
+   if (__tty_ldisc_reinit(tty, tty->termios.c_line) < 0)
+   __tty_ldisc_reinit(tty, N_TTY);
} else
tty_ldisc_kill(tty);
}

Re: [PATCH 1/1] tty/serial: to support 8250 earlycon can be enabled independently

2016-05-16 Thread Peter Hurley

On 05/16/2016 04:35 AM, Zhen Lei wrote:
> Sometimes, we may only use SSH to login, and build 8250 uart driver as a
> ko(insmod if needed). But the earlycon may still be necessary, because
> the kernel boot process may take a long time. It's not good to display
> nothing but ask people to wait patiently.

I'm confused; you want the possibility of earlycon but _not_ a normal
serial console?

This configuration is unsafe because nothing prevents the 8250 driver
and 8250 earlycon from concurrently accessing the hardware.


> In addition, the 8250.ko can not be worked if we have not opened any
> other serial drivers, because SERIAL_CORE would not be selected.

I don't understand what this means.

Regards,
Peter Hurley


> Signed-off-by: Zhen Lei <thunder.leiz...@huawei.com>
> ---
>  drivers/tty/serial/8250/Kconfig  | 9 +++--
>  drivers/tty/serial/8250/Makefile | 1 -
>  drivers/tty/serial/Makefile  | 1 +
>  3 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/tty/serial/8250/Kconfig b/drivers/tty/serial/8250/Kconfig
> index 4d7cb9c..2992f0a 100644
> --- a/drivers/tty/serial/8250/Kconfig
> +++ b/drivers/tty/serial/8250/Kconfig
> @@ -3,6 +3,12 @@
>  # you somehow have an implicit or explicit dependency on SERIAL_8250.
>  #
> 
> +config SERIAL_8250_EARLYCON
> + bool "Early console using 8250"
> + select SERIAL_CORE
> + select SERIAL_CORE_CONSOLE
> + select SERIAL_EARLYCON
> +
>  config SERIAL_8250
>   tristate "8250/16550 and compatible serial support"
>   select SERIAL_CORE
> @@ -60,8 +66,7 @@ config SERIAL_8250_PNP
>  config SERIAL_8250_CONSOLE
>   bool "Console on 8250/16550 and compatible serial port"
>   depends on SERIAL_8250=y
> - select SERIAL_CORE_CONSOLE
> - select SERIAL_EARLYCON
> + select SERIAL_8250_EARLYCON
>   ---help---
> If you say Y here, it will be possible to use a serial port as the
> system console (the system console is the device which receives all
> diff --git a/drivers/tty/serial/8250/Makefile 
> b/drivers/tty/serial/8250/Makefile
> index c9a2d6e..1f24c74 100644
> --- a/drivers/tty/serial/8250/Makefile
> +++ b/drivers/tty/serial/8250/Makefile
> @@ -13,7 +13,6 @@ obj-$(CONFIG_SERIAL_8250_HP300) += 8250_hp300.o
>  obj-$(CONFIG_SERIAL_8250_CS) += serial_cs.o
>  obj-$(CONFIG_SERIAL_8250_ACORN)  += 8250_acorn.o
>  obj-$(CONFIG_SERIAL_8250_BCM2835AUX) += 8250_bcm2835aux.o
> -obj-$(CONFIG_SERIAL_8250_CONSOLE)+= 8250_early.o
>  obj-$(CONFIG_SERIAL_8250_FOURPORT)   += 8250_fourport.o
>  obj-$(CONFIG_SERIAL_8250_ACCENT) += 8250_accent.o
>  obj-$(CONFIG_SERIAL_8250_BOCA)   += 8250_boca.o
> diff --git a/drivers/tty/serial/Makefile b/drivers/tty/serial/Makefile
> index 8c261ad..cd84181 100644
> --- a/drivers/tty/serial/Makefile
> +++ b/drivers/tty/serial/Makefile
> @@ -19,6 +19,7 @@ obj-$(CONFIG_SERIAL_SUNSAB) += sunsab.o
> 
>  # Now bring in any enabled 8250/16450/16550 type drivers.
>  obj-$(CONFIG_SERIAL_8250) += 8250/
> +obj-$(CONFIG_SERIAL_8250_EARLYCON) += 8250/8250_early.o
> 
>  obj-$(CONFIG_SERIAL_AMBA_PL010) += amba-pl010.o
>  obj-$(CONFIG_SERIAL_AMBA_PL011) += amba-pl011.o
> --
> 2.5.0
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-serial" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [PATCH 1/1] tty/serial: to support 8250 earlycon can be enabled independently

2016-05-16 Thread Peter Hurley

On 05/16/2016 04:35 AM, Zhen Lei wrote:
> Sometimes, we may only use SSH to login, and build 8250 uart driver as a
> ko(insmod if needed). But the earlycon may still be necessary, because
> the kernel boot process may take a long time. It's not good to display
> nothing but ask people to wait patiently.

I'm confused; you want the possibility of earlycon but _not_ a normal
serial console?

This configuration is unsafe because nothing prevents the 8250 driver
and 8250 earlycon from concurrently accessing the hardware.


> In addition, the 8250.ko can not be worked if we have not opened any
> other serial drivers, because SERIAL_CORE would not be selected.

I don't understand what this means.

Regards,
Peter Hurley


> Signed-off-by: Zhen Lei 
> ---
>  drivers/tty/serial/8250/Kconfig  | 9 +++--
>  drivers/tty/serial/8250/Makefile | 1 -
>  drivers/tty/serial/Makefile  | 1 +
>  3 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/tty/serial/8250/Kconfig b/drivers/tty/serial/8250/Kconfig
> index 4d7cb9c..2992f0a 100644
> --- a/drivers/tty/serial/8250/Kconfig
> +++ b/drivers/tty/serial/8250/Kconfig
> @@ -3,6 +3,12 @@
>  # you somehow have an implicit or explicit dependency on SERIAL_8250.
>  #
> 
> +config SERIAL_8250_EARLYCON
> + bool "Early console using 8250"
> + select SERIAL_CORE
> + select SERIAL_CORE_CONSOLE
> + select SERIAL_EARLYCON
> +
>  config SERIAL_8250
>   tristate "8250/16550 and compatible serial support"
>   select SERIAL_CORE
> @@ -60,8 +66,7 @@ config SERIAL_8250_PNP
>  config SERIAL_8250_CONSOLE
>   bool "Console on 8250/16550 and compatible serial port"
>   depends on SERIAL_8250=y
> - select SERIAL_CORE_CONSOLE
> - select SERIAL_EARLYCON
> + select SERIAL_8250_EARLYCON
>   ---help---
> If you say Y here, it will be possible to use a serial port as the
> system console (the system console is the device which receives all
> diff --git a/drivers/tty/serial/8250/Makefile 
> b/drivers/tty/serial/8250/Makefile
> index c9a2d6e..1f24c74 100644
> --- a/drivers/tty/serial/8250/Makefile
> +++ b/drivers/tty/serial/8250/Makefile
> @@ -13,7 +13,6 @@ obj-$(CONFIG_SERIAL_8250_HP300) += 8250_hp300.o
>  obj-$(CONFIG_SERIAL_8250_CS) += serial_cs.o
>  obj-$(CONFIG_SERIAL_8250_ACORN)  += 8250_acorn.o
>  obj-$(CONFIG_SERIAL_8250_BCM2835AUX) += 8250_bcm2835aux.o
> -obj-$(CONFIG_SERIAL_8250_CONSOLE)+= 8250_early.o
>  obj-$(CONFIG_SERIAL_8250_FOURPORT)   += 8250_fourport.o
>  obj-$(CONFIG_SERIAL_8250_ACCENT) += 8250_accent.o
>  obj-$(CONFIG_SERIAL_8250_BOCA)   += 8250_boca.o
> diff --git a/drivers/tty/serial/Makefile b/drivers/tty/serial/Makefile
> index 8c261ad..cd84181 100644
> --- a/drivers/tty/serial/Makefile
> +++ b/drivers/tty/serial/Makefile
> @@ -19,6 +19,7 @@ obj-$(CONFIG_SERIAL_SUNSAB) += sunsab.o
> 
>  # Now bring in any enabled 8250/16450/16550 type drivers.
>  obj-$(CONFIG_SERIAL_8250) += 8250/
> +obj-$(CONFIG_SERIAL_8250_EARLYCON) += 8250/8250_early.o
> 
>  obj-$(CONFIG_SERIAL_AMBA_PL010) += amba-pl010.o
>  obj-$(CONFIG_SERIAL_AMBA_PL011) += amba-pl011.o
> --
> 2.5.0
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-serial" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-16 Thread Peter Hurley

On 05/16/2016 05:17 AM, Paul E. McKenney wrote:
> On Mon, May 16, 2016 at 01:09:48PM +0200, Peter Zijlstra wrote:
>> On Fri, May 13, 2016 at 10:58:05AM -0700, Peter Hurley wrote:
>>>> Note that barrier() and READ_ONCE() have overlapping but not identical
>>>> results and the combined use actually makes sense here.
>>>>
>>>> Yes, a barrier() anywhere in the loop will force a reload of the
>>>> variable, _however_ it doesn't force that reload to not suffer from
>>>> load tearing.
>>>>
>>>> Using volatile also forces a reload, but also ensures the load cannot
>>>> be torn IFF it is of machine word side and naturally aligned.
>>>>
>>>> So while the READ_ONCE() here is pointless for forcing the reload;
>>>> that's already ensured, we still need to make sure the load isn't torn.
>>>
>>> If load tearing a naturally aligned pointer is a real code generation
>>> possibility then the rcu list code is broken too (which loads ->next
>>> directly; cf. list_for_each_entry_rcu() & list_for_each_entry_lockless()).
>>>
>>> For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
>>> that had to do with control dependencies and not load tearing.
>>
>> Well, Paul is the one who started the whole load/store tearing thing, so
>> I suppose he knows what he's doing.
> 
> That had to do with suppressing false positives for one of Dmitry
> Vjukov's concurrency checkers.  I suspect that Peter Hurley is right
> that continued use of that checker would identify other places needing
> READ_ONCE(), but from what I understand that is on hold pending a formal
> definition of the Linux-kernel memory model.  (KCC and Dmitry (CCed)
> can correct my if I am confused on this point.)
> 
>> That said; its a fairly recent as things go so lots of code hasn't been
>> updated yet, and its also a very unlikely thing for a compiler to do;
>> since it mostly doesn't make sense to emit multiple instructions where
>> one will do, so its not a very high priority thing either.
>>
>> But from what I understand, the compiler is free to emit all kinds of
>> nonsense for !volatile loads/stores.
> 
> That is quite true.  :-/
> 
>>> OTOH, this patch might actually produce store-tearing:
>>>
>>> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
>>> +{
>>> +   /*
>>> +* We check the owner value first to make sure that we will only
>>> +* do a write to the rwsem cacheline when it is really necessary
>>> +* to minimize cacheline contention.
>>> +*/
>>> +   if (sem->owner != RWSEM_READER_OWNED)
>>> +   sem->owner = RWSEM_READER_OWNED;
>>> +}
>>
>> Correct; which is why we should always use {READ,WRITE}_ONCE() for
>> anything that is used locklessly.
> 
> Completely agreed.  Improve readability of code by flagging lockless
> shared-memory accesses, help checkers better find bugs, and prevent the
> occasional compiler mischief!

I think this would be a mistake for 3 reasons:

1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
   of any normally-atomic type (char/int/long/void*), then _every_ access
   would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
   of compiler optimization (eg. eliding redundant loads) where it would
   otherwise be possible.

2. Makes a mess of otherwise readable code.

3. Error-prone; ie., easy to overlook in review.

There is no practical difference between _always_ using READ_ONCE()/WRITE_ONCE()
(to prevent tearing) and declaring the field volatile.

So we've come full-circle from volatile-considered-harmful.

Regards,
Peter Hurley

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-16 Thread Peter Hurley

On 05/16/2016 05:17 AM, Paul E. McKenney wrote:
> On Mon, May 16, 2016 at 01:09:48PM +0200, Peter Zijlstra wrote:
>> On Fri, May 13, 2016 at 10:58:05AM -0700, Peter Hurley wrote:
>>>> Note that barrier() and READ_ONCE() have overlapping but not identical
>>>> results and the combined use actually makes sense here.
>>>>
>>>> Yes, a barrier() anywhere in the loop will force a reload of the
>>>> variable, _however_ it doesn't force that reload to not suffer from
>>>> load tearing.
>>>>
>>>> Using volatile also forces a reload, but also ensures the load cannot
>>>> be torn IFF it is of machine word side and naturally aligned.
>>>>
>>>> So while the READ_ONCE() here is pointless for forcing the reload;
>>>> that's already ensured, we still need to make sure the load isn't torn.
>>>
>>> If load tearing a naturally aligned pointer is a real code generation
>>> possibility then the rcu list code is broken too (which loads ->next
>>> directly; cf. list_for_each_entry_rcu() & list_for_each_entry_lockless()).
>>>
>>> For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
>>> that had to do with control dependencies and not load tearing.
>>
>> Well, Paul is the one who started the whole load/store tearing thing, so
>> I suppose he knows what he's doing.
> 
> That had to do with suppressing false positives for one of Dmitry
> Vjukov's concurrency checkers.  I suspect that Peter Hurley is right
> that continued use of that checker would identify other places needing
> READ_ONCE(), but from what I understand that is on hold pending a formal
> definition of the Linux-kernel memory model.  (KCC and Dmitry (CCed)
> can correct my if I am confused on this point.)
> 
>> That said; its a fairly recent as things go so lots of code hasn't been
>> updated yet, and its also a very unlikely thing for a compiler to do;
>> since it mostly doesn't make sense to emit multiple instructions where
>> one will do, so its not a very high priority thing either.
>>
>> But from what I understand, the compiler is free to emit all kinds of
>> nonsense for !volatile loads/stores.
> 
> That is quite true.  :-/
> 
>>> OTOH, this patch might actually produce store-tearing:
>>>
>>> +static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
>>> +{
>>> +   /*
>>> +* We check the owner value first to make sure that we will only
>>> +* do a write to the rwsem cacheline when it is really necessary
>>> +* to minimize cacheline contention.
>>> +*/
>>> +   if (sem->owner != RWSEM_READER_OWNED)
>>> +   sem->owner = RWSEM_READER_OWNED;
>>> +}
>>
>> Correct; which is why we should always use {READ,WRITE}_ONCE() for
>> anything that is used locklessly.
> 
> Completely agreed.  Improve readability of code by flagging lockless
> shared-memory accesses, help checkers better find bugs, and prevent the
> occasional compiler mischief!

I think this would be a mistake for 3 reasons:

1. If READ_ONCE()/WRITE_ONCE() is necessary to prevent load/store tearing
   of any normally-atomic type (char/int/long/void*), then _every_ access
   would require READ_ONCE()/WRITE_ONCE(), thus eliminating any possibility
   of compiler optimization (eg. eliding redundant loads) where it would
   otherwise be possible.

2. Makes a mess of otherwise readable code.

3. Error-prone; ie., easy to overlook in review.

There is no practical difference between _always_ using READ_ONCE()/WRITE_ONCE()
(to prevent tearing) and declaring the field volatile.

So we've come full-circle from volatile-considered-harmful.

Regards,
Peter Hurley

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-13 Thread Peter Hurley

On 05/13/2016 08:07 AM, Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 03:04:20PM -0700, Peter Hurley wrote:
>>> +   return !rwsem_is_reader_owned(READ_ONCE(sem->owner));
>>
>> It doesn't make sense to force reload sem->owner here; if sem->owner
>> is not being reloaded then the loop above will execute forever.
>>
>> Arguably, this check should be bumped out to the optimistic spin and
>> reload/check the owner there?
>>
> 
> Note that barrier() and READ_ONCE() have overlapping but not identical
> results and the combined use actually makes sense here.
> 
> Yes, a barrier() anywhere in the loop will force a reload of the
> variable, _however_ it doesn't force that reload to not suffer from
> load tearing.
> 
> Using volatile also forces a reload, but also ensures the load cannot
> be torn IFF it is of machine word side and naturally aligned.
> 
> So while the READ_ONCE() here is pointless for forcing the reload;
> that's already ensured, we still need to make sure the load isn't torn.

If load tearing a naturally aligned pointer is a real code generation
possibility then the rcu list code is broken too (which loads ->next
directly; cf. list_for_each_entry_rcu() & list_for_each_entry_lockless()).

For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
that had to do with control dependencies and not load tearing.

OTOH, this patch might actually produce store-tearing:

+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+   /*
+* We check the owner value first to make sure that we will only
+* do a write to the rwsem cacheline when it is really necessary
+* to minimize cacheline contention.
+*/
+   if (sem->owner != RWSEM_READER_OWNED)
+   sem->owner = RWSEM_READER_OWNED;
+}


Regards,
Peter Hurley

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-13 Thread Peter Hurley

On 05/13/2016 08:07 AM, Peter Zijlstra wrote:
> On Wed, May 11, 2016 at 03:04:20PM -0700, Peter Hurley wrote:
>>> +   return !rwsem_is_reader_owned(READ_ONCE(sem->owner));
>>
>> It doesn't make sense to force reload sem->owner here; if sem->owner
>> is not being reloaded then the loop above will execute forever.
>>
>> Arguably, this check should be bumped out to the optimistic spin and
>> reload/check the owner there?
>>
> 
> Note that barrier() and READ_ONCE() have overlapping but not identical
> results and the combined use actually makes sense here.
> 
> Yes, a barrier() anywhere in the loop will force a reload of the
> variable, _however_ it doesn't force that reload to not suffer from
> load tearing.
> 
> Using volatile also forces a reload, but also ensures the load cannot
> be torn IFF it is of machine word side and naturally aligned.
> 
> So while the READ_ONCE() here is pointless for forcing the reload;
> that's already ensured, we still need to make sure the load isn't torn.

If load tearing a naturally aligned pointer is a real code generation
possibility then the rcu list code is broken too (which loads ->next
directly; cf. list_for_each_entry_rcu() & list_for_each_entry_lockless()).

For 4.4, Paul added READ_ONCE() checks for list_empty() et al, but iirc
that had to do with control dependencies and not load tearing.

OTOH, this patch might actually produce store-tearing:

+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+   /*
+* We check the owner value first to make sure that we will only
+* do a write to the rwsem cacheline when it is really necessary
+* to minimize cacheline contention.
+*/
+   if (sem->owner != RWSEM_READER_OWNED)
+   sem->owner = RWSEM_READER_OWNED;
+}


Regards,
Peter Hurley

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-12 Thread Peter Hurley

On 05/12/2016 01:15 PM, Waiman Long wrote:
> On 05/11/2016 06:04 PM, Peter Hurley wrote:

[...]

>>
>>> @@ -328,8 +329,6 @@ done:
>>>   static noinline
>>>   bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct 
>>> *owner)
>>>   {
>>> -long count;
>>> -
>>>   rcu_read_lock();
>>>   while (sem->owner == owner) {
>>>   /*
>>> @@ -350,16 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, 
>>> struct task_struct *owner)
>>>   }
>>>   rcu_read_unlock();
>>>
>>> -if (READ_ONCE(sem->owner))
>>> -return true; /* new owner, continue spinning */
>>> -
>>>   /*
>>> - * When the owner is not set, the lock could be free or
>>> - * held by readers. Check the counter to verify the
>>> - * state.
>>> + * If there is a new owner or the owner is not set, we continue
>>> + * spinning.
>>>*/
>>> -count = READ_ONCE(sem->count);
>>> -return (count == 0 || count == RWSEM_WAITING_BIAS);
>>> +return !rwsem_is_reader_owned(READ_ONCE(sem->owner));
>> It doesn't make sense to force reload sem->owner here; if sem->owner
>> is not being reloaded then the loop above will execute forever.
> 
> I agree that we don't actually need to use READ_ONCE() here for sem->owner as 
> the barrier() call will force the reloading. It is more like a habit to use 
> it for public variable or we will have to think a bit harder to make sure 
> that we are doing the right thing.

I look at that code and I think "what am I missing that it needs to reload at 
exit"
Extra cruft is just as misleading.

What matters while spinning on owner is that the reload is forced as part of the
loop, not at exit time. The whole point of rwsem_spin_on_owner() is to spin
until the owner _changes_; there's no need to get a more recent owner value
than the one that caused the loop to break.


>> Arguably, this check should be bumped out to the optimistic spin and
>> reload/check the owner there?
>>
>> Or better yet; don't pass the owner in as a parameter at all, but
>> instead snapshot the owner and check its ownership on entry.
> 
> That will make the main optimistic spinning loop more complex.

??

Simpler.

while (rwsem_spin_on_owner(sem)) {
if (rwsem_try_write_lock_unqueued(sem)) {
taken = true;
break;
}

if (!sem->owner && (need_resched() || rt_task(current)))
break;

cpu_relax_lowlatency();
}




bool rwsem_spin_on_owner(struct rw_semaphore *sem)
{
struct task_struct *owner = READ_ONCE(sem->owner);

if (!rwsem_is_writer_owned(owner))
return false;

rcu_read_lock();
while (sem->owner == owner) {

}
rcu_read_unlock();

return !rwsem_is_reader_owned(sem->owner);
}


>>
>> Because see below...
>>
>>>   }
>>>
>>>   static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>>> @@ -378,7 +372,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore 
>>> *sem)
>>>
>>>   while (true) {
>>>   owner = READ_ONCE(sem->owner);
>>> -if (owner&&  !rwsem_spin_on_owner(sem, owner))
>>> +if (rwsem_is_writer_owned(owner)&&
>>> +   !rwsem_spin_on_owner(sem, owner))
>>>   break;
>>>
>>>   /* wait_lock will be acquired if write_lock is obtained */
>>> @@ -391,9 +386,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore 
>>> *sem)
>>>* When there's no owner, we might have preempted between the
>>>* owner acquiring the lock and setting the owner field. If
>>>* we're an RT task that will live-lock because we won't let
>>> - * the owner complete.
>>> + * the owner complete. We also quit if the lock is owned by
>>> + * readers.
>>>*/
>>> -if (!owner&&  (need_resched() || rt_task(current)))
>>> +if (rwsem_is_reader_owned(owner) ||
>>> +   (!owner&&  (need_resched() || rt_task(current
>> This is using the stale owner value that was cached before spinning on the 
>> owner;
>> That can't be right.
> 
> The code is going to loop back and reload the new owner value anyway. It is 
> just a bit of additional latency. I will move the is_reader check up after 
> loading sem->owner to clear any confusion.

Well, why do it at all then?  Just before attempting the lock, 
rwsem_spin_on_owner()
determined that the owner was not a reader. If a reader became the new owner 
after
that, the loop will discover that the lock is no longer writer owned anyway.

Either way, the check here should be to sem->owner (ie., like I wrote above),
but there's no need to force reload here either.

Regards,
Peter Hurley

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-12 Thread Peter Hurley

On 05/12/2016 01:15 PM, Waiman Long wrote:
> On 05/11/2016 06:04 PM, Peter Hurley wrote:

[...]

>>
>>> @@ -328,8 +329,6 @@ done:
>>>   static noinline
>>>   bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct 
>>> *owner)
>>>   {
>>> -long count;
>>> -
>>>   rcu_read_lock();
>>>   while (sem->owner == owner) {
>>>   /*
>>> @@ -350,16 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, 
>>> struct task_struct *owner)
>>>   }
>>>   rcu_read_unlock();
>>>
>>> -if (READ_ONCE(sem->owner))
>>> -return true; /* new owner, continue spinning */
>>> -
>>>   /*
>>> - * When the owner is not set, the lock could be free or
>>> - * held by readers. Check the counter to verify the
>>> - * state.
>>> + * If there is a new owner or the owner is not set, we continue
>>> + * spinning.
>>>*/
>>> -count = READ_ONCE(sem->count);
>>> -return (count == 0 || count == RWSEM_WAITING_BIAS);
>>> +return !rwsem_is_reader_owned(READ_ONCE(sem->owner));
>> It doesn't make sense to force reload sem->owner here; if sem->owner
>> is not being reloaded then the loop above will execute forever.
> 
> I agree that we don't actually need to use READ_ONCE() here for sem->owner as 
> the barrier() call will force the reloading. It is more like a habit to use 
> it for public variable or we will have to think a bit harder to make sure 
> that we are doing the right thing.

I look at that code and I think "what am I missing that it needs to reload at 
exit"
Extra cruft is just as misleading.

What matters while spinning on owner is that the reload is forced as part of the
loop, not at exit time. The whole point of rwsem_spin_on_owner() is to spin
until the owner _changes_; there's no need to get a more recent owner value
than the one that caused the loop to break.


>> Arguably, this check should be bumped out to the optimistic spin and
>> reload/check the owner there?
>>
>> Or better yet; don't pass the owner in as a parameter at all, but
>> instead snapshot the owner and check its ownership on entry.
> 
> That will make the main optimistic spinning loop more complex.

??

Simpler.

while (rwsem_spin_on_owner(sem)) {
if (rwsem_try_write_lock_unqueued(sem)) {
taken = true;
break;
}

if (!sem->owner && (need_resched() || rt_task(current)))
break;

cpu_relax_lowlatency();
}




bool rwsem_spin_on_owner(struct rw_semaphore *sem)
{
struct task_struct *owner = READ_ONCE(sem->owner);

if (!rwsem_is_writer_owned(owner))
return false;

rcu_read_lock();
while (sem->owner == owner) {

}
rcu_read_unlock();

return !rwsem_is_reader_owned(sem->owner);
}


>>
>> Because see below...
>>
>>>   }
>>>
>>>   static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>>> @@ -378,7 +372,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore 
>>> *sem)
>>>
>>>   while (true) {
>>>   owner = READ_ONCE(sem->owner);
>>> -if (owner&&  !rwsem_spin_on_owner(sem, owner))
>>> +if (rwsem_is_writer_owned(owner)&&
>>> +   !rwsem_spin_on_owner(sem, owner))
>>>   break;
>>>
>>>   /* wait_lock will be acquired if write_lock is obtained */
>>> @@ -391,9 +386,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore 
>>> *sem)
>>>* When there's no owner, we might have preempted between the
>>>* owner acquiring the lock and setting the owner field. If
>>>* we're an RT task that will live-lock because we won't let
>>> - * the owner complete.
>>> + * the owner complete. We also quit if the lock is owned by
>>> + * readers.
>>>*/
>>> -if (!owner&&  (need_resched() || rt_task(current)))
>>> +if (rwsem_is_reader_owned(owner) ||
>>> +   (!owner&&  (need_resched() || rt_task(current
>> This is using the stale owner value that was cached before spinning on the 
>> owner;
>> That can't be right.
> 
> The code is going to loop back and reload the new owner value anyway. It is 
> just a bit of additional latency. I will move the is_reader check up after 
> loading sem->owner to clear any confusion.

Well, why do it at all then?  Just before attempting the lock, 
rwsem_spin_on_owner()
determined that the owner was not a reader. If a reader became the new owner 
after
that, the loop will discover that the lock is no longer writer owned anyway.

Either way, the check here should be to sem->owner (ie., like I wrote above),
but there's no need to force reload here either.

Regards,
Peter Hurley

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-11 Thread Peter Hurley

long count;
> -
>   rcu_read_lock();
>   while (sem->owner == owner) {
>   /*
> @@ -350,16 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, 
> struct task_struct *owner)
>   }
>   rcu_read_unlock();
>  
> - if (READ_ONCE(sem->owner))
> - return true; /* new owner, continue spinning */
> -
>   /*
> -  * When the owner is not set, the lock could be free or
> -  * held by readers. Check the counter to verify the
> -  * state.
> +  * If there is a new owner or the owner is not set, we continue
> +  * spinning.
>*/
> - count = READ_ONCE(sem->count);
> - return (count == 0 || count == RWSEM_WAITING_BIAS);
> + return !rwsem_is_reader_owned(READ_ONCE(sem->owner));

It doesn't make sense to force reload sem->owner here; if sem->owner
is not being reloaded then the loop above will execute forever.

Arguably, this check should be bumped out to the optimistic spin and
reload/check the owner there?

Or better yet; don't pass the owner in as a parameter at all, but
instead snapshot the owner and check its ownership on entry.

Because see below...

>  }
>  
>  static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
> @@ -378,7 +372,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore 
> *sem)
>  
>   while (true) {
>   owner = READ_ONCE(sem->owner);
> - if (owner && !rwsem_spin_on_owner(sem, owner))
> + if (rwsem_is_writer_owned(owner) &&
> +!rwsem_spin_on_owner(sem, owner))
>   break;
>  
>   /* wait_lock will be acquired if write_lock is obtained */
> @@ -391,9 +386,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore 
> *sem)
>* When there's no owner, we might have preempted between the
>* owner acquiring the lock and setting the owner field. If
>* we're an RT task that will live-lock because we won't let
> -  * the owner complete.
> +  * the owner complete. We also quit if the lock is owned by
> +  * readers.
>*/
> - if (!owner && (need_resched() || rt_task(current)))
> + if (rwsem_is_reader_owned(owner) ||
> +(!owner && (need_resched() || rt_task(current

This is using the stale owner value that was cached before spinning on the 
owner;
That can't be right.

Regards,
Peter Hurley

>   break;
>  
>   /*
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index c817216..5838f56 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -22,6 +22,7 @@ void __sched down_read(struct rw_semaphore *sem)
>   rwsem_acquire_read(>dep_map, 0, 0, _RET_IP_);
>  
>   LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
> + rwsem_set_reader_owned(sem);
>  }
>  
>  EXPORT_SYMBOL(down_read);
> @@ -33,8 +34,10 @@ int down_read_trylock(struct rw_semaphore *sem)
>  {
>   int ret = __down_read_trylock(sem);
>  
> - if (ret == 1)
> + if (ret == 1) {
>   rwsem_acquire_read(>dep_map, 0, 1, _RET_IP_);
> + rwsem_set_reader_owned(sem);
> + }
>   return ret;
>  }
>  
> @@ -124,7 +127,7 @@ void downgrade_write(struct rw_semaphore *sem)
>* lockdep: a downgraded write will live on as a write
>* dependency.
>*/
> - rwsem_clear_owner(sem);
> + rwsem_set_reader_owned(sem);
>   __downgrade_write(sem);
>  }
>  
> @@ -138,6 +141,7 @@ void down_read_nested(struct rw_semaphore *sem, int 
> subclass)
>   rwsem_acquire_read(>dep_map, subclass, 0, _RET_IP_);
>  
>   LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
> + rwsem_set_reader_owned(sem);
>  }
>  
>  EXPORT_SYMBOL(down_read_nested);
> diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
> index 870ed9a..d7fea18 100644
> --- a/kernel/locking/rwsem.h
> +++ b/kernel/locking/rwsem.h
> @@ -1,3 +1,20 @@
> +/*
> + * The owner field of the rw_semaphore structure will be set to
> + * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
> + * the owner field when it unlocks. A reader, on the other hand, will
> + * not touch the owner field when it unlocks.
> + *
> + * In essence, the owner field now has the following 3 states:
> + *  1) 0
> + * - lock is free or the owner hasn't set the field yet
> + *  2) RWSEM_READER_OWNED
> + * - lock is currently or previously owned by readers (lock is free
> + *   or not set by owner yet)
> + *  3) Other non-zero value
> + *

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-11 Thread Peter Hurley

> -
>   rcu_read_lock();
>   while (sem->owner == owner) {
>   /*
> @@ -350,16 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, 
> struct task_struct *owner)
>   }
>   rcu_read_unlock();
>  
> - if (READ_ONCE(sem->owner))
> - return true; /* new owner, continue spinning */
> -
>   /*
> -  * When the owner is not set, the lock could be free or
> -  * held by readers. Check the counter to verify the
> -  * state.
> +  * If there is a new owner or the owner is not set, we continue
> +  * spinning.
>*/
> - count = READ_ONCE(sem->count);
> - return (count == 0 || count == RWSEM_WAITING_BIAS);
> + return !rwsem_is_reader_owned(READ_ONCE(sem->owner));

It doesn't make sense to force reload sem->owner here; if sem->owner
is not being reloaded then the loop above will execute forever.

Arguably, this check should be bumped out to the optimistic spin and
reload/check the owner there?

Or better yet; don't pass the owner in as a parameter at all, but
instead snapshot the owner and check its ownership on entry.

Because see below...

>  }
>  
>  static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
> @@ -378,7 +372,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore 
> *sem)
>  
>   while (true) {
>   owner = READ_ONCE(sem->owner);
> - if (owner && !rwsem_spin_on_owner(sem, owner))
> + if (rwsem_is_writer_owned(owner) &&
> +!rwsem_spin_on_owner(sem, owner))
>   break;
>  
>   /* wait_lock will be acquired if write_lock is obtained */
> @@ -391,9 +386,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore 
> *sem)
>* When there's no owner, we might have preempted between the
>* owner acquiring the lock and setting the owner field. If
>* we're an RT task that will live-lock because we won't let
> -  * the owner complete.
> +  * the owner complete. We also quit if the lock is owned by
> +  * readers.
>*/
> - if (!owner && (need_resched() || rt_task(current)))
> + if (rwsem_is_reader_owned(owner) ||
> +(!owner && (need_resched() || rt_task(current

This is using the stale owner value that was cached before spinning on the 
owner;
That can't be right.

Regards,
Peter Hurley

>   break;
>  
>   /*
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index c817216..5838f56 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -22,6 +22,7 @@ void __sched down_read(struct rw_semaphore *sem)
>   rwsem_acquire_read(>dep_map, 0, 0, _RET_IP_);
>  
>   LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
> + rwsem_set_reader_owned(sem);
>  }
>  
>  EXPORT_SYMBOL(down_read);
> @@ -33,8 +34,10 @@ int down_read_trylock(struct rw_semaphore *sem)
>  {
>   int ret = __down_read_trylock(sem);
>  
> - if (ret == 1)
> + if (ret == 1) {
>   rwsem_acquire_read(>dep_map, 0, 1, _RET_IP_);
> + rwsem_set_reader_owned(sem);
> + }
>   return ret;
>  }
>  
> @@ -124,7 +127,7 @@ void downgrade_write(struct rw_semaphore *sem)
>* lockdep: a downgraded write will live on as a write
>* dependency.
>*/
> - rwsem_clear_owner(sem);
> + rwsem_set_reader_owned(sem);
>   __downgrade_write(sem);
>  }
>  
> @@ -138,6 +141,7 @@ void down_read_nested(struct rw_semaphore *sem, int 
> subclass)
>   rwsem_acquire_read(>dep_map, subclass, 0, _RET_IP_);
>  
>   LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
> + rwsem_set_reader_owned(sem);
>  }
>  
>  EXPORT_SYMBOL(down_read_nested);
> diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
> index 870ed9a..d7fea18 100644
> --- a/kernel/locking/rwsem.h
> +++ b/kernel/locking/rwsem.h
> @@ -1,3 +1,20 @@
> +/*
> + * The owner field of the rw_semaphore structure will be set to
> + * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
> + * the owner field when it unlocks. A reader, on the other hand, will
> + * not touch the owner field when it unlocks.
> + *
> + * In essence, the owner field now has the following 3 states:
> + *  1) 0
> + * - lock is free or the owner hasn't set the field yet
> + *  2) RWSEM_READER_OWNED
> + * - lock is currently or previously owned by readers (lock is free
> + *   or not set by owner yet)
> + *  3) Other non-zero value
> + * - a writer owns the lock
> + *

Re: [PATCH 1/4] locking/rwsem: Avoid stale ->count for rwsem_down_write_failed()

2016-05-08 Thread Peter Hurley

On 05/08/2016 09:56 PM, Davidlohr Bueso wrote:
> The field is obviously updated w.o the lock and needs a READ_ONCE
> while waiting for lock holder(s) to go away, just like we do with
> all other ->count accesses.

This isn't actually fixing a bug because it's passed through
several full barriers which will force reloading from sem->count.

I think the patch is ok if you want it just for consistency anyway,
but please change $subject and changelog.

Regards,
Peter Hurley


> Signed-off-by: Davidlohr Bueso <dbu...@suse.de>
> ---
>  kernel/locking/rwsem-xadd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index df4dcb883b50..7d62772600cf 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -494,7 +494,7 @@ __rwsem_down_write_failed_common(struct rw_semaphore 
> *sem, int state)
>   }
>   schedule();
>   set_current_state(state);
> - } while ((count = sem->count) & RWSEM_ACTIVE_MASK);
> + } while ((count = READ_ONCE(sem->count)) & RWSEM_ACTIVE_MASK);
>  
>   raw_spin_lock_irq(>wait_lock);
>   }
>

Re: [PATCH 1/4] locking/rwsem: Avoid stale ->count for rwsem_down_write_failed()

2016-05-08 Thread Peter Hurley

On 05/08/2016 09:56 PM, Davidlohr Bueso wrote:
> The field is obviously updated w.o the lock and needs a READ_ONCE
> while waiting for lock holder(s) to go away, just like we do with
> all other ->count accesses.

This isn't actually fixing a bug because it's passed through
several full barriers which will force reloading from sem->count.

I think the patch is ok if you want it just for consistency anyway,
but please change $subject and changelog.

Regards,
Peter Hurley


> Signed-off-by: Davidlohr Bueso 
> ---
>  kernel/locking/rwsem-xadd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index df4dcb883b50..7d62772600cf 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -494,7 +494,7 @@ __rwsem_down_write_failed_common(struct rw_semaphore 
> *sem, int state)
>   }
>   schedule();
>   set_current_state(state);
> - } while ((count = sem->count) & RWSEM_ACTIVE_MASK);
> + } while ((count = READ_ONCE(sem->count)) & RWSEM_ACTIVE_MASK);
>  
>   raw_spin_lock_irq(>wait_lock);
>   }
>

Re: [PATCH 0/1] devpts: Removing the need for pt_chown

2016-05-06 Thread Peter Hurley

On 05/06/2016 12:35 PM, Greg KH wrote:
> On Fri, May 06, 2016 at 02:04:12PM -0500, Eric W. Biederman wrote:
>>
>> Greg,
>>
>> Could you please apply the following patch to tty-next so it can be
>> merged into 4.7-rc1.
>>
>> We have had a long series of discussions and in the last iteration we
>> finally converged on a set of semantics that does not break userspace
>> and also makes the code simpler.
> 
> Did everyone agree?  I didn't think so, but the thread got long and
> messy.  And then Linus did some work on this as well.
> 
> How does this play with what Linus proposed?  I think only portions of
> his original changes are merged, and there are still outstanding parts,
> right?

Linus committed his proposed pty changes which sits in -rc6 and not
in tty-next (which is based on -rc5):

commit 8ead9dd54716d1e05e129959f702fcc1786f82b4
Author: Linus Torvalds <torva...@linux-foundation.org>
Date:   Mon Apr 25 20:04:08 2016 -0700

devpts: more pty driver interface cleanups

This is more prep-work for the upcoming pty changes.  Still just code
cleanup with no actual semantic changes.

This removes a bunch pointless complexity by just having the slave pty
side remember the dentry associated with the devpts slave rather than
the inode.  That allows us to remove all the "look up the dentry" code
for when we want to remove it again.

Linus's changes look good to me and I've been running them cherry-picked on
my private tty-next testing tree since.

When Greg picks up -rc6 (not sure he was going to do that pre-merge window?),
I'd also like to push the devpts_mutex locking down into fs/devpts/inode.c,
but I'd be willing to do that later, if it's going to get in the way.

Regards,
Peter Hurley

Re: [PATCH 0/1] devpts: Removing the need for pt_chown

2016-05-06 Thread Peter Hurley

On 05/06/2016 12:35 PM, Greg KH wrote:
> On Fri, May 06, 2016 at 02:04:12PM -0500, Eric W. Biederman wrote:
>>
>> Greg,
>>
>> Could you please apply the following patch to tty-next so it can be
>> merged into 4.7-rc1.
>>
>> We have had a long series of discussions and in the last iteration we
>> finally converged on a set of semantics that does not break userspace
>> and also makes the code simpler.
> 
> Did everyone agree?  I didn't think so, but the thread got long and
> messy.  And then Linus did some work on this as well.
> 
> How does this play with what Linus proposed?  I think only portions of
> his original changes are merged, and there are still outstanding parts,
> right?

Linus committed his proposed pty changes which sits in -rc6 and not
in tty-next (which is based on -rc5):

commit 8ead9dd54716d1e05e129959f702fcc1786f82b4
Author: Linus Torvalds 
Date:   Mon Apr 25 20:04:08 2016 -0700

devpts: more pty driver interface cleanups

This is more prep-work for the upcoming pty changes.  Still just code
cleanup with no actual semantic changes.

This removes a bunch pointless complexity by just having the slave pty
side remember the dentry associated with the devpts slave rather than
the inode.  That allows us to remove all the "look up the dentry" code
for when we want to remove it again.

Linus's changes look good to me and I've been running them cherry-picked on
my private tty-next testing tree since.

When Greg picks up -rc6 (not sure he was going to do that pre-merge window?),
I'd also like to push the devpts_mutex locking down into fs/devpts/inode.c,
but I'd be willing to do that later, if it's going to get in the way.

Regards,
Peter Hurley

Re: [BUG] drivers/tty: read() on a noncanonical blocking tty randomly fails when VMIN > received >= buf

2016-05-05 Thread Peter Hurley

On 05/05/2016 03:08 AM, One Thousand Gnomes wrote:
> On Wed, 4 May 2016 16:07:44 -0700
> Peter Hurley <pe...@hurleysoftware.com> wrote:
> 
>> Hi Julio,
>>
>> On 05/04/2016 04:00 PM, Julio Guerra wrote:
>>> Hi,
>>>
>>> When a tty (here a slave pty) is set in noncanonical input and blocking 
>>> read modes, a read() randomly blocks when:
>>> "VMIN > kernel received >= user buffer size > 0".
>>>
>>> The standard says that read() should block until VMIN bytes are received 
>>> [1][2]. Whether this is an implementation defined case not really specified 
>>> by POSIX or not, it should not behave randomly (otherwise it really should 
>>> be documented in termios manpage).  
>>
>> This is not a bug.
>>
>> >From the termios(3) man page:  
>>
>>* MIN > 0; TIME == 0: read(2) blocks until the lesser of MIN bytes or 
>> the number of bytes requested are avail‐
>>  able, and returns the lesser of these two values.
> 
> The standard says
> 
>   Case B: MIN>0, TIME=0
> 
>   In case B, since the value of TIME is zero, the timer plays no
>   role and only MIN is significant. A pending read shall not be
>   satisfied until MIN bytes are received (that is, the pending read
>   shall block until MIN bytes are received), or a signal is
>   received. A program that uses case B to read record-based
>   terminal I/O may block indefinitely in the read operation.
> 
> That is if you do 
> 
> 
>   read(fd, buf, 3)
> 
> and MIN is 5, the read should not return until there are 5 bytes in the
> queue. The following code is guaranteed to work reliably by the standard
> with TIME 0 MIN 5 (ignoring signals for the moment)
> 
> 
>   read(fd, buf, 3);
>   fcntl(fd, F_SETFL, FNDELAY);
>   assert(read(fd, buf, 2) == 2);
> 
> Historically this behaviour was useful for things like block transfer
> protocols, especially with offloaded serial processing.
> 
> So actually I think we do have a bug, the behaviuour is not standards
> compliant, and the man page documents the erroneous behaviour.

I disagree; I think SUSv4 fails to address this degenerate condition at all.
For example, SUSv4 specifically states that there is no precedence of
MIN/TIME with O_NONBLOCK. IOW, the standard does _not_ guarantee that your
code fragment above won't block on the subsequent read anyway since it fails
to meet the new MIN 5 watermark.

But I have no problem fixing a bona fide regression; what's broken?

Re: [BUG] drivers/tty: read() on a noncanonical blocking tty randomly fails when VMIN > received >= buf

2016-05-05 Thread Peter Hurley

On 05/05/2016 03:08 AM, One Thousand Gnomes wrote:
> On Wed, 4 May 2016 16:07:44 -0700
> Peter Hurley  wrote:
> 
>> Hi Julio,
>>
>> On 05/04/2016 04:00 PM, Julio Guerra wrote:
>>> Hi,
>>>
>>> When a tty (here a slave pty) is set in noncanonical input and blocking 
>>> read modes, a read() randomly blocks when:
>>> "VMIN > kernel received >= user buffer size > 0".
>>>
>>> The standard says that read() should block until VMIN bytes are received 
>>> [1][2]. Whether this is an implementation defined case not really specified 
>>> by POSIX or not, it should not behave randomly (otherwise it really should 
>>> be documented in termios manpage).  
>>
>> This is not a bug.
>>
>> >From the termios(3) man page:  
>>
>>* MIN > 0; TIME == 0: read(2) blocks until the lesser of MIN bytes or 
>> the number of bytes requested are avail‐
>>  able, and returns the lesser of these two values.
> 
> The standard says
> 
>   Case B: MIN>0, TIME=0
> 
>   In case B, since the value of TIME is zero, the timer plays no
>   role and only MIN is significant. A pending read shall not be
>   satisfied until MIN bytes are received (that is, the pending read
>   shall block until MIN bytes are received), or a signal is
>   received. A program that uses case B to read record-based
>   terminal I/O may block indefinitely in the read operation.
> 
> That is if you do 
> 
> 
>   read(fd, buf, 3)
> 
> and MIN is 5, the read should not return until there are 5 bytes in the
> queue. The following code is guaranteed to work reliably by the standard
> with TIME 0 MIN 5 (ignoring signals for the moment)
> 
> 
>   read(fd, buf, 3);
>   fcntl(fd, F_SETFL, FNDELAY);
>   assert(read(fd, buf, 2) == 2);
> 
> Historically this behaviour was useful for things like block transfer
> protocols, especially with offloaded serial processing.
> 
> So actually I think we do have a bug, the behaviuour is not standards
> compliant, and the man page documents the erroneous behaviour.

I disagree; I think SUSv4 fails to address this degenerate condition at all.
For example, SUSv4 specifically states that there is no precedence of
MIN/TIME with O_NONBLOCK. IOW, the standard does _not_ guarantee that your
code fragment above won't block on the subsequent read anyway since it fails
to meet the new MIN 5 watermark.

But I have no problem fixing a bona fide regression; what's broken?

Re: [BUG] drivers/tty: read() on a noncanonical blocking tty randomly fails when VMIN > received >= buf

2016-05-04 Thread Peter Hurley

On 05/04/2016 04:27 PM, Julio Guerra wrote:
>>> When a tty (here a slave pty) is set in noncanonical input and blocking 
>>> read modes, a read() randomly blocks when:
>>> "VMIN > kernel received >= user buffer size > 0".
>>>
>>> The standard says that read() should block until VMIN bytes are received 
>>> [1][2]. Whether this is an implementation defined case not really specified 
>>> by POSIX or not, it should not behave randomly (otherwise it really should 
>>> be documented in termios manpage).
>>
>> This is not a bug.
>>
>> From the termios(3) man page:
>>
>>* MIN > 0; TIME == 0: read(2) blocks until the lesser of MIN bytes or 
>> the number of bytes requested are avail‐
>>  able, and returns the lesser of these two values.
>>
> 
> This does not appear in my man...
> 
> Anyway, how do you explain the random behavior then?

A long standing bug in this read mode allows the asynchronous input
processing thread to race with the read() thread and become confused
about how much data remains.

I fixed this in 4.6; when I run your test on 4.6, it consistently
returns the full user buffer.

Regards,
Peter Hurley

Re: [BUG] drivers/tty: read() on a noncanonical blocking tty randomly fails when VMIN > received >= buf

2016-05-04 Thread Peter Hurley

On 05/04/2016 04:27 PM, Julio Guerra wrote:
>>> When a tty (here a slave pty) is set in noncanonical input and blocking 
>>> read modes, a read() randomly blocks when:
>>> "VMIN > kernel received >= user buffer size > 0".
>>>
>>> The standard says that read() should block until VMIN bytes are received 
>>> [1][2]. Whether this is an implementation defined case not really specified 
>>> by POSIX or not, it should not behave randomly (otherwise it really should 
>>> be documented in termios manpage).
>>
>> This is not a bug.
>>
>> From the termios(3) man page:
>>
>>* MIN > 0; TIME == 0: read(2) blocks until the lesser of MIN bytes or 
>> the number of bytes requested are avail‐
>>  able, and returns the lesser of these two values.
>>
> 
> This does not appear in my man...
> 
> Anyway, how do you explain the random behavior then?

A long standing bug in this read mode allows the asynchronous input
processing thread to race with the read() thread and become confused
about how much data remains.

I fixed this in 4.6; when I run your test on 4.6, it consistently
returns the full user buffer.

Regards,
Peter Hurley

Re: [BUG] drivers/tty: read() on a noncanonical blocking tty randomly fails when VMIN > received >= buf

2016-05-04 Thread Peter Hurley

Hi Julio,

On 05/04/2016 04:00 PM, Julio Guerra wrote:
> Hi,
> 
> When a tty (here a slave pty) is set in noncanonical input and blocking read 
> modes, a read() randomly blocks when:
> "VMIN > kernel received >= user buffer size > 0".
> 
> The standard says that read() should block until VMIN bytes are received 
> [1][2]. Whether this is an implementation defined case not really specified 
> by POSIX or not, it should not behave randomly (otherwise it really should be 
> documented in termios manpage).

This is not a bug.

>From the termios(3) man page:

   * MIN > 0; TIME == 0: read(2) blocks until the lesser of MIN bytes or 
the number of bytes requested are avail‐
 able, and returns the lesser of these two values.

Regards,
Peter Hurley


> I isolated it in the following example (with VMIN = 5, received = 4, user 
> buffer = 3):
> https://gist.github.com/Julio-Guerra/b3fdefab281403073607d81cabcea04a
> 
> Since it is random, you will need run it several times to observe both cases. 
> When correctly behaving, it should block in the read() for ever (C-c to kill 
> it). When incorrectly behaving, it does not block and reads into the user 
> buffer what is present in the kernel receive buffer (string "any").
> 
> Example where it does not block 3 times out of 4:
>  > $ ./a.out
>  > res=3 buf=any
>  > $ ./a.out
>  > res=3 buf=any
>  > $ ./a.out
>  > ^C⏎
> 
> Linux version 4.5.1-1-ARCH (builduser@tobias) (gcc version 5.3.0 (GCC) ) #1 
> SMP PREEMPT Thu Apr 14 19:19:32 CEST 2016
> 
> [1] "Canonical and noncanonical mode", man termios
> [2] http://www.gnu.org/software/libc/manual/html_node/Noncanonical-Input.html
>

Re: [BUG] drivers/tty: read() on a noncanonical blocking tty randomly fails when VMIN > received >= buf

2016-05-04 Thread Peter Hurley

Hi Julio,

On 05/04/2016 04:00 PM, Julio Guerra wrote:
> Hi,
> 
> When a tty (here a slave pty) is set in noncanonical input and blocking read 
> modes, a read() randomly blocks when:
> "VMIN > kernel received >= user buffer size > 0".
> 
> The standard says that read() should block until VMIN bytes are received 
> [1][2]. Whether this is an implementation defined case not really specified 
> by POSIX or not, it should not behave randomly (otherwise it really should be 
> documented in termios manpage).

This is not a bug.

>From the termios(3) man page:

   * MIN > 0; TIME == 0: read(2) blocks until the lesser of MIN bytes or 
the number of bytes requested are avail‐
 able, and returns the lesser of these two values.

Regards,
Peter Hurley


> I isolated it in the following example (with VMIN = 5, received = 4, user 
> buffer = 3):
> https://gist.github.com/Julio-Guerra/b3fdefab281403073607d81cabcea04a
> 
> Since it is random, you will need run it several times to observe both cases. 
> When correctly behaving, it should block in the read() for ever (C-c to kill 
> it). When incorrectly behaving, it does not block and reads into the user 
> buffer what is present in the kernel receive buffer (string "any").
> 
> Example where it does not block 3 times out of 4:
>  > $ ./a.out
>  > res=3 buf=any
>  > $ ./a.out
>  > res=3 buf=any
>  > $ ./a.out
>  > ^C⏎
> 
> Linux version 4.5.1-1-ARCH (builduser@tobias) (gcc version 5.3.0 (GCC) ) #1 
> SMP PREEMPT Thu Apr 14 19:19:32 CEST 2016
> 
> [1] "Canonical and noncanonical mode", man termios
> [2] http://www.gnu.org/software/libc/manual/html_node/Noncanonical-Input.html
>

Re: [serial] 13a7238eea: WARNING: CPU: 0 PID: 1 at drivers/tty/serial/serial_core.c:99 uart_change_pm+0x35/0x162

2016-05-03 Thread Peter Hurley

Hi,

On 05/03/2016 08:13 PM, kernel test robot wrote:
> 
> FYI, we noticed the following commit:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tty-testing
> commit 13a7238eeab5718bf968e2a835205ba659a38a77 ("serial: core: Prevent 
> unsafe uart port access, part 3")

This should be fixed on most recent tty-testing and tty-next branches.

Please re-test on last tty-testing branch including commit 
7da4b8b7378790dd1e4af1bb7522863127fa1438
("serial: core: Fix port mutex assert if lockdep disabled")

Regards,
Peter Hurley


> on test machine: vm-vp-quantal-x86_64: 2 threads qemu-system-x86_64 
> -enable-kvm with 360M memory
> 
> caused below changes:
> 
> 
> [7.249323] [ cut here ]
> [7.249323] [ cut here ]
> [7.250172] WARNING: CPU: 0 PID: 1 at drivers/tty/serial/serial_core.c:99 
> uart_change_pm+0x35/0x162
> [7.250172] WARNING: CPU: 0 PID: 1 at drivers/tty/serial/serial_core.c:99 
> uart_change_pm+0x35/0x162
> [7.252110] Modules linked in:
> [7.252110] Modules linked in:
> 
> [7.252679] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB   
> 4.6.0-rc5-00015-g13a7238 #1
> [7.252679] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB   
> 4.6.0-rc5-00015-g13a7238 #1
> [7.254232] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Debian-1.8.2-1 04/01/2014
> [7.254232] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Debian-1.8.2-1 04/01/2014
> [7.255806]  
> [7.255806]   88000f42f668 88000f42f668 
> 815a2566 815a2566 0286 0286
> 
> [7.257180]  41b58ab3
> [7.257180]  41b58ab3 81f66067 81f66067 
> 815a240c 815a240c 81f7647d 81f7647d
> 
> [7.258601]  81238abf
> [7.258601]  81238abf 0042f6a8 0042f6a8 
> ed0001e85ecb ed0001e85ecb  
> 
> [7.259973] Call Trace:
> [7.259973] Call Trace:
> [7.260422]  [] dump_stack+0x15a/0x1de
> [7.260422]  [] dump_stack+0x15a/0x1de
> [7.261374]  [] ? _atomic_dec_and_lock+0xaf/0xaf
> [7.261374]  [] ? _atomic_dec_and_lock+0xaf/0xaf
> [7.262470]  [] ? is_module_text_address+0x1f/0x1f
> [7.262470]  [] ? is_module_text_address+0x1f/0x1f
> [7.263595]  [] ? uart_change_pm+0x35/0x162
> [7.263595]  [] ? uart_change_pm+0x35/0x162
> [7.264623]  [] __warn+0x16e/0x189
> [7.264623]  [] __warn+0x16e/0x189
> [7.265523]  [] warn_slowpath_null+0x18/0x1a
> [7.265523]  [] warn_slowpath_null+0x18/0x1a
> [7.266561]  [] uart_change_pm+0x35/0x162
> [7.266561]  [] uart_change_pm+0x35/0x162
> [7.267553]  [] uart_add_one_port+0xb36/0x1008
> [7.267553]  [] uart_add_one_port+0xb36/0x1008
> [7.268637]  [] ? uart_unregister_driver+0x150/0x150
> [7.268637]  [] ? uart_unregister_driver+0x150/0x150
> [7.269786]  [] ? mutex_trylock+0x4eb/0x4eb
> [7.269786]  [] ? mutex_trylock+0x4eb/0x4eb
> [7.270838]  [] 
> serial8250_register_8250_port+0xf68/0x14e5
> [7.270838]  [] 
> serial8250_register_8250_port+0xf68/0x14e5
> [7.272116]  [] serial_pnp_probe+0x2d8/0x5f1
> [7.272116]  [] serial_pnp_probe+0x2d8/0x5f1
> [7.273146]  [] ? check_name+0x6e/0x6e
> [7.273146]  [] ? check_name+0x6e/0x6e
> [7.274125]  [] ? check_name+0x6e/0x6e
> [7.274125]  [] ? check_name+0x6e/0x6e
> [7.275076]  [] pnp_device_probe+0x1c1/0x1dc
> [7.275076]  [] pnp_device_probe+0x1c1/0x1dc
> [7.276109]  [] driver_probe_device+0x325/0x63c
> [7.276109]  [] driver_probe_device+0x325/0x63c
> [7.277191]  [] ? driver_probe_device+0x63c/0x63c
> [7.277191]  [] ? driver_probe_device+0x63c/0x63c
> [7.278299]  [] __driver_attach+0x138/0x179
> [7.278299]  [] __driver_attach+0x138/0x179
> [7.279316]  [] bus_for_each_dev+0x10d/0x143
> [7.279316]  [] bus_for_each_dev+0x10d/0x143
> [7.280372]  [] ? next_device+0x52/0x52
> [7.280372]  [] ? next_device+0x52/0x52
> [7.281334]  [] driver_attach+0x50/0x53
> [7.281334]  [] driver_attach+0x50/0x53
> [7.282302]  [] bus_add_driver+0x31d/0x53e
> [7.282302]  [] bus_add_driver+0x31d/0x53e
> [7.283346]  [] driver_register+0x24b/0x2c3
> [7.283346]  [] driver_register+0x24b/0x2c3
> [7.284370]  [] pnp_register_driver+0x98/0x9f
> [7.284370]  [] pnp_register_driver+0x98/0x9f
> [7.285420]  [] ? univ8250_console_init+0x28/0x28
> [7.285420]  [] ? univ8250_console_init+0x28/0x28
> [7.286579]  [] serial8250_pnp_init+0x10/0x12
> [7.286579]  [] serial8250

Re: [serial] 13a7238eea: WARNING: CPU: 0 PID: 1 at drivers/tty/serial/serial_core.c:99 uart_change_pm+0x35/0x162

2016-05-03 Thread Peter Hurley

Hi,

On 05/03/2016 08:13 PM, kernel test robot wrote:
> 
> FYI, we noticed the following commit:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tty-testing
> commit 13a7238eeab5718bf968e2a835205ba659a38a77 ("serial: core: Prevent 
> unsafe uart port access, part 3")

This should be fixed on most recent tty-testing and tty-next branches.

Please re-test on last tty-testing branch including commit 
7da4b8b7378790dd1e4af1bb7522863127fa1438
("serial: core: Fix port mutex assert if lockdep disabled")

Regards,
Peter Hurley


> on test machine: vm-vp-quantal-x86_64: 2 threads qemu-system-x86_64 
> -enable-kvm with 360M memory
> 
> caused below changes:
> 
> 
> [7.249323] [ cut here ]
> [7.249323] [ cut here ]
> [7.250172] WARNING: CPU: 0 PID: 1 at drivers/tty/serial/serial_core.c:99 
> uart_change_pm+0x35/0x162
> [7.250172] WARNING: CPU: 0 PID: 1 at drivers/tty/serial/serial_core.c:99 
> uart_change_pm+0x35/0x162
> [7.252110] Modules linked in:
> [7.252110] Modules linked in:
> 
> [7.252679] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB   
> 4.6.0-rc5-00015-g13a7238 #1
> [7.252679] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GB   
> 4.6.0-rc5-00015-g13a7238 #1
> [7.254232] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Debian-1.8.2-1 04/01/2014
> [7.254232] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Debian-1.8.2-1 04/01/2014
> [7.255806]  
> [7.255806]   88000f42f668 88000f42f668 
> 815a2566 815a2566 0286 0286
> 
> [7.257180]  41b58ab3
> [7.257180]  41b58ab3 81f66067 81f66067 
> 815a240c 815a240c 81f7647d 81f7647d
> 
> [7.258601]  81238abf
> [7.258601]  81238abf 0042f6a8 0042f6a8 
> ed0001e85ecb ed0001e85ecb  
> 
> [7.259973] Call Trace:
> [7.259973] Call Trace:
> [7.260422]  [] dump_stack+0x15a/0x1de
> [7.260422]  [] dump_stack+0x15a/0x1de
> [7.261374]  [] ? _atomic_dec_and_lock+0xaf/0xaf
> [7.261374]  [] ? _atomic_dec_and_lock+0xaf/0xaf
> [7.262470]  [] ? is_module_text_address+0x1f/0x1f
> [7.262470]  [] ? is_module_text_address+0x1f/0x1f
> [7.263595]  [] ? uart_change_pm+0x35/0x162
> [7.263595]  [] ? uart_change_pm+0x35/0x162
> [7.264623]  [] __warn+0x16e/0x189
> [7.264623]  [] __warn+0x16e/0x189
> [7.265523]  [] warn_slowpath_null+0x18/0x1a
> [7.265523]  [] warn_slowpath_null+0x18/0x1a
> [7.266561]  [] uart_change_pm+0x35/0x162
> [7.266561]  [] uart_change_pm+0x35/0x162
> [7.267553]  [] uart_add_one_port+0xb36/0x1008
> [7.267553]  [] uart_add_one_port+0xb36/0x1008
> [7.268637]  [] ? uart_unregister_driver+0x150/0x150
> [7.268637]  [] ? uart_unregister_driver+0x150/0x150
> [7.269786]  [] ? mutex_trylock+0x4eb/0x4eb
> [7.269786]  [] ? mutex_trylock+0x4eb/0x4eb
> [7.270838]  [] 
> serial8250_register_8250_port+0xf68/0x14e5
> [7.270838]  [] 
> serial8250_register_8250_port+0xf68/0x14e5
> [7.272116]  [] serial_pnp_probe+0x2d8/0x5f1
> [7.272116]  [] serial_pnp_probe+0x2d8/0x5f1
> [7.273146]  [] ? check_name+0x6e/0x6e
> [7.273146]  [] ? check_name+0x6e/0x6e
> [7.274125]  [] ? check_name+0x6e/0x6e
> [7.274125]  [] ? check_name+0x6e/0x6e
> [7.275076]  [] pnp_device_probe+0x1c1/0x1dc
> [7.275076]  [] pnp_device_probe+0x1c1/0x1dc
> [7.276109]  [] driver_probe_device+0x325/0x63c
> [7.276109]  [] driver_probe_device+0x325/0x63c
> [7.277191]  [] ? driver_probe_device+0x63c/0x63c
> [7.277191]  [] ? driver_probe_device+0x63c/0x63c
> [7.278299]  [] __driver_attach+0x138/0x179
> [7.278299]  [] __driver_attach+0x138/0x179
> [7.279316]  [] bus_for_each_dev+0x10d/0x143
> [7.279316]  [] bus_for_each_dev+0x10d/0x143
> [7.280372]  [] ? next_device+0x52/0x52
> [7.280372]  [] ? next_device+0x52/0x52
> [7.281334]  [] driver_attach+0x50/0x53
> [7.281334]  [] driver_attach+0x50/0x53
> [7.282302]  [] bus_add_driver+0x31d/0x53e
> [7.282302]  [] bus_add_driver+0x31d/0x53e
> [7.283346]  [] driver_register+0x24b/0x2c3
> [7.283346]  [] driver_register+0x24b/0x2c3
> [7.284370]  [] pnp_register_driver+0x98/0x9f
> [7.284370]  [] pnp_register_driver+0x98/0x9f
> [7.285420]  [] ? univ8250_console_init+0x28/0x28
> [7.285420]  [] ? univ8250_console_init+0x28/0x28
> [7.286579]  [] serial8250_pnp_init+0x10/0x12
> [7.286579]  [] serial8250

Re: [PATCH v2 0/4] Patches to allow consistent mmc / mmcblk numbering w/ device tree

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:03 PM, Doug Anderson wrote:
> Hi,
> 
> On Fri, Apr 29, 2016 at 4:58 PM, Peter Hurley <pe...@hurleysoftware.com> 
> wrote:
> 
> On 04/29/2016 04:01 PM, Doug Anderson wrote:
> > * serial allows numbering devices by alias.
> 
> Which is in fact a total nightmare.
> 
> While stable device order is mandatory in serial because of
> console command line parameters and existing userspace expectations,
> it is the number one barrier to providing a shared ttyS namespace
> for mixed uart platforms.
> 
> Stable device order has a very real (and often unforeseen) maintenance
> burden.
> 
> 
> Interesting. I wonder if these burdens are unique to serial or shared
> by all the other subsystems that allow ordering? Maybe this is all
> because of legacy reasons?

Well, the specific issue is certainly unique to serial.
But what I was suggesting is that 5 years from now, these patches
could be the "legacy reasons" in mmc.

FWIW, there is already a defacto expectation by boot configurations in the
field that a given mmc block device is stable across boots. The reality
is that 10's of kernel command lines look like:

root=/dev/mmcblk0p2

This was a recent regression fixed by Ulf in commit 9aaf3437aa72
("mmc: block: Use the mmc host device index as the mmcblk device index")

> Note that for MMC devices we wouldn't be asserting that _every_
> device has a stable order. Only those known at boot.
> 
> 
> For example, I noticed these patches are strictly for DT.
> I'm assuming that's because guaranteeing stable device order for
> mmc in general is more difficult; eg., by performing minimal scan
> serially and more expensive portion of the probe async.
> 
> 
> Presumably if there were another system similar to DT where we knew
> all the possible built-in devices right at boot then we could
> extend.
> 
> ...but yes, obviously this wouldn't be a sane thing to do for
> anything probable like PCI or USB.

SCSI does. Here's what Linus said:

"So the *simple* model is to just scan the devices minimally serially,
and allocate the names at that point (so the names are reliable
between boots for the same hardware configuration). And then do the
more expensive device setup asynchronously (ie querying device
information, spinning up disks, whatever - things that can take
anything from milliseonds to several seconds, because they are doing
actual IO). So you'd do some very basic (and _often_ fairly quick)
operations serially, but then try to do the expensive parts
concurrently.

The SCSI layer actually goes a bit further than that: it has a fairly
asynchronous scanning thing, but it does allocate the actual host
device nodes serially, and then it even has an ordered list of
"scanning_hosts" that is used to complete the scanning in-order, so
that the sysfs devices show up in the right order even if things
actually got scanned out-of-order. So scans that finished early will
wait for other scans that are for "earlier" devices, and you end up
with what *looks* ordered to the outside, even if internally it was
all done out-of-order.
"

Regards,
Peter Hurley

Re: [PATCH v2 0/4] Patches to allow consistent mmc / mmcblk numbering w/ device tree

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:03 PM, Doug Anderson wrote:
> Hi,
> 
> On Fri, Apr 29, 2016 at 4:58 PM, Peter Hurley  
> wrote:
> 
> On 04/29/2016 04:01 PM, Doug Anderson wrote:
> > * serial allows numbering devices by alias.
> 
> Which is in fact a total nightmare.
> 
> While stable device order is mandatory in serial because of
> console command line parameters and existing userspace expectations,
> it is the number one barrier to providing a shared ttyS namespace
> for mixed uart platforms.
> 
> Stable device order has a very real (and often unforeseen) maintenance
> burden.
> 
> 
> Interesting. I wonder if these burdens are unique to serial or shared
> by all the other subsystems that allow ordering? Maybe this is all
> because of legacy reasons?

Well, the specific issue is certainly unique to serial.
But what I was suggesting is that 5 years from now, these patches
could be the "legacy reasons" in mmc.

FWIW, there is already a defacto expectation by boot configurations in the
field that a given mmc block device is stable across boots. The reality
is that 10's of kernel command lines look like:

root=/dev/mmcblk0p2

This was a recent regression fixed by Ulf in commit 9aaf3437aa72
("mmc: block: Use the mmc host device index as the mmcblk device index")

> Note that for MMC devices we wouldn't be asserting that _every_
> device has a stable order. Only those known at boot.
> 
> 
> For example, I noticed these patches are strictly for DT.
> I'm assuming that's because guaranteeing stable device order for
> mmc in general is more difficult; eg., by performing minimal scan
> serially and more expensive portion of the probe async.
> 
> 
> Presumably if there were another system similar to DT where we knew
> all the possible built-in devices right at boot then we could
> extend.
> 
> ...but yes, obviously this wouldn't be a sane thing to do for
> anything probable like PCI or USB.

SCSI does. Here's what Linus said:

"So the *simple* model is to just scan the devices minimally serially,
and allocate the names at that point (so the names are reliable
between boots for the same hardware configuration). And then do the
more expensive device setup asynchronously (ie querying device
information, spinning up disks, whatever - things that can take
anything from milliseonds to several seconds, because they are doing
actual IO). So you'd do some very basic (and _often_ fairly quick)
operations serially, but then try to do the expensive parts
concurrently.

The SCSI layer actually goes a bit further than that: it has a fairly
asynchronous scanning thing, but it does allocate the actual host
device nodes serially, and then it even has an ordered list of
"scanning_hosts" that is used to complete the scanning in-order, so
that the sysfs devices show up in the right order even if things
actually got scanned out-of-order. So scans that finished early will
wait for other scans that are for "earlier" devices, and you end up
with what *looks* ordered to the outside, even if internally it was
all done out-of-order.
"

Regards,
Peter Hurley

Re: [PATCH v2 0/4] Patches to allow consistent mmc / mmcblk numbering w/ device tree

2016-04-29 Thread Peter Hurley

On 04/29/2016 04:01 PM, Doug Anderson wrote:
> * serial allows numbering devices by alias.

Which is in fact a total nightmare.

While stable device order is mandatory in serial because of
console command line parameters and existing userspace expectations,
it is the number one barrier to providing a shared ttyS namespace
for mixed uart platforms.

Stable device order has a very real (and often unforeseen) maintenance
burden.

For example, I noticed these patches are strictly for DT.
I'm assuming that's because guaranteeing stable device order for
mmc in general is more difficult; eg., by performing minimal scan
serially and more expensive portion of the probe async.

Regards,
Peter Hurley

Re: [PATCH v2 0/4] Patches to allow consistent mmc / mmcblk numbering w/ device tree

2016-04-29 Thread Peter Hurley

On 04/29/2016 04:01 PM, Doug Anderson wrote:
> * serial allows numbering devices by alias.

Which is in fact a total nightmare.

While stable device order is mandatory in serial because of
console command line parameters and existing userspace expectations,
it is the number one barrier to providing a shared ttyS namespace
for mixed uart platforms.

Stable device order has a very real (and often unforeseen) maintenance
burden.

For example, I noticed these patches are strictly for DT.
I'm assuming that's because guaranteeing stable device order for
mmc in general is more difficult; eg., by performing minimal scan
serially and more expensive portion of the probe async.

Regards,
Peter Hurley

Re: [PATCH v2 08/11] serial: sh-sci: Correct pin initialization on (H)SCIF

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:58 AM, Geert Uytterhoeven wrote:
> Correct pin initialization on (H)SCIF:
>   - RTS must be deasserted (it's active low),
>   - SCK must be an input, as it may be used as the optional external
> clock input.
> 
> Initial pin configuration must always be done:
>   - Regardless of the presence of dedicated RTS and CTS pins: if the
> register exists, the RTS/CTS bits exist, too,
>   - Regardless of hardware flow control being enabled or not: RTS must
> be deasserted.

This is always setting RTS active; why?

Normally you want to program RTS only in response to ->set_mctrl()
from serial core.  For example, this will set RTS active even though
baud might be set to B0.


> Signed-off-by: Geert Uytterhoeven 
> ---
> v2:
>   - New.
> ---
>  drivers/tty/serial/sh-sci.c | 23 ---
>  1 file changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
> index ce7bd165929ea078..c46999f20917964e 100644
> --- a/drivers/tty/serial/sh-sci.c
> +++ b/drivers/tty/serial/sh-sci.c
> @@ -712,21 +712,14 @@ static void sci_init_pins(struct uart_port *port, 
> unsigned int cflag)
>   return;
>   }
>  
> - /*
> -  * For the generic path SCSPTR is necessary. Bail out if that's
> -  * unavailable, too.
> -  */
> - if (!sci_getreg(port, SCSPTR)->size)
> - return;
> -
> - if ((s->cfg->capabilities & SCIx_HAVE_RTSCTS) &&
> - ((!(cflag & CRTSCTS {
> - unsigned short status;
> -
> - status = serial_port_in(port, SCSPTR);
> - status &= ~SCSPTR_CTSIO;
> - status |= SCSPTR_RTSIO;
> - serial_port_out(port, SCSPTR, status); /* Set RTS = 1 */
> + if (sci_getreg(port, SCSPTR)->size) {
> + u16 status = serial_port_in(port, SCSPTR);
> +
> + /* RTS# is output, driven 1 */
> + status |= SCSPTR_RTSIO | SCSPTR_RTSDT;
> + /* CTS# and SCK are inputs */
> + status &= ~(SCSPTR_CTSIO | SCSPTR_SCKIO);
> + serial_port_out(port, SCSPTR, status);
>   }
>  }
>  
>

Re: [PATCH v2 08/11] serial: sh-sci: Correct pin initialization on (H)SCIF

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:58 AM, Geert Uytterhoeven wrote:
> Correct pin initialization on (H)SCIF:
>   - RTS must be deasserted (it's active low),
>   - SCK must be an input, as it may be used as the optional external
> clock input.
> 
> Initial pin configuration must always be done:
>   - Regardless of the presence of dedicated RTS and CTS pins: if the
> register exists, the RTS/CTS bits exist, too,
>   - Regardless of hardware flow control being enabled or not: RTS must
> be deasserted.

This is always setting RTS active; why?

Normally you want to program RTS only in response to ->set_mctrl()
from serial core.  For example, this will set RTS active even though
baud might be set to B0.


> Signed-off-by: Geert Uytterhoeven 
> ---
> v2:
>   - New.
> ---
>  drivers/tty/serial/sh-sci.c | 23 ---
>  1 file changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
> index ce7bd165929ea078..c46999f20917964e 100644
> --- a/drivers/tty/serial/sh-sci.c
> +++ b/drivers/tty/serial/sh-sci.c
> @@ -712,21 +712,14 @@ static void sci_init_pins(struct uart_port *port, 
> unsigned int cflag)
>   return;
>   }
>  
> - /*
> -  * For the generic path SCSPTR is necessary. Bail out if that's
> -  * unavailable, too.
> -  */
> - if (!sci_getreg(port, SCSPTR)->size)
> - return;
> -
> - if ((s->cfg->capabilities & SCIx_HAVE_RTSCTS) &&
> - ((!(cflag & CRTSCTS {
> - unsigned short status;
> -
> - status = serial_port_in(port, SCSPTR);
> - status &= ~SCSPTR_CTSIO;
> - status |= SCSPTR_RTSIO;
> - serial_port_out(port, SCSPTR, status); /* Set RTS = 1 */
> + if (sci_getreg(port, SCSPTR)->size) {
> + u16 status = serial_port_in(port, SCSPTR);
> +
> + /* RTS# is output, driven 1 */
> + status |= SCSPTR_RTSIO | SCSPTR_RTSDT;
> + /* CTS# and SCK are inputs */
> + status &= ~(SCSPTR_CTSIO | SCSPTR_SCKIO);
> + serial_port_out(port, SCSPTR, status);
>   }
>  }
>  
>

Re: [PATCH v2 07/11] serial: sh-sci: Add more Serial Port Control/Data Register documentation

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:58 AM, Geert Uytterhoeven wrote:
> Improve documentation for the SCIFA/SCIFB Serial Port Control and Data
> Registers:
>   - State clearly that the RTS and CTS lines are active-low,
>   - Document the bits related to the serial port's SCK, RXD, and TXD
> pins.

Reviewed-by: Peter Hurley <pe...@hurleysoftware.com>

Re: [PATCH v2 07/11] serial: sh-sci: Add more Serial Port Control/Data Register documentation

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:58 AM, Geert Uytterhoeven wrote:
> Improve documentation for the SCIFA/SCIFB Serial Port Control and Data
> Registers:
>   - State clearly that the RTS and CTS lines are active-low,
>   - Document the bits related to the serial port's SCK, RXD, and TXD
> pins.

Reviewed-by: Peter Hurley

Re: [PATCH v2 06/11] serial: sh-sci: Add more Serial Port Register documentation

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:58 AM, Geert Uytterhoeven wrote:
> Improve documentation for the (H)SCIF Serial Port Register:
>   - Make it clear the RTS and CTS lines are active-low,
>   - Document the bits related to the serial port's clock pin.

Reviewed-by: Peter Hurley <pe...@hurleysoftware.com>

Re: [PATCH v2 06/11] serial: sh-sci: Add more Serial Port Register documentation

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:58 AM, Geert Uytterhoeven wrote:
> Improve documentation for the (H)SCIF Serial Port Register:
>   - Make it clear the RTS and CTS lines are active-low,
>   - Document the bits related to the serial port's clock pin.

Reviewed-by: Peter Hurley

Re: [PATCH v2 05/11] serial: sh-sci: Do not open-code sci_getreg()

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:58 AM, Geert Uytterhoeven wrote:
> Replace open-coded variants of sci_getreg() by function calls, and drop
> intermediate variables where appropriate.

Reviewed-by: Peter Hurley <pe...@hurleysoftware.com>

Re: [PATCH v2 04/11] serial: sh-sci: Add support for GPIO-controlled modem lines

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:58 AM, Geert Uytterhoeven wrote:
> Enhance the Renesas SCI UART driver to add support for GPIO-controlled
> modem lines (CTS, DSR, DCD, RNG, RTS, DTR), using the serial_mctrl_gpio
> helpers.
> 
> GPIO-controlled modem lines can be used when dedicated modem lines are
> not available. Invalid configurations specifying both GPIO RTS/CTS and
> dedicated RTS/CTS are rejected.

Reviewed-by: Peter Hurley <pe...@hurleysoftware.com>

Re: [PATCH v2 05/11] serial: sh-sci: Do not open-code sci_getreg()

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:58 AM, Geert Uytterhoeven wrote:
> Replace open-coded variants of sci_getreg() by function calls, and drop
> intermediate variables where appropriate.

Reviewed-by: Peter Hurley

Re: [PATCH v2 04/11] serial: sh-sci: Add support for GPIO-controlled modem lines

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:58 AM, Geert Uytterhoeven wrote:
> Enhance the Renesas SCI UART driver to add support for GPIO-controlled
> modem lines (CTS, DSR, DCD, RNG, RTS, DTR), using the serial_mctrl_gpio
> helpers.
> 
> GPIO-controlled modem lines can be used when dedicated modem lines are
> not available. Invalid configurations specifying both GPIO RTS/CTS and
> dedicated RTS/CTS are rejected.

Reviewed-by: Peter Hurley

Re: [PATCH v2 03/11] serial: sh-sci: Always set TIOCM_CTS in .get_mctrl() callback

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:58 AM, Geert Uytterhoeven wrote:
> Documentation/serial/driver clearly states:
> 
> If the port does not support CTS, DCD or DSR, the driver should
> indicate that the signal is permanently active.
> 
> Hence always set TIOCM_CTS, as we currently don't look at the CTS
> hardware line state at all.
> 
> FWIW, this fixes the transmit path when hardware-assisted flow control
> is enabled, and userspace enables CRTSCTS.
> The receive path is still broken, as RTS is never asserted.

Reviewed-by: Peter Hurley <pe...@hurleysoftware.com>

Re: [PATCH v2 03/11] serial: sh-sci: Always set TIOCM_CTS in .get_mctrl() callback

2016-04-29 Thread Peter Hurley

On 04/29/2016 05:58 AM, Geert Uytterhoeven wrote:
> Documentation/serial/driver clearly states:
> 
> If the port does not support CTS, DCD or DSR, the driver should
> indicate that the signal is permanently active.
> 
> Hence always set TIOCM_CTS, as we currently don't look at the CTS
> hardware line state at all.
> 
> FWIW, this fixes the transmit path when hardware-assisted flow control
> is enabled, and userspace enables CRTSCTS.
> The receive path is still broken, as RTS is never asserted.

Reviewed-by: Peter Hurley

Re: [PATCH] 8250: Hypervisors always export working 16550A UARTs.

2016-04-28 Thread Peter Hurley

On 04/28/2016 03:18 PM, Richard W.M. Jones wrote:
> [This is an opinionated patch, mainly for discussion.]
> 
> I'm trying to reduce the time taken in the kernel in initcalls, with
> my aim being to reduce the current ~700ms spent in initcalls before
> userspace, down to something like 100ms.  All times on my Broadwell-U
> laptop, under virtualization.  The purpose of this is to be able to
> launch VMs around containers with minimal overhead, like Intel Clear
> Containers, but using standard distro kernels and qemu.
> 
> Currently the kernel spends 25ms inspecting the UART that we passed to
> it from qemu to find out whether it's an 8250/16550/16550A perhaps
> with a non-working FIFO or other quirks.  Well, it isn't -- it's a
> working emulated 16550A, with a FIFO and no quirks, and if it isn't,
> we should fix qemu.

I'm sure all of this delay is sizing the fifo:

static int size_fifo(struct uart_8250_port *up)
{
...
for (count = 0; count < 256; count++)
serial_out(up, UART_TX, count);
===>mdelay(20);/* FIXME - schedule_timeout */
for (count = 0; (serial_in(up, UART_LSR) & UART_LSR_DR) &&
 (count < 256); count++)
serial_in(up, UART_RX);
...

return count;
}

> So the patch detects if we're running virtualized (perhaps it should
> only check for qemu/KVM?) and if so, shortcuts the tests.

Yeah, sorry, that's not going to fly.

I'm assuming x86, yes?

There's are multiple ways already supported by this driver to specify
the port based on the platform:

1. Register "serial8250" platform device.
   Set UPF_FIXED_TYPE in ::flags and PORT_16550A in ::type
2. early_serial_setup()
   Same as above, but must be before console_init()
3. Don't enumerate PNP0501 ACPI devices
   These are going to be probed dynamically
4. Add ACPI/PNP custom device
   Fixing the port type isn't supported now but could be
5. Modify the SERIAL_PORT_DFNS
   Fixing the port type isn't supported now either but could be

Regards,
Peter Hurley

Re: [PATCH] 8250: Hypervisors always export working 16550A UARTs.

2016-04-28 Thread Peter Hurley

On 04/28/2016 03:18 PM, Richard W.M. Jones wrote:
> [This is an opinionated patch, mainly for discussion.]
> 
> I'm trying to reduce the time taken in the kernel in initcalls, with
> my aim being to reduce the current ~700ms spent in initcalls before
> userspace, down to something like 100ms.  All times on my Broadwell-U
> laptop, under virtualization.  The purpose of this is to be able to
> launch VMs around containers with minimal overhead, like Intel Clear
> Containers, but using standard distro kernels and qemu.
> 
> Currently the kernel spends 25ms inspecting the UART that we passed to
> it from qemu to find out whether it's an 8250/16550/16550A perhaps
> with a non-working FIFO or other quirks.  Well, it isn't -- it's a
> working emulated 16550A, with a FIFO and no quirks, and if it isn't,
> we should fix qemu.

I'm sure all of this delay is sizing the fifo:

static int size_fifo(struct uart_8250_port *up)
{
...
for (count = 0; count < 256; count++)
serial_out(up, UART_TX, count);
===>mdelay(20);/* FIXME - schedule_timeout */
for (count = 0; (serial_in(up, UART_LSR) & UART_LSR_DR) &&
 (count < 256); count++)
serial_in(up, UART_RX);
...

return count;
}

> So the patch detects if we're running virtualized (perhaps it should
> only check for qemu/KVM?) and if so, shortcuts the tests.

Yeah, sorry, that's not going to fly.

I'm assuming x86, yes?

There's are multiple ways already supported by this driver to specify
the port based on the platform:

1. Register "serial8250" platform device.
   Set UPF_FIXED_TYPE in ::flags and PORT_16550A in ::type
2. early_serial_setup()
   Same as above, but must be before console_init()
3. Don't enumerate PNP0501 ACPI devices
   These are going to be probed dynamically
4. Add ACPI/PNP custom device
   Fixing the port type isn't supported now but could be
5. Modify the SERIAL_PORT_DFNS
   Fixing the port type isn't supported now either but could be

Regards,
Peter Hurley

Re: [PATCH V4] audit: add tty field to LOGIN event

2016-04-28 Thread Peter Hurley

On 04/28/2016 12:28 PM, Richard Guy Briggs wrote:
> On 16/04/27, Peter Hurley wrote:
>> On 04/27/2016 06:31 PM, Richard Guy Briggs wrote:
>>> On 16/04/22, Peter Hurley wrote:
>>>> On 04/21/2016 11:14 AM, Richard Guy Briggs wrote:
>>>>> The tty field was missing from AUDIT_LOGIN events.
>>>>>
>>>>> Refactor code to create a new function audit_get_tty(), using it to
>>>>> replace the call in audit_log_task_info() and to add it to
>>>>> audit_log_set_loginuid().  Lock and bump the kref to protect it, adding
>>>>> audit_put_tty() alias to decrement it.
>>>>>
>>>>> Signed-off-by: Richard Guy Briggs <r...@redhat.com>
>>>>> ---
>>>>>
>>>>> V4: Add missing prototype for audit_put_tty() when audit syscall is not
>>>>> enabled (MIPS).
>>>>>
>>>>> V3: Introduce audit_put_tty() alias to decrement kref.
>>>>>
>>>>> V2: Use kref to protect tty signal struct while in use.
>>>>>
>>>>> ---
>>>>>
>>>>>  include/linux/audit.h |   24 
>>>>>  kernel/audit.c|   18 +-
>>>>>  kernel/auditsc.c  |8 ++--
>>>>>  3 files changed, 35 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/audit.h b/include/linux/audit.h
>>>>> index b40ed5d..32cdafb 100644
>>>>> --- a/include/linux/audit.h
>>>>> +++ b/include/linux/audit.h
>>>>> @@ -26,6 +26,7 @@
>>>>>  #include 
>>>>>  #include 
>>>>>  #include 
>>>>> +#include 
>>>>>  
>>>>>  #define AUDIT_INO_UNSET ((unsigned long)-1)
>>>>>  #define AUDIT_DEV_UNSET ((dev_t)-1)
>>>>> @@ -343,6 +344,23 @@ static inline unsigned int 
>>>>> audit_get_sessionid(struct task_struct *tsk)
>>>>>   return tsk->sessionid;
>>>>>  }
>>>>>  
>>>>> +static inline struct tty_struct *audit_get_tty(struct task_struct *tsk)
>>>>> +{
>>>>> + struct tty_struct *tty = NULL;
>>>>> + unsigned long flags;
>>>>> +
>>>>> + spin_lock_irqsave(>sighand->siglock, flags);
>>>>> + if (tsk->signal)
>>>>> + tty = tty_kref_get(tsk->signal->tty);
>>>>> + spin_unlock_irqrestore(>sighand->siglock, flags);
>>>>
>>>>
>>>> Not that I'm objecting because I get that you're just refactoring
>>>> existing code, but I thought I'd point out some stuff.
>>>>
>>>> 1. There's no need to check if signal_struct is NULL (ie. tsk->signal)
>>>>because if it is, this will blow up trying to dereference the
>>>>sighand_struct (ie tsk->sighand).
>>>
>>> Ok.  This logic goes back 10 years and one month less two days. (45d9bb0e)
>>>
>>>> 2. The existing usage is always tsk==current
>>>
>>> My understanding is that when it is called via:
>>>
>>> copy_process()
>>> audit_free()
>>> __audit_free()
>>> audit_log_exit()
>>> audit_log_task_info()
>>>
>>> then tsk != current.
>>
>> While it's true that tsk != current here, everything relevant to tty
>> in task_struct is the same because the nascent task is not even half-done.
>> So tsk->sighand == current->sighand, tsk->signal == current->signal etc.
> 
> I agree this is true except in the case of !CLONE_SIGHAND, if it fails
> after copy_sighand() or copy_signal() then it would be null and would
> get freed before audit_free() is called.  By the time tty gets copied
> from current in this case, it is past the point of failure in
> copy_process().

Oh, right.


>> If you're uncomfortable with pass-through execution like that, then the
>> simple solution is:
>>
>>  struct tty_struct *tty = NULL;
>>
>>  /* tsk != current when copy_process() failed */
>>  if (tsk == current)
>>  tty = get_current_tty();
>>
>> because tty_kref_put(tty) accepts NULL tty and (obviously) so does
>> tty_name(tty).
> 
> Given the circumstances above, this appears reasonable to me at first
> look.

Ok.

I'll spend more analysis time before I actually submit a pa

Re: [PATCH V4] audit: add tty field to LOGIN event

2016-04-28 Thread Peter Hurley

On 04/28/2016 12:28 PM, Richard Guy Briggs wrote:
> On 16/04/27, Peter Hurley wrote:
>> On 04/27/2016 06:31 PM, Richard Guy Briggs wrote:
>>> On 16/04/22, Peter Hurley wrote:
>>>> On 04/21/2016 11:14 AM, Richard Guy Briggs wrote:
>>>>> The tty field was missing from AUDIT_LOGIN events.
>>>>>
>>>>> Refactor code to create a new function audit_get_tty(), using it to
>>>>> replace the call in audit_log_task_info() and to add it to
>>>>> audit_log_set_loginuid().  Lock and bump the kref to protect it, adding
>>>>> audit_put_tty() alias to decrement it.
>>>>>
>>>>> Signed-off-by: Richard Guy Briggs 
>>>>> ---
>>>>>
>>>>> V4: Add missing prototype for audit_put_tty() when audit syscall is not
>>>>> enabled (MIPS).
>>>>>
>>>>> V3: Introduce audit_put_tty() alias to decrement kref.
>>>>>
>>>>> V2: Use kref to protect tty signal struct while in use.
>>>>>
>>>>> ---
>>>>>
>>>>>  include/linux/audit.h |   24 
>>>>>  kernel/audit.c|   18 +-
>>>>>  kernel/auditsc.c  |8 ++--
>>>>>  3 files changed, 35 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/audit.h b/include/linux/audit.h
>>>>> index b40ed5d..32cdafb 100644
>>>>> --- a/include/linux/audit.h
>>>>> +++ b/include/linux/audit.h
>>>>> @@ -26,6 +26,7 @@
>>>>>  #include 
>>>>>  #include 
>>>>>  #include 
>>>>> +#include 
>>>>>  
>>>>>  #define AUDIT_INO_UNSET ((unsigned long)-1)
>>>>>  #define AUDIT_DEV_UNSET ((dev_t)-1)
>>>>> @@ -343,6 +344,23 @@ static inline unsigned int 
>>>>> audit_get_sessionid(struct task_struct *tsk)
>>>>>   return tsk->sessionid;
>>>>>  }
>>>>>  
>>>>> +static inline struct tty_struct *audit_get_tty(struct task_struct *tsk)
>>>>> +{
>>>>> + struct tty_struct *tty = NULL;
>>>>> + unsigned long flags;
>>>>> +
>>>>> + spin_lock_irqsave(>sighand->siglock, flags);
>>>>> + if (tsk->signal)
>>>>> + tty = tty_kref_get(tsk->signal->tty);
>>>>> + spin_unlock_irqrestore(>sighand->siglock, flags);
>>>>
>>>>
>>>> Not that I'm objecting because I get that you're just refactoring
>>>> existing code, but I thought I'd point out some stuff.
>>>>
>>>> 1. There's no need to check if signal_struct is NULL (ie. tsk->signal)
>>>>because if it is, this will blow up trying to dereference the
>>>>sighand_struct (ie tsk->sighand).
>>>
>>> Ok.  This logic goes back 10 years and one month less two days. (45d9bb0e)
>>>
>>>> 2. The existing usage is always tsk==current
>>>
>>> My understanding is that when it is called via:
>>>
>>> copy_process()
>>> audit_free()
>>> __audit_free()
>>> audit_log_exit()
>>> audit_log_task_info()
>>>
>>> then tsk != current.
>>
>> While it's true that tsk != current here, everything relevant to tty
>> in task_struct is the same because the nascent task is not even half-done.
>> So tsk->sighand == current->sighand, tsk->signal == current->signal etc.
> 
> I agree this is true except in the case of !CLONE_SIGHAND, if it fails
> after copy_sighand() or copy_signal() then it would be null and would
> get freed before audit_free() is called.  By the time tty gets copied
> from current in this case, it is past the point of failure in
> copy_process().

Oh, right.


>> If you're uncomfortable with pass-through execution like that, then the
>> simple solution is:
>>
>>  struct tty_struct *tty = NULL;
>>
>>  /* tsk != current when copy_process() failed */
>>  if (tsk == current)
>>  tty = get_current_tty();
>>
>> because tty_kref_put(tty) accepts NULL tty and (obviously) so does
>> tty_name(tty).
> 
> Given the circumstances above, this appears reasonable to me at first
> look.

Ok.

I'll spend more analysis time before I actually submit a patch for this.


>>>  This appears

Re: [PATCH V4] audit: add tty field to LOGIN event

2016-04-27 Thread Peter Hurley

On 04/27/2016 06:31 PM, Richard Guy Briggs wrote:
> On 16/04/22, Peter Hurley wrote:
>> On 04/21/2016 11:14 AM, Richard Guy Briggs wrote:
>>> The tty field was missing from AUDIT_LOGIN events.
>>>
>>> Refactor code to create a new function audit_get_tty(), using it to
>>> replace the call in audit_log_task_info() and to add it to
>>> audit_log_set_loginuid().  Lock and bump the kref to protect it, adding
>>> audit_put_tty() alias to decrement it.
>>>
>>> Signed-off-by: Richard Guy Briggs <r...@redhat.com>
>>> ---
>>>
>>> V4: Add missing prototype for audit_put_tty() when audit syscall is not
>>> enabled (MIPS).
>>>
>>> V3: Introduce audit_put_tty() alias to decrement kref.
>>>
>>> V2: Use kref to protect tty signal struct while in use.
>>>
>>> ---
>>>
>>>  include/linux/audit.h |   24 
>>>  kernel/audit.c|   18 +-
>>>  kernel/auditsc.c  |8 ++--
>>>  3 files changed, 35 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/include/linux/audit.h b/include/linux/audit.h
>>> index b40ed5d..32cdafb 100644
>>> --- a/include/linux/audit.h
>>> +++ b/include/linux/audit.h
>>> @@ -26,6 +26,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  
>>>  #define AUDIT_INO_UNSET ((unsigned long)-1)
>>>  #define AUDIT_DEV_UNSET ((dev_t)-1)
>>> @@ -343,6 +344,23 @@ static inline unsigned int audit_get_sessionid(struct 
>>> task_struct *tsk)
>>> return tsk->sessionid;
>>>  }
>>>  
>>> +static inline struct tty_struct *audit_get_tty(struct task_struct *tsk)
>>> +{
>>> +   struct tty_struct *tty = NULL;
>>> +   unsigned long flags;
>>> +
>>> +   spin_lock_irqsave(>sighand->siglock, flags);
>>> +   if (tsk->signal)
>>> +   tty = tty_kref_get(tsk->signal->tty);
>>> +   spin_unlock_irqrestore(>sighand->siglock, flags);
>>
>>
>> Not that I'm objecting because I get that you're just refactoring
>> existing code, but I thought I'd point out some stuff.
>>
>> 1. There's no need to check if signal_struct is NULL (ie. tsk->signal)
>>because if it is, this will blow up trying to dereference the
>>sighand_struct (ie tsk->sighand).
> 
> Ok.  This logic goes back 10 years and one month less two days. (45d9bb0e)
> 
>> 2. The existing usage is always tsk==current
> 
> My understanding is that when it is called via:
> 
>   copy_process()
>   audit_free()
>   __audit_free()
>   audit_log_exit()
>   audit_log_task_info()
> 
> then tsk != current.

While it's true that tsk != current here, everything relevant to tty
in task_struct is the same because the nascent task is not even half-done.
So tsk->sighand == current->sighand, tsk->signal == current->signal etc.

If you're uncomfortable with pass-through execution like that, then the
simple solution is:


struct tty_struct *tty = NULL;

/* tsk != current when copy_process() failed */
if (tsk == current)
tty = get_current_tty();


because tty_kref_put(tty) accepts NULL tty and (obviously) so does
tty_name(tty).


Regards,
Peter Hurley


>  This appears to be the only case which appears to
> force lugging around tsk.  This is noted in that commit referenced
> above.
> 
>> 3. If the idea is to make this invulnerable to tsk being gone, then
>>the usage is unsafe anyway.
>>
>>
>> So ultimately (but not necessarily for this patch) I'd prefer that either
>> a. audit use existing tty api instead of open-coding, or
>> b. add any tty api functions required.
> 
> This latter option did cross my mind...
> 
>> Peter Hurley
>>
>>> +   return tty;
>>> +}
>>> +
>>> +static inline void audit_put_tty(struct tty_struct *tty)
>>> +{
>>> +   tty_kref_put(tty);
>>> +}
>>> +
>>>  extern void __audit_ipc_obj(struct kern_ipc_perm *ipcp);
>>>  extern void __audit_ipc_set_perm(unsigned long qbytes, uid_t uid, gid_t 
>>> gid, umode_t mode);
>>>  extern void __audit_bprm(struct linux_binprm *bprm);
>>> @@ -500,6 +518,12 @@ static inline unsigned int audit_get_sessionid(struct 
>>> task_struct *tsk)
>>>  {
>>> return -1;
>>>  }
>>> +static inline struct t

Re: [PATCH V4] audit: add tty field to LOGIN event

2016-04-27 Thread Peter Hurley

On 04/27/2016 06:31 PM, Richard Guy Briggs wrote:
> On 16/04/22, Peter Hurley wrote:
>> On 04/21/2016 11:14 AM, Richard Guy Briggs wrote:
>>> The tty field was missing from AUDIT_LOGIN events.
>>>
>>> Refactor code to create a new function audit_get_tty(), using it to
>>> replace the call in audit_log_task_info() and to add it to
>>> audit_log_set_loginuid().  Lock and bump the kref to protect it, adding
>>> audit_put_tty() alias to decrement it.
>>>
>>> Signed-off-by: Richard Guy Briggs 
>>> ---
>>>
>>> V4: Add missing prototype for audit_put_tty() when audit syscall is not
>>> enabled (MIPS).
>>>
>>> V3: Introduce audit_put_tty() alias to decrement kref.
>>>
>>> V2: Use kref to protect tty signal struct while in use.
>>>
>>> ---
>>>
>>>  include/linux/audit.h |   24 
>>>  kernel/audit.c|   18 +-
>>>  kernel/auditsc.c  |8 ++--
>>>  3 files changed, 35 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/include/linux/audit.h b/include/linux/audit.h
>>> index b40ed5d..32cdafb 100644
>>> --- a/include/linux/audit.h
>>> +++ b/include/linux/audit.h
>>> @@ -26,6 +26,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  
>>>  #define AUDIT_INO_UNSET ((unsigned long)-1)
>>>  #define AUDIT_DEV_UNSET ((dev_t)-1)
>>> @@ -343,6 +344,23 @@ static inline unsigned int audit_get_sessionid(struct 
>>> task_struct *tsk)
>>> return tsk->sessionid;
>>>  }
>>>  
>>> +static inline struct tty_struct *audit_get_tty(struct task_struct *tsk)
>>> +{
>>> +   struct tty_struct *tty = NULL;
>>> +   unsigned long flags;
>>> +
>>> +   spin_lock_irqsave(>sighand->siglock, flags);
>>> +   if (tsk->signal)
>>> +   tty = tty_kref_get(tsk->signal->tty);
>>> +   spin_unlock_irqrestore(>sighand->siglock, flags);
>>
>>
>> Not that I'm objecting because I get that you're just refactoring
>> existing code, but I thought I'd point out some stuff.
>>
>> 1. There's no need to check if signal_struct is NULL (ie. tsk->signal)
>>because if it is, this will blow up trying to dereference the
>>sighand_struct (ie tsk->sighand).
> 
> Ok.  This logic goes back 10 years and one month less two days. (45d9bb0e)
> 
>> 2. The existing usage is always tsk==current
> 
> My understanding is that when it is called via:
> 
>   copy_process()
>   audit_free()
>   __audit_free()
>   audit_log_exit()
>   audit_log_task_info()
> 
> then tsk != current.

While it's true that tsk != current here, everything relevant to tty
in task_struct is the same because the nascent task is not even half-done.
So tsk->sighand == current->sighand, tsk->signal == current->signal etc.

If you're uncomfortable with pass-through execution like that, then the
simple solution is:


struct tty_struct *tty = NULL;

/* tsk != current when copy_process() failed */
if (tsk == current)
tty = get_current_tty();


because tty_kref_put(tty) accepts NULL tty and (obviously) so does
tty_name(tty).


Regards,
Peter Hurley


>  This appears to be the only case which appears to
> force lugging around tsk.  This is noted in that commit referenced
> above.
> 
>> 3. If the idea is to make this invulnerable to tsk being gone, then
>>the usage is unsafe anyway.
>>
>>
>> So ultimately (but not necessarily for this patch) I'd prefer that either
>> a. audit use existing tty api instead of open-coding, or
>> b. add any tty api functions required.
> 
> This latter option did cross my mind...
> 
>> Peter Hurley
>>
>>> +   return tty;
>>> +}
>>> +
>>> +static inline void audit_put_tty(struct tty_struct *tty)
>>> +{
>>> +   tty_kref_put(tty);
>>> +}
>>> +
>>>  extern void __audit_ipc_obj(struct kern_ipc_perm *ipcp);
>>>  extern void __audit_ipc_set_perm(unsigned long qbytes, uid_t uid, gid_t 
>>> gid, umode_t mode);
>>>  extern void __audit_bprm(struct linux_binprm *bprm);
>>> @@ -500,6 +518,12 @@ static inline unsigned int audit_get_sessionid(struct 
>>> task_struct *tsk)
>>>  {
>>> return -1;
>>>  }
>>> +static inline struct tty_struct *audit_get_tty(str

Re: [PATCH] tty: provide tty_name() even without CONFIG_TTY

2016-04-27 Thread Peter Hurley

On 04/27/2016 10:24 AM, Arnd Bergmann wrote:
> On Wednesday 27 April 2016 12:20:02 Paul Moore wrote:
>>> diff --git a/include/linux/tty.h b/include/linux/tty.h
>>> index 3b09f235db66..17b247c94440 100644
>>> --- a/include/linux/tty.h
>>> +++ b/include/linux/tty.h
>>> @@ -371,6 +371,7 @@ extern void proc_clear_tty(struct task_struct *p);
>>>  extern struct tty_struct *get_current_tty(void);
>>>  /* tty_io.c */
>>>  extern int __init tty_init(void);
>>> +extern const char *tty_name(const struct tty_struct *tty);
>>>  #else
>>>  static inline void console_init(void)
>>>  { }
>>> @@ -391,6 +392,8 @@ static inline struct tty_struct *get_current_tty(void)
>>>  /* tty_io.c */
>>>  static inline int __init tty_init(void)
>>>  { return 0; }
>>> +static inline const char *tty_name(const struct tty_struct *tty)
>>> +{ return "(none)"; }
>>>  #endif
>>
>> As it currently stands tty_name() returns "NULL tty" when the passed
>> tty_struct is NULL while this patch returns "(none)" in the case of
>> CONFIG_TTY=n; it seems like some consistency might be good, yes?  Or
>> do you think there is value in differentiating between the two cases?
>>
>> From an audit point of view, we would prefer if both were "(none)".
> 
> Right, I noticed that the audit code prints "(none)" here while the
> tty code prints "NULL tty", and that meant I could not make it behave
> the same way as all the existing code. I picked "(none)" because
> in case of CONFIG_TTY being disabled that is more logical: it's
> not a NULL pointer because something went wrong, but instead the
> pointer doesn't matter and we know there is no tty.

Apologies for not having foreseen this in the review.
Arnd's solution looks good to me.

Re: [PATCH] tty: provide tty_name() even without CONFIG_TTY

2016-04-27 Thread Peter Hurley

On 04/27/2016 10:24 AM, Arnd Bergmann wrote:
> On Wednesday 27 April 2016 12:20:02 Paul Moore wrote:
>>> diff --git a/include/linux/tty.h b/include/linux/tty.h
>>> index 3b09f235db66..17b247c94440 100644
>>> --- a/include/linux/tty.h
>>> +++ b/include/linux/tty.h
>>> @@ -371,6 +371,7 @@ extern void proc_clear_tty(struct task_struct *p);
>>>  extern struct tty_struct *get_current_tty(void);
>>>  /* tty_io.c */
>>>  extern int __init tty_init(void);
>>> +extern const char *tty_name(const struct tty_struct *tty);
>>>  #else
>>>  static inline void console_init(void)
>>>  { }
>>> @@ -391,6 +392,8 @@ static inline struct tty_struct *get_current_tty(void)
>>>  /* tty_io.c */
>>>  static inline int __init tty_init(void)
>>>  { return 0; }
>>> +static inline const char *tty_name(const struct tty_struct *tty)
>>> +{ return "(none)"; }
>>>  #endif
>>
>> As it currently stands tty_name() returns "NULL tty" when the passed
>> tty_struct is NULL while this patch returns "(none)" in the case of
>> CONFIG_TTY=n; it seems like some consistency might be good, yes?  Or
>> do you think there is value in differentiating between the two cases?
>>
>> From an audit point of view, we would prefer if both were "(none)".
> 
> Right, I noticed that the audit code prints "(none)" here while the
> tty code prints "NULL tty", and that meant I could not make it behave
> the same way as all the existing code. I picked "(none)" because
> in case of CONFIG_TTY being disabled that is more logical: it's
> not a NULL pointer because something went wrong, but instead the
> pointer doesn't matter and we know there is no tty.

Apologies for not having foreseen this in the review.
Arnd's solution looks good to me.

Re: [PATCH V4] audit: add tty field to LOGIN event

2016-04-26 Thread Peter Hurley

On 04/26/2016 03:34 PM, Paul Moore wrote:
> On Fri, Apr 22, 2016 at 1:16 PM, Peter Hurley <pe...@hurleysoftware.com> 
> wrote:
>> On 04/21/2016 11:14 AM, Richard Guy Briggs wrote:
>>> diff --git a/include/linux/audit.h b/include/linux/audit.h
>>> index b40ed5d..32cdafb 100644
>>> --- a/include/linux/audit.h
>>> +++ b/include/linux/audit.h
>>> @@ -26,6 +26,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>
>>>  #define AUDIT_INO_UNSET ((unsigned long)-1)
>>>  #define AUDIT_DEV_UNSET ((dev_t)-1)
>>> @@ -343,6 +344,23 @@ static inline unsigned int audit_get_sessionid(struct 
>>> task_struct *tsk)
>>>   return tsk->sessionid;
>>>  }
>>>
>>> +static inline struct tty_struct *audit_get_tty(struct task_struct *tsk)
>>> +{
>>> + struct tty_struct *tty = NULL;
>>> + unsigned long flags;
>>> +
>>> + spin_lock_irqsave(>sighand->siglock, flags);
>>> + if (tsk->signal)
>>> + tty = tty_kref_get(tsk->signal->tty);
>>> + spin_unlock_irqrestore(>sighand->siglock, flags);
> 
> I just merged Richard's patch, if nothing else it is better than it
> was.  However, I would like to talk about improving things, see below.
> 
>> Not that I'm objecting because I get that you're just refactoring
>> existing code, but I thought I'd point out some stuff.
>>
>> 1. There's no need to check if signal_struct is NULL (ie. tsk->signal)
>>because if it is, this will blow up trying to dereference the
>>sighand_struct (ie tsk->sighand).
>>
>> 2. The existing usage is always tsk==current
> 
> Yep, there is only one caller I found that even works on task_structs
> other than current (see audit_log_exit() via audit_free()), although
> even then when it ends up calling into audit_log_task_info() tsk
> should always be current.
> 
> I've got a patch compiling now to get rid of passing around current as
> a a task_struct argument, assuming nothing blows up in testing I'll
> post/merge it.
> 
>> 3. If the idea is to make this invulnerable to tsk being gone, then
>>the usage is unsafe anyway.
> 
> I don't think that is our concern here.
> 
>> So ultimately (but not necessarily for this patch) I'd prefer that either
>> a. audit use existing tty api instead of open-coding, or
>> b. add any tty api functions required.
> 
> I'm open to suggestions, care to elaborate on either option?

So b) is only necessary if the answer to 3) was yes or if tsk != current.
Otherwise, the new audit_get_tty() is equivalent to get_current_tty()
which is the exported tty core interface for the identical operation.

I was only suggesting b) if get_current_tty() wasn't going to be
sufficient.

> Feel free to elaborate by patch too ;)

I can do that.

Regards,
Peter Hurley

Re: [PATCH V4] audit: add tty field to LOGIN event

2016-04-26 Thread Peter Hurley

On 04/26/2016 03:34 PM, Paul Moore wrote:
> On Fri, Apr 22, 2016 at 1:16 PM, Peter Hurley  
> wrote:
>> On 04/21/2016 11:14 AM, Richard Guy Briggs wrote:
>>> diff --git a/include/linux/audit.h b/include/linux/audit.h
>>> index b40ed5d..32cdafb 100644
>>> --- a/include/linux/audit.h
>>> +++ b/include/linux/audit.h
>>> @@ -26,6 +26,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>
>>>  #define AUDIT_INO_UNSET ((unsigned long)-1)
>>>  #define AUDIT_DEV_UNSET ((dev_t)-1)
>>> @@ -343,6 +344,23 @@ static inline unsigned int audit_get_sessionid(struct 
>>> task_struct *tsk)
>>>   return tsk->sessionid;
>>>  }
>>>
>>> +static inline struct tty_struct *audit_get_tty(struct task_struct *tsk)
>>> +{
>>> + struct tty_struct *tty = NULL;
>>> + unsigned long flags;
>>> +
>>> + spin_lock_irqsave(>sighand->siglock, flags);
>>> + if (tsk->signal)
>>> + tty = tty_kref_get(tsk->signal->tty);
>>> + spin_unlock_irqrestore(>sighand->siglock, flags);
> 
> I just merged Richard's patch, if nothing else it is better than it
> was.  However, I would like to talk about improving things, see below.
> 
>> Not that I'm objecting because I get that you're just refactoring
>> existing code, but I thought I'd point out some stuff.
>>
>> 1. There's no need to check if signal_struct is NULL (ie. tsk->signal)
>>because if it is, this will blow up trying to dereference the
>>sighand_struct (ie tsk->sighand).
>>
>> 2. The existing usage is always tsk==current
> 
> Yep, there is only one caller I found that even works on task_structs
> other than current (see audit_log_exit() via audit_free()), although
> even then when it ends up calling into audit_log_task_info() tsk
> should always be current.
> 
> I've got a patch compiling now to get rid of passing around current as
> a a task_struct argument, assuming nothing blows up in testing I'll
> post/merge it.
> 
>> 3. If the idea is to make this invulnerable to tsk being gone, then
>>the usage is unsafe anyway.
> 
> I don't think that is our concern here.
> 
>> So ultimately (but not necessarily for this patch) I'd prefer that either
>> a. audit use existing tty api instead of open-coding, or
>> b. add any tty api functions required.
> 
> I'm open to suggestions, care to elaborate on either option?

So b) is only necessary if the answer to 3) was yes or if tsk != current.
Otherwise, the new audit_get_tty() is equivalent to get_current_tty()
which is the exported tty core interface for the identical operation.

I was only suggesting b) if get_current_tty() wasn't going to be
sufficient.

> Feel free to elaborate by patch too ;)

I can do that.

Regards,
Peter Hurley

Re: [PATCH 1/1] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Peter Hurley

On 04/26/2016 04:15 AM, Roman Pen wrote:
> The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list
> with the following backtrace:
> 
> [  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 
> seconds.
> [  601.347574]   Tainted: G   O4.4.5-1-storage+ #6
> [  601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  601.348142] kworker/u129:5  D 880803077988 0  1636  2 
> 0x
> [  601.348519] Workqueue: ibnbd_server_fileio_wq 
> ibnbd_dev_file_submit_io_worker [ibnbd_server]
> [  601.348999]  880803077988 88080466b900 8808033f9c80 
> 880803078000
> [  601.349662]  880807c95000 7fff 815b0920 
> 880803077ad0
> [  601.350333]  8808030779a0 815b01d5  
> 880803077a38
> [  601.350965] Call Trace:
> [  601.351203]  [] ? bit_wait+0x60/0x60
> [  601.351444]  [] schedule+0x35/0x80
> [  601.351709]  [] schedule_timeout+0x192/0x230
> [  601.351958]  [] ? blk_flush_plug_list+0xc7/0x220
> [  601.352208]  [] ? ktime_get+0x37/0xa0
> [  601.352446]  [] ? bit_wait+0x60/0x60
> [  601.352688]  [] io_schedule_timeout+0xa4/0x110
> [  601.352951]  [] ? _raw_spin_unlock_irqrestore+0xe/0x10
> [  601.353196]  [] bit_wait_io+0x1b/0x70
> [  601.353440]  [] __wait_on_bit+0x5d/0x90
> [  601.353689]  [] wait_on_page_bit+0xc0/0xd0
> [  601.353958]  [] ? autoremove_wake_function+0x40/0x40
> [  601.354200]  [] __filemap_fdatawait_range+0xe4/0x140
> [  601.354441]  [] filemap_fdatawait_range+0x14/0x30
> [  601.354688]  [] filemap_write_and_wait_range+0x3f/0x70
> [  601.354932]  [] blkdev_fsync+0x1b/0x50
> [  601.355193]  [] vfs_fsync_range+0x49/0xa0
> [  601.355432]  [] blkdev_write_iter+0xca/0x100
> [  601.355679]  [] __vfs_write+0xaa/0xe0
> [  601.355925]  [] vfs_write+0xa9/0x1a0
> [  601.356164]  [] kernel_write+0x38/0x50
> 
> The underlying device is a null_blk, with default parameters:
> 
>   queue_mode= MQ
>   submit_queues = 1
> 
> Verification that nullb0 has something inflight:
> 
> root@pserver8:~# cat /sys/block/nullb0/inflight
>01
> root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec 
> cat {} \;
> ...
> /sys/block/nullb0/mq/0/cpu2/rq_list
> CTX pending:
> 8838038e2400
> ...
> 
> During debug it became clear that stalled request is always inserted in
> the rq_list from the following path:
> 
>save_stack_trace_tsk + 34
>blk_mq_insert_requests + 231
>blk_mq_flush_plug_list + 281
>blk_flush_plug_list + 199
>wait_on_page_bit + 192
>__filemap_fdatawait_range + 228
>filemap_fdatawait_range + 20
>filemap_write_and_wait_range + 63
>blkdev_fsync + 27
>vfs_fsync_range + 73
>blkdev_write_iter + 202
>__vfs_write + 170
>vfs_write + 169
>kernel_write + 56
> 
> So blk_flush_plug_list() was called with from_schedule == true.
> 
> If from_schedule is true, that means that finally blk_mq_insert_requests()
> offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
> i.e. it calls kblockd_schedule_delayed_work_on().
> 
> That means, that we race with another CPU, which is about to execute
> __blk_mq_run_hw_queue() work.
> 
> Further debugging shows the following traces from different CPUs:
> 
>   CPU#0  CPU#1
>   -- ---
>   reqeust A inserted
>   STORE hctx->ctx_map[0] bit marked
>   kblockd_schedule...() returns 1
>   
>  request B inserted
>  STORE hctx->ctx_map[1] bit marked
>  kblockd_schedule...() returns 0
>   *** WORK PENDING bit is cleared ***
>   flush_busy_ctxs() is executed, but
>   bit 1, set by CPU#1, is not observed
> 
> As a result request B pended forever.
> 
> This behaviour can be explained by speculative LOAD of hctx->ctx_map on
> CPU#0, which is reordered with clear of PENDING bit and executed _before_
> actual STORE of bit 1 on CPU#1.
> 
> The proper fix is an explicit full barrier , which guarantees
> that clear of PENDING bit is to be executed before all possible
> speculative LOADS or STORES inside actual work function.
> 
> Signed-off-by: Roman Pen 
> Cc: Gioh Kim 
> Cc: Michael Wang 
> Cc: Tejun Heo 
> Cc: Jens Axboe 
> Cc: linux-bl...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: sta...@vger.kernel.org
> ---
>  kernel/workqueue.c | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 450c21f..0ec0594 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -649,6 +649,35 @@ static void set_work_pool_and_clear_pending(struct 
> work_struct *work,
>*/
>   smp_wmb();
>

Re: [PATCH 1/1] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Peter Hurley

On 04/26/2016 04:15 AM, Roman Pen wrote:
> The bug in a workqueue leads to a stalled IO request in MQ ctx->rq_list
> with the following backtrace:
> 
> [  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 
> seconds.
> [  601.347574]   Tainted: G   O4.4.5-1-storage+ #6
> [  601.347651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  601.348142] kworker/u129:5  D 880803077988 0  1636  2 
> 0x
> [  601.348519] Workqueue: ibnbd_server_fileio_wq 
> ibnbd_dev_file_submit_io_worker [ibnbd_server]
> [  601.348999]  880803077988 88080466b900 8808033f9c80 
> 880803078000
> [  601.349662]  880807c95000 7fff 815b0920 
> 880803077ad0
> [  601.350333]  8808030779a0 815b01d5  
> 880803077a38
> [  601.350965] Call Trace:
> [  601.351203]  [] ? bit_wait+0x60/0x60
> [  601.351444]  [] schedule+0x35/0x80
> [  601.351709]  [] schedule_timeout+0x192/0x230
> [  601.351958]  [] ? blk_flush_plug_list+0xc7/0x220
> [  601.352208]  [] ? ktime_get+0x37/0xa0
> [  601.352446]  [] ? bit_wait+0x60/0x60
> [  601.352688]  [] io_schedule_timeout+0xa4/0x110
> [  601.352951]  [] ? _raw_spin_unlock_irqrestore+0xe/0x10
> [  601.353196]  [] bit_wait_io+0x1b/0x70
> [  601.353440]  [] __wait_on_bit+0x5d/0x90
> [  601.353689]  [] wait_on_page_bit+0xc0/0xd0
> [  601.353958]  [] ? autoremove_wake_function+0x40/0x40
> [  601.354200]  [] __filemap_fdatawait_range+0xe4/0x140
> [  601.354441]  [] filemap_fdatawait_range+0x14/0x30
> [  601.354688]  [] filemap_write_and_wait_range+0x3f/0x70
> [  601.354932]  [] blkdev_fsync+0x1b/0x50
> [  601.355193]  [] vfs_fsync_range+0x49/0xa0
> [  601.355432]  [] blkdev_write_iter+0xca/0x100
> [  601.355679]  [] __vfs_write+0xaa/0xe0
> [  601.355925]  [] vfs_write+0xa9/0x1a0
> [  601.356164]  [] kernel_write+0x38/0x50
> 
> The underlying device is a null_blk, with default parameters:
> 
>   queue_mode= MQ
>   submit_queues = 1
> 
> Verification that nullb0 has something inflight:
> 
> root@pserver8:~# cat /sys/block/nullb0/inflight
>01
> root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec 
> cat {} \;
> ...
> /sys/block/nullb0/mq/0/cpu2/rq_list
> CTX pending:
> 8838038e2400
> ...
> 
> During debug it became clear that stalled request is always inserted in
> the rq_list from the following path:
> 
>save_stack_trace_tsk + 34
>blk_mq_insert_requests + 231
>blk_mq_flush_plug_list + 281
>blk_flush_plug_list + 199
>wait_on_page_bit + 192
>__filemap_fdatawait_range + 228
>filemap_fdatawait_range + 20
>filemap_write_and_wait_range + 63
>blkdev_fsync + 27
>vfs_fsync_range + 73
>blkdev_write_iter + 202
>__vfs_write + 170
>vfs_write + 169
>kernel_write + 56
> 
> So blk_flush_plug_list() was called with from_schedule == true.
> 
> If from_schedule is true, that means that finally blk_mq_insert_requests()
> offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
> i.e. it calls kblockd_schedule_delayed_work_on().
> 
> That means, that we race with another CPU, which is about to execute
> __blk_mq_run_hw_queue() work.
> 
> Further debugging shows the following traces from different CPUs:
> 
>   CPU#0  CPU#1
>   -- ---
>   reqeust A inserted
>   STORE hctx->ctx_map[0] bit marked
>   kblockd_schedule...() returns 1
>   
>  request B inserted
>  STORE hctx->ctx_map[1] bit marked
>  kblockd_schedule...() returns 0
>   *** WORK PENDING bit is cleared ***
>   flush_busy_ctxs() is executed, but
>   bit 1, set by CPU#1, is not observed
> 
> As a result request B pended forever.
> 
> This behaviour can be explained by speculative LOAD of hctx->ctx_map on
> CPU#0, which is reordered with clear of PENDING bit and executed _before_
> actual STORE of bit 1 on CPU#1.
> 
> The proper fix is an explicit full barrier , which guarantees
> that clear of PENDING bit is to be executed before all possible
> speculative LOADS or STORES inside actual work function.
> 
> Signed-off-by: Roman Pen 
> Cc: Gioh Kim 
> Cc: Michael Wang 
> Cc: Tejun Heo 
> Cc: Jens Axboe 
> Cc: linux-bl...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: sta...@vger.kernel.org
> ---
>  kernel/workqueue.c | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 450c21f..0ec0594 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -649,6 +649,35 @@ static void set_work_pool_and_clear_pending(struct 
> work_struct *work,
>*/
>   smp_wmb();
>   set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT, 0);
> + /*
> +  * The following mb guarantees

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Peter Hurley

On 04/26/2016 10:45 AM, Tejun Heo wrote:
> As long as what's happening is clearly documented, I think either is
> fine.  I'm gonna go with Roman's mb patch for -stable fix but think
> it'd be nice to have a separate patch to consolidate the paths which
> clear PENDING and make them use xchg.  If you can spin up a patch for
> that, I'd be happy to apply it to wq/for-3.7.

Ok.

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Peter Hurley

On 04/26/2016 10:45 AM, Tejun Heo wrote:
> As long as what's happening is clearly documented, I think either is
> fine.  I'm gonna go with Roman's mb patch for -stable fix but think
> it'd be nice to have a separate patch to consolidate the paths which
> clear PENDING and make them use xchg.  If you can spin up a patch for
> that, I'd be happy to apply it to wq/for-3.7.

Ok.

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Peter Hurley

Hi Tejun,

On 04/26/2016 08:15 AM, Tejun Heo wrote:
> Hello, Peter.
> 
> On Mon, Apr 25, 2016 at 06:22:01PM -0700, Peter Hurley wrote:
>> This is the same bug I wrote about 2 yrs ago (but with the wrong fix).
>>
>> http://lkml.iu.edu/hypermail/linux/kernel/1402.2/04697.html
>>
>> Unfortunately I didn't have a reproducer at all :/
> 
> Ah, bummer.
> 
>> The atomic_long_xchg() patch has several benefits over the naked barrier:
>>
>> 1. set_work_pool_and_clear_pending() has the same requirements as
>>clear_work_data(); note that both require write barrier before and
>>full barrier after.
> 
> clear_work_data() is only used by __cancel_work_timer() and there's no
> following execution or anything where rescheduling memory loads can
> cause any issue.

I meant this block:

clear_work_data(work);

/*
 * Paired with prepare_to_wait() above so that either
 * waitqueue_active() is visible here or !work_is_canceling() is
 * visible there.
 */
smp_mb();

>> 2. xchg() et al imply full barrier before and full barrier after.
>>
>> 3. The naked barriers could be removed, while improving efficiency.
>>On x86, mov + mfence => xchg
> 
> It's unlikely to make any measureable difference.  Is xchg() actually
> cheaper than store + rmb?

store + mfence (full barrier), yes. Roughly 2x faster.

https://lkml.org/lkml/2015/11/2/607

>> 4. Maybe fixes other hidden bugs.
>>For example, I'm wondering if reordering with set_work_pwq/list_add_tail
>>would be a problem; ie., what if work is visible on the worklist _before_
>>data is initialized by set_work_pwq()?
> 
> Worklist is always accessed under the pool lock.

I see a lot of list_empty() checks while not holding pool lock, but I get
what you mean: that work visible on the worklist isn't actually inspected
without holding the pool lock.

> The barrier comes
> into play only because we're using bare PENDING bit for
> synchronization.

I see that it's only transitions of PENDING that need full barriers but it's
not clear from the semantics the conditions under which pending is already
set. If performance isn't the concern there, maybe those uses (where PENDING
is expected to already be set) should check and warn?

> I'm not necessarily against making all clearings of
> PENDING to be followed by a rmb or use xhcg.  Reasons 2-4 are pretty
> weak tho.

I agree 2 and 3 are not the best reasons.
Actually, it looks that I'm in the minority anyway, and that style-wise,
naked barrier is preferred.

Regards,
Peter Hurley

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-26 Thread Peter Hurley

Hi Tejun,

On 04/26/2016 08:15 AM, Tejun Heo wrote:
> Hello, Peter.
> 
> On Mon, Apr 25, 2016 at 06:22:01PM -0700, Peter Hurley wrote:
>> This is the same bug I wrote about 2 yrs ago (but with the wrong fix).
>>
>> http://lkml.iu.edu/hypermail/linux/kernel/1402.2/04697.html
>>
>> Unfortunately I didn't have a reproducer at all :/
> 
> Ah, bummer.
> 
>> The atomic_long_xchg() patch has several benefits over the naked barrier:
>>
>> 1. set_work_pool_and_clear_pending() has the same requirements as
>>clear_work_data(); note that both require write barrier before and
>>full barrier after.
> 
> clear_work_data() is only used by __cancel_work_timer() and there's no
> following execution or anything where rescheduling memory loads can
> cause any issue.

I meant this block:

clear_work_data(work);

/*
 * Paired with prepare_to_wait() above so that either
 * waitqueue_active() is visible here or !work_is_canceling() is
 * visible there.
 */
smp_mb();

>> 2. xchg() et al imply full barrier before and full barrier after.
>>
>> 3. The naked barriers could be removed, while improving efficiency.
>>On x86, mov + mfence => xchg
> 
> It's unlikely to make any measureable difference.  Is xchg() actually
> cheaper than store + rmb?

store + mfence (full barrier), yes. Roughly 2x faster.

https://lkml.org/lkml/2015/11/2/607

>> 4. Maybe fixes other hidden bugs.
>>For example, I'm wondering if reordering with set_work_pwq/list_add_tail
>>would be a problem; ie., what if work is visible on the worklist _before_
>>data is initialized by set_work_pwq()?
> 
> Worklist is always accessed under the pool lock.

I see a lot of list_empty() checks while not holding pool lock, but I get
what you mean: that work visible on the worklist isn't actually inspected
without holding the pool lock.

> The barrier comes
> into play only because we're using bare PENDING bit for
> synchronization.

I see that it's only transitions of PENDING that need full barriers but it's
not clear from the semantics the conditions under which pending is already
set. If performance isn't the concern there, maybe those uses (where PENDING
is expected to already be set) should check and warn?

> I'm not necessarily against making all clearings of
> PENDING to be followed by a rmb or use xhcg.  Reasons 2-4 are pretty
> weak tho.

I agree 2 and 3 are not the best reasons.
Actually, it looks that I'm in the minority anyway, and that style-wise,
naked barrier is preferred.

Regards,
Peter Hurley

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Peter Hurley

On 04/25/2016 08:48 AM, Tejun Heo wrote:
> Hello, Roman.
> 
> On Mon, Apr 25, 2016 at 05:22:51PM +0200, Roman Pen wrote:
> ...
>>   CPU#6CPU#2
>>   reqeust 884000343600 inserted
>>   hctx marked as pended
>>   kblockd_schedule...() returns 1
>>   
>>   *** WORK_STRUCT_PENDING_BIT is cleared ***
>>   flush_busy_ctxs() is executed
>>reqeust 884000343cc0 inserted
>>hctx marked as pended
>>kblockd_schedule...() returns 0
>>
>>WTF?
>>
>> As a result 884000343cc0 request pended forever.
>>
>> According to the trace output I see that another CPU _always_ observes
>> WORK_STRUCT_PENDING_BIT as set for that hctx->run_work, even it was
>> cleared on another CPU.
>>
>> Checking the workqueue.c code I see that clearing the bit is nothing
>> more, but atomic_long_set(), which is  instruction. This
>> function:
>>
>>   static inline void set_work_data()
>>
>> In attempt to "fix" the mystery I replaced atomic_long_set() call with
>> atomic_long_xchg(), which is  instruction.
>>
>> The problem has gone.
>>
>> For me it looks that test_and_set_bit() ( instruction) does
>> not require flush of all CPU caches, which can be dirty after executing
>> of  on another CPU.  But  really updates the memory and the
>> following execution of  observes that bit was cleared.
>>
>> As a conculusion I can say, that I am lucky enough and can reproduce
>> this bug in several minutes on a specific load (I tried many other
>> simple loads using fio, even using btrecord/btreplay, no success).
>> And that easy reproduction on a specific load gives me some freedom
>> to test and then to be sure, that problem has gone.
> 
> Heh, excellent debugging.  I wonder how old this bug is.

This is the same bug I wrote about 2 yrs ago (but with the wrong fix).

http://lkml.iu.edu/hypermail/linux/kernel/1402.2/04697.html

Unfortunately I didn't have a reproducer at all :/


> cc'ing David
> Howells who ISTR to have reported a similar issue.  The root problem
> here, I think, is that PENDING is used to synchronize between
> different queueing instances but we don't have proper memory barrier
> after it.
> 
>   A   B
> 
>   queue (test_and_sets PENDING)
>   dispatch (clears PENDING)
>   execute queue (test_and_sets PENDING)
> 
> So, for B, the guarantee must be that either A starts executing after
> B's test_and_set or B's test_and_set succeeds; however, as we don't
> have any memory barrier between dispatch and execute, there's nothing
> preventing the processor from scheduling some memory fetch operations
> from the execute stage before the clearing of PENDING - ie. A might
> not see what B has done prior to queue even if B's test_and_set fails
> indicating that A should.  Can you please test whether the following
> patch fixes the issue?
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 2232ae3..8ec2b5e 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -666,6 +666,7 @@ static void set_work_pool_and_clear_pending(struct 
> work_struct *work,
>*/
>   smp_wmb();
>   set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT, 0);
> + smp_mb();
>  }

The atomic_long_xchg() patch has several benefits over the naked barrier:

1. set_work_pool_and_clear_pending() has the same requirements as
   clear_work_data(); note that both require write barrier before and
   full barrier after.

2. xchg() et al imply full barrier before and full barrier after.

3. The naked barriers could be removed, while improving efficiency.
   On x86, mov + mfence => xchg

4. Maybe fixes other hidden bugs.
   For example, I'm wondering if reordering with set_work_pwq/list_add_tail
   would be a problem; ie., what if work is visible on the worklist _before_
   data is initialized by set_work_pwq()?

Regards,
Peter Hurley

Re: [PATCH 1/1] [RFC] workqueue: fix ghost PENDING flag while doing MQ IO

2016-04-25 Thread Peter Hurley

On 04/25/2016 08:48 AM, Tejun Heo wrote:
> Hello, Roman.
> 
> On Mon, Apr 25, 2016 at 05:22:51PM +0200, Roman Pen wrote:
> ...
>>   CPU#6CPU#2
>>   reqeust 884000343600 inserted
>>   hctx marked as pended
>>   kblockd_schedule...() returns 1
>>   
>>   *** WORK_STRUCT_PENDING_BIT is cleared ***
>>   flush_busy_ctxs() is executed
>>reqeust 884000343cc0 inserted
>>hctx marked as pended
>>kblockd_schedule...() returns 0
>>
>>WTF?
>>
>> As a result 884000343cc0 request pended forever.
>>
>> According to the trace output I see that another CPU _always_ observes
>> WORK_STRUCT_PENDING_BIT as set for that hctx->run_work, even it was
>> cleared on another CPU.
>>
>> Checking the workqueue.c code I see that clearing the bit is nothing
>> more, but atomic_long_set(), which is  instruction. This
>> function:
>>
>>   static inline void set_work_data()
>>
>> In attempt to "fix" the mystery I replaced atomic_long_set() call with
>> atomic_long_xchg(), which is  instruction.
>>
>> The problem has gone.
>>
>> For me it looks that test_and_set_bit() ( instruction) does
>> not require flush of all CPU caches, which can be dirty after executing
>> of  on another CPU.  But  really updates the memory and the
>> following execution of  observes that bit was cleared.
>>
>> As a conculusion I can say, that I am lucky enough and can reproduce
>> this bug in several minutes on a specific load (I tried many other
>> simple loads using fio, even using btrecord/btreplay, no success).
>> And that easy reproduction on a specific load gives me some freedom
>> to test and then to be sure, that problem has gone.
> 
> Heh, excellent debugging.  I wonder how old this bug is.

This is the same bug I wrote about 2 yrs ago (but with the wrong fix).

http://lkml.iu.edu/hypermail/linux/kernel/1402.2/04697.html

Unfortunately I didn't have a reproducer at all :/


> cc'ing David
> Howells who ISTR to have reported a similar issue.  The root problem
> here, I think, is that PENDING is used to synchronize between
> different queueing instances but we don't have proper memory barrier
> after it.
> 
>   A   B
> 
>   queue (test_and_sets PENDING)
>   dispatch (clears PENDING)
>   execute queue (test_and_sets PENDING)
> 
> So, for B, the guarantee must be that either A starts executing after
> B's test_and_set or B's test_and_set succeeds; however, as we don't
> have any memory barrier between dispatch and execute, there's nothing
> preventing the processor from scheduling some memory fetch operations
> from the execute stage before the clearing of PENDING - ie. A might
> not see what B has done prior to queue even if B's test_and_set fails
> indicating that A should.  Can you please test whether the following
> patch fixes the issue?
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 2232ae3..8ec2b5e 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -666,6 +666,7 @@ static void set_work_pool_and_clear_pending(struct 
> work_struct *work,
>*/
>   smp_wmb();
>   set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT, 0);
> + smp_mb();
>  }

The atomic_long_xchg() patch has several benefits over the naked barrier:

1. set_work_pool_and_clear_pending() has the same requirements as
   clear_work_data(); note that both require write barrier before and
   full barrier after.

2. xchg() et al imply full barrier before and full barrier after.

3. The naked barriers could be removed, while improving efficiency.
   On x86, mov + mfence => xchg

4. Maybe fixes other hidden bugs.
   For example, I'm wondering if reordering with set_work_pwq/list_add_tail
   would be a problem; ie., what if work is visible on the worklist _before_
   data is initialized by set_work_pwq()?

Regards,
Peter Hurley

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4882 matches

Mail list logo