Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646

2016-04-15 Thread James Bottomley
On Fri, 2016-04-15 at 23:09 +0200, Denys Vlasenko wrote:
> On 04/15/2016 09:05 PM, James Bottomley wrote:
> > On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
> > > On 04/15/2016 04:40 PM, James Bottomley wrote:
> > > > On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
> > > > > More info here:
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> > > > 
> > > > This bug is under investigation, so I'd rather not alter code
> > > > for a
> > > > gcc
> > > > bug until we know if we can supply options to fix it rather
> > > > than
> > > > changing code.
> > > 
> > > 
> > > Background. The bug exists in gcc for 2 years, but it is rather
> > > hard to trigger, so nobody noticed.
> > 
> > We know this ... linux-scsi is on the cc for the other thread on
> > this.
> > 
> > > Unfortunately for kernel, these two commits landed in Linus tree
> > > in March 16 and 17:
> > > 
> > > 
> > > On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> > > > It occurs with the combination of the following two recent
> > > > commits:
> > > > 
> > > > - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force
> > > > inlining
> > > > of some byteswap operations")
> > > > - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn
> > > > access")
> > > 
> > > 
> > > and now *many* users of qla2x00 and new-ish gcc are going to
> > > very much notice it, as their kernels will start crashing
> > > reliably.
> > > 
> > > The commits can be reverted, sure, but they per se do not contain
> > > anything unusual. They, together with not very typical construct
> > > in qla2x00_get_host_fabric_name, one
> > > which boils down to "swab64p(constant_array_of_8_bytes)",
> > > just happen to nudge gcc in a right way to finally trigger the
> > > bug.
> > > 
> > > So I came with another idea how to forestall the imminent deluge
> > > of
> > > qla2x00 oops reports - this patch.
> > 
> > There are actually a raft of checkers that run the upstream code
> > which
> > aren't seeing any problem; likely because the code is harder to
> > trigger
> > than you think.  So, lets wait until the resolution of the other
> > thread
> > before we panic, especially since we're only at -rc3.
> 
> I'm not panicking, James.
> 
> By sending a workaround, I just want to make sure that *other people*
> won't be forced to fix up a problem which surfaced because of *my* 
> patch.

Look, if gcc really proves to be intractable, I think what should
happen is revert the triggering patch, which is

commit e3bde9568d992c5f985e6e30731a5f9f9bef7b13
Author: Denys Vlasenko 
Date:   Thu Mar 17 14:22:47 2016 -0700

include/linux/unaligned: force inlining of byteswap operations

But, as I've said a couple of times now, there are no bug reports from
the testers about qla2xxx (yet) so we can afford to wait a bit and see
if there's some other resolution that doesn't involve changing kernel
code to work around a local gcc bug.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646

2016-04-15 Thread James Bottomley
On Fri, 2016-04-15 at 15:02 -0500, Josh Poimboeuf wrote:
> On Fri, Apr 15, 2016 at 12:05:26PM -0700, James Bottomley wrote:
> > On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
> > > and now *many* users of qla2x00 and new-ish gcc are going to
> > > very much notice it, as their kernels will start crashing
> > > reliably.
> > > 
> > > The commits can be reverted, sure, but they per se do not contain
> > > anything unusual. They, together with not very typical construct
> > > in qla2x00_get_host_fabric_name, one
> > > which boils down to "swab64p(constant_array_of_8_bytes)",
> > > just happen to nudge gcc in a right way to finally trigger the
> > > bug.
> > > 
> > > So I came with another idea how to forestall the imminent deluge
> > > of
> > > qla2x00 oops reports - this patch.
> > 
> > There are actually a raft of checkers that run the upstream code
> > which
> > aren't seeing any problem; likely because the code is harder to
> > trigger
> > than you think.  So, lets wait until the resolution of the other
> > thread
> > before we panic, especially since we're only at -rc3.
> 
> Regardless of the outcome of the gcc bug, it seems kind of silly to
> byteswap a constant value of 0x.
> 
>   uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \
>   0xFF, 0xFF, 0xFF, 0xFF};
>   u64 fabric_name = wwn_to_u64(node_name);
> 
> Similar to what Denys suggested, it can just be:
> 
>   u64 fabric_name = -1;
> or
>   u64 fabric_name = 0x;
> 
> Wouldn't that be an improvement to the code regardless?

"Improvement" would be in the eye of the beholder.  Semantically it
would be wrong because we're initialising a CPU local representation of
a big endian structure, so we *should* use the conversion.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646

2016-04-15 Thread Denys Vlasenko
On 04/15/2016 09:05 PM, James Bottomley wrote:
> On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
>> On 04/15/2016 04:40 PM, James Bottomley wrote:
>>> On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
 More info here:
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
>>>
>>> This bug is under investigation, so I'd rather not alter code for a
>>> gcc
>>> bug until we know if we can supply options to fix it rather than
>>> changing code.
>>
>>
>> Background. The bug exists in gcc for 2 years, but it is rather
>> hard to trigger, so nobody noticed.
> 
> We know this ... linux-scsi is on the cc for the other thread on this.
> 
>> Unfortunately for kernel, these two commits landed in Linus tree
>> in March 16 and 17:
>>
>>
>> On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
>>> It occurs with the combination of the following two recent commits:
>>>
>>> - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining
>>> of some byteswap operations")
>>> - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
>>
>>
>> and now *many* users of qla2x00 and new-ish gcc are going to
>> very much notice it, as their kernels will start crashing reliably.
>>
>> The commits can be reverted, sure, but they per se do not contain
>> anything unusual. They, together with not very typical construct
>> in qla2x00_get_host_fabric_name, one
>> which boils down to "swab64p(constant_array_of_8_bytes)",
>> just happen to nudge gcc in a right way to finally trigger the bug.
>>
>> So I came with another idea how to forestall the imminent deluge of
>> qla2x00 oops reports - this patch.
> 
> There are actually a raft of checkers that run the upstream code which
> aren't seeing any problem; likely because the code is harder to trigger
> than you think.  So, lets wait until the resolution of the other thread
> before we panic, especially since we're only at -rc3.

I'm not panicking, James.

By sending a workaround, I just want to make sure that *other people*
won't be forced to fix up a problem which surfaced because of *my* patch.


I'm afraid "harder to trigger than you think" is not true.
It is nearly trivial to trigger it now.
I just tried the following on a freshly installed Fedora 21 machine:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ make defconfig
$ sed '/SCSI_FC_ATTRS/d;/SCSI_LOWLEVEL/d' -i .config
$ make oldconfig# answer "yes" to everything
$ nice make -j22
$ objdump -dr drivers/scsi/qla2xxx/qla_attr.o | grep -A10 
qla2x00_get_host_fabric_name

1540 :
1540:   55  push   %rbp
1541:   48 89 e5mov%rsp,%rbp
1544:   66 66 66 2e 0f 1f 84data32 data32 nopw %cs:0x0(%rax,%rax,1)
154b:   00 00 00 00 00

1550 :
1550:   55  push   %rbp
1551:   48 89 e5mov%rsp,%rbp
1554:   53  push   %rbx
1555:   48 89 d3mov%rdx,%rbx

See?
I'm sure Fedora 22, 23 and 24 will also do that.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646

2016-04-15 Thread Josh Poimboeuf
On Fri, Apr 15, 2016 at 12:05:26PM -0700, James Bottomley wrote:
> On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
> > and now *many* users of qla2x00 and new-ish gcc are going to
> > very much notice it, as their kernels will start crashing reliably.
> > 
> > The commits can be reverted, sure, but they per se do not contain
> > anything unusual. They, together with not very typical construct
> > in qla2x00_get_host_fabric_name, one
> > which boils down to "swab64p(constant_array_of_8_bytes)",
> > just happen to nudge gcc in a right way to finally trigger the bug.
> > 
> > So I came with another idea how to forestall the imminent deluge of
> > qla2x00 oops reports - this patch.
> 
> There are actually a raft of checkers that run the upstream code which
> aren't seeing any problem; likely because the code is harder to trigger
> than you think.  So, lets wait until the resolution of the other thread
> before we panic, especially since we're only at -rc3.

Regardless of the outcome of the gcc bug, it seems kind of silly to
byteswap a constant value of 0x.

uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \
0xFF, 0xFF, 0xFF, 0xFF};
u64 fabric_name = wwn_to_u64(node_name);

Similar to what Denys suggested, it can just be:

u64 fabric_name = -1;
or
u64 fabric_name = 0x;

Wouldn't that be an improvement to the code regardless?

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646

2016-04-15 Thread James Bottomley
On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
> On 04/15/2016 04:40 PM, James Bottomley wrote:
> > On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
> > > More info here:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> > 
> > This bug is under investigation, so I'd rather not alter code for a
> > gcc
> > bug until we know if we can supply options to fix it rather than
> > changing code.
> 
> 
> Background. The bug exists in gcc for 2 years, but it is rather
> hard to trigger, so nobody noticed.

We know this ... linux-scsi is on the cc for the other thread on this.

> Unfortunately for kernel, these two commits landed in Linus tree
> in March 16 and 17:
> 
> 
> On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> > It occurs with the combination of the following two recent commits:
> > 
> > - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining
> > of some byteswap operations")
> > - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
> 
> 
> and now *many* users of qla2x00 and new-ish gcc are going to
> very much notice it, as their kernels will start crashing reliably.
> 
> The commits can be reverted, sure, but they per se do not contain
> anything unusual. They, together with not very typical construct
> in qla2x00_get_host_fabric_name, one
> which boils down to "swab64p(constant_array_of_8_bytes)",
> just happen to nudge gcc in a right way to finally trigger the bug.
> 
> So I came with another idea how to forestall the imminent deluge of
> qla2x00 oops reports - this patch.

There are actually a raft of checkers that run the upstream code which
aren't seeing any problem; likely because the code is harder to trigger
than you think.  So, lets wait until the resolution of the other thread
before we panic, especially since we're only at -rc3.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646

2016-04-15 Thread Denys Vlasenko
On 04/15/2016 04:40 PM, James Bottomley wrote:
> On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
>> More info here:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> 
> This bug is under investigation, so I'd rather not alter code for a gcc
> bug until we know if we can supply options to fix it rather than
> changing code.


Background. The bug exists in gcc for 2 years, but it is rather
hard to trigger, so nobody noticed.

Unfortunately for kernel, these two commits landed in Linus tree
in March 16 and 17:


On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> It occurs with the combination of the following two recent commits:
>
> - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some 
> byteswap operations")
> - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")


and now *many* users of qla2x00 and new-ish gcc are going to
very much notice it, as their kernels will start crashing reliably.

The commits can be reverted, sure, but they per se do not contain
anything unusual. They, together with not very typical construct
in qla2x00_get_host_fabric_name, one
which boils down to "swab64p(constant_array_of_8_bytes)",
just happen to nudge gcc in a right way to finally trigger the bug.

So I came with another idea how to forestall the imminent deluge of
qla2x00 oops reports - this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qla2xxx: rewrite code to avoid hitting gcc bug 70646

2016-04-15 Thread James Bottomley
On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
> More info here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646

This bug is under investigation, so I'd rather not alter code for a gcc
bug until we know if we can supply options to fix it rather than
changing code.

James


> Signed-off-by: Denys Vlasenko 
> CC: Himanshu Madhani 
> CC: James Bottomley 
> CC: qla2xxx-upstr...@qlogic.com
> CC: Josh Poimboeuf 
> CC: linux-scsi@vger.kernel.org
> CC: linux-ker...@vger.kernel.org
> ---
>  drivers/scsi/qla2xxx/qla_attr.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_attr.c
> b/drivers/scsi/qla2xxx/qla_attr.c
> index 4dc06a13..2dd9c72 100644
> --- a/drivers/scsi/qla2xxx/qla_attr.c
> +++ b/drivers/scsi/qla2xxx/qla_attr.c
> @@ -2003,9 +2003,14 @@ static void
>  qla2x00_get_host_fabric_name(struct Scsi_Host *shost)
>  {
>   scsi_qla_host_t *vha = shost_priv(shost);
> + /*
> +  * This can trigger gcc 4.9/5.3 bug.
> +  * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
>   uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \
>   0xFF, 0xFF, 0xFF, 0xFF};
>   u64 fabric_name = wwn_to_u64(node_name);
> +  */
> + u64 fabric_name = (u64)(s64)-1; /* the same as above */
>  
>   if (vha->device_flags & SWITCH_FOUND)
>   fabric_name = wwn_to_u64(vha->fabric_node_name);

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html