Re: Regression 4.17-rc1: SSD doesn’t properly resume causing system hang (NULL pointer dereference)

2018-04-25 Thread Paul Menzel

Dear Bart,


On 04/25/18 14:26, Bart Van Assche wrote:

On Wed, 2018-04-25 at 07:37 +0200, Paul Menzel wrote:

Am 24.04.2018 um 23:17 schrieb Bart Van Assche:

On Tue, 2018-04-24 at 23:04 +0200, Paul Menzel wrote:

I applied your change, and rebuilt the Linux kernel. Unfortunately, it
looks like, it didn’t make a difference.


In that case I don't know what is causing the failure. Can you run a bisect
to determine which commit introduced this regression?


With `scsi_mod.use_blk_mq=n` the system resumes fine, so for to me
unknown reasons, that Kconfig option get selected in my Linux kernel
configuration. I remember having similar issues when this was enabled by
default in Linux 4.13-rc?, so it was just a configuration problem and
not a regression. Unfortunately, the Linux configuration files are not
under version control, so I cannot check, but it is probably my fault.

Sorry for the noise, and please tell me, what I can do to get the option
working on this old device.



Did the same system boot fine with a previous kernel with scsi-mq enabled?


No, as far as I know it never worked, see thread *[Regression 4.13-rc1] 
Resume does not work on Lenovo X60t* [1].



Anyway, we would like to know what is the root cause such that this NULL
pointer dereference can be fixed. There are namely plans to remove the
legacy block layer in the not too distant future.


I’ll be happy to test proposed changes.


Kind regards,

Paul


PS: Your mailer also changed *doesn’t* to *doesn* in the subject line.


[1] https://www.spinics.net/lists/linux-scsi/msg111457.html



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Regression 4.17-rc1: SSD doesn properly resume causing system hang (NULL pointer dereference)

2018-04-24 Thread Paul Menzel

Dear Bart,


Am 24.04.2018 um 23:17 schrieb Bart Van Assche:

On Tue, 2018-04-24 at 23:04 +0200, Paul Menzel wrote:

I applied your change, and rebuilt the Linux kernel. Unfortunately, it
looks like, it didn’t make a difference.


In that case I don't know what is causing the failure. Can you run a bisect
to determine which commit introduced this regression?


With `scsi_mod.use_blk_mq=n` the system resumes fine, so for to me 
unknown reasons, that Kconfig option get selected in my Linux kernel 
configuration. I remember having similar issues when this was enabled by 
default in Linux 4.13-rc?, so it was just a configuration problem and 
not a regression. Unfortunately, the Linux configuration files are not 
under version control, so I cannot check, but it is probably my fault.


Sorry for the noise, and please tell me, what I can do to get the option 
working on this old device.



Kind regards,

Paul


Re: Regression 4.17-rc1: SSD doesn properly resume causing system hang (NULL pointer dereference)

2018-04-24 Thread Paul Menzel

Dear Bart,


On 04/24/18 19:31, Bart Van Assche wrote:

On Tue, 2018-04-24 at 19:10 +0200, Paul Menzel wrote:

Please find the configuration file attached. The log only has
`initcall_debug no_console_suspend` added.


What I was looking for in the .config is the following:
CONFIG_SCSI_MQ_DEFAULT=y

Can you also provide the disassembly output for blk_set_runtime_active,
e.g. by loading vmlinux into gdb and by running the command "disas
blk_set_runtime_active"?


Here it is, pasted as citation, as otherwise Thunderbird would wrap the 
line.



(gdb) disas blk_set_runtime_active
Dump of assembler code for function blk_set_runtime_active:
   0xc1518610 <+0>:   call   0xc106ac9c <__fentry__>
   0xc1518615 <+5>:   push   %ebp
   0xc1518616 <+6>:   mov%esp,%ebp
   0xc1518618 <+8>:   sub$0x14,%esp
   0xc151861b <+11>:  mov%ebx,-0xc(%ebp)
   0xc151861e <+14>:  mov%eax,%ebx
   0xc1518620 <+16>:  mov%gs:0x14,%eax
   0xc1518626 <+22>:  mov%eax,-0x10(%ebp)
   0xc1518629 <+25>:  xor%eax,%eax
   0xc151862b <+27>:  test   %ebx,%ebx
   0xc151862d <+29>:  mov%esi,-0x8(%ebp)
   0xc1518630 <+32>:  mov%edi,-0x4(%ebp)
   0xc1518633 <+35>:  je 0xc15186b3 <blk_set_runtime_active+163>
   0xc1518635 <+37>:  mov0xfc(%ebx),%eax
   0xc151863b <+43>:  call   0xc1a4b920 <_raw_spin_lock_irq>
   0xc1518640 <+48>:  mov0x150(%ebx),%esi
   0xc1518646 <+54>:  xor%eax,%eax
   0xc1518648 <+56>:  mov0xc1ca7d20,%edi
   0xc151864e <+62>:  mov%eax,0x154(%ebx)
   0xc1518654 <+68>:  cmp$0xff0c,%esi
   0xc151865a <+74>:  mov%edi,-0x14(%ebp)
   0xc151865d <+77>:  je 0xc15186a5 <blk_set_runtime_active+149>
   0xc151865f <+79>:  mov%edi,0xf4(%esi)
   0xc1518665 <+85>:  mov$0x9,%edx
   0xc151866a <+90>:  mov0x150(%ebx),%eax
   0xc1518670 <+96>:  call   0xc175ab80 <__pm_runtime_suspend>
   0xc1518675 <+101>: mov0xfc(%ebx),%eax
   0xc151867b <+107>: call   *0xc1ce2918
   0xc1518681 <+113>: call   *0xc1ce2888
   0xc1518687 <+119>: mov-0x10(%ebp),%eax
   0xc151868a <+122>: xor%gs:0x14,%eax
   0xc1518691 <+129>: jne0xc15186a0 <blk_set_runtime_active+144>
   0xc1518693 <+131>: mov-0xc(%ebp),%ebx
   0xc1518696 <+134>: mov-0x8(%ebp),%esi
   0xc1518699 <+137>: mov-0x4(%ebp),%edi
   0xc151869c <+140>: mov%ebp,%esp
   0xc151869e <+142>: pop%ebp
   0xc151869f <+143>:	ret
   0xc15186a0 <+144>:	call   0xc108c6c0 <__stack_chk_fail>

   0xc15186a5 <+149>: xor%edx,%edx
   0xc15186a7 <+151>: mov$0xc1ee14b4,%eax
   0xc15186ac <+156>: call   0xc15bb7f0 <__ubsan_handle_type_mismatch>
   0xc15186b1 <+161>: jmp0xc151865f <blk_set_runtime_active+79>
   0xc15186b3 <+163>: xor%edx,%edx
   0xc15186b5 <+165>: mov$0xc1ee14cc,%eax
   0xc15186ba <+170>: call   0xc15bb7f0 <__ubsan_handle_type_mismatch>
   0xc15186bf <+175>: jmp0xc1518635 <blk_set_runtime_active+37>
End of assembler dump.



Kind regards,

Paul


PS: By the way, your mailer stripped the full names of my first message, 
and replace the “names” with the email address.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: aacaid: Difference in `/sys` between 4.14.13 andout of tree driver 55022

2018-03-20 Thread Paul Menzel

Dear Raghava,


On 01/15/18 21:22, Paul Menzel wrote:


Am 18.12.2017 um 19:09 schrieb Raghava Aditya Renukunta:


-Original Message- From: Paul Menzel
[mailto:pmen...@molgen.mpg.de] Sent: Saturday, December 16, 2017
1:39 AM


[…]


Am 17.02.2017 um 20:29 schrieb Raghava Aditya Renukunta:


[…]

Searching the vendor Web site, there is *Linux Driver Source 
1.2.1-53005* available for download [1].


The latest upstream driver version is 50740. We will be reaching
version 53005 in couple of patch sets ( ~ 3).

http://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?id=96f6a6134766de0d42a98c7758736dde16e0add5 


Thank you for the details. At our infrastructure we only want to
use LTS Linux kernels, and the latest in 4.14. So right now, Linux
4.14.6 includes version 50834 [1], which is the same version
currently in Linus master branch (4.15-rc3). Is that save to use
with async mode, or are you aware of problems and we should always
use the latest out of tree driver, which is at version 55022 and
can be download from the Microsemi server [3].


Well at this point I am in the process of creating a patch set that
solves a kdump regression issue(Should be out before the new year),
other than that the upstream driver is pretty much up to date. If
kdump support is a must  for you I would recommend that  55022 be
used.


We tried Linux 4.14.13, and noticed a difference. My colleague commented 
as below [1].


The problem is still present in Linux 4.14.23.


Here is the location info of the missing sys-fs parts
short way, where as the `device` part is missing:
```
#ls -la /sys/class/enclosure/7:0:80:0/Disk001
drwxr-xr-x  3 root system    0 Jan 10 13:08 .
drwxr-xr-x 19 root system    0 Jan 10 13:07 ..
-rw-r--r--  1 root system 4096 Jan 11 12:56 active
lrwxrwxrwx  1 root system    0 Jan 10 13:08 device -> 
../../../../../../../port-7:1/end_device-7:1/target7:0:65/7:0:65:0
-rw-r--r--  1 root system 4096 Jan 11 12:56 fault
-rw-r--r--  1 root system 4096 Jan 11 12:56 locate
drwxr-xr-x  2 root system    0 Jan 11 12:56 power
-rw-r--r--  1 root system 4096 Jan 11 12:56 power_status
-r--r--r--  1 root system 4096 Jan 11 12:56 slot
-rw-r--r--  1 root system 4096 Jan 11 12:56 status
-r--r--r--  1 root system 4096 Jan 11 12:56 type
-rw-r--r--  1 root system 4096 Jan 11 12:56 uevent
```
The true location would be:
```
/sys/devices/pci:00/:00:03.0/:04:00.0/host7/port-7:16/end_device-7:16/target7:0:80/7:0:80:0/enclosure/7:0:80:0/Disk001 


```
Could you point me to a commit bring the in tree driver on par with the 
out of tree driver?


It’d be great, if you could point us to the relevant source, how the 
device link can be created.



Kind regards,

Paul



[1] 
https://storage.microsemi.com/en-us/speed/raid/aac/linux/aacraid-linux-src-1_2_1-53005_tgz.php
[2] 
https://elixir.free-electrons.com/linux/v4.14.6/source/drivers/scsi/aacraid/aacraid.h#L100

[3] 
https://storage.microsemi.com/en-us/downloads/linux_source/linux_source_code/productid=aha-1000-8e=microsemi+adaptec+hba+1000-8e.php

[4] https://github.molgen.mpg.de/mariux64/bee-files/pull/571#issuecomment-4468




smime.p7s
Description: S/MIME Cryptographic Signature


aacaid: Difference in `/sys` between 4.14.13 andout of tree driver 55022 (was: Driver version for PMC Adaptec HBA in Linux and from vendor)

2018-01-15 Thread Paul Menzel

Dear Raghava,


Am 18.12.2017 um 19:09 schrieb Raghava Aditya Renukunta:


-Original Message- From: Paul Menzel
[mailto:pmen...@molgen.mpg.de] Sent: Saturday, December 16, 2017
1:39 AM


[…]


Am 17.02.2017 um 20:29 schrieb Raghava Aditya Renukunta:


[…]

Searching the vendor Web site, there is *Linux Driver Source 
1.2.1-53005* available for download [1].


The latest upstream driver version is 50740. We will be reaching
version 53005 in couple of patch sets ( ~ 3).


http://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?id=96f6a6134766de0d42a98c7758736dde16e0add5
Thank you for the details. At our infrastructure we only want to
use LTS Linux kernels, and the latest in 4.14. So right now, Linux
4.14.6 includes version 50834 [1], which is the same version
currently in Linus master branch (4.15-rc3). Is that save to use
with async mode, or are you aware of problems and we should always
use the latest out of tree driver, which is at version 55022 and
can be download from the Microsemi server [3].


Well at this point I am in the process of creating a patch set that
solves a kdump regression issue(Should be out before the new year),
other than that the upstream driver is pretty much up to date. If
kdump support is a must  for you I would recommend that  55022 be
used.
We tried Linux 4.14.13, and noticed a difference. My colleague commented 
as below [1].



Here is the location info of the missing sys-fs parts
short way, where as the `device` part is missing:
```
#ls -la /sys/class/enclosure/7:0:80:0/Disk001
drwxr-xr-x  3 root system0 Jan 10 13:08 .
drwxr-xr-x 19 root system0 Jan 10 13:07 ..
-rw-r--r--  1 root system 4096 Jan 11 12:56 active
lrwxrwxrwx  1 root system0 Jan 10 13:08 device -> 
../../../../../../../port-7:1/end_device-7:1/target7:0:65/7:0:65:0
-rw-r--r--  1 root system 4096 Jan 11 12:56 fault
-rw-r--r--  1 root system 4096 Jan 11 12:56 locate
drwxr-xr-x  2 root system0 Jan 11 12:56 power
-rw-r--r--  1 root system 4096 Jan 11 12:56 power_status
-r--r--r--  1 root system 4096 Jan 11 12:56 slot
-rw-r--r--  1 root system 4096 Jan 11 12:56 status
-r--r--r--  1 root system 4096 Jan 11 12:56 type
-rw-r--r--  1 root system 4096 Jan 11 12:56 uevent
```
The true location would be:
```
/sys/devices/pci:00/:00:03.0/:04:00.0/host7/port-7:16/end_device-7:16/target7:0:80/7:0:80:0/enclosure/7:0:80:0/Disk001
```
Could you point me to a commit bring the in tree driver on par with the 
out of tree driver?


[…]


Kind regards,

Paul


[1] https://storage.microsemi.com/en-us/speed/raid/aac/linux/aacraid-linux-src-1_2_1-53005_tgz.php >> [2] 

https://elixir.free-electrons.com/linux/v4.14.6/source/drivers/scsi/aacraid/aacraid.h#L100

[3] 
https://storage.microsemi.com/en-us/downloads/linux_source/linux_source_code/productid=aha-1000-8e=microsemi+adaptec+hba+1000-8e.php
[4] 
https://github.molgen.mpg.de/mariux64/bee-files/pull/571#issuecomment-4468


Re: Driver version for PMC Adaptec HBA in Linux and from vendor

2017-12-19 Thread Paul Menzel

Dear Raghava Aditya,


Thank you for your answer.

Am 18.12.2017 um 19:09 schrieb Raghava Aditya Renukunta:


-Original Message-
From: Paul Menzel [mailto:pmen...@molgen.mpg.de]
Sent: Saturday, December 16, 2017 1:39 AM
To: Raghava Aditya Renukunta
<raghavaaditya.renuku...@microsemi.com>; dl-esc-Aacraid Linux Driver
<aacr...@microsemi.com>
Cc: linux-scsi@vger.kernel.org; it+linux-s...@vger.kernel.org
Subject: Re: Driver version for PMC Adaptec HBA in Linux and from vendor



Am 17.02.2017 um 20:29 schrieb Raghava Aditya Renukunta:


Using a PMC Adaptec HBA 1000-8e with latest Linux, it only initializes
in sync mode, instead of async mode.


The patches that enable async mode in HBA 1000-8e, have been included in

the James Bottomley's linux-scsi Branch and are on track be

Included into Linux 4.11.

https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/


```
$ git describe --tag
v4.10-rc8-47-g0722f57bf
$ dmesg
[   21.359635] Adaptec aacraid driver 1.2-1[41066]-ms
[   21.360017] aacraid :04:00.0: can't disable ASPM; OS doesn't have
ASPM control
[   21.363987] AAC0: Async. mode not supported by current driver, sync.
mode enforced.
[   21.363987] Please update driver to get full performance.
[   21.364949] AAC0: kernel 1.2-0[0] Nov  5 2015
[   21.365275] AAC0: monitor 0.0-0[0]
[   21.371382] AAC0: bios 0.13-209[32000]
[   21.371711] AAC0: serial 10F447
[   21.372035] AAC0: Non-DASD support enabled.
[   21.372360] AAC0: 64bit support enabled.
[   21.372688] AAC0: 64 Bit DAC enabled
[…]
$ git grep 'AAC_DRIVER_BUILD 41066'
drivers/scsi/aacraid/aacraid.h:# define AAC_DRIVER_BUILD 41066
```

Searching the vendor Web site, there is *Linux Driver Source
1.2.1-53005* available for download [1].


The latest upstream driver version is 50740. We will be reaching version 53005 
in couple of patch sets  ( ~ 3).


http://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?id=96f6a613
4766de0d42a98c7758736dde16e0add5

Thank you for the details. At our infrastructure we only want to use LTS
Linux kernels, and the latest in 4.14. So right now, Linux 4.14.6
includes version 50834 [1], which is the same version currently in Linus
master branch (4.15-rc3). Is that save to use with async mode, or are
you aware of problems and we should always use the latest out of tree
driver, which is at version 55022 and can be download from the Microsemi
server [3].


Well at this point I am in the process of creating a patch set that solves a 
kdump regression issue(Should be out before the new year), other than that the 
upstream driver is pretty much up to date. If kdump support is a must  for you 
I would recommend that  55022 be used.


From your answer the state of async support is unclear to me. Could you 
please clarify, if that’s support in 4.14.x? (What source line do I need 
to check?)



How does the upstream process work? Is there a git repository somewhere
from Microsemi? Are the patches already up for review? (I didn’t find them.)


We try to push out patch sets to kernel.org for every major  driver release we 
make.  Usually they go into the
sub component maintainers branch (linux-scsi ) , which is then pushed out to 
Linus when the merge
window for  opens (currently the merge window for 4.10 is closed , barring 
fixes). So Linux version 4.11 should have
full async support and more for HBA1000-8e.

We do not maintain a git repository unfortunately, but we do release the >>> 
source code for every release as you indicated.

For further reference the patches are sent out in the scsi mailing list 
linux-scsi@vger.kernel.org ,
the archive is here http://marc.info/?l=linux-scsi=1=2 .

Hope I cleared up your doubts. Please do reach out if you have other concerns 
or questions.


Yes, thank you for your elaborate answer, which cleared up a lot of my
doubts. We would be even more satisfied if you moved your development
fully to the Linux kernel tree, so that it always carries the latest
driver. If we can help with that by contacting certain people, please
tell us.


We would love to, but  we have lots of customers who are on the older kernel 
versions 2.6.32, 3.10.0 etc and It becomes almost impossible for us to fully 
move our development to the  Linux kernel tree and support our customers at the 
same time.  Hopefully we will start being up to date with the upstream kernel 
in the coming months. Hope that answered your questions.


I understand, but doesn’t it make more sense to adapt the model like 
done for Linux Long Term Support (LTS) series to develop against the 
latest Linux kernel, and then backport the corresponding patches?


Maybe you should talk to Red Hat and SUSE? I guess that’s the systems 
you have to support. Probably you already talk to them.



Kind regards,

Paul



[1] 
https://storage.microsemi.com/en-us/speed/raid/aac/linux/aacraid-linux-src-1_2_1-53005_tgz.php

[2] 
https://elixir.free-electrons.com/linux/v4.14.6/source/drivers/scsi/aacraid/aacr

Re: Driver version for PMC Adaptec HBA in Linux and from vendor

2017-12-16 Thread Paul Menzel

[Corrected email address.]

Am 16.12.2017 um 10:39 schrieb Paul Menzel:

Dear Aditya,


Am 17.02.2017 um 20:29 schrieb Raghava Aditya Renukunta:


Using a PMC Adaptec HBA 1000-8e with latest Linux, it only initializes
in sync mode, instead of async mode.


The patches that enable async mode in HBA 1000-8e, have been included 
in the James Bottomley's linux-scsi Branch and are on track be

Included into Linux 4.11.

https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/


```
$ git describe --tag
v4.10-rc8-47-g0722f57bf
$ dmesg
[   21.359635] Adaptec aacraid driver 1.2-1[41066]-ms
[   21.360017] aacraid :04:00.0: can't disable ASPM; OS doesn't have
ASPM control
[   21.363987] AAC0: Async. mode not supported by current driver, sync.
mode enforced.
[   21.363987] Please update driver to get full performance.
[   21.364949] AAC0: kernel 1.2-0[0] Nov  5 2015
[   21.365275] AAC0: monitor 0.0-0[0]
[   21.371382] AAC0: bios 0.13-209[32000]
[   21.371711] AAC0: serial 10F447
[   21.372035] AAC0: Non-DASD support enabled.
[   21.372360] AAC0: 64bit support enabled.
[   21.372688] AAC0: 64 Bit DAC enabled
[…]
$ git grep 'AAC_DRIVER_BUILD 41066'
drivers/scsi/aacraid/aacraid.h:# define AAC_DRIVER_BUILD 41066
```

Searching the vendor Web site, there is *Linux Driver Source
1.2.1-53005* available for download [1].


The latest upstream driver version is 50740. We will be reaching 
version 53005 in couple of patch sets  ( ~ 3).


http://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?id=96f6a6134766de0d42a98c7758736dde16e0add5 



Thank you for the details. At our infrastructure we only want to use LTS 
Linux kernels, and the latest in 4.14. So right now, Linux 4.14.6 
includes version 50834 [1], which is the same version currently in Linus 
master branch (4.15-rc3). Is that save to use with async mode, or are 
you aware of problems and we should always use the latest out of tree 
driver, which is at version 55022 and can be download from the Microsemi 
server [3].



How does the upstream process work? Is there a git repository somewhere
from Microsemi? Are the patches already up for review? (I didn’t find 
them.)


We try to push out patch sets to kernel.org for every major  driver 
release we make.  Usually they go into the
sub component maintainers branch (linux-scsi ) , which is then pushed 
out to Linus when the merge
window for  opens (currently the merge window for 4.10 is closed , 
barring fixes). So Linux version 4.11 should have

full async support and more for HBA1000-8e.

We do not maintain a git repository unfortunately, but we do release 
the source code for every release as you

indicated.

For further reference the patches are sent out in the scsi mailing 
list linux-scsi@vger.kernel.org ,

the archive is here http://marc.info/?l=linux-scsi=1=2 .

Hope I cleared up your doubts. Please do reach out if you have other 
concerns or questions.


Yes, thank you for your elaborate answer, which cleared up a lot of my 
doubts. We would be even more satisfied if you moved your development 
fully to the Linux kernel tree, so that it always carries the latest 
driver. If we can help with that by contacting certain people, please 
tell us.



Kind regards,

Paul



[1] 
https://storage.microsemi.com/en-us/speed/raid/aac/linux/aacraid-linux-src-1_2_1-53005_tgz.php

[2] 
https://elixir.free-electrons.com/linux/v4.14.6/source/drivers/scsi/aacraid/aacraid.h#L100
[3] https://storage.microsemi.com/en-us/downloads/linux_source/linux_source_code/productid=aha-1000-8e=microsemi+adaptec+hba+1000-8e.php 


Re: Driver version for PMC Adaptec HBA in Linux and from vendor

2017-12-16 Thread Paul Menzel

Dear Aditya,


Am 17.02.2017 um 20:29 schrieb Raghava Aditya Renukunta:


Using a PMC Adaptec HBA 1000-8e with latest Linux, it only initializes
in sync mode, instead of async mode.


The patches that enable async mode in HBA 1000-8e, have been included in the 
James Bottomley's linux-scsi Branch and are on track be
Included into Linux 4.11.

https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/


```
$ git describe --tag
v4.10-rc8-47-g0722f57bf
$ dmesg
[   21.359635] Adaptec aacraid driver 1.2-1[41066]-ms
[   21.360017] aacraid :04:00.0: can't disable ASPM; OS doesn't have
ASPM control
[   21.363987] AAC0: Async. mode not supported by current driver, sync.
mode enforced.
[   21.363987] Please update driver to get full performance.
[   21.364949] AAC0: kernel 1.2-0[0] Nov  5 2015
[   21.365275] AAC0: monitor 0.0-0[0]
[   21.371382] AAC0: bios 0.13-209[32000]
[   21.371711] AAC0: serial 10F447
[   21.372035] AAC0: Non-DASD support enabled.
[   21.372360] AAC0: 64bit support enabled.
[   21.372688] AAC0: 64 Bit DAC enabled
[…]
$ git grep 'AAC_DRIVER_BUILD 41066'
drivers/scsi/aacraid/aacraid.h:# define AAC_DRIVER_BUILD 41066
```

Searching the vendor Web site, there is *Linux Driver Source
1.2.1-53005* available for download [1].


The latest upstream driver version is 50740. We will be reaching version 53005 
in couple of patch sets  ( ~ 3).

http://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?id=96f6a6134766de0d42a98c7758736dde16e0add5


Thank you for the details. At our infrastructure we only want to use LTS 
Linux kernels, and the latest in 4.14. So right now, Linux 4.14.6 
includes version 50834 [1], which is the same version currently in Linus 
master branch (4.15-rc3). Is that save to use with async mode, or are 
you aware of problems and we should always use the latest out of tree 
driver, which is at version 55022 and can be download from the Microsemi 
server [3].



How does the upstream process work? Is there a git repository somewhere
from Microsemi? Are the patches already up for review? (I didn’t find them.)


We try to push out patch sets to kernel.org for every major  driver release we 
make.  Usually they go into the
sub component maintainers branch (linux-scsi ) , which is then pushed out to 
Linus when the merge
window for  opens (currently the merge window for 4.10 is closed , barring 
fixes). So Linux version 4.11 should have
full async support and more for HBA1000-8e.

We do not maintain a git repository unfortunately, but we do release the source 
code for every release as you
indicated.

For further reference the patches are sent out in the scsi mailing list 
linux-scsi@vger.kernel.org ,
the archive is here http://marc.info/?l=linux-scsi=1=2 .

Hope I cleared up your doubts. Please do reach out if you have other concerns 
or questions.


Yes, thank you for your elaborate answer, which cleared up a lot of my 
doubts. We would be even more satisfied if you moved your development 
fully to the Linux kernel tree, so that it always carries the latest 
driver. If we can help with that by contacting certain people, please 
tell us.



Kind regards,

Paul



[1] 
https://storage.microsemi.com/en-us/speed/raid/aac/linux/aacraid-linux-src-1_2_1-53005_tgz.php
[2] 
https://elixir.free-electrons.com/linux/v4.14.6/source/drivers/scsi/aacraid/aacraid.h#L100
[3] 
https://storage.microsemi.com/en-us/downloads/linux_source/linux_source_code/productid=aha-1000-8e=microsemi+adaptec+hba+1000-8e.php


Driver version for PMC Adaptec HBA in Linux and from vendor

2017-12-04 Thread Paul Menzel

Dear Raghava, dear Linux folks,


Evaluating HBA extension cards, one of our key requirement is easy 
maintenance, especially when upgrading the firmware.


You provide the utility `arcconf` [1], which can be used for such tasks 
directly on the command line.


Unfortunately, we can’t find the source code for this application, which 
is something we’d like to have when executing programs with root privileges.


It’d be great to have something similar like flashrom [2], or the source 
of your program.


Do you know the reasons, why the source of this utility is not published 
under a free license?


Who can be contacted to discuss this issue further?


Kind regards,

Paul


[1] http://download.adaptec.com/raid/storage_manager/arcconf_v2_05_22932.zip
[2] https://www.flashrom.org/



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Regression 4.13-rc1] Resume does not work on Lenovo X60t

2017-08-29 Thread Paul Menzel

Dear Christoph,


On 08/23/17 14:15, Paul Menzel wrote:


On 08/23/17 13:48, Christoph Hellwig wrote:

Are you running with blk-mq enabled?  Also this never
occurred with 4.12, right?  Were you also running with or
without blk-mq for scsi there?


To my knowledge, I am using the defaults from Debian 9. I’ll check in 
one week, as I am away from the system.


It looks like I was using blk-mq, as it was the default up to commit 
cbe7dfa26eee (Revert "scsi: default to scsi-mq"). So with Linux 4.13-rc7 
and disabling blk-mq for SCSI, the system is functional again after resume.



Kind regards,

Paul


Re: [Regression 4.13-rc1] Resume does not work on Lenovo X60t

2017-08-23 Thread Paul Menzel

Dear Christoph,


On 08/23/17 13:48, Christoph Hellwig wrote:

Are you running with blk-mq enabled?  Also this never
occured with 4.12, right?  Were you also running with or
without blk-mq for scsi there?


To my knowledge, I am using the defaults from Debian 9. I’ll check in 
one week, as I am away from the system.



Kind regards,

Paul


Re: [Regression 4.13-rc1] Resume does not work on Lenovo X60t

2017-08-22 Thread Paul Menzel

Dear Christoph,


On 2017-08-21 20:41, Christoph Hellwig wrote:


with 4.13-rc6 we're not using blk-mq by default any more, do you
still see the issue with that one?


Yes, I do see it this commit 6470812e2226 (Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc).

```
00.831: [  575.945132] BUG: unable to handle kernel NULL pointer 
dereference at 00f4

00.830: [  575.948009] IP: blk_set_runtime_active+0x27/0x60
00.830: [  575.948009] *pde = 
00.831: [  575.948009]
00.831: [  575.948009] Oops: 0002 [#1] SMP
00.831: [  575.948009] Modules linked in: joydev wacom_w8001 serport 
cpufreq_powersave cpufreq_conservative cpufreq_userspace binfmt_misc 
iTCO_wdt iTCO_vendor_support arc4 coretemp snd_hda_codec_analog 
snd_hda_codec_generic iwl3945 snd_hda_intel pcmcia iwlegacy 
snd_hda_codec kvm mac80211 snd_hda_core irqbypass yenta_socket snd_pcsp 
lpc_ich snd_hwdep thinkpad_acpi pcmcia_rsrc mfd_core serio_raw snd_pcm 
sg pcmcia_core nvram cfg80211 snd_timer rng_core snd rfkill battery 
soundcore shpchp evdev ac acpi_cpufreq parport_pc ppdev lp parport 
ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb cbc 
algif_skcipher af_alg dm_crypt dm_mod sr_mod cdrom sd_mod ata_generic 
ahci libahci sdhci_pci firewire_ohci ata_piix sdhci firewire_core libata 
e1000e i2c_i801 psmouse mmc_core crc_itu_t ptp scsi_mod i915 pps_core
00.831: [  575.948009]  ehci_pci video button uhci_hcd i2c_algo_bit 
ehci_hcd drm_kms_helper thermal usbcore syscopyarea sysfillrect 
sysimgblt fb_sys_fops drm
00.831: [  575.948009] CPU: 0 PID: 1126 Comm: kworker/u4:36 Not tainted 
4.13.0-rc6+ #110
00.831: [  575.948009] Hardware name: LENOVO 636338U/636338U, BIOS 
CBET4000 TIMELESS 01/01/1970

00.831: [  575.948009] Workqueue: events_unbound async_run_entry_fn
00.831: [  575.948009] task: f2ed8bc0 task.stack: f2ecc000
00.831: [  575.948009] EIP: blk_set_runtime_active+0x27/0x60
00.831: [  575.948009] EFLAGS: 00010046 CPU: 0
00.831: [  575.948009] EAX:  EBX: f5f3f820 ECX: f5f3f918 EDX: 
00010d7b
00.831: [  575.948009] ESI: f8ac3cc0 EDI: 0010 EBP: 0010 ESP: 
f2ecdea4

00.831: [  575.948009]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
00.831: [  575.948009] CR0: 80050033 CR2: 00f4 CR3: 0e3a9000 CR4: 
06d0

00.831: [  575.948009] Call Trace:
00.831: [  575.948009]  ? scsi_bus_resume_common+0x6e/0x110 [scsi_mod]
00.831: [  575.948009]  ? dpm_run_callback+0x4f/0x150
00.831: [  575.948009]  ? wait_for_completion+0x29/0x140
00.831: [  575.948009]  ? scsi_bus_thaw+0x10/0x10 [scsi_mod]
00.831: [  575.948009]  ? device_resume+0x8e/0x180
00.831: [  575.948009]  ? async_resume+0x1b/0x40
00.831: [  575.948009]  ? async_run_entry_fn+0x3f/0x1a0
00.831: [  575.948009]  ? process_one_work+0x136/0x310
00.831: [  575.948009]  ? worker_thread+0x39/0x3b0
00.831: [  575.948009]  ? kthread+0xd7/0x110
00.831: [  575.948009]  ? process_one_work+0x310/0x310
00.831: [  575.948009]  ? kthread_create_on_node+0x30/0x30
00.831: [  575.948009]  ? ret_from_fork+0x19/0x24
00.831: [  575.948009] Code: 8d 74 26 00 3e 8d 74 26 00 53 89 c3 8b 80 
fc 00 00 00 e8 2d 48 32 00 31 c0 8b 15 20 9e 24 ce 89 83 54 01 00 00 8b 
83 50 01 00 00 <89> 90 f4 00 00 00 ba 09 00 00 00 8b 83 50 01 00 00 e8 
f3 f2 16
00.831: [  575.948009] EIP: blk_set_runtime_active+0x27/0x60 SS:ESP: 
0068:f2ecdea4

00.831: [  575.948009] CR2: 00f4
00.831: [  575.948009] ---[ end trace b3f1ac10115418ab ]---
00.831: [  576.195662] pciehp :00:1c.0:pcie004: Timeout on hotplug 
command 0x1038 (issued 574920 msec ago)
00.831: [  576.204847] pciehp :00:1c.0:pcie004: Device :01:00.0 
already exists at :01:00, cannot hot-add
00.832: [  576.214460] pciehp :00:1c.0:pcie004: Cannot add device at 
:01:00
00.834: [  576.223117] atkbd serio0: Spurious ACK on isa0060/serio0. 
Some program might be trying to access hardware directly.

00.834: [  576.233968] ata1.00: configured for UDMA/33
00.927: [  576.328159] pciehp :00:1c.0:pcie004: Device :01:00.0 
already exists at :01:00, cannot hot-add
00.929: [  576.340348] pciehp :00:1c.0:pcie004: Cannot add device at 
:01:00
01.002: [  576.420139] usb 5-6: reset high-speed USB device number 2 
using ehci-pci
01.372: [  576.796072] firewire_core :05:00.1: rediscovered device 
fw0
03.010: [  578.440083] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 
300)

05.274: [  580.710027] ata3.00: ATA Identify Device Log not supported
05.276: [  580.718136] ata3.00: Security Log not supported
05.279: [  580.725856] ata3.00: ATA Identify Device Log not supported
05.282: [  580.733887] ata3.00: Security Log not supported
05.284: [  580.740838] ata3.00: configured for UDMA/100
```


Kind regards,

Paul


Re: [Regression 4.13-rc1] Resume does not work on Lenovo X60t

2017-08-18 Thread Paul Menzel

Dear Christoph,


On 08/06/17 20:06, Paul Menzel wrote:


On 2017-08-05 11:30, Christoph Hellwig wrote:

On Thu, Aug 03, 2017 at 07:42:15PM +0200, Paul Menzel wrote:


Since the merge windows opened for Linux 4.13, I am unable to resume 
from ACPI S3 suspend on a Lenovo X60t. The graphics comes back, but I am 
unable to enter anything, and the system seems to be hung. Magic SysRq keys 
still work though, but powering the system of doesn’t work. The power 
button also does not work.


Please find the stack trace with Linux 4.13-rc3 captured over the serial
console below.


Is this really -rc3?  rc3 has a commit to disable block runtime pm
for blk-mq, which is now the default for scsi.  So with -rc1 we've
seen similar reports, but rc3 would be odd and suggest we have further
problems.


Yes, this was 4.13-rc3. Rebuilding the Linux kernel from commit 0fdd951c 
(Merge tag 'media/v4.13-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media) shows 
the same behavior.


Just an update, that this is still present in Linux 4.13-rc5+, that 
means commit 04d49f3638d0 (Merge tag 'drm-fixes-for-v4.13-rc6' of 
git://people.freedesktop.org/~airlied/linux).



Kind regards,

Paul


Re: [Regression 4.13-rc1] Resume does not work on Lenovo X60t

2017-08-06 Thread Paul Menzel

Dear Christoph,


On 2017-08-05 11:30, Christoph Hellwig wrote:

On Thu, Aug 03, 2017 at 07:42:15PM +0200, Paul Menzel wrote:


Since the merge windows opened for Linux 4.13, I am unable to resume 
from
ACPI S3 suspend on a Lenovo X60t. The graphics comes back, but I am 
unable
to enter anything, and the system seems to be hung. Magic SysRq keys 
still
work though, but powering the system of doesn’t work. The power button 
also

does not work.

Please find the stack trace with Linux 4.13-rc3 captured over the 
serial

console below.


Is this really -rc3?  rc3 has a commit to disable block runtime pm
for blk-mq, which is now the default for scsi.  So with -rc1 we've
seen similar reports, but rc3 would be odd and suggest we have further
problems.


Yes, this was 4.13-rc3. Rebuilding the Linux kernel from commit 0fdd951c 
(Merge tag 'media/v4.13-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media) shows 
the same behavior.



Kind regards,

Paul



[Regression 4.13-rc1] Resume does not work on Lenovo X60t

2017-08-03 Thread Paul Menzel

Dear Linux folks,


Since the merge windows opened for Linux 4.13, I am unable to resume 
from ACPI S3 suspend on a Lenovo X60t. The graphics comes back, but I am 
unable to enter anything, and the system seems to be hung. Magic SysRq 
keys still work though, but powering the system of doesn’t work. The 
power button also does not work.


Please find the stack trace with Linux 4.13-rc3 captured over the serial 
console below.


> ```

46.417: [   58.148083] ata6: port disabled--ignoring
46.417: [   58.148243] BUG: unable to handle kernel NULL pointer dereference at 
00f4
46.417: [   58.148252] IP: blk_set_runtime_active+0x27/0x60
46.417: [   58.148253] *pde =  
46.417: [   58.148254] 
46.417: [   58.148256] Oops: 0002 [#1] SMP

46.418: [   58.148258] Modules linked in: cpufreq_powersave 
cpufreq_conservative cpufreq_userspace joydev wacom_w8001 serport binfmt_misc 
iTCO_wdt iTCO_vendor_support coretemp kvm snd_hda_codec_analog 
snd_hda_codec_generic arc4 irqbypass pcmcia snd_pcsp thinkpad_acpi serio_raw 
snd_hda_intel snd_hda_codec yenta_socket lpc_ich iwl3945 mfd_core pcmcia_rsrc 
snd_hda_core iwlegacy snd_hwdep pcmcia_core snd_pcm mac80211 sg rng_core nvram 
cfg80211 snd_timer snd soundcore rfkill evdev battery ac shpchp acpi_cpufreq 
parport_pc ppdev lp parport ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 
fscrypto ecb cbc algif_skcipher af_alg dm_crypt dm_mod sr_mod cdrom sd_mod 
ata_generic psmouse i915 i2c_i801 sdhci_pci ahci ata_piix ehci_pci uhci_hcd 
libahci firewire_ohci sdhci libata firewire_core ehci_hcd mmc_core e1000e 
crc_itu_t
46.416: [   58.148310]  scsi_mod ptp usbcore video pps_core button i2c_algo_bit 
drm_kms_helper thermal syscopyarea sysfillrect sysimgblt fb_sys_fops drm
46.416: [   58.148322] CPU: 0 PID: 808 Comm: kworker/u4:38 Not tainted 
4.13.0-rc3+ #94
46.416: [   58.148323] Hardware name: LENOVO 636338U/636338U, BIOS CBET4000 
TIMELESS 01/01/1970
46.416: [   58.148328] Workqueue: events_unbound async_run_entry_fn
46.416: [   58.148330] task: f2900180 task.stack: f2902000
46.416: [   58.148333] EIP: blk_set_runtime_active+0x27/0x60
46.416: [   58.148334] EFLAGS: 00010046 CPU: 0
46.416: [   58.148335] EAX:  EBX: f5f3c628 ECX: f5f3c720 EDX: 13c5
46.416: [   58.148337] ESI: f87a5cc0 EDI: 0010 EBP: 0010 ESP: f2903ea4
46.416: [   58.148338]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
46.416: [   58.148340] CR0: 80050033 CR2: 00f4 CR3: 363b4000 CR4: 06d0
46.416: [   58.148342] Call Trace:
46.416: [   58.148361]  ? scsi_bus_resume_common+0x6e/0x110 [scsi_mod]
46.416: [   58.148366]  ? dpm_run_callback+0x4f/0x150
46.416: [   58.148369]  ? wait_for_completion+0x29/0x140
46.416: [   58.148381]  ? scsi_bus_thaw+0x10/0x10 [scsi_mod]
46.416: [   58.148384]  ? device_resume+0x8e/0x180
46.416: [   58.148387]  ? async_resume+0x1b/0x40
46.416: [   58.148389]  ? async_run_entry_fn+0x3f/0x1a0
46.416: [   58.148392]  ? process_one_work+0x136/0x310
46.416: [   58.148394]  ? worker_thread+0x39/0x3b0
46.416: [   58.148396]  ? kthread+0xd7/0x110
46.416: [   58.148398]  ? process_one_work+0x310/0x310
46.416: [   58.148400]  ? kthread_create_on_node+0x30/0x30
46.416: [   58.148403]  ? ret_from_fork+0x19/0x24
46.416: [   58.148404] Code: 8d 74 26 00 3e 8d 74 26 00 53 89 c3 8b 80 fc 00 00 00 e8 
5d 43 32 00 31 c0 8b 15 20 7e 64 cf 89 83 54 01 00 00 8b 83 50 01 00 00 <89> 90 
f4 00 00 00 ba 09 00 00 00 8b 83 50 01 00 00 e8 e3 ef 16
46.416: [   58.148437] EIP: blk_set_runtime_active+0x27/0x60 SS:ESP: 
0068:f2903ea4
46.416: [   58.148438] CR2: 00f4
46.416: [   58.148441] ---[ end trace 529e3022b2906e41 ]---

> ```

Please find the full log attached. I don’t know, why the Linux kernel 
messages in the beginning are transferred in the wrong baud rate.



Kind regards,

Paul


=== Thu Aug  3 08:52:39 2017 (adjust=1041.7us)
00.000: <1a>F0H<8c>:<8b>ed<08>4
00.012: <8a><0f>v>74I
00.013: u<16><18><15>><0b><0f>X6T
00.015: CA<84><96><02>(<84><18>R4<18><9a>:<9c>X<05><07>d<16><08><92><91>d<1e>>nM*<87>B@<13><1e><81><83><1e><0c><86><12><95>`<80>Z1<89><99><<00>
00.321: <0e><8c>(B<8d><0c>R<14>H)RX)<8a><81><06><06>42JV.<02><90>uR<15>h<94><92>5<09><95><1a>;<96>26<85>:yZ4@<94><18><96>i<83>NXi<02><95>r<88><17>S<16>(E"pl<82>
00.514: <16><96>4,<1a>:+<80><9e>8<0c>P<19><12>xB<86><05><1c>*<18>)xb<02>.'$&-
00.539: <08>:\(<95><12><02><0e><16>i.<16>r"F<08>*<02>%:#<02>F,"<01>p<<8c><1d>Z<92><04><8d><9b>p<85><81><12>$<1a>#<0f><89>l<04><90>l,@<99>T<04>
00.602: <14><85>12<86><86>64z<12>*
00.622: p:<08><0e><80><84>
00.634: P<04>!x$<80><8a>HD
00.641: D*<16>>|<1a><10>dl<86>`<06><16><05><9e>e<08>6%<02>D|*<00>
00.659: <16>2<90><82>\<8c>iF<93><1e><05>K<16>4(]<04>
00.674: HL<1e>d<8a>%<85>$GP<86>>q<82><8c><0e>8|<80>"<86>"n<08>
00.685: <99><86><07>"
00.686: <89><19>yo<90><8e>tYd<16>-<07><01>,M<9c><89><9c>M-9<12>D<94><8a>A<<04><89>F>[%.<89>
00.725: Tf-^p<8e>p<85><91><94><04>
00.725: 

Driver version for PMC Adaptec HBA in Linux and from vendor

2017-02-17 Thread Paul Menzel

Dear Raghava, dear Linux folks,


Using a PMC Adaptec HBA 1000-8e with latest Linux, it only initializes 
in sync mode, instead of async mode.


```
$ git describe --tag
v4.10-rc8-47-g0722f57bf
$ dmesg
[   21.359635] Adaptec aacraid driver 1.2-1[41066]-ms
[   21.360017] aacraid :04:00.0: can't disable ASPM; OS doesn't have 
ASPM control
[   21.363987] AAC0: Async. mode not supported by current driver, sync. 
mode enforced.

[   21.363987] Please update driver to get full performance.
[   21.364949] AAC0: kernel 1.2-0[0] Nov  5 2015
[   21.365275] AAC0: monitor 0.0-0[0]
[   21.371382] AAC0: bios 0.13-209[32000]
[   21.371711] AAC0: serial 10F447
[   21.372035] AAC0: Non-DASD support enabled.
[   21.372360] AAC0: 64bit support enabled.
[   21.372688] AAC0: 64 Bit DAC enabled
[…]
$ git grep 'AAC_DRIVER_BUILD 41066'
drivers/scsi/aacraid/aacraid.h:# define AAC_DRIVER_BUILD 41066
```

Searching the vendor Web site, there is *Linux Driver Source 
1.2.1-53005* available for download [1].


How does the upstream process work? Is there a git repository somewhere 
from Microsemi? Are the patches already up for review? (I didn’t find them.)


The answers would be very helpful for our evaluation of the device.


Kind regards,

Paul


[1] 
https://storage.microsemi.com/en-us/speed/raid/aac/linux/aacraid-linux-src-1_2_1-53005_tgz.php


Re: Ordering problems with 3ware controller

2016-11-17 Thread Paul Menzel

Dear Linux folks,


On 11/16/16 22:24, Donald Buczek wrote:

On 10.11.2016 14:59, Martin K. Petersen wrote:

"Paul" == Paul Menzel <pmen...@molgen.mpg.de> writes:



Linux does not provide device discovery ordering guarantees. You need
to fix your scripts to use UUIDs, filesystem labels, or DM devices to
get stable naming.

Paul> Indeed. But it worked for several years, so that *something* must
Paul> have changed that the ordering of the result of `getdents64` is
Paul> different now.

Could be changes in the PCI or platform code that causes things to be
enumerated differently. Whatever it is, it has nothing to do with the
3ware drivers themselves since they have been dormant for a long time.



Right. We further tracked it down. In fact its not a matter of driver
initialization order but of the way sysfs/kernfs hashes its object names
and thereby defines the order of names returned by getdents64 calls. In
fs/kernfs/dir.h the names are inserted into a red-black tree ordered by
the hashes over their names (and possibly namespace pointer, which in
our case is zero).

I've walked the rbtrees of the kernfs_node structs from
/sys/class/scsi_host showing their addresses, the hash values and the
names in a 4.4.27 system:

root:cu:/home/buczek/autofs/# ./peek-3w

88046d847640 : 11bf1ddd : host0
88046c56d3e8 : 11bf1e8d : host1
88046c571c58 : 11bf1f3d : host2
88046c572550 : 11bf1fed : host3
88046c577dc0 : 11bf209d : host4
88046a4bbaf0 : 11bf214d : host5

As can be seen, in 4.4 the hash algorithm happened to produce increasing
hash values for names like "host0","host1","host2",... In 4.8.6 the hash
values seem to be more random:

root:gynaekophobie:/home/buczek/autofs/# ./peek-3w

88041df9a7f8 : 074af64b : host0
88081db40528 : 1009cd9b : host9
88041d3fba50 : 1c512bfb : host7
88181d19c000 : 28988a5b : host5
88041df5a780 : 34dfe8bb : host3
88041d3f5e10 : 4127471b : host1
88041ccbd258 : 562d7ccb : host8
88201cd5f960 : 6274db2b : host6
88141e2d0ca8 : 6ebc398b : host4
88041df599d8 : 7b0397eb : host2

The relevant commit is 703b5fa  which includes


The commit message summary is *fs/dcache.c: Save one 32-bit multiply in 
dcache lookup*.



 static inline unsigned long end_name_hash(unsigned long hash)
 {
-   return (unsigned int)hash;
+   return __hash_32((unsigned int)hash);
 }

__hash_32 is a multiplication by 0x61C88647 ( hash.h )

And this exactly is the difference between the hash value of "host0" on
the 4.4 and the 4.8 system:

  DB<2> x sprintf '%x',0x11bf1ddd*0x61C88647
0  '6c750ef074af64b'

The bug, of course, is in the userspace tool tw_cli which wrongly
assumes that the names would be returned in the "right" order by getdents.


Nice analysis.

Unfortunately, I don’t find the discussion of the patch on the Linux 
kernel mailing list.


Searching for the summary only brings up *screen rotation flipped in 
4.8-rc* [1].



As a dirty workaround, I've created a new wrapper, which uses ptrace to
pause the program on return from SYS_getdents64 and sorts the values
returned from the system call in the memory of the target process. >



I append the source of the wrapper.



Kind regards,

Paul


[1] https://lkml.org/lkml/2016/8/30/739
"screen rotation flipped in 4.8-rc"
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Delivery Status Notification for linuxr...@lsi.com

2016-11-10 Thread Paul Menzel

Dear Martin,


On 11/10/16 15:07, Martin K. Petersen wrote:

"Paul" == Paul Menzel <pmen...@molgen.mpg.de> writes:



Paul> Probably you know it already, but the listed email address of the
Paul> 3WARE SCSI drivers maintainer linuxr...@lsi.com doesn’t work (for
Paul> me).

Ownership of these products is now with Broadcom. To my knowledge the
3ware product lines have been discontinued.


Indeed. I forgot to actually formulate the intend of my message.

What should happen to the entry in the file `MAINTAINERS`?


Kind regards,

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Delivery Status Notification for linuxr...@lsi.com

2016-11-10 Thread Paul Menzel

Dear Linux folks,


Probably you know it already, but the listed email address of the 3WARE 
SCSI drivers maintainer linuxr...@lsi.com doesn’t work (for me).


Please see the attached message.


Kind regards,

Paul
--- Begin Message ---
This is an automatically generated Delivery Status Notification

THIS IS A WARNING MESSAGE ONLY.

YOU DO NOT NEED TO RESEND YOUR MESSAGE.

Delivery to the following recipient has been delayed:

 linuxr...@lsi.com

Message will be retried for 5 more day(s)

Technical details of temporary failure: 
The recipient server did not accept our requests to connect. Learn more at 
https://support.google.com/mail/answer/7720 
[192.19.192.224 192.19.192.224: timed out]

- Original message -

X-Gm-Message-State: 
ABUngveFBksx92G0BZX5qMdBuCDHDG4xuI0c1GPn8OQmdZmNS3ZMR9/TFIpcevk2OOorMUNMld3vQugDvWxMGFOcWZveSLRhDyWLWqRReAKzrCIwHLeIr+9x3z44bqKAnr2A3oQ=
X-Received: by 10.28.209.67 with SMTP id i64mr12975034wmg.48.1478599659285;
Tue, 08 Nov 2016 02:07:39 -0800 (PST)
X-Received: by 10.28.209.67 with SMTP id i64mr12975009wmg.48.1478599659097;
Tue, 08 Nov 2016 02:07:39 -0800 (PST)
Return-Path: <pmen...@molgen.mpg.de>
Received: from mx1.molgen.mpg.de (mx1.molgen.mpg.de. [141.14.17.9])
by mx.google.com with ESMTPS id w203si15677261wmg.41.2016.11.08.02.07.38
for <linuxr...@lsi.com>
(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
Tue, 08 Nov 2016 02:07:38 -0800 (PST)
Received-SPF: pass (google.com: domain of pmen...@molgen.mpg.de designates 
141.14.17.9 as permitted sender) client-ip=141.14.17.9;
Authentication-Results: mx.google.com;
   spf=pass (google.com: domain of pmen...@molgen.mpg.de designates 
141.14.17.9 as permitted sender) smtp.mailfrom=pmen...@molgen.mpg.de
Received: from keineahnung.molgen.mpg.de (keineahnung.molgen.mpg.de 
[141.14.17.193])
(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
(No client certificate requested)
(Authenticated sender: pmenzel)
by mx.molgen.mpg.de (Postfix) with ESMTPSA id 08BAC20128247D;
Tue,  8 Nov 2016 11:07:38 +0100 (CET)
To: linux-scsi@vger.kernel.org
Cc: Adam Radford <linuxr...@lsi.com>
From: Paul Menzel <pmen...@molgen.mpg.de>
Subject: Ordering problems with 3ware controller
Message-ID: <a41b4bab-edb7-34ab-eb76-7ff4d6e3f...@molgen.mpg.de>
Date: Tue, 8 Nov 2016 11:07:37 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Gm-Spam: 0
X-Gm-Phishy: 0

Dear Linux SCSI folks,


Updating from Linux 4.4.X to Linux 4.8.4, we noticed that the 3ware 
devices under `/dev` – `/dev/twa0`, `/dev/twa1`, … – map to the 
controllers differently.

This unfortunately breaks quite a lot of our scripts, as we depend on 
the fact that the first controller is also in the front.

> $ dmesg | grep 3ware
> [   14.509238] 3ware 9000 Storage Controller device driver for Linux 
> v2.26.02.014.
> [   14.824274] scsi host8: 3ware 9000 Storage Controller
> [   14.824537] 3w-9xxx: scsi8: Found a 3ware 9000 Storage Controller at 
> 0xd020, IRQ: 17.
> [   15.508310] scsi host9: 3ware 9000 Storage Controller
> [   15.508569] 3w-9xxx: scsi9: Found a 3ware 9000 Storage Controller at 
> 0xda10, IRQ: 17.

Tracing `twi_cli` it looks like the ordering of the devices in 
`/sys/class/scsi_host` might have changed, as `getdents64` seems to be 
used for the ordering of creating `/dev/twaX`.

> $ find /sys/class/scsi_host/ -ls
>  6033  0 drwxr-xr-x   2  root system  0 Nov  8 10:58 
> /sys/class/scsi_host/
> 23125  0 lrwxrwxrwx   1  root system  0 Oct 27 17:41 
> /sys/class/scsi_host/host0 -> 
> ../../devices/pci:00/:00:07.0/ata1/host0/scsi_host/host0
> 29893  0 lrwxrwxrwx   1  root system  0 Oct 27 18:03 
> /sys/class/scsi_host/host9 -> 
> ../../devices/pci:80/:80:0e.0/:90:00.0/host9/scsi_host/host9
> 23878  0 lrwxrwxrwx   1  root system  0 Oct 27 17:41 
> /sys/class/scsi_host/host7 -> 
> ../../devices/pci:80/:80:08.0/ata8/host7/scsi_host/host7
> 23640  0 lrwxrwxrwx   1  root system  0 Oct 27 17:41 
> /sys/class/scsi_host/host5 -> 
> ../../devices/pci:80/:80:07.0/ata6/host5/scsi_host/host5
> 23402  0 lrwxrwxrwx   1  root system  0 Oct 27 17:41 
> /sys/class/scsi_host/host3 -> 
> ../../devices/pci:00/:00:08.0/ata4/host3/scsi_host/host3
> 23164  0 lrwxrwxrwx   1  root system  0 Oct 27 17:41 
> /sys/class/scsi_host/host1 -> ../.

- Message truncated -

--- End Message ---


Re: Ordering problems with 3ware controller

2016-11-09 Thread Paul Menzel

Dear Martin,


On 11/09/16 00:45, Martin K. Petersen wrote:

"Paul" == Paul Menzel <pmen...@molgen.mpg.de> writes:



Paul> Updating from Linux 4.4.X to Linux 4.8.4, we noticed that the
Paul> 3ware devices under `/dev` – `/dev/twa0`, `/dev/twa1`, … – map to
Paul> the controllers differently.

Paul> This unfortunately breaks quite a lot of our scripts, as we depend
Paul> on the fact that the first controller is also in the front.

It's not the 3ware drivers since they have not been updated in a long
time (since way before 4.4).


Yes, that’s what made me wonder too.


Linux does not provide device discovery ordering guarantees. You need to
fix your scripts to use UUIDs, filesystem labels, or DM devices to get
stable naming.


Indeed. But it worked for several years, so that *something* must have 
changed that the ordering of the result of `getdents64` is different now.


Fixing the scripts is unfortunately not that easy, as `tw_cli` is a 
proprietary tool [1], and we do not have the sources. It does a `readdir()`.



open("/proc/scsi/3w-9xxx", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = -1 ENOENT (No 
such file or directory)
open("/sys/class/scsi_host", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
getdents64(3, /* 12 entries */, 4096)   = 368
stat("/sys/class/scsi_host/host0/stats", 0x7fffafd05290) = -1 ENOENT (No such 
file or directory)
stat("/sys/class/scsi_host/host9/stats", {st_mode=S_IFREG|0444, st_size=4096, 
...}) = 0
open("/sys/class/scsi_host/host9/stats", O_RDONLY) = 4
read(4, "3w-9xxx Driver v", 16) = 16
close(4)= 0
open("/dev/twa0", O_RDWR)   = 4
close(4)= 0
stat("/sys/class/scsi_host/host7/stats", 0x7fffafd05290) = -1 ENOENT (No such 
file or directory)
stat("/sys/class/scsi_host/host5/stats", 0x7fffafd05290) = -1 ENOENT (No such 
file or directory)
stat("/sys/class/scsi_host/host3/stats", 0x7fffafd05290) = -1 ENOENT (No such 
file or directory)
stat("/sys/class/scsi_host/host1/stats", 0x7fffafd05290) = -1 ENOENT (No such 
file or directory)
stat("/sys/class/scsi_host/host8/stats", {st_mode=S_IFREG|0444, st_size=4096, 
...}) = 0
open("/sys/class/scsi_host/host8/stats", O_RDONLY) = 4
read(4, "3w-9xxx Driver v", 16) = 16
close(4)= 0
open("/dev/twa1", O_RDWR)   = 4
close(4)= 0
stat("/sys/class/scsi_host/host6/stats", 0x7fffafd05290) = -1 ENOENT (No such 
file or directory)
stat("/sys/class/scsi_host/host4/stats", 0x7fffafd05290) = -1 ENOENT (No such 
file or directory)
stat("/sys/class/scsi_host/host2/stats", 0x7fffafd05290) = -1 ENOENT (No such 
file or directory)
getdents64(3, /* 0 entries */, 4096)= 0
close(3)= 0
open("/proc/devices", O_RDONLY) = 3


Please find attached a wrapper from my colleague, using name spaces to 
ensure the ordering, that `tw_cli` expects.



Kind regards,

Paul


[1] https://wiki.hetzner.de/index.php/3Ware_RAID_Controller/en
#! /usr/bin/perl
use strict;
use warnings;

sub sort_host {
my ($n1,$n2);
($n1)=$a=~/^host(\d+)$/ and ($n2)=$b=~/^host(\d+)$/ and return $n1 <=> 
$n2;
return $a cmp $b;
}


our $SYS_unshare=272;  # /usr/include/asm/unistd_64.h
our $CLONE_NEWNS=0x2;  # /usr/include/linux/sched.h

my $pid=fork;
defined $pid or die "$!\n";
unless ($pid) {
opendir my $d,"/sys/class/scsi_host";
my @names=sort sort_host grep !/^\.\.?$/,readdir $d;

syscall($SYS_unshare,$CLONE_NEWNS) and die "$!\n";
-d '/tmp/sysfs' or mkdir("/tmp/sysfs") or die "/tmp/sysfs: $!\n";
system 'mount','-tsysfs','BLA','/tmp/sysfs' and exit 1;
system 'mount','-ttmpfs','BLA','/sys/class/scsi_host' and exit 1;

for my $name (reverse @names) {

symlink("/tmp/sysfs/class/scsi_host/$name","/sys/class/scsi_host/$name") or die 
"/sys/class/scsi_host/$name: $!\n";
}
exec '/root/bin/tw_cli.exe',@ARGV;
die "$!\n";
}
wait;
$? and exit 1;


Re: Ordering problems with 3ware controller

2016-11-08 Thread Paul Menzel

Dear Linux SCSI folks,


On 11/08/16 11:07, Paul Menzel wrote:


Updating from Linux 4.4.X to Linux 4.8.4, we noticed that the 3ware
devices under `/dev` – `/dev/twa0`, `/dev/twa1`, … – map to the
controllers differently.

This unfortunately breaks quite a lot of our scripts, as we depend on
the fact that the first controller is also in the front.


$ dmesg | grep 3ware
[   14.509238] 3ware 9000 Storage Controller device driver for Linux
v2.26.02.014.
[   14.824274] scsi host8: 3ware 9000 Storage Controller
[   14.824537] 3w-9xxx: scsi8: Found a 3ware 9000 Storage Controller
at 0xd020, IRQ: 17.
[   15.508310] scsi host9: 3ware 9000 Storage Controller
[   15.508569] 3w-9xxx: scsi9: Found a 3ware 9000 Storage Controller
at 0xda10, IRQ: 17.


Tracing `twi_cli` it looks like the ordering of the devices in
`/sys/class/scsi_host` might have changed, as `getdents64` seems to be
used for the ordering of creating `/dev/twaX`.


$ find /sys/class/scsi_host/ -ls
 6033  0 drwxr-xr-x   2  root system  0 Nov  8
10:58 /sys/class/scsi_host/
23125  0 lrwxrwxrwx   1  root system  0 Oct 27
17:41 /sys/class/scsi_host/host0 ->
../../devices/pci:00/:00:07.0/ata1/host0/scsi_host/host0
29893  0 lrwxrwxrwx   1  root system  0 Oct 27
18:03 /sys/class/scsi_host/host9 ->
../../devices/pci:80/:80:0e.0/:90:00.0/host9/scsi_host/host9
23878  0 lrwxrwxrwx   1  root system  0 Oct 27
17:41 /sys/class/scsi_host/host7 ->
../../devices/pci:80/:80:08.0/ata8/host7/scsi_host/host7
23640  0 lrwxrwxrwx   1  root system  0 Oct 27
17:41 /sys/class/scsi_host/host5 ->
../../devices/pci:80/:80:07.0/ata6/host5/scsi_host/host5
23402  0 lrwxrwxrwx   1  root system  0 Oct 27
17:41 /sys/class/scsi_host/host3 ->
../../devices/pci:00/:00:08.0/ata4/host3/scsi_host/host3
23164  0 lrwxrwxrwx   1  root system  0 Oct 27
17:41 /sys/class/scsi_host/host1 ->
../../devices/pci:00/:00:07.0/ata2/host1/scsi_host/host1
29851  0 lrwxrwxrwx   1  root system  0 Oct 27
18:03 /sys/class/scsi_host/host8 ->
../../devices/pci:00/:00:0e.0/:05:00.0/host8/scsi_host/host8
23839  0 lrwxrwxrwx   1  root system  0 Oct 27
17:41 /sys/class/scsi_host/host6 ->
../../devices/pci:80/:80:08.0/ata7/host6/scsi_host/host6
23601  0 lrwxrwxrwx   1  root system  0 Oct 27
17:41 /sys/class/scsi_host/host4 ->
../../devices/pci:80/:80:07.0/ata5/host4/scsi_host/host4
23363  0 lrwxrwxrwx   1  root system  0 Oct 27
17:41 /sys/class/scsi_host/host2 ->
../../devices/pci:00/:00:08.0/ata3/host2/scsi_host/host2
$ sudo -i tw_cli show

Ctl   Model(V)Ports  Drives   Units   NotOpt  RRate   VRate  BBU

c89650SE-8LPML 8 81   0   5   1  OK
c99690SA-8E0 00   0   5   1  OK

Enclosure Slots  Drives  Fans  TSUnits  PSUnits  Alarms
--
/c9/e016 0   3 121


So in this case `c8` is mapped to `/dev/twa1`, and `c9` to `/dev/twa0`.

As we do not know of a way, to use `tw_cli` to find the correct mapping,
or another place, we rely on the implicit ordering, which – according to
my colleagues – has worked for over 15 years [1].


Here is the excerpt from the manual page for smartctl [2].

> --- end of manual page excerpt ---

3ware,N  -  [FreeBSD  and Linux only] the device consists of one or more ATA 
disks con‐
nected to a 3ware RAID controller.  The non-negative integer N (in the range 
from 0  to
127 inclusive) denotes which disk on the controller is monitored.  Use syntax 
such as:
smartctl -a -d 3ware,2 /dev/sda
smartctl -a -d 3ware,0 /dev/twe0
smartctl -a -d 3ware,1 /dev/twa0
smartctl -a -d 3ware,1 /dev/twl0
The  first  two  forms, which refer to devices /dev/sda-z and /dev/twe0-15, may 
be used
with 3ware series 6000, 7000, and 8000 series controllers that use the 3x-  
driver.
Note  that  the /dev/sda-z form is deprecated starting with the Linux 2.6 
kernel series
and may not be supported by the Linux kernel in the near future.  The final 
form, which
refers  to devices /dev/twa0-15, must be used with 3ware 9000 series 
controllers, which
use the 3w-9xxx driver.

The devices /dev/twl0-15 must be used with the 3ware/LSI 9750 series 
controllers  which
use the 3w-sas driver.

Note  that if the special character device nodes /dev/twl?, /dev/twa?  and 
/dev/twe? do
not exist, or exist with the incorrect major or minor numbers, smartctl  will  
recreate
them  on  the  fly.   Typically  /dev/twa0  refers to the first 9000-series 
controller,
/dev/twa1 refers to the second 9000  series  controller,  and  so  on.   The  
/dev/twl

Ordering problems with 3ware controller

2016-11-08 Thread Paul Menzel

Dear Linux SCSI folks,


Updating from Linux 4.4.X to Linux 4.8.4, we noticed that the 3ware 
devices under `/dev` – `/dev/twa0`, `/dev/twa1`, … – map to the 
controllers differently.


This unfortunately breaks quite a lot of our scripts, as we depend on 
the fact that the first controller is also in the front.



$ dmesg | grep 3ware
[   14.509238] 3ware 9000 Storage Controller device driver for Linux 
v2.26.02.014.
[   14.824274] scsi host8: 3ware 9000 Storage Controller
[   14.824537] 3w-9xxx: scsi8: Found a 3ware 9000 Storage Controller at 
0xd020, IRQ: 17.
[   15.508310] scsi host9: 3ware 9000 Storage Controller
[   15.508569] 3w-9xxx: scsi9: Found a 3ware 9000 Storage Controller at 
0xda10, IRQ: 17.


Tracing `twi_cli` it looks like the ordering of the devices in 
`/sys/class/scsi_host` might have changed, as `getdents64` seems to be 
used for the ordering of creating `/dev/twaX`.



$ find /sys/class/scsi_host/ -ls
 6033  0 drwxr-xr-x   2  root system  0 Nov  8 10:58 
/sys/class/scsi_host/
23125  0 lrwxrwxrwx   1  root system  0 Oct 27 17:41 
/sys/class/scsi_host/host0 -> 
../../devices/pci:00/:00:07.0/ata1/host0/scsi_host/host0
29893  0 lrwxrwxrwx   1  root system  0 Oct 27 18:03 
/sys/class/scsi_host/host9 -> 
../../devices/pci:80/:80:0e.0/:90:00.0/host9/scsi_host/host9
23878  0 lrwxrwxrwx   1  root system  0 Oct 27 17:41 
/sys/class/scsi_host/host7 -> 
../../devices/pci:80/:80:08.0/ata8/host7/scsi_host/host7
23640  0 lrwxrwxrwx   1  root system  0 Oct 27 17:41 
/sys/class/scsi_host/host5 -> 
../../devices/pci:80/:80:07.0/ata6/host5/scsi_host/host5
23402  0 lrwxrwxrwx   1  root system  0 Oct 27 17:41 
/sys/class/scsi_host/host3 -> 
../../devices/pci:00/:00:08.0/ata4/host3/scsi_host/host3
23164  0 lrwxrwxrwx   1  root system  0 Oct 27 17:41 
/sys/class/scsi_host/host1 -> 
../../devices/pci:00/:00:07.0/ata2/host1/scsi_host/host1
29851  0 lrwxrwxrwx   1  root system  0 Oct 27 18:03 
/sys/class/scsi_host/host8 -> 
../../devices/pci:00/:00:0e.0/:05:00.0/host8/scsi_host/host8
23839  0 lrwxrwxrwx   1  root system  0 Oct 27 17:41 
/sys/class/scsi_host/host6 -> 
../../devices/pci:80/:80:08.0/ata7/host6/scsi_host/host6
23601  0 lrwxrwxrwx   1  root system  0 Oct 27 17:41 
/sys/class/scsi_host/host4 -> 
../../devices/pci:80/:80:07.0/ata5/host4/scsi_host/host4
23363  0 lrwxrwxrwx   1  root system  0 Oct 27 17:41 
/sys/class/scsi_host/host2 -> 
../../devices/pci:00/:00:08.0/ata3/host2/scsi_host/host2
$ sudo -i tw_cli show

Ctl   Model(V)Ports  Drives   Units   NotOpt  RRate   VRate  BBU

c89650SE-8LPML 8 81   0   5   1  OK
c99690SA-8E0 00   0   5   1  OK

Enclosure Slots  Drives  Fans  TSUnits  PSUnits  Alarms
--
/c9/e016 0   3 121


So in this case `c8` is mapped to `/dev/twa1`, and `c9` to `/dev/twa0`.

As we do not know of a way, to use `tw_cli` to find the correct mapping, 
or another place, we rely on the implicit ordering, which – according to 
my colleagues – has worked for over 15 years [1].


Do you know of a way, to either get the mapping “over an API” so we 
don’t have to rely on the implicit ordering?


Otherwise, do you know, why the ordering has changed, and can this be 
reverted?



Kind regards,

Paul Menzel


[1] 
https://www.thomas-krenn.com/de/wiki/Smartmontools_mit_3ware_RAID_Controller

(German)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NULL pointer dereference: IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]

2015-10-31 Thread Paul Menzel
Control: notfound -1 3.19-1~exp1
Control: found -1 4.2.5-1


Am Dienstag, den 20.10.2015, 02:39 +0100 schrieb Ben Hutchings:
> On Fri, 2015-10-16 at 09:54 +0200, Paul Menzel wrote:
> [...]
> > > BUG: unable to handle kernel NULL pointer dereference at 0014
> > > IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]
> > > *pdpt = 3696e001 *pde = 00
> > > Oops:  [#1] SMB
> > > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci 
> > > libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata 
> > > scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal 
> > > thermal_sys floppy(+)
> > > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 
> > > 4.2.3-1
> > > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009
> > > task: f68dd040 ti: f6988000 task.ti: f6988000
> > > EIP: 0060:[] EFLAGS: 00010246 CPU: 1
> > > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod]
> > > EAX:  EBX: f6a30cd8 ECX: f6c03d2c EDX: 
> > > ESI:  EDI: f828e100 EBP: f6989ba8 ESP: f6989b88
> > >  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > > CR0: 8005003b CR2: 0014 CR3: 3696d780 CR4: 06f0
> > > Stack:
> > >  af83346c3  0001 fff5 f6a7d150 f6a30cd8 f6a30d3c 
> > >  f6989bbc c1390cb7 f6a30cd8 f8334660  f6989bd0 c1390d0f f6a30cd8
> > >  f8334660  f6989c0c c13916cb f694a614 f68dd040  0008
> > > Call Trace:
> > >  […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod]
> > >  […] ? __rpm_callback+0x27/0x60
> > > […]
> [...]
> > Ben Hutchings asked me to test the patch below to get more debug
> > information.
> [...]
> 
> Well, that didn't help much.  Paul hit another oops, this time in
> sd_mod but again apparently related to runtime PM.  My patch only
> touched sr_mod.
> 
> This time he sent photos of the complete oops; see
> <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801925;filename=20151020_005.jpg;att=4;msg=15>
> and
> <https://bugs.debian.org/cgi-bin/bugreport.cgi?filename=20151020_006.jpg;bug=801925;att=3;msg=15>

after backing up my data, I tested a little bit more, and using Linux
3.19 the drive is detected and the system boots.

Does anything stand out what changed in this area between Linux 3.19 and
4.1?


Thanks

Paul
-- 
go~mus | Besuchermanagement

▶ 18. – 20. November 2015 // Messe Köln – Stand D054

Besuchen Sie uns auf der EXPONATEC und lernen Sie die Software für
Besuchermanagement kennen, die von führenden Museumsverbänden in Europa
eingesetzt wird.

Mehr Infos über go~mus finden Sie unter https://www.gomus.de

~

GPG-Schlüssel: 33623E9B
Fingerabdruck = 0EB1 649D 4361 D04F 3C70  6F71 4DD7 BF75 3362 3E9B

Giant Monkey Software Engineering GmbH

Brunnenstr. 7D
10119 Berlin Mitte

Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel

USt-IdNr.: DE281524720
HRB 139495 B Amtsgericht Charlottenburg


signature.asc
Description: This is a digitally signed message part


Re: NULL pointer dereference: IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]

2015-10-16 Thread Paul Menzel
Package: linux-image-4.2.0-1-686-pae
Version: 4.2.3-2
Severity: important


Dear Linux SCSI folks,


please don’t include the address sub...@bugs.debian.org in your reply.


Am Freitag, den 16.10.2015, 03:05 +0200 schrieb Paul Menzel:

> using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd
> 227-1 to 227-2 [1] and other packages, the system doesn’t start up
> anymore and the /dev/md1 device doesn’t seem to be found and I am
> dropped into shell from initramfs (BusyBox).
> 
> Only having wireless LAN and no serial or USB debug capabilities, and
> mount a USB storage device did not work, I manually copied the beginning
> of the Oops.
> 
> ```
> BUG: unable to handle kernel NULL pointer dereference at 0014
> IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]
> *pdpt = 3696e001 *pde = 00
> Oops:  [#1] SMB
> Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci 
> libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata 
> scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal 
> thermal_sys floppy(+)
> CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 
> 4.2.3-1
> Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009
> task: f68dd040 ti: f6988000 task.ti: f6988000
> EIP: 0060:[] EFLAGS: 00010246 CPU: 1
> EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod]
> EAX:  EBX: f6a30cd8 ECX: f6c03d2c EDX: 
> ESI:  EDI: f828e100 EBP: f6989ba8 ESP: f6989b88
>  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> CR0: 8005003b CR2: 0014 CR3: 3696d780 CR4: 06f0
> Stack:
>  af83346c3  0001 fff5 f6a7d150 f6a30cd8 f6a30d3c 
>  f6989bbc c1390cb7 f6a30cd8 f8334660  f6989bd0 c1390d0f f6a30cd8
>  f8334660  f6989c0c c13916cb f694a614 f68dd040  0008
> Call Trace:
>  […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod]
>  […] ? __rpm_callback+0x27/0x60
> […]
> ```
> 
> I tried also to boot with Linux 4.1 and it fails the same way.
> 
> Is that a known problem and has been fixed in the mean time? It’d be
> great if you helped me getting the system to boot again. Please tell me
> if you need more information to debug this issue and I’ll do my best to
> get it.

Ben Hutchings asked me to test the patch below to get more debug
information.

```
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 8bd54a6..dd5b5b2 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -144,6 +144,12 @@ static int sr_runtime_suspend(struct device *dev)
 {
struct scsi_cd *cd = dev_get_drvdata(dev);
 
+   if (WARN_ON(!cd)) {
+   pr_info("%s: cd == NULL; power.usage_count = %d\n",
+   __func__, atomic_read(>power.usage_count));
+   return 0;
+   }
+
if (cd->media_present)
return -EBUSY;
else
@@ -652,7 +658,13 @@ static int sr_probe(struct device *dev)
struct scsi_cd *cd;
int minor, error;
 
-   scsi_autopm_get_device(sdev);
+   error = scsi_autopm_get_device(sdev);
+   if (error) {
+   pr_err("%s: scsi_autopm_get_device returned %d\n",
+  __func__, error);
+   return error;
+   }
+
error = -ENODEV;
if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM)
goto fail;
@@ -719,6 +731,9 @@ static int sr_probe(struct device *dev)
if (register_cdrom(>cdi))
goto fail_put;
 
+   pr_info("%s: power.usage_count = %d\n",
+   __func__, atomic_read(>power.usage_count));
+
/*
 * Initialize block layer runtime PM stuffs before the
 * periodic event checking request gets started in add_disk.
```

I’ll try that as soon as a spare drive has arrived, where I can copy the
data to as a backup.

More thoughts are welcome! Especially, if that error suggests a failing
drive or not.


Thanks,

Paul


> [1] 
> http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog
-- 
GPG-Schlüssel: 33623E9B
Fingerabdruck = 0EB1 649D 4361 D04F 3C70  6F71 4DD7 BF75 3362 3E9B

Giant Monkey Software Engineering GmbH

Brunnenstr. 7D
10119 Berlin Mitte

Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel

USt-IdNr.: DE281524720
HRB 139495 B Amtsgericht Charlottenburg


signature.asc
Description: This is a digitally signed message part


Re: NULL pointer dereference: IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]

2015-10-16 Thread Paul Menzel
Dear Linux SCSI folks,


Am Freitag, den 16.10.2015, 09:54 +0200 schrieb Paul Menzel:
> Package: linux-image-4.2.0-1-686-pae
> Version: 4.2.3-2
> Severity: important

> please don’t include the address sub...@bugs.debian.org in your reply.

this issue is now also tracked in the Debian Bug Tracking System [2] and
has the number #801925 [3]. Please keep that address in CC.

> Am Freitag, den 16.10.2015, 03:05 +0200 schrieb Paul Menzel:
> 
> > using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd
> > 227-1 to 227-2 [1] and other packages, the system doesn’t start up
> > anymore and the /dev/md1 device doesn’t seem to be found and I am
> > dropped into shell from initramfs (BusyBox).
> > 
> > Only having wireless LAN and no serial or USB debug capabilities, and
> > mount a USB storage device did not work, I manually copied the beginning
> > of the Oops.
> > 
> > ```
> > BUG: unable to handle kernel NULL pointer dereference at 0014
> > IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]
> > *pdpt = 3696e001 *pde = 00
> > Oops:  [#1] SMB
> > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci 
> > libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata 
> > scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal 
> > thermal_sys floppy(+)
> > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 
> > 4.2.3-1
> > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009
> > task: f68dd040 ti: f6988000 task.ti: f6988000
> > EIP: 0060:[] EFLAGS: 00010246 CPU: 1
> > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod]
> > EAX:  EBX: f6a30cd8 ECX: f6c03d2c EDX: 
> > ESI:  EDI: f828e100 EBP: f6989ba8 ESP: f6989b88
> >  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > CR0: 8005003b CR2: 0014 CR3: 3696d780 CR4: 06f0
> > Stack:
> >  af83346c3  0001 fff5 f6a7d150 f6a30cd8 f6a30d3c 
> >  f6989bbc c1390cb7 f6a30cd8 f8334660  f6989bd0 c1390d0f f6a30cd8
> >  f8334660  f6989c0c c13916cb f694a614 f68dd040  0008
> > Call Trace:
> >  […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod]
> >  […] ? __rpm_callback+0x27/0x60
> > […]
> > ```
> > 
> > I tried also to boot with Linux 4.1 and it fails the same way.
> > 
> > Is that a known problem and has been fixed in the mean time? It’d be
> > great if you helped me getting the system to boot again. Please tell me
> > if you need more information to debug this issue and I’ll do my best to
> > get it.
> 
> Ben Hutchings asked me to test the patch below to get more debug
> information.
> 
> ```
> diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
> index 8bd54a6..dd5b5b2 100644
> --- a/drivers/scsi/sr.c
> +++ b/drivers/scsi/sr.c
> @@ -144,6 +144,12 @@ static int sr_runtime_suspend(struct device *dev)
>  {
>   struct scsi_cd *cd = dev_get_drvdata(dev);
>  
> + if (WARN_ON(!cd)) {
> + pr_info("%s: cd == NULL; power.usage_count = %d\n",
> + __func__, atomic_read(>power.usage_count));
> + return 0;
> + }
> +
>   if (cd->media_present)
>   return -EBUSY;
>   else
> @@ -652,7 +658,13 @@ static int sr_probe(struct device *dev)
>   struct scsi_cd *cd;
>   int minor, error;
>  
> - scsi_autopm_get_device(sdev);
> + error = scsi_autopm_get_device(sdev);
> + if (error) {
> + pr_err("%s: scsi_autopm_get_device returned %d\n",
> +__func__, error);
> + return error;
> + }
> +
>   error = -ENODEV;
>   if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM)
>   goto fail;
> @@ -719,6 +731,9 @@ static int sr_probe(struct device *dev)
>   if (register_cdrom(>cdi))
>   goto fail_put;
>  
> + pr_info("%s: power.usage_count = %d\n",
> + __func__, atomic_read(>power.usage_count));
> +
>   /*
>* Initialize block layer runtime PM stuffs before the
>* periodic event checking request gets started in add_disk.
> ```
> 
> I’ll try that as soon as a spare drive has arrived, where I can copy the
> data to as a backup.
> 
> More thoughts are welcome! Especially, if that error suggests a failing
> drive or not.


Thanks,

Paul


> > [1] 
> > http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog
[2] https://www.debian.org/Bugs/
[3] https://bugs.debian.org/801925
-- 
GPG-Schlüssel: 33623E9B
Fingerabdruck = 0EB1 649D 4361 D04F 3C70  6F71 4DD7 BF75 3362 3E9B

Giant Monkey Software Engineering GmbH

Brunnenstr. 7D
10119 Berlin Mitte

Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel

USt-IdNr.: DE281524720
HRB 139495 B Amtsgericht Charlottenburg


signature.asc
Description: This is a digitally signed message part


NULL pointer dereference: IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]

2015-10-15 Thread Paul Menzel
Dear Linux SCSI folks,


using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd
227-1 to 227-2 [1] and other packages, the system doesn’t start up
anymore and the /dev/md1 device doesn’t seem to be found and I am
dropped into shell from initramfs (BusyBox).

Only having wireless LAN and no serial or USB debug capabilities, and
mount a USB storage device did not work, I manually copied the beginning
of the Oops.

```
BUG: unable to handle kernel NULL pointer dereference at 0014
IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]
*pdpt = 3696e001 *pde = 00
Oops:  [#1] SMB
Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci 
pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod 
ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+)
CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1
Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009
task: f68dd040 ti: f6988000 task.ti: f6988000
EIP: 0060:[] EFLAGS: 00010246 CPU: 1
EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod]
EAX:  EBX: f6a30cd8 ECX: f6c03d2c EDX: 
ESI:  EDI: f828e100 EBP: f6989ba8 ESP: f6989b88
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 0014 CR3: 3696d780 CR4: 06f0
Stack:
 af83346c3  0001 fff5 f6a7d150 f6a30cd8 f6a30d3c 
 f6989bbc c1390cb7 f6a30cd8 f8334660  f6989bd0 c1390d0f f6a30cd8
 f8334660  f6989c0c c13916cb f694a614 f68dd040  0008
Call Trace:
 […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod]
 […] ? __rpm_callback+0x27/0x60
[…]
```

I tried also to boot with Linux 4.1 and it fails the same way.

Is that a known problem and has been fixed in the mean time? It’d be
great if you helped me getting the system to boot again. Please tell me
if you need more information to debug this issue and I’ll do my best to
get it.


Thanks,

Paul


[1] 
http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog

-- 
GPG-Schlüssel: 33623E9B
Fingerabdruck = 0EB1 649D 4361 D04F 3C70  6F71 4DD7 BF75 3362 3E9B

Giant Monkey Software Engineering GmbH

Brunnenstr. 7D
10119 Berlin Mitte

Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel

USt-IdNr.: DE281524720
HRB 139495 B Amtsgericht Charlottenburg


signature.asc
Description: This is a digitally signed message part


[PATCH 1/3] Documentation: scsi.txt: Remove unused abbreviation lk

2012-08-14 Thread Paul Menzel
From: Paul Menzel paulepan...@users.sourceforge.net
Date: Tue, 14 Aug 2012 11:48:04 +0200

»lk« is not used anywhere in the document.

Signed-off-by: Paul Menzel paulepan...@users.sourceforge.net
---
 Documentation/scsi/scsi.txt |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/scsi/scsi.txt b/Documentation/scsi/scsi.txt
index 3d99d38..45b9c25 100644
--- a/Documentation/scsi/scsi.txt
+++ b/Documentation/scsi/scsi.txt
@@ -1,7 +1,7 @@
 SCSI subsystem documentation
 
 The Linux Documentation Project (LDP) maintains a document describing
-the SCSI subsystem in the Linux kernel (lk) 2.4 series. See:
+the SCSI subsystem in the Linux kernel 2.4 series. See:
 http://www.tldp.org/HOWTO/SCSI-2.4-HOWTO . The LDP has single
 and multiple page HTML renderings as well as postscript and pdf.
 It can also be found at:
-- 
1.7.10.4


signature.asc
Description: This is a digitally signed message part


[PATCH 2/3] Documentation/scsi/scsi.txt: Clean up typography and fix grammar

2012-08-14 Thread Paul Menzel
From: Paul Menzel paulepan...@users.sourceforge.net
Date: Tue, 14 Aug 2012 11:59:31 +0200

1. Consistently use SCSI und Linux.
2. Use two spaces between sentences.
3. Remove trailing white space.

Signed-off-by: Paul Menzel paulepan...@users.sourceforge.net
---
 Documentation/scsi/scsi.txt |   30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/Documentation/scsi/scsi.txt b/Documentation/scsi/scsi.txt
index 45b9c25..56afe6c 100644
--- a/Documentation/scsi/scsi.txt
+++ b/Documentation/scsi/scsi.txt
@@ -9,33 +9,33 @@ 
http://web.archive.org/web/*/http://www.torque.net/scsi/SCSI-2.4-HOWTO
 
 Notes on using modules in the SCSI subsystem
 
-The scsi support in the linux kernel can be modularized in a number of 
+The SCSI support in the Linux kernel can be modularized in a number of
 different ways depending upon the needs of the end user.  To understand
 your options, we should first define a few terms.
 
-The scsi-core (also known as the mid level) contains the core of scsi 
-support.  Without it you can do nothing with any of the other scsi drivers.
-The scsi core support can be a module (scsi_mod.o), or it can be built into
-the kernel. If the core is a module, it must be the first scsi module 
-loaded, and if you unload the modules, it will have to be the last one 
+The scsi-core (also known as the mid level) contains the core of SCSI
+support.  Without it you can do nothing with any of the other SCSI drivers.
+The SCSI core support can be a module (scsi_mod.o), or it can be built into
+the kernel. If the core is a module, it must be the first SCSI module
+loaded, and if you unload the modules, it will have to be the last one
 unloaded.  In practice the modprobe and rmmod commands (and autoclean)
 will enforce the correct ordering of loading and unloading modules in
 the SCSI subsystem.
 
-The individual upper and lower level drivers can be loaded in any order 
-once the scsi core is present in the kernel (either compiled in or loaded
+The individual upper and lower level drivers can be loaded in any order
+once the SCSI core is present in the kernel (either compiled in or loaded
 as a module).  The disk driver (sd_mod.o), cdrom driver (sr_mod.o),
-tape driver ** (st.o) and scsi generics driver (sg.o) represent the upper 
-level drivers to support the various assorted devices which can be 
-controlled.  You can for example load the tape driver to use the tape drive, 
+tape driver ** (st.o) and SCSI generics driver (sg.o) represent the upper
+level drivers to support the various assorted devices which can be
+controlled.  You can for example load the tape driver to use the tape drive,
 and then unload it once you have no further need for the driver (and release
 the associated memory).
 
 The lower level drivers are the ones that support the individual cards that
-are supported for the hardware platform that you are running under. Those
-individual cards are often called Host Bus Adapters (HBAs). For example the
-aic7xxx.o driver is used to control all recent SCSI controller cards from 
-Adaptec. Almost all lower level drivers can be built either as modules or 
+are supported for the hardware platform that you are running under.  Those
+individual cards are often called Host Bus Adapters (HBAs).  For example the
+aic7xxx.o driver is used to control all recent SCSI controller cards from
+Adaptec.  Almost all lower level drivers can be built either as modules or
 built into the kernel.
 
 
-- 
1.7.10.4


signature.asc
Description: This is a digitally signed message part


[PATCH 3/3] Documentation/scsi/scsi.txt: Remove wrong superfluous word »built«

2012-08-14 Thread Paul Menzel
From: Paul Menzel paulepan...@users.sourceforge.net
Date: Tue, 14 Aug 2012 12:01:51 +0200


Signed-off-by: Paul Menzel paulepan...@users.sourceforge.net
---
I am sending this as a separate patch, because I am no native speaker.
But I am pretty sure because the either is after the first built.

 Documentation/scsi/scsi.txt |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/scsi/scsi.txt b/Documentation/scsi/scsi.txt
index 56afe6c..71ab560 100644
--- a/Documentation/scsi/scsi.txt
+++ b/Documentation/scsi/scsi.txt
@@ -36,7 +36,7 @@ are supported for the hardware platform that you are running 
under.  Those
 individual cards are often called Host Bus Adapters (HBAs).  For example the
 aic7xxx.o driver is used to control all recent SCSI controller cards from
 Adaptec.  Almost all lower level drivers can be built either as modules or
-built into the kernel.
+into the kernel.
 
 
 ** There is a variant of the st driver for controlling OnStream tape
-- 
1.7.10.4


signature.asc
Description: This is a digitally signed message part


[PATCH] drivers/scsi/Kconfig: Remove reference to non-existent howtos

2012-08-14 Thread Paul Menzel
From: Paul Menzel paulepan...@users.sourceforge.net
Date: Tue, 14 Aug 2012 12:22:43 +0200

Searching for »scsi« at http://www.tldp.org/HOWTO/html_single/ I only
found »SCSI-2.4-HOWTO« [1].

The Linux 2.4 SCSI subsystem HOWTO

[1] http://www.tldp.org/HOWTO/html_single/SCSI-2.4-HOWTO/

Signed-off-by: Paul Menzel paulepan...@users.sourceforge.net
---
 drivers/scsi/Kconfig |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 74bf1aa..d717116 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -74,10 +74,7 @@ config BLK_DEV_SD
  If you want to use SCSI hard disks, Fibre Channel disks,
  Serial ATA (SATA) or Parallel ATA (PATA) hard disks,
  USB storage or the SCSI or parallel port version of
- the IOMEGA ZIP drive, say Y and read the SCSI-HOWTO,
- the Disk-HOWTO and the Multi-Disk-HOWTO, available from
- http://www.tldp.org/docs.html#howto. This is NOT for SCSI
- CD-ROMs.
+ the IOMEGA ZIP drive, say Y. This is NOT for SCSI CD-ROMs.
 
  To compile this driver as a module, choose M here and read
  file:Documentation/scsi/scsi.txt.
-- 
1.7.10.4


signature.asc
Description: This is a digitally signed message part