Re: Regression 4.17-rc1: SSD doesn’t properly resume causing system hang (NULL pointer dereference)
Dear Bart, On 04/25/18 14:26, Bart Van Assche wrote: On Wed, 2018-04-25 at 07:37 +0200, Paul Menzel wrote: Am 24.04.2018 um 23:17 schrieb Bart Van Assche: On Tue, 2018-04-24 at 23:04 +0200, Paul Menzel wrote: I applied your change, and rebuilt the Linux kernel. Unfortunately, it looks like, it didn’t make a difference. In that case I don't know what is causing the failure. Can you run a bisect to determine which commit introduced this regression? With `scsi_mod.use_blk_mq=n` the system resumes fine, so for to me unknown reasons, that Kconfig option get selected in my Linux kernel configuration. I remember having similar issues when this was enabled by default in Linux 4.13-rc?, so it was just a configuration problem and not a regression. Unfortunately, the Linux configuration files are not under version control, so I cannot check, but it is probably my fault. Sorry for the noise, and please tell me, what I can do to get the option working on this old device. Did the same system boot fine with a previous kernel with scsi-mq enabled? No, as far as I know it never worked, see thread *[Regression 4.13-rc1] Resume does not work on Lenovo X60t* [1]. Anyway, we would like to know what is the root cause such that this NULL pointer dereference can be fixed. There are namely plans to remove the legacy block layer in the not too distant future. I’ll be happy to test proposed changes. Kind regards, Paul PS: Your mailer also changed *doesn’t* to *doesn* in the subject line. [1] https://www.spinics.net/lists/linux-scsi/msg111457.html smime.p7s Description: S/MIME Cryptographic Signature
Re: Regression 4.17-rc1: SSD doesn properly resume causing system hang (NULL pointer dereference)
Dear Bart, Am 24.04.2018 um 23:17 schrieb Bart Van Assche: On Tue, 2018-04-24 at 23:04 +0200, Paul Menzel wrote: I applied your change, and rebuilt the Linux kernel. Unfortunately, it looks like, it didn’t make a difference. In that case I don't know what is causing the failure. Can you run a bisect to determine which commit introduced this regression? With `scsi_mod.use_blk_mq=n` the system resumes fine, so for to me unknown reasons, that Kconfig option get selected in my Linux kernel configuration. I remember having similar issues when this was enabled by default in Linux 4.13-rc?, so it was just a configuration problem and not a regression. Unfortunately, the Linux configuration files are not under version control, so I cannot check, but it is probably my fault. Sorry for the noise, and please tell me, what I can do to get the option working on this old device. Kind regards, Paul
Re: Regression 4.17-rc1: SSD doesn properly resume causing system hang (NULL pointer dereference)
Dear Bart, On 04/24/18 19:31, Bart Van Assche wrote: On Tue, 2018-04-24 at 19:10 +0200, Paul Menzel wrote: Please find the configuration file attached. The log only has `initcall_debug no_console_suspend` added. What I was looking for in the .config is the following: CONFIG_SCSI_MQ_DEFAULT=y Can you also provide the disassembly output for blk_set_runtime_active, e.g. by loading vmlinux into gdb and by running the command "disas blk_set_runtime_active"? Here it is, pasted as citation, as otherwise Thunderbird would wrap the line. (gdb) disas blk_set_runtime_active Dump of assembler code for function blk_set_runtime_active: 0xc1518610 <+0>: call 0xc106ac9c <__fentry__> 0xc1518615 <+5>: push %ebp 0xc1518616 <+6>: mov%esp,%ebp 0xc1518618 <+8>: sub$0x14,%esp 0xc151861b <+11>: mov%ebx,-0xc(%ebp) 0xc151861e <+14>: mov%eax,%ebx 0xc1518620 <+16>: mov%gs:0x14,%eax 0xc1518626 <+22>: mov%eax,-0x10(%ebp) 0xc1518629 <+25>: xor%eax,%eax 0xc151862b <+27>: test %ebx,%ebx 0xc151862d <+29>: mov%esi,-0x8(%ebp) 0xc1518630 <+32>: mov%edi,-0x4(%ebp) 0xc1518633 <+35>: je 0xc15186b3 <blk_set_runtime_active+163> 0xc1518635 <+37>: mov0xfc(%ebx),%eax 0xc151863b <+43>: call 0xc1a4b920 <_raw_spin_lock_irq> 0xc1518640 <+48>: mov0x150(%ebx),%esi 0xc1518646 <+54>: xor%eax,%eax 0xc1518648 <+56>: mov0xc1ca7d20,%edi 0xc151864e <+62>: mov%eax,0x154(%ebx) 0xc1518654 <+68>: cmp$0xff0c,%esi 0xc151865a <+74>: mov%edi,-0x14(%ebp) 0xc151865d <+77>: je 0xc15186a5 <blk_set_runtime_active+149> 0xc151865f <+79>: mov%edi,0xf4(%esi) 0xc1518665 <+85>: mov$0x9,%edx 0xc151866a <+90>: mov0x150(%ebx),%eax 0xc1518670 <+96>: call 0xc175ab80 <__pm_runtime_suspend> 0xc1518675 <+101>: mov0xfc(%ebx),%eax 0xc151867b <+107>: call *0xc1ce2918 0xc1518681 <+113>: call *0xc1ce2888 0xc1518687 <+119>: mov-0x10(%ebp),%eax 0xc151868a <+122>: xor%gs:0x14,%eax 0xc1518691 <+129>: jne0xc15186a0 <blk_set_runtime_active+144> 0xc1518693 <+131>: mov-0xc(%ebp),%ebx 0xc1518696 <+134>: mov-0x8(%ebp),%esi 0xc1518699 <+137>: mov-0x4(%ebp),%edi 0xc151869c <+140>: mov%ebp,%esp 0xc151869e <+142>: pop%ebp 0xc151869f <+143>: ret 0xc15186a0 <+144>: call 0xc108c6c0 <__stack_chk_fail> 0xc15186a5 <+149>: xor%edx,%edx 0xc15186a7 <+151>: mov$0xc1ee14b4,%eax 0xc15186ac <+156>: call 0xc15bb7f0 <__ubsan_handle_type_mismatch> 0xc15186b1 <+161>: jmp0xc151865f <blk_set_runtime_active+79> 0xc15186b3 <+163>: xor%edx,%edx 0xc15186b5 <+165>: mov$0xc1ee14cc,%eax 0xc15186ba <+170>: call 0xc15bb7f0 <__ubsan_handle_type_mismatch> 0xc15186bf <+175>: jmp0xc1518635 <blk_set_runtime_active+37> End of assembler dump. Kind regards, Paul PS: By the way, your mailer stripped the full names of my first message, and replace the “names” with the email address. smime.p7s Description: S/MIME Cryptographic Signature
Re: aacaid: Difference in `/sys` between 4.14.13 andout of tree driver 55022
Dear Raghava, On 01/15/18 21:22, Paul Menzel wrote: Am 18.12.2017 um 19:09 schrieb Raghava Aditya Renukunta: -Original Message- From: Paul Menzel [mailto:pmen...@molgen.mpg.de] Sent: Saturday, December 16, 2017 1:39 AM […] Am 17.02.2017 um 20:29 schrieb Raghava Aditya Renukunta: […] Searching the vendor Web site, there is *Linux Driver Source 1.2.1-53005* available for download [1]. The latest upstream driver version is 50740. We will be reaching version 53005 in couple of patch sets ( ~ 3). http://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?id=96f6a6134766de0d42a98c7758736dde16e0add5 Thank you for the details. At our infrastructure we only want to use LTS Linux kernels, and the latest in 4.14. So right now, Linux 4.14.6 includes version 50834 [1], which is the same version currently in Linus master branch (4.15-rc3). Is that save to use with async mode, or are you aware of problems and we should always use the latest out of tree driver, which is at version 55022 and can be download from the Microsemi server [3]. Well at this point I am in the process of creating a patch set that solves a kdump regression issue(Should be out before the new year), other than that the upstream driver is pretty much up to date. If kdump support is a must for you I would recommend that 55022 be used. We tried Linux 4.14.13, and noticed a difference. My colleague commented as below [1]. The problem is still present in Linux 4.14.23. Here is the location info of the missing sys-fs parts short way, where as the `device` part is missing: ``` #ls -la /sys/class/enclosure/7:0:80:0/Disk001 drwxr-xr-x 3 root system 0 Jan 10 13:08 . drwxr-xr-x 19 root system 0 Jan 10 13:07 .. -rw-r--r-- 1 root system 4096 Jan 11 12:56 active lrwxrwxrwx 1 root system 0 Jan 10 13:08 device -> ../../../../../../../port-7:1/end_device-7:1/target7:0:65/7:0:65:0 -rw-r--r-- 1 root system 4096 Jan 11 12:56 fault -rw-r--r-- 1 root system 4096 Jan 11 12:56 locate drwxr-xr-x 2 root system 0 Jan 11 12:56 power -rw-r--r-- 1 root system 4096 Jan 11 12:56 power_status -r--r--r-- 1 root system 4096 Jan 11 12:56 slot -rw-r--r-- 1 root system 4096 Jan 11 12:56 status -r--r--r-- 1 root system 4096 Jan 11 12:56 type -rw-r--r-- 1 root system 4096 Jan 11 12:56 uevent ``` The true location would be: ``` /sys/devices/pci:00/:00:03.0/:04:00.0/host7/port-7:16/end_device-7:16/target7:0:80/7:0:80:0/enclosure/7:0:80:0/Disk001 ``` Could you point me to a commit bring the in tree driver on par with the out of tree driver? It’d be great, if you could point us to the relevant source, how the device link can be created. Kind regards, Paul [1] https://storage.microsemi.com/en-us/speed/raid/aac/linux/aacraid-linux-src-1_2_1-53005_tgz.php [2] https://elixir.free-electrons.com/linux/v4.14.6/source/drivers/scsi/aacraid/aacraid.h#L100 [3] https://storage.microsemi.com/en-us/downloads/linux_source/linux_source_code/productid=aha-1000-8e=microsemi+adaptec+hba+1000-8e.php [4] https://github.molgen.mpg.de/mariux64/bee-files/pull/571#issuecomment-4468 smime.p7s Description: S/MIME Cryptographic Signature
aacaid: Difference in `/sys` between 4.14.13 andout of tree driver 55022 (was: Driver version for PMC Adaptec HBA in Linux and from vendor)
Dear Raghava, Am 18.12.2017 um 19:09 schrieb Raghava Aditya Renukunta: -Original Message- From: Paul Menzel [mailto:pmen...@molgen.mpg.de] Sent: Saturday, December 16, 2017 1:39 AM […] Am 17.02.2017 um 20:29 schrieb Raghava Aditya Renukunta: […] Searching the vendor Web site, there is *Linux Driver Source 1.2.1-53005* available for download [1]. The latest upstream driver version is 50740. We will be reaching version 53005 in couple of patch sets ( ~ 3). http://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?id=96f6a6134766de0d42a98c7758736dde16e0add5 Thank you for the details. At our infrastructure we only want to use LTS Linux kernels, and the latest in 4.14. So right now, Linux 4.14.6 includes version 50834 [1], which is the same version currently in Linus master branch (4.15-rc3). Is that save to use with async mode, or are you aware of problems and we should always use the latest out of tree driver, which is at version 55022 and can be download from the Microsemi server [3]. Well at this point I am in the process of creating a patch set that solves a kdump regression issue(Should be out before the new year), other than that the upstream driver is pretty much up to date. If kdump support is a must for you I would recommend that 55022 be used. We tried Linux 4.14.13, and noticed a difference. My colleague commented as below [1]. Here is the location info of the missing sys-fs parts short way, where as the `device` part is missing: ``` #ls -la /sys/class/enclosure/7:0:80:0/Disk001 drwxr-xr-x 3 root system0 Jan 10 13:08 . drwxr-xr-x 19 root system0 Jan 10 13:07 .. -rw-r--r-- 1 root system 4096 Jan 11 12:56 active lrwxrwxrwx 1 root system0 Jan 10 13:08 device -> ../../../../../../../port-7:1/end_device-7:1/target7:0:65/7:0:65:0 -rw-r--r-- 1 root system 4096 Jan 11 12:56 fault -rw-r--r-- 1 root system 4096 Jan 11 12:56 locate drwxr-xr-x 2 root system0 Jan 11 12:56 power -rw-r--r-- 1 root system 4096 Jan 11 12:56 power_status -r--r--r-- 1 root system 4096 Jan 11 12:56 slot -rw-r--r-- 1 root system 4096 Jan 11 12:56 status -r--r--r-- 1 root system 4096 Jan 11 12:56 type -rw-r--r-- 1 root system 4096 Jan 11 12:56 uevent ``` The true location would be: ``` /sys/devices/pci:00/:00:03.0/:04:00.0/host7/port-7:16/end_device-7:16/target7:0:80/7:0:80:0/enclosure/7:0:80:0/Disk001 ``` Could you point me to a commit bring the in tree driver on par with the out of tree driver? […] Kind regards, Paul [1] https://storage.microsemi.com/en-us/speed/raid/aac/linux/aacraid-linux-src-1_2_1-53005_tgz.php >> [2] https://elixir.free-electrons.com/linux/v4.14.6/source/drivers/scsi/aacraid/aacraid.h#L100 [3] https://storage.microsemi.com/en-us/downloads/linux_source/linux_source_code/productid=aha-1000-8e=microsemi+adaptec+hba+1000-8e.php [4] https://github.molgen.mpg.de/mariux64/bee-files/pull/571#issuecomment-4468
Re: Driver version for PMC Adaptec HBA in Linux and from vendor
Dear Raghava Aditya, Thank you for your answer. Am 18.12.2017 um 19:09 schrieb Raghava Aditya Renukunta: -Original Message- From: Paul Menzel [mailto:pmen...@molgen.mpg.de] Sent: Saturday, December 16, 2017 1:39 AM To: Raghava Aditya Renukunta <raghavaaditya.renuku...@microsemi.com>; dl-esc-Aacraid Linux Driver <aacr...@microsemi.com> Cc: linux-scsi@vger.kernel.org; it+linux-s...@vger.kernel.org Subject: Re: Driver version for PMC Adaptec HBA in Linux and from vendor Am 17.02.2017 um 20:29 schrieb Raghava Aditya Renukunta: Using a PMC Adaptec HBA 1000-8e with latest Linux, it only initializes in sync mode, instead of async mode. The patches that enable async mode in HBA 1000-8e, have been included in the James Bottomley's linux-scsi Branch and are on track be Included into Linux 4.11. https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/ ``` $ git describe --tag v4.10-rc8-47-g0722f57bf $ dmesg [ 21.359635] Adaptec aacraid driver 1.2-1[41066]-ms [ 21.360017] aacraid :04:00.0: can't disable ASPM; OS doesn't have ASPM control [ 21.363987] AAC0: Async. mode not supported by current driver, sync. mode enforced. [ 21.363987] Please update driver to get full performance. [ 21.364949] AAC0: kernel 1.2-0[0] Nov 5 2015 [ 21.365275] AAC0: monitor 0.0-0[0] [ 21.371382] AAC0: bios 0.13-209[32000] [ 21.371711] AAC0: serial 10F447 [ 21.372035] AAC0: Non-DASD support enabled. [ 21.372360] AAC0: 64bit support enabled. [ 21.372688] AAC0: 64 Bit DAC enabled […] $ git grep 'AAC_DRIVER_BUILD 41066' drivers/scsi/aacraid/aacraid.h:# define AAC_DRIVER_BUILD 41066 ``` Searching the vendor Web site, there is *Linux Driver Source 1.2.1-53005* available for download [1]. The latest upstream driver version is 50740. We will be reaching version 53005 in couple of patch sets ( ~ 3). http://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?id=96f6a613 4766de0d42a98c7758736dde16e0add5 Thank you for the details. At our infrastructure we only want to use LTS Linux kernels, and the latest in 4.14. So right now, Linux 4.14.6 includes version 50834 [1], which is the same version currently in Linus master branch (4.15-rc3). Is that save to use with async mode, or are you aware of problems and we should always use the latest out of tree driver, which is at version 55022 and can be download from the Microsemi server [3]. Well at this point I am in the process of creating a patch set that solves a kdump regression issue(Should be out before the new year), other than that the upstream driver is pretty much up to date. If kdump support is a must for you I would recommend that 55022 be used. From your answer the state of async support is unclear to me. Could you please clarify, if that’s support in 4.14.x? (What source line do I need to check?) How does the upstream process work? Is there a git repository somewhere from Microsemi? Are the patches already up for review? (I didn’t find them.) We try to push out patch sets to kernel.org for every major driver release we make. Usually they go into the sub component maintainers branch (linux-scsi ) , which is then pushed out to Linus when the merge window for opens (currently the merge window for 4.10 is closed , barring fixes). So Linux version 4.11 should have full async support and more for HBA1000-8e. We do not maintain a git repository unfortunately, but we do release the >>> source code for every release as you indicated. For further reference the patches are sent out in the scsi mailing list linux-scsi@vger.kernel.org , the archive is here http://marc.info/?l=linux-scsi=1=2 . Hope I cleared up your doubts. Please do reach out if you have other concerns or questions. Yes, thank you for your elaborate answer, which cleared up a lot of my doubts. We would be even more satisfied if you moved your development fully to the Linux kernel tree, so that it always carries the latest driver. If we can help with that by contacting certain people, please tell us. We would love to, but we have lots of customers who are on the older kernel versions 2.6.32, 3.10.0 etc and It becomes almost impossible for us to fully move our development to the Linux kernel tree and support our customers at the same time. Hopefully we will start being up to date with the upstream kernel in the coming months. Hope that answered your questions. I understand, but doesn’t it make more sense to adapt the model like done for Linux Long Term Support (LTS) series to develop against the latest Linux kernel, and then backport the corresponding patches? Maybe you should talk to Red Hat and SUSE? I guess that’s the systems you have to support. Probably you already talk to them. Kind regards, Paul [1] https://storage.microsemi.com/en-us/speed/raid/aac/linux/aacraid-linux-src-1_2_1-53005_tgz.php [2] https://elixir.free-electrons.com/linux/v4.14.6/source/drivers/scsi/aacraid/aacr
Re: Driver version for PMC Adaptec HBA in Linux and from vendor
[Corrected email address.] Am 16.12.2017 um 10:39 schrieb Paul Menzel: Dear Aditya, Am 17.02.2017 um 20:29 schrieb Raghava Aditya Renukunta: Using a PMC Adaptec HBA 1000-8e with latest Linux, it only initializes in sync mode, instead of async mode. The patches that enable async mode in HBA 1000-8e, have been included in the James Bottomley's linux-scsi Branch and are on track be Included into Linux 4.11. https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/ ``` $ git describe --tag v4.10-rc8-47-g0722f57bf $ dmesg [ 21.359635] Adaptec aacraid driver 1.2-1[41066]-ms [ 21.360017] aacraid :04:00.0: can't disable ASPM; OS doesn't have ASPM control [ 21.363987] AAC0: Async. mode not supported by current driver, sync. mode enforced. [ 21.363987] Please update driver to get full performance. [ 21.364949] AAC0: kernel 1.2-0[0] Nov 5 2015 [ 21.365275] AAC0: monitor 0.0-0[0] [ 21.371382] AAC0: bios 0.13-209[32000] [ 21.371711] AAC0: serial 10F447 [ 21.372035] AAC0: Non-DASD support enabled. [ 21.372360] AAC0: 64bit support enabled. [ 21.372688] AAC0: 64 Bit DAC enabled […] $ git grep 'AAC_DRIVER_BUILD 41066' drivers/scsi/aacraid/aacraid.h:# define AAC_DRIVER_BUILD 41066 ``` Searching the vendor Web site, there is *Linux Driver Source 1.2.1-53005* available for download [1]. The latest upstream driver version is 50740. We will be reaching version 53005 in couple of patch sets ( ~ 3). http://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?id=96f6a6134766de0d42a98c7758736dde16e0add5 Thank you for the details. At our infrastructure we only want to use LTS Linux kernels, and the latest in 4.14. So right now, Linux 4.14.6 includes version 50834 [1], which is the same version currently in Linus master branch (4.15-rc3). Is that save to use with async mode, or are you aware of problems and we should always use the latest out of tree driver, which is at version 55022 and can be download from the Microsemi server [3]. How does the upstream process work? Is there a git repository somewhere from Microsemi? Are the patches already up for review? (I didn’t find them.) We try to push out patch sets to kernel.org for every major driver release we make. Usually they go into the sub component maintainers branch (linux-scsi ) , which is then pushed out to Linus when the merge window for opens (currently the merge window for 4.10 is closed , barring fixes). So Linux version 4.11 should have full async support and more for HBA1000-8e. We do not maintain a git repository unfortunately, but we do release the source code for every release as you indicated. For further reference the patches are sent out in the scsi mailing list linux-scsi@vger.kernel.org , the archive is here http://marc.info/?l=linux-scsi=1=2 . Hope I cleared up your doubts. Please do reach out if you have other concerns or questions. Yes, thank you for your elaborate answer, which cleared up a lot of my doubts. We would be even more satisfied if you moved your development fully to the Linux kernel tree, so that it always carries the latest driver. If we can help with that by contacting certain people, please tell us. Kind regards, Paul [1] https://storage.microsemi.com/en-us/speed/raid/aac/linux/aacraid-linux-src-1_2_1-53005_tgz.php [2] https://elixir.free-electrons.com/linux/v4.14.6/source/drivers/scsi/aacraid/aacraid.h#L100 [3] https://storage.microsemi.com/en-us/downloads/linux_source/linux_source_code/productid=aha-1000-8e=microsemi+adaptec+hba+1000-8e.php
Re: Driver version for PMC Adaptec HBA in Linux and from vendor
Dear Aditya, Am 17.02.2017 um 20:29 schrieb Raghava Aditya Renukunta: Using a PMC Adaptec HBA 1000-8e with latest Linux, it only initializes in sync mode, instead of async mode. The patches that enable async mode in HBA 1000-8e, have been included in the James Bottomley's linux-scsi Branch and are on track be Included into Linux 4.11. https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/ ``` $ git describe --tag v4.10-rc8-47-g0722f57bf $ dmesg [ 21.359635] Adaptec aacraid driver 1.2-1[41066]-ms [ 21.360017] aacraid :04:00.0: can't disable ASPM; OS doesn't have ASPM control [ 21.363987] AAC0: Async. mode not supported by current driver, sync. mode enforced. [ 21.363987] Please update driver to get full performance. [ 21.364949] AAC0: kernel 1.2-0[0] Nov 5 2015 [ 21.365275] AAC0: monitor 0.0-0[0] [ 21.371382] AAC0: bios 0.13-209[32000] [ 21.371711] AAC0: serial 10F447 [ 21.372035] AAC0: Non-DASD support enabled. [ 21.372360] AAC0: 64bit support enabled. [ 21.372688] AAC0: 64 Bit DAC enabled […] $ git grep 'AAC_DRIVER_BUILD 41066' drivers/scsi/aacraid/aacraid.h:# define AAC_DRIVER_BUILD 41066 ``` Searching the vendor Web site, there is *Linux Driver Source 1.2.1-53005* available for download [1]. The latest upstream driver version is 50740. We will be reaching version 53005 in couple of patch sets ( ~ 3). http://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?id=96f6a6134766de0d42a98c7758736dde16e0add5 Thank you for the details. At our infrastructure we only want to use LTS Linux kernels, and the latest in 4.14. So right now, Linux 4.14.6 includes version 50834 [1], which is the same version currently in Linus master branch (4.15-rc3). Is that save to use with async mode, or are you aware of problems and we should always use the latest out of tree driver, which is at version 55022 and can be download from the Microsemi server [3]. How does the upstream process work? Is there a git repository somewhere from Microsemi? Are the patches already up for review? (I didn’t find them.) We try to push out patch sets to kernel.org for every major driver release we make. Usually they go into the sub component maintainers branch (linux-scsi ) , which is then pushed out to Linus when the merge window for opens (currently the merge window for 4.10 is closed , barring fixes). So Linux version 4.11 should have full async support and more for HBA1000-8e. We do not maintain a git repository unfortunately, but we do release the source code for every release as you indicated. For further reference the patches are sent out in the scsi mailing list linux-scsi@vger.kernel.org , the archive is here http://marc.info/?l=linux-scsi=1=2 . Hope I cleared up your doubts. Please do reach out if you have other concerns or questions. Yes, thank you for your elaborate answer, which cleared up a lot of my doubts. We would be even more satisfied if you moved your development fully to the Linux kernel tree, so that it always carries the latest driver. If we can help with that by contacting certain people, please tell us. Kind regards, Paul [1] https://storage.microsemi.com/en-us/speed/raid/aac/linux/aacraid-linux-src-1_2_1-53005_tgz.php [2] https://elixir.free-electrons.com/linux/v4.14.6/source/drivers/scsi/aacraid/aacraid.h#L100 [3] https://storage.microsemi.com/en-us/downloads/linux_source/linux_source_code/productid=aha-1000-8e=microsemi+adaptec+hba+1000-8e.php
Driver version for PMC Adaptec HBA in Linux and from vendor
Dear Raghava, dear Linux folks, Evaluating HBA extension cards, one of our key requirement is easy maintenance, especially when upgrading the firmware. You provide the utility `arcconf` [1], which can be used for such tasks directly on the command line. Unfortunately, we can’t find the source code for this application, which is something we’d like to have when executing programs with root privileges. It’d be great to have something similar like flashrom [2], or the source of your program. Do you know the reasons, why the source of this utility is not published under a free license? Who can be contacted to discuss this issue further? Kind regards, Paul [1] http://download.adaptec.com/raid/storage_manager/arcconf_v2_05_22932.zip [2] https://www.flashrom.org/ smime.p7s Description: S/MIME Cryptographic Signature
Re: [Regression 4.13-rc1] Resume does not work on Lenovo X60t
Dear Christoph, On 08/23/17 14:15, Paul Menzel wrote: On 08/23/17 13:48, Christoph Hellwig wrote: Are you running with blk-mq enabled? Also this never occurred with 4.12, right? Were you also running with or without blk-mq for scsi there? To my knowledge, I am using the defaults from Debian 9. I’ll check in one week, as I am away from the system. It looks like I was using blk-mq, as it was the default up to commit cbe7dfa26eee (Revert "scsi: default to scsi-mq"). So with Linux 4.13-rc7 and disabling blk-mq for SCSI, the system is functional again after resume. Kind regards, Paul
Re: [Regression 4.13-rc1] Resume does not work on Lenovo X60t
Dear Christoph, On 08/23/17 13:48, Christoph Hellwig wrote: Are you running with blk-mq enabled? Also this never occured with 4.12, right? Were you also running with or without blk-mq for scsi there? To my knowledge, I am using the defaults from Debian 9. I’ll check in one week, as I am away from the system. Kind regards, Paul
Re: [Regression 4.13-rc1] Resume does not work on Lenovo X60t
Dear Christoph, On 2017-08-21 20:41, Christoph Hellwig wrote: with 4.13-rc6 we're not using blk-mq by default any more, do you still see the issue with that one? Yes, I do see it this commit 6470812e2226 (Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc). ``` 00.831: [ 575.945132] BUG: unable to handle kernel NULL pointer dereference at 00f4 00.830: [ 575.948009] IP: blk_set_runtime_active+0x27/0x60 00.830: [ 575.948009] *pde = 00.831: [ 575.948009] 00.831: [ 575.948009] Oops: 0002 [#1] SMP 00.831: [ 575.948009] Modules linked in: joydev wacom_w8001 serport cpufreq_powersave cpufreq_conservative cpufreq_userspace binfmt_misc iTCO_wdt iTCO_vendor_support arc4 coretemp snd_hda_codec_analog snd_hda_codec_generic iwl3945 snd_hda_intel pcmcia iwlegacy snd_hda_codec kvm mac80211 snd_hda_core irqbypass yenta_socket snd_pcsp lpc_ich snd_hwdep thinkpad_acpi pcmcia_rsrc mfd_core serio_raw snd_pcm sg pcmcia_core nvram cfg80211 snd_timer rng_core snd rfkill battery soundcore shpchp evdev ac acpi_cpufreq parport_pc ppdev lp parport ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb cbc algif_skcipher af_alg dm_crypt dm_mod sr_mod cdrom sd_mod ata_generic ahci libahci sdhci_pci firewire_ohci ata_piix sdhci firewire_core libata e1000e i2c_i801 psmouse mmc_core crc_itu_t ptp scsi_mod i915 pps_core 00.831: [ 575.948009] ehci_pci video button uhci_hcd i2c_algo_bit ehci_hcd drm_kms_helper thermal usbcore syscopyarea sysfillrect sysimgblt fb_sys_fops drm 00.831: [ 575.948009] CPU: 0 PID: 1126 Comm: kworker/u4:36 Not tainted 4.13.0-rc6+ #110 00.831: [ 575.948009] Hardware name: LENOVO 636338U/636338U, BIOS CBET4000 TIMELESS 01/01/1970 00.831: [ 575.948009] Workqueue: events_unbound async_run_entry_fn 00.831: [ 575.948009] task: f2ed8bc0 task.stack: f2ecc000 00.831: [ 575.948009] EIP: blk_set_runtime_active+0x27/0x60 00.831: [ 575.948009] EFLAGS: 00010046 CPU: 0 00.831: [ 575.948009] EAX: EBX: f5f3f820 ECX: f5f3f918 EDX: 00010d7b 00.831: [ 575.948009] ESI: f8ac3cc0 EDI: 0010 EBP: 0010 ESP: f2ecdea4 00.831: [ 575.948009] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 00.831: [ 575.948009] CR0: 80050033 CR2: 00f4 CR3: 0e3a9000 CR4: 06d0 00.831: [ 575.948009] Call Trace: 00.831: [ 575.948009] ? scsi_bus_resume_common+0x6e/0x110 [scsi_mod] 00.831: [ 575.948009] ? dpm_run_callback+0x4f/0x150 00.831: [ 575.948009] ? wait_for_completion+0x29/0x140 00.831: [ 575.948009] ? scsi_bus_thaw+0x10/0x10 [scsi_mod] 00.831: [ 575.948009] ? device_resume+0x8e/0x180 00.831: [ 575.948009] ? async_resume+0x1b/0x40 00.831: [ 575.948009] ? async_run_entry_fn+0x3f/0x1a0 00.831: [ 575.948009] ? process_one_work+0x136/0x310 00.831: [ 575.948009] ? worker_thread+0x39/0x3b0 00.831: [ 575.948009] ? kthread+0xd7/0x110 00.831: [ 575.948009] ? process_one_work+0x310/0x310 00.831: [ 575.948009] ? kthread_create_on_node+0x30/0x30 00.831: [ 575.948009] ? ret_from_fork+0x19/0x24 00.831: [ 575.948009] Code: 8d 74 26 00 3e 8d 74 26 00 53 89 c3 8b 80 fc 00 00 00 e8 2d 48 32 00 31 c0 8b 15 20 9e 24 ce 89 83 54 01 00 00 8b 83 50 01 00 00 <89> 90 f4 00 00 00 ba 09 00 00 00 8b 83 50 01 00 00 e8 f3 f2 16 00.831: [ 575.948009] EIP: blk_set_runtime_active+0x27/0x60 SS:ESP: 0068:f2ecdea4 00.831: [ 575.948009] CR2: 00f4 00.831: [ 575.948009] ---[ end trace b3f1ac10115418ab ]--- 00.831: [ 576.195662] pciehp :00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 574920 msec ago) 00.831: [ 576.204847] pciehp :00:1c.0:pcie004: Device :01:00.0 already exists at :01:00, cannot hot-add 00.832: [ 576.214460] pciehp :00:1c.0:pcie004: Cannot add device at :01:00 00.834: [ 576.223117] atkbd serio0: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly. 00.834: [ 576.233968] ata1.00: configured for UDMA/33 00.927: [ 576.328159] pciehp :00:1c.0:pcie004: Device :01:00.0 already exists at :01:00, cannot hot-add 00.929: [ 576.340348] pciehp :00:1c.0:pcie004: Cannot add device at :01:00 01.002: [ 576.420139] usb 5-6: reset high-speed USB device number 2 using ehci-pci 01.372: [ 576.796072] firewire_core :05:00.1: rediscovered device fw0 03.010: [ 578.440083] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) 05.274: [ 580.710027] ata3.00: ATA Identify Device Log not supported 05.276: [ 580.718136] ata3.00: Security Log not supported 05.279: [ 580.725856] ata3.00: ATA Identify Device Log not supported 05.282: [ 580.733887] ata3.00: Security Log not supported 05.284: [ 580.740838] ata3.00: configured for UDMA/100 ``` Kind regards, Paul
Re: [Regression 4.13-rc1] Resume does not work on Lenovo X60t
Dear Christoph, On 08/06/17 20:06, Paul Menzel wrote: On 2017-08-05 11:30, Christoph Hellwig wrote: On Thu, Aug 03, 2017 at 07:42:15PM +0200, Paul Menzel wrote: Since the merge windows opened for Linux 4.13, I am unable to resume from ACPI S3 suspend on a Lenovo X60t. The graphics comes back, but I am unable to enter anything, and the system seems to be hung. Magic SysRq keys still work though, but powering the system of doesn’t work. The power button also does not work. Please find the stack trace with Linux 4.13-rc3 captured over the serial console below. Is this really -rc3? rc3 has a commit to disable block runtime pm for blk-mq, which is now the default for scsi. So with -rc1 we've seen similar reports, but rc3 would be odd and suggest we have further problems. Yes, this was 4.13-rc3. Rebuilding the Linux kernel from commit 0fdd951c (Merge tag 'media/v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media) shows the same behavior. Just an update, that this is still present in Linux 4.13-rc5+, that means commit 04d49f3638d0 (Merge tag 'drm-fixes-for-v4.13-rc6' of git://people.freedesktop.org/~airlied/linux). Kind regards, Paul
Re: [Regression 4.13-rc1] Resume does not work on Lenovo X60t
Dear Christoph, On 2017-08-05 11:30, Christoph Hellwig wrote: On Thu, Aug 03, 2017 at 07:42:15PM +0200, Paul Menzel wrote: Since the merge windows opened for Linux 4.13, I am unable to resume from ACPI S3 suspend on a Lenovo X60t. The graphics comes back, but I am unable to enter anything, and the system seems to be hung. Magic SysRq keys still work though, but powering the system of doesn’t work. The power button also does not work. Please find the stack trace with Linux 4.13-rc3 captured over the serial console below. Is this really -rc3? rc3 has a commit to disable block runtime pm for blk-mq, which is now the default for scsi. So with -rc1 we've seen similar reports, but rc3 would be odd and suggest we have further problems. Yes, this was 4.13-rc3. Rebuilding the Linux kernel from commit 0fdd951c (Merge tag 'media/v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media) shows the same behavior. Kind regards, Paul
[Regression 4.13-rc1] Resume does not work on Lenovo X60t
Dear Linux folks, Since the merge windows opened for Linux 4.13, I am unable to resume from ACPI S3 suspend on a Lenovo X60t. The graphics comes back, but I am unable to enter anything, and the system seems to be hung. Magic SysRq keys still work though, but powering the system of doesn’t work. The power button also does not work. Please find the stack trace with Linux 4.13-rc3 captured over the serial console below. > ``` 46.417: [ 58.148083] ata6: port disabled--ignoring 46.417: [ 58.148243] BUG: unable to handle kernel NULL pointer dereference at 00f4 46.417: [ 58.148252] IP: blk_set_runtime_active+0x27/0x60 46.417: [ 58.148253] *pde = 46.417: [ 58.148254] 46.417: [ 58.148256] Oops: 0002 [#1] SMP 46.418: [ 58.148258] Modules linked in: cpufreq_powersave cpufreq_conservative cpufreq_userspace joydev wacom_w8001 serport binfmt_misc iTCO_wdt iTCO_vendor_support coretemp kvm snd_hda_codec_analog snd_hda_codec_generic arc4 irqbypass pcmcia snd_pcsp thinkpad_acpi serio_raw snd_hda_intel snd_hda_codec yenta_socket lpc_ich iwl3945 mfd_core pcmcia_rsrc snd_hda_core iwlegacy snd_hwdep pcmcia_core snd_pcm mac80211 sg rng_core nvram cfg80211 snd_timer snd soundcore rfkill evdev battery ac shpchp acpi_cpufreq parport_pc ppdev lp parport ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb cbc algif_skcipher af_alg dm_crypt dm_mod sr_mod cdrom sd_mod ata_generic psmouse i915 i2c_i801 sdhci_pci ahci ata_piix ehci_pci uhci_hcd libahci firewire_ohci sdhci libata firewire_core ehci_hcd mmc_core e1000e crc_itu_t 46.416: [ 58.148310] scsi_mod ptp usbcore video pps_core button i2c_algo_bit drm_kms_helper thermal syscopyarea sysfillrect sysimgblt fb_sys_fops drm 46.416: [ 58.148322] CPU: 0 PID: 808 Comm: kworker/u4:38 Not tainted 4.13.0-rc3+ #94 46.416: [ 58.148323] Hardware name: LENOVO 636338U/636338U, BIOS CBET4000 TIMELESS 01/01/1970 46.416: [ 58.148328] Workqueue: events_unbound async_run_entry_fn 46.416: [ 58.148330] task: f2900180 task.stack: f2902000 46.416: [ 58.148333] EIP: blk_set_runtime_active+0x27/0x60 46.416: [ 58.148334] EFLAGS: 00010046 CPU: 0 46.416: [ 58.148335] EAX: EBX: f5f3c628 ECX: f5f3c720 EDX: 13c5 46.416: [ 58.148337] ESI: f87a5cc0 EDI: 0010 EBP: 0010 ESP: f2903ea4 46.416: [ 58.148338] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 46.416: [ 58.148340] CR0: 80050033 CR2: 00f4 CR3: 363b4000 CR4: 06d0 46.416: [ 58.148342] Call Trace: 46.416: [ 58.148361] ? scsi_bus_resume_common+0x6e/0x110 [scsi_mod] 46.416: [ 58.148366] ? dpm_run_callback+0x4f/0x150 46.416: [ 58.148369] ? wait_for_completion+0x29/0x140 46.416: [ 58.148381] ? scsi_bus_thaw+0x10/0x10 [scsi_mod] 46.416: [ 58.148384] ? device_resume+0x8e/0x180 46.416: [ 58.148387] ? async_resume+0x1b/0x40 46.416: [ 58.148389] ? async_run_entry_fn+0x3f/0x1a0 46.416: [ 58.148392] ? process_one_work+0x136/0x310 46.416: [ 58.148394] ? worker_thread+0x39/0x3b0 46.416: [ 58.148396] ? kthread+0xd7/0x110 46.416: [ 58.148398] ? process_one_work+0x310/0x310 46.416: [ 58.148400] ? kthread_create_on_node+0x30/0x30 46.416: [ 58.148403] ? ret_from_fork+0x19/0x24 46.416: [ 58.148404] Code: 8d 74 26 00 3e 8d 74 26 00 53 89 c3 8b 80 fc 00 00 00 e8 5d 43 32 00 31 c0 8b 15 20 7e 64 cf 89 83 54 01 00 00 8b 83 50 01 00 00 <89> 90 f4 00 00 00 ba 09 00 00 00 8b 83 50 01 00 00 e8 e3 ef 16 46.416: [ 58.148437] EIP: blk_set_runtime_active+0x27/0x60 SS:ESP: 0068:f2903ea4 46.416: [ 58.148438] CR2: 00f4 46.416: [ 58.148441] ---[ end trace 529e3022b2906e41 ]--- > ``` Please find the full log attached. I don’t know, why the Linux kernel messages in the beginning are transferred in the wrong baud rate. Kind regards, Paul === Thu Aug 3 08:52:39 2017 (adjust=1041.7us) 00.000: <1a>F0H<8c>:<8b>ed<08>4 00.012: <8a><0f>v>74I 00.013: u<16><18><15>><0b><0f>X6T 00.015: CA<84><96><02>(<84><18>R4<18><9a>:<9c>X<05><07>d<16><08><92><91>d<1e>>nM*<87>B@<13><1e><81><83><1e><0c><86><12><95>`<80>Z1<89><99><<00> 00.321: <0e><8c>(B<8d><0c>R<14>H)RX)<8a><81><06><06>42JV.<02><90>uR<15>h<94><92>5<09><95><1a>;<96>26<85>:yZ4@<94><18><96>i<83>NXi<02><95>r<88><17>S<16>(E"pl<82> 00.514: <16><96>4,<1a>:+<80><9e>8<0c>P<19><12>xB<86><05><1c>*<18>)xb<02>.'$&- 00.539: <08>:\(<95><12><02><0e><16>i.<16>r"F<08>*<02>%:#<02>F,"<01>p<<8c><1d>Z<92><04><8d><9b>p<85><81><12>$<1a>#<0f><89>l<04><90>l,@<99>T<04> 00.602: <14><85>12<86><86>64z<12>* 00.622: p:<08><0e><80><84> 00.634: P<04>!x$<80><8a>HD 00.641: D*<16>>|<1a><10>dl<86>`<06><16><05><9e>e<08>6%<02>D|*<00> 00.659: <16>2<90><82>\<8c>iF<93><1e><05>K<16>4(]<04> 00.674: HL<1e>d<8a>%<85>$GP<86>>q<82><8c><0e>8|<80>"<86>"n<08> 00.685: <99><86><07>" 00.686: <89><19>yo<90><8e>tYd<16>-<07><01>,M<9c><89><9c>M-9<12>D<94><8a>A<<04><89>F>[%.<89> 00.725: Tf-^p<8e>p<85><91><94><04> 00.725:
Driver version for PMC Adaptec HBA in Linux and from vendor
Dear Raghava, dear Linux folks, Using a PMC Adaptec HBA 1000-8e with latest Linux, it only initializes in sync mode, instead of async mode. ``` $ git describe --tag v4.10-rc8-47-g0722f57bf $ dmesg [ 21.359635] Adaptec aacraid driver 1.2-1[41066]-ms [ 21.360017] aacraid :04:00.0: can't disable ASPM; OS doesn't have ASPM control [ 21.363987] AAC0: Async. mode not supported by current driver, sync. mode enforced. [ 21.363987] Please update driver to get full performance. [ 21.364949] AAC0: kernel 1.2-0[0] Nov 5 2015 [ 21.365275] AAC0: monitor 0.0-0[0] [ 21.371382] AAC0: bios 0.13-209[32000] [ 21.371711] AAC0: serial 10F447 [ 21.372035] AAC0: Non-DASD support enabled. [ 21.372360] AAC0: 64bit support enabled. [ 21.372688] AAC0: 64 Bit DAC enabled […] $ git grep 'AAC_DRIVER_BUILD 41066' drivers/scsi/aacraid/aacraid.h:# define AAC_DRIVER_BUILD 41066 ``` Searching the vendor Web site, there is *Linux Driver Source 1.2.1-53005* available for download [1]. How does the upstream process work? Is there a git repository somewhere from Microsemi? Are the patches already up for review? (I didn’t find them.) The answers would be very helpful for our evaluation of the device. Kind regards, Paul [1] https://storage.microsemi.com/en-us/speed/raid/aac/linux/aacraid-linux-src-1_2_1-53005_tgz.php
Re: Ordering problems with 3ware controller
Dear Linux folks, On 11/16/16 22:24, Donald Buczek wrote: On 10.11.2016 14:59, Martin K. Petersen wrote: "Paul" == Paul Menzel <pmen...@molgen.mpg.de> writes: Linux does not provide device discovery ordering guarantees. You need to fix your scripts to use UUIDs, filesystem labels, or DM devices to get stable naming. Paul> Indeed. But it worked for several years, so that *something* must Paul> have changed that the ordering of the result of `getdents64` is Paul> different now. Could be changes in the PCI or platform code that causes things to be enumerated differently. Whatever it is, it has nothing to do with the 3ware drivers themselves since they have been dormant for a long time. Right. We further tracked it down. In fact its not a matter of driver initialization order but of the way sysfs/kernfs hashes its object names and thereby defines the order of names returned by getdents64 calls. In fs/kernfs/dir.h the names are inserted into a red-black tree ordered by the hashes over their names (and possibly namespace pointer, which in our case is zero). I've walked the rbtrees of the kernfs_node structs from /sys/class/scsi_host showing their addresses, the hash values and the names in a 4.4.27 system: root:cu:/home/buczek/autofs/# ./peek-3w 88046d847640 : 11bf1ddd : host0 88046c56d3e8 : 11bf1e8d : host1 88046c571c58 : 11bf1f3d : host2 88046c572550 : 11bf1fed : host3 88046c577dc0 : 11bf209d : host4 88046a4bbaf0 : 11bf214d : host5 As can be seen, in 4.4 the hash algorithm happened to produce increasing hash values for names like "host0","host1","host2",... In 4.8.6 the hash values seem to be more random: root:gynaekophobie:/home/buczek/autofs/# ./peek-3w 88041df9a7f8 : 074af64b : host0 88081db40528 : 1009cd9b : host9 88041d3fba50 : 1c512bfb : host7 88181d19c000 : 28988a5b : host5 88041df5a780 : 34dfe8bb : host3 88041d3f5e10 : 4127471b : host1 88041ccbd258 : 562d7ccb : host8 88201cd5f960 : 6274db2b : host6 88141e2d0ca8 : 6ebc398b : host4 88041df599d8 : 7b0397eb : host2 The relevant commit is 703b5fa which includes The commit message summary is *fs/dcache.c: Save one 32-bit multiply in dcache lookup*. static inline unsigned long end_name_hash(unsigned long hash) { - return (unsigned int)hash; + return __hash_32((unsigned int)hash); } __hash_32 is a multiplication by 0x61C88647 ( hash.h ) And this exactly is the difference between the hash value of "host0" on the 4.4 and the 4.8 system: DB<2> x sprintf '%x',0x11bf1ddd*0x61C88647 0 '6c750ef074af64b' The bug, of course, is in the userspace tool tw_cli which wrongly assumes that the names would be returned in the "right" order by getdents. Nice analysis. Unfortunately, I don’t find the discussion of the patch on the Linux kernel mailing list. Searching for the summary only brings up *screen rotation flipped in 4.8-rc* [1]. As a dirty workaround, I've created a new wrapper, which uses ptrace to pause the program on return from SYS_getdents64 and sorts the values returned from the system call in the memory of the target process. > I append the source of the wrapper. Kind regards, Paul [1] https://lkml.org/lkml/2016/8/30/739 "screen rotation flipped in 4.8-rc" -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Delivery Status Notification for linuxr...@lsi.com
Dear Martin, On 11/10/16 15:07, Martin K. Petersen wrote: "Paul" == Paul Menzel <pmen...@molgen.mpg.de> writes: Paul> Probably you know it already, but the listed email address of the Paul> 3WARE SCSI drivers maintainer linuxr...@lsi.com doesn’t work (for Paul> me). Ownership of these products is now with Broadcom. To my knowledge the 3ware product lines have been discontinued. Indeed. I forgot to actually formulate the intend of my message. What should happen to the entry in the file `MAINTAINERS`? Kind regards, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Delivery Status Notification for linuxr...@lsi.com
Dear Linux folks, Probably you know it already, but the listed email address of the 3WARE SCSI drivers maintainer linuxr...@lsi.com doesn’t work (for me). Please see the attached message. Kind regards, Paul --- Begin Message --- This is an automatically generated Delivery Status Notification THIS IS A WARNING MESSAGE ONLY. YOU DO NOT NEED TO RESEND YOUR MESSAGE. Delivery to the following recipient has been delayed: linuxr...@lsi.com Message will be retried for 5 more day(s) Technical details of temporary failure: The recipient server did not accept our requests to connect. Learn more at https://support.google.com/mail/answer/7720 [192.19.192.224 192.19.192.224: timed out] - Original message - X-Gm-Message-State: ABUngveFBksx92G0BZX5qMdBuCDHDG4xuI0c1GPn8OQmdZmNS3ZMR9/TFIpcevk2OOorMUNMld3vQugDvWxMGFOcWZveSLRhDyWLWqRReAKzrCIwHLeIr+9x3z44bqKAnr2A3oQ= X-Received: by 10.28.209.67 with SMTP id i64mr12975034wmg.48.1478599659285; Tue, 08 Nov 2016 02:07:39 -0800 (PST) X-Received: by 10.28.209.67 with SMTP id i64mr12975009wmg.48.1478599659097; Tue, 08 Nov 2016 02:07:39 -0800 (PST) Return-Path: <pmen...@molgen.mpg.de> Received: from mx1.molgen.mpg.de (mx1.molgen.mpg.de. [141.14.17.9]) by mx.google.com with ESMTPS id w203si15677261wmg.41.2016.11.08.02.07.38 for <linuxr...@lsi.com> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Nov 2016 02:07:38 -0800 (PST) Received-SPF: pass (google.com: domain of pmen...@molgen.mpg.de designates 141.14.17.9 as permitted sender) client-ip=141.14.17.9; Authentication-Results: mx.google.com; spf=pass (google.com: domain of pmen...@molgen.mpg.de designates 141.14.17.9 as permitted sender) smtp.mailfrom=pmen...@molgen.mpg.de Received: from keineahnung.molgen.mpg.de (keineahnung.molgen.mpg.de [141.14.17.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: pmenzel) by mx.molgen.mpg.de (Postfix) with ESMTPSA id 08BAC20128247D; Tue, 8 Nov 2016 11:07:38 +0100 (CET) To: linux-scsi@vger.kernel.org Cc: Adam Radford <linuxr...@lsi.com> From: Paul Menzel <pmen...@molgen.mpg.de> Subject: Ordering problems with 3ware controller Message-ID: <a41b4bab-edb7-34ab-eb76-7ff4d6e3f...@molgen.mpg.de> Date: Tue, 8 Nov 2016 11:07:37 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Gm-Spam: 0 X-Gm-Phishy: 0 Dear Linux SCSI folks, Updating from Linux 4.4.X to Linux 4.8.4, we noticed that the 3ware devices under `/dev` – `/dev/twa0`, `/dev/twa1`, … – map to the controllers differently. This unfortunately breaks quite a lot of our scripts, as we depend on the fact that the first controller is also in the front. > $ dmesg | grep 3ware > [ 14.509238] 3ware 9000 Storage Controller device driver for Linux > v2.26.02.014. > [ 14.824274] scsi host8: 3ware 9000 Storage Controller > [ 14.824537] 3w-9xxx: scsi8: Found a 3ware 9000 Storage Controller at > 0xd020, IRQ: 17. > [ 15.508310] scsi host9: 3ware 9000 Storage Controller > [ 15.508569] 3w-9xxx: scsi9: Found a 3ware 9000 Storage Controller at > 0xda10, IRQ: 17. Tracing `twi_cli` it looks like the ordering of the devices in `/sys/class/scsi_host` might have changed, as `getdents64` seems to be used for the ordering of creating `/dev/twaX`. > $ find /sys/class/scsi_host/ -ls > 6033 0 drwxr-xr-x 2 root system 0 Nov 8 10:58 > /sys/class/scsi_host/ > 23125 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 > /sys/class/scsi_host/host0 -> > ../../devices/pci:00/:00:07.0/ata1/host0/scsi_host/host0 > 29893 0 lrwxrwxrwx 1 root system 0 Oct 27 18:03 > /sys/class/scsi_host/host9 -> > ../../devices/pci:80/:80:0e.0/:90:00.0/host9/scsi_host/host9 > 23878 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 > /sys/class/scsi_host/host7 -> > ../../devices/pci:80/:80:08.0/ata8/host7/scsi_host/host7 > 23640 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 > /sys/class/scsi_host/host5 -> > ../../devices/pci:80/:80:07.0/ata6/host5/scsi_host/host5 > 23402 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 > /sys/class/scsi_host/host3 -> > ../../devices/pci:00/:00:08.0/ata4/host3/scsi_host/host3 > 23164 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 > /sys/class/scsi_host/host1 -> ../. - Message truncated - --- End Message ---
Re: Ordering problems with 3ware controller
Dear Martin, On 11/09/16 00:45, Martin K. Petersen wrote: "Paul" == Paul Menzel <pmen...@molgen.mpg.de> writes: Paul> Updating from Linux 4.4.X to Linux 4.8.4, we noticed that the Paul> 3ware devices under `/dev` – `/dev/twa0`, `/dev/twa1`, … – map to Paul> the controllers differently. Paul> This unfortunately breaks quite a lot of our scripts, as we depend Paul> on the fact that the first controller is also in the front. It's not the 3ware drivers since they have not been updated in a long time (since way before 4.4). Yes, that’s what made me wonder too. Linux does not provide device discovery ordering guarantees. You need to fix your scripts to use UUIDs, filesystem labels, or DM devices to get stable naming. Indeed. But it worked for several years, so that *something* must have changed that the ordering of the result of `getdents64` is different now. Fixing the scripts is unfortunately not that easy, as `tw_cli` is a proprietary tool [1], and we do not have the sources. It does a `readdir()`. open("/proc/scsi/3w-9xxx", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = -1 ENOENT (No such file or directory) open("/sys/class/scsi_host", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3 fstat(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 getdents64(3, /* 12 entries */, 4096) = 368 stat("/sys/class/scsi_host/host0/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) stat("/sys/class/scsi_host/host9/stats", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/class/scsi_host/host9/stats", O_RDONLY) = 4 read(4, "3w-9xxx Driver v", 16) = 16 close(4)= 0 open("/dev/twa0", O_RDWR) = 4 close(4)= 0 stat("/sys/class/scsi_host/host7/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) stat("/sys/class/scsi_host/host5/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) stat("/sys/class/scsi_host/host3/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) stat("/sys/class/scsi_host/host1/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) stat("/sys/class/scsi_host/host8/stats", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 open("/sys/class/scsi_host/host8/stats", O_RDONLY) = 4 read(4, "3w-9xxx Driver v", 16) = 16 close(4)= 0 open("/dev/twa1", O_RDWR) = 4 close(4)= 0 stat("/sys/class/scsi_host/host6/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) stat("/sys/class/scsi_host/host4/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) stat("/sys/class/scsi_host/host2/stats", 0x7fffafd05290) = -1 ENOENT (No such file or directory) getdents64(3, /* 0 entries */, 4096)= 0 close(3)= 0 open("/proc/devices", O_RDONLY) = 3 Please find attached a wrapper from my colleague, using name spaces to ensure the ordering, that `tw_cli` expects. Kind regards, Paul [1] https://wiki.hetzner.de/index.php/3Ware_RAID_Controller/en #! /usr/bin/perl use strict; use warnings; sub sort_host { my ($n1,$n2); ($n1)=$a=~/^host(\d+)$/ and ($n2)=$b=~/^host(\d+)$/ and return $n1 <=> $n2; return $a cmp $b; } our $SYS_unshare=272; # /usr/include/asm/unistd_64.h our $CLONE_NEWNS=0x2; # /usr/include/linux/sched.h my $pid=fork; defined $pid or die "$!\n"; unless ($pid) { opendir my $d,"/sys/class/scsi_host"; my @names=sort sort_host grep !/^\.\.?$/,readdir $d; syscall($SYS_unshare,$CLONE_NEWNS) and die "$!\n"; -d '/tmp/sysfs' or mkdir("/tmp/sysfs") or die "/tmp/sysfs: $!\n"; system 'mount','-tsysfs','BLA','/tmp/sysfs' and exit 1; system 'mount','-ttmpfs','BLA','/sys/class/scsi_host' and exit 1; for my $name (reverse @names) { symlink("/tmp/sysfs/class/scsi_host/$name","/sys/class/scsi_host/$name") or die "/sys/class/scsi_host/$name: $!\n"; } exec '/root/bin/tw_cli.exe',@ARGV; die "$!\n"; } wait; $? and exit 1;
Re: Ordering problems with 3ware controller
Dear Linux SCSI folks, On 11/08/16 11:07, Paul Menzel wrote: Updating from Linux 4.4.X to Linux 4.8.4, we noticed that the 3ware devices under `/dev` – `/dev/twa0`, `/dev/twa1`, … – map to the controllers differently. This unfortunately breaks quite a lot of our scripts, as we depend on the fact that the first controller is also in the front. $ dmesg | grep 3ware [ 14.509238] 3ware 9000 Storage Controller device driver for Linux v2.26.02.014. [ 14.824274] scsi host8: 3ware 9000 Storage Controller [ 14.824537] 3w-9xxx: scsi8: Found a 3ware 9000 Storage Controller at 0xd020, IRQ: 17. [ 15.508310] scsi host9: 3ware 9000 Storage Controller [ 15.508569] 3w-9xxx: scsi9: Found a 3ware 9000 Storage Controller at 0xda10, IRQ: 17. Tracing `twi_cli` it looks like the ordering of the devices in `/sys/class/scsi_host` might have changed, as `getdents64` seems to be used for the ordering of creating `/dev/twaX`. $ find /sys/class/scsi_host/ -ls 6033 0 drwxr-xr-x 2 root system 0 Nov 8 10:58 /sys/class/scsi_host/ 23125 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host0 -> ../../devices/pci:00/:00:07.0/ata1/host0/scsi_host/host0 29893 0 lrwxrwxrwx 1 root system 0 Oct 27 18:03 /sys/class/scsi_host/host9 -> ../../devices/pci:80/:80:0e.0/:90:00.0/host9/scsi_host/host9 23878 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host7 -> ../../devices/pci:80/:80:08.0/ata8/host7/scsi_host/host7 23640 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host5 -> ../../devices/pci:80/:80:07.0/ata6/host5/scsi_host/host5 23402 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host3 -> ../../devices/pci:00/:00:08.0/ata4/host3/scsi_host/host3 23164 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host1 -> ../../devices/pci:00/:00:07.0/ata2/host1/scsi_host/host1 29851 0 lrwxrwxrwx 1 root system 0 Oct 27 18:03 /sys/class/scsi_host/host8 -> ../../devices/pci:00/:00:0e.0/:05:00.0/host8/scsi_host/host8 23839 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host6 -> ../../devices/pci:80/:80:08.0/ata7/host6/scsi_host/host6 23601 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host4 -> ../../devices/pci:80/:80:07.0/ata5/host4/scsi_host/host4 23363 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host2 -> ../../devices/pci:00/:00:08.0/ata3/host2/scsi_host/host2 $ sudo -i tw_cli show Ctl Model(V)Ports Drives Units NotOpt RRate VRate BBU c89650SE-8LPML 8 81 0 5 1 OK c99690SA-8E0 00 0 5 1 OK Enclosure Slots Drives Fans TSUnits PSUnits Alarms -- /c9/e016 0 3 121 So in this case `c8` is mapped to `/dev/twa1`, and `c9` to `/dev/twa0`. As we do not know of a way, to use `tw_cli` to find the correct mapping, or another place, we rely on the implicit ordering, which – according to my colleagues – has worked for over 15 years [1]. Here is the excerpt from the manual page for smartctl [2]. > --- end of manual page excerpt --- 3ware,N - [FreeBSD and Linux only] the device consists of one or more ATA disks con‐ nected to a 3ware RAID controller. The non-negative integer N (in the range from 0 to 127 inclusive) denotes which disk on the controller is monitored. Use syntax such as: smartctl -a -d 3ware,2 /dev/sda smartctl -a -d 3ware,0 /dev/twe0 smartctl -a -d 3ware,1 /dev/twa0 smartctl -a -d 3ware,1 /dev/twl0 The first two forms, which refer to devices /dev/sda-z and /dev/twe0-15, may be used with 3ware series 6000, 7000, and 8000 series controllers that use the 3x- driver. Note that the /dev/sda-z form is deprecated starting with the Linux 2.6 kernel series and may not be supported by the Linux kernel in the near future. The final form, which refers to devices /dev/twa0-15, must be used with 3ware 9000 series controllers, which use the 3w-9xxx driver. The devices /dev/twl0-15 must be used with the 3ware/LSI 9750 series controllers which use the 3w-sas driver. Note that if the special character device nodes /dev/twl?, /dev/twa? and /dev/twe? do not exist, or exist with the incorrect major or minor numbers, smartctl will recreate them on the fly. Typically /dev/twa0 refers to the first 9000-series controller, /dev/twa1 refers to the second 9000 series controller, and so on. The /dev/twl
Ordering problems with 3ware controller
Dear Linux SCSI folks, Updating from Linux 4.4.X to Linux 4.8.4, we noticed that the 3ware devices under `/dev` – `/dev/twa0`, `/dev/twa1`, … – map to the controllers differently. This unfortunately breaks quite a lot of our scripts, as we depend on the fact that the first controller is also in the front. $ dmesg | grep 3ware [ 14.509238] 3ware 9000 Storage Controller device driver for Linux v2.26.02.014. [ 14.824274] scsi host8: 3ware 9000 Storage Controller [ 14.824537] 3w-9xxx: scsi8: Found a 3ware 9000 Storage Controller at 0xd020, IRQ: 17. [ 15.508310] scsi host9: 3ware 9000 Storage Controller [ 15.508569] 3w-9xxx: scsi9: Found a 3ware 9000 Storage Controller at 0xda10, IRQ: 17. Tracing `twi_cli` it looks like the ordering of the devices in `/sys/class/scsi_host` might have changed, as `getdents64` seems to be used for the ordering of creating `/dev/twaX`. $ find /sys/class/scsi_host/ -ls 6033 0 drwxr-xr-x 2 root system 0 Nov 8 10:58 /sys/class/scsi_host/ 23125 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host0 -> ../../devices/pci:00/:00:07.0/ata1/host0/scsi_host/host0 29893 0 lrwxrwxrwx 1 root system 0 Oct 27 18:03 /sys/class/scsi_host/host9 -> ../../devices/pci:80/:80:0e.0/:90:00.0/host9/scsi_host/host9 23878 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host7 -> ../../devices/pci:80/:80:08.0/ata8/host7/scsi_host/host7 23640 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host5 -> ../../devices/pci:80/:80:07.0/ata6/host5/scsi_host/host5 23402 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host3 -> ../../devices/pci:00/:00:08.0/ata4/host3/scsi_host/host3 23164 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host1 -> ../../devices/pci:00/:00:07.0/ata2/host1/scsi_host/host1 29851 0 lrwxrwxrwx 1 root system 0 Oct 27 18:03 /sys/class/scsi_host/host8 -> ../../devices/pci:00/:00:0e.0/:05:00.0/host8/scsi_host/host8 23839 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host6 -> ../../devices/pci:80/:80:08.0/ata7/host6/scsi_host/host6 23601 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host4 -> ../../devices/pci:80/:80:07.0/ata5/host4/scsi_host/host4 23363 0 lrwxrwxrwx 1 root system 0 Oct 27 17:41 /sys/class/scsi_host/host2 -> ../../devices/pci:00/:00:08.0/ata3/host2/scsi_host/host2 $ sudo -i tw_cli show Ctl Model(V)Ports Drives Units NotOpt RRate VRate BBU c89650SE-8LPML 8 81 0 5 1 OK c99690SA-8E0 00 0 5 1 OK Enclosure Slots Drives Fans TSUnits PSUnits Alarms -- /c9/e016 0 3 121 So in this case `c8` is mapped to `/dev/twa1`, and `c9` to `/dev/twa0`. As we do not know of a way, to use `tw_cli` to find the correct mapping, or another place, we rely on the implicit ordering, which – according to my colleagues – has worked for over 15 years [1]. Do you know of a way, to either get the mapping “over an API” so we don’t have to rely on the implicit ordering? Otherwise, do you know, why the ordering has changed, and can this be reverted? Kind regards, Paul Menzel [1] https://www.thomas-krenn.com/de/wiki/Smartmontools_mit_3ware_RAID_Controller (German) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NULL pointer dereference: IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]
Control: notfound -1 3.19-1~exp1 Control: found -1 4.2.5-1 Am Dienstag, den 20.10.2015, 02:39 +0100 schrieb Ben Hutchings: > On Fri, 2015-10-16 at 09:54 +0200, Paul Menzel wrote: > [...] > > > BUG: unable to handle kernel NULL pointer dereference at 0014 > > > IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod] > > > *pdpt = 3696e001 *pde = 00 > > > Oops: [#1] SMB > > > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci > > > libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata > > > scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal > > > thermal_sys floppy(+) > > > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian > > > 4.2.3-1 > > > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009 > > > task: f68dd040 ti: f6988000 task.ti: f6988000 > > > EIP: 0060:[] EFLAGS: 00010246 CPU: 1 > > > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod] > > > EAX: EBX: f6a30cd8 ECX: f6c03d2c EDX: > > > ESI: EDI: f828e100 EBP: f6989ba8 ESP: f6989b88 > > > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > > > CR0: 8005003b CR2: 0014 CR3: 3696d780 CR4: 06f0 > > > Stack: > > > af83346c3 0001 fff5 f6a7d150 f6a30cd8 f6a30d3c > > > f6989bbc c1390cb7 f6a30cd8 f8334660 f6989bd0 c1390d0f f6a30cd8 > > > f8334660 f6989c0c c13916cb f694a614 f68dd040 0008 > > > Call Trace: > > > […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod] > > > […] ? __rpm_callback+0x27/0x60 > > > […] > [...] > > Ben Hutchings asked me to test the patch below to get more debug > > information. > [...] > > Well, that didn't help much. Paul hit another oops, this time in > sd_mod but again apparently related to runtime PM. My patch only > touched sr_mod. > > This time he sent photos of the complete oops; see > <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801925;filename=20151020_005.jpg;att=4;msg=15> > and > <https://bugs.debian.org/cgi-bin/bugreport.cgi?filename=20151020_006.jpg;bug=801925;att=3;msg=15> after backing up my data, I tested a little bit more, and using Linux 3.19 the drive is detected and the system boots. Does anything stand out what changed in this area between Linux 3.19 and 4.1? Thanks Paul -- go~mus | Besuchermanagement ▶ 18. – 20. November 2015 // Messe Köln – Stand D054 Besuchen Sie uns auf der EXPONATEC und lernen Sie die Software für Besuchermanagement kennen, die von führenden Museumsverbänden in Europa eingesetzt wird. Mehr Infos über go~mus finden Sie unter https://www.gomus.de ~ GPG-Schlüssel: 33623E9B Fingerabdruck = 0EB1 649D 4361 D04F 3C70 6F71 4DD7 BF75 3362 3E9B Giant Monkey Software Engineering GmbH Brunnenstr. 7D 10119 Berlin Mitte Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel USt-IdNr.: DE281524720 HRB 139495 B Amtsgericht Charlottenburg signature.asc Description: This is a digitally signed message part
Re: NULL pointer dereference: IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]
Package: linux-image-4.2.0-1-686-pae Version: 4.2.3-2 Severity: important Dear Linux SCSI folks, please don’t include the address sub...@bugs.debian.org in your reply. Am Freitag, den 16.10.2015, 03:05 +0200 schrieb Paul Menzel: > using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd > 227-1 to 227-2 [1] and other packages, the system doesn’t start up > anymore and the /dev/md1 device doesn’t seem to be found and I am > dropped into shell from initramfs (BusyBox). > > Only having wireless LAN and no serial or USB debug capabilities, and > mount a USB storage device did not work, I manually copied the beginning > of the Oops. > > ``` > BUG: unable to handle kernel NULL pointer dereference at 0014 > IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod] > *pdpt = 3696e001 *pde = 00 > Oops: [#1] SMB > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci > libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata > scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal > thermal_sys floppy(+) > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian > 4.2.3-1 > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009 > task: f68dd040 ti: f6988000 task.ti: f6988000 > EIP: 0060:[] EFLAGS: 00010246 CPU: 1 > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod] > EAX: EBX: f6a30cd8 ECX: f6c03d2c EDX: > ESI: EDI: f828e100 EBP: f6989ba8 ESP: f6989b88 > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > CR0: 8005003b CR2: 0014 CR3: 3696d780 CR4: 06f0 > Stack: > af83346c3 0001 fff5 f6a7d150 f6a30cd8 f6a30d3c > f6989bbc c1390cb7 f6a30cd8 f8334660 f6989bd0 c1390d0f f6a30cd8 > f8334660 f6989c0c c13916cb f694a614 f68dd040 0008 > Call Trace: > […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod] > […] ? __rpm_callback+0x27/0x60 > […] > ``` > > I tried also to boot with Linux 4.1 and it fails the same way. > > Is that a known problem and has been fixed in the mean time? It’d be > great if you helped me getting the system to boot again. Please tell me > if you need more information to debug this issue and I’ll do my best to > get it. Ben Hutchings asked me to test the patch below to get more debug information. ``` diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index 8bd54a6..dd5b5b2 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -144,6 +144,12 @@ static int sr_runtime_suspend(struct device *dev) { struct scsi_cd *cd = dev_get_drvdata(dev); + if (WARN_ON(!cd)) { + pr_info("%s: cd == NULL; power.usage_count = %d\n", + __func__, atomic_read(>power.usage_count)); + return 0; + } + if (cd->media_present) return -EBUSY; else @@ -652,7 +658,13 @@ static int sr_probe(struct device *dev) struct scsi_cd *cd; int minor, error; - scsi_autopm_get_device(sdev); + error = scsi_autopm_get_device(sdev); + if (error) { + pr_err("%s: scsi_autopm_get_device returned %d\n", + __func__, error); + return error; + } + error = -ENODEV; if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM) goto fail; @@ -719,6 +731,9 @@ static int sr_probe(struct device *dev) if (register_cdrom(>cdi)) goto fail_put; + pr_info("%s: power.usage_count = %d\n", + __func__, atomic_read(>power.usage_count)); + /* * Initialize block layer runtime PM stuffs before the * periodic event checking request gets started in add_disk. ``` I’ll try that as soon as a spare drive has arrived, where I can copy the data to as a backup. More thoughts are welcome! Especially, if that error suggests a failing drive or not. Thanks, Paul > [1] > http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog -- GPG-Schlüssel: 33623E9B Fingerabdruck = 0EB1 649D 4361 D04F 3C70 6F71 4DD7 BF75 3362 3E9B Giant Monkey Software Engineering GmbH Brunnenstr. 7D 10119 Berlin Mitte Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel USt-IdNr.: DE281524720 HRB 139495 B Amtsgericht Charlottenburg signature.asc Description: This is a digitally signed message part
Re: NULL pointer dereference: IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]
Dear Linux SCSI folks, Am Freitag, den 16.10.2015, 09:54 +0200 schrieb Paul Menzel: > Package: linux-image-4.2.0-1-686-pae > Version: 4.2.3-2 > Severity: important > please don’t include the address sub...@bugs.debian.org in your reply. this issue is now also tracked in the Debian Bug Tracking System [2] and has the number #801925 [3]. Please keep that address in CC. > Am Freitag, den 16.10.2015, 03:05 +0200 schrieb Paul Menzel: > > > using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd > > 227-1 to 227-2 [1] and other packages, the system doesn’t start up > > anymore and the /dev/md1 device doesn’t seem to be found and I am > > dropped into shell from initramfs (BusyBox). > > > > Only having wireless LAN and no serial or USB debug capabilities, and > > mount a USB storage device did not work, I manually copied the beginning > > of the Oops. > > > > ``` > > BUG: unable to handle kernel NULL pointer dereference at 0014 > > IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod] > > *pdpt = 3696e001 *pde = 00 > > Oops: [#1] SMB > > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci > > libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata > > scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal > > thermal_sys floppy(+) > > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian > > 4.2.3-1 > > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009 > > task: f68dd040 ti: f6988000 task.ti: f6988000 > > EIP: 0060:[] EFLAGS: 00010246 CPU: 1 > > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod] > > EAX: EBX: f6a30cd8 ECX: f6c03d2c EDX: > > ESI: EDI: f828e100 EBP: f6989ba8 ESP: f6989b88 > > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > > CR0: 8005003b CR2: 0014 CR3: 3696d780 CR4: 06f0 > > Stack: > > af83346c3 0001 fff5 f6a7d150 f6a30cd8 f6a30d3c > > f6989bbc c1390cb7 f6a30cd8 f8334660 f6989bd0 c1390d0f f6a30cd8 > > f8334660 f6989c0c c13916cb f694a614 f68dd040 0008 > > Call Trace: > > […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod] > > […] ? __rpm_callback+0x27/0x60 > > […] > > ``` > > > > I tried also to boot with Linux 4.1 and it fails the same way. > > > > Is that a known problem and has been fixed in the mean time? It’d be > > great if you helped me getting the system to boot again. Please tell me > > if you need more information to debug this issue and I’ll do my best to > > get it. > > Ben Hutchings asked me to test the patch below to get more debug > information. > > ``` > diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c > index 8bd54a6..dd5b5b2 100644 > --- a/drivers/scsi/sr.c > +++ b/drivers/scsi/sr.c > @@ -144,6 +144,12 @@ static int sr_runtime_suspend(struct device *dev) > { > struct scsi_cd *cd = dev_get_drvdata(dev); > > + if (WARN_ON(!cd)) { > + pr_info("%s: cd == NULL; power.usage_count = %d\n", > + __func__, atomic_read(>power.usage_count)); > + return 0; > + } > + > if (cd->media_present) > return -EBUSY; > else > @@ -652,7 +658,13 @@ static int sr_probe(struct device *dev) > struct scsi_cd *cd; > int minor, error; > > - scsi_autopm_get_device(sdev); > + error = scsi_autopm_get_device(sdev); > + if (error) { > + pr_err("%s: scsi_autopm_get_device returned %d\n", > +__func__, error); > + return error; > + } > + > error = -ENODEV; > if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM) > goto fail; > @@ -719,6 +731,9 @@ static int sr_probe(struct device *dev) > if (register_cdrom(>cdi)) > goto fail_put; > > + pr_info("%s: power.usage_count = %d\n", > + __func__, atomic_read(>power.usage_count)); > + > /* >* Initialize block layer runtime PM stuffs before the >* periodic event checking request gets started in add_disk. > ``` > > I’ll try that as soon as a spare drive has arrived, where I can copy the > data to as a backup. > > More thoughts are welcome! Especially, if that error suggests a failing > drive or not. Thanks, Paul > > [1] > > http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog [2] https://www.debian.org/Bugs/ [3] https://bugs.debian.org/801925 -- GPG-Schlüssel: 33623E9B Fingerabdruck = 0EB1 649D 4361 D04F 3C70 6F71 4DD7 BF75 3362 3E9B Giant Monkey Software Engineering GmbH Brunnenstr. 7D 10119 Berlin Mitte Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel USt-IdNr.: DE281524720 HRB 139495 B Amtsgericht Charlottenburg signature.asc Description: This is a digitally signed message part
NULL pointer dereference: IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]
Dear Linux SCSI folks, using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd 227-1 to 227-2 [1] and other packages, the system doesn’t start up anymore and the /dev/md1 device doesn’t seem to be found and I am dropped into shell from initramfs (BusyBox). Only having wireless LAN and no serial or USB debug capabilities, and mount a USB storage device did not work, I manually copied the beginning of the Oops. ``` BUG: unable to handle kernel NULL pointer dereference at 0014 IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod] *pdpt = 3696e001 *pde = 00 Oops: [#1] SMB Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+) CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1 Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009 task: f68dd040 ti: f6988000 task.ti: f6988000 EIP: 0060:[] EFLAGS: 00010246 CPU: 1 EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod] EAX: EBX: f6a30cd8 ECX: f6c03d2c EDX: ESI: EDI: f828e100 EBP: f6989ba8 ESP: f6989b88 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 0014 CR3: 3696d780 CR4: 06f0 Stack: af83346c3 0001 fff5 f6a7d150 f6a30cd8 f6a30d3c f6989bbc c1390cb7 f6a30cd8 f8334660 f6989bd0 c1390d0f f6a30cd8 f8334660 f6989c0c c13916cb f694a614 f68dd040 0008 Call Trace: […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod] […] ? __rpm_callback+0x27/0x60 […] ``` I tried also to boot with Linux 4.1 and it fails the same way. Is that a known problem and has been fixed in the mean time? It’d be great if you helped me getting the system to boot again. Please tell me if you need more information to debug this issue and I’ll do my best to get it. Thanks, Paul [1] http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog -- GPG-Schlüssel: 33623E9B Fingerabdruck = 0EB1 649D 4361 D04F 3C70 6F71 4DD7 BF75 3362 3E9B Giant Monkey Software Engineering GmbH Brunnenstr. 7D 10119 Berlin Mitte Geschäftsführer Adrian Fuhrmann, Lion Vollnhals und Paul Menzel USt-IdNr.: DE281524720 HRB 139495 B Amtsgericht Charlottenburg signature.asc Description: This is a digitally signed message part
[PATCH 1/3] Documentation: scsi.txt: Remove unused abbreviation lk
From: Paul Menzel paulepan...@users.sourceforge.net Date: Tue, 14 Aug 2012 11:48:04 +0200 »lk« is not used anywhere in the document. Signed-off-by: Paul Menzel paulepan...@users.sourceforge.net --- Documentation/scsi/scsi.txt |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/scsi/scsi.txt b/Documentation/scsi/scsi.txt index 3d99d38..45b9c25 100644 --- a/Documentation/scsi/scsi.txt +++ b/Documentation/scsi/scsi.txt @@ -1,7 +1,7 @@ SCSI subsystem documentation The Linux Documentation Project (LDP) maintains a document describing -the SCSI subsystem in the Linux kernel (lk) 2.4 series. See: +the SCSI subsystem in the Linux kernel 2.4 series. See: http://www.tldp.org/HOWTO/SCSI-2.4-HOWTO . The LDP has single and multiple page HTML renderings as well as postscript and pdf. It can also be found at: -- 1.7.10.4 signature.asc Description: This is a digitally signed message part
[PATCH 2/3] Documentation/scsi/scsi.txt: Clean up typography and fix grammar
From: Paul Menzel paulepan...@users.sourceforge.net Date: Tue, 14 Aug 2012 11:59:31 +0200 1. Consistently use SCSI und Linux. 2. Use two spaces between sentences. 3. Remove trailing white space. Signed-off-by: Paul Menzel paulepan...@users.sourceforge.net --- Documentation/scsi/scsi.txt | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/Documentation/scsi/scsi.txt b/Documentation/scsi/scsi.txt index 45b9c25..56afe6c 100644 --- a/Documentation/scsi/scsi.txt +++ b/Documentation/scsi/scsi.txt @@ -9,33 +9,33 @@ http://web.archive.org/web/*/http://www.torque.net/scsi/SCSI-2.4-HOWTO Notes on using modules in the SCSI subsystem -The scsi support in the linux kernel can be modularized in a number of +The SCSI support in the Linux kernel can be modularized in a number of different ways depending upon the needs of the end user. To understand your options, we should first define a few terms. -The scsi-core (also known as the mid level) contains the core of scsi -support. Without it you can do nothing with any of the other scsi drivers. -The scsi core support can be a module (scsi_mod.o), or it can be built into -the kernel. If the core is a module, it must be the first scsi module -loaded, and if you unload the modules, it will have to be the last one +The scsi-core (also known as the mid level) contains the core of SCSI +support. Without it you can do nothing with any of the other SCSI drivers. +The SCSI core support can be a module (scsi_mod.o), or it can be built into +the kernel. If the core is a module, it must be the first SCSI module +loaded, and if you unload the modules, it will have to be the last one unloaded. In practice the modprobe and rmmod commands (and autoclean) will enforce the correct ordering of loading and unloading modules in the SCSI subsystem. -The individual upper and lower level drivers can be loaded in any order -once the scsi core is present in the kernel (either compiled in or loaded +The individual upper and lower level drivers can be loaded in any order +once the SCSI core is present in the kernel (either compiled in or loaded as a module). The disk driver (sd_mod.o), cdrom driver (sr_mod.o), -tape driver ** (st.o) and scsi generics driver (sg.o) represent the upper -level drivers to support the various assorted devices which can be -controlled. You can for example load the tape driver to use the tape drive, +tape driver ** (st.o) and SCSI generics driver (sg.o) represent the upper +level drivers to support the various assorted devices which can be +controlled. You can for example load the tape driver to use the tape drive, and then unload it once you have no further need for the driver (and release the associated memory). The lower level drivers are the ones that support the individual cards that -are supported for the hardware platform that you are running under. Those -individual cards are often called Host Bus Adapters (HBAs). For example the -aic7xxx.o driver is used to control all recent SCSI controller cards from -Adaptec. Almost all lower level drivers can be built either as modules or +are supported for the hardware platform that you are running under. Those +individual cards are often called Host Bus Adapters (HBAs). For example the +aic7xxx.o driver is used to control all recent SCSI controller cards from +Adaptec. Almost all lower level drivers can be built either as modules or built into the kernel. -- 1.7.10.4 signature.asc Description: This is a digitally signed message part
[PATCH 3/3] Documentation/scsi/scsi.txt: Remove wrong superfluous word »built«
From: Paul Menzel paulepan...@users.sourceforge.net Date: Tue, 14 Aug 2012 12:01:51 +0200 Signed-off-by: Paul Menzel paulepan...@users.sourceforge.net --- I am sending this as a separate patch, because I am no native speaker. But I am pretty sure because the either is after the first built. Documentation/scsi/scsi.txt |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/scsi/scsi.txt b/Documentation/scsi/scsi.txt index 56afe6c..71ab560 100644 --- a/Documentation/scsi/scsi.txt +++ b/Documentation/scsi/scsi.txt @@ -36,7 +36,7 @@ are supported for the hardware platform that you are running under. Those individual cards are often called Host Bus Adapters (HBAs). For example the aic7xxx.o driver is used to control all recent SCSI controller cards from Adaptec. Almost all lower level drivers can be built either as modules or -built into the kernel. +into the kernel. ** There is a variant of the st driver for controlling OnStream tape -- 1.7.10.4 signature.asc Description: This is a digitally signed message part
[PATCH] drivers/scsi/Kconfig: Remove reference to non-existent howtos
From: Paul Menzel paulepan...@users.sourceforge.net Date: Tue, 14 Aug 2012 12:22:43 +0200 Searching for »scsi« at http://www.tldp.org/HOWTO/html_single/ I only found »SCSI-2.4-HOWTO« [1]. The Linux 2.4 SCSI subsystem HOWTO [1] http://www.tldp.org/HOWTO/html_single/SCSI-2.4-HOWTO/ Signed-off-by: Paul Menzel paulepan...@users.sourceforge.net --- drivers/scsi/Kconfig |5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index 74bf1aa..d717116 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -74,10 +74,7 @@ config BLK_DEV_SD If you want to use SCSI hard disks, Fibre Channel disks, Serial ATA (SATA) or Parallel ATA (PATA) hard disks, USB storage or the SCSI or parallel port version of - the IOMEGA ZIP drive, say Y and read the SCSI-HOWTO, - the Disk-HOWTO and the Multi-Disk-HOWTO, available from - http://www.tldp.org/docs.html#howto. This is NOT for SCSI - CD-ROMs. + the IOMEGA ZIP drive, say Y. This is NOT for SCSI CD-ROMs. To compile this driver as a module, choose M here and read file:Documentation/scsi/scsi.txt. -- 1.7.10.4 signature.asc Description: This is a digitally signed message part