Re: [PATCH] libsas: Disable asynchronous aborts for SATA devices
On Tue, 2018-01-09 at 16:43 +0100, Hannes Reinecke wrote: > Handling CD-ROM devices from libsas is decidedly odd, as libata > relies on SCSI EH to be started to figure out that no medium is > present. > So we cannot do asynchronous aborts for SATA devices. The box boots fine with this change, thanks! Tested-by: Yves-Alexis Perez -- Yves-Alexis signature.asc Description: This is a digitally signed message part
Re: [PATCHv2] libsas: Check for completed commands before calling lldd_abort_task()
On Tue, 2018-01-09 at 10:30 +0100, Hannes Reinecke wrote: > Can you try to boot the stock 4.15 kernel (without any patches) with > scsi_mod.scsi_logging_level=9411 > on the kernel commandline and send me tha output? > I really would like to see which command fails. > THX. Here it is: [ 204.960508] sd 0:0:1:0: [sdb] tag#0 Send: scmd 0x6f2d047e [ 204.960510] sd 0:0:0:0: [sda] tag#1 Send: scmd 0x9840a325 [ 204.960511] sd 0:0:1:0: [sdb] tag#0 CDB: ATA command pass through(16) 85 06 20 00 05 00 fe 00 00 00 00 00 00 40 ef 00 [ 204.960512] sd 0:0:0:0: [sda] tag#1 CDB: ATA command pass through(16) 85 06 20 00 05 00 fe 00 00 00 00 00 00 40 ef 00 [ 204.960583] sd 0:0:1:0: [sdb] tag#0 Done: TIMEOUT_ERROR Result: hostbyte=DID_OK driverbyte=DRIVER_OK [ 204.960586] sd 0:0:1:0: [sdb] tag#0 CDB: ATA command pass through(16) 85 06 20 00 05 00 fe 00 00 00 00 00 00 40 ef 00 [ 204.983681] BUG: unable to handle kernel NULL pointer dereference at (null) [ 204.983690] IP: isci_task_abort_task+0x30/0x4b0 [isci] [ 204.983691] PGD 0 P4D 0 [ 204.983693] Oops: [#1] SMP PTI [ 204.983695] Modules linked in: snd_hwdep(+) snd_hda_core snd_pcm_oss snd_mixer_oss iTCO_wdt snd_pcm iTCO_vendor_support mei_me snd_timer snd lpc_ich shpchp sg mei joydev evdev mfd_core soundcore dcdbas serio_raw intel_rapl_perf(+) iptable_filter(+) nls_cp437 nls_utf8 vfat fat coretemp dell_smm_hwmon parport_pc ppdev lp parport ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto algif_skcipher af_alg dm_crypt dm_mod tpm_rng rng_core uhci_hcd pcrypt sr_mod cdrom sd_mod hid_lenovo usbhid hid nouveau crct10dif_pclmul crc32_pclmul video isci crc32c_intel mxm_wmi ahci xhci_pci libsas ehci_pci ghash_clmulni_intel wmi libahci scsi_transport_sas xhci_hcd ehci_hcd i2c_algo_bit pcbc drm_kms_helper libata e1000e ttm aesni_intel aes_x86_64 crypto_simd cryptd psmouse ptp glue_helper i2c_i801 usbcore scsi_mod [ 204.983738] pps_core drm button [ 204.983741] CPU: 11 PID: 262 Comm: kworker/u64:5 Not tainted 4.15.0-rc7 #2 [ 204.983742] Hardware name: Dell Inc. Precision T5600/0Y56T3, BIOS A09 05/03/2013 [ 204.983750] Workqueue: scsi_tmf_0 scmd_eh_abort_handler [scsi_mod] [ 204.983755] RIP: 0010:isci_task_abort_task+0x30/0x4b0 [isci] [ 204.983756] RSP: 0018:b0504152fcd8 EFLAGS: 00010246 [ 204.983757] RAX: RBX: 8a8f842d79a8 RCX: [ 204.983758] RDX: 8a8f80f7f800 RSI: RDI: [ 204.983759] RBP: 8a8f842d7948 R08: 8a8f870e1f60 R09: [ 204.983759] R10: R11: 0f27 R12: 0008 [ 204.983760] R13: R14: 8a8f80b51540 R15: [ 204.983761] FS: () GS:8a8f870c() knlGS: [ 204.983774] CS: 0010 DS: ES: CR0: 80050033 [ 204.983775] CR2: CR3: 000200a0a005 CR4: 000606e0 [ 204.983775] Call Trace: [ 204.983781] ? sched_clock+0x5/0x10 [ 204.983784] ? sched_clock_cpu+0xc/0xb0 [ 204.983787] ? sched_clock+0x5/0x10 [ 204.983788] ? sched_clock+0x5/0x10 [ 204.983790] ? sched_clock_cpu+0xc/0xb0 [ 204.983791] ? pick_next_task_fair+0x4de/0x5f0 [ 204.983794] ? __switch_to+0xa2/0x450 [ 204.983795] ? put_prev_entity+0x1e/0xe0 [ 204.983799] sas_eh_abort_handler+0x2f/0x50 [libsas] [ 204.983805] scmd_eh_abort_handler+0x56/0x210 [scsi_mod] [ 204.983809] process_one_work+0x188/0x380 [ 204.983811] worker_thread+0x2e/0x390 [ 204.983812] ? process_one_work+0x380/0x380 [ 204.983814] kthread+0x111/0x130 [ 204.983815] ? kthread_create_worker_on_cpu+0x70/0x70 [ 204.983819] ret_from_fork+0x1f/0x30 [ 204.983820] Code: 41 57 41 56 49 89 ff 41 55 41 54 4d 8d 67 08 55 53 48 81 ec 68 01 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 60 01 00 00 31 c0 <48> 8b 07 c7 44 24 08 00 00 00 00 c7 44 24 10 00 00 00 00 48 8b [ 204.983843] RIP: isci_task_abort_task+0x30/0x4b0 [isci] RSP: b0504152fcd8 [ 204.983843] CR2: [ 204.983845] ---[ end trace c55806a9bed49dc4 ]--- I have the full log from boot (2.2M) but I'm unsure how much private data appears in it so I'd rather not send it to a mailing list. If you need it I'll send it privately. Regards, -- Yves-Alexis signature.asc Description: This is a digitally signed message part
Re: [PATCHv2] libsas: Check for completed commands before calling lldd_abort_task()
On Mon, 2018-01-08 at 13:04 +0100, Hannes Reinecke wrote: > The abort handler might be racing with command completion, so the > task might already be NULL by the time the abort handler is called. Hi, I tried the patch on top of 4.15-rc7 (and without the revert of 90965761). I don't have the NULL pointer deref anymore, but the box timeouts after SCSI init. It's not completely dead, but I don't have any meaningful log from the kernel. Regards, -- Yves-Alexis signature.asc Description: This is a digitally signed message part
Oops: NULL pointer dereference - RIP: isci_task_abort_task+0x30/0x3e0 [isci]
Hi, since kernel 4.11 (sorry it took so long to report) I have a box failing to boot with a NULL pointer dereference (the box is stuck there afterwards). The bug has also been reported to the Debian BTS (https://bugs.debian.org/cgi- bin/bugreport.cgi?bug=882414) and a suggestion to revert 90965761 has been made. I can confirm it fix the boot issue. I don't have the complete stack trace at hand but there's an example in the Debian bug. The machine is a Dell Precision T5600 with the following SATA controllers: 00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset 6-Port SATA AHCI Controller (rev 05) 05:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset 4-Port SATA Storage Control Unit (rev 05) If you need more information or need me to test something, please ask. Regards, -- Yves-Alexis signature.asc Description: This is a digitally signed message part
Re: Page allocation failure (order 7) in UAS code
On ven., 2016-03-04 at 08:18 +0100, Hans de Goede wrote: > Thanks for testing, there shouldn't be any side-effects, I'll turn this into > a proper patch, add a: > > Reported-and-tested-by: Yves-Alexis Perez > > line to the comit msg and submit this upstream. > > Thanks, I guess it can also be CC: stable@ since it started in 4.4. Regards, -- Yves-Alexis -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Page allocation failure (order 7) in UAS code
On mar., 2016-03-01 at 11:49 +0100, Hans de Goede wrote: > Hi, > > On 01-03-16 10:42, Yves-Alexis Perez wrote: > > > > Hi, > > > > [sorry if this is not the right point for reporting bugs, I took the email > > addresses from MAINTAINERS but please point me to the correct place if > > needed] > > > > I have an external USB drive (Samsung M3), which apparently uses the UAS > > code. > > Starting with 4.4 (from Debian sid, I could retry with vanilla if needed), > > I > > can't mount the drive anymore after a while (few hours/days uptime). Just > > plugging the disk, I get page allocation failure in kernel logs: > Can you try building a kernel with the following line in > drivers/usb/storage/uas.c : > > .can_queue = 65536, /* Is there a limit on the _host_ ? */ > > (around line 815) Replaced with > > .can_queue = MAX_CMNDS, > > That should help as MAX_CMNDS is 256, so claiming that we can queue more > is not helpful, and that likely is what is causing this quite high order > alloc. After a few days, it seems that it does work fine, although I can't say anything about sides effects. Regards, -- Yves-Alexis signature.asc Description: This is a digitally signed message part
Re: Page allocation failure (order 7) in UAS code
On mar., 2016-03-01 at 11:49 +0100, Hans de Goede wrote: > Can you try building a kernel with the following line in > drivers/usb/storage/uas.c : > > .can_queue = 65536, /* Is there a limit on the _host_ ? */ > > (around line 815) Replaced with > > .can_queue = MAX_CMNDS, > > That should help as MAX_CMNDS is 256, so claiming that we can queue more > is not helpful, and that likely is what is causing this quite high order > alloc. Hi, I've rebuilt a 4.4.3 + above patch. I'll report back in few hours to see if I can still reproduce. Regards, -- Yves-Alexis signature.asc Description: This is a digitally signed message part
Page allocation failure (order 7) in UAS code
Hi, [sorry if this is not the right point for reporting bugs, I took the email addresses from MAINTAINERS but please point me to the correct place if needed] I have an external USB drive (Samsung M3), which apparently uses the UAS code. Starting with 4.4 (from Debian sid, I could retry with vanilla if needed), I can't mount the drive anymore after a while (few hours/days uptime). Just plugging the disk, I get page allocation failure in kernel logs: Feb 22 10:57:56 scapa kernel: [26557.268872] usb 2-2: new SuperSpeed USB device number 3 using xhci_hcd Feb 22 10:57:56 scapa kernel: [26557.285884] usb 2-2: New USB device found, idVendor=04e8, idProduct=61b6 Feb 22 10:57:56 scapa kernel: [26557.285888] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 Feb 22 10:57:56 scapa kernel: [26557.285891] usb 2-2: Product: Samsung M3 Portable Feb 22 10:57:56 scapa kernel: [26557.285893] usb 2-2: Manufacturer: Samsung M3 Portable Feb 22 10:57:56 scapa kernel: [26557.285895] usb 2-2: SerialNumber: 826C7DFB1367 Feb 22 10:57:56 scapa kernel: [26557.293693] scsi host4: uas Feb 22 10:57:56 scapa kernel: [26557.293707] kworker/0:0: page allocation failure: order:7, mode:0x2204020 Feb 22 10:57:56 scapa kernel: [26557.293711] CPU: 0 PID: 11015 Comm: kworker/0:0 Tainted: GW 4.4.0-1-amd64 #1 Debian 4.4.2-3~a.test Feb 22 10:57:56 scapa kernel: [26557.293713] Hardware name: LENOVO 20CMCTO1WW/20CMCTO1WW, BIOS N10ET41W (1.20 ) 01/19/2016 Feb 22 10:57:56 scapa kernel: [26557.293724] Workqueue: usb_hub_wq hub_event [usbcore] Feb 22 10:57:56 scapa kernel: [26557.293727] c0207f69 812e7679 02204020 Feb 22 10:57:56 scapa kernel: [26557.293730] 8116b23a 0001c0207f69 88024dffbb20 Feb 22 10:57:56 scapa kernel: [26557.293733] 810b475e 0001020c8220 c0207f69 0046 Feb 22 10:57:56 scapa kernel: [26557.293736] Call Trace: Feb 22 10:57:56 scapa kernel: [26557.293741] [] ? dump_stack+0x40/0x57 Feb 22 10:57:56 scapa kernel: [26557.293745] [] ? warn_alloc_failed+0xfa/0x150 Feb 22 10:57:56 scapa kernel: [26557.293749] [] ? __wake_up_common+0x4e/0x90 Feb 22 10:57:56 scapa kernel: [26557.293752] [] ? __alloc_pages_nodemask+0x306/0xb70 Feb 22 10:57:56 scapa kernel: [26557.293757] [] ? kmem_getpages+0x51/0xf0 Feb 22 10:57:56 scapa kernel: [26557.293759] [] ? fallback_alloc+0x145/0x1f0 Feb 22 10:57:56 scapa kernel: [26557.293765] [] ? init_tag_map+0x38/0xb0 Feb 22 10:57:56 scapa kernel: [26557.293768] [] ? __kmalloc+0x17f/0x1c0 Feb 22 10:57:56 scapa kernel: [26557.293771] [] ? init_tag_map+0x38/0xb0 Feb 22 10:57:56 scapa kernel: [26557.293774] [] ? __blk_queue_init_tags+0x40/0x80 Feb 22 10:57:56 scapa kernel: [26557.293781] [] ? scsi_add_host_with_dma+0x7b/0x2f0 [scsi_mod] Feb 22 10:57:56 scapa kernel: [26557.293785] [] ? uas_probe+0x41a/0x560 [uas] Feb 22 10:57:56 scapa kernel: [26557.293796] [] ? usb_probe_interface+0x1b3/0x300 [usbcore] Feb 22 10:57:56 scapa kernel: [26557.293801] [] ? driver_probe_device+0x212/0x480 Feb 22 10:57:56 scapa kernel: [26557.293803] [] ? __driver_attach+0x80/0x80 Feb 22 10:57:56 scapa kernel: [26557.293806] [] ? bus_for_each_drv+0x62/0xb0 Feb 22 10:57:56 scapa kernel: [26557.293809] [] ? __device_attach+0xd8/0x160 Feb 22 10:57:56 scapa kernel: [26557.293811] [] ? bus_probe_device+0x87/0xa0 Feb 22 10:57:56 scapa kernel: [26557.293814] [] ? device_add+0x3f5/0x660 Feb 22 10:57:56 scapa kernel: [26557.293823] [] ? usb_set_configuration+0x51b/0x8f0 [usbcore] Feb 22 10:57:56 scapa kernel: [26557.293835] [] ? generic_probe+0x28/0x80 [usbcore] Feb 22 10:57:56 scapa kernel: [26557.293838] [] ? driver_probe_device+0x212/0x480 Feb 22 10:57:56 scapa kernel: [26557.293840] [] ? __driver_attach+0x80/0x80 Feb 22 10:57:56 scapa kernel: [26557.293842] [] ? bus_for_each_drv+0x62/0xb0 Feb 22 10:57:56 scapa kernel: [26557.293845] [] ? __device_attach+0xd8/0x160 Feb 22 10:57:56 scapa kernel: [26557.293847] [] ? bus_probe_device+0x87/0xa0 Feb 22 10:57:56 scapa kernel: [26557.293850] [] ? device_add+0x3f5/0x660 Feb 22 10:57:56 scapa kernel: [26557.293857] [] ? usb_new_device+0x265/0x490 [usbcore] Feb 22 10:57:56 scapa kernel: [26557.293864] [] ? hub_event+0xfb3/0x14f0 [usbcore] Feb 22 10:57:56 scapa kernel: [26557.293868] [] ? process_one_work+0x19f/0x3d0 Feb 22 10:57:56 scapa kernel: [26557.293871] [] ? worker_thread+0x4d/0x450 Feb 22 10:57:56 scapa kernel: [26557.293873] [] ? process_one_work+0x3d0/0x3d0 Feb 22 10:57:56 scapa kernel: [26557.293876] [] ? kthread+0xcd/0xf0 Feb 22 10:57:56 scapa kernel: [26557.293879] [] ? kthread_create_on_node+0x190/0x190 Feb 22 10:57:56 scapa kernel: [26557.293883] [] ? ret_from_fork+0x3f/0x70 Feb 22 10:57:56 scapa kernel: [26557.293886] [] ? kthread_create_on_node+0x190/0x190 Feb 22 10:57:56 scapa kernel: [26557.293888] Mem-Info: Feb 22 10:57:56 scapa kernel: [26557.293892] active_anon:1