Re: [PATCH] libsas: Disable asynchronous aborts for SATA devices

2018-01-10 Thread Yves-Alexis Perez
On Tue, 2018-01-09 at 16:43 +0100, Hannes Reinecke wrote:
> Handling CD-ROM devices from libsas is decidedly odd, as libata
> relies on SCSI EH to be started to figure out that no medium is
> present.
> So we cannot do asynchronous aborts for SATA devices.

The box boots fine with this change, thanks!

Tested-by: Yves-Alexis Perez 
-- 
Yves-Alexis

signature.asc
Description: This is a digitally signed message part


Re: [PATCHv2] libsas: Check for completed commands before calling lldd_abort_task()

2018-01-09 Thread Yves-Alexis Perez
On Tue, 2018-01-09 at 10:30 +0100, Hannes Reinecke wrote:
> Can you try to boot the stock 4.15 kernel (without any patches) with
> scsi_mod.scsi_logging_level=9411
> on the kernel commandline and send me tha output?
> I really would like to see which command fails.
> THX.

Here it is:

[  204.960508] sd 0:0:1:0: [sdb] tag#0 Send: scmd 0x6f2d047e
[  204.960510] sd 0:0:0:0: [sda] tag#1 Send: scmd 0x9840a325
[  204.960511] sd 0:0:1:0: [sdb] tag#0 CDB: ATA command pass through(16) 85 06 
20 00 05 00 fe 00 00 00 00 00 00 40 ef 00
[  204.960512] sd 0:0:0:0: [sda] tag#1 CDB: ATA command pass through(16) 85 06 
20 00 05 00 fe 00 00 00 00 00 00 40 ef 00
[  204.960583] sd 0:0:1:0: [sdb] tag#0 Done: TIMEOUT_ERROR Result: 
hostbyte=DID_OK driverbyte=DRIVER_OK
[  204.960586] sd 0:0:1:0: [sdb] tag#0 CDB: ATA command pass through(16) 85 06 
20 00 05 00 fe 00 00 00 00 00 00 40 ef 00
[  204.983681] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[  204.983690] IP: isci_task_abort_task+0x30/0x4b0 [isci]
[  204.983691] PGD 0 P4D 0 
[  204.983693] Oops:  [#1] SMP PTI
[  204.983695] Modules linked in: snd_hwdep(+) snd_hda_core snd_pcm_oss 
snd_mixer_oss iTCO_wdt snd_pcm iTCO_vendor_support mei_me snd_timer snd lpc_ich 
shpchp sg mei joydev evdev mfd_core soundcore dcdbas serio_raw 
intel_rapl_perf(+) iptable_filter(+) nls_cp437 nls_utf8 vfat fat coretemp 
dell_smm_hwmon parport_pc ppdev lp parport ip_tables x_tables autofs4 ext4 
crc16 mbcache jbd2 fscrypto algif_skcipher af_alg dm_crypt dm_mod tpm_rng 
rng_core uhci_hcd pcrypt sr_mod cdrom sd_mod hid_lenovo usbhid hid nouveau 
crct10dif_pclmul crc32_pclmul video isci crc32c_intel mxm_wmi ahci xhci_pci 
libsas ehci_pci ghash_clmulni_intel wmi libahci scsi_transport_sas xhci_hcd 
ehci_hcd i2c_algo_bit pcbc drm_kms_helper libata e1000e ttm aesni_intel 
aes_x86_64 crypto_simd cryptd psmouse ptp glue_helper i2c_i801 usbcore scsi_mod
[  204.983738]  pps_core drm button
[  204.983741] CPU: 11 PID: 262 Comm: kworker/u64:5 Not tainted 4.15.0-rc7 #2
[  204.983742] Hardware name: Dell Inc. Precision T5600/0Y56T3, BIOS A09 
05/03/2013
[  204.983750] Workqueue: scsi_tmf_0 scmd_eh_abort_handler [scsi_mod]
[  204.983755] RIP: 0010:isci_task_abort_task+0x30/0x4b0 [isci]
[  204.983756] RSP: 0018:b0504152fcd8 EFLAGS: 00010246
[  204.983757] RAX:  RBX: 8a8f842d79a8 RCX: 
[  204.983758] RDX: 8a8f80f7f800 RSI:  RDI: 
[  204.983759] RBP: 8a8f842d7948 R08: 8a8f870e1f60 R09: 
[  204.983759] R10:  R11: 0f27 R12: 0008
[  204.983760] R13:  R14: 8a8f80b51540 R15: 
[  204.983761] FS:  () GS:8a8f870c() 
knlGS:
[  204.983774] CS:  0010 DS:  ES:  CR0: 80050033
[  204.983775] CR2:  CR3: 000200a0a005 CR4: 000606e0
[  204.983775] Call Trace:
[  204.983781]  ? sched_clock+0x5/0x10
[  204.983784]  ? sched_clock_cpu+0xc/0xb0
[  204.983787]  ? sched_clock+0x5/0x10
[  204.983788]  ? sched_clock+0x5/0x10
[  204.983790]  ? sched_clock_cpu+0xc/0xb0
[  204.983791]  ? pick_next_task_fair+0x4de/0x5f0
[  204.983794]  ? __switch_to+0xa2/0x450
[  204.983795]  ? put_prev_entity+0x1e/0xe0
[  204.983799]  sas_eh_abort_handler+0x2f/0x50 [libsas]
[  204.983805]  scmd_eh_abort_handler+0x56/0x210 [scsi_mod]
[  204.983809]  process_one_work+0x188/0x380
[  204.983811]  worker_thread+0x2e/0x390
[  204.983812]  ? process_one_work+0x380/0x380
[  204.983814]  kthread+0x111/0x130
[  204.983815]  ? kthread_create_worker_on_cpu+0x70/0x70
[  204.983819]  ret_from_fork+0x1f/0x30
[  204.983820] Code: 41 57 41 56 49 89 ff 41 55 41 54 4d 8d 67 08 55 53 48 81 
ec 68 01 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 60 01 00 00 31 c0 <48> 8b 
07 c7 44 24 08 00 00 00 00 c7 44 24 10 00 00 00 00 48 8b 
[  204.983843] RIP: isci_task_abort_task+0x30/0x4b0 [isci] RSP: b0504152fcd8
[  204.983843] CR2: 
[  204.983845] ---[ end trace c55806a9bed49dc4 ]---

I have the full log from boot (2.2M) but I'm unsure how much private data
appears in it so I'd rather not send it to a mailing list. If you need it I'll
send it privately.

Regards,
-- 
Yves-Alexis

signature.asc
Description: This is a digitally signed message part


Re: [PATCHv2] libsas: Check for completed commands before calling lldd_abort_task()

2018-01-08 Thread Yves-Alexis Perez
On Mon, 2018-01-08 at 13:04 +0100, Hannes Reinecke wrote:
> The abort handler might be racing with command completion, so the
> task might already be NULL by the time the abort handler is called.

Hi,

I tried the patch on top of 4.15-rc7 (and without the revert of 90965761). I
don't have the NULL pointer deref anymore, but the box timeouts after SCSI
init. It's not completely dead, but I don't have any meaningful log from the
kernel.

Regards,
-- 
Yves-Alexis

signature.asc
Description: This is a digitally signed message part


Oops: NULL pointer dereference - RIP: isci_task_abort_task+0x30/0x3e0 [isci]

2018-01-05 Thread Yves-Alexis Perez
Hi,

since kernel 4.11 (sorry it took so long to report) I have a box failing to
boot with a NULL pointer dereference (the box is stuck there afterwards).

The bug has also been reported to the Debian BTS (https://bugs.debian.org/cgi-
bin/bugreport.cgi?bug=882414) and a suggestion to revert 90965761 has been
made. I can confirm it fix the boot issue.

I don't have the complete stack trace at hand but there's an example in the
Debian bug. The machine is a Dell Precision T5600 with the following SATA
controllers:

00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset 6-Port SATA
AHCI Controller (rev 05)
05:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset 4-Port 
SATA Storage Control Unit (rev 05)

If you need more information or need me to test something, please ask.

Regards,
-- 
Yves-Alexis

signature.asc
Description: This is a digitally signed message part


Re: Page allocation failure (order 7) in UAS code

2016-03-04 Thread Yves-Alexis Perez
On ven., 2016-03-04 at 08:18 +0100, Hans de Goede wrote:
> Thanks for testing, there shouldn't be any side-effects, I'll turn this into
> a proper patch, add a:
> 
> Reported-and-tested-by: Yves-Alexis Perez 
> 
> line to the comit msg and submit this upstream.
> 
> 
Thanks,

I guess it can also be CC: stable@ since it started in 4.4.

Regards,
-- 
Yves-Alexis

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Page allocation failure (order 7) in UAS code

2016-03-03 Thread Yves-Alexis Perez
On mar., 2016-03-01 at 11:49 +0100, Hans de Goede wrote:
> Hi,
> 
> On 01-03-16 10:42, Yves-Alexis Perez wrote:
> > 
> > Hi,
> > 
> > [sorry if this is not the right point for reporting bugs, I took the email
> > addresses from MAINTAINERS but please point me to the correct place if
> > needed]
> > 
> > I have an external USB drive (Samsung M3), which apparently uses the UAS
> > code.
> > Starting with 4.4 (from Debian sid, I could retry with vanilla if needed),
> > I
> > can't mount the drive anymore after a while (few hours/days uptime). Just
> > plugging the disk, I get page allocation failure in kernel logs:
> Can you try building a kernel with the following line in
> drivers/usb/storage/uas.c :
> 
>  .can_queue = 65536, /* Is there a limit on the _host_ ? */
> 
> (around line 815) Replaced with
> 
>  .can_queue = MAX_CMNDS,
> 
> That should help as MAX_CMNDS is 256, so claiming that we can queue more
> is not helpful, and that likely is what is causing this quite high order
> alloc.

After a few days, it seems that it does work fine, although I can't say
anything about sides effects.

Regards,
-- 
Yves-Alexis



signature.asc
Description: This is a digitally signed message part


Re: Page allocation failure (order 7) in UAS code

2016-03-01 Thread Yves-Alexis Perez
On mar., 2016-03-01 at 11:49 +0100, Hans de Goede wrote:
> Can you try building a kernel with the following line in
> drivers/usb/storage/uas.c :
> 
>  .can_queue = 65536, /* Is there a limit on the _host_ ? */
> 
> (around line 815) Replaced with
> 
>  .can_queue = MAX_CMNDS,
> 
> That should help as MAX_CMNDS is 256, so claiming that we can queue more
> is not helpful, and that likely is what is causing this quite high order
> alloc.

Hi,

I've rebuilt a 4.4.3 + above patch. I'll report back in few hours to see if I
can still reproduce.

Regards,
-- 
Yves-Alexis



signature.asc
Description: This is a digitally signed message part


Page allocation failure (order 7) in UAS code

2016-03-01 Thread Yves-Alexis Perez
Hi,

[sorry if this is not the right point for reporting bugs, I took the email
addresses from MAINTAINERS but please point me to the correct place if needed]

I have an external USB drive (Samsung M3), which apparently uses the UAS code.
Starting with 4.4 (from Debian sid, I could retry with vanilla if needed), I
can't mount the drive anymore after a while (few hours/days uptime). Just
plugging the disk, I get page allocation failure in kernel logs:

Feb 22 10:57:56 scapa kernel: [26557.268872] usb 2-2: new SuperSpeed USB device 
number 3 using xhci_hcd
Feb 22 10:57:56 scapa kernel: [26557.285884] usb 2-2: New USB device found, 
idVendor=04e8, idProduct=61b6
Feb 22 10:57:56 scapa kernel: [26557.285888] usb 2-2: New USB device strings: 
Mfr=1, Product=2, SerialNumber=3
Feb 22 10:57:56 scapa kernel: [26557.285891] usb 2-2: Product: Samsung M3 
Portable
Feb 22 10:57:56 scapa kernel: [26557.285893] usb 2-2: Manufacturer: Samsung M3 
Portable
Feb 22 10:57:56 scapa kernel: [26557.285895] usb 2-2: SerialNumber: 
826C7DFB1367
Feb 22 10:57:56 scapa kernel: [26557.293693] scsi host4: uas
Feb 22 10:57:56 scapa kernel: [26557.293707] kworker/0:0: page allocation 
failure: order:7, mode:0x2204020
Feb 22 10:57:56 scapa kernel: [26557.293711] CPU: 0 PID: 11015 Comm: 
kworker/0:0 Tainted: GW   4.4.0-1-amd64 #1 Debian 4.4.2-3~a.test
Feb 22 10:57:56 scapa kernel: [26557.293713] Hardware name: LENOVO 
20CMCTO1WW/20CMCTO1WW, BIOS N10ET41W (1.20 ) 01/19/2016
Feb 22 10:57:56 scapa kernel: [26557.293724] Workqueue: usb_hub_wq hub_event 
[usbcore]
Feb 22 10:57:56 scapa kernel: [26557.293727]   c0207f69 
812e7679 02204020
Feb 22 10:57:56 scapa kernel: [26557.293730]  8116b23a 0001c0207f69 
 88024dffbb20
Feb 22 10:57:56 scapa kernel: [26557.293733]  810b475e 0001020c8220 
c0207f69 0046
Feb 22 10:57:56 scapa kernel: [26557.293736] Call Trace:
Feb 22 10:57:56 scapa kernel: [26557.293741]  [] ? 
dump_stack+0x40/0x57
Feb 22 10:57:56 scapa kernel: [26557.293745]  [] ? 
warn_alloc_failed+0xfa/0x150
Feb 22 10:57:56 scapa kernel: [26557.293749]  [] ? 
__wake_up_common+0x4e/0x90
Feb 22 10:57:56 scapa kernel: [26557.293752]  [] ? 
__alloc_pages_nodemask+0x306/0xb70
Feb 22 10:57:56 scapa kernel: [26557.293757]  [] ? 
kmem_getpages+0x51/0xf0
Feb 22 10:57:56 scapa kernel: [26557.293759]  [] ? 
fallback_alloc+0x145/0x1f0
Feb 22 10:57:56 scapa kernel: [26557.293765]  [] ? 
init_tag_map+0x38/0xb0
Feb 22 10:57:56 scapa kernel: [26557.293768]  [] ? 
__kmalloc+0x17f/0x1c0
Feb 22 10:57:56 scapa kernel: [26557.293771]  [] ? 
init_tag_map+0x38/0xb0
Feb 22 10:57:56 scapa kernel: [26557.293774]  [] ? 
__blk_queue_init_tags+0x40/0x80
Feb 22 10:57:56 scapa kernel: [26557.293781]  [] ? 
scsi_add_host_with_dma+0x7b/0x2f0 [scsi_mod]
Feb 22 10:57:56 scapa kernel: [26557.293785]  [] ? 
uas_probe+0x41a/0x560 [uas]
Feb 22 10:57:56 scapa kernel: [26557.293796]  [] ? 
usb_probe_interface+0x1b3/0x300 [usbcore]
Feb 22 10:57:56 scapa kernel: [26557.293801]  [] ? 
driver_probe_device+0x212/0x480
Feb 22 10:57:56 scapa kernel: [26557.293803]  [] ? 
__driver_attach+0x80/0x80
Feb 22 10:57:56 scapa kernel: [26557.293806]  [] ? 
bus_for_each_drv+0x62/0xb0
Feb 22 10:57:56 scapa kernel: [26557.293809]  [] ? 
__device_attach+0xd8/0x160
Feb 22 10:57:56 scapa kernel: [26557.293811]  [] ? 
bus_probe_device+0x87/0xa0
Feb 22 10:57:56 scapa kernel: [26557.293814]  [] ? 
device_add+0x3f5/0x660
Feb 22 10:57:56 scapa kernel: [26557.293823]  [] ? 
usb_set_configuration+0x51b/0x8f0 [usbcore]
Feb 22 10:57:56 scapa kernel: [26557.293835]  [] ? 
generic_probe+0x28/0x80 [usbcore]
Feb 22 10:57:56 scapa kernel: [26557.293838]  [] ? 
driver_probe_device+0x212/0x480
Feb 22 10:57:56 scapa kernel: [26557.293840]  [] ? 
__driver_attach+0x80/0x80
Feb 22 10:57:56 scapa kernel: [26557.293842]  [] ? 
bus_for_each_drv+0x62/0xb0
Feb 22 10:57:56 scapa kernel: [26557.293845]  [] ? 
__device_attach+0xd8/0x160
Feb 22 10:57:56 scapa kernel: [26557.293847]  [] ? 
bus_probe_device+0x87/0xa0
Feb 22 10:57:56 scapa kernel: [26557.293850]  [] ? 
device_add+0x3f5/0x660
Feb 22 10:57:56 scapa kernel: [26557.293857]  [] ? 
usb_new_device+0x265/0x490 [usbcore]
Feb 22 10:57:56 scapa kernel: [26557.293864]  [] ? 
hub_event+0xfb3/0x14f0 [usbcore]
Feb 22 10:57:56 scapa kernel: [26557.293868]  [] ? 
process_one_work+0x19f/0x3d0
Feb 22 10:57:56 scapa kernel: [26557.293871]  [] ? 
worker_thread+0x4d/0x450
Feb 22 10:57:56 scapa kernel: [26557.293873]  [] ? 
process_one_work+0x3d0/0x3d0
Feb 22 10:57:56 scapa kernel: [26557.293876]  [] ? 
kthread+0xcd/0xf0
Feb 22 10:57:56 scapa kernel: [26557.293879]  [] ? 
kthread_create_on_node+0x190/0x190
Feb 22 10:57:56 scapa kernel: [26557.293883]  [] ? 
ret_from_fork+0x3f/0x70
Feb 22 10:57:56 scapa kernel: [26557.293886]  [] ? 
kthread_create_on_node+0x190/0x190
Feb 22 10:57:56 scapa kernel: [26557.293888] Mem-Info:
Feb 22 10:57:56 scapa kernel: [26557.293892] active_anon:1