Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk
On (04/02/18 13:14), wen.yan...@zte.com.cn wrote: > >> It's true that this print for the same device is useless. But it's >> useful for different devices. Is it possible to limit the print only >> for the same device? > >In our scene, it's just for the same device (q->queuedata), Thanks. Yes, what Jason meant was that rate limit struct is shared by different call sites - including scsi_request_fn() from different devices. If device1->scsi_request_fn()->sdev_printk_ratelimited() causes printk rate limit, then messages from device2->scsi_request_fn()->sdev_printk_ratelimited() may be lost entirely, unless you have enough of them. -ss
Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk
Hello, On (04/02/18 09:58), Wen Yang wrote: > There would be so many same lines printed by frequent printk if one > disk went wrong, like, > [ 546.185242] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185258] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185280] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185307] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185334] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185364] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185390] sd 0:1:0:0: rejecting I/O to offline device > [ 546.185410] sd 0:1:0:0: rejecting I/O to offline device > For slow serial console, the frequent printk may be blocked for a > long time, and if any spin_lock has been acquired before the printk > like in scsi_request_fn, watchdog could be triggered. Did you test the patch? Rate limiting does not completely remove printk calls. printk is still there, in a loop under spin_lock. A big enough I/O request queue can cause the same lockup problems. -ss
Re: 答复: Re: [PATCH] scsi: Replace sdev_printk with printk_deferred to avoid
On (03/28/18 10:29), wen.yan...@zte.com.cn wrote: >Hello Bart, > >We have a detailed discussion of the problem. >Sergey Senozhatsky, Petr and many people have made a lot of efforts about >it. >Please see this link: >https://bugzilla.kernel.org/show_bug.cgi?id=199003 > >1, Petr suggests that it should be modified in this way: >IMHO, printing the same message so many times is useless. Therefore >some throttling would make sense. If we want to keep sdev_printk(), The thing with retelimiting is that - yes, we do less printks but we still do them under queue spin_lock. So I was thinking about something like below [a quick-n-dirty workaround] --- drivers/scsi/scsi_lib.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 0dfec0dedd5e..6c930fbdd24c 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1825,9 +1825,13 @@ static void scsi_request_fn(struct request_queue *q) break; if (unlikely(!scsi_device_online(sdev))) { + scsi_kill_request(req, q); + spin_unlock_irq(q->queue_lock); + sdev_printk(KERN_ERR, sdev, "rejecting I/O to offline device\n"); - scsi_kill_request(req, q); + + spin_lock_irq(q->queue_lock); continue; }
Re: [Bug 199003] console stalled, cause Hard LOCKUP.
I'll Cc blockdev On (03/27/18 08:36), bugzilla-dae...@bugzilla.kernel.org wrote: > > --- Comment #17 from sergey.senozhatsky.w...@gmail.com --- > > On (03/26/18 13:05), bugzilla-dae...@bugzilla.kernel.org wrote: > > > Therefore the serial console is actually pretty fast. It seems that the > > > deadline > > > 10ms-per-character is not in the game here. > > > > As the name suggests this is dmesg - content of logbuf. We can't tell > > anything about serial consoles speed from it. > > Grrr, you are right. It would be interesting to see the output from > the serial port as well. > > Anyway, it does not change the fact that printing so many same lines is > useless. The throttling still would make sense and probably would > solve the problem. You are right. Looking at backtraces (https://bugzilla.kernel.org/attachment.cgi?id=274953=edit) there *probably* was just one CPU doing all printk-s and all printouts. And there was one CPU waiting for that printing CPU to unlock the queue spin_lock. The printing CPU was looping in scsi_request_fn() picking up requests and calling sdev_printk() for each of them, because the device was offline. Given that serial console is not very fast, that we called serial console under queue spin_lock and the number of printks called, it was enough to lockup the CPU which was spining on queue spin_lock and to hard lockup the system. scsi_request_fn() does unlock the queue lock later, but not in that !scsi_device_online(sdev) error case. scsi_request_fn() { for (;;) { int rtn; /* * get next queueable request. We do this early to make sure * that the request is fully prepared even if we cannot * accept it. */ req = blk_peek_request(q); if (!req) break; if (unlikely(!scsi_device_online(sdev))) { sdev_printk(KERN_ERR, sdev, "rejecting I/O to offline device\n"); scsi_kill_request(req, q); continue; ^ still under spinlock } } I'd probably just unlock/lock queue lock, rather than ratelimit printk-s, before `continue'. Dunno. James, Martin, what do you think? -ss
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
Hello Peter, On (08/30/17 10:47), Peter Zijlstra wrote: [..] > On Wed, Aug 30, 2017 at 10:42:07AM +0200, Peter Zijlstra wrote: > > > > So the overhead looks to be spread out over all sorts, which makes it > > harder to find and fix. > > > > stack unwinding is done lots and is fairly expensive, I've not yet > > checked if crossrelease does too much of that. > > Aah, we do an unconditional stack unwind for every __lock_acquire() now. > It keeps a trace in the xhlocks[]. > > Does the below cure most of that overhead? thanks. mostly yes. the kernel is not so dramatically slower now. it's still seems to be a bit slower, which is expected I suppose, but over all it's much better real1m35.209s user4m12.467s sys 0m49.457s // approx 10 seconds slower. -ss
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
Hi, On (08/30/17 14:43), Byungchul Park wrote: [..] > > notably slower than earlier 4.13 linux-next. (e.g. scrolling in vim > > is irritatingly slow) > > To Ingo, > > I cannot decide if we have to roll back CONFIG_LOCKDEP_CROSSRELEASE > dependency on CONFIG_PROVE_LOCKING in Kconfig. With them enabled, > lockdep detection becomes strong but has performance impact. But, > it's anyway a debug option so IMHO we don't have to take case of the > performance impact. Please let me know your decision. well, I expected it :) I've been running lockdep enabled kernels for years, and was OK with the performance. but now it's just too much and I'm looking at disabling lockdep. a more relevant test -- compilation of a relatively small project LOCKDEP -CROSSRELEASE -COMPLETIONS LOCKDEP +CROSSRELEASE +COMPLETIONS real1m23.722s real2m9.969s user4m11.300s user4m15.458s sys 0m49.386s sys 2m3.594s you don't want to know how much time now it takes to recompile the kernel ;) -ss
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On (08/23/17 09:03), Byungchul Park wrote: [..] > > Byungchul, did you add the crosslock checks to lockdep? Can you have a look > > at > > the above report? That report namely doesn't make sense to me. > > The report is talking about the following lockup: > > A work in a worker A task work on exit to user > -- --- > mutex_lock(>bd_mutex) >mutext_lock(>bd_mutex) > blk_execute_rq() >wait_for_completion_io_timeout() >complete() > [..] > To Peterz, > > Anyway I wanted to avoid lockdep reports in the case using a timeout > interface. Do you think it's still worth reporting the kind of lockup? > I'm ok if you do. Byungchul, a quick question. have you measured the performance impact? somehow my linux-next is notably slower than earlier 4.13 linux-next. (e.g. scrolling in vim is irritatingly slow) `time dmesg' shows some difference, but probably that's not a good test. !LOCKDEPLOCKDEP LOCKDEP -CROSSRELEASE -COMPLETIONS real 0m0.661s 0m2.290s0m1.920s user 0m0.010s 0m0.105s0m0.000s sys 0m0.636s 0m2.224s0m1.888s anyone else "sees"/"can confirm" the slow down? it gets back to "usual normal" when I disable CROSSRELEASE and COMPLETIONS. --- diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index b19c491cbc4e..cdc30ef81c5e 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1091,8 +1091,6 @@ config PROVE_LOCKING select DEBUG_MUTEXES select DEBUG_RT_MUTEXES if RT_MUTEXES select DEBUG_LOCK_ALLOC - select LOCKDEP_CROSSRELEASE - select LOCKDEP_COMPLETIONS select TRACE_IRQFLAGS default n help --- -ss
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
Hi, On (08/24/17 12:39), Boqun Feng wrote: > On Wed, Aug 23, 2017 at 02:55:17PM +0900, Sergey Senozhatsky wrote: > > On (08/23/17 13:35), Boqun Feng wrote: > > > > KERN_CONT and "\n" should not be together. "\n" flushes the cont > > > > buffer immediately. > > > > > > > > > > Hmm.. Not quite familiar with printk() stuffs, but I could see several > > > usages of printk(KERN_CONT "...\n") in kernel. > > > > > > Did a bit research myself, and I now think the inappropriate use is to > > > use a KERN_CONT printk *after* another printk ending with a "\n". > > > > ah... I didn't check __print_lock_name(): it leaves unflushed cont buffer > > upon the return. sorry, your code is correct. > > > > So means printk(KERN_CON "..."); + printk(KERN_CONT "...\n") is a > correct usage, right? well, yes. with one precondition - there should be no printk-s from other CPUs/tasks in between printk(KERN_CON "..."); + printk(KERN_CONT "...\n") ^ here we can have a preliminary flush and broken cont line. but it's been this way forever. -ss
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On (08/23/17 13:35), Boqun Feng wrote: > > KERN_CONT and "\n" should not be together. "\n" flushes the cont > > buffer immediately. > > > > Hmm.. Not quite familiar with printk() stuffs, but I could see several > usages of printk(KERN_CONT "...\n") in kernel. > > Did a bit research myself, and I now think the inappropriate use is to > use a KERN_CONT printk *after* another printk ending with a "\n". ah... I didn't check __print_lock_name(): it leaves unflushed cont buffer upon the return. sorry, your code is correct. -ss > > > printk("\n *** DEADLOCK ***\n\n"); > > > + } else if (cross_lock(src->instance)) { > > > + printk(" Possible unsafe locking scenario by crosslock:\n\n"); > > > + printk(" CPU0CPU1\n"); > > > + printk(" \n"); > > > + printk(" lock("); > > > + __print_lock_name(target); > > > + printk(KERN_CONT ");\n"); > > > + printk(" lock("); > > > + __print_lock_name(source); > > > + printk(KERN_CONT ");\n"); > > > + printk(" lock("); > > > + __print_lock_name(parent == source ? target : parent); > > > + printk(KERN_CONT ");\n"); > > > + printk(" unlock("); > > > + __print_lock_name(source); > > > + printk(KERN_CONT ");\n"); > > > + printk("\n *** DEADLOCK ***\n\n"); > > > } else { > > > printk(" Possible unsafe locking scenario:\n\n"); > > > printk(" CPU0CPU1\n"); > > > -- > > > 2.14.1 > > >
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On (08/23/17 13:35), Boqun Feng wrote: [..] > > > printk(KERN_CONT ");\n"); > > > > KERN_CONT and "\n" should not be together. "\n" flushes the cont > > buffer immediately. > > > > Hmm.. Not quite familiar with printk() stuffs, but I could see several > usages of printk(KERN_CONT "...\n") in kernel. > > Did a bit research myself, and I now think the inappropriate use is to > use a KERN_CONT printk *after* another printk ending with a "\n". Am I > missing some recent changes or rules of KERN_CONT? has been this way for quite some time (if not always). LOG_NEWLINE results in cont_flush(), which log_store() the content of KERN_CONT buffer. if we see that supplied message has no \n then we store it in a dedicated buffer (cont buffer) if (!(lflags & LOG_NEWLINE)) return cont_add(); return log_store(); we flush that buffer (move its content to the kernel log buffer) when we receive a message with a \n or when printk() from another task/context interrupts the current cont line and, thus, forces us to flush. -ss
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On (08/23/17 12:38), Boqun Feng wrote: [..] > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index 642fb5362507..a3709e15f609 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -1156,6 +1156,23 @@ print_circular_lock_scenario(struct held_lock *src, > __print_lock_name(target); > printk(KERN_CONT ");\n"); KERN_CONT and "\n" should not be together. "\n" flushes the cont buffer immediately. -ss > printk("\n *** DEADLOCK ***\n\n"); > + } else if (cross_lock(src->instance)) { > + printk(" Possible unsafe locking scenario by crosslock:\n\n"); > + printk(" CPU0CPU1\n"); > + printk(" \n"); > + printk(" lock("); > + __print_lock_name(target); > + printk(KERN_CONT ");\n"); > + printk(" lock("); > + __print_lock_name(source); > + printk(KERN_CONT ");\n"); > + printk(" lock("); > + __print_lock_name(parent == source ? target : parent); > + printk(KERN_CONT ");\n"); > + printk(" unlock("); > + __print_lock_name(source); > + printk(KERN_CONT ");\n"); > + printk("\n *** DEADLOCK ***\n\n"); > } else { > printk(" Possible unsafe locking scenario:\n\n"); > printk(" CPU0CPU1\n"); > -- > 2.14.1 >
Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
On (08/23/17 09:03), Byungchul Park wrote: [..] aha, ok > The report is talking about the following lockup: > > A work in a worker A task work on exit to user > -- --- > mutex_lock(>bd_mutex) >mutext_lock(>bd_mutex) > blk_execute_rq() >wait_for_completion_io_timeout() >complete() > > Is this impossible? I was really confused how this "unlock" may lead to a deadlock > > > other info that might help us debug this: > > > Possible unsafe locking scenario by crosslock: > > >CPU0CPU1 > > > > > > lock(>bd_mutex); > > > lock((complete)#2); > > >lock(>bd_mutex); > > >unlock((complete)#2); any chance the report can be improved? mention timeout, etc? // well, if this functionality will stay. p.s. Bart Van Assche, thanks for Cc-ing Park Byungchul, I was really sure I didn't enabled the cross-release, but apparently I was wrong: CONFIG_LOCKDEP_CROSSRELEASE=y CONFIG_LOCKDEP_COMPLETIONS=y -ss
possible circular locking dependency detected [was: linux-next: Tree for Aug 22]
Hello, == WARNING: possible circular locking dependency detected 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tainted -- fsck.ext4/148 is trying to acquire lock: (>bd_mutex){+.+.}, at: [] __blkdev_put+0x33/0x190 but now in release context of a crosslock acquired at the following: ((complete)#2){+.+.}, at: [] blk_execute_rq+0xbb/0xda which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((complete)#2){+.+.}: lock_acquire+0x176/0x19e __wait_for_common+0x50/0x1e3 blk_execute_rq+0xbb/0xda scsi_execute+0xc3/0x17d [scsi_mod] sd_revalidate_disk+0x112/0x1549 [sd_mod] rescan_partitions+0x48/0x2c4 __blkdev_get+0x14b/0x37c blkdev_get+0x191/0x2c0 device_add_disk+0x2b4/0x3e5 sd_probe_async+0xf8/0x17e [sd_mod] async_run_entry_fn+0x34/0xe0 process_one_work+0x2af/0x4d1 worker_thread+0x19a/0x24f kthread+0x133/0x13b ret_from_fork+0x27/0x40 -> #0 (>bd_mutex){+.+.}: __blkdev_put+0x33/0x190 blkdev_close+0x24/0x27 __fput+0xee/0x18a task_work_run+0x79/0xa0 prepare_exit_to_usermode+0x9b/0xb5 other info that might help us debug this: Possible unsafe locking scenario by crosslock: CPU0CPU1 lock(>bd_mutex); lock((complete)#2); lock(>bd_mutex); unlock((complete)#2); *** DEADLOCK *** 4 locks held by fsck.ext4/148: #0: (>bd_mutex){+.+.}, at: [] __blkdev_put+0x33/0x190 #1: (rcu_read_lock){}, at: [] rcu_lock_acquire+0x0/0x20 #2: (&(>lock)->rlock){-.-.}, at: [] ata_scsi_queuecmd+0x23/0x74 [libata] #3: (>wait#14){-...}, at: [] complete+0x18/0x50 stack backtrace: CPU: 1 PID: 148 Comm: fsck.ext4 Not tainted 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Call Trace: dump_stack+0x67/0x8e print_circular_bug+0x2a1/0x2af ? zap_class+0xc5/0xc5 check_prev_add+0x76/0x20d ? __lock_acquire+0xc27/0xcc8 lock_commit_crosslock+0x327/0x35e complete+0x24/0x50 scsi_end_request+0x8d/0x176 [scsi_mod] scsi_io_completion+0x1be/0x423 [scsi_mod] __blk_mq_complete_request+0x112/0x131 ata_scsi_simulate+0x212/0x218 [libata] __ata_scsi_queuecmd+0x1be/0x1de [libata] ata_scsi_queuecmd+0x41/0x74 [libata] scsi_dispatch_cmd+0x194/0x2af [scsi_mod] scsi_queue_rq+0x1e0/0x26f [scsi_mod] blk_mq_dispatch_rq_list+0x193/0x2a7 ? _raw_spin_unlock+0x2e/0x40 blk_mq_sched_dispatch_requests+0x132/0x176 __blk_mq_run_hw_queue+0x59/0xc5 __blk_mq_delay_run_hw_queue+0x5f/0xc1 blk_mq_flush_plug_list+0xfc/0x10b blk_flush_plug_list+0xc6/0x1eb blk_finish_plug+0x25/0x32 generic_writepages+0x56/0x63 do_writepages+0x36/0x70 __filemap_fdatawrite_range+0x59/0x5f filemap_write_and_wait+0x19/0x4f __blkdev_put+0x5f/0x190 blkdev_close+0x24/0x27 __fput+0xee/0x18a task_work_run+0x79/0xa0 prepare_exit_to_usermode+0x9b/0xb5 entry_SYSCALL_64_fastpath+0xab/0xad RIP: 0033:0x7ff5755a2f74 RSP: 002b:7ffe46fce038 EFLAGS: 0246 ORIG_RAX: 0003 RAX: RBX: 555ddeddded0 RCX: 7ff5755a2f74 RDX: 1000 RSI: 555ddede2580 RDI: 0004 RBP: R08: 555ddede2580 R09: 555ddedde080 R10: 00010800 R11: 0246 R12: 555ddedddfa0 R13: 7ff576523680 R14: 1000 R15: 555ddeddc2b0 -ss
Re: [-next] BUG_ON in scsi_target_destroy()
Hello, On (04/13/16 08:14), James Bottomley wrote: [..] > How about good grief no! A device with multiple targets will get it's > lists screwed with this > > The STARGET_REMOVE state you added only applies to the case we're > trying to kill a target. In the natural operation case, which is what > everyone else is running into, we will try to remove a running target > when it has no more scsi devices left on it. So the correct patch > should be to make the BUG_ON see this: works for me. Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com> -ss > James > > --- > > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c > index 27df7e7..e0a78f5 100644 > --- a/drivers/scsi/scsi_scan.c > +++ b/drivers/scsi/scsi_scan.c > @@ -319,8 +319,7 @@ static void scsi_target_destroy(struct scsi_target > *starget) > struct Scsi_Host *shost = dev_to_shost(dev->parent); > unsigned long flags; > > - BUG_ON(starget->state != STARGET_REMOVE && > -starget->state != STARGET_CREATED); > + BUG_ON(starget->state == STARGET_DEL); > starget->state = STARGET_DEL; > transport_destroy_device(dev); > spin_lock_irqsave(shost->host_lock, flags); > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-next] BUG_ON in scsi_target_destroy()
Hi, On (04/13/16 10:41), Johannes Thumshirn wrote: > Hi Sergey, Xiong, > > Can you try below patch? it panics my system. first it warn_on-s in lib/kobject.c:244 then NULL dereferences do_scan_async __scsi_remove_device scsi_target_reap device_del dpm_sysfs_remove sysfs_unmerge_group kernfs_find_and_get_ns kernfs_find_ns -ss -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[-next] BUG_ON in scsi_target_destroy()
Hello, commit 7b106f2de6938c31ce5e9c86bc70ad3904666b96 Author: Johannes ThumshirnDate: Tue Apr 5 11:50:44 2016 +0200 scsi: Add intermediate STARGET_REMOVE state to scsi_target_state BUG_ON()s (next-20160411) each time I remove a usb flash [ 49.561600] [] scsi_target_destroy+0x5a/0xcb [scsi_mod] [ 49.561607] [] scsi_target_reap+0x4a/0x4f [scsi_mod] [ 49.561613] [] __scsi_remove_device+0xc3/0xd0 [scsi_mod] [ 49.561619] [] scsi_forget_host+0x52/0x63 [scsi_mod] [ 49.561623] [] scsi_remove_host+0x8c/0x102 [scsi_mod] [ 49.561627] [] usb_stor_disconnect+0x6b/0xab [usb_storage] [ 49.561634] [] usb_unbind_interface+0x77/0x1ca [usbcore] [ 49.561636] [] __device_release_driver+0x9d/0x121 [ 49.561638] [] device_release_driver+0x23/0x30 [ 49.561639] [] bus_remove_device+0xfb/0x10e [ 49.561641] [] device_del+0x164/0x1e6 [ 49.561648] [] ? remove_intf_ep_devs+0x3b/0x48 [usbcore] [ 49.561655] [] usb_disable_device+0x84/0x1a5 [usbcore] [ 49.561661] [] usb_disconnect+0x94/0x19f [usbcore] [ 49.561667] [] hub_event+0x5c1/0xdea [usbcore] [ 49.561670] [] process_one_work+0x1dc/0x37f [ 49.561672] [] worker_thread+0x282/0x36d [ 49.561673] [] ? rescuer_thread+0x2ae/0x2ae [ 49.561675] [] kthread+0xd2/0xda [ 49.561678] [] ret_from_fork+0x22/0x40 [ 49.561679] [] ? kthread_worker_fn+0x13e/0x13e -ss -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] st: convert DRIVER_ATTR macros to DRIVER_ATTR_RO
On (06/24/15 06:10), Seymour, Shane M wrote: [..] /* The sysfs driver interface. Read-only at the moment */ -static ssize_t st_try_direct_io_show(struct device_driver *ddp, char *buf) +static ssize_t try_direct_io_show(struct device_driver *ddp, char *buf) { - return snprintf(buf, PAGE_SIZE, %d\n, try_direct_io); + return sprintf(buf, %d\n, try_direct_io); } a nitpick, per Documentation/filesystems/sysfs.txt : : - show() should always use scnprintf(). : -ss -static DRIVER_ATTR(try_direct_io, S_IRUGO, st_try_direct_io_show, NULL); +static DRIVER_ATTR_RO(try_direct_io); -static ssize_t st_fixed_buffer_size_show(struct device_driver *ddp, char *buf) +static ssize_t fixed_buffer_size_show(struct device_driver *ddp, char *buf) { - return snprintf(buf, PAGE_SIZE, %d\n, st_fixed_buffer_size); + return sprintf(buf, %d\n, st_fixed_buffer_size); } -static DRIVER_ATTR(fixed_buffer_size, S_IRUGO, st_fixed_buffer_size_show, NULL); +static DRIVER_ATTR_RO(fixed_buffer_size); -static ssize_t st_max_sg_segs_show(struct device_driver *ddp, char *buf) +static ssize_t max_sg_segs_show(struct device_driver *ddp, char *buf) { - return snprintf(buf, PAGE_SIZE, %d\n, st_max_sg_segs); + return sprintf(buf, %d\n, st_max_sg_segs); } -static DRIVER_ATTR(max_sg_segs, S_IRUGO, st_max_sg_segs_show, NULL); +static DRIVER_ATTR_RO(max_sg_segs); -static ssize_t st_version_show(struct device_driver *ddd, char *buf) +static ssize_t version_show(struct device_driver *ddd, char *buf) { - return snprintf(buf, PAGE_SIZE, [%s]\n, verstr); + return sprintf(buf, [%s]\n, verstr); } -static DRIVER_ATTR(version, S_IRUGO, st_version_show, NULL); +static DRIVER_ATTR_RO(version); static struct attribute *st_drv_attrs[] = { driver_attr_try_direct_io.attr, -- -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] st: convert DRIVER_ATTR macros to DRIVER_ATTR_RO
On (06/24/15 08:10), Greg KH gre...@linuxfoundation.org (gre...@linuxfoundation.org) wrote: On Wed, Jun 24, 2015 at 03:25:57PM +0900, Sergey Senozhatsky wrote: On (06/24/15 06:10), Seymour, Shane M wrote: [..] /* The sysfs driver interface. Read-only at the moment */ -static ssize_t st_try_direct_io_show(struct device_driver *ddp, char *buf) +static ssize_t try_direct_io_show(struct device_driver *ddp, char *buf) { - return snprintf(buf, PAGE_SIZE, %d\n, try_direct_io); + return sprintf(buf, %d\n, try_direct_io); } a nitpick, per Documentation/filesystems/sysfs.txt : : - show() should always use scnprintf(). : That should be rewritten to say, don't use snprintf(), but scnprintf(), if you want to. Otherwise sprintf() should be fine as you obviously are only returning a single value to userspace Sure, that was just a nitpick. For '%d' it's totally fine, I agree. It was more of a 'do we strictly obey the rules' thing. -ss -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[linux-next] scsi_attach_vpd() warning at mm/page_alloc.c:2497
Hello, [1.971778] [ cut here ] [1.971960] WARNING: CPU: 1 PID: 6 at mm/page_alloc.c:2497 __alloc_pages_nodemask+0x1b9/0x693() [1.972246] Modules linked in: sd_mod ahci [1.972604] CPU: 1 PID: 6 Comm: kworker/u8:0 Not tainted 3.14.0-rc8-next-20140327-dbg-dirty #202 [1.972890] Hardware name: Acer Aspire 5741G/Aspire 5741G , BIOS V1.20 02/08/2011 [1.973182] Workqueue: events_unbound async_run_entry_fn [1.973417] 8801534d5af0 813ad822 [1.973994] 8801534d5b28 8103c03c 810ad4e5 001000d0 [1.974529] 81656fc8 81656fc0 8801534d5b38 [1.975125] Call Trace: [1.975306] [813ad822] dump_stack+0x4e/0x7a [1.975488] [8103c03c] warn_slowpath_common+0x75/0x8e [1.975670] [810ad4e5] ? __alloc_pages_nodemask+0x1b9/0x693 [1.975853] [8103c0f4] warn_slowpath_null+0x15/0x17 [1.976035] [810ad4e5] __alloc_pages_nodemask+0x1b9/0x693 [1.976220] [81074220] ? console_unlock+0x2dd/0x2fa [1.976404] [812a08be] ? scsi_execute_req_flags+0x9c/0xb3 [1.976587] [810ad9d1] __get_free_pages+0x12/0x3f [1.976771] [810d5c66] __kmalloc+0x37/0x112 [1.976951] [8129b5f4] scsi_attach_vpd+0x41/0x1a8 [1.977133] [812a3d81] scsi_probe_and_add_lun+0x8ec/0xa21 [1.977317] [812a40f8] __scsi_add_device+0xce/0x10a [1.977501] [812b109a] ata_scsi_scan_host+0x60/0x142 [1.977683] [812ad6e9] async_port_probe+0x45/0x4a [1.977873] [81057e18] async_run_entry_fn+0x5a/0x110 [1.978061] [8104ee52] process_one_work+0x1c9/0x2e9 [1.978243] [8104f3ec] worker_thread+0x1d3/0x2bd [1.978424] [8104f219] ? rescuer_thread+0x27d/0x27d [1.978606] [81053ab4] kthread+0xd6/0xde [1.978786] [810539de] ? kthread_create_on_node+0x162/0x162 [1.978970] [813b2efc] ret_from_fork+0x7c/0xb0 [1.979151] [810539de] ? kthread_create_on_node+0x162/0x162 [1.979334] ---[ end trace ca7c5bd74b0dca21 ]--- scsi_attach_vpd() (at pg0 label) attempts to firstly allocate 255 and later 134217730 bytes. --- general question, scsi_attach_vpd() is sure that scsi_vpd_inquiry() returns `ret 0' in case of an error and `ret 0' otherwise: Returns size of the vpd page on success or a negative error number. while this is not exactly true. scsi_vpd_inquiry() indeed can return -EINVAL, but usually it returns scsi_execute_req()-scsi_execute_req_flags()-scsi_execute() status, which has different error indication and can be, e.g. for failed blk_get_request() in scsi_execute() or failed sense kzalloc() in scsi_execute_req_flags(), `DRIVER_ERROR 24'. -ss -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-next] scsi_attach_vpd() warning at mm/page_alloc.c:2497
On (03/27/14 07:29), James Bottomley wrote: On Thu, 2014-03-27 at 13:19 +0300, Sergey Senozhatsky wrote: Hello, [1.971778] [ cut here ] [1.971960] WARNING: CPU: 1 PID: 6 at mm/page_alloc.c:2497 __alloc_pages_nodemask+0x1b9/0x693() [1.972246] Modules linked in: sd_mod ahci [1.972604] CPU: 1 PID: 6 Comm: kworker/u8:0 Not tainted 3.14.0-rc8-next-20140327-dbg-dirty #202 [1.972890] Hardware name: Acer Aspire 5741G/Aspire 5741G, BIOS V1.20 02/08/2011 [1.973182] Workqueue: events_unbound async_run_entry_fn [1.973417] 8801534d5af0 813ad822 [1.973994] 8801534d5b28 8103c03c 810ad4e5 001000d0 [1.974529] 81656fc8 81656fc0 8801534d5b38 [1.975125] Call Trace: [1.975306] [813ad822] dump_stack+0x4e/0x7a [1.975488] [8103c03c] warn_slowpath_common+0x75/0x8e [1.975670] [810ad4e5] ? __alloc_pages_nodemask+0x1b9/0x693 [1.975853] [8103c0f4] warn_slowpath_null+0x15/0x17 [1.976035] [810ad4e5] __alloc_pages_nodemask+0x1b9/0x693 [1.976220] [81074220] ? console_unlock+0x2dd/0x2fa [1.976404] [812a08be] ? scsi_execute_req_flags+0x9c/0xb3 [1.976587] [810ad9d1] __get_free_pages+0x12/0x3f [1.976771] [810d5c66] __kmalloc+0x37/0x112 [1.976951] [8129b5f4] scsi_attach_vpd+0x41/0x1a8 [1.977133] [812a3d81] scsi_probe_and_add_lun+0x8ec/0xa21 [1.977317] [812a40f8] __scsi_add_device+0xce/0x10a [1.977501] [812b109a] ata_scsi_scan_host+0x60/0x142 [1.977683] [812ad6e9] async_port_probe+0x45/0x4a [1.977873] [81057e18] async_run_entry_fn+0x5a/0x110 [1.978061] [8104ee52] process_one_work+0x1c9/0x2e9 [1.978243] [8104f3ec] worker_thread+0x1d3/0x2bd [1.978424] [8104f219] ? rescuer_thread+0x27d/0x27d [1.978606] [81053ab4] kthread+0xd6/0xde [1.978786] [810539de] ? kthread_create_on_node+0x162/0x162 [1.978970] [813b2efc] ret_from_fork+0x7c/0xb0 [1.979151] [810539de] ? kthread_create_on_node+0x162/0x162 [1.979334] ---[ end trace ca7c5bd74b0dca21 ]--- scsi_attach_vpd() (at pg0 label) attempts to firstly allocate 255 and later 134217730 bytes. What device is this? can you do a sg_inq -H -p 0 /dev/sda and email the results to see if there's a code error or the device really is sending data that's insane. $ sudo sg_inq -H -p 0 /dev/sda VPD INQUIRY, page code=0x00: 00 00 00 00 07 00 80 83 89 b0 b1 b2 ... acer aspire 5741G laptop [1.593890] ahci :00:1f.2: version 3.0 [1.594108] ahci :00:1f.2: irq 43 for MSI/MSI-X [1.594169] ahci :00:1f.2: AHCI 0001.0300 32 slots 4 ports 3 Gbps 0x3 impl SATA mode [1.594454] ahci :00:1f.2: flags: 64bit ncq sntf stag pm led clo pio slum part ems apst [1.602452] scsi0 : ahci [1.602853] scsi1 : ahci [1.603574] scsi2 : ahci [1.603939] scsi3 : ahci [1.604185] ata1: SATA max UDMA/133 abar m2048@0xb4105000 port 0xb4105100 irq 43 [1.604481] ata2: SATA max UDMA/133 abar m2048@0xb4105000 port 0xb4105180 irq 43 [1.604765] ata3: DUMMY [1.604925] ata4: DUMMY [1.921966] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [1.922179] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [1.924100] ata1.00: ATA-8: WDC WD6400BEVT-22A0RT0, 01.01A01, max UDMA/133 [1.924280] ata1.00: 1250263728 sectors, multi 0: LBA48 NCQ (depth 31/32), AA [1.926662] ata1.00: configured for UDMA/133 [1.927126] scsi 0:0:0:0: Direct-Access ATA WDC WD6400BEVT-2 01.0 PQ: 0 ANSI: 5 [1.929700] sd 0:0:0:0: [sda] 1250263728 512-byte logical blocks: (640 GB/596 GiB) [1.930393] sd 0:0:0:0: [sda] Write Protect is off [1.930579] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [1.930822] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [1.938973] ata2.00: ATAPI: TSSTcorp CDDVDW TS-L633C, AC01, max UDMA/100 [1.939154] ata2.00: applying bridge limits [1.957617] ata2.00: configured for UDMA/100 [1.963032] scsi 1:0:0:0: CD-ROMTSSTcorp CDDVDW TS-L633C AC01 PQ: 0 ANSI: 5 [1.971778] [ cut here ] [1.971960] WARNING: CPU: 1 PID: 6 at mm/page_alloc.c:2497 __alloc_pages_nodemask+0x1b9/0x693() [1.972246] Modules linked in: sd_mod ahci [1.972604] CPU: 1 PID: 6 Comm: kworker/u8:0 Not tainted 3.14.0-rc8-next-20140327-dbg-dirty #202 [1.972890] Hardware name: Acer Aspire 5741G/Aspire 5741G , BIOS V1.20 02/08/2011 [1.973182] Workqueue: events_unbound async_run_entry_fn [1.973417] 8801534d5af0
Re: [linux-next] scsi_attach_vpd() warning at mm/page_alloc.c:2497
On (03/27/14 07:29), James Bottomley wrote: On Thu, 2014-03-27 at 13:19 +0300, Sergey Senozhatsky wrote: Hello, [1.971778] [ cut here ] [1.971960] WARNING: CPU: 1 PID: 6 at mm/page_alloc.c:2497 __alloc_pages_nodemask+0x1b9/0x693() [1.972246] Modules linked in: sd_mod ahci [1.972604] CPU: 1 PID: 6 Comm: kworker/u8:0 Not tainted 3.14.0-rc8-next-20140327-dbg-dirty #202 [1.972890] Hardware name: Acer Aspire 5741G/Aspire 5741G, BIOS V1.20 02/08/2011 [1.973182] Workqueue: events_unbound async_run_entry_fn [1.973417] 8801534d5af0 813ad822 [1.973994] 8801534d5b28 8103c03c 810ad4e5 001000d0 [1.974529] 81656fc8 81656fc0 8801534d5b38 [1.975125] Call Trace: [1.975306] [813ad822] dump_stack+0x4e/0x7a [1.975488] [8103c03c] warn_slowpath_common+0x75/0x8e [1.975670] [810ad4e5] ? __alloc_pages_nodemask+0x1b9/0x693 [1.975853] [8103c0f4] warn_slowpath_null+0x15/0x17 [1.976035] [810ad4e5] __alloc_pages_nodemask+0x1b9/0x693 [1.976220] [81074220] ? console_unlock+0x2dd/0x2fa [1.976404] [812a08be] ? scsi_execute_req_flags+0x9c/0xb3 [1.976587] [810ad9d1] __get_free_pages+0x12/0x3f [1.976771] [810d5c66] __kmalloc+0x37/0x112 [1.976951] [8129b5f4] scsi_attach_vpd+0x41/0x1a8 [1.977133] [812a3d81] scsi_probe_and_add_lun+0x8ec/0xa21 [1.977317] [812a40f8] __scsi_add_device+0xce/0x10a [1.977501] [812b109a] ata_scsi_scan_host+0x60/0x142 [1.977683] [812ad6e9] async_port_probe+0x45/0x4a [1.977873] [81057e18] async_run_entry_fn+0x5a/0x110 [1.978061] [8104ee52] process_one_work+0x1c9/0x2e9 [1.978243] [8104f3ec] worker_thread+0x1d3/0x2bd [1.978424] [8104f219] ? rescuer_thread+0x27d/0x27d [1.978606] [81053ab4] kthread+0xd6/0xde [1.978786] [810539de] ? kthread_create_on_node+0x162/0x162 [1.978970] [813b2efc] ret_from_fork+0x7c/0xb0 [1.979151] [810539de] ? kthread_create_on_node+0x162/0x162 [1.979334] ---[ end trace ca7c5bd74b0dca21 ]--- scsi_attach_vpd() (at pg0 label) attempts to firstly allocate 255 and later 134217730 bytes. What device is this? can you do a sg_inq -H -p 0 /dev/sda and email the results to see if there's a code error or the device really is sending data that's insane. --- general question, scsi_attach_vpd() is sure that scsi_vpd_inquiry() returns `ret 0' in case of an error and `ret 0' otherwise: Returns size of the vpd page on success or a negative error number. while this is not exactly true. scsi_vpd_inquiry() indeed can return -EINVAL, but usually it returns scsi_execute_req()-scsi_execute_req_flags()-scsi_execute() status, which has different error indication and can be, e.g. for failed blk_get_request() in scsi_execute() or failed sense kzalloc() in scsi_execute_req_flags(), `DRIVER_ERROR 24'. Yes, that looks to be the problem. Can you test this patch? NO mm/page_alloc.c warnings observed with the patch applied. -ss James --- diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 3241749..c4d632c 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -952,7 +952,7 @@ static int scsi_vpd_inquiry(struct scsi_device *sdev, unsigned char *buffer, result = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buffer, len, NULL, 30 * HZ, 3, NULL); if (result) - return result; + return -EIO; /* Sanity check that we got the page back that we asked for */ if (buffer[1] != page) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html