Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk

2018-04-01 Thread Sergey Senozhatsky
On (04/02/18 13:14), wen.yan...@zte.com.cn wrote:
> 
>> It's true that this print for the same device is useless. But it's
>> useful for different devices. Is it possible to limit the print only
>> for the same device?
> 
>In our scene, it's  just for the same device (q->queuedata), Thanks.

Yes, what Jason meant was that rate limit struct is shared by different call
sites - including scsi_request_fn() from different devices.

If device1->scsi_request_fn()->sdev_printk_ratelimited() causes printk rate
limit, then messages from device2->scsi_request_fn()->sdev_printk_ratelimited()
may be lost entirely, unless you have enough of them.

-ss


Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk

2018-04-01 Thread Sergey Senozhatsky
Hello,

On (04/02/18 09:58), Wen Yang wrote:
> There would be so many same lines printed by frequent printk if one 
> disk went wrong, like,
> [  546.185242] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185258] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185280] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185307] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185334] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185364] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185390] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185410] sd 0:1:0:0: rejecting I/O to offline device
> For slow serial console, the frequent printk may be blocked for a 
> long time, and if any spin_lock has been acquired before the printk 
> like in scsi_request_fn, watchdog could be triggered.

Did you test the patch? Rate limiting does not completely remove
printk calls. printk is still there, in a loop under spin_lock.
A big enough I/O request queue can cause the same lockup problems.

-ss


Re: 答复: Re: [PATCH] scsi: Replace sdev_printk with printk_deferred to avoid

2018-03-27 Thread Sergey Senozhatsky
On (03/28/18 10:29), wen.yan...@zte.com.cn wrote:
>Hello Bart,
>
>We have a detailed discussion of the problem.
>Sergey Senozhatsky, Petr and many people have made a lot of efforts about
>it.
>Please see this link:
>https://bugzilla.kernel.org/show_bug.cgi?id=199003
>
>1, Petr suggests that it should be modified in this way:
>IMHO, printing the same message so many times is useless. Therefore
>some throttling would make sense. If we want to keep sdev_printk(),

The thing with retelimiting is that - yes, we do less printks but we still
do them under queue spin_lock.

So I was thinking about something like below [a quick-n-dirty workaround]

---

 drivers/scsi/scsi_lib.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0dfec0dedd5e..6c930fbdd24c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1825,9 +1825,13 @@ static void scsi_request_fn(struct request_queue *q)
break;
 
if (unlikely(!scsi_device_online(sdev))) {
+   scsi_kill_request(req, q);
+   spin_unlock_irq(q->queue_lock);
+
sdev_printk(KERN_ERR, sdev,
"rejecting I/O to offline device\n");
-   scsi_kill_request(req, q);
+
+   spin_lock_irq(q->queue_lock);
continue;
}
 


Re: [Bug 199003] console stalled, cause Hard LOCKUP.

2018-03-27 Thread Sergey Senozhatsky
I'll Cc blockdev

On (03/27/18 08:36), bugzilla-dae...@bugzilla.kernel.org wrote:
> > --- Comment #17 from sergey.senozhatsky.w...@gmail.com ---
> > On (03/26/18 13:05), bugzilla-dae...@bugzilla.kernel.org wrote:
> > > Therefore the serial console is actually pretty fast. It seems that the
> > > deadline
> > > 10ms-per-character is not in the game here.
> > 
> > As the name suggests this is dmesg - content of logbuf. We can't tell
> > anything about serial consoles speed from it.
> 
> Grrr, you are right. It would be interesting to see the output from
> the serial port as well.
> 
> Anyway, it does not change the fact that printing so many same lines is
> useless. The throttling still would make sense and probably would
> solve the problem.

You are right.

Looking at backtraces 
(https://bugzilla.kernel.org/attachment.cgi?id=274953=edit)
there *probably* was just one CPU doing all printk-s and all printouts. And
there was one CPU waiting for that printing CPU to unlock the queue spin_lock.

The printing CPU was looping in scsi_request_fn() picking up requests
and calling sdev_printk() for each of them, because the device was
offline. Given that serial console is not very fast, that we called
serial console under queue spin_lock and the number of printks called,
it was enough to lockup the CPU which was spining on queue spin_lock and
to hard lockup the system.

scsi_request_fn() does unlock the queue lock later, but not in that
!scsi_device_online(sdev) error case.

scsi_request_fn()
{
for (;;) {
int rtn;
/*
 * get next queueable request.  We do this early to make sure
 * that the request is fully prepared even if we cannot
 * accept it.
 */
req = blk_peek_request(q);
if (!req)
break;

if (unlikely(!scsi_device_online(sdev))) {
sdev_printk(KERN_ERR, sdev,
"rejecting I/O to offline device\n");
scsi_kill_request(req, q);
continue;
^ still under spinlock
}
}

I'd probably just unlock/lock queue lock, rather than ratelimit printk-s,
before `continue'. Dunno.

James, Martin, what do you think?

-ss


Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]

2017-08-30 Thread Sergey Senozhatsky
Hello Peter,

On (08/30/17 10:47), Peter Zijlstra wrote:
[..]
> On Wed, Aug 30, 2017 at 10:42:07AM +0200, Peter Zijlstra wrote:
> > 
> > So the overhead looks to be spread out over all sorts, which makes it
> > harder to find and fix.
> > 
> > stack unwinding is done lots and is fairly expensive, I've not yet
> > checked if crossrelease does too much of that.
> 
> Aah, we do an unconditional stack unwind for every __lock_acquire() now.
> It keeps a trace in the xhlocks[].
> 
> Does the below cure most of that overhead?

thanks.

mostly yes. the kernel is not so dramatically slower now. it's still
seems to be a bit slower, which is expected I suppose, but over all
it's much better

real1m35.209s
user4m12.467s
sys 0m49.457s

// approx 10 seconds slower.

-ss


Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]

2017-08-30 Thread Sergey Senozhatsky
Hi,

On (08/30/17 14:43), Byungchul Park wrote:
[..]
> > notably slower than earlier 4.13 linux-next. (e.g. scrolling in vim
> > is irritatingly slow)
> 
> To Ingo,
> 
> I cannot decide if we have to roll back CONFIG_LOCKDEP_CROSSRELEASE
> dependency on CONFIG_PROVE_LOCKING in Kconfig. With them enabled,
> lockdep detection becomes strong but has performance impact. But,
> it's anyway a debug option so IMHO we don't have to take case of the
> performance impact. Please let me know your decision.

well, I expected it :)

I've been running lockdep enabled kernels for years, and was OK with
the performance. but now it's just too much and I'm looking at disabling
lockdep.

a more relevant test -- compilation of a relatively small project

  LOCKDEP -CROSSRELEASE -COMPLETIONS LOCKDEP +CROSSRELEASE +COMPLETIONS

   real1m23.722s  real2m9.969s
   user4m11.300s  user4m15.458s
   sys 0m49.386s  sys 2m3.594s


you don't want to know how much time now it takes to recompile the
kernel ;)

-ss


Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]

2017-08-29 Thread Sergey Senozhatsky
On (08/23/17 09:03), Byungchul Park wrote:
[..]
> > Byungchul, did you add the crosslock checks to lockdep? Can you have a look 
> > at
> > the above report? That report namely doesn't make sense to me.
> 
> The report is talking about the following lockup:
> 
> A work in a worker A task work on exit to user
> -- ---
> mutex_lock(>bd_mutex)
>mutext_lock(>bd_mutex)
> blk_execute_rq()
>wait_for_completion_io_timeout()
>complete()
> 
[..]
> To Peterz,
> 
> Anyway I wanted to avoid lockdep reports in the case using a timeout
> interface. Do you think it's still worth reporting the kind of lockup?
> I'm ok if you do.

Byungchul, a quick question.
have you measured the performance impact? somehow my linux-next is
notably slower than earlier 4.13 linux-next. (e.g. scrolling in vim
is irritatingly slow)


`time dmesg' shows some difference, but probably that's not a good
test.

!LOCKDEPLOCKDEP LOCKDEP -CROSSRELEASE -COMPLETIONS
real 0m0.661s   0m2.290s0m1.920s
user 0m0.010s   0m0.105s0m0.000s
sys  0m0.636s   0m2.224s0m1.888s

anyone else "sees"/"can confirm" the slow down?


it gets back to "usual normal" when I disable CROSSRELEASE and COMPLETIONS.

---

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index b19c491cbc4e..cdc30ef81c5e 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1091,8 +1091,6 @@ config PROVE_LOCKING
select DEBUG_MUTEXES
select DEBUG_RT_MUTEXES if RT_MUTEXES
select DEBUG_LOCK_ALLOC
-   select LOCKDEP_CROSSRELEASE
-   select LOCKDEP_COMPLETIONS
select TRACE_IRQFLAGS
default n
help

---

-ss


Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]

2017-08-23 Thread Sergey Senozhatsky
Hi,

On (08/24/17 12:39), Boqun Feng wrote:
> On Wed, Aug 23, 2017 at 02:55:17PM +0900, Sergey Senozhatsky wrote:
> > On (08/23/17 13:35), Boqun Feng wrote:
> > > > KERN_CONT and "\n" should not be together. "\n" flushes the cont
> > > > buffer immediately.
> > > > 
> > > 
> > > Hmm.. Not quite familiar with printk() stuffs, but I could see several
> > > usages of printk(KERN_CONT "...\n") in kernel.
> > > 
> > > Did a bit research myself, and I now think the inappropriate use is to
> > > use a KERN_CONT printk *after* another printk ending with a "\n".
> > 
> > ah... I didn't check __print_lock_name(): it leaves unflushed cont buffer
> > upon the return. sorry, your code is correct.
> > 
> 
> So means printk(KERN_CON "..."); + printk(KERN_CONT "...\n") is a
> correct usage, right?

well, yes. with one precondition - there should be no printk-s from other
CPUs/tasks in between

printk(KERN_CON "...");  +  printk(KERN_CONT "...\n")
   ^
here we can have a preliminary flush and broken
cont line. but it's been this way forever.

-ss


Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]

2017-08-22 Thread Sergey Senozhatsky
On (08/23/17 13:35), Boqun Feng wrote:
> > KERN_CONT and "\n" should not be together. "\n" flushes the cont
> > buffer immediately.
> > 
> 
> Hmm.. Not quite familiar with printk() stuffs, but I could see several
> usages of printk(KERN_CONT "...\n") in kernel.
> 
> Did a bit research myself, and I now think the inappropriate use is to
> use a KERN_CONT printk *after* another printk ending with a "\n".

ah... I didn't check __print_lock_name(): it leaves unflushed cont buffer
upon the return. sorry, your code is correct.

-ss

> > >   printk("\n *** DEADLOCK ***\n\n");
> > > + } else if (cross_lock(src->instance)) {
> > > + printk(" Possible unsafe locking scenario by crosslock:\n\n");
> > > + printk("   CPU0CPU1\n");
> > > + printk("   \n");
> > > + printk("  lock(");
> > > + __print_lock_name(target);
> > > + printk(KERN_CONT ");\n");
> > > + printk("  lock(");
> > > + __print_lock_name(source);
> > > + printk(KERN_CONT ");\n");
> > > + printk("   lock(");
> > > + __print_lock_name(parent == source ? target : parent);
> > > + printk(KERN_CONT ");\n");
> > > + printk("   unlock(");
> > > + __print_lock_name(source);
> > > + printk(KERN_CONT ");\n");
> > > + printk("\n *** DEADLOCK ***\n\n");
> > >   } else {
> > >   printk(" Possible unsafe locking scenario:\n\n");
> > >   printk("   CPU0CPU1\n");
> > > -- 
> > > 2.14.1
> > > 




Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]

2017-08-22 Thread Sergey Senozhatsky
On (08/23/17 13:35), Boqun Feng wrote:
[..]
> > >   printk(KERN_CONT ");\n");
> > 
> > KERN_CONT and "\n" should not be together. "\n" flushes the cont
> > buffer immediately.
> > 
> 
> Hmm.. Not quite familiar with printk() stuffs, but I could see several
> usages of printk(KERN_CONT "...\n") in kernel.
> 
> Did a bit research myself, and I now think the inappropriate use is to
> use a KERN_CONT printk *after* another printk ending with a "\n". Am I
> missing some recent changes or rules of KERN_CONT?

has been this way for quite some time (if not always).

LOG_NEWLINE results in cont_flush(), which log_store() the content
of KERN_CONT buffer.

if we see that supplied message has no \n then we store it in a
dedicated buffer (cont buffer)

if (!(lflags & LOG_NEWLINE))
return cont_add();

return log_store();

we flush that buffer (move its content to the kernel log buffer) when
we receive a message with a \n or when printk() from another task/context
interrupts the current cont line and, thus, forces us to flush.

-ss


Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]

2017-08-22 Thread Sergey Senozhatsky
On (08/23/17 12:38), Boqun Feng wrote:
[..]
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 642fb5362507..a3709e15f609 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -1156,6 +1156,23 @@ print_circular_lock_scenario(struct held_lock *src,
>   __print_lock_name(target);
>   printk(KERN_CONT ");\n");

KERN_CONT and "\n" should not be together. "\n" flushes the cont
buffer immediately.

-ss

>   printk("\n *** DEADLOCK ***\n\n");
> + } else if (cross_lock(src->instance)) {
> + printk(" Possible unsafe locking scenario by crosslock:\n\n");
> + printk("   CPU0CPU1\n");
> + printk("   \n");
> + printk("  lock(");
> + __print_lock_name(target);
> + printk(KERN_CONT ");\n");
> + printk("  lock(");
> + __print_lock_name(source);
> + printk(KERN_CONT ");\n");
> + printk("   lock(");
> + __print_lock_name(parent == source ? target : parent);
> + printk(KERN_CONT ");\n");
> + printk("   unlock(");
> + __print_lock_name(source);
> + printk(KERN_CONT ");\n");
> + printk("\n *** DEADLOCK ***\n\n");
>   } else {
>   printk(" Possible unsafe locking scenario:\n\n");
>   printk("   CPU0CPU1\n");
> -- 
> 2.14.1
> 


Re: possible circular locking dependency detected [was: linux-next: Tree for Aug 22]

2017-08-22 Thread Sergey Senozhatsky
On (08/23/17 09:03), Byungchul Park wrote:
[..]

aha, ok

> The report is talking about the following lockup:
> 
> A work in a worker A task work on exit to user
> -- ---
> mutex_lock(>bd_mutex)
>mutext_lock(>bd_mutex)
> blk_execute_rq()
>wait_for_completion_io_timeout()
>complete()
> 
> Is this impossible?

I was really confused how this "unlock" may lead to a deadlock

> > >  other info that might help us debug this:
> > >  Possible unsafe locking scenario by crosslock:
> > >CPU0CPU1
> > >
> > >   lock(>bd_mutex);
> > >   lock((complete)#2);
> > >lock(>bd_mutex);
> > >unlock((complete)#2);


any chance the report can be improved? mention timeout, etc?
// well, if this functionality will stay.


p.s.
Bart Van Assche, thanks for Cc-ing Park Byungchul, I was really
sure I didn't enabled the cross-release, but apparently I was wrong:
 CONFIG_LOCKDEP_CROSSRELEASE=y
 CONFIG_LOCKDEP_COMPLETIONS=y

-ss


possible circular locking dependency detected [was: linux-next: Tree for Aug 22]

2017-08-22 Thread Sergey Senozhatsky
Hello,

==
WARNING: possible circular locking dependency detected
4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tainted
--
fsck.ext4/148 is trying to acquire lock:
 (>bd_mutex){+.+.}, at: [] __blkdev_put+0x33/0x190

 but now in release context of a crosslock acquired at the following:
 ((complete)#2){+.+.}, at: [] blk_execute_rq+0xbb/0xda

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 ((complete)#2){+.+.}:
   lock_acquire+0x176/0x19e
   __wait_for_common+0x50/0x1e3
   blk_execute_rq+0xbb/0xda
   scsi_execute+0xc3/0x17d [scsi_mod]
   sd_revalidate_disk+0x112/0x1549 [sd_mod]
   rescan_partitions+0x48/0x2c4
   __blkdev_get+0x14b/0x37c
   blkdev_get+0x191/0x2c0
   device_add_disk+0x2b4/0x3e5
   sd_probe_async+0xf8/0x17e [sd_mod]
   async_run_entry_fn+0x34/0xe0
   process_one_work+0x2af/0x4d1
   worker_thread+0x19a/0x24f
   kthread+0x133/0x13b
   ret_from_fork+0x27/0x40

 -> #0 (>bd_mutex){+.+.}:
   __blkdev_put+0x33/0x190
   blkdev_close+0x24/0x27
   __fput+0xee/0x18a
   task_work_run+0x79/0xa0
   prepare_exit_to_usermode+0x9b/0xb5

 other info that might help us debug this:
 Possible unsafe locking scenario by crosslock:
   CPU0CPU1
   
  lock(>bd_mutex);
  lock((complete)#2);
   lock(>bd_mutex);
   unlock((complete)#2);

  *** DEADLOCK ***
4 locks held by fsck.ext4/148:
 #0:  (>bd_mutex){+.+.}, at: [] __blkdev_put+0x33/0x190
 #1:  (rcu_read_lock){}, at: [] rcu_lock_acquire+0x0/0x20
 #2:  (&(>lock)->rlock){-.-.}, at: [] 
ata_scsi_queuecmd+0x23/0x74 [libata]
 #3:  (>wait#14){-...}, at: [] complete+0x18/0x50

 stack backtrace:
CPU: 1 PID: 148 Comm: fsck.ext4 Not tainted 
4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746
Call Trace:
 dump_stack+0x67/0x8e
 print_circular_bug+0x2a1/0x2af
 ? zap_class+0xc5/0xc5
 check_prev_add+0x76/0x20d
 ? __lock_acquire+0xc27/0xcc8
 lock_commit_crosslock+0x327/0x35e
 complete+0x24/0x50
 scsi_end_request+0x8d/0x176 [scsi_mod]
 scsi_io_completion+0x1be/0x423 [scsi_mod]
 __blk_mq_complete_request+0x112/0x131
 ata_scsi_simulate+0x212/0x218 [libata]
 __ata_scsi_queuecmd+0x1be/0x1de [libata]
 ata_scsi_queuecmd+0x41/0x74 [libata]
 scsi_dispatch_cmd+0x194/0x2af [scsi_mod]
 scsi_queue_rq+0x1e0/0x26f [scsi_mod]
 blk_mq_dispatch_rq_list+0x193/0x2a7
 ? _raw_spin_unlock+0x2e/0x40
 blk_mq_sched_dispatch_requests+0x132/0x176
 __blk_mq_run_hw_queue+0x59/0xc5
 __blk_mq_delay_run_hw_queue+0x5f/0xc1
 blk_mq_flush_plug_list+0xfc/0x10b
 blk_flush_plug_list+0xc6/0x1eb
 blk_finish_plug+0x25/0x32
 generic_writepages+0x56/0x63
 do_writepages+0x36/0x70
 __filemap_fdatawrite_range+0x59/0x5f
 filemap_write_and_wait+0x19/0x4f
 __blkdev_put+0x5f/0x190
 blkdev_close+0x24/0x27
 __fput+0xee/0x18a
 task_work_run+0x79/0xa0
 prepare_exit_to_usermode+0x9b/0xb5
 entry_SYSCALL_64_fastpath+0xab/0xad
RIP: 0033:0x7ff5755a2f74
RSP: 002b:7ffe46fce038 EFLAGS: 0246 ORIG_RAX: 0003
RAX:  RBX: 555ddeddded0 RCX: 7ff5755a2f74
RDX: 1000 RSI: 555ddede2580 RDI: 0004
RBP:  R08: 555ddede2580 R09: 555ddedde080
R10: 00010800 R11: 0246 R12: 555ddedddfa0
R13: 7ff576523680 R14: 1000 R15: 555ddeddc2b0

-ss


Re: [-next] BUG_ON in scsi_target_destroy()

2016-04-13 Thread Sergey Senozhatsky
Hello,

On (04/13/16 08:14), James Bottomley wrote:
[..]
> How about good grief no!  A device with multiple targets will get it's
> lists screwed with this
> 
> The STARGET_REMOVE state you added only applies to the case we're
> trying to kill a target.  In the natural operation case, which is what
> everyone else is running into, we will try to remove a running target
> when it has no more scsi devices left on it.  So the correct patch
> should be to make the BUG_ON see this:

works for me.
Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhat...@gmail.com>

-ss

> James
> 
> ---
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 27df7e7..e0a78f5 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -319,8 +319,7 @@ static void scsi_target_destroy(struct scsi_target 
> *starget)
>   struct Scsi_Host *shost = dev_to_shost(dev->parent);
>   unsigned long flags;
>  
> - BUG_ON(starget->state != STARGET_REMOVE &&
> -starget->state != STARGET_CREATED);
> + BUG_ON(starget->state == STARGET_DEL);
>   starget->state = STARGET_DEL;
>   transport_destroy_device(dev);
>   spin_lock_irqsave(shost->host_lock, flags);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [-next] BUG_ON in scsi_target_destroy()

2016-04-13 Thread Sergey Senozhatsky
Hi,

On (04/13/16 10:41), Johannes Thumshirn wrote:
> Hi Sergey,  Xiong,
> 
> Can you try below patch?

it panics my system.
first it warn_on-s in lib/kobject.c:244

then NULL dereferences
do_scan_async
 __scsi_remove_device
  scsi_target_reap
   device_del
dpm_sysfs_remove
 sysfs_unmerge_group
  kernfs_find_and_get_ns
   kernfs_find_ns

-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-next] BUG_ON in scsi_target_destroy()

2016-04-11 Thread Sergey Senozhatsky
Hello,

commit 7b106f2de6938c31ce5e9c86bc70ad3904666b96
Author: Johannes Thumshirn 
Date:   Tue Apr 5 11:50:44 2016 +0200

scsi: Add intermediate STARGET_REMOVE state to scsi_target_state


BUG_ON()s (next-20160411) each time I remove a usb flash

[   49.561600]  [] scsi_target_destroy+0x5a/0xcb [scsi_mod]
[   49.561607]  [] scsi_target_reap+0x4a/0x4f [scsi_mod]
[   49.561613]  [] __scsi_remove_device+0xc3/0xd0 [scsi_mod]
[   49.561619]  [] scsi_forget_host+0x52/0x63 [scsi_mod]
[   49.561623]  [] scsi_remove_host+0x8c/0x102 [scsi_mod]
[   49.561627]  [] usb_stor_disconnect+0x6b/0xab [usb_storage]
[   49.561634]  [] usb_unbind_interface+0x77/0x1ca [usbcore]
[   49.561636]  [] __device_release_driver+0x9d/0x121
[   49.561638]  [] device_release_driver+0x23/0x30
[   49.561639]  [] bus_remove_device+0xfb/0x10e
[   49.561641]  [] device_del+0x164/0x1e6
[   49.561648]  [] ? remove_intf_ep_devs+0x3b/0x48 [usbcore]
[   49.561655]  [] usb_disable_device+0x84/0x1a5 [usbcore]
[   49.561661]  [] usb_disconnect+0x94/0x19f [usbcore]
[   49.561667]  [] hub_event+0x5c1/0xdea [usbcore]
[   49.561670]  [] process_one_work+0x1dc/0x37f
[   49.561672]  [] worker_thread+0x282/0x36d
[   49.561673]  [] ? rescuer_thread+0x2ae/0x2ae
[   49.561675]  [] kthread+0xd2/0xda
[   49.561678]  [] ret_from_fork+0x22/0x40
[   49.561679]  [] ? kthread_worker_fn+0x13e/0x13e

-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] st: convert DRIVER_ATTR macros to DRIVER_ATTR_RO

2015-06-24 Thread Sergey Senozhatsky
On (06/24/15 06:10), Seymour, Shane M wrote:
[..]
  
  /* The sysfs driver interface. Read-only at the moment */
 -static ssize_t st_try_direct_io_show(struct device_driver *ddp, char *buf)
 +static ssize_t try_direct_io_show(struct device_driver *ddp, char *buf)
  {
 - return snprintf(buf, PAGE_SIZE, %d\n, try_direct_io);
 + return sprintf(buf, %d\n, try_direct_io);
  }

a nitpick,

per Documentation/filesystems/sysfs.txt

:
: - show() should always use scnprintf().
:

  -ss

 -static DRIVER_ATTR(try_direct_io, S_IRUGO, st_try_direct_io_show, NULL);
 +static DRIVER_ATTR_RO(try_direct_io);
  
 -static ssize_t st_fixed_buffer_size_show(struct device_driver *ddp, char 
 *buf)
 +static ssize_t fixed_buffer_size_show(struct device_driver *ddp, char *buf)
  {
 - return snprintf(buf, PAGE_SIZE, %d\n, st_fixed_buffer_size);
 + return sprintf(buf, %d\n, st_fixed_buffer_size);
  }
 -static DRIVER_ATTR(fixed_buffer_size, S_IRUGO, st_fixed_buffer_size_show, 
 NULL);
 +static DRIVER_ATTR_RO(fixed_buffer_size);
  
 -static ssize_t st_max_sg_segs_show(struct device_driver *ddp, char *buf)
 +static ssize_t max_sg_segs_show(struct device_driver *ddp, char *buf)
  {
 - return snprintf(buf, PAGE_SIZE, %d\n, st_max_sg_segs);
 + return sprintf(buf, %d\n, st_max_sg_segs);
  }
 -static DRIVER_ATTR(max_sg_segs, S_IRUGO, st_max_sg_segs_show, NULL);
 +static DRIVER_ATTR_RO(max_sg_segs);
  
 -static ssize_t st_version_show(struct device_driver *ddd, char *buf)
 +static ssize_t version_show(struct device_driver *ddd, char *buf)
  {
 - return snprintf(buf, PAGE_SIZE, [%s]\n, verstr);
 + return sprintf(buf, [%s]\n, verstr);
  }
 -static DRIVER_ATTR(version, S_IRUGO, st_version_show, NULL);
 +static DRIVER_ATTR_RO(version);
  
  static struct attribute *st_drv_attrs[] = {
   driver_attr_try_direct_io.attr,
 --
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] st: convert DRIVER_ATTR macros to DRIVER_ATTR_RO

2015-06-24 Thread Sergey Senozhatsky
On (06/24/15 08:10), Greg KH gre...@linuxfoundation.org 
(gre...@linuxfoundation.org) wrote:
 On Wed, Jun 24, 2015 at 03:25:57PM +0900, Sergey Senozhatsky wrote:
  On (06/24/15 06:10), Seymour, Shane M wrote:
  [..]

/* The sysfs driver interface. Read-only at the moment */
   -static ssize_t st_try_direct_io_show(struct device_driver *ddp, char 
   *buf)
   +static ssize_t try_direct_io_show(struct device_driver *ddp, char *buf)
{
   - return snprintf(buf, PAGE_SIZE, %d\n, try_direct_io);
   + return sprintf(buf, %d\n, try_direct_io);
}
  
  a nitpick,
  
  per Documentation/filesystems/sysfs.txt
  
  :
  : - show() should always use scnprintf().
  :
 
 That should be rewritten to say, don't use snprintf(), but scnprintf(),
 if you want to.  Otherwise sprintf() should be fine as you obviously are
 only returning a single value to userspace
 

Sure, that was just a nitpick. For '%d' it's totally fine, I agree.
It was more of a 'do we strictly obey the rules' thing.

-ss
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[linux-next] scsi_attach_vpd() warning at mm/page_alloc.c:2497

2014-03-27 Thread Sergey Senozhatsky
Hello,

[1.971778] [ cut here ]
[1.971960] WARNING: CPU: 1 PID: 6 at mm/page_alloc.c:2497 
__alloc_pages_nodemask+0x1b9/0x693()
[1.972246] Modules linked in: sd_mod ahci
[1.972604] CPU: 1 PID: 6 Comm: kworker/u8:0 Not tainted 
3.14.0-rc8-next-20140327-dbg-dirty #202
[1.972890] Hardware name: Acer Aspire 5741G/Aspire 5741G
, BIOS V1.20 02/08/2011
[1.973182] Workqueue: events_unbound async_run_entry_fn
[1.973417]   8801534d5af0 813ad822 

[1.973994]  8801534d5b28 8103c03c 810ad4e5 
001000d0
[1.974529]  81656fc8  81656fc0 
8801534d5b38
[1.975125] Call Trace:
[1.975306]  [813ad822] dump_stack+0x4e/0x7a
[1.975488]  [8103c03c] warn_slowpath_common+0x75/0x8e
[1.975670]  [810ad4e5] ? __alloc_pages_nodemask+0x1b9/0x693
[1.975853]  [8103c0f4] warn_slowpath_null+0x15/0x17
[1.976035]  [810ad4e5] __alloc_pages_nodemask+0x1b9/0x693
[1.976220]  [81074220] ? console_unlock+0x2dd/0x2fa
[1.976404]  [812a08be] ? scsi_execute_req_flags+0x9c/0xb3
[1.976587]  [810ad9d1] __get_free_pages+0x12/0x3f
[1.976771]  [810d5c66] __kmalloc+0x37/0x112
[1.976951]  [8129b5f4] scsi_attach_vpd+0x41/0x1a8
[1.977133]  [812a3d81] scsi_probe_and_add_lun+0x8ec/0xa21
[1.977317]  [812a40f8] __scsi_add_device+0xce/0x10a
[1.977501]  [812b109a] ata_scsi_scan_host+0x60/0x142
[1.977683]  [812ad6e9] async_port_probe+0x45/0x4a
[1.977873]  [81057e18] async_run_entry_fn+0x5a/0x110
[1.978061]  [8104ee52] process_one_work+0x1c9/0x2e9
[1.978243]  [8104f3ec] worker_thread+0x1d3/0x2bd
[1.978424]  [8104f219] ? rescuer_thread+0x27d/0x27d
[1.978606]  [81053ab4] kthread+0xd6/0xde
[1.978786]  [810539de] ? kthread_create_on_node+0x162/0x162
[1.978970]  [813b2efc] ret_from_fork+0x7c/0xb0
[1.979151]  [810539de] ? kthread_create_on_node+0x162/0x162
[1.979334] ---[ end trace ca7c5bd74b0dca21 ]---


scsi_attach_vpd() (at pg0 label) attempts to firstly allocate 255 and later 
134217730 bytes.



---

general question,

scsi_attach_vpd() is sure that scsi_vpd_inquiry() returns `ret  0' in case of
an error and `ret  0' otherwise:

Returns size of the vpd page on success or a negative error number.

while this is not exactly true.

scsi_vpd_inquiry() indeed can return -EINVAL, but usually it returns
scsi_execute_req()-scsi_execute_req_flags()-scsi_execute() status, which
has different error indication and can be, e.g. for failed blk_get_request()
in scsi_execute() or failed sense kzalloc() in scsi_execute_req_flags(),
`DRIVER_ERROR  24'.


-ss
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-next] scsi_attach_vpd() warning at mm/page_alloc.c:2497

2014-03-27 Thread Sergey Senozhatsky
On (03/27/14 07:29), James Bottomley wrote:
 On Thu, 2014-03-27 at 13:19 +0300, Sergey Senozhatsky wrote:
  Hello,
  
  [1.971778] [ cut here ]
  [1.971960] WARNING: CPU: 1 PID: 6 at mm/page_alloc.c:2497 
  __alloc_pages_nodemask+0x1b9/0x693()
  [1.972246] Modules linked in: sd_mod ahci
  [1.972604] CPU: 1 PID: 6 Comm: kworker/u8:0 Not tainted 
  3.14.0-rc8-next-20140327-dbg-dirty #202
  [1.972890] Hardware name: Acer Aspire 5741G/Aspire 
  5741G, BIOS V1.20 02/08/2011
  [1.973182] Workqueue: events_unbound async_run_entry_fn
  [1.973417]   8801534d5af0 813ad822 
  
  [1.973994]  8801534d5b28 8103c03c 810ad4e5 
  001000d0
  [1.974529]  81656fc8  81656fc0 
  8801534d5b38
  [1.975125] Call Trace:
  [1.975306]  [813ad822] dump_stack+0x4e/0x7a
  [1.975488]  [8103c03c] warn_slowpath_common+0x75/0x8e
  [1.975670]  [810ad4e5] ? __alloc_pages_nodemask+0x1b9/0x693
  [1.975853]  [8103c0f4] warn_slowpath_null+0x15/0x17
  [1.976035]  [810ad4e5] __alloc_pages_nodemask+0x1b9/0x693
  [1.976220]  [81074220] ? console_unlock+0x2dd/0x2fa
  [1.976404]  [812a08be] ? scsi_execute_req_flags+0x9c/0xb3
  [1.976587]  [810ad9d1] __get_free_pages+0x12/0x3f
  [1.976771]  [810d5c66] __kmalloc+0x37/0x112
  [1.976951]  [8129b5f4] scsi_attach_vpd+0x41/0x1a8
  [1.977133]  [812a3d81] scsi_probe_and_add_lun+0x8ec/0xa21
  [1.977317]  [812a40f8] __scsi_add_device+0xce/0x10a
  [1.977501]  [812b109a] ata_scsi_scan_host+0x60/0x142
  [1.977683]  [812ad6e9] async_port_probe+0x45/0x4a
  [1.977873]  [81057e18] async_run_entry_fn+0x5a/0x110
  [1.978061]  [8104ee52] process_one_work+0x1c9/0x2e9
  [1.978243]  [8104f3ec] worker_thread+0x1d3/0x2bd
  [1.978424]  [8104f219] ? rescuer_thread+0x27d/0x27d
  [1.978606]  [81053ab4] kthread+0xd6/0xde
  [1.978786]  [810539de] ? kthread_create_on_node+0x162/0x162
  [1.978970]  [813b2efc] ret_from_fork+0x7c/0xb0
  [1.979151]  [810539de] ? kthread_create_on_node+0x162/0x162
  [1.979334] ---[ end trace ca7c5bd74b0dca21 ]---
  
  
  scsi_attach_vpd() (at pg0 label) attempts to firstly allocate 255 and later 
  134217730 bytes.
 
 What device is this?  can you do a sg_inq -H -p 0 /dev/sda and email the
 results to see if there's a code error or the device really is sending
 data that's insane.


$ sudo sg_inq -H -p 0 /dev/sda
VPD INQUIRY, page code=0x00:
 00 00 00 00 07 00 80 83 89  b0 b1 b2   ...


acer aspire 5741G laptop

[1.593890] ahci :00:1f.2: version 3.0
[1.594108] ahci :00:1f.2: irq 43 for MSI/MSI-X
[1.594169] ahci :00:1f.2: AHCI 0001.0300 32 slots 4 ports 3 Gbps 0x3 
impl SATA mode
[1.594454] ahci :00:1f.2: flags: 64bit ncq sntf stag pm led clo pio 
slum part ems apst 
[1.602452] scsi0 : ahci
[1.602853] scsi1 : ahci
[1.603574] scsi2 : ahci
[1.603939] scsi3 : ahci
[1.604185] ata1: SATA max UDMA/133 abar m2048@0xb4105000 port 0xb4105100 
irq 43
[1.604481] ata2: SATA max UDMA/133 abar m2048@0xb4105000 port 0xb4105180 
irq 43
[1.604765] ata3: DUMMY
[1.604925] ata4: DUMMY
[1.921966] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[1.922179] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[1.924100] ata1.00: ATA-8: WDC WD6400BEVT-22A0RT0, 01.01A01, max UDMA/133
[1.924280] ata1.00: 1250263728 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[1.926662] ata1.00: configured for UDMA/133
[1.927126] scsi 0:0:0:0: Direct-Access ATA  WDC WD6400BEVT-2 01.0 
PQ: 0 ANSI: 5
[1.929700] sd 0:0:0:0: [sda] 1250263728 512-byte logical blocks: (640 
GB/596 GiB)
[1.930393] sd 0:0:0:0: [sda] Write Protect is off
[1.930579] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[1.930822] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[1.938973] ata2.00: ATAPI: TSSTcorp CDDVDW TS-L633C, AC01, max UDMA/100
[1.939154] ata2.00: applying bridge limits
[1.957617] ata2.00: configured for UDMA/100
[1.963032] scsi 1:0:0:0: CD-ROMTSSTcorp CDDVDW TS-L633C  AC01 
PQ: 0 ANSI: 5
[1.971778] [ cut here ]
[1.971960] WARNING: CPU: 1 PID: 6 at mm/page_alloc.c:2497 
__alloc_pages_nodemask+0x1b9/0x693()
[1.972246] Modules linked in: sd_mod ahci
[1.972604] CPU: 1 PID: 6 Comm: kworker/u8:0 Not tainted 
3.14.0-rc8-next-20140327-dbg-dirty #202
[1.972890] Hardware name: Acer Aspire 5741G/Aspire 5741G
, BIOS V1.20 02/08/2011
[1.973182] Workqueue: events_unbound async_run_entry_fn
[1.973417]   8801534d5af0

Re: [linux-next] scsi_attach_vpd() warning at mm/page_alloc.c:2497

2014-03-27 Thread Sergey Senozhatsky
On (03/27/14 07:29), James Bottomley wrote:
 On Thu, 2014-03-27 at 13:19 +0300, Sergey Senozhatsky wrote:
  Hello,
  
  [1.971778] [ cut here ]
  [1.971960] WARNING: CPU: 1 PID: 6 at mm/page_alloc.c:2497 
  __alloc_pages_nodemask+0x1b9/0x693()
  [1.972246] Modules linked in: sd_mod ahci
  [1.972604] CPU: 1 PID: 6 Comm: kworker/u8:0 Not tainted 
  3.14.0-rc8-next-20140327-dbg-dirty #202
  [1.972890] Hardware name: Acer Aspire 5741G/Aspire 
  5741G, BIOS V1.20 02/08/2011
  [1.973182] Workqueue: events_unbound async_run_entry_fn
  [1.973417]   8801534d5af0 813ad822 
  
  [1.973994]  8801534d5b28 8103c03c 810ad4e5 
  001000d0
  [1.974529]  81656fc8  81656fc0 
  8801534d5b38
  [1.975125] Call Trace:
  [1.975306]  [813ad822] dump_stack+0x4e/0x7a
  [1.975488]  [8103c03c] warn_slowpath_common+0x75/0x8e
  [1.975670]  [810ad4e5] ? __alloc_pages_nodemask+0x1b9/0x693
  [1.975853]  [8103c0f4] warn_slowpath_null+0x15/0x17
  [1.976035]  [810ad4e5] __alloc_pages_nodemask+0x1b9/0x693
  [1.976220]  [81074220] ? console_unlock+0x2dd/0x2fa
  [1.976404]  [812a08be] ? scsi_execute_req_flags+0x9c/0xb3
  [1.976587]  [810ad9d1] __get_free_pages+0x12/0x3f
  [1.976771]  [810d5c66] __kmalloc+0x37/0x112
  [1.976951]  [8129b5f4] scsi_attach_vpd+0x41/0x1a8
  [1.977133]  [812a3d81] scsi_probe_and_add_lun+0x8ec/0xa21
  [1.977317]  [812a40f8] __scsi_add_device+0xce/0x10a
  [1.977501]  [812b109a] ata_scsi_scan_host+0x60/0x142
  [1.977683]  [812ad6e9] async_port_probe+0x45/0x4a
  [1.977873]  [81057e18] async_run_entry_fn+0x5a/0x110
  [1.978061]  [8104ee52] process_one_work+0x1c9/0x2e9
  [1.978243]  [8104f3ec] worker_thread+0x1d3/0x2bd
  [1.978424]  [8104f219] ? rescuer_thread+0x27d/0x27d
  [1.978606]  [81053ab4] kthread+0xd6/0xde
  [1.978786]  [810539de] ? kthread_create_on_node+0x162/0x162
  [1.978970]  [813b2efc] ret_from_fork+0x7c/0xb0
  [1.979151]  [810539de] ? kthread_create_on_node+0x162/0x162
  [1.979334] ---[ end trace ca7c5bd74b0dca21 ]---
  
  
  scsi_attach_vpd() (at pg0 label) attempts to firstly allocate 255 and later 
  134217730 bytes.
 
 What device is this?  can you do a sg_inq -H -p 0 /dev/sda and email the
 results to see if there's a code error or the device really is sending
 data that's insane.
 
  ---
  
  general question,
  
  scsi_attach_vpd() is sure that scsi_vpd_inquiry() returns `ret  0' in case 
  of
  an error and `ret  0' otherwise:
  
  Returns size of the vpd page on success or a negative error number.
  
  while this is not exactly true.
  
  scsi_vpd_inquiry() indeed can return -EINVAL, but usually it returns
  scsi_execute_req()-scsi_execute_req_flags()-scsi_execute() status, which
  has different error indication and can be, e.g. for failed blk_get_request()
  in scsi_execute() or failed sense kzalloc() in scsi_execute_req_flags(),
  `DRIVER_ERROR  24'.
 
 Yes, that looks to be the problem.  Can you test this patch?
 

NO mm/page_alloc.c warnings observed with the patch applied.

-ss

 James
 
 ---
 
 diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
 index 3241749..c4d632c 100644
 --- a/drivers/scsi/scsi.c
 +++ b/drivers/scsi/scsi.c
 @@ -952,7 +952,7 @@ static int scsi_vpd_inquiry(struct scsi_device *sdev, 
 unsigned char *buffer,
   result = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buffer,
 len, NULL, 30 * HZ, 3, NULL);
   if (result)
 - return result;
 + return -EIO;
  
   /* Sanity check that we got the page back that we asked for */
   if (buffer[1] != page)
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html