Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Chris Leech
On Fri, Dec 23, 2016 at 07:53:50AM +0800, Ming Lei wrote:
> On Fri, Dec 23, 2016 at 2:50 AM, Chris Leech  wrote:
> > I'm not reproducing any problems with xfstests running over iscsi_tcp
> > right now.  Two 10G luns exported from an LIO target, attached directly
> > to a test VM as sda/sdb and xfstests configured to use sda1/sdb1 as
> > TEST_DEV and SCRATCH_DEV.
> >
> > The virtio scatterlist issue that popped right away for me is triggered
> > by an hdparm ioctl, which is being run by tuned on Fedora.  And that
> > actually seems to happen back on 4.9 as well :(
> 
> Could you share us what the specific hdparm cmd line is? I tried several
> random cmds over virtio-blk/virito-scsi, looks not see this problem.

Of course, looks like I've screwed up my bisect run on this so I'm still
taking a look.  It triggers for me with 'hdparm -B /dev/vda' but may
also depend on kernel configuration.

I started with the fedora rawhide config with a lot of debug on and
trimed it down with a localmodconfig run in the VM to speed up rebuilds.

Chris

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Chris Leech
On Thu, Dec 22, 2016 at 05:50:12PM +1100, Dave Chinner wrote:
> On Wed, Dec 21, 2016 at 09:46:37PM -0800, Linus Torvalds wrote:
> > On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner  wrote:
> > >
> > > There may be deeper issues. I just started running scalability tests
> > > (e.g. 16-way fsmark create tests) and about a minute in I got a
> > > directory corruption reported - something I hadn't seen in the dev
> > > cycle at all.
> > 
> > By "in the dev cycle", do you mean your XFS changes, or have you been
> > tracking the merge cycle at least for some testing?
> 
> I mean the three months leading up to the 4.10 merge, when all the
> XFS changes were being tested against 4.9-rc kernels.
> 
> The iscsi problem showed up when I updated the base kernel from
> 4.9 to 4.10-current last week to test the pullreq I was going to
> send you. I've been bust with other stuff until now, so I didn't
> upgrade my working trees again until today in the hope the iscsi
> problem had already been found and fixed.
> 
> > > I unmounted the fs, mkfs'd it again, ran the
> > > workload again and about a minute in this fired:
> > >
> > > [628867.607417] [ cut here ]
> > > [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
> > > shadow_lru_isolate+0x171/0x220
> > 
> > Well, part of the changes during the merge window were the shadow
> > entry tracking changes that came in through Andrew's tree. Adding
> > Johannes Weiner to the participants.
> > 
> > > Now, this workload does not touch the page cache at all - it's
> > > entirely an XFS metadata workload, so it should not really be
> > > affecting the working set code.
> > 
> > Well, I suspect that anything that creates memory pressure will end up
> > triggering the working set code, so ..
> > 
> > That said, obviously memory corruption could be involved and result in
> > random issues too, but I wouldn't really expect that in this code.
> > 
> > It would probably be really useful to get more data points - is the
> > problem reliably in this area, or is it going to be random and all
> > over the place.
> 
> The iscsi problem is 100% reproducable. create a pair of iscsi luns,
> mkfs, run xfstests on them. iscsi fails a second after xfstests mounts
> the filesystems.
> 
> The test machine I'm having all these other problems on? stable and
> steady as a rock using PMEM devices. Moment I go to use /dev/vdc
> (i.e. run load/perf benchmarks) it starts falling over left, right
> and center.

I'm not reproducing any problems with xfstests running over iscsi_tcp
right now.  Two 10G luns exported from an LIO target, attached directly
to a test VM as sda/sdb and xfstests configured to use sda1/sdb1 as
TEST_DEV and SCRATCH_DEV.

The virtio scatterlist issue that popped right away for me is triggered
by an hdparm ioctl, which is being run by tuned on Fedora.  And that
actually seems to happen back on 4.9 as well :(

Chris

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: iscsistart -N skips offload NICs causing boot to fail on debian

2016-12-22 Thread Andrew Patterson
On 12/20/2016 12:28 PM, Andrew Patterson wrote:
> On 12/20/2016 12:13 PM, Mike Christie wrote:
>> On 12/20/2016 12:15 PM, Andrew Patterson wrote:
>>> On 12/12/2016 02:38 PM, Andrew Patterson wrote:


 On Friday, December 9, 2016 at 3:33:57 PM UTC-7, Mike Christie wrote:

 On 12/08/2016 10:55 AM, andrew.p...@hpe.com  wrote:
 > I am trying to get iSCSI boot working on a debian-based system
 > using built-in uEFI iSCSI initiator firmware and broadcom
 > NICs (bnx2x).
 >
 > The debian initramfs uses the ibft support in iscsistart to log
 > into the root volume. The initramfs script uses iscsistart -N to
 > bring up the NICs before logging in with iscsstart -b. This
 > process works fine when using non-offload NICs, but fails when
 > using NICs using the bnx2, bnx2x, cxgb3, and cxgb4 drivers due to
 > the following code in
 > utils/fwparam_ibft/fw_entry.c:fw_setup_nics():
 >
 > list_for_each_entry(context, , list) {  
 > /* if it is a offload nic ignore it */
 > if (!net_get_transport_name_from_netdev(context->iface,
 > transport))
 > continue
 >
 >
 > Which does a lookup in the table in usr/iscsi_net_util.c
 >
 > static struct iscsi_net_driver net_drivers[] = {
 > {"cxgb3", "cxgb3i" },
 > {"cxgb4", "cxgb4i" },
 > {"bnx2", "bnx2i" },
 > {"bnx2x", "bnx2i"},
 > {NULL, NULL}
 > };
 >
 >
 > to see if -N should skip this NIC.  I cannot determine the reason
 > for this check. Do iscsi offload NICs not need to be configured
 > for boot? The current code causes iscsi boots to fail. I removed the
 > check and booting works fine.
 >

 If you use the ibft net info for the nic, what do you use for the iscsi
 offload engine? 


 I am not configuring it at boot (at least not yet). I haven't tried
 using the boot support in the NIC firmware yet, just the uEFI iSCSI 
 firmware
 which will work with any NIC.


  

 Do they both have the same IP, and are you creating boot
 iscsi sessions through the offload engine?


 I am not using offload yet. But I will probably need to get this working 
 next.



 cxgb*i is able to share the net info. The normal old nic engine and
 iscsi engine are able to use the same IP. That originally was not
 supposed to be allowed upstream, but it got in.

 bnx2* is (or at least was when we wrote that code) not able to share 
 the
 net info. The nic and iscsi engines need different IPs.


 That makes some sense. The firmware has two different MAC addresses,
 one for the NIC and one for iSCSI.


  


 So to cover both types we just use the ibft net info for the iscsi
 offload engines.

 In the version of the code you are using is OFFLOAD_BOOT_SUPPORTED
 defined in open-iscsi/usr/iface.c or not?


 Version 2.0.874. OFFLOAD_BOOT_SUPPORTED is in the code but not defined.



 If it's not defined and you modified the code like you described, then 
 I
 think you would use the ibft info to setup the normal old nic, and then
 you would end up creating a boot session using software iscsi. 


 Yes, I believe that is the result I am getting.


  

 I am not
 100% sure about this last statement. I cannot remember and just looked
 at the code for a couple minutes.


 I will try defining OFFLOAD_BOOT_SUPPORTED to see if that brings up the 
 NIC.

  

>>>
>>> I tried defining OFFLOAD_BOOT_SUPPORTED. The NIC is brought up but no
>>> IP is assigned. Is that expected? I have been using the uEFI iSCSI
>>
>> It is expected for what you are doing. I misunderstood what you were
>> trying to do.
>>
>>> driver which does not support hardware offload. I also have at least
>>> one bnx2x NIC that does not support iSCSI hardware offload:
>>>
>>> 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719
>>> Gigabit Ethernet PCIe (rev 01)
>>> 02:00.0 0200: 14e4:1657 (rev 01)
>>>
>>> Do we need some sort of whitelist or a command-line flag to support these 
>>> sort
>>> of configurations?
>>>
>>
>> I thought you wanted to use a iscsi offload engine in the broadcom card.
>> If you wanted to use a card that is in those offload lists for normal
>> old networking, have iscsistart bring it up and do iscsi root using
>> software iscsi, then yeah, I think we need some sort of flag.
>>
> 
> I want to do both. I 

Re: Problem with iSCSI connected LTO-2 tape drive

2016-12-22 Thread The Lee-Man
Hi David:

I have created Issue#35 for this on github.

On Thursday, December 15, 2016 at 2:00:15 PM UTC-8, David C. Partridge 
wrote:
>
> You’re right, it is in section 8.2.  Maybe it needs to be said in 8.1.1 as 
> well?
>
>  
>
> Dave
>
>  
>
> *From:* open-iscsi@googlegroups.com [mailto:open-iscsi@googlegroups.com] *On 
> Behalf Of *Lee Duncan
> *Sent:* 15 December 2016 19:41
> *To:* open-iscsi@googlegroups.com
> *Subject:* Re: Problem with iSCSI connected LTO-2 tape drive
>
>  
>
> On Dec 15, 2016, at 7:14 AM, david.partri...@perdrix.co.uk wrote:
>
>  
>
> Lee,
>
> It would appear that the guilty party was:
>
> node.conn[0].timeo.noop_out_interval = 5
> node.conn[0].timeo.noop_out_timeout = 5
>
> I changed both of these to 0 for the tape device and the problem went away.
>
>  
>
> Excellent.
>
>
>
>
> Please note that the README.gz for open-scsi doesn't actually say that 
> this is what you need to do to disable the NOP-out polling, so could I 
> suggest that this be stated explicitly. 
>
>  
>
> My README says, in section 8.2:
>
>  
>
> For this setup, you can turn off iSCSI pings by setting:
>
>  
>
> node.conn[0].timeo.noop_out_interval = 0
>
> node.conn[0].timeo.noop_out_timeout = 0
>
>  
>
>
>
> I must admit that I find it hard to imagine that an iSCSI target would 
> reply to a NOP-out while it was processing a command such as a tape fsf or 
> even tape erase (whose timeout is 6 * the long-timeout of 4 hours).  Should 
> perhaps the NOP-out polling be suspended while a command is being 
> processed?  Or alternatively maybe the NOP-out polling be completely 
> disabled by default with something in the README.gz file that explains WHEN 
> you might want it and how to enable it.   It's certainly clear that (at 
> least) the MS iSCSI initiator doesn't send NOP-out polls.
>
>  
>
> open-iscsi is normally used to deal with discs. When it’s used with tape 
> it’s not unusual to find bugs or design errors that we did not know were 
> present.
>
>  
>
> Perhaps a small blurb in the README about dealing with tape, suggesting 
> turning NOOP/ping off. Please feel free to post a pull request on github or 
> suggest a patch on this list.
>
>  
>
>
>
>
> Regards
> Dave
>
>  
>
>  
>
> -- 
>
> Lee Duncan
>
>  
>
> "Choice means saying no to one thing so you can say yes to another." -- 
> Dan Millman
>
>  
>
> -- 
> You received this message because you are subscribed to a topic in the 
> Google Groups "open-iscsi" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/open-iscsi/ViC-za8eHdc/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> open-iscsi+unsubscr...@googlegroups.com.
> To post to this group, send email to open-iscsi@googlegroups.com.
> Visit this group at https://groups.google.com/group/open-iscsi.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Dave Chinner
On Wed, Dec 21, 2016 at 09:46:37PM -0800, Linus Torvalds wrote:
> On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner  wrote:
> >
> > There may be deeper issues. I just started running scalability tests
> > (e.g. 16-way fsmark create tests) and about a minute in I got a
> > directory corruption reported - something I hadn't seen in the dev
> > cycle at all.
> 
> By "in the dev cycle", do you mean your XFS changes, or have you been
> tracking the merge cycle at least for some testing?

I mean the three months leading up to the 4.10 merge, when all the
XFS changes were being tested against 4.9-rc kernels.

The iscsi problem showed up when I updated the base kernel from
4.9 to 4.10-current last week to test the pullreq I was going to
send you. I've been bust with other stuff until now, so I didn't
upgrade my working trees again until today in the hope the iscsi
problem had already been found and fixed.

> > I unmounted the fs, mkfs'd it again, ran the
> > workload again and about a minute in this fired:
> >
> > [628867.607417] [ cut here ]
> > [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
> > shadow_lru_isolate+0x171/0x220
> 
> Well, part of the changes during the merge window were the shadow
> entry tracking changes that came in through Andrew's tree. Adding
> Johannes Weiner to the participants.
> 
> > Now, this workload does not touch the page cache at all - it's
> > entirely an XFS metadata workload, so it should not really be
> > affecting the working set code.
> 
> Well, I suspect that anything that creates memory pressure will end up
> triggering the working set code, so ..
> 
> That said, obviously memory corruption could be involved and result in
> random issues too, but I wouldn't really expect that in this code.
> 
> It would probably be really useful to get more data points - is the
> problem reliably in this area, or is it going to be random and all
> over the place.

The iscsi problem is 100% reproducable. create a pair of iscsi luns,
mkfs, run xfstests on them. iscsi fails a second after xfstests mounts
the filesystems.

The test machine I'm having all these other problems on? stable and
steady as a rock using PMEM devices. Moment I go to use /dev/vdc
(i.e. run load/perf benchmarks) it starts falling over left, right
and center.

And I just smacked into this in the bulkstat phase of the benchmark
(mkfs, fsmark, xfs_repair, mount, bulkstat, find, grep, rm):

[ 2729.750563] BUG: Bad page state in process bstat  pfn:14945
[ 2729.751863] page:ea525140 count:-1 mapcount:0 mapping:  
(null) index:0x0
[ 2729.753763] flags: 0x4000()
[ 2729.754671] raw: 4000   

[ 2729.756469] raw: dead0100 dead0200  

[ 2729.758276] page dumped because: nonzero _refcount
[ 2729.759393] Modules linked in:
[ 2729.760137] CPU: 7 PID: 25902 Comm: bstat Tainted: GB   
4.9.0-dgc #18
[ 2729.761888] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[ 2729.763943] Call Trace:
[ 2729.764523]  
[ 2729.765004]  dump_stack+0x63/0x83
[ 2729.765784]  bad_page+0xc4/0x130
[ 2729.766552]  free_pages_check_bad+0x4f/0x70
[ 2729.767531]  free_pcppages_bulk+0x3c5/0x3d0
[ 2729.768513]  ? page_alloc_cpu_dead+0x30/0x30
[ 2729.769510]  drain_pages_zone+0x41/0x60
[ 2729.770417]  drain_pages+0x3e/0x60
[ 2729.771215]  drain_local_pages+0x24/0x30
[ 2729.772138]  flush_smp_call_function_queue+0x88/0x160
[ 2729.773317]  generic_smp_call_function_single_interrupt+0x13/0x30
[ 2729.774742]  smp_call_function_single_interrupt+0x27/0x40
[ 2729.776000]  smp_call_function_interrupt+0xe/0x10
[ 2729.777102]  call_function_interrupt+0x8e/0xa0
[ 2729.778147] RIP: 0010:delay_tsc+0x41/0x90
[ 2729.779085] RSP: 0018:c9000f0cf500 EFLAGS: 0202 ORIG_RAX: 
ff03
[ 2729.780852] RAX: 77541291 RBX: 88008b5efe40 RCX: 002e
[ 2729.782514] RDX: 0577 RSI: 05541291 RDI: 0001
[ 2729.784167] RBP: c9000f0cf500 R08: 0007 R09: c9000f0cf678
[ 2729.785818] R10: 0006 R11: 1000 R12: 0061
[ 2729.787480] R13: 0001 R14: 83214e30 R15: 0080
[ 2729.789124]  
[ 2729.789626]  __delay+0xf/0x20
[ 2729.790333]  do_raw_spin_lock+0x8c/0x160
[ 2729.791255]  _raw_spin_lock+0x15/0x20
[ 2729.792112]  list_lru_add+0x1a/0x70
[ 2729.792932]  xfs_buf_rele+0x3e7/0x410
[ 2729.793792]  xfs_buftarg_shrink_scan+0x6b/0x80
[ 2729.794841]  shrink_slab.part.65.constprop.86+0x1dc/0x410
[ 2729.796099]  shrink_node+0x57/0x90
[ 2729.796905]  do_try_to_free_pages+0xdd/0x230
[ 2729.797914]  try_to_free_pages+0xce/0x1a0
[ 2729.798852]  __alloc_pages_slowpath+0x2df/0x960
[ 2729.799908]  __alloc_pages_nodemask+0x24b/0x290
[ 2729.800963]  new_slab+0x2ac/0x380
[ 2729.801743]  ___slab_alloc.constprop.82+0x336/0x440
[ 

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Dave Chinner
On Thu, Dec 22, 2016 at 04:13:22PM +1100, Dave Chinner wrote:
> On Wed, Dec 21, 2016 at 04:13:03PM -0800, Chris Leech wrote:
> > On Wed, Dec 21, 2016 at 03:19:15PM -0800, Linus Torvalds wrote:
> > > Hi,
> > > 
> > > On Wed, Dec 21, 2016 at 2:16 PM, Dave Chinner  wrote:
> > > > On Fri, Dec 16, 2016 at 10:59:06AM -0800, Chris Leech wrote:
> > > >> Thanks Dave,
> > > >>
> > > >> I'm hitting a bug at scatterlist.h:140 before I even get any iSCSI
> > > >> modules loaded (virtio block) so there's something else going on in the
> > > >> current merge window.  I'll keep an eye on it and make sure there's
> > > >> nothing iSCSI needs fixing for.
> > > >
> > > > OK, so before this slips through the cracks.
> > > >
> > > > Linus - your tree as of a few minutes ago still panics immediately
> > > > when starting xfstests on iscsi devices. It appears to be a
> > > > scatterlist corruption and not an iscsi problem, so the iscsi guys
> > > > seem to have bounced it and no-one is looking at it.
> > > 
> > > Hmm. There's not much to go by.
> > > 
> > > Can somebody in iscsi-land please try to just bisect it - I'm not
> > > seeing a lot of clues to where this comes from otherwise.
> > 
> > Yeah, my hopes of this being quickly resolved by someone else didn't
> > work out and whatever is going on in that test VM is looking like a
> > different kind of odd.  I'm saving that off for later, and seeing if I
> > can't be a bisect on the iSCSI issue.
> 
> There may be deeper issues. I just started running scalability tests
> (e.g. 16-way fsmark create tests) and about a minute in I got a
> directory corruption reported - something I hadn't seen in the dev
> cycle at all. I unmounted the fs, mkfs'd it again, ran the
> workload again and about a minute in this fired:
> 
> [628867.607417] [ cut here ]
> [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
> shadow_lru_isolate+0x171/0x220
> [628867.610702] Modules linked in:
> [628867.611375] CPU: 2 PID: 16925 Comm: kworker/2:97 Tainted: GW  
>  4.9.0-dgc #18
> [628867.613382] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Debian-1.8.2-1 04/01/2014
> [628867.616179] Workqueue: events rht_deferred_worker
> [628867.632422] Call Trace:
> [628867.634691]  dump_stack+0x63/0x83
> [628867.637937]  __warn+0xcb/0xf0
> [628867.641359]  warn_slowpath_null+0x1d/0x20
> [628867.643362]  shadow_lru_isolate+0x171/0x220
> [628867.644627]  __list_lru_walk_one.isra.11+0x79/0x110
> [628867.645780]  ? __list_lru_init+0x70/0x70
> [628867.646628]  list_lru_walk_one+0x17/0x20
> [628867.647488]  scan_shadow_nodes+0x34/0x50
> [628867.648358]  shrink_slab.part.65.constprop.86+0x1dc/0x410
> [628867.649506]  shrink_node+0x57/0x90
> [628867.650233]  do_try_to_free_pages+0xdd/0x230
> [628867.651157]  try_to_free_pages+0xce/0x1a0
> [628867.652342]  __alloc_pages_slowpath+0x2df/0x960
> [628867.653332]  ? __might_sleep+0x4a/0x80
> [628867.654148]  __alloc_pages_nodemask+0x24b/0x290
> [628867.655237]  kmalloc_order+0x21/0x50
> [628867.656016]  kmalloc_order_trace+0x24/0xc0
> [628867.656878]  __kmalloc+0x17d/0x1d0
> [628867.657644]  bucket_table_alloc+0x195/0x1d0
> [628867.658564]  ? __might_sleep+0x4a/0x80
> [628867.659449]  rht_deferred_worker+0x287/0x3c0
> [628867.660366]  ? _raw_spin_unlock_irq+0xe/0x30
> [628867.661294]  process_one_work+0x1de/0x4d0
> [628867.662208]  worker_thread+0x4b/0x4f0
> [628867.662990]  kthread+0x10c/0x140
> [628867.663687]  ? process_one_work+0x4d0/0x4d0
> [628867.664564]  ? kthread_create_on_node+0x40/0x40
> [628867.665523]  ret_from_fork+0x25/0x30
> [628867.666317] ---[ end trace 7c38634006a9955e ]---
> 
> Now, this workload does not touch the page cache at all - it's
> entirely an XFS metadata workload, so it should not really be
> affecting the working set code.

The system back up, and I haven't reproduced this problem yet.
However, benchmark results are way off where they should be, and at
times the performance is utterly abysmal. The XFS for-next tree
based on the 4.9 kernel shows none of these problems, so I don't
think there's an XFS problem here. Workload is the same 16-way
fsmark workload that I've been using for years as a performance
regression test.

The workload normally averages around 230k files/s - i'm seeing
and average of ~175k files/s on you current kernel. And there are
periods where performance just completely tanks:

#  ./fs_mark  -D  1  -S0  -n  10  -s  0  -L  32  -d  /mnt/scratch/0  -d 
 /mnt/scratch/1  -d  /mnt/scratch/2  -d  /mnt/scratch/3  -d  /mnt/scratch/4  -d 
 /mnt/scratch/5  -d  /mnt/scratch/6  -d  /mnt/scratch/7  -d  /mnt/scratch/8  -d 
 /mnt/scratch/9  -d  /mnt/scratch/10  -d  /mnt/scratch/11  -d  /mnt/scratch/12  
-d  /mnt/scratch/13  -d  /mnt/scratch/14  -d  /mnt/scratch/15
#   Version 3.3, 16 thread(s) starting at Thu Dec 22 16:29:20 2016
#   Sync method: NO SYNC: Test does not issue sync() or fsync() calls.
#   Directories:  Time based hash 

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Dave Chinner
On Wed, Dec 21, 2016 at 04:13:03PM -0800, Chris Leech wrote:
> On Wed, Dec 21, 2016 at 03:19:15PM -0800, Linus Torvalds wrote:
> > Hi,
> > 
> > On Wed, Dec 21, 2016 at 2:16 PM, Dave Chinner  wrote:
> > > On Fri, Dec 16, 2016 at 10:59:06AM -0800, Chris Leech wrote:
> > >> Thanks Dave,
> > >>
> > >> I'm hitting a bug at scatterlist.h:140 before I even get any iSCSI
> > >> modules loaded (virtio block) so there's something else going on in the
> > >> current merge window.  I'll keep an eye on it and make sure there's
> > >> nothing iSCSI needs fixing for.
> > >
> > > OK, so before this slips through the cracks.
> > >
> > > Linus - your tree as of a few minutes ago still panics immediately
> > > when starting xfstests on iscsi devices. It appears to be a
> > > scatterlist corruption and not an iscsi problem, so the iscsi guys
> > > seem to have bounced it and no-one is looking at it.
> > 
> > Hmm. There's not much to go by.
> > 
> > Can somebody in iscsi-land please try to just bisect it - I'm not
> > seeing a lot of clues to where this comes from otherwise.
> 
> Yeah, my hopes of this being quickly resolved by someone else didn't
> work out and whatever is going on in that test VM is looking like a
> different kind of odd.  I'm saving that off for later, and seeing if I
> can't be a bisect on the iSCSI issue.

There may be deeper issues. I just started running scalability tests
(e.g. 16-way fsmark create tests) and about a minute in I got a
directory corruption reported - something I hadn't seen in the dev
cycle at all. I unmounted the fs, mkfs'd it again, ran the
workload again and about a minute in this fired:

[628867.607417] [ cut here ]
[628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
shadow_lru_isolate+0x171/0x220
[628867.610702] Modules linked in:
[628867.611375] CPU: 2 PID: 16925 Comm: kworker/2:97 Tainted: GW   
4.9.0-dgc #18
[628867.613382] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[628867.616179] Workqueue: events rht_deferred_worker
[628867.632422] Call Trace:
[628867.634691]  dump_stack+0x63/0x83
[628867.637937]  __warn+0xcb/0xf0
[628867.641359]  warn_slowpath_null+0x1d/0x20
[628867.643362]  shadow_lru_isolate+0x171/0x220
[628867.644627]  __list_lru_walk_one.isra.11+0x79/0x110
[628867.645780]  ? __list_lru_init+0x70/0x70
[628867.646628]  list_lru_walk_one+0x17/0x20
[628867.647488]  scan_shadow_nodes+0x34/0x50
[628867.648358]  shrink_slab.part.65.constprop.86+0x1dc/0x410
[628867.649506]  shrink_node+0x57/0x90
[628867.650233]  do_try_to_free_pages+0xdd/0x230
[628867.651157]  try_to_free_pages+0xce/0x1a0
[628867.652342]  __alloc_pages_slowpath+0x2df/0x960
[628867.653332]  ? __might_sleep+0x4a/0x80
[628867.654148]  __alloc_pages_nodemask+0x24b/0x290
[628867.655237]  kmalloc_order+0x21/0x50
[628867.656016]  kmalloc_order_trace+0x24/0xc0
[628867.656878]  __kmalloc+0x17d/0x1d0
[628867.657644]  bucket_table_alloc+0x195/0x1d0
[628867.658564]  ? __might_sleep+0x4a/0x80
[628867.659449]  rht_deferred_worker+0x287/0x3c0
[628867.660366]  ? _raw_spin_unlock_irq+0xe/0x30
[628867.661294]  process_one_work+0x1de/0x4d0
[628867.662208]  worker_thread+0x4b/0x4f0
[628867.662990]  kthread+0x10c/0x140
[628867.663687]  ? process_one_work+0x4d0/0x4d0
[628867.664564]  ? kthread_create_on_node+0x40/0x40
[628867.665523]  ret_from_fork+0x25/0x30
[628867.666317] ---[ end trace 7c38634006a9955e ]---

Now, this workload does not touch the page cache at all - it's
entirely an XFS metadata workload, so it should not really be
affecting the working set code.

And worse, on that last error, the /host/ is now going into meltdown
(running 4.7.5) with 32 CPUs all burning down in ACPI code:

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
35074 root  -2   0   0  0  0 R  99.0  0.0  12:38.92 acpi_pad/12
35079 root  -2   0   0  0  0 R  99.0  0.0  12:39.40 acpi_pad/16
35080 root  -2   0   0  0  0 R  99.0  0.0  12:39.29 acpi_pad/17
35085 root  -2   0   0  0  0 R  99.0  0.0  12:39.35 acpi_pad/22
35087 root  -2   0   0  0  0 R  99.0  0.0  12:39.13 acpi_pad/24
35090 root  -2   0   0  0  0 R  99.0  0.0  12:38.89 acpi_pad/27
35093 root  -2   0   0  0  0 R  99.0  0.0  12:38.88 acpi_pad/30
35063 root  -2   0   0  0  0 R  98.1  0.0  12:40.64 acpi_pad/1
35065 root  -2   0   0  0  0 R  98.1  0.0  12:40.38 acpi_pad/3
35066 root  -2   0   0  0  0 R  98.1  0.0  12:40.30 acpi_pad/4
35067 root  -2   0   0  0  0 R  98.1  0.0  12:40.82 acpi_pad/5
35077 root  -2   0   0  0  0 R  98.1  0.0  12:39.65 acpi_pad/14
35078 root  -2   0   0  0  0 R  98.1  0.0  12:39.58 acpi_pad/15
35081 root  -2   0   0  0  0 R  98.1  0.0  12:39.32 acpi_pad/18
35072 root  -2   0   0  0  0 R  96.2  

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Linus Torvalds
On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner  wrote:
>
> There may be deeper issues. I just started running scalability tests
> (e.g. 16-way fsmark create tests) and about a minute in I got a
> directory corruption reported - something I hadn't seen in the dev
> cycle at all.

By "in the dev cycle", do you mean your XFS changes, or have you been
tracking the merge cycle at least for some testing?

> I unmounted the fs, mkfs'd it again, ran the
> workload again and about a minute in this fired:
>
> [628867.607417] [ cut here ]
> [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
> shadow_lru_isolate+0x171/0x220

Well, part of the changes during the merge window were the shadow
entry tracking changes that came in through Andrew's tree. Adding
Johannes Weiner to the participants.

> Now, this workload does not touch the page cache at all - it's
> entirely an XFS metadata workload, so it should not really be
> affecting the working set code.

Well, I suspect that anything that creates memory pressure will end up
triggering the working set code, so ..

That said, obviously memory corruption could be involved and result in
random issues too, but I wouldn't really expect that in this code.

It would probably be really useful to get more data points - is the
problem reliably in this area, or is it going to be random and all
over the place.

That said:

> And worse, on that last error, the /host/ is now going into meltdown
> (running 4.7.5) with 32 CPUs all burning down in ACPI code:

The obvious question here is how much you trust the environment if the
host ends up also showing problems. Maybe you do end up having hw
issues pop up too.

The primary suspect would presumably be the development kernel you're
testing triggering something, but it has to be asked..

 Linus

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Christoph Hellwig
On Thu, Dec 22, 2016 at 05:30:46PM +1100, Dave Chinner wrote:
> > For "normal" bios the for_each_segment loop iterates over bi_vcnt,
> > so it will be ignored anyway.  That being said both I and the lists
> > got CCed halfway through the thread and I haven't seen the original
> > report, so I'm not really sure what's going on here anyway.
> 
> http://www.gossamer-threads.com/lists/linux/kernel/2587485

This doesn't look like the discard changes, but if Chris wants to test
without them f9d03f96b988 reverts cleanly.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Dave Chinner
On Thu, Dec 22, 2016 at 07:18:27AM +0100, Christoph Hellwig wrote:
> On Wed, Dec 21, 2016 at 03:19:15PM -0800, Linus Torvalds wrote:
> > Looking around a bit, the only even halfway suspicious scatterlist
> > initialization thing I see is commit f9d03f96b988 ("block: improve
> > handling of the magic discard payload") which used to have a magic
> > hack wrt !bio->bi_vcnt, and that got removed. See __blk_bios_map_sg(),
> > now it does __blk_bvec_map_sg() instead.
> 
> But that check was only for discard (and discard-like) bios which
> had the maic single page that sometimes was unused attached.
> 
> For "normal" bios the for_each_segment loop iterates over bi_vcnt,
> so it will be ignored anyway.  That being said both I and the lists
> got CCed halfway through the thread and I haven't seen the original
> report, so I'm not really sure what's going on here anyway.

http://www.gossamer-threads.com/lists/linux/kernel/2587485

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Dave Chinner
On Fri, Dec 16, 2016 at 10:59:06AM -0800, Chris Leech wrote:
> Thanks Dave,
> 
> I'm hitting a bug at scatterlist.h:140 before I even get any iSCSI
> modules loaded (virtio block) so there's something else going on in the
> current merge window.  I'll keep an eye on it and make sure there's
> nothing iSCSI needs fixing for.

OK, so before this slips through the cracks.

Linus - your tree as of a few minutes ago still panics immediately
when starting xfstests on iscsi devices. It appears to be a
scatterlist corruption and not an iscsi problem, so the iscsi guys
seem to have bounced it and no-one is looking at it.

I'm disappearing for several months at the end of tomorrow, so I
thought I better make sure you know about it.  I've also added
linux-scsi, linux-block to the cc list

Cheers,

Dave.

> On Thu, Dec 15, 2016 at 09:29:53AM +1100, Dave Chinner wrote:
> > On Thu, Dec 15, 2016 at 09:24:11AM +1100, Dave Chinner wrote:
> > > Hi folks,
> > > 
> > > Just updated my test boxes from 4.9 to a current Linus 4.10 merge
> > > window kernel to test the XFS merge I am preparing for Linus.
> > > Unfortunately, all my test VMs using iscsi failed pretty much
> > > instantly on the first mount of an iscsi device:
> > > 
> > > [  159.372704] XFS (sdb): EXPERIMENTAL reverse mapping btree feature 
> > > enabled. Use at your own risk!
> > > [  159.374612] XFS (sdb): Mounting V5 Filesystem
> > > [  159.425710] XFS (sdb): Ending clean mount
> > > [  160.274438] BUG: unable to handle kernel NULL pointer dereference at 
> > > 000c
> > > [  160.275851] IP: iscsi_tcp_segment_done+0x20d/0x2e0
> > 
> > FYI, crash is here:
> > 
> > (gdb) l *(iscsi_tcp_segment_done+0x20d)
> > 0x81b950bd is in iscsi_tcp_segment_done 
> > (drivers/scsi/libiscsi_tcp.c:102).
> > 97  iscsi_tcp_segment_init_sg(struct iscsi_segment *segment,
> > 98struct scatterlist *sg, unsigned int offset)
> > 99  {
> > 100 segment->sg = sg;
> > 101 segment->sg_offset = offset;
> > 102 segment->size = min(sg->length - offset,
> > 103 segment->total_size - 
> > segment->total_copied);
> > 104 segment->data = NULL;
> > 105 }
> > 106 
> > 
> > So it looks to be sg = NULL, which means there's probably an issue
> > with the scatterlist...
> > 
> > -Dave.
> > 
> > > [  160.276565] PGD 336ed067 [  160.276885] PUD 31b0d067
> > > PMD 0 [  160.277309]
> > > [  160.277523] Oops:  [#1] PREEMPT SMP
> > > [  160.278004] Modules linked in:
> > > [  160.278407] CPU: 0 PID: 16 Comm: kworker/u2:1 Not tainted 4.9.0-dgc #18
> > > [  160.279224] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > BIOS Debian-1.8.2-1 04/01/2014
> > > [  160.280314] Workqueue: iscsi_q_2 iscsi_xmitworker
> > > [  160.280919] task: 88003e28 task.stack: c908
> > > [  160.281647] RIP: 0010:iscsi_tcp_segment_done+0x20d/0x2e0
> > > [  160.282312] RSP: 0018:c9083c38 EFLAGS: 00010206
> > > [  160.282980] RAX:  RBX: 880039061730 RCX: 
> > > 
> > > [  160.283854] RDX: 1e00 RSI:  RDI: 
> > > 880039061730
> > > [  160.284738] RBP: c9083c90 R08: 0200 R09: 
> > > 05a8
> > > [  160.285627] R10: 9835607d R11:  R12: 
> > > 0200
> > > [  160.286495] R13:  R14: 8800390615a0 R15: 
> > > 880039061730
> > > [  160.287362] FS:  () GS:88003fc0() 
> > > knlGS:
> > > [  160.288340] CS:  0010 DS:  ES:  CR0: 80050033
> > > [  160.289113] CR2: 000c CR3: 31a8d000 CR4: 
> > > 06f0
> > > [  160.290084] Call Trace:
> > > [  160.290429]  ? inet_sendpage+0x4d/0x140
> > > [  160.290957]  iscsi_sw_tcp_xmit_segment+0x89/0x110
> > > [  160.291597]  iscsi_sw_tcp_pdu_xmit+0x56/0x180
> > > [  160.292190]  iscsi_tcp_task_xmit+0xb8/0x280
> > > [  160.292771]  iscsi_xmit_task+0x53/0xc0
> > > [  160.293282]  iscsi_xmitworker+0x274/0x310
> > > [  160.293835]  process_one_work+0x1de/0x4d0
> > > [  160.294388]  worker_thread+0x4b/0x4f0
> > > [  160.294889]  kthread+0x10c/0x140
> > > [  160.295333]  ? process_one_work+0x4d0/0x4d0
> > > [  160.295898]  ? kthread_create_on_node+0x40/0x40
> > > [  160.296525]  ret_from_fork+0x25/0x30
> > > [  160.297015] Code: 43 18 00 00 00 00 e9 ad fe ff ff 48 8b 7b 30 e8 da 
> > > e7 ca ff 8b 53 10 44 89 ee 48 89 df 2b 53 14 48 89 43 30 c7 43 40 00 00 
> > > 00 00 <8b
> > > [  160.300674] RIP: iscsi_tcp_segment_done+0x20d/0x2e0 RSP: 
> > > c9083c38
> > > [  160.301584] CR2: 000c
> > > 
> > > 
> > > Known problem, or something new?
> > > 
> > > Cheers,
> > > 
> > > Dave.
> > > -- 
> > > Dave Chinner
> > > da...@fromorbit.com
> > > 
> > 
> > -- 
> > Dave Chinner
> > da...@fromorbit.com
> 

-- 
Dave Chinner
da...@fromorbit.com

-- 
You received this message because you are subscribed to the Google 

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Christoph Hellwig
On Wed, Dec 21, 2016 at 03:19:15PM -0800, Linus Torvalds wrote:
> Looking around a bit, the only even halfway suspicious scatterlist
> initialization thing I see is commit f9d03f96b988 ("block: improve
> handling of the magic discard payload") which used to have a magic
> hack wrt !bio->bi_vcnt, and that got removed. See __blk_bios_map_sg(),
> now it does __blk_bvec_map_sg() instead.

But that check was only for discard (and discard-like) bios which
had the maic single page that sometimes was unused attached.

For "normal" bios the for_each_segment loop iterates over bi_vcnt,
so it will be ignored anyway.  That being said both I and the lists
got CCed halfway through the thread and I haven't seen the original
report, so I'm not really sure what's going on here anyway.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Linus Torvalds
Hi,

On Wed, Dec 21, 2016 at 2:16 PM, Dave Chinner  wrote:
> On Fri, Dec 16, 2016 at 10:59:06AM -0800, Chris Leech wrote:
>> Thanks Dave,
>>
>> I'm hitting a bug at scatterlist.h:140 before I even get any iSCSI
>> modules loaded (virtio block) so there's something else going on in the
>> current merge window.  I'll keep an eye on it and make sure there's
>> nothing iSCSI needs fixing for.
>
> OK, so before this slips through the cracks.
>
> Linus - your tree as of a few minutes ago still panics immediately
> when starting xfstests on iscsi devices. It appears to be a
> scatterlist corruption and not an iscsi problem, so the iscsi guys
> seem to have bounced it and no-one is looking at it.

Hmm. There's not much to go by.

Can somebody in iscsi-land please try to just bisect it - I'm not
seeing a lot of clues to where this comes from otherwise.

There clearly hasn't been anything done to drivers/scsi/*iscsi* since
4.9, so it's coming from elsewhere, presumably the block layer changes
as you say.

But I think we need iscsi people to pinpoint it a bit more, since
nobody else seems to be complaining.

Looking around a bit, the only even halfway suspicious scatterlist
initialization thing I see is commit f9d03f96b988 ("block: improve
handling of the magic discard payload") which used to have a magic
hack wrt !bio->bi_vcnt, and that got removed. See __blk_bios_map_sg(),
now it does __blk_bvec_map_sg() instead.

Maybe __blk_bvec_map_sg() would want that same

if (!bio->bi_vcnt)
return 0;

adding Christoph explicitly so that he can tell me why I'm a clueless
weenie. Although I suspect he's already on the scsi and block lists
and would have done so anyway.

> I'm disappearing for several months at the end of tomorrow, so I
> thought I better make sure you know about it.  I've also added
> linux-scsi, linux-block to the cc list

Thanks,

Linus

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [PATCH 5/8] linux: drop __bitwise__ everywhere

2016-12-22 Thread Luca Coelho
On Thu, 2016-12-15 at 07:15 +0200, Michael S. Tsirkin wrote:
> __bitwise__ used to mean "yes, please enable sparse checks
> unconditionally", but now that we dropped __CHECK_ENDIAN__
> __bitwise is exactly the same.
> There aren't many users, replace it by __bitwise everywhere.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  arch/arm/plat-samsung/include/plat/gpio-cfg.h| 2 +-
>  drivers/md/dm-cache-block-types.h| 6 +++---
>  drivers/net/ethernet/sun/sunhme.h| 2 +-
>  drivers/net/wireless/intel/iwlwifi/iwl-fw-file.h | 4 ++--

For drivers/net/wireless/intel/iwlwifi/iwl-fw-file.h:

Acked-by: Luca Coelho 

--
Luca.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.