Re: Issue with Ceph File System and LIO

2015-12-22 Thread Eric Eastman
On Sun, Dec 20, 2015 at 7:38 PM, Eric Eastman
 wrote:
> On Fri, Dec 18, 2015 at 12:18 AM, Yan, Zheng  wrote:
>> On Fri, Dec 18, 2015 at 2:23 PM, Eric Eastman
>>  wrote:
 Hi Yan Zheng, Eric Eastman

 Similar bug was reported in f2fs, btrfs, it does affect 4.4-rc4, the fixing
 patch was merged into 4.4-rc5, dfd01f026058 ("sched/wait: Fix the signal
 handling fix").

 Related report & discussion was here:
 https://lkml.org/lkml/2015/12/12/149

 I'm not sure the current reported issue of ceph was related to that though,
 but at least try testing with an upgraded or patched kernel could verify 
 it.
 :)

 Thanks,
>
>>
>> please try rc5 kernel without patches and DEBUG_VM=y
>>
>> Regards
>> Yan, Zheng
>
>
> The latest test with 4.4rc5 with CONFIG_DEBUG_VM=y has ran for over 36
> hours with no ERRORS or WARNINGS.  My plan is to install the 4.4rc6
> kernel from the Ubuntu kernel-ppa site once it is available, and rerun
> the tests.
>

Test has run for 2 days using the 4.4rc6 kernel from the Ubuntu
kernel-ppa kernel site without error or warning.  Looks like it was a
4.4rc4 bug.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with Ceph File System and LIO

2015-12-21 Thread Gregory Farnum
On Sun, Dec 20, 2015 at 6:38 PM, Eric Eastman
 wrote:
> On Fri, Dec 18, 2015 at 12:18 AM, Yan, Zheng  wrote:
>> On Fri, Dec 18, 2015 at 2:23 PM, Eric Eastman
>>  wrote:
 Hi Yan Zheng, Eric Eastman

 Similar bug was reported in f2fs, btrfs, it does affect 4.4-rc4, the fixing
 patch was merged into 4.4-rc5, dfd01f026058 ("sched/wait: Fix the signal
 handling fix").

 Related report & discussion was here:
 https://lkml.org/lkml/2015/12/12/149

 I'm not sure the current reported issue of ceph was related to that though,
 but at least try testing with an upgraded or patched kernel could verify 
 it.
 :)

 Thanks,
>
>>
>> please try rc5 kernel without patches and DEBUG_VM=y
>>
>> Regards
>> Yan, Zheng
>
>
> The latest test with 4.4rc5 with CONFIG_DEBUG_VM=y has ran for over 36
> hours with no ERRORS or WARNINGS.  My plan is to install the 4.4rc6
> kernel from the Ubuntu kernel-ppa site once it is available, and rerun
> the tests.
>
> Before running this test I had to rebuild the Ceph File System as
> after the last logged errors on Friday using the 4.4rc4 kernel, the
> Ceph File system hung accessing the exported image file.  After
> rebooting my iSCSI gateway using the Ceph File System, from / using
> command: strace du -a cephfs, the mount point, the hang happened on
> the newfsstatat call on my image file:
>
> write(1, "0\tcephfs/ctdb/.ctdb.lock\n", 250 cephfs/ctdb/.ctdb.lock
> ) = 25
> close(5)= 0
> write(1, "0\tcephfs/ctdb\n", 140 cephfs/ctdb
> )= 14
> newfstatat(4, "iscsi", {st_mode=S_IFDIR|0755, st_size=993814480896,
> ...}, AT_SYMLINK_NOFOLLOW) = 0
> openat(4, "iscsi", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3
> fcntl(3, F_GETFD)   = 0
> fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
> fstat(3, {st_mode=S_IFDIR|0755, st_size=993814480896, ...}) = 0
> fcntl(3, F_GETFL)   = 0x38800 (flags
> O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW)
> fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
> newfstatat(4, "iscsi", {st_mode=S_IFDIR|0755, st_size=993814480896,
> ...}, AT_SYMLINK_NOFOLLOW) = 0
> fcntl(3, F_DUPFD, 3)= 5
> fcntl(5, F_GETFD)   = 0
> fcntl(5, F_SETFD, FD_CLOEXEC)   = 0
> getdents(3, /* 8 entries */, 65536) = 288
> getdents(3, /* 0 entries */, 65536) = 0
> close(3)= 0
> newfstatat(5, "iscsi900g.img", ^C
> ^C^C^C
> ^Z
> I could not break out with a ^C, and had to background the process to
> get my prompt back. The process would not die so I had to hard reset
> the system.
>
> This same hang happened on 2 other kernel mounted systems using a 4.3.0 
> kernel.
>
> On a separate system, I fuse mounted the file system and a du -a
> cephfs hung at the same point. Once again I could not break out of the
> hang, and had to hard reset the system.
>
> Restarting the MDS and Monitors did not clear the issue. Taking a
> quick look at the dumpcache showed it was large
>
> # ceph mds tell 0 dumpcache /tmp/dump.txt
> ok
> # wc /tmp/dump.txt
>   370556  5002449 59211054 /tmp/dump.txt
> # tail /tmp/dump.txt
> [inode 1259276 [...c4,head] ~mds0/stray0/1259276/ auth v977593
> snaprealm=0x561339e3fb00 f(v0 m2015-12-12 00:51:04.345614) n(v0
> rc2015-12-12 00:51:04.345614 1=0+1) (iversion lock) 0x561339c66228]
> [inode 120c1ba [...a6,head] ~mds0/stray0/120c1ba/ auth v742016
> snaprealm=0x56133ad19600 f(v0 m2015-12-10 18:25:55.880167) n(v0
> rc2015-12-10 18:25:55.880167 1=0+1) (iversion lock) 0x56133a5e0d88]
> [inode 10d0088 [...77,head] ~mds0/stray6/10d0088/ auth v292336
> snaprealm=0x5613537673c0 f(v0 m2015-12-08 19:23:20.269283) n(v0
> rc2015-12-08 19:23:20.269283 1=0+1) (iversion lock) 0x56134c2f7378]

These are deleted files that haven't been trimmed yet...

>
> I tried one more thing:
>
> ceph daemon mds.0 flush journal
>
> and restarted the MDS. Accessing the file system still locked up, but
> a du -a cephfs did not even get to the iscsi900g.img file. As I was
> running on a broken rc kernel, with snapshots turned on

...and I think we have some known issues in the tracker about snap
trimming and snapshotted inodes. So this is not entirely surprising.
:/
-Greg


>, when this
> corruption happened, I decided to recreated the file system and
> restarted the ESXi iSCSI test.
>
> Regards,
> Eric
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with Ceph File System and LIO

2015-12-20 Thread Eric Eastman
On Fri, Dec 18, 2015 at 12:18 AM, Yan, Zheng  wrote:
> On Fri, Dec 18, 2015 at 2:23 PM, Eric Eastman
>  wrote:
>>> Hi Yan Zheng, Eric Eastman
>>>
>>> Similar bug was reported in f2fs, btrfs, it does affect 4.4-rc4, the fixing
>>> patch was merged into 4.4-rc5, dfd01f026058 ("sched/wait: Fix the signal
>>> handling fix").
>>>
>>> Related report & discussion was here:
>>> https://lkml.org/lkml/2015/12/12/149
>>>
>>> I'm not sure the current reported issue of ceph was related to that though,
>>> but at least try testing with an upgraded or patched kernel could verify it.
>>> :)
>>>
>>> Thanks,

>
> please try rc5 kernel without patches and DEBUG_VM=y
>
> Regards
> Yan, Zheng


The latest test with 4.4rc5 with CONFIG_DEBUG_VM=y has ran for over 36
hours with no ERRORS or WARNINGS.  My plan is to install the 4.4rc6
kernel from the Ubuntu kernel-ppa site once it is available, and rerun
the tests.

Before running this test I had to rebuild the Ceph File System as
after the last logged errors on Friday using the 4.4rc4 kernel, the
Ceph File system hung accessing the exported image file.  After
rebooting my iSCSI gateway using the Ceph File System, from / using
command: strace du -a cephfs, the mount point, the hang happened on
the newfsstatat call on my image file:

write(1, "0\tcephfs/ctdb/.ctdb.lock\n", 250 cephfs/ctdb/.ctdb.lock
) = 25
close(5)= 0
write(1, "0\tcephfs/ctdb\n", 140 cephfs/ctdb
)= 14
newfstatat(4, "iscsi", {st_mode=S_IFDIR|0755, st_size=993814480896,
...}, AT_SYMLINK_NOFOLLOW) = 0
openat(4, "iscsi", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3
fcntl(3, F_GETFD)   = 0
fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
fstat(3, {st_mode=S_IFDIR|0755, st_size=993814480896, ...}) = 0
fcntl(3, F_GETFL)   = 0x38800 (flags
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW)
fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
newfstatat(4, "iscsi", {st_mode=S_IFDIR|0755, st_size=993814480896,
...}, AT_SYMLINK_NOFOLLOW) = 0
fcntl(3, F_DUPFD, 3)= 5
fcntl(5, F_GETFD)   = 0
fcntl(5, F_SETFD, FD_CLOEXEC)   = 0
getdents(3, /* 8 entries */, 65536) = 288
getdents(3, /* 0 entries */, 65536) = 0
close(3)= 0
newfstatat(5, "iscsi900g.img", ^C
^C^C^C
^Z
I could not break out with a ^C, and had to background the process to
get my prompt back. The process would not die so I had to hard reset
the system.

This same hang happened on 2 other kernel mounted systems using a 4.3.0 kernel.

On a separate system, I fuse mounted the file system and a du -a
cephfs hung at the same point. Once again I could not break out of the
hang, and had to hard reset the system.

Restarting the MDS and Monitors did not clear the issue. Taking a
quick look at the dumpcache showed it was large

# ceph mds tell 0 dumpcache /tmp/dump.txt
ok
# wc /tmp/dump.txt
  370556  5002449 59211054 /tmp/dump.txt
# tail /tmp/dump.txt
[inode 1259276 [...c4,head] ~mds0/stray0/1259276/ auth v977593
snaprealm=0x561339e3fb00 f(v0 m2015-12-12 00:51:04.345614) n(v0
rc2015-12-12 00:51:04.345614 1=0+1) (iversion lock) 0x561339c66228]
[inode 120c1ba [...a6,head] ~mds0/stray0/120c1ba/ auth v742016
snaprealm=0x56133ad19600 f(v0 m2015-12-10 18:25:55.880167) n(v0
rc2015-12-10 18:25:55.880167 1=0+1) (iversion lock) 0x56133a5e0d88]
[inode 10d0088 [...77,head] ~mds0/stray6/10d0088/ auth v292336
snaprealm=0x5613537673c0 f(v0 m2015-12-08 19:23:20.269283) n(v0
rc2015-12-08 19:23:20.269283 1=0+1) (iversion lock) 0x56134c2f7378]

I tried one more thing:

ceph daemon mds.0 flush journal

and restarted the MDS. Accessing the file system still locked up, but
a du -a cephfs did not even get to the iscsi900g.img file. As I was
running on a broken rc kernel, with snapshots turned on, when this
corruption happened, I decided to recreated the file system and
restarted the ESXi iSCSI test.

Regards,
Eric
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with Ceph File System and LIO

2015-12-18 Thread Mike Christie
Eric,

Do you have iSCSI data digests on?

On 12/15/2015 12:08 AM, Eric Eastman wrote:
> I am testing Linux Target SCSI, LIO, with a Ceph File System backstore
> and I am seeing this error on my LIO gateway.  I am using Ceph v9.2.0
> on a 4.4rc4 Kernel, on Trusty, using a kernel mounted Ceph File
> System.  A file on the Ceph File System is exported via iSCSI to a
> VMware ESXi 5.0 server, and I am seeing this error when doing a lot of
> I/O on the ESXi server.   Is this a LIO or a Ceph issue?
> 
> [Tue Dec 15 00:46:55 2015] [ cut here ]
> [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at
> /home/kernel/COD/linux/fs/ceph/addr.c:125
> ceph_set_page_dirty+0x230/0x240 [ceph]()
> [Tue Dec 15 00:46:55 2015] Modules linked in: iptable_filter ip_tables
> x_tables xfs rbd iscsi_target_mod vhost_scsi tcm_qla2xxx ib_srpt
> tcm_fc tcm_usb_gadget tcm_loop target_core_file target_core_iblock
> target_core_pscsi target_core_user target_core_mod ipmi_devintf vhost
> qla2xxx ib_cm ib_sa ib_mad ib_core ib_addr libfc scsi_transport_fc
> libcomposite udc_core uio configfs ipmi_ssif ttm drm_kms_helper
> gpio_ich drm i2c_algo_bit fb_sys_fops coretemp syscopyarea ipmi_si
> sysfillrect ipmi_msghandler sysimgblt kvm acpi_power_meter 8250_fintek
> irqbypass hpilo shpchp input_leds serio_raw lpc_ich i7core_edac
> edac_core mac_hid ceph libceph libcrc32c fscache bonding lp parport
> mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core hid_generic
> usbhid hid hpsa mlx4_core psmouse bnx2 scsi_transport_sas fjes [last
> unloaded: target_core_mod]
> [Tue Dec 15 00:46:55 2015] CPU: 0 PID: 1123421 Comm: iscsi_trx
> Tainted: GW I 4.4.0-040400rc4-generic #201512061930
> [Tue Dec 15 00:46:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS
> P64 01/22/2015
> [Tue Dec 15 00:46:55 2015]   fdc0ce43
> 880bf38c38c0 813c8ab4
> [Tue Dec 15 00:46:55 2015]   880bf38c38f8
> 8107d772 ea00127a8680
> [Tue Dec 15 00:46:55 2015]  8804e52c1448 8804e52c15b0
> 8804e52c10f0 0200
> [Tue Dec 15 00:46:55 2015] Call Trace:
> [Tue Dec 15 00:46:55 2015]  [] dump_stack+0x44/0x60
> [Tue Dec 15 00:46:55 2015]  [] 
> warn_slowpath_common+0x82/0xc0
> [Tue Dec 15 00:46:55 2015]  [] warn_slowpath_null+0x1a/0x20
> [Tue Dec 15 00:46:55 2015]  []
> ceph_set_page_dirty+0x230/0x240 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] ?
> pagecache_get_page+0x150/0x1c0
> [Tue Dec 15 00:46:55 2015]  [] ?
> ceph_pool_perm_check+0x48/0x700 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] set_page_dirty+0x3d/0x70
> [Tue Dec 15 00:46:55 2015]  []
> ceph_write_end+0x5e/0x180 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] ?
> iov_iter_copy_from_user_atomic+0x156/0x220
> [Tue Dec 15 00:46:55 2015]  []
> generic_perform_write+0x114/0x1c0
> [Tue Dec 15 00:46:55 2015]  []
> ceph_write_iter+0xf8a/0x1050 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] ?
> ceph_put_cap_refs+0x143/0x320 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] ?
> check_preempt_wakeup+0xfa/0x220
> [Tue Dec 15 00:46:55 2015]  [] ? zone_statistics+0x7c/0xa0
> [Tue Dec 15 00:46:55 2015]  [] ? copy_page_to_iter+0x5e/0xa0
> [Tue Dec 15 00:46:55 2015]  [] ?
> skb_copy_datagram_iter+0x122/0x250
> [Tue Dec 15 00:46:55 2015]  [] vfs_iter_write+0x76/0xc0
> [Tue Dec 15 00:46:55 2015]  []
> fd_do_rw.isra.5+0xd8/0x1e0 [target_core_file]
> [Tue Dec 15 00:46:55 2015]  []
> fd_execute_rw+0xc5/0x2a0 [target_core_file]
> [Tue Dec 15 00:46:55 2015]  []
> sbc_execute_rw+0x22/0x30 [target_core_mod]
> [Tue Dec 15 00:46:55 2015]  []
> __target_execute_cmd+0x1f/0x70 [target_core_mod]
> [Tue Dec 15 00:46:55 2015]  []
> target_execute_cmd+0x195/0x2a0 [target_core_mod]
> [Tue Dec 15 00:46:55 2015]  []
> iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod]
> [Tue Dec 15 00:46:55 2015]  []
> iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod]
> [Tue Dec 15 00:46:55 2015]  []
> iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod]
> [Tue Dec 15 00:46:55 2015]  [] ? __switch_to+0x1dc/0x5a0
> [Tue Dec 15 00:46:55 2015]  [] ?
> iscsi_target_tx_thread+0x1e0/0x1e0 [iscsi_target_mod]
> [Tue Dec 15 00:46:55 2015]  [] kthread+0xd8/0xf0
> [Tue Dec 15 00:46:55 2015]  [] ?
> kthread_create_on_node+0x1a0/0x1a0
> [Tue Dec 15 00:46:55 2015]  [] ret_from_fork+0x3f/0x70
> [Tue Dec 15 00:46:55 2015]  [] ?
> kthread_create_on_node+0x1a0/0x1a0
> [Tue Dec 15 00:46:55 2015] ---[ end trace 4079437668c77cbb ]---
> [Tue Dec 15 00:47:45 2015] ABORT_TASK: Found referenced iSCSI task_tag: 
> 95784927
> [Tue Dec 15 00:47:45 2015] ABORT_TASK: ref_tag: 95784927 already
> complete, skipping
> 
> If it is a Ceph File System issue, let me know and I will open a bug.
> 
> Thanks
> 
> Eric
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to 

Re: Issue with Ceph File System and LIO

2015-12-18 Thread Eric Eastman
Hi Mike,

On the EXSi server both Header Digest and Data Digest are set to Prohibited.

Eric

On Fri, Dec 18, 2015 at 2:54 PM, Mike Christie  wrote:
> Eric,
>
> Do you have iSCSI data digests on?
>
> On 12/15/2015 12:08 AM, Eric Eastman wrote:
>> I am testing Linux Target SCSI, LIO, with a Ceph File System backstore
>> and I am seeing this error on my LIO gateway.  I am using Ceph v9.2.0
>> on a 4.4rc4 Kernel, on Trusty, using a kernel mounted Ceph File
>> System.  A file on the Ceph File System is exported via iSCSI to a
>> VMware ESXi 5.0 server, and I am seeing this error when doing a lot of
>> I/O on the ESXi server.   Is this a LIO or a Ceph issue?
>>
>> [Tue Dec 15 00:46:55 2015] [ cut here ]
>> [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at
>> /home/kernel/COD/linux/fs/ceph/addr.c:125
>> ceph_set_page_dirty+0x230/0x240 [ceph]()
>> [Tue Dec 15 00:46:55 2015] Modules linked in: iptable_filter ip_tables
>> x_tables xfs rbd iscsi_target_mod vhost_scsi tcm_qla2xxx ib_srpt
>> tcm_fc tcm_usb_gadget tcm_loop target_core_file target_core_iblock
>> target_core_pscsi target_core_user target_core_mod ipmi_devintf vhost
>> qla2xxx ib_cm ib_sa ib_mad ib_core ib_addr libfc scsi_transport_fc
>> libcomposite udc_core uio configfs ipmi_ssif ttm drm_kms_helper
>> gpio_ich drm i2c_algo_bit fb_sys_fops coretemp syscopyarea ipmi_si
>> sysfillrect ipmi_msghandler sysimgblt kvm acpi_power_meter 8250_fintek
>> irqbypass hpilo shpchp input_leds serio_raw lpc_ich i7core_edac
>> edac_core mac_hid ceph libceph libcrc32c fscache bonding lp parport
>> mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core hid_generic
>> usbhid hid hpsa mlx4_core psmouse bnx2 scsi_transport_sas fjes [last
>> unloaded: target_core_mod]
>> [Tue Dec 15 00:46:55 2015] CPU: 0 PID: 1123421 Comm: iscsi_trx
>> Tainted: GW I 4.4.0-040400rc4-generic #201512061930
>> [Tue Dec 15 00:46:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS
>> P64 01/22/2015
>> [Tue Dec 15 00:46:55 2015]   fdc0ce43
>> 880bf38c38c0 813c8ab4
>> [Tue Dec 15 00:46:55 2015]   880bf38c38f8
>> 8107d772 ea00127a8680
>> [Tue Dec 15 00:46:55 2015]  8804e52c1448 8804e52c15b0
>> 8804e52c10f0 0200
>> [Tue Dec 15 00:46:55 2015] Call Trace:
>> [Tue Dec 15 00:46:55 2015]  [] dump_stack+0x44/0x60
>> [Tue Dec 15 00:46:55 2015]  [] 
>> warn_slowpath_common+0x82/0xc0
>> [Tue Dec 15 00:46:55 2015]  [] warn_slowpath_null+0x1a/0x20
>> [Tue Dec 15 00:46:55 2015]  []
>> ceph_set_page_dirty+0x230/0x240 [ceph]
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> pagecache_get_page+0x150/0x1c0
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> ceph_pool_perm_check+0x48/0x700 [ceph]
>> [Tue Dec 15 00:46:55 2015]  [] set_page_dirty+0x3d/0x70
>> [Tue Dec 15 00:46:55 2015]  []
>> ceph_write_end+0x5e/0x180 [ceph]
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> iov_iter_copy_from_user_atomic+0x156/0x220
>> [Tue Dec 15 00:46:55 2015]  []
>> generic_perform_write+0x114/0x1c0
>> [Tue Dec 15 00:46:55 2015]  []
>> ceph_write_iter+0xf8a/0x1050 [ceph]
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> ceph_put_cap_refs+0x143/0x320 [ceph]
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> check_preempt_wakeup+0xfa/0x220
>> [Tue Dec 15 00:46:55 2015]  [] ? zone_statistics+0x7c/0xa0
>> [Tue Dec 15 00:46:55 2015]  [] ? 
>> copy_page_to_iter+0x5e/0xa0
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> skb_copy_datagram_iter+0x122/0x250
>> [Tue Dec 15 00:46:55 2015]  [] vfs_iter_write+0x76/0xc0
>> [Tue Dec 15 00:46:55 2015]  []
>> fd_do_rw.isra.5+0xd8/0x1e0 [target_core_file]
>> [Tue Dec 15 00:46:55 2015]  []
>> fd_execute_rw+0xc5/0x2a0 [target_core_file]
>> [Tue Dec 15 00:46:55 2015]  []
>> sbc_execute_rw+0x22/0x30 [target_core_mod]
>> [Tue Dec 15 00:46:55 2015]  []
>> __target_execute_cmd+0x1f/0x70 [target_core_mod]
>> [Tue Dec 15 00:46:55 2015]  []
>> target_execute_cmd+0x195/0x2a0 [target_core_mod]
>> [Tue Dec 15 00:46:55 2015]  []
>> iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod]
>> [Tue Dec 15 00:46:55 2015]  []
>> iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod]
>> [Tue Dec 15 00:46:55 2015]  []
>> iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod]
>> [Tue Dec 15 00:46:55 2015]  [] ? __switch_to+0x1dc/0x5a0
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> iscsi_target_tx_thread+0x1e0/0x1e0 [iscsi_target_mod]
>> [Tue Dec 15 00:46:55 2015]  [] kthread+0xd8/0xf0
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> kthread_create_on_node+0x1a0/0x1a0
>> [Tue Dec 15 00:46:55 2015]  [] ret_from_fork+0x3f/0x70
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> kthread_create_on_node+0x1a0/0x1a0
>> [Tue Dec 15 00:46:55 2015] ---[ end trace 4079437668c77cbb ]---
>> [Tue Dec 15 00:47:45 2015] ABORT_TASK: Found referenced iSCSI task_tag: 
>> 95784927
>> [Tue Dec 15 00:47:45 2015] ABORT_TASK: ref_tag: 95784927 already
>> complete, skipping
>>
>> If it is a Ceph File System issue, let me know and I will open a bug.
>>
>> Thanks
>>
>> Eric
>> --
>> To unsubscribe from this 

Re: Issue with Ceph File System and LIO

2015-12-17 Thread Eric Eastman
I patched the 4.4rc4 kernel source and restarted the test.  Shortly
after starting it, this showed up in dmesg:

[Thu Dec 17 03:29:55 2015] WARNING: CPU: 0 PID: 2547 at
fs/ceph/addr.c:1162 ceph_write_begin+0xfb/0x120 [ceph]()
[Thu Dec 17 03:29:55 2015] Modules linked in: iscsi_target_mod
vhost_scsi tcm_qla2xxx ib_srpt tcm_fc tcm_usb_gadget tcm_loop
target_core_file target_core_iblock target_core_pscsi target_core_user
target_core_mod ipmi_devintf vhost qla2xxx ib_cm ib_sa ib_mad ib_core
ib_addr libfc scsi_transport_fc libcomposite udc_core uio configfs ttm
ipmi_ssif drm_kms_helper drm coretemp kvm gpio_ich i2c_algo_bit
i7core_edac fb_sys_fops syscopyarea edac_core sysfillrect sysimgblt
ipmi_si input_leds hpilo ipmi_msghandler shpchp acpi_power_meter
irqbypass serio_raw 8250_fintek lpc_ich mac_hid ceph bonding libceph
lp parport libcrc32c fscache mlx4_en vxlan ip6_udp_tunnel udp_tunnel
ptp pps_core hid_generic usbhid hid mlx4_core hpsa psmouse bnx2 fjes
scsi_transport_sas [last unloaded: target_core_mod]
[Thu Dec 17 03:29:55 2015] CPU: 0 PID: 2547 Comm: iscsi_trx Tainted: G
   W I 4.4.0-rc4-ede1 #1
[Thu Dec 17 03:29:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS
P64 01/22/2015
[Thu Dec 17 03:29:55 2015]  c020cd47 8805f1e97958
813ad644 
[Thu Dec 17 03:29:55 2015]  8805f1e97990 81079702
8805f1e97a50 015dd000
[Thu Dec 17 03:29:55 2015]  880c034df800 0200
eab26a80 8805f1e979a0
[Thu Dec 17 03:29:55 2015] Call Trace:
[Thu Dec 17 03:29:55 2015]  [] dump_stack+0x44/0x60
[Thu Dec 17 03:29:55 2015]  [] warn_slowpath_common+0x82/0xc0
[Thu Dec 17 03:29:55 2015]  [] warn_slowpath_null+0x1a/0x20
[Thu Dec 17 03:29:55 2015]  []
ceph_write_begin+0xfb/0x120 [ceph]
[Thu Dec 17 03:29:55 2015]  []
generic_perform_write+0xbf/0x1a0
[Thu Dec 17 03:29:55 2015]  []
ceph_write_iter+0xf5c/0x1010 [ceph]
[Thu Dec 17 03:29:55 2015]  [] ? __enqueue_entity+0x6c/0x70
[Thu Dec 17 03:29:55 2015]  [] ?
iov_iter_get_pages+0x113/0x210
[Thu Dec 17 03:29:55 2015]  [] ?
skb_copy_datagram_iter+0x122/0x250
[Thu Dec 17 03:29:55 2015]  [] vfs_iter_write+0x63/0xa0
[Thu Dec 17 03:29:55 2015]  []
fd_do_rw.isra.5+0xc9/0x1b0 [target_core_file]
[Thu Dec 17 03:29:55 2015]  []
fd_execute_rw+0xc5/0x2a0 [target_core_file]
[Thu Dec 17 03:29:55 2015]  []
sbc_execute_rw+0x22/0x30 [target_core_mod]
[Thu Dec 17 03:29:55 2015]  []
__target_execute_cmd+0x1f/0x70 [target_core_mod]
[Thu Dec 17 03:29:55 2015]  []
target_execute_cmd+0x195/0x2a0 [target_core_mod]
[Thu Dec 17 03:29:55 2015]  []
iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod]
[Thu Dec 17 03:29:55 2015]  []
iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod]
[Thu Dec 17 03:29:55 2015]  []
iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod]
[Thu Dec 17 03:29:55 2015]  [] ? __switch_to+0x1cd/0x570
[Thu Dec 17 03:29:55 2015]  [] ?
iscsi_target_tx_thread+0x1c0/0x1c0 [iscsi_target_mod]
[Thu Dec 17 03:29:55 2015]  [] kthread+0xc9/0xe0
[Thu Dec 17 03:29:55 2015]  [] ?
kthread_create_on_node+0x180/0x180
[Thu Dec 17 03:29:55 2015]  [] ret_from_fork+0x3f/0x70
[Thu Dec 17 03:29:55 2015]  [] ?
kthread_create_on_node+0x180/0x180
[Thu Dec 17 03:29:55 2015] ---[ end trace 382a45986961da4e ]---

There are WARNINGs on both line 125 and 1162. I will attached the
whole set of dmesg output to the tracker ticket 14086

I wanted to note that file system snapshots are enabled and being used
on this file system.

Thanks
Eric

On Wed, Dec 16, 2015 at 8:15 AM, Eric Eastman
 wrote:
>>>
>> This warning is really strange. Could you try the attached debug patch.
>>
>> Regards
>> Yan, Zheng
>
> I will try the patch and get back to the list.
>
> Eric
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with Ceph File System and LIO

2015-12-17 Thread Yan, Zheng
On Thu, Dec 17, 2015 at 4:56 PM, Eric Eastman
 wrote:
> I patched the 4.4rc4 kernel source and restarted the test.  Shortly
> after starting it, this showed up in dmesg:
>
> [Thu Dec 17 03:29:55 2015] WARNING: CPU: 0 PID: 2547 at
> fs/ceph/addr.c:1162 ceph_write_begin+0xfb/0x120 [ceph]()
> [Thu Dec 17 03:29:55 2015] Modules linked in: iscsi_target_mod
> vhost_scsi tcm_qla2xxx ib_srpt tcm_fc tcm_usb_gadget tcm_loop
> target_core_file target_core_iblock target_core_pscsi target_core_user
> target_core_mod ipmi_devintf vhost qla2xxx ib_cm ib_sa ib_mad ib_core
> ib_addr libfc scsi_transport_fc libcomposite udc_core uio configfs ttm
> ipmi_ssif drm_kms_helper drm coretemp kvm gpio_ich i2c_algo_bit
> i7core_edac fb_sys_fops syscopyarea edac_core sysfillrect sysimgblt
> ipmi_si input_leds hpilo ipmi_msghandler shpchp acpi_power_meter
> irqbypass serio_raw 8250_fintek lpc_ich mac_hid ceph bonding libceph
> lp parport libcrc32c fscache mlx4_en vxlan ip6_udp_tunnel udp_tunnel
> ptp pps_core hid_generic usbhid hid mlx4_core hpsa psmouse bnx2 fjes
> scsi_transport_sas [last unloaded: target_core_mod]
> [Thu Dec 17 03:29:55 2015] CPU: 0 PID: 2547 Comm: iscsi_trx Tainted: G
>W I 4.4.0-rc4-ede1 #1
> [Thu Dec 17 03:29:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS
> P64 01/22/2015
> [Thu Dec 17 03:29:55 2015]  c020cd47 8805f1e97958
> 813ad644 
> [Thu Dec 17 03:29:55 2015]  8805f1e97990 81079702
> 8805f1e97a50 015dd000
> [Thu Dec 17 03:29:55 2015]  880c034df800 0200
> eab26a80 8805f1e979a0
> [Thu Dec 17 03:29:55 2015] Call Trace:
> [Thu Dec 17 03:29:55 2015]  [] dump_stack+0x44/0x60
> [Thu Dec 17 03:29:55 2015]  [] 
> warn_slowpath_common+0x82/0xc0
> [Thu Dec 17 03:29:55 2015]  [] warn_slowpath_null+0x1a/0x20
> [Thu Dec 17 03:29:55 2015]  []
> ceph_write_begin+0xfb/0x120 [ceph]
> [Thu Dec 17 03:29:55 2015]  []
> generic_perform_write+0xbf/0x1a0
> [Thu Dec 17 03:29:55 2015]  []
> ceph_write_iter+0xf5c/0x1010 [ceph]
> [Thu Dec 17 03:29:55 2015]  [] ? __enqueue_entity+0x6c/0x70
> [Thu Dec 17 03:29:55 2015]  [] ?
> iov_iter_get_pages+0x113/0x210
> [Thu Dec 17 03:29:55 2015]  [] ?
> skb_copy_datagram_iter+0x122/0x250
> [Thu Dec 17 03:29:55 2015]  [] vfs_iter_write+0x63/0xa0
> [Thu Dec 17 03:29:55 2015]  []
> fd_do_rw.isra.5+0xc9/0x1b0 [target_core_file]
> [Thu Dec 17 03:29:55 2015]  []
> fd_execute_rw+0xc5/0x2a0 [target_core_file]
> [Thu Dec 17 03:29:55 2015]  []
> sbc_execute_rw+0x22/0x30 [target_core_mod]
> [Thu Dec 17 03:29:55 2015]  []
> __target_execute_cmd+0x1f/0x70 [target_core_mod]
> [Thu Dec 17 03:29:55 2015]  []
> target_execute_cmd+0x195/0x2a0 [target_core_mod]
> [Thu Dec 17 03:29:55 2015]  []
> iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod]
> [Thu Dec 17 03:29:55 2015]  []
> iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod]
> [Thu Dec 17 03:29:55 2015]  []
> iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod]
> [Thu Dec 17 03:29:55 2015]  [] ? __switch_to+0x1cd/0x570
> [Thu Dec 17 03:29:55 2015]  [] ?
> iscsi_target_tx_thread+0x1c0/0x1c0 [iscsi_target_mod]
> [Thu Dec 17 03:29:55 2015]  [] kthread+0xc9/0xe0
> [Thu Dec 17 03:29:55 2015]  [] ?
> kthread_create_on_node+0x180/0x180
> [Thu Dec 17 03:29:55 2015]  [] ret_from_fork+0x3f/0x70
> [Thu Dec 17 03:29:55 2015]  [] ?
> kthread_create_on_node+0x180/0x180
> [Thu Dec 17 03:29:55 2015] ---[ end trace 382a45986961da4e ]---


Could you please try the apply the new incremental patch and try again.


Regards
Yan, Zheng


>
> There are WARNINGs on both line 125 and 1162. I will attached the
> whole set of dmesg output to the tracker ticket 14086
>
> I wanted to note that file system snapshots are enabled and being used
> on this file system.
>
> Thanks
> Eric
>
> On Wed, Dec 16, 2015 at 8:15 AM, Eric Eastman
>  wrote:

>>> This warning is really strange. Could you try the attached debug patch.
>>>
>>> Regards
>>> Yan, Zheng
>>
>> I will try the patch and get back to the list.
>>
>> Eric


cephfs1.patch
Description: Binary data


Re: Issue with Ceph File System and LIO

2015-12-17 Thread Minfei Huang
Hi.

It may be helpful to address this issue, if we flip the debug.

Thanks
Minfei

On 12/17/15 at 01:56P, Eric Eastman wrote:
> I patched the 4.4rc4 kernel source and restarted the test.  Shortly
> after starting it, this showed up in dmesg:
> 
> [Thu Dec 17 03:29:55 2015] WARNING: CPU: 0 PID: 2547 at
> fs/ceph/addr.c:1162 ceph_write_begin+0xfb/0x120 [ceph]()
> [Thu Dec 17 03:29:55 2015] Modules linked in: iscsi_target_mod
> vhost_scsi tcm_qla2xxx ib_srpt tcm_fc tcm_usb_gadget tcm_loop
> target_core_file target_core_iblock target_core_pscsi target_core_user
> target_core_mod ipmi_devintf vhost qla2xxx ib_cm ib_sa ib_mad ib_core
> ib_addr libfc scsi_transport_fc libcomposite udc_core uio configfs ttm
> ipmi_ssif drm_kms_helper drm coretemp kvm gpio_ich i2c_algo_bit
> i7core_edac fb_sys_fops syscopyarea edac_core sysfillrect sysimgblt
> ipmi_si input_leds hpilo ipmi_msghandler shpchp acpi_power_meter
> irqbypass serio_raw 8250_fintek lpc_ich mac_hid ceph bonding libceph
> lp parport libcrc32c fscache mlx4_en vxlan ip6_udp_tunnel udp_tunnel
> ptp pps_core hid_generic usbhid hid mlx4_core hpsa psmouse bnx2 fjes
> scsi_transport_sas [last unloaded: target_core_mod]
> [Thu Dec 17 03:29:55 2015] CPU: 0 PID: 2547 Comm: iscsi_trx Tainted: G
>W I 4.4.0-rc4-ede1 #1
> [Thu Dec 17 03:29:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS
> P64 01/22/2015
> [Thu Dec 17 03:29:55 2015]  c020cd47 8805f1e97958
> 813ad644 
> [Thu Dec 17 03:29:55 2015]  8805f1e97990 81079702
> 8805f1e97a50 015dd000
> [Thu Dec 17 03:29:55 2015]  880c034df800 0200
> eab26a80 8805f1e979a0
> [Thu Dec 17 03:29:55 2015] Call Trace:
> [Thu Dec 17 03:29:55 2015]  [] dump_stack+0x44/0x60
> [Thu Dec 17 03:29:55 2015]  [] 
> warn_slowpath_common+0x82/0xc0
> [Thu Dec 17 03:29:55 2015]  [] warn_slowpath_null+0x1a/0x20
> [Thu Dec 17 03:29:55 2015]  []
> ceph_write_begin+0xfb/0x120 [ceph]
> [Thu Dec 17 03:29:55 2015]  []
> generic_perform_write+0xbf/0x1a0
> [Thu Dec 17 03:29:55 2015]  []
> ceph_write_iter+0xf5c/0x1010 [ceph]
> [Thu Dec 17 03:29:55 2015]  [] ? __enqueue_entity+0x6c/0x70
> [Thu Dec 17 03:29:55 2015]  [] ?
> iov_iter_get_pages+0x113/0x210
> [Thu Dec 17 03:29:55 2015]  [] ?
> skb_copy_datagram_iter+0x122/0x250
> [Thu Dec 17 03:29:55 2015]  [] vfs_iter_write+0x63/0xa0
> [Thu Dec 17 03:29:55 2015]  []
> fd_do_rw.isra.5+0xc9/0x1b0 [target_core_file]
> [Thu Dec 17 03:29:55 2015]  []
> fd_execute_rw+0xc5/0x2a0 [target_core_file]
> [Thu Dec 17 03:29:55 2015]  []
> sbc_execute_rw+0x22/0x30 [target_core_mod]
> [Thu Dec 17 03:29:55 2015]  []
> __target_execute_cmd+0x1f/0x70 [target_core_mod]
> [Thu Dec 17 03:29:55 2015]  []
> target_execute_cmd+0x195/0x2a0 [target_core_mod]
> [Thu Dec 17 03:29:55 2015]  []
> iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod]
> [Thu Dec 17 03:29:55 2015]  []
> iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod]
> [Thu Dec 17 03:29:55 2015]  []
> iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod]
> [Thu Dec 17 03:29:55 2015]  [] ? __switch_to+0x1cd/0x570
> [Thu Dec 17 03:29:55 2015]  [] ?
> iscsi_target_tx_thread+0x1c0/0x1c0 [iscsi_target_mod]
> [Thu Dec 17 03:29:55 2015]  [] kthread+0xc9/0xe0
> [Thu Dec 17 03:29:55 2015]  [] ?
> kthread_create_on_node+0x180/0x180
> [Thu Dec 17 03:29:55 2015]  [] ret_from_fork+0x3f/0x70
> [Thu Dec 17 03:29:55 2015]  [] ?
> kthread_create_on_node+0x180/0x180
> [Thu Dec 17 03:29:55 2015] ---[ end trace 382a45986961da4e ]---
> 
> There are WARNINGs on both line 125 and 1162. I will attached the
> whole set of dmesg output to the tracker ticket 14086
> 
> I wanted to note that file system snapshots are enabled and being used
> on this file system.
> 
> Thanks
> Eric
> 
> On Wed, Dec 16, 2015 at 8:15 AM, Eric Eastman
>  wrote:
> >>>
> >> This warning is really strange. Could you try the attached debug patch.
> >>
> >> Regards
> >> Yan, Zheng
> >
> > I will try the patch and get back to the list.
> >
> > Eric
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with Ceph File System and LIO

2015-12-17 Thread Eric Eastman
With cephfs.patch and cephfs1.patch applied and I am now seeing:

[Thu Dec 17 14:27:59 2015] [ cut here ]
[Thu Dec 17 14:27:59 2015] WARNING: CPU: 0 PID: 3036 at
fs/ceph/addr.c:1171 ceph_write_begin+0xfb/0x120 [ceph]()
[Thu Dec 17 14:27:59 2015] Modules linked in: iscsi_target_mod
vhost_scsi tcm_qla2xxx ib_srpt tcm_fc tcm_usb_gadget tcm_loop
target_core_file target_core_iblock target_core_pscsi target_core_user
target_core_mod ipmi_devintf vhost qla2xxx ib_cm ib_sa ib_mad ib_core
ib_addr libfc scsi_transport_fc libcomposite udc_core uio configfs ttm
drm_kms_helper drm ipmi_ssif coretemp gpio_ich i2c_algo_bit kvm
fb_sys_fops syscopyarea sysfillrect sysimgblt shpchp input_leds ceph
irqbypass i7core_edac serio_raw hpilo edac_core ipmi_si
ipmi_msghandler 8250_fintek lpc_ich acpi_power_meter libceph mac_hid
libcrc32c fscache bonding lp parport mlx4_en vxlan ip6_udp_tunnel
udp_tunnel ptp pps_core hid_generic usbhid hid mlx4_core hpsa psmouse
bnx2 fjes scsi_transport_sas [last unloaded: target_core_mod]
[Thu Dec 17 14:27:59 2015] CPU: 0 PID: 3036 Comm: iscsi_trx Tainted: G
   W I 4.4.0-rc4-ede2 #1
[Thu Dec 17 14:27:59 2015] Hardware name: HP ProLiant DL360 G6, BIOS
P64 01/22/2015
[Thu Dec 17 14:27:59 2015]  c02b2e37 880c0289b958
813ad644 
[Thu Dec 17 14:27:59 2015]  880c0289b990 81079702
880c0289ba50 000846c21000
[Thu Dec 17 14:27:59 2015]  880c009ea200 1000
ea00122ed700 880c0289b9a0
[Thu Dec 17 14:27:59 2015] Call Trace:
[Thu Dec 17 14:27:59 2015]  [] dump_stack+0x44/0x60
[Thu Dec 17 14:27:59 2015]  [] warn_slowpath_common+0x82/0xc0
[Thu Dec 17 14:27:59 2015]  [] warn_slowpath_null+0x1a/0x20
[Thu Dec 17 14:27:59 2015]  []
ceph_write_begin+0xfb/0x120 [ceph]
[Thu Dec 17 14:27:59 2015]  []
generic_perform_write+0xbf/0x1a0
[Thu Dec 17 14:27:59 2015]  []
ceph_write_iter+0xf5c/0x1010 [ceph]
[Thu Dec 17 14:27:59 2015]  [] ? __schedule+0x386/0x9c0
[Thu Dec 17 14:27:59 2015]  [] ? schedule+0x35/0x80
[Thu Dec 17 14:27:59 2015]  [] ? __slab_free+0xb5/0x290
[Thu Dec 17 14:27:59 2015]  [] ?
iov_iter_get_pages+0x113/0x210
[Thu Dec 17 14:27:59 2015]  [] vfs_iter_write+0x63/0xa0
[Thu Dec 17 14:27:59 2015]  []
fd_do_rw.isra.5+0xc9/0x1b0 [target_core_file]
[Thu Dec 17 14:27:59 2015]  []
fd_execute_rw+0xc5/0x2a0 [target_core_file]
[Thu Dec 17 14:27:59 2015]  []
sbc_execute_rw+0x22/0x30 [target_core_mod]
[Thu Dec 17 14:27:59 2015]  []
__target_execute_cmd+0x1f/0x70 [target_core_mod]
[Thu Dec 17 14:27:59 2015]  []
target_execute_cmd+0x195/0x2a0 [target_core_mod]
[Thu Dec 17 14:27:59 2015]  []
iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod]
[Thu Dec 17 14:27:59 2015]  []
iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod]
[Thu Dec 17 14:27:59 2015]  []
iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod]
[Thu Dec 17 14:27:59 2015]  [] ? __switch_to+0x1cd/0x570
[Thu Dec 17 14:27:59 2015]  [] ?
iscsi_target_tx_thread+0x1c0/0x1c0 [iscsi_target_mod]
[Thu Dec 17 14:27:59 2015]  [] kthread+0xc9/0xe0
[Thu Dec 17 14:27:59 2015]  [] ?
kthread_create_on_node+0x180/0x180
[Thu Dec 17 14:27:59 2015]  [] ret_from_fork+0x3f/0x70
[Thu Dec 17 14:27:59 2015]  [] ?
kthread_create_on_node+0x180/0x180
[Thu Dec 17 14:27:59 2015] ---[ end trace 8346192e3f29ed5d ]---

Each of the WARNING on line 1171 is followed by a WARNING on line 125.
The dmesg output is attached to the tracker ticket 14086

Regards,
Eric

On Thu, Dec 17, 2015 at 2:38 AM, Yan, Zheng  wrote:
> On Thu, Dec 17, 2015 at 4:56 PM, Eric Eastman
>  wrote:
>> I patched the 4.4rc4 kernel source and restarted the test.  Shortly
>> after starting it, this showed up in dmesg:
>>
>> [Thu Dec 17 03:29:55 2015] WARNING: CPU: 0 PID: 2547 at
>> fs/ceph/addr.c:1162 ceph_write_begin+0xfb/0x120 [ceph]()
>> [Thu Dec 17 03:29:55 2015] Modules linked in: iscsi_target_mod
>> vhost_scsi tcm_qla2xxx ib_srpt tcm_fc tcm_usb_gadget tcm_loop
>> target_core_file target_core_iblock target_core_pscsi target_core_user
>> target_core_mod ipmi_devintf vhost qla2xxx ib_cm ib_sa ib_mad ib_core
>> ib_addr libfc scsi_transport_fc libcomposite udc_core uio configfs ttm
>> ipmi_ssif drm_kms_helper drm coretemp kvm gpio_ich i2c_algo_bit
>> i7core_edac fb_sys_fops syscopyarea edac_core sysfillrect sysimgblt
>> ipmi_si input_leds hpilo ipmi_msghandler shpchp acpi_power_meter
>> irqbypass serio_raw 8250_fintek lpc_ich mac_hid ceph bonding libceph
>> lp parport libcrc32c fscache mlx4_en vxlan ip6_udp_tunnel udp_tunnel
>> ptp pps_core hid_generic usbhid hid mlx4_core hpsa psmouse bnx2 fjes
>> scsi_transport_sas [last unloaded: target_core_mod]
>> [Thu Dec 17 03:29:55 2015] CPU: 0 PID: 2547 Comm: iscsi_trx Tainted: G
>>W I 4.4.0-rc4-ede1 #1
>> [Thu Dec 17 03:29:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS
>> P64 01/22/2015
>> [Thu Dec 17 03:29:55 2015]  c020cd47 8805f1e97958
>> 813ad644 
>> [Thu Dec 17 03:29:55 2015]  

Re: Issue with Ceph File System and LIO

2015-12-17 Thread Yan, Zheng
On Fri, Dec 18, 2015 at 2:23 PM, Eric Eastman
<eric.east...@keepertech.com> wrote:
>> Hi Yan Zheng, Eric Eastman
>>
>> Similar bug was reported in f2fs, btrfs, it does affect 4.4-rc4, the fixing
>> patch was merged into 4.4-rc5, dfd01f026058 ("sched/wait: Fix the signal
>> handling fix").
>>
>> Related report & discussion was here:
>> https://lkml.org/lkml/2015/12/12/149
>>
>> I'm not sure the current reported issue of ceph was related to that though,
>> but at least try testing with an upgraded or patched kernel could verify it.
>> :)
>>
>> Thanks,
>>
>>> -Original Message-
>>> From: ceph-devel-ow...@vger.kernel.org 
>>> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of
>>> Yan, Zheng
>>> Sent: Friday, December 18, 2015 12:05 PM
>>> To: Eric Eastman
>>> Cc: Ceph Development
>>> Subject: Re: Issue with Ceph File System and LIO
>>>
>>> On Fri, Dec 18, 2015 at 3:49 AM, Eric Eastman
>>> <eric.east...@keepertech.com> wrote:
>>> > With cephfs.patch and cephfs1.patch applied and I am now seeing:
>>> >
>>> > [Thu Dec 17 14:27:59 2015] [ cut here ]
>>> > [Thu Dec 17 14:27:59 2015] WARNING: CPU: 0 PID: 3036 at
>>> > fs/ceph/addr.c:1171 ceph_write_begin+0xfb/0x120 [ceph]()
>>> > [Thu Dec 17 14:27:59 2015] Modules linked in: iscsi_target_mod
> ...
>>> >
>>>
>>> The page gets unlocked mystically. I still don't find any clue. Could
>>> you please try the new patch (not incremental patch). Besides, please
>>> enable CONFIG_DEBUG_VM when compiling the kernel.
>>>
>>> Thanks you very much
>>> Yan, Zheng
>>
> I have just installed the cephfs_new.patch and have set
> CONFIG_DEBUG_VM=y on a new 4.4rc4 kernel and restarted the ESXi iSCSI
> test to my Ceph File System gateway.  I plan to let it run overnight
> and report the status tomorrow.
>
> Let me know if I should move on to 4.4rc5 with or without patches and
> with or without  CONFIG_DEBUG_VM=y
>

please try rc5 kernel without patches and DEBUG_VM=y

Regards
Yan, Zheng


> Looking at the network traffic stats on my iSCSI gateway, with
> CONFIG_DEBUG_VM=y, throughput seems to be down by a factor of at least
> 10 compared to my last test without setting CONFIG_DEBUG_VM=y
>
> Regards,
> Eric
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with Ceph File System and LIO

2015-12-17 Thread Eric Eastman
> Hi Yan Zheng, Eric Eastman
>
> Similar bug was reported in f2fs, btrfs, it does affect 4.4-rc4, the fixing
> patch was merged into 4.4-rc5, dfd01f026058 ("sched/wait: Fix the signal
> handling fix").
>
> Related report & discussion was here:
> https://lkml.org/lkml/2015/12/12/149
>
> I'm not sure the current reported issue of ceph was related to that though,
> but at least try testing with an upgraded or patched kernel could verify it.
> :)
>
> Thanks,
>
>> -Original Message-
>> From: ceph-devel-ow...@vger.kernel.org 
>> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of
>> Yan, Zheng
>> Sent: Friday, December 18, 2015 12:05 PM
>> To: Eric Eastman
>> Cc: Ceph Development
>> Subject: Re: Issue with Ceph File System and LIO
>>
>> On Fri, Dec 18, 2015 at 3:49 AM, Eric Eastman
>> <eric.east...@keepertech.com> wrote:
>> > With cephfs.patch and cephfs1.patch applied and I am now seeing:
>> >
>> > [Thu Dec 17 14:27:59 2015] [ cut here ]
>> > [Thu Dec 17 14:27:59 2015] WARNING: CPU: 0 PID: 3036 at
>> > fs/ceph/addr.c:1171 ceph_write_begin+0xfb/0x120 [ceph]()
>> > [Thu Dec 17 14:27:59 2015] Modules linked in: iscsi_target_mod
...
>> >
>>
>> The page gets unlocked mystically. I still don't find any clue. Could
>> you please try the new patch (not incremental patch). Besides, please
>> enable CONFIG_DEBUG_VM when compiling the kernel.
>>
>> Thanks you very much
>> Yan, Zheng
>
I have just installed the cephfs_new.patch and have set
CONFIG_DEBUG_VM=y on a new 4.4rc4 kernel and restarted the ESXi iSCSI
test to my Ceph File System gateway.  I plan to let it run overnight
and report the status tomorrow.

Let me know if I should move on to 4.4rc5 with or without patches and
with or without  CONFIG_DEBUG_VM=y

Looking at the network traffic stats on my iSCSI gateway, with
CONFIG_DEBUG_VM=y, throughput seems to be down by a factor of at least
10 compared to my last test without setting CONFIG_DEBUG_VM=y

Regards,
Eric
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with Ceph File System and LIO

2015-12-17 Thread Yan, Zheng
On Fri, Dec 18, 2015 at 3:49 AM, Eric Eastman
 wrote:
> With cephfs.patch and cephfs1.patch applied and I am now seeing:
>
> [Thu Dec 17 14:27:59 2015] [ cut here ]
> [Thu Dec 17 14:27:59 2015] WARNING: CPU: 0 PID: 3036 at
> fs/ceph/addr.c:1171 ceph_write_begin+0xfb/0x120 [ceph]()
> [Thu Dec 17 14:27:59 2015] Modules linked in: iscsi_target_mod
> vhost_scsi tcm_qla2xxx ib_srpt tcm_fc tcm_usb_gadget tcm_loop
> target_core_file target_core_iblock target_core_pscsi target_core_user
> target_core_mod ipmi_devintf vhost qla2xxx ib_cm ib_sa ib_mad ib_core
> ib_addr libfc scsi_transport_fc libcomposite udc_core uio configfs ttm
> drm_kms_helper drm ipmi_ssif coretemp gpio_ich i2c_algo_bit kvm
> fb_sys_fops syscopyarea sysfillrect sysimgblt shpchp input_leds ceph
> irqbypass i7core_edac serio_raw hpilo edac_core ipmi_si
> ipmi_msghandler 8250_fintek lpc_ich acpi_power_meter libceph mac_hid
> libcrc32c fscache bonding lp parport mlx4_en vxlan ip6_udp_tunnel
> udp_tunnel ptp pps_core hid_generic usbhid hid mlx4_core hpsa psmouse
> bnx2 fjes scsi_transport_sas [last unloaded: target_core_mod]
> [Thu Dec 17 14:27:59 2015] CPU: 0 PID: 3036 Comm: iscsi_trx Tainted: G
>W I 4.4.0-rc4-ede2 #1
> [Thu Dec 17 14:27:59 2015] Hardware name: HP ProLiant DL360 G6, BIOS
> P64 01/22/2015
> [Thu Dec 17 14:27:59 2015]  c02b2e37 880c0289b958
> 813ad644 
> [Thu Dec 17 14:27:59 2015]  880c0289b990 81079702
> 880c0289ba50 000846c21000
> [Thu Dec 17 14:27:59 2015]  880c009ea200 1000
> ea00122ed700 880c0289b9a0
> [Thu Dec 17 14:27:59 2015] Call Trace:
> [Thu Dec 17 14:27:59 2015]  [] dump_stack+0x44/0x60
> [Thu Dec 17 14:27:59 2015]  [] 
> warn_slowpath_common+0x82/0xc0
> [Thu Dec 17 14:27:59 2015]  [] warn_slowpath_null+0x1a/0x20
> [Thu Dec 17 14:27:59 2015]  []
> ceph_write_begin+0xfb/0x120 [ceph]
> [Thu Dec 17 14:27:59 2015]  []
> generic_perform_write+0xbf/0x1a0
> [Thu Dec 17 14:27:59 2015]  []
> ceph_write_iter+0xf5c/0x1010 [ceph]
> [Thu Dec 17 14:27:59 2015]  [] ? __schedule+0x386/0x9c0
> [Thu Dec 17 14:27:59 2015]  [] ? schedule+0x35/0x80
> [Thu Dec 17 14:27:59 2015]  [] ? __slab_free+0xb5/0x290
> [Thu Dec 17 14:27:59 2015]  [] ?
> iov_iter_get_pages+0x113/0x210
> [Thu Dec 17 14:27:59 2015]  [] vfs_iter_write+0x63/0xa0
> [Thu Dec 17 14:27:59 2015]  []
> fd_do_rw.isra.5+0xc9/0x1b0 [target_core_file]
> [Thu Dec 17 14:27:59 2015]  []
> fd_execute_rw+0xc5/0x2a0 [target_core_file]
> [Thu Dec 17 14:27:59 2015]  []
> sbc_execute_rw+0x22/0x30 [target_core_mod]
> [Thu Dec 17 14:27:59 2015]  []
> __target_execute_cmd+0x1f/0x70 [target_core_mod]
> [Thu Dec 17 14:27:59 2015]  []
> target_execute_cmd+0x195/0x2a0 [target_core_mod]
> [Thu Dec 17 14:27:59 2015]  []
> iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod]
> [Thu Dec 17 14:27:59 2015]  []
> iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod]
> [Thu Dec 17 14:27:59 2015]  []
> iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod]
> [Thu Dec 17 14:27:59 2015]  [] ? __switch_to+0x1cd/0x570
> [Thu Dec 17 14:27:59 2015]  [] ?
> iscsi_target_tx_thread+0x1c0/0x1c0 [iscsi_target_mod]
> [Thu Dec 17 14:27:59 2015]  [] kthread+0xc9/0xe0
> [Thu Dec 17 14:27:59 2015]  [] ?
> kthread_create_on_node+0x180/0x180
> [Thu Dec 17 14:27:59 2015]  [] ret_from_fork+0x3f/0x70
> [Thu Dec 17 14:27:59 2015]  [] ?
> kthread_create_on_node+0x180/0x180
> [Thu Dec 17 14:27:59 2015] ---[ end trace 8346192e3f29ed5d ]---
>

The page gets unlocked mystically. I still don't find any clue. Could
you please try the new patch (not incremental patch). Besides, please
enable CONFIG_DEBUG_VM when compiling the kernel.

Thanks you very much
Yan, Zheng


cephfs_new.patch
Description: Binary data


Re: Issue with Ceph File System and LIO

2015-12-16 Thread Yan, Zheng
On Wed, Dec 16, 2015 at 12:51 AM, Eric Eastman
 wrote:
> I have opened ticket: 14086
>
> On Tue, Dec 15, 2015 at 5:05 AM, Yan, Zheng  wrote:
>> On Tue, Dec 15, 2015 at 2:08 PM, Eric Eastman
>>> [Tue Dec 15 00:46:55 2015] [ cut here ]
>>> [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at
>>> /home/kernel/COD/linux/fs/ceph/addr.c:125
>>
>> could you confirm that addr.c:125 is WARN_ON(!PageLocked(page));
>
> I am using the generic kernel from:
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc4-wily
> and assuming they did not change anything, from the 4.4rc4 source tree
> I pulled down shows:
>
> 124 ret = __set_page_dirty_nobuffers(page);
> 125 WARN_ON(!PageLocked(page));
> 126 WARN_ON(!page->mapping);
>
>
> modinfo ceph
> filename:   /lib/modules/4.4.0-040400rc4-generic/kernel/fs/ceph/ceph.ko
> license:GPL
> description:Ceph filesystem for Linux
> author: Patience Warnick 
> author: Yehuda Sadeh 
> author: Sage Weil 
> alias:  fs-ceph
> srcversion: E94BA78C2D998705FE2C600
> depends:libceph,fscache
> intree: Y
> vermagic:   4.4.0-040400rc4-generic SMP mod_unload modversions
>
> This error has shown up about 20 times in 12 hours, since I started
> the ESXi test.
>

This warning is really strange. Could you try the attached debug patch.

Regards
Yan, Zheng


cephfs.patch
Description: Binary data


Re: Issue with Ceph File System and LIO

2015-12-16 Thread Eric Eastman
>>
> This warning is really strange. Could you try the attached debug patch.
>
> Regards
> Yan, Zheng

I will try the patch and get back to the list.

Eric
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with Ceph File System and LIO

2015-12-15 Thread Eric Eastman
I have opened ticket: 14086

On Tue, Dec 15, 2015 at 5:05 AM, Yan, Zheng  wrote:
> On Tue, Dec 15, 2015 at 2:08 PM, Eric Eastman
>> [Tue Dec 15 00:46:55 2015] [ cut here ]
>> [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at
>> /home/kernel/COD/linux/fs/ceph/addr.c:125
>
> could you confirm that addr.c:125 is WARN_ON(!PageLocked(page));

I am using the generic kernel from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc4-wily
and assuming they did not change anything, from the 4.4rc4 source tree
I pulled down shows:

124 ret = __set_page_dirty_nobuffers(page);
125 WARN_ON(!PageLocked(page));
126 WARN_ON(!page->mapping);


modinfo ceph
filename:   /lib/modules/4.4.0-040400rc4-generic/kernel/fs/ceph/ceph.ko
license:GPL
description:Ceph filesystem for Linux
author: Patience Warnick 
author: Yehuda Sadeh 
author: Sage Weil 
alias:  fs-ceph
srcversion: E94BA78C2D998705FE2C600
depends:libceph,fscache
intree: Y
vermagic:   4.4.0-040400rc4-generic SMP mod_unload modversions

This error has shown up about 20 times in 12 hours, since I started
the ESXi test.

Thanks,
Eric
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with Ceph File System and LIO

2015-12-15 Thread Mike Christie
On 12/15/2015 12:08 AM, Eric Eastman wrote:
> I am testing Linux Target SCSI, LIO, with a Ceph File System backstore
> and I am seeing this error on my LIO gateway.  I am using Ceph v9.2.0
> on a 4.4rc4 Kernel, on Trusty, using a kernel mounted Ceph File
> System.  A file on the Ceph File System is exported via iSCSI to a
> VMware ESXi 5.0 server, and I am seeing this error when doing a lot of
> I/O on the ESXi server.   Is this a LIO or a Ceph issue?
> 
> [Tue Dec 15 00:46:55 2015] [ cut here ]
> [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at
> /home/kernel/COD/linux/fs/ceph/addr.c:125
> ceph_set_page_dirty+0x230/0x240 [ceph]()
> [Tue Dec 15 00:46:55 2015] Modules linked in: iptable_filter ip_tables
> x_tables xfs rbd iscsi_target_mod vhost_scsi tcm_qla2xxx ib_srpt
> tcm_fc tcm_usb_gadget tcm_loop target_core_file target_core_iblock
> target_core_pscsi target_core_user target_core_mod ipmi_devintf vhost
> qla2xxx ib_cm ib_sa ib_mad ib_core ib_addr libfc scsi_transport_fc
> libcomposite udc_core uio configfs ipmi_ssif ttm drm_kms_helper
> gpio_ich drm i2c_algo_bit fb_sys_fops coretemp syscopyarea ipmi_si
> sysfillrect ipmi_msghandler sysimgblt kvm acpi_power_meter 8250_fintek
> irqbypass hpilo shpchp input_leds serio_raw lpc_ich i7core_edac
> edac_core mac_hid ceph libceph libcrc32c fscache bonding lp parport
> mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core hid_generic
> usbhid hid hpsa mlx4_core psmouse bnx2 scsi_transport_sas fjes [last
> unloaded: target_core_mod]
> [Tue Dec 15 00:46:55 2015] CPU: 0 PID: 1123421 Comm: iscsi_trx
> Tainted: GW I 4.4.0-040400rc4-generic #201512061930
> [Tue Dec 15 00:46:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS
> P64 01/22/2015
> [Tue Dec 15 00:46:55 2015]   fdc0ce43
> 880bf38c38c0 813c8ab4
> [Tue Dec 15 00:46:55 2015]   880bf38c38f8
> 8107d772 ea00127a8680
> [Tue Dec 15 00:46:55 2015]  8804e52c1448 8804e52c15b0
> 8804e52c10f0 0200
> [Tue Dec 15 00:46:55 2015] Call Trace:
> [Tue Dec 15 00:46:55 2015]  [] dump_stack+0x44/0x60
> [Tue Dec 15 00:46:55 2015]  [] 
> warn_slowpath_common+0x82/0xc0
> [Tue Dec 15 00:46:55 2015]  [] warn_slowpath_null+0x1a/0x20
> [Tue Dec 15 00:46:55 2015]  []
> ceph_set_page_dirty+0x230/0x240 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] ?
> pagecache_get_page+0x150/0x1c0
> [Tue Dec 15 00:46:55 2015]  [] ?
> ceph_pool_perm_check+0x48/0x700 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] set_page_dirty+0x3d/0x70
> [Tue Dec 15 00:46:55 2015]  []
> ceph_write_end+0x5e/0x180 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] ?
> iov_iter_copy_from_user_atomic+0x156/0x220
> [Tue Dec 15 00:46:55 2015]  []
> generic_perform_write+0x114/0x1c0
> [Tue Dec 15 00:46:55 2015]  []
> ceph_write_iter+0xf8a/0x1050 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] ?
> ceph_put_cap_refs+0x143/0x320 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] ?
> check_preempt_wakeup+0xfa/0x220
> [Tue Dec 15 00:46:55 2015]  [] ? zone_statistics+0x7c/0xa0
> [Tue Dec 15 00:46:55 2015]  [] ? copy_page_to_iter+0x5e/0xa0
> [Tue Dec 15 00:46:55 2015]  [] ?
> skb_copy_datagram_iter+0x122/0x250
> [Tue Dec 15 00:46:55 2015]  [] vfs_iter_write+0x76/0xc0
> [Tue Dec 15 00:46:55 2015]  []
> fd_do_rw.isra.5+0xd8/0x1e0 [target_core_file]
> [Tue Dec 15 00:46:55 2015]  []
> fd_execute_rw+0xc5/0x2a0 [target_core_file]
> [Tue Dec 15 00:46:55 2015]  []
> sbc_execute_rw+0x22/0x30 [target_core_mod]
> [Tue Dec 15 00:46:55 2015]  []
> __target_execute_cmd+0x1f/0x70 [target_core_mod]
> [Tue Dec 15 00:46:55 2015]  []
> target_execute_cmd+0x195/0x2a0 [target_core_mod]
> [Tue Dec 15 00:46:55 2015]  []
> iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod]
> [Tue Dec 15 00:46:55 2015]  []
> iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod]
> [Tue Dec 15 00:46:55 2015]  []
> iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod]
> [Tue Dec 15 00:46:55 2015]  [] ? __switch_to+0x1dc/0x5a0
> [Tue Dec 15 00:46:55 2015]  [] ?
> iscsi_target_tx_thread+0x1e0/0x1e0 [iscsi_target_mod]
> [Tue Dec 15 00:46:55 2015]  [] kthread+0xd8/0xf0
> [Tue Dec 15 00:46:55 2015]  [] ?
> kthread_create_on_node+0x1a0/0x1a0
> [Tue Dec 15 00:46:55 2015]  [] ret_from_fork+0x3f/0x70
> [Tue Dec 15 00:46:55 2015]  [] ?
> kthread_create_on_node+0x1a0/0x1a0
> [Tue Dec 15 00:46:55 2015] ---[ end trace 4079437668c77cbb ]---
> [Tue Dec 15 00:47:45 2015] ABORT_TASK: Found referenced iSCSI task_tag: 
> 95784927
> [Tue Dec 15 00:47:45 2015] ABORT_TASK: ref_tag: 95784927 already
> complete, skipping
> 


For writes, LIO just allocates pages using GFP_KERNEL, passes them to
sock_recvmsg to read the data into them, then passes them to the fs
using the function you see above, vfs_iter_write. So it does not do
anything fancy.

Do we need to send specific types of pages to ceph?

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: Issue with Ceph File System and LIO

2015-12-15 Thread John Spray
On Tue, Dec 15, 2015 at 9:26 AM, Mike Christie  wrote:
> On 12/15/2015 12:08 AM, Eric Eastman wrote:
>> I am testing Linux Target SCSI, LIO, with a Ceph File System backstore
>> and I am seeing this error on my LIO gateway.  I am using Ceph v9.2.0
>> on a 4.4rc4 Kernel, on Trusty, using a kernel mounted Ceph File
>> System.  A file on the Ceph File System is exported via iSCSI to a
>> VMware ESXi 5.0 server, and I am seeing this error when doing a lot of
>> I/O on the ESXi server.   Is this a LIO or a Ceph issue?
>>
>> [Tue Dec 15 00:46:55 2015] [ cut here ]
>> [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at
>> /home/kernel/COD/linux/fs/ceph/addr.c:125
>> ceph_set_page_dirty+0x230/0x240 [ceph]()
>> [Tue Dec 15 00:46:55 2015] Modules linked in: iptable_filter ip_tables
>> x_tables xfs rbd iscsi_target_mod vhost_scsi tcm_qla2xxx ib_srpt
>> tcm_fc tcm_usb_gadget tcm_loop target_core_file target_core_iblock
>> target_core_pscsi target_core_user target_core_mod ipmi_devintf vhost
>> qla2xxx ib_cm ib_sa ib_mad ib_core ib_addr libfc scsi_transport_fc
>> libcomposite udc_core uio configfs ipmi_ssif ttm drm_kms_helper
>> gpio_ich drm i2c_algo_bit fb_sys_fops coretemp syscopyarea ipmi_si
>> sysfillrect ipmi_msghandler sysimgblt kvm acpi_power_meter 8250_fintek
>> irqbypass hpilo shpchp input_leds serio_raw lpc_ich i7core_edac
>> edac_core mac_hid ceph libceph libcrc32c fscache bonding lp parport
>> mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core hid_generic
>> usbhid hid hpsa mlx4_core psmouse bnx2 scsi_transport_sas fjes [last
>> unloaded: target_core_mod]
>> [Tue Dec 15 00:46:55 2015] CPU: 0 PID: 1123421 Comm: iscsi_trx
>> Tainted: GW I 4.4.0-040400rc4-generic #201512061930
>> [Tue Dec 15 00:46:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS
>> P64 01/22/2015
>> [Tue Dec 15 00:46:55 2015]   fdc0ce43
>> 880bf38c38c0 813c8ab4
>> [Tue Dec 15 00:46:55 2015]   880bf38c38f8
>> 8107d772 ea00127a8680
>> [Tue Dec 15 00:46:55 2015]  8804e52c1448 8804e52c15b0
>> 8804e52c10f0 0200
>> [Tue Dec 15 00:46:55 2015] Call Trace:
>> [Tue Dec 15 00:46:55 2015]  [] dump_stack+0x44/0x60
>> [Tue Dec 15 00:46:55 2015]  [] 
>> warn_slowpath_common+0x82/0xc0
>> [Tue Dec 15 00:46:55 2015]  [] warn_slowpath_null+0x1a/0x20
>> [Tue Dec 15 00:46:55 2015]  []
>> ceph_set_page_dirty+0x230/0x240 [ceph]
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> pagecache_get_page+0x150/0x1c0
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> ceph_pool_perm_check+0x48/0x700 [ceph]
>> [Tue Dec 15 00:46:55 2015]  [] set_page_dirty+0x3d/0x70
>> [Tue Dec 15 00:46:55 2015]  []
>> ceph_write_end+0x5e/0x180 [ceph]
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> iov_iter_copy_from_user_atomic+0x156/0x220
>> [Tue Dec 15 00:46:55 2015]  []
>> generic_perform_write+0x114/0x1c0
>> [Tue Dec 15 00:46:55 2015]  []
>> ceph_write_iter+0xf8a/0x1050 [ceph]
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> ceph_put_cap_refs+0x143/0x320 [ceph]
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> check_preempt_wakeup+0xfa/0x220
>> [Tue Dec 15 00:46:55 2015]  [] ? zone_statistics+0x7c/0xa0
>> [Tue Dec 15 00:46:55 2015]  [] ? 
>> copy_page_to_iter+0x5e/0xa0
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> skb_copy_datagram_iter+0x122/0x250
>> [Tue Dec 15 00:46:55 2015]  [] vfs_iter_write+0x76/0xc0
>> [Tue Dec 15 00:46:55 2015]  []
>> fd_do_rw.isra.5+0xd8/0x1e0 [target_core_file]
>> [Tue Dec 15 00:46:55 2015]  []
>> fd_execute_rw+0xc5/0x2a0 [target_core_file]
>> [Tue Dec 15 00:46:55 2015]  []
>> sbc_execute_rw+0x22/0x30 [target_core_mod]
>> [Tue Dec 15 00:46:55 2015]  []
>> __target_execute_cmd+0x1f/0x70 [target_core_mod]
>> [Tue Dec 15 00:46:55 2015]  []
>> target_execute_cmd+0x195/0x2a0 [target_core_mod]
>> [Tue Dec 15 00:46:55 2015]  []
>> iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod]
>> [Tue Dec 15 00:46:55 2015]  []
>> iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod]
>> [Tue Dec 15 00:46:55 2015]  []
>> iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod]
>> [Tue Dec 15 00:46:55 2015]  [] ? __switch_to+0x1dc/0x5a0
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> iscsi_target_tx_thread+0x1e0/0x1e0 [iscsi_target_mod]
>> [Tue Dec 15 00:46:55 2015]  [] kthread+0xd8/0xf0
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> kthread_create_on_node+0x1a0/0x1a0
>> [Tue Dec 15 00:46:55 2015]  [] ret_from_fork+0x3f/0x70
>> [Tue Dec 15 00:46:55 2015]  [] ?
>> kthread_create_on_node+0x1a0/0x1a0
>> [Tue Dec 15 00:46:55 2015] ---[ end trace 4079437668c77cbb ]---
>> [Tue Dec 15 00:47:45 2015] ABORT_TASK: Found referenced iSCSI task_tag: 
>> 95784927
>> [Tue Dec 15 00:47:45 2015] ABORT_TASK: ref_tag: 95784927 already
>> complete, skipping

Looks likely to be a kclient bug, as it's in the newish
pool_perm_check path.  Perhaps we don't usually see this because we'd
usually hit the permissions checks earlier (or during a read).

CCing zyan, who will have a better idea than me.

Eric: you should probably go ahead and 

Re: Issue with Ceph File System and LIO

2015-12-15 Thread Yan, Zheng
On Tue, Dec 15, 2015 at 2:08 PM, Eric Eastman
 wrote:
> I am testing Linux Target SCSI, LIO, with a Ceph File System backstore
> and I am seeing this error on my LIO gateway.  I am using Ceph v9.2.0
> on a 4.4rc4 Kernel, on Trusty, using a kernel mounted Ceph File
> System.  A file on the Ceph File System is exported via iSCSI to a
> VMware ESXi 5.0 server, and I am seeing this error when doing a lot of
> I/O on the ESXi server.   Is this a LIO or a Ceph issue?
>
> [Tue Dec 15 00:46:55 2015] [ cut here ]
> [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at
> /home/kernel/COD/linux/fs/ceph/addr.c:125

could you confirm that addr.c:125 is WARN_ON(!PageLocked(page));

Regards
Yan, Zheng

> ceph_set_page_dirty+0x230/0x240 [ceph]()
> [Tue Dec 15 00:46:55 2015] Modules linked in: iptable_filter ip_tables
> x_tables xfs rbd iscsi_target_mod vhost_scsi tcm_qla2xxx ib_srpt
> tcm_fc tcm_usb_gadget tcm_loop target_core_file target_core_iblock
> target_core_pscsi target_core_user target_core_mod ipmi_devintf vhost
> qla2xxx ib_cm ib_sa ib_mad ib_core ib_addr libfc scsi_transport_fc
> libcomposite udc_core uio configfs ipmi_ssif ttm drm_kms_helper
> gpio_ich drm i2c_algo_bit fb_sys_fops coretemp syscopyarea ipmi_si
> sysfillrect ipmi_msghandler sysimgblt kvm acpi_power_meter 8250_fintek
> irqbypass hpilo shpchp input_leds serio_raw lpc_ich i7core_edac
> edac_core mac_hid ceph libceph libcrc32c fscache bonding lp parport
> mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core hid_generic
> usbhid hid hpsa mlx4_core psmouse bnx2 scsi_transport_sas fjes [last
> unloaded: target_core_mod]
> [Tue Dec 15 00:46:55 2015] CPU: 0 PID: 1123421 Comm: iscsi_trx
> Tainted: GW I 4.4.0-040400rc4-generic #201512061930
> [Tue Dec 15 00:46:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS
> P64 01/22/2015
> [Tue Dec 15 00:46:55 2015]   fdc0ce43
> 880bf38c38c0 813c8ab4
> [Tue Dec 15 00:46:55 2015]   880bf38c38f8
> 8107d772 ea00127a8680
> [Tue Dec 15 00:46:55 2015]  8804e52c1448 8804e52c15b0
> 8804e52c10f0 0200
> [Tue Dec 15 00:46:55 2015] Call Trace:
> [Tue Dec 15 00:46:55 2015]  [] dump_stack+0x44/0x60
> [Tue Dec 15 00:46:55 2015]  [] 
> warn_slowpath_common+0x82/0xc0
> [Tue Dec 15 00:46:55 2015]  [] warn_slowpath_null+0x1a/0x20
> [Tue Dec 15 00:46:55 2015]  []
> ceph_set_page_dirty+0x230/0x240 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] ?
> pagecache_get_page+0x150/0x1c0
> [Tue Dec 15 00:46:55 2015]  [] ?
> ceph_pool_perm_check+0x48/0x700 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] set_page_dirty+0x3d/0x70
> [Tue Dec 15 00:46:55 2015]  []
> ceph_write_end+0x5e/0x180 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] ?
> iov_iter_copy_from_user_atomic+0x156/0x220
> [Tue Dec 15 00:46:55 2015]  []
> generic_perform_write+0x114/0x1c0
> [Tue Dec 15 00:46:55 2015]  []
> ceph_write_iter+0xf8a/0x1050 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] ?
> ceph_put_cap_refs+0x143/0x320 [ceph]
> [Tue Dec 15 00:46:55 2015]  [] ?
> check_preempt_wakeup+0xfa/0x220
> [Tue Dec 15 00:46:55 2015]  [] ? zone_statistics+0x7c/0xa0
> [Tue Dec 15 00:46:55 2015]  [] ? copy_page_to_iter+0x5e/0xa0
> [Tue Dec 15 00:46:55 2015]  [] ?
> skb_copy_datagram_iter+0x122/0x250
> [Tue Dec 15 00:46:55 2015]  [] vfs_iter_write+0x76/0xc0
> [Tue Dec 15 00:46:55 2015]  []
> fd_do_rw.isra.5+0xd8/0x1e0 [target_core_file]
> [Tue Dec 15 00:46:55 2015]  []
> fd_execute_rw+0xc5/0x2a0 [target_core_file]
> [Tue Dec 15 00:46:55 2015]  []
> sbc_execute_rw+0x22/0x30 [target_core_mod]
> [Tue Dec 15 00:46:55 2015]  []
> __target_execute_cmd+0x1f/0x70 [target_core_mod]
> [Tue Dec 15 00:46:55 2015]  []
> target_execute_cmd+0x195/0x2a0 [target_core_mod]
> [Tue Dec 15 00:46:55 2015]  []
> iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod]
> [Tue Dec 15 00:46:55 2015]  []
> iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod]
> [Tue Dec 15 00:46:55 2015]  []
> iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod]
> [Tue Dec 15 00:46:55 2015]  [] ? __switch_to+0x1dc/0x5a0
> [Tue Dec 15 00:46:55 2015]  [] ?
> iscsi_target_tx_thread+0x1e0/0x1e0 [iscsi_target_mod]
> [Tue Dec 15 00:46:55 2015]  [] kthread+0xd8/0xf0
> [Tue Dec 15 00:46:55 2015]  [] ?
> kthread_create_on_node+0x1a0/0x1a0
> [Tue Dec 15 00:46:55 2015]  [] ret_from_fork+0x3f/0x70
> [Tue Dec 15 00:46:55 2015]  [] ?
> kthread_create_on_node+0x1a0/0x1a0
> [Tue Dec 15 00:46:55 2015] ---[ end trace 4079437668c77cbb ]---
> [Tue Dec 15 00:47:45 2015] ABORT_TASK: Found referenced iSCSI task_tag: 
> 95784927
> [Tue Dec 15 00:47:45 2015] ABORT_TASK: ref_tag: 95784927 already
> complete, skipping
>
> If it is a Ceph File System issue, let me know and I will open a bug.
>
> Thanks
>
> Eric
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this