Re: Issue with Ceph File System and LIO
On Sun, Dec 20, 2015 at 7:38 PM, Eric Eastmanwrote: > On Fri, Dec 18, 2015 at 12:18 AM, Yan, Zheng wrote: >> On Fri, Dec 18, 2015 at 2:23 PM, Eric Eastman >> wrote: Hi Yan Zheng, Eric Eastman Similar bug was reported in f2fs, btrfs, it does affect 4.4-rc4, the fixing patch was merged into 4.4-rc5, dfd01f026058 ("sched/wait: Fix the signal handling fix"). Related report & discussion was here: https://lkml.org/lkml/2015/12/12/149 I'm not sure the current reported issue of ceph was related to that though, but at least try testing with an upgraded or patched kernel could verify it. :) Thanks, > >> >> please try rc5 kernel without patches and DEBUG_VM=y >> >> Regards >> Yan, Zheng > > > The latest test with 4.4rc5 with CONFIG_DEBUG_VM=y has ran for over 36 > hours with no ERRORS or WARNINGS. My plan is to install the 4.4rc6 > kernel from the Ubuntu kernel-ppa site once it is available, and rerun > the tests. > Test has run for 2 days using the 4.4rc6 kernel from the Ubuntu kernel-ppa kernel site without error or warning. Looks like it was a 4.4rc4 bug. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue with Ceph File System and LIO
On Sun, Dec 20, 2015 at 6:38 PM, Eric Eastmanwrote: > On Fri, Dec 18, 2015 at 12:18 AM, Yan, Zheng wrote: >> On Fri, Dec 18, 2015 at 2:23 PM, Eric Eastman >> wrote: Hi Yan Zheng, Eric Eastman Similar bug was reported in f2fs, btrfs, it does affect 4.4-rc4, the fixing patch was merged into 4.4-rc5, dfd01f026058 ("sched/wait: Fix the signal handling fix"). Related report & discussion was here: https://lkml.org/lkml/2015/12/12/149 I'm not sure the current reported issue of ceph was related to that though, but at least try testing with an upgraded or patched kernel could verify it. :) Thanks, > >> >> please try rc5 kernel without patches and DEBUG_VM=y >> >> Regards >> Yan, Zheng > > > The latest test with 4.4rc5 with CONFIG_DEBUG_VM=y has ran for over 36 > hours with no ERRORS or WARNINGS. My plan is to install the 4.4rc6 > kernel from the Ubuntu kernel-ppa site once it is available, and rerun > the tests. > > Before running this test I had to rebuild the Ceph File System as > after the last logged errors on Friday using the 4.4rc4 kernel, the > Ceph File system hung accessing the exported image file. After > rebooting my iSCSI gateway using the Ceph File System, from / using > command: strace du -a cephfs, the mount point, the hang happened on > the newfsstatat call on my image file: > > write(1, "0\tcephfs/ctdb/.ctdb.lock\n", 250 cephfs/ctdb/.ctdb.lock > ) = 25 > close(5)= 0 > write(1, "0\tcephfs/ctdb\n", 140 cephfs/ctdb > )= 14 > newfstatat(4, "iscsi", {st_mode=S_IFDIR|0755, st_size=993814480896, > ...}, AT_SYMLINK_NOFOLLOW) = 0 > openat(4, "iscsi", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3 > fcntl(3, F_GETFD) = 0 > fcntl(3, F_SETFD, FD_CLOEXEC) = 0 > fstat(3, {st_mode=S_IFDIR|0755, st_size=993814480896, ...}) = 0 > fcntl(3, F_GETFL) = 0x38800 (flags > O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW) > fcntl(3, F_SETFD, FD_CLOEXEC) = 0 > newfstatat(4, "iscsi", {st_mode=S_IFDIR|0755, st_size=993814480896, > ...}, AT_SYMLINK_NOFOLLOW) = 0 > fcntl(3, F_DUPFD, 3)= 5 > fcntl(5, F_GETFD) = 0 > fcntl(5, F_SETFD, FD_CLOEXEC) = 0 > getdents(3, /* 8 entries */, 65536) = 288 > getdents(3, /* 0 entries */, 65536) = 0 > close(3)= 0 > newfstatat(5, "iscsi900g.img", ^C > ^C^C^C > ^Z > I could not break out with a ^C, and had to background the process to > get my prompt back. The process would not die so I had to hard reset > the system. > > This same hang happened on 2 other kernel mounted systems using a 4.3.0 > kernel. > > On a separate system, I fuse mounted the file system and a du -a > cephfs hung at the same point. Once again I could not break out of the > hang, and had to hard reset the system. > > Restarting the MDS and Monitors did not clear the issue. Taking a > quick look at the dumpcache showed it was large > > # ceph mds tell 0 dumpcache /tmp/dump.txt > ok > # wc /tmp/dump.txt > 370556 5002449 59211054 /tmp/dump.txt > # tail /tmp/dump.txt > [inode 1259276 [...c4,head] ~mds0/stray0/1259276/ auth v977593 > snaprealm=0x561339e3fb00 f(v0 m2015-12-12 00:51:04.345614) n(v0 > rc2015-12-12 00:51:04.345614 1=0+1) (iversion lock) 0x561339c66228] > [inode 120c1ba [...a6,head] ~mds0/stray0/120c1ba/ auth v742016 > snaprealm=0x56133ad19600 f(v0 m2015-12-10 18:25:55.880167) n(v0 > rc2015-12-10 18:25:55.880167 1=0+1) (iversion lock) 0x56133a5e0d88] > [inode 10d0088 [...77,head] ~mds0/stray6/10d0088/ auth v292336 > snaprealm=0x5613537673c0 f(v0 m2015-12-08 19:23:20.269283) n(v0 > rc2015-12-08 19:23:20.269283 1=0+1) (iversion lock) 0x56134c2f7378] These are deleted files that haven't been trimmed yet... > > I tried one more thing: > > ceph daemon mds.0 flush journal > > and restarted the MDS. Accessing the file system still locked up, but > a du -a cephfs did not even get to the iscsi900g.img file. As I was > running on a broken rc kernel, with snapshots turned on ...and I think we have some known issues in the tracker about snap trimming and snapshotted inodes. So this is not entirely surprising. :/ -Greg >, when this > corruption happened, I decided to recreated the file system and > restarted the ESXi iSCSI test. > > Regards, > Eric > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue with Ceph File System and LIO
On Fri, Dec 18, 2015 at 12:18 AM, Yan, Zhengwrote: > On Fri, Dec 18, 2015 at 2:23 PM, Eric Eastman > wrote: >>> Hi Yan Zheng, Eric Eastman >>> >>> Similar bug was reported in f2fs, btrfs, it does affect 4.4-rc4, the fixing >>> patch was merged into 4.4-rc5, dfd01f026058 ("sched/wait: Fix the signal >>> handling fix"). >>> >>> Related report & discussion was here: >>> https://lkml.org/lkml/2015/12/12/149 >>> >>> I'm not sure the current reported issue of ceph was related to that though, >>> but at least try testing with an upgraded or patched kernel could verify it. >>> :) >>> >>> Thanks, > > please try rc5 kernel without patches and DEBUG_VM=y > > Regards > Yan, Zheng The latest test with 4.4rc5 with CONFIG_DEBUG_VM=y has ran for over 36 hours with no ERRORS or WARNINGS. My plan is to install the 4.4rc6 kernel from the Ubuntu kernel-ppa site once it is available, and rerun the tests. Before running this test I had to rebuild the Ceph File System as after the last logged errors on Friday using the 4.4rc4 kernel, the Ceph File system hung accessing the exported image file. After rebooting my iSCSI gateway using the Ceph File System, from / using command: strace du -a cephfs, the mount point, the hang happened on the newfsstatat call on my image file: write(1, "0\tcephfs/ctdb/.ctdb.lock\n", 250 cephfs/ctdb/.ctdb.lock ) = 25 close(5)= 0 write(1, "0\tcephfs/ctdb\n", 140 cephfs/ctdb )= 14 newfstatat(4, "iscsi", {st_mode=S_IFDIR|0755, st_size=993814480896, ...}, AT_SYMLINK_NOFOLLOW) = 0 openat(4, "iscsi", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3 fcntl(3, F_GETFD) = 0 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 fstat(3, {st_mode=S_IFDIR|0755, st_size=993814480896, ...}) = 0 fcntl(3, F_GETFL) = 0x38800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW) fcntl(3, F_SETFD, FD_CLOEXEC) = 0 newfstatat(4, "iscsi", {st_mode=S_IFDIR|0755, st_size=993814480896, ...}, AT_SYMLINK_NOFOLLOW) = 0 fcntl(3, F_DUPFD, 3)= 5 fcntl(5, F_GETFD) = 0 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 getdents(3, /* 8 entries */, 65536) = 288 getdents(3, /* 0 entries */, 65536) = 0 close(3)= 0 newfstatat(5, "iscsi900g.img", ^C ^C^C^C ^Z I could not break out with a ^C, and had to background the process to get my prompt back. The process would not die so I had to hard reset the system. This same hang happened on 2 other kernel mounted systems using a 4.3.0 kernel. On a separate system, I fuse mounted the file system and a du -a cephfs hung at the same point. Once again I could not break out of the hang, and had to hard reset the system. Restarting the MDS and Monitors did not clear the issue. Taking a quick look at the dumpcache showed it was large # ceph mds tell 0 dumpcache /tmp/dump.txt ok # wc /tmp/dump.txt 370556 5002449 59211054 /tmp/dump.txt # tail /tmp/dump.txt [inode 1259276 [...c4,head] ~mds0/stray0/1259276/ auth v977593 snaprealm=0x561339e3fb00 f(v0 m2015-12-12 00:51:04.345614) n(v0 rc2015-12-12 00:51:04.345614 1=0+1) (iversion lock) 0x561339c66228] [inode 120c1ba [...a6,head] ~mds0/stray0/120c1ba/ auth v742016 snaprealm=0x56133ad19600 f(v0 m2015-12-10 18:25:55.880167) n(v0 rc2015-12-10 18:25:55.880167 1=0+1) (iversion lock) 0x56133a5e0d88] [inode 10d0088 [...77,head] ~mds0/stray6/10d0088/ auth v292336 snaprealm=0x5613537673c0 f(v0 m2015-12-08 19:23:20.269283) n(v0 rc2015-12-08 19:23:20.269283 1=0+1) (iversion lock) 0x56134c2f7378] I tried one more thing: ceph daemon mds.0 flush journal and restarted the MDS. Accessing the file system still locked up, but a du -a cephfs did not even get to the iscsi900g.img file. As I was running on a broken rc kernel, with snapshots turned on, when this corruption happened, I decided to recreated the file system and restarted the ESXi iSCSI test. Regards, Eric -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue with Ceph File System and LIO
Eric, Do you have iSCSI data digests on? On 12/15/2015 12:08 AM, Eric Eastman wrote: > I am testing Linux Target SCSI, LIO, with a Ceph File System backstore > and I am seeing this error on my LIO gateway. I am using Ceph v9.2.0 > on a 4.4rc4 Kernel, on Trusty, using a kernel mounted Ceph File > System. A file on the Ceph File System is exported via iSCSI to a > VMware ESXi 5.0 server, and I am seeing this error when doing a lot of > I/O on the ESXi server. Is this a LIO or a Ceph issue? > > [Tue Dec 15 00:46:55 2015] [ cut here ] > [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at > /home/kernel/COD/linux/fs/ceph/addr.c:125 > ceph_set_page_dirty+0x230/0x240 [ceph]() > [Tue Dec 15 00:46:55 2015] Modules linked in: iptable_filter ip_tables > x_tables xfs rbd iscsi_target_mod vhost_scsi tcm_qla2xxx ib_srpt > tcm_fc tcm_usb_gadget tcm_loop target_core_file target_core_iblock > target_core_pscsi target_core_user target_core_mod ipmi_devintf vhost > qla2xxx ib_cm ib_sa ib_mad ib_core ib_addr libfc scsi_transport_fc > libcomposite udc_core uio configfs ipmi_ssif ttm drm_kms_helper > gpio_ich drm i2c_algo_bit fb_sys_fops coretemp syscopyarea ipmi_si > sysfillrect ipmi_msghandler sysimgblt kvm acpi_power_meter 8250_fintek > irqbypass hpilo shpchp input_leds serio_raw lpc_ich i7core_edac > edac_core mac_hid ceph libceph libcrc32c fscache bonding lp parport > mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core hid_generic > usbhid hid hpsa mlx4_core psmouse bnx2 scsi_transport_sas fjes [last > unloaded: target_core_mod] > [Tue Dec 15 00:46:55 2015] CPU: 0 PID: 1123421 Comm: iscsi_trx > Tainted: GW I 4.4.0-040400rc4-generic #201512061930 > [Tue Dec 15 00:46:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS > P64 01/22/2015 > [Tue Dec 15 00:46:55 2015] fdc0ce43 > 880bf38c38c0 813c8ab4 > [Tue Dec 15 00:46:55 2015] 880bf38c38f8 > 8107d772 ea00127a8680 > [Tue Dec 15 00:46:55 2015] 8804e52c1448 8804e52c15b0 > 8804e52c10f0 0200 > [Tue Dec 15 00:46:55 2015] Call Trace: > [Tue Dec 15 00:46:55 2015] [] dump_stack+0x44/0x60 > [Tue Dec 15 00:46:55 2015] [] > warn_slowpath_common+0x82/0xc0 > [Tue Dec 15 00:46:55 2015] [] warn_slowpath_null+0x1a/0x20 > [Tue Dec 15 00:46:55 2015] [] > ceph_set_page_dirty+0x230/0x240 [ceph] > [Tue Dec 15 00:46:55 2015] [] ? > pagecache_get_page+0x150/0x1c0 > [Tue Dec 15 00:46:55 2015] [] ? > ceph_pool_perm_check+0x48/0x700 [ceph] > [Tue Dec 15 00:46:55 2015] [] set_page_dirty+0x3d/0x70 > [Tue Dec 15 00:46:55 2015] [] > ceph_write_end+0x5e/0x180 [ceph] > [Tue Dec 15 00:46:55 2015] [] ? > iov_iter_copy_from_user_atomic+0x156/0x220 > [Tue Dec 15 00:46:55 2015] [] > generic_perform_write+0x114/0x1c0 > [Tue Dec 15 00:46:55 2015] [] > ceph_write_iter+0xf8a/0x1050 [ceph] > [Tue Dec 15 00:46:55 2015] [] ? > ceph_put_cap_refs+0x143/0x320 [ceph] > [Tue Dec 15 00:46:55 2015] [] ? > check_preempt_wakeup+0xfa/0x220 > [Tue Dec 15 00:46:55 2015] [] ? zone_statistics+0x7c/0xa0 > [Tue Dec 15 00:46:55 2015] [] ? copy_page_to_iter+0x5e/0xa0 > [Tue Dec 15 00:46:55 2015] [] ? > skb_copy_datagram_iter+0x122/0x250 > [Tue Dec 15 00:46:55 2015] [] vfs_iter_write+0x76/0xc0 > [Tue Dec 15 00:46:55 2015] [] > fd_do_rw.isra.5+0xd8/0x1e0 [target_core_file] > [Tue Dec 15 00:46:55 2015] [] > fd_execute_rw+0xc5/0x2a0 [target_core_file] > [Tue Dec 15 00:46:55 2015] [] > sbc_execute_rw+0x22/0x30 [target_core_mod] > [Tue Dec 15 00:46:55 2015] [] > __target_execute_cmd+0x1f/0x70 [target_core_mod] > [Tue Dec 15 00:46:55 2015] [] > target_execute_cmd+0x195/0x2a0 [target_core_mod] > [Tue Dec 15 00:46:55 2015] [] > iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod] > [Tue Dec 15 00:46:55 2015] [] > iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod] > [Tue Dec 15 00:46:55 2015] [] > iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod] > [Tue Dec 15 00:46:55 2015] [] ? __switch_to+0x1dc/0x5a0 > [Tue Dec 15 00:46:55 2015] [] ? > iscsi_target_tx_thread+0x1e0/0x1e0 [iscsi_target_mod] > [Tue Dec 15 00:46:55 2015] [] kthread+0xd8/0xf0 > [Tue Dec 15 00:46:55 2015] [] ? > kthread_create_on_node+0x1a0/0x1a0 > [Tue Dec 15 00:46:55 2015] [] ret_from_fork+0x3f/0x70 > [Tue Dec 15 00:46:55 2015] [] ? > kthread_create_on_node+0x1a0/0x1a0 > [Tue Dec 15 00:46:55 2015] ---[ end trace 4079437668c77cbb ]--- > [Tue Dec 15 00:47:45 2015] ABORT_TASK: Found referenced iSCSI task_tag: > 95784927 > [Tue Dec 15 00:47:45 2015] ABORT_TASK: ref_tag: 95784927 already > complete, skipping > > If it is a Ceph File System issue, let me know and I will open a bug. > > Thanks > > Eric > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to
Re: Issue with Ceph File System and LIO
Hi Mike, On the EXSi server both Header Digest and Data Digest are set to Prohibited. Eric On Fri, Dec 18, 2015 at 2:54 PM, Mike Christiewrote: > Eric, > > Do you have iSCSI data digests on? > > On 12/15/2015 12:08 AM, Eric Eastman wrote: >> I am testing Linux Target SCSI, LIO, with a Ceph File System backstore >> and I am seeing this error on my LIO gateway. I am using Ceph v9.2.0 >> on a 4.4rc4 Kernel, on Trusty, using a kernel mounted Ceph File >> System. A file on the Ceph File System is exported via iSCSI to a >> VMware ESXi 5.0 server, and I am seeing this error when doing a lot of >> I/O on the ESXi server. Is this a LIO or a Ceph issue? >> >> [Tue Dec 15 00:46:55 2015] [ cut here ] >> [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at >> /home/kernel/COD/linux/fs/ceph/addr.c:125 >> ceph_set_page_dirty+0x230/0x240 [ceph]() >> [Tue Dec 15 00:46:55 2015] Modules linked in: iptable_filter ip_tables >> x_tables xfs rbd iscsi_target_mod vhost_scsi tcm_qla2xxx ib_srpt >> tcm_fc tcm_usb_gadget tcm_loop target_core_file target_core_iblock >> target_core_pscsi target_core_user target_core_mod ipmi_devintf vhost >> qla2xxx ib_cm ib_sa ib_mad ib_core ib_addr libfc scsi_transport_fc >> libcomposite udc_core uio configfs ipmi_ssif ttm drm_kms_helper >> gpio_ich drm i2c_algo_bit fb_sys_fops coretemp syscopyarea ipmi_si >> sysfillrect ipmi_msghandler sysimgblt kvm acpi_power_meter 8250_fintek >> irqbypass hpilo shpchp input_leds serio_raw lpc_ich i7core_edac >> edac_core mac_hid ceph libceph libcrc32c fscache bonding lp parport >> mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core hid_generic >> usbhid hid hpsa mlx4_core psmouse bnx2 scsi_transport_sas fjes [last >> unloaded: target_core_mod] >> [Tue Dec 15 00:46:55 2015] CPU: 0 PID: 1123421 Comm: iscsi_trx >> Tainted: GW I 4.4.0-040400rc4-generic #201512061930 >> [Tue Dec 15 00:46:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS >> P64 01/22/2015 >> [Tue Dec 15 00:46:55 2015] fdc0ce43 >> 880bf38c38c0 813c8ab4 >> [Tue Dec 15 00:46:55 2015] 880bf38c38f8 >> 8107d772 ea00127a8680 >> [Tue Dec 15 00:46:55 2015] 8804e52c1448 8804e52c15b0 >> 8804e52c10f0 0200 >> [Tue Dec 15 00:46:55 2015] Call Trace: >> [Tue Dec 15 00:46:55 2015] [] dump_stack+0x44/0x60 >> [Tue Dec 15 00:46:55 2015] [] >> warn_slowpath_common+0x82/0xc0 >> [Tue Dec 15 00:46:55 2015] [] warn_slowpath_null+0x1a/0x20 >> [Tue Dec 15 00:46:55 2015] [] >> ceph_set_page_dirty+0x230/0x240 [ceph] >> [Tue Dec 15 00:46:55 2015] [] ? >> pagecache_get_page+0x150/0x1c0 >> [Tue Dec 15 00:46:55 2015] [] ? >> ceph_pool_perm_check+0x48/0x700 [ceph] >> [Tue Dec 15 00:46:55 2015] [] set_page_dirty+0x3d/0x70 >> [Tue Dec 15 00:46:55 2015] [] >> ceph_write_end+0x5e/0x180 [ceph] >> [Tue Dec 15 00:46:55 2015] [] ? >> iov_iter_copy_from_user_atomic+0x156/0x220 >> [Tue Dec 15 00:46:55 2015] [] >> generic_perform_write+0x114/0x1c0 >> [Tue Dec 15 00:46:55 2015] [] >> ceph_write_iter+0xf8a/0x1050 [ceph] >> [Tue Dec 15 00:46:55 2015] [] ? >> ceph_put_cap_refs+0x143/0x320 [ceph] >> [Tue Dec 15 00:46:55 2015] [] ? >> check_preempt_wakeup+0xfa/0x220 >> [Tue Dec 15 00:46:55 2015] [] ? zone_statistics+0x7c/0xa0 >> [Tue Dec 15 00:46:55 2015] [] ? >> copy_page_to_iter+0x5e/0xa0 >> [Tue Dec 15 00:46:55 2015] [] ? >> skb_copy_datagram_iter+0x122/0x250 >> [Tue Dec 15 00:46:55 2015] [] vfs_iter_write+0x76/0xc0 >> [Tue Dec 15 00:46:55 2015] [] >> fd_do_rw.isra.5+0xd8/0x1e0 [target_core_file] >> [Tue Dec 15 00:46:55 2015] [] >> fd_execute_rw+0xc5/0x2a0 [target_core_file] >> [Tue Dec 15 00:46:55 2015] [] >> sbc_execute_rw+0x22/0x30 [target_core_mod] >> [Tue Dec 15 00:46:55 2015] [] >> __target_execute_cmd+0x1f/0x70 [target_core_mod] >> [Tue Dec 15 00:46:55 2015] [] >> target_execute_cmd+0x195/0x2a0 [target_core_mod] >> [Tue Dec 15 00:46:55 2015] [] >> iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod] >> [Tue Dec 15 00:46:55 2015] [] >> iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod] >> [Tue Dec 15 00:46:55 2015] [] >> iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod] >> [Tue Dec 15 00:46:55 2015] [] ? __switch_to+0x1dc/0x5a0 >> [Tue Dec 15 00:46:55 2015] [] ? >> iscsi_target_tx_thread+0x1e0/0x1e0 [iscsi_target_mod] >> [Tue Dec 15 00:46:55 2015] [] kthread+0xd8/0xf0 >> [Tue Dec 15 00:46:55 2015] [] ? >> kthread_create_on_node+0x1a0/0x1a0 >> [Tue Dec 15 00:46:55 2015] [] ret_from_fork+0x3f/0x70 >> [Tue Dec 15 00:46:55 2015] [] ? >> kthread_create_on_node+0x1a0/0x1a0 >> [Tue Dec 15 00:46:55 2015] ---[ end trace 4079437668c77cbb ]--- >> [Tue Dec 15 00:47:45 2015] ABORT_TASK: Found referenced iSCSI task_tag: >> 95784927 >> [Tue Dec 15 00:47:45 2015] ABORT_TASK: ref_tag: 95784927 already >> complete, skipping >> >> If it is a Ceph File System issue, let me know and I will open a bug. >> >> Thanks >> >> Eric >> -- >> To unsubscribe from this
Re: Issue with Ceph File System and LIO
I patched the 4.4rc4 kernel source and restarted the test. Shortly after starting it, this showed up in dmesg: [Thu Dec 17 03:29:55 2015] WARNING: CPU: 0 PID: 2547 at fs/ceph/addr.c:1162 ceph_write_begin+0xfb/0x120 [ceph]() [Thu Dec 17 03:29:55 2015] Modules linked in: iscsi_target_mod vhost_scsi tcm_qla2xxx ib_srpt tcm_fc tcm_usb_gadget tcm_loop target_core_file target_core_iblock target_core_pscsi target_core_user target_core_mod ipmi_devintf vhost qla2xxx ib_cm ib_sa ib_mad ib_core ib_addr libfc scsi_transport_fc libcomposite udc_core uio configfs ttm ipmi_ssif drm_kms_helper drm coretemp kvm gpio_ich i2c_algo_bit i7core_edac fb_sys_fops syscopyarea edac_core sysfillrect sysimgblt ipmi_si input_leds hpilo ipmi_msghandler shpchp acpi_power_meter irqbypass serio_raw 8250_fintek lpc_ich mac_hid ceph bonding libceph lp parport libcrc32c fscache mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core hid_generic usbhid hid mlx4_core hpsa psmouse bnx2 fjes scsi_transport_sas [last unloaded: target_core_mod] [Thu Dec 17 03:29:55 2015] CPU: 0 PID: 2547 Comm: iscsi_trx Tainted: G W I 4.4.0-rc4-ede1 #1 [Thu Dec 17 03:29:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS P64 01/22/2015 [Thu Dec 17 03:29:55 2015] c020cd47 8805f1e97958 813ad644 [Thu Dec 17 03:29:55 2015] 8805f1e97990 81079702 8805f1e97a50 015dd000 [Thu Dec 17 03:29:55 2015] 880c034df800 0200 eab26a80 8805f1e979a0 [Thu Dec 17 03:29:55 2015] Call Trace: [Thu Dec 17 03:29:55 2015] [] dump_stack+0x44/0x60 [Thu Dec 17 03:29:55 2015] [] warn_slowpath_common+0x82/0xc0 [Thu Dec 17 03:29:55 2015] [] warn_slowpath_null+0x1a/0x20 [Thu Dec 17 03:29:55 2015] [] ceph_write_begin+0xfb/0x120 [ceph] [Thu Dec 17 03:29:55 2015] [] generic_perform_write+0xbf/0x1a0 [Thu Dec 17 03:29:55 2015] [] ceph_write_iter+0xf5c/0x1010 [ceph] [Thu Dec 17 03:29:55 2015] [] ? __enqueue_entity+0x6c/0x70 [Thu Dec 17 03:29:55 2015] [] ? iov_iter_get_pages+0x113/0x210 [Thu Dec 17 03:29:55 2015] [] ? skb_copy_datagram_iter+0x122/0x250 [Thu Dec 17 03:29:55 2015] [] vfs_iter_write+0x63/0xa0 [Thu Dec 17 03:29:55 2015] [] fd_do_rw.isra.5+0xc9/0x1b0 [target_core_file] [Thu Dec 17 03:29:55 2015] [] fd_execute_rw+0xc5/0x2a0 [target_core_file] [Thu Dec 17 03:29:55 2015] [] sbc_execute_rw+0x22/0x30 [target_core_mod] [Thu Dec 17 03:29:55 2015] [] __target_execute_cmd+0x1f/0x70 [target_core_mod] [Thu Dec 17 03:29:55 2015] [] target_execute_cmd+0x195/0x2a0 [target_core_mod] [Thu Dec 17 03:29:55 2015] [] iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod] [Thu Dec 17 03:29:55 2015] [] iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod] [Thu Dec 17 03:29:55 2015] [] iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod] [Thu Dec 17 03:29:55 2015] [] ? __switch_to+0x1cd/0x570 [Thu Dec 17 03:29:55 2015] [] ? iscsi_target_tx_thread+0x1c0/0x1c0 [iscsi_target_mod] [Thu Dec 17 03:29:55 2015] [] kthread+0xc9/0xe0 [Thu Dec 17 03:29:55 2015] [] ? kthread_create_on_node+0x180/0x180 [Thu Dec 17 03:29:55 2015] [] ret_from_fork+0x3f/0x70 [Thu Dec 17 03:29:55 2015] [] ? kthread_create_on_node+0x180/0x180 [Thu Dec 17 03:29:55 2015] ---[ end trace 382a45986961da4e ]--- There are WARNINGs on both line 125 and 1162. I will attached the whole set of dmesg output to the tracker ticket 14086 I wanted to note that file system snapshots are enabled and being used on this file system. Thanks Eric On Wed, Dec 16, 2015 at 8:15 AM, Eric Eastmanwrote: >>> >> This warning is really strange. Could you try the attached debug patch. >> >> Regards >> Yan, Zheng > > I will try the patch and get back to the list. > > Eric -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue with Ceph File System and LIO
On Thu, Dec 17, 2015 at 4:56 PM, Eric Eastmanwrote: > I patched the 4.4rc4 kernel source and restarted the test. Shortly > after starting it, this showed up in dmesg: > > [Thu Dec 17 03:29:55 2015] WARNING: CPU: 0 PID: 2547 at > fs/ceph/addr.c:1162 ceph_write_begin+0xfb/0x120 [ceph]() > [Thu Dec 17 03:29:55 2015] Modules linked in: iscsi_target_mod > vhost_scsi tcm_qla2xxx ib_srpt tcm_fc tcm_usb_gadget tcm_loop > target_core_file target_core_iblock target_core_pscsi target_core_user > target_core_mod ipmi_devintf vhost qla2xxx ib_cm ib_sa ib_mad ib_core > ib_addr libfc scsi_transport_fc libcomposite udc_core uio configfs ttm > ipmi_ssif drm_kms_helper drm coretemp kvm gpio_ich i2c_algo_bit > i7core_edac fb_sys_fops syscopyarea edac_core sysfillrect sysimgblt > ipmi_si input_leds hpilo ipmi_msghandler shpchp acpi_power_meter > irqbypass serio_raw 8250_fintek lpc_ich mac_hid ceph bonding libceph > lp parport libcrc32c fscache mlx4_en vxlan ip6_udp_tunnel udp_tunnel > ptp pps_core hid_generic usbhid hid mlx4_core hpsa psmouse bnx2 fjes > scsi_transport_sas [last unloaded: target_core_mod] > [Thu Dec 17 03:29:55 2015] CPU: 0 PID: 2547 Comm: iscsi_trx Tainted: G >W I 4.4.0-rc4-ede1 #1 > [Thu Dec 17 03:29:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS > P64 01/22/2015 > [Thu Dec 17 03:29:55 2015] c020cd47 8805f1e97958 > 813ad644 > [Thu Dec 17 03:29:55 2015] 8805f1e97990 81079702 > 8805f1e97a50 015dd000 > [Thu Dec 17 03:29:55 2015] 880c034df800 0200 > eab26a80 8805f1e979a0 > [Thu Dec 17 03:29:55 2015] Call Trace: > [Thu Dec 17 03:29:55 2015] [] dump_stack+0x44/0x60 > [Thu Dec 17 03:29:55 2015] [] > warn_slowpath_common+0x82/0xc0 > [Thu Dec 17 03:29:55 2015] [] warn_slowpath_null+0x1a/0x20 > [Thu Dec 17 03:29:55 2015] [] > ceph_write_begin+0xfb/0x120 [ceph] > [Thu Dec 17 03:29:55 2015] [] > generic_perform_write+0xbf/0x1a0 > [Thu Dec 17 03:29:55 2015] [] > ceph_write_iter+0xf5c/0x1010 [ceph] > [Thu Dec 17 03:29:55 2015] [] ? __enqueue_entity+0x6c/0x70 > [Thu Dec 17 03:29:55 2015] [] ? > iov_iter_get_pages+0x113/0x210 > [Thu Dec 17 03:29:55 2015] [] ? > skb_copy_datagram_iter+0x122/0x250 > [Thu Dec 17 03:29:55 2015] [] vfs_iter_write+0x63/0xa0 > [Thu Dec 17 03:29:55 2015] [] > fd_do_rw.isra.5+0xc9/0x1b0 [target_core_file] > [Thu Dec 17 03:29:55 2015] [] > fd_execute_rw+0xc5/0x2a0 [target_core_file] > [Thu Dec 17 03:29:55 2015] [] > sbc_execute_rw+0x22/0x30 [target_core_mod] > [Thu Dec 17 03:29:55 2015] [] > __target_execute_cmd+0x1f/0x70 [target_core_mod] > [Thu Dec 17 03:29:55 2015] [] > target_execute_cmd+0x195/0x2a0 [target_core_mod] > [Thu Dec 17 03:29:55 2015] [] > iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod] > [Thu Dec 17 03:29:55 2015] [] > iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod] > [Thu Dec 17 03:29:55 2015] [] > iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod] > [Thu Dec 17 03:29:55 2015] [] ? __switch_to+0x1cd/0x570 > [Thu Dec 17 03:29:55 2015] [] ? > iscsi_target_tx_thread+0x1c0/0x1c0 [iscsi_target_mod] > [Thu Dec 17 03:29:55 2015] [] kthread+0xc9/0xe0 > [Thu Dec 17 03:29:55 2015] [] ? > kthread_create_on_node+0x180/0x180 > [Thu Dec 17 03:29:55 2015] [] ret_from_fork+0x3f/0x70 > [Thu Dec 17 03:29:55 2015] [] ? > kthread_create_on_node+0x180/0x180 > [Thu Dec 17 03:29:55 2015] ---[ end trace 382a45986961da4e ]--- Could you please try the apply the new incremental patch and try again. Regards Yan, Zheng > > There are WARNINGs on both line 125 and 1162. I will attached the > whole set of dmesg output to the tracker ticket 14086 > > I wanted to note that file system snapshots are enabled and being used > on this file system. > > Thanks > Eric > > On Wed, Dec 16, 2015 at 8:15 AM, Eric Eastman > wrote: >>> This warning is really strange. Could you try the attached debug patch. >>> >>> Regards >>> Yan, Zheng >> >> I will try the patch and get back to the list. >> >> Eric cephfs1.patch Description: Binary data
Re: Issue with Ceph File System and LIO
Hi. It may be helpful to address this issue, if we flip the debug. Thanks Minfei On 12/17/15 at 01:56P, Eric Eastman wrote: > I patched the 4.4rc4 kernel source and restarted the test. Shortly > after starting it, this showed up in dmesg: > > [Thu Dec 17 03:29:55 2015] WARNING: CPU: 0 PID: 2547 at > fs/ceph/addr.c:1162 ceph_write_begin+0xfb/0x120 [ceph]() > [Thu Dec 17 03:29:55 2015] Modules linked in: iscsi_target_mod > vhost_scsi tcm_qla2xxx ib_srpt tcm_fc tcm_usb_gadget tcm_loop > target_core_file target_core_iblock target_core_pscsi target_core_user > target_core_mod ipmi_devintf vhost qla2xxx ib_cm ib_sa ib_mad ib_core > ib_addr libfc scsi_transport_fc libcomposite udc_core uio configfs ttm > ipmi_ssif drm_kms_helper drm coretemp kvm gpio_ich i2c_algo_bit > i7core_edac fb_sys_fops syscopyarea edac_core sysfillrect sysimgblt > ipmi_si input_leds hpilo ipmi_msghandler shpchp acpi_power_meter > irqbypass serio_raw 8250_fintek lpc_ich mac_hid ceph bonding libceph > lp parport libcrc32c fscache mlx4_en vxlan ip6_udp_tunnel udp_tunnel > ptp pps_core hid_generic usbhid hid mlx4_core hpsa psmouse bnx2 fjes > scsi_transport_sas [last unloaded: target_core_mod] > [Thu Dec 17 03:29:55 2015] CPU: 0 PID: 2547 Comm: iscsi_trx Tainted: G >W I 4.4.0-rc4-ede1 #1 > [Thu Dec 17 03:29:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS > P64 01/22/2015 > [Thu Dec 17 03:29:55 2015] c020cd47 8805f1e97958 > 813ad644 > [Thu Dec 17 03:29:55 2015] 8805f1e97990 81079702 > 8805f1e97a50 015dd000 > [Thu Dec 17 03:29:55 2015] 880c034df800 0200 > eab26a80 8805f1e979a0 > [Thu Dec 17 03:29:55 2015] Call Trace: > [Thu Dec 17 03:29:55 2015] [] dump_stack+0x44/0x60 > [Thu Dec 17 03:29:55 2015] [] > warn_slowpath_common+0x82/0xc0 > [Thu Dec 17 03:29:55 2015] [] warn_slowpath_null+0x1a/0x20 > [Thu Dec 17 03:29:55 2015] [] > ceph_write_begin+0xfb/0x120 [ceph] > [Thu Dec 17 03:29:55 2015] [] > generic_perform_write+0xbf/0x1a0 > [Thu Dec 17 03:29:55 2015] [] > ceph_write_iter+0xf5c/0x1010 [ceph] > [Thu Dec 17 03:29:55 2015] [] ? __enqueue_entity+0x6c/0x70 > [Thu Dec 17 03:29:55 2015] [] ? > iov_iter_get_pages+0x113/0x210 > [Thu Dec 17 03:29:55 2015] [] ? > skb_copy_datagram_iter+0x122/0x250 > [Thu Dec 17 03:29:55 2015] [] vfs_iter_write+0x63/0xa0 > [Thu Dec 17 03:29:55 2015] [] > fd_do_rw.isra.5+0xc9/0x1b0 [target_core_file] > [Thu Dec 17 03:29:55 2015] [] > fd_execute_rw+0xc5/0x2a0 [target_core_file] > [Thu Dec 17 03:29:55 2015] [] > sbc_execute_rw+0x22/0x30 [target_core_mod] > [Thu Dec 17 03:29:55 2015] [] > __target_execute_cmd+0x1f/0x70 [target_core_mod] > [Thu Dec 17 03:29:55 2015] [] > target_execute_cmd+0x195/0x2a0 [target_core_mod] > [Thu Dec 17 03:29:55 2015] [] > iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod] > [Thu Dec 17 03:29:55 2015] [] > iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod] > [Thu Dec 17 03:29:55 2015] [] > iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod] > [Thu Dec 17 03:29:55 2015] [] ? __switch_to+0x1cd/0x570 > [Thu Dec 17 03:29:55 2015] [] ? > iscsi_target_tx_thread+0x1c0/0x1c0 [iscsi_target_mod] > [Thu Dec 17 03:29:55 2015] [] kthread+0xc9/0xe0 > [Thu Dec 17 03:29:55 2015] [] ? > kthread_create_on_node+0x180/0x180 > [Thu Dec 17 03:29:55 2015] [] ret_from_fork+0x3f/0x70 > [Thu Dec 17 03:29:55 2015] [] ? > kthread_create_on_node+0x180/0x180 > [Thu Dec 17 03:29:55 2015] ---[ end trace 382a45986961da4e ]--- > > There are WARNINGs on both line 125 and 1162. I will attached the > whole set of dmesg output to the tracker ticket 14086 > > I wanted to note that file system snapshots are enabled and being used > on this file system. > > Thanks > Eric > > On Wed, Dec 16, 2015 at 8:15 AM, Eric Eastman >wrote: > >>> > >> This warning is really strange. Could you try the attached debug patch. > >> > >> Regards > >> Yan, Zheng > > > > I will try the patch and get back to the list. > > > > Eric > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue with Ceph File System and LIO
With cephfs.patch and cephfs1.patch applied and I am now seeing: [Thu Dec 17 14:27:59 2015] [ cut here ] [Thu Dec 17 14:27:59 2015] WARNING: CPU: 0 PID: 3036 at fs/ceph/addr.c:1171 ceph_write_begin+0xfb/0x120 [ceph]() [Thu Dec 17 14:27:59 2015] Modules linked in: iscsi_target_mod vhost_scsi tcm_qla2xxx ib_srpt tcm_fc tcm_usb_gadget tcm_loop target_core_file target_core_iblock target_core_pscsi target_core_user target_core_mod ipmi_devintf vhost qla2xxx ib_cm ib_sa ib_mad ib_core ib_addr libfc scsi_transport_fc libcomposite udc_core uio configfs ttm drm_kms_helper drm ipmi_ssif coretemp gpio_ich i2c_algo_bit kvm fb_sys_fops syscopyarea sysfillrect sysimgblt shpchp input_leds ceph irqbypass i7core_edac serio_raw hpilo edac_core ipmi_si ipmi_msghandler 8250_fintek lpc_ich acpi_power_meter libceph mac_hid libcrc32c fscache bonding lp parport mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core hid_generic usbhid hid mlx4_core hpsa psmouse bnx2 fjes scsi_transport_sas [last unloaded: target_core_mod] [Thu Dec 17 14:27:59 2015] CPU: 0 PID: 3036 Comm: iscsi_trx Tainted: G W I 4.4.0-rc4-ede2 #1 [Thu Dec 17 14:27:59 2015] Hardware name: HP ProLiant DL360 G6, BIOS P64 01/22/2015 [Thu Dec 17 14:27:59 2015] c02b2e37 880c0289b958 813ad644 [Thu Dec 17 14:27:59 2015] 880c0289b990 81079702 880c0289ba50 000846c21000 [Thu Dec 17 14:27:59 2015] 880c009ea200 1000 ea00122ed700 880c0289b9a0 [Thu Dec 17 14:27:59 2015] Call Trace: [Thu Dec 17 14:27:59 2015] [] dump_stack+0x44/0x60 [Thu Dec 17 14:27:59 2015] [] warn_slowpath_common+0x82/0xc0 [Thu Dec 17 14:27:59 2015] [] warn_slowpath_null+0x1a/0x20 [Thu Dec 17 14:27:59 2015] [] ceph_write_begin+0xfb/0x120 [ceph] [Thu Dec 17 14:27:59 2015] [] generic_perform_write+0xbf/0x1a0 [Thu Dec 17 14:27:59 2015] [] ceph_write_iter+0xf5c/0x1010 [ceph] [Thu Dec 17 14:27:59 2015] [] ? __schedule+0x386/0x9c0 [Thu Dec 17 14:27:59 2015] [] ? schedule+0x35/0x80 [Thu Dec 17 14:27:59 2015] [] ? __slab_free+0xb5/0x290 [Thu Dec 17 14:27:59 2015] [] ? iov_iter_get_pages+0x113/0x210 [Thu Dec 17 14:27:59 2015] [] vfs_iter_write+0x63/0xa0 [Thu Dec 17 14:27:59 2015] [] fd_do_rw.isra.5+0xc9/0x1b0 [target_core_file] [Thu Dec 17 14:27:59 2015] [] fd_execute_rw+0xc5/0x2a0 [target_core_file] [Thu Dec 17 14:27:59 2015] [] sbc_execute_rw+0x22/0x30 [target_core_mod] [Thu Dec 17 14:27:59 2015] [] __target_execute_cmd+0x1f/0x70 [target_core_mod] [Thu Dec 17 14:27:59 2015] [] target_execute_cmd+0x195/0x2a0 [target_core_mod] [Thu Dec 17 14:27:59 2015] [] iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod] [Thu Dec 17 14:27:59 2015] [] iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod] [Thu Dec 17 14:27:59 2015] [] iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod] [Thu Dec 17 14:27:59 2015] [] ? __switch_to+0x1cd/0x570 [Thu Dec 17 14:27:59 2015] [] ? iscsi_target_tx_thread+0x1c0/0x1c0 [iscsi_target_mod] [Thu Dec 17 14:27:59 2015] [] kthread+0xc9/0xe0 [Thu Dec 17 14:27:59 2015] [] ? kthread_create_on_node+0x180/0x180 [Thu Dec 17 14:27:59 2015] [] ret_from_fork+0x3f/0x70 [Thu Dec 17 14:27:59 2015] [] ? kthread_create_on_node+0x180/0x180 [Thu Dec 17 14:27:59 2015] ---[ end trace 8346192e3f29ed5d ]--- Each of the WARNING on line 1171 is followed by a WARNING on line 125. The dmesg output is attached to the tracker ticket 14086 Regards, Eric On Thu, Dec 17, 2015 at 2:38 AM, Yan, Zhengwrote: > On Thu, Dec 17, 2015 at 4:56 PM, Eric Eastman > wrote: >> I patched the 4.4rc4 kernel source and restarted the test. Shortly >> after starting it, this showed up in dmesg: >> >> [Thu Dec 17 03:29:55 2015] WARNING: CPU: 0 PID: 2547 at >> fs/ceph/addr.c:1162 ceph_write_begin+0xfb/0x120 [ceph]() >> [Thu Dec 17 03:29:55 2015] Modules linked in: iscsi_target_mod >> vhost_scsi tcm_qla2xxx ib_srpt tcm_fc tcm_usb_gadget tcm_loop >> target_core_file target_core_iblock target_core_pscsi target_core_user >> target_core_mod ipmi_devintf vhost qla2xxx ib_cm ib_sa ib_mad ib_core >> ib_addr libfc scsi_transport_fc libcomposite udc_core uio configfs ttm >> ipmi_ssif drm_kms_helper drm coretemp kvm gpio_ich i2c_algo_bit >> i7core_edac fb_sys_fops syscopyarea edac_core sysfillrect sysimgblt >> ipmi_si input_leds hpilo ipmi_msghandler shpchp acpi_power_meter >> irqbypass serio_raw 8250_fintek lpc_ich mac_hid ceph bonding libceph >> lp parport libcrc32c fscache mlx4_en vxlan ip6_udp_tunnel udp_tunnel >> ptp pps_core hid_generic usbhid hid mlx4_core hpsa psmouse bnx2 fjes >> scsi_transport_sas [last unloaded: target_core_mod] >> [Thu Dec 17 03:29:55 2015] CPU: 0 PID: 2547 Comm: iscsi_trx Tainted: G >>W I 4.4.0-rc4-ede1 #1 >> [Thu Dec 17 03:29:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS >> P64 01/22/2015 >> [Thu Dec 17 03:29:55 2015] c020cd47 8805f1e97958 >> 813ad644 >> [Thu Dec 17 03:29:55 2015]
Re: Issue with Ceph File System and LIO
On Fri, Dec 18, 2015 at 2:23 PM, Eric Eastman <eric.east...@keepertech.com> wrote: >> Hi Yan Zheng, Eric Eastman >> >> Similar bug was reported in f2fs, btrfs, it does affect 4.4-rc4, the fixing >> patch was merged into 4.4-rc5, dfd01f026058 ("sched/wait: Fix the signal >> handling fix"). >> >> Related report & discussion was here: >> https://lkml.org/lkml/2015/12/12/149 >> >> I'm not sure the current reported issue of ceph was related to that though, >> but at least try testing with an upgraded or patched kernel could verify it. >> :) >> >> Thanks, >> >>> -Original Message- >>> From: ceph-devel-ow...@vger.kernel.org >>> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of >>> Yan, Zheng >>> Sent: Friday, December 18, 2015 12:05 PM >>> To: Eric Eastman >>> Cc: Ceph Development >>> Subject: Re: Issue with Ceph File System and LIO >>> >>> On Fri, Dec 18, 2015 at 3:49 AM, Eric Eastman >>> <eric.east...@keepertech.com> wrote: >>> > With cephfs.patch and cephfs1.patch applied and I am now seeing: >>> > >>> > [Thu Dec 17 14:27:59 2015] [ cut here ] >>> > [Thu Dec 17 14:27:59 2015] WARNING: CPU: 0 PID: 3036 at >>> > fs/ceph/addr.c:1171 ceph_write_begin+0xfb/0x120 [ceph]() >>> > [Thu Dec 17 14:27:59 2015] Modules linked in: iscsi_target_mod > ... >>> > >>> >>> The page gets unlocked mystically. I still don't find any clue. Could >>> you please try the new patch (not incremental patch). Besides, please >>> enable CONFIG_DEBUG_VM when compiling the kernel. >>> >>> Thanks you very much >>> Yan, Zheng >> > I have just installed the cephfs_new.patch and have set > CONFIG_DEBUG_VM=y on a new 4.4rc4 kernel and restarted the ESXi iSCSI > test to my Ceph File System gateway. I plan to let it run overnight > and report the status tomorrow. > > Let me know if I should move on to 4.4rc5 with or without patches and > with or without CONFIG_DEBUG_VM=y > please try rc5 kernel without patches and DEBUG_VM=y Regards Yan, Zheng > Looking at the network traffic stats on my iSCSI gateway, with > CONFIG_DEBUG_VM=y, throughput seems to be down by a factor of at least > 10 compared to my last test without setting CONFIG_DEBUG_VM=y > > Regards, > Eric -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue with Ceph File System and LIO
> Hi Yan Zheng, Eric Eastman > > Similar bug was reported in f2fs, btrfs, it does affect 4.4-rc4, the fixing > patch was merged into 4.4-rc5, dfd01f026058 ("sched/wait: Fix the signal > handling fix"). > > Related report & discussion was here: > https://lkml.org/lkml/2015/12/12/149 > > I'm not sure the current reported issue of ceph was related to that though, > but at least try testing with an upgraded or patched kernel could verify it. > :) > > Thanks, > >> -Original Message- >> From: ceph-devel-ow...@vger.kernel.org >> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of >> Yan, Zheng >> Sent: Friday, December 18, 2015 12:05 PM >> To: Eric Eastman >> Cc: Ceph Development >> Subject: Re: Issue with Ceph File System and LIO >> >> On Fri, Dec 18, 2015 at 3:49 AM, Eric Eastman >> <eric.east...@keepertech.com> wrote: >> > With cephfs.patch and cephfs1.patch applied and I am now seeing: >> > >> > [Thu Dec 17 14:27:59 2015] [ cut here ] >> > [Thu Dec 17 14:27:59 2015] WARNING: CPU: 0 PID: 3036 at >> > fs/ceph/addr.c:1171 ceph_write_begin+0xfb/0x120 [ceph]() >> > [Thu Dec 17 14:27:59 2015] Modules linked in: iscsi_target_mod ... >> > >> >> The page gets unlocked mystically. I still don't find any clue. Could >> you please try the new patch (not incremental patch). Besides, please >> enable CONFIG_DEBUG_VM when compiling the kernel. >> >> Thanks you very much >> Yan, Zheng > I have just installed the cephfs_new.patch and have set CONFIG_DEBUG_VM=y on a new 4.4rc4 kernel and restarted the ESXi iSCSI test to my Ceph File System gateway. I plan to let it run overnight and report the status tomorrow. Let me know if I should move on to 4.4rc5 with or without patches and with or without CONFIG_DEBUG_VM=y Looking at the network traffic stats on my iSCSI gateway, with CONFIG_DEBUG_VM=y, throughput seems to be down by a factor of at least 10 compared to my last test without setting CONFIG_DEBUG_VM=y Regards, Eric -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue with Ceph File System and LIO
On Fri, Dec 18, 2015 at 3:49 AM, Eric Eastmanwrote: > With cephfs.patch and cephfs1.patch applied and I am now seeing: > > [Thu Dec 17 14:27:59 2015] [ cut here ] > [Thu Dec 17 14:27:59 2015] WARNING: CPU: 0 PID: 3036 at > fs/ceph/addr.c:1171 ceph_write_begin+0xfb/0x120 [ceph]() > [Thu Dec 17 14:27:59 2015] Modules linked in: iscsi_target_mod > vhost_scsi tcm_qla2xxx ib_srpt tcm_fc tcm_usb_gadget tcm_loop > target_core_file target_core_iblock target_core_pscsi target_core_user > target_core_mod ipmi_devintf vhost qla2xxx ib_cm ib_sa ib_mad ib_core > ib_addr libfc scsi_transport_fc libcomposite udc_core uio configfs ttm > drm_kms_helper drm ipmi_ssif coretemp gpio_ich i2c_algo_bit kvm > fb_sys_fops syscopyarea sysfillrect sysimgblt shpchp input_leds ceph > irqbypass i7core_edac serio_raw hpilo edac_core ipmi_si > ipmi_msghandler 8250_fintek lpc_ich acpi_power_meter libceph mac_hid > libcrc32c fscache bonding lp parport mlx4_en vxlan ip6_udp_tunnel > udp_tunnel ptp pps_core hid_generic usbhid hid mlx4_core hpsa psmouse > bnx2 fjes scsi_transport_sas [last unloaded: target_core_mod] > [Thu Dec 17 14:27:59 2015] CPU: 0 PID: 3036 Comm: iscsi_trx Tainted: G >W I 4.4.0-rc4-ede2 #1 > [Thu Dec 17 14:27:59 2015] Hardware name: HP ProLiant DL360 G6, BIOS > P64 01/22/2015 > [Thu Dec 17 14:27:59 2015] c02b2e37 880c0289b958 > 813ad644 > [Thu Dec 17 14:27:59 2015] 880c0289b990 81079702 > 880c0289ba50 000846c21000 > [Thu Dec 17 14:27:59 2015] 880c009ea200 1000 > ea00122ed700 880c0289b9a0 > [Thu Dec 17 14:27:59 2015] Call Trace: > [Thu Dec 17 14:27:59 2015] [] dump_stack+0x44/0x60 > [Thu Dec 17 14:27:59 2015] [] > warn_slowpath_common+0x82/0xc0 > [Thu Dec 17 14:27:59 2015] [] warn_slowpath_null+0x1a/0x20 > [Thu Dec 17 14:27:59 2015] [] > ceph_write_begin+0xfb/0x120 [ceph] > [Thu Dec 17 14:27:59 2015] [] > generic_perform_write+0xbf/0x1a0 > [Thu Dec 17 14:27:59 2015] [] > ceph_write_iter+0xf5c/0x1010 [ceph] > [Thu Dec 17 14:27:59 2015] [] ? __schedule+0x386/0x9c0 > [Thu Dec 17 14:27:59 2015] [] ? schedule+0x35/0x80 > [Thu Dec 17 14:27:59 2015] [] ? __slab_free+0xb5/0x290 > [Thu Dec 17 14:27:59 2015] [] ? > iov_iter_get_pages+0x113/0x210 > [Thu Dec 17 14:27:59 2015] [] vfs_iter_write+0x63/0xa0 > [Thu Dec 17 14:27:59 2015] [] > fd_do_rw.isra.5+0xc9/0x1b0 [target_core_file] > [Thu Dec 17 14:27:59 2015] [] > fd_execute_rw+0xc5/0x2a0 [target_core_file] > [Thu Dec 17 14:27:59 2015] [] > sbc_execute_rw+0x22/0x30 [target_core_mod] > [Thu Dec 17 14:27:59 2015] [] > __target_execute_cmd+0x1f/0x70 [target_core_mod] > [Thu Dec 17 14:27:59 2015] [] > target_execute_cmd+0x195/0x2a0 [target_core_mod] > [Thu Dec 17 14:27:59 2015] [] > iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod] > [Thu Dec 17 14:27:59 2015] [] > iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod] > [Thu Dec 17 14:27:59 2015] [] > iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod] > [Thu Dec 17 14:27:59 2015] [] ? __switch_to+0x1cd/0x570 > [Thu Dec 17 14:27:59 2015] [] ? > iscsi_target_tx_thread+0x1c0/0x1c0 [iscsi_target_mod] > [Thu Dec 17 14:27:59 2015] [] kthread+0xc9/0xe0 > [Thu Dec 17 14:27:59 2015] [] ? > kthread_create_on_node+0x180/0x180 > [Thu Dec 17 14:27:59 2015] [] ret_from_fork+0x3f/0x70 > [Thu Dec 17 14:27:59 2015] [] ? > kthread_create_on_node+0x180/0x180 > [Thu Dec 17 14:27:59 2015] ---[ end trace 8346192e3f29ed5d ]--- > The page gets unlocked mystically. I still don't find any clue. Could you please try the new patch (not incremental patch). Besides, please enable CONFIG_DEBUG_VM when compiling the kernel. Thanks you very much Yan, Zheng cephfs_new.patch Description: Binary data
Re: Issue with Ceph File System and LIO
On Wed, Dec 16, 2015 at 12:51 AM, Eric Eastmanwrote: > I have opened ticket: 14086 > > On Tue, Dec 15, 2015 at 5:05 AM, Yan, Zheng wrote: >> On Tue, Dec 15, 2015 at 2:08 PM, Eric Eastman >>> [Tue Dec 15 00:46:55 2015] [ cut here ] >>> [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at >>> /home/kernel/COD/linux/fs/ceph/addr.c:125 >> >> could you confirm that addr.c:125 is WARN_ON(!PageLocked(page)); > > I am using the generic kernel from: > http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc4-wily > and assuming they did not change anything, from the 4.4rc4 source tree > I pulled down shows: > > 124 ret = __set_page_dirty_nobuffers(page); > 125 WARN_ON(!PageLocked(page)); > 126 WARN_ON(!page->mapping); > > > modinfo ceph > filename: /lib/modules/4.4.0-040400rc4-generic/kernel/fs/ceph/ceph.ko > license:GPL > description:Ceph filesystem for Linux > author: Patience Warnick > author: Yehuda Sadeh > author: Sage Weil > alias: fs-ceph > srcversion: E94BA78C2D998705FE2C600 > depends:libceph,fscache > intree: Y > vermagic: 4.4.0-040400rc4-generic SMP mod_unload modversions > > This error has shown up about 20 times in 12 hours, since I started > the ESXi test. > This warning is really strange. Could you try the attached debug patch. Regards Yan, Zheng cephfs.patch Description: Binary data
Re: Issue with Ceph File System and LIO
>> > This warning is really strange. Could you try the attached debug patch. > > Regards > Yan, Zheng I will try the patch and get back to the list. Eric -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue with Ceph File System and LIO
I have opened ticket: 14086 On Tue, Dec 15, 2015 at 5:05 AM, Yan, Zhengwrote: > On Tue, Dec 15, 2015 at 2:08 PM, Eric Eastman >> [Tue Dec 15 00:46:55 2015] [ cut here ] >> [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at >> /home/kernel/COD/linux/fs/ceph/addr.c:125 > > could you confirm that addr.c:125 is WARN_ON(!PageLocked(page)); I am using the generic kernel from: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc4-wily and assuming they did not change anything, from the 4.4rc4 source tree I pulled down shows: 124 ret = __set_page_dirty_nobuffers(page); 125 WARN_ON(!PageLocked(page)); 126 WARN_ON(!page->mapping); modinfo ceph filename: /lib/modules/4.4.0-040400rc4-generic/kernel/fs/ceph/ceph.ko license:GPL description:Ceph filesystem for Linux author: Patience Warnick author: Yehuda Sadeh author: Sage Weil alias: fs-ceph srcversion: E94BA78C2D998705FE2C600 depends:libceph,fscache intree: Y vermagic: 4.4.0-040400rc4-generic SMP mod_unload modversions This error has shown up about 20 times in 12 hours, since I started the ESXi test. Thanks, Eric -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue with Ceph File System and LIO
On 12/15/2015 12:08 AM, Eric Eastman wrote: > I am testing Linux Target SCSI, LIO, with a Ceph File System backstore > and I am seeing this error on my LIO gateway. I am using Ceph v9.2.0 > on a 4.4rc4 Kernel, on Trusty, using a kernel mounted Ceph File > System. A file on the Ceph File System is exported via iSCSI to a > VMware ESXi 5.0 server, and I am seeing this error when doing a lot of > I/O on the ESXi server. Is this a LIO or a Ceph issue? > > [Tue Dec 15 00:46:55 2015] [ cut here ] > [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at > /home/kernel/COD/linux/fs/ceph/addr.c:125 > ceph_set_page_dirty+0x230/0x240 [ceph]() > [Tue Dec 15 00:46:55 2015] Modules linked in: iptable_filter ip_tables > x_tables xfs rbd iscsi_target_mod vhost_scsi tcm_qla2xxx ib_srpt > tcm_fc tcm_usb_gadget tcm_loop target_core_file target_core_iblock > target_core_pscsi target_core_user target_core_mod ipmi_devintf vhost > qla2xxx ib_cm ib_sa ib_mad ib_core ib_addr libfc scsi_transport_fc > libcomposite udc_core uio configfs ipmi_ssif ttm drm_kms_helper > gpio_ich drm i2c_algo_bit fb_sys_fops coretemp syscopyarea ipmi_si > sysfillrect ipmi_msghandler sysimgblt kvm acpi_power_meter 8250_fintek > irqbypass hpilo shpchp input_leds serio_raw lpc_ich i7core_edac > edac_core mac_hid ceph libceph libcrc32c fscache bonding lp parport > mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core hid_generic > usbhid hid hpsa mlx4_core psmouse bnx2 scsi_transport_sas fjes [last > unloaded: target_core_mod] > [Tue Dec 15 00:46:55 2015] CPU: 0 PID: 1123421 Comm: iscsi_trx > Tainted: GW I 4.4.0-040400rc4-generic #201512061930 > [Tue Dec 15 00:46:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS > P64 01/22/2015 > [Tue Dec 15 00:46:55 2015] fdc0ce43 > 880bf38c38c0 813c8ab4 > [Tue Dec 15 00:46:55 2015] 880bf38c38f8 > 8107d772 ea00127a8680 > [Tue Dec 15 00:46:55 2015] 8804e52c1448 8804e52c15b0 > 8804e52c10f0 0200 > [Tue Dec 15 00:46:55 2015] Call Trace: > [Tue Dec 15 00:46:55 2015] [] dump_stack+0x44/0x60 > [Tue Dec 15 00:46:55 2015] [] > warn_slowpath_common+0x82/0xc0 > [Tue Dec 15 00:46:55 2015] [] warn_slowpath_null+0x1a/0x20 > [Tue Dec 15 00:46:55 2015] [] > ceph_set_page_dirty+0x230/0x240 [ceph] > [Tue Dec 15 00:46:55 2015] [] ? > pagecache_get_page+0x150/0x1c0 > [Tue Dec 15 00:46:55 2015] [] ? > ceph_pool_perm_check+0x48/0x700 [ceph] > [Tue Dec 15 00:46:55 2015] [] set_page_dirty+0x3d/0x70 > [Tue Dec 15 00:46:55 2015] [] > ceph_write_end+0x5e/0x180 [ceph] > [Tue Dec 15 00:46:55 2015] [] ? > iov_iter_copy_from_user_atomic+0x156/0x220 > [Tue Dec 15 00:46:55 2015] [] > generic_perform_write+0x114/0x1c0 > [Tue Dec 15 00:46:55 2015] [] > ceph_write_iter+0xf8a/0x1050 [ceph] > [Tue Dec 15 00:46:55 2015] [] ? > ceph_put_cap_refs+0x143/0x320 [ceph] > [Tue Dec 15 00:46:55 2015] [] ? > check_preempt_wakeup+0xfa/0x220 > [Tue Dec 15 00:46:55 2015] [] ? zone_statistics+0x7c/0xa0 > [Tue Dec 15 00:46:55 2015] [] ? copy_page_to_iter+0x5e/0xa0 > [Tue Dec 15 00:46:55 2015] [] ? > skb_copy_datagram_iter+0x122/0x250 > [Tue Dec 15 00:46:55 2015] [] vfs_iter_write+0x76/0xc0 > [Tue Dec 15 00:46:55 2015] [] > fd_do_rw.isra.5+0xd8/0x1e0 [target_core_file] > [Tue Dec 15 00:46:55 2015] [] > fd_execute_rw+0xc5/0x2a0 [target_core_file] > [Tue Dec 15 00:46:55 2015] [] > sbc_execute_rw+0x22/0x30 [target_core_mod] > [Tue Dec 15 00:46:55 2015] [] > __target_execute_cmd+0x1f/0x70 [target_core_mod] > [Tue Dec 15 00:46:55 2015] [] > target_execute_cmd+0x195/0x2a0 [target_core_mod] > [Tue Dec 15 00:46:55 2015] [] > iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod] > [Tue Dec 15 00:46:55 2015] [] > iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod] > [Tue Dec 15 00:46:55 2015] [] > iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod] > [Tue Dec 15 00:46:55 2015] [] ? __switch_to+0x1dc/0x5a0 > [Tue Dec 15 00:46:55 2015] [] ? > iscsi_target_tx_thread+0x1e0/0x1e0 [iscsi_target_mod] > [Tue Dec 15 00:46:55 2015] [] kthread+0xd8/0xf0 > [Tue Dec 15 00:46:55 2015] [] ? > kthread_create_on_node+0x1a0/0x1a0 > [Tue Dec 15 00:46:55 2015] [] ret_from_fork+0x3f/0x70 > [Tue Dec 15 00:46:55 2015] [] ? > kthread_create_on_node+0x1a0/0x1a0 > [Tue Dec 15 00:46:55 2015] ---[ end trace 4079437668c77cbb ]--- > [Tue Dec 15 00:47:45 2015] ABORT_TASK: Found referenced iSCSI task_tag: > 95784927 > [Tue Dec 15 00:47:45 2015] ABORT_TASK: ref_tag: 95784927 already > complete, skipping > For writes, LIO just allocates pages using GFP_KERNEL, passes them to sock_recvmsg to read the data into them, then passes them to the fs using the function you see above, vfs_iter_write. So it does not do anything fancy. Do we need to send specific types of pages to ceph? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: Issue with Ceph File System and LIO
On Tue, Dec 15, 2015 at 9:26 AM, Mike Christiewrote: > On 12/15/2015 12:08 AM, Eric Eastman wrote: >> I am testing Linux Target SCSI, LIO, with a Ceph File System backstore >> and I am seeing this error on my LIO gateway. I am using Ceph v9.2.0 >> on a 4.4rc4 Kernel, on Trusty, using a kernel mounted Ceph File >> System. A file on the Ceph File System is exported via iSCSI to a >> VMware ESXi 5.0 server, and I am seeing this error when doing a lot of >> I/O on the ESXi server. Is this a LIO or a Ceph issue? >> >> [Tue Dec 15 00:46:55 2015] [ cut here ] >> [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at >> /home/kernel/COD/linux/fs/ceph/addr.c:125 >> ceph_set_page_dirty+0x230/0x240 [ceph]() >> [Tue Dec 15 00:46:55 2015] Modules linked in: iptable_filter ip_tables >> x_tables xfs rbd iscsi_target_mod vhost_scsi tcm_qla2xxx ib_srpt >> tcm_fc tcm_usb_gadget tcm_loop target_core_file target_core_iblock >> target_core_pscsi target_core_user target_core_mod ipmi_devintf vhost >> qla2xxx ib_cm ib_sa ib_mad ib_core ib_addr libfc scsi_transport_fc >> libcomposite udc_core uio configfs ipmi_ssif ttm drm_kms_helper >> gpio_ich drm i2c_algo_bit fb_sys_fops coretemp syscopyarea ipmi_si >> sysfillrect ipmi_msghandler sysimgblt kvm acpi_power_meter 8250_fintek >> irqbypass hpilo shpchp input_leds serio_raw lpc_ich i7core_edac >> edac_core mac_hid ceph libceph libcrc32c fscache bonding lp parport >> mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core hid_generic >> usbhid hid hpsa mlx4_core psmouse bnx2 scsi_transport_sas fjes [last >> unloaded: target_core_mod] >> [Tue Dec 15 00:46:55 2015] CPU: 0 PID: 1123421 Comm: iscsi_trx >> Tainted: GW I 4.4.0-040400rc4-generic #201512061930 >> [Tue Dec 15 00:46:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS >> P64 01/22/2015 >> [Tue Dec 15 00:46:55 2015] fdc0ce43 >> 880bf38c38c0 813c8ab4 >> [Tue Dec 15 00:46:55 2015] 880bf38c38f8 >> 8107d772 ea00127a8680 >> [Tue Dec 15 00:46:55 2015] 8804e52c1448 8804e52c15b0 >> 8804e52c10f0 0200 >> [Tue Dec 15 00:46:55 2015] Call Trace: >> [Tue Dec 15 00:46:55 2015] [] dump_stack+0x44/0x60 >> [Tue Dec 15 00:46:55 2015] [] >> warn_slowpath_common+0x82/0xc0 >> [Tue Dec 15 00:46:55 2015] [] warn_slowpath_null+0x1a/0x20 >> [Tue Dec 15 00:46:55 2015] [] >> ceph_set_page_dirty+0x230/0x240 [ceph] >> [Tue Dec 15 00:46:55 2015] [] ? >> pagecache_get_page+0x150/0x1c0 >> [Tue Dec 15 00:46:55 2015] [] ? >> ceph_pool_perm_check+0x48/0x700 [ceph] >> [Tue Dec 15 00:46:55 2015] [] set_page_dirty+0x3d/0x70 >> [Tue Dec 15 00:46:55 2015] [] >> ceph_write_end+0x5e/0x180 [ceph] >> [Tue Dec 15 00:46:55 2015] [] ? >> iov_iter_copy_from_user_atomic+0x156/0x220 >> [Tue Dec 15 00:46:55 2015] [] >> generic_perform_write+0x114/0x1c0 >> [Tue Dec 15 00:46:55 2015] [] >> ceph_write_iter+0xf8a/0x1050 [ceph] >> [Tue Dec 15 00:46:55 2015] [] ? >> ceph_put_cap_refs+0x143/0x320 [ceph] >> [Tue Dec 15 00:46:55 2015] [] ? >> check_preempt_wakeup+0xfa/0x220 >> [Tue Dec 15 00:46:55 2015] [] ? zone_statistics+0x7c/0xa0 >> [Tue Dec 15 00:46:55 2015] [] ? >> copy_page_to_iter+0x5e/0xa0 >> [Tue Dec 15 00:46:55 2015] [] ? >> skb_copy_datagram_iter+0x122/0x250 >> [Tue Dec 15 00:46:55 2015] [] vfs_iter_write+0x76/0xc0 >> [Tue Dec 15 00:46:55 2015] [] >> fd_do_rw.isra.5+0xd8/0x1e0 [target_core_file] >> [Tue Dec 15 00:46:55 2015] [] >> fd_execute_rw+0xc5/0x2a0 [target_core_file] >> [Tue Dec 15 00:46:55 2015] [] >> sbc_execute_rw+0x22/0x30 [target_core_mod] >> [Tue Dec 15 00:46:55 2015] [] >> __target_execute_cmd+0x1f/0x70 [target_core_mod] >> [Tue Dec 15 00:46:55 2015] [] >> target_execute_cmd+0x195/0x2a0 [target_core_mod] >> [Tue Dec 15 00:46:55 2015] [] >> iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod] >> [Tue Dec 15 00:46:55 2015] [] >> iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod] >> [Tue Dec 15 00:46:55 2015] [] >> iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod] >> [Tue Dec 15 00:46:55 2015] [] ? __switch_to+0x1dc/0x5a0 >> [Tue Dec 15 00:46:55 2015] [] ? >> iscsi_target_tx_thread+0x1e0/0x1e0 [iscsi_target_mod] >> [Tue Dec 15 00:46:55 2015] [] kthread+0xd8/0xf0 >> [Tue Dec 15 00:46:55 2015] [] ? >> kthread_create_on_node+0x1a0/0x1a0 >> [Tue Dec 15 00:46:55 2015] [] ret_from_fork+0x3f/0x70 >> [Tue Dec 15 00:46:55 2015] [] ? >> kthread_create_on_node+0x1a0/0x1a0 >> [Tue Dec 15 00:46:55 2015] ---[ end trace 4079437668c77cbb ]--- >> [Tue Dec 15 00:47:45 2015] ABORT_TASK: Found referenced iSCSI task_tag: >> 95784927 >> [Tue Dec 15 00:47:45 2015] ABORT_TASK: ref_tag: 95784927 already >> complete, skipping Looks likely to be a kclient bug, as it's in the newish pool_perm_check path. Perhaps we don't usually see this because we'd usually hit the permissions checks earlier (or during a read). CCing zyan, who will have a better idea than me. Eric: you should probably go ahead and
Re: Issue with Ceph File System and LIO
On Tue, Dec 15, 2015 at 2:08 PM, Eric Eastmanwrote: > I am testing Linux Target SCSI, LIO, with a Ceph File System backstore > and I am seeing this error on my LIO gateway. I am using Ceph v9.2.0 > on a 4.4rc4 Kernel, on Trusty, using a kernel mounted Ceph File > System. A file on the Ceph File System is exported via iSCSI to a > VMware ESXi 5.0 server, and I am seeing this error when doing a lot of > I/O on the ESXi server. Is this a LIO or a Ceph issue? > > [Tue Dec 15 00:46:55 2015] [ cut here ] > [Tue Dec 15 00:46:55 2015] WARNING: CPU: 0 PID: 1123421 at > /home/kernel/COD/linux/fs/ceph/addr.c:125 could you confirm that addr.c:125 is WARN_ON(!PageLocked(page)); Regards Yan, Zheng > ceph_set_page_dirty+0x230/0x240 [ceph]() > [Tue Dec 15 00:46:55 2015] Modules linked in: iptable_filter ip_tables > x_tables xfs rbd iscsi_target_mod vhost_scsi tcm_qla2xxx ib_srpt > tcm_fc tcm_usb_gadget tcm_loop target_core_file target_core_iblock > target_core_pscsi target_core_user target_core_mod ipmi_devintf vhost > qla2xxx ib_cm ib_sa ib_mad ib_core ib_addr libfc scsi_transport_fc > libcomposite udc_core uio configfs ipmi_ssif ttm drm_kms_helper > gpio_ich drm i2c_algo_bit fb_sys_fops coretemp syscopyarea ipmi_si > sysfillrect ipmi_msghandler sysimgblt kvm acpi_power_meter 8250_fintek > irqbypass hpilo shpchp input_leds serio_raw lpc_ich i7core_edac > edac_core mac_hid ceph libceph libcrc32c fscache bonding lp parport > mlx4_en vxlan ip6_udp_tunnel udp_tunnel ptp pps_core hid_generic > usbhid hid hpsa mlx4_core psmouse bnx2 scsi_transport_sas fjes [last > unloaded: target_core_mod] > [Tue Dec 15 00:46:55 2015] CPU: 0 PID: 1123421 Comm: iscsi_trx > Tainted: GW I 4.4.0-040400rc4-generic #201512061930 > [Tue Dec 15 00:46:55 2015] Hardware name: HP ProLiant DL360 G6, BIOS > P64 01/22/2015 > [Tue Dec 15 00:46:55 2015] fdc0ce43 > 880bf38c38c0 813c8ab4 > [Tue Dec 15 00:46:55 2015] 880bf38c38f8 > 8107d772 ea00127a8680 > [Tue Dec 15 00:46:55 2015] 8804e52c1448 8804e52c15b0 > 8804e52c10f0 0200 > [Tue Dec 15 00:46:55 2015] Call Trace: > [Tue Dec 15 00:46:55 2015] [] dump_stack+0x44/0x60 > [Tue Dec 15 00:46:55 2015] [] > warn_slowpath_common+0x82/0xc0 > [Tue Dec 15 00:46:55 2015] [] warn_slowpath_null+0x1a/0x20 > [Tue Dec 15 00:46:55 2015] [] > ceph_set_page_dirty+0x230/0x240 [ceph] > [Tue Dec 15 00:46:55 2015] [] ? > pagecache_get_page+0x150/0x1c0 > [Tue Dec 15 00:46:55 2015] [] ? > ceph_pool_perm_check+0x48/0x700 [ceph] > [Tue Dec 15 00:46:55 2015] [] set_page_dirty+0x3d/0x70 > [Tue Dec 15 00:46:55 2015] [] > ceph_write_end+0x5e/0x180 [ceph] > [Tue Dec 15 00:46:55 2015] [] ? > iov_iter_copy_from_user_atomic+0x156/0x220 > [Tue Dec 15 00:46:55 2015] [] > generic_perform_write+0x114/0x1c0 > [Tue Dec 15 00:46:55 2015] [] > ceph_write_iter+0xf8a/0x1050 [ceph] > [Tue Dec 15 00:46:55 2015] [] ? > ceph_put_cap_refs+0x143/0x320 [ceph] > [Tue Dec 15 00:46:55 2015] [] ? > check_preempt_wakeup+0xfa/0x220 > [Tue Dec 15 00:46:55 2015] [] ? zone_statistics+0x7c/0xa0 > [Tue Dec 15 00:46:55 2015] [] ? copy_page_to_iter+0x5e/0xa0 > [Tue Dec 15 00:46:55 2015] [] ? > skb_copy_datagram_iter+0x122/0x250 > [Tue Dec 15 00:46:55 2015] [] vfs_iter_write+0x76/0xc0 > [Tue Dec 15 00:46:55 2015] [] > fd_do_rw.isra.5+0xd8/0x1e0 [target_core_file] > [Tue Dec 15 00:46:55 2015] [] > fd_execute_rw+0xc5/0x2a0 [target_core_file] > [Tue Dec 15 00:46:55 2015] [] > sbc_execute_rw+0x22/0x30 [target_core_mod] > [Tue Dec 15 00:46:55 2015] [] > __target_execute_cmd+0x1f/0x70 [target_core_mod] > [Tue Dec 15 00:46:55 2015] [] > target_execute_cmd+0x195/0x2a0 [target_core_mod] > [Tue Dec 15 00:46:55 2015] [] > iscsit_execute_cmd+0x20a/0x270 [iscsi_target_mod] > [Tue Dec 15 00:46:55 2015] [] > iscsit_sequence_cmd+0xda/0x190 [iscsi_target_mod] > [Tue Dec 15 00:46:55 2015] [] > iscsi_target_rx_thread+0x51d/0xe30 [iscsi_target_mod] > [Tue Dec 15 00:46:55 2015] [] ? __switch_to+0x1dc/0x5a0 > [Tue Dec 15 00:46:55 2015] [] ? > iscsi_target_tx_thread+0x1e0/0x1e0 [iscsi_target_mod] > [Tue Dec 15 00:46:55 2015] [] kthread+0xd8/0xf0 > [Tue Dec 15 00:46:55 2015] [] ? > kthread_create_on_node+0x1a0/0x1a0 > [Tue Dec 15 00:46:55 2015] [] ret_from_fork+0x3f/0x70 > [Tue Dec 15 00:46:55 2015] [] ? > kthread_create_on_node+0x1a0/0x1a0 > [Tue Dec 15 00:46:55 2015] ---[ end trace 4079437668c77cbb ]--- > [Tue Dec 15 00:47:45 2015] ABORT_TASK: Found referenced iSCSI task_tag: > 95784927 > [Tue Dec 15 00:47:45 2015] ABORT_TASK: ref_tag: 95784927 already > complete, skipping > > If it is a Ceph File System issue, let me know and I will open a bug. > > Thanks > > Eric > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this