Re: race condition in dm-crypt?
On Fri, Mar 23, 2007 at 06:42:14PM +0100, markus reichelt wrote: > * "Jan C. Nordholz" <[EMAIL PROTECTED]> wrote: > > I'm seeing this for quite a while now (since 2.6.16 at least), but > > without any obvious indicator to what might be causing it... where > > should I continue debugging this? > I bet folks at [EMAIL PROTECTED] would love to hear about this. As mentioned in another thread, please try these patches if they aren't already in your kernel: http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.19/dm-io-fix-bi_max_vecs.patch http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-merge-max_hw_sector.patch http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-disable-barriers.patch http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-fix-call-to-clone_init.patch http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-fix-avoid-cloned-bio-ref-after-free.patch http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-fix-remove-first_clone.patch http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-use-smaller-bvecs-in-clones.patch Alasdair -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: race condition in dm-crypt?
On Fri, Mar 23, 2007 at 06:42:14PM +0100, markus reichelt wrote: * Jan C. Nordholz [EMAIL PROTECTED] wrote: I'm seeing this for quite a while now (since 2.6.16 at least), but without any obvious indicator to what might be causing it... where should I continue debugging this? I bet folks at [EMAIL PROTECTED] would love to hear about this. As mentioned in another thread, please try these patches if they aren't already in your kernel: http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.19/dm-io-fix-bi_max_vecs.patch http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-merge-max_hw_sector.patch http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-disable-barriers.patch http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-fix-call-to-clone_init.patch http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-fix-avoid-cloned-bio-ref-after-free.patch http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-fix-remove-first_clone.patch http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-use-smaller-bvecs-in-clones.patch Alasdair -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: race condition in dm-crypt?
On Fri, 2007-03-23 at 21:41 +0100, Christoph Maier wrote: > Jan C. Nordholz wrote: > > I think I'm experiencing a race condition: Irregularly my kernel runs > > into an Oops when it tries to initialize my crypt containers. > > FYI, there are similiar reports on the net, going as far back as May 2006: > http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1636 > is the oldest one I could find. > > Bugzilla entry: http://bugzilla.kernel.org/show_bug.cgi?id=7388 > > I, too, ran into the bug and failed to reproduce it. However, it might > be worth knowing that the system went to 100% iowait afterwards. Very interresting actually. I myself run dm-crypt and somewhat regularly my io stops for 5-10 seconds, with seemingly no errors or high load, io just stalls, and then returns after a while. > > Regards, Christoph Maier > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: race condition in dm-crypt?
On Fri, 2007-03-23 at 21:41 +0100, Christoph Maier wrote: Jan C. Nordholz wrote: I think I'm experiencing a race condition: Irregularly my kernel runs into an Oops when it tries to initialize my crypt containers. FYI, there are similiar reports on the net, going as far back as May 2006: http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1636 is the oldest one I could find. Bugzilla entry: http://bugzilla.kernel.org/show_bug.cgi?id=7388 I, too, ran into the bug and failed to reproduce it. However, it might be worth knowing that the system went to 100% iowait afterwards. Very interresting actually. I myself run dm-crypt and somewhat regularly my io stops for 5-10 seconds, with seemingly no errors or high load, io just stalls, and then returns after a while. Regards, Christoph Maier - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: race condition in dm-crypt?
Jan C. Nordholz wrote: I think I'm experiencing a race condition: Irregularly my kernel runs into an Oops when it tries to initialize my crypt containers. FYI, there are similiar reports on the net, going as far back as May 2006: http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1636 is the oldest one I could find. Bugzilla entry: http://bugzilla.kernel.org/show_bug.cgi?id=7388 I, too, ran into the bug and failed to reproduce it. However, it might be worth knowing that the system went to 100% iowait afterwards. Regards, Christoph Maier - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: race condition in dm-crypt?
* "Jan C. Nordholz" <[EMAIL PROTECTED]> wrote: > I'm seeing this for quite a while now (since 2.6.16 at least), but > without any obvious indicator to what might be causing it... where > should I continue debugging this? I bet folks at [EMAIL PROTECTED] would love to hear about this. -- left blank, right bald pgpF8tQkEIaFG.pgp Description: PGP signature
race condition in dm-crypt?
Dear list, I think I'm experiencing a race condition: Irregularly my kernel runs into an Oops when it tries to initialize my crypt containers. > Mar 23 17:33:08 1A:hejre kernel: BUG: unable to handle kernel NULL pointer > dereference at virtual address > Mar 23 17:33:08 1A:hejre kernel: printing eip: > Mar 23 17:33:08 4A:hejre kernel: c0143543 > Mar 23 17:33:08 1A:hejre kernel: *pde = > Mar 23 17:33:08 0A:hejre kernel: Oops: [#1] > Mar 23 17:33:08 0A:hejre kernel: PREEMPT > Mar 23 17:33:08 4A:hejre kernel: Modules linked in: xt_NFQUEUE xt_tcpudp > xt_state xt_limit xt_CONNMARK xt_connmark xt_multiport ipt_REDIRECT > ipt_MASQUERADE ipt_LOG nfnetlink_queue nfnetlink iptable_mangle iptable_nat > nf_nat nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables ipv6 > xfrm4_mode_transport esp4 deflate des md5 crypto_null hmac crypto_hash af_key > ntfs sha256 aes_i586 cbc dm_crypt dm_mod snd_virmidi snd_seq_virmidi > snd_ca0106 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm ac97_bus > snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi > snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore usb_storage > sd_mod scsi_mod nls_utf8 nls_cp850 nls_iso8859_1 hisax_fcpcipnp hisax_isac > hisax via686a w83781d hwmon_vid hwmon i2c_isa i2c_viapro i2c_core msr > isdn_bsdcomp isdn cpuid rtc > Mar 23 17:33:08 0A:hejre kernel: CPU:0 > Mar 23 17:33:08 0A:hejre kernel: EIP:0060:[]Not tainted VLI > Mar 23 17:33:08 0A:hejre kernel: EFLAGS: 00010282 (2.6.20-git12 #2) > Mar 23 17:33:08 0A:hejre kernel: EIP is at mempool_free+0x13/0xb0 > Mar 23 17:33:08 0A:hejre kernel: eax: cdf39d44 ebx: cdf39d44 ecx: > 0001 edx: > Mar 23 17:33:08 0A:hejre kernel: esi: edi: cdf39d44 ebp: > dff19eb4 esp: dff19ea4 > Mar 23 17:33:08 0A:hejre kernel: ds: 007b es: 007b fs: 00d8 gs: > ss: 0068 > Mar 23 17:33:08 0A:hejre kernel: Process kcryptd/0 (pid: 772, ti=dff18000 > task=dff050d0 task.ti=dff18000) > Mar 23 17:33:08 0A:hejre kernel: Stack: c011c58b cdf39d44 dd87f9e0 > dff19ed8 e092c792 e092d7c4 cdf39d44 > Mar 23 17:33:08 0A:hejre kernel:e0a45040 dd87f9e0 dff19f20 dd87f9e0 > dfbb76e0 dff19f4c e092cc1b d10abbe0 > Mar 23 17:33:08 0A:hejre kernel:007c d10abbe0 > cdf39d44 d10ab9a0 c0117598 000f > Mar 23 17:33:08 0A:hejre kernel: Call Trace: > Mar 23 17:33:08 0A:hejre kernel: [] show_trace_log_lvl+0x1a/0x30 > Mar 23 17:33:08 0A:hejre kernel: [] show_stack_log_lvl+0xa9/0xd0 > Mar 23 17:33:08 0A:hejre kernel: [] show_registers+0x1e1/0x330 > Mar 23 17:33:08 0A:hejre kernel: [] die+0x10e/0x230 > Mar 23 17:33:08 0A:hejre kernel: [] do_page_fault+0x2b0/0x5d0 > Mar 23 17:33:08 0A:hejre kernel: [] error_code+0x74/0x7c > Mar 23 17:33:08 0A:hejre kernel: [] dec_pending+0x62/0x80 > [dm_crypt] > Mar 23 17:33:08 0A:hejre kernel: [] kcryptd_do_work+0x2fb/0x3b0 > [dm_crypt] > Mar 23 17:33:08 0A:hejre kernel: [] run_workqueue+0xa4/0x180 > Mar 23 17:33:08 0A:hejre kernel: [] worker_thread+0x137/0x160 > Mar 23 17:33:08 0A:hejre kernel: [] kthread+0xa3/0xd0 > Mar 23 17:33:08 0A:hejre kernel: [] kernel_thread_helper+0x7/0x1c > Mar 23 17:33:08 0A:hejre kernel: === > Mar 23 17:33:08 0A:hejre kernel: Code: f4 45 1d 00 8d 74 26 00 eb d3 31 db e9 > 37 ff ff ff 8d b4 26 00 00 00 00 55 89 e5 83 ec 10 89 75 f8 89 7d fc 89 d6 89 > 5d f4 89 c7 <8b> 02 39 42 04 7d 27 9c 5b fa 89 e0 25 00 e0 ff ff ff 40 14 8b > Mar 23 17:33:08 0A:hejre kernel: EIP: [] mempool_free+0x13/0xb0 > SS:ESP 0068:dff19ea4 The userland process triggering the BUG gets stuck in sync_page, but the rest of the system survives. - I suspected dec_pending's cc pointer to become invalid, so I inserted a few printk()s in crypt_ctr, _dtr and dec_pending, et voilĂ : (successful cryptsetup, a few days ago) > Mar 21 20:37:31 6A:hejre kernel: Crypt_Ctr, DmTarget e0a4c040 now has > CryptConfig daf12aa0. > Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e84, > DmTgt e0a4c040, CC daf12aa0. > Mar 21 20:37:31 4A:hejre last message repeated 124 times > Mar 21 20:37:31 6A:hejre kernel: Crypt_Dtr, freeing DmTarget e0a4c040's > CryptConfig daf12aa0. > Mar 21 20:37:31 6A:hejre kernel: Crypt_Ctr, DmTarget e0a4c040 now has > CryptConfig daf12ba0. > Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e84, > DmTgt e0a4c040, CC daf12ba0. > Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e5c, > DmTgt e0a4c040, CC daf12ba0. > Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e34, > DmTgt e0a4c040, CC daf12ba0. > [... and continues happily] (context of the above oops) > Mar 23 17:33:08 6A:hejre kernel: Crypt_Ctr, DmTarget e0a45040 now has > CryptConfig dd87f9e0. > Mar 23 17:33:08 4A:hejre kernel: This is dm_crypt::dec_pending, IO cdf39d44, > DmTgt e0a45040, CC dd87f9e0. > Mar 23 17:33:08 4A:hejre kernel: This is
race condition in dm-crypt?
Dear list, I think I'm experiencing a race condition: Irregularly my kernel runs into an Oops when it tries to initialize my crypt containers. Mar 23 17:33:08 1A:hejre kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address Mar 23 17:33:08 1A:hejre kernel: printing eip: Mar 23 17:33:08 4A:hejre kernel: c0143543 Mar 23 17:33:08 1A:hejre kernel: *pde = Mar 23 17:33:08 0A:hejre kernel: Oops: [#1] Mar 23 17:33:08 0A:hejre kernel: PREEMPT Mar 23 17:33:08 4A:hejre kernel: Modules linked in: xt_NFQUEUE xt_tcpudp xt_state xt_limit xt_CONNMARK xt_connmark xt_multiport ipt_REDIRECT ipt_MASQUERADE ipt_LOG nfnetlink_queue nfnetlink iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables ipv6 xfrm4_mode_transport esp4 deflate des md5 crypto_null hmac crypto_hash af_key ntfs sha256 aes_i586 cbc dm_crypt dm_mod snd_virmidi snd_seq_virmidi snd_ca0106 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm ac97_bus snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore usb_storage sd_mod scsi_mod nls_utf8 nls_cp850 nls_iso8859_1 hisax_fcpcipnp hisax_isac hisax via686a w83781d hwmon_vid hwmon i2c_isa i2c_viapro i2c_core msr isdn_bsdcomp isdn cpuid rtc Mar 23 17:33:08 0A:hejre kernel: CPU:0 Mar 23 17:33:08 0A:hejre kernel: EIP:0060:[c0143543]Not tainted VLI Mar 23 17:33:08 0A:hejre kernel: EFLAGS: 00010282 (2.6.20-git12 #2) Mar 23 17:33:08 0A:hejre kernel: EIP is at mempool_free+0x13/0xb0 Mar 23 17:33:08 0A:hejre kernel: eax: cdf39d44 ebx: cdf39d44 ecx: 0001 edx: Mar 23 17:33:08 0A:hejre kernel: esi: edi: cdf39d44 ebp: dff19eb4 esp: dff19ea4 Mar 23 17:33:08 0A:hejre kernel: ds: 007b es: 007b fs: 00d8 gs: ss: 0068 Mar 23 17:33:08 0A:hejre kernel: Process kcryptd/0 (pid: 772, ti=dff18000 task=dff050d0 task.ti=dff18000) Mar 23 17:33:08 0A:hejre kernel: Stack: c011c58b cdf39d44 dd87f9e0 dff19ed8 e092c792 e092d7c4 cdf39d44 Mar 23 17:33:08 0A:hejre kernel:e0a45040 dd87f9e0 dff19f20 dd87f9e0 dfbb76e0 dff19f4c e092cc1b d10abbe0 Mar 23 17:33:08 0A:hejre kernel:007c d10abbe0 cdf39d44 d10ab9a0 c0117598 000f Mar 23 17:33:08 0A:hejre kernel: Call Trace: Mar 23 17:33:08 0A:hejre kernel: [c010500a] show_trace_log_lvl+0x1a/0x30 Mar 23 17:33:08 0A:hejre kernel: [c01050c9] show_stack_log_lvl+0xa9/0xd0 Mar 23 17:33:08 0A:hejre kernel: [c01052d1] show_registers+0x1e1/0x330 Mar 23 17:33:08 0A:hejre kernel: [c010552e] die+0x10e/0x230 Mar 23 17:33:08 0A:hejre kernel: [c01168a0] do_page_fault+0x2b0/0x5d0 Mar 23 17:33:08 0A:hejre kernel: [c0319bd4] error_code+0x74/0x7c Mar 23 17:33:08 0A:hejre kernel: [e092c792] dec_pending+0x62/0x80 [dm_crypt] Mar 23 17:33:08 0A:hejre kernel: [e092cc1b] kcryptd_do_work+0x2fb/0x3b0 [dm_crypt] Mar 23 17:33:08 0A:hejre kernel: [c012b6e4] run_workqueue+0xa4/0x180 Mar 23 17:33:08 0A:hejre kernel: [c012bde7] worker_thread+0x137/0x160 Mar 23 17:33:08 0A:hejre kernel: [c012ec73] kthread+0xa3/0xd0 Mar 23 17:33:08 0A:hejre kernel: [c0104c3b] kernel_thread_helper+0x7/0x1c Mar 23 17:33:08 0A:hejre kernel: === Mar 23 17:33:08 0A:hejre kernel: Code: f4 45 1d 00 8d 74 26 00 eb d3 31 db e9 37 ff ff ff 8d b4 26 00 00 00 00 55 89 e5 83 ec 10 89 75 f8 89 7d fc 89 d6 89 5d f4 89 c7 8b 02 39 42 04 7d 27 9c 5b fa 89 e0 25 00 e0 ff ff ff 40 14 8b Mar 23 17:33:08 0A:hejre kernel: EIP: [c0143543] mempool_free+0x13/0xb0 SS:ESP 0068:dff19ea4 The userland process triggering the BUG gets stuck in sync_page, but the rest of the system survives. - I suspected dec_pending's cc pointer to become invalid, so I inserted a few printk()s in crypt_ctr, _dtr and dec_pending, et voilĂ : (successful cryptsetup, a few days ago) Mar 21 20:37:31 6A:hejre kernel: Crypt_Ctr, DmTarget e0a4c040 now has CryptConfig daf12aa0. Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e84, DmTgt e0a4c040, CC daf12aa0. Mar 21 20:37:31 4A:hejre last message repeated 124 times Mar 21 20:37:31 6A:hejre kernel: Crypt_Dtr, freeing DmTarget e0a4c040's CryptConfig daf12aa0. Mar 21 20:37:31 6A:hejre kernel: Crypt_Ctr, DmTarget e0a4c040 now has CryptConfig daf12ba0. Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e84, DmTgt e0a4c040, CC daf12ba0. Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e5c, DmTgt e0a4c040, CC daf12ba0. Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e34, DmTgt e0a4c040, CC daf12ba0. [... and continues happily] (context of the above oops) Mar 23 17:33:08 6A:hejre kernel: Crypt_Ctr, DmTarget e0a45040 now has CryptConfig dd87f9e0. Mar 23 17:33:08 4A:hejre kernel: This is dm_crypt::dec_pending, IO cdf39d44, DmTgt e0a45040, CC dd87f9e0. Mar 23 17:33:08
Re: race condition in dm-crypt?
* Jan C. Nordholz [EMAIL PROTECTED] wrote: I'm seeing this for quite a while now (since 2.6.16 at least), but without any obvious indicator to what might be causing it... where should I continue debugging this? I bet folks at [EMAIL PROTECTED] would love to hear about this. -- left blank, right bald pgpF8tQkEIaFG.pgp Description: PGP signature
Re: race condition in dm-crypt?
Jan C. Nordholz wrote: I think I'm experiencing a race condition: Irregularly my kernel runs into an Oops when it tries to initialize my crypt containers. FYI, there are similiar reports on the net, going as far back as May 2006: http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1636 is the oldest one I could find. Bugzilla entry: http://bugzilla.kernel.org/show_bug.cgi?id=7388 I, too, ran into the bug and failed to reproduce it. However, it might be worth knowing that the system went to 100% iowait afterwards. Regards, Christoph Maier - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/