Re: race condition in dm-crypt?

2007-03-26 Thread Alasdair G Kergon
On Fri, Mar 23, 2007 at 06:42:14PM +0100, markus reichelt wrote:
> * "Jan C. Nordholz" <[EMAIL PROTECTED]> wrote:
> > I'm seeing this for quite a while now (since 2.6.16 at least), but
> > without any obvious indicator to what might be causing it... where
> > should I continue debugging this?
> I bet folks at [EMAIL PROTECTED] would love to hear about this.
 
As mentioned in another thread, please try these patches if they aren't already
in your kernel:

  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.19/dm-io-fix-bi_max_vecs.patch

   

  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-merge-max_hw_sector.patch


  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-disable-barriers.patch

  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-fix-call-to-clone_init.patch
  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-fix-avoid-cloned-bio-ref-after-free.patch
  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-fix-remove-first_clone.patch
  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-use-smaller-bvecs-in-clones.patch

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: race condition in dm-crypt?

2007-03-26 Thread Alasdair G Kergon
On Fri, Mar 23, 2007 at 06:42:14PM +0100, markus reichelt wrote:
 * Jan C. Nordholz [EMAIL PROTECTED] wrote:
  I'm seeing this for quite a while now (since 2.6.16 at least), but
  without any obvious indicator to what might be causing it... where
  should I continue debugging this?
 I bet folks at [EMAIL PROTECTED] would love to hear about this.
 
As mentioned in another thread, please try these patches if they aren't already
in your kernel:

  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.19/dm-io-fix-bi_max_vecs.patch

   

  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-merge-max_hw_sector.patch


  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-disable-barriers.patch

  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-fix-call-to-clone_init.patch
  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-fix-avoid-cloned-bio-ref-after-free.patch
  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-fix-remove-first_clone.patch
  
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-use-smaller-bvecs-in-clones.patch

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: race condition in dm-crypt?

2007-03-24 Thread Kasper Sandberg
On Fri, 2007-03-23 at 21:41 +0100, Christoph Maier wrote:
> Jan C. Nordholz wrote:
> > I think I'm experiencing a race condition: Irregularly my kernel runs
> > into an Oops when it tries to initialize my crypt containers.
> 
> FYI, there are similiar reports on the net, going as far back as May 2006:
> http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1636 
> is the oldest one I could find.
> 
> Bugzilla entry: http://bugzilla.kernel.org/show_bug.cgi?id=7388
> 
> I, too, ran into the bug and failed to reproduce it. However, it might 
> be worth knowing that the system went to 100% iowait afterwards.
Very interresting actually. I myself run dm-crypt and somewhat regularly
my io stops for 5-10 seconds, with seemingly no errors or high load, io
just stalls, and then returns after a while.

> 
> Regards, Christoph Maier
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: race condition in dm-crypt?

2007-03-24 Thread Kasper Sandberg
On Fri, 2007-03-23 at 21:41 +0100, Christoph Maier wrote:
 Jan C. Nordholz wrote:
  I think I'm experiencing a race condition: Irregularly my kernel runs
  into an Oops when it tries to initialize my crypt containers.
 
 FYI, there are similiar reports on the net, going as far back as May 2006:
 http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1636 
 is the oldest one I could find.
 
 Bugzilla entry: http://bugzilla.kernel.org/show_bug.cgi?id=7388
 
 I, too, ran into the bug and failed to reproduce it. However, it might 
 be worth knowing that the system went to 100% iowait afterwards.
Very interresting actually. I myself run dm-crypt and somewhat regularly
my io stops for 5-10 seconds, with seemingly no errors or high load, io
just stalls, and then returns after a while.

 
 Regards, Christoph Maier
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: race condition in dm-crypt?

2007-03-23 Thread Christoph Maier

Jan C. Nordholz wrote:

I think I'm experiencing a race condition: Irregularly my kernel runs
into an Oops when it tries to initialize my crypt containers.


FYI, there are similiar reports on the net, going as far back as May 2006:
http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1636 
is the oldest one I could find.


Bugzilla entry: http://bugzilla.kernel.org/show_bug.cgi?id=7388

I, too, ran into the bug and failed to reproduce it. However, it might 
be worth knowing that the system went to 100% iowait afterwards.


Regards, Christoph Maier

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: race condition in dm-crypt?

2007-03-23 Thread markus reichelt
* "Jan C. Nordholz" <[EMAIL PROTECTED]> wrote:

> I'm seeing this for quite a while now (since 2.6.16 at least), but
> without any obvious indicator to what might be causing it... where
> should I continue debugging this?

I bet folks at [EMAIL PROTECTED] would love to hear about this.

-- 
left blank, right bald


pgpF8tQkEIaFG.pgp
Description: PGP signature


race condition in dm-crypt?

2007-03-23 Thread Jan C. Nordholz
Dear list,

I think I'm experiencing a race condition: Irregularly my kernel runs
into an Oops when it tries to initialize my crypt containers.

> Mar 23 17:33:08 1A:hejre kernel: BUG: unable to handle kernel NULL pointer 
> dereference at virtual address 
> Mar 23 17:33:08 1A:hejre kernel:  printing eip:
> Mar 23 17:33:08 4A:hejre kernel: c0143543
> Mar 23 17:33:08 1A:hejre kernel: *pde = 
> Mar 23 17:33:08 0A:hejre kernel: Oops:  [#1]
> Mar 23 17:33:08 0A:hejre kernel: PREEMPT 
> Mar 23 17:33:08 4A:hejre kernel: Modules linked in: xt_NFQUEUE xt_tcpudp 
> xt_state xt_limit xt_CONNMARK xt_connmark xt_multiport ipt_REDIRECT 
> ipt_MASQUERADE ipt_LOG nfnetlink_queue nfnetlink iptable_mangle iptable_nat 
> nf_nat nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables ipv6 
> xfrm4_mode_transport esp4 deflate des md5 crypto_null hmac crypto_hash af_key 
> ntfs sha256 aes_i586 cbc dm_crypt dm_mod snd_virmidi snd_seq_virmidi 
> snd_ca0106 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm ac97_bus 
> snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi 
> snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore usb_storage 
> sd_mod scsi_mod nls_utf8 nls_cp850 nls_iso8859_1 hisax_fcpcipnp hisax_isac 
> hisax via686a w83781d hwmon_vid hwmon i2c_isa i2c_viapro i2c_core msr 
> isdn_bsdcomp isdn cpuid rtc
> Mar 23 17:33:08 0A:hejre kernel: CPU:0
> Mar 23 17:33:08 0A:hejre kernel: EIP:0060:[]Not tainted VLI
> Mar 23 17:33:08 0A:hejre kernel: EFLAGS: 00010282   (2.6.20-git12 #2)
> Mar 23 17:33:08 0A:hejre kernel: EIP is at mempool_free+0x13/0xb0
> Mar 23 17:33:08 0A:hejre kernel: eax: cdf39d44   ebx: cdf39d44   ecx: 
> 0001   edx: 
> Mar 23 17:33:08 0A:hejre kernel: esi:    edi: cdf39d44   ebp: 
> dff19eb4   esp: dff19ea4
> Mar 23 17:33:08 0A:hejre kernel: ds: 007b   es: 007b   fs: 00d8  gs:   
> ss: 0068
> Mar 23 17:33:08 0A:hejre kernel: Process kcryptd/0 (pid: 772, ti=dff18000 
> task=dff050d0 task.ti=dff18000)
> Mar 23 17:33:08 0A:hejre kernel: Stack: c011c58b cdf39d44  dd87f9e0 
> dff19ed8 e092c792 e092d7c4 cdf39d44 
> Mar 23 17:33:08 0A:hejre kernel:e0a45040 dd87f9e0 dff19f20 dd87f9e0 
> dfbb76e0 dff19f4c e092cc1b d10abbe0 
> Mar 23 17:33:08 0A:hejre kernel:007c   d10abbe0 
> cdf39d44 d10ab9a0 c0117598 000f 
> Mar 23 17:33:08 0A:hejre kernel: Call Trace:
> Mar 23 17:33:08 0A:hejre kernel:  [] show_trace_log_lvl+0x1a/0x30
> Mar 23 17:33:08 0A:hejre kernel:  [] show_stack_log_lvl+0xa9/0xd0
> Mar 23 17:33:08 0A:hejre kernel:  [] show_registers+0x1e1/0x330
> Mar 23 17:33:08 0A:hejre kernel:  [] die+0x10e/0x230
> Mar 23 17:33:08 0A:hejre kernel:  [] do_page_fault+0x2b0/0x5d0
> Mar 23 17:33:08 0A:hejre kernel:  [] error_code+0x74/0x7c
> Mar 23 17:33:08 0A:hejre kernel:  [] dec_pending+0x62/0x80 
> [dm_crypt]
> Mar 23 17:33:08 0A:hejre kernel:  [] kcryptd_do_work+0x2fb/0x3b0 
> [dm_crypt]
> Mar 23 17:33:08 0A:hejre kernel:  [] run_workqueue+0xa4/0x180
> Mar 23 17:33:08 0A:hejre kernel:  [] worker_thread+0x137/0x160
> Mar 23 17:33:08 0A:hejre kernel:  [] kthread+0xa3/0xd0
> Mar 23 17:33:08 0A:hejre kernel:  [] kernel_thread_helper+0x7/0x1c
> Mar 23 17:33:08 0A:hejre kernel:  ===
> Mar 23 17:33:08 0A:hejre kernel: Code: f4 45 1d 00 8d 74 26 00 eb d3 31 db e9 
> 37 ff ff ff 8d b4 26 00 00 00 00 55 89 e5 83 ec 10 89 75 f8 89 7d fc 89 d6 89 
> 5d f4 89 c7 <8b> 02 39 42 04 7d 27 9c 5b fa 89 e0 25 00 e0 ff ff ff 40 14 8b
> Mar 23 17:33:08 0A:hejre kernel: EIP: [] mempool_free+0x13/0xb0 
> SS:ESP 0068:dff19ea4

The userland process triggering the BUG gets stuck in sync_page, but the rest of
the system survives. - I suspected dec_pending's cc pointer to become invalid, 
so
I inserted a few printk()s in crypt_ctr, _dtr and dec_pending, et voilĂ :

(successful cryptsetup, a few days ago)
> Mar 21 20:37:31 6A:hejre kernel: Crypt_Ctr, DmTarget e0a4c040 now has 
> CryptConfig daf12aa0.
> Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e84, 
> DmTgt e0a4c040, CC daf12aa0.
> Mar 21 20:37:31 4A:hejre last message repeated 124 times
> Mar 21 20:37:31 6A:hejre kernel: Crypt_Dtr, freeing DmTarget e0a4c040's 
> CryptConfig daf12aa0.
> Mar 21 20:37:31 6A:hejre kernel: Crypt_Ctr, DmTarget e0a4c040 now has 
> CryptConfig daf12ba0.
> Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e84, 
> DmTgt e0a4c040, CC daf12ba0.
> Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e5c, 
> DmTgt e0a4c040, CC daf12ba0.
> Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e34, 
> DmTgt e0a4c040, CC daf12ba0.
> [... and continues happily]

(context of the above oops)
> Mar 23 17:33:08 6A:hejre kernel: Crypt_Ctr, DmTarget e0a45040 now has 
> CryptConfig dd87f9e0.
> Mar 23 17:33:08 4A:hejre kernel: This is dm_crypt::dec_pending, IO cdf39d44, 
> DmTgt e0a45040, CC dd87f9e0.
> Mar 23 17:33:08 4A:hejre kernel: This is 

race condition in dm-crypt?

2007-03-23 Thread Jan C. Nordholz
Dear list,

I think I'm experiencing a race condition: Irregularly my kernel runs
into an Oops when it tries to initialize my crypt containers.

 Mar 23 17:33:08 1A:hejre kernel: BUG: unable to handle kernel NULL pointer 
 dereference at virtual address 
 Mar 23 17:33:08 1A:hejre kernel:  printing eip:
 Mar 23 17:33:08 4A:hejre kernel: c0143543
 Mar 23 17:33:08 1A:hejre kernel: *pde = 
 Mar 23 17:33:08 0A:hejre kernel: Oops:  [#1]
 Mar 23 17:33:08 0A:hejre kernel: PREEMPT 
 Mar 23 17:33:08 4A:hejre kernel: Modules linked in: xt_NFQUEUE xt_tcpudp 
 xt_state xt_limit xt_CONNMARK xt_connmark xt_multiport ipt_REDIRECT 
 ipt_MASQUERADE ipt_LOG nfnetlink_queue nfnetlink iptable_mangle iptable_nat 
 nf_nat nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables ipv6 
 xfrm4_mode_transport esp4 deflate des md5 crypto_null hmac crypto_hash af_key 
 ntfs sha256 aes_i586 cbc dm_crypt dm_mod snd_virmidi snd_seq_virmidi 
 snd_ca0106 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm ac97_bus 
 snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi 
 snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore usb_storage 
 sd_mod scsi_mod nls_utf8 nls_cp850 nls_iso8859_1 hisax_fcpcipnp hisax_isac 
 hisax via686a w83781d hwmon_vid hwmon i2c_isa i2c_viapro i2c_core msr 
 isdn_bsdcomp isdn cpuid rtc
 Mar 23 17:33:08 0A:hejre kernel: CPU:0
 Mar 23 17:33:08 0A:hejre kernel: EIP:0060:[c0143543]Not tainted VLI
 Mar 23 17:33:08 0A:hejre kernel: EFLAGS: 00010282   (2.6.20-git12 #2)
 Mar 23 17:33:08 0A:hejre kernel: EIP is at mempool_free+0x13/0xb0
 Mar 23 17:33:08 0A:hejre kernel: eax: cdf39d44   ebx: cdf39d44   ecx: 
 0001   edx: 
 Mar 23 17:33:08 0A:hejre kernel: esi:    edi: cdf39d44   ebp: 
 dff19eb4   esp: dff19ea4
 Mar 23 17:33:08 0A:hejre kernel: ds: 007b   es: 007b   fs: 00d8  gs:   
 ss: 0068
 Mar 23 17:33:08 0A:hejre kernel: Process kcryptd/0 (pid: 772, ti=dff18000 
 task=dff050d0 task.ti=dff18000)
 Mar 23 17:33:08 0A:hejre kernel: Stack: c011c58b cdf39d44  dd87f9e0 
 dff19ed8 e092c792 e092d7c4 cdf39d44 
 Mar 23 17:33:08 0A:hejre kernel:e0a45040 dd87f9e0 dff19f20 dd87f9e0 
 dfbb76e0 dff19f4c e092cc1b d10abbe0 
 Mar 23 17:33:08 0A:hejre kernel:007c   d10abbe0 
 cdf39d44 d10ab9a0 c0117598 000f 
 Mar 23 17:33:08 0A:hejre kernel: Call Trace:
 Mar 23 17:33:08 0A:hejre kernel:  [c010500a] show_trace_log_lvl+0x1a/0x30
 Mar 23 17:33:08 0A:hejre kernel:  [c01050c9] show_stack_log_lvl+0xa9/0xd0
 Mar 23 17:33:08 0A:hejre kernel:  [c01052d1] show_registers+0x1e1/0x330
 Mar 23 17:33:08 0A:hejre kernel:  [c010552e] die+0x10e/0x230
 Mar 23 17:33:08 0A:hejre kernel:  [c01168a0] do_page_fault+0x2b0/0x5d0
 Mar 23 17:33:08 0A:hejre kernel:  [c0319bd4] error_code+0x74/0x7c
 Mar 23 17:33:08 0A:hejre kernel:  [e092c792] dec_pending+0x62/0x80 
 [dm_crypt]
 Mar 23 17:33:08 0A:hejre kernel:  [e092cc1b] kcryptd_do_work+0x2fb/0x3b0 
 [dm_crypt]
 Mar 23 17:33:08 0A:hejre kernel:  [c012b6e4] run_workqueue+0xa4/0x180
 Mar 23 17:33:08 0A:hejre kernel:  [c012bde7] worker_thread+0x137/0x160
 Mar 23 17:33:08 0A:hejre kernel:  [c012ec73] kthread+0xa3/0xd0
 Mar 23 17:33:08 0A:hejre kernel:  [c0104c3b] kernel_thread_helper+0x7/0x1c
 Mar 23 17:33:08 0A:hejre kernel:  ===
 Mar 23 17:33:08 0A:hejre kernel: Code: f4 45 1d 00 8d 74 26 00 eb d3 31 db e9 
 37 ff ff ff 8d b4 26 00 00 00 00 55 89 e5 83 ec 10 89 75 f8 89 7d fc 89 d6 89 
 5d f4 89 c7 8b 02 39 42 04 7d 27 9c 5b fa 89 e0 25 00 e0 ff ff ff 40 14 8b
 Mar 23 17:33:08 0A:hejre kernel: EIP: [c0143543] mempool_free+0x13/0xb0 
 SS:ESP 0068:dff19ea4

The userland process triggering the BUG gets stuck in sync_page, but the rest of
the system survives. - I suspected dec_pending's cc pointer to become invalid, 
so
I inserted a few printk()s in crypt_ctr, _dtr and dec_pending, et voilĂ :

(successful cryptsetup, a few days ago)
 Mar 21 20:37:31 6A:hejre kernel: Crypt_Ctr, DmTarget e0a4c040 now has 
 CryptConfig daf12aa0.
 Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e84, 
 DmTgt e0a4c040, CC daf12aa0.
 Mar 21 20:37:31 4A:hejre last message repeated 124 times
 Mar 21 20:37:31 6A:hejre kernel: Crypt_Dtr, freeing DmTarget e0a4c040's 
 CryptConfig daf12aa0.
 Mar 21 20:37:31 6A:hejre kernel: Crypt_Ctr, DmTarget e0a4c040 now has 
 CryptConfig daf12ba0.
 Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e84, 
 DmTgt e0a4c040, CC daf12ba0.
 Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e5c, 
 DmTgt e0a4c040, CC daf12ba0.
 Mar 21 20:37:31 4A:hejre kernel: This is dm_crypt::dec_pending, IO d4e03e34, 
 DmTgt e0a4c040, CC daf12ba0.
 [... and continues happily]

(context of the above oops)
 Mar 23 17:33:08 6A:hejre kernel: Crypt_Ctr, DmTarget e0a45040 now has 
 CryptConfig dd87f9e0.
 Mar 23 17:33:08 4A:hejre kernel: This is dm_crypt::dec_pending, IO cdf39d44, 
 DmTgt e0a45040, CC dd87f9e0.
 Mar 23 17:33:08 

Re: race condition in dm-crypt?

2007-03-23 Thread markus reichelt
* Jan C. Nordholz [EMAIL PROTECTED] wrote:

 I'm seeing this for quite a while now (since 2.6.16 at least), but
 without any obvious indicator to what might be causing it... where
 should I continue debugging this?

I bet folks at [EMAIL PROTECTED] would love to hear about this.

-- 
left blank, right bald


pgpF8tQkEIaFG.pgp
Description: PGP signature


Re: race condition in dm-crypt?

2007-03-23 Thread Christoph Maier

Jan C. Nordholz wrote:

I think I'm experiencing a race condition: Irregularly my kernel runs
into an Oops when it tries to initialize my crypt containers.


FYI, there are similiar reports on the net, going as far back as May 2006:
http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1636 
is the oldest one I could find.


Bugzilla entry: http://bugzilla.kernel.org/show_bug.cgi?id=7388

I, too, ran into the bug and failed to reproduce it. However, it might 
be worth knowing that the system went to 100% iowait afterwards.


Regards, Christoph Maier

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/