Re: [B.A.T.M.A.N.] [PATCH] Removing the big batman lock

2009-12-13 Thread Simon Wunderlich
Hello Andrew,

sorry for the late answer and thank you for your remarks.

I like your idea of the static hash iterator and have got this implemented in
svn revision 1499. This should remove the problems about the memory leaks
with dangling hash iterators. What do you think?

As the patch obviously has some more severe problems (deadlocks etc) i will
have to review this seperately ... :(

On Mon, Nov 30, 2009 at 11:09:02AM +0100, Andrew Lunn wrote:
 On Sun, Nov 29, 2009 at 09:09:50PM +0100, Simon Wunderlich wrote:
 
  I did some testing, including loading, unloading, killing individual
  nodes etc, which seems to be clean so far. However there might be 
  more race conditions introduced by this large patch, and i'm therefore
  requesting a careful review before committing.
 
 Hi Simon
 
 I've not done a careful review yet, just a quick look.
 
 Two idea for improvements:
 
 In the purge code, add some debug code which looks at the refcount
 value. If it is not between 1 and 5, prinkt() a warning what there is
 probably a missing refcount operation. Since the purge code does not
 run very often, it should not add much overhead, yet it is still a
 useful debug tool.
 
 The following bit of code happens quite a lot:
 
 while (NULL != (orig_node = orig_hash_iterate(hashit, orig_node))) {
 
 
 There is also a comment about having to free hashit, if you don't
 iterate to the end of the hash. How about refactoring this, more like
 the linux list.h.
 
 Make hashit a stack variable, with an init macro:
 
 #define HASHIT(name) struct hash_it_t name = { .index = -1, .bucket = NULL, \
  .prev_bucket=NULL,   \
  .first_bucket=NULL }
 
 and a macro for iterating over the hash
 
 HASHIT(hashit);
 
 orig_hash_for_each(orig_node, hashit) {
 
 foo(orig_node);
 }
 
 
 Andrew
 ___
 B.A.T.M.A.N mailing list
 b.a.t.m@lists.open-mesh.net
 https://lists.open-mesh.net/mm/listinfo/b.a.t.m.a.n
 


signature.asc
Description: Digital signature


Re: [B.A.T.M.A.N.] [PATCH] Removing the big batman lock

2009-12-03 Thread Linus Lüssing
Hi Simon,

I just gave your patch a try on my laptop and could successfully,
reproduceably crash my kernel in the following way:
Setting up wifi to ad-hoc mode and connecting it to other
batman-wifi-nodes, insmodding batman-adv on my laptop and adding
this wifi interface to batman - kernel hangs (see the two
attachements for more detailed error messages thrown by the kernel).

I'm not an expert in this, just a guess: Could it be, that
purge_orig() is executing the spinlock first and calling
free_orig_node() then, which tries to lock the same variable
again, resulting into a deadlock?

Cheers, Linus
[21060.326349] batman-adv:Adding interface: wlan1
[21060.337123] batman-adv:Interface activated: wlan1
[21125.872005] BUG: soft lockup - CPU#0 stuck for 61s! [bat_events:5157]
[21125.872007] Modules linked in: batman_adv(-) tun nvidia(P) uinput ppdev lp 
parport sco bridge stp bnep rfcomm kvm_intel kvm acpi_cpufreq cpufreq_powersave 
cpufreq_conservative cpufreq_userspace cpufreq_stats l2cap fuse dm_snapshot 
dm_mirror dm_region_hash dm_log firewire_sbp2 loop snd_hda_codec_realtek 
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss uvcvideo snd_mixer_oss arc4 
ecb btusb snd_pcm snd_seq_midi snd_rawmidi videodev snd_seq_midi_event snd_seq 
v4l1_compat snd_timer iwlagn v4l2_compat_ioctl32 snd_seq_device iwlcore 
bluetooth snd i2c_i801 mac80211 soundcore i2c_core acer_wmi button processor ac 
battery snd_page_alloc cfg80211 psmouse evdev pcspkr serio_raw wmi rfkill ext3 
jbd mbcache sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod 
ide_cd_mod sd_mod crc_t10dif cdrom ata_generic ide_pci_generic mmc_block ahci 
uhci_hcd libata piix ide_core sdhci_pci sdhci tg3 ricoh_mmc libphy 
firewire_ohci firewire_core crc_itu_t scsi_mod mmc_core led_class intel_agp 
video output ehci_hcd thermal fan thermal_sys [last unloaded: batman_adv]
[21125.872007] CPU 0:
[21125.872007] Modules linked in: batman_adv(-) tun nvidia(P) uinput ppdev lp 
parport sco bridge stp bnep rfcomm kvm_intel kvm acpi_cpufreq cpufreq_powersave 
cpufreq_conservative cpufreq_userspace cpufreq_stats l2cap fuse dm_snapshot 
dm_mirror dm_region_hash dm_log firewire_sbp2 loop snd_hda_codec_realtek 
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss uvcvideo snd_mixer_oss arc4 
ecb btusb snd_pcm snd_seq_midi snd_rawmidi videodev snd_seq_midi_event snd_seq 
v4l1_compat snd_timer iwlagn v4l2_compat_ioctl32 snd_seq_device iwlcore 
bluetooth snd i2c_i801 mac80211 soundcore i2c_core acer_wmi button processor ac 
battery snd_page_alloc cfg80211 psmouse evdev pcspkr serio_raw wmi rfkill ext3 
jbd mbcache sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod 
ide_cd_mod sd_mod crc_t10dif cdrom ata_generic ide_pci_generic mmc_block ahci 
uhci_hcd libata piix ide_core sdhci_pci sdhci tg3 ricoh_mmc libphy 
firewire_ohci firewire_core crc_itu_t scsi_mod mmc_core led_class intel_agp 
video output ehci_hcd thermal fan thermal_sys [last unloaded: batman_adv]
[21125.872007] Pid: 5157, comm: bat_events Tainted: P   2.6.31-1-amd64 
#1 ��� 
[21125.872007] RIP: 0010:[812de34a]  [812de34a] 
_spin_lock+0xf/0x1b
[21125.872007] RSP: :88007dd99d98  EFLAGS: 0297
[21125.872007] RAX: 0008 RBX: 88004f878b40 RCX: 0001
[21125.872007] RDX: 0007 RSI: a0e8f02b RDI: 88004f878b90
[21125.872007] RBP: 8101166e R08: 88007d5eca20 R09: 000f
[21125.872007] R10: 0020 R11:  R12: 
[21125.872007] R13: 88007d9d R14: 88005f1f2c80 R15: 880070052000
[21125.872007] FS:  () GS:880001643000() 
knlGS:
[21125.872007] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[21125.872007] CR2: f39dc000 CR3: 01001000 CR4: 26f0
[21125.872007] DR0:  DR1:  DR2: 
[21125.872007] DR3:  DR6: 0ff0 DR7: 0400
[21125.872007] Call Trace:
[21125.872007]  [a0e8c78a] ? free_orig_node+0x26/0x84 [batman_adv]
[21125.872007]  [a0e8ca18] ? purge_orig+0x198/0x1d8 [batman_adv]
[21125.872007]  [a0e8c880] ? purge_orig+0x0/0x1d8 [batman_adv]
[21125.872007]  [8105b43b] ? worker_thread+0x174/0x211
[21125.872007]  [8105f39e] ? autoremove_wake_function+0x0/0x2e
[21125.872007]  [8105b2c7] ? worker_thread+0x0/0x211
[21125.872007]  [8105f042] ? kthread+0x8b/0x93
[21125.872007]  [81011baa] ? child_rip+0xa/0x20
[21125.872007]  [8105efb7] ? kthread+0x0/0x93
[21125.872007]  [81011ba0] ? child_rip+0x0/0x20
[ 4412.876146] wlan1: Trigger new scan to find an IBSS to join
[ 4417.816058] wlan1: Trigger new scan to find an IBSS to join
[ 4420.102974] wlan1: Creating new IBSS network, BSSID 1a:f2:51:40:bc:e9
[ 4421.761168] wlan1: Selected IBSS BSSID 02:22:b0:44:94:5d based on configured 
SSID
[ 4433.902832] 

Re: [B.A.T.M.A.N.] [PATCH] Removing the big batman lock

2009-12-03 Thread Andrew Lunn
On Thu, Dec 03, 2009 at 02:31:22AM +0100, Linus L??ssing wrote:
 Hi Simon,
 
 I just gave your patch a try on my laptop and could successfully,
 reproduceably crash my kernel in the following way:

This looks like a deadlock. 

Simon: Did you try lockdep on this new code? 

http://lwn.net/Articles/185666/

This should hopefully show you what locks are being taken in the wrong
order. This is something i keep intending on running, but never get
around to it. Now we have a probably deadlock, it might be enough
reason for me to actually do it.

Andrew


Re: [B.A.T.M.A.N.] [PATCH] Removing the big batman lock

2009-11-30 Thread Andrew Lunn
On Sun, Nov 29, 2009 at 09:09:50PM +0100, Simon Wunderlich wrote:

 I did some testing, including loading, unloading, killing individual
 nodes etc, which seems to be clean so far. However there might be 
 more race conditions introduced by this large patch, and i'm therefore
 requesting a careful review before committing.

Hi Simon

I've not done a careful review yet, just a quick look.

Two idea for improvements:

In the purge code, add some debug code which looks at the refcount
value. If it is not between 1 and 5, prinkt() a warning what there is
probably a missing refcount operation. Since the purge code does not
run very often, it should not add much overhead, yet it is still a
useful debug tool.

The following bit of code happens quite a lot:

while (NULL != (orig_node = orig_hash_iterate(hashit, orig_node))) {


There is also a comment about having to free hashit, if you don't
iterate to the end of the hash. How about refactoring this, more like
the linux list.h.

Make hashit a stack variable, with an init macro:

#define HASHIT(name) struct hash_it_t name = { .index = -1, .bucket = NULL, \
   .prev_bucket=NULL,   \
   .first_bucket=NULL }

and a macro for iterating over the hash

HASHIT(hashit);

orig_hash_for_each(orig_node, hashit) {

foo(orig_node);
}


Andrew


[B.A.T.M.A.N.] [PATCH] Removing the big batman lock

2009-11-29 Thread Simon Wunderlich
The orig_hash_lock locks big sections of the batman-adv code base, which
might lead to unneccesary performance degradation at some points. 

Therefore this patch moves the locking from the whole originator hash
table to individual orig nodes, introducing a reference count based
system to identify whether a node is still in use or can be free()d.

To summarize the changes:
 * when iterating, the hash is only locked while an item is picked
   from the hash. Access to the orig_nodes themselves is no longer 
   protected by orig_hash_lock.
 * Each orig_node is referenced when it is used or referenced (e.g. as
   neighbor), and freed when there are no references left. This makes
   it neccesary to carefully reference and unreference nodes.
 * orig_nodes receive an individual spin lock to protect access to
   their data.

I did some testing, including loading, unloading, killing individual
nodes etc, which seems to be clean so far. However there might be 
more race conditions introduced by this large patch, and i'm therefore
requesting a careful review before committing.

Signed-off-by: Simon Wunderlich simon.wunderl...@s2003.tu-chemnitz.de
---

Index: batman-adv-kernelland/vis.c
===
--- batman-adv-kernelland/vis.c (revision 1489)
+++ batman-adv-kernelland/vis.c (working copy)
@@ -26,6 +26,7 @@
 #include soft-interface.h
 #include hard-interface.h
 #include hash.h
+#include originator.h
 #include compat.h
 
 struct hashtable_t *vis_hash;
@@ -262,16 +263,16 @@
 
 /* Walk the originators and find the VIS server with the best tq. Set the 
packet
  * address to its address and return the best_tq.
- *
- * Must be called with the originator hash locked */
+ */
 static int find_best_vis_server(struct vis_info *info)
 {
struct hash_it_t *hashit = NULL;
-   struct orig_node *orig_node;
+   struct orig_node *orig_node = NULL;
int best_tq = -1;
 
-   while (NULL != (hashit = hash_iterate(orig_hash, hashit))) {
-   orig_node = hashit-bucket-data;
+   while (NULL != (orig_node = orig_hash_iterate(hashit, orig_node))) {
+
+   spin_lock(orig_node-lock);
if ((orig_node != NULL) 
(orig_node-router != NULL) 
(orig_node-flags  VIS_SERVER) 
@@ -280,6 +281,7 @@
memcpy(info-packet.target_orig, orig_node-orig,
   ETH_ALEN);
}
+   spin_unlock(orig_node-lock);
}
return best_tq;
 }
@@ -298,7 +300,7 @@
 static int generate_vis_packet(void)
 {
struct hash_it_t *hashit = NULL;
-   struct orig_node *orig_node;
+   struct orig_node *orig_node = NULL;
struct vis_info *info = (struct vis_info *)my_vis_info;
struct vis_info_entry *entry, *entry_array;
struct hna_local_entry *hna_local_entry;
@@ -307,7 +309,6 @@
 
info-first_seen = jiffies;
 
-   spin_lock(orig_hash_lock);
memcpy(info-packet.target_orig, broadcastAddr, ETH_ALEN);
info-packet.ttl = TTL;
info-packet.seqno++;
@@ -316,17 +317,17 @@
if (!is_vis_server_locked()) {
best_tq = find_best_vis_server(info);
if (best_tq  0) {
-   spin_unlock(orig_hash_lock);
return -1;
}
}
-   hashit = NULL;
 
entry_array = (struct vis_info_entry *)
((char *)info + sizeof(struct vis_info));
 
-   while (NULL != (hashit = hash_iterate(orig_hash, hashit))) {
-   orig_node = hashit-bucket-data;
+   while (NULL != (orig_node = orig_hash_iterate(hashit, orig_node))) {
+
+   spin_lock(orig_node-lock);
+
if (orig_node-router != NULL
 compare_orig(orig_node-router-addr, 
orig_node-orig)
 orig_node-batman_if
@@ -340,15 +341,16 @@
entry-quality = orig_node-router-tq_avg;
info-packet.entries++;
 
+   /* TODO: this is is a possible memory leak, 
+* hashit should be freed somewhere. */
if (vis_packet_full(info)) {
-   spin_unlock(orig_hash_lock);
+   spin_unlock(orig_node-lock);
+   orig_unreference(orig_node);
return 0;
}
}
}
 
-   spin_unlock(orig_hash_lock);
-
hashit = NULL;
spin_lock_irqsave(hna_local_hash_lock, flags);
while (NULL != (hashit = hash_iterate(hna_local_hash, hashit))) {
@@ -388,14 +390,12 @@
 static void broadcast_vis_packet(struct vis_info *info, int packet_length)
 {
struct hash_it_t *hashit = NULL;
-   struct orig_node *orig_node;
+   struct orig_node *orig_node = NULL;
 
-   spin_lock(orig_hash_lock);
-