Re: Oops in filter add

2007-03-21 Thread jamal
On Tue, 2007-20-03 at 11:58 +0100, Patrick McHardy wrote:
 jamal wrote:

  So the resolution (as Dave points out) was wrong. In any case, restoring
  queue_lock for now would slow things but will remove the race.
 
 
 Yes. I think thats what we should do for 2.6.21, since fixing
 this while keeping ingress_lock is quite intrusive.
 

reasonable.

 I'm on it. I'm using the opportunity to try to simply the qdisc locking.

Ok, thanks Patrick. 
BTW, I was just staring at the code and i think i have found probably a
long standing minor bug on the holding of the tree lock. I will post
 a patch shortly if i dont get disrupted.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-20 Thread jamal
On Tue, 2007-20-03 at 07:58 +0100, Patrick McHardy wrote:
 jamal wrote:

  The main idea is to avoid one BigLock for both ingress and egress;
  Which was/is still useful in the compat mode where netfilter is used
  instead. 
 
 
 In that case is isn't even used.
 

Ok. It certainly used to matter in the old days.

 You would need to make qdisc_lock_tree() aware of the difference
 between ingress and egress.

If you have the cycles please go ahead - I really dont have much time
today (have to present in a few hours and havent even started).
I will most certainly look in 1-2 days; for now the queue_lock should
suffice; 


cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-20 Thread Patrick McHardy
jamal wrote:
 On Tue, 2007-20-03 at 07:58 +0100, Patrick McHardy wrote:
 
 Ok. It certainly used to matter in the old days.


Actually it has never been used anywhere else but in ing_filter,
it was introduced together with the TC actions.

You would need to make qdisc_lock_tree() aware of the difference
between ingress and egress.
 
 
 If you have the cycles please go ahead - I really dont have much time
 today (have to present in a few hours and havent even started).
 I will most certainly look in 1-2 days; for now the queue_lock should
 suffice; 


I'll try, but no promises, I'm a bit behind with various things myself.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-20 Thread jamal
On Tue, 2007-20-03 at 08:29 +0100, Patrick McHardy wrote:
 jamal wrote:
  On Tue, 2007-20-03 at 07:58 +0100, Patrick McHardy wrote:
  
  Ok. It certainly used to matter in the old days.
 
 
 Actually it has never been used anywhere else but in ing_filter,
 it was introduced together with the TC actions.
 

You are correct. I looked at old 2.4 and all i saw was:

--
/* 
 revisit later: Use a private since lock dev-queue_lock is also
 used on the egress (might slow things for an iota)
*/
 
 if (dev-qdisc_ingress) {
 spin_lock(dev-queue_lock);
 if ((q = dev-qdisc_ingress) != NULL)
 fwres = q-enqueue(skb, q);
 spin_unlock(dev-queue_lock);
 }
--

So the resolution (as Dave points out) was wrong. In any case, restoring
queue_lock for now would slow things but will remove the race.

 
 I'll try, but no promises, I'm a bit behind with various things myself.

I will ping you in a few days and if you havent done anything i will
take it up.
I am almost tempted to make the ingress filters to not have any
dependencies on egress whatsoever. It will create more locks but
will make the datapath faster. Actions can still be shared, but thats
lesser of an overhead.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-20 Thread Chris Madden
Thanks for all your replies!

One thing I did notice in examining tc_ctl_tfilter was that there is
something like:

qdisc_lock_tree(dev);
tp-next = *back;
*back = tp;
qdisc_unlock_tree(dev);

And then proceed to the data structure down below with:

err = tp-ops-change(tp, cl, t-tcm_handle, tca, fh);

Simply reordering these seems to ameliorate the problem greatly.  I
don't know if this is a generic solution or something specific to the
basic filter only. 

Chris Madden

jamal wrote:
 On Tue, 2007-20-03 at 08:29 +0100, Patrick McHardy wrote:
   
 jamal wrote:
 
 On Tue, 2007-20-03 at 07:58 +0100, Patrick McHardy wrote:

 Ok. It certainly used to matter in the old days.
   
 Actually it has never been used anywhere else but in ing_filter,
 it was introduced together with the TC actions.

 

 You are correct. I looked at old 2.4 and all i saw was:

 --
 /* 
  revisit later: Use a private since lock dev-queue_lock is also
  used on the egress (might slow things for an iota)
 */
  
  if (dev-qdisc_ingress) {
  spin_lock(dev-queue_lock);
  if ((q = dev-qdisc_ingress) != NULL)
  fwres = q-enqueue(skb, q);
  spin_unlock(dev-queue_lock);
  }
 --

 So the resolution (as Dave points out) was wrong. In any case, restoring
 queue_lock for now would slow things but will remove the race.

   
 I'll try, but no promises, I'm a bit behind with various things myself.
 

 I will ping you in a few days and if you havent done anything i will
 take it up.
 I am almost tempted to make the ingress filters to not have any
 dependencies on egress whatsoever. It will create more locks but
 will make the datapath faster. Actions can still be shared, but thats
 lesser of an overhead.

 cheers,
 jamal

   

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-20 Thread Patrick McHardy
jamal wrote:
 On Tue, 2007-20-03 at 08:29 +0100, Patrick McHardy wrote:
 
Actually it has never been used anywhere else but in ing_filter,
it was introduced together with the TC actions.

 
 
 You are correct. I looked at old 2.4 and all i saw was:
 
 --
 /* 
  revisit later: Use a private since lock dev-queue_lock is also
  used on the egress (might slow things for an iota)
 */
  
  if (dev-qdisc_ingress) {
  spin_lock(dev-queue_lock);
  if ((q = dev-qdisc_ingress) != NULL)
  fwres = q-enqueue(skb, q);
  spin_unlock(dev-queue_lock);
  }
 --
 
 So the resolution (as Dave points out) was wrong. In any case, restoring
 queue_lock for now would slow things but will remove the race.


Yes. I think thats what we should do for 2.6.21, since fixing
this while keeping ingress_lock is quite intrusive.


I'll try, but no promises, I'm a bit behind with various things myself.
 
 
 I will ping you in a few days and if you havent done anything i will
 take it up.
 I am almost tempted to make the ingress filters to not have any
 dependencies on egress whatsoever. It will create more locks but
 will make the datapath faster. Actions can still be shared, but thats
 lesser of an overhead.


I'm on it. I'm using the opportunity to try to simply the qdisc locking.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-20 Thread Patrick McHardy
Chris Madden wrote:
 Thanks for all your replies!
 
 One thing I did notice in examining tc_ctl_tfilter was that there is
 something like:
 
 qdisc_lock_tree(dev);
 tp-next = *back;
 *back = tp;
 qdisc_unlock_tree(dev);
 
 And then proceed to the data structure down below with:
 
 err = tp-ops-change(tp, cl, t-tcm_handle, tca, fh);
 
 Simply reordering these seems to ameliorate the problem greatly.  I
 don't know if this is a generic solution or something specific to the
 basic filter only. 


It might hide the problem, but there are currently a lot of places
where things can go wrong.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-20 Thread Chris Madden
jamal wrote:
 On Mon, 2007-19-03 at 19:22 -0700, David Miller wrote:
 Can you just replace the above with dev-queue_lock and see if
 that makes your problem go away?  THanks.
 

 It should; 
 i will stare at the code later and see if i can send a better patch,
 maybe a read_lock(qdisc_tree_lock). Chris, if you can experiment by just
 locking against filters instead, that would be nicer for the reasons i
 described above.

   
Ok, I replaced ingress_lock with queue_lock in ing_filter and it died in
the same place.  Trace below (looks to be substantively the same)...

If I am reading tc_ctl_tfilter correctly, we are adding our new
tcf_proto to the end of the list, and it is getting used before the
change function is getting executed on it.   Locking the ingress_lock in
qdisc_lock_tree ( in addition to the extant queue_lock )seems to have
the same effect; I can get  a traceback from that if its useful.

BUG: unable to handle kernel NULL pointer dereference at virtual address
004
 printing
eip: 
f8a5f093   

*pde =

Oops: 
[#1]
SMP

Modules linked in: af_packet cls_basic sch_ingress button ac battery
ts_match bn
CPU:   
0  
EIP:0060:[f8a5f093]Not tainted
VLI   
EFLAGS: 00010246   (2.6.20.3-x86
#1)   
EIP is at basic_classify+0xb/0x69
[cls_basic]  
eax: f3d57080   ebx: f3dd6f00   ecx: c0329f1c   edx:
f3dd6f00  
esi: c0329f1c   edi:    ebp: f3d57080   esp:
c0329ee4  
ds: 007b   es: 007b   ss:
0068 
Process python (pid: 2503, ti=c0329000 task=f431e570
task.ti=f42ac000) 
Stack: f3dd6f00 f3d57080 f3dd6f00 0008 c02206a1 0286 f8aaad60
f3d57080 
   c0329f1c f3e606c0 f3d57080  df993000 f8a5c10b 0080
c02108cd 
   df993000 f3d57080 c021454e df993000 0040 f3db8780 0008
 
Call Trace:
 [c02206a1] tc_classify+0x34/0xbc  
 [f8a5c10b] ingress_enqueue+0x16/0x55 [sch_ingress]
 [c02108cd] __alloc_skb+0x53/0xfd  
 [c021454e] netif_receive_skb+0x1cd/0x2a6  
 [f88aa162] e1000_clean_rx_irq+0x35b/0x42c [e1000] 
 [f88a9e07] e1000_clean_rx_irq+0x0/0x42c [e1000]   
 [f88a9194] e1000_clean+0x6e/0x23d [e1000] 
 [c011007b] handle_vm86_fault+0x16/0x6f7   
 [c0215ef6] net_rx_action+0xcc/0x1bc   
 [c011cafc] __do_softirq+0x5d/0xba 
 [c0104c7f] do_softirq+0x59/0xa9   
 [c01341e3] handle_fasteoi_irq+0x0/0xa0
 [c0104d70] do_IRQ+0xa1/0xb9   
 [c010367f] common_interrupt+0x23/0x28 
 [c014c39a] kmem_cache_zalloc+0x33/0x5c
 [f8a5f4ac] basic_change+0x17c/0x370 [cls_basic]   
 [c02220ff] tc_ctl_tfilter+0x3ec/0x469 
 [c0221d13] tc_ctl_tfilter+0x0/0x469   
 [c021b270] rtnetlink_rcv_msg+0x1b3/0x1d8  
 [c0221398] tc_dump_qdisc+0x0/0xfe 
 [c02259ed] netlink_run_queue+0x50/0xbe
 [c021b0bd] rtnetlink_rcv_msg+0x0/0x1d8
 [c021b07c] rtnetlink_rcv+0x25/0x3d
 [c0225e34] netlink_data_ready+0x12/0x4c
 [c0224e2e] netlink_sendskb+0x19/0x30
 [c0225e16]
netlink_sendmsg+0x242/0x24e  
 [c020ba5f]
sock_sendmsg+0xbc/0xd4   
 [c012832d]
autoremove_wake_function+0x0/0x35
 [c012832d]
autoremove_wake_function+0x0/0x35
 [c010367f]
common_interrupt+0x23/0x28   
 [c021184e]
verify_iovec+0x3e/0x70   
 [c020bc0b]
sys_sendmsg+0x194/0x1f9  
 [c0112bde]
__wake_up+0x32/0x43  
 [c022508a]
netlink_insert+0x106/0x110   
 [c022515b]
netlink_autobind+0xc7/0xe3   
 [c0226376]
netlink_bind+0x8d/0x127  
 [c013f623]
do_wp_page+0x149/0x36c

Re: Oops in filter add

2007-03-20 Thread Patrick McHardy
Chris Madden wrote:
 Ok, I replaced ingress_lock with queue_lock in ing_filter and it died in
 the same place.  Trace below (looks to be substantively the same)...
 
 If I am reading tc_ctl_tfilter correctly, we are adding our new
 tcf_proto to the end of the list, and it is getting used before the
 change function is getting executed on it.   Locking the ingress_lock in
 qdisc_lock_tree ( in addition to the extant queue_lock )seems to have
 the same effect; I can get  a traceback from that if its useful.


The problem is that some classifiers (like basic, fw and route)
implement empty -init functions, although fw and route properly
deal with it in their classification functions. The -init function
should allocate and initalize tp-root, which basic fails to do.
If you give me a few hours I'll cook up a patch for this.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-20 Thread Patrick McHardy
Patrick McHardy wrote:
 Chris Madden wrote:
 
Ok, I replaced ingress_lock with queue_lock in ing_filter and it died in
the same place.  Trace below (looks to be substantively the same)...

If I am reading tc_ctl_tfilter correctly, we are adding our new
tcf_proto to the end of the list, and it is getting used before the
change function is getting executed on it.   Locking the ingress_lock in
qdisc_lock_tree ( in addition to the extant queue_lock )seems to have
the same effect; I can get  a traceback from that if its useful.
 
 
 
 The problem is that some classifiers (like basic, fw and route)
 implement empty -init functions, although fw and route properly
 deal with it in their classification functions. The -init function
 should allocate and initalize tp-root, which basic fails to do.
 If you give me a few hours I'll cook up a patch for this.


Actually its only cls_basic thats broken, in case of route and fw
its intentional for backwards-compatibility.

Can you try this patch please?

diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index fad08e5..ed788d0 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -81,6 +81,13 @@ static void basic_put(struct tcf_proto *
 
 static int basic_init(struct tcf_proto *tp)
 {
+   struct basic_head *head;
+
+   head = kzalloc(sizeof(*head), GFP_KERNEL);
+   if (head == NULL)
+   return -ENOMEM;
+   INIT_LIST_HEAD(head-flist);
+   tp-root = head;
return 0;
 }
 
@@ -175,16 +182,6 @@ static int basic_change(struct tcf_proto
return basic_set_parms(tp, f, base, tb, tca[TCA_RATE-1]);
}
 
-   err = -ENOBUFS;
-   if (head == NULL) {
-   head = kzalloc(sizeof(*head), GFP_KERNEL);
-   if (head == NULL)
-   goto errout;
-
-   INIT_LIST_HEAD(head-flist);
-   tp-root = head;
-   }
-
f = kzalloc(sizeof(*f), GFP_KERNEL);
if (f == NULL)
goto errout;


Re: Oops in filter add

2007-03-20 Thread Thomas Graf
* Patrick McHardy [EMAIL PROTECTED] 2007-03-20 15:57
 Actually its only cls_basic thats broken, in case of route and fw
 its intentional for backwards-compatibility.

Absolutely right, thanks Patrick.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-20 Thread Patrick McHardy
Thomas Graf wrote:
 * Patrick McHardy [EMAIL PROTECTED] 2007-03-20 15:57
 
Actually its only cls_basic thats broken, in case of route and fw
its intentional for backwards-compatibility.
 
 
 Absolutely right, thanks Patrick.


There was a small bug in my patch (broken return value on memory
allocation failure, not relevant for testing though). I'll push
the fixed patch to Dave once Chris confirms that it fixes the
problem he's seeing.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-20 Thread Chris Madden
Patrick McHardy wrote:

 There was a small bug in my patch (broken return value on memory
 allocation failure, not relevant for testing though). I'll push
 the fixed patch to Dave once Chris confirms that it fixes the
 problem he's seeing.
   
Looks like that may have done it.  I ramped up the traffic on it and did
thousands of inserts/removals and the box seems happy.

I had ing_filter still locking dev-ingress_lock ( instead of the
queue_lock suggestion earlier ).   So while that race may still be in
there, I haven't been able to hit the locking problem.

chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-20 Thread Patrick McHardy
Chris Madden wrote:
 Patrick McHardy wrote:
 
There was a small bug in my patch (broken return value on memory
allocation failure, not relevant for testing though). I'll push
the fixed patch to Dave once Chris confirms that it fixes the
problem he's seeing.
  
 
 Looks like that may have done it.  I ramped up the traffic on it and did
 thousands of inserts/removals and the box seems happy.


Thanks Chris.

 I had ing_filter still locking dev-ingress_lock ( instead of the
 queue_lock suggestion earlier ).   So while that race may still be in
 there, I haven't been able to hit the locking problem.


I'll push a patch for this as well, we can hopefully go back to using
ingress_lock in 2.6.22.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-19 Thread David Miller
From: Chris Madden [EMAIL PROTECTED]
Date: Mon, 19 Mar 2007 16:10:29 -0400

 I did some digging, and it appears the filter add isn't mutexed right.  
 Inside net/core/dev.c, ing_filter, I see:
 
 spin_lock(dev-ingress_lock);
 if ((q = dev-qdisc_ingress) != NULL)
 result = q-enqueue(skb, q);
 spin_unlock(dev-ingress_lock);
 
 And unless I'm missing something, this is the only place this lock is
 used ( other than initialization ).  In net/sched/cls_api.c, I see we do
 qdisc_lock_tree/qdisc_unlock_tree (which locks dev-queue_lock).  As
 near as I can tell, this is our problem ( our mutexes don't prohibit
 manipulation while packets are flowing ).

I think this should use dev-queue_lock.

It looks like the idea might have been to allow more parallelized
running of ingress filters, but this is done wrong and leads to
the crashes you are seeing.

Can you just replace the above with dev-queue_lock and see if
that makes your problem go away?  THanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-19 Thread jamal
On Mon, 2007-19-03 at 19:22 -0700, David Miller wrote:

 
 I think this should use dev-queue_lock.
 

It would slow down things if he is doing both ingress and egress
traffic as well as control changes.

 It looks like the idea might have been to allow more parallelized
 running of ingress filters, but this is done wrong and leads to
 the crashes you are seeing.

The main idea is to avoid one BigLock for both ingress and egress;
Which was/is still useful in the compat mode where netfilter is used
instead. 

 Can you just replace the above with dev-queue_lock and see if
 that makes your problem go away?  THanks.

It should; 
i will stare at the code later and see if i can send a better patch,
maybe a read_lock(qdisc_tree_lock). Chris, if you can experiment by just
locking against filters instead, that would be nicer for the reasons i
described above.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Oops in filter add

2007-03-19 Thread Patrick McHardy
jamal wrote:
 On Mon, 2007-19-03 at 19:22 -0700, David Miller wrote:
 
It looks like the idea might have been to allow more parallelized
running of ingress filters, but this is done wrong and leads to
the crashes you are seeing.
 
 
 The main idea is to avoid one BigLock for both ingress and egress;
 Which was/is still useful in the compat mode where netfilter is used
 instead. 


In that case is isn't even used.

Can you just replace the above with dev-queue_lock and see if
that makes your problem go away?  THanks.
 
 
 It should; 
 i will stare at the code later and see if i can send a better patch,
 maybe a read_lock(qdisc_tree_lock).


You would need to make qdisc_lock_tree() aware of the difference
between ingress and egress.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html