Hello!
On Wed, Jan 23, 2013 at 07:01:52PM +0400, Andrey Vagin wrote:
> -#define tcp_time_stamp ((__u32)(jiffies))
> +#define tcp_time_stamp(tp) ((__u32)(jiffies) + tp->tsoffset)
This implies that you always have some tp in hands. AFAIK this is not true,
so that I am puzzled how you
Hello!
On Wed, Jan 23, 2013 at 07:01:52PM +0400, Andrey Vagin wrote:
-#define tcp_time_stamp ((__u32)(jiffies))
+#define tcp_time_stamp(tp) ((__u32)(jiffies) + tp-tsoffset)
This implies that you always have some tp in hands. AFAIK this is not true,
so that I am puzzled how you were
Hello!
> Is there a reason that the target hardware address isn't the target
> hardware address?
It is bound only to the fact that linux uses protocol address
of the machine, which responds. It would be highly confusing
(more than confusing :-)), if we used our protocol address and hardware
Hello!
Is there a reason that the target hardware address isn't the target
hardware address?
It is bound only to the fact that linux uses protocol address
of the machine, which responds. It would be highly confusing
(more than confusing :-)), if we used our protocol address and hardware
Hello!
> Send a correct arp reply instead of one with sender ip and sender
> hardware adress in target fields.
I do not see anything more legal in setting target address to 0.
Actually, semantics of target address in ARP reply is ambiguous.
If it is a reply to some real request, it is set to
Hello!
Send a correct arp reply instead of one with sender ip and sender
hardware adress in target fields.
I do not see anything more legal in setting target address to 0.
Actually, semantics of target address in ARP reply is ambiguous.
If it is a reply to some real request, it is set to
Hello!
> This file is so outdated that I can't see any value in keeping it.
Absolutely agree.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please
Hello!
This file is so outdated that I can't see any value in keeping it.
Absolutely agree.
Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please
Hello!
> Also, create_workqueue() is very costly. The last 2 lines should be
> reverted.
Indeed.
The result improves from 3988 nanoseconds to 3975. :-)
Actually, the difference is within statistical variance,
which is about 20 ns.
Alexey
-
To unsubscribe from this list: send the line
Hello!
> What changed?
softirq remains raised for such tasklet. Old times softirq was processed
once per invocation, in schedule and on syscall exit and this was relatively
harmless. Since softirqs are very weakly moderated, it results in strong
cpu hogging.
> And can it be fixed?
With
Hello!
> If I understand correctly, this is because tasklet_head.list is protected
> by local_irq_save(), and t could be scheduled on another CPU, so we just
> can't steal it, yes?
Yes. All that code is written to avoid synchronization as much as possible.
> If we use worqueues, we can change
Hello!
> again, there is no reason why this couldnt be done in a hardirq context.
> If a hardirq preempts another hardirq and the first hardirq already
> processes the 'softnet work', you dont do it from the second one but
> queue it with the first one. (into the already existing
>
Hello!
> Not a very accurate measurement (jiffies that is).
Believe me or not, but the measurement has nanosecond precision.
> Since the work queue *is* a thread, you are running a busy loop here. Even
> though you call schedule, this thread still may have quota available, and
> will not yeild
Hello!
> I felt that three calls to tasklet_disable were better than a gazillion calls
> to
> spin_(un)lock.
It is not better.
Actually, it also has something equivalent to spinlock inside.
It raises some flag and waits for completion of already running
tasklets (cf. spin_lock_bh). And if
Hello!
> > The difference between softirqs and hardirqs lays not in their
> > "heavyness". It is in reentrancy protection, which has to be done with
> > local_irq_disable(), unless networking is not isolated from hardirqs.
>
> i know that pretty well ;)
You forgot about this again in the
Hello!
> I find the 4usecs cost on a P4 interesting and a bit too high - how did
> you measure it?
Simple and stupid:
int flag;
static void do_test(unsigned long dummy)
{
flag = 1;
}
static void do_test_wq(void *dummy)
{
flag = 1;
}
static void measure_tasklet0(void)
{
Hello!
I find the 4usecs cost on a P4 interesting and a bit too high - how did
you measure it?
Simple and stupid:
int flag;
static void do_test(unsigned long dummy)
{
flag = 1;
}
static void do_test_wq(void *dummy)
{
flag = 1;
}
static void measure_tasklet0(void)
{
Hello!
I felt that three calls to tasklet_disable were better than a gazillion calls
to
spin_(un)lock.
It is not better.
Actually, it also has something equivalent to spinlock inside.
It raises some flag and waits for completion of already running
tasklets (cf. spin_lock_bh). And if
Hello!
The difference between softirqs and hardirqs lays not in their
heavyness. It is in reentrancy protection, which has to be done with
local_irq_disable(), unless networking is not isolated from hardirqs.
i know that pretty well ;)
You forgot about this again in the next
Hello!
Not a very accurate measurement (jiffies that is).
Believe me or not, but the measurement has nanosecond precision.
Since the work queue *is* a thread, you are running a busy loop here. Even
though you call schedule, this thread still may have quota available, and
will not yeild to
Hello!
again, there is no reason why this couldnt be done in a hardirq context.
If a hardirq preempts another hardirq and the first hardirq already
processes the 'softnet work', you dont do it from the second one but
queue it with the first one. (into the already existing
Hello!
If I understand correctly, this is because tasklet_head.list is protected
by local_irq_save(), and t could be scheduled on another CPU, so we just
can't steal it, yes?
Yes. All that code is written to avoid synchronization as much as possible.
If we use worqueues, we can change the
Hello!
What changed?
softirq remains raised for such tasklet. Old times softirq was processed
once per invocation, in schedule and on syscall exit and this was relatively
harmless. Since softirqs are very weakly moderated, it results in strong
cpu hogging.
And can it be fixed?
With
Hello!
Also, create_workqueue() is very costly. The last 2 lines should be
reverted.
Indeed.
The result improves from 3988 nanoseconds to 3975. :-)
Actually, the difference is within statistical variance,
which is about 20 ns.
Alexey
-
To unsubscribe from this list: send the line unsubscribe
Hello!
> the context-switch argument i'll believe if i see numbers. You'll
> probably need in excess of tens of thousands of irqs/sec to even be able
> to measure its overhead. (workqueues are driven by nice kernel threads
> so there's no TLB overhead, etc.)
It was authors of the patch who
Hello!
the context-switch argument i'll believe if i see numbers. You'll
probably need in excess of tens of thousands of irqs/sec to even be able
to measure its overhead. (workqueues are driven by nice kernel threads
so there's no TLB overhead, etc.)
It was authors of the patch who were
Hello!
> We actually need to do something about this, as we might loop for ever
> there. The robust cleanup code can fail (e.g. due to list corruption)
> and we would see exit_state != 0 and the OWNER_DIED bit would never be
> set, so we are stuck in a busy loop.
Yes...
It is possible to take
Hello!
> Hmm, what means not expected ? -ESRCH is returned, when the owner task
> is not found.
This is not supposed to happen with robust futexes.
glibs aborts (which is correct), or for build with disabled debugging
enters simulated deadlock (which is confusing).
> lock. Also using uval is
Hello!
Hmm, what means not expected ? -ESRCH is returned, when the owner task
is not found.
This is not supposed to happen with robust futexes.
glibs aborts (which is correct), or for build with disabled debugging
enters simulated deadlock (which is confusing).
lock. Also using uval is
Hello!
We actually need to do something about this, as we might loop for ever
there. The robust cleanup code can fail (e.g. due to list corruption)
and we would see exit_state != 0 and the OWNER_DIED bit would never be
set, so we are stuck in a busy loop.
Yes...
It is possible to take
Hello!
> #2 crash be explained via any of the bugs you fixed? (i.e. memory
> corruption?)
Yes, I found the reason, it is really fixed by taking tasklist_lock.
This happens after task struct with not cleared pi_state_list is freed
and the list of futex_pi_state's is corrupted.
Meanwhile... two
Hello!
#2 crash be explained via any of the bugs you fixed? (i.e. memory
corruption?)
Yes, I found the reason, it is really fixed by taking tasklist_lock.
This happens after task struct with not cleared pi_state_list is freed
and the list of futex_pi_state's is corrupted.
Meanwhile... two
Hello!
1. New entries can be added to tsk->pi_state_list after task completed
exit_pi_state_list(). The result is memory leakage and deadlocks.
2. handle_mm_fault() is called under spinlock. The result is obvious.
3. State machine is broken. Kernel thinks it owns futex after
it released
Hello!
1. New entries can be added to tsk-pi_state_list after task completed
exit_pi_state_list(). The result is memory leakage and deadlocks.
2. handle_mm_fault() is called under spinlock. The result is obvious.
3. State machine is broken. Kernel thinks it owns futex after
it released
Hello!
> This might work. Could you post a patch to better show what you mean to do?
Here it is.
->neigh_destructor() is killed (not used), replaced with ->neigh_cleanup(),
which is called when neighbor entry goes to dead state. At this point
everything is still valid: neigh->dev, neigh->parms
Hello!
> infiniband sets parm->neigh_destructor, and I search for a way to prevent
> this destructor from being called after the module has been unloaded.
> Ideas?
It must be called in any case to update/release internal ipoib structures.
The idea is to move call of parm->neigh_destructor from
Hello!
> If a device driver sets neigh_destructor in neigh_params, this could
> get called after the device has been unregistered and the driver module
> removed.
It is the same problem: if dst->neighbour holds neighbour, it should
not hold device. parms->dev is not supposed to be used after
Hello!
> I think the thing to do is to just leave the loopback references
> in place, try to unregister the per-namespace loopback device,
> and that will safely wait for all the references to go away.
Yes, it is exactly how it works in openvz. All the sockets are killed,
queues are cleared,
Hello!
> Does this look sane (untested)?
It does not, unfortunately.
Instead of regular crash in infiniband you will get numerous
random NULL pointer dereferences both due to dst->neighbour
and due to dst->dev.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Hello!
> Well I don't think the loopback device is currently but as soon
> as we get network namespace support we will have multiple loopback
> devices and they will get unregistered when we remove the network
> namespace.
There is no logical difference. At the moment when namespace is gone
Hello!
Well I don't think the loopback device is currently but as soon
as we get network namespace support we will have multiple loopback
devices and they will get unregistered when we remove the network
namespace.
There is no logical difference. At the moment when namespace is gone
there is
Hello!
Does this look sane (untested)?
It does not, unfortunately.
Instead of regular crash in infiniband you will get numerous
random NULL pointer dereferences both due to dst-neighbour
and due to dst-dev.
Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the
Hello!
I think the thing to do is to just leave the loopback references
in place, try to unregister the per-namespace loopback device,
and that will safely wait for all the references to go away.
Yes, it is exactly how it works in openvz. All the sockets are killed,
queues are cleared, nobody
Hello!
If a device driver sets neigh_destructor in neigh_params, this could
get called after the device has been unregistered and the driver module
removed.
It is the same problem: if dst-neighbour holds neighbour, it should
not hold device. parms-dev is not supposed to be used after
Hello!
infiniband sets parm-neigh_destructor, and I search for a way to prevent
this destructor from being called after the module has been unloaded.
Ideas?
It must be called in any case to update/release internal ipoib structures.
The idea is to move call of parm-neigh_destructor from
Hello!
This might work. Could you post a patch to better show what you mean to do?
Here it is.
-neigh_destructor() is killed (not used), replaced with -neigh_cleanup(),
which is called when neighbor entry goes to dead state. At this point
everything is still valid: neigh-dev, neigh-parms etc.
Hello!
> > It should be cleared and we should be sure it will not be destroyed
> > before quiescent state.
>
> I'm confused. didn't you say dst_ifdown is called after quiescent state?
Quiescent state should happen after dst->neighbour is invalidated.
And this implies that all the users of
Hello!
> This is not new code, and should have triggered long time ago,
> so I am not sure how come we are triggering this only now,
> but somehow this did not lead to crashes in 2.6.20
I see. I guess this was plain luck.
> Why is neighbour->dev changed here?
It holds reference to device and
Hello!
> Hmm. Something I don't understand: does the code
> in question not run on *each* device unregister?
It does.
> Why do I only see this under stress?
You should have some referenced destination entries to trigger bad path.
This should happen not only under stress.
F.e. just try to ssh
Hello!
Hmm. Something I don't understand: does the code
in question not run on *each* device unregister?
It does.
Why do I only see this under stress?
You should have some referenced destination entries to trigger bad path.
This should happen not only under stress.
F.e. just try to ssh to
Hello!
This is not new code, and should have triggered long time ago,
so I am not sure how come we are triggering this only now,
but somehow this did not lead to crashes in 2.6.20
I see. I guess this was plain luck.
Why is neighbour-dev changed here?
It holds reference to device and
Hello!
It should be cleared and we should be sure it will not be destroyed
before quiescent state.
I'm confused. didn't you say dst_ifdown is called after quiescent state?
Quiescent state should happen after dst-neighbour is invalidated.
And this implies that all the users of
Hello!
> Getting an error there is all the more reason to proceed
> with the swapoff, not to give up and break out of it.
Yes, from this viewpoint more reasonable approach would be to untie
corresponding ptes from swap entry and mark them as invalid to trigger
fault on access.
Not even tried
Hello!
Getting an error there is all the more reason to proceed
with the swapoff, not to give up and break out of it.
Yes, from this viewpoint more reasonable approach would be to untie
corresponding ptes from swap entry and mark them as invalid to trigger
fault on access.
Not even tried
Hello!
> Well, take a look at the double acks for 84439343, 84440447 and 84441059,
> they seem pretty much identical to me.
It is just a little tcpdump glitch.
19:34:54.532271 < 10.2.20.246.33060 > 65.171.224.182.8700: . 44:44(0) ack
84439343 win 24544 (DF) (ttl 64, id
60946)
Hello!
> This is where things start going bad. The window starts shrinking from
> 15340 all the way down to 2355 over the course of 0.3 seconds. Notice the
> many duplicate acks that serve no purpose
These are not duplicate, TCP_NODELAY sender just starts flooding
tiny segments, and those are
Hello!
> I wonder if clamping the window though is too harsh. Maybe just
> setting the rcv_ssthresh down is better?
It is too harsh. This was invented before we learned how to collapse
received data, that time tiny segments were fatal and clamping was
the last weapon against misbehaving
Hello!
> I experienced the very same problem but with window size going all the
> way down to just a few bytes (14 bytes). dump files available upon
> requests :)
I do request.
TCP is not allowed to reduce window to a value less than 2*MSS no matter
how hard network device or peer try to
Hello!
> If you overflow the socket's memory bound, it ends up calling
> tcp_clamp_window(). (I'm not sure this is really the right thing to do
> here before trying to collapse the queue.)
Collapsing is too expensive procedure, it is rather an emergency measure.
So, tcp collapses queue, when
Hello!
If you overflow the socket's memory bound, it ends up calling
tcp_clamp_window(). (I'm not sure this is really the right thing to do
here before trying to collapse the queue.)
Collapsing is too expensive procedure, it is rather an emergency measure.
So, tcp collapses queue, when it
Hello!
I experienced the very same problem but with window size going all the
way down to just a few bytes (14 bytes). dump files available upon
requests :)
I do request.
TCP is not allowed to reduce window to a value less than 2*MSS no matter
how hard network device or peer try to confuse
Hello!
I wonder if clamping the window though is too harsh. Maybe just
setting the rcv_ssthresh down is better?
It is too harsh. This was invented before we learned how to collapse
received data, that time tiny segments were fatal and clamping was
the last weapon against misbehaving
Hello!
This is where things start going bad. The window starts shrinking from
15340 all the way down to 2355 over the course of 0.3 seconds. Notice the
many duplicate acks that serve no purpose
These are not duplicate, TCP_NODELAY sender just starts flooding
tiny segments, and those are
Hello!
Well, take a look at the double acks for 84439343, 84440447 and 84441059,
they seem pretty much identical to me.
It is just a little tcpdump glitch.
19:34:54.532271 10.2.20.246.33060 65.171.224.182.8700: . 44:44(0) ack
84439343 win 24544 nop,nop,timestamp 226080638 99717832 (DF)
Hello!
> If this packet came in from an 802.1Q VLAN device, the VLAN code already
> has the logic necessary to map the .1q priority to an arbitrary
> skb->priority.
Actually, the patch makes sense when it is straight ethernet bridge
not involving full parsing of VLAN. I guess the case when the
Hello!
If this packet came in from an 802.1Q VLAN device, the VLAN code already
has the logic necessary to map the .1q priority to an arbitrary
skb-priority.
Actually, the patch makes sense when it is straight ethernet bridge
not involving full parsing of VLAN. I guess the case when the
66 matches
Mail list logo