Re: [PATCH] [RFC] tcp: add ability to set a timestamp offset (v2)

2013-01-23 Thread Alexey Kuznetsov
Hello! On Wed, Jan 23, 2013 at 07:01:52PM +0400, Andrey Vagin wrote: > -#define tcp_time_stamp ((__u32)(jiffies)) > +#define tcp_time_stamp(tp) ((__u32)(jiffies) + tp->tsoffset) This implies that you always have some tp in hands. AFAIK this is not true, so that I am puzzled how you

Re: [PATCH] [RFC] tcp: add ability to set a timestamp offset (v2)

2013-01-23 Thread Alexey Kuznetsov
Hello! On Wed, Jan 23, 2013 at 07:01:52PM +0400, Andrey Vagin wrote: -#define tcp_time_stamp ((__u32)(jiffies)) +#define tcp_time_stamp(tp) ((__u32)(jiffies) + tp-tsoffset) This implies that you always have some tp in hands. AFAIK this is not true, so that I am puzzled how you were

Re: [PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0 (was: Strange behavior in arp probe reply, bug or feature?)

2007-11-19 Thread Alexey Kuznetsov
Hello! > Is there a reason that the target hardware address isn't the target > hardware address? It is bound only to the fact that linux uses protocol address of the machine, which responds. It would be highly confusing (more than confusing :-)), if we used our protocol address and hardware

Re: [PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0 (was: Strange behavior in arp probe reply, bug or feature?)

2007-11-19 Thread Alexey Kuznetsov
Hello! Is there a reason that the target hardware address isn't the target hardware address? It is bound only to the fact that linux uses protocol address of the machine, which responds. It would be highly confusing (more than confusing :-)), if we used our protocol address and hardware

Re: [PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0 (was: Strange behavior in arp probe reply, bug or feature?)

2007-11-15 Thread Alexey Kuznetsov
Hello! > Send a correct arp reply instead of one with sender ip and sender > hardware adress in target fields. I do not see anything more legal in setting target address to 0. Actually, semantics of target address in ARP reply is ambiguous. If it is a reply to some real request, it is set to

Re: [PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0 (was: Strange behavior in arp probe reply, bug or feature?)

2007-11-15 Thread Alexey Kuznetsov
Hello! Send a correct arp reply instead of one with sender ip and sender hardware adress in target fields. I do not see anything more legal in setting target address to 0. Actually, semantics of target address in ARP reply is ambiguous. If it is a reply to some real request, it is set to

Re: [2.6 patch] remove Documentation/networking/routing.txt

2007-11-05 Thread Alexey Kuznetsov
Hello! > This file is so outdated that I can't see any value in keeping it. Absolutely agree. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please

Re: [2.6 patch] remove Documentation/networking/routing.txt

2007-11-05 Thread Alexey Kuznetsov
Hello! This file is so outdated that I can't see any value in keeping it. Absolutely agree. Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! > Also, create_workqueue() is very costly. The last 2 lines should be > reverted. Indeed. The result improves from 3988 nanoseconds to 3975. :-) Actually, the difference is within statistical variance, which is about 20 ns. Alexey - To unsubscribe from this list: send the line

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! > What changed? softirq remains raised for such tasklet. Old times softirq was processed once per invocation, in schedule and on syscall exit and this was relatively harmless. Since softirqs are very weakly moderated, it results in strong cpu hogging. > And can it be fixed? With

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! > If I understand correctly, this is because tasklet_head.list is protected > by local_irq_save(), and t could be scheduled on another CPU, so we just > can't steal it, yes? Yes. All that code is written to avoid synchronization as much as possible. > If we use worqueues, we can change

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! > again, there is no reason why this couldnt be done in a hardirq context. > If a hardirq preempts another hardirq and the first hardirq already > processes the 'softnet work', you dont do it from the second one but > queue it with the first one. (into the already existing >

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! > Not a very accurate measurement (jiffies that is). Believe me or not, but the measurement has nanosecond precision. > Since the work queue *is* a thread, you are running a busy loop here. Even > though you call schedule, this thread still may have quota available, and > will not yeild

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! > I felt that three calls to tasklet_disable were better than a gazillion calls > to > spin_(un)lock. It is not better. Actually, it also has something equivalent to spinlock inside. It raises some flag and waits for completion of already running tasklets (cf. spin_lock_bh). And if

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! > > The difference between softirqs and hardirqs lays not in their > > "heavyness". It is in reentrancy protection, which has to be done with > > local_irq_disable(), unless networking is not isolated from hardirqs. > > i know that pretty well ;) You forgot about this again in the

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! > I find the 4usecs cost on a P4 interesting and a bit too high - how did > you measure it? Simple and stupid: int flag; static void do_test(unsigned long dummy) { flag = 1; } static void do_test_wq(void *dummy) { flag = 1; } static void measure_tasklet0(void) {

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! I find the 4usecs cost on a P4 interesting and a bit too high - how did you measure it? Simple and stupid: int flag; static void do_test(unsigned long dummy) { flag = 1; } static void do_test_wq(void *dummy) { flag = 1; } static void measure_tasklet0(void) {

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! I felt that three calls to tasklet_disable were better than a gazillion calls to spin_(un)lock. It is not better. Actually, it also has something equivalent to spinlock inside. It raises some flag and waits for completion of already running tasklets (cf. spin_lock_bh). And if

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! The difference between softirqs and hardirqs lays not in their heavyness. It is in reentrancy protection, which has to be done with local_irq_disable(), unless networking is not isolated from hardirqs. i know that pretty well ;) You forgot about this again in the next

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! Not a very accurate measurement (jiffies that is). Believe me or not, but the measurement has nanosecond precision. Since the work queue *is* a thread, you are running a busy loop here. Even though you call schedule, this thread still may have quota available, and will not yeild to

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! again, there is no reason why this couldnt be done in a hardirq context. If a hardirq preempts another hardirq and the first hardirq already processes the 'softnet work', you dont do it from the second one but queue it with the first one. (into the already existing

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! If I understand correctly, this is because tasklet_head.list is protected by local_irq_save(), and t could be scheduled on another CPU, so we just can't steal it, yes? Yes. All that code is written to avoid synchronization as much as possible. If we use worqueues, we can change the

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! What changed? softirq remains raised for such tasklet. Old times softirq was processed once per invocation, in schedule and on syscall exit and this was relatively harmless. Since softirqs are very weakly moderated, it results in strong cpu hogging. And can it be fixed? With

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-29 Thread Alexey Kuznetsov
Hello! Also, create_workqueue() is very costly. The last 2 lines should be reverted. Indeed. The result improves from 3988 nanoseconds to 3975. :-) Actually, the difference is within statistical variance, which is about 20 ns. Alexey - To unsubscribe from this list: send the line unsubscribe

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Alexey Kuznetsov
Hello! > the context-switch argument i'll believe if i see numbers. You'll > probably need in excess of tens of thousands of irqs/sec to even be able > to measure its overhead. (workqueues are driven by nice kernel threads > so there's no TLB overhead, etc.) It was authors of the patch who

Re: [RFC PATCH 0/6] Convert all tasklets to workqueues

2007-06-28 Thread Alexey Kuznetsov
Hello! the context-switch argument i'll believe if i see numbers. You'll probably need in excess of tens of thousands of irqs/sec to even be able to measure its overhead. (workqueues are driven by nice kernel threads so there's no TLB overhead, etc.) It was authors of the patch who were

Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Alexey Kuznetsov
Hello! > We actually need to do something about this, as we might loop for ever > there. The robust cleanup code can fail (e.g. due to list corruption) > and we would see exit_state != 0 and the OWNER_DIED bit would never be > set, so we are stuck in a busy loop. Yes... It is possible to take

Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Alexey Kuznetsov
Hello! > Hmm, what means not expected ? -ESRCH is returned, when the owner task > is not found. This is not supposed to happen with robust futexes. glibs aborts (which is correct), or for build with disabled debugging enters simulated deadlock (which is confusing). > lock. Also using uval is

Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Alexey Kuznetsov
Hello! Hmm, what means not expected ? -ESRCH is returned, when the owner task is not found. This is not supposed to happen with robust futexes. glibs aborts (which is correct), or for build with disabled debugging enters simulated deadlock (which is confusing). lock. Also using uval is

Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Alexey Kuznetsov
Hello! We actually need to do something about this, as we might loop for ever there. The robust cleanup code can fail (e.g. due to list corruption) and we would see exit_state != 0 and the OWNER_DIED bit would never be set, so we are stuck in a busy loop. Yes... It is possible to take

Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-05-23 Thread Alexey Kuznetsov
Hello! > #2 crash be explained via any of the bugs you fixed? (i.e. memory > corruption?) Yes, I found the reason, it is really fixed by taking tasklist_lock. This happens after task struct with not cleared pi_state_list is freed and the list of futex_pi_state's is corrupted. Meanwhile... two

Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-05-23 Thread Alexey Kuznetsov
Hello! #2 crash be explained via any of the bugs you fixed? (i.e. memory corruption?) Yes, I found the reason, it is really fixed by taking tasklist_lock. This happens after task struct with not cleared pi_state_list is freed and the list of futex_pi_state's is corrupted. Meanwhile... two

[RFC][PATCH] muptiple bugs in PI futexes

2007-05-07 Thread Alexey Kuznetsov
Hello! 1. New entries can be added to tsk->pi_state_list after task completed exit_pi_state_list(). The result is memory leakage and deadlocks. 2. handle_mm_fault() is called under spinlock. The result is obvious. 3. State machine is broken. Kernel thinks it owns futex after it released

[RFC][PATCH] muptiple bugs in PI futexes

2007-05-07 Thread Alexey Kuznetsov
Hello! 1. New entries can be added to tsk-pi_state_list after task completed exit_pi_state_list(). The result is memory leakage and deadlocks. 2. handle_mm_fault() is called under spinlock. The result is obvious. 3. State machine is broken. Kernel thinks it owns futex after it released

Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov
Hello! > This might work. Could you post a patch to better show what you mean to do? Here it is. ->neigh_destructor() is killed (not used), replaced with ->neigh_cleanup(), which is called when neighbor entry goes to dead state. At this point everything is still valid: neigh->dev, neigh->parms

Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov
Hello! > infiniband sets parm->neigh_destructor, and I search for a way to prevent > this destructor from being called after the module has been unloaded. > Ideas? It must be called in any case to update/release internal ipoib structures. The idea is to move call of parm->neigh_destructor from

Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov
Hello! > If a device driver sets neigh_destructor in neigh_params, this could > get called after the device has been unregistered and the driver module > removed. It is the same problem: if dst->neighbour holds neighbour, it should not hold device. parms->dev is not supposed to be used after

Re: [ofa-general] Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov
Hello! > I think the thing to do is to just leave the loopback references > in place, try to unregister the per-namespace loopback device, > and that will safely wait for all the references to go away. Yes, it is exactly how it works in openvz. All the sockets are killed, queues are cleared,

Re: [ofa-general] Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov
Hello! > Does this look sane (untested)? It does not, unfortunately. Instead of regular crash in infiniband you will get numerous random NULL pointer dereferences both due to dst->neighbour and due to dst->dev. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

Re: [ofa-general] Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov
Hello! > Well I don't think the loopback device is currently but as soon > as we get network namespace support we will have multiple loopback > devices and they will get unregistered when we remove the network > namespace. There is no logical difference. At the moment when namespace is gone

Re: [ofa-general] Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov
Hello! Well I don't think the loopback device is currently but as soon as we get network namespace support we will have multiple loopback devices and they will get unregistered when we remove the network namespace. There is no logical difference. At the moment when namespace is gone there is

Re: [ofa-general] Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov
Hello! Does this look sane (untested)? It does not, unfortunately. Instead of regular crash in infiniband you will get numerous random NULL pointer dereferences both due to dst-neighbour and due to dst-dev. Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the

Re: [ofa-general] Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov
Hello! I think the thing to do is to just leave the loopback references in place, try to unregister the per-namespace loopback device, and that will safely wait for all the references to go away. Yes, it is exactly how it works in openvz. All the sockets are killed, queues are cleared, nobody

Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov
Hello! If a device driver sets neigh_destructor in neigh_params, this could get called after the device has been unregistered and the driver module removed. It is the same problem: if dst-neighbour holds neighbour, it should not hold device. parms-dev is not supposed to be used after

Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov
Hello! infiniband sets parm-neigh_destructor, and I search for a way to prevent this destructor from being called after the module has been unloaded. Ideas? It must be called in any case to update/release internal ipoib structures. The idea is to move call of parm-neigh_destructor from

Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov
Hello! This might work. Could you post a patch to better show what you mean to do? Here it is. -neigh_destructor() is killed (not used), replaced with -neigh_cleanup(), which is called when neighbor entry goes to dead state. At this point everything is still valid: neigh-dev, neigh-parms etc.

Re: dst_ifdown breaks infiniband?

2007-03-18 Thread Alexey Kuznetsov
Hello! > > It should be cleared and we should be sure it will not be destroyed > > before quiescent state. > > I'm confused. didn't you say dst_ifdown is called after quiescent state? Quiescent state should happen after dst->neighbour is invalidated. And this implies that all the users of

Re: dst_ifdown breaks infiniband?

2007-03-18 Thread Alexey Kuznetsov
Hello! > This is not new code, and should have triggered long time ago, > so I am not sure how come we are triggering this only now, > but somehow this did not lead to crashes in 2.6.20 I see. I guess this was plain luck. > Why is neighbour->dev changed here? It holds reference to device and

Re: dst_ifdown breaks infiniband?

2007-03-18 Thread Alexey Kuznetsov
Hello! > Hmm. Something I don't understand: does the code > in question not run on *each* device unregister? It does. > Why do I only see this under stress? You should have some referenced destination entries to trigger bad path. This should happen not only under stress. F.e. just try to ssh

Re: dst_ifdown breaks infiniband?

2007-03-18 Thread Alexey Kuznetsov
Hello! Hmm. Something I don't understand: does the code in question not run on *each* device unregister? It does. Why do I only see this under stress? You should have some referenced destination entries to trigger bad path. This should happen not only under stress. F.e. just try to ssh to

Re: dst_ifdown breaks infiniband?

2007-03-18 Thread Alexey Kuznetsov
Hello! This is not new code, and should have triggered long time ago, so I am not sure how come we are triggering this only now, but somehow this did not lead to crashes in 2.6.20 I see. I guess this was plain luck. Why is neighbour-dev changed here? It holds reference to device and

Re: dst_ifdown breaks infiniband?

2007-03-18 Thread Alexey Kuznetsov
Hello! It should be cleared and we should be sure it will not be destroyed before quiescent state. I'm confused. didn't you say dst_ifdown is called after quiescent state? Quiescent state should happen after dst-neighbour is invalidated. And this implies that all the users of

Re: [PATCH] Don't map random pages if swapoff errors

2007-01-19 Thread Alexey Kuznetsov
Hello! > Getting an error there is all the more reason to proceed > with the swapoff, not to give up and break out of it. Yes, from this viewpoint more reasonable approach would be to untie corresponding ptes from swap entry and mark them as invalid to trigger fault on access. Not even tried

Re: [PATCH] Don't map random pages if swapoff errors

2007-01-19 Thread Alexey Kuznetsov
Hello! Getting an error there is all the more reason to proceed with the swapoff, not to give up and break out of it. Yes, from this viewpoint more reasonable approach would be to untie corresponding ptes from swap entry and mark them as invalid to trigger fault on access. Not even tried

Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello! > Well, take a look at the double acks for 84439343, 84440447 and 84441059, > they seem pretty much identical to me. It is just a little tcpdump glitch. 19:34:54.532271 < 10.2.20.246.33060 > 65.171.224.182.8700: . 44:44(0) ack 84439343 win 24544 (DF) (ttl 64, id 60946)

Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello! > This is where things start going bad. The window starts shrinking from > 15340 all the way down to 2355 over the course of 0.3 seconds. Notice the > many duplicate acks that serve no purpose These are not duplicate, TCP_NODELAY sender just starts flooding tiny segments, and those are

Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello! > I wonder if clamping the window though is too harsh. Maybe just > setting the rcv_ssthresh down is better? It is too harsh. This was invented before we learned how to collapse received data, that time tiny segments were fatal and clamping was the last weapon against misbehaving

Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello! > I experienced the very same problem but with window size going all the > way down to just a few bytes (14 bytes). dump files available upon > requests :) I do request. TCP is not allowed to reduce window to a value less than 2*MSS no matter how hard network device or peer try to

Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello! > If you overflow the socket's memory bound, it ends up calling > tcp_clamp_window(). (I'm not sure this is really the right thing to do > here before trying to collapse the queue.) Collapsing is too expensive procedure, it is rather an emergency measure. So, tcp collapses queue, when

Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello! If you overflow the socket's memory bound, it ends up calling tcp_clamp_window(). (I'm not sure this is really the right thing to do here before trying to collapse the queue.) Collapsing is too expensive procedure, it is rather an emergency measure. So, tcp collapses queue, when it

Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello! I experienced the very same problem but with window size going all the way down to just a few bytes (14 bytes). dump files available upon requests :) I do request. TCP is not allowed to reduce window to a value less than 2*MSS no matter how hard network device or peer try to confuse

Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello! I wonder if clamping the window though is too harsh. Maybe just setting the rcv_ssthresh down is better? It is too harsh. This was invented before we learned how to collapse received data, that time tiny segments were fatal and clamping was the last weapon against misbehaving

Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello! This is where things start going bad. The window starts shrinking from 15340 all the way down to 2355 over the course of 0.3 seconds. Notice the many duplicate acks that serve no purpose These are not duplicate, TCP_NODELAY sender just starts flooding tiny segments, and those are

Re: Possible BUG in IPv4 TCP window handling, all recent 2.4.x/2.6.x kernels

2005-09-02 Thread Alexey Kuznetsov
Hello! Well, take a look at the double acks for 84439343, 84440447 and 84441059, they seem pretty much identical to me. It is just a little tcpdump glitch. 19:34:54.532271 10.2.20.246.33060 65.171.224.182.8700: . 44:44(0) ack 84439343 win 24544 nop,nop,timestamp 226080638 99717832 (DF)

Re: [PATCH] ethernet-bridge: update skb->priority in case forwarded frame has VLAN-header

2005-03-07 Thread Alexey Kuznetsov
Hello! > If this packet came in from an 802.1Q VLAN device, the VLAN code already > has the logic necessary to map the .1q priority to an arbitrary > skb->priority. Actually, the patch makes sense when it is straight ethernet bridge not involving full parsing of VLAN. I guess the case when the

Re: [PATCH] ethernet-bridge: update skb-priority in case forwarded frame has VLAN-header

2005-03-07 Thread Alexey Kuznetsov
Hello! If this packet came in from an 802.1Q VLAN device, the VLAN code already has the logic necessary to map the .1q priority to an arbitrary skb-priority. Actually, the patch makes sense when it is straight ethernet bridge not involving full parsing of VLAN. I guess the case when the