Re: OSPF performance/SPF calculations

2010-04-23 Thread Joakim Tjernlund
Ondrej Zajicek santi...@crfreenet.org wrote on 2010/04/21 20:15:07: On Wed, Apr 21, 2010 at 09:41:47AM +0200, Joakim Tjernlund wrote: I am using Quagga ATM but I had a quick look at BIRD and I got a few observations. Hello. Thank you for your tips and notes. The LSA/checksum

Re: OSPF performance/SPF calculations

2010-04-23 Thread Joakim Tjernlund
Joakim Tjernlund/Transmode wrote on 2010/04/21 22:04:06: Ondrej Zajicek santi...@crfreenet.org wrote on 2010/04/21 20:15:07: On Wed, Apr 21, 2010 at 09:41:47AM +0200, Joakim Tjernlund wrote: I am using Quagga ATM but I had a quick look at BIRD and I got a few observations. Hello.

[PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
This is a much simpler and efficent impl. of the IP checksum. It is a dry port from Quagga and I have not tested it. --- lib/checksum.c | 44 +++- 1 files changed, 11 insertions(+), 33 deletions(-) diff --git a/lib/checksum.c b/lib/checksum.c index

Re: OSPF performance/SPF calculations

2010-04-23 Thread Ondrej Zajicek
On Fri, Apr 23, 2010 at 08:27:40AM +0200, Joakim Tjernlund wrote: Yes, but not at the moment. The endian problem should be addressed when you build the lsa. Does this help at all? In any case, the (int) cast should be there. Removing of endianity swap is correct only if the Fletcher

Re: OSPF performance/SPF calculations

2010-04-23 Thread Joakim Tjernlund
Ondrej Zajicek santi...@crfreenet.org wrote on 2010/04/23 11:52:26: From: Ondrej Zajicek santi...@crfreenet.org To: Joakim Tjernlund joakim.tjernl...@transmode.se Cc: bird-us...@trubka.network.cz Date: 2010/04/23 11:46 Subject: Re: OSPF performance/SPF calculations On Fri, Apr 23, 2010

Re: possible bug: bgp md5 authentication and multiple source ip addresses

2010-04-23 Thread Joakim Tjernlund
On Fri, Apr 23, 2010 at 10:13:32AM +0200, Wolfgang Hennerbichler wrote: now I've setup BIRD to peer on the different source interfaces and from different ASes to simulate productive routers: protocol bgp R1 { debug all; local as 1120; neighbor 193.203.0.3 as 1267; import

Re: OSPF performance/SPF calculations

2010-04-23 Thread Ondrej Zajicek
On Fri, Apr 23, 2010 at 01:06:20PM +0200, Joakim Tjernlund wrote: I must be missing something then(not surprising as I just started looking at BIRD). Why do you need the separate allocation for the body of the LSA then? Why not just adding entries to the allocated LSA header? Ahh, I am

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! This is a much simpler and efficent impl. of the IP checksum. It is a dry port from Quagga and I have not tested it. Are you convinced that your version is more efficient? The original version processes 32 bits at a time, while your code does only 16 bits at a time. It might be worth

Re: OSPF performance/SPF calculations

2010-04-23 Thread Joakim Tjernlund
Ondrej Zajicek santi...@crfreenet.org wrote on 2010/04/23 14:22:18: On Fri, Apr 23, 2010 at 01:06:20PM +0200, Joakim Tjernlund wrote: I must be missing something then(not surprising as I just started looking at BIRD). Why do you need the separate allocation for the body of the LSA

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Martin Mares m...@ucw.cz wrote on 2010/04/23 15:03:34: Hello! This is a much simpler and efficent impl. of the IP checksum. It is a dry port from Quagga and I have not tested it. Are you convinced that your version is more efficient? The original version processes 32 bits at a time,

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! Fairly, I once had the same idea for Quagga but found all those extra tests and additions were much slower(I benched it). Just look at the number of extra ops one has to do in the current code. If you want to do it faster you have to go ASM. It would be easy to add support for

Re: OSPF performance/SPF calculations

2010-04-23 Thread Martin Mares
Hello! But it slows down the fletcher checksum as one need to test and extra calculations because of this. Any gain by the separation is lost many times over in the fletcher checksum which could be as simple as(from Quagga with my tweaks): Again, do you have any real numbers backing this

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! But you can't get rid of: z + (z sum) which is the real bottleneck. Perhaps this doesn't cost much on high end CPUs but it sure does on embedded CPUs Why should it be? It can be compiled as a sequence of add with carry instructions, can't it? Have

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Hello! But you can't get rid of: z + (z sum) which is the real bottleneck. Perhaps this doesn't cost much on high end CPUs but it sure does on embedded CPUs Why should it be? It can be compiled as a sequence of add with carry instructions, can't it? Yes, but have you seen gcc do

Re: OSPF performance/SPF calculations

2010-04-23 Thread Joakim Tjernlund
Ondrej Zajicek santi...@crfreenet.org wrote on 2010/04/23 16:09:00: On Fri, Apr 23, 2010 at 10:54:33AM +0200, Joakim Tjernlund wrote: Removing of endianity swap is correct only if the Fletcher checksum would return the same value regardless of endianity swap. Is this a property of the

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Joakim Tjernlund/Transmode wrote on 2010/04/23 16:14:58: Hello! But you can't get rid of: z + (z sum) which is the real bottleneck. Perhaps this doesn't cost much on high end CPUs but it sure does on embedded CPUs Why should it be? It can be compiled as a sequence of add

Re: OSPF performance/SPF calculations

2010-04-23 Thread Ondrej Zajicek
On Fri, Apr 23, 2010 at 04:12:27PM +0200, Joakim Tjernlund wrote: As i looked on the Fletcher checksum, it seems that you cannot just swap the result instead of swapping the checked data. Then there is a bug else where. Fletcher as such does not depend on host endian. It operates on bytes

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! Just tried this and it didn't with gcc 3.4.3 on PowerPC It would be better to let gcc unroll the loop (if it is critical for performance, it should be unrolled anyway) and use a newer version of gcc. Some arch does not have an add with carry insn(MIPS?) Well, first of all, we should

Re: OSPF performance/SPF calculations

2010-04-23 Thread Joakim Tjernlund
On Fri, Apr 23, 2010 at 04:12:27PM +0200, Joakim Tjernlund wrote: As i looked on the Fletcher checksum, it seems that you cannot just swap the result instead of swapping the checked data. Then there is a bug else where. Fletcher as such does not depend on host endian. It operates on

Re: OSPF performance/SPF calculations

2010-04-23 Thread Ondrej Zajicek
On Fri, Apr 23, 2010 at 05:29:09PM +0200, Joakim Tjernlund wrote: But you also need LSAs in host endianity when doing SPF calculation. Although it would be probably possible to change SPF calculation to use directly BE values it would be huge work and it is questionable whether it wouldn't

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Ondrej Filip
On 23.4.2010 18:32, Ondrej Zajicek wrote: On Fri, Apr 23, 2010 at 03:20:55PM +0200, Martin Mares wrote: Hello! Fairly, I once had the same idea for Quagga but found all those extra tests and additions were much slower(I benched it). Just look at the number of extra ops one has to do in

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Ondrej Filip fe...@network.cz wrote on 2010/04/23 18:41:57: On 23.4.2010 18:32, Ondrej Zajicek wrote: On Fri, Apr 23, 2010 at 03:20:55PM +0200, Martin Mares wrote: Hello! Fairly, I once had the same idea for Quagga but found all those extra tests and additions were much slower(I

Re: OSPF performance/SPF calculations

2010-04-23 Thread Ondrej Filip
On 23.4.2010 14:22, Ondrej Zajicek wrote: On Fri, Apr 23, 2010 at 01:06:20PM +0200, Joakim Tjernlund wrote: I must be missing something then(not surprising as I just started looking at BIRD). Why do you need the separate allocation for the body of the LSA then? Why not just adding entries to

Re: OSPF performance/SPF calculations

2010-04-23 Thread Joakim Tjernlund
Ondrej Zajicek santi...@crfreenet.org wrote on 2010/04/23 18:46:19: On Fri, Apr 23, 2010 at 05:29:09PM +0200, Joakim Tjernlund wrote: But you also need LSAs in host endianity when doing SPF calculation. Although it would be probably possible to change SPF calculation to use directly BE

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! So there isn't really difference in performance of both implementations. Even on slow embedded AMD Geode CPU, it gives ~ 180 MB/s. No difference? what does 1.2 mean? to me this means 20% which is a lot Yes, but according to Santiago's benchmarks, your code is sometimes 20%

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Ondrej Filip
On 23.4.2010 19:01, Joakim Tjernlund wrote: Ondrej Filip fe...@network.cz wrote on 2010/04/23 18:41:57: On 23.4.2010 18:32, Ondrej Zajicek wrote: On Fri, Apr 23, 2010 at 03:20:55PM +0200, Martin Mares wrote: Hello! My primary reaction was If something isn't broken, don't fix it. I.e.,

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Martin Mares m...@ucw.cz wrote on 2010/04/23 19:23:18: Hello! So there isn't really difference in performance of both implementations. Even on slow embedded AMD Geode CPU, it gives ~ 180 MB/s. No difference? what does 1.2 mean? to me this means 20% which is a lot Yes, but

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! Peformance matter, especially when the network grows. Is this the way BIRD development works? Only work on stuff that currently feels important is acceptet? Yes, performance matters. This is why performance optimizations in BIRD have to be justified by real numbers, not by hand-waving.

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Ondrej Filip fe...@network.cz wrote on 2010/04/23 19:28:27: On 23.4.2010 19:01, Joakim Tjernlund wrote: Ondrej Filip fe...@network.cz wrote on 2010/04/23 18:41:57: On 23.4.2010 18:32, Ondrej Zajicek wrote: On Fri, Apr 23, 2010 at 03:20:55PM +0200, Martin Mares wrote: Hello! My

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Ondrej Filip
Again, BIRD is used on some very important places and therefore we are very conservative in accepting new patches. But I don't think we have NIH syndrome. We have been accepting foreign patches since beginning and Ondrej Zajicek is a good example. :-) He had sent me some new patches and later

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! So basically you are saying that outsiders like my self aren't welcome because BIRD is so important to some IXPs that you don't want to take any chanches? Certainly not. However, it means that the criteria for accepting patches are somewhat stricter than in many other projects. All

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Ondrej Zajicek
On Fri, Apr 23, 2010 at 07:40:28PM +0200, Joakim Tjernlund wrote: So there isn't really difference in performance of both implementations. Even on slow embedded AMD Geode CPU, it gives ~ 180 MB/s. No difference? what does 1.2 mean? to me this means 20% which is a lot Yes,

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Ondrej Zajicek
On Fri, Apr 23, 2010 at 07:40:28PM +0200, Joakim Tjernlund wrote: Martin Mares m...@ucw.cz wrote on 2010/04/23 19:23:18: Hello! So there isn't really difference in performance of both implementations. Even on slow embedded AMD Geode CPU, it gives ~ 180 MB/s. No

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Ondrej Zajicek santi...@crfreenet.org wrote on 2010/04/23 21:39:06: On Fri, Apr 23, 2010 at 07:40:28PM +0200, Joakim Tjernlund wrote: Martin Mares m...@ucw.cz wrote on 2010/04/23 19:23:18: Hello! So there isn't really difference in performance of both implementations. Even

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Martin Mares m...@ucw.cz wrote on 2010/04/23 23:02:29: Hello! How did you come to the conclusion the the current code was better than the previous version? Seems like hand waving to me. Did I claim anywhere that the old code is better? I only pointed out You did when you commited it