[PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
This is a much simpler and efficent impl. of the IP checksum. It is a dry port from Quagga and I have not tested it. --- lib/checksum.c | 44 +++- 1 files changed, 11 insertions(+), 33 deletions(-) diff --git a/lib/checksum.c b/lib/checksum.c index

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! This is a much simpler and efficent impl. of the IP checksum. It is a dry port from Quagga and I have not tested it. Are you convinced that your version is more efficient? The original version processes 32 bits at a time, while your code does only 16 bits at a time. It might be worth

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Martin Mares m...@ucw.cz wrote on 2010/04/23 15:03:34: Hello! This is a much simpler and efficent impl. of the IP checksum. It is a dry port from Quagga and I have not tested it. Are you convinced that your version is more efficient? The original version processes 32 bits at a time,

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! Fairly, I once had the same idea for Quagga but found all those extra tests and additions were much slower(I benched it). Just look at the number of extra ops one has to do in the current code. If you want to do it faster you have to go ASM. It would be easy to add support for

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! But you can't get rid of: z + (z sum) which is the real bottleneck. Perhaps this doesn't cost much on high end CPUs but it sure does on embedded CPUs Why should it be? It can be compiled as a sequence of add with carry instructions, can't it? Have

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Hello! But you can't get rid of: z + (z sum) which is the real bottleneck. Perhaps this doesn't cost much on high end CPUs but it sure does on embedded CPUs Why should it be? It can be compiled as a sequence of add with carry instructions, can't it? Yes, but have you seen gcc do

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Joakim Tjernlund/Transmode wrote on 2010/04/23 16:14:58: Hello! But you can't get rid of: z + (z sum) which is the real bottleneck. Perhaps this doesn't cost much on high end CPUs but it sure does on embedded CPUs Why should it be? It can be compiled as a sequence of add

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! Just tried this and it didn't with gcc 3.4.3 on PowerPC It would be better to let gcc unroll the loop (if it is critical for performance, it should be unrolled anyway) and use a newer version of gcc. Some arch does not have an add with carry insn(MIPS?) Well, first of all, we should

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Ondrej Filip
On 23.4.2010 18:32, Ondrej Zajicek wrote: On Fri, Apr 23, 2010 at 03:20:55PM +0200, Martin Mares wrote: Hello! Fairly, I once had the same idea for Quagga but found all those extra tests and additions were much slower(I benched it). Just look at the number of extra ops one has to do in

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Ondrej Filip fe...@network.cz wrote on 2010/04/23 18:41:57: On 23.4.2010 18:32, Ondrej Zajicek wrote: On Fri, Apr 23, 2010 at 03:20:55PM +0200, Martin Mares wrote: Hello! Fairly, I once had the same idea for Quagga but found all those extra tests and additions were much slower(I

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! So there isn't really difference in performance of both implementations. Even on slow embedded AMD Geode CPU, it gives ~ 180 MB/s. No difference? what does 1.2 mean? to me this means 20% which is a lot Yes, but according to Santiago's benchmarks, your code is sometimes 20%

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Ondrej Filip
On 23.4.2010 19:01, Joakim Tjernlund wrote: Ondrej Filip fe...@network.cz wrote on 2010/04/23 18:41:57: On 23.4.2010 18:32, Ondrej Zajicek wrote: On Fri, Apr 23, 2010 at 03:20:55PM +0200, Martin Mares wrote: Hello! My primary reaction was If something isn't broken, don't fix it. I.e.,

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Martin Mares m...@ucw.cz wrote on 2010/04/23 19:23:18: Hello! So there isn't really difference in performance of both implementations. Even on slow embedded AMD Geode CPU, it gives ~ 180 MB/s. No difference? what does 1.2 mean? to me this means 20% which is a lot Yes, but

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! Peformance matter, especially when the network grows. Is this the way BIRD development works? Only work on stuff that currently feels important is acceptet? Yes, performance matters. This is why performance optimizations in BIRD have to be justified by real numbers, not by hand-waving.

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Ondrej Filip fe...@network.cz wrote on 2010/04/23 19:28:27: On 23.4.2010 19:01, Joakim Tjernlund wrote: Ondrej Filip fe...@network.cz wrote on 2010/04/23 18:41:57: On 23.4.2010 18:32, Ondrej Zajicek wrote: On Fri, Apr 23, 2010 at 03:20:55PM +0200, Martin Mares wrote: Hello! My

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Ondrej Filip
Again, BIRD is used on some very important places and therefore we are very conservative in accepting new patches. But I don't think we have NIH syndrome. We have been accepting foreign patches since beginning and Ondrej Zajicek is a good example. :-) He had sent me some new patches and later

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Martin Mares
Hello! So basically you are saying that outsiders like my self aren't welcome because BIRD is so important to some IXPs that you don't want to take any chanches? Certainly not. However, it means that the criteria for accepting patches are somewhat stricter than in many other projects. All

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Ondrej Zajicek
On Fri, Apr 23, 2010 at 07:40:28PM +0200, Joakim Tjernlund wrote: So there isn't really difference in performance of both implementations. Even on slow embedded AMD Geode CPU, it gives ~ 180 MB/s. No difference? what does 1.2 mean? to me this means 20% which is a lot Yes,

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Ondrej Zajicek
On Fri, Apr 23, 2010 at 07:40:28PM +0200, Joakim Tjernlund wrote: Martin Mares m...@ucw.cz wrote on 2010/04/23 19:23:18: Hello! So there isn't really difference in performance of both implementations. Even on slow embedded AMD Geode CPU, it gives ~ 180 MB/s. No

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Ondrej Zajicek santi...@crfreenet.org wrote on 2010/04/23 21:39:06: On Fri, Apr 23, 2010 at 07:40:28PM +0200, Joakim Tjernlund wrote: Martin Mares m...@ucw.cz wrote on 2010/04/23 19:23:18: Hello! So there isn't really difference in performance of both implementations. Even

Re: [PATCH] ipsum_calc_block: Optimize size and speed

2010-04-23 Thread Joakim Tjernlund
Martin Mares m...@ucw.cz wrote on 2010/04/23 23:02:29: Hello! How did you come to the conclusion the the current code was better than the previous version? Seems like hand waving to me. Did I claim anywhere that the old code is better? I only pointed out You did when you commited it