Re: [Gimp-developer] Solaris 64bit compile
Sven Neumann [EMAIL PROTECTED] wrote: The code is combining the multiplications done on 2 channels of the same pixel into one. Also it is also meant as an example of what can be done without using CPU-specific instructions. here's another example (4 x 8bit saturated addition): uint32 padd_sat_4x8(uint32 a, uint32 b) { uint32 ta, tb, tm, q, u, m; /* save overflow-causing bits in ta, tb */ ta = a 0x80808080; tb = b 0x80808080; q = a + b - (ta + tb); /* determine overflow conditions */ tm = ta | tb; u = (ta tb) | (q tm); /* u now contains overflow bits, propagate them over fields */ m = (u 1) - (u 7); return (q + tm - u) | m; } This is completely portable, and should be a good deal faster than conditionally adding each component separately, at least on modern superscalar machines with expensive unpredicted branches. And benchmarks confirm this Extending the above to 8 x 8bit (using 64-bit integers) is trivial of course ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] Solaris 64bit compile
Am 06 Sep 2001 10:19:28 +0200 schrieb =?ISO-8859-1?Q?Mattias Engdeg=E5rd?=: This is completely portable, and should be a good deal faster than conditionally adding each component separately, at least on modern superscalar machines with expensive unpredicted branches. And benchmarks confirm this I like it. I did some benchmarking with a few routines with different compilers on ppc and i686 and here are the results: egger@sonja:~/test time ./testmat Time needed for padd_sat_4x8 in clocks: 555 Time needed for padd_sat_4x8_and in clocks: 664 Time needed for padd_sat_4x8_norm in clocks: 709 real0m21.046s user0m18.680s sys 0m0.650s Options to compile: /opt/experimental/bin/gcc -O3 -fssa -save-temps test.c -o testmat egger@sonja:~/test time ./testmat Time needed for padd_sat_4x8 in clocks: 555 Time needed for padd_sat_4x8_and in clocks: 584 Time needed for padd_sat_4x8_norm in clocks: 784 Time needed for padd_sat_4x8_vec in clocks: 178 real0m21.477s user0m20.520s sys 0m0.530s Same machine but gcc-2.95.3 with Altivec support. Options to compile: /opt/gcc-altivec/bin/gcc -O3 -mcpu=7400 -fvec -save-temps test.c -o testmat egger@alex:~ time ./testmat Time needed for padd_sat_4x8 in clocks: 883 Time needed for padd_sat_4x8_and in clocks: 1073 Time needed for padd_sat_4x8_norm in clocks: 1101 real0m30.614s user0m30.370s sys 0m0.210s This machine is a Duron-800 with 1GB RAM. I've no idea why it performs so poorly compared to the G4. The compile was gcc 2.95.3 with -march=i686 and -mcpu=i686 however the compiler didn't use the conditional move instructions from the higher Pentium CPUs which should have sped up the _norm case considerable as it is possible to do the same without branches. The source is attached, feel free to study it and provide faster code. At the moment it is pretty clear that Mattias code is pretty efficent and compiler equally well with several compilers on several architectures. Servus, Daniel #include glib-1.2/glib.h #include time.h static guint32 dest[2000] __attribute__ ((aligned (16))); static guint32 source1[2000] __attribute__ ((aligned (16))); static guint32 source2[2000] __attribute__ ((aligned (16))); inline void padd_sat_4x8(guint32 *dest, guint32 *pa, guint32 *pb) { guint32 a = *pa, b = *pb; guint32 ta, tb, tm, q, u, m; /* save overflow-causing bits in ta, tb */ ta = a 0x80808080; tb = b 0x80808080; q = a + b - (ta + tb); /* determine overflow conditions */ tm = ta | tb; u = (ta tb) | (q tm); /* u now contains overflow bits, propagate them over fields */ m = (u 1) - (u 7); *dest = ((q + tm - u) | m); } inline void padd_sat_4x8_norm (guint32 *dest, guint32 *pa, guint32 *pb) { guint8 *newdest = (guint8 *) dest; guint16 dr, dg, db, da; guint8 r1 = *((guint8 *) (pa) + 0); guint8 g1 = *((guint8 *) (pa) + 1); guint8 b1 = *((guint8 *) (pa) + 2); guint8 a1 = *((guint8 *) (pa) + 3); guint8 r2 = *((guint8 *) (pb) + 0); guint8 g2 = *((guint8 *) (pb) + 1); guint8 b2 = *((guint8 *) (pb) + 2); guint8 a2 = *((guint8 *) (pb) + 3); dr = r1 + r2; dg = g1 + g2; db = b1 + b2; da = a1 + a2; newdest[0] = dr 255 ? 255 : dr; newdest[1] = dg 255 ? 255 : dg; newdest[2] = db 255 ? 255 : db; newdest[3] = da 255 ? 255 : da; } inline void padd_sat_4x8_and (guint32 *dest, guint32 *pa, guint32 *pb) { guint32 s1 = *pa, s2 = *pb; guint16 dr, dg, db, da; guint8 *newdest = (guint8 *) dest; guint8 scratch; dr = (s1 24 ) 0xff + (s2 24) 0xff; dg = (s1 16) 0xff + (s2 16) 0xff; db = (s1 8) 0xff + (s2 8) 0xff; da = s1 0xff + s2 0xff; newdest[0] = (guint8) (~((dr 8) - 1)) | dr; newdest[1] = (guint8) (~((dg 8) - 1)) | dg; newdest[2] = (guint8) (~((db 8) - 1)) | db; newdest[3] = (guint8) (~((da 8) - 1)) | da; } #ifdef __VEC__ inline void padd_sat_4x8_vec (guint32 *dest, guint32 *pa, guint32 *pb) { vector unsigned char vdest, source1, source2; source1 = vec_ld (0, (unsigned char *) pa); source2 = vec_ld (0, (unsigned char *) pb); vdest = vec_adds (source1, source2); vec_st (vdest, 0, (unsigned char *) dest); } #endif int main (void) { int i, current, iter; current = clock (); for (iter = 0; iter 10; iter++) { for (i = 0; i 2000; i++) { padd_sat_4x8 (dest + i, source1 + i, source2 + i); } } current = clock () - current; printf(Time needed for padd_sat_4x8 in clocks: %i\n, current); current = clock (); for (iter = 0; iter 10; iter++) { for (i = 0; i 2000; i++) { padd_sat_4x8_and (dest + i, source1 + i, source2 + i); } } current = clock () - current; printf(Time needed for padd_sat_4x8_and in clocks: %i\n, current); current = clock (); for (iter = 0; iter 10; iter++) { for (i = 0; i 2000; i++) { padd_sat_4x8_norm (dest + i, source1 + i, source2 +
Re: [Gimp-developer] Solaris 64bit compile
Sven Neumann [EMAIL PROTECTED] wrote: __u32 __rb = (((color.r)16) | (color.b)); __u32 __g = ((color.g)8); switch (a) {\ case 0xff: *(d) = (0xff00 | __rb | __g); \ case 0: break; \ default: {\ __u32 pixel = *(d);\ __u16 s = (a)+1;\ register __u32 t1,t2; \ t1 = (pixel0x00ff00ff); t2 = (pixel0xff00); \ pixel = __rb-t1)*s+(t18)) 0xff00ff00) + \ ((( __g-t2)*s+(t28)) 0x00ff)) 8; \ *(d) = pixel;\ }\ } if you think this looks ugly, you should have a look at the same function for RGB16 and RGB15 ;-) I don't think they are that bad --- the readability of the above code merely suffers from a pollution of backslashes and underscores. But the general principle is useful and it's not hard to do parallel saturating additions and subtractions without any branches at all, just using bit fiddling. Many modern architectures can do better with vector instructions but generic fallback code is of course always needed ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] Solaris 64bit compile
Hi, Mattias Engdegård [EMAIL PROTECTED] writes: I don't think they are that bad --- the readability of the above code merely suffers from a pollution of backslashes and underscores. I took this code out of a macro and forgot to remove the backslashes. Also we'd use different types since glib defines guint8, guint16 and guint32. I choose not to show off the code for RGB16 and RGB15 since the masks and shifting values used there make the code hard to understand, while the RGB32 and ARGB cases are more obvious. Also fortunately, we don't do RGB16 in gimp. Salut, Sven ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] Solaris 64bit compile
Am 03 Sep 2001 17:45:13 -0700 schrieb Brian Weber: So if I understand you correctly, since the gimps main purpose is to perform bitwise manipulation that 64 bit won't give much of a benefit. No, it's doing bytewise operation on 8, 16, 24 or 32bit data and it has to because many processors won't operate on several 8bit channels at once because they lack cpu commands to do so. It would be a nice speedup to work on 4x8bit at once for processors that support it and for 64bit we could even do 2 RGBA pixels at once if the processors supports it. If someone were to go in and change the memory management scheme of the gimp to make use of the OS instead of gimps tiles than we would probably see a benefit in the load time and save time. Actually we considered using a chunked memory region instead of tiles however this also has several drawbacks but would probably speed up imaging a lot if enough memory is available (which is probably not an issue anymore with todays memory prices). In theory a tilebased system has huge advantages when working with pictures that are bigger than the available memory because only tiles which are being worked on would have to be in physical memory. I don't know why but for some reason I got it in my head that 64 bit would be twice as fast for any application that was processor intensive. It is (at least) twice as fast if you have data that needs 64bit precision because emulated 64bit operation is really heavyweight. With imaging software one normally operates with byte or doublebyte data data which is why aren't to big speedups possible at the moment. Thinking that you can have, I don't know how many more, many more instructions and the system buss would be twice the width allowing twice the amount of data to go between ram and the CPUs. Maybe when I design my new processor I will include that in :) If you do that take a PPClike design and add lots units and a better dispatcher. :) Servus, Daniel ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] Solaris 64bit compile
Am 04 Sep 2001 15:51:34 +0200 schrieb Sven Neumann: you certainly can process several 8 bit channels in one operation without special support from the processor and I would like to contribute such code to The GIMP Uii, now I'm curious. If you group an RGBA pixel together in a word and want to add another one what would the code look like so it does proper saturation instead of smearing between the channels? but I'm waiting until the new paint_funcs code is in place. Ayiie, now I'm really starting to feel guilty and I still cannot compile HEAD gimp :( Servus, Daniel ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] Solaris 64bit compile
So if I understand you correctly, since the gimps main purpose is to perform bitwise manipulation that 64 bit won't give much of a benefit. If someone were to go in and change the memory management scheme of the gimp to make use of the OS instead of gimps tiles than we would probably see a benefit in the load time and save time. I could also see that breaking other OS's unless it was done to work across platforms. I don't know why but for some reason I got it in my head that 64 bit would be twice as fast for any application that was processor intensive. Thinking that you can have, I don't know how many more, many more instructions and the system buss would be twice the width allowing twice the amount of data to go between ram and the CPUs. Maybe when I design my new processor I will include that in :) - Original Message - From: Daniel Egger [EMAIL PROTECTED] To: Brian Weber [EMAIL PROTECTED] Cc: GIMP developer [EMAIL PROTECTED] Sent: Monday, September 03, 2001 8:21 AM Subject: Re: [Gimp-developer] Solaris 64bit compile Am 25 Aug 2001 22:21:58 -0700 schrieb Brian Weber: I did get a successful compile of 64 bit gimp. I ran it and used so of the limited functionality that I usually use with no problems at all. I wouldn't expect them either. I ran GIMP before on 64bit USparc, PPC and Itanium without any quirks. The other thing I didn't see is any performance increase. Bad luck. :) My machine is an Ultra 2 with dual 168 mhz and 192 Meg of ram. I grabbed a relatively small bitmap and used the globe plugin for 10 iterations to make sure that I had enough time to get some statistics. I wasn't using any swap space and one of the processors was pegged about 3/4s of the time. The other process got pinged a couple times but that was probably more os and vmstat. Ok, two reasons for that: GIMP is not requesting a single big chunk of memory from the system but rather does it's own memory management using a tile based system. So if you do not excessivly utilize your system or lie about your memory size then GIMP will probably never hit SWAP at all. The second CPU will be only used for tile rendering if GIMP is compiled with the MP option, however it doesn't give much of a benefit, is rather untested and will definitely result in a deterioriation on a single CPU system because of the additional checks (which is also why it is not activated by default). My question is, am I expecting too much from 64 bit? Yes you are. Does the C code actually have to change to get the benefit from 64 bit? Yes. I don't exactly know how your machine is organized and how USPARC addresses memory but it maybe that differently aligned memory might show benefits here. Also I have some ideas how to optimize the GIMP to give bigger benefits on 64bit architectures but cannot benchmark it due to unavailability of those systems; the trick here would be to reorganize memory utilisation to use the full 64bit capabilites of your machine. Those architectures normally shine while handling big chunks of data and suffer when addressing small amounts of data (say bytewise) with a lot of instructions because they are clocked so slowly. Is this the right list to be asking this question? Sure it is. Servus, Daniel ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
Re: [Gimp-developer] Solaris 64bit compile
Hi, My question is, am I expecting too much from 64 bit? Does the C code actually have to change to get the benefit from 64 bit? Is this Sorry, youre expecting too much for an application like The GIMP to benefit from 64 bit. I'd say look for pure CPU speed and memory bandwidth, or SIMD/vector instructions if the code is optimized for it. 64 bit gives most for applications that handle giant amounts of data or very high precision. Jens ___ Gimp-developer mailing list [EMAIL PROTECTED] http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer
[Gimp-developer] Solaris 64bit compile
All, I hope this question hasn't been asked too many times before but I couldn't find it in the archives. I am using an Ultra 2 sparc with 64 bit kernel. I was told by sun that the only way to see real benefit of 64 bit is to have the kernel and the application compiled for 64 bit. That was the start of my adventure. glib and gtk+ version 1.2.8 compiled without any problems using egcs compile with the following options "-m64 -mcmodel=medlow -g -O2". I then went to compile the gimp version 1.2.2 with the same options. The flarefx.c had some problems but I figured if I got 64 bit than I could do without one component. It compiled with a few warnings but I don't believe they are any different than the 32 bit warnings. I did get a successful compile of 64 bit gimp. I ran it and used so of the limited functionality that I usually use with no problems at all. Now for the question. The other thing I didn't see is any performance increase. My machine is an Ultra 2 with dual 168 mhz and 192 Meg of ram. I grabbed a relatively small bitmap and used the globe plugin for 10 iterations to make sure that I had enough time to get some statistics. I wasn't using any swap space and one of the processors was pegged about 3/4s of the time. The other process got pinged a couple times but that was probably more os and vmstat. I also have a windows machine with gimp installed for windows. That machine has 64 Meg of ram and a single 266 mhz processor. I did a side by side test of 10 iterations with the globe plugin. The windows machine finished first with the sparc still having 3 more iterations to go. My question is, am I expecting too much from 64 bit? Does the C code actually have to change to get the benefit from 64 bit? Is this the right list to be asking this question? I know this was a long email. I hope it was an appropriate question for this list and i appreciate any comments. If this is not a question for this list and you still have comments for me just send them directly too me so as not to disturb the list any more than I already have. Thanks Brian