Re: [OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements
Also, this is with the Java version, right? Yes, my patch is pure java given as webrev: http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-1/ We got a decent 2x speedup in FX by porting the version of Open Pisces that you started with to C code (all except on Linux where we couldn't find the right gcc options to make it faster than Hotspot). So, we have yet to investigate a native version in the JDK which might provide even more gains… Oleg did more analysis on this and it appears the reason hotspot on Linux was faster than the C version was because on Linux it is -server compiler (c2) whereas on Windows / Mac it is client compiler (c1). Possibly using -server on windows / mac would also have hotspot beating the C version, although that hasn't been tested. Richard
Re: [OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements
On Tue, Apr 9, 2013 at 3:02 PM, Laurent Bourgès bourges.laur...@gmail.comwrote: Dear Java2D members, Could someone review the following webrev concerning Java2D Pisces to enhance its performance and reduce its memory footprint (RendererContext stored in thread local or concurrent queue): http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-1/ FYI I fixed file headers in this patch and signed my OCA 3 weeks ago. Remaining work: - cleanup (comments ...) - statistics to perform auto-tuning - cache / memory cleanup (SoftReference ?): use hints or System properties to adapt it to use cases - another problem: fix clipping performance in Dasher / Stroker for segments out of bounds Could somebody support me ? ie help me working on these tasks or just to discuss on Pisces algorithm / implementation ? Hi, I would like to express my support for this patch. Given that micro-benchmarks have already been run, I took the patch for a spin in a large, real world benchmark instead, the OSGeo WMS Shootout 2010 benchmark, for which you can see the results here: http://www.slideshare.net/gatewaygeomatics.com/wms-performance-shootout-2010 The presentation is long, but suffice it to say all Java based implementations took quite the beating due to the poor scalability of Ductus with antialiased rendering of vector data (for an executive summary just look at slide 27 and slide 66, where GeoServer, Oracle MapViewer and Constellation SDI were the Java based ones) I took the same tests and run them again on my machine (different hardware than the tests, don't try to compare the absolute values), using Oracle JDK 1.7.0_17, OpenJDK 8 (a checkout a couple of weeks old) and the same, but with Laurent's patches applied. Here are the results, throughput (in maps generated per second) with the load generator (JMeter) going up from one client to 64 concurrent clients: *Threads* *JDK 1.7.0_17* *OpenJDK 8, vanilla* *OpenJDK 8 + pisces renderer improvements* *Pisces renderer performance gain, %* 1 13,97 12,43 13,03 4,75% 2 22,08 20,60 20,77 0,81% 4 34,36 33,15 33,36 0,62% 8 39,39 40,51 41,71 2,96% 16 40,61 44,57 46,98 5,39% 32 41,41 44,73 48,16 7,66% 64 37,09 42,19 45,28 7,32% Well, first of all, congratulations to the JDK developers, don't know what changed in JDK 8, but GeoServer seems to like it quite a bit :-). That said, Laurent's patch also gives a visible boost, especially when several concurrent clients are asking for the maps. Mind, as I said, this is no micro-benchmark, it is a real application loading doing a lot of I/O (from the operating system file cache), other processing before the data reaches the rendering pipeline, and then has to PNG encode the output BufferedImage (PNG encoding being rather expensive), so getting this speedup from just a change in the rendering pipeline is significant. Long story short... personally I'd be very happy if this patch was going to be integrated in Java 8 :-) Cheers Andrea -- == GeoServer training in Milan, 6th 7th June 2013! Visit http://geoserver.geo-solutions.it for more information. == Ing. Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via Poggio alle Viti 1187 55054 Massarosa (LU) Italy phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549 http://www.geo-solutions.it http://twitter.com/geosolutions_it ---
Re: [OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements
Jim, thanks for having some interest for my efforts ! As I got almost no feedback, I felt quite disappointed and was thinking that improving pisces was not important ... Here are ductus results and comparison (open office format): http://jmmc.fr/~bourgesl/share/java2d-pisces/ductus_det.log http://jmmc.fr/~bourgesl/share/java2d-pisces/compareRef_Patch.ods test threads ops Tavg Tmed stdDev rms *Med+Stddev* min max boulder_17 1 20 73,92% 69,34% 27,98% 69,34% *69,14%* 69,81% 146,89% boulder_17 2 20 110,86% 110,48% 613,80% 112,01% *125,43%* 94,71% 136,35% boulder_17 4 20 135,28% 135,86% 226,61% 136,46% *141,85%* 125,14% 111,32% shp_alllayers_47 1 20 82,25% 82,49% 47,50% 82,48% *82,30%* 82,64% 78,08% shp_alllayers_47 2 20 115,87% 115,67% 315,45% 115,85% *119,89%* 109,33% 128,71% shp_alllayers_47 4 20 218,59% 218,76% 169,38% 218,65% *216,45%* 220,17% 206,17% Ductus vs Patch *1* *80,74%* *2* *120,69%* *4* *205,92%* Reminder: Ref vs Patch *1* *237,71%* *2* *271,68%* *4* *286,15%* Note: I only have 2 cores + HT on my laptop and do not test with more threads (64 like andrea). 2013/4/16 Jim Graham james.gra...@oracle.com If I'm reading this correctly, your patch is faster even for a single thread? That's great news. Not yet, but ductus is now only 20% faster than my patch and 20% and 2x slower with 2 and 4 threads : I still hope to beat it applying few more optimizations: - Renderer.ScanLine iterator / Renderer.endRendering can be improved - avoid few more array zero fill (reset) if possible - adding statistics to better set initial array sizes ... - use SunGraphics2D to hold an hard reference (temporarly) on RendererContext (to avoid many ThreadLocal.get()) - cache eviction (WeakReference or SoftReference) ? Why not use divide and conquer (thread pool) to boost single thread rendering if the machine has more cpu cores ? It would be helpful if the AATileGenerator has access to SunGraphics2D to get rendering hints or share variables (cache ...) For the moment, I did not modify the algorithms itself but I will do it to enhance clipping (dasher segments / stroker) ... One of the problems we've had with replacing Ductus is that it has been faster in a single thread situation than the open source versions we've created. One of its drawbacks is that it had been designed to take advantage of some AA-accelerating hardware that never came to be. With the accelerator it would have been insanely fast, but hardware went in a different direction. The problem was that this early design goal caused the entire library to be built around an abstraction layer that allowed for a single tile producer internally (because there would be only one - insanely fast - hardware chip available) and the software version of the abstraction layer thus had a lot of native static data structures (there's only one of me, right?) that prevented MT access. It was probably solvable, but I'd be happier if someone could come up with a faster rasterizer, imagining that there must have been some sort of advancements in the nearly 2 decades since the original was written. If I'm misinterpreting and single thread is still slower than Ductus (or if it is still slower on some other platforms), then frowny face. Not yet: slower than ductus by 20% but faster than current pisces by 2 times ! Also, this is with the Java version, right? Yes, my patch is pure java given as webrev: http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-1/ We got a decent 2x speedup in FX by porting the version of Open Pisces that you started with to C code (all except on Linux where we couldn't find the right gcc options to make it faster than Hotspot). So, we have yet to investigate a native version in the JDK which might provide even more gains... Personally I prefer working on java code as hotspot can perform so much optimizations for free and no pointers to deal with and more important: concurrent primitives (thread local, collections) ! Laurent On 4/15/13 3:01 AM, Laurent Bourgčs wrote: Jim, Andrea, I updated MapBench to provide test statistics (avg, median, stddev, rms, med + stddev, min, max) and CSV output (tab separator): http://jmmc.fr/~bourgesl/**share/java2d-pisces/MapBench/http://jmmc.fr/%7Ebourgesl/share/java2d-pisces/MapBench/ http://jmmc.fr/%7Ebourgesl/**share/java2d-pisces/MapBench/http://jmmc.fr/%7Ebourgesl/share/java2d-pisces/MapBench/ Here are the results (OpenJDK8 Ref vs Patched): http://jmmc.fr/~bourgesl/**share/java2d-pisces/ref_det.**loghttp://jmmc.fr/%7Ebourgesl/share/java2d-pisces/ref_det.log http://jmmc.fr/~bourgesl/**share/java2d-pisces/patch_det.**loghttp://jmmc.fr/%7Ebourgesl/share/java2d-pisces/patch_det.log testthreads ops TavgTmedstdDev rms Med+Stddev min max boulder_17 1 20 180,22% 181,08% 1186,01% 181,17% 185,92% 176,35% 170,36% boulder_17 2 20
Re: [OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements
whereas on Windows / Mac it is client compiler (c1). For Mac we only have a 64 bit VM which SFAIK should be c2 as well, yet in that case native was presumably still faster. So its also a matter of factoring in how good the code is that is generated by the C compiler. -phil. On 4/17/2013 1:26 PM, Richard Bair wrote: Also, this is with the Java version, right? Yes, my patch is pure java given as webrev: http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-1/ http://jmmc.fr/%7Ebourgesl/share/java2d-pisces/webrev-1/ We got a decent 2x speedup in FX by porting the version of Open Pisces that you started with to C code (all except on Linux where we couldn't find the right gcc options to make it faster than Hotspot). So, we have yet to investigate a native version in the JDK which might provide even more gains… Oleg did more analysis on this and it appears the reason hotspot on Linux was faster than the C version was because on Linux it is -server compiler (c2) whereas on Windows / Mac it is client compiler (c1). Possibly using -server on windows / mac would also have hotspot beating the C version, although that hasn't been tested. Richard
Re: [OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements
If I'm reading this correctly, your patch is faster even for a single thread? That's great news. One of the problems we've had with replacing Ductus is that it has been faster in a single thread situation than the open source versions we've created. One of its drawbacks is that it had been designed to take advantage of some AA-accelerating hardware that never came to be. With the accelerator it would have been insanely fast, but hardware went in a different direction. The problem was that this early design goal caused the entire library to be built around an abstraction layer that allowed for a single tile producer internally (because there would be only one - insanely fast - hardware chip available) and the software version of the abstraction layer thus had a lot of native static data structures (there's only one of me, right?) that prevented MT access. It was probably solvable, but I'd be happier if someone could come up with a faster rasterizer, imagining that there must have been some sort of advancements in the nearly 2 decades since the original was written. If I'm misinterpreting and single thread is still slower than Ductus (or if it is still slower on some other platforms), then frowny face. Also, this is with the Java version, right? We got a decent 2x speedup in FX by porting the version of Open Pisces that you started with to C code (all except on Linux where we couldn't find the right gcc options to make it faster than Hotspot). So, we have yet to investigate a native version in the JDK which might provide even more gains... ...jim On 4/15/13 3:01 AM, Laurent Bourgès wrote: Jim, Andrea, I updated MapBench to provide test statistics (avg, median, stddev, rms, med + stddev, min, max) and CSV output (tab separator): http://jmmc.fr/~bourgesl/share/java2d-pisces/MapBench/ http://jmmc.fr/%7Ebourgesl/share/java2d-pisces/MapBench/ Here are the results (OpenJDK8 Ref vs Patched): http://jmmc.fr/~bourgesl/share/java2d-pisces/ref_det.log http://jmmc.fr/~bourgesl/share/java2d-pisces/patch_det.log testthreads ops TavgTmedstdDev rms Med+Stddev min max boulder_17 1 20 180,22% 181,08% 1186,01% 181,17% 185,92% 176,35% 170,36% boulder_17 2 20 183,15% 183,80% 162,68% 183,78% 183,17% 174,01% 169,89% boulder_17 4 20 216,62% 218,03% 349,31% 218,87% 226,68% 172,15% 167,54% shp_alllayers_471 20 243,90% 244,86% 537,92% 244,87% 246,39% 240,64% 231,00% shp_alllayers_472 20 286,42% 287,07% 294,87% 287,07% 287,23% 277,19% 272,23% shp_alllayers_474 20 303,08% 302,15% 168,19% 301,90% 295,90% 462,70% 282,41% PATCH: testthreads ops TavgTmedstdDev rms Med+Stddev min max boulder_17 1 20 110,196 109,244 0,529 109,246 109,773 108,197 129,327 boulder_17 2 40 127,916 127,363 3,899 127,423 131,262 125,262 151,561 boulder_17 4 80 213,085 212,268 14,988 212,796 227,256 155,512 334,407 shp_alllayers_471 20 1139,4521134,8585,971 1134,8731140,829 1125,8591235,746 shp_alllayers_472 40 1306,8891304,59828,157 1304,902 1332,7551280,49 1420,351 shp_alllayers_474 80 2296,4872303,81 112,816 2306,57 2416,626 1390,31 2631,455 REF: testthreads ops TavgTmedstdDev rms Med+Stddev min max boulder_17 1 20 198,591 197,816 6,274 197,916 204,091 190,805 220,319 boulder_17 2 40 234,272 234,09 6,343 234,176 240,433 217,967 257,485 boulder_17 4 80 461,579 462,8 52,354 465,751 515,153 267,712 560,254 shp_alllayers_471 20 2779,1332778,82332,119 2779,009 2810,9432709,2852854,557 shp_alllayers_472 40 3743,2553745,11183,027 3746,031 3828,1383549,3643866,612 shp_alllayers_474 80 6960,23 6960,948189,75 6963,5337150,698 6432,9457431,541 Linux 64 server vm JVM: -Xms128m -Xmx128m (low mem) Laurent 2013/4/14 Andrea Aime andrea.a...@geo-solutions.it mailto:andrea.a...@geo-solutions.it On Tue, Apr 9, 2013 at 3:02 PM, Laurent Bourgès bourges.laur...@gmail.com mailto:bourges.laur...@gmail.com wrote: Dear Java2D members,
Re: [OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements
Jim, Andrea, I updated MapBench to provide test statistics (avg, median, stddev, rms, med + stddev, min, max) and CSV output (tab separator): http://jmmc.fr/~bourgesl/share/java2d-pisces/MapBench/ Here are the results (OpenJDK8 Ref vs Patched): http://jmmc.fr/~bourgesl/share/java2d-pisces/ref_det.log http://jmmc.fr/~bourgesl/share/java2d-pisces/patch_det.log test threads ops Tavg Tmed stdDev rms Med+Stddev min max boulder_17 1 20 180,22% 181,08% 1186,01% 181,17% 185,92% 176,35% 170,36% boulder_17 2 20 183,15% 183,80% 162,68% 183,78% 183,17% 174,01% 169,89% boulder_17 4 20 216,62% 218,03% 349,31% 218,87% 226,68% 172,15% 167,54% shp_alllayers_47 1 20 243,90% 244,86% 537,92% 244,87% 246,39% 240,64% 231,00% shp_alllayers_47 2 20 286,42% 287,07% 294,87% 287,07% 287,23% 277,19% 272,23% shp_alllayers_47 4 20 303,08% 302,15% 168,19% 301,90% 295,90% 462,70% 282,41% PATCH: test threads ops Tavg Tmed stdDev rms Med+Stddev min max boulder_17 1 20 110,196 109,244 0,529 109,246 109,773 108,197 129,327 boulder_17 2 40 127,916 127,363 3,899 127,423 131,262 125,262 151,561 boulder_17 4 80 213,085 212,268 14,988 212,796 227,256 155,512 334,407 shp_alllayers_47 1 20 1139,452 1134,858 5,971 1134,873 1140,829 1125,859 1235,746 shp_alllayers_47 2 40 1306,889 1304,598 28,157 1304,902 1332,755 1280,49 1420,351 shp_alllayers_47 4 80 2296,487 2303,81 112,816 2306,57 2416,626 1390,31 2631,455 REF: test threads ops Tavg Tmed stdDev rms Med+Stddev min max boulder_17 1 20 198,591 197,816 6,274 197,916 204,091 190,805 220,319 boulder_17 2 40 234,272 234,09 6,343 234,176 240,433 217,967 257,485 boulder_17 4 80 461,579 462,8 52,354 465,751 515,153 267,712 560,254 shp_alllayers_47 1 20 2779,133 2778,823 32,119 2779,009 2810,943 2709,285 2854,557 shp_alllayers_47 2 40 3743,255 3745,111 83,027 3746,031 3828,138 3549,364 3866,612 shp_alllayers_47 4 80 6960,23 6960,948 189,75 6963,533 7150,698 6432,945 7431,541 Linux 64 server vm JVM: -Xms128m -Xmx128m (low mem) Laurent 2013/4/14 Andrea Aime andrea.a...@geo-solutions.it On Tue, Apr 9, 2013 at 3:02 PM, Laurent Bourgès bourges.laur...@gmail.com wrote: Dear Java2D members, Could someone review the following webrev concerning Java2D Pisces to enhance its performance and reduce its memory footprint (RendererContext stored in thread local or concurrent queue): http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-1/ FYI I fixed file headers in this patch and signed my OCA 3 weeks ago. Remaining work: - cleanup (comments ...) - statistics to perform auto-tuning - cache / memory cleanup (SoftReference ?): use hints or System properties to adapt it to use cases - another problem: fix clipping performance in Dasher / Stroker for segments out of bounds Could somebody support me ? ie help me working on these tasks or just to discuss on Pisces algorithm / implementation ? Hi, I would like to express my support for this patch. Given that micro-benchmarks have already been run, I took the patch for a spin in a large, real world benchmark instead, the OSGeo WMS Shootout 2010 benchmark, for which you can see the results here: http://www.slideshare.net/gatewaygeomatics.com/wms-performance-shootout-2010 The presentation is long, but suffice it to say all Java based implementations took quite the beating due to the poor scalability of Ductus with antialiased rendering of vector data (for an executive summary just look at slide 27 and slide 66, where GeoServer, Oracle MapViewer and Constellation SDI were the Java based ones) I took the same tests and run them again on my machine (different hardware than the tests, don't try to compare the absolute values), using Oracle JDK 1.7.0_17, OpenJDK 8 (a checkout a couple of weeks old) and the same, but with Laurent's patches applied. Here are the results, throughput (in maps generated per second) with the load generator (JMeter) going up from one client to 64 concurrent clients: *Threads* *JDK 1.7.0_17* *OpenJDK 8, vanilla* *OpenJDK 8 + pisces renderer improvements* *Pisces renderer performance gain, %* 1 13,97 12,43 13,03 4,75% 2 22,08 20,60 20,77 0,81% 4 34,36 33,15 33,36 0,62% 8 39,39 40,51 41,71 2,96% 16 40,61 44,57 46,98 5,39% 32 41,41 44,73 48,16 7,66% 64 37,09 42,19 45,28 7,32% Well, first of all, congratulations to the JDK developers, don't know what changed in JDK 8, but GeoServer seems to like it quite a bit :-). That said, Laurent's patch also gives a visible boost, especially when several concurrent clients are asking for the maps. Mind, as I said, this is no micro-benchmark, it is a real application loading doing a lot of I/O (from the operating system file cache), other processing before the data reaches the rendering pipeline, and then has to PNG encode the output BufferedImage (PNG encoding being rather expensive), so getting this speedup from just a change in the rendering pipeline is significant. Long story short... personally I'd be very
[OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements
Dear Java2D members, Could someone review the following webrev concerning Java2D Pisces to enhance its performance and reduce its memory footprint (RendererContext stored in thread local or concurrent queue): http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-1/ FYI I fixed file headers in this patch and signed my OCA 3 weeks ago. Remaining work: - cleanup (comments ...) - statistics to perform auto-tuning - cache / memory cleanup (SoftReference ?): use hints or System properties to adapt it to use cases - another problem: fix clipping performance in Dasher / Stroker for segments out of bounds Could somebody support me ? ie help me working on these tasks or just to discuss on Pisces algorithm / implementation ? Best regards, Laurent Bourgès