Re: [OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements

2013-04-19 Thread Richard Bair
 
 Also, this is with the Java version, right?  
 
 Yes, my patch is pure java given as webrev:
 http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-1/
  
 We got a decent 2x speedup in FX by porting the version of Open Pisces that 
 you started with to C code (all except on Linux where we couldn't find the 
 right gcc options to make it faster than Hotspot).  So, we have yet to 
 investigate a native version in the JDK which might provide even more gains…

Oleg did more analysis on this and it appears the reason hotspot on Linux was 
faster than the C version was because on Linux it is -server compiler (c2) 
whereas on Windows / Mac it is client compiler (c1). Possibly using -server on 
windows / mac would also have hotspot beating the C version, although that 
hasn't been tested.

Richard



Re: [OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements

2013-04-19 Thread Andrea Aime
On Tue, Apr 9, 2013 at 3:02 PM, Laurent Bourgès
bourges.laur...@gmail.comwrote:

 Dear Java2D members,

 Could someone review the following webrev concerning Java2D Pisces to
 enhance its performance and reduce its memory footprint (RendererContext
 stored in thread local or concurrent queue):
 http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-1/

 FYI I fixed file headers in this patch and signed my OCA 3 weeks ago.

 Remaining work:
 - cleanup (comments ...)
 - statistics to perform auto-tuning
 - cache / memory cleanup (SoftReference ?): use hints or System properties
 to adapt it to use cases
 - another problem: fix clipping performance in Dasher / Stroker for
 segments out of bounds

 Could somebody support me ? ie help me working on these tasks or just to
 discuss on Pisces algorithm / implementation ?


Hi,
I would like to express my support for this patch.
Given that micro-benchmarks have already been run, I took the patch for a
spin in a large, real world benchmark instead,
the OSGeo WMS Shootout 2010 benchmark, for which you can see the results
here:
http://www.slideshare.net/gatewaygeomatics.com/wms-performance-shootout-2010

The presentation is long, but suffice it to say all Java based
implementations took quite the beating due to the
poor scalability of Ductus with antialiased rendering of vector data (for
an executive summary just look
at slide 27 and slide 66, where GeoServer, Oracle MapViewer and
Constellation SDI were the
Java based ones)

I took the same tests and run them again on my machine (different hardware
than the tests, don't try to compare
the absolute values), using Oracle JDK 1.7.0_17, OpenJDK 8 (a checkout a
couple of weeks old) and the
same, but with Laurent's patches applied.
Here are the results, throughput (in maps generated per second) with the
load generator (JMeter) going
up from one client to 64 concurrent clients:

   *Threads* *JDK 1.7.0_17* *OpenJDK 8, vanilla* *OpenJDK 8 + pisces
renderer improvements* *Pisces renderer performance gain, %*  1 13,97 12,43
13,03 4,75%  2 22,08 20,60 20,77 0,81%  4 34,36 33,15 33,36 0,62%  8 39,39
40,51 41,71 2,96%  16 40,61 44,57 46,98 5,39%  32 41,41 44,73 48,16 7,66%
64 37,09 42,19 45,28 7,32%

Well, first of all, congratulations to the JDK developers, don't know what
changed in JDK 8, but
GeoServer seems to like it quite a bit :-).
That said, Laurent's patch also gives a visible boost, especially when
several concurrent clients are
asking for the maps.

Mind, as I said, this is no micro-benchmark, it is a real application
loading doing a lot of I/O
(from the operating system file cache), other processing before the data
reaches the rendering
pipeline, and then has to PNG encode the output BufferedImage (PNG encoding
being rather
expensive), so getting this speedup from just a change in the rendering
pipeline is significant.

Long story short... personally I'd be very happy if this patch was going to
be integrated in Java 8 :-)

Cheers
Andrea

-- 
==
GeoServer training in Milan, 6th  7th June 2013!  Visit
http://geoserver.geo-solutions.it for more information.
==

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054  Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39  339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

---


Re: [OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements

2013-04-17 Thread Laurent Bourgès
Jim,

thanks for having some interest for my efforts !
As I got almost no feedback, I felt quite disappointed and was thinking
that improving pisces was not important ...

Here are ductus results and comparison (open office format):
http://jmmc.fr/~bourgesl/share/java2d-pisces/ductus_det.log
http://jmmc.fr/~bourgesl/share/java2d-pisces/compareRef_Patch.ods

   test threads ops Tavg Tmed stdDev rms *Med+Stddev* min max  boulder_17 1
20 73,92% 69,34% 27,98% 69,34% *69,14%* 69,81% 146,89%  boulder_17 2 20
110,86% 110,48% 613,80% 112,01% *125,43%* 94,71% 136,35%  boulder_17 4 20
135,28% 135,86% 226,61% 136,46% *141,85%* 125,14% 111,32%  shp_alllayers_47
1 20 82,25% 82,49% 47,50% 82,48% *82,30%* 82,64% 78,08%  shp_alllayers_47 2
20 115,87% 115,67% 315,45% 115,85% *119,89%* 109,33% 128,71%
shp_alllayers_47 4 20 218,59% 218,76% 169,38% 218,65% *216,45%* 220,17%
206,17%
Ductus vs Patch
   *1* *80,74%*  *2* *120,69%*  *4* *205,92%*
Reminder: Ref vs Patch
   *1* *237,71%*  *2* *271,68%*  *4* *286,15%*

Note: I only have 2 cores + HT on my laptop and do not test with more
threads (64 like andrea).

2013/4/16 Jim Graham james.gra...@oracle.com

 If I'm reading this correctly, your patch is faster even for a single
 thread?  That's great news.


Not yet, but ductus is now only 20% faster than my patch and 20% and 2x
slower with 2 and 4 threads :
I still hope to beat it applying few more optimizations:
- Renderer.ScanLine iterator / Renderer.endRendering can be improved
- avoid few more array zero fill (reset) if possible
- adding statistics to better set initial array sizes ...
- use SunGraphics2D to hold an hard reference (temporarly) on
RendererContext (to avoid many ThreadLocal.get())
- cache eviction (WeakReference or SoftReference) ?

Why not use divide and conquer (thread pool) to boost single thread
rendering if the machine has more cpu cores ?
It would be helpful if the AATileGenerator has access to SunGraphics2D to
get rendering hints or share variables (cache ...)

For the moment, I did not modify the algorithms itself but I will do it to
enhance clipping (dasher segments / stroker) ...


 One of the problems we've had with replacing Ductus is that it has been
 faster in a single thread situation than the open source versions we've
 created.  One of its drawbacks is that it had been designed to take
 advantage of some AA-accelerating hardware that never came to be.  With the
 accelerator it would have been insanely fast, but hardware went in a
 different direction.  The problem was that this early design goal caused
 the entire library to be built around an abstraction layer that allowed for
 a single tile producer internally (because there would be only one -
 insanely fast - hardware chip available) and the software version of the
 abstraction layer thus had a lot of native static data structures
 (there's only one of me, right?) that prevented MT access.  It was probably
 solvable, but I'd be happier if someone could come up with a faster
 rasterizer, imagining that there must have been some sort of advancements
 in the nearly 2 decades since the original was written.

 If I'm misinterpreting and single thread is still slower than Ductus (or
 if it is still slower on some other platforms), then frowny face.


Not yet: slower than ductus by 20% but faster than current pisces by 2
times !


 Also, this is with the Java version, right?


Yes, my patch is pure java given as webrev:
http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-1/


 We got a decent 2x speedup in FX by porting the version of Open Pisces
 that you started with to C code (all except on Linux where we couldn't find
 the right gcc options to make it faster than Hotspot).  So, we have yet to
 investigate a native version in the JDK which might provide even more
 gains...


Personally I prefer working on java code as hotspot can perform so much
optimizations for free and no pointers to deal with and more important:
concurrent primitives (thread local, collections) !

Laurent



 On 4/15/13 3:01 AM, Laurent Bourgčs wrote:

 Jim, Andrea,

 I updated MapBench to provide test statistics (avg, median, stddev, rms,
 med + stddev, min, max) and CSV output (tab separator):
 http://jmmc.fr/~bourgesl/**share/java2d-pisces/MapBench/http://jmmc.fr/%7Ebourgesl/share/java2d-pisces/MapBench/
 http://jmmc.fr/%7Ebourgesl/**share/java2d-pisces/MapBench/http://jmmc.fr/%7Ebourgesl/share/java2d-pisces/MapBench/
 



 Here are the results (OpenJDK8 Ref vs Patched):
 http://jmmc.fr/~bourgesl/**share/java2d-pisces/ref_det.**loghttp://jmmc.fr/%7Ebourgesl/share/java2d-pisces/ref_det.log
 http://jmmc.fr/~bourgesl/**share/java2d-pisces/patch_det.**loghttp://jmmc.fr/%7Ebourgesl/share/java2d-pisces/patch_det.log

 testthreads ops TavgTmedstdDev  rms
 Med+Stddev  min max
 boulder_17  1   20  180,22% 181,08% 1186,01%
181,17% 185,92%
 176,35% 170,36%
 boulder_17  2   20  

Re: [OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements

2013-04-17 Thread Phil Race

 whereas on Windows / Mac it is client compiler (c1).

For Mac we only have a 64 bit VM which SFAIK should be c2 as well,
yet in that case native was presumably still faster. So its also a matter
of factoring in how good the code is that is generated by the C compiler.

-phil.

On 4/17/2013 1:26 PM, Richard Bair wrote:



Also, this is with the Java version, right? 



Yes, my patch is pure java given as webrev:
http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-1/ 
http://jmmc.fr/%7Ebourgesl/share/java2d-pisces/webrev-1/


We got a decent 2x speedup in FX by porting the version of Open
Pisces that you started with to C code (all except on Linux where
we couldn't find the right gcc options to make it faster than
Hotspot).  So, we have yet to investigate a native version in the
JDK which might provide even more gains… 



Oleg did more analysis on this and it appears the reason hotspot on 
Linux was faster than the C version was because on Linux it is -server 
compiler (c2) whereas on Windows / Mac it is client compiler (c1). 
Possibly using -server on windows / mac would also have hotspot 
beating the C version, although that hasn't been tested.


Richard





Re: [OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements

2013-04-16 Thread Jim Graham
If I'm reading this correctly, your patch is faster even for a single 
thread?  That's great news.


One of the problems we've had with replacing Ductus is that it has been 
faster in a single thread situation than the open source versions we've 
created.  One of its drawbacks is that it had been designed to take 
advantage of some AA-accelerating hardware that never came to be.  With 
the accelerator it would have been insanely fast, but hardware went in a 
different direction.  The problem was that this early design goal caused 
the entire library to be built around an abstraction layer that allowed 
for a single tile producer internally (because there would be only one 
- insanely fast - hardware chip available) and the software version of 
the abstraction layer thus had a lot of native static data structures 
(there's only one of me, right?) that prevented MT access.  It was 
probably solvable, but I'd be happier if someone could come up with a 
faster rasterizer, imagining that there must have been some sort of 
advancements in the nearly 2 decades since the original was written.


If I'm misinterpreting and single thread is still slower than Ductus (or 
if it is still slower on some other platforms), then frowny face.


Also, this is with the Java version, right?  We got a decent 2x speedup 
in FX by porting the version of Open Pisces that you started with to C 
code (all except on Linux where we couldn't find the right gcc options 
to make it faster than Hotspot).  So, we have yet to investigate a 
native version in the JDK which might provide even more gains...


...jim

On 4/15/13 3:01 AM, Laurent Bourgès wrote:

Jim, Andrea,

I updated MapBench to provide test statistics (avg, median, stddev, rms,
med + stddev, min, max) and CSV output (tab separator):
http://jmmc.fr/~bourgesl/share/java2d-pisces/MapBench/
http://jmmc.fr/%7Ebourgesl/share/java2d-pisces/MapBench/


Here are the results (OpenJDK8 Ref vs Patched):
http://jmmc.fr/~bourgesl/share/java2d-pisces/ref_det.log
http://jmmc.fr/~bourgesl/share/java2d-pisces/patch_det.log

testthreads ops TavgTmedstdDev  rms Med+Stddev  
min max
boulder_17  1   20  180,22% 181,08% 1186,01%
181,17% 185,92%
176,35% 170,36%
boulder_17  2   20  183,15% 183,80% 162,68% 
183,78% 183,17%
174,01% 169,89%
boulder_17  4   20  216,62% 218,03% 349,31% 
218,87% 226,68%
172,15% 167,54%
shp_alllayers_471   20  243,90% 244,86% 537,92% 
244,87% 246,39%
240,64% 231,00%
shp_alllayers_472   20  286,42% 287,07% 294,87% 
287,07% 287,23%
277,19% 272,23%
shp_alllayers_474   20  303,08% 302,15% 168,19% 
301,90% 295,90%
462,70% 282,41%



PATCH:
testthreads ops TavgTmedstdDev  rms Med+Stddev  
min max
boulder_17  1   20  110,196 109,244 0,529   109,246 
109,773 108,197
129,327
boulder_17  2   40  127,916 127,363 3,899   127,423 
131,262 125,262
151,561
boulder_17  4   80  213,085 212,268 14,988  212,796 
227,256 155,512
334,407
shp_alllayers_471   20  1139,4521134,8585,971   
1134,8731140,829
1125,8591235,746
shp_alllayers_472   40  1306,8891304,59828,157  
1304,902
1332,7551280,49 1420,351
shp_alllayers_474   80  2296,4872303,81 112,816 
2306,57 2416,626
1390,31 2631,455



REF:
testthreads ops TavgTmedstdDev  rms Med+Stddev  
min max
boulder_17  1   20  198,591 197,816 6,274   197,916 
204,091 190,805
220,319
boulder_17  2   40  234,272 234,09  6,343   234,176 
240,433 217,967
257,485
boulder_17  4   80  461,579 462,8   52,354  465,751 
515,153 267,712
560,254
shp_alllayers_471   20  2779,1332778,82332,119  
2779,009
2810,9432709,2852854,557
shp_alllayers_472   40  3743,2553745,11183,027  
3746,031
3828,1383549,3643866,612
shp_alllayers_474   80  6960,23 6960,948189,75  
6963,5337150,698
6432,9457431,541


Linux 64 server vm
JVM: -Xms128m -Xmx128m (low mem)

Laurent

2013/4/14 Andrea Aime andrea.a...@geo-solutions.it
mailto:andrea.a...@geo-solutions.it

On Tue, Apr 9, 2013 at 3:02 PM, Laurent Bourgès
bourges.laur...@gmail.com mailto:bourges.laur...@gmail.com wrote:

Dear Java2D members,

Re: [OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements

2013-04-15 Thread Laurent Bourgès
Jim, Andrea,

I updated MapBench to provide test statistics (avg, median, stddev, rms,
med + stddev, min, max) and CSV output (tab separator):
http://jmmc.fr/~bourgesl/share/java2d-pisces/MapBench/


Here are the results (OpenJDK8 Ref vs Patched):
http://jmmc.fr/~bourgesl/share/java2d-pisces/ref_det.log
http://jmmc.fr/~bourgesl/share/java2d-pisces/patch_det.log

   test threads ops Tavg Tmed stdDev rms Med+Stddev min max  boulder_17 1 20
180,22% 181,08% 1186,01% 181,17% 185,92% 176,35% 170,36%  boulder_17 2 20
183,15% 183,80% 162,68% 183,78% 183,17% 174,01% 169,89%  boulder_17 4 20
216,62% 218,03% 349,31% 218,87% 226,68% 172,15% 167,54%  shp_alllayers_47 1
20 243,90% 244,86% 537,92% 244,87% 246,39% 240,64% 231,00%  shp_alllayers_47
2 20 286,42% 287,07% 294,87% 287,07% 287,23% 277,19% 272,23%
shp_alllayers_47 4 20 303,08% 302,15% 168,19% 301,90% 295,90% 462,70%
282,41%

PATCH:
   test threads ops Tavg Tmed stdDev rms Med+Stddev min max  boulder_17 1 20
110,196 109,244 0,529 109,246 109,773 108,197 129,327  boulder_17 2 40
127,916 127,363 3,899 127,423 131,262 125,262 151,561  boulder_17 4 80
213,085 212,268 14,988 212,796 227,256 155,512 334,407  shp_alllayers_47 1
20 1139,452 1134,858 5,971 1134,873 1140,829 1125,859 1235,746
shp_alllayers_47 2 40 1306,889 1304,598 28,157 1304,902 1332,755 1280,49
1420,351  shp_alllayers_47 4 80 2296,487 2303,81 112,816 2306,57 2416,626
1390,31 2631,455

REF:
   test threads ops Tavg Tmed stdDev rms Med+Stddev min max  boulder_17 1 20
198,591 197,816 6,274 197,916 204,091 190,805 220,319  boulder_17 2 40
234,272 234,09 6,343 234,176 240,433 217,967 257,485  boulder_17 4 80
461,579 462,8 52,354 465,751 515,153 267,712 560,254  shp_alllayers_47 1 20
2779,133 2778,823 32,119 2779,009 2810,943 2709,285 2854,557
shp_alllayers_47 2 40 3743,255 3745,111 83,027 3746,031 3828,138 3549,364
3866,612  shp_alllayers_47 4 80 6960,23 6960,948 189,75 6963,533 7150,698
6432,945 7431,541
Linux 64 server vm
JVM: -Xms128m -Xmx128m (low mem)

Laurent

2013/4/14 Andrea Aime andrea.a...@geo-solutions.it

 On Tue, Apr 9, 2013 at 3:02 PM, Laurent Bourgès bourges.laur...@gmail.com
  wrote:

 Dear Java2D members,

 Could someone review the following webrev concerning Java2D Pisces to
 enhance its performance and reduce its memory footprint (RendererContext
 stored in thread local or concurrent queue):
 http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-1/

 FYI I fixed file headers in this patch and signed my OCA 3 weeks ago.

 Remaining work:
 - cleanup (comments ...)
 - statistics to perform auto-tuning
 - cache / memory cleanup (SoftReference ?): use hints or System
 properties to adapt it to use cases
 - another problem: fix clipping performance in Dasher / Stroker for
 segments out of bounds

 Could somebody support me ? ie help me working on these tasks or just to
 discuss on Pisces algorithm / implementation ?


 Hi,
 I would like to express my support for this patch.
 Given that micro-benchmarks have already been run, I took the patch for a
 spin in a large, real world benchmark instead,
 the OSGeo WMS Shootout 2010 benchmark, for which you can see the results
 here:

 http://www.slideshare.net/gatewaygeomatics.com/wms-performance-shootout-2010

 The presentation is long, but suffice it to say all Java based
 implementations took quite the beating due to the
 poor scalability of Ductus with antialiased rendering of vector data (for
 an executive summary just look
 at slide 27 and slide 66, where GeoServer, Oracle MapViewer and
 Constellation SDI were the
 Java based ones)

 I took the same tests and run them again on my machine (different hardware
 than the tests, don't try to compare
 the absolute values), using Oracle JDK 1.7.0_17, OpenJDK 8 (a checkout a
 couple of weeks old) and the
 same, but with Laurent's patches applied.
 Here are the results, throughput (in maps generated per second) with the
 load generator (JMeter) going
 up from one client to 64 concurrent clients:

*Threads* *JDK 1.7.0_17* *OpenJDK 8, vanilla* *OpenJDK 8 + pisces
 renderer improvements* *Pisces renderer performance gain, %*  1 13,97
 12,43 13,03 4,75%  2 22,08 20,60 20,77 0,81%  4 34,36 33,15 33,36 0,62%  8
 39,39 40,51 41,71 2,96%  16 40,61 44,57 46,98 5,39%  32 41,41 44,73 48,16
 7,66%  64 37,09 42,19 45,28 7,32%

 Well, first of all, congratulations to the JDK developers, don't know what
 changed in JDK 8, but
 GeoServer seems to like it quite a bit :-).
 That said, Laurent's patch also gives a visible boost, especially when
 several concurrent clients are
 asking for the maps.

 Mind, as I said, this is no micro-benchmark, it is a real application
 loading doing a lot of I/O
 (from the operating system file cache), other processing before the data
 reaches the rendering
 pipeline, and then has to PNG encode the output BufferedImage (PNG
 encoding being rather
 expensive), so getting this speedup from just a change in the rendering
 pipeline is significant.

 Long story short... personally I'd be very 

[OpenJDK 2D-Dev] sun.java2D.Pisces renderer Performance and Memory enhancements

2013-04-09 Thread Laurent Bourgès
Dear Java2D members,

Could someone review the following webrev concerning Java2D Pisces to
enhance its performance and reduce its memory footprint (RendererContext
stored in thread local or concurrent queue):
http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-1/

FYI I fixed file headers in this patch and signed my OCA 3 weeks ago.

Remaining work:
- cleanup (comments ...)
- statistics to perform auto-tuning
- cache / memory cleanup (SoftReference ?): use hints or System properties
to adapt it to use cases
- another problem: fix clipping performance in Dasher / Stroker for
segments out of bounds

Could somebody support me ? ie help me working on these tasks or just to
discuss on Pisces algorithm / implementation ?

Best regards,
Laurent Bourgès