Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-11-27 Thread Laurent Bourgès
Hi Phil,

> I proposed 2 small fixes for jdk12... still to be done, as better
> solutions are not yet ready and I am completely overbusy on testing Java
> Sorting algorithms.
>
> Maybe I will give up fixing the native renderer, for these reasons:
> - stdlib qsort implementation and performance (bentley qsort 93 or merge
> sort on linux) is platform dependent, so it may take time to validate its
> behaviour (small vs large arrays, not really random, closer to almost
> sorted data)
>
>
> I don't think we should use anyC++  stdlib:: code here if that is what you
> mean.
>

No I proposed to use stdlib (C) qsort(... cmp) as it is done in the same
class to sort initially edges, i.e. see my previous diff => ~1s time.
I prefer focusing on integrating Marlin nonAA (like jfx) to get rid of this
native code, than fixing C code, except such simple change.


>
> Any small improvements we can make by having a better sort in our sources
> would
> be better.
> If they aren't ready for 12, no problem. 13 is around the corner ..
>

Yes, I know, but if you have time, I can propose a very simple Marlin patch
to improve a bit its sorting performance in such extreme case: ~0.5s to
0.22s !


> - AA rendering is now the common case where the Marlin renderer is active
> & fast enough.
>
> Would you accept enabling antialiasing by default in jdk12 ?
> Up to now, RenderingHint.ANTIALIASING_DEFAULT means ANTIALIASING_OFF.
>
>
> That is a big change and I do not think we should do it.
>

Yes it is a one-liner but it can have large impacts, so I agree it should
be discussed in CSR ... so maybe in 13 ?

Cheers,
Laurent



Several users faced performance issues with the native renderer (hidpi,
complex shapes, clipping dashed shapes) where the Marlin renderer performs
far better.

Finally I can now focus on a trivial Marlin patch to tune its sort
algorithm selection...

Cheers,
Laurent

Le jeu. 22 nov. 2018 à 00:04, Sergey Bylokhov 
a écrit :

> Hi, Laurent.
>
> I do not think that we should postpone it to the next version.
> Just send the changes when they are ready, if the review fails
> to take place at the right time, then the fix will be
> moved to the jdk13.
>
> On 20/11/2018 00:28, Laurent Bourgès wrote:
> > As OpenJDK12 RDP1 is coming soon, I propose this plan:
> > - integrate this basic fix in ShapeSpanIterator.c code to use stdlib
> sort (mergesort on linux)
> > - integrate a very simple patch in Marlin renderer to disable insertion
> sort for large arrays: 0.5s to 0.25s, few LOC
> > - postpone my changes to Marlin sort & Marlin nonAA renderer integration
> in OpenJDK 13
> >
> > Will you have time to review 2 small patchs on time ?
>
>


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-11-26 Thread Phil Race

Hi,


On 11/22/18 12:40 AM, Laurent Bourgès wrote:

Hi sergey,

I proposed 2 small fixes for jdk12... still to be done, as better 
solutions are not yet ready and I am completely overbusy on testing 
Java Sorting algorithms.


Maybe I will give up fixing the native renderer, for these reasons:
- stdlib qsort implementation and performance (bentley qsort 93 or 
merge sort on linux) is platform dependent, so it may take time to 
validate its behaviour (small vs large arrays, not really random, 
closer to almost sorted data)


I don't think we should use anyC++  stdlib:: code here if that is what 
you mean.


Any small improvements we can make by having a better sort in our 
sources would

be better.
If they aren't ready for 12, no problem. 13 is around the corner ..

- AA rendering is now the common case where the Marlin renderer is 
active & fast enough.


Would you accept enabling antialiasing by default in jdk12 ?
Up to now, RenderingHint.ANTIALIASING_DEFAULT means ANTIALIASING_OFF.


That is a big change and I do not think we should do it.

-phil.




Several users faced performance issues with the native renderer 
(hidpi, complex shapes, clipping dashed shapes) where the Marlin 
renderer performs far better.


Finally I can now focus on a trivial Marlin patch to tune its sort 
algorithm selection...


Cheers,
Laurent

Le jeu. 22 nov. 2018 à 00:04, Sergey Bylokhov 
mailto:sergey.bylok...@oracle.com>> a écrit :


Hi, Laurent.

I do not think that we should postpone it to the next version.
Just send the changes when they are ready, if the review fails
to take place at the right time, then the fix will be
moved to the jdk13.

On 20/11/2018 00:28, Laurent Bourgès wrote:
> As OpenJDK12 RDP1 is coming soon, I propose this plan:
> - integrate this basic fix in ShapeSpanIterator.c code to use
stdlib sort (mergesort on linux)
> - integrate a very simple patch in Marlin renderer to disable
insertion sort for large arrays: 0.5s to 0.25s, few LOC
> - postpone my changes to Marlin sort & Marlin nonAA renderer
integration in OpenJDK 13
>
> Will you have time to review 2 small patchs on time ?





Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-11-22 Thread Laurent Bourgès
Hi sergey,

I proposed 2 small fixes for jdk12... still to be done, as better solutions
are not yet ready and I am completely overbusy on testing Java Sorting
algorithms.

Maybe I will give up fixing the native renderer, for these reasons:
- stdlib qsort implementation and performance (bentley qsort 93 or merge
sort on linux) is platform dependent, so it may take time to validate its
behaviour (small vs large arrays, not really random, closer to almost
sorted data)
- AA rendering is now the common case where the Marlin renderer is active &
fast enough.

Would you accept enabling antialiasing by default in jdk12 ?
Up to now, RenderingHint.ANTIALIASING_DEFAULT means ANTIALIASING_OFF.

Several users faced performance issues with the native renderer (hidpi,
complex shapes, clipping dashed shapes) where the Marlin renderer performs
far better.

Finally I can now focus on a trivial Marlin patch to tune its sort
algorithm selection...

Cheers,
Laurent

Le jeu. 22 nov. 2018 à 00:04, Sergey Bylokhov 
a écrit :

> Hi, Laurent.
>
> I do not think that we should postpone it to the next version.
> Just send the changes when they are ready, if the review fails
> to take place at the right time, then the fix will be
> moved to the jdk13.
>
> On 20/11/2018 00:28, Laurent Bourgès wrote:
> > As OpenJDK12 RDP1 is coming soon, I propose this plan:
> > - integrate this basic fix in ShapeSpanIterator.c code to use stdlib
> sort (mergesort on linux)
> > - integrate a very simple patch in Marlin renderer to disable insertion
> sort for large arrays: 0.5s to 0.25s, few LOC
> > - postpone my changes to Marlin sort & Marlin nonAA renderer integration
> in OpenJDK 13
> >
> > Will you have time to review 2 small patchs on time ?
>
>


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-11-21 Thread Sergey Bylokhov

Hi, Laurent.

I do not think that we should postpone it to the next version.
Just send the changes when they are ready, if the review fails
to take place at the right time, then the fix will be
moved to the jdk13.

On 20/11/2018 00:28, Laurent Bourgès wrote:

As OpenJDK12 RDP1 is coming soon, I propose this plan:
- integrate this basic fix in ShapeSpanIterator.c code to use stdlib sort 
(mergesort on linux)
- integrate a very simple patch in Marlin renderer to disable insertion sort 
for large arrays: 0.5s to 0.25s, few LOC
- postpone my changes to Marlin sort & Marlin nonAA renderer integration in 
OpenJDK 13

Will you have time to review 2 small patchs on time ?





Cheers,
Laurent

Le mar. 23 oct. 2018 à 22:37, Laurent Bourgès mailto:bourges.laur...@gmail.com>> a écrit :

Phil,
I quickly modified the final update & sort loop to:
- move sort in another block
- use qsort() using a new comparator sortSegmentsByCurX

This improves performance in PolyLineTest by 3 times: ~1s vs 3.5s !
Apparently qsort() is not optimal (comparator can not be inlined by c) so 
it may explain why Marlin (0x0 sampling) is still 2 times faster with its 
custom merge-sort (in-place).

Any idea to improve C sort ?
Is it good enough ?

- USE_QSORT_X: 1
oct. 23, 2018 10:15:29 PM polylinetest.Canvas paintComponent
INFOS: Paint Time: 1,081s
INFOS: Paint Time: 1,058s
INFOS: Paint Time: 1,067s

- USE_QSORT_X: 0
oct. 23, 2018 10:18:50 PM polylinetest.Canvas paintComponent
INFOS: Paint Time: 3,318s
INFOS: Paint Time: 3,258s
INFOS: Paint Time: 3,273s

Patch:
diff -r 297450fcab26 
src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c
--- a/src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c  
  Tue Oct 16 23:21:05 2018 +0530
+++ b/src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c  
  Tue Oct 23 22:31:00 2018 +0200
@@ -1243,6 +1243,18 @@
  }
  }

+/* LBO: enable (1) / disable (0) qsort on curx */
+#define USE_QSORT_X (0)
+
+static int CDECL
+sortSegmentsByCurX(const void *elem1, const void *elem2)
+{
+    jint x1 = (*(segmentData **)elem1)->curx;
+    jint x2 = (*(segmentData **)elem2)->curx;
+
+    return (x1 - x2);
+}
+
  static jboolean
  ShapeSINextSpan(void *state, jint spanbox[])
  {
@@ -1378,16 +1390,28 @@
  seg->curx = x0;
  seg->cury = y0;
  seg->error = err;
+    }

-    /* Then make sure the segment is sorted by x0 */
-    for (new = cur; new > lo; new--) {
-    segmentData *seg2 = segmentTable[new - 1];
-    if (seg2->curx <= x0) {
-    break;
+    if (USE_QSORT_X && (hi - lo) > 100)
+    {
+    /* use quick sort on [lo - hi] range */
+    qsort(&(segmentTable[lo]), (hi - lo), sizeof(segmentData *),
+    sortSegmentsByCurX);
+    } else {
+    for (cur = lo; cur < hi; cur++) {
+    seg = segmentTable[cur];
+    x0 = seg->curx;
+
+    /* Then make sure the segment is sorted by x0 */
+    for (new = cur; new > lo; new--) {
+    segmentData *seg2 = segmentTable[new - 1];
+    if (seg2->curx <= x0) {
+    break;
+    }
+    segmentTable[new] = seg2;
  }
-    segmentTable[new] = seg2;
+    segmentTable[new] = seg;
  }
-    segmentTable[new] = seg;
  }
  cur = lo;
  }

Cheers,
Laurent

Le mar. 23 oct. 2018 à 08:30, Laurent Bourgès mailto:bourges.laur...@gmail.com>> a écrit :

Phil,
Yesterday I started hacking ShapeSpanIterator.c to add stats: the last 
stage (sort by x0) is the bottleneck.
In this case every sort takes up to 15ms per pixel row !

I will see if I can adapt Marlin's MergeSort.java to C to have an 
efficient sort in-place.
Do you know if libawt has already an efficient sort instead of porting 
mine ?

PS: "To save the planet, make software more efficient" is my quote of 
the day !

Cheers,
Laurent




--
Best regards, Sergey.


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-11-20 Thread Peter Hull
On Tue, Nov 20, 2018 at 8:28 AM Laurent Bourgès
 wrote:
> Will you have time to review 2 small patchs on time ?
If it would help for me to have a look, I am happy to do so.
Pete


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-11-20 Thread Laurent Bourgès
Hi,
As OpenJDK12 RDP1 is coming soon, I propose this plan:
- integrate this basic fix in ShapeSpanIterator.c code to use stdlib sort
(mergesort on linux)
- integrate a very simple patch in Marlin renderer to disable insertion
sort for large arrays: 0.5s to 0.25s, few LOC
- postpone my changes to Marlin sort & Marlin nonAA renderer integration in
OpenJDK 13

Will you have time to review 2 small patchs on time ?

Cheers,
Laurent

Le mar. 23 oct. 2018 à 22:37, Laurent Bourgès  a
écrit :

> Phil,
> I quickly modified the final update & sort loop to:
> - move sort in another block
> - use qsort() using a new comparator sortSegmentsByCurX
>
> This improves performance in PolyLineTest by 3 times: ~1s vs 3.5s !
> Apparently qsort() is not optimal (comparator can not be inlined by c) so
> it may explain why Marlin (0x0 sampling) is still 2 times faster with its
> custom merge-sort (in-place).
>
> Any idea to improve C sort ?
> Is it good enough ?
>
> - USE_QSORT_X: 1
> oct. 23, 2018 10:15:29 PM polylinetest.Canvas paintComponent
> INFOS: Paint Time: 1,081s
> INFOS: Paint Time: 1,058s
> INFOS: Paint Time: 1,067s
>
> - USE_QSORT_X: 0
> oct. 23, 2018 10:18:50 PM polylinetest.Canvas paintComponent
> INFOS: Paint Time: 3,318s
> INFOS: Paint Time: 3,258s
> INFOS: Paint Time: 3,273s
>
> Patch:
> diff -r 297450fcab26
> src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c
> ---
> a/src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c
> Tue Oct 16 23:21:05 2018 +0530
> +++
> b/src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c
> Tue Oct 23 22:31:00 2018 +0200
> @@ -1243,6 +1243,18 @@
>  }
>  }
>
> +/* LBO: enable (1) / disable (0) qsort on curx */
> +#define USE_QSORT_X (0)
> +
> +static int CDECL
> +sortSegmentsByCurX(const void *elem1, const void *elem2)
> +{
> +jint x1 = (*(segmentData **)elem1)->curx;
> +jint x2 = (*(segmentData **)elem2)->curx;
> +
> +return (x1 - x2);
> +}
> +
>  static jboolean
>  ShapeSINextSpan(void *state, jint spanbox[])
>  {
> @@ -1378,16 +1390,28 @@
>  seg->curx = x0;
>  seg->cury = y0;
>  seg->error = err;
> +}
>
> -/* Then make sure the segment is sorted by x0 */
> -for (new = cur; new > lo; new--) {
> -segmentData *seg2 = segmentTable[new - 1];
> -if (seg2->curx <= x0) {
> -break;
> +if (USE_QSORT_X && (hi - lo) > 100)
> +{
> +/* use quick sort on [lo - hi] range */
> +qsort(&(segmentTable[lo]), (hi - lo), sizeof(segmentData *),
> +sortSegmentsByCurX);
> +} else {
> +for (cur = lo; cur < hi; cur++) {
> +seg = segmentTable[cur];
> +x0 = seg->curx;
> +
> +/* Then make sure the segment is sorted by x0 */
> +for (new = cur; new > lo; new--) {
> +segmentData *seg2 = segmentTable[new - 1];
> +if (seg2->curx <= x0) {
> +break;
> +}
> +segmentTable[new] = seg2;
>  }
> -segmentTable[new] = seg2;
> +segmentTable[new] = seg;
>  }
> -segmentTable[new] = seg;
>  }
>  cur = lo;
>  }
>
> Cheers,
> Laurent
>
> Le mar. 23 oct. 2018 à 08:30, Laurent Bourgès 
> a écrit :
>
>> Phil,
>> Yesterday I started hacking ShapeSpanIterator.c to add stats: the last
>> stage (sort by x0) is the bottleneck.
>> In this case every sort takes up to 15ms per pixel row !
>>
>> I will see if I can adapt Marlin's MergeSort.java to C to have an
>> efficient sort in-place.
>> Do you know if libawt has already an efficient sort instead of porting
>> mine ?
>>
>> PS: "To save the planet, make software more efficient" is my quote of the
>> day !
>>
>> Cheers,
>> Laurent
>>
>


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-30 Thread Sergey Bylokhov

On 26/10/2018 00:41, Laurent Bourgès wrote:

I suppose a JEP or RFE is required, isnt it ?


It depends from the changes, but such contributions are welcome!



Cheers,
Laurent


On 23/10/2018 13:37, Laurent Bourgès wrote:
 > Phil,
 > I quickly modified the final update & sort loop to:
 > - move sort in another block
 > - use qsort() using a new comparator sortSegmentsByCurX
 >
 > This improves performance in PolyLineTest by 3 times: ~1s vs 3.5s !
 > Apparently qsort() is not optimal (comparator can not be inlined by c) 
so it may explain why Marlin (0x0 sampling) is still 2 times faster with its 
custom merge-sort (in-place).
 >
 > Any idea to improve C sort ?
 > Is it good enough ?
 >
 > - USE_QSORT_X: 1
 > oct. 23, 2018 10:15:29 PM polylinetest.Canvas paintComponent
 > INFOS: Paint Time: 1,081s
 > INFOS: Paint Time: 1,058s
 > INFOS: Paint Time: 1,067s
 >
 > - USE_QSORT_X: 0
 > oct. 23, 2018 10:18:50 PM polylinetest.Canvas paintComponent
 > INFOS: Paint Time: 3,318s
 > INFOS: Paint Time: 3,258s
 > INFOS: Paint Time: 3,273s
 >
 > Patch:
 > diff -r 297450fcab26 
src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c
 > --- 
a/src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c    Tue Oct 
16 23:21:05 2018 +0530
 > +++ 
b/src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c    Tue Oct 
23 22:31:00 2018 +0200
 > @@ -1243,6 +1243,18 @@
 >   }
 >   }
 >
 > +/* LBO: enable (1) / disable (0) qsort on curx */
 > +#define USE_QSORT_X (0)
 > +
 > +static int CDECL
 > +sortSegmentsByCurX(const void *elem1, const void *elem2)
 > +{
 > +    jint x1 = (*(segmentData **)elem1)->curx;
 > +    jint x2 = (*(segmentData **)elem2)->curx;
 > +
 > +    return (x1 - x2);
 > +}
 > +
 >   static jboolean
 >   ShapeSINextSpan(void *state, jint spanbox[])
 >   {
 > @@ -1378,16 +1390,28 @@
 >   seg->curx = x0;
 >   seg->cury = y0;
 >   seg->error = err;
 > +    }
 >
 > -    /* Then make sure the segment is sorted by x0 */
 > -    for (new = cur; new > lo; new--) {
 > -    segmentData *seg2 = segmentTable[new - 1];
 > -    if (seg2->curx <= x0) {
 > -    break;
 > +    if (USE_QSORT_X && (hi - lo) > 100)
 > +    {
 > +    /* use quick sort on [lo - hi] range */
 > +    qsort(&(segmentTable[lo]), (hi - lo), sizeof(segmentData *),
 > +    sortSegmentsByCurX);
 > +    } else {
 > +    for (cur = lo; cur < hi; cur++) {
 > +    seg = segmentTable[cur];
 > +    x0 = seg->curx;
 > +
 > +    /* Then make sure the segment is sorted by x0 */
 > +    for (new = cur; new > lo; new--) {
 > +    segmentData *seg2 = segmentTable[new - 1];
 > +    if (seg2->curx <= x0) {
 > +    break;
 > +    }
 > +    segmentTable[new] = seg2;
 >   }
 > -    segmentTable[new] = seg2;
 > +    segmentTable[new] = seg;
 >   }
 > -    segmentTable[new] = seg;
 >   }
 >   cur = lo;
 >   }
 >
 > Cheers,
 > Laurent
 >
 > Le mar. 23 oct. 2018 à 08:30, Laurent Bourgès mailto:bourges.laur...@gmail.com> >> a écrit :
 >
 >     Phil,
 >     Yesterday I started hacking ShapeSpanIterator.c to add stats: the 
last stage (sort by x0) is the bottleneck.
 >     In this case every sort takes up to 15ms per pixel row !
 >
 >     I will see if I can adapt Marlin's MergeSort.java to C to have an 
efficient sort in-place.
 >     Do you know if libawt has already an efficient sort instead of 
porting mine ?
 >
 >     PS: "To save the planet, make software more efficient" is my quote 
of the day !
 >
 >     Cheers,
 >     Laurent
 >


-- 
Best regards, Sergey.





--
Best regards, Sergey.


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-26 Thread Laurent Bourgès
Sergey,

Le mer. 24 oct. 2018 à 23:50, Sergey Bylokhov 
a écrit :

> I have no comments about the current proposal(results is good), but is
> that really necessary to have this implementation in native code?
>

1. I read the academic papers "Sort Race", 2016 and I will experiment their
best merge6 sort in Marlin to optimize even better the sort of crossings
(x).

It reports that linux qsort() is instezd a mergesort (sedgewick), not
optimal for ordered/reversed runs, as java Timsort does: stdlib is up to 2
times slower (worst case).

See https://arxiv.org/abs/1609.04471

2. My goal consists in fixing the worst case here, as demonstrated by this
case. I may port my MergeSort to C if interested.

Later we could get rid of this native nln-AA pipeline if we switch to
Marlin NonAA renderer like OpenJFX.

I suppose a JEP or RFE is required, isnt it ?

Cheers,
Laurent


> On 23/10/2018 13:37, Laurent Bourgès wrote:
> > Phil,
> > I quickly modified the final update & sort loop to:
> > - move sort in another block
> > - use qsort() using a new comparator sortSegmentsByCurX
> >
> > This improves performance in PolyLineTest by 3 times: ~1s vs 3.5s !
> > Apparently qsort() is not optimal (comparator can not be inlined by c)
> so it may explain why Marlin (0x0 sampling) is still 2 times faster with
> its custom merge-sort (in-place).
> >
> > Any idea to improve C sort ?
> > Is it good enough ?
> >
> > - USE_QSORT_X: 1
> > oct. 23, 2018 10:15:29 PM polylinetest.Canvas paintComponent
> > INFOS: Paint Time: 1,081s
> > INFOS: Paint Time: 1,058s
> > INFOS: Paint Time: 1,067s
> >
> > - USE_QSORT_X: 0
> > oct. 23, 2018 10:18:50 PM polylinetest.Canvas paintComponent
> > INFOS: Paint Time: 3,318s
> > INFOS: Paint Time: 3,258s
> > INFOS: Paint Time: 3,273s
> >
> > Patch:
> > diff -r 297450fcab26
> src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c
> > ---
> a/src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c
> Tue Oct 16 23:21:05 2018 +0530
> > +++
> b/src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c
> Tue Oct 23 22:31:00 2018 +0200
> > @@ -1243,6 +1243,18 @@
> >   }
> >   }
> >
> > +/* LBO: enable (1) / disable (0) qsort on curx */
> > +#define USE_QSORT_X (0)
> > +
> > +static int CDECL
> > +sortSegmentsByCurX(const void *elem1, const void *elem2)
> > +{
> > +jint x1 = (*(segmentData **)elem1)->curx;
> > +jint x2 = (*(segmentData **)elem2)->curx;
> > +
> > +return (x1 - x2);
> > +}
> > +
> >   static jboolean
> >   ShapeSINextSpan(void *state, jint spanbox[])
> >   {
> > @@ -1378,16 +1390,28 @@
> >   seg->curx = x0;
> >   seg->cury = y0;
> >   seg->error = err;
> > +}
> >
> > -/* Then make sure the segment is sorted by x0 */
> > -for (new = cur; new > lo; new--) {
> > -segmentData *seg2 = segmentTable[new - 1];
> > -if (seg2->curx <= x0) {
> > -break;
> > +if (USE_QSORT_X && (hi - lo) > 100)
> > +{
> > +/* use quick sort on [lo - hi] range */
> > +qsort(&(segmentTable[lo]), (hi - lo), sizeof(segmentData *),
> > +sortSegmentsByCurX);
> > +} else {
> > +for (cur = lo; cur < hi; cur++) {
> > +seg = segmentTable[cur];
> > +x0 = seg->curx;
> > +
> > +/* Then make sure the segment is sorted by x0 */
> > +for (new = cur; new > lo; new--) {
> > +segmentData *seg2 = segmentTable[new - 1];
> > +if (seg2->curx <= x0) {
> > +break;
> > +}
> > +segmentTable[new] = seg2;
> >   }
> > -segmentTable[new] = seg2;
> > +segmentTable[new] = seg;
> >   }
> > -segmentTable[new] = seg;
> >   }
> >   cur = lo;
> >   }
> >
> > Cheers,
> > Laurent
> >
> > Le mar. 23 oct. 2018 à 08:30, Laurent Bourgès  > a écrit :
> >
> > Phil,
> > Yesterday I started hacking ShapeSpanIterator.c to add stats: the
> last stage (sort by x0) is the bottleneck.
> > In this case every sort takes up to 15ms per pixel row !
> >
> > I will see if I can adapt Marlin's MergeSort.java to C to have an
> efficient sort in-place.
> > Do you know if libawt has already an efficient sort instead of
> porting mine ?
> >
> > PS: "To save the planet, make software more efficient" is my quote
> of the day !
> >
> > Cheers,
> > Laurent
> >
>
>
> --
> Best regards, Sergey.
>


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-24 Thread Sergey Bylokhov

I have no comments about the current proposal(results is good), but is that 
really necessary to have this implementation in native code?

On 23/10/2018 13:37, Laurent Bourgès wrote:

Phil,
I quickly modified the final update & sort loop to:
- move sort in another block
- use qsort() using a new comparator sortSegmentsByCurX

This improves performance in PolyLineTest by 3 times: ~1s vs 3.5s !
Apparently qsort() is not optimal (comparator can not be inlined by c) so it 
may explain why Marlin (0x0 sampling) is still 2 times faster with its custom 
merge-sort (in-place).

Any idea to improve C sort ?
Is it good enough ?

- USE_QSORT_X: 1
oct. 23, 2018 10:15:29 PM polylinetest.Canvas paintComponent
INFOS: Paint Time: 1,081s
INFOS: Paint Time: 1,058s
INFOS: Paint Time: 1,067s

- USE_QSORT_X: 0
oct. 23, 2018 10:18:50 PM polylinetest.Canvas paintComponent
INFOS: Paint Time: 3,318s
INFOS: Paint Time: 3,258s
INFOS: Paint Time: 3,273s

Patch:
diff -r 297450fcab26 
src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c
--- a/src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c    
Tue Oct 16 23:21:05 2018 +0530
+++ b/src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c    
Tue Oct 23 22:31:00 2018 +0200
@@ -1243,6 +1243,18 @@
  }
  }

+/* LBO: enable (1) / disable (0) qsort on curx */
+#define USE_QSORT_X (0)
+
+static int CDECL
+sortSegmentsByCurX(const void *elem1, const void *elem2)
+{
+    jint x1 = (*(segmentData **)elem1)->curx;
+    jint x2 = (*(segmentData **)elem2)->curx;
+
+    return (x1 - x2);
+}
+
  static jboolean
  ShapeSINextSpan(void *state, jint spanbox[])
  {
@@ -1378,16 +1390,28 @@
  seg->curx = x0;
  seg->cury = y0;
  seg->error = err;
+    }

-    /* Then make sure the segment is sorted by x0 */
-    for (new = cur; new > lo; new--) {
-    segmentData *seg2 = segmentTable[new - 1];
-    if (seg2->curx <= x0) {
-    break;
+    if (USE_QSORT_X && (hi - lo) > 100)
+    {
+    /* use quick sort on [lo - hi] range */
+    qsort(&(segmentTable[lo]), (hi - lo), sizeof(segmentData *),
+    sortSegmentsByCurX);
+    } else {
+    for (cur = lo; cur < hi; cur++) {
+    seg = segmentTable[cur];
+    x0 = seg->curx;
+
+    /* Then make sure the segment is sorted by x0 */
+    for (new = cur; new > lo; new--) {
+    segmentData *seg2 = segmentTable[new - 1];
+    if (seg2->curx <= x0) {
+    break;
+    }
+    segmentTable[new] = seg2;
  }
-    segmentTable[new] = seg2;
+    segmentTable[new] = seg;
  }
-    segmentTable[new] = seg;
  }
  cur = lo;
  }

Cheers,
Laurent

Le mar. 23 oct. 2018 à 08:30, Laurent Bourgès mailto:bourges.laur...@gmail.com>> a écrit :

Phil,
Yesterday I started hacking ShapeSpanIterator.c to add stats: the last 
stage (sort by x0) is the bottleneck.
In this case every sort takes up to 15ms per pixel row !

I will see if I can adapt Marlin's MergeSort.java to C to have an efficient 
sort in-place.
Do you know if libawt has already an efficient sort instead of porting mine 
?

PS: "To save the planet, make software more efficient" is my quote of the 
day !

Cheers,
Laurent




--
Best regards, Sergey.


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-23 Thread Laurent Bourgès
Phil,
I quickly modified the final update & sort loop to:
- move sort in another block
- use qsort() using a new comparator sortSegmentsByCurX

This improves performance in PolyLineTest by 3 times: ~1s vs 3.5s !
Apparently qsort() is not optimal (comparator can not be inlined by c) so
it may explain why Marlin (0x0 sampling) is still 2 times faster with its
custom merge-sort (in-place).

Any idea to improve C sort ?
Is it good enough ?

- USE_QSORT_X: 1
oct. 23, 2018 10:15:29 PM polylinetest.Canvas paintComponent
INFOS: Paint Time: 1,081s
INFOS: Paint Time: 1,058s
INFOS: Paint Time: 1,067s

- USE_QSORT_X: 0
oct. 23, 2018 10:18:50 PM polylinetest.Canvas paintComponent
INFOS: Paint Time: 3,318s
INFOS: Paint Time: 3,258s
INFOS: Paint Time: 3,273s

Patch:
diff -r 297450fcab26
src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c
---
a/src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c
Tue Oct 16 23:21:05 2018 +0530
+++
b/src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c
Tue Oct 23 22:31:00 2018 +0200
@@ -1243,6 +1243,18 @@
 }
 }

+/* LBO: enable (1) / disable (0) qsort on curx */
+#define USE_QSORT_X (0)
+
+static int CDECL
+sortSegmentsByCurX(const void *elem1, const void *elem2)
+{
+jint x1 = (*(segmentData **)elem1)->curx;
+jint x2 = (*(segmentData **)elem2)->curx;
+
+return (x1 - x2);
+}
+
 static jboolean
 ShapeSINextSpan(void *state, jint spanbox[])
 {
@@ -1378,16 +1390,28 @@
 seg->curx = x0;
 seg->cury = y0;
 seg->error = err;
+}

-/* Then make sure the segment is sorted by x0 */
-for (new = cur; new > lo; new--) {
-segmentData *seg2 = segmentTable[new - 1];
-if (seg2->curx <= x0) {
-break;
+if (USE_QSORT_X && (hi - lo) > 100)
+{
+/* use quick sort on [lo - hi] range */
+qsort(&(segmentTable[lo]), (hi - lo), sizeof(segmentData *),
+sortSegmentsByCurX);
+} else {
+for (cur = lo; cur < hi; cur++) {
+seg = segmentTable[cur];
+x0 = seg->curx;
+
+/* Then make sure the segment is sorted by x0 */
+for (new = cur; new > lo; new--) {
+segmentData *seg2 = segmentTable[new - 1];
+if (seg2->curx <= x0) {
+break;
+}
+segmentTable[new] = seg2;
 }
-segmentTable[new] = seg2;
+segmentTable[new] = seg;
 }
-segmentTable[new] = seg;
 }
 cur = lo;
 }

Cheers,
Laurent

Le mar. 23 oct. 2018 à 08:30, Laurent Bourgès  a
écrit :

> Phil,
> Yesterday I started hacking ShapeSpanIterator.c to add stats: the last
> stage (sort by x0) is the bottleneck.
> In this case every sort takes up to 15ms per pixel row !
>
> I will see if I can adapt Marlin's MergeSort.java to C to have an
> efficient sort in-place.
> Do you know if libawt has already an efficient sort instead of porting
> mine ?
>
> PS: "To save the planet, make software more efficient" is my quote of the
> day !
>
> Cheers,
> Laurent
>


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-23 Thread Laurent Bourgès
Phil,
Yesterday I started hacking ShapeSpanIterator.c to add stats: the last
stage (sort by x0) is the bottleneck.
In this case every sort takes up to 15ms per pixel row !

I will see if I can adapt Marlin's MergeSort.java to C to have an efficient
sort in-place.
Do you know if libawt has already an efficient sort instead of porting mine
?

PS: "To save the planet, make software more efficient" is my quote of the
day !

Cheers,
Laurent

Le ven. 12 oct. 2018 à 16:37, Philip Race  a écrit :

> If there are no comments in the source, then there is no documentation :-(
> Reverse engineering/studying the code is what we'd have to do if that were
> the approach to be taken here.
>
> -phil.
>
> On 10/12/18, 2:18 AM, Laurent Bourgès wrote:
>
> Phil,
> I looked at the hostpot in
> src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c (75%
> cpu time) and its sort algorithm looks like an insertion sort ...
> If you could give me some explanations (or documentation), I could try
> optimizing this method.
>
> Do you know if it uses an Active Edge Table (AET) or it traverses all
> segments every time ?
> i.e. segmentTable contains only ACTIVE segments or all ?
>
>


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-12 Thread Peter Hull
This has been given a Java bug number now: JDK-8212124
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8212124

I have been following the discussion here and I have a couple of
workarounds to try, which is great.

But, is there any more information you need from me? I'm always happy to help.

Thanks,
Pete


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-12 Thread Philip Race

If there are no comments in the source, then there is no documentation :-(
Reverse engineering/studying the code is what we'd have to do if that were
the approach to be taken here.

-phil.

On 10/12/18, 2:18 AM, Laurent Bourgès wrote:

Phil,
I looked at the hostpot in 
src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c 
(75% cpu time) and its sort algorithm looks like an insertion sort ...
If you could give me some explanations (or documentation), I could try 
optimizing this method.


Do you know if it uses an Active Edge Table (AET) or it traverses all 
segments every time ?

i.e. segmentTable contains only ACTIVE segments or all ?



Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-12 Thread Laurent Bourgès
Phil,
I looked at the hostpot in
src/java.desktop/share/native/libawt/java2d/pipe/ShapeSpanIterator.c (75%
cpu time) and its sort algorithm looks like an insertion sort ...
If you could give me some explanations (or documentation), I could try
optimizing this method.

Do you know if it uses an Active Edge Table (AET) or it traverses all
segments every time ?
i.e. segmentTable contains only ACTIVE segments or all ?

static jboolean
ShapeSINextSpan(void *state, jint spanbox[])
{
pathData *pd = (pathData *)state;
int lo, cur, new, hi;
int num = pd->numSegments;
jint x0, x1, y0, err;
jint loy;
int ret = JNI_FALSE;
segmentData **segmentTable;
segmentData *seg;

if (pd->state != STATE_SPAN_STARTED) {
if (!initSegmentTable(pd)) {
/* REMIND: - throw exception? */
pd->lowSegment = num;
return JNI_FALSE;
}
}

lo = pd->lowSegment;
cur = pd->curSegment;
hi = pd->hiSegment;
num = pd->numSegments;
loy = pd->loy;
segmentTable = pd->segmentTable;

while (lo < num) {
if (cur < hi) {
seg = segmentTable[cur];
x0 = seg->curx;
if (x0 >= pd->hix) {
cur = hi;
continue;
}
if (x0 < pd->lox) {
x0 = pd->lox;
}

if (pd->evenodd) {
cur += 2;
if (cur <= hi) {
x1 = segmentTable[cur - 1]->curx;
} else {
x1 = pd->hix;
}
} else {
int wind = seg->windDir;
cur++;

while (JNI_TRUE) {
if (cur >= hi) {
x1 = pd->hix;
break;
}
seg = segmentTable[cur++];
wind += seg->windDir;
if (wind == 0) {
x1 = seg->curx;
break;
}
}
}

if (x1 > pd->hix) {
x1 = pd->hix;
}
if (x1 <= x0) {
continue;
}
spanbox[0] = x0;
spanbox[1] = loy;
spanbox[2] = x1;
spanbox[3] = loy + 1;
ret = JNI_TRUE;
break;
}

if (++loy >= pd->hiy) {
lo = cur = hi = num;
break;
}

/* Go through active segments and toss which end "above" loy */
cur = new = hi;
while (--cur >= lo) {
seg = segmentTable[cur];
if (seg->lasty > loy) {
segmentTable[--new] = seg;
}
}

lo = new;
if (lo == hi && lo < num) {
/* The current list of segments is empty so we need to
 * jump to the beginning of the next set of segments.
 * Since the segments are not clipped to the output
 * area we need to make sure we don't jump "backwards"
 */
seg = segmentTable[lo];
if (loy < seg->cury) {
loy = seg->cury;
}
}

/* Go through new segments and accept any which start "above" loy */
while (hi < num && segmentTable[hi]->cury <= loy) {
hi++;
}

/* Update and sort the active segments by x0 */
for (cur = lo; cur < hi; cur++) {
seg = segmentTable[cur];

/* First update the x0, y0 of the segment */
x0 = seg->curx;
y0 = seg->cury;
err = seg->error;
if (++y0 == loy) {
x0 += seg->bumpx;
err += seg->bumperr;
x0 -= (err >> 31);
err &= ERRSTEP_MAX;
} else {
jlong steps = loy;
steps -= y0 - 1;
y0 = loy;
x0 += (jint) (steps * seg->bumpx);
steps = err + (steps * seg->bumperr);
x0 += (jint) (steps >> 31);
err = ((jint) steps) & ERRSTEP_MAX;
}
seg->curx = x0;
seg->cury = y0;
seg->error = err;








*/* Then make sure the segment is sorted by x0 */
for (new = cur; new > lo; new--) {segmentData *seg2 =
segmentTable[new - 1];if (seg2->curx <= x0)
{break;}
segmentTable[new] = seg2;}*
segmentTable[new] = seg;
}
cur = lo;
}

pd->lowSegment = lo;
pd->hiSegment = hi;
pd->curSegment = cur;
pd->loy = loy;
return ret;
}

Cheers,
Laurent

Le ven. 12 oct. 2018 à 07:04, Laurent Bourgès  a
écrit :

> Phil,
>
> It reminds me I have rewritten in Marlin renderer the crossing sort at
> every scanline.
> Pisces was using a trivial insertion sort but it became very slow when the
> crossing count is 

Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-11 Thread Laurent Bourgès
Phil,

It reminds me I have rewritten in Marlin renderer the crossing sort at
every scanline.
Pisces was using a trivial insertion sort but it became very slow when the
crossing count is large.
I adopted a special merge sort as crossings are mostly ordered: big win.

What sort algo is in action in drawPolyLine ?

Cheers,
Laurent

Le ven. 12 oct. 2018 à 01:15, Phil Race  a écrit :

>
> In my previous email I was asking only about the "older" system,
> precisely because as you confirm below, I wanted to know that
> it was operating on an unscaled graphics.
>
> It is being triggered by the scale. If you add :
> graphics.scale(1.25, 1.25)
> in your application and run on 8 you'll see the window size is
> not changed but the contents are and the test now runs slowly
> like the JDK 9+ case.
>
> I think most primitives (text, images, fills, gradients, untransformed
> rectangle drawing) will be only slightly slower. The same if
> you were drawing anti-aliased lines - they are going to be slow already
> by comparison.
>
> A few similar primitives (drawArc, drawOval, drawPolygon ..) may be
> similarly
> affected but drawPolyLine even has dedicated loops for single pixel wide
> lines
> so may be the most affected when these loops can't be used.
>
> So this is a kind of worse case difference. Untransformed, aliased lines
> are super fast.
> Once you do anything like add anti-aliasing or a transform, they get
> slower.
>
> Note: hidpi does not mean that acceleration is "turned off", rather that
> some operations can no longer be sent to the accelerated pipeline, either
> it doesn't support that mode, or we haven't implemented the necessary code
> to invoke it for that mode.
>
> In Peter's case on Intel there will be no acceleration, since we do not
> enable D3D
> on Intel graphics cards. But on my system the time is identical whether
> I use D3D or not.
>
> But there is something else going on here too.
> Peter's test use 2^16 line segments.
>
> On my windows system at 1.25 scale, this takes 55 seconds to run.
> But 2^15 line segments completes in 10 seconds.
> So 2 x the no. of lines takes approx 5 times as long to run ..
>
> I have a modified version of Peter's program which breaks the polyline
> array
> into subarrays which get passed to multiple calls to drawPolyline.
> It misses joining the last point in ARR[N] to the first point in
> ARR[N+1] but
> I think that should not make much difference but if someone wants to use
> that in a real app they'll need to handle it.
>
> What I see is that using the smaller arrays makes a big difference.
>
> So instead of 60 seconds to draw one 65,536 element polyline, to
> draw 64 polylines of 1,024 elements takes just 1.1 seconds.
> Still not 0.05 but better.
>  From what I can see it is being turned into a huge GeneralPath and
> rendered as a Shape. Multiple smaller shapes perform better.
> Perhaps we can add a loop that is specific to polygons that will handle
> this better, but that isn't likely to be jumped on .. and obviously
> it will never be *as* fast as narrow lines.
>
> -phil.
>
>
>
> On 10/11/2018 04:26 AM, Peter Hull wrote:
> > Hi Laurent,
> > Thanks for the detailed explanation. I quickly checked on the older
> > Windows system and the Java 11 window was the same size as the Java 8
> > one, implying no scaling was going on (I guess just because it has a
> > lower resolution monitor) - so that confirms your hypothesis.
> >
> > If I use -Dsun.java2d.uiScale=1.0 that's OK for my laptop, it doesn't
> > matter if the window is a bit small. However I believe some higher end
> > systems have much higher scaling factors (2x, 3x?). Is there a general
> > way to specify a 1px line regardless of scaling, because in my case I
> > don't mind too much if it's a 'hair-line'?
> >
> > By the way, my actual application doesn't have 65000 lines but it
> > draws 3 graphs with about 3000 points, which makes it noticeably slow
> > when resizing the Window. I suppose I should look into cutting down
> > the number of points somehow...
> >
> > Pete
>
>


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-11 Thread Philip Race

And something else to try .. which I haven't tried, since I now believe
the problem isn't the drawing performance of a wide line, is to see
what happens if you do go that path - ie try 3,000 individual drawline 
calls.

It depends how much overhead matters whether it is better than (say)
breaking it into 100 calls with 30 points to drawPolyline ..

-phil.

On 10/11/18, 4:15 PM, Phil Race wrote:



By the way, my actual application doesn't have 65000 lines but it
draws 3 graphs with about 3000 points, which makes it noticeably slow
when resizing the Window. I suppose I should look into cutting down
the number of points somehow...


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-11 Thread Phil Race



In my previous email I was asking only about the "older" system,
precisely because as you confirm below, I wanted to know that
it was operating on an unscaled graphics.

It is being triggered by the scale. If you add :
graphics.scale(1.25, 1.25)
in your application and run on 8 you'll see the window size is
not changed but the contents are and the test now runs slowly
like the JDK 9+ case.

I think most primitives (text, images, fills, gradients, untransformed
rectangle drawing) will be only slightly slower. The same if
you were drawing anti-aliased lines - they are going to be slow already 
by comparison.


A few similar primitives (drawArc, drawOval, drawPolygon ..) may be 
similarly
affected but drawPolyLine even has dedicated loops for single pixel wide 
lines

so may be the most affected when these loops can't be used.

So this is a kind of worse case difference. Untransformed, aliased lines 
are super fast.

Once you do anything like add anti-aliasing or a transform, they get slower.

Note: hidpi does not mean that acceleration is "turned off", rather that
some operations can no longer be sent to the accelerated pipeline, either
it doesn't support that mode, or we haven't implemented the necessary code
to invoke it for that mode.

In Peter's case on Intel there will be no acceleration, since we do not 
enable D3D
on Intel graphics cards. But on my system the time is identical whether 
I use D3D or not.


But there is something else going on here too.
Peter's test use 2^16 line segments.

On my windows system at 1.25 scale, this takes 55 seconds to run.
But 2^15 line segments completes in 10 seconds.
So 2 x the no. of lines takes approx 5 times as long to run ..

I have a modified version of Peter's program which breaks the polyline array
into subarrays which get passed to multiple calls to drawPolyline.
It misses joining the last point in ARR[N] to the first point in 
ARR[N+1] but

I think that should not make much difference but if someone wants to use
that in a real app they'll need to handle it.

What I see is that using the smaller arrays makes a big difference.

So instead of 60 seconds to draw one 65,536 element polyline, to
draw 64 polylines of 1,024 elements takes just 1.1 seconds.
Still not 0.05 but better.
From what I can see it is being turned into a huge GeneralPath and
rendered as a Shape. Multiple smaller shapes perform better.
Perhaps we can add a loop that is specific to polygons that will handle
this better, but that isn't likely to be jumped on .. and obviously
it will never be *as* fast as narrow lines.

-phil.



On 10/11/2018 04:26 AM, Peter Hull wrote:

Hi Laurent,
Thanks for the detailed explanation. I quickly checked on the older
Windows system and the Java 11 window was the same size as the Java 8
one, implying no scaling was going on (I guess just because it has a
lower resolution monitor) - so that confirms your hypothesis.

If I use -Dsun.java2d.uiScale=1.0 that's OK for my laptop, it doesn't
matter if the window is a bit small. However I believe some higher end
systems have much higher scaling factors (2x, 3x?). Is there a general
way to specify a 1px line regardless of scaling, because in my case I
don't mind too much if it's a 'hair-line'?

By the way, my actual application doesn't have 65000 lines but it
draws 3 graphs with about 3000 points, which makes it noticeably slow
when resizing the Window. I suppose I should look into cutting down
the number of points somehow...

Pete




Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-11 Thread Laurent Bourgès
Hi,

One last thing about Marlin renderer:
it is available since OpenJDK9 and you can tune its subpixels to let say
1x1 ie 1 pixel: -Dsun.java2d.renderer.subPixel_log2_X=0
-Dsun.java2d.renderer.subPixel_log2_Y=0

I ran again the 'slow' test on linux ~ 0.5s:
- 4x faster than Marlin AA defaults
- 6.5x faster than AWT C code (HiDPI)
- still 16x slower than accelerated pipeline (xrender)

OpenJDK Runtime Environment 18.9 (build 11+28)
JAVA_OPTS: -DuseAA=true -Dsun.java2d.uiScale=2.5
-Dsun.java2d.renderer.subPixel_log2_X=0
-Dsun.java2d.renderer.subPixel_log2_Y=0
Java: 11 11+28
oct. 11, 2018 2:36:12 PM polylinetest.Canvas paintComponent
INFO: Paint Time: 0,747s
oct. 11, 2018 2:36:12 PM polylinetest.Canvas paintComponent
INFO: Paint Time: 0,553s
oct. 11, 2018 2:36:13 PM polylinetest.Canvas paintComponent
INFO: Paint Time: 0,559s
oct. 11, 2018 2:36:13 PM polylinetest.Canvas paintComponent
INFO: Paint Time: 0,55s

Of course, you should enable antialiasing:

 @Override
 protected void paintComponent(Graphics g) {
 super.paintComponent(g);
 Graphics2D graphics = (Graphics2D) g;
+   if (USE_AA) {
+
graphics.setRenderingHint(RenderingHints.KEY_ANTIALIASING,
RenderingHints.VALUE_ANTIALIAS_ON);
+   }

PS: In OpenJFX, noAA rendering is using a specific Marlin renderer instance
(1x1 sampling) so it could be applied to Java2D noAA too.

Cheers,
Laurent

Le jeu. 11 oct. 2018 à 13:27, Peter Hull  a écrit :

> Hi Laurent,
> Thanks for the detailed explanation. I quickly checked on the older
> Windows system and the Java 11 window was the same size as the Java 8
> one, implying no scaling was going on (I guess just because it has a
> lower resolution monitor) - so that confirms your hypothesis.
>
> If I use -Dsun.java2d.uiScale=1.0 that's OK for my laptop, it doesn't
> matter if the window is a bit small. However I believe some higher end
> systems have much higher scaling factors (2x, 3x?). Is there a general
> way to specify a 1px line regardless of scaling, because in my case I
> don't mind too much if it's a 'hair-line'?
>
> By the way, my actual application doesn't have 65000 lines but it
> draws 3 graphs with about 3000 points, which makes it noticeably slow
> when resizing the Window. I suppose I should look into cutting down
> the number of points somehow...
>
> Pete
>


-- 
-- 
Laurent Bourgès


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-11 Thread Peter Hull
Hi Laurent,
Thanks for the detailed explanation. I quickly checked on the older
Windows system and the Java 11 window was the same size as the Java 8
one, implying no scaling was going on (I guess just because it has a
lower resolution monitor) - so that confirms your hypothesis.

If I use -Dsun.java2d.uiScale=1.0 that's OK for my laptop, it doesn't
matter if the window is a bit small. However I believe some higher end
systems have much higher scaling factors (2x, 3x?). Is there a general
way to specify a 1px line regardless of scaling, because in my case I
don't mind too much if it's a 'hair-line'?

By the way, my actual application doesn't have 65000 lines but it
draws 3 graphs with about 3000 points, which makes it noticeably slow
when resizing the Window. I suppose I should look into cutting down
the number of points somehow...

Pete


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-11 Thread Laurent Bourgès
Hi Peter,

I confirm that HiDPI support is causing your problem.

On linux (xrender), I added -Dsun.java2d.uiScale=2.5 and the performance
becomes poor ~ 3.3s vs 0.03s !

java -Dsun.java2d.uiScale=2.5 -DuseAA=false -jar dist/PolylineTest.jar
Java: 11 11+28
oct. 11, 2018 1:02:00 PM polylinetest.Canvas paintComponent
INFO: Paint Time: 3,781s
oct. 11, 2018 1:02:03 PM polylinetest.Canvas paintComponent
INFO: Paint Time: 3,003s
oct. 11, 2018 1:02:06 PM polylinetest.Canvas paintComponent
*INFO: Paint Time: 3,318s*

java -jar dist/PolylineTest.jar
Java: 11 11+28
oct. 11, 2018 12:50:33 PM polylinetest.Canvas paintComponent
INFO: Paint Time: 0,073s
oct. 11, 2018 12:50:33 PM polylinetest.Canvas paintComponent
INFO: Paint Time: 0,037s
oct. 11, 2018 12:50:33 PM polylinetest.Canvas paintComponent
*INFO: Paint Time: 0,029s*


I enabled antialiasing hint to use Marlin renderer and performance is
slightly better ~1.9s vs 3.3s.
Java: 11 11+28
oct. 11, 2018 1:01:27 PM polylinetest.Canvas paintComponent
INFO: Paint Time: 2,304s
oct. 11, 2018 1:01:29 PM polylinetest.Canvas paintComponent
INFO: Paint Time: 1,911s
oct. 11, 2018 1:01:31 PM polylinetest.Canvas paintComponent
*INFO: Paint Time: 1,881s*

Moreover, your polyline is very complicated (65K segments) so AWT (c code)
may have troubles in contrary to the Marlin renderer (pure java - AA
optimized code), that is faster and performs AA computations (8 times more
sampling).

I performed quick profiling using linux perf:
perf record -g java -Dsun.java2d.uiScale=2.5 -jar dist/PolylineTest.jar

Samples: 58K of event 'cycles:ppp', Event count (approx.): 48668354960,
DSO: libawt.so
  Children  Self  Command
Symbol  ◆


*+   74,35% 0,00%  AWT-EventQueue-  [.]
Java_sun_java2d_pipe_ShapeSpanIterator_nextSpan ▒+
74,35%74,10%  AWT-EventQueue-  [.]
ShapeSINextSpan ▒+
10,00% 0,05%  AWT-EventQueue-  [.]
Java_sun_java2d_pipe_ShapeSpanIterator_lineTo   ▒*
 0,22% 0,22%  AWT-EventQueue-  [.]
sortSegmentsByLeadingY  ▒
 0,14% 0,14%  AWT-EventQueue-  [.]
appendSegment   ▒
 0,08% 0,00%  java [.]
Java_sun_java2d_loops_GraphicsPrimitiveMgr_initIDs  ▒
 0,04% 0,04%  AWT-EventQueue-  [.]
subdivideLine.isra.0▒
 0,03% 0,00%  java [.]
AWT_OnLoad  ▒
 0,03% 0,00%  java [.]
AWTIsHeadless   ▒
 0,02% 0,02%  AWT-EventQueue-  [.]
GetSpanData ▒
 0,02% 0,00%  java [.]
Java_sun_java2d_SurfaceData_initIDs ▒
 0,01% 0,00%  AWT-EventQueue-  [.]
initSegmentTable▒
 0,01% 0,00%  java [.]
Java_sun_java2d_loops_GraphicsPrimitiveMgr_registerNativeLoops  ▒
 0,01% 0,01%  AWT-EventQueue-  [.] free@plt
▒
 0,00% 0,00%  java [.]
RegisterPrimitives  ▒
 0,00% 0,00%  Java2D Disposer  [.]
SurfaceData_DisposeOps  ▒
 0,00% 0,00%  AWT-EventQueue-  [.] memcpy@plt
▒
 0,00% 0,00%  AWT-EventQueue-  [.] calloc@plt
▒
 0,00% 0,00%  java [.]
InitSimpleTypes.constprop.0 ▒
 0,00% 0,00%  java [.]
MapAccelFunction▒
 0,00% 0,00%  java [.]
Java_sun_java2d_pipe_SpanClipRenderer_initIDs   ▒

I suspect that HiDPI implies software rendering instead of accelerated
rendering (xrender drawline, AFAIR).

However, I am not sure such performance issue can be fixed any time soon.
Workaround: use -Dsun.java2d.uiScale=1.0

Regards,
Laurent

Le jeu. 11 oct. 2018 à 10:30, Peter Hull  a écrit :

> I can answer part of that, but I can't get access to the older system just
> now.
>
> On Wed, Oct 10, 2018 at 4:41 PM Philip Race 
> wrote:
> > In other words does
> >
> > -Dsun.java2d.uiScale=1.0
> >
> > even change the physical size of the window on JDK 9/10/11 ?
> >
> Yes, because I can run the same jar under Java 8 and 11. Without the
> scale option, the Java 11 window is bigger than the Java 8 one, by
> about 1.25x (this corresponds to my system setting)
> When I add the scale=1 option to both, they are both the same size
> (and the same as JDK8 without any scaling)
> I've attached 2 images so you can see what I mean, one is without any
> scale option (and I labelled the approx size on this) and the other is
> with -Dsun.java2d.uiScale=1.0.
> The 

Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-11 Thread Peter Hull
I can answer part of that, but I can't get access to the older system just now.

On Wed, Oct 10, 2018 at 4:41 PM Philip Race  wrote:
> In other words does
>
> -Dsun.java2d.uiScale=1.0
>
> even change the physical size of the window on JDK 9/10/11 ?
>
Yes, because I can run the same jar under Java 8 and 11. Without the
scale option, the Java 11 window is bigger than the Java 8 one, by
about 1.25x (this corresponds to my system setting)
When I add the scale=1 option to both, they are both the same size
(and the same as JDK8 without any scaling)
I've attached 2 images so you can see what I mean, one is without any
scale option (and I labelled the approx size on this) and the other is
with -Dsun.java2d.uiScale=1.0.
The window title contains the system property "java.runtime.version"
so you can see which is which.

I do appreciate your help on this. It looks like it's coming down to
Intel's graphics driver, do you agree?

Pete


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-10 Thread Philip Race




 On 10/10/18, 6:07 AM, Peter Hull wrote:

On Wed, Oct 10, 2018 at 1:21 PM Peter Hull  wrote:

-Dsun.java2d.uiScale=1.0
And this does make it fast again for me (paint time<  0.1sec)!


Which would sound like drawing wide lines on the software pipeline is 
slowing
it down which makes sense except the next statement potentially 
invalidates that :

Also I tried on an older system with "Intel HD Graphics 4600" and it
did not have the slow down problem. So it seems to be quite specific
to my system.

Did that older system actually have UI scaling ?
What was the uiScale factor you saw on your "slow" system and what
is it on the no-slowdown system ?

In other words does

-Dsun.java2d.uiScale=1.0

even change the physical size of the window on JDK 9/10/11 ?


Measure the actual pixels by using a ruler on your screen and  of
course you'll need to know the screen resolution you have in use
to do the calculation :

Screen width in pixels  X window width in inches / screen width in inches

-phil


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-10 Thread Peter Hull
On Wed, Oct 10, 2018 at 1:21 PM Peter Hull  wrote:
> -Dsun.java2d.uiScale=1.0
> And this does make it fast again for me (paint time < 0.1sec)!

Also I tried on an older system with "Intel HD Graphics 4600" and it
did not have the slow down problem. So it seems to be quite specific
to my system.


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-10 Thread Peter Hull
On Wed, 10 Oct 2018 at 11:55 Laurent Bourgès  wrote:
>
> Peter,
> What is the corresponding bug ?
>
 I don't know; I filled in the details and it said they would let me
know the bug number if & when it was accepted.
>
>
> I think it is -Djava2d.ui.scale=1.0 AFAIR.

It's
-Dsun.java2d.uiScale=1.0
And this does make it fast again for me (paint time < 0.1sec)!

How do I make it run verbosely, I tried
-Dsun.java2d.trace=log,out:log.txt,verbose
But didn't see anything helpful.

Thanks,
Peter


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-10 Thread Laurent Bourgès
Peter,
What is the corresponding bug ?


> > Peter, what is your hardware & OS info ?
> It's an Intel core i7-8750H, 8GB RAM, intel UHD 630 graphics
> Windows 10 Pro 1803 build 17134.320
>
> Note that Java 8 is still 'fast' so there must be some difference
> between 8 & 11.
>
> I saw that on Java 11, the window size was bigger. I assume this is
> due to the HiDPI support added in Java 9 (my display scaling is set to
> 125%, which is the 'recommended' setting)
>
> Are there any options I can pass to java.exe which would turn off
> scaling - that might help to narrow down the problem.
>

I think it is -Djava2d.ui.scale=1.0 AFAIR.


> By the way, I tried it on a Mac, it was also 'fast', so it just seems
> to be Windows at the moment. I would appreciate it if someone else on
> Windows could check it out.
>

I can run tests in windows but i5+ discrete nvidia card.

You should run java2d in verbose mode to see if your intel gpu is not
supported by jdk11.
 It could then use software rendering as fallback and explain the slowdown.

Cheers,
Laurent


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-10 Thread Peter Hull
On Tue, Oct 9, 2018 at 3:52 PM Laurent Bourgès
 wrote:
> Peter, what is your hardware & OS info ?
It's an Intel core i7-8750H, 8GB RAM, intel UHD 630 graphics
Windows 10 Pro 1803 build 17134.320

Note that Java 8 is still 'fast' so there must be some difference
between 8 & 11.

I saw that on Java 11, the window size was bigger. I assume this is
due to the HiDPI support added in Java 9 (my display scaling is set to
125%, which is the 'recommended' setting)

Are there any options I can pass to java.exe which would turn off
scaling - that might help to narrow down the problem.

By the way, I tried it on a Mac, it was also 'fast', so it just seems
to be Windows at the moment. I would appreciate it if someone else on
Windows could check it out.

Thanks,
Peter


Re: [OpenJDK 2D-Dev] Speed of drawPolyline on JDK11

2018-10-09 Thread Laurent Bourgès
Hi Peter,

I tried on my linux laptop (i7 + nvidia binary driver) and my results are
the same on OpenJDK 8 / 11:





*- Java: 1.8.0_181 25.181-b13oct. 09, 2018 4:29:31 PM polylinetest.Canvas
paintComponentINFOS: Paint Time: 0,078soct. 09, 2018 4:29:31 PM
polylinetest.Canvas paintComponentINFOS: Paint Time: 0,032s*





*- Java: 11 11+28oct. 09, 2018 4:33:17 PM polylinetest.Canvas
paintComponentINFO: Paint Time: 0,058soct. 09, 2018 4:33:17 PM
polylinetest.Canvas paintComponentINFO: Paint Time: 0,03s*

Maybe it is related to your hardware, as such graphics calls should be
handled direclty by the accelerated pipelines ... XRender on linux.

Here are netbeans hotspots:
Name Self Time Self Time (CPU) Total Time Total Time (CPU) Hits
sun.java2d.xr.XRRenderer$XRDrawHandler.drawLine (int, int, int, int) 599 ms
(8,9 %) 599 ms (39 %) 702 ms (0,8 %) 702 ms (1,3 %) 393
sun.java2d.xr.XRBackendNative.XRenderRectanglesNative[native] (int, byte,
short, short, short, short, int[], int) 327 ms (4,9 %) 327 ms (21,3 %) 327
ms (0,4 %) 327 ms (0,6 %) 21
sun.java2d.loops.ProcessPath.doProcessPath
(sun.java2d.loops.ProcessPath.ProcessHandler, java.awt.geom.Path2D.Float,
float, float) 109 ms (1,6 %) 109 ms (7,1 %) 836 ms (0,9 %) 836 ms (1,5 %) 21
sun.java2d.xr.XRDrawLine.rasterizeLine (sun.java2d.xr.GrowableRectArray,
int, int, int, int, int, int, int, int, boolean, boolean) 85,1 ms (1,3 %) 85,1
ms (5,5 %) 103 ms (0,1 %) 103 ms (0,2 %) 458
sun.awt.X11.XInputMethod.createXICNative[native] (long) 46,9 ms (0,7 %) 46,9
ms (3,1 %) 46,9 ms (0,1 %) 46,9 ms (0,1 %) 1
java.awt.geom.Path2D$Float.needRoom (boolean, int) 45,2 ms (0,7 %) 45,2 ms
(2,9 %) 76,7 ms (0,1 %) 76,7 ms (0,1 %) 30
java.util.Arrays.copyOf (float[], int) 21,9 ms (0,3 %) 21,9 ms (1,4 %) 23,0
ms (0 %) 23,0 ms (0 %) 28
sun.java2d.xr.GrowableIntArray.growArray () 20,5 ms (0,3 %) 20,5 ms
(1,3 %) 21,0
ms (0 %) 21,0 ms (0 %) 4
sun.java2d.loops.ProcessPath$DrawProcessHandler.PROCESS_LINE (int, int,
int, int, boolean, int[]) 10,3 ms (0,2 %) 10,3 ms (0,7 %) 716 ms (0,8 %) 716
ms (1,3 %) 391

Peter, what is your hardware & OS info ?

Laurent

PS: I made few modifs in your code:
diff --git a/src/polylinetest/Canvas.java b/src/polylinetest/Canvas.java
index 8aad48c..a355e18 100644
--- a/src/polylinetest/Canvas.java
+++ b/src/polylinetest/Canvas.java
@@ -1,17 +1,26 @@
 package polylinetest;

+import java.awt.Color;
+import java.awt.Dimension;
 import java.awt.Graphics;
 import java.awt.Graphics2D;
+import java.awt.event.ActionEvent;
+import java.awt.event.ActionListener;
 import java.util.Random;
 import java.util.logging.Level;
 import java.util.logging.Logger;
-import javax.swing.JComponent;
+import javax.swing.JPanel;
+import javax.swing.Timer;

-public class Canvas extends JComponent {
+public class Canvas extends JPanel {

 public static int SIZEGRADE = 16;

 public Canvas() {
+super();
+setPreferredSize(new Dimension(300, 200));
+setBackground(Color.WHITE);
+
 Random rnd = new Random();
 int len = 1 << SIZEGRADE;
 xs = new int[len];
@@ -20,6 +29,19 @@ public class Canvas extends JComponent {
 xs[i] = rnd.nextInt(640);
 ys[i] = rnd.nextInt(480);
 }
+final Timer t = new Timer(100, new ActionListener() {
+int count = 0;
+@Override
+public void actionPerformed(ActionEvent e) {
+repaint();
+count++;
+if (count > 20) {
+System.exit(0);
+}
+}
+});
+t.setRepeats(true);
+t.start();
 }

 @Override
diff --git a/src/polylinetest/PolylineTest.java
b/src/polylinetest/PolylineTest.java
index 61c9e10..636c3b7 100644
--- a/src/polylinetest/PolylineTest.java
+++ b/src/polylinetest/PolylineTest.java
@@ -1,14 +1,29 @@
 package polylinetest;

+import java.awt.BorderLayout;
 import javax.swing.JFrame;
+import javax.swing.JPanel;

 public class PolylineTest {
+
 public static void main(String[] args) {
-JFrame frame = new JFrame("polyline test");
-frame.setSize(640, 480);
-frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
-frame.getContentPane().add(new Canvas());
-javax.swing.SwingUtilities.invokeLater(() ->
frame.setVisible(true));
+System.out.println("Java: "+System.getProperty("java.version") + "
" + System.getProperty("java.vm.version"));
+
+javax.swing.SwingUtilities.invokeLater(new Runnable() {
+@Override
+public void run() {
+JFrame frame = new JFrame("polyline test");
+frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
+
+JPanel panel = new JPanel(new BorderLayout());
+panel.add(new Canvas(), BorderLayout.CENTER);
+
+frame.getContentPane().add(panel);
+
+frame.pack();
+frame.setVisible(true);
+}
+});
 }
-
+
 }

Le