Re: Regression Test Run for upcoming 5.0.0

2021-01-05 Thread Dominik Stadler
Hi,

the 2nd run of the regression tests is now finished, results look much
better now, only very few failures left (56 failures in 12 stacktraces):

1) o.a.p.ooxml.POIXMLException: error: The document is not a
xml@urn:schemas-poi-apache-org:vmldrawing:
document element namespace mismatch expected
"urn:schemas-poi-apache-org:vmldrawing" got "
http://schemas.openxmlformats.org/spreadsheetml/2006/main;
=> Seems to have been introduced by #64773 - Visual signatures for
.xlsx/.docx, Subversion Revision 1882394

2) A few failures related to drawing slideshows, likely introduced by
support much more functionality there, not sure if we need to fix those

3) java.lang.RuntimeException: CountryRecord or SSTRecord not found: This
is just a change in an error-message which needs to be catched differently
in the integration-tests

4) some documents try to allocate very large arrays, which I would ignore
as a user can increase the allowed max allocated memory easily

5) "java.lang.IllegalArgumentException: Invalid char (*) found at index (*)
in sheet name *" => now happens because we fixed another issue, so not an
actual regression

Full reports are at
http://people.apache.org/~centic/poi_regression/reports/index412RC3to500RC1.html
and
http://people.apache.org/~centic/poi_regression/reportsAll/index412RC3to500RC1.html

I think we only need to take a look at 1) and 2) before releasing.

Thanks... Dominik.


On Sun, Jan 3, 2021 at 1:08 PM Dominik Stadler 
wrote:

> Hi,
>
> Thanks for the fixes and the "stress" documents, I added a few more and
> added a test for the normal unit-tests to trigger those documents,
> otherwise the ooxml-schema-lite does not contain them as far as I saw.
>
> Next regression-run is underway...
>
> Dominik.
>
> On Wed, Dec 30, 2020 at 8:25 PM Andreas Beeker 
> wrote:
>
>> HI,
>>
>> I've mentioned it in our private slack group *) - there's also an ant
>> error, which ignores quite a few *$Factory.class-es in packing the lite jar.
>> I'm currently trying to figure out how I can workaround this.
>>
>> > Another potential approach: ...
>> This was my first approach class -> xsb, but it was not reliable
>> therefore I've spent some time to find out (the few lines) of byte-buddy
>> code.
>> So those .xsb are the ones we use in our test. if we do b) those should
>> be picked up.
>>
>> Andi
>>
>> *) this is just a participation reminder for the rest - I'm happy to
>> invite you if you tell me your asf slack id ;)
>>
>> On 30.12.20 20:04, Dominik Stadler wrote:
>> > Hi,
>> >
>> > I'd go for b), hopefully not too many are necessary, it seems a simple
>> test
>> > which reads in the document triggers the necesary parts in most of the
>> > cases.
>> >
>> > c) would mean anybody out there with such a file would now get
>> > regression-errors unless he switches to the full file.
>> >
>> > Another potential approach: I don't know much about how you do all this
>> > agent-stuff nowadays, but is there a way to match the classes to the
>> xsb to
>> > find those missing ones as we seem to cover the classes themselves
>> already
>> > as they are only included when used in tests.
>> >
>> > Dominik.
>> >
>> > On Wed, Dec 30, 2020 at 7:09 PM Andreas Beeker 
>> wrote:
>> >
>> >> Hi Dominik,
>> >>
>> >> thank you for running the regression test.
>> >>
>> >>> * Most of these are because the "lite" ooxml-schema jar is still
>> missing
>> >>> some stuff, not sure if the new way of building the lite-jar is the
>> cause
>> >>> or if we now use more parts in the regression tests
>> >> The lite jar used to contain all *.xsb files and now it will only
>> contains
>> >> the ones used in the tests, which decreased its size by around 40%.
>> >>
>> >> Should we ... ?
>> >> a) rollback the change and include all *.xsbs - the class files might
>> be
>> >> still missing
>> >> b) provide unit tests for the failing files - we might need a few
>> >> roundtrips to fix those cases, i.e. best would be a reduced file list
>> of
>> >> those failures
>> >> c) use the full schema for the regression tests
>> >>
>> >> Andi
>> >>
>> >>
>> >> On 30.12.20 17:37, Dominik Stadler wrote:
>> >>> Hi,
>> >>>
>> >>> In order to get the release-preparations rolling a bit, I have
>> finished a
>> >>> first run of the "mass regression test" exercise.
>> >>>
>> >>> As usual it brings up cases where documents fail now, but did work
>> fine
>> >>> previously, i.e. regressions that we may have introduced since the
>> >> previous
>> >>> release.
>> >>>
>> >>> I now process 3,356,984 documents (460k of those are skipped because
>> they
>> >>> are duplicates), currently there are around 3800 documents which show
>> a
>> >>> regression:
>> >>> * Most of these are because the "lite" ooxml-schema jar is still
>> missing
>> >>> some stuff, not sure if the new way of building the lite-jar is the
>> cause
>> >>> or if we now use more parts in the regression tests
>> >>> * some exceptions/NPEs probably related to more support for
>> >>> drawing/rendering PPT(X) and so 

Re: Regression Test Run for upcoming 5.0.0

2021-01-03 Thread Dominik Stadler
Hi,

Thanks for the fixes and the "stress" documents, I added a few more and
added a test for the normal unit-tests to trigger those documents,
otherwise the ooxml-schema-lite does not contain them as far as I saw.

Next regression-run is underway...

Dominik.

On Wed, Dec 30, 2020 at 8:25 PM Andreas Beeker  wrote:

> HI,
>
> I've mentioned it in our private slack group *) - there's also an ant
> error, which ignores quite a few *$Factory.class-es in packing the lite jar.
> I'm currently trying to figure out how I can workaround this.
>
> > Another potential approach: ...
> This was my first approach class -> xsb, but it was not reliable therefore
> I've spent some time to find out (the few lines) of byte-buddy code.
> So those .xsb are the ones we use in our test. if we do b) those should be
> picked up.
>
> Andi
>
> *) this is just a participation reminder for the rest - I'm happy to
> invite you if you tell me your asf slack id ;)
>
> On 30.12.20 20:04, Dominik Stadler wrote:
> > Hi,
> >
> > I'd go for b), hopefully not too many are necessary, it seems a simple
> test
> > which reads in the document triggers the necesary parts in most of the
> > cases.
> >
> > c) would mean anybody out there with such a file would now get
> > regression-errors unless he switches to the full file.
> >
> > Another potential approach: I don't know much about how you do all this
> > agent-stuff nowadays, but is there a way to match the classes to the xsb
> to
> > find those missing ones as we seem to cover the classes themselves
> already
> > as they are only included when used in tests.
> >
> > Dominik.
> >
> > On Wed, Dec 30, 2020 at 7:09 PM Andreas Beeker 
> wrote:
> >
> >> Hi Dominik,
> >>
> >> thank you for running the regression test.
> >>
> >>> * Most of these are because the "lite" ooxml-schema jar is still
> missing
> >>> some stuff, not sure if the new way of building the lite-jar is the
> cause
> >>> or if we now use more parts in the regression tests
> >> The lite jar used to contain all *.xsb files and now it will only
> contains
> >> the ones used in the tests, which decreased its size by around 40%.
> >>
> >> Should we ... ?
> >> a) rollback the change and include all *.xsbs - the class files might be
> >> still missing
> >> b) provide unit tests for the failing files - we might need a few
> >> roundtrips to fix those cases, i.e. best would be a reduced file list of
> >> those failures
> >> c) use the full schema for the regression tests
> >>
> >> Andi
> >>
> >>
> >> On 30.12.20 17:37, Dominik Stadler wrote:
> >>> Hi,
> >>>
> >>> In order to get the release-preparations rolling a bit, I have
> finished a
> >>> first run of the "mass regression test" exercise.
> >>>
> >>> As usual it brings up cases where documents fail now, but did work fine
> >>> previously, i.e. regressions that we may have introduced since the
> >> previous
> >>> release.
> >>>
> >>> I now process 3,356,984 documents (460k of those are skipped because
> they
> >>> are duplicates), currently there are around 3800 documents which show a
> >>> regression:
> >>> * Most of these are because the "lite" ooxml-schema jar is still
> missing
> >>> some stuff, not sure if the new way of building the lite-jar is the
> cause
> >>> or if we now use more parts in the regression tests
> >>> * some exceptions/NPEs probably related to more support for
> >>> drawing/rendering PPT(X) and so some may in fact be simply new
> "expected"
> >>> exceptions for broken documents
> >>> * Note: The ones with TIMEOUT or OLDFORMAT are not regressions
> >>>
> >>> 5.0.0 vs. 4.1.2:
> >>>
> >>
> http://people.apache.org/~centic/poi_regression/reports/index412RC3to500RC1.html
> >>> 5.0.0 overall errors:
> >>>
> >>
> http://people.apache.org/~centic/poi_regression/reportsAll/index412RC3to500RC1.html
> >>> I can fairly easily re-run this as soon as we have fixes for some of
> the
> >>> things.
> >>>
> >>> Thanks... Dominik.
> >>>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> >> For additional commands, e-mail: dev-h...@poi.apache.org
> >>
> >>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> For additional commands, e-mail: dev-h...@poi.apache.org
>
>


Re: Regression Test Run for upcoming 5.0.0

2020-12-30 Thread Andreas Beeker

HI,

I've mentioned it in our private slack group *) - there's also an ant error, 
which ignores quite a few *$Factory.class-es in packing the lite jar.
I'm currently trying to figure out how I can workaround this.


Another potential approach: ...

This was my first approach class -> xsb, but it was not reliable therefore I've 
spent some time to find out (the few lines) of byte-buddy code.
So those .xsb are the ones we use in our test. if we do b) those should be 
picked up.

Andi

*) this is just a participation reminder for the rest - I'm happy to invite you 
if you tell me your asf slack id ;)

On 30.12.20 20:04, Dominik Stadler wrote:

Hi,

I'd go for b), hopefully not too many are necessary, it seems a simple test
which reads in the document triggers the necesary parts in most of the
cases.

c) would mean anybody out there with such a file would now get
regression-errors unless he switches to the full file.

Another potential approach: I don't know much about how you do all this
agent-stuff nowadays, but is there a way to match the classes to the xsb to
find those missing ones as we seem to cover the classes themselves already
as they are only included when used in tests.

Dominik.

On Wed, Dec 30, 2020 at 7:09 PM Andreas Beeker  wrote:


Hi Dominik,

thank you for running the regression test.


* Most of these are because the "lite" ooxml-schema jar is still missing
some stuff, not sure if the new way of building the lite-jar is the cause
or if we now use more parts in the regression tests

The lite jar used to contain all *.xsb files and now it will only contains
the ones used in the tests, which decreased its size by around 40%.

Should we ... ?
a) rollback the change and include all *.xsbs - the class files might be
still missing
b) provide unit tests for the failing files - we might need a few
roundtrips to fix those cases, i.e. best would be a reduced file list of
those failures
c) use the full schema for the regression tests

Andi


On 30.12.20 17:37, Dominik Stadler wrote:

Hi,

In order to get the release-preparations rolling a bit, I have finished a
first run of the "mass regression test" exercise.

As usual it brings up cases where documents fail now, but did work fine
previously, i.e. regressions that we may have introduced since the

previous

release.

I now process 3,356,984 documents (460k of those are skipped because they
are duplicates), currently there are around 3800 documents which show a
regression:
* Most of these are because the "lite" ooxml-schema jar is still missing
some stuff, not sure if the new way of building the lite-jar is the cause
or if we now use more parts in the regression tests
* some exceptions/NPEs probably related to more support for
drawing/rendering PPT(X) and so some may in fact be simply new "expected"
exceptions for broken documents
* Note: The ones with TIMEOUT or OLDFORMAT are not regressions

5.0.0 vs. 4.1.2:


http://people.apache.org/~centic/poi_regression/reports/index412RC3to500RC1.html

5.0.0 overall errors:


http://people.apache.org/~centic/poi_regression/reportsAll/index412RC3to500RC1.html

I can fairly easily re-run this as soon as we have fixes for some of the
things.

Thanks... Dominik.



-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



Re: Regression Test Run for upcoming 5.0.0

2020-12-30 Thread Dominik Stadler
Hi,

I'd go for b), hopefully not too many are necessary, it seems a simple test
which reads in the document triggers the necesary parts in most of the
cases.

c) would mean anybody out there with such a file would now get
regression-errors unless he switches to the full file.

Another potential approach: I don't know much about how you do all this
agent-stuff nowadays, but is there a way to match the classes to the xsb to
find those missing ones as we seem to cover the classes themselves already
as they are only included when used in tests.

Dominik.

On Wed, Dec 30, 2020 at 7:09 PM Andreas Beeker  wrote:

> Hi Dominik,
>
> thank you for running the regression test.
>
> > * Most of these are because the "lite" ooxml-schema jar is still missing
> > some stuff, not sure if the new way of building the lite-jar is the cause
> > or if we now use more parts in the regression tests
>
> The lite jar used to contain all *.xsb files and now it will only contains
> the ones used in the tests, which decreased its size by around 40%.
>
> Should we ... ?
> a) rollback the change and include all *.xsbs - the class files might be
> still missing
> b) provide unit tests for the failing files - we might need a few
> roundtrips to fix those cases, i.e. best would be a reduced file list of
> those failures
> c) use the full schema for the regression tests
>
> Andi
>
>
> On 30.12.20 17:37, Dominik Stadler wrote:
> > Hi,
> >
> > In order to get the release-preparations rolling a bit, I have finished a
> > first run of the "mass regression test" exercise.
> >
> > As usual it brings up cases where documents fail now, but did work fine
> > previously, i.e. regressions that we may have introduced since the
> previous
> > release.
> >
> > I now process 3,356,984 documents (460k of those are skipped because they
> > are duplicates), currently there are around 3800 documents which show a
> > regression:
> > * Most of these are because the "lite" ooxml-schema jar is still missing
> > some stuff, not sure if the new way of building the lite-jar is the cause
> > or if we now use more parts in the regression tests
> > * some exceptions/NPEs probably related to more support for
> > drawing/rendering PPT(X) and so some may in fact be simply new "expected"
> > exceptions for broken documents
> > * Note: The ones with TIMEOUT or OLDFORMAT are not regressions
> >
> > 5.0.0 vs. 4.1.2:
> >
> http://people.apache.org/~centic/poi_regression/reports/index412RC3to500RC1.html
> >
> > 5.0.0 overall errors:
> >
> http://people.apache.org/~centic/poi_regression/reportsAll/index412RC3to500RC1.html
> >
> > I can fairly easily re-run this as soon as we have fixes for some of the
> > things.
> >
> > Thanks... Dominik.
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> For additional commands, e-mail: dev-h...@poi.apache.org
>
>


Re: Regression Test Run for upcoming 5.0.0

2020-12-30 Thread Andreas Beeker

Hi Dominik,

thank you for running the regression test.


* Most of these are because the "lite" ooxml-schema jar is still missing
some stuff, not sure if the new way of building the lite-jar is the cause
or if we now use more parts in the regression tests


The lite jar used to contain all *.xsb files and now it will only contains the 
ones used in the tests, which decreased its size by around 40%.

Should we ... ?
a) rollback the change and include all *.xsbs - the class files might be still 
missing
b) provide unit tests for the failing files - we might need a few roundtrips to 
fix those cases, i.e. best would be a reduced file list of those failures
c) use the full schema for the regression tests

Andi


On 30.12.20 17:37, Dominik Stadler wrote:

Hi,

In order to get the release-preparations rolling a bit, I have finished a
first run of the "mass regression test" exercise.

As usual it brings up cases where documents fail now, but did work fine
previously, i.e. regressions that we may have introduced since the previous
release.

I now process 3,356,984 documents (460k of those are skipped because they
are duplicates), currently there are around 3800 documents which show a
regression:
* Most of these are because the "lite" ooxml-schema jar is still missing
some stuff, not sure if the new way of building the lite-jar is the cause
or if we now use more parts in the regression tests
* some exceptions/NPEs probably related to more support for
drawing/rendering PPT(X) and so some may in fact be simply new "expected"
exceptions for broken documents
* Note: The ones with TIMEOUT or OLDFORMAT are not regressions

5.0.0 vs. 4.1.2:
http://people.apache.org/~centic/poi_regression/reports/index412RC3to500RC1.html

5.0.0 overall errors:
http://people.apache.org/~centic/poi_regression/reportsAll/index412RC3to500RC1.html

I can fairly easily re-run this as soon as we have fixes for some of the
things.

Thanks... Dominik.




-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



Regression Test Run for upcoming 5.0.0

2020-12-30 Thread Dominik Stadler
Hi,

In order to get the release-preparations rolling a bit, I have finished a
first run of the "mass regression test" exercise.

As usual it brings up cases where documents fail now, but did work fine
previously, i.e. regressions that we may have introduced since the previous
release.

I now process 3,356,984 documents (460k of those are skipped because they
are duplicates), currently there are around 3800 documents which show a
regression:
* Most of these are because the "lite" ooxml-schema jar is still missing
some stuff, not sure if the new way of building the lite-jar is the cause
or if we now use more parts in the regression tests
* some exceptions/NPEs probably related to more support for
drawing/rendering PPT(X) and so some may in fact be simply new "expected"
exceptions for broken documents
* Note: The ones with TIMEOUT or OLDFORMAT are not regressions

5.0.0 vs. 4.1.2:
http://people.apache.org/~centic/poi_regression/reports/index412RC3to500RC1.html

5.0.0 overall errors:
http://people.apache.org/~centic/poi_regression/reportsAll/index412RC3to500RC1.html

I can fairly easily re-run this as soon as we have fixes for some of the
things.

Thanks... Dominik.