Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-03 Thread Andreas Hubold

Hi Gary,

I can confirm that this fixed the issue. Our tests are green again with 
current master. Thanks a lot for your quick response and fix!


Andreas

Gary Gregory wrote on 02.01.24 17:27:

I fixed this in git master and 2.16.0-SNAPSHOT builds.

Please test and report back! 

Gary


On Tue, Jan 2, 2024, 11:03 AM Gary Gregory  wrote:


Ah, intersection, I'll look into it.

Gary


On Tue, Jan 2, 2024, 9:50 AM Andreas Hubold
  wrote:


Hi Gary,

right, but it is optional for external entities, see
https://www.w3.org/TR/xml/#TextEntities

And the examples inhttps://www.w3.org/TR/xml/#NT-EncodingDecl  also
don't have version attributes, so this might still be a valid use case?





Cheers
Andreas


Gary Gregory schrieb am 02.01.24 um 15:42:

[Sie erhalten nicht häufig e-mailsvongarydgreg...@gmail.com. Weitere

Informationen, warum dies wichtig ist, finden Sie unterhttps://
aka.ms/LearnAboutSenderIdentification  ]

Hi Andreas,

In an "xml" PI, the "version" is NOT optional, see
https://www.w3.org/TR/REC-xml/#sec-pi

If we tried to handle all cases of invalid documents, then there would
be no end to it.

Gary

On Tue, Jan 2, 2024 at 9:36 AM Gary Gregory

wrote:

Ah, you are talking about something different, I am sorry about that.

Looking...

Gary

On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory

wrote:

Hello Andrea,

Please try git master or a 2.16.0-SNAPSHOT build
(

https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT
)

I fixed this today as reported inhttps://

github.com/apache/commons-io/pull/550

TY!
Gary

On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold
   wrote:

Hi,

the regular expression for the encoding was changed in

XmlStreamReader

between 2.13.0 and 2.15.1.

It now requires a version attribute in the XML declaration and does

not

work anymore with some real world files.

For example, the encoding from the following example declaration is
respected by 2.13.0, but not by 2.15.1



It works if the version is specified: 

However note, thathttps://www.w3.org/TR/xml/#NT-EncodingDecl  also
mentions examples without version attribute, at least for entities.

It

would be good to restore the previous behavior, IMHO.

Cheers,
Andreas




-
To unsubscribe,e-mail:user-unsubscr...@commons.apache.org
For additional commands,e-mail:user-h...@commons.apache.org


-
To unsubscribe,e-mail:user-unsubscr...@commons.apache.org
For additional commands,e-mail:user-h...@commons.apache.org



Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Gary Gregory
I fixed this in git master and 2.16.0-SNAPSHOT builds.

Please test and report back! 

Gary


On Tue, Jan 2, 2024, 11:03 AM Gary Gregory  wrote:

> Ah, intersection, I'll look into it.
>
> Gary
>
>
> On Tue, Jan 2, 2024, 9:50 AM Andreas Hubold
>  wrote:
>
>> Hi Gary,
>>
>> right, but it is optional for external entities, see
>> https://www.w3.org/TR/xml/#TextEntities
>>
>> And the examples in https://www.w3.org/TR/xml/#NT-EncodingDecl also
>> don't have version attributes, so this might still be a valid use case?
>>
>> > 
>> > 
>>
>> Cheers
>> Andreas
>>
>>
>> Gary Gregory schrieb am 02.01.24 um 15:42:
>> > [Sie erhalten nicht häufig E-Mails vongarydgreg...@gmail.com. Weitere
>> Informationen, warum dies wichtig ist, finden Sie unterhttps://
>> aka.ms/LearnAboutSenderIdentification  ]
>> >
>> > Hi Andreas,
>> >
>> > In an "xml" PI, the "version" is NOT optional, see
>> > https://www.w3.org/TR/REC-xml/#sec-pi
>> >
>> > If we tried to handle all cases of invalid documents, then there would
>> > be no end to it.
>> >
>> > Gary
>> >
>> > On Tue, Jan 2, 2024 at 9:36 AM Gary Gregory
>> wrote:
>> >> Ah, you are talking about something different, I am sorry about that.
>> Looking...
>> >>
>> >> Gary
>> >>
>> >> On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory
>> wrote:
>> >>> Hello Andrea,
>> >>>
>> >>> Please try git master or a 2.16.0-SNAPSHOT build
>> >>> (
>> https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT
>> )
>> >>> I fixed this today as reported inhttps://
>> github.com/apache/commons-io/pull/550
>> >>>
>> >>> TY!
>> >>> Gary
>> >>>
>> >>> On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold
>> >>>   wrote:
>>  Hi,
>> 
>>  the regular expression for the encoding was changed in
>> XmlStreamReader
>>  between 2.13.0 and 2.15.1.
>> 
>>  It now requires a version attribute in the XML declaration and does
>> not
>>  work anymore with some real world files.
>> 
>>  For example, the encoding from the following example declaration is
>>  respected by 2.13.0, but not by 2.15.1
>> 
>>  
>> 
>>  It works if the version is specified: >  encoding='Cp1047'?>
>> 
>>  However note, thathttps://www.w3.org/TR/xml/#NT-EncodingDecl  also
>>  mentions examples without version attribute, at least for entities.
>> It
>>  would be good to restore the previous behavior, IMHO.
>> 
>>  Cheers,
>>  Andreas
>> 
>> 
>> 
>> 
>>  -
>>  To unsubscribe, e-mail:user-unsubscr...@commons.apache.org
>>  For additional commands, e-mail:user-h...@commons.apache.org
>> 
>> > -
>> > To unsubscribe, e-mail:user-unsubscr...@commons.apache.org
>> > For additional commands, e-mail:user-h...@commons.apache.org
>> >
>>
>


Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Gary Gregory
Ah, intersection, I'll look into it.

Gary


On Tue, Jan 2, 2024, 9:50 AM Andreas Hubold
 wrote:

> Hi Gary,
>
> right, but it is optional for external entities, see
> https://www.w3.org/TR/xml/#TextEntities
>
> And the examples in https://www.w3.org/TR/xml/#NT-EncodingDecl also
> don't have version attributes, so this might still be a valid use case?
>
> > 
> > 
>
> Cheers
> Andreas
>
>
> Gary Gregory schrieb am 02.01.24 um 15:42:
> > [Sie erhalten nicht häufig E-Mails vongarydgreg...@gmail.com. Weitere
> Informationen, warum dies wichtig ist, finden Sie unterhttps://
> aka.ms/LearnAboutSenderIdentification  ]
> >
> > Hi Andreas,
> >
> > In an "xml" PI, the "version" is NOT optional, see
> > https://www.w3.org/TR/REC-xml/#sec-pi
> >
> > If we tried to handle all cases of invalid documents, then there would
> > be no end to it.
> >
> > Gary
> >
> > On Tue, Jan 2, 2024 at 9:36 AM Gary Gregory
> wrote:
> >> Ah, you are talking about something different, I am sorry about that.
> Looking...
> >>
> >> Gary
> >>
> >> On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory
> wrote:
> >>> Hello Andrea,
> >>>
> >>> Please try git master or a 2.16.0-SNAPSHOT build
> >>> (
> https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT
> )
> >>> I fixed this today as reported inhttps://
> github.com/apache/commons-io/pull/550
> >>>
> >>> TY!
> >>> Gary
> >>>
> >>> On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold
> >>>   wrote:
>  Hi,
> 
>  the regular expression for the encoding was changed in XmlStreamReader
>  between 2.13.0 and 2.15.1.
> 
>  It now requires a version attribute in the XML declaration and does
> not
>  work anymore with some real world files.
> 
>  For example, the encoding from the following example declaration is
>  respected by 2.13.0, but not by 2.15.1
> 
>  
> 
>  It works if the version is specified:   encoding='Cp1047'?>
> 
>  However note, thathttps://www.w3.org/TR/xml/#NT-EncodingDecl  also
>  mentions examples without version attribute, at least for entities. It
>  would be good to restore the previous behavior, IMHO.
> 
>  Cheers,
>  Andreas
> 
> 
> 
> 
>  -
>  To unsubscribe, e-mail:user-unsubscr...@commons.apache.org
>  For additional commands, e-mail:user-h...@commons.apache.org
> 
> > -
> > To unsubscribe, e-mail:user-unsubscr...@commons.apache.org
> > For additional commands, e-mail:user-h...@commons.apache.org
> >
>


Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Gary Gregory
Andreas,

I just remembered that we have a lenient setting that could be used to
access a different regular expression that does not care about correctness.

If we do support this, then the regular expression must be lenient enough
but not so much that it can be used as an attack vector for resource
consumption, which was a problem in the past IIRC.

Whether or not it's a good idea to have a new lenient setting, overload the
current one, or have one at all, is a different topic.

Gary

On Tue, Jan 2, 2024, 9:42 AM Gary Gregory  wrote:

> Hi Andreas,
>
> In an "xml" PI, the "version" is NOT optional, see
> https://www.w3.org/TR/REC-xml/#sec-pi
>
> If we tried to handle all cases of invalid documents, then there would
> be no end to it.
>
> Gary
>
> On Tue, Jan 2, 2024 at 9:36 AM Gary Gregory 
> wrote:
> >
> > Ah, you are talking about something different, I am sorry about that.
> Looking...
> >
> > Gary
> >
> > On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory 
> wrote:
> > >
> > > Hello Andrea,
> > >
> > > Please try git master or a 2.16.0-SNAPSHOT build
> > > (
> https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT
> )
> > > I fixed this today as reported in
> https://github.com/apache/commons-io/pull/550
> > >
> > > TY!
> > > Gary
> > >
> > > On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > the regular expression for the encoding was changed in
> XmlStreamReader
> > > > between 2.13.0 and 2.15.1.
> > > >
> > > > It now requires a version attribute in the XML declaration and does
> not
> > > > work anymore with some real world files.
> > > >
> > > > For example, the encoding from the following example declaration is
> > > > respected by 2.13.0, but not by 2.15.1
> > > >
> > > > 
> > > >
> > > > It works if the version is specified:  > > > encoding='Cp1047'?>
> > > >
> > > > However note, that https://www.w3.org/TR/xml/#NT-EncodingDecl also
> > > > mentions examples without version attribute, at least for entities.
> It
> > > > would be good to restore the previous behavior, IMHO.
> > > >
> > > > Cheers,
> > > > Andreas
> > > >
> > > >
> > > >
> > > >
> > > > -
> > > > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
> > > > For additional commands, e-mail: user-h...@commons.apache.org
> > > >
>


Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Andreas Hubold

Hi Gary,

right, but it is optional for external entities, see 
https://www.w3.org/TR/xml/#TextEntities


And the examples in https://www.w3.org/TR/xml/#NT-EncodingDecl also 
don't have version attributes, so this might still be a valid use case?







Cheers
Andreas


Gary Gregory schrieb am 02.01.24 um 15:42:

[Sie erhalten nicht häufig E-Mails vongarydgreg...@gmail.com. Weitere 
Informationen, warum dies wichtig ist, finden Sie 
unterhttps://aka.ms/LearnAboutSenderIdentification  ]

Hi Andreas,

In an "xml" PI, the "version" is NOT optional, see
https://www.w3.org/TR/REC-xml/#sec-pi

If we tried to handle all cases of invalid documents, then there would
be no end to it.

Gary

On Tue, Jan 2, 2024 at 9:36 AM Gary Gregory  wrote:

Ah, you are talking about something different, I am sorry about that. Looking...

Gary

On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory  wrote:

Hello Andrea,

Please try git master or a 2.16.0-SNAPSHOT build
(https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT)
I fixed this today as reported inhttps://github.com/apache/commons-io/pull/550

TY!
Gary

On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold
  wrote:

Hi,

the regular expression for the encoding was changed in XmlStreamReader
between 2.13.0 and 2.15.1.

It now requires a version attribute in the XML declaration and does not
work anymore with some real world files.

For example, the encoding from the following example declaration is
respected by 2.13.0, but not by 2.15.1



It works if the version is specified: 

However note, thathttps://www.w3.org/TR/xml/#NT-EncodingDecl  also
mentions examples without version attribute, at least for entities. It
would be good to restore the previous behavior, IMHO.

Cheers,
Andreas




-
To unsubscribe, e-mail:user-unsubscr...@commons.apache.org
For additional commands, e-mail:user-h...@commons.apache.org


-
To unsubscribe, e-mail:user-unsubscr...@commons.apache.org
For additional commands, e-mail:user-h...@commons.apache.org



Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Gary Gregory
Hi Andreas,

In an "xml" PI, the "version" is NOT optional, see
https://www.w3.org/TR/REC-xml/#sec-pi

If we tried to handle all cases of invalid documents, then there would
be no end to it.

Gary

On Tue, Jan 2, 2024 at 9:36 AM Gary Gregory  wrote:
>
> Ah, you are talking about something different, I am sorry about that. 
> Looking...
>
> Gary
>
> On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory  wrote:
> >
> > Hello Andrea,
> >
> > Please try git master or a 2.16.0-SNAPSHOT build
> > (https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT)
> > I fixed this today as reported in 
> > https://github.com/apache/commons-io/pull/550
> >
> > TY!
> > Gary
> >
> > On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold
> >  wrote:
> > >
> > > Hi,
> > >
> > > the regular expression for the encoding was changed in XmlStreamReader
> > > between 2.13.0 and 2.15.1.
> > >
> > > It now requires a version attribute in the XML declaration and does not
> > > work anymore with some real world files.
> > >
> > > For example, the encoding from the following example declaration is
> > > respected by 2.13.0, but not by 2.15.1
> > >
> > > 
> > >
> > > It works if the version is specified:  > > encoding='Cp1047'?>
> > >
> > > However note, that https://www.w3.org/TR/xml/#NT-EncodingDecl also
> > > mentions examples without version attribute, at least for entities. It
> > > would be good to restore the previous behavior, IMHO.
> > >
> > > Cheers,
> > > Andreas
> > >
> > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: user-h...@commons.apache.org
> > >

-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Gary Gregory
Ah, you are talking about something different, I am sorry about that. Looking...

Gary

On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory  wrote:
>
> Hello Andrea,
>
> Please try git master or a 2.16.0-SNAPSHOT build
> (https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT)
> I fixed this today as reported in 
> https://github.com/apache/commons-io/pull/550
>
> TY!
> Gary
>
> On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold
>  wrote:
> >
> > Hi,
> >
> > the regular expression for the encoding was changed in XmlStreamReader
> > between 2.13.0 and 2.15.1.
> >
> > It now requires a version attribute in the XML declaration and does not
> > work anymore with some real world files.
> >
> > For example, the encoding from the following example declaration is
> > respected by 2.13.0, but not by 2.15.1
> >
> > 
> >
> > It works if the version is specified:  > encoding='Cp1047'?>
> >
> > However note, that https://www.w3.org/TR/xml/#NT-EncodingDecl also
> > mentions examples without version attribute, at least for entities. It
> > would be good to restore the previous behavior, IMHO.
> >
> > Cheers,
> > Andreas
> >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
> > For additional commands, e-mail: user-h...@commons.apache.org
> >

-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



Re: XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Gary Gregory
Hello Andrea,

Please try git master or a 2.16.0-SNAPSHOT build
(https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT)
I fixed this today as reported in https://github.com/apache/commons-io/pull/550

TY!
Gary

On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold
 wrote:
>
> Hi,
>
> the regular expression for the encoding was changed in XmlStreamReader
> between 2.13.0 and 2.15.1.
>
> It now requires a version attribute in the XML declaration and does not
> work anymore with some real world files.
>
> For example, the encoding from the following example declaration is
> respected by 2.13.0, but not by 2.15.1
>
> 
>
> It works if the version is specified:  encoding='Cp1047'?>
>
> However note, that https://www.w3.org/TR/xml/#NT-EncodingDecl also
> mentions examples without version attribute, at least for entities. It
> would be good to restore the previous behavior, IMHO.
>
> Cheers,
> Andreas
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
> For additional commands, e-mail: user-h...@commons.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



XmlStreamReader encoding regexp does not work anymore without version

2024-01-02 Thread Andreas Hubold

Hi,

the regular expression for the encoding was changed in XmlStreamReader 
between 2.13.0 and 2.15.1.


It now requires a version attribute in the XML declaration and does not 
work anymore with some real world files.


For example, the encoding from the following example declaration is 
respected by 2.13.0, but not by 2.15.1




It works if the version is specified: encoding='Cp1047'?>


However note, that https://www.w3.org/TR/xml/#NT-EncodingDecl also 
mentions examples without version attribute, at least for entities. It 
would be good to restore the previous behavior, IMHO.


Cheers,
Andreas




-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org