Re: XmlStreamReader encoding regexp does not work anymore without version
Hi Gary, I can confirm that this fixed the issue. Our tests are green again with current master. Thanks a lot for your quick response and fix! Andreas Gary Gregory wrote on 02.01.24 17:27: I fixed this in git master and 2.16.0-SNAPSHOT builds. Please test and report back! Gary On Tue, Jan 2, 2024, 11:03 AM Gary Gregory wrote: Ah, intersection, I'll look into it. Gary On Tue, Jan 2, 2024, 9:50 AM Andreas Hubold wrote: Hi Gary, right, but it is optional for external entities, see https://www.w3.org/TR/xml/#TextEntities And the examples inhttps://www.w3.org/TR/xml/#NT-EncodingDecl also don't have version attributes, so this might still be a valid use case? Cheers Andreas Gary Gregory schrieb am 02.01.24 um 15:42: [Sie erhalten nicht häufig e-mailsvongarydgreg...@gmail.com. Weitere Informationen, warum dies wichtig ist, finden Sie unterhttps:// aka.ms/LearnAboutSenderIdentification ] Hi Andreas, In an "xml" PI, the "version" is NOT optional, see https://www.w3.org/TR/REC-xml/#sec-pi If we tried to handle all cases of invalid documents, then there would be no end to it. Gary On Tue, Jan 2, 2024 at 9:36 AM Gary Gregory wrote: Ah, you are talking about something different, I am sorry about that. Looking... Gary On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory wrote: Hello Andrea, Please try git master or a 2.16.0-SNAPSHOT build ( https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT ) I fixed this today as reported inhttps:// github.com/apache/commons-io/pull/550 TY! Gary On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold wrote: Hi, the regular expression for the encoding was changed in XmlStreamReader between 2.13.0 and 2.15.1. It now requires a version attribute in the XML declaration and does not work anymore with some real world files. For example, the encoding from the following example declaration is respected by 2.13.0, but not by 2.15.1 It works if the version is specified: However note, thathttps://www.w3.org/TR/xml/#NT-EncodingDecl also mentions examples without version attribute, at least for entities. It would be good to restore the previous behavior, IMHO. Cheers, Andreas - To unsubscribe,e-mail:user-unsubscr...@commons.apache.org For additional commands,e-mail:user-h...@commons.apache.org - To unsubscribe,e-mail:user-unsubscr...@commons.apache.org For additional commands,e-mail:user-h...@commons.apache.org
Re: XmlStreamReader encoding regexp does not work anymore without version
I fixed this in git master and 2.16.0-SNAPSHOT builds. Please test and report back! Gary On Tue, Jan 2, 2024, 11:03 AM Gary Gregory wrote: > Ah, intersection, I'll look into it. > > Gary > > > On Tue, Jan 2, 2024, 9:50 AM Andreas Hubold > wrote: > >> Hi Gary, >> >> right, but it is optional for external entities, see >> https://www.w3.org/TR/xml/#TextEntities >> >> And the examples in https://www.w3.org/TR/xml/#NT-EncodingDecl also >> don't have version attributes, so this might still be a valid use case? >> >> > >> > >> >> Cheers >> Andreas >> >> >> Gary Gregory schrieb am 02.01.24 um 15:42: >> > [Sie erhalten nicht häufig E-Mails vongarydgreg...@gmail.com. Weitere >> Informationen, warum dies wichtig ist, finden Sie unterhttps:// >> aka.ms/LearnAboutSenderIdentification ] >> > >> > Hi Andreas, >> > >> > In an "xml" PI, the "version" is NOT optional, see >> > https://www.w3.org/TR/REC-xml/#sec-pi >> > >> > If we tried to handle all cases of invalid documents, then there would >> > be no end to it. >> > >> > Gary >> > >> > On Tue, Jan 2, 2024 at 9:36 AM Gary Gregory >> wrote: >> >> Ah, you are talking about something different, I am sorry about that. >> Looking... >> >> >> >> Gary >> >> >> >> On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory >> wrote: >> >>> Hello Andrea, >> >>> >> >>> Please try git master or a 2.16.0-SNAPSHOT build >> >>> ( >> https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT >> ) >> >>> I fixed this today as reported inhttps:// >> github.com/apache/commons-io/pull/550 >> >>> >> >>> TY! >> >>> Gary >> >>> >> >>> On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold >> >>> wrote: >> Hi, >> >> the regular expression for the encoding was changed in >> XmlStreamReader >> between 2.13.0 and 2.15.1. >> >> It now requires a version attribute in the XML declaration and does >> not >> work anymore with some real world files. >> >> For example, the encoding from the following example declaration is >> respected by 2.13.0, but not by 2.15.1 >> >> >> >> It works if the version is specified: > encoding='Cp1047'?> >> >> However note, thathttps://www.w3.org/TR/xml/#NT-EncodingDecl also >> mentions examples without version attribute, at least for entities. >> It >> would be good to restore the previous behavior, IMHO. >> >> Cheers, >> Andreas >> >> >> >> >> - >> To unsubscribe, e-mail:user-unsubscr...@commons.apache.org >> For additional commands, e-mail:user-h...@commons.apache.org >> >> > - >> > To unsubscribe, e-mail:user-unsubscr...@commons.apache.org >> > For additional commands, e-mail:user-h...@commons.apache.org >> > >> >
Re: XmlStreamReader encoding regexp does not work anymore without version
Ah, intersection, I'll look into it. Gary On Tue, Jan 2, 2024, 9:50 AM Andreas Hubold wrote: > Hi Gary, > > right, but it is optional for external entities, see > https://www.w3.org/TR/xml/#TextEntities > > And the examples in https://www.w3.org/TR/xml/#NT-EncodingDecl also > don't have version attributes, so this might still be a valid use case? > > > > > > > Cheers > Andreas > > > Gary Gregory schrieb am 02.01.24 um 15:42: > > [Sie erhalten nicht häufig E-Mails vongarydgreg...@gmail.com. Weitere > Informationen, warum dies wichtig ist, finden Sie unterhttps:// > aka.ms/LearnAboutSenderIdentification ] > > > > Hi Andreas, > > > > In an "xml" PI, the "version" is NOT optional, see > > https://www.w3.org/TR/REC-xml/#sec-pi > > > > If we tried to handle all cases of invalid documents, then there would > > be no end to it. > > > > Gary > > > > On Tue, Jan 2, 2024 at 9:36 AM Gary Gregory > wrote: > >> Ah, you are talking about something different, I am sorry about that. > Looking... > >> > >> Gary > >> > >> On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory > wrote: > >>> Hello Andrea, > >>> > >>> Please try git master or a 2.16.0-SNAPSHOT build > >>> ( > https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT > ) > >>> I fixed this today as reported inhttps:// > github.com/apache/commons-io/pull/550 > >>> > >>> TY! > >>> Gary > >>> > >>> On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold > >>> wrote: > Hi, > > the regular expression for the encoding was changed in XmlStreamReader > between 2.13.0 and 2.15.1. > > It now requires a version attribute in the XML declaration and does > not > work anymore with some real world files. > > For example, the encoding from the following example declaration is > respected by 2.13.0, but not by 2.15.1 > > > > It works if the version is specified: encoding='Cp1047'?> > > However note, thathttps://www.w3.org/TR/xml/#NT-EncodingDecl also > mentions examples without version attribute, at least for entities. It > would be good to restore the previous behavior, IMHO. > > Cheers, > Andreas > > > > > - > To unsubscribe, e-mail:user-unsubscr...@commons.apache.org > For additional commands, e-mail:user-h...@commons.apache.org > > > - > > To unsubscribe, e-mail:user-unsubscr...@commons.apache.org > > For additional commands, e-mail:user-h...@commons.apache.org > > >
Re: XmlStreamReader encoding regexp does not work anymore without version
Andreas, I just remembered that we have a lenient setting that could be used to access a different regular expression that does not care about correctness. If we do support this, then the regular expression must be lenient enough but not so much that it can be used as an attack vector for resource consumption, which was a problem in the past IIRC. Whether or not it's a good idea to have a new lenient setting, overload the current one, or have one at all, is a different topic. Gary On Tue, Jan 2, 2024, 9:42 AM Gary Gregory wrote: > Hi Andreas, > > In an "xml" PI, the "version" is NOT optional, see > https://www.w3.org/TR/REC-xml/#sec-pi > > If we tried to handle all cases of invalid documents, then there would > be no end to it. > > Gary > > On Tue, Jan 2, 2024 at 9:36 AM Gary Gregory > wrote: > > > > Ah, you are talking about something different, I am sorry about that. > Looking... > > > > Gary > > > > On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory > wrote: > > > > > > Hello Andrea, > > > > > > Please try git master or a 2.16.0-SNAPSHOT build > > > ( > https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT > ) > > > I fixed this today as reported in > https://github.com/apache/commons-io/pull/550 > > > > > > TY! > > > Gary > > > > > > On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold > > > wrote: > > > > > > > > Hi, > > > > > > > > the regular expression for the encoding was changed in > XmlStreamReader > > > > between 2.13.0 and 2.15.1. > > > > > > > > It now requires a version attribute in the XML declaration and does > not > > > > work anymore with some real world files. > > > > > > > > For example, the encoding from the following example declaration is > > > > respected by 2.13.0, but not by 2.15.1 > > > > > > > > > > > > > > > > It works if the version is specified: > > > encoding='Cp1047'?> > > > > > > > > However note, that https://www.w3.org/TR/xml/#NT-EncodingDecl also > > > > mentions examples without version attribute, at least for entities. > It > > > > would be good to restore the previous behavior, IMHO. > > > > > > > > Cheers, > > > > Andreas > > > > > > > > > > > > > > > > > > > > - > > > > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org > > > > For additional commands, e-mail: user-h...@commons.apache.org > > > > >
Re: XmlStreamReader encoding regexp does not work anymore without version
Hi Gary, right, but it is optional for external entities, see https://www.w3.org/TR/xml/#TextEntities And the examples in https://www.w3.org/TR/xml/#NT-EncodingDecl also don't have version attributes, so this might still be a valid use case? Cheers Andreas Gary Gregory schrieb am 02.01.24 um 15:42: [Sie erhalten nicht häufig E-Mails vongarydgreg...@gmail.com. Weitere Informationen, warum dies wichtig ist, finden Sie unterhttps://aka.ms/LearnAboutSenderIdentification ] Hi Andreas, In an "xml" PI, the "version" is NOT optional, see https://www.w3.org/TR/REC-xml/#sec-pi If we tried to handle all cases of invalid documents, then there would be no end to it. Gary On Tue, Jan 2, 2024 at 9:36 AM Gary Gregory wrote: Ah, you are talking about something different, I am sorry about that. Looking... Gary On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory wrote: Hello Andrea, Please try git master or a 2.16.0-SNAPSHOT build (https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT) I fixed this today as reported inhttps://github.com/apache/commons-io/pull/550 TY! Gary On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold wrote: Hi, the regular expression for the encoding was changed in XmlStreamReader between 2.13.0 and 2.15.1. It now requires a version attribute in the XML declaration and does not work anymore with some real world files. For example, the encoding from the following example declaration is respected by 2.13.0, but not by 2.15.1 It works if the version is specified: However note, thathttps://www.w3.org/TR/xml/#NT-EncodingDecl also mentions examples without version attribute, at least for entities. It would be good to restore the previous behavior, IMHO. Cheers, Andreas - To unsubscribe, e-mail:user-unsubscr...@commons.apache.org For additional commands, e-mail:user-h...@commons.apache.org - To unsubscribe, e-mail:user-unsubscr...@commons.apache.org For additional commands, e-mail:user-h...@commons.apache.org
Re: XmlStreamReader encoding regexp does not work anymore without version
Hi Andreas, In an "xml" PI, the "version" is NOT optional, see https://www.w3.org/TR/REC-xml/#sec-pi If we tried to handle all cases of invalid documents, then there would be no end to it. Gary On Tue, Jan 2, 2024 at 9:36 AM Gary Gregory wrote: > > Ah, you are talking about something different, I am sorry about that. > Looking... > > Gary > > On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory wrote: > > > > Hello Andrea, > > > > Please try git master or a 2.16.0-SNAPSHOT build > > (https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT) > > I fixed this today as reported in > > https://github.com/apache/commons-io/pull/550 > > > > TY! > > Gary > > > > On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold > > wrote: > > > > > > Hi, > > > > > > the regular expression for the encoding was changed in XmlStreamReader > > > between 2.13.0 and 2.15.1. > > > > > > It now requires a version attribute in the XML declaration and does not > > > work anymore with some real world files. > > > > > > For example, the encoding from the following example declaration is > > > respected by 2.13.0, but not by 2.15.1 > > > > > > > > > > > > It works if the version is specified: > > encoding='Cp1047'?> > > > > > > However note, that https://www.w3.org/TR/xml/#NT-EncodingDecl also > > > mentions examples without version attribute, at least for entities. It > > > would be good to restore the previous behavior, IMHO. > > > > > > Cheers, > > > Andreas > > > > > > > > > > > > > > > - > > > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org > > > For additional commands, e-mail: user-h...@commons.apache.org > > > - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: XmlStreamReader encoding regexp does not work anymore without version
Ah, you are talking about something different, I am sorry about that. Looking... Gary On Tue, Jan 2, 2024 at 9:35 AM Gary Gregory wrote: > > Hello Andrea, > > Please try git master or a 2.16.0-SNAPSHOT build > (https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT) > I fixed this today as reported in > https://github.com/apache/commons-io/pull/550 > > TY! > Gary > > On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold > wrote: > > > > Hi, > > > > the regular expression for the encoding was changed in XmlStreamReader > > between 2.13.0 and 2.15.1. > > > > It now requires a version attribute in the XML declaration and does not > > work anymore with some real world files. > > > > For example, the encoding from the following example declaration is > > respected by 2.13.0, but not by 2.15.1 > > > > > > > > It works if the version is specified: > encoding='Cp1047'?> > > > > However note, that https://www.w3.org/TR/xml/#NT-EncodingDecl also > > mentions examples without version attribute, at least for entities. It > > would be good to restore the previous behavior, IMHO. > > > > Cheers, > > Andreas > > > > > > > > > > - > > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org > > For additional commands, e-mail: user-h...@commons.apache.org > > - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: XmlStreamReader encoding regexp does not work anymore without version
Hello Andrea, Please try git master or a 2.16.0-SNAPSHOT build (https://repository.apache.org/content/repositories/snapshots/commons-io/commons-io/2.16.0-SNAPSHOT) I fixed this today as reported in https://github.com/apache/commons-io/pull/550 TY! Gary On Tue, Jan 2, 2024 at 9:33 AM Andreas Hubold wrote: > > Hi, > > the regular expression for the encoding was changed in XmlStreamReader > between 2.13.0 and 2.15.1. > > It now requires a version attribute in the XML declaration and does not > work anymore with some real world files. > > For example, the encoding from the following example declaration is > respected by 2.13.0, but not by 2.15.1 > > > > It works if the version is specified: encoding='Cp1047'?> > > However note, that https://www.w3.org/TR/xml/#NT-EncodingDecl also > mentions examples without version attribute, at least for entities. It > would be good to restore the previous behavior, IMHO. > > Cheers, > Andreas > > > > > - > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org > For additional commands, e-mail: user-h...@commons.apache.org > - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
XmlStreamReader encoding regexp does not work anymore without version
Hi, the regular expression for the encoding was changed in XmlStreamReader between 2.13.0 and 2.15.1. It now requires a version attribute in the XML declaration and does not work anymore with some real world files. For example, the encoding from the following example declaration is respected by 2.13.0, but not by 2.15.1 It works if the version is specified: encoding='Cp1047'?> However note, that https://www.w3.org/TR/xml/#NT-EncodingDecl also mentions examples without version attribute, at least for entities. It would be good to restore the previous behavior, IMHO. Cheers, Andreas - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org