Re: How to cope with duplicate attributes in XML tags
On 2/7/2021 10:22 AM, Arrigo Marchiori wrote: > Hello all, > > re-replying to Jim's message. > > On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote: > >> Funny that you bring this up... I'm been tracking down some bugs and they >> all seem to be XML related... fastsax->libwriterfilter with occasional cores >> due to __cxa_call_unexpected. >> >> I feel that making AOO more fragile by trying to work around cases where >> invalid and/or non-compliant XML is encountered is just wrong. We should >> either ignore the error (catch it) or raise an exception. Invalid data >> shouldn't >> be tolerated. Additionally, trying to be "lenient" is an easy vector for >> vulnerabilities. > > For the record: the detection of duplicated attributes is made > internally by the expat library. Our code just receives the error > message and cannot do anything to recover it. > > I don't believe it's worth patching expat to allow duplicated > attributes. I don't know the library well and I fear about the > consequences of tinkering with it. > > But then my question becomes: do we want to offer any data recovery > tools for corrupted documents? Like ``dumb'' XML parsers that just > shave away XML errors? > > 1- it could be an external tool, written in a language that is easier > to code into? (like Python, Perl, Java... whatever) > > 2- or an internal pre-parsing phase? It should not be based on the > expat library though; do we have any other possibilities among the > current modules? > > 3- or we leave it to hand-crafting by knowledgeable people on the > forum, as it is happening now? > > I am looking forward to opinions ... and possibily reviews of PR 122 > please ;-) > > Best regards, > Purely from a users point of view I agree with Jim. It should not be allowed to happen. Asking the user to run an external program, our to send it to the forum to be hand edit is a recipe for disaster to our user base and from a marketing standpoint. I could see an external program as a short term, stop gap work around. However it should only be that. Regards Keith signature.asc Description: OpenPGP digital signature
Re: How to cope with duplicate attributes in XML tags
Hello Peter, and thank you for taking time to reply. On Sun, Feb 07, 2021 at 04:58:19PM +0100, Peter Kovacs wrote: > Hello, > > On 07.02.21 16:22, Arrigo Marchiori wrote: > > Hello all, > > > > re-replying to Jim's message. > > > > On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote: > > > > > Funny that you bring this up... I'm been tracking down some bugs and they > > > all seem to be XML related... fastsax->libwriterfilter with occasional > > > cores > > > due to __cxa_call_unexpected. > > > > > > I feel that making AOO more fragile by trying to work around cases where > > > invalid and/or non-compliant XML is encountered is just wrong. We should > > > either ignore the error (catch it) or raise an exception. Invalid data > > > shouldn't > > > be tolerated. Additionally, trying to be "lenient" is an easy vector for > > > vulnerabilities. > > For the record: the detection of duplicated attributes is made > > internally by the expat library. Our code just receives the error > > message and cannot do anything to recover it. > > I think it is not an issue of expat itself. It is an issue of how expat is > setup. > > From the pure xml lore you can allow multiple elements of the same name. > > consider unordered HTML List, as a reference. I may have explained myself wrong... I am not talking about duplicated _elements_ but rather duplicated _attributes_. To remain in your example, the problem is not having multiple elements, but rather something like: [...] > We could provide a helper to help the user to figure out what has happened > maybe. By "helper" do you mean a separate program or a function inside AOO? [..] > > I am looking forward to opinions ... and possibily reviews of PR 122 > > please ;-) > I plan to have a look! Many thanks! If we approve it, I will surely need your indication (as per a previous thread on this ML) about the next "high priority issue" to address. Best regards, -- Arrigo http://rigo.altervista.org - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: How to cope with duplicate attributes in XML tags
Hello, On 07.02.21 16:22, Arrigo Marchiori wrote: Hello all, re-replying to Jim's message. On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote: Funny that you bring this up... I'm been tracking down some bugs and they all seem to be XML related... fastsax->libwriterfilter with occasional cores due to __cxa_call_unexpected. I feel that making AOO more fragile by trying to work around cases where invalid and/or non-compliant XML is encountered is just wrong. We should either ignore the error (catch it) or raise an exception. Invalid data shouldn't be tolerated. Additionally, trying to be "lenient" is an easy vector for vulnerabilities. For the record: the detection of duplicated attributes is made internally by the expat library. Our code just receives the error message and cannot do anything to recover it. I think it is not an issue of expat itself. It is an issue of how expat is setup. From the pure xml lore you can allow multiple elements of the same name. consider unordered HTML List, as a reference. I would opt for checking if we could allow that this Element can be read as a duplicate. The user can then delete the entry he does not like, and fix therefore the document. We could provide a helper to help the user to figure out what has happened maybe. I don't believe it's worth patching expat to allow duplicated attributes. I don't know the library well and I fear about the consequences of tinkering with it. But then my question becomes: do we want to offer any data recovery tools for corrupted documents? Like ``dumb'' XML parsers that just shave away XML errors? 1- it could be an external tool, written in a language that is easier to code into? (like Python, Perl, Java... whatever) 2- or an internal pre-parsing phase? It should not be based on the expat library though; do we have any other possibilities among the current modules? 3- or we leave it to hand-crafting by knowledgeable people on the forum, as it is happening now? I am looking forward to opinions ... and possibily reviews of PR 122 please ;-) I plan to have a look! Best regards, -- This is the Way! http://www.apache.org/theapacheway/index.html - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: How to cope with duplicate attributes in XML tags
Hello all, re-replying to Jim's message. On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote: > Funny that you bring this up... I'm been tracking down some bugs and they > all seem to be XML related... fastsax->libwriterfilter with occasional cores > due to __cxa_call_unexpected. > > I feel that making AOO more fragile by trying to work around cases where > invalid and/or non-compliant XML is encountered is just wrong. We should > either ignore the error (catch it) or raise an exception. Invalid data > shouldn't > be tolerated. Additionally, trying to be "lenient" is an easy vector for > vulnerabilities. For the record: the detection of duplicated attributes is made internally by the expat library. Our code just receives the error message and cannot do anything to recover it. I don't believe it's worth patching expat to allow duplicated attributes. I don't know the library well and I fear about the consequences of tinkering with it. But then my question becomes: do we want to offer any data recovery tools for corrupted documents? Like ``dumb'' XML parsers that just shave away XML errors? 1- it could be an external tool, written in a language that is easier to code into? (like Python, Perl, Java... whatever) 2- or an internal pre-parsing phase? It should not be based on the expat library though; do we have any other possibilities among the current modules? 3- or we leave it to hand-crafting by knowledgeable people on the forum, as it is happening now? I am looking forward to opinions ... and possibily reviews of PR 122 please ;-) Best regards, -- Arrigo http://rigo.altervista.org - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: How to cope with duplicate attributes in XML tags
Hello Peter, All, On Thu, Feb 04, 2021 at 12:00:44PM +0100, Peter Kovacs wrote: > +1 for Option 3) > [... FTR ...] > > 3- raise an exception because it is just not acceptable. I believe it should be a subclass of uno::Exception, but... which one? There seem to be a lot of Exception's subclasses in the UNO framework. Thank you in advance, -- Arrigo http://rigo.altervista.org - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: How to cope with duplicate attributes in XML tags
Hello Jim, On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote: > Funny that you bring this up... I'm been tracking down some bugs and they > all seem to be XML related... :-) [...] > I feel that making AOO more fragile by trying to work around cases where > invalid and/or non-compliant XML is encountered is just wrong. We should > either ignore the error (catch it) or raise an exception. Could you please explain better what you mean by "ignore the error (catch it)"? I understand the word "catch" for exceptions... > Invalid data shouldn't be tolerated. Additionally, trying to be > "lenient" is an easy vector for vulnerabilities. I agree if we speak about the XML code generated by AOO, I mean during the export of XML data. If the save operation fails, an exception should be thrown and the user should be warned, so they can retry saving in another format, or doing copy & paste into another document... I am sure they would rather try many times than just silently risk to lose their data. Once the data is on disk, being able to recover a corrupt file would be IMHO very useful, because users get very upset if their data is suddenly lost because of a bug, and honestly, so would I. As long as errors consist of a missing end tag or a duplicated attribute, I believe we should be able to handle them with little implications for security (for what I can imagine -- I am not a security expert). In a previous thread on this ML "High priority issues", Peter pointed out this bug too: https://bz.apache.org/ooo/show_bug.cgi?id=126927 that can be generalized as "warn the user if there are problems in the input". I would like very much to have such a system in place, for example with ``warnings'' recorded during the XML parsing, and eventually displayed. Users would become worried about the unexpected warning messages emitted during load and save operations, and would come back to us (on user forums, mailing lists, BugZilla etc); this could help them (and us) better address issues, maybe even identify problems before they cause data loss. If you really do not like the "lenient" parsing, we could try to make it an optional second attempt. If the "strict" parsing fails, then the user could be offered to retry with the lenient algorithm, but being warned about possible security implications. I hope I could explain myself clearly. I believe that these topics may have already been discussed in the past; if you want to avoid discussing any already consolidated decisions, please send me the pointers. Best regards, -- Arrigo http://rigo.altervista.org - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: How to cope with duplicate attributes in XML tags
+1 for Option 3) On 03.02.21 19:38, Arrigo Marchiori wrote: Dear List, bug 128356 [1] is mostly about an XML tag that carries an attribute _twice_. This is not allowed; the SAX parser raises an exception when it finds it while loading the file, and the user gets upset because they cannot open their file any more. While I am looking for the actual cause of this bug, I think it may be useful to stop this XML error from happening anywhere. Our API should complain heavily when requested to do such an illegal action. I am going to add a ``safety-check'' into the methods of class SvXMLAttributeList to avoid ``adding'' an attribute that is already in the list. But what shall these methods do instead, apart from complaining? 1- just ignore the request, leaving the current value for the attribute; 2- set the new value, overwriting the current one; 3- raise an exception because it is just not acceptable. I personally like option 3-, but someone may have a different opinion? In addition to this, when finished with the bug, I will try to make the SAX parser accept these duplicated attributes, and try to carry on loading the file. This should help our angry users recover their data, even if some information may be lost. Please share your opinion. References: 1: https://bz.apache.org/ooo/show_bug.cgi?id=128356 Best regards, -- This is the Way! http://www.apache.org/theapacheway/index.html - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org
Re: How to cope with duplicate attributes in XML tags
Funny that you bring this up... I'm been tracking down some bugs and they all seem to be XML related... fastsax->libwriterfilter with occasional cores due to __cxa_call_unexpected. I feel that making AOO more fragile by trying to work around cases where invalid and/or non-compliant XML is encountered is just wrong. We should either ignore the error (catch it) or raise an exception. Invalid data shouldn't be tolerated. Additionally, trying to be "lenient" is an easy vector for vulnerabilities. My 2c > On Feb 3, 2021, at 1:38 PM, Arrigo Marchiori wrote: > > Dear List, > > bug 128356 [1] is mostly about an XML tag that carries an attribute > _twice_. This is not allowed; the SAX parser raises an exception when > it finds it while loading the file, and the user gets upset because > they cannot open their file any more. > > While I am looking for the actual cause of this bug, I think it may be > useful to stop this XML error from happening anywhere. Our API should > complain heavily when requested to do such an illegal action. > > I am going to add a ``safety-check'' into the methods of class > SvXMLAttributeList to avoid ``adding'' an attribute that is already in > the list. But what shall these methods do instead, apart from > complaining? > > 1- just ignore the request, leaving the current value for the >attribute; > > 2- set the new value, overwriting the current one; > > 3- raise an exception because it is just not acceptable. > > I personally like option 3-, but someone may have a different opinion? > > In addition to this, when finished with the bug, I will try to make > the SAX parser accept these duplicated attributes, and try to carry on > loading the file. This should help our angry users recover their data, > even if some information may be lost. > > Please share your opinion. > > References: > 1: https://bz.apache.org/ooo/show_bug.cgi?id=128356 > > Best regards, > -- > Arrigo > > http://rigo.altervista.org > > - > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > For additional commands, e-mail: dev-h...@openoffice.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org