Re: How to cope with duplicate attributes in XML tags

2021-02-07 Thread Keith N. McKenna
On 2/7/2021 10:22 AM, Arrigo Marchiori wrote:
> Hello all,
> 
> re-replying to Jim's message.
> 
> On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote:
> 
>> Funny that you bring this up... I'm been tracking down some bugs and they
>> all seem to be XML related... fastsax->libwriterfilter with occasional cores
>> due to __cxa_call_unexpected.
>>
>> I feel that making AOO more fragile by trying to work around cases where
>> invalid and/or non-compliant XML is encountered is just wrong. We should
>> either ignore the error (catch it) or raise an exception. Invalid data 
>> shouldn't
>> be tolerated. Additionally, trying to be "lenient" is an easy vector for
>> vulnerabilities.
> 
> For the record: the detection of duplicated attributes is made
> internally by the expat library. Our code just receives the error
> message and cannot do anything to recover it.
> 
> I don't believe it's worth patching expat to allow duplicated
> attributes. I don't know the library well and I fear about the
> consequences of tinkering with it.
> 
> But then my question becomes: do we want to offer any data recovery
> tools for corrupted documents? Like ``dumb'' XML parsers that just
> shave away XML errors?
> 
>  1- it could be an external tool, written in a language that is easier
> to code into? (like Python, Perl, Java... whatever)
> 
>  2- or an internal pre-parsing phase? It should not be based on the
> expat library though; do we have any other possibilities among the
> current modules?
> 
>  3- or we leave it to hand-crafting by knowledgeable people on the
> forum, as it is happening now?
> 
> I am looking forward to opinions ... and possibily reviews of PR 122
> please ;-)
> 
> Best regards,
> 
Purely from a users point of view I agree with Jim. It should not be
allowed to happen. Asking the user to run an external program, our to
send it to the forum to be hand edit is a recipe for disaster to our
user base and from a marketing standpoint.

I could see an external program as a  short term, stop gap work around.
However it should only be that.

Regards
Keith




signature.asc
Description: OpenPGP digital signature


Re: How to cope with duplicate attributes in XML tags

2021-02-07 Thread Arrigo Marchiori
Hello Peter, and thank you for taking time to reply.

On Sun, Feb 07, 2021 at 04:58:19PM +0100, Peter Kovacs wrote:

> Hello,
> 
> On 07.02.21 16:22, Arrigo Marchiori wrote:
> > Hello all,
> > 
> > re-replying to Jim's message.
> > 
> > On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote:
> > 
> > > Funny that you bring this up... I'm been tracking down some bugs and they
> > > all seem to be XML related... fastsax->libwriterfilter with occasional 
> > > cores
> > > due to __cxa_call_unexpected.
> > > 
> > > I feel that making AOO more fragile by trying to work around cases where
> > > invalid and/or non-compliant XML is encountered is just wrong. We should
> > > either ignore the error (catch it) or raise an exception. Invalid data 
> > > shouldn't
> > > be tolerated. Additionally, trying to be "lenient" is an easy vector for
> > > vulnerabilities.
> > For the record: the detection of duplicated attributes is made
> > internally by the expat library. Our code just receives the error
> > message and cannot do anything to recover it.
> 
> I think it is not an issue of expat itself. It is an issue of how expat is
> setup.
> 
> From the pure xml lore you can allow multiple elements of the same name.
> 
> consider unordered HTML List, as a reference.

I may have explained myself wrong...  I am not talking about
duplicated _elements_ but rather duplicated _attributes_.

To remain in your example, the problem is not having multiple 
elements, but rather something like:


[...]
> We could provide a helper to help the user to figure out what has happened
> maybe.

By "helper" do you mean a separate program or a function inside AOO?

[..]
> > I am looking forward to opinions ... and possibily reviews of PR 122
> > please ;-)
> I plan to have a look!

Many thanks!

If we approve it, I will surely need your indication (as per a
previous thread on this ML) about the next "high priority issue" to
address.

Best regards,
-- 
Arrigo

http://rigo.altervista.org

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: How to cope with duplicate attributes in XML tags

2021-02-07 Thread Peter Kovacs

Hello,

On 07.02.21 16:22, Arrigo Marchiori wrote:

Hello all,

re-replying to Jim's message.

On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote:


Funny that you bring this up... I'm been tracking down some bugs and they
all seem to be XML related... fastsax->libwriterfilter with occasional cores
due to __cxa_call_unexpected.

I feel that making AOO more fragile by trying to work around cases where
invalid and/or non-compliant XML is encountered is just wrong. We should
either ignore the error (catch it) or raise an exception. Invalid data shouldn't
be tolerated. Additionally, trying to be "lenient" is an easy vector for
vulnerabilities.

For the record: the detection of duplicated attributes is made
internally by the expat library. Our code just receives the error
message and cannot do anything to recover it.


I think it is not an issue of expat itself. It is an issue of how expat 
is setup.


From the pure xml lore you can allow multiple elements of the same name.

consider unordered HTML List, as a reference.

I would opt for checking if we could allow that this Element can be read 
as a duplicate.


The user can then delete the entry he does not like, and fix therefore 
the document.


We could provide a helper to help the user to figure out what has 
happened maybe.




I don't believe it's worth patching expat to allow duplicated
attributes. I don't know the library well and I fear about the
consequences of tinkering with it.

But then my question becomes: do we want to offer any data recovery
tools for corrupted documents? Like ``dumb'' XML parsers that just
shave away XML errors?

  1- it could be an external tool, written in a language that is easier
 to code into? (like Python, Perl, Java... whatever)

  2- or an internal pre-parsing phase? It should not be based on the
 expat library though; do we have any other possibilities among the
 current modules?

  3- or we leave it to hand-crafting by knowledgeable people on the
 forum, as it is happening now?

I am looking forward to opinions ... and possibily reviews of PR 122
please ;-)

I plan to have a look!


Best regards,

--
This is the Way! http://www.apache.org/theapacheway/index.html

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: How to cope with duplicate attributes in XML tags

2021-02-07 Thread Arrigo Marchiori
Hello all,

re-replying to Jim's message.

On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote:

> Funny that you bring this up... I'm been tracking down some bugs and they
> all seem to be XML related... fastsax->libwriterfilter with occasional cores
> due to __cxa_call_unexpected.
> 
> I feel that making AOO more fragile by trying to work around cases where
> invalid and/or non-compliant XML is encountered is just wrong. We should
> either ignore the error (catch it) or raise an exception. Invalid data 
> shouldn't
> be tolerated. Additionally, trying to be "lenient" is an easy vector for
> vulnerabilities.

For the record: the detection of duplicated attributes is made
internally by the expat library. Our code just receives the error
message and cannot do anything to recover it.

I don't believe it's worth patching expat to allow duplicated
attributes. I don't know the library well and I fear about the
consequences of tinkering with it.

But then my question becomes: do we want to offer any data recovery
tools for corrupted documents? Like ``dumb'' XML parsers that just
shave away XML errors?

 1- it could be an external tool, written in a language that is easier
to code into? (like Python, Perl, Java... whatever)

 2- or an internal pre-parsing phase? It should not be based on the
expat library though; do we have any other possibilities among the
current modules?

 3- or we leave it to hand-crafting by knowledgeable people on the
forum, as it is happening now?

I am looking forward to opinions ... and possibily reviews of PR 122
please ;-)

Best regards,
-- 
Arrigo

http://rigo.altervista.org

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: How to cope with duplicate attributes in XML tags

2021-02-04 Thread Arrigo Marchiori
Hello Peter, All,

On Thu, Feb 04, 2021 at 12:00:44PM +0100, Peter Kovacs wrote:

> +1 for Option 3)
> 
[... FTR ...]
> >   3- raise an exception because it is just not acceptable.

I believe it should be a subclass of uno::Exception, but... which one?
There seem to be a lot of Exception's subclasses in the UNO framework.

Thank you in advance,
-- 
Arrigo

http://rigo.altervista.org

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: How to cope with duplicate attributes in XML tags

2021-02-04 Thread Arrigo Marchiori
Hello Jim,

On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote:

> Funny that you bring this up... I'm been tracking down some bugs and they
> all seem to be XML related...

:-)

[...]
> I feel that making AOO more fragile by trying to work around cases where
> invalid and/or non-compliant XML is encountered is just wrong. We should
> either ignore the error (catch it) or raise an exception.

Could you please explain better what you mean by "ignore the error
(catch it)"? I understand the word "catch" for exceptions...

> Invalid data shouldn't be tolerated. Additionally, trying to be
> "lenient" is an easy vector for vulnerabilities.

I agree if we speak about the XML code generated by AOO, I mean during
the export of XML data.

If the save operation fails, an exception should be thrown and the
user should be warned, so they can retry saving in another format, or
doing copy & paste into another document... I am sure they would
rather try many times than just silently risk to lose their data.

Once the data is on disk, being able to recover a corrupt file would
be IMHO very useful, because users get very upset if their data is
suddenly lost because of a bug, and honestly, so would I.

As long as errors consist of a missing end tag or a duplicated
attribute, I believe we should be able to handle them with little
implications for security (for what I can imagine -- I am not a
security expert).

In a previous thread on this ML "High priority issues", Peter pointed
out this bug too: https://bz.apache.org/ooo/show_bug.cgi?id=126927
that can be generalized as "warn the user if there are problems in the
input". I would like very much to have such a system in place, for
example with ``warnings'' recorded during the XML parsing, and
eventually displayed.

Users would become worried about the unexpected warning messages
emitted during load and save operations, and would come back to us (on
user forums, mailing lists, BugZilla etc); this could help them (and
us) better address issues, maybe even identify problems before they
cause data loss.

If you really do not like the "lenient" parsing, we could try to make
it an optional second attempt. If the "strict" parsing fails, then the
user could be offered to retry with the lenient algorithm, but being
warned about possible security implications.

I hope I could explain myself clearly. I believe that these topics may
have already been discussed in the past; if you want to avoid
discussing any already consolidated decisions, please send me the
pointers.

Best regards,
-- 
Arrigo

http://rigo.altervista.org

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: How to cope with duplicate attributes in XML tags

2021-02-04 Thread Peter Kovacs

+1 for Option 3)

On 03.02.21 19:38, Arrigo Marchiori wrote:

Dear List,

bug 128356 [1] is mostly about an XML tag that carries an attribute
_twice_. This is not allowed; the SAX parser raises an exception when
it finds it while loading the file, and the user gets upset because
they cannot open their file any more.

While I am looking for the actual cause of this bug, I think it may be
useful to stop this XML error from happening anywhere. Our API should
complain heavily when requested to do such an illegal action.

I am going to add a ``safety-check'' into the methods of class
SvXMLAttributeList to avoid ``adding'' an attribute that is already in
the list. But what shall these methods do instead, apart from
complaining?

  1- just ignore the request, leaving the current value for the
 attribute;

  2- set the new value, overwriting the current one;

  3- raise an exception because it is just not acceptable.

I personally like option 3-, but someone may have a different opinion?

In addition to this, when finished with the bug, I will try to make
the SAX parser accept these duplicated attributes, and try to carry on
loading the file. This should help our angry users recover their data,
even if some information may be lost.

Please share your opinion.

References:
  1: https://bz.apache.org/ooo/show_bug.cgi?id=128356

Best regards,

--
This is the Way! http://www.apache.org/theapacheway/index.html

-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



Re: How to cope with duplicate attributes in XML tags

2021-02-03 Thread Jim Jagielski
Funny that you bring this up... I'm been tracking down some bugs and they
all seem to be XML related... fastsax->libwriterfilter with occasional cores
due to __cxa_call_unexpected.

I feel that making AOO more fragile by trying to work around cases where
invalid and/or non-compliant XML is encountered is just wrong. We should
either ignore the error (catch it) or raise an exception. Invalid data shouldn't
be tolerated. Additionally, trying to be "lenient" is an easy vector for
vulnerabilities.

My 2c

> On Feb 3, 2021, at 1:38 PM, Arrigo Marchiori  wrote:
> 
> Dear List,
> 
> bug 128356 [1] is mostly about an XML tag that carries an attribute
> _twice_. This is not allowed; the SAX parser raises an exception when
> it finds it while loading the file, and the user gets upset because
> they cannot open their file any more.
> 
> While I am looking for the actual cause of this bug, I think it may be
> useful to stop this XML error from happening anywhere. Our API should
> complain heavily when requested to do such an illegal action.
> 
> I am going to add a ``safety-check'' into the methods of class
> SvXMLAttributeList to avoid ``adding'' an attribute that is already in
> the list. But what shall these methods do instead, apart from
> complaining?
> 
> 1- just ignore the request, leaving the current value for the
>attribute;
> 
> 2- set the new value, overwriting the current one;
> 
> 3- raise an exception because it is just not acceptable.
> 
> I personally like option 3-, but someone may have a different opinion?
> 
> In addition to this, when finished with the bug, I will try to make
> the SAX parser accept these duplicated attributes, and try to carry on
> loading the file. This should help our angry users recover their data,
> even if some information may be lost.
> 
> Please share your opinion.
> 
> References:
> 1: https://bz.apache.org/ooo/show_bug.cgi?id=128356
> 
> Best regards,
> -- 
> Arrigo
> 
> http://rigo.altervista.org
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org