Re: [xml] Need help on normalization/canonicalization with namespace prefix rewrite

2018-01-29 Thread Aleksey Sanin
I don't understand what are you trying to do so I really don't have
a good answer for you.

Aleksey

On 1/29/18 12:33 AM, Mikhail Goloborodko wrote:
> Alexey,
> 
> Thank you!
> I'am new to libxml2. Would appretiate if you advice how to achieve this.
> May be I need a callback for in Reader or Writer?
> 
> Mikhail
> 
> On Sun, Jan 28, 2018 at 10:02 PM, Aleksey Sanin  > wrote:
> 
> I am not sure what is the suggest algorithm to "rewrite namespace
> prefixes". Regardless, this is not part of C14N spec and something
> you will have to do yourself.
> 
> Aleksey
> 
> On 1/28/18 3:19 AM, Mikhail Goloborodko wrote:
> > Hi All,
> >
> > I will appreciate if somebody could help on how to normalize and
> > canonicalize XML.
> >
> > For example
> > 
> > 
> > 
> >
> > I need to get
> >
> > 
> >
> > And for
> >
> > 
> > 
> >   
> > 
> >
> > I need
> >
> >  attr="4583001999" > attr="value">
> >
> > In other words I need to remove whitespaces and rewrite namespace
> prefixes
> > I use
> > string src;
> > xmlChar * canon;
> > xmlDocPtr xDoc = xmlReadMemory(src.data(), src.size(), nullptr,
> nullptr,
> > XML_PARSE_NOBLANKS);
> > int bytes = xmlC14NDocDumpMemory(xDoc, nullptr, 0, nullptr, 0, &
> canon);
> >
> > It removes whitespaces, need help with namespace prefix rewrite.
> >
> > Thank you in advance.
> >
> > On Sun, Jan 28, 2018 at 12:41 AM, Mikhail Goloborodko
> > 
> >> wrote:
> >
> >     Hi,
> >
> >     I need help on how to normalize and canonicalize XML.
> >     For example
> >     
> >      xmlns:ed="urn:ru:ed:v2.0">
> >     
> >
> >     I need to get
> >
> >     
> >
> >     And for
> >
> >     
> >     
> >       
> >     
> >
> >     I need
> >
> >      >     attr="value">
> >
> >     In other words I need to remove whitespaces and rewrite namespace
> >     prefixes
> >     I use
> >     string src;
> >     xmlChar * canon;
> >     xmlDocPtr xDoc = xmlReadMemory(src.data(), src.size(), nullptr,
> >     nullptr, XML_PARSE_NOBLANKS);
> >     int bytes = xmlC14NDocDumpMemory(xDoc, nullptr, 0, nullptr, 0,
> & canon);
> >
> >     It clearly removes whitespace, need help with namespace prefix
> rewrite.
> >
> >     Thank you in advance.
> >
> >     Mikhail 
> >
> >
> >
> >
> > ___
> > xml mailing list, project page  http://xmlsoft.org/
> > xml@gnome.org 
> > https://mail.gnome.org/mailman/listinfo/xml
> 
> >
> 
> 
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Incorrect server side include parsing can lead to XSS and other similar issues

2018-01-29 Thread Patrick Toomey
Does anyone have any thoughts on this? Apologies if the original post
didn’t concisely outline the issue. In short, the current libxml2 behavior
seems to result in well formed HTML being parsed in a way that is
round-tripped incorrectly and results in new elements being added.

{{{
->  libxml2 git:(bc5a5d65) ✗ cat test/HTML/ssiquote.html
test1
->  libxml2 git:(bc5a5d65) ✗ make testHTML
->  libxml2 git:(bc5a5d65) ✗ ./testHTML test/HTML/ssiquote.html
http://www.w3.org/TR/REC-html40/loose.dtd;>
alert(1)-->">test1
}}}

Using the `testHTML` test utility demonstrates that the sample input does
not round-trip correctly. Given that libxml2 is used in lots of libraries,
including those that do HTML sanitization, this can result in security bugs
that currently require us to maintain a custom patch for libxml2.

On Fri, Jan 12, 2018 at 10:12 AM Patrick Toomey 
wrote:

> Shoot..I see that the href from the example was stripped once it is
> displayed on
> https://mail.gnome.org/archives/xml/2018-January/msg00010.html. Here is a
> gist that preserves formatting:
> https://gist.github.com/ptoomey3/4f684c7386229658b39b69756e262050.
>
>
>
> On Fri, Jan 12, 2018 at 10:01 AM Patrick Toomey 
> wrote:
>
>> While triaging a reported cross site scripting bug we were analyzing the
>> behavior of our HTML sanitization code and noticed that it was parsing an
>> input in an unexpected way.  The sanitization library itself eventually
>> wraps Nokogiri, which is a relatively thin wrapper around libxml2. We
>> reached out to Kurt Seifriend and Daniel Veillard to let them know about
>> the observed behavior.  There was discussion about whether the observed
>> behavior was was expected or not and eventually it was suggested that we
>> move the discussion to the libxml2 mailing list (the folks on that thread
>> reserved CVE-2016-3709 in case it is decided that this is an issue).
>>
>> In short https://github.com/GNOME/libxml2/commit/960f0e2 introduced
>> support for not URI escaping server side includes. However, this seems to
>> lead to the below behavior.
>>
>> ->  libxml2 git:(bc5a5d65) ✗ cat test/HTML/ssiquote.html
>> > href='!--"scriptalert(1)/script-->'>test1
>> ->  libxml2 git:(bc5a5d65) ✗ make testHTML
>> ->  libxml2 git:(bc5a5d65) ✗ ./testHTML test/HTML/ssiquote.html
>> > http://www.w3.org/TR/REC-html40/loose.dtd;>
>> > href="">test1
>>
>> Notice that the output results in a script tag added to the resulting
>> parsed output.  Here is a small bit of Java/Xerces code to compare:
>>
>> import java.io.IOException;
>> import java.io.StringReader;
>> import java.io.StringWriter;
>>
>> import javax.xml.parsers.*;
>> import javax.xml.transform.OutputKeys;
>> import javax.xml.transform.Transformer;
>> import javax.xml.transform.TransformerException;
>> import javax.xml.transform.TransformerFactory;
>> import javax.xml.transform.dom.DOMSource;
>> import javax.xml.transform.stream.StreamResult;
>>
>> import org.w3c.dom.Document;
>> import org.w3c.dom.NamedNodeMap;
>> import org.w3c.dom.Node;
>> import org.w3c.dom.NodeList;
>> import org.xml.sax.InputSource;
>> import org.xml.sax.SAXException;
>>
>> public class XmlTester {
>>
>> public static void main(String[] args) throws
>> ParserConfigurationException, SAXException, IOException,
>> TransformerException {
>> String text = "> href='!--\"scriptalert(1)/script-->'>test1";
>> DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
>> DocumentBuilder builder = factory.newDocumentBuilder();
>>
>> // text contains the XML content
>> Document doc = builder.parse(new InputSource(new StringReader(text)));
>> System.out.println(getStringFromDocument(doc));
>> }
>>
>> public static String getStringFromDocument(Document doc) throws
>> TransformerException {
>> DOMSource domSource = new DOMSource(doc);
>> StringWriter writer = new StringWriter();
>> StreamResult result = new StreamResult(writer);
>> TransformerFactory tf = TransformerFactory.newInstance();
>> Transformer transformer = tf.newTransformer();
>> transformer.transform(domSource, result);
>> return writer.toString();
>> }
>> }
>>
>> This Java code, with the same input, results in the following output:
>>
>> 
>> 
>> 
>> test1
>> 
>> 
>> 
>>
>> The attribute contents are quoted/escaped such that  they don’t break out
>> of the attribute once it is parsed. This libxml2 behavior doesn’t apply to
>> all attributes. If we change the href to a class attribute there is no
>> issue. This likely makes sense since the above mentioned commit
>> specifically references not URI escaping.
>>
>> ->  libxml2 git:(bc5a5d65) ✗ cat test/HTML/ssiquote.html
>> > class='!--"scriptalert(1)/script-->'>test1
>> ->  libxml2 git:(bc5a5d65) ✗ make testHTML
>> ->  libxml2 git:(bc5a5d65) ✗ ./testHTML test/HTML/ssiquote.html
>> > http://www.w3.org/TR/REC-html40/loose.dtd;>
>> > class=''>test1
>>
>> So, I guess the question is, what do people think? I believe the argument
>> from Daniel was roughly that 

Re: [xml] Need help on normalization/canonicalization with namespace prefix rewrite

2018-01-29 Thread Mikhail Goloborodko
Alexey,

Thank you!
I'am new to libxml2. Would appretiate if you advice how to achieve this.
May be I need a callback for in Reader or Writer?

Mikhail

On Sun, Jan 28, 2018 at 10:02 PM, Aleksey Sanin  wrote:

> I am not sure what is the suggest algorithm to "rewrite namespace
> prefixes". Regardless, this is not part of C14N spec and something
> you will have to do yourself.
>
> Aleksey
>
> On 1/28/18 3:19 AM, Mikhail Goloborodko wrote:
> > Hi All,
> >
> > I will appreciate if somebody could help on how to normalize and
> > canonicalize XML.
> >
> > For example
> > 
> > 
> > 
> >
> > I need to get
> >
> > 
> >
> > And for
> >
> > 
> > 
> >   
> > 
> >
> > I need
> >
> >  attr="4583001999" > attr="value">
> >
> > In other words I need to remove whitespaces and rewrite namespace
> prefixes
> > I use
> > string src;
> > xmlChar * canon;
> > xmlDocPtr xDoc = xmlReadMemory(src.data(), src.size(), nullptr, nullptr,
> > XML_PARSE_NOBLANKS);
> > int bytes = xmlC14NDocDumpMemory(xDoc, nullptr, 0, nullptr, 0, & canon);
> >
> > It removes whitespaces, need help with namespace prefix rewrite.
> >
> > Thank you in advance.
> >
> > On Sun, Jan 28, 2018 at 12:41 AM, Mikhail Goloborodko
> > > wrote:
> >
> > Hi,
> >
> > I need help on how to normalize and canonicalize XML.
> > For example
> > 
> > 
> > 
> >
> > I need to get
> >
> > 
> >
> > And for
> >
> > 
> > 
> >   
> > 
> >
> > I need
> >
> >  > attr="value">
> >
> > In other words I need to remove whitespaces and rewrite namespace
> > prefixes
> > I use
> > string src;
> > xmlChar * canon;
> > xmlDocPtr xDoc = xmlReadMemory(src.data(), src.size(), nullptr,
> > nullptr, XML_PARSE_NOBLANKS);
> > int bytes = xmlC14NDocDumpMemory(xDoc, nullptr, 0, nullptr, 0, &
> canon);
> >
> > It clearly removes whitespace, need help with namespace prefix
> rewrite.
> >
> > Thank you in advance.
> >
> > Mikhail
> >
> >
> >
> >
> > ___
> > xml mailing list, project page  http://xmlsoft.org/
> > xml@gnome.org
> > https://mail.gnome.org/mailman/listinfo/xml
> >
>
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml