Re: [xml] i'm here to contribute

Ladar Levison Mon, 31 Oct 2011 17:48:02 -0700

On Mon, 10/31/2011 5:48 PM, Stefan Sauer wrote:

On 09/18/2011 10:24 PM, Glen Hein wrote:
Hello,
I'm a software developer and I'd like to contribute to Gnome's XML project. I've used the libxml software for a long time and I'd like togive something back.
I just started a voluntary career break, but I'd like to stay active.

I looked over the TODO file, but I'm not sure which item to tackle. Could you 
recommend an item for someone new to the project?

Thanks,
Glen Hein
One thing that would be super cool would be multi-threaded xslt processing (e.g. for chunked document output). Unfortunately again, thisis not trivial at all. But any speedup for xslt processing would be great. The docbook xml -> html step in gtk-doc is so slow that mostdevelopers to api-doc generation off still :/
Stefan

My vote is to add a generic XML sanitizer. Presumably it would correct syntax problems, escape special characters, etc. Once the data issyntactically correct, the sanitizer could use a dtd/schema/xslt to add missing elements, or more importantly strip unwanted elements. Theobvious application is HTML. A web server could pass untrusted bytes into the sanitizer and get back a result that is both valid and safe.Different levels/rules would be used to achieve different results.

Of course there are existing solutions, but everything I've found so far is written in PHP, Perl, Python, Java, et al. And most are writtenas standalone command line tools. Launching a command line tool, particularly an executable that runs atop a virtual machine is veryinefficient, and difficult to scale. Having the functionality inside libxml2 means daemons that already use the library could easilysanitize their output, and with relatively little overhead protect themselves from a number of potential problems.

A secondary goal would be the standardization of the dtd/schema/xslt rules that are used to sanitize HTML (and other XML formatted content).Right now, every sanitizer uses a different set of rules, and looks for a different collection of exploits. If a new trick is discovered topass harmful data to clients, presumably by encapsulating it in a way that might be valid, but which gets parsed by some clients in a"vendor specific" way, updating the standardized rules would allow all the saniziters to adapt without changing code...


Just my .02.

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] i'm here to contribute

Reply via email to