Re: Working with HTML5 documents
On Thu, Nov 20, 2014 at 1:10 PM, Stefan Behnel wrote: > Ian Kelly schrieb am 20.11.2014 um 20:44: >> On Thu, Nov 20, 2014 at 12:02 PM, Stefan Behnel wrote: >>> There's also the E-factory for creating (sub-)trees and a nicely objectish >>> way: >>> >>> http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory >> >> That looks ugly with all those caps and also hard to extend. Notably >> it seems to be missing any functions to build HTML5 elements, unless >> those have been added in lxml 3.4. > > It's actually trivial to extend, and it's designed for it. The factory > simply uses "__getattr__()", so you can ask it for any tag name. The > predefined names in the builder.py module are mainly there to easily detect > typos on user side. This is not the case from what I saw in my testing based on the documentation. Python 2.7.6 (default, Mar 22 2014, 22:59:56) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lxml.html import builder as E >>> html = E.HTML(E.HEAD(), E.BODY()) >>> html = E.HTML(E.HEAD(), E.BODY(E.ARTICLE())) Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'ARTICLE' > https://github.com/lxml/lxml/blob/master/src/lxml/html/builder.py > > If you don't like capital names for constants, just copy the module and > change the tag names to lower case, or use the blank E-factory if you feel > like it. Based on the source file that you linked, I can see that this would work but is undocumented: >>> from lxml.builder import ElementMaker >>> import lxml.html >>> E = ElementMaker(makeelement=lxml.html.html_parser.makeelement) >>> html = E.html(E.head(), E.body(E.article())) >>> lxml.html.tostring(html) '' -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with HTML5 documents
Ian Kelly schrieb am 20.11.2014 um 20:44: > On Thu, Nov 20, 2014 at 12:02 PM, Stefan Behnel wrote: >> There's also the E-factory for creating (sub-)trees and a nicely objectish >> way: >> >> http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory > > That looks ugly with all those caps and also hard to extend. Notably > it seems to be missing any functions to build HTML5 elements, unless > those have been added in lxml 3.4. It's actually trivial to extend, and it's designed for it. The factory simply uses "__getattr__()", so you can ask it for any tag name. The predefined names in the builder.py module are mainly there to easily detect typos on user side. https://github.com/lxml/lxml/blob/master/src/lxml/html/builder.py If you don't like capital names for constants, just copy the module and change the tag names to lower case, or use the blank E-factory if you feel like it. Stefan -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with HTML5 documents
On Thu, Nov 20, 2014 at 12:02 PM, Stefan Behnel wrote: > There's also the E-factory for creating (sub-)trees and a nicely objectish > way: > > http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory That looks ugly with all those caps and also hard to extend. Notably it seems to be missing any functions to build HTML5 elements, unless those have been added in lxml 3.4. Working with lxml.html.Element directly seems pretty versatile, though. -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with HTML5 documents
Tim schrieb am 20.11.2014 um 18:31: > On Thursday, November 20, 2014 12:04:09 PM UTC-5, Denis McMahon wrote: >>> On Wednesday, November 19, 2014 2:08:27 PM UTC-7, Denis McMahon wrote: So what I'm looking for is a method to create an html5 document using "dom manipulation", ie: doc = new htmldocument(doctype="HTML") html = new html5element("html") doc.appendChild(html) head = new html5element("body") html.appendChild(head) body = new html5element("body") html.appendChild(body) title = new html5element("title") txt = new textnode("This Is The Title") title.appendChild(txt) head.appendChild(title) para = new html5element("p") txt = new textnode("This is some text.") para.appendChild(txt) body.appendChild(para) print(doc.serialise()) generates: This Is The Title>>> head>This is some text. I'm finding various mechanisms to generate the structure from an existing piece of html (eg html5lib, beautifulsoup etc) but I can't seem to find any mechanism to generate, manipulate and produce html5 documents using this dom manipulation approach. Where should I be looking? >> >> Everything there seems to assume I'll be creating a document serially, eg >> that I won't get to some point in the document and decide that I want to >> add an element earlier. >> >> bs4 and html5lib will parse a document into a tree structure, but they're >> not so hot on manipulating the tree structure, eg adding and moving nodes. >> >> Actually it looks like bs4 is going to be my best bet, although limited >> it does have most of what I'm looking for. I just need to start by giving >> it "" to parse. > > I believe lxml should work for this. Here's a snippet that I have used to > create an HTML document: > > from lxml import etree > page = etree.Element('html') > doc = etree.ElementTree(page) > > head = etree.SubElement(page, 'head') > body = etree.SubElement(page, 'body') > table = etree.SubElement(body, 'table') > > etc etc > > with open('mynewfile.html', 'wb') as f: > doc.write(f, pretty_print=True, method='html') > > (you can leave out the method= option to get xhtml). There's also the E-factory for creating (sub-)trees and a nicely objectish way: http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory and the just released lxml 3.4.1 has an "htmlfile" context manager that allows you to generate HTML incrementally: http://lxml.de/api.html#incremental-xml-generation Obviously, you can combine both, so you can create a subtree in memory and write it into an incrementally built HTML stream. Pretty versatile. Stefan -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with HTML5 documents
On Thursday, November 20, 2014 12:04:09 PM UTC-5, Denis McMahon wrote: > On Wed, 19 Nov 2014 13:43:17 -0800, Novocastrian_Nomad wrote: > > > On Wednesday, November 19, 2014 2:08:27 PM UTC-7, Denis McMahon wrote: > >> So what I'm looking for is a method to create an html5 document using > >> "dom manipulation", ie: > >> > >> doc = new htmldocument(doctype="HTML") > >> html = new html5element("html") > >> doc.appendChild(html) > >> head = new html5element("body") > >> html.appendChild(head) > >> body = new html5element("body") > >> html.appendChild(body) > >> title = new html5element("title") > >> txt = new textnode("This Is The Title") > >> title.appendChild(txt) > >> head.appendChild(title) > >> para = new html5element("p") > >> txt = new textnode("This is some text.") > >> para.appendChild(txt) > >> body.appendChild(para) > >> > >> print(doc.serialise()) > >> > >> generates: > >> > >> This Is The Title >> head>This is some text. > >> > >> I'm finding various mechanisms to generate the structure from an > >> existing piece of html (eg html5lib, beautifulsoup etc) but I can't > >> seem to find any mechanism to generate, manipulate and produce html5 > >> documents using this dom manipulation approach. Where should I be > >> looking? > > > Use a search engine (Google, DuckDuckGo etc) and search for 'python > > write html' > > Surprise surprise, already tried that, can't find anything that holds the > document in the sort of tree structure that I want to manipulate it in. > > Everything there seems to assume I'll be creating a document serially, eg > that I won't get to some point in the document and decide that I want to > add an element earlier. > > bs4 and html5lib will parse a document into a tree structure, but they're > not so hot on manipulating the tree structure, eg adding and moving nodes. > > Actually it looks like bs4 is going to be my best bet, although limited > it does have most of what I'm looking for. I just need to start by giving > it "" to parse. > > -- > Denis McMahon I believe lxml should work for this. Here's a snippet that I have used to create an HTML document: from lxml import etree page = etree.Element('html') doc = etree.ElementTree(page) head = etree.SubElement(page, 'head') body = etree.SubElement(page, 'body') table = etree.SubElement(body, 'table') etc etc with open('mynewfile.html', 'wb') as f: doc.write(f, pretty_print=True, method='html') (you can leave out the method= option to get xhtml). hope that helps, --Tim -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with HTML5 documents
On Wed, 19 Nov 2014 13:43:17 -0800, Novocastrian_Nomad wrote: > On Wednesday, November 19, 2014 2:08:27 PM UTC-7, Denis McMahon wrote: >> So what I'm looking for is a method to create an html5 document using >> "dom manipulation", ie: >> >> doc = new htmldocument(doctype="HTML") >> html = new html5element("html") >> doc.appendChild(html) >> head = new html5element("body") >> html.appendChild(head) >> body = new html5element("body") >> html.appendChild(body) >> title = new html5element("title") >> txt = new textnode("This Is The Title") >> title.appendChild(txt) >> head.appendChild(title) >> para = new html5element("p") >> txt = new textnode("This is some text.") >> para.appendChild(txt) >> body.appendChild(para) >> >> print(doc.serialise()) >> >> generates: >> >> This Is The Title> head>This is some text. >> >> I'm finding various mechanisms to generate the structure from an >> existing piece of html (eg html5lib, beautifulsoup etc) but I can't >> seem to find any mechanism to generate, manipulate and produce html5 >> documents using this dom manipulation approach. Where should I be >> looking? > Use a search engine (Google, DuckDuckGo etc) and search for 'python > write html' Surprise surprise, already tried that, can't find anything that holds the document in the sort of tree structure that I want to manipulate it in. Everything there seems to assume I'll be creating a document serially, eg that I won't get to some point in the document and decide that I want to add an element earlier. bs4 and html5lib will parse a document into a tree structure, but they're not so hot on manipulating the tree structure, eg adding and moving nodes. Actually it looks like bs4 is going to be my best bet, although limited it does have most of what I'm looking for. I just need to start by giving it "" to parse. -- Denis McMahon, denismfmcma...@gmail.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Working with HTML5 documents
On Wednesday, November 19, 2014 2:08:27 PM UTC-7, Denis McMahon wrote: > So what I'm looking for is a method to create an html5 document using "dom > manipulation", ie: > > doc = new htmldocument(doctype="HTML") > html = new html5element("html") > doc.appendChild(html) > head = new html5element("body") > html.appendChild(head) > body = new html5element("body") > html.appendChild(body) > title = new html5element("title") > txt = new textnode("This Is The Title") > title.appendChild(txt) > head.appendChild(title) > para = new html5element("p") > txt = new textnode("This is some text.") > para.appendChild(txt) > body.appendChild(para) > > print(doc.serialise()) > > generates: > > This Is The Title head>This is some text. > > I'm finding various mechanisms to generate the structure from an existing > piece of html (eg html5lib, beautifulsoup etc) but I can't seem to find > any mechanism to generate, manipulate and produce html5 documents using > this dom manipulation approach. Where should I be looking? > > -- > Denis McMahon, Use a search engine (Google, DuckDuckGo etc) and search for 'python write html' -- https://mail.python.org/mailman/listinfo/python-list
Working with HTML5 documents
So what I'm looking for is a method to create an html5 document using "dom manipulation", ie: doc = new htmldocument(doctype="HTML") html = new html5element("html") doc.appendChild(html) head = new html5element("body") html.appendChild(head) body = new html5element("body") html.appendChild(body) title = new html5element("title") txt = new textnode("This Is The Title") title.appendChild(txt) head.appendChild(title) para = new html5element("p") txt = new textnode("This is some text.") para.appendChild(txt) body.appendChild(para) print(doc.serialise()) generates: This Is The TitleThis is some text. I'm finding various mechanisms to generate the structure from an existing piece of html (eg html5lib, beautifulsoup etc) but I can't seem to find any mechanism to generate, manipulate and produce html5 documents using this dom manipulation approach. Where should I be looking? -- Denis McMahon, denismfmcma...@gmail.com -- https://mail.python.org/mailman/listinfo/python-list