Re: Working with HTML5 documents

2014-11-20 Thread Ian Kelly
On Thu, Nov 20, 2014 at 1:10 PM, Stefan Behnel  wrote:
> Ian Kelly schrieb am 20.11.2014 um 20:44:
>> On Thu, Nov 20, 2014 at 12:02 PM, Stefan Behnel wrote:
>>> There's also the E-factory for creating (sub-)trees and a nicely objectish 
>>> way:
>>>
>>> http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory
>>
>> That looks ugly with all those caps and also hard to extend. Notably
>> it seems to be missing any functions to build HTML5 elements, unless
>> those have been added in lxml 3.4.
>
> It's actually trivial to extend, and it's designed for it. The factory
> simply uses "__getattr__()", so you can ask it for any tag name. The
> predefined names in the builder.py module are mainly there to easily detect
> typos on user side.

This is not the case from what I saw in my testing based on the documentation.

Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.html import builder as E
>>> html = E.HTML(E.HEAD(), E.BODY())
>>> html = E.HTML(E.HEAD(), E.BODY(E.ARTICLE()))
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'module' object has no attribute 'ARTICLE'

> https://github.com/lxml/lxml/blob/master/src/lxml/html/builder.py
>
> If you don't like capital names for constants, just copy the module and
> change the tag names to lower case, or use the blank E-factory if you feel
> like it.

Based on the source file that you linked, I can see that this would
work but is undocumented:

>>> from lxml.builder import ElementMaker
>>> import lxml.html
>>> E = ElementMaker(makeelement=lxml.html.html_parser.makeelement)
>>> html = E.html(E.head(), E.body(E.article()))
>>> lxml.html.tostring(html)
''
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with HTML5 documents

2014-11-20 Thread Stefan Behnel
Ian Kelly schrieb am 20.11.2014 um 20:44:
> On Thu, Nov 20, 2014 at 12:02 PM, Stefan Behnel wrote:
>> There's also the E-factory for creating (sub-)trees and a nicely objectish 
>> way:
>>
>> http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory
> 
> That looks ugly with all those caps and also hard to extend. Notably
> it seems to be missing any functions to build HTML5 elements, unless
> those have been added in lxml 3.4.

It's actually trivial to extend, and it's designed for it. The factory
simply uses "__getattr__()", so you can ask it for any tag name. The
predefined names in the builder.py module are mainly there to easily detect
typos on user side.

https://github.com/lxml/lxml/blob/master/src/lxml/html/builder.py

If you don't like capital names for constants, just copy the module and
change the tag names to lower case, or use the blank E-factory if you feel
like it.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with HTML5 documents

2014-11-20 Thread Ian Kelly
On Thu, Nov 20, 2014 at 12:02 PM, Stefan Behnel  wrote:
> There's also the E-factory for creating (sub-)trees and a nicely objectish 
> way:
>
> http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory

That looks ugly with all those caps and also hard to extend. Notably
it seems to be missing any functions to build HTML5 elements, unless
those have been added in lxml 3.4.

Working with lxml.html.Element directly seems pretty versatile, though.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with HTML5 documents

2014-11-20 Thread Stefan Behnel
Tim schrieb am 20.11.2014 um 18:31:
> On Thursday, November 20, 2014 12:04:09 PM UTC-5, Denis McMahon wrote:
>>> On Wednesday, November 19, 2014 2:08:27 PM UTC-7, Denis McMahon wrote:
 So what I'm looking for is a method to create an html5 document using
 "dom manipulation", ie:

 doc = new htmldocument(doctype="HTML")
 html = new html5element("html")
 doc.appendChild(html)
 head = new html5element("body")
 html.appendChild(head)
 body = new html5element("body")
 html.appendChild(body)
 title = new html5element("title")
 txt = new textnode("This Is The Title")
 title.appendChild(txt)
 head.appendChild(title)
 para = new html5element("p")
 txt = new textnode("This is some text.")
 para.appendChild(txt)
 body.appendChild(para)

 print(doc.serialise())

 generates:

 This Is The Title>>> head>This is some text.

 I'm finding various mechanisms to generate the structure from an
 existing piece of html (eg html5lib, beautifulsoup etc) but I can't
 seem to find any mechanism to generate, manipulate and produce html5
 documents using this dom manipulation approach. Where should I be
 looking?
>>
>> Everything there seems to assume I'll be creating a document serially, eg 
>> that I won't get to some point in the document and decide that I want to 
>> add an element earlier.
>>
>> bs4 and html5lib will parse a document into a tree structure, but they're 
>> not so hot on manipulating the tree structure, eg adding and moving nodes.
>>
>> Actually it looks like bs4 is going to be my best bet, although limited 
>> it does have most of what I'm looking for. I just need to start by giving 
>> it "" to parse.
> 
> I believe lxml should work for this. Here's a snippet that I have used to 
> create an HTML document:
> 
> from lxml import etree
> page = etree.Element('html')
> doc = etree.ElementTree(page)
> 
> head = etree.SubElement(page, 'head')
> body = etree.SubElement(page, 'body')
> table = etree.SubElement(body, 'table')
> 
> etc etc
>
> with open('mynewfile.html', 'wb') as f:
> doc.write(f, pretty_print=True, method='html')
> 
> (you can leave out the method= option to get xhtml).

There's also the E-factory for creating (sub-)trees and a nicely objectish way:

http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory

and the just released lxml 3.4.1 has an "htmlfile" context manager that
allows you to generate HTML incrementally:

http://lxml.de/api.html#incremental-xml-generation

Obviously, you can combine both, so you can create a subtree in memory and
write it into an incrementally built HTML stream. Pretty versatile.

Stefan


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with HTML5 documents

2014-11-20 Thread Tim
On Thursday, November 20, 2014 12:04:09 PM UTC-5, Denis McMahon wrote:
> On Wed, 19 Nov 2014 13:43:17 -0800, Novocastrian_Nomad wrote:
> 
> > On Wednesday, November 19, 2014 2:08:27 PM UTC-7, Denis McMahon wrote:
> >> So what I'm looking for is a method to create an html5 document using
> >> "dom manipulation", ie:
> >> 
> >> doc = new htmldocument(doctype="HTML")
> >> html = new html5element("html")
> >> doc.appendChild(html)
> >> head = new html5element("body")
> >> html.appendChild(head)
> >> body = new html5element("body")
> >> html.appendChild(body)
> >> title = new html5element("title")
> >> txt = new textnode("This Is The Title")
> >> title.appendChild(txt)
> >> head.appendChild(title)
> >> para = new html5element("p")
> >> txt = new textnode("This is some text.")
> >> para.appendChild(txt)
> >> body.appendChild(para)
> >> 
> >> print(doc.serialise())
> >> 
> >> generates:
> >> 
> >> This Is The Title >> head>This is some text.
> >> 
> >> I'm finding various mechanisms to generate the structure from an
> >> existing piece of html (eg html5lib, beautifulsoup etc) but I can't
> >> seem to find any mechanism to generate, manipulate and produce html5
> >> documents using this dom manipulation approach. Where should I be
> >> looking?
> 
> > Use a search engine (Google, DuckDuckGo etc) and search for 'python
> > write html'
> 
> Surprise surprise, already tried that, can't find anything that holds the 
> document in the sort of tree structure that I want to manipulate it in.
> 
> Everything there seems to assume I'll be creating a document serially, eg 
> that I won't get to some point in the document and decide that I want to 
> add an element earlier.
> 
> bs4 and html5lib will parse a document into a tree structure, but they're 
> not so hot on manipulating the tree structure, eg adding and moving nodes.
> 
> Actually it looks like bs4 is going to be my best bet, although limited 
> it does have most of what I'm looking for. I just need to start by giving 
> it "" to parse.
> 
> -- 
> Denis McMahon

I believe lxml should work for this. Here's a snippet that I have used to 
create an HTML document:

from lxml import etree
page = etree.Element('html')
doc = etree.ElementTree(page)

head = etree.SubElement(page, 'head')
body = etree.SubElement(page, 'body')
table = etree.SubElement(body, 'table')

etc etc
   
with open('mynewfile.html', 'wb') as f:
doc.write(f, pretty_print=True, method='html')

(you can leave out the method= option to get xhtml).

hope that helps,
--Tim


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with HTML5 documents

2014-11-20 Thread Denis McMahon
On Wed, 19 Nov 2014 13:43:17 -0800, Novocastrian_Nomad wrote:

> On Wednesday, November 19, 2014 2:08:27 PM UTC-7, Denis McMahon wrote:
>> So what I'm looking for is a method to create an html5 document using
>> "dom manipulation", ie:
>> 
>> doc = new htmldocument(doctype="HTML")
>> html = new html5element("html")
>> doc.appendChild(html)
>> head = new html5element("body")
>> html.appendChild(head)
>> body = new html5element("body")
>> html.appendChild(body)
>> title = new html5element("title")
>> txt = new textnode("This Is The Title")
>> title.appendChild(txt)
>> head.appendChild(title)
>> para = new html5element("p")
>> txt = new textnode("This is some text.")
>> para.appendChild(txt)
>> body.appendChild(para)
>> 
>> print(doc.serialise())
>> 
>> generates:
>> 
>> This Is The Title> head>This is some text.
>> 
>> I'm finding various mechanisms to generate the structure from an
>> existing piece of html (eg html5lib, beautifulsoup etc) but I can't
>> seem to find any mechanism to generate, manipulate and produce html5
>> documents using this dom manipulation approach. Where should I be
>> looking?

> Use a search engine (Google, DuckDuckGo etc) and search for 'python
> write html'

Surprise surprise, already tried that, can't find anything that holds the 
document in the sort of tree structure that I want to manipulate it in.

Everything there seems to assume I'll be creating a document serially, eg 
that I won't get to some point in the document and decide that I want to 
add an element earlier.

bs4 and html5lib will parse a document into a tree structure, but they're 
not so hot on manipulating the tree structure, eg adding and moving nodes.

Actually it looks like bs4 is going to be my best bet, although limited 
it does have most of what I'm looking for. I just need to start by giving 
it "" to parse.

-- 
Denis McMahon, denismfmcma...@gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Working with HTML5 documents

2014-11-19 Thread Novocastrian_Nomad
On Wednesday, November 19, 2014 2:08:27 PM UTC-7, Denis McMahon wrote:
> So what I'm looking for is a method to create an html5 document using "dom 
> manipulation", ie:
> 
> doc = new htmldocument(doctype="HTML")
> html = new html5element("html")
> doc.appendChild(html)
> head = new html5element("body")
> html.appendChild(head)
> body = new html5element("body")
> html.appendChild(body)
> title = new html5element("title")
> txt = new textnode("This Is The Title")
> title.appendChild(txt)
> head.appendChild(title)
> para = new html5element("p")
> txt = new textnode("This is some text.")
> para.appendChild(txt)
> body.appendChild(para)
> 
> print(doc.serialise())
> 
> generates:
> 
> This Is The Title head>This is some text.
> 
> I'm finding various mechanisms to generate the structure from an existing 
> piece of html (eg html5lib, beautifulsoup etc) but I can't seem to find 
> any mechanism to generate, manipulate and produce html5 documents using 
> this dom manipulation approach. Where should I be looking?
> 
> -- 
> Denis McMahon,

Use a search engine (Google, DuckDuckGo etc) and search for 'python write html'
-- 
https://mail.python.org/mailman/listinfo/python-list


Working with HTML5 documents

2014-11-19 Thread Denis McMahon
So what I'm looking for is a method to create an html5 document using "dom 
manipulation", ie:

doc = new htmldocument(doctype="HTML")
html = new html5element("html")
doc.appendChild(html)
head = new html5element("body")
html.appendChild(head)
body = new html5element("body")
html.appendChild(body)
title = new html5element("title")
txt = new textnode("This Is The Title")
title.appendChild(txt)
head.appendChild(title)
para = new html5element("p")
txt = new textnode("This is some text.")
para.appendChild(txt)
body.appendChild(para)

print(doc.serialise())

generates:

This Is The TitleThis is some text.

I'm finding various mechanisms to generate the structure from an existing 
piece of html (eg html5lib, beautifulsoup etc) but I can't seem to find 
any mechanism to generate, manipulate and produce html5 documents using 
this dom manipulation approach. Where should I be looking?

-- 
Denis McMahon, denismfmcma...@gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list