robert <[EMAIL PROTECTED]>:

> Often I want to extract some web table contents. Formats are
> mostly static, simple text & numbers in it, other tags to be
> stripped off. So a simple & fast approach would be ok.
> 
> What of the different modules around is most easy to use, stable,
> up-to-date, iterator access or best matrix-access (without need
> for callback functions,classes.. for basic tasks)?

Not more than a handful of lines with lxml.html:

def htmltable2matrix(table):
    """Converts a html table to a matrix.

    :param table:  The html table element
    :type table:  An lxml element
    """
    matrix = []
    for row in table:
        matrix.append([e.text_content() for e in row])
    return matrix



-- 
Freedom is always the freedom of dissenters.
                                      (Rosa Luxemburg)
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to