2009/1/13 Girish Redekar girish.rede...@gmail.com:
I'm trying to build a search engine in python am stuck at the place where I
parse HTML to get useful text. One should ideally be able to parse the text
(out of HTML tags) along with its position (for phrase searches) and
font-size (to weigh
Thanks Noah - Beautiful Soup does give a tree that can be used - however,
getting from the tree to the result I desire is still a long way.
I'm using lxml (for speed conerns) and it also returns a tree similar to BS
.. I have even got as far as parsing the css and getting the attributes for
each
2009/1/12 Girish Redekar girish.rede...@gmail.com:
is still tedious as font sizes in html/css can be expressed in multiple
methods (like FONT tags, sizes in pixels, relative sizes, default larger
size for header etc). One can get down and code each of these cases, but I
was hoping someone has
2009/1/12 Girish Redekar:
I'm trying to build a search engine in python am stuck at the place where I
parse HTML to get useful text. One should ideally be able to parse the text
(out of HTML tags) along with its position (for phrase searches) and
font-size (to weigh words appropriately).
Have
Girish Redekar ha scritto:
I'm trying to build a search engine in python am stuck at the place
where I parse HTML to get useful text. One should ideally be able to
parse the text (out of HTML tags) along with its position (for phrase
searches) and font-size (to weigh words appropriately).