Hi,
I'm trying to use the code in html-parse.c (v1.7) in standalone mode
but am having trouble parsing some pages. For some reason, <img
src=... > tags are recognized but then skipped almost every time they
are encountered. When using the full program and recursive retrieve,
the images are in fact retreived so it seems that the parser does work
correctly when not in standalone mode.
It seems that the following condition is met when parsing img
tag attributes
/* Establish bounds of attribute name. */
attr_name_begin = p; /* <foo bar ...> */
/* ^ */
while (NAME_CHAR_P (*p))
ADVANCE (p);
attr_name_end = p; /* <foo bar ...> */
/* ^ */
if (attr_name_begin == attr_name_end)
goto backout_tag;
Can someone shed some light on this?
Thanks.
-- Anees