On 23 Aug 2001, at 13:33, Edward J. Sabol wrote:
> Nathan J. Yoder wrote:
> >> Please fix this soon,
> >>
> >> ***COMMAND***
> >> wget -k http://reality.sgi.com/fxgovers_houst/yama/panels/panelsIntro.html
> >[snip]
> >> 02:30:05 (23.54 KB/s) - `panelsIntro.html' saved [3061/3061]
> >>
> >> Converting panelsIntro.html... zsh: segmentation fault (core dumped)
>
> Ian Abbott replied:
> > I cannot reproduce this failure on my RedHat 7.1 box.
>
> I was able to reproduce this pretty easily on both Irix 6.5.2 and Digital
> Unix 4.0d, using gcc 2.95.2. (I bet Linux's glibc has code to protect against
> fwrite() calls with negative lengths.)
>
> The problem occurs when you have a single tag with multiple attributes that
> specify links that need to be converted. In this case, it's an IMG tag with
> SRC and LOWSRC attributes. The urlpos structure passed to convert_links() is
> a linked list of pointers to where the links are that needed to be converted.
> The problem is that the links are not in positional order. The second
> attribute is in the linked list before the first attribute, causing the
> length of the string to be printed out to be a negative number.
Thanks for tracking that down. I've now found the problem, fixed it and
created a patch (attached) against the current CVS sources.
> Here's a diff (against the current CVS sources) which will prevent the core
> dump, but please note that it does not fix the problem. html-parse.c and
> html-url.c are some dense code, and I'm still wading through it. (Also, it's
> not clear if the linked list is supposed to be in positional order or if
> convert_links() is wrongly assuming that.)
[snipped the diff]
At least that extra code was a convenient place for me to stick a
brreakpoint on in gdb, and also helped me verify that I've nailed the
bug (I checked the converted html file too, of course!).
It's a shame Hrvoje Niksik's not arround at the moment to apply all
these patches to the repository.
Index: src/html-url.c
===================================================================
RCS file: /pack/anoncvs/wget/src/html-url.c,v
retrieving revision 1.10
diff -u -r1.10 html-url.c
--- src/html-url.c 2001/05/27 19:35:02 1.10
+++ src/html-url.c 2001/08/24 15:07:49
@@ -383,7 +383,7 @@
{
case TC_LINK:
{
- int i;
+ int i, id, first;
int size = ARRAY_SIZE (url_tag_attr_map);
for (i = 0; i < size; i++)
if (url_tag_attr_map[i].tagid == tagid)
@@ -391,25 +391,34 @@
/* We've found the index of url_tag_attr_map where the
attributes of our tags begin. Now, look for every one of
them, and handle it. */
- for (; (i < size && url_tag_attr_map[i].tagid == tagid); i++)
+ /* Need to process the attributes in the order they appear in
+ the tag, as this is required if we convert links. */
+ first = i;
+ for (id = 0; id < tag->nattrs; id++)
{
- char *attr_value;
- int id;
- if (closure->dash_p_leaf_HTML
- && (url_tag_attr_map[i].flags & AF_EXTERNAL))
- /* If we're at a -p leaf node, we don't want to retrieve
- links to references we know are external to this document,
- such as <a href=...>. */
- continue;
+ /* This nested loop may seem inefficient (O(n^2)), but it's
+ not, since the number of attributes (n) we loop over is
+ extremely small. In the worst case of IMG with all its
+ possible attributes, n^2 will be only 9. */
+ for (i = first; (i < size && url_tag_attr_map[i].tagid == tagid);
+ i++)
+ {
+ char *attr_value;
+ if (closure->dash_p_leaf_HTML
+ && (url_tag_attr_map[i].flags & AF_EXTERNAL))
+ /* If we're at a -p leaf node, we don't want to retrieve
+ links to references we know are external to this document,
+ such as <a href=...>. */
+ continue;
- /* This find_attr() buried in a loop may seem inefficient
- (O(n^2)), but it's not, since the number of attributes
- (n) we loop over is extremely small. In the worst case
- of IMG with all its possible attributes, n^2 will be
- only 9. */
- attr_value = find_attr (tag, url_tag_attr_map[i].attr_name, &id);
- if (attr_value)
- handle_link (closure, attr_value, tag, id);
+ if (!strcasecmp (tag->attrs[id].name,
+ url_tag_attr_map[i].attr_name))
+ {
+ attr_value = tag->attrs[id].value;
+ if (attr_value)
+ handle_link (closure, attr_value, tag, id);
+ }
+ }
}
}
break;