Hi, I think there has been mail on this issue in the past (especially from Eddy Thilleman) but it hasn't been adequately addressed IMO. Currently there is no facility in wget (1.6) to choose which *HTML* links are followed. All HTML links are followed (controlled by recursion and other rules) and one can modify which files are actually downloaded using wildcards in the accept/reject options. I had a recent need to download image files pointed to by children of some top level pages. Each child page unfortunately also pointed to all it's uncles. Because all the HTML files live in the same directory, directory based rules did not work. I had to hack recur.c to apply the acceptance/rejection rules for HTML links too (at lines 316-344). That way I could say ~/wget-1.6/src/wget -p -r -nH --cut-dirs=2 --wait=10 --accept=jpg,\[0-9\]\*_poster.html --reject=thumb.jpg,\[a-z\]0_poster.html www.webshots.com/posters/html/art_abstract0_poster.html [Explanation: the [a-zA-Z]*_poster.html are the top level pages, they point to the actual poster pages \[0-9\]\*_poster.html which also have links to all the top level pages. I wanted to download only the subtree rooted at art_abstract0_poster.html] I suggest that an option be added (say --checkhtml) to allow accept/reject rules to be applied to HTML links too. Thanks, and thanks for wget. -Ullas
