On Thu, Apr 30, 2009 at 3:14 AM, Petr Pisar petr.pi...@atlas.cz wrote:
On Wed, Apr 29, 2009 at 06:50:11PM -0500, Jake b wrote:
Instead of creating something like: 912.html or index.html it instead
becomes: viewtopic@t=29807postdays=0postorder=ascstart=27330
That't normal because the server doean't provide any usefull alternative name
via HTTP headers which can be obtained using wget's option
--content-disposition.
I already know how to get the page number ( my python script converts
27330 to 912 and back ), but i'm not sure how to tell wget that the
output html file should be named.
How do I make wget download all images on the page? I don't want to
recurse other hosts, or even sijun, just download this page, and all
images needed to display it.
That's not easy task. Especially because all big desktop images are stored on
other servers. I think wget is not enough powerfull to do it all on its own.
Are you saying because some services show a thumbnail, then click to
do the full image? I'm not worried about that, since the majority are
full size in the thread.
Would it be simpler to say something like: download page 912,
recursion level=1 ( or 2? ), except for non-image links. ( so it only
allows recursion on images, ie: downloading randomguyshost.com/3.png
But the problem that it does not span any hosts? Is there a way I can
achieve this, if I do the same, except, allow span everybody, recurse
lvl=1, and only recurse non-images.
I propose using other tools to extract the image ULRs and then to download
them
using wget. E.g.:
I guess I could use wget to get the html, and parse that for image
tags manually, but, then I don't get the forum thread comments. Which
isn't required, but would be nice.
wget -O -
'http://forums.sijun.com/viewtopic.php?t=29807postdays=0postorder=ascstart=27+330'
| grep -o -E 'http:\/\/[^]*\.(jpg|jpeg|png)' | wget -i -
Ok, will have to try it out. ( In windows ATM so I can't pipe. )
Acctually, I suppose you use some unix enviroment, where you have available
powerfull collection of external tools (grep, seq) and amazing shell scripting
abilities (like colons and loops).
-- Petr
Using python, and I have dual boot if needed.
--
Jake