Hi again, Ian and fellow wgeteers!
A debug log will be useful if you can produce one.
Sure I (or wget) can and did.
It is 60kB of text. Zipping? Attaching?
Also note that if receive cookies that expire around 2038 with
debugging on, the Windows version of Wget will crash! (This is a
known bug with a known fix, but not yet finalised in CVS.)
Funny you mention that!
I came across a crash caused by a cookie
two days ago. I disabled cookies and it worked.
Should have traced this a bit more.
I just installed 1.7.1, which also works breadth-first.
(I think you mean depth-first.)
*doh* /slaps forehead
Of course, thanks.
used depth-first retrieval. There are advantages and disadvantages
with both types of retrieval.
I understand, I followed (but not totally understood)
the discussion back then.
Of course, this is possible.
I just had hoped that by combining
-F -i url.html
with domain acceptance would save me a lot of time.
Oh, I think I see what your first complaint is now. I initially
assumed that your local html file was being served by a local HTTP
server rather than being fed to the -F -i options. Is your complaint really that URLs
supplied on the command line or via the
-i option are not subjected to the acceptance/rejection rules? That
does indeed seem to be the current behavior, but there is not
particular reason why we couldn't apply the tests to these URLs as
well as the URLs obtained through recursion.
Well, you are confusing me a bit ;}
Assume a file like
html
body
a href=http://www.audistory-nospam.com;1/a
a href=http://www.audistory-nospam.de;2/a
a href=http://www.audi100-online-nospam.de;3/a
a href=http://www.kolaschnik-nospam.de;4/a
/body
/html
and a command line like
wget -nc -x -r -l0 -t10 -H -Dstory.de,audi -o example.log -k -d
-R.gif,.exe,*tn*,*thumb*,*small* -F -i example.html
Result with 1.8.1 and 1.7.1 with -nh:
audistory.com: Only index.html
audistory.de: Everything
audi100-online: only the first page
kolaschnik.de: only the first page
What I would have liked and expected:
audistory.com: Everything
audistory.de: Everything
audi100-online: Everything
kolaschnik.de: nothing
Independent from the the question how the string audi
should be matched within the URL, I think rejected URLs
should not be parsed or be retrieved.
I hope I could articulate what I wanted to say :)
CU
Jens