I know! But that is intentionally left without index.html. It should
display content of the directory, and I want that wget mirror it
correctly.
Similar situation is here:
http://chemfan.pl.feedle.com/arch/chemfanftp/
it is left intentionally without index.html so that people could
Do I understand correctly that the mirror at feeble is created by you and
wget?
Yes, because this is in th HTML file itself:
http://znik.wbc.lublin.pl/Mineraly/Ftp/UpLoad/index.html;
It does not work in a browser, so why should it work in wget?
It works in the browser:
Wget saves a mirror to your harddisk. Therefore, it cannot rely on an
apache
server generating a directory listing. Thus, it created an index.html as
Apparently you have not tried to open that link,
Which link? The non-working one on your incorrect mirror or the working one
on my correct
The problem was that that link:
http://znik.wbc.lublin.pl/Mineraly/Ftp/UpLoad/
instead of being properly converted to:
http://mineraly.feedle.com/Ftp/UpLoad/
Or, in fact, wget's default:
http://mineraly.feedle.com/Ftp/UpLoad/index.html
was left like this on the main mirror page:
Hi Alan!
As the URL starts with https, it is a secure server.
You will need to log in to this server in order to download stuff.
See the manual for info how to do that (I have no experience with it).
Good luck
Jens (just another user)
I am having trouble getting the files I want using a
Hi!
Yes, I see now, I misread Alan's original post.
I thought he would not even be able to download the single .pdf.
Don't know why, as he clearly said it works getting a single pdf.
Sorry for the confusion!
Jens
Tony Lewis [EMAIL PROTECTED] writes:
PS) Jens was mistaken when he said
Hi Jerry!
AFAIK, RegExp for (HTML?) file rejection was requested a few times, but is
not implemented at the moment.
CU
Jens (just another user)
The -R option is not working in wget 1.9.1 for anything but
specifically-hardcoded filenames..
file[Nn]ames such as [Tt]hese are simply
Hallo!
Je ne parle pas francais (ou presque pas du tout)...
C:\wgetwget --proxy=on -x -r -l 2 -k -x -l
imit-rate=50k --tries=45 --directory-prefix=AsptDD
Je pense que ce doit être:
C:\wgetwget --proxy=on -x -r -l 2 -k -x -limit-rate=50k --tries=45
--directory-prefix=AsptDD
dans un ligne de
Hi Levander!
I am not an expert by any means, just another user,
but what does the -E option do for you?
-E = --html-extension
apache. Could wget, for url's that end in slashes, read the
content-type header, and if it's text/xml, could wget create index.xml
inside the directory wget
Hi Jorge!
Current wget versions do not support large files 2GB.
However, the CVS version does and the fix will be introduced
to the normal wget source.
Jens
(just another user)
When downloading a file of 2GB and more, the counter get crazy, probably
it should have a long instead if a int
Hi Vince!
I did give -X*backup a try, and
it too didn't work for me. :(
Does the -Xdir work for you at all?
If not, there might be a problem with MacOS.
I hope one of the more knowledgeable people here
can help you!
However, I would like to confirm something dumb - will wget fetch these
Hi LucMa!
I have find a command
for autoskip if the file in my PC have the same name of the file
on the ftp
-nc I guess.
but now i need a command for overwrite the file on my PC if is smaller
respect to the file in the ftp.
-e robots=off -N
-N should do the trick.
Hi Vince!
tip or two with regards to using -X?
I'll try!
wget -r --exclude-directories='*.backup*' --no-parent \
http://example.com/dir/stuff/
Well, I am using wget under Windows and there, you have
have to use exp, not 'exp', to make it work. The *x* works as expected.
I could not test
Hi Vince!
So, so far these don't work for me:
--exclude-directories='*.backup*'
--exclude-directories=*.backup*
--exclude-directories=*\.backup*
Would -X*backup be OK for you?
If yes, give it a try.
If not, I think you'd need the correct escaping for the .,
but I have no idea how to do
Hi Mike!
Strange!
I suspect that you have some kind of typo in your test.txt
If you cannot spot one, try
wget -d -o logi.txt -i test.txt
as a command line and send the debug output.
Good luck
Jens (just another user)
a) I've verified that they both exist
b) All of the URLs are purely HTTP.
Hi Jon!
Yes, I tried using the 'command prompt' (thru XP) and it replied:
'wget' is not recognized as an internal or external command, operatable
program or batch file
Did you cd to the directory wget was unpacked to?
If not, you either need to or add the wget directory to your path
Hi Jon!
and added the text wget http://xoomer.virgilio.it/hherold/index.html;
in the file and saved it.I then double-clicked on it and nothing
happened that I could see.
Well, there now should be a file called index.html in your wget directory!
Now replace the text in your wget.bat with
Hi Wgeteers!
I understand that -C off as short for --cache=off
was dropped, right?
However, the wget.html that comes with Herold's
Windows binary mentions only
--no-cache
and the wgetrc option
cache = on/off
I just tried
1.9+cvs-dev-200404081407 unstable development version
and
--cache=off
Hi Colin!
I am just another user and rarely use timeout.
I hope I am not causing more trouble than I'm solving.
I'm attempting to use wget's --timeout flag to limit the time spent
downloading a file. However, wget seems to ignore this setting. For
example, if I set --timeout=2, the download
Hi Gerriet!
Only three images, which were referenced in styles.css, were missing.
Yes, wget does not parse css or javascript.
I thought that the -p option causes Wget to download all the files
that are necessary to properly display a given HTML page. This includes
such things as inlined
dynsrc is Microsoft DHTML for IE, if I am not mistaken.
As wget is -thankfully- not MS IE, it fails.
I just did a quick google and it seems that the use of
dynsrc is not recommended anyway.
What you can do is to download
http://www.wideopenwest.com/~nkuzmenko7225/Collision.mpg
Jens
(and
Hi Helmut!
I suspect there is a robots.txt that says index, no follow
Try
wget -nc -r -l0 -p -np -erobots=off
http://www.vatican.va/archive/DEU0035/_FA.HTM;
it works for me.
-l0 says: infinite recursion depth
-p means page requisites (not really necessary)
-erobots=off orders wget to ignore any
Hi Phil!
Without more info (wget's verbose or even debug output, full command
line,...) I find it hard to tell what is happening.
However, I have had very good success with wget and google.
So, some hints:
1. protect the google URL by enclosing it in
2. remember to span (and allow only certain)
Hi François!
Well, it seems to work for me. Here's how:
Open the frame in another window (works in Mozilla easily),
then you'll see the
URL:
Hi all!
François just told me that it works. :)
I thought that maybe I'll should add why it does ;)
The original website sits on www.zurich-airport.com,
the info frame however is loaded from
http://www.uniqueairport.com
As wget by default only downloads pages from
the same server (which makes
Hi Ben!
Not at a bug as far as I can see.
Use -A to accept only certain files.
Furthermore, the pdf and ppt files are located across various servers,
you need to allow wget to parse other servers than the original one by -H
and then restrict it to only certain ones by -D.
wget -nc -x -r -l2 -p
Hi Kelvin!
I must admit that I am a bit puzzled.
I am trying to mirror a web site that has many
hierarchical levels.
I am using the command
wget -m -k $site
which allows me to view the site fine.
However, I wish the mirror to make a directory
structure that also mimics the website
Hi Hrvoje!
In other words, save a copy of wget.texi, make the change, and send the
output of `diff -u wget.texi.orig wget.texi'. That's it.
Uhm, ok.
I found diff for windows among other GNU utilities at
http://unxutils.sourceforge.net/
if someone is interested.
distribution. See
You're close. You forgot the `-u' option to diff (very important),
and you snipped the beginning of the `patch' output (also important).
Ok, I forgot the -u switch which was stupid as I actually read
the command line in the patches file :(
But concerning the snipping I just did
diff
[...]
Cygwin considers `c:\Documents and Settings\USERNAME' to be the
home directory. I wonder if that is reachable through registry...
Does anyone have an idea what we should consider the home dir under
Windows, and how to find it?
Doesn't this depend on each user's personal
Hi Hrvoje!
PS: One note to the manual editor(s?): The -e switch could be
(briefly?) mentioned also at the wgetrc commands paragraph. I
think it would make sense to mention it there again without
clustering the manual too much. Currently it is only mentioned in
Basic Startup Options (and
Hi Dan,
I must admit that I don't fully understand your question.
-nc
means no clobber, that means that files that already exist
locally are not downloaded again, independent from their age or size or
whatever.
-N
means that only newer files are downloaded (or if the size differs).
So these
use
robots = on/off in your wgetrc
or
wget -e robots = on/off URL in your command line
Jens
PS: One note to the manual editor(s?):
The -e switch could be (briefly?) mentioned
also at the wgetrc commands paragraph.
I think it would make sense to mention it there again
without clustering the
Hi Heiko!
Until now, I linked to your main page.
Would you mind if people short-cut this?
Linking to the directory is bad since people would download
Sorry, I meant linking directly to the latest zip.
However, I personally prefer to read what the provider
(in this case you) has to say
Hi Ron!
If I understand you correctly, you could probably use the
-A acclist
--accept acclist
accept = acclist
option.
So, probably (depending on your site), the syntax should be something like:
wget -r -A *.pdf URL
wget -r -A *.pdf -np URL
or, if you have to recurse through multiple html
Hello Heiko!
I added a wget-complete-stable.zip, if you want to link to a fixed url
use
that, I'll update it whenever needed. Currently it is the same archive
as the wget-wget-1.9.1b-complete.zip .
Great! Thank you very much, Heiko.
I think I'll use it on my wgetgui page as well! :)
But
Note:
Mail redirected from bug to normal wget list.
H For getting Wget you might want to link directly to
H ftp://ftp.sunsite.dk/projects/wget/windows/wget-1.9.1b-complete.zip,
OK, but too bad there's no stable second link .../latest.zip so I
don't have to update my web page to follow the
Hi Tommy!
Does this option, first shown in 1.9.1 (I think) help you:
--restrict-file-names=mode
It controls file-name escaping.
I'll mail the complete extract from the manual to your private mail address.
You can download the current wget version from
http://www.sunsite.dk/wget/
CU
Jens
Hi!
Do you propose that squashing newlines would break legitimate uses of
unescaped newlines in links?
I personally think that this is the main question.
If it doesn't break other things, implement squashing newlines
as the default behaviour.
Or are you arguing on principle that
such
Hi Jing-Shin!
Thanks for the pointers. Where can I get a version that support
the --post-data option? My newest version is 1.8.2, but it doesn't
have this option. -JS
Current version is 1.9.1.
The wget site lists download options on
http://wget.sunsite.dk/#downloading
Good luck
Jens
--
Hi Suhas!
Well, I am by no means an expert, but I think that wget
closes the connection after the first retrieval.
The SSL server realizes this and decides that wget has no right to log in
for the second retrieval, eventhough the cookie is there.
I think that is a correct behaviour for a
Hi Hrvoje!
retrieval, eventhough the cookie is there. I think that is a
correct behaviour for a secure server, isn't it?
Why would it be correct?
Sorry, I seem to have been misled by my own (limited) experience:
From the few secure sites I use, most will not let you
log in again after
Hi Suhas!
I am trying to use wget for Windows get this message: The ordinal 508
could not be located in the dynamic link library LIBEAY32.dll.
You are very probably using the wrong version of the SSL files.
Take a look at
http://xoomer.virgilio.it/hherold/
Herold has nicely rearranged the
Hi Sergey!
-nc does not only apply to .htm(l) files.
All files are considered.
At least in all wget versions I know of.
I cannot comment on your suggestion, to restrict -nc to a
user-specified list of file types.
I personally don't need it, but I could imagine certain situations
were this
Hi Stacee,
a quick cut'n'paste into google revealed the following page:
http://curl.haxx.se/mail/archive-2001-06/0017.html
Hope that helps
Jens
Stacee Kinney wrote:
Hello,
I installed Wget.exe on a Windows 2000 system and has setup Wget.exe
to run a maintenance file on an hourly
Hi Karl!
From my POV, the current set-up is the best solution.
Of course, I am also no developer, but an avid user.
Sometimes you just don't know the structure of the website
in advance, so using -m as a trouble-free no-brainer
will get you the complete site neatly done with timestamps.
BTW,
Hi Pi!
Copied straight from the wget.hlp:
#
-i file
--input-file=file
Read URLs from file, in which case no URLs need to be on the command
line. If there are URLs both on the command line and in an input file,
those on the command lines will be the first
Hi Dominic!
Since wget 1.8, the following should be the case:
*
*** When in page-requisites (-p) mode, no-parent (-np) is ignored when
retrieving for inline images, stylesheets, and other documents needed
to display the page.
**
(Taken from the included news file of wget
Hi Jakub!
But I get the same files as running this coomand without using
user-agent
string.
What is wrong with the files you get?
Do you not get all the files?
Many servers (sites) do not make a difference what
user-agent accesses them. So the files will not differ.
If you know that you don't
Hi Chris!
Using the -k switch (convert local files to relative links)
should do what you want.
CU
Jens
Christopher Stone wrote:
Hi.
I am new to wget, and although it doesn't seem to
difficult, I am unable to get the desired results that
I am looking for.
I currently have a web
Hi!
With wget 1.9-beta, wget will download .exe files
although they should be rejected (-r -R.exe).
After the download, wget removes the local file.
I understand that html files are downloaded even if -R.html,.htm
is specified as the links that may be included in them
have to be parsed.
Hi guys!
Could someone please explain to me how to use
-X (exclude_directories; --exclude)
correctly on Windos machines?
I tried
wget -X/html -x -k -r -l0 http://home.arcor.de/???/versuch.html
wget -Xhtml -x -k -r -l0 http://home.arcor.de/???/versuch.html
wget -X html -x -k -r -l0
Hi Pike and the list!
or your indexing mech might loop on it, or crash the server. who knows.
I have yet to find a site which forces wGet into a loop as you said.
I have a few. And I have a few java servers on linux that really hog the
machine when requested. They're up for testing.
Ok,
Hi Joonas!
There was a lengthy discussion about this topic a few months ago.
I am pretty sure (=I hope) that noone wants to revamp this (again).
I personally think that if people start regarding this as
a bug wget is damn close to absolute perfection.
(Yes, I know, perfection is per
Hi!
Why not just put robots=off in your .wgetrc?
hey hey
the robots.txt didn't just appear in the website; someone's
put it there and thought about it. what's in there has a good reason.
Wll, from my own experience, the #1 reason is that webmasters
do not want webgrabbers of any kind
Hi Dale!
Do I have to do 4 separate logins passing my username/passowrd each time?
If not, how do I list the 4 separate directories I need to pull files from
without performing 4 logins?
you should be able to put the four files into a .txt file and then
use this txt-file with -i filename.txt
Hi Frederic!
I'd like to know if there is a simple way to 'mirror' only the images
from a galley (ie. without thumbnails).
[...]
I won't address the options you suggested, because I think they should
be evaluated by a developper/coder.
However, as I often download galleries (and have some
Hi Brix!
It also seems these options are incompatible:
--continue with --recursive
[...]
JR How should wget decide if it needs to re-get or continue the file?
[...]
Brix:
Not wanting to repeat my post from a few days ago (but doing so nevertheless) the
one way
without checking all files
Hi Ian!
This is amazingly stupid.
It seems to make more sense if you subtract one from the number of
periods.
That was what I thought, too.
Could you assume that all two-letter TLDs are country-code TLDs and
require one more period than other TLDs (which are presumably at
least three
Hi Ian!
wgetrc works fine under windows (always has)
however, .wgetrc is not possible, but
maybe . does mean in root dir under Unix?
The code does different stuff for Windows. Instead of looking for
'.wgetrc' in the user's home directory, it looks for a file called
'wget.ini' in the
Hi!
Just to be complete, thanks to Hrvoje's tip,
I was able to find
-e command
--execute command
Execute command as if it were a part of .wgetrc (see Startup File.).
A command thus invoked will be executed after the
commands in .wgetrc, thus taking precedence over them.
I always wondered
Hi!
I recently managed to get my big machine online using a two PC
(Windows boxes) LAN.
A PI is the server, running both Zonealaram and Jana under Win98.
The first one a firewall, the second one a proxy programme.
On my client, an Athlon 1800+ with Windows 2000
I want to work with wget and
Hi Gérard!
I think you should have a look at the -p option.
It stands for page requisites and should do exactly what you want.
If I am not mistaken, -p was introduced in wget 1.8
and improved for 1.8.1 (the current version).
CU
Jens
I'd like to download a html file with its embedded
Hi Noèl!
-nh
and
-nH
are totally different.
from wget 1.7.1 (I think the last version to offer both):
`-nh'
`--no-host-lookup'
Disable the time-consuming DNS lookup of almost all hosts (*note
Host Checking::).
`-nH'
`--no-host-directories'
Disable generation of host-prefixed
Hallo Markus!
This is not a bug (I reckon) and should therefore have been sent to
the normal wget list.
Using both wget 1.7.1 and 1.8.1 on Windows the file is
downloaded with
wget -d -U Mozilla/5.0 (compatible; Konqueror/2.2.1; Linux) -r
Hi again, Ian and fellow wgeteers!
A debug log will be useful if you can produce one.
Sure I (or wget) can and did.
It is 60kB of text. Zipping? Attaching?
Also note that if receive cookies that expire around 2038 with
debugging on, the Windows version of Wget will crash! (This is a
known
Hi List!
As a non-wget-programmer I also think that this
option may be very useful.
I'd be happy to see it wget soon :)
Just thought to drop in some positive feedback :)
CU
Jens
-u, --unfollowed-links=FILE log unfollowed links to FILE.
Nice. It sounds useful.
Hi Noèl!
)message CC changed to normal wget list(
Rate-limiting is possible since wget 1.7.1 or so, please correct me if
it was 1.8!
requests for http post pop up occasionaly,
but as far as I am concerned, I don't need it and
I think it is not in the scope of wget currently.
Filling out forms
Hi Peter!
I was using 153
I am getting 181 now
Good idea, but
--accept=patchdiagxref,103566*,103603*,103612*,103618*
a href=patches/112502readme112502readme/abr
a href=patches/112504-01zip112504-01zip/abr
a href=patches/112504readme112504readme/abr
a
Hi Herold!
Thanks for the testing, I must admit, trying -nd did not occur to me :(
I already have implemented a \ to / conversion in my wgetgui,
but forgot to strip the trailing (as Hrvoje suggested) / *doh*
Anyway, I would of course be happy to see a patch like you proposed,
but I understand
Hi Fred!
First, I think this would rather belong in the normal wget list,
as I cannot see a bug here.
Sorry to the bug tracers, I am posting to the normal wget List and
cc-ing Fred,
hope that is ok.
To your first request: -Q (Quota) should do precisely what you want.
I used it with -k and it
Hi!
I already posted this on the normal wget list, to which I am subscribed.
Problem:
-nh does not work in 1.8 latest windows binary.
By not working I mean that it is not recognized as a valid parameter.
(-nh is no-host look-up and with it on,
two domain names pointing to the same IP are
Hi Hrvoje!
-nh does not work in 1.8 latest windows binary.
By not working I mean that it is not recognized as a valid parameter.
(-nh is no-host look-up and with it on,
two domain names pointing to the same IP are treated as different)
You no longer need `-nh' to get that kind of
Hi!
2. Wouldn't it be a good idea to mention the
deletion of the -nh option in a file?
Maybe. What file do you have in mind?
First and foremost
the news file, but I think it would also not be misplaced in
wget.html and/or wget.hlp /.info (whatever it is called on Unix
systems).
Hi wgeteers!
I noticed that -nh (no host look-up) seems to be gone in 1.8.1.
Is that right?
At first I thought, Oh, you fool, it is -nH, you mixed it up
But, obviously, these are two different options.
I read the news file and the wget.hlp and wget.html but could not find
an answer.
I always
Hi Vladi!
If you are using windows, you might try
http://www.jensroesner.de/wgetgui/
it is a GUI for wGet written in VB 6.0.
If you click on the checkbox identify as browser, wGetGUI
will create a command line like you want.
I use it and it works for me.
Hope this helps?
CU
Jens
Vladi wrote:
Hi guysgals! ;)
I just wanted to let you know, that with v0.5 of wGetGUI, it is now
released under the
GPL, so if you feel like modifying or laughing at the source code, you
can now do so.
CU
Jens
http://www.jensroesner.de/wgetgui
Hi!
For all who cannot download the windows binaries,
they are now available through my site:
http://www.jensroesner.de/wgetgui/data/wget20010605-17b.zip
And while you are there, why not download wGetGUI v0.4?
:) http://www.jensroesner.de/wgetgui
If Heiko is reading this:
May I just keep the
Hi there!
First, let me introduce myself:
I am studying mechanical engineering and for a lecture I am learning
Visual Basic.
I was looking for a non-brain-dead way to get used to it and when a
friend of mine told me that he finds wGet too difficult to use I just
went *bing*
So, look what I have
79 matches
Mail list logo