Re: -X regex syntax? (repost)
On Thu, 17 Feb 2005, [ISO-8859-1] "Jens Rösner" wrote: Hi Jens, } Would -X"*backup" be OK for you? It depends on how the trailing wildcard is used - the actual name of the directories is ".backup", but they are in each directory [and yes, there is html in each page which refers to them, which is why i'm trying to avoid grabbing them in the first place]. I did give -X"*backup" a try, and it too didn't work for me. :( } If yes, give it a try. } If not, I think you'd need the correct escaping for the ".", } but I have no idea how to do that, but } http://mrpip.orcon.net.nz/href/asciichar.html } lists } %2E } as the code. Does this work? I gave that a try too [thanks!], but it still fetches the .backup directory: --exclude-directories="%2Ebackup". However, I would like to confirm something dumb - will wget fetch these directories, regardless of what I put in --exclude-directories, but when it is done fetching the URL, will it then discard those directories? the reason I ask this is because each time I've tried doing this, I've interrupted the process with a ^C when I saw it fetching files from a .backup directory. One of the goals, besides to save disc space, is to save bandwidth, so I'd ideally like wget never to fetch those directories to begin with. Thanks for the tips, Jens! /vjl/
Re: how to follow incorrect links?
Hi Tomasz! > There are some websites with backslashes istead of slashes in links. > For instance : > instead of : > Internet Explorer can "repair" such addresses. My own assumption is: It repairs them, because Microsoft introduced that #censored# way of writing HTML. Anyway, this will not help you, I know. I think you should email the webmaster and tell him/her about the errors. > How to make wget to follow such addresses? I think it is impossible. I can think of one way: start wget -nc -r -l0 -p URL after it finishes, replace all "\" with "/" in the downloaded htm(l) files This will make the html files correct. After that, start wget -nc -r -l0 -p URL again wget will now parse the downloaded and corrected HTML files instead of the wrong files on the net. Continue this procedere until wget does not download any more files. I do not know how handy you are in your OS, but this should be doable with one or two small batch files. Maybe one of the pros has a better idea. :) CU Jens (just another user) -- DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen! AKTION "Kein Einrichtungspreis" nutzen: http://www.gmx.net/de/go/dsl
how to follow incorrect links?
There are some websites with backslashes istead of slashes in links. For instance : instead of : Internet Explorer can "repair" such addresses. My question is: How to make wget to follow such addresses? (I'd like to recursiely retieve a website with many such links) Regards, -tt
Re: -X regex syntax? (repost)
Hi Vince! > So, so far these don't work for me: > > --exclude-directories='*.backup*' > --exclude-directories="*.backup*" > --exclude-directories="*\.backup*" Would -X"*backup" be OK for you? If yes, give it a try. If not, I think you'd need the correct escaping for the ".", but I have no idea how to do that, but http://mrpip.orcon.net.nz/href/asciichar.html lists %2E as the code. Does this work? CU Jens > > I've also tried this on my linux box running v1.9.1 as well. Same results. > Any other ideas? > > Thanks a lot for your tips, and quick reply! > > /vjl/ -- Lassen Sie Ihren Gedanken freien Lauf... z.B. per FreeSMS GMX bietet bis zu 100 FreeSMS/Monat: http://www.gmx.net/de/go/mail
Re: -X regex syntax? (repost)
On Thu, 17 Feb 2005, [ISO-8859-1] "Jens Rösner" wrote: Hi Jens! } > tip or two with regards to using -X? } I'll try! Thanks - I do appreciate it! } > wget -r --exclude-directories='*.backup*' --no-parent \ } > http://example.com/dir/stuff/ } Well, I am using wget under Windows and there, you have } have to use "exp", not 'exp', to make it work. The *x* works as expected. } I could not test whether the . in your dir name causes any problem. I tried it with double quotes, and I'm still seeing wget download files in the .backup directories. I've also tried escaping the "." with a "\" but that doesn't seem to work either. :( So, so far these don't work for me: --exclude-directories='*.backup*' --exclude-directories="*.backup*" --exclude-directories="*\.backup*" I've also tried this on my linux box running v1.9.1 as well. Same results. Any other ideas? Thanks a lot for your tips, and quick reply! /vjl/
Re: -X regex syntax? (repost)
Hi Vince! > tip or two with regards to using -X? I'll try! > wget -r --exclude-directories='*.backup*' --no-parent \ > http://example.com/dir/stuff/ Well, I am using wget under Windows and there, you have have to use "exp", not 'exp', to make it work. The *x* works as expected. I could not test whether the . in your dir name causes any problem. Good luck! Jens (just another user) -- DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen! AKTION "Kein Einrichtungspreis" nutzen: http://www.gmx.net/de/go/dsl
-X regex syntax? (repost)
I hate to do this, but I am still stumped by this. Can anyone pass along a tip or two with regards to using -X? Thanks, /vjl/ [repost follows]: Hi all, I'm using GNU Wget 1.9.1 under Mac OS X, and I'm trying to confirm that I have the correct syntax for using the -X [or --exclude-directories] argument. For example, I have a URL which I would like to wget with a -r. The URL contains many directories that are named, ".backup". I do not wish to download those directories. The way I've been attempting to do that is as follows: wget -r --exclude-directories='*.backup*' --no-parent \ http://example.com/dir/stuff/ This does not appear to work. What is the proper syntax for wget's regex engine? Thanks for any tips you can provide... /vjl/