Re: -X regex syntax? (repost)

2005-02-17 Thread Vince LaMonica
On Thu, 17 Feb 2005, [ISO-8859-1] "Jens Rösner" wrote:

Hi Jens,

} Would -X"*backup" be OK for you? 

It depends on how the trailing wildcard is used - the actual name of the 
directories is ".backup", but they are in each directory [and yes, there 
is html in each page which refers to them, which is why i'm trying to 
avoid grabbing them in the first place]. I did give -X"*backup" a try, and 
it too didn't work for me. :(

} If yes, give it a try.
} If not, I think you'd need the correct escaping for the ".", 
} but I have no idea how to do that, but 
} http://mrpip.orcon.net.nz/href/asciichar.html
} lists
} %2E
} as the code. Does this work?

I gave that a try too [thanks!], but it still fetches the .backup 
directory: --exclude-directories="%2Ebackup".

However, I would like to confirm something dumb - will wget fetch these 
directories, regardless of what I put in --exclude-directories, but when 
it is done fetching the URL, will it then discard those directories? the 
reason I ask this is because each time I've tried doing this, I've 
interrupted the process with a ^C when I saw it fetching files from a 
.backup directory. One of the goals, besides to save disc space, is to 
save bandwidth, so I'd ideally like wget never to fetch those directories 
to begin with.

Thanks for the tips, Jens!

/vjl/

Re: how to follow incorrect links?

2005-02-17 Thread jens . roesner
Hi Tomasz!

> There are some websites with backslashes istead of slashes in links.
> For instance : 
> instead of   : 
> Internet Explorer can "repair" such addresses. 
My own assumption is: It repairs them, because Microsoft 
introduced that #censored# way of writing HTML.
Anyway, this will not help you, I know.
I think you should email the webmaster and tell him/her 
about the errors.

> How to make wget to follow such addresses?
I think it is impossible. 

I can think of one way:
start wget -nc -r -l0 -p URL
after it finishes, replace all "\" with "/" in the downloaded htm(l) files 
This will make the html files correct.
After that, start wget -nc -r -l0 -p URL again
wget will now parse the downloaded and corrected HTML files instead of the
wrong files on the net.
Continue this procedere until wget does not download any more files.
I do not know how handy you are in your OS, but this should be doable with
one or two small batch files.

Maybe one of the pros has a better idea. :)

CU
Jens (just another user)

-- 
DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen!
AKTION "Kein Einrichtungspreis" nutzen: http://www.gmx.net/de/go/dsl


how to follow incorrect links?

2005-02-17 Thread Tomasz Toczyski
There are some websites with backslashes istead of slashes in links.
For instance : 
instead of   : 
Internet Explorer can "repair" such addresses.

My question is:
How to make wget to follow such addresses?
(I'd like to recursiely retieve a website with many such links)


Regards,
-tt


Re: -X regex syntax? (repost)

2005-02-17 Thread Jens Rösner
Hi Vince!

> So, so far these don't work for me:
> 
> --exclude-directories='*.backup*'
> --exclude-directories="*.backup*"
> --exclude-directories="*\.backup*"

Would -X"*backup" be OK for you? 
If yes, give it a try.
If not, I think you'd need the correct escaping for the ".", 
but I have no idea how to do that, but 
http://mrpip.orcon.net.nz/href/asciichar.html
lists
%2E
as the code. Does this work?

CU
Jens


> 
> I've also tried this on my linux box running v1.9.1 as well. Same results.
> Any other ideas?
> 
> Thanks a lot for your tips, and quick reply!
> 
> /vjl/

-- 
Lassen Sie Ihren Gedanken freien Lauf... z.B. per FreeSMS
GMX bietet bis zu 100 FreeSMS/Monat: http://www.gmx.net/de/go/mail


Re: -X regex syntax? (repost)

2005-02-17 Thread Vince LaMonica
On Thu, 17 Feb 2005, [ISO-8859-1] "Jens Rösner" wrote:

Hi Jens!

} > tip or two with regards to using -X?
} I'll try!

Thanks - I do appreciate it!

} > wget -r --exclude-directories='*.backup*' --no-parent \ 
} > http://example.com/dir/stuff/
} Well, I am using wget under Windows and there, you have 
} have to use "exp", not 'exp', to make it work. The *x* works as expected.
} I could not test whether the . in your dir name causes any problem. 

I tried it with double quotes, and I'm still seeing wget download files in 
the .backup directories. I've also tried escaping the "." with a "\" but 
that doesn't seem to work either. :( So, so far these don't work for me:

--exclude-directories='*.backup*'
--exclude-directories="*.backup*"
--exclude-directories="*\.backup*"

I've also tried this on my linux box running v1.9.1 as well. Same results. 
Any other ideas?

Thanks a lot for your tips, and quick reply!

/vjl/

Re: -X regex syntax? (repost)

2005-02-17 Thread Jens Rösner
Hi Vince!

> tip or two with regards to using -X?
I'll try!

> wget -r --exclude-directories='*.backup*' --no-parent \ 
> http://example.com/dir/stuff/
Well, I am using wget under Windows and there, you have 
have to use "exp", not 'exp', to make it work. The *x* works as expected.
I could not test whether the . in your dir name causes any problem. 

Good luck!
Jens (just another user)



-- 
DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen!
AKTION "Kein Einrichtungspreis" nutzen: http://www.gmx.net/de/go/dsl


-X regex syntax? (repost)

2005-02-17 Thread Vince LaMonica
I hate to do this, but I am still stumped by this. Can anyone pass along a 
tip or two with regards to using -X?

Thanks,
/vjl/
[repost follows]:
Hi all,
I'm using GNU Wget 1.9.1 under Mac OS X, and I'm trying to confirm that I have 
the correct syntax for using the -X [or --exclude-directories] argument.

For example, I have a URL which I would like to wget with a -r. The URL 
contains many directories that are named, ".backup". I do not wish to download 
those directories. The way I've been attempting to do that is as follows:

wget -r --exclude-directories='*.backup*' --no-parent \ 
http://example.com/dir/stuff/

This does not appear to work. What is the proper syntax for wget's regex 
engine?

Thanks for any tips you can provide...
/vjl/