Re: links conversion; non-existent index.html

2005-05-01 Thread Jens Rösner

 I know! But that is intentionally left without index.html. It should 
 display content of the directory, and I want that wget mirror it 
 correctly.
 Similar situation is here:
 http://chemfan.pl.feedle.com/arch/chemfanftp/
 it is left intentionally without index.html so that people could download 
 these archives. 
Is something wrong with my browser?
This looks not like a simple directory listing, this file has formatting and
even a background image. http://chemfan.pl.feedle.com/arch/chemfanftp/ looks
the same as http://chemfan.pl.feedle.com/arch/chemfanftp/index.html in my
Mozilla and wget downloads it correctly.

 If wget put here index.html in the mirror of such site 
 then there will be no access to these files.
IMO, this is not correct. index.html will include the info the directory
listing contains at the point of download.
This works for me with znik.wbc.lublin.pl/Mineraly/Ftp/UpLoad/ as well -
what seemed to be  problem according to your other post.

 Well, if wget has to put index.html is such situations then wget is not 
 suitable for mirroring such sites, 
What exactly do you mean? It seems to work for me, e.g. index.html looks
like the apache-generated directory listing. When mirroring, index.html will
be re-written if/when it has changed on the server since the last mirroring.

 and I expect that problem to be 
 corrected in future wget versions.
You expect??

Jens

-- 
+++ Sparen beginnt mit GMX DSL: http://www.gmx.net/de/go/dsl


Re: links conversion; non-existent index.html

2005-05-01 Thread Jens Rösner
Do I understand correctly that the mirror at feeble is created by you and
wget?

  Yes, because this is in th HTML file itself:
  http://znik.wbc.lublin.pl/Mineraly/Ftp/UpLoad/index.html;
  It does not work in a browser, so why should it work in wget?
 It works in the browser:
 http://znik.wbc.lublin.pl/Mineraly/Ftp/UpLoad/
 The is no index.html and the content of the directory is displayed.
I assume I was confused by the different sites you wrote about. I was sure
that both included the same link to ...index.html and the same gif-address.

 http://znik.wbc.lublin.pl/Mineraly/Ftp/UpLoad/index.html
 The link was not converted properly, it should be:
 http://mineraly.feedle.com/Ftp/UpLoad/
 and it should be without any index.html, because there is none in the 
 original.
Wget saves a mirror to your harddisk. Therefore, it cannot rely on an apache
server generating a directory listing. Thus, it created an index.html as
Tony Lewis explained. Now, _you_ uploaded (If I understood correctly) the
copy from your HDD but did not save the index.html. Otherwise it would be
there and it would work.

Jens

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++

10 GB Mailbox, 100 FreeSMS  http://www.gmx.net/de/go/topmail


Re: links conversion; non-existent index.html

2005-05-01 Thread Jens Rösner
  Wget saves a mirror to your harddisk. Therefore, it cannot rely on an
 apache
  server generating a directory listing. Thus, it created an index.html as
 
 Apparently you have not tried to open that link, 
Which link? The non-working one on your incorrect mirror or the working one
on my correct mirror on my HDD?

 got it now?
No need to get snappy, Andrzej.

From your other mail:
 No, you did not understand. I run wget on remote machines. 
Ah! Sorry, missed that.

 The problem is 
 solved though by running the 1.9.1 wget version.
I still am wondering, because even wget 1.5 correctly generates the
index.html from the server output, when called on my local box.
I really do not know what is happening on your remote machine, but my wget
1.5 is able to mirror the site. It creates the
Mineraly/Ftp/UpLoad/index.html file and the correct link to it. 
I understand that it is not what you want (having an index.html), but wget
1.5 creates a working mirror - as it is supposed to do.

CU
Jens





-- 
+++ Sparen beginnt mit GMX DSL: http://www.gmx.net/de/go/dsl


Re: links conversion; non-existent index.html

2005-05-01 Thread Jens Rösner
 The problem was that that link:
 http://znik.wbc.lublin.pl/Mineraly/Ftp/UpLoad/
 instead of being properly converted to:
 http://mineraly.feedle.com/Ftp/UpLoad/
Or, in fact, wget's default:
http://mineraly.feedle.com/Ftp/UpLoad/index.html

 was left like this on the main mirror page:
 http://znik.wbc.lublin.pl/Mineraly/Ftp/UpLoad/index.html
 and hence while clicking on it:
 Not Found
 The requested URL /Mineraly/Ftp/UpLoad/index.html was not found on this 
 server.

Yup. So I assume that the problem you see is not that of wget mirroring, but
a combination of saving to a custom dir (with --cut-dirs and the like) and
conversion of the links. Obviously, the link to
http://znik.wbc.lublin.pl/Mineraly/Ftp/UpLoad/index.html which would be
correct for a standard wget -m URL was carried over while the custom link
to http://mineraly.feedle.com/Ftp/UpLoad/index.html was not created.
My test with wget 1.5 just was a simple wget15 -m -np URL and it worked. 
So maybe the convert/rename problem/bug was solved with 1.9.1
This would also explain the missing gif file, I think.

Jens



-- 
+++ GMX - die erste Adresse für Mail, Message, More +++

10 GB Mailbox, 100 FreeSMS  http://www.gmx.net/de/go/topmail


Re: newbie question

2005-04-14 Thread Jens Rösner
Hi Alan!

As the URL starts with https, it is a secure server. 
You will need to log in to this server in order to download stuff.
See the manual for info how to do that (I have no experience with it).

Good luck
Jens (just another user)


  I am having trouble getting the files I want using a wildcard
 specifier (-A option = accept list).  The following command works fine to
get an
 individual file:
 
 wget

https://164.224.25.30/FY06.nsf/($reload)/85256F8A00606A1585256F900040A32F/$FILE/160RDTEN_FY06PB.pdf
 
 However, I cannot get all PDF files this command: 
 
 wget -A *.pdf

https://164.224.25.30/FY06.nsf/($reload)/85256F8A00606A1585256F900040A32F/$FILE/
 
 Instead, I get:
 
 Connecting to 164.224.25.30:443 . . . connected.
 HTTP request sent, awaiting response . . . 400 Bad Request
 15:57:52  ERROR 400: Bad Request.
 
I also tried this command without success:
 
 wget

https://164.224.25.30/FY06.nsf/($reload)/85256F8A00606A1585256F900040A32F/$FILE/*.pdf
 
 Instead, I get:
 
 HTTP request sent, awaiting response . . . 404 Bad Request
 15:57:52  ERROR 404: Bad Request.
 
  I read through the manual but am still having trouble.  What am I
 doing wrong?
 
 Thanks, Alan
 
 
 

-- 
+++ NEU: GMX DSL_Flatrate! Schon ab 14,99 EUR/Monat! +++

GMX Garantie: Surfen ohne Tempo-Limit! http://www.gmx.net/de/go/dsl


Re: newbie question

2005-04-14 Thread Jens Rösner
Hi! 

Yes, I see now, I misread Alan's original post. 
I thought he would not even be able to download the single .pdf. 
Don't know why, as he clearly said it works getting a single pdf.

Sorry for the confusion! 
Jens

 Tony Lewis [EMAIL PROTECTED] writes:
 
  PS) Jens was mistaken when he said that https requires you to log
  into the server. Some servers may require authentication before
  returning information over a secure (https) channel, but that is not
  a given.
 
 That is true.  HTTPS provides encrypted communication between the
 client and the server, but it doesn't always imply authentication.
 

-- 
+++ GMX - Die erste Adresse für Mail, Message, More +++

1 GB Mailbox bereits in GMX FreeMail http://www.gmx.net/de/go/mail


Re: File rejection is not working

2005-04-06 Thread Jens Rösner
Hi Jerry!

AFAIK, RegExp for (HTML?) file rejection was requested a few times, but is
not implemented at the moment.

CU
Jens (just another user)

 The -R option is not working in wget 1.9.1 for anything but
 specifically-hardcoded filenames..
  
 file[Nn]ames such as [Tt]hese are simply ignored...
  
 Please respond... Do not delete my email address as I am not a
 subscriber... Yet
  
 Thanks
  
 Jerry
 

-- 
Sparen beginnt mit GMX DSL: http://www.gmx.net/de/go/dsl


Re: de Hacene : raport un bug

2005-03-25 Thread Jens Rösner
Hallo!

Je ne parle pas francais (ou presque pas du tout)...

 C:\wgetwget --proxy=on -x -r -l 2 -k -x -l
 imit-rate=50k --tries=45 --directory-prefix=AsptDD 

Je pense que ce doit être:
C:\wgetwget --proxy=on -x -r -l 2 -k -x -limit-rate=50k --tries=45
--directory-prefix=AsptDD 
dans un ligne de text. You had a line-break between -l and imit...:
 wget: reclevel: Invalid specification `imit-rate=50k'.

Encore, votre version de wget est très ancienne.
Furthermore, your version of wget is very old.

Download a newer version for windows here:
http://xoomer.virgilio.it/hherold/

CU
Jens (seulement un autre utilisateur de wget)


-- 
Sparen beginnt mit GMX DSL: http://www.gmx.net/de/go/dsl


Re: Feautre Request: Directory URL's and Mime Content-Type Header

2005-03-21 Thread Jens Rösner
Hi Levander!

I am not an expert by any means, just another user, 
but what does the -E option do for you?
-E = --html-extension 

 apache.  Could wget, for url's that end in slashes, read the 
 content-type header, and if it's text/xml, could wget create index.xml 
 inside the directory wget creates?

Don't you mean create index.html?

CU
Jens


-- 
Happy ProMail bis 24. März: http://www.gmx.net/de/go/promail
Zum 6. Geburtstag gibt's GMX ProMail jetzt 66 Tage kostenlos!


Re: Bug

2005-03-20 Thread Jens Rösner
Hi Jorge!

Current wget versions do not support large files 2GB. 
However, the CVS version does and the fix will be introduced 
to the normal wget source. 

Jens
(just another user)

 When downloading a file of 2GB and more, the counter get crazy, probably
 it should have a long instead if a int number.

-- 
DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen!
AKTION Kein Einrichtungspreis nutzen: http://www.gmx.net/de/go/dsl


Re: -X regex syntax? (repost)

2005-02-18 Thread Jens Rösner
Hi Vince!

 I did give -X*backup a try, and
 it too didn't work for me. :(

Does the -Xdir work for you at all?
If not, there might be a problem with MacOS.
I hope one of the more knowledgeable people here 
can help you!

 However, I would like to confirm something dumb - will wget fetch these 
 directories, regardless of what I put in --exclude-directories, but when 
 it is done fetching the URL, will it then discard those directories? 

As far as I can tell from a log file I just created, wget does not follow
links into these directories. So no files downloaded from them.

CU
Jens


-- 
DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen!
AKTION Kein Einrichtungspreis nutzen: http://www.gmx.net/de/go/dsl


Re: a little help for use wget

2005-02-18 Thread Jens Rösner
Hi LucMa!

I have find a command 
 for autoskip if the file in my PC have the same name of the file 
 on the ftp 
-nc I guess.

 but now i need a command for overwrite the file on my PC if is smaller 
 respect to the file in the ftp.
 -e robots=off -N 
-N should do the trick. 

 http://www.playagain.net/download/m.php?p=roms/zzyzzyx2.zip;
I am not sure about that ? Url, can wget use it?

 Last help, i use a script that copy from an ftp the exactly structure of
 it, 
 so i have on my pc many dir created from wget, is possible create a rules 
 that wget download all the file from an ftp only in one dir?
Does -np work for you? No-parent means it will only descend (go deeper) into
the directory tree. Not up.
Or try -IdownloadDIR which means that wget will only accept the directory
of the file/directory it is started with.

CU
Jens (just another user)

-- 
DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen!
AKTION Kein Einrichtungspreis nutzen: http://www.gmx.net/de/go/dsl


Re: -X regex syntax? (repost)

2005-02-17 Thread Jens Rösner
Hi Vince!

 tip or two with regards to using -X?
I'll try!

 wget -r --exclude-directories='*.backup*' --no-parent \ 
 http://example.com/dir/stuff/
Well, I am using wget under Windows and there, you have 
have to use exp, not 'exp', to make it work. The *x* works as expected.
I could not test whether the . in your dir name causes any problem. 

Good luck!
Jens (just another user)



-- 
DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen!
AKTION Kein Einrichtungspreis nutzen: http://www.gmx.net/de/go/dsl


Re: -X regex syntax? (repost)

2005-02-17 Thread Jens Rösner
Hi Vince!

 So, so far these don't work for me:
 
 --exclude-directories='*.backup*'
 --exclude-directories=*.backup*
 --exclude-directories=*\.backup*

Would -X*backup be OK for you? 
If yes, give it a try.
If not, I think you'd need the correct escaping for the ., 
but I have no idea how to do that, but 
http://mrpip.orcon.net.nz/href/asciichar.html
lists
%2E
as the code. Does this work?

CU
Jens


 
 I've also tried this on my linux box running v1.9.1 as well. Same results.
 Any other ideas?
 
 Thanks a lot for your tips, and quick reply!
 
 /vjl/

-- 
Lassen Sie Ihren Gedanken freien Lauf... z.B. per FreeSMS
GMX bietet bis zu 100 FreeSMS/Monat: http://www.gmx.net/de/go/mail


RE: -i problem: Invalid URL ¹h: Unsupported scheme

2005-02-13 Thread Jens Rösner
Hi Mike!

Strange! 
I suspect that you have some kind of typo in your test.txt
If you cannot spot one, try 
wget -d -o logi.txt -i test.txt
as a command line and send the debug output.

Good luck
Jens (just another user)

 a) I've verified that they both exist
 b) All of the URLs are purely HTTP. In fact, I used http://www.gnu.org as
 the simplest example.
 c) I've tried with both Notepad and Wordpad. I get the same error either
 way.
 d) I actually am using -i. I carelessly allowed the spell checker to
 capitalize it when it sent the e-mail.


-- 
DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen!
AKTION Kein Einrichtungspreis nutzen: http://www.gmx.net/de/go/dsl


Re: command prompt closes immediately after opening

2005-02-12 Thread Jens Rösner
Hi Jon!

 Yes, I tried using the 'command prompt' (thru XP) and it replied:
 
 'wget' is not recognized as an internal or external command, operatable 
 program or batch file
 
Did you cd to the directory wget was unpacked to?
If not, you either need to or add the wget directory to your path
environment. I always have used wget with a .bat file I'd edit and start
from the Windows Explorer. But you might prefer the pure command-line
approach :)

 aside from leaving out a comma after program, 
Well, if my memory of german grammar lessons from 15 years ago are valid 
for english in 2005, the or replaces the , in this enumeration =)

CU
Jens


-- 
DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen!
AKTION Kein Einrichtungspreis nutzen: http://www.gmx.net/de/go/dsl


Re: command prompt closes immediately after opening

2005-02-12 Thread Jens Rösner
Hi Jon!

 and added the text wget http://xoomer.virgilio.it/hherold/index.html;
 in the file and saved it.I then double-clicked on it and nothing 
 happened that I could see.
Well, there now should be a file called index.html in your wget directory!
Now replace the text in your wget.bat with 
wget -x http://xoomer.virgilio.it/hherold/index.html
and watch as wget will create matching directories.

 Did you cd to the directory wget was unpacked to?
 I'm sorry, what does cd mean?
Uhm, cd is the windows command to change directory.

 I have a site www.helpusall.com and all I want to do is backup all the
 html and pictures on the entire site.
use 
wget -m -k www.helpusall.com
it should be all you'll need from what I have seen.

CU
Jens

-- 
DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen!
AKTION Kein Einrichtungspreis nutzen: http://www.gmx.net/de/go/dsl


--cache=off: misleading manual?

2005-01-16 Thread Jens Rösner
Hi Wgeteers!
I understand that -C off as short for --cache=off
was dropped, right?
However, the wget.html that comes with Herold's
Windows binary mentions only
  --no-cache
and the wgetrc option
  cache = on/off
I just tried
1.9+cvs-dev-200404081407 unstable development version
and
--cache=off still works.
I think this is not the latest cvs version and
possibly the manual will be updated accordingly.
But I think it would be nice to mention
that --cache=off still works for backwards compatibility.
I am aware that there are bigger tasks (LFS)
currently, I just stumbled over this issue and
thought I'd mention it.
I hope I am not missing something!
Jens



Re: timeouts not functioning

2005-01-04 Thread Jens Rösner
Hi Colin!

I am just another user and rarely use timeout. 
I hope I am not causing more trouble than I'm solving.

 I'm attempting to use wget's --timeout flag to limit the time spent
 downloading a file. However, wget seems to ignore this setting. For
 example, if I set --timeout=2, the download should take at most 6
 seconds, because the dns, connect, and read timeouts are all 2
 seconds. However, it does not seem to obey any such limit.

Maybe I am misinterpreting what you write, but it sounds as if you expect
wget to stop downloading the file after these 6 seconds? 
I think that timeout refers to the time wget will try an action until it
considers the trials a fail.
E.g.: when after 2 seconds there is no DNS conection, wget will time out, 
if after 2 seconds during get there is no data transferred, it will time
out.

CU
Jens

-- 
+++ Sparen Sie mit GMX DSL +++ http://www.gmx.net/de/go/dsl
AKTION für Wechsler: DSL-Tarife ab 3,99 EUR/Monat + Startguthaben


Re: wget 1.9.1

2004-10-18 Thread Jens Rösner
Hi Gerriet!

 Only three images, which were referenced in styles.css, were missing.
Yes, wget does not parse css or javascript.

 I thought that the -p option causes Wget to download all the files 
 that are necessary to properly display a given HTML page. This includes 
 such things as inlined images, sounds, and referenced stylesheets.
The referenced stylesheet whatever.css should have been saved.

 Yes, it does not explicitly mention images referenced in style sheets, 
 but it does claim to download everything necessary to properly display 
 a given HTML page.
I think this paragraph is misleading. As soon as JavaScript or CSS are
involved in certain ways (like displaying images), -p will not be able to
fully display the site.

 So - is this a bug, 
no - it is a missing feature.

 did I misunderstand the documentation, 
somehow

 did I use the wrong options?
kind of, but as the right options don't exist, you are not to blame ;)

 Should I get a newer version of wget? 
1.9.1 is the latest stable version according to http://wget.sunsite.dk/

CU
Jens (just another user)


-- 
GMX ProMail mit bestem Virenschutz http://www.gmx.net/de/go/mail
+++ Empfehlung der Redaktion +++ Internet Professionell 10/04 +++



Re: img dynsrc not downloaded?

2004-10-17 Thread Jens Rösner
dynsrc is Microsoft DHTML for IE, if I am not mistaken.
As wget is -thankfully- not MS IE, it fails.
I just did a quick google and it seems that the use of  
dynsrc is not recommended anyway. 

What you can do is to download 
http://www.wideopenwest.com/~nkuzmenko7225/Collision.mpg

Jens

(and before you ask, no I am not a developer of wget, just a user)



 Hello.
 Wget could not follow dynsrc tags; the mpeg file was not downloaded:
   pimg dynsrc=Collision.mpg CONTROLS LOOP=1
 at
   http://www.wideopenwest.com/~nkuzmenko7225/Collision.htm
 
 Regards,
 Juhana
 

-- 
+++ GMX DSL Premiumtarife 3 Monate gratis* + WLAN-Router 0,- EUR* +++
Clevere DSL-Nutzer wechseln jetzt zu GMX: http://www.gmx.net/de/go/dsl



-- 
GMX ProMail mit bestem Virenschutz http://www.gmx.net/de/go/mail
+++ Empfehlung der Redaktion +++ Internet Professionell 10/04 +++



Re: wget -r ist not recursive

2004-09-13 Thread Jens Rösner
Hi Helmut!

I suspect there is a robots.txt that says index, no follow

Try
wget -nc -r -l0 -p -np -erobots=off
http://www.vatican.va/archive/DEU0035/_FA.HTM;
it works for me.

-l0 says: infinite recursion depth
-p means page requisites (not really necessary)
-erobots=off orders wget to ignore any robot rules

You could also add 
-k: converts absolute to local links for maximum offline browsability.

CU
Jens

 I tried
 
 wget -r -np -nc http://www.vatican.va/archive/DEU0035/_FA.HTM
 
 both with cygwin / Wget 1.9.1 and Linux / Wget 1.8.2.
 They return just one single file but none of
 
 http://www.vatican.va/archive/DEU0035/_FA1.HTM
 http://www.vatican.va/archive/DEU0035/_FA2.HTM
 
 etc. which are referenced on that page. Looking at the
 downloaded file shows that these links are really present.
  
 What are the correct options to receive these pages recursively?
 
 Thank you for any help,
 
 Helmut Zeisel
 
 
 -- 
 Supergünstige DSL-Tarife + WLAN-Router für 0,- EUR*
 Jetzt zu GMX wechseln und sparen http://www.gmx.net/de/go/dsl
 

-- 
NEU: Bis zu 10 GB Speicher für e-mails  Dateien!
1 GB bereits bei GMX FreeMail http://www.gmx.net/de/go/mail



Re: Cannot WGet Google Search Page?

2004-06-12 Thread Jens Rösner
Hi Phil!

Without more info (wget's verbose or even debug output, full command
line,...) I find it hard to tell what is happening.
However, I have had very good success with wget and google.
So, some hints:
1. protect the google URL by enclosing it in 
2. remember to span (and allow only certain) hosts, otherwise, wget will
only download google pages 
And lastly -but you obviously did so- think about restricting the recursion
depth.

Hope that helps a bit
Jens

  I have been trying to wget several levels deep from a Google search page
 (e.g., http://www.google.com/search?=deepwater+oil). But on the very first
 page, wget returns a 403 Forbidden error and stops. Anyone know how I can
 get around this?
 
 Regards, Phil 
 Philip E. Lewis, P.E.
 [EMAIL PROTECTED]
 
 

-- 
Sie haben neue Mails! - Die GMX Toolbar informiert Sie beim Surfen!
Jetzt aktivieren unter http://www.gmx.net/info



Re: HELP: Can not load websites with frames

2004-06-11 Thread Jens Rösner
Hi François!

Well, it seems to work for me. Here's how:
Open the frame in another window (works in Mozilla easily), 
then you'll see the
URL:
http://www.uniqueairport.com/timetable/fplan_landung_imm.asp?ID_site=1sp=enle=2ID_level1=1ID_level2=2ID_level3=7ID_level4=ID_level5=d=timetable/fplan_landung_imm.aspu=2t=Timetable%20visitorsbr=

You should be able to use that one in wget.

CU
Jens



 Hello,
 
 I'm using wget since months for saving the daily
 arrival/departure information of the local airport.
 Now they changed the design of the website and started
 to use frames. I'm stucked now! No idea how to extract
 the data from this framed website. Any ideas?
 
 Here's an example:

http://www.zurich-airport.com/ZRH/?ID_site=1le=2d=timetable/fplan_landung_imm.aspsp=enu=2ID_level1=1ID_level2=2ID_level3=7t=Timetable%20visitorsa4=94
 
 Thanks for any hints.
 
 Rgds,
 François
 [EMAIL PROTECTED]
 

-- 
+++ Jetzt WLAN-Router für alle DSL-Einsteiger und Wechsler +++
GMX DSL-Powertarife zudem 3 Monate gratis* http://www.gmx.net/dsl



Re: HELP: Can not load websites with frames

2004-06-11 Thread Jens Rösner
Hi all!

François just told me that it works. :)
I thought that maybe I'll should add why it does ;)
The original website sits on www.zurich-airport.com, 
the info frame however is loaded from 
http://www.uniqueairport.com
As wget by default only downloads pages from 
the same server (which makes sense), it 
will not download the info frame.
If one would like to download the complete page, 
then a combination of -D and -H must be used 
to allow wget to travel to different hosts.

CU
Jens


 Hi François!
 
 Well, it seems to work for me. Here's how:
 Open the frame in another window (works in Mozilla easily), 
 then you'll see the
 URL:

http://www.uniqueairport.com/timetable/fplan_landung_imm.asp?ID_site=1sp=enle=2ID_level1=1ID_level2=2ID_level3=7ID_level4=ID_level5=d=timetable/fplan_landung_imm.aspu=2t=Timetable%20visitorsbr=
 
 You should be able to use that one in wget.
 
 CU
 Jens
 
 
 
  Hello,
  
  I'm using wget since months for saving the daily
  arrival/departure information of the local airport.
  Now they changed the design of the website and started
  to use frames. I'm stucked now! No idea how to extract
  the data from this framed website. Any ideas?
  
  Here's an example:
 

http://www.zurich-airport.com/ZRH/?ID_site=1le=2d=timetable/fplan_landung_imm.aspsp=enu=2ID_level1=1ID_level2=2ID_level3=7t=Timetable%20visitorsa4=94
  
  Thanks for any hints.
  
  Rgds,
  François
  [EMAIL PROTECTED]
  
 
 -- 
 +++ Jetzt WLAN-Router für alle DSL-Einsteiger und Wechsler +++
 GMX DSL-Powertarife zudem 3 Monate gratis* http://www.gmx.net/dsl
 

-- 
+++ Jetzt WLAN-Router für alle DSL-Einsteiger und Wechsler +++
GMX DSL-Powertarife zudem 3 Monate gratis* http://www.gmx.net/dsl



Re: Maybe a bug or something else for wget

2004-05-24 Thread Jens Rösner
Hi Ben!

Not at a bug as far as I can see.
Use -A to accept only certain files.
Furthermore, the pdf and ppt files are located across various servers, 
you need to allow wget to parse other servers than the original one by -H 
and then restrict it to only certain ones by -D.

wget -nc -x -r -l2 -p -erobots=off -t10 -w2 --random-wait --waitretry=7 -U
Mozilla/4.03 [en] (X11; I; SunOS 5.5.1 sun4u)
--referer=http://devresource.hp.com/drc/topics/utility_comp.jsp; -k -v
-A*.ppt,*.pdf,utility_comp.jsp -H -Dwww.hpl.hp.com,www.nesc.ac.uk  
http://devresource.hp.com/drc/topics/utility_comp.jsp

works for me. It was generated using my gui front-end to wget, so it is not
streamlined ;)

Jens



 Hi,
 How can I download all pdf and ppt file by the following url with command
 line of:
 
 wget -k -r -l 1 http://devresource.hp.com/drc/topics/utility_comp.jsp
 
 I am on windows 2000 server sp4 with latest update.
 
 E:\Releasewget -V
 GNU Wget 1.9.1
 
 Copyright (C) 2003 Free Software Foundation, Inc.
 This program is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.
 
 Originally written by Hrvoje Niksic [EMAIL PROTECTED].
 
 
 Thank you for your nice work.
 
 Ben
 

-- 
NEU : GMX Internet.FreeDSL
Ab sofort DSL-Tarif ohne Grundgebühr: http://www.gmx.net/dsl



Re: Site Mirror

2004-05-11 Thread Jens Rösner
Hi Kelvin!

I must admit that I am a bit puzzled.

 I am trying to mirror a web site that has many
 hierarchical levels.  
 I am using the command 
 wget -m -k $site
 which allows me to view the site fine.  

 However, I wish the mirror to make a directory
 structure that also mimics the website rather than
 having the html files in a single directory.
Normally, your syntax should do this.
With -r (and -m ist short for -r -N IIRC) 
the host directories should be created locally as well.
Here are some lines from the wget manual:


-nd
--no-directories
Do not create a hierarchy of directories when retrieving recursively.  With
this option turned on, all files will get saved to the current directory,
without clobbering (if a name shows up more than once, the filenames will
get extensions .n).

-x
--force-directories
The opposite of -nd--create a hierarchy of directories, even if one would
not have been created otherwise.  E.g. wget -x
http://fly.srk.fer.hr/robots.txt will save the downloaded file to
fly.srk.fer.hr/robots.txt.

-nH
--no-host-directories
Disable generation of host-prefixed directories.  By default, invoking Wget
with -r http://fly.srk.fer.hr/ will create a structure of directories
beginning with fly.srk.fer.hr/.  This option disables such behavior.



So, I'd recommend trying the -x switch, although I am not sure what your
problem is exactly.

CU
Jens


-- 
NEU : GMX Internet.FreeDSL
Ab sofort DSL-Tarif ohne Grundgebühr: http://www.gmx.net/dsl



Re: skip robots

2004-02-08 Thread Jens Rösner
Hi Hrvoje!

 In other words, save a copy of wget.texi, make the change, and send the
 output of `diff -u wget.texi.orig wget.texi'.  That's it.

Uhm, ok. 
I found diff for windows among other GNU utilities at
http://unxutils.sourceforge.net/
if someone is interested.

 distribution.  See
 http://cvs.sunsite.dk/viewcvs.cgi/*checkout*/wget/PATCHES?rev=1.5

Thanks, I tried to understand that. Let's see if I understood it.
Sorry if I am not sending this to the patches list, the document above 
says that it is ok to evaluate the patch with the general list.

CU
Jens


Patch sum up:
a) Tell users how to --execute more than one wgetrc command
b) Tell about and link to --execute when explaining wgetrc commands.
Reason: Better understanding and navigating the manual.

ChangeLog entry:
Changed wget.texi concerning --execute switch to facilitate 
use and user navigation.

Start patch:


409,410c409,412
 @emph{after} the commands in @file{.wgetrc}, thus taking precedence over
 them.
---
 @emph{after} the commands in @file{.wgetrc}, thus taking precedence over 
 them. If you need to use more than one wgetrc command in your
 command-line, use -e preceeding each.
 
2150,2151c2152,2154
 Most of these commands have command-line equivalents (@pxref{Invoking}),
 though some of the more obscure or rarely used ones do not.
---
 Most of these commands have command-line equivalents (@pxref{Invoking}).
Any
 wgetrc command can be used in the command-line by using the -e (--execute)
(@pxref{Basic Startup Options}) switch.
 


-- 
GMX ProMail (250 MB Mailbox, 50 FreeSMS, Virenschutz, 2,99 EUR/Monat...)
jetzt 3 Monate GRATIS + 3x DER SPIEGEL +++ http://www.gmx.net/derspiegel +++



Re: skip robots

2004-02-08 Thread Jens Rösner
 You're close.  You forgot the `-u' option to diff (very important),
 and you snipped the beginning of the `patch' output (also important).

Ok, I forgot the -u switch which was stupid as I actually read 
the command line in the patches file :(
But concerning the snipping I just did 
diff   file.txt 
so I cannot have snipped anything. Is my shell (win2000) 
doing something wrongly or is the missing bit there now (when using the -u
switch).

Jens

Once more:

Patch sum up:
a) Tell users how to --execute more than one wgetrc command
b) Tell about and link to --execute when listing wgetrc commands.
Reason: Better understanding and navigating the manual

ChangeLog entry:
Changed wget.texi concerning --execute switch to facilitate 
use and user navigation.

Start patch:

--- wget.texi   Sun Nov 09 00:46:32 2003
+++ wget_mod.texi   Sun Feb 08 20:46:07 2004
@@ -406,8 +406,10 @@
 @itemx --execute @var{command}
 Execute @var{command} as if it were a part of @file{.wgetrc}
 (@pxref{Startup File}).  A command thus invoked will be executed
[EMAIL PROTECTED] the commands in @file{.wgetrc}, thus taking precedence over
-them.
[EMAIL PROTECTED] the commands in @file{.wgetrc}, thus taking precedence over 
+them. If you need to use more than one wgetrc command in your
+command-line, use -e preceeding each.
+
 @end table
 
 @node Logging and Input File Options, Download Options, Basic Startup
Options, Invoking
@@ -2147,8 +2149,9 @@
 integer, or @samp{inf} for infinity, where appropriate.  @var{string}
 values can be any non-empty string.
 
-Most of these commands have command-line equivalents (@pxref{Invoking}),
-though some of the more obscure or rarely used ones do not.
+Most of these commands have command-line equivalents (@pxref{Invoking}).
Any
+wgetrc command can be used in the command-line by using the -e (--execute)
(@pxref{Basic Startup Options}) switch.
+
 
 @table @asis
 @item accept/reject = @var{string} 






-- 
GMX ProMail (250 MB Mailbox, 50 FreeSMS, Virenschutz, 2,99 EUR/Monat...)
jetzt 3 Monate GRATIS + 3x DER SPIEGEL +++ http://www.gmx.net/derspiegel +++



Re: Startup delay on Windows

2004-02-08 Thread Jens Rösner
[...]
Cygwin considers `c:\Documents and Settings\USERNAME' to be the
home directory.  I wonder if that is reachable through registry...
 
 Does anyone have an idea what we should consider the home dir under
 Windows, and how to find it?

Doesn't this depend on each user's personal preference?
I think most could live with
c:\Documents and Settings\all users (or whatever it is called in each
language) 
or the cygwin approach 
c:\Documents and Settings\USERNAME
which will be less likely to conflict with security limits on multi-user PCs
I think.
I personally would like to keep everything wget-ish in the directory its exe
is in 
and treat that as its home dir.

BTW: 
Is this bug connected to the bug under Windows, that saving into another
directory 
than wget's starting dir by using the -P (--directory-prefix) option 
does not work when switching drives?

wget -r -P C:\temp URL
will save to
.\C3A\temp\*.*

wget -r -P 'C:\temp\' URL
will save to
.\'C3A\temp\'\*.*

wget -r -P C:\temp\ URL
does not work at all ('Missing URL') error

however
wget -r -P ..\temp2\ URL
works like a charme.

CU
Jens





-- 
GMX ProMail (250 MB Mailbox, 50 FreeSMS, Virenschutz, 2,99 EUR/Monat...)
jetzt 3 Monate GRATIS + 3x DER SPIEGEL +++ http://www.gmx.net/derspiegel +++



Re: skip robots

2004-02-07 Thread Jens Rösner
Hi Hrvoje!

  PS: One note to the manual editor(s?): The -e switch could be
  (briefly?) mentioned also at the wgetrc commands paragraph. I
  think it would make sense to mention it there again without
  clustering the manual too much. Currently it is only mentioned in
  Basic Startup Options (and in an example dealing with robots).
  Opinions?
 
 Sure, why not.  Have you just volunteered to write the patch?  :-)
 
Touché!
;)
Well, seriously, I don't know how to write a patch!

I'll write a text and maybe someones finds it useful and embedds it into a
patch.
Or someone sends me a link that explains patching so that even a Windowser 
can do it ;)

I'd suggest that in the paragraph of Wgetrc Commands 
instead of
Most of these commands have command-line equivalents (see Invoking.),
though some of the more obscure or rarely used ones do not.
could be written
Most of these commands have command-line equivalents (see Invoking.).  Any
wgetrc command can be used in the command-line by using the -e (--execute)
[LINK] switch.

There, in the paragraph Basic Startup Options
instead of 
-e command
--execute command
Execute command as if it were a part of .wgetrc (see Startup File.). A
command thus invoked will be executed after the commands in .wgetrc, thus taking
precedence over them.
I'd suggest
-e command
--execute command
Execute command as if it were a part of .wgetrc (see Startup File.). A
command thus invoked will be executed after the commands in .wgetrc, thus taking
precedence over them. If you need to use more than one wgetrc command in your
command-line, use -e preceeding each.

Hope this is ok
Jens


-- 
GMX ProMail (250 MB Mailbox, 50 FreeSMS, Virenschutz, 2,99 EUR/Monat...)
jetzt 3 Monate GRATIS + 3x DER SPIEGEL +++ http://www.gmx.net/derspiegel +++



Re: Why no -nc with -N?

2004-02-05 Thread Jens Rösner
Hi Dan,

I must admit that I don't fully understand your question.

-nc
means no clobber, that means that files that already exist
locally are not downloaded again, independent from their age or size or 
whatever.

-N
means that only newer files are downloaded (or if the size differs).

So these two options are mutually exclusive.
I could imagine that you want something like
wget --no-clobber --keep-server-time URL
right?
If I understand the manual correctly, this date should normally be kept 
for http,
at least if you specify
wget URL
I just tested this and it works for me.
(With -S and/or -s you can print the http headers, if you need to.)

However, I noticed that quite many servers do not provide a 
last-modified header.

Did this answer your question?
Jens






 I'd love to have an option so that, when mirroring, it
 will backup only files that are replaced because they
 are newer on the source system (time-stamping).

 Is there a reason these can't be enabled together?

 __
 Do you Yahoo!?
 Yahoo! SiteBuilder - Free web site building tool. Try it!
 http://webhosting.yahoo.com/ps/sb/




 I'd love to have an option so that, when mirroring, it
 will backup only files that are replaced because they
 are newer on the source system (time-stamping).
 
 Is there a reason these can't be enabled together?
 
 __
 Do you Yahoo!?
 Yahoo! SiteBuilder - Free web site building tool. Try it!
 http://webhosting.yahoo.com/ps/sb/
 

-- 
GMX ProMail (250 MB Mailbox, 50 FreeSMS, Virenschutz, 2,99 EUR/Monat...)
jetzt 3 Monate GRATIS + 3x DER SPIEGEL +++ http://www.gmx.net/derspiegel +++



Re: skip robots

2004-02-04 Thread Jens Rösner
use 
robots = on/off in your wgetrc
or 
wget -e robots = on/off URL in your command line

Jens

PS: One note to the manual editor(s?): 
The -e switch could be (briefly?) mentioned 
also at the wgetrc commands paragraph. 
I think it would make sense to mention it there again 
without clustering the manual too much. 
Currently it is only mentioned in Basic Startup Options
(and in an example dealing with robots).
Opinions?



 I onced used the skip robots directive in the wgetrc file.
 But I can't find it anymore in wget 1.9.1 documentation.
 Did it disapeared from the doc or from the program ?
 
 Please answer me, as I'm not subscribed to this list
 

-- 
GMX ProMail (250 MB Mailbox, 50 FreeSMS, Virenschutz, 2,99 EUR/Monat...)
jetzt 3 Monate GRATIS + 3x DER SPIEGEL +++ http://www.gmx.net/derspiegel +++



RE: apt-get via Windows with wget

2004-02-03 Thread Jens Rösner
Hi Heiko!

  Until now, I linked to your main page. 
  Would you mind if people short-cut this? 
 Linking to the directory is bad since people would download 

Sorry, I meant linking directly to the latest zip.
However, I personally prefer to read what the provider 
(in this case you) has to say about a download anyway.


 Do link to the complete url if you prefer to, although I like to keep 
 some stats.

Understood.


 for example since start of the year
 there have been 7 referrals from www.jensroesner.de/wgetgui 

Wow, that's massive... 
...not!
;-)


 Since that is about 0.05% stats shouldn't be 
 altered too much if you link directly to the archive ;-)

Thanks for pointing that out ;-}


  What do you think about adding a latest-ssl-libraries.zip?
 I don't think so.
 If you get the latest complete wget archive those are included anyway 
 and you are sure it will work. 

Oh I'm very sorry, must have overread/misunderstood that. 
I thought the latest zip would not contain the SSLs.
That's great.


 I'd prefer to not force a unneeded (admittedly, small) download by
bundling 
 the ssl libraries in every package.

Very true.
Especially as wget seems to be used by quite some people on slow
connections.


Kind regards
Jens




-- 
GMX ProMail (250 MB Mailbox, 50 FreeSMS, Virenschutz, 2,99 EUR/Monat...)
jetzt 3 Monate GRATIS + 3x DER SPIEGEL +++ http://www.gmx.net/derspiegel +++



Re: downloading multiple files question...

2004-02-03 Thread Jens Rösner
Hi Ron!

If I understand you correctly, you could probably use the 
-A acclist
--accept acclist
accept = acclist
option.

So, probably (depending on your site), the syntax should be something like:
wget -r -A *.pdf URL
wget -r -A *.pdf -np URL
or, if you have to recurse through multiple html files, 
it could be necessary/beneficial to
wget -r -l0 -A *.pdf,*.htm* -np URL

Hope that helps (and is correct ;) )
Jens


 In the docs I've seen on wget, I see that I can use wildcards to 
 download multiple files on ftp sites.  So using *.pdf would get me all 
 the pdfs in a directory.  It seems that this isn't possible with http 
 sites though.  For work I often have to download lots of pdfs when 
 there's new info I need, so is there any way to download multiple files 
 of the same type from an http web page?
 
 I'd like to be cc'd in replies to my post please as I'm not subscribed 
 to the mailing list.
 

-- 
GMX ProMail (250 MB Mailbox, 50 FreeSMS, Virenschutz, 2,99 EUR/Monat...)
jetzt 3 Monate GRATIS + 3x DER SPIEGEL +++ http://www.gmx.net/derspiegel +++



RE: apt-get via Windows with wget

2004-02-02 Thread Jens Rösner
Hello Heiko!

 I added a wget-complete-stable.zip, if you want to link to a fixed url 
 use
 that, I'll update it whenever needed. Currently it is the same archive 
 as the wget-wget-1.9.1b-complete.zip .

Great! Thank you very much, Heiko. 
I think I'll use it on my wgetgui page as well! :)
But what would you prefer?
Until now, I linked to your main page. 
Would you mind if people short-cut this? 

[SSL-enabled / plain binaries]
 the ssl version. As long as the libraries are placed somewhere in the 
 path
 OR simply kept in the same directory where wget is the ssl version is 
 fine for everything after all.

I agree.  
What do you think about adding a latest-ssl-libraries.zip?

Kind regards
Jens


-- 
+++ Mailpower für Multimedia-Begeisterte: http://www.gmx.net/topmail +++
250 MB Mailbox, 1 GB Online-Festplatte, 100 FreeSMS. Jetzt kostenlos testen!



Re: apt-get via Windows with wget

2004-02-01 Thread Jens Rösner
Note:
Mail redirected from bug to normal wget list.


 H For getting Wget you might want to link directly to
 H ftp://ftp.sunsite.dk/projects/wget/windows/wget-1.9.1b-complete.zip,
 OK, but too bad there's no stable second link .../latest.zip so I
 don't have to update my web page to follow the link.
Yep, this would make things much easier for applications like yours.
However, I think the current wget version can to all you'll ever 
need for that purpose.

 Furthermore, they don't need SSL, but I don't see any 'diet'
 versions...
Right, Heiko is so kind to compile the SSL enabled wget binaries. 
If you need it without SSL, you would have to compile it yourself.
But since you don't have windows...

 H Oh, and the Windows users should preferrably be ones who know how to
 H run a command-line application, but I assume you've got that covered.
 Exactly not.  I recall being able to get to a little window where one
 enters a command... Anyway, can you give an example of all the steps
 needed to do wget -x -i fetch_list.txt -B
 http://debian.linux.org.tw/debian/pool/main/

Hm, why don't you do the following:
download wget and the ssl libraries, unzip them on a windows box 
(I know, you don't have it, but someone on this planet you know, should have
it, 
I heared it is fairly wide spread). 
Unzip them to a sensible directory like c:\wget\
add a startupdate.bat file to the directory
this file should read something like 
wget -x -i fetch_list.txt -B http://debian.linux.org.tw/debian/pool/main/
Now, pack everything into a zip again, preserving full folder info. 
(I always use Power Archiver 6.1, the last freeware version.)
now create a self-extracting archive from it.
Distribute the archive.exe to your buddies 
all they have to do is 
a) doubleclick on the archive
b) browse to c:\wget\ with Windows explorer and 
c) doubleclick on startupdate.bat
d) afterwards, do the CD writing

Thinking about it, you could distribute wget with the 
SSL and startupdate.bat file unzipped on a 1.44MB floppy disk.

CU
Jens

http://www.jensroesner.de/wgetgui/




-- 
+++ Mailpower für Multimedia-Begeisterte: http://www.gmx.net/topmail +++
250 MB Mailbox, 1 GB Online-Festplatte, 100 FreeSMS. Jetzt kostenlos testen!



Re: Spaces in directories/files are converted to '@' symbol.

2004-01-09 Thread Jens Rösner
Hi Tommy!

Does this option, first shown in 1.9.1 (I think) help you:
--restrict-file-names=mode 
It controls file-name escaping. 
I'll mail the complete extract from the manual to your private mail address.
You can download the current wget version from
http://www.sunsite.dk/wget/

CU
Jens




 I've notice that spaces in directories/files are automatically converted
 to  '@' character.  Is there any way to turn this option off?
 
 e.g.   template directory   =   [EMAIL PROTECTED] 
 
 
 Thanks,
 
 Tommy. 

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net




Re: problem with LF/CR etc.

2003-11-20 Thread Jens Rösner
Hi!

 Do you propose that squashing newlines would break legitimate uses of
 unescaped newlines in links?  
I personally think that this is the main question.
If it doesn't break other things, implement squashing newlines 
as the default behaviour.

 Or are you arguing on principle that
 such practices are too heinous to cater to by default?  
Well, if I may speak openly, 
I don't think wget should be a moralist here.
If the fix is easy to implement and doesn't break things, let's do it. 
After all, ignoring these links does not punish the culprit (the HTML coder)

but the innocent user, who expects that wget will download the site.

 IMHO we should either cater to this by default or not at all.
Agreed.
But if (for whatever reasons) an option is unavoidable, I would 
suggest something like
--relax_html_rules #integer
where integer is a bit-code (I hope that's the right term). 
For example
0 = off
1 (2^0)= smart comment checking
2 (2^1)= smart line-break checking
4 (2^2)= option to come
8 (2^3)= another option to come
So specifiying
wget -m --relax_html_rules 0 URL
would ensure strict HTML obeyance, while
wget -m --relax_html_rules 15 URL
would relax the above mentioned rules
By using this bit-code, one integer is able 
to represent all combinations of relaxations 
by summing up the individual options.
One could even think about 
wget -m --relax_html_rules inf URL
to ensure that _all_ rules are relaxed, 
to be upward compatible with future wget versions.
Whether 
--relax_html_rules inf
or
--relax_html_rules 0
or 
--relax_html_rules another-combination-that-makes-most-sense
should be default, is up to negotiation.
However, I would vote for complete relaxation.

I hope that made a bit of sense
Jens





-- 
GMX Weihnachts-Special: Seychellen-Traumreise zu gewinnen!

Rentier entlaufen. Finden Sie Rudolph! Als Belohnung winken
tolle Preise. http://www.gmx.net/de/cgi/specialmail/

+++ GMX - die erste Adresse für Mail, Message, More! +++



Re: How to send line breaks for textarea tag with wget

2003-11-16 Thread Jens Rösner
Hi Jing-Shin!

 Thanks for the pointers. Where can I get a version that support
 the --post-data option? My newest version is 1.8.2, but it doesn't
 have this option. -JS

Current version is 1.9.1.
The wget site lists download options on 
http://wget.sunsite.dk/#downloading

Good luck
Jens


-- 
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++



Re: Web page source using wget?

2003-10-13 Thread Jens Rösner
Hi Suhas!

Well, I am by no means an expert, but I think that wget 
closes the connection after the first retrieval. 
The SSL server realizes this and decides that wget has no right to log in 
for the second retrieval, eventhough the cookie is there.
I think that is a correct behaviour for a secure server, isn't it?

Does this make sense? 
Jens


 A slight correction the first wget should read:
 
 wget --save-cookies=cookies.txt 
 http://customer.website.com/supplyweb/general/default.asp?UserAccount=U
 SERAccessCode=PASSWORDLocale=en-usTimeZone=EST:-300action-Submi
 t=Login
 
 I tried this link in IE, but it it comes back to the same login screen. 
 No errors messages are displayed at this point. Am I missing something? 
 I have attached the source for the login page.
 
 Thanks,
 Suhas
 
 
 - Original Message - 
 From: Suhas Tembe [EMAIL PROTECTED]
 To: Hrvoje Niksic [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Monday, October 13, 2003 11:53 AM
 Subject: Re: Web page source using wget?
 
 
 I tried, but it doesn't seem to have worked. This what I did:
 
 wget --save-cookies=cookies.txt 
 http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLoca
 le=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time 
 (USA amp; Canada)action-Submit=Login
 
 wget --load-cookies=cookies.txt 
 http://customer.website.com/supplyweb/smi/inventorystatus.asp?cboSupplier
 =4541-134289status=allaction-select=Query 
 --http-user=4542-134289
 
 After executing the above two lines, it creates two files: 
 1). [EMAIL PROTECTED] :  I can see that 
 this file contains a message (among other things): Your session has 
 expired due to a period of inactivity
 2). [EMAIL PROTECTED]
 
 Thanks,
 Suhas
 
 
 - Original Message - 
 From: Hrvoje Niksic [EMAIL PROTECTED]
 To: Suhas Tembe [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Monday, October 13, 2003 11:37 AM
 Subject: Re: Web page source using wget?
 
 
  Suhas Tembe [EMAIL PROTECTED] writes:
  
   There are two steps involved:
   1). Log in to the customer's web site. I was able to create the 
 following link after I looked at the form section in the source as 
 explained to me earlier by Hrvoje.
   wget 
 http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLoca
 le=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time 
 (USA amp; Canada)action-Submit=Login
  
  Did you add --save-cookies=FILE?  By default Wget will use cookies,
  but will not save them to an external file and they will therefore be
  lost.
  
   2). Execute: wget
   
 http://customer.website.com/InventoryStatus.asp?cboSupplier=4541-134289
 status=allaction-select=Query
  
  For this step, add --load-cookies=FILE, where FILE is the same file
  you specified to --save-cookies above.
 
 

-- 
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++



Re: Web page source using wget?

2003-10-13 Thread Jens Rösner
Hi Hrvoje!

  retrieval, eventhough the cookie is there.  I think that is a
  correct behaviour for a secure server, isn't it?
 Why would it be correct?  
Sorry, I seem to have been misled by my own (limited) experience:
From the few secure sites I use, most will not let you 
log in again after you closed and restarted your browser or redialed 
your connection. That's what reminded my of Suhas' problem.

 Even if it were the case, you could tell Wget to use the same
 connection, like this:
 wget http://URL1... http://URL2...
Right, I always forget that, thanks!

Cya
Jens



-- 
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++



Re: Error: wget for Windows.

2003-10-08 Thread Jens Rösner
Hi Suhas!

 I am trying to use wget for Windows  get this message: The ordinal 508 
 could not be located in the dynamic link library LIBEAY32.dll.

You are very probably using the wrong version of the SSL files.
Take a look at 
http://xoomer.virgilio.it/hherold/
Herold has nicely rearranged the links to 
wget binaries and the SSL binaries.
As you can see, different wget versions need 
different SSL versions-
Just download the matching SSL, 
everything else should then be easy :)

Jens



 
 This is the command I am using:
 wget http://www.website.com --http-user=username 
 --http-passwd=password
 
 I have the LIBEAY32.dll file in the same folder as the wget. What could 
 be wrong?
 
 Thanks in advance.
 Suhas
 

-- 
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++



Re: no-clobber add more suffix

2003-10-06 Thread Jens Rösner
Hi Sergey!

-nc does not only apply to .htm(l) files.
All files are considered.
At least in all wget versions I know of.

I cannot comment on your suggestion, to restrict -nc to a 
user-specified list of file types.
I personally don't need it, but I could imagine certain situations 
were this could indeed be helpful. 
Hopefully someone with more knowledge than me 
can elaborate a bit more on this :)

CU
Jens



 `--no-clobber' is very usfull option, but i retrive document not only with
 .html/.htm suffix.
 
 Make addition option that like -A/-R define all allowed/rejected rules
 for -nc option.
 

-- 
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++



Bug in Windows binary?

2003-10-05 Thread Jens Rösner
Hi!

I downloaded 
wget 1.9 beta 2003/09/29 from Heiko
http://xoomer.virgilio.it/hherold/
along with the SSL binaries.
wget --help 
and 
wget --version 
will work, but 
any downloading like 
wget http://www.google.com
will immediately fail.
The debug output is very brief as well:

wget -d http://www.google.com
DEBUG output created by Wget 1.9-beta on Windows.

set_sleep_mode(): mode 0x8001, rc 0x8000

I disabled my wgetrc as well and the output was exactly the same.

I then tested 
wget 1.9 beta 2003/09/18 (earlier build!)
from the same place and it works smoothly.

Can anyone reproduce this bug?
System is Win2000, latest Service Pack installed.

Thanks for your assistance and sorry if I missed an 
earlier report of this bug, I know a lot has been done over the last weeks 
and I may have missed something.
Jens



-- 
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++



Re: The Dynamic link Library LIBEAY32.dll

2003-01-14 Thread Jens Rösner
Hi Stacee, 

a quick cut'n'paste into google revealed the following page:
http://curl.haxx.se/mail/archive-2001-06/0017.html

Hope that helps
Jens


 Stacee Kinney wrote:
 
 Hello,
 
 I installed Wget.exe on a Windows 2000 system and has setup Wget.exe
 to run a maintenance file on an hourly bases. However, I am getting
 the following error.
 
 wget.exe - Unable to Locate DLL
 
 The dynamic link library LIBEAY32.dll could not be found in the
 specified path
 
C:\WINNT;,;C:\WINNT\System32;C:\WINNT\system;c:\WINNT;C:\Perl\bin;C\WINNT\system32;C;WINNT;C:\WINNT\system32\WBEM.
 
 I am not at all knowledgeable about Wget and just tried to follow
 instructions for its installation to run the maintenance program.
 Could you please help me with this problem and the DLL file Wget is
 looking for?
 
 Regards
 Stacee



Re: wget -m imply -np?

2002-12-30 Thread Jens Rösner
Hi Karl!

From my POV, the current set-up is the best solution. 
Of course, I am also no developer, but an avid user.
Sometimes you just don't know the structure of the website 
in advance, so using -m as a trouble-free no-brainer 
will get you the complete site neatly done with timestamps.
BTW, -m is an abbreviation:
-m = -r -l0 -N IIRC
If you _know_ that you don't want to grab upwards, 
just add -np and you're done. Otherwise someone 
would have to come up with a switch to disable the default -np 
that you suggested or the user would have to rely on 
the single options that -m is made of - hassle without benefit.
You furthermore said:
generally, that leads to the whole Internet
That is wrong, if I understand you correctly. 
Wget will always stay at the start-host, except when you 
allow different hosts via a smart combination of 
-D -H -I 
switches.

H2H
Jens


Karl Berry wrote:
 
 I wonder if it would make sense for wget -m (--mirror) to imply -np
 (--no-parent).  I know that I, at least, have no interest in ever
 mirroring anything above the target url(s) -- generally, that leads to
 the whole Internet.  An option to explicitly include the parents could
 be added.
 
 Just a suggestion.  Thanks for the great software.
 
 [EMAIL PROTECTED]



Re: Improvement: Input file Option

2002-10-08 Thread Jens Rösner

Hi Pi!

Copied straight from the wget.hlp:

#

-i file
--input-file=file

Read URLs from file, in which case no URLs need to be on the command
line.  If there are URLs both on the command line and in an input file,
those on the command lines will be the first ones to be retrieved.  The
file need not be an HTML document (but no harm if it is)--it is enough
if the URLs are just listed sequentially.

However, if you specify --force-html, the document will be regarded as
html.  In that case you may have problems with relative links, which you
can solve either by adding base href=url to the documents or by
specifying --base=url on the command line.

-F
--force-html

When input is read from a file, force it to be treated as an HTML file. 
This enables you to retrieve relative links from existing HTML files on
your local disk, by adding base href=url to HTML, or using the
--base command-line option.

-B URL
--base=URL

When used in conjunction with -F, prepends URL to relative links in the
file specified by -i.

#

I think that should help, or I am missing your point.

CU
Jens


Thomas Otto wrote:
 
 Hi!
 
 I miss an option to use wget with a local html file that I have
 downloaded and maybe already edited. Wget should take this file plus the
 option where this file originally came from and take this file instead
 of the first document it gets after connecting.
 
-Thomas



Re: -p is not respected when used with --no-parent

2002-09-20 Thread Jens Rösner

Hi Dominic!

Since wget 1.8, the following should be the case:
*
*** When in page-requisites (-p) mode, no-parent (-np) is ignored when
retrieving for inline images, stylesheets, and other documents needed
to display the page.
**
(Taken from the included news file of wget 1.8.1)

I however remember that I once had the same problem, 
that -p -np will only get page requsites under or at the current
directory.
I currently run wget 1.9-beta and haven't seen the problem yet.

CU
Jens


 Dominic Chambers wrote:
 
 Hi again,
 
 I just noticed that one of the inline images on one of the jobs I did
 was not included. I looked into this, and it was because it was it
 outside the scope that I had asked to remain within using --no-parent.
 So I ran the job again, but using the -p option to ensure that I kept
 to the right pages, but got all the page requesites regardless.
 
 However, this had no effect, and I therefore assume that this option
 is not compatible with the --no-parent option:
 
 wget -p -r -l0 -A htm,html,png,gif,jpg,jpeg
 --no-parent http://java.sun.com/products/jlf/ed2/book/index.html
 
 Hope that helps, Dominic.



Re: user-agent string

2002-09-02 Thread Jens Rösner

Hi Jakub!

But I get the same files as running this coomand without using
user-agent
string.

What is wrong with the files you get?
Do you not get all the files?
Many servers (sites) do not make a difference what 
user-agent accesses them. So the files will not differ.
If you know that you don't get all the files (or the wrong ones), 
it maybe that you should ignore robots by the 
wgetrc command
robots = on/off
or you need a special referrer if you want 
to start in the middle of the site.

CU
Jens


 (Jakub Grosman) wrote:
 
 Hi all,
 I am using wget a long time ago and it is realy great utility.
 I run wget 1.8.22 on redhat 7.3 and my problem is concerning the user agent
 string.
 I run this command:
 wget -m --user-agent=Mozilla/4.0 (compatible; MSIE 5.5; Windows 98) -l0 -H
 -Dsite -nd -np -Pdirectory http://site
 
 But I get the same files as running this coomand without using user-agent
 string.
 Could someone explain me, what I am making not correct?
 
 Thanks
 Jakub



Re: getting the correct links

2002-08-28 Thread Jens Rösner

Hi Chris!

Using the -k switch (convert local files to relative links) 
should do what you want.

CU
Jens


Christopher Stone wrote:
 
 Hi.
 
 I am new to wget, and although it doesn't seem to
 difficult, I am unable to get the desired results that
 I am looking for.
 
 I currently have a web site hosted by a web hosting
 site. I would like to take this web site as is and
 bring it to my local web server. Obviously, the ip
 address and all the links point back to this web
 server.
 
 When I ran wget and sucked the site to my local box,
 it pulled all the pages down and the index page comes
 up fine, but when I click on a link, it goes back to
 the remote server.
 
 What switch(s) do I use, so that when I pull the pages
 to my box, that all of the links are changed also?
 
 Thank you.
 
 Chris
 
 please cc to me, as i am not a list subscriber.
 
 __
 Do You Yahoo!?
 Yahoo! Finance - Get real-time stock quotes
 http://finance.yahoo.com



-r -R.exe downloads exe file

2002-08-04 Thread Jens Rösner

Hi!

With wget 1.9-beta, wget will download .exe files 
although they should be rejected (-r -R.exe).
After the download, wget removes the local file.
I understand that html files are downloaded even if -R.html,.htm 
is specified as the links that may be included in them 
have to be parsed.
However, I think, this makes no sense for .exe files 
and wanted to ask if this behaviour of wget 
maybe could get reconsidered.

Kind regards
Jens



Syntax for exclude_directories?

2002-07-27 Thread Jens Rösner

Hi guys!

Could someone please explain to me how to use 
-X (exclude_directories; --exclude) 
correctly on Windos machines?

I tried 
wget -X/html -x -k -r -l0 http://home.arcor.de/???/versuch.html
wget -Xhtml -x -k -r -l0 http://home.arcor.de/???/versuch.html
wget -X html -x -k -r -l0 http://home.arcor.de/???/versuch.html
wget -X/html -x -k -r -l0 http://home.arcor.de/???/versuch.html
wget -X'/html' -x -k -r -l0 http://home.arcor.de/???/versuch.html

All will traverse into the http://home.arcor.de/???/html folder.
I also tried the wgetrc version with either quotes, slashes or 
combinations.

I had a look into the wget documentation html file, but could not find
my mistake.

I tried both wget 1.5 and 1.9-beta.

Kind regards
Jens



Re: robots.txt

2002-06-10 Thread Jens Rösner

Hi Pike and the list!


   or your indexing mech might loop on it, or crash the server. who knows.
  I have yet to find a site which forces wGet into a loop as you said.
 I have a few. And I have a few java servers on linux that really hog the
 machine when requested. They're up for testing.
Ok, I am sorry, I always thought that when something like this happens
the 
person causing the loop would suffer most and therefore be punished
directly. 
I did not imagine that the server could really go down in case of that
constellation.

  If the robots.txt said that no user-agent may access the page, you would
  be right.
 right. or if it says some page is only ment for one specific bot.
 these things have a reason.
Yes, you are right. I concluded from my own experience that most
robots.txt say:
If you are a browser or google (and so on), go ahead, if you are
anything else, stop.
Allowing a certain bot to a bot-specific page was outside my scope.

CU
Jens



Re: speed units

2002-06-10 Thread Jens Rösner

Hi Joonas!

There was a lengthy discussion about this topic a few months ago.
I am pretty sure (=I hope) that noone wants to revamp this (again).
I personally think that if people start regarding this as 
a bug wget is damn close to absolute perfection.
(Yes, I know, perfection is per definitionem complete, that is a
pleonasmus.)
If you are really interested, do 
a) a search in Google
b) a search in the wget Mailing list archive

CU
Jens


Joonas Kortesalmi wrote:
 
 Wget seems top repots speeds with wrong units. It uses for example KB/s
 rather than kB/s which would be correct. Any possibility to fix that? :)
 
 K = Kelvin
 k = Kilo
 
 Propably you want to use small k with download speeds, right?
 
 Thanks a lot anyways for such a great tool!
 Keep up the good work, free software rules the world!
 
 --
 Joonas Kortesalmi [EMAIL PROTECTED]



Re: robots.txt

2002-06-09 Thread Jens Rösner

Hi!

  Why not just put robots=off in your .wgetrc?
 hey hey
 the robots.txt didn't just appear in the website; someone's
 put it there and thought about it. what's in there has a good reason.
Wll, from my own experience, the #1 reason is that webmasters 
do not want webgrabbers of any kind to download the site in order to
force 
the visitor to interactively browse the site and thus click
advertisement banners.

 The only reason is 
 you might be indexing old, doubled or invalid data, 
That is cute, someone who believes that all people in the 
internet do what they do to make life easier for everyone.
If you said one reason is or even one reason might be, 
I would not be that cynical, sorry.

 or your indexing mech might loop on it, or crash the server. who knows.
I have yet to find a site which forces wGet into a loop as you said.
Others on the list probably can estimate the theoretical likelyhood of
such events.

 ask the webmaster or sysadmin before you 'hack' the site.
LOL!
hack! Please provide a serious definition of to hack that includes 
automatically downloading pages that could be downloaded with any
interactive web-browser
If the robots.txt said that no user-agent may access the page, you would
be right.
But then: How would anyone know of the existence of this page then?
[rant]
Then again, maybe the page has a high percentage of cgi, JavaScript,
iFrames and thus only allows 
IE 6.0.123b to access the site. Then wget could maybe slow down the
server, especially as it is 
probably a w-ows box : But I ask: Is this a bad thing?
Whuahaha!
[/rant]

Ok, sorry vor my sarcasm, but I think you overestimate the benefits of
robots.txt for mankind.

CU
Jens



Re: A couple of wget newbie questions (proxy server and multiplefiles)

2002-05-16 Thread Jens Rösner

Hi Dale!

 Do I have to do 4 separate logins passing my username/passowrd each time?
 If not, how do I list the 4 separate directories I need to pull files from
 without performing 4 logins?
you should be able to put the four files into a .txt file and then 
use this txt-file with -i filename.txt
I use windows, so if you run Linux your fileextension may differ
(right?)
Also please note that I have neither used wget on a password-protected
site 
nor on ftp, so I may be wrong here.

 We are behind a firewall, I can't see how to pass the proxy server IP
 address to wget. And unfortunately, our IT group will not open up a hole for
 me to pull these files.
No problem, use
proxy = on
http_proxy = IP/URL
ftp_proxy = IP/URL
proxy_user = username
proxy_password = proxypass
in your wgetrc.
This is also included in the wget manual, 
but I, too, was too dumb to find it. ;)

CU
Jens



Re: Feature request

2002-04-24 Thread Jens Rösner

Hi Frederic!

 I'd like to know if there is a simple way to 'mirror' only the images
 from a galley (ie. without thumbnails).
[...]
I won't address the options you suggested, because I think they should 
be evaluated by a developper/coder.
However, as I often download galleries (and have some myself), I might be
able to give you a few hints:
Restricting files to be downloaded by
a) file-name
b) the directory they are in

To a):
-R*.gif,*tn*,*thumb*,*_jpg*,*small*
you get the picture I guess (pun not intended, but funny nevertheless). 
Works quite well. 

To b):
--reject-dir *thumb*

(I am not sure about the correct spelling/syntax, I currently have neither
wget nor winzip -or similar- on this machine, sorry!)

 It also seems these options are incompatible:
 --continue with --recursive
 This could be useful, imho.
IIRC, you are correct, but this is intentional. (right?)
You probably think of the case where during a recursive download, the
connection breaks and a large file is only partially downloaded.
I could imagine that this might be useful.
However, I see a problem when using timestamps, which normally require 
that a file be downloaded, if sizes local/on the server do not match, 
or the date on the server is newer. 
How should wget decide if it needs to re-get or continue the file?
You could probably to smart guessing, but the chance of false decisions
persists.
As a matter of fact, the problem is also existing when using --continue on a
single file, but then it is the user's decision and the story is therefore
quite different (I think).

CU
Jens


-- 
GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net




Re: Feature request

2002-04-24 Thread Jens Rösner

Hi Brix!

  It also seems these options are incompatible:
  --continue with --recursive
[...]
 JR How should wget decide if it needs to re-get or continue the file?
[...]
Brix: 
 Not wanting to repeat my post from a few days ago (but doing so nevertheless) the 
one way
 without checking all files online is to have wget write the downloaded
 file into a temp file (like *.wg! or something) and renaming it only
 after completing the download. 

Sorry for not paying attention. 
It sounds like a good idea :)
But I am no coder...

CU
Jens



Re: Validating cookie domains

2002-04-19 Thread Jens Rösner

Hi Ian!

  This is amazingly stupid.
 It seems to make more sense if you subtract one from the number of
 periods.
That was what I thought, too.

 Could you assume that all two-letter TLDs are country-code TLDs and
 require one more period than other TLDs (which are presumably at
 least three characters long)?
No, I don't think so.
Take my sites, for example
http://www.ichwillbagger-ladida.de
http://ichwillbagger-ladida.de
(remove the -ladida)
both work.

Or -as another phenomena I found- take 
http://www.uvex-ladida.de
and 
http://uvex-ladida.de
(remove the -ladida)
They are different...

I hope I did not miss your point.

CU
Jens









-- 
GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net




Re: LAN with Proxy, no Router

2002-04-10 Thread Jens Rösner

Hi Ian!

  wgetrc works fine under windows (always has)
  however, .wgetrc is not possible, but 
  maybe . does mean in root dir under Unix?
 
 The code does different stuff for Windows. Instead of looking for
 '.wgetrc' in the user's home directory, it looks for a file called
 'wget.ini' in the directory that contains the executable. This does
 not seemed to be mentioned anywhere in the documentation.

From my own experience, you are right concerning the location wget searches
for wgetrc on Windows.
However, a file called wgetrc is sufficient.
In fact, wgetrc.ini will not be found and thus 
its options ignored.
 
CU
Jens


-- 
GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net




Re: feature wish: switch to disable robots.txt usage

2002-04-10 Thread Jens Rösner

Hi!

Just to be complete, thanks to Hrvoje's tip, 
I was able to find 

-e command
--execute command
Execute command as if it were a part of .wgetrc (see Startup File.). 
A command thus invoked will be executed after the 
commands in .wgetrc, thus taking precedence over them.

I always wondered about that. *sigh*
I can now think about changing my wgetgui in this aspect :)

Thanks again
Jens


Hrvoje Niksic wrote:
 
 Noel Koethe [EMAIL PROTECTED] writes:
 
  Ok got it. But it is possible to get this option as a switch for
  using it on the command line?
 
 Yes, like this:
 
 wget -erobots=off ...



LAN with Proxy, no Router

2002-04-09 Thread Jens Rösner

Hi!

I recently managed to get my big machine online using a two PC 
(Windows boxes) LAN. 
A PI is the server, running both Zonealaram and Jana under Win98.
The first one a firewall, the second one a proxy programme.
On my client, an Athlon 1800+ with Windows 2000 
I want to work with wget and download files over http from the www.

For Netscape, I need to specify the LAN IP of the server as Proxy
address.
Setting up LeechFTP works similarly, IE is also set up (all three work).
But wget does not work the way I tried.
I just basically started it, it failed (of course) 
and I searched the wget help and the www with google.
However, the only thing that looks remotely like what I need is

''
-Y on/off
--proxy=on/off

Turn proxy support on or off.  The proxy is on by default 
if the appropriate environmental variable is defined.
''

Could someone please tell me, what 
the appropriate environmental variable is
and how do I change it in Windows
or what else I need to do?

I'd expect something like
--proxy=on/off
--proxy-address
--proxy-user
--proxy-passwd
as a collection of proxy-related commands.
All except --proxy-address=IP exist, so it is apparently not necessary.

Kind regards
Jens



Re: wget usage

2002-04-05 Thread Jens Rösner

Hi Gérard!

I think you should have a look at the -p option.
It stands for page requisites and should do exactly what you want.
If I am not mistaken, -p was introduced in wget 1.8 
and improved for 1.8.1 (the current version).

CU
Jens

 I'd like to download a html file with its embedded
 elements (e.g. .gif files).

[PS: CC changed to the normal wget list]



Re: option changed: -nh - -nH

2002-04-03 Thread Jens Rösner

Hi Noèl!

-nh
and
-nH
are totally different.

from wget 1.7.1 (I think the last version to offer both):
`-nh'
`--no-host-lookup'
 Disable the time-consuming DNS lookup of almost all hosts (*note
 Host Checking::).

`-nH'
`--no-host-directories'
 Disable generation of host-prefixed directories.  By default,
 invoking Wget with `-r http://fly.srk.fer.hr/' will create a
 structure of directories beginning with `fly.srk.fer.hr/'.  This
 option disables such behavior.

For wget 1.8.x -nh became the default behavior.
Switching back to host-look-up is not possible.
I already complained that many old scripts now break and suggested 
that entering -nh at the command line would 
either be completely ignored or the user would be 
informed and wget executed nevertheless.
Apparently this was not regarded as useful.

CU
Jens


 
 The option --no-host-directories
 changed from -nh to -nH (v1.8.1).
 
 Is there a reason for this?
 It breaks a lot of scripts when upgrading,
 I think.
 
 Could this be changed back to -nh?
 
 Thank you.
 
 -- 
   Noèl Köthe
 

-- 
GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net




Re: cuj.com file retrieving fails -why?

2002-04-03 Thread Jens Rösner

Hallo Markus!

This is not a bug (I reckon) and should therefore have been sent to 
the normal wget list.

Using both wget 1.7.1 and 1.8.1 on Windows the file is 
downloaded with 

wget -d -U Mozilla/5.0 (compatible; Konqueror/2.2.1; Linux) -r
http://www.cuj.com/images/resource/experts/alexandr.gif

as well as with

wget http://www.cuj.com/images/resource/experts/alexandr.gif

So, I do not know what your problem is, but is neither wget#s nor cuj's
fault, AFAICT.

CU
Jens


 
 This problem is independent on whether a proxy is used or not:
 The download hangs, though I can read the content using konqueror.
 So what do cuj people do to inhibit automatic download and how can
 I circumvent it?
 
 
 wget --proxy=off -d -U Mozilla/5.0 (compatible; Konqueror/2.2.1; Linux)
 -r http://www.cuj.com/images/resource/experts/alexandr.gif
 DEBUG output created by Wget 1.7 on linux.
 
 parseurl (http://www.cuj.com/images/resource/experts/alexandr.gif;) -
 host www.cuj.com - opath images/resource/experts/alexandr.gif - dir
 images/resource/experts - file alexandr.gif - ndir
 images/resource/experts
 newpath: /images/resource/experts/alexandr.gif
 Checking for www.cuj.com in host_name_address_map.
 Checking for www.cuj.com in host_slave_master_map.
 First time I hear about www.cuj.com by that name; looking it up.
 Caching www.cuj.com - 66.35.216.85
 Checking again for www.cuj.com in host_slave_master_map.
 --14:32:35--  http://www.cuj.com/images/resource/experts/alexandr.gif
= `www.cuj.com/images/resource/experts/alexandr.gif'
 Connecting to www.cuj.com:80... Found www.cuj.com in
 host_name_address_map: 66.35.216.85
 Created fd 3.
 connected!
 ---request begin---
 GET /images/resource/experts/alexandr.gif HTTP/1.0
 User-Agent: Mozilla/5.0 (compatible; Konqueror/2.2.1; Linux)
 Host: www.cuj.com
 Accept: */*
 Connection: Keep-Alive
 
 ---request end---
 HTTP request sent, awaiting response...
 
 nothing happens 
 
 
 Markus
 

-- 
GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net




Re: spanning hosts: 2 Problems

2002-03-28 Thread Jens Rösner

Hi again, Ian and fellow wgeteers!

 A debug log will be useful if you can produce one. 
Sure I (or wget) can and did.
It is 60kB of text. Zipping? Attaching?

 Also note that if receive cookies that expire around 2038 with
 debugging on, the Windows version of Wget will crash! (This is a
 known bug with a known fix, but not yet finalised in CVS.)
Funny you mention that! 
I came across a crash caused by a cookie 
two days ago. I disabled cookies and it worked.
Should have traced this a bit more.

  I just installed 1.7.1, which also works breadth-first.
 (I think you mean depth-first.) 
*doh* /slaps forehead
Of course, thanks.

 used depth-first retrieval. There are advantages and disadvantages
 with both types of retrieval.
I understand, I followed (but not totally understood) 
the discussion back then.

  Of course, this is possible.
  I just had hoped that by combining
  -F -i url.html
  with domain acceptance would save me a lot of time.
 
 Oh, I think I see what your first complaint is now. I initially
 assumed that your local html file was being served by a local HTTP
 server rather than being fed to the -F -i options. Is your complaint really that URLs
 supplied on the command line or via the
 -i option are not subjected to the acceptance/rejection rules? That
 does indeed seem to be the current behavior, but there is not
 particular reason why we couldn't apply the tests to these URLs as
 well as the URLs obtained through recursion.

Well, you are confusing me a bit ;}
Assume a file like

html
body
a href=http://www.audistory-nospam.com;1/a
a href=http://www.audistory-nospam.de;2/a
a href=http://www.audi100-online-nospam.de;3/a
a href=http://www.kolaschnik-nospam.de;4/a
/body
/html

and a command line like

wget -nc -x -r -l0 -t10 -H -Dstory.de,audi -o example.log -k -d
-R.gif,.exe,*tn*,*thumb*,*small* -F -i example.html

Result with 1.8.1 and 1.7.1 with -nh: 
audistory.com: Only index.html
audistory.de: Everything
audi100-online: only the first page 
kolaschnik.de: only the first page

What I would have liked and expected:
audistory.com: Everything
audistory.de: Everything
audi100-online: Everything
kolaschnik.de: nothing

Independent from the the question how the string audi 
should be matched within the URL, I think rejected URLs 
should not be parsed or be retrieved.

I hope I could articulate what I wanted to say :)

CU
Jens



Re: (Fwd) Proposed new --unfollowed-links option for wget

2002-03-08 Thread Jens Rösner

Hi List!

As a non-wget-programmer I also think that this 
option may be very useful.
I'd be happy to see it wget soon :)
Just thought to drop in some positive feedback :)

CU
Jens

-u,  --unfollowed-links=FILE  log unfollowed links to FILE.
 Nice. It sounds useful.



Re: maybe code from pavuk would help

2002-03-02 Thread Jens Rösner

Hi Noèl!

)message CC changed to normal wget list(

Rate-limiting is possible since wget 1.7.1 or so, please correct me if
it was 1.8!
requests for http post pop up occasionaly, 
but as far as I am concerned, I don't need it and 
I think it is not in the scope of wget currently.
Filling out forms could probably be very useful for some users I guess.
If it would be possible without too much fuss, 
I would encourage this, eventhough I would not need it.
BTW: Could you elaborate a bit more on the ... part of your mail?
BTW2: Why did you send this to the bug list?(insert multiple question
marks here)

CU
Jens

Noel Koethe schrieb:
 
 Hello,
 
 I tested pavuk (http://www.pavuk.org/, GPL) and there are some features
 I miss in wget:
 
 -supports HTTP POST requests
 -can automaticaly fill forms from HTML documents and make POST or GET
  requestes based on user input and form content
 -you can limit transfer rate over network (speed throttling)
 ...
 
 Maybe there is some code which could be used in wget.:)
 So the wheel wouldn't invented twice.
 
 --
 Noèl Köthe



RE: Accept list

2002-02-28 Thread Jens Rösner

Hi Peter!
 
 I was using 153
 I am getting 181 now
Good idea, but

  --accept=patchdiagxref,103566*,103603*,103612*,103618*
  a href=patches/112502readme112502readme/abr
  a href=patches/112504-01zip112504-01zip/abr
  a href=patches/112504readme112504readme/abr
  a href=patches/112518-01zip112518-01zip/abr
  a href=patches/112518readme112518readme/abr
[snip]
look at the file names you want, none of them includes 103*, they all
start with 112*
So, wget works absolutely ok, I think
Or am I missing something here?

CU
Jens

-- 
GMX - Die Kommunikationsplattform im Internet
http://wwwgmxnet




Re: How does -P work?

2002-01-14 Thread Jens Rösner

Hi Herold!

Thanks for the testing, I must admit, trying -nd did not occur to me :(

I already have implemented a \ to / conversion in my wgetgui, 
but forgot to strip the trailing (as Hrvoje suggested) / *doh*
Anyway, I would of course be happy to see a patch like you proposed, 
but I understand too little to judge where it belongs :}

CU
Jens

http://www.JensRoesner.de/wgetgui/


 Note: tests done on NT4. W9x probably would behave different (even
 worse).
 starting from (for example) c:, with d: being another writable disk of
 some kind, something like
 wget -nd -P d:/dir http://www.previnet.it
 does work as expected.
 wget -nd -P d:\dir http://www.previnet.it
 also does work as expected.
 wget -P d:\dir http://www.previnet.it
 did create a directory d:\@5Cdir and started from there, in other words
 the \ is converted by wget since it doesn't recognize it as a valid
 local directory separator.
 wget -P d:/dir http://www.previnet.it
 failed in a way or another for the impossibility to create the correct
 directory or use it if already present.
[snip]



Re: Suggestion on job size

2002-01-11 Thread Jens Rösner

Hi Fred!

First, I think this would rather belong in the normal wget list, 
as I cannot see a bug here.
Sorry to the bug tracers, I am posting to the normal wget List and
cc-ing Fred, 
hope that is ok.

To your first request: -Q (Quota) should do precisely what you want.
I used it with -k and it worked very well.
Or am I missing your point here?

Your second wish is AFAIK not possible now.
Maybe in the future wget could write the record 
of downloaded files in the appropriate directory.
After exiting wget, this file could then be used 
to process all the files mentioned in it.
Just an idea, I would normally not think that 
this option is an often requested one.
HOWEVER: 
-K works (when I understand it correctly) on the fly, as it decides on
the run, 
if the server file is newer, if a previously converted file exists and
what to do.
So, only -k would work after the download, right?

CU
Jens

http://www.JensRoesner.de/wgetgui/

 It would be nice to have some way to limit the total size of any job, and
 have it exit gracefully upon reaching that size, by completing the -k -K
 process upon termination, so that what one has downloaded is useful.  A
 switch that would set the total size of all downloads --total-size=600MB
 would terminate the run when the total bytes downloaded reached 600 MB, and
 process the -k -K.  What one had already downloaded would then be properly
 linked for viewing.
 
 Probably more difficult would be a way of terminating the run manually
 (Ctrl-break??), but then being able to run the -k -K process on the
 already-downloaded files.
 
 Fred Holmes



How does -P work?

2002-01-05 Thread Jens Rösner

Hi!

Can I use -P (Directory prefix) to save files in a user-determinded
folder on another drive under Windows?
I tried -PC:\temp\ which does not work (I am starting from D:\)
Also -P..\test\ would not save into the dir above the current one.
So I changed the \ into / and it worked.
However, I still could not save to another drive with -Pc:/temp
Any way around this? Bug/Feature? Windows/Unix problem?

CU
Jens



-nh broken 1.8

2001-12-24 Thread Jens Rösner

Hi!

I already posted this on the normal wget list, to which I am subscribed.
Problem:
-nh does not work in 1.8 latest windows binary.
By not working I mean that it is not recognized as a valid parameter.
(-nh is no-host look-up and with it on, 
two domain names pointing to the same IP are treated as different)

I am not sure which version first had this problem, but 1.7 did not show
it.
I really would like to have this option back.
Does anyone know where it is gone to?
Maybe doing holidays?

CU
Jens

http://www.jensroesner.de/wgetgui/



Re: -nh broken 1.8

2001-12-24 Thread Jens Rösner

Hi Hrvoje!

  -nh does not work in 1.8 latest windows binary.
  By not working I mean that it is not recognized as a valid parameter.
  (-nh is no-host look-up and with it on,
  two domain names pointing to the same IP are treated as different)
 
 You no longer need `-nh' to get that kind of behavior: it is now the
 default.

Ok, then three questions:

1. Is there then now a way to turn off -nh?
So that wget does not distinguish between domain names of the same IP?
Or is this option irrelevant given the net's current structure?

2. Wouldn't it be a good idea to mention the d
eletion of the -nh option in a file? 
Or was it already mentiones and I am too blind/stupid?

3. on a different aspect:
All command lines with -nh that were created before 1.8 are now
non-functional, 
except for the old versions of course.
Would it be possible that new wget versions just ignore it and 
older versions still work.
This would greatly enhance (forward) compatibility between different
versions, 
something I would regard as at least desirable?

CU
Jens



Re: -nh broken 1.8

2001-12-24 Thread Jens Rösner

Hi!

  2. Wouldn't it be a good idea to mention the 
  deletion of the -nh option in a file?
 
 Maybe.  What file do you have in mind?

First and foremost 
the news file, but I think it would also not be misplaced in 
wget.html and/or wget.hlp /.info (whatever it is called on Unix
systems).


  3. on a different aspect:
  All command lines with -nh that were created before 1.8 are now
  non-functional,
 Those command lines will need to be adjusted for the new Wget.  This
 is sad, but unavoidable.  Wget's command line options don't change
 every day, but they are not guaranteed to be cast in stone either.

I don't expect them to be lasting forever, 
I just meant that simply ignoring -nh in wget 1.8 
would have been an easy way to avoid the hassle.
I am of course thinking of my wGetGUI, where no host look-up is an
option.
So, I no have either to explain every user (not reading any manual of
course), 
that s/he should only use this option if they have an old wget version.
Or I could simply delete the -nh option and say that it is not important
enough 
for all the users of old wget versions.
And then there is the problem if someone upgrades from an old wget to a
new 
one, but keeps his/her old copy of wgetgui, which now of course produces 
invalid 1.8 command lines :(

CU
Jens

http://www.jensroesner.de/wgetgui/



-nh -nH??

2001-12-23 Thread Jens Rösner

Hi wgeteers!

I noticed that -nh (no host look-up) seems to be gone in 1.8.1.
Is that right?
At first I thought, Oh, you fool, it is -nH, you mixed it up
But, obviously, these are two different options.
I read the news file and the wget.hlp and wget.html but could not find
an answer.
I always thought that this option is quite important nowadays?!

Any help appreciated.

CU and a Merry Christmas
Jens



Re: referer question

2001-09-13 Thread Jens Rösner

Hi Vladi!

If you are using windows, you might try 
http://www.jensroesner.de/wgetgui/
it is a GUI for wGet written in VB 6.0.
If you click on the checkbox identify as browser, wGetGUI
will create a command line like you want.
I use it and it works for me.
Hope this helps?

CU
Jens

Vladi wrote:
   is it possible (well I mean easy way:)) to make wget pass referer auto?
   I mean for every url that wget tries to fetch to pass hostname as
 referer.
[snip]

-- 
GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net




wGetGUI now under the GPL

2001-06-23 Thread Jens Rösner

Hi guysgals! ;)

I just wanted to let you know, that with v0.5 of wGetGUI, it is now
released under the 
GPL, so if you feel like modifying or laughing at the source code, you
can now do so.

CU
Jens

http://www.jensroesner.de/wgetgui



Re: Download problem

2001-06-14 Thread Jens Rösner

Hi!

For all who cannot download the windows binaries,
they are now available through my site:
http://www.jensroesner.de/wgetgui/data/wget20010605-17b.zip
And while you are there, why not download wGetGUI v0.4?
:) http://www.jensroesner.de/wgetgui 
If Heiko is reading this:
May I just keep the file on my site?
And make it availabe to the public?

CU
Jens



(complete) GUI for those Windows users

2001-06-12 Thread Jens Rösner

Hi there!

First, let me introduce myself:
I am studying mechanical engineering and for a lecture I am learning
Visual Basic.
I was looking for a non-brain-dead way to get used to it and when a
friend of mine told me that he finds wGet too difficult to use I just
went *bing*
So, look what I have done:
http://www.jensroesner.de/wgetgui
Yes, it is a GUI and yes, it is not as powerful as the command line
execution.
I understand that most people who will read this are Unix/Linux users
and as that 
might have no use for my programme.
However, I would like to encourage you to send me any tips and bug
reports you find.
As I have not yet subscribed to the wget list, I would appreciate a CC
to my 
e-mail address.

Thanks!
Jens