-q and -S are incompatible

2003-10-06 Thread Dan Jacobson
-q and -S are incompatible and should perhaps produce errors and be noted thus
 in the docs.

BTW, there seems no way to get the -S output, but no progress
indicator.  -nv, -q kill them both.

P.S. one shouldn't have to confirm each bug submission. Once should be enough.


Wget 1.9-beta4 is available for testing

2003-10-06 Thread Hrvoje Niksic
Several bugs fixed since beta3, including a fatal one on Windows.
Includes a working Windows implementation of run_with_timeout.

Get it from:

http://fly.srk.fer.hr/~hniksic/wget/wget-1.9-beta4.tar.gz



Using chunked transfer for HTTP requests?

2003-10-06 Thread Hrvoje Niksic
As I was writing the manual for `--post', I decided that I wasn't
happy with this part:

Please be aware that Wget needs to know the size of the POST data
in advance.  Therefore the argument to @code{--post-file} must be
a regular file; specifying a FIFO or something like
@file{/dev/stdin} won't work.

My first impulse was to bemoan Wget's antiquated HTTP code which
doesn't understand "chunked" transfer.  But, coming to think of it,
even if Wget used HTTP/1.1, I don't see how a client can send chunked
requests and interoperate with HTTP/1.0 servers.

The thing is, to be certain that you can use chunked transfer, you
have to know you're dealing with an HTTP/1.1 server.  But you can't
know that until you receive a response.  And you don't get a response
until you've finished sending the request.  A chicken-and-egg problem!

Of course, once a response is received, we could remember that we're
dealing with an HTTP/1.1 server, but that information is all but
useless, since Wget's `--post' is typically used to POST information
to one URL and exit.

Is there a sane way to stream data to HTTP/1.0 servers that expect
POST?



Re: Web page "source" using wget?

2003-10-06 Thread Tony Lewis
Hrvoje Niksic wrote:

> Wget 1.9 can send POST data.

That slipped past me. :-) Just in case I'm not the only one in that boat...

>From NEWS:

** It is now possible to specify that POST method be used for HTTP
requests.  For example, `wget --post-data="id=foo&data=bar" URL' will
send a POST request with the specified contents.

wget --help

HTTP options:

   --post-data=STRINGuse the POST method; send STRING as the data.
   --post-file=FILE  use the POST method; send contents of FILE.



Re: Web page "source" using wget?

2003-10-06 Thread Hrvoje Niksic
"Suhas Tembe" <[EMAIL PROTECTED]> writes:

> Hello Everyone,
>
> I am new to this wget utility, so pardon my ignorance.. Here is a
> brief explanation of what I am currently doing:
>
> 1). I go to our customer's website every day & log in using a User Name & Password.
> 2). I click on 3 links before I get to the page I want.
> 3). I right-click on the page & choose "view source". It opens it up in Notepad.
> 4). I save the "source" to a file & subsequently perform various tasks on that file.
>
> As you can see, it is a manual process. What I would like to do is
> automate this process of obtaining the "source" of a page using
> wget. Is this possible? Maybe you can give me some suggestions.

It's possible, in fact it's what Wget does in its most basic form.
Disregarding authentication, the recipe would be:

1) Write down the URL.

2) Type `wget URL' and you get the source of the page in file named
   SOMETHING.html, where SOMETHING is the file name that the URL ends
   with.

Of course, you will also have to specify the credentials to the page,
and Tony explained how to do that.



Re: Problems ands suggestions

2003-10-06 Thread Hrvoje Niksic
Thanks for the report, I agree with your suggestions.  IIRC, the "no
such file or directory" is a kludge that comes from the inability of
the FTP code to distinguish between different kinds of errors in RETR.
In fact, Wget's FTP code is old somewhat crufty, although it works
well in practice.  The problems will probably be fixed at a later
release, when I get around to revamping the backend implementations.

Patches that fix the problems in the current codebase are very likely
to be accepted.



Re: Web page "source" using wget?

2003-10-06 Thread Hrvoje Niksic
"Tony Lewis" <[EMAIL PROTECTED]> writes:

> wget
> http://www.custsite.com/some/page.html --http-user=USER --http-passwd=PASS
>
> If you supply your user ID and password via a web form, it will be
> tricky (if not impossible) because wget doesn't POST forms (unless
> someone added that option while I wasn't looking. :-)

Wget 1.9 can send POST data.

But there's a simpler way to handle web sites that use cookies for
authorization: make Wget use the site's own cookie.  Export cookies as
explained in the manual, and specify:

wget --load-cookies=COOKIE-FILE http://...

Here is an excerpt from the manual section that explains how to export
cookies.

`--load-cookies FILE'
 Load cookies from FILE before the first HTTP retrieval.  FILE is a
 textual file in the format originally used by Netscape's
 `cookies.txt' file.

 You will typically use this option when mirroring sites that
 require that you be logged in to access some or all of their
 content.  The login process typically works by the web server
 issuing an HTTP cookie upon receiving and verifying your
 credentials.  The cookie is then resent by the browser when
 accessing that part of the site, and so proves your identity.

 Mirroring such a site requires Wget to send the same cookies your
 browser sends when communicating with the site.  This is achieved
 by `--load-cookies'--simply point Wget to the location of the
 `cookies.txt' file, and it will send the same cookies your browser
 would send in the same situation.  Different browsers keep textual
 cookie files in different locations:

Netscape 4.x.
  The cookies are in `~/.netscape/cookies.txt'.

Mozilla and Netscape 6.x.
  Mozilla's cookie file is also named `cookies.txt', located
  somewhere under `~/.mozilla', in the directory of your
  profile.  The full path usually ends up looking somewhat like
  `~/.mozilla/default/SOME-WEIRD-STRING/cookies.txt'.

Internet Explorer.
  You can produce a cookie file Wget can use by using the File
  menu, Import and Export, Export Cookies.  This has been
  tested with Internet Explorer 5; it is not guaranteed to work
  with earlier versions.

Other browsers.
  If you are using a different browser to create your cookies,
  `--load-cookies' will only work if you can locate or produce a
  cookie file in the Netscape format that Wget expects.

 If you cannot use `--load-cookies', there might still be an
 alternative.  If your browser supports a "cookie manager", you can
 use it to view the cookies used when accessing the site you're
 mirroring.  Write down the name and value of the cookie, and
 manually instruct Wget to send those cookies, bypassing the
 "official" cookie support:

  wget --cookies=off --header "Cookie: NAME=VALUE"




Re: Web page "source" using wget?

2003-10-06 Thread Tony Lewis
Suhas Tembe wrote:

> 1). I go to our customer's website every day & log in using a User Name &
> Password.
[snip]
> 4). I save the "source" to a file & subsequently perform various tasks on
> that file.
>
> What I would like to do is automate this process of obtaining the "source"
> of a page using wget. Is this possible?

That depends on how you enter your user name and password. If it's via using
an HTTP user ID and password, that's pretty easy.

wget
http://www.custsite.com/some/page.html --http-user=USER --http-passwd=PASS

If you supply your user ID and password via a web form, it will be tricky
(if not impossible) because wget doesn't POST forms (unless someone added
that option while I wasn't looking. :-)

Tony



Problems ands suggestions

2003-10-06 Thread Bloodflowers [Tuth 10]
I'm a big fan of wget. I've been usnig it for quite a while now, and am now 
testing the 1.9beta3 on win2k.

First of all, I'd like to suggest a couple of things:
# it should be possible to tell wget to ignore a couple of errors:
  	FTPLOGINC // FTPs often give out this error when they're full. I want it 
to keep trying
   CONREFUSED // the FTP may be temporarily down
   FTPLOGREFUSED // the FTP may be full
   FTPSRVERR // freakish errors happen every once in a while

# if I tell to download files from a list, and it fails, it should still 
obey the waitretry timeout as was pointed out by someone else earlier (I 
don't have the time right now to go look for the post)

and now for the problem:

Apparently, when wget has a problem during a transfer in win32 and dies, it 
the starts saying:
"failed: No such file or directory"

this has hapened to me on HTTPs, I have to check to see what the real error 
is (I'm guessing CONREFUSED), but it shouldn't be givin this error anyway.

thanks for everything ;)

_
Help STOP SPAM with the new MSN 8 and get 2 months FREE*  
http://join.msn.com/?page=features/junkmail



Web page "source" using wget?

2003-10-06 Thread Suhas Tembe
Hello Everyone,

I am new to this wget utility, so pardon my ignorance.. Here is a brief explanation of 
what I am currently doing:

1). I go to our customer's website every day & log in using a User Name & Password.
2). I click on 3 links before I get to the page I want.
3). I right-click on the page & choose "view source". It opens it up in Notepad.
4). I save the "source" to a file & subsequently perform various tasks on that file.

As you can see, it is a manual process. What I would like to do is automate this 
process of obtaining the "source" of a page using wget. Is this possible? Maybe you 
can give me some suggestions.

Thanks in advance.
Suhas


New win binary was (RE: Compilation breakage in html-parse.c)

2003-10-06 Thread Herold Heiko
> From: Hrvoje Niksic [mailto:[EMAIL PROTECTED]
> 
> This might be one cause for compilation breakage in html-parse.c.
> It's a Gcc-ism/c99-ism/c++-ism, depending on how you look at it, fixed
> by this patch:
> 
> 2003-10-03  Hrvoje Niksic  <[EMAIL PROTECTED]>
> 
>   * html-parse.c (convert_and_copy): Move variable declarations
>   before statements.

Either this or another patch resolved - I didn't have time to track it down
for good. Didn't even read the Changelog, just a quick export, make, minimal
test, put up on site.
New msvc binary from current cvs at http://xoomer.virgilio.it/hherold
(yes, ISP decided to change the url. Old urls do still work).

Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax


RE: Bug in Windows binary?

2003-10-06 Thread Herold Heiko
> From: Gisle Vanem [mailto:[EMAIL PROTECTED]

> "Jens Rösner" <[EMAIL PROTECTED]> said:
> 
...
 
> I assume Heiko didn't notice it because he doesn't have that function
> in his kernel32.dll. Heiko and Hrvoje, will you correct this ASAP?
> 
> --gv

Probably.
Currently I'm compiling and testing on NT 4.0 only.
Beside that I'm VERY tight on time in this moment so testing usually means
"does it run ? Does it download one sample http and one https site ? Yes ?
Put it up for testing!".

Heiko

-- 
-- PREVINET S.p.A. www.previnet.it
-- Heiko Herold [EMAIL PROTECTED]
-- +39-041-5907073 ph
-- +39-041-5907472 fax


Re: no-clobber add more suffix

2003-10-06 Thread Jens Rösner
Hi Sergey!

-nc does not only apply to .htm(l) files.
All files are considered.
At least in all wget versions I know of.

I cannot comment on your suggestion, to restrict -nc to a 
user-specified list of file types.
I personally don't need it, but I could imagine certain situations 
were this could indeed be helpful. 
Hopefully someone with more knowledge than me 
can elaborate a bit more on this :)

CU
Jens



> `--no-clobber' is very usfull option, but i retrive document not only with
> .html/.htm suffix.
> 
> Make addition option that like -A/-R define all allowed/rejected rules
> for -nc option.
> 

-- 
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++



Re: can wget disable HTTP Location Forward ?

2003-10-06 Thread Hrvoje Niksic
There is currently no way to disable following redirects.  A patch to
do so has been submitted recently, but I didn't see a good reason why
one would need it, so I didn't add the option.  Your mail is a good
argument, but I don't know how prevalent that behavior is.

What is it with servers that can't be bothered to return 404?  Are
there lots of them nowadays?  Is a new default setting of Apache or
IIS to blame, or are people intentionally screwing up their
configurations?


Re: subscribe wget

2003-10-06 Thread Hrvoje Niksic
To subscribe to this list, please send mail to
<[EMAIL PROTECTED]>.


can wget disable HTTP Location Forward ?

2003-10-06 Thread fwei
hello bug-wget,

When I use wget to grab some page that does not exist,
the server usually return a location redirect to it's home page (instead of 404 error),
then the wget will follow this link, which cause an un-necessary HTTP request.

so my question is which option I should use to disable this default behavior.thanks.



Regards,

fwei
[EMAIL PROTECTED]

2003-10-06




no-clobber add more suffix

2003-10-06 Thread Sergey Vasilevsky
`--no-clobber' is very usfull option, but i retrive document not only with
.html/.htm suffix.

Make addition option that like -A/-R define all allowed/rejected rules
for -nc option.