Re: Web page "source" using wget?
"Suhas Tembe" <[EMAIL PROTECTED]> writes: > It does look a little complicated This is how it looks: > > > 454A > 454B > Those are the important parts. It's not hard to submit this form. With Wget 1.9, you can even use the POST method, e.g.: wget http://.../InventoryStatus.asp --post-data \ 'cboSupplier=4541-134289&status=all&action-select=Query' \ -O InventoryStatus1.asp wget http://.../InventoryStatus.asp --post-data \ 'cboSupplier=4542-134289&status=all&action-select=Query' -O InventoryStatus2.asp It might even work to simply use GET, and retrieve http://.../InventoryStatus.asp?cboSupplier=4541-134289&status=all&action-select=Query without the need for `--post-data' or `-O', but that depends on the ASP script that does the processing. The harder part is to automate this process for *any* values in the drop-down list. You might need to use an intermediary Perl script that extracts all the from the HTML source of the page with the drop-down. Then, from the output of the Perl script, you call Wget as shown above. It's doable, but it takes some work. Unfortunately, I don't know of a (command-line) tool that would make this easier.
Re: Web page "source" using wget?
It does look a little complicated This is how it looks: Supplier 454A 454B Quantity Status Over Under Both All I don't see any specific URL that would get the relevant data after I hit submit. Maybe I am missing something... Thanks, Suhas - Original Message - From: "Hrvoje Niksic" <[EMAIL PROTECTED]> To: "Suhas Tembe" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Tuesday, October 07, 2003 5:24 PM Subject: Re: Web page "source" using wget? > "Suhas Tembe" <[EMAIL PROTECTED]> writes: > > > this page contains a "drop-down" list of our customer's locations. > > At present, I choose one location from the "drop-down" list & click > > submit to get the data, which is displayed in a report format. I > > "right-click" & then choose "view source" & save "source" to a file. > > I then choose the next location from the "drop-down" list, click > > submit again. I again do a "view source" & save the source to > > another file and so on for all their locations. > > It's possible to automate this, but it requires some knowledge of > HTML. Basically, you need to look at the ... part of the > page and find the tag that defines the drop-down. Assuming > that the form looks like this: > > http://foo.com/customer"; method=GET> > > California > Massachussetts > ... > > > > you'd automate getting the locations by doing something like: > > for loc in ca ma ... > do > wget "http://foo.com/customer?location=$loc"; > done > > Wget will save the respective sources in files named > "customer?location=ca", "customer?location=ma", etc. > > But this was only an example. The actual process depends on what's in > the form, and it might be considerably more complex than this. >
Re: Web page "source" using wget?
"Suhas Tembe" <[EMAIL PROTECTED]> writes: > this page contains a "drop-down" list of our customer's locations. > At present, I choose one location from the "drop-down" list & click > submit to get the data, which is displayed in a report format. I > "right-click" & then choose "view source" & save "source" to a file. > I then choose the next location from the "drop-down" list, click > submit again. I again do a "view source" & save the source to > another file and so on for all their locations. It's possible to automate this, but it requires some knowledge of HTML. Basically, you need to look at the ... part of the page and find the tag that defines the drop-down. Assuming that the form looks like this: http://foo.com/customer"; method=GET> California Massachussetts ... you'd automate getting the locations by doing something like: for loc in ca ma ... do wget "http://foo.com/customer?location=$loc"; done Wget will save the respective sources in files named "customer?location=ca", "customer?location=ma", etc. But this was only an example. The actual process depends on what's in the form, and it might be considerably more complex than this.
Re: Web page "source" using wget?
Got it! Thanks! So far so good. After logging-in, I was able to get to the page I am interested in. There was one thing that I forgot to mention in my earlier posts (I apologize)... this page contains a "drop-down" list of our customer's locations. At present, I choose one location from the "drop-down" list & click submit to get the data, which is displayed in a report format. I "right-click" & then choose "view source" & save "source" to a file. I then choose the next location from the "drop-down" list, click submit again. I again do a "view source" & save the source to another file and so on for all their locations. I am not quite sure how to automate this process! How can I do this non-interactively? especially the "submit" portion of the page. Is this possible using wget? Thanks, Suhas - Original Message - From: "Hrvoje Niksic" <[EMAIL PROTECTED]> To: "Suhas Tembe" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Tuesday, October 07, 2003 5:02 PM Subject: Re: Web page "source" using wget? > "Suhas Tembe" <[EMAIL PROTECTED]> writes: > > > Thanks everyone for the replies so far.. > > > > The problem I am having is that the customer is using ASP & Java > > script. The URL stays the same as I click through the links. > > URL staying the same is usually a sign of the use of frame, not of ASP > and JavaScript. Instead of looking at the URL entry field, try using > "copy link to clipboard" instead of clicking on the last link. Then > use Wget on that. >
Re: Web page "source" using wget?
"Suhas Tembe" <[EMAIL PROTECTED]> writes: > Thanks everyone for the replies so far.. > > The problem I am having is that the customer is using ASP & Java > script. The URL stays the same as I click through the links. URL staying the same is usually a sign of the use of frame, not of ASP and JavaScript. Instead of looking at the URL entry field, try using "copy link to clipboard" instead of clicking on the last link. Then use Wget on that.
Re: Major, and seemingly random problems with wget 1.8.2
Josh Brooks <[EMAIL PROTECTED]> writes: >> > At first it will act normally, just going over the site in question, but >> > sometimes, you will come back to the terminal and see if grabbing all >> > sorts of pages from totally different sites (!) >> >> The only way I've seen it happen is when it follows a redirection to a >> different site. The redirection is followed because it's considered >> to be part of the same download. However, further links on the >> redirected site are not (supposed to be) followed. > > Ok, is there a way to tell wget not to follow redirects, so it will > not ever do that at all ? Not yet, sorry. But people have asked for it a lot, so it'll probably make it in after 1.9.
Re: Web page "source" using wget?
Thanks everyone for the replies so far.. The problem I am having is that the customer is using ASP & Java script. The URL stays the same as I click through the links. So, using "wget URL" for the page I want may not work (I may be wrong). Any suggestions on how I can tackle this? Thanks, Suhas - Original Message - From: "Hrvoje Niksic" <[EMAIL PROTECTED]> To: "Suhas Tembe" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Monday, October 06, 2003 5:19 PM Subject: Re: Web page "source" using wget? > "Suhas Tembe" <[EMAIL PROTECTED]> writes: > > > Hello Everyone, > > > > I am new to this wget utility, so pardon my ignorance.. Here is a > > brief explanation of what I am currently doing: > > > > 1). I go to our customer's website every day & log in using a User Name & Password. > > 2). I click on 3 links before I get to the page I want. > > 3). I right-click on the page & choose "view source". It opens it up in Notepad. > > 4). I save the "source" to a file & subsequently perform various tasks on that > > file. > > > > As you can see, it is a manual process. What I would like to do is > > automate this process of obtaining the "source" of a page using > > wget. Is this possible? Maybe you can give me some suggestions. > > It's possible, in fact it's what Wget does in its most basic form. > Disregarding authentication, the recipe would be: > > 1) Write down the URL. > > 2) Type `wget URL' and you get the source of the page in file named >SOMETHING.html, where SOMETHING is the file name that the URL ends >with. > > Of course, you will also have to specify the credentials to the page, > and Tony explained how to do that. >
Re: Using chunked transfer for HTTP requests?
Hrvoje Niksic wrote: > That would work for short streaming, but would be pretty bad in the > mkisofs example. One would expect Wget to be able to stream the data > to the server, and that's just not possible if the size needs to be > known in advance, which HTTP/1.0 requires. One might expect it, but if it's not possible using the HTTP protocol, what can you do? :-)
Re: Major, and seemingly random problems with wget 1.8.2
Thank you for the great response. It is much appreciated - see below... On Tue, 7 Oct 2003, Hrvoje Niksic wrote: > www.zorg.org/vsound/ contains this markup: > > > > That explicitly tells robots, such as Wget, not to follow the links in > the page. Wget respects this and does not follow the links. You can > tell Wget to ignore the robot directives. For me, this works as > expected: > > wget -km -e robots=off http://www.zorg.org/vsound/ Perfect - thank you. > > At first it will act normally, just going over the site in question, but > > sometimes, you will come back to the terminal and see if grabbing all > > sorts of pages from totally different sites (!) > > The only way I've seen it happen is when it follows a redirection to a > different site. The redirection is followed because it's considered > to be part of the same download. However, further links on the > redirected site are not (supposed to be) followed. Ok, is there a way to tell wget not to follow redirects, so it will not ever do that at all ? Basically I am looking for a way to tell wget "don't ever get anything with a different FQDN than what I started you with" thanks.
Re: Major, and seemingly random problems with wget 1.8.2
Josh Brooks <[EMAIL PROTECTED]> writes: > I have noticed very unpredictable behavior from wget 1.8.2 - > specifically I have noticed two things: > > a) sometimes it does not follow all of the links it should > > b) sometimes wget will follow links to other sites and URLs - when the > command line used should not allow it to do that. Thanks for the report. A more detailed response follows below: > First, sometimes when you attempt to download a site with -k -m > (--convert-links and --mirror) wget will not follow all of the links and > will skip some of the files! > > I have no idea why it does this with some sites and doesn't do it with > other sites. Here is an example that I have reproduced on several systems > - all with 1.8.2: Links are missed on some sites because of the use of incorrect comments. This has been fixed for Wget 1.9, where a more relaxed comment parsing code is the default. But that's not the case for www.zorg.org/vsound/. www.zorg.org/vsound/ contains this markup: That explicitly tells robots, such as Wget, not to follow the links in the page. Wget respects this and does not follow the links. You can tell Wget to ignore the robot directives. For me, this works as expected: wget -km -e robots=off http://www.zorg.org/vsound/ You can put `robots=off' in your .wgetrc and this problem will not bother you again. > The second problem, and I cannot currently give you an example to try > yourself but _it does happen_, is if you use this command line: > > wget --tries=inf -nH --no-parent > --directory-prefix=/usr/data/www.explodingdog.com--random-wait -r -l inf > --convert-links --html-extension --user-agent="Mozilla/4.0 (compatible; > MSIE 6.0; AOL 7.0; Windows NT 5.1)" www.example.com > > At first it will act normally, just going over the site in question, but > sometimes, you will come back to the terminal and see if grabbing all > sorts of pages from totally different sites (!) The only way I've seen it happen is when it follows a redirection to a different site. The redirection is followed because it's considered to be part of the same download. However, further links on the redirected site are not (supposed to be) followed. If you have a repeatable example, please mail it here so we can examine it in more detail.
Re: Using chunked transfer for HTTP requests?
"Tony Lewis" <[EMAIL PROTECTED]> writes: > Hrvoje Niksic wrote: > >> I don't understand what you're proposing. Reading the whole file in >> memory is too memory-intensive for large files (one could presumably >> POST really huge files, CD images or whatever). > > I was proposing that you read the file to determine the length, but > that was on the assumption that you could read the input twice, > which won't work with the example you proposed. In fact, it won't work with anything except regular files and links to them. > Can you determine if --post-file is a regular file? Yes. > If so, I still think you should just read (or otherwise examine) the > file to determine the length. That's how --post-file works now. The problem is that it doesn't work for non-regular files. My first message explains it, or at least tries to. > For other types of input, perhaps you want write the input to a > temporary file. That would work for short streaming, but would be pretty bad in the mkisofs example. One would expect Wget to be able to stream the data to the server, and that's just not possible if the size needs to be known in advance, which HTTP/1.0 requires.
Re: Using chunked transfer for HTTP requests?
Hrvoje Niksic wrote: > I don't understand what you're proposing. Reading the whole file in > memory is too memory-intensive for large files (one could presumably > POST really huge files, CD images or whatever). I was proposing that you read the file to determine the length, but that was on the assumption that you could read the input twice, which won't work with the example you proposed. > It would be really nice to be able to say something like: > > mkisofs blabla | wget http://burner/localburn.cgi --post-file > /dev/stdin Stefan Eissing wrote: > I just checked with RFC 1945 and it explicitly says that POSTs must > carry a valid Content-Length header. In that case, Hrvoje will need to get creative. :-) Can you determine if --post-file is a regular file? If so, I still think you should just read (or otherwise examine) the file to determine the length. For other types of input, perhaps you want write the input to a temporary file. Tony
Major, and seemingly random problems with wget 1.8.2
Hello, I have noticed very unpredictable behavior from wget 1.8.2 - specifically I have noticed two things: a) sometimes it does not follow all of the links it should b) sometimes wget will follow links to other sites and URLs - when the command line used should not allow it to do that. Here are the details. First, sometimes when you attempt to download a site with -k -m (--convert-links and --mirror) wget will not follow all of the links and will skip some of the files! I have no idea why it does this with some sites and doesn't do it with other sites. Here is an example that I have reproduced on several systems - all with 1.8.2: # wget -k -m http://www.zorg.org/vsound/ --17:09:32-- http://www.zorg.org/vsound/ => `www.zorg.org/vsound/index.html' Resolving www.zorg.org... done. Connecting to www.zorg.org[213.232.100.31]:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] [ <=> ] 12,23553.82K/s Last-modified header missing -- time-stamps turned off. 17:09:32 (53.82 KB/s) - `www.zorg.org/vsound/index.html' saved [12235] FINISHED --17:09:32-- Downloaded: 12,235 bytes in 1 files Converting www.zorg.org/vsound/index.html... 2-6 Converted 1 files in 0.03 seconds. What is the problem here ? When I run the exact same command line with wget 1.6, I get this: # wget -k -m http://www.zorg.org/vsound/ --11:10:06-- http://www.zorg.org/vsound/ => `www.zorg.org/vsound/index.html' Connecting to www.zorg.org:80... connected! HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] 0K -> .. . Last-modified header missing -- time-stamps turned off. 11:10:07 (71.12 KB/s) - `www.zorg.org/vsound/index.html' saved [12235] Loading robots.txt; please ignore errors. --11:10:07-- http://www.zorg.org/robots.txt => `www.zorg.org/robots.txt' Connecting to www.zorg.org:80... connected! HTTP request sent, awaiting response... 404 Not Found 11:10:07 ERROR 404: Not Found. --11:10:07-- http://www.zorg.org/vsound/vsound.jpg => `www.zorg.org/vsound/vsound.jpg' Connecting to www.zorg.org:80... connected! HTTP request sent, awaiting response... 200 OK Length: 27,629 [image/jpeg] 0K -> .. .. .. [100%] 11:10:08 (51.49 KB/s) - `www.zorg.org/vsound/vsound.jpg' saved [27629/27629] --11:10:09-- http://www.zorg.org/vsound/vsound-0.2.tar.gz => `www.zorg.org/vsound/vsound-0.2.tar.gz' Connecting to www.zorg.org:80... connected! HTTP request sent, awaiting response... 200 OK Length: 108,987 [application/x-tar] 0K -> .. .. .. .. .. [ 46%] 50K -> .. .. .. .. .. [ 93%] 100K -> .. [100%] 11:10:12 (46.60 KB/s) - `www.zorg.org/vsound/vsound-0.2.tar.gz' saved [108987/108987] --11:10:12-- http://www.zorg.org/vsound/vsound-0.5.tar.gz => `www.zorg.org/vsound/vsound-0.5.tar.gz' Connecting to www.zorg.org:80... connected! HTTP request sent, awaiting response... 200 OK Length: 116,904 [application/x-tar] 0K -> .. .. .. .. .. [ 43%] 50K -> .. .. .. .. .. [ 87%] 100K -> .. [100%] 11:10:14 (60.44 KB/s) - `www.zorg.org/vsound/vsound-0.5.tar.gz' saved [116904/116904] --11:10:14-- http://www.zorg.org/vsound/vsound => `www.zorg.org/vsound/vsound' Connecting to www.zorg.org:80... connected! HTTP request sent, awaiting response... 200 OK Length: 3,365 [text/plain] 0K -> ...[100%] 11:10:14 (3.21 MB/s) - `www.zorg.org/vsound/vsound' saved [3365/3365] Converting www.zorg.org/vsound/index.html... done. FINISHED --11:10:14-- Downloaded: 269,120 bytes in 5 files Converting www.zorg.org/vsound/index.html... done. See ? It gets the links inside of index.html, and mirrors those links, and converts them - just like it should. Why does 1.8.2 have a problem with this site ? Other sites are handled just fine by 1.8.2 with the same command line ... it makes no sense that wget 1.8.2 has problems with particular web sites. This is incorrect behavior - and if you try the same URL with 1.8.2 you can reproduce the same results. The second problem, and I cannot currently give you an example to try yourself but _it does happen_, is if you use this command line: wget --tries=inf -nH --no-parent --directory-prefix=/usr/data/www.explodingdog.com--random-wait -r -l inf --convert-links --html-extension --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; AOL 7.0; Windows NT 5.1)" www.example.com At first it will act normally, just going over the site in question, but sometimes, you will come back to the terminal and see if grabbing all sorts of pages from totally different sites (!) I have seen
Re: [PATCH] wget-1.8.2: Portability, plus EBCDIC patch
Martin, thanks for the patch and the detailed report. Note that it might have made more sense to apply the patch to the latest CVS version, which is somewhat different from 1.8.2. I'm really not sure whether to add this patch. On the one hand, it's nice to support as many architectures as possible. But on the other hand, most systems are ASCII. All the systems I've ever seen or worked on have been ASCII. I am fairly certain that I would not be able to support EBCDIC in the long run and that, unless someone were to continually support EBCDIC, the existing support would bitrot away. Is anyone on the Wget list using an EBCDIC system?
[PATCH] wget-1.8.2: Portability, plus EBCDIC patch
Hello Hrvoje and Dan, I have been using wget for many years now, and finally got to applying a patch I made long ago (EBCDIC patch against wget-1.5.3) to the current wget-1.8.2. This patch makes wget compile and run on a mainframe computer using the EBCDIC character set. Also, when compiling wget on Solaris (using the SUNWspro "Forte" compiler), I stumbled over a portability problem (C++ comments in a C source) to which I add a patch as well. About the EBCDIC patch: * The goal was to create a patch which worked for our EBCDIC system (Fujitsu-Siemens' mainframe OS is called BS2000, it runs on /390 hardware, but is not compatible with OS/390 per se) but would be easily adaptable to OS/390 (to which I have no access, but whose behaviour I know from similar ports). The code to actually make it work for OS/390 is not in place, but I added a tool (called safe-ctype-mk.c -- delete if you don't like it) to create the additions to safe-ctype.c which are necessary because IBM's EBCDIC differs from "our" EBCDIC. * Because code conversion is necessary for text files, a distiction between "text" and "binary" download was added (based on the downloaded MIME type; see the routines http_set_convert_flag() and http_get_convert_flag(). A future patch may add a new --conversion=text/binary/auto switch which is not implemented yet.) Currently, the same heuristics are used as in the Apache HTTP server to determine whether conversion is required (for several kinds of text files) or not required (for images, compressed files etc.) * Because EBCDIC alphabetic characters live in the range between '\xA1' and '\xE9', the getopt_long() numbers have been shifted up by 200, beyond the 0xFF boundary, to avoid conflicts between single-character options and numeric long-option values. That does not change the behaviour on ASCII machines, but allows the source to compile on EBCDIC machines (otherwise: error: multiple case in switch). * wget-1.8.2 has been compiled on our BS2000, with the patch applied, and with SSL enabled (against openssl-0.9.6k), and has been tested to work correctly. If you would add the patch to future versions of wget, then all users of our BS2000 as well as users of IBM's OS/390 could take advantage of the availability of wget for EBCDIC-based machines, and hopefully someone would also contribute the missing IBM-EBCDIC counterparts to our BS2000-EBCDIC patch. Martin -- <[EMAIL PROTECTED]> | Fujitsu Siemens Fon: +49-89-636-46021, FAX: +49-89-636-47655 | 81730 Munich, Germany diff -bur wget-1.8.2/src/ftp.c work/wget-1.8.2/src/ftp.c --- wget-1.8.2/src/ftp.c.orig 2003-10-06 17:20:58.710178000 +0200 +++ wget-1.8.2/src/ftp.c2003-10-06 17:17:00.399371000 +0200 @@ -474,7 +474,7 @@ } err = ftp_size(&con->rbuf, u->file, len); -// printf("\ndebug: %lld\n", *len); +/* printf("\ndebug: %lld\n", *len); */ /* FTPRERR */ switch (err) { diff -bur wget-1.8.2/src/http.c work/wget-1.8.2/src/http.c --- wget-1.8.2/src/http.c.orig 2003-10-06 17:20:58.900182000 +0200 +++ wget-1.8.2/src/http.c 2003-10-06 17:19:16.829836000 +0200 @@ -1777,7 +1777,7 @@ FREE_MAYBE (dummy); return RETROK; } -// fprintf(stderr, "test: hstat.len: %lld, hstat.restval: %lld\n", hstat.dltime); +/* fprintf(stderr, "test: hstat.len: %lld, hstat.restval: %lld\n", hstat.dltime); */ tmrate = retr_rate (hstat.len - hstat.restval, hstat.dltime, 0); if (hstat.len == hstat.contlen) diff -bur wget-1.8.2.orig/src/connect.c wget-1.8.2/src/connect.c --- wget-1.8.2.orig/src/connect.c Mon Oct 6 17:13:11 2003 +++ wget-1.8.2/src/connect.cMon Oct 6 17:10:28 2003 @@ -47,6 +47,10 @@ #endif #endif /* WINDOWS */ +#if #system(bs2000) +#include +#endif + #include #ifdef HAVE_STRING_H # include @@ -73,6 +77,26 @@ to connect_to_one. */ static const char *connection_host_name; +#if 'A' == '\xC1' /* CHARSET_EBCDIC */ +/* Start off with convert=1 (headers are always converted) */ +static int convert_flag_last_reply = 1; + +void +http_set_convert_flag(const char *type) +{ +convert_flag_last_reply = + (strncasecmp(type, "text/", 5) == 0 + || strncasecmp(type, "message/", 8) == 0 + || strcasecmp(type, "application/postscript") == 0); +} + +int +http_get_convert_flag() +{ +return convert_flag_last_reply; +} +#endif + void set_connection_host_name (const char *host) { @@ -459,6 +483,11 @@ } while (res == -1 && errno == EINTR); +#if 'A' == '\xC1' + if (res > 0 && http_get_convert_flag()) +_a2e_n(buf,res); +#endif + return res; } @@ -472,6 +501,25 @@ { int res = 0; +#if 'A' == '\xC1' /* CHARSET_EBCDIC */ + static char *cbuf = NULL; + static int csize = 0; + + if (len > csize) { +if (cbuf != NULL) + free(cbuf); +cbuf = malloc(csize = len+8192); /* add arbitrary amount of skew */ +
Re: Using chunked transfer for HTTP requests?
Am Dienstag, 07.10.03, um 17:02 Uhr (Europe/Berlin) schrieb Hrvoje Niksic: That's probably true. But have you tried sending without Content-Length and Connection: close and closing the output side of the socket before starting to read the reply from the server? That might work, but it sounds too dangerous to do by default, and too obscure to devote a command-line option to. Besides, HTTP/1.1 *requires* requests with a request-body to provide Conent-Length: For compatibility with HTTP/1.0 applications, HTTP/1.1 requests containing a message-body MUST include a valid Content-Length header field unless the server is known to be HTTP/1.1 compliant. I just checked with RFC 1945 and it explicitly says that POSTs must carry a valid Content-Length header. That leaves the option of first sending an OPTIONS request to the server (either url or *) to check the HTTP version. //Stefan
Re: Using chunked transfer for HTTP requests?
Stefan Eissing <[EMAIL PROTECTED]> writes: > Am Dienstag, 07.10.03, um 16:36 Uhr (Europe/Berlin) schrieb Hrvoje > Niksic: >> What the current code does is: determine the file size, send >> Content-Length, read the file in chunks (up to the promised size) and >> send those chunks to the server. But that works only with regular >> files. It would be really nice to be able to say something like: >> >> mkisofs blabla | wget http://burner/localburn.cgi --post-file >> /dev/stdin > > That would indeed be nice. Since I'm coming from the WebDAV side > of life: does wget allow the use of PUT? No. >> I haven't checked, but I'm 99% convinced that browsers simply don't >> give a shit about non-regular files. > > That's probably true. But have you tried sending without > Content-Length and Connection: close and closing the output side of > the socket before starting to read the reply from the server? That might work, but it sounds too dangerous to do by default, and too obscure to devote a command-line option to. Besides, HTTP/1.1 *requires* requests with a request-body to provide Conent-Length: For compatibility with HTTP/1.0 applications, HTTP/1.1 requests containing a message-body MUST include a valid Content-Length header field unless the server is known to be HTTP/1.1 compliant.
Re: some wget patches against beta3
Karl Eichwalder <[EMAIL PROTECTED]> writes: > I guess, you as the wget maintainer switched from something > supported to the unsupported "betaX" scheme and now we have > something to talk about ;) I had no idea that something as usual as "betaX" was unsupported. In fact, I believe that "bX" was added when Francois saw me using it in Wget. :-) > Using something different then exactly "wget-1.9-b3.de.po" will > confuse the robot >> Returning an error that says "your version number is unparsable to >> this piece of software, you must use one of <...> instead" would be >> more correct in the long run. > > Sure. You should have receive a message like this, didn't you? I didn't. Maybe it was an artifact of robot not having worked at the time, though.
Re: Using chunked transfer for HTTP requests?
Am Dienstag, 07.10.03, um 16:36 Uhr (Europe/Berlin) schrieb Hrvoje Niksic: What the current code does is: determine the file size, send Content-Length, read the file in chunks (up to the promised size) and send those chunks to the server. But that works only with regular files. It would be really nice to be able to say something like: mkisofs blabla | wget http://burner/localburn.cgi --post-file /dev/stdin That would indeed be nice. Since I'm coming from the WebDAV side of life: does wget allow the use of PUT? My first impulse was to bemoan Wget's antiquated HTTP code which doesn't understand "chunked" transfer. But, coming to think of it, even if Wget used HTTP/1.1, I don't see how a client can send chunked requests and interoperate with HTTP/1.0 servers. How do browsers figure out whether they can do a chunked transfer or not? I haven't checked, but I'm 99% convinced that browsers simply don't give a shit about non-regular files. That's probably true. But have you tried sending without Content-Length and Connection: close and closing the output side of the socket before starting to read the reply from the server? //Stefan
Re: Using chunked transfer for HTTP requests?
"Tony Lewis" <[EMAIL PROTECTED]> writes: > Hrvoje Niksic wrote: > >> Please be aware that Wget needs to know the size of the POST >> data in advance. Therefore the argument to @code{--post-file} >> must be a regular file; specifying a FIFO or something like >> @file{/dev/stdin} won't work. > > There's nothing that says you have to read the data after you've > started sending the POST. Why not just read the --post-file before > constructing the request so that you know how big it is? I don't understand what you're proposing. Reading the whole file in memory is too memory-intensive for large files (one could presumably POST really huge files, CD images or whatever). What the current code does is: determine the file size, send Content-Length, read the file in chunks (up to the promised size) and send those chunks to the server. But that works only with regular files. It would be really nice to be able to say something like: mkisofs blabla | wget http://burner/localburn.cgi --post-file /dev/stdin >> My first impulse was to bemoan Wget's antiquated HTTP code which >> doesn't understand "chunked" transfer. But, coming to think of it, >> even if Wget used HTTP/1.1, I don't see how a client can send >> chunked requests and interoperate with HTTP/1.0 servers. > > How do browsers figure out whether they can do a chunked transfer or > not? I haven't checked, but I'm 99% convinced that browsers simply don't give a shit about non-regular files.
Re: Using chunked transfer for HTTP requests?
Hrvoje Niksic wrote: > Please be aware that Wget needs to know the size of the POST data > in advance. Therefore the argument to @code{--post-file} must be > a regular file; specifying a FIFO or something like > @file{/dev/stdin} won't work. There's nothing that says you have to read the data after you've started sending the POST. Why not just read the --post-file before constructing the request so that you know how big it is? > My first impulse was to bemoan Wget's antiquated HTTP code which > doesn't understand "chunked" transfer. But, coming to think of it, > even if Wget used HTTP/1.1, I don't see how a client can send chunked > requests and interoperate with HTTP/1.0 servers. How do browsers figure out whether they can do a chunked transfer or not? Tony
Re: some wget patches against beta3
Karl Eichwalder <[EMAIL PROTECTED]> writes: > Hrvoje Niksic <[EMAIL PROTECTED]> writes: > >> Ouch. Why does the robot care about version names at all? > > It must know about the sequences; this is important for merging > issues. IIRC, we have at least these sequences supported by the > robot: > > 1.2 -> 1.2.1 -> 1.2.2 -> 1.3 etc. > > 1.2 -> 1.2a -> 1.2b -> 1.3 > > 1.2 -> 1.3-pre1 -> 1.3-pre2 -> 1.3 > > 1.2 -> 1.3-b1 -> 1.3-b2 -> 1.3 Thanks for the clarification, Karl. But as a maintainer of a project that tries to use the robot, I must say that I'm not happy about this. If the robot absolutely must be able to collate versions, then it should be smarter about it and support a larger array of formats in use out there. See `dpkg' for an example of how it can be done, although the TP robot certainly doesn't need to do all that `dpkg' does. This way, unless I'm missing something, the robot seems to be in the position to dictate its very narrow-minded versioning scheme to the projects that would only like to use it (the robot). That's really bad. But what's even worse is that something or someone silently changed "beta3" to "b3" in the POT, and then failed to perform the same change for my translation, which caused it to get dropped without notice. Returning an error that says "your version number is unparsable to this piece of software, you must use one of <...> instead" would be more correct in the long run. Is the robot written in Python? Would you consider it for inclusion if I donated a function that performed the comparison more fully (provided, of course, that the code meets your standards of quality)?
Re: some wget patches against beta3
Karl Eichwalder <[EMAIL PROTECTED]> writes: > Hrvoje Niksic <[EMAIL PROTECTED]> writes: > >> I'm not sure what "b3" is, but the version in the POT file was >> supposed to be "beta3". Was there a misunderstanding somewhere along >> the line? > > Yes, the robot does not like beta3 as part of the version > string. "b3" or "pre3" are okay. Ouch. Why does the robot care about version names at all?
Re: some wget patches against beta3
Karl Eichwalder <[EMAIL PROTECTED]> writes: >> Also, my Croatian translation of 1.9 doesn't seem to have made it >> in. Is that expected? > > Unfortunately, yes. Will you please resubmit it with the subject line > updated (IIRC, it's now): > > TP-Robot wget-1.9-b3.hr.po I'm not sure what "b3" is, but the version in the POT file was supposed to be "beta3". Was there a misunderstanding somewhere along the line?
Re: some wget patches against beta3
Karl Eichwalder <[EMAIL PROTECTED]> writes: > Hrvoje Niksic <[EMAIL PROTECTED]> writes: > >> As for the Polish translation, translations are normally handled >> through the Translation Project. The TP robot is currently down, but >> I assume it will be back up soon, and then we'll submit the POT file >> and update the translations /en masse/. > > It took a little bit longer than expected but now, the robot is up and > running again. This morning (CET) I installed b3 for translation. However, http://www2.iro.umontreal.ca/~gnutra/registry.cgi?domain=wget still shows `wget-1.8.2.pot' to be the "current template for [the] domain". Also, my Croatian translation of 1.9 doesn't seem to have made it in. Is that expected?
Re: -q and -S are incompatible
Dan Jacobson <[EMAIL PROTECTED]> writes: > -q and -S are incompatible and should perhaps produce errors and be > noted thus in the docs. They seem to work as I'd expect -- `-q' tells Wget to print *nothing*, and that's what happens. The output Wget would have generated does contain HTTP headers, among other things, but it never gets printed. > BTW, there seems no way to get the -S output, but no progress > indicator. -nv, -q kill them both. It's a bug that `-nv' kills `-S' output, I think. > P.S. one shouldn't have to confirm each bug submission. Once should > be enough. You're right. :-( I'll ask the sunsite people if there's a way to establish some form of white lists...
Re: some wget patches against beta3
Thanks!
Re: some wget patches against beta3
Hrvoje Niksic <[EMAIL PROTECTED]> writes: > As for the Polish translation, translations are normally handled > through the Translation Project. The TP robot is currently down, but > I assume it will be back up soon, and then we'll submit the POT file > and update the translations /en masse/. It took a little bit longer than expected but now, the robot is up and running again. This morning (CET) I installed b3 for translation.
Re: wget 1.9 - behaviour change in recursive downloads
Zitat von Hrvoje Niksic <[EMAIL PROTECTED]>: > Jochen Roderburg <[EMAIL PROTECTED]> writes: > > > Zitat von Hrvoje Niksic <[EMAIL PROTECTED]>: > > > >> It's a feature. `-A zip' means `-A zip', not `-A zip,html'. Wget > >> downloads the HTML files only because it absolutely has to, in order > >> to recurse through them. After it finds the links in them, it deletes > >> them. > > > > Hmm, so it has really been an undetected error over all the years > > ;-) ? > > s/undetected/unfixed/ > > At least I've always considered it an error. I didn't know people > depended on it. Well, *depend* is a rather strong expression for that ;-) It worked that way always, I got used to it, I never really thought if it was correct or not, because I had a use for it. So I was astonished, when these files suddenly disappeared. As I wrote already, I will mention them explicitly now. I think, the worst that will happen is that I get a few more of them than before. Perhaps the whole thing could be mentioned in the documentation of the accept/reject option. Current there is only this sentence there: >> Note that these two options do not affect the downloading of HTML >> files; Wget must load all the HTMLs to know where to go at >> all--recursive retrieval would make no sense otherwise. J. Roderburg
Re: Using chunked transfer for HTTP requests?
On Tue, 7 Oct 2003, Hrvoje Niksic wrote: > My first impulse was to bemoan Wget's antiquated HTTP code which doesn't > understand "chunked" transfer. But, coming to think of it, even if Wget > used HTTP/1.1, I don't see how a client can send chunked requests and > interoperate with HTTP/1.0 servers. > > The thing is, to be certain that you can use chunked transfer, you > have to know you're dealing with an HTTP/1.1 server. But you can't > know that until you receive a response. And you don't get a response > until you've finished sending the request. A chicken-and-egg problem! The only way to deal with this automaticly, that I can think of, is to use a "Expect: 100-continue" request-header and based on the 100-response you can decide if the server is 1.1 or not. Other than that, I think a command line option is the only choice. -- -=- Daniel Stenberg -=- http://daniel.haxx.se -=- ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol
Re: Using chunked transfer for HTTP requests?
Theoretically, a HTTP/1.0 server should accept an unknown content-length if the connection is closed after the request. Unfortunately, the response 411 Length Required, is only defined in HTTP/1.1. //Stefan Am Dienstag, 07.10.03, um 01:12 Uhr (Europe/Berlin) schrieb Hrvoje Niksic: As I was writing the manual for `--post', I decided that I wasn't happy with this part: Please be aware that Wget needs to know the size of the POST data in advance. Therefore the argument to @code{--post-file} must be a regular file; specifying a FIFO or something like @file{/dev/stdin} won't work. My first impulse was to bemoan Wget's antiquated HTTP code which doesn't understand "chunked" transfer. But, coming to think of it, even if Wget used HTTP/1.1, I don't see how a client can send chunked requests and interoperate with HTTP/1.0 servers. The thing is, to be certain that you can use chunked transfer, you have to know you're dealing with an HTTP/1.1 server. But you can't know that until you receive a response. And you don't get a response until you've finished sending the request. A chicken-and-egg problem! Of course, once a response is received, we could remember that we're dealing with an HTTP/1.1 server, but that information is all but useless, since Wget's `--post' is typically used to POST information to one URL and exit. Is there a sane way to stream data to HTTP/1.0 servers that expect POST?