Re: Feautre Request: Directory URL's and Mime Content-Type Header
Hi Levander! I am not an expert by any means, just another user, but what does the -E option do for you? -E = --html-extension apache. Could wget, for url's that end in slashes, read the content-type header, and if it's text/xml, could wget create index.xml inside the directory wget creates? Don't you mean create index.html? CU Jens -- Happy ProMail bis 24. März: http://www.gmx.net/de/go/promail Zum 6. Geburtstag gibt's GMX ProMail jetzt 66 Tage kostenlos!
File name too long
Hi all, is there a fix when file names are too long? Example: URL=http://search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21; - bash-2.04$ wget -kxE $URL --15:16:37-- http://search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21 = `search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21' Proxy request sent, awaiting response... 301 Moved Perminantly Location: /wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2 [following] --15:16:37-- http://search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2 = `search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2' Length: 46,310 [text/html] search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html: File name too long Cannot write to `search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html' (File name too long). search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html: File name too long Converting search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html... nothing to do. Converted 1 files in 0.00 seconds. ... apart from that the main thing I look for is how to obtain the search results. I still don't manage how to get the result from search.ebay.de and then download the links to cgi.ebay.de in one: wget -kxrE -l1 -D cgi.ebay.de -H $URL
Re: File name too long
On Mon, 21 Mar 2005, Martin Trautmann wrote: is there a fix when file names are too long? bash-2.04$ wget -kxE $URL --15:16:37-- http://search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21 = `search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21' Proxy request sent, awaiting response... 301 Moved Perminantly Location: /wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2 [following] --15:16:37-- http://search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2 = `search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2' Length: 46,310 [text/html] search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3 a21QQversionZ3D2.html: File name too long *** This is not problem of wget, but your filesystem. Try to do touch search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html ... apart from that the main thing I look for is how to obtain the search results. I still don't manage how to get the result from search.ebay.de and then download the links to cgi.ebay.de in one: wget -kxrE -l1 -D cgi.ebay.de -H $URL *** maybe to create SHA1 sum of the request and store the result in this file (but you will not know what was the original request, if you don't create some DB of requests). Or do just simple counting URL=. sha1sum=$( echo -n $URL | sha1sum ) echo $sha1sum $URL SHA1-URL.db wget -O sha1sum.html [other options] $URL or URL= i=0 echo $i $URL URL.db wget -O search-$i.html $URL Could be this your solution? Wolf.
help!!!
Hello all: Here is the scenario... I am trying to d/l a webpage, but the issue is there are two authentication challenges. The first is from the web server, so I wanted to use something like wget --http-user=user --http-passwd=password http://website.html , once authenticated with the web server therein an applicationauthentication via post so I found, wget --http-post="login=userpassword=pw" http://www.yourclient.com/somepage.htmlon the gnu siteto be an option.Is there a way I canconcatenate bothchallenges into one string whentrying to d/l the page? any help would be greatly appreciated. ps Jorge: islist receiving this email? or am I wasting sending to the wrong address. Many thanks, Richard Emanilov [EMAIL PROTECTED]
Re: File name too long
On 2005-03-21 15:32, [EMAIL PROTECTED] wrote: *** This is not problem of wget, but your filesystem. Try to do touch search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html I'm very sure that my file system has some limits somewhere - but I suppose a web server may create virtual URLs which will be too long or will include illegal characters for almost any file system around. The file name here might get repaired by some regex, e.g. wget_?catref=C6coaction=comparecoentrypage=searchcopagenum=1dfte=Q2d1dfts=Q2d1flt=9from=R9fsoo=2fsop=2saetm=396614sojs=1sspagename=ADMEQ3aBQ3aSSQ3aDEQ3a21version=2.html However, I'd be comfortable enough with some fixed length or char limitation, such as a 'trim' extension: -tc, --trimcharacter char cut filename after character, such as _ -tl, --trimlengthnum cut filename after num characters -ts, --trimsuffixnum digits used for incremented cut filenames -tt, --trimtable file log trimmed file name and original to file For the moment I'd be happy enough with saving to a md5.html checksum as filename instead of a filename too long for my fs. The output log could tell me about the shrinked and the original filename. search.ebay.de and then download the links to cgi.ebay.de in one: wget -kxrE -l1 -D cgi.ebay.de -H $URL *** maybe to create SHA1 sum of the request and store the result in this file (but you will not know what was the original request, if you don't create some DB of requests). Or do just simple counting URL=. sha1sum=$( echo -n $URL | sha1sum ) echo $sha1sum $URL SHA1-URL.db wget -O sha1sum.html [other options] $URL or URL= i=0 echo $i $URL URL.db wget -O search-$i.html $URL Could be this your solution? Nice idea - I'll give it a try. However, it does not answer the -D problem itself. I'm afraid this does require some further awk/sed processing of the result? Thanks, Martin
Re: File name too long
Martin Trautmann [EMAIL PROTECTED] writes: is there a fix when file names are too long? I'm afraid not. The question here would be, how should Wget know the maximum size of file name the file system supports? I don't think there's a portable way to determine that. Maybe there should be a way for --restrict-file-names to handle this too.
RE: help!!!
The --post-data option was added in version 1.9. You need to upgrade your version of wget. Tony -Original Message- From: Richard Emanilov [mailto:[EMAIL PROTECTED] Sent: Monday, March 21, 2005 8:49 AM To: Tony Lewis; [EMAIL PROTECTED] Cc: wget@sunsite.dk Subject: RE: help!!! wget --http-user=login --http-passwd=passwd --post-data=login=loginpassword=passwd https://site wget: unrecognized option `--post-data=login=loginpassword=password' Usage: wget [OPTION]... [URL]... wget --http-user=login --http-passwd=passwd --http-post=login=loginpassword=password https:site wget: unrecognized option `--http-post=login=loginpassword=passwd' Usage: wget [OPTION]... [URL]... Try `wget --help' for more options. wget -V GNU Wget 1.8.2 Richard Emanilov [EMAIL PROTECTED] -Original Message- From: Tony Lewis [mailto:[EMAIL PROTECTED] Sent: Monday, March 21, 2005 10:26 AM To: wget@sunsite.dk Cc: Richard Emanilov Subject: RE: help!!! Richard Emanilov wrote: Below is what I have tried with no success wget --http-user=login --http-passwd=passwd --http-post=login=loginpassword=passwd That should be: wget --http-user=login --http-passwd=passwd --post-data=login=loginpassword=passwd Tony
Re: File name too long
Martin Trautmann [EMAIL PROTECTED] writes: On 2005-03-21 17:13, Hrvoje Niksic wrote: Martin Trautmann [EMAIL PROTECTED] writes: is there a fix when file names are too long? I'm afraid not. The question here would be, how should Wget know the maximum size of file name the file system supports? I don't think there's a portable way to determine that. Where did the warning come from that stated File name too long'? I don't think it's a warning; it's an error that came from trying to open the file. By the time this error occurs, it's pretty much too late to change the file name. If the writing failed, you'll know for sure that either writing was not possible or that the file name was too long. Exactly -- there can be a number of reasons why opening a file fails, and large file name is only one of them. Maybe there should be a way for --restrict-file-names to handle this too. I guess the problem is less how to identify too long filenames, but more how to handle them. Identifying them is the harder problem. Imposing an arbitrary limit would hurt file systems with larger limits. It might be easier to use e.g. the suggested md5 checksum instead - It might be useful to have an option that did that. The problem is that it's a very heavy-handed solution -- looking at the file name would no longer provide a hint from which URL the file came from.
Re: help!!!
On Monday 21 March 2005 02:22 pm, Richard Emanilov wrote: Guys, Thanks so much for your help, when running wget --http-user=login --http-passwd=passwd --post-data=login=loginpassword=passwd https://site With version 1.9.1, I get the error message Site: Unsupported scheme. have you compiled wget with SSL support? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
RE: help!!!
/usr/local/bin/wget -dv --post-data=login=loginpassword=password https://login:[EMAIL PROTECTED]:8443/ft DEBUG output created by Wget 1.9.1 on linux-gnu. --17:11:35-- https://login:[EMAIL PROTECTED]:8443/ft = `ft' Connecting to ip... connected. Created socket 3. Releasing 0x8123110 (new refcount 0). Deleting unused 0x8123110. ---request begin--- POST /ft HTTP/1.0 User-Agent: Wget/1.9.1 Host: ip:8443 Accept: */* Connection: Keep-Alive Authorization: Basic cm9zZW46Y3VybHlx Content-Type: application/x-www-form-urlencoded Content-Length: 30 [POST data: login=loginpassword=passwd] ---request end--- HTTP request sent, awaiting response... HTTP/1.1 302 Moved Temporarily Location: https://login:[EMAIL PROTECTED]:8443/ft Content-Length: 0 Date: Mon, 21 Mar 2005 22:11:35 GMT Server: Apache-Coyote/1.1 Connection: Keep-Alive Registered fd 3 for persistent reuse. Location: https://ip:8443/ft/ [following] Closing fd 3 Releasing 0x81359a0 (new refcount 0). Deleting unused 0x81359a0. Invalidating fd 3 from further reuse. --17:11:35-- https://ip:8443/ft/ = `index.html' Connecting to ip:8443... connected. Created socket 3. Releasing 0x8118718 (new refcount 0). Deleting unused 0x8118718. ---request begin--- GET /ft/ HTTP/1.0 User-Agent: Wget/1.9.1 Host: ip:8443 Accept: */* Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... HTTP/1.1 401 Unauthorized WWW-Authenticate: Basic realm=Portfolio Viewer Content-Type: text/html;charset=ISO-8859-1 Content-Language: en-US Date: Mon, 21 Mar 2005 22:11:35 GMT Server: Apache-Coyote/1.1 Connection: close Closing fd 3 Authorization failed. Again, I'd like to thank you guys so much, made some progress, any og you guys familiar with this issue? -Original Message- From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] Sent: Monday, March 21, 2005 4:11 PM To: Richard Emanilov Cc: Tony Lewis; wget@sunsite.dk; [EMAIL PROTECTED] Subject: Re: help!!! On Monday 21 March 2005 02:22 pm, Richard Emanilov wrote: Guys, Thanks so much for your help, when running wget --http-user=login --http-passwd=passwd --post-data=login=loginpassword=passwd https://site With version 1.9.1, I get the error message Site: Unsupported scheme. have you compiled wget with SSL support? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it
RE: help!!!
I'm not longer seeing the 401 error, I am now seeing HTTP request sent, awaiting response... HTTP/1.1 501 Not Implemented I need to end this nightmare! Richard Emanilov [EMAIL PROTECTED] -Original Message- From: Richard Emanilov Sent: Monday, March 21, 2005 5:17 PM To: 'Mauro Tortonesi' Cc: Tony Lewis; wget@sunsite.dk; [EMAIL PROTECTED] Subject: RE: help!!! /usr/local/bin/wget -dv --post-data=login=loginpassword=password https://login:[EMAIL PROTECTED]:8443/ft DEBUG output created by Wget 1.9.1 on linux-gnu. --17:11:35-- https://login:[EMAIL PROTECTED]:8443/ft = `ft' Connecting to ip... connected. Created socket 3. Releasing 0x8123110 (new refcount 0). Deleting unused 0x8123110. ---request begin--- POST /ft HTTP/1.0 User-Agent: Wget/1.9.1 Host: ip:8443 Accept: */* Connection: Keep-Alive Authorization: Basic cm9zZW46Y3VybHlx Content-Type: application/x-www-form-urlencoded Content-Length: 30 [POST data: login=loginpassword=passwd] ---request end--- HTTP request sent, awaiting response... HTTP/1.1 302 Moved Temporarily Location: https://login:[EMAIL PROTECTED]:8443/ft Content-Length: 0 Date: Mon, 21 Mar 2005 22:11:35 GMT Server: Apache-Coyote/1.1 Connection: Keep-Alive Registered fd 3 for persistent reuse. Location: https://ip:8443/ft/ [following] Closing fd 3 Releasing 0x81359a0 (new refcount 0). Deleting unused 0x81359a0. Invalidating fd 3 from further reuse. --17:11:35-- https://ip:8443/ft/ = `index.html' Connecting to ip:8443... connected. Created socket 3. Releasing 0x8118718 (new refcount 0). Deleting unused 0x8118718. ---request begin--- GET /ft/ HTTP/1.0 User-Agent: Wget/1.9.1 Host: ip:8443 Accept: */* Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... HTTP/1.1 401 Unauthorized WWW-Authenticate: Basic realm=Portfolio Viewer Content-Type: text/html;charset=ISO-8859-1 Content-Language: en-US Date: Mon, 21 Mar 2005 22:11:35 GMT Server: Apache-Coyote/1.1 Connection: close Closing fd 3 Authorization failed. Again, I'd like to thank you guys so much, made some progress, any og you guys familiar with this issue? -Original Message- From: Mauro Tortonesi [mailto:[EMAIL PROTECTED] Sent: Monday, March 21, 2005 4:11 PM To: Richard Emanilov Cc: Tony Lewis; wget@sunsite.dk; [EMAIL PROTECTED] Subject: Re: help!!! On Monday 21 March 2005 02:22 pm, Richard Emanilov wrote: Guys, Thanks so much for your help, when running wget --http-user=login --http-passwd=passwd --post-data=login=loginpassword=passwd https://site With version 1.9.1, I get the error message Site: Unsupported scheme. have you compiled wget with SSL support? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi University of Ferrara - Dept. of Eng.http://www.ing.unife.it Institute of Human Machine Cognition http://www.ihmc.us Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net Ferrara Linux User Group http://www.ferrara.linux.it