Re: [Bug-wget] CNET download links not working with WGET
No, it's not working. It downloads part of the URL and creates a file named 3001-8022_4-10804572.html@spi=077d9109e846975d0db9532bd610588f.1 which is 68 KB. I cannot wget to treat the string of characters as a whole URL. Please help, I really need to get this script working and the only place to download this file is from CNET. So... looks like it works, then. Your command shell isn't complaining about weird command names, wget is clearly requesting the full and correct URL, it follows redirections, and saves using the final redirection URL (the latest sources wouldn't follow that last step - it'd save using the request URI by default). If you dislike the filename, then provided you have a recent enough version of wget you can add the --content-disposition option if the server provides a rename header (Content-Disposition); or else use -E to have wget force the file name to end in .html -mjc (05/26/2011 12:19 PM), Jeff Givens wrote: Hi, I know this is an older topic but thanks for replying. I forgot to mention I had already what you listed below and this is the output I get: C:\DOWNLOADwget http://dw.com.com/redir?edId=3siteId=4oId=3000-8022_ 4-10804572ontId=8022_4spi=077d9109e846975d0db9532bd610588flop=linktag=tdw_dl textltype=dl_dlnowpid=11665648mfgId=6290020merId=6290020pguid=HFsQLwoOYJQAA BuImQcAAAGmdestUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572.html%3 Fspi%3D077d9109e846975d0db9532bd610588f --2011-05-02 12:34:20-- http://dw.com.com/redir?edId=3siteId=4oId=3000-8022_4 -10804572ontId=8022_4spi=077d9109e846975d0db9532bd610588flop=linktag=tdw_dlt extltype=dl_dlnowpid=11665648mfgId=6290020merId=6290020pguid=HFsQLwoOYJQAAB uImQcAAAGmdestUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572.html%3F spi%3D077d9109e846975d0db9532bd610588f Resolving dw.com.com... 216.239.113.95 Connecting to dw.com.com|216.239.113.95|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://download.cnet.com/3001-8022_4-10804572.html?spi=077d9109e846975 d0db9532bd610588f [following] --2011-05-02 12:34:21-- http://download.cnet.com/3001-8022_4-10804572.html?spi= 077d9109e846975d0db9532bd610588f Resolving download.cnet.com... 64.30.224.58 Connecting to download.cnet.com|64.30.224.58|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `3001-8022_4-10804572.html@spi=077d9109e846975d0db9532bd610588f.1' [= ] 69,240 77.3K/s in 0.9s 2011-05-02 12:34:22 (77.3 KB/s) - `3001-8022_4-10804572.html@spi=077d9109e846975 d0db9532bd610588f.1' saved [69240] C:\DOWNLOAD Thanks for your help. -Jeff hello, the character in the url is interpreted by your shell. Try using something like: wget URL Cheers, Giuseppe Jeff Givensj...@sds.net writes: Hello, I am having an issue downloading files via download links from CNET. It appears to locate some of the URL but stops at the first siteId part. I have included the debug information as well. Thanks in advance for your help. C:\DOWNLOAD\wget http://dw.com.com/redir?edId=3siteId=4oId=300 0-8022_4-10804572ontId=8022_4spi=077d9109e846975d0db9532bd610588flop=linktag =tdw_dltextltype=dl_dlnowpid=11665648mfgId=6290020merId=6290020pguid=HFsQLw oOYJQAABuImQcAAAGmdestUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572 .html%3Fspi%3D077d9109e846975d0db9532bd610588f --2011-04-19 11:30:35-- http://dw.com.com/redir?edId=3 Resolving dw.com.com... 216.239.113.95 Connecting to dw.com.com|216.239.113.95|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://dw.com.com/redir/redx/?edId=3 [following] --2011-04-19 11:30:36-- http://dw.com.com/redir/redx/?edId=3 Reusing existing connection to dw.com.com:80. HTTP request sent, awaiting response... 404 Not Found 2011-04-19 11:30:36 ERROR 404: Not Found. 'siteId' is not recognized as an internal or external command, operable program or batch file. 'oId' is not recognized as an internal or external command, operable program or batch file. 'ontId' is not recognized as an internal or external command, operable program or batch file. 'spi' is not recognized as an internal or external command, operable program or batch file. 'lop' is not recognized as an internal or external command, operable program or batch file. 'tag' is not recognized as an internal or external command, operable program or batch file. 'ltype' is not recognized as an internal or external command, operable program or batch file. 'pid' is not recognized as an internal or external command, operable program or batch file. 'mfgId' is not recognized as an internal or external command, operable program or batch file. 'merId' is not recognized as an internal or external command, operable program or batch file. 'pguid' is not recognized as an internal or external command, operable program or batch file. 'destUrl' is not recognized as an internal or external command, operable program or
Re: [Bug-wget] CNET download links not working with WGET
If you read the most recent output of wget that you gave (after quoting the URL), it _does_ treat the string of characters as a whole URL. The server redirects it to a shorter URL. If I enter that same URL into a browser, it does the same redirection there, and results in an HTML page, just like what wget gets. That page seems to have some JavaScript or something that initiates a separate download of something else; I suppose that something else is what you wanted. As you may know, wget doesn't execute JavaScript code from a webpage, so you'll need to find the real URL to the thing you wanted to download, and feed that to wget. -mjc On 06/08/2011 09:38 AM, Jeff Givens wrote No, it's not working. It downloads part of the URL and creates a file named 3001-8022_4-10804572.html@spi=077d9109e846975d0db9532bd610588f.1 which is 68 KB. I cannot wget to treat the string of characters as a whole URL. Please help, I really need to get this script working and the only place to download this file is from CNET. So... looks like it works, then. Your command shell isn't complaining about weird command names, wget is clearly requesting the full and correct URL, it follows redirections, and saves using the final redirection URL (the latest sources wouldn't follow that last step - it'd save using the request URI by default). If you dislike the filename, then provided you have a recent enough version of wget you can add the --content-disposition option if the server provides a rename header (Content-Disposition); or else use -E to have wget force the file name to end in .html -mjc (05/26/2011 12:19 PM), Jeff Givens wrote: Hi, I know this is an older topic but thanks for replying. I forgot to mention I had already what you listed below and this is the output I get: C:\DOWNLOADwget http://dw.com.com/redir?edId=3siteId=4oId=3000-8022_ 4-10804572ontId=8022_4spi=077d9109e846975d0db9532bd610588flop=linktag=tdw_dl textltype=dl_dlnowpid=11665648mfgId=6290020merId=6290020pguid=HFsQLwoOYJQAA BuImQcAAAGmdestUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572.html%3 Fspi%3D077d9109e846975d0db9532bd610588f --2011-05-02 12:34:20-- http://dw.com.com/redir?edId=3siteId=4oId=3000-8022_4 -10804572ontId=8022_4spi=077d9109e846975d0db9532bd610588flop=linktag=tdw_dlt extltype=dl_dlnowpid=11665648mfgId=6290020merId=6290020pguid=HFsQLwoOYJQAAB uImQcAAAGmdestUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572.html%3F spi%3D077d9109e846975d0db9532bd610588f Resolving dw.com.com... 216.239.113.95 Connecting to dw.com.com|216.239.113.95|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://download.cnet.com/3001-8022_4-10804572.html?spi=077d9109e846975 d0db9532bd610588f [following] --2011-05-02 12:34:21-- http://download.cnet.com/3001-8022_4-10804572.html?spi= 077d9109e846975d0db9532bd610588f Resolving download.cnet.com... 64.30.224.58 Connecting to download.cnet.com|64.30.224.58|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `3001-8022_4-10804572.html@spi=077d9109e846975d0db9532bd610588f.1' [= ] 69,240 77.3K/s in 0.9s 2011-05-02 12:34:22 (77.3 KB/s) - `3001-8022_4-10804572.html@spi=077d9109e846975 d0db9532bd610588f.1' saved [69240] C:\DOWNLOAD Thanks for your help. - Jeff
Re: [Bug-wget] CNET download links not working with WGET
Micah, thanks for your help. That was the piece that I was missing. I didn't realize it was re-directing to another site. I was able to find out the other site it was going to, download the executable and then I just put in a command to re-name the exe file since it was named the URL. Thanks again for your help. If you read the most recent output of wget that you gave (after quoting the URL), it _does_ treat the string of characters as a whole URL. The server redirects it to a shorter URL. If I enter that same URL into a browser, it does the same redirection there, and results in an HTML page, just like what wget gets. That page seems to have some JavaScript or something that initiates a separate download of something else; I suppose that something else is what you wanted. As you may know, wget doesn't execute JavaScript code from a webpage, so you'll need to find the real URL to the thing you wanted to download, and feed that to wget. -mjc On 06/08/2011 09:38 AM, Jeff Givens wrote No, it's not working. It downloads part of the URL and creates a file named 3001-8022_4-10804572.html@spi=077d9109e846975d0db9532bd610588f.1 which is 68 KB. I cannot wget to treat the string of characters as a whole URL. Please help, I really need to get this script working and the only place to download this file is from CNET. So... looks like it works, then. Your command shell isn't complaining about weird command names, wget is clearly requesting the full and correct URL, it follows redirections, and saves using the final redirection URL (the latest sources wouldn't follow that last step - it'd save using the request URI by default). If you dislike the filename, then provided you have a recent enough version of wget you can add the --content-disposition option if the server provides a rename header (Content-Disposition); or else use -E to have wget force the file name to end in .html -mjc (05/26/2011 12:19 PM), Jeff Givens wrote: Hi, I know this is an older topic but thanks for replying. I forgot to mention I had already what you listed below and this is the output I get: C:\DOWNLOADwget http://dw.com.com/redir?edId=3siteId=4oId=3000-8022_ 4-10804572ontId=8022_4spi=077d9109e846975d0db9532bd610588flop=linktag=tdw_dl textltype=dl_dlnowpid=11665648mfgId=6290020merId=6290020pguid=HFsQLwoOYJQAA BuImQcAAAGmdestUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572.html%3 Fspi%3D077d9109e846975d0db9532bd610588f --2011-05-02 12:34:20-- http://dw.com.com/redir?edId=3siteId=4oId=3000-8022_4 -10804572ontId=8022_4spi=077d9109e846975d0db9532bd610588flop=linktag=tdw_dlt extltype=dl_dlnowpid=11665648mfgId=6290020merId=6290020pguid=HFsQLwoOYJQAAB uImQcAAAGmdestUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572.html%3F spi%3D077d9109e846975d0db9532bd610588f Resolving dw.com.com... 216.239.113.95 Connecting to dw.com.com|216.239.113.95|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://download.cnet.com/3001-8022_4-10804572.html?spi=077d9109e846975 d0db9532bd610588f [following] --2011-05-02 12:34:21-- http://download.cnet.com/3001-8022_4-10804572.html?spi= 077d9109e846975d0db9532bd610588f Resolving download.cnet.com... 64.30.224.58 Connecting to download.cnet.com|64.30.224.58|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `3001-8022_4-10804572.html@spi=077d9109e846975d0db9532bd610588f.1' [= ] 69,240 77.3K/s in 0.9s 2011-05-02 12:34:22 (77.3 KB/s) - `3001-8022_4-10804572.html@spi=077d9109e846975 d0db9532bd610588f.1' saved [69240] C:\DOWNLOAD Thanks for your help. - Jeff
Re: [Bug-wget] CNET download links not working with WGET
So... looks like it works, then. Your command shell isn't complaining about weird command names, wget is clearly requesting the full and correct URL, it follows redirections, and saves using the final redirection URL (the latest sources wouldn't follow that last step - it'd save using the request URI by default). If you dislike the filename, then provided you have a recent enough version of wget you can add the --content-disposition option if the server provides a rename header (Content-Disposition); or else use -E to have wget force the file name to end in .html -mjc (05/26/2011 12:19 PM), Jeff Givens wrote: Hi, I know this is an older topic but thanks for replying. I forgot to mention I had already what you listed below and this is the output I get: C:\DOWNLOADwget http://dw.com.com/redir?edId=3siteId=4oId=3000-8022_ 4-10804572ontId=8022_4spi=077d9109e846975d0db9532bd610588flop=linktag=tdw_dl textltype=dl_dlnowpid=11665648mfgId=6290020merId=6290020pguid=HFsQLwoOYJQAA BuImQcAAAGmdestUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572.html%3 Fspi%3D077d9109e846975d0db9532bd610588f --2011-05-02 12:34:20-- http://dw.com.com/redir?edId=3siteId=4oId=3000-8022_4 -10804572ontId=8022_4spi=077d9109e846975d0db9532bd610588flop=linktag=tdw_dlt extltype=dl_dlnowpid=11665648mfgId=6290020merId=6290020pguid=HFsQLwoOYJQAAB uImQcAAAGmdestUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572.html%3F spi%3D077d9109e846975d0db9532bd610588f Resolving dw.com.com... 216.239.113.95 Connecting to dw.com.com|216.239.113.95|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://download.cnet.com/3001-8022_4-10804572.html?spi=077d9109e846975 d0db9532bd610588f [following] --2011-05-02 12:34:21-- http://download.cnet.com/3001-8022_4-10804572.html?spi= 077d9109e846975d0db9532bd610588f Resolving download.cnet.com... 64.30.224.58 Connecting to download.cnet.com|64.30.224.58|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `3001-8022_4-10804572.html@spi=077d9109e846975d0db9532bd610588f.1' [ =] 69,240 77.3K/s in 0.9s 2011-05-02 12:34:22 (77.3 KB/s) - `3001-8022_4-10804572.html@spi=077d9109e846975 d0db9532bd610588f.1' saved [69240] C:\DOWNLOAD Thanks for your help. -Jeff hello, the character in the url is interpreted by your shell. Try using something like: wget URL Cheers, Giuseppe Jeff Givensj...@sds.net writes: Hello, I am having an issue downloading files via download links from CNET. It appears to locate some of the URL but stops at the first siteId part. I have included the debug information as well. Thanks in advance for your help. C:\DOWNLOAD\wget http://dw.com.com/redir?edId=3siteId=4oId=300 0-8022_4-10804572ontId=8022_4spi=077d9109e846975d0db9532bd610588flop=linktag =tdw_dltextltype=dl_dlnowpid=11665648mfgId=6290020merId=6290020pguid=HFsQLw oOYJQAABuImQcAAAGmdestUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572 .html%3Fspi%3D077d9109e846975d0db9532bd610588f --2011-04-19 11:30:35-- http://dw.com.com/redir?edId=3 Resolving dw.com.com... 216.239.113.95 Connecting to dw.com.com|216.239.113.95|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://dw.com.com/redir/redx/?edId=3 [following] --2011-04-19 11:30:36-- http://dw.com.com/redir/redx/?edId=3 Reusing existing connection to dw.com.com:80. HTTP request sent, awaiting response... 404 Not Found 2011-04-19 11:30:36 ERROR 404: Not Found. 'siteId' is not recognized as an internal or external command, operable program or batch file. 'oId' is not recognized as an internal or external command, operable program or batch file. 'ontId' is not recognized as an internal or external command, operable program or batch file. 'spi' is not recognized as an internal or external command, operable program or batch file. 'lop' is not recognized as an internal or external command, operable program or batch file. 'tag' is not recognized as an internal or external command, operable program or batch file. 'ltype' is not recognized as an internal or external command, operable program or batch file. 'pid' is not recognized as an internal or external command, operable program or batch file. 'mfgId' is not recognized as an internal or external command, operable program or batch file. 'merId' is not recognized as an internal or external command, operable program or batch file. 'pguid' is not recognized as an internal or external command, operable program or batch file. 'destUrl' is not recognized as an internal or external command, operable program or batch file. DEBUG output created by Wget 1.11.4 on Windows-MSVC. --2011-04-19 11:27:09-- http://dw.com.com/redir?edId=3 Resolving dw.com.com... seconds 0.00, 64.30.224.42 Caching dw.com.com = 64.30.224.42 Connecting to
Re: [Bug-wget] CNET download links not working with WGET
hello, the character in the url is interpreted by your shell. Try using something like: wget URL Cheers, Giuseppe Jeff Givens j...@sds.net writes: Hello, I am having an issue downloading files via download links from CNET. It appears to locate some of the URL but stops at the first siteId part. I have included the debug information as well. Thanks in advance for your help. C:\DOWNLOAD\wget http://dw.com.com/redir?edId=3siteId=4oId=300 0-8022_4-10804572ontId=8022_4spi=077d9109e846975d0db9532bd610588flop=linktag =tdw_dltextltype=dl_dlnowpid=11665648mfgId=6290020merId=6290020pguid=HFsQLw oOYJQAABuImQcAAAGmdestUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572 .html%3Fspi%3D077d9109e846975d0db9532bd610588f --2011-04-19 11:30:35-- http://dw.com.com/redir?edId=3 Resolving dw.com.com... 216.239.113.95 Connecting to dw.com.com|216.239.113.95|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://dw.com.com/redir/redx/?edId=3 [following] --2011-04-19 11:30:36-- http://dw.com.com/redir/redx/?edId=3 Reusing existing connection to dw.com.com:80. HTTP request sent, awaiting response... 404 Not Found 2011-04-19 11:30:36 ERROR 404: Not Found. 'siteId' is not recognized as an internal or external command, operable program or batch file. 'oId' is not recognized as an internal or external command, operable program or batch file. 'ontId' is not recognized as an internal or external command, operable program or batch file. 'spi' is not recognized as an internal or external command, operable program or batch file. 'lop' is not recognized as an internal or external command, operable program or batch file. 'tag' is not recognized as an internal or external command, operable program or batch file. 'ltype' is not recognized as an internal or external command, operable program or batch file. 'pid' is not recognized as an internal or external command, operable program or batch file. 'mfgId' is not recognized as an internal or external command, operable program or batch file. 'merId' is not recognized as an internal or external command, operable program or batch file. 'pguid' is not recognized as an internal or external command, operable program or batch file. 'destUrl' is not recognized as an internal or external command, operable program or batch file. DEBUG output created by Wget 1.11.4 on Windows-MSVC. --2011-04-19 11:27:09-- http://dw.com.com/redir?edId=3 Resolving dw.com.com... seconds 0.00, 64.30.224.42 Caching dw.com.com = 64.30.224.42 Connecting to dw.com.com|64.30.224.42|:80... seconds 0.00, connected. Created socket 340. Releasing 0x01411158 (new refcount 1). ---request begin--- GET /redir?edId=3 HTTP/1.0 User-Agent: Wget/1.11.4 Accept: */* Host: dw.com.com Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 Found Date: Tue, 19 Apr 2011 15:27:26 GMT Server: Apache/2.0 Pragma: no-cache Cache-control: no-cache, must-revalidate, no-transform Vary: * Expires: Fri, 23 Jan 1970 12:12:12 GMT Set-Cookie: XCLGFbrowser=Cg5iVk2tqd6J8Sg; expires=Sun, 18-Apr-2021 15:27:26 GMT; domain=.com.com; path=/ Location: http://dw.com.com/redir/redx/?edId=3 Content-Length: 0 P3P: CP=CAO DSP COR CURa ADMa DEVa PSAa PSDa IVAi IVDi CONi OUR OTRi IND PHY ONL UNI FIN COM NAV INT DEM STA Keep-Alive: timeout=363, max=760 Connection: Keep-Alive Content-Type: text/plain ---response end--- 302 Found Registered socket 340 for persistent reuse. cdm: 1 2 3 4 5 6 7 8 Stored cookie com.com -1 (ANY) / permanent insecure [expiry 2021-04-18 11:27:26] XCLGFbrowser Cg5iVk2tqd6J8Sg Location: http://dw.com.com/redir/redx/?edId=3 [following] Skipping 0 bytes of body: [] done. --2011-04-19 11:27:09-- http://dw.com.com/redir/redx/?edId=3 Reusing existing connection to dw.com.com:80. Reusing fd 340. ---request begin--- GET /redir/redx/?edId=3 HTTP/1.0 User-Agent: Wget/1.11.4 Accept: */* Host: dw.com.com Connection: Keep-Alive Cookie: XCLGFbrowser=Cg5iVk2tqd6J8Sg ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 404 Not Found Date: Tue, 19 Apr 2011 15:27:26 GMT Server: Apache/2.0 Content-Length: 209 Keep-Alive: timeout=363, max=779 Connection: Keep-Alive Content-Type: text/html; charset=iso-8859-1 ---response end--- 404 Not Found Skipping 209 bytes of body: [!DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN htmlhead title404 Not Found/title /headbody h1Not Found/h1 pThe requested URL /redir/redx/ was not found on this server./p /body/html ] done. 2011-04-19 11:27:09 ERROR 404: Not Found.