Re: [Bug-wget] downloading links in a dynamic site
On Mon, Jul 26, 2010 at 2:02 PM, Vinh Nguyen wrote: > On Mon, Jul 26, 2010 at 1:51 PM, Vinh Nguyen wrote: >> That's displayed in the source. Also, when i try to manually enter >> the url changing =10, =20, =30, I get the right page, so I don't think >> it's a javascript issue. What else could it be besides referer and >> cookies? > > Confirmed that it also works in a DIFFERENT browser (conkeror and > firefox). Hmm, what can be the difference between wget and these > browsers? This issue is RESOLVED. Put 'quotes' around the url. I thought I had this the entire time. Thanks everyone. Vinh
Re: [Bug-wget] downloading links in a dynamic site
On Mon, Jul 26, 2010 at 1:51 PM, Vinh Nguyen wrote: > That's displayed in the source. Also, when i try to manually enter > the url changing =10, =20, =30, I get the right page, so I don't think > it's a javascript issue. What else could it be besides referer and > cookies? Confirmed that it also works in a DIFFERENT browser (conkeror and firefox). Hmm, what can be the difference between wget and these browsers?
Re: [Bug-wget] downloading links in a dynamic site
On Mon, Jul 26, 2010 at 11:18 AM, Keisial wrote: > Vinh Nguyen wrote: >> Dear list, >> >> My goal is to download some pdf files from a dynamic site (not sure on >> the terminology). For example, I would execute: >> >> wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*' >> http://site.com/?sortorder=asc&p_o=0 >> >> and would get my 10 pdf files. On the page I can click a "Next" link >> (to have more files), and I execute: >> >> wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*' >> http://site.com/?sortorder=asc&p_o=10 >> >> However, the downloaded files are identical to the previous. I tried >> the cookies setting and referer setting: >> >> wget -U firefox --cookies=on --keep-session-cookies >> --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*' >> http://site.com/?sortorder=asc&p_o=0 >> wget -U firefox --referer='http://site.com/?sortorder=asc&p_o=0' >> --cookies=on --load-cookies=cookie.txt --keep-session-cookies >> --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*' >> http://site.com/?sortorder=asc&p_o=10 >> >> but the results again are identical. Any suggestions? >> >> Thanks. >> Vinh > > Look at the page source how they are generating the urls. > Maybe they are using some ugly javascript, although that discards > the benefit of paging... Thanks for your response Keisial. I looked at the source, and of course, there is javascript. However, I couldn't tie it to anything that generate links. The links that I click on: 32 ChaptersFirst | 1-10 | 11-20 | 21-30 | 31-32 | Next That's displayed in the source. Also, when i try to manually enter the url changing =10, =20, =30, I get the right page, so I don't think it's a javascript issue. What else could it be besides referer and cookies? Vinh
Re: [Bug-wget] downloading links in a dynamic site
Vinh Nguyen wrote: > Dear list, > > My goal is to download some pdf files from a dynamic site (not sure on > the terminology). For example, I would execute: > > wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*' > http://site.com/?sortorder=asc&p_o=0 > > and would get my 10 pdf files. On the page I can click a "Next" link > (to have more files), and I execute: > > wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*' > http://site.com/?sortorder=asc&p_o=10 > > However, the downloaded files are identical to the previous. I tried > the cookies setting and referer setting: > > wget -U firefox --cookies=on --keep-session-cookies > --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*' > http://site.com/?sortorder=asc&p_o=0 > wget -U firefox --referer='http://site.com/?sortorder=asc&p_o=0' > --cookies=on --load-cookies=cookie.txt --keep-session-cookies > --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*' > http://site.com/?sortorder=asc&p_o=10 > > but the results again are identical. Any suggestions? > > Thanks. > Vinh Look at the page source how they are generating the urls. Maybe they are using some ugly javascript, although that discards the benefit of paging...
[Bug-wget] downloading links in a dynamic site
Dear list, My goal is to download some pdf files from a dynamic site (not sure on the terminology). For example, I would execute: wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*' http://site.com/?sortorder=asc&p_o=0 and would get my 10 pdf files. On the page I can click a "Next" link (to have more files), and I execute: wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*' http://site.com/?sortorder=asc&p_o=10 However, the downloaded files are identical to the previous. I tried the cookies setting and referer setting: wget -U firefox --cookies=on --keep-session-cookies --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*' http://site.com/?sortorder=asc&p_o=0 wget -U firefox --referer='http://site.com/?sortorder=asc&p_o=0' --cookies=on --load-cookies=cookie.txt --keep-session-cookies --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*' http://site.com/?sortorder=asc&p_o=10 but the results again are identical. Any suggestions? Thanks. Vinh