Re: [Bug-wget] downloading links in a dynamic site

2010-07-26 Thread Keisial
 Vinh Nguyen wrote:
 Dear list,

 My goal is to download some pdf files from a dynamic site (not sure on
 the terminology).  For example, I would execute:

 wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
 http://site.com/?sortorder=ascp_o=0

 and would get my 10 pdf files.  On the page I can click a Next link
 (to have more files), and I execute:

 wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
 http://site.com/?sortorder=ascp_o=10

 However, the downloaded files are identical to the previous.  I tried
 the cookies setting and referer setting:

 wget -U firefox --cookies=on --keep-session-cookies
 --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
 http://site.com/?sortorder=ascp_o=0
 wget -U firefox --referer='http://site.com/?sortorder=ascp_o=0'
 --cookies=on --load-cookies=cookie.txt --keep-session-cookies
 --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
 http://site.com/?sortorder=ascp_o=10

 but the results again are identical.  Any suggestions?

 Thanks.
 Vinh

Look at the page source how they are generating the urls.
Maybe they are using some ugly javascript, although that discards
the benefit of paging...




Re: [Bug-wget] downloading links in a dynamic site

2010-07-26 Thread Vinh Nguyen
On Mon, Jul 26, 2010 at 1:51 PM, Vinh Nguyen vinhdi...@gmail.com wrote:
 That's displayed in the source.  Also, when i try to manually enter
 the url changing =10, =20, =30, I get the right page, so I don't think
 it's a javascript issue.  What else could it be besides referer and
 cookies?

Confirmed that it also works in a DIFFERENT browser (conkeror and
firefox).  Hmm, what can be the difference between wget and these
browsers?



Re: [Bug-wget] downloading links in a dynamic site

2010-07-26 Thread Vinh Nguyen
On Mon, Jul 26, 2010 at 2:02 PM, Vinh Nguyen vinhdi...@gmail.com wrote:
 On Mon, Jul 26, 2010 at 1:51 PM, Vinh Nguyen vinhdi...@gmail.com wrote:
 That's displayed in the source.  Also, when i try to manually enter
 the url changing =10, =20, =30, I get the right page, so I don't think
 it's a javascript issue.  What else could it be besides referer and
 cookies?

 Confirmed that it also works in a DIFFERENT browser (conkeror and
 firefox).  Hmm, what can be the difference between wget and these
 browsers?

This issue is RESOLVED.  Put 'quotes' around the url.  I thought I had
this the entire time.  Thanks everyone.

Vinh



[Bug-wget] downloading links in a dynamic site

2010-07-25 Thread Vinh Nguyen
Dear list,

My goal is to download some pdf files from a dynamic site (not sure on
the terminology).  For example, I would execute:

wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
http://site.com/?sortorder=ascp_o=0

and would get my 10 pdf files.  On the page I can click a Next link
(to have more files), and I execute:

wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
http://site.com/?sortorder=ascp_o=10

However, the downloaded files are identical to the previous.  I tried
the cookies setting and referer setting:

wget -U firefox --cookies=on --keep-session-cookies
--save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
http://site.com/?sortorder=ascp_o=0
wget -U firefox --referer='http://site.com/?sortorder=ascp_o=0'
--cookies=on --load-cookies=cookie.txt --keep-session-cookies
--save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
http://site.com/?sortorder=ascp_o=10

but the results again are identical.  Any suggestions?

Thanks.
Vinh