Re: [Bug-wget] downloading links in a dynamic site

2010-07-26 Thread Vinh Nguyen
On Mon, Jul 26, 2010 at 2:02 PM, Vinh Nguyen  wrote:
> On Mon, Jul 26, 2010 at 1:51 PM, Vinh Nguyen  wrote:
>> That's displayed in the source.  Also, when i try to manually enter
>> the url changing =10, =20, =30, I get the right page, so I don't think
>> it's a javascript issue.  What else could it be besides referer and
>> cookies?
>
> Confirmed that it also works in a DIFFERENT browser (conkeror and
> firefox).  Hmm, what can be the difference between wget and these
> browsers?

This issue is RESOLVED.  Put 'quotes' around the url.  I thought I had
this the entire time.  Thanks everyone.

Vinh



Re: [Bug-wget] downloading links in a dynamic site

2010-07-26 Thread Vinh Nguyen
On Mon, Jul 26, 2010 at 1:51 PM, Vinh Nguyen  wrote:
> That's displayed in the source.  Also, when i try to manually enter
> the url changing =10, =20, =30, I get the right page, so I don't think
> it's a javascript issue.  What else could it be besides referer and
> cookies?

Confirmed that it also works in a DIFFERENT browser (conkeror and
firefox).  Hmm, what can be the difference between wget and these
browsers?



Re: [Bug-wget] downloading links in a dynamic site

2010-07-26 Thread Vinh Nguyen
On Mon, Jul 26, 2010 at 11:18 AM, Keisial  wrote:
>  Vinh Nguyen wrote:
>> Dear list,
>>
>> My goal is to download some pdf files from a dynamic site (not sure on
>> the terminology).  For example, I would execute:
>>
>> wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
>> http://site.com/?sortorder=asc&p_o=0
>>
>> and would get my 10 pdf files.  On the page I can click a "Next" link
>> (to have more files), and I execute:
>>
>> wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
>> http://site.com/?sortorder=asc&p_o=10
>>
>> However, the downloaded files are identical to the previous.  I tried
>> the cookies setting and referer setting:
>>
>> wget -U firefox --cookies=on --keep-session-cookies
>> --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
>> http://site.com/?sortorder=asc&p_o=0
>> wget -U firefox --referer='http://site.com/?sortorder=asc&p_o=0'
>> --cookies=on --load-cookies=cookie.txt --keep-session-cookies
>> --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
>> http://site.com/?sortorder=asc&p_o=10
>>
>> but the results again are identical.  Any suggestions?
>>
>> Thanks.
>> Vinh
>
> Look at the page source how they are generating the urls.
> Maybe they are using some ugly javascript, although that discards
> the benefit of paging...


Thanks for your response Keisial.  I looked at the source, and of
course, there is javascript.  However, I couldn't tie it to anything
that generate links.  The links that I click on:

32 ChaptersFirst | 1-10 | 11-20 | 21-30 | 31-32 | Next

That's displayed in the source.  Also, when i try to manually enter
the url changing =10, =20, =30, I get the right page, so I don't think
it's a javascript issue.  What else could it be besides referer and
cookies?

Vinh



Re: [Bug-wget] downloading links in a dynamic site

2010-07-26 Thread Keisial
 Vinh Nguyen wrote:
> Dear list,
>
> My goal is to download some pdf files from a dynamic site (not sure on
> the terminology).  For example, I would execute:
>
> wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
> http://site.com/?sortorder=asc&p_o=0
>
> and would get my 10 pdf files.  On the page I can click a "Next" link
> (to have more files), and I execute:
>
> wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
> http://site.com/?sortorder=asc&p_o=10
>
> However, the downloaded files are identical to the previous.  I tried
> the cookies setting and referer setting:
>
> wget -U firefox --cookies=on --keep-session-cookies
> --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
> http://site.com/?sortorder=asc&p_o=0
> wget -U firefox --referer='http://site.com/?sortorder=asc&p_o=0'
> --cookies=on --load-cookies=cookie.txt --keep-session-cookies
> --save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
> http://site.com/?sortorder=asc&p_o=10
>
> but the results again are identical.  Any suggestions?
>
> Thanks.
> Vinh

Look at the page source how they are generating the urls.
Maybe they are using some ugly javascript, although that discards
the benefit of paging...




[Bug-wget] downloading links in a dynamic site

2010-07-25 Thread Vinh Nguyen
Dear list,

My goal is to download some pdf files from a dynamic site (not sure on
the terminology).  For example, I would execute:

wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
http://site.com/?sortorder=asc&p_o=0

and would get my 10 pdf files.  On the page I can click a "Next" link
(to have more files), and I execute:

wget -U firefox -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
http://site.com/?sortorder=asc&p_o=10

However, the downloaded files are identical to the previous.  I tried
the cookies setting and referer setting:

wget -U firefox --cookies=on --keep-session-cookies
--save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
http://site.com/?sortorder=asc&p_o=0
wget -U firefox --referer='http://site.com/?sortorder=asc&p_o=0'
--cookies=on --load-cookies=cookie.txt --keep-session-cookies
--save-cookies=cookie.txt -r -l1 -nd -e robots=off -A '*.pdf,*.pdf.*'
http://site.com/?sortorder=asc&p_o=10

but the results again are identical.  Any suggestions?

Thanks.
Vinh