Following is a message that was forwarded to me by someone who was
having difficulties posting to the list. Please Cc the author in your
replies.

I asked augustin to talk to [EMAIL PROTECTED] about the difficulties in
posting to the list (I think a move to gnu.org is really going to need
to happen RSN); I also recommended to be sure to remove surrounding
space from the "=" in things like "--referer=...", and to use the
--debug flag to check wget's request headers against the target set.
Hopefully this will solve the problem, but if anyone has additional
advice, feel free.

-Micah

----------  Forwarded Message  ----------

Subject: Downloading video list from youtube profile.
Date: Friday 31 October 2008
From: augustin <[EMAIL PROTECTED]>
To: wget@sunsite.dk



Hello,

I am trying to use wget to download the video list from a youtube
profile, but
youtube uses some AJAX and there is no direct download link to use, which
makes the task a bit complicated.

I tried to subscribe to this list but my subscription was refused with the
following message:
<[EMAIL PROTECTED]>: host a.mx.sunsite.dk[130.225.254.106] said: 550
    5.7.1 Blocked by SpamAssassin (in reply to end of DATA command)
Therefore, I am NOT subscribed to this list and would appreciate if you
could
CC in your reply.


If you point your browser to:
http://tw.youtube.com/profile?user=BarackObamadotcom&view=videos
you will invariably be pointed to the first page of videos.
You can see at the bottom that there are more pages. Clicking on any
subsequent page will call some AJAX script which will refresh the inside of
the page.

There is no direct way to get a link to download, say, the list of
videos on
page 38. Even manually, that would be fastidious, because you can only
click
on the largest page number available and hop page after page to the end of
the list.

I am using Firefox and the very good firebug extension to get a clue of
what's
happening behind the scenes. Thus, I can get the full headers of the AJAX
request, and the reply. I use this to try to replicate the same request
with
wget.


Here is a sample HEADER for a request:

Host    tw.youtube.com
User-Agent      Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.17)
Gecko/20080924
Ubuntu/8.04 (hardy) Firefox/2.0.0.17
Accept  
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip,deflate
Accept-Charset  ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive      300
Connection      keep-alive
Content-Type    application/x-www-form-urlencoded
Referer http://tw.youtube.com/profile?user=BarackObamadotcom&view=videos
Content-Length  422
Cookie  use_hitbox=72c46ff6cbcdb7c5585c36411b6b334edAEAAAAw;
VISITOR_INFO1_LIVE=fvxHpXl_mLY; PREF=f1=11000000&gl=TW&hl=zh-TW;
GEO=89297d4335cbe1b88e16edc35e26fbedcwwAAAAyVFfbRIilAP95Ckk=;
__utma=207772311.1297267009699863800.1225423366.1225423366.1225426494.2;
__utmc=207772311; __utmz=207772311.1225423366.1.1.utmcsr=(direct)|
utmccn=(direct)|utmcmd=(none);
watched_video_id_list=91c191d2b5f06f2215ec8f1813faa796WwEAAABzCwAAAE1nNTZLYnRtQVJj;

__utmb=207772311.1.10.1225426494
Pragma  no-cache
Cache-Control   no-cache

With the POST information:

messages        [{"type":"box_method","request":
{"name":"user_videos","user_id":27679989,"style":"None","x_position":1,"y_position":24,"method":"draw_page_internal","params":
{"start":80,"num":20,"view_all_mode":"True","sort":"p"}}}]
session_token   

With some PARAMS which I don't know how to use:

action_ajax     1
box_method      draw_page_internal
box_name        user_videos
user    BarackObamadotcom



Finally, here is my wget call, attempting to replicate the above request:



wget \
--keep-session-cookies \
--post-data = 'session_token=&messages=[{"type":"box_method","request":
{"name":"user_videos","user_id":27679989,"style":"None","x_position":1,"y_position":24,"method":"draw_page_internal","params":
{"start":40,"num":20,"view_all_mode":"True","sort":"p"}}}]' \
--save-cookies cookie.txt \
--load-cookies cookie.txt \
--referer =
"http://tw.youtube.com/profile?user=BarackObamadotcom&view=videos";
\
--user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.17)
Gecko/20080924 Ubuntu/8.04 (hardy) Firefox/2.0.0.17" \
http://tw.youtube.com/profile?action_ajax=1&user=BarackObamadotcom&box_method=draw_page_internal&box_name=user_videos
;


or:


wget \
--keep-session-cookies \
--post-data
=
'action_ajax=1&user=BarackObamadotcom&box_method=draw_page_internal&box_name=user_videos

session_token=FRR-1WmHVPCLE6r3ImZ48PqDSrF8MTIyNTUxMjg4OQ==&messages=%5B%7B%22type%22%3A%22box_method%22%2C%22request%22%3A%7B%22name%22%3A%22user_videos%22%2C%22user_id%22%3A27679989%2C%22style%22%3A%22None%22%2C%22x_position%22%3A1%2C%22y_position%22%3A24%2C%22method%22%3A%22draw_page_internal%22%2C%22params%22%3A%7B%22start%22%3A80%2C%22num%22%3A20%2C%22view_all_mode%22%3A%22True%22%2C%22sort%22%3A%22p%22%7D%7D%7D%5D'

\
--save-cookies cookie.txt \
--load-cookies cookie.txt \
--referer =
"http://tw.youtube.com/profile?user=BarackObamadotcom&view=videos";
\
--user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.17)
Gecko/20080924 Ubuntu/8.04 (hardy) Firefox/2.0.0.17" \
http://tw.youtube.com/profile?action_ajax=1&user=BarackObamadotcom&box_method=draw_page_internal&box_name=user_videos
;


I've tried various other combinations but all failed.






You can try yourself. I can't manage to download the second page or any
subsequent page. I only ever get the content of the first page in return.


I don't know what I am missing or what I am doing wrong.

Thanks for any help,

Augustin.








-------------------------------------------------------

Reply via email to