Following is a message that was forwarded to me by someone who was having difficulties posting to the list. Please Cc the author in your replies.
I asked augustin to talk to [EMAIL PROTECTED] about the difficulties in posting to the list (I think a move to gnu.org is really going to need to happen RSN); I also recommended to be sure to remove surrounding space from the "=" in things like "--referer=...", and to use the --debug flag to check wget's request headers against the target set. Hopefully this will solve the problem, but if anyone has additional advice, feel free. -Micah ---------- Forwarded Message ---------- Subject: Downloading video list from youtube profile. Date: Friday 31 October 2008 From: augustin <[EMAIL PROTECTED]> To: wget@sunsite.dk Hello, I am trying to use wget to download the video list from a youtube profile, but youtube uses some AJAX and there is no direct download link to use, which makes the task a bit complicated. I tried to subscribe to this list but my subscription was refused with the following message: <[EMAIL PROTECTED]>: host a.mx.sunsite.dk[130.225.254.106] said: 550 5.7.1 Blocked by SpamAssassin (in reply to end of DATA command) Therefore, I am NOT subscribed to this list and would appreciate if you could CC in your reply. If you point your browser to: http://tw.youtube.com/profile?user=BarackObamadotcom&view=videos you will invariably be pointed to the first page of videos. You can see at the bottom that there are more pages. Clicking on any subsequent page will call some AJAX script which will refresh the inside of the page. There is no direct way to get a link to download, say, the list of videos on page 38. Even manually, that would be fastidious, because you can only click on the largest page number available and hop page after page to the end of the list. I am using Firefox and the very good firebug extension to get a clue of what's happening behind the scenes. Thus, I can get the full headers of the AJAX request, and the reply. I use this to try to replicate the same request with wget. Here is a sample HEADER for a request: Host tw.youtube.com User-Agent Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.17) Gecko/20080924 Ubuntu/8.04 (hardy) Firefox/2.0.0.17 Accept text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language en-us,en;q=0.5 Accept-Encoding gzip,deflate Accept-Charset ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive 300 Connection keep-alive Content-Type application/x-www-form-urlencoded Referer http://tw.youtube.com/profile?user=BarackObamadotcom&view=videos Content-Length 422 Cookie use_hitbox=72c46ff6cbcdb7c5585c36411b6b334edAEAAAAw; VISITOR_INFO1_LIVE=fvxHpXl_mLY; PREF=f1=11000000&gl=TW&hl=zh-TW; GEO=89297d4335cbe1b88e16edc35e26fbedcwwAAAAyVFfbRIilAP95Ckk=; __utma=207772311.1297267009699863800.1225423366.1225423366.1225426494.2; __utmc=207772311; __utmz=207772311.1225423366.1.1.utmcsr=(direct)| utmccn=(direct)|utmcmd=(none); watched_video_id_list=91c191d2b5f06f2215ec8f1813faa796WwEAAABzCwAAAE1nNTZLYnRtQVJj; __utmb=207772311.1.10.1225426494 Pragma no-cache Cache-Control no-cache With the POST information: messages [{"type":"box_method","request": {"name":"user_videos","user_id":27679989,"style":"None","x_position":1,"y_position":24,"method":"draw_page_internal","params": {"start":80,"num":20,"view_all_mode":"True","sort":"p"}}}] session_token With some PARAMS which I don't know how to use: action_ajax 1 box_method draw_page_internal box_name user_videos user BarackObamadotcom Finally, here is my wget call, attempting to replicate the above request: wget \ --keep-session-cookies \ --post-data = 'session_token=&messages=[{"type":"box_method","request": {"name":"user_videos","user_id":27679989,"style":"None","x_position":1,"y_position":24,"method":"draw_page_internal","params": {"start":40,"num":20,"view_all_mode":"True","sort":"p"}}}]' \ --save-cookies cookie.txt \ --load-cookies cookie.txt \ --referer = "http://tw.youtube.com/profile?user=BarackObamadotcom&view=videos" \ --user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.17) Gecko/20080924 Ubuntu/8.04 (hardy) Firefox/2.0.0.17" \ http://tw.youtube.com/profile?action_ajax=1&user=BarackObamadotcom&box_method=draw_page_internal&box_name=user_videos ; or: wget \ --keep-session-cookies \ --post-data = 'action_ajax=1&user=BarackObamadotcom&box_method=draw_page_internal&box_name=user_videos session_token=FRR-1WmHVPCLE6r3ImZ48PqDSrF8MTIyNTUxMjg4OQ==&messages=%5B%7B%22type%22%3A%22box_method%22%2C%22request%22%3A%7B%22name%22%3A%22user_videos%22%2C%22user_id%22%3A27679989%2C%22style%22%3A%22None%22%2C%22x_position%22%3A1%2C%22y_position%22%3A24%2C%22method%22%3A%22draw_page_internal%22%2C%22params%22%3A%7B%22start%22%3A80%2C%22num%22%3A20%2C%22view_all_mode%22%3A%22True%22%2C%22sort%22%3A%22p%22%7D%7D%7D%5D' \ --save-cookies cookie.txt \ --load-cookies cookie.txt \ --referer = "http://tw.youtube.com/profile?user=BarackObamadotcom&view=videos" \ --user-agent="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.17) Gecko/20080924 Ubuntu/8.04 (hardy) Firefox/2.0.0.17" \ http://tw.youtube.com/profile?action_ajax=1&user=BarackObamadotcom&box_method=draw_page_internal&box_name=user_videos ; I've tried various other combinations but all failed. You can try yourself. I can't manage to download the second page or any subsequent page. I only ever get the content of the first page in return. I don't know what I am missing or what I am doing wrong. Thanks for any help, Augustin. -------------------------------------------------------