Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote: On Mon, 8 Sep 2008, Donald Allen wrote: The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts need. Then you read the capure and replay them as closely as possible using your tool. As you will find out, sites like this use all sorts of funny tricks to figure out you and to make it hard to automate what you're trying to do. They tend to use javascripts for redirects and for fiddling with cookies just to make sure you have a javascript and cookie enabled browser. So you need to work hard(er) when trying this with non-browsers. It's certainly still possible, even without using the browser to get the first cookie file. But it may take some effort. I have not been able to retrieve a page with wget as if I were logged in using --load-cookies and Micah's suggestion about 'Accept-Encoding' (there was a typo in his message -- it's 'Accept-Encoding', not 'Accept-Encodings'). I did install livehttpheaders and tried --no-cookies and --header cookie info from livehttpheaders and that did work. Some of the cookie info sent by Firefox was a mystery, because it's not in the cookie file. Perhaps that's the crucial difference -- I'm speculating that wget isn't sending quite the same thing as Firefox when --load-cookies is used, because Firefox is adding stuff that isn't in the cookie file. Just a guess. Is there a way to ask wget to print the headers it sends (ala livehttpheaders)? I've looked through the options on the man page and didn't see anything, though I might have missed it. -- / daniel.haxx.se
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 12:23 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote: On Mon, 8 Sep 2008, Donald Allen wrote: The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts need. Then you read the capure and replay them as closely as possible using your tool. As you will find out, sites like this use all sorts of funny tricks to figure out you and to make it hard to automate what you're trying to do. They tend to use javascripts for redirects and for fiddling with cookies just to make sure you have a javascript and cookie enabled browser. So you need to work hard(er) when trying this with non-browsers. It's certainly still possible, even without using the browser to get the first cookie file. But it may take some effort. I have not been able to retrieve a page with wget as if I were logged in using --load-cookies and Micah's suggestion about 'Accept-Encoding' (there was a typo in his message -- it's 'Accept-Encoding', not 'Accept-Encodings'). I did install livehttpheaders and tried --no-cookies and --header cookie info from livehttpheaders and that did work. That's how I did it as well (except I got the headers from tcpdump); I'm using Firefox 3, so don't have access to FF's new sqllite-based cookies file (apart from the patch at http://wget.addictivecode.org/FrontPage?action=AttachFiledo=viewtarget=wget-firefox3-cookie.patch ). Some of the cookie info sent by Firefox was a mystery, because it's not in the cookie file. Perhaps that's the crucial difference -- I'm speculating that wget isn't sending quite the same thing as Firefox when --load-cookies is used, because Firefox is adding stuff that isn't in the cookie file. Just a guess. Probably there are session cookies involved, that are sent in the first page, that you're not sending back with the form submit. - --keep-session-cookies and --save-cookies=foo.txt make a good combination. Is there a way to ask wget to print the headers it sends (ala livehttpheaders)? I've looked through the options on the man page and didn't see anything, though I might have missed it. - --debug Well, I rebuilt my wget with the 'debug' use flag and ran it on the yahoo test page (after having logged in to yahoo with firefox, of course) with --load-cookies and the accept-encoding header item, with --debug. Very useful. wget is sending every cookie item in firefox's cookies.txt. But firefox sends three additional cookie items in the header that wget does not send. Those items are *not* in firefox's cookies.txt so wget has no way of knowing about them. Is it possible that firefox is not writing session cookies to the file? The result of this test, just to be clear, was a page that indicated yahoo thought I was not logged in. Those extra items firefox is sending appear to be the difference, because I included them (from the livehttpheaders output) when I tried sending the cookies manually with --header, I got the same page back with wget that indicated that yahoo knew I was logged in and formatted with page with my preferences. /Don - -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxqL77M8hyUobTrERAovFAJ9yagS2xW+2wFG65BwiFkJNfTMylgCfYaq7 1vOmTDimFg8E7Cn+Q+HGZn8= =JKXH -END PGP SIGNATURE-
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 1:29 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: The result of this test, just to be clear, was a page that indicated yahoo thought I was not logged in. Those extra items firefox is sending appear to be the difference, because I included them (from the livehttpheaders output) when I tried sending the cookies manually with --header, I got the same page back with wget that indicated that yahoo knew I was logged in and formatted with page with my preferences. Perhaps you missed this in my last message: Probably there are session cookies involved, that are sent in the first page, that you're not sending back with the form submit. --keep-session-cookies and --save-cookies=foo.txt make a good combination. I think we're mis-communicating, easily my fault, since I know just enough about this stuff to be dangerous. I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxrJ17M8hyUobTrERAvdsAJ9XEwMfimHXRUXKtV66P+YsG+tA7gCfWKbq nCqAmXJfU3kTncMQkKk0JZo= =17Yr -END PGP SIGNATURE-
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxrVD7M8hyUobTrERAt19AJ9bmmczCKjzMtGCoXb8B5g25uMLRQCeK8qh M57W3Reqj+/pO8GuDwb9Nok= =ajp/ -END PGP SIGNATURE-
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. Yes; and that's entirely my fault, as I didn't explicitly say that. No problem. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. I was actually surprised you kept going after I pointed out that it required the Accept-Encoding header that results in gzipped content. That didn't faze me because the pages I'm after will be processed by a python program, so having to gunzip would not require a manual step. This behavior is a little surprising to me from Yahoo!. It's not surprising in _general_, but for a site that really wants to be as accessible as possible (I would think?), insisting on the latest browsers seems ill-advised. Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape, visit a site, and get a server-generated page that's empty other than the phrase You're not using Internet Explorer. :p And taking it one step further, I'm greatly enjoying watching Microsoft thrash around, trying to save themselves, which I don't think they will. Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not going to produce milk too much longer. I've just installed the Chrome beta on the Windows side of one of my machines (I grudgingly give it 10 Gb on each machine; Linux gets the rest), and it looks very, very nice. They've still got work to do, but they appear to be heading in a very good direction. These are smart people at Google. All signs seem to be pointing towards more and more computing happening on the server side in the coming years. /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik 3HbbATyqnrm0hAJXqNTqpl4= =3XD/ -END PGP SIGNATURE-
Re: Wget and Yahoo login?
After surprisingly little struggle, I got Plan B working -- logged into yahoo with wget, saved the cookies, including session cookies, and then proceeded to fetch pages using the saved cookies. Those pages came back logged in as me, with my customizations. Thanks to Tony, Daniel, and Micah -- you all provided critical advice in solving this problem. /Don On Tue, Sep 9, 2008 at 2:21 PM, Donald Allen [EMAIL PROTECTED] wrote: On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. Yes; and that's entirely my fault, as I didn't explicitly say that. No problem. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. I was actually surprised you kept going after I pointed out that it required the Accept-Encoding header that results in gzipped content. That didn't faze me because the pages I'm after will be processed by a python program, so having to gunzip would not require a manual step. This behavior is a little surprising to me from Yahoo!. It's not surprising in _general_, but for a site that really wants to be as accessible as possible (I would think?), insisting on the latest browsers seems ill-advised. Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape, visit a site, and get a server-generated page that's empty other than the phrase You're not using Internet Explorer. :p And taking it one step further, I'm greatly enjoying watching Microsoft thrash around, trying to save themselves, which I don't think they will. Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not going to produce milk too much longer. I've just installed the Chrome beta on the Windows side of one of my machines (I grudgingly give it 10 Gb on each machine; Linux gets the rest), and it looks very, very nice. They've still got work to do, but they appear to be heading in a very good direction. These are smart people at Google. All signs seem to be pointing towards more and more computing happening on the server side in the coming years. /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik 3HbbATyqnrm0hAJXqNTqpl4= =3XD/ -END PGP SIGNATURE-
Wget and Yahoo login?
There was a recent discussion concerning using wget to obtain pages from yahoo logged into yahoo as a particular user. Micah replied to Rick Nakroshis with instructions describing two methods for doing this. This information has also been added by Micah to the wiki. I just tried the simpler of the two methods -- logging into yahoo with my browser (Firefox 2.0.0.16) and then downloading a page with wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home directory/.mozilla/firefox/id2dmo7r.default/cookies.txt 'http://yahoo url' The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. wget -V: GNU Wget 1.11.1 I am running a reasonably up-to-date Gentoo system (updated within the last month) on a Thinkpad X61. Have I missed something here? Any help will be appreciated. Please include my personal address in your replies as I am not (yet) a subscriber to this list. Thanks -- /Don Allen
Re: Wget and Yahoo login?
2008/9/8 Tony Godshall [EMAIL PROTECTED]: I haven't done this but I can speculate that you need to have wget identify itself as firefox. When I read this, I thought it looked promising, but it doesn't work. I tried sending exactly the user-agent string firefox is sending and still got a page from yahoo that clearly indicates yahoo thinks I'm not logged in. /Don Quote from man wget... -U agent-string --user-agent=agent-string Identify as agent-string to the HTTP server. The HTTP protocol allows the clients to identify themselves using a User-Agent header field. This enables distinguishing the WWW software, usually for statistical purposes or for tracing of protocol violations. Wget normally identifies as Wget/version, version being the current ver‐ sion number of Wget. However, some sites have been known to impose the policy of tailoring the output according to the User-Agent-supplied information. While this is not such a bad idea in theory, it has been abused by servers denying information to clients other than (historically) Netscape or, more fre‐ quently, Microsoft Internet Explorer. This option allows you to change the User-Agent line issued by Wget. Use of this option is discouraged, unless you really know what you are doing. On Mon, Sep 8, 2008 at 12:25 PM, Donald Allen [EMAIL PROTECTED] wrote: There was a recent discussion concerning using wget to obtain pages from yahoo logged into yahoo as a particular user. Micah replied to Rick Nakroshis with instructions describing two methods for doing this. This information has also been added by Micah to the wiki. I just tried the simpler of the two methods -- logging into yahoo with my browser (Firefox 2.0.0.16) and then downloading a page with wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home directory/.mozilla/firefox/id2dmo7r.default/cookies.txt 'http://yahoo url' The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. wget -V: GNU Wget 1.11.1 I am running a reasonably up-to-date Gentoo system (updated within the last month) on a Thinkpad X61. Have I missed something here? Any help will be appreciated. Please include my personal address in your replies as I am not (yet) a subscriber to this list. Thanks -- /Don Allen -- Best Regards. Please keep in touch. This is unedited. P-)