Re: Wget and Yahoo login?
And you'll probably have to do this again- I bet yahoo expires the session cookies! On Tue, Sep 9, 2008 at 2:18 PM, Donald Allen [EMAIL PROTECTED] wrote: After surprisingly little struggle, I got Plan B working -- logged into yahoo with wget, saved the cookies, including session cookies, and then proceeded to fetch pages using the saved cookies. Those pages came back logged in as me, with my customizations. Thanks to Tony, Daniel, and Micah -- you all provided critical advice in solving this problem. /Don On Tue, Sep 9, 2008 at 2:21 PM, Donald Allen [EMAIL PROTECTED] wrote: On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. Yes; and that's entirely my fault, as I didn't explicitly say that. No problem. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. I was actually surprised you kept going after I pointed out that it required the Accept-Encoding header that results in gzipped content. That didn't faze me because the pages I'm after will be processed by a python program, so having to gunzip would not require a manual step. This behavior is a little surprising to me from Yahoo!. It's not surprising in _general_, but for a site that really wants to be as accessible as possible (I would think?), insisting on the latest browsers seems ill-advised. Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape, visit a site, and get a server-generated page that's empty other than the phrase You're not using Internet Explorer. :p And taking it one step further, I'm greatly enjoying watching Microsoft thrash around, trying to save themselves, which I don't think they will. Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not going to produce milk too much longer. I've just installed the Chrome beta on the Windows side of one of my machines (I grudgingly give it 10 Gb on each machine; Linux gets the rest), and it looks very, very nice. They've still got work to do, but they appear to be heading in a very good direction. These are smart people at Google. All signs seem to be pointing towards more and more computing happening on the server side in the coming years. /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik 3HbbATyqnrm0hAJXqNTqpl4= =3XD/ -END PGP SIGNATURE- -- Best Regards. Please keep in touch. This is unedited. P-)
Re: Wget and Yahoo login?
On Mon, 8 Sep 2008, Donald Allen wrote: The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts need. Then you read the capure and replay them as closely as possible using your tool. As you will find out, sites like this use all sorts of funny tricks to figure out you and to make it hard to automate what you're trying to do. They tend to use javascripts for redirects and for fiddling with cookies just to make sure you have a javascript and cookie enabled browser. So you need to work hard(er) when trying this with non-browsers. It's certainly still possible, even without using the browser to get the first cookie file. But it may take some effort. -- / daniel.haxx.se
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote: On Mon, 8 Sep 2008, Donald Allen wrote: The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts need. Then you read the capure and replay them as closely as possible using your tool. As you will find out, sites like this use all sorts of funny tricks to figure out you and to make it hard to automate what you're trying to do. They tend to use javascripts for redirects and for fiddling with cookies just to make sure you have a javascript and cookie enabled browser. So you need to work hard(er) when trying this with non-browsers. It's certainly still possible, even without using the browser to get the first cookie file. But it may take some effort. I have not been able to retrieve a page with wget as if I were logged in using --load-cookies and Micah's suggestion about 'Accept-Encoding' (there was a typo in his message -- it's 'Accept-Encoding', not 'Accept-Encodings'). I did install livehttpheaders and tried --no-cookies and --header cookie info from livehttpheaders and that did work. Some of the cookie info sent by Firefox was a mystery, because it's not in the cookie file. Perhaps that's the crucial difference -- I'm speculating that wget isn't sending quite the same thing as Firefox when --load-cookies is used, because Firefox is adding stuff that isn't in the cookie file. Just a guess. Is there a way to ask wget to print the headers it sends (ala livehttpheaders)? I've looked through the options on the man page and didn't see anything, though I might have missed it. -- / daniel.haxx.se
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote: On Mon, 8 Sep 2008, Donald Allen wrote: The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts need. Then you read the capure and replay them as closely as possible using your tool. As you will find out, sites like this use all sorts of funny tricks to figure out you and to make it hard to automate what you're trying to do. They tend to use javascripts for redirects and for fiddling with cookies just to make sure you have a javascript and cookie enabled browser. So you need to work hard(er) when trying this with non-browsers. It's certainly still possible, even without using the browser to get the first cookie file. But it may take some effort. I have not been able to retrieve a page with wget as if I were logged in using --load-cookies and Micah's suggestion about 'Accept-Encoding' (there was a typo in his message -- it's 'Accept-Encoding', not 'Accept-Encodings'). I did install livehttpheaders and tried --no-cookies and --header cookie info from livehttpheaders and that did work. That's how I did it as well (except I got the headers from tcpdump); I'm using Firefox 3, so don't have access to FF's new sqllite-based cookies file (apart from the patch at http://wget.addictivecode.org/FrontPage?action=AttachFiledo=viewtarget=wget-firefox3-cookie.patch). Some of the cookie info sent by Firefox was a mystery, because it's not in the cookie file. Perhaps that's the crucial difference -- I'm speculating that wget isn't sending quite the same thing as Firefox when --load-cookies is used, because Firefox is adding stuff that isn't in the cookie file. Just a guess. Probably there are session cookies involved, that are sent in the first page, that you're not sending back with the form submit. - --keep-session-cookies and --save-cookies=foo.txt make a good combination. Is there a way to ask wget to print the headers it sends (ala livehttpheaders)? I've looked through the options on the man page and didn't see anything, though I might have missed it. - --debug - -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxqL77M8hyUobTrERAovFAJ9yagS2xW+2wFG65BwiFkJNfTMylgCfYaq7 1vOmTDimFg8E7Cn+Q+HGZn8= =JKXH -END PGP SIGNATURE-
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 12:23 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote: On Mon, 8 Sep 2008, Donald Allen wrote: The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts need. Then you read the capure and replay them as closely as possible using your tool. As you will find out, sites like this use all sorts of funny tricks to figure out you and to make it hard to automate what you're trying to do. They tend to use javascripts for redirects and for fiddling with cookies just to make sure you have a javascript and cookie enabled browser. So you need to work hard(er) when trying this with non-browsers. It's certainly still possible, even without using the browser to get the first cookie file. But it may take some effort. I have not been able to retrieve a page with wget as if I were logged in using --load-cookies and Micah's suggestion about 'Accept-Encoding' (there was a typo in his message -- it's 'Accept-Encoding', not 'Accept-Encodings'). I did install livehttpheaders and tried --no-cookies and --header cookie info from livehttpheaders and that did work. That's how I did it as well (except I got the headers from tcpdump); I'm using Firefox 3, so don't have access to FF's new sqllite-based cookies file (apart from the patch at http://wget.addictivecode.org/FrontPage?action=AttachFiledo=viewtarget=wget-firefox3-cookie.patch ). Some of the cookie info sent by Firefox was a mystery, because it's not in the cookie file. Perhaps that's the crucial difference -- I'm speculating that wget isn't sending quite the same thing as Firefox when --load-cookies is used, because Firefox is adding stuff that isn't in the cookie file. Just a guess. Probably there are session cookies involved, that are sent in the first page, that you're not sending back with the form submit. - --keep-session-cookies and --save-cookies=foo.txt make a good combination. Is there a way to ask wget to print the headers it sends (ala livehttpheaders)? I've looked through the options on the man page and didn't see anything, though I might have missed it. - --debug Well, I rebuilt my wget with the 'debug' use flag and ran it on the yahoo test page (after having logged in to yahoo with firefox, of course) with --load-cookies and the accept-encoding header item, with --debug. Very useful. wget is sending every cookie item in firefox's cookies.txt. But firefox sends three additional cookie items in the header that wget does not send. Those items are *not* in firefox's cookies.txt so wget has no way of knowing about them. Is it possible that firefox is not writing session cookies to the file? The result of this test, just to be clear, was a page that indicated yahoo thought I was not logged in. Those extra items firefox is sending appear to be the difference, because I included them (from the livehttpheaders output) when I tried sending the cookies manually with --header, I got the same page back with wget that indicated that yahoo knew I was logged in and formatted with page with my preferences. /Don - -- HTH, Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxqL77M8hyUobTrERAovFAJ9yagS2xW+2wFG65BwiFkJNfTMylgCfYaq7 1vOmTDimFg8E7Cn+Q+HGZn8= =JKXH -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: The result of this test, just to be clear, was a page that indicated yahoo thought I was not logged in. Those extra items firefox is sending appear to be the difference, because I included them (from the livehttpheaders output) when I tried sending the cookies manually with --header, I got the same page back with wget that indicated that yahoo knew I was logged in and formatted with page with my preferences. Perhaps you missed this in my last message: Probably there are session cookies involved, that are sent in the first page, that you're not sending back with the form submit. --keep-session-cookies and --save-cookies=foo.txt make a good combination. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxrJ17M8hyUobTrERAvdsAJ9XEwMfimHXRUXKtV66P+YsG+tA7gCfWKbq nCqAmXJfU3kTncMQkKk0JZo= =17Yr -END PGP SIGNATURE-
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 1:29 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: The result of this test, just to be clear, was a page that indicated yahoo thought I was not logged in. Those extra items firefox is sending appear to be the difference, because I included them (from the livehttpheaders output) when I tried sending the cookies manually with --header, I got the same page back with wget that indicated that yahoo knew I was logged in and formatted with page with my preferences. Perhaps you missed this in my last message: Probably there are session cookies involved, that are sent in the first page, that you're not sending back with the form submit. --keep-session-cookies and --save-cookies=foo.txt make a good combination. I think we're mis-communicating, easily my fault, since I know just enough about this stuff to be dangerous. I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxrJ17M8hyUobTrERAvdsAJ9XEwMfimHXRUXKtV66P+YsG+tA7gCfWKbq nCqAmXJfU3kTncMQkKk0JZo= =17Yr -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxrVD7M8hyUobTrERAt19AJ9bmmczCKjzMtGCoXb8B5g25uMLRQCeK8qh M57W3Reqj+/pO8GuDwb9Nok= =ajp/ -END PGP SIGNATURE-
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxrVD7M8hyUobTrERAt19AJ9bmmczCKjzMtGCoXb8B5g25uMLRQCeK8qh M57W3Reqj+/pO8GuDwb9Nok= =ajp/ -END PGP SIGNATURE-
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. Yes; and that's entirely my fault, as I didn't explicitly say that. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. I was actually surprised you kept going after I pointed out that it required the Accept-Encoding header that results in gzipped content. This behavior is a little surprising to me from Yahoo!. It's not surprising in _general_, but for a site that really wants to be as accessible as possible (I would think?), insisting on the latest browsers seems ill-advised. Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape, visit a site, and get a server-generated page that's empty other than the phrase You're not using Internet Explorer. :p - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik 3HbbATyqnrm0hAJXqNTqpl4= =3XD/ -END PGP SIGNATURE-
Re: Wget and Yahoo login?
On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. Yes; and that's entirely my fault, as I didn't explicitly say that. No problem. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. I was actually surprised you kept going after I pointed out that it required the Accept-Encoding header that results in gzipped content. That didn't faze me because the pages I'm after will be processed by a python program, so having to gunzip would not require a manual step. This behavior is a little surprising to me from Yahoo!. It's not surprising in _general_, but for a site that really wants to be as accessible as possible (I would think?), insisting on the latest browsers seems ill-advised. Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape, visit a site, and get a server-generated page that's empty other than the phrase You're not using Internet Explorer. :p And taking it one step further, I'm greatly enjoying watching Microsoft thrash around, trying to save themselves, which I don't think they will. Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not going to produce milk too much longer. I've just installed the Chrome beta on the Windows side of one of my machines (I grudgingly give it 10 Gb on each machine; Linux gets the rest), and it looks very, very nice. They've still got work to do, but they appear to be heading in a very good direction. These are smart people at Google. All signs seem to be pointing towards more and more computing happening on the server side in the coming years. /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik 3HbbATyqnrm0hAJXqNTqpl4= =3XD/ -END PGP SIGNATURE-
Re: Wget and Yahoo login?
After surprisingly little struggle, I got Plan B working -- logged into yahoo with wget, saved the cookies, including session cookies, and then proceeded to fetch pages using the saved cookies. Those pages came back logged in as me, with my customizations. Thanks to Tony, Daniel, and Micah -- you all provided critical advice in solving this problem. /Don On Tue, Sep 9, 2008 at 2:21 PM, Donald Allen [EMAIL PROTECTED] wrote: On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Donald Allen wrote: I am doing the yahoo session login with firefox, not with wget, so I'm using the first and easier of your two suggested methods. I'm guessing you are thinking that I'm trying to login to the yahoo session with wget, and thus --keep-session-cookies and --save-cookies=foo.txt would make perfect sense to me, but that's not what I'm doing (yet -- if I'm right about what's happening here, I'm going to have to resort to this). But using firefox to initiate the session, it looks to me like wget never gets to see the session cookies because I don't think firefox writes them to its cookie file (which actually makes sense -- if they only need to live as long as the session, why write them out?). Yes, and I understood this; the thing is, that if session cookies are involved (i.e., cookies that are marked for immediate expiration and are not meant to be saved to the cookies file), then I don't see how you have much choice other than to use the harder method, or else to fake the session cookies by manually inserting them to your cookies file or whatnot (not sure how well that may be expected to work). Or, yeah, add an explicit --header 'Cookie: ...'. Ah, the misunderstanding was that the stuff you thought I missed was intended to push me in the direction of Plan B -- log in to yahoo with wget. Yes; and that's entirely my fault, as I didn't explicitly say that. No problem. I understand now. I'll look at trying to make this work. Thanks for all the help, though I can't guarantee that you are done yet :-) But, hopefully, this exchange will benefit others. I was actually surprised you kept going after I pointed out that it required the Accept-Encoding header that results in gzipped content. That didn't faze me because the pages I'm after will be processed by a python program, so having to gunzip would not require a manual step. This behavior is a little surprising to me from Yahoo!. It's not surprising in _general_, but for a site that really wants to be as accessible as possible (I would think?), insisting on the latest browsers seems ill-advised. Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape, visit a site, and get a server-generated page that's empty other than the phrase You're not using Internet Explorer. :p And taking it one step further, I'm greatly enjoying watching Microsoft thrash around, trying to save themselves, which I don't think they will. Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not going to produce milk too much longer. I've just installed the Chrome beta on the Windows side of one of my machines (I grudgingly give it 10 Gb on each machine; Linux gets the rest), and it looks very, very nice. They've still got work to do, but they appear to be heading in a very good direction. These are smart people at Google. All signs seem to be pointing towards more and more computing happening on the server side in the coming years. /Don - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik 3HbbATyqnrm0hAJXqNTqpl4= =3XD/ -END PGP SIGNATURE-
Re: Wget and Yahoo login?
2008/9/8 Tony Godshall [EMAIL PROTECTED]: I haven't done this but I can speculate that you need to have wget identify itself as firefox. When I read this, I thought it looked promising, but it doesn't work. I tried sending exactly the user-agent string firefox is sending and still got a page from yahoo that clearly indicates yahoo thinks I'm not logged in. /Don Quote from man wget... -U agent-string --user-agent=agent-string Identify as agent-string to the HTTP server. The HTTP protocol allows the clients to identify themselves using a User-Agent header field. This enables distinguishing the WWW software, usually for statistical purposes or for tracing of protocol violations. Wget normally identifies as Wget/version, version being the current ver‐ sion number of Wget. However, some sites have been known to impose the policy of tailoring the output according to the User-Agent-supplied information. While this is not such a bad idea in theory, it has been abused by servers denying information to clients other than (historically) Netscape or, more fre‐ quently, Microsoft Internet Explorer. This option allows you to change the User-Agent line issued by Wget. Use of this option is discouraged, unless you really know what you are doing. On Mon, Sep 8, 2008 at 12:25 PM, Donald Allen [EMAIL PROTECTED] wrote: There was a recent discussion concerning using wget to obtain pages from yahoo logged into yahoo as a particular user. Micah replied to Rick Nakroshis with instructions describing two methods for doing this. This information has also been added by Micah to the wiki. I just tried the simpler of the two methods -- logging into yahoo with my browser (Firefox 2.0.0.16) and then downloading a page with wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home directory/.mozilla/firefox/id2dmo7r.default/cookies.txt 'http://yahoo url' The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. wget -V: GNU Wget 1.11.1 I am running a reasonably up-to-date Gentoo system (updated within the last month) on a Thinkpad X61. Have I missed something here? Any help will be appreciated. Please include my personal address in your replies as I am not (yet) a subscriber to this list. Thanks -- /Don Allen -- Best Regards. Please keep in touch. This is unedited. P-)
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: There was a recent discussion concerning using wget to obtain pages from yahoo logged into yahoo as a particular user. Micah replied to Rick Nakroshis with instructions describing two methods for doing this. This information has also been added by Micah to the wiki. I just tried the simpler of the two methods -- logging into yahoo with my browser (Firefox 2.0.0.16) and then downloading a page with wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home directory/.mozilla/firefox/id2dmo7r.default/cookies.txt 'http://yahoo url' The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. Are you signing into the main Yahoo! site? When I try to do so, whether I use the cookies or no, I get a message about update your browser to something more modern or the like. The difference appears to be a combination of _both_ User-Agent (as you've done), _and_ --header Accept-Encodings: gzip,deflate. This plus appropriate cookies gets me a decent logged-in page, but of course it's gzip-compressed. Since Wget doesn't currently support gzip-decoding and the like, that makes the use of Wget in this situation cumbersome. Support for something like this probably won't be seen until 1.13 or 1.14, I'm afraid. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxdw77M8hyUobTrERAi/QAJ0atPMeUQ/0YCNwAP+XiH4nDyvclwCcDxYo obud0CjpATBYDvA0eS3ZHGY= =vv4R -END PGP SIGNATURE-
RE: Wget and Yahoo login?
Micah Cowan wrote: The easiest way to do what you want may be to log in using your browser, and then tell Wget to use the cookies from your browser, using Given the frequency of the login and then download a file use case , it should probably be documented on the wiki. (Perhaps it already is. :-) Also, it would probably be helpful to have a shell script to automate this. Tony
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony Lewis wrote: Micah Cowan wrote: The easiest way to do what you want may be to log in using your browser, and then tell Wget to use the cookies from your browser, using Given the frequency of the login and then download a file use case , it should probably be documented on the wiki. (Perhaps it already is. :-) Yeah, at http://wget.addictivecode.org/FrequentlyAskedQuestions#password-protected I think you missed the final sentence of my how-to: (I'm going to put this up on the Wgiki Faq now, at http://wget.addictivecode.org/FrequentlyAskedQuestions) :) (Back to you:) Also, it would probably be helpful to have a shell script to automate this. I filed the following issue some time ago: https://savannah.gnu.org/bugs/index.php?22561 The report is low on details; but I was envisioning something that would spew out forms and their fields, accept values for fields in one form, and invoke the appropriate Wget command to do the submission. I don't know if it could be _completely_ automated, since it's not 100% possible for the script to know which form fields are the ones it should be filling out. OTOH, there are some damn good heuristics that could be done: I imagine that the right form (in the event of more than one) can usually be guessed by seeing which one has a password-type input (assuming there's also only one of those). If that form has only one text-type input, then we've found the username field as well. Name-based heuristics (with pass, user, uname, login, etc) could also help. If someone wants to do this, that'd be terrific. Could probably reuse the existing HTML parser code from Wget. Otherwise, it'd probably be a while before I could get to it, since I've got higher priorities that have been languishing. Such a tool might also be an appropriate place to add FF3 sqllite cookies support. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIrb0s7M8hyUobTrERAlVXAJ9YnAM7JiQrxrB/KclA1FXDnoVswgCdGO7t Vaa98nhNRuEY4aLMx2BFXm0= =ScoA -END PGP SIGNATURE-
Re: Wget and Yahoo login?
At 04:27 PM 8/10/2008, you wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Rick Nakroshis wrote: Micah, If you will excuse a quick question about Wget, I'm trying to find out if I can use it to download a page from Yahoo that requires me to be logged in using my Yahoo profile name and password. It's a display of a CSV file, and the only wrinkle is trying to get past the Yahoo login. Try as I may, I just can't seem to find anything about Wget and Yahoo. Any suggestions or pointers? Hi Rick, In the future, it's better if you post questions to the mailing list at wget@sunsite.dk; I don't always have time to respond. The easiest way to do what you want may be to log in using your browser, and then tell Wget to use the cookies from your browser, using - --load-cookies=path-to-browser's-cookies. Of course, this only works if your browser saves its cookies in the standard text format (Firefox prior to version 3 will do this), or can export to that format (note that someone contributed a patch to allow Wget to work with Firefox 3 cookies; it's linked from http://wget.addictivecode.org/, it's unoffocial so I can't vouch for its quality). Otherwise, you can perform the login using Wget, saving the cookies to a file of your choice, using --post-data=..., --save-cookies=cookies.txt, and probably --keep-session-cookies. This will require that you know what data to place in --post-data, which generally requires that you dig around in the HTML to find the right form field names, and where to post them. For instance, if you find a form like the following within the page containing the log-in form: form action=/doLogin.php method=POST input type=text name=s-login input type=password name=s-pass /form then you need to do something like: $ wget --post-data='s-login=USERNAMEs-pass=PASSWORD' \ --save-cookies=my-cookies.txt --keep-session-cookies \ http://HOSTNAME/doLogin.php (Note that you _don't_ necessarily send the information to the page that had the login page: you send it to the spot mentioned in the action attribute of the password form.) Once this is done, you _should_ be able to perform further operations with Wget as if you're logged in, by using $ wget --load-cookies=my-cookies.txt --save-cookies=my-cookies.txt \ --keep-session-cookies ... (I'm going to put this up on the Wgiki Faq now, at http://wget.addictivecode.org/FrequentlyAskedQuestions) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIn09A7M8hyUobTrERAu04AJ9EgRoBBhvNCDwOt87f91p+HpWktACdFgMM KEfliBtfrPBbh/XdvusEPiw= =qlGZ -END PGP SIGNATURE- Micah, Thank you for taking the time to answer so thoroughly, and doing so promptly, too. You've given me a great boost forward, and I appreciate it. Thank you, sir! Rick
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Rick Nakroshis wrote: Micah, If you will excuse a quick question about Wget, I'm trying to find out if I can use it to download a page from Yahoo that requires me to be logged in using my Yahoo profile name and password. It's a display of a CSV file, and the only wrinkle is trying to get past the Yahoo login. Try as I may, I just can't seem to find anything about Wget and Yahoo. Any suggestions or pointers? Hi Rick, In the future, it's better if you post questions to the mailing list at wget@sunsite.dk; I don't always have time to respond. The easiest way to do what you want may be to log in using your browser, and then tell Wget to use the cookies from your browser, using - --load-cookies=path-to-browser's-cookies. Of course, this only works if your browser saves its cookies in the standard text format (Firefox prior to version 3 will do this), or can export to that format (note that someone contributed a patch to allow Wget to work with Firefox 3 cookies; it's linked from http://wget.addictivecode.org/, it's unoffocial so I can't vouch for its quality). Otherwise, you can perform the login using Wget, saving the cookies to a file of your choice, using --post-data=..., --save-cookies=cookies.txt, and probably --keep-session-cookies. This will require that you know what data to place in --post-data, which generally requires that you dig around in the HTML to find the right form field names, and where to post them. For instance, if you find a form like the following within the page containing the log-in form: form action=/doLogin.php method=POST input type=text name=s-login input type=password name=s-pass /form then you need to do something like: $ wget --post-data='s-login=USERNAMEs-pass=PASSWORD' \ --save-cookies=my-cookies.txt --keep-session-cookies \ http://HOSTNAME/doLogin.php (Note that you _don't_ necessarily send the information to the page that had the login page: you send it to the spot mentioned in the action attribute of the password form.) Once this is done, you _should_ be able to perform further operations with Wget as if you're logged in, by using $ wget --load-cookies=my-cookies.txt --save-cookies=my-cookies.txt \ --keep-session-cookies ... (I'm going to put this up on the Wgiki Faq now, at http://wget.addictivecode.org/FrequentlyAskedQuestions) - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIn09A7M8hyUobTrERAu04AJ9EgRoBBhvNCDwOt87f91p+HpWktACdFgMM KEfliBtfrPBbh/XdvusEPiw= =qlGZ -END PGP SIGNATURE-