Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote:
 On Mon, 8 Sep 2008, Donald Allen wrote:

 The page I get is what would be obtained if an un-logged-in user went to
 the specified url. Opening that same url in Firefox *does* correctly
 indicate that it is logged in as me and reflects my customizations.

 First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts
 need. Then you read the capure and replay them as closely as possible using
 your tool.

 As you will find out, sites like this use all sorts of funny tricks to
 figure out you and to make it hard to automate what you're trying to do.
 They tend to use javascripts for redirects and for fiddling with cookies
 just to make sure you have a javascript and cookie enabled browser. So you
 need to work hard(er) when trying this with non-browsers.

 It's certainly still possible, even without using the browser to get the
 first cookie file. But it may take some effort.

I have not been able to retrieve a page with wget as if I were logged
in using --load-cookies and Micah's suggestion about 'Accept-Encoding'
(there was a typo in his message -- it's 'Accept-Encoding', not
'Accept-Encodings'). I did install livehttpheaders and tried
--no-cookies and --header cookie info from livehttpheaders and that
did work. Some of the cookie info sent by Firefox was a mystery,
because it's not in the cookie file. Perhaps that's the crucial
difference -- I'm speculating that wget isn't sending quite the same
thing as Firefox when --load-cookies is used, because Firefox is
adding stuff that isn't in the cookie file. Just a guess. Is there a
way to ask wget to print the headers it sends (ala livehttpheaders)?
I've looked through the options on the man page and didn't see
anything, though I might have missed it.


 --

  / daniel.haxx.se



Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 12:23 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
  On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote:
  On Mon, 8 Sep 2008, Donald Allen wrote:
 
  The page I get is what would be obtained if an un-logged-in user went
 to
  the specified url. Opening that same url in Firefox *does* correctly
  indicate that it is logged in as me and reflects my customizations.
  First, LiveHTTPHeaders is the Firefox plugin everyone who tries these
 stunts
  need. Then you read the capure and replay them as closely as possible
 using
  your tool.
 
  As you will find out, sites like this use all sorts of funny tricks to
  figure out you and to make it hard to automate what you're trying to do.
  They tend to use javascripts for redirects and for fiddling with cookies
  just to make sure you have a javascript and cookie enabled browser. So
 you
  need to work hard(er) when trying this with non-browsers.
 
  It's certainly still possible, even without using the browser to get the
  first cookie file. But it may take some effort.
 
  I have not been able to retrieve a page with wget as if I were logged
  in using --load-cookies and Micah's suggestion about 'Accept-Encoding'
  (there was a typo in his message -- it's 'Accept-Encoding', not
  'Accept-Encodings'). I did install livehttpheaders and tried
  --no-cookies and --header cookie info from livehttpheaders and that
  did work.

 That's how I did it as well (except I got the headers from tcpdump); I'm
 using Firefox 3, so don't have access to FF's new sqllite-based cookies
 file (apart from the patch at

 http://wget.addictivecode.org/FrontPage?action=AttachFiledo=viewtarget=wget-firefox3-cookie.patch
 ).

  Some of the cookie info sent by Firefox was a mystery,
  because it's not in the cookie file. Perhaps that's the crucial
  difference -- I'm speculating that wget isn't sending quite the same
  thing as Firefox when --load-cookies is used, because Firefox is
  adding stuff that isn't in the cookie file. Just a guess.

 Probably there are session cookies involved, that are sent in the first
 page, that you're not sending back with the form submit.
 - --keep-session-cookies and --save-cookies=foo.txt make a good
 combination.

  Is there a
  way to ask wget to print the headers it sends (ala livehttpheaders)?
  I've looked through the options on the man page and didn't see
  anything, though I might have missed it.

 - --debug


Well, I rebuilt my wget with the 'debug' use flag and ran it on the yahoo
test page (after having logged in to yahoo with firefox, of course) with
--load-cookies and the accept-encoding header item, with --debug. Very
useful. wget is sending every cookie item in firefox's cookies.txt. But
firefox sends three additional cookie items in the header that wget does not
send. Those items are *not* in firefox's cookies.txt so wget has no way of
knowing about them. Is it possible that firefox is not writing session
cookies to the file?

The result of this test, just to be clear, was a page that indicated yahoo
thought I was not logged in. Those extra items firefox is sending appear to
be the difference, because I included them (from the livehttpheaders output)
when I tried sending the cookies manually with --header, I got the same page
back with wget that indicated that yahoo knew I was logged in and formatted
with page with my preferences.

/Don





 - --
 HTH,
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxqL77M8hyUobTrERAovFAJ9yagS2xW+2wFG65BwiFkJNfTMylgCfYaq7
 1vOmTDimFg8E7Cn+Q+HGZn8=
 =JKXH
 -END PGP SIGNATURE-



Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 1:29 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
  The result of this test, just to be clear, was a page that indicated
  yahoo thought I was not logged in. Those extra items firefox is sending
  appear to be the difference, because I included them (from the
  livehttpheaders output) when I tried sending the cookies manually with
  --header, I got the same page back with wget that indicated that yahoo
  knew I was logged in and formatted with page with my preferences.

 Perhaps you missed this in my last message:

  Probably there are session cookies involved, that are sent in the first
  page, that you're not sending back with the form submit.
  --keep-session-cookies and --save-cookies=foo.txt make a good
  combination.


I think we're mis-communicating, easily my fault, since I know just enough
about this stuff to be dangerous.

I am doing the yahoo session login with firefox, not with wget, so I'm using
the first and easier of your two suggested methods. I'm guessing you are
thinking that I'm trying to login to the yahoo session with wget, and thus
--keep-session-cookies and --save-cookies=foo.txt would make perfect sense
to me, but that's not what I'm doing (yet -- if I'm right about what's
happening here, I'm going to have to resort to this). But using firefox to
initiate the session, it looks to me like wget never gets to see the session
cookies because I don't think firefox writes them to its cookie file (which
actually makes sense -- if they only need to live as long as the session,
why write them out?).

/Don





 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxrJ17M8hyUobTrERAvdsAJ9XEwMfimHXRUXKtV66P+YsG+tA7gCfWKbq
 nCqAmXJfU3kTncMQkKk0JZo=
 =17Yr
 -END PGP SIGNATURE-



Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
  I am doing the yahoo session login with firefox, not with wget, so I'm
  using the first and easier of your two suggested methods. I'm guessing
  you are thinking that I'm trying to login to the yahoo session with
  wget, and thus --keep-session-cookies and --save-cookies=foo.txt would
  make perfect sense to me, but that's not what I'm doing (yet -- if I'm
  right about what's happening here, I'm going to have to resort to this).
  But using firefox to initiate the session, it looks to me like wget
  never gets to see the session cookies because I don't think firefox
  writes them to its cookie file (which actually makes sense -- if they
  only need to live as long as the session, why write them out?).

 Yes, and I understood this; the thing is, that if session cookies are
 involved (i.e., cookies that are marked for immediate expiration and are
 not meant to be saved to the cookies file), then I don't see how you
 have much choice other than to use the harder method, or else to fake
 the session cookies by manually inserting them to your cookies file or
 whatnot (not sure how well that may be expected to work). Or, yeah, add
 an explicit --header 'Cookie: ...'.


Ah, the misunderstanding was that the stuff you thought I missed was
intended to push me in the direction of Plan B -- log in to yahoo with wget.
I understand now. I'll look at trying to make this work. Thanks for all the
help, though I can't guarantee that you are done yet :-) But, hopefully,
this exchange will benefit others.

/Don



 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxrVD7M8hyUobTrERAt19AJ9bmmczCKjzMtGCoXb8B5g25uMLRQCeK8qh
 M57W3Reqj+/pO8GuDwb9Nok=
 =ajp/
 -END PGP SIGNATURE-



Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
 
 
  On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  Donald Allen wrote:
  I am doing the yahoo session login with firefox, not with wget,
  so I'm
  using the first and easier of your two suggested methods. I'm
  guessing
  you are thinking that I'm trying to login to the yahoo session with
  wget, and thus --keep-session-cookies and
  --save-cookies=foo.txt would
  make perfect sense to me, but that's not what I'm doing (yet --
  if I'm
  right about what's happening here, I'm going to have to resort to
  this).
  But using firefox to initiate the session, it looks to me like wget
  never gets to see the session cookies because I don't think firefox
  writes them to its cookie file (which actually makes sense -- if they
  only need to live as long as the session, why write them out?).
 
  Yes, and I understood this; the thing is, that if session cookies are
  involved (i.e., cookies that are marked for immediate expiration and are
  not meant to be saved to the cookies file), then I don't see how you
  have much choice other than to use the harder method, or else to fake
  the session cookies by manually inserting them to your cookies file or
  whatnot (not sure how well that may be expected to work). Or, yeah, add
  an explicit --header 'Cookie: ...'.
 
 
  Ah, the misunderstanding was that the stuff you thought I missed was
  intended to push me in the direction of Plan B -- log in to yahoo with
  wget.

 Yes; and that's entirely my fault, as I didn't explicitly say that.


No problem.



  I understand now. I'll look at trying to make this work. Thanks
  for all the help, though I can't guarantee that you are done yet :-)
  But, hopefully, this exchange will benefit others.

 I was actually surprised you kept going after I pointed out that it
 required the Accept-Encoding header that results in gzipped content.


That didn't faze me because the pages I'm after will be processed by a
python program, so having to gunzip would not require a manual step.


 This behavior is a little surprising to me from Yahoo!. It's not
 surprising in _general_, but for a site that really wants to be as
 accessible as possible (I would think?), insisting on the latest
 browsers seems ill-advised.

 Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape,
 visit a site, and get a server-generated page that's empty other than
 the phrase You're not using Internet Explorer. :p


And taking it one step further, I'm greatly enjoying watching Microsoft
thrash around, trying to save themselves, which I don't think they will.
Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not
going to produce milk too much longer. I've just installed the Chrome beta
on the Windows side of one of my machines (I grudgingly give it 10 Gb on
each machine; Linux gets the rest), and it looks very, very nice. They've
still got work to do, but they appear to be heading in a very good
direction. These are smart people at Google. All signs seem to be pointing
towards more and more computing happening on the server side in the coming
years.

/Don




 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik
 3HbbATyqnrm0hAJXqNTqpl4=
 =3XD/
 -END PGP SIGNATURE-



Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
After surprisingly little struggle, I got Plan B working -- logged into
yahoo with wget, saved the cookies, including session cookies, and then
proceeded to fetch pages using the saved cookies. Those pages came back
logged in as me, with my customizations. Thanks to Tony, Daniel, and Micah
-- you all provided critical advice in solving this problem.

/Don

On Tue, Sep 9, 2008 at 2:21 PM, Donald Allen [EMAIL PROTECTED] wrote:



 On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
 
 
  On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  Donald Allen wrote:
  I am doing the yahoo session login with firefox, not with wget,
  so I'm
  using the first and easier of your two suggested methods. I'm
  guessing
  you are thinking that I'm trying to login to the yahoo session with
  wget, and thus --keep-session-cookies and
  --save-cookies=foo.txt would
  make perfect sense to me, but that's not what I'm doing (yet --
  if I'm
  right about what's happening here, I'm going to have to resort to
  this).
  But using firefox to initiate the session, it looks to me like wget
  never gets to see the session cookies because I don't think firefox
  writes them to its cookie file (which actually makes sense -- if they
  only need to live as long as the session, why write them out?).
 
  Yes, and I understood this; the thing is, that if session cookies are
  involved (i.e., cookies that are marked for immediate expiration and are
  not meant to be saved to the cookies file), then I don't see how you
  have much choice other than to use the harder method, or else to fake
  the session cookies by manually inserting them to your cookies file or
  whatnot (not sure how well that may be expected to work). Or, yeah, add
  an explicit --header 'Cookie: ...'.
 
 
  Ah, the misunderstanding was that the stuff you thought I missed was
  intended to push me in the direction of Plan B -- log in to yahoo with
  wget.

 Yes; and that's entirely my fault, as I didn't explicitly say that.


 No problem.



  I understand now. I'll look at trying to make this work. Thanks
  for all the help, though I can't guarantee that you are done yet :-)
  But, hopefully, this exchange will benefit others.

 I was actually surprised you kept going after I pointed out that it
 required the Accept-Encoding header that results in gzipped content.


 That didn't faze me because the pages I'm after will be processed by a
 python program, so having to gunzip would not require a manual step.


 This behavior is a little surprising to me from Yahoo!. It's not
 surprising in _general_, but for a site that really wants to be as
 accessible as possible (I would think?), insisting on the latest
 browsers seems ill-advised.

 Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape,
 visit a site, and get a server-generated page that's empty other than
 the phrase You're not using Internet Explorer. :p


 And taking it one step further, I'm greatly enjoying watching Microsoft
 thrash around, trying to save themselves, which I don't think they will.
 Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not
 going to produce milk too much longer. I've just installed the Chrome beta
 on the Windows side of one of my machines (I grudgingly give it 10 Gb on
 each machine; Linux gets the rest), and it looks very, very nice. They've
 still got work to do, but they appear to be heading in a very good
 direction. These are smart people at Google. All signs seem to be pointing
 towards more and more computing happening on the server side in the coming
 years.

 /Don




 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik
 3HbbATyqnrm0hAJXqNTqpl4=
 =3XD/
 -END PGP SIGNATURE-





Wget and Yahoo login?

2008-09-08 Thread Donald Allen
There was a recent discussion concerning using wget to obtain pages
from yahoo logged into yahoo as a particular user. Micah replied to
Rick Nakroshis with instructions describing two methods for doing
this. This information has also been added by Micah to the wiki.

I just tried the simpler of the two methods -- logging into yahoo with
my browser (Firefox 2.0.0.16) and then downloading a page with

wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home
directory/.mozilla/firefox/id2dmo7r.default/cookies.txt
'http://yahoo url'

The page I get is what would be obtained if an un-logged-in user went
to the specified url. Opening that same url in Firefox *does*
correctly indicate that it is logged in as me and reflects my
customizations.

wget -V:
GNU Wget 1.11.1

I am running a reasonably up-to-date Gentoo system (updated within the
last month) on a Thinkpad X61.

Have I missed something here? Any help will be appreciated. Please
include my personal address in your replies as I am not (yet) a
subscriber to this list.

Thanks --
/Don Allen


Re: Wget and Yahoo login?

2008-09-08 Thread Donald Allen
2008/9/8 Tony Godshall [EMAIL PROTECTED]:
 I haven't done this but I can speculate that you need to
 have wget identify itself as firefox.

When I read this, I thought it looked promising, but it doesn't work.
I tried sending exactly the user-agent string firefox is sending and
still got a page from yahoo that clearly indicates yahoo thinks I'm
not logged in.

/Don


 Quote from man wget...

   -U agent-string
   --user-agent=agent-string
   Identify as agent-string to the HTTP server.

   The HTTP protocol allows the clients to identify themselves
 using a User-Agent header field.  This enables distinguishing the
 WWW software,
   usually for statistical purposes or for tracing of protocol
 violations.  Wget normally identifies as Wget/version, version being
 the current ver‐
   sion number of Wget.

   However, some sites have been known to impose the policy of
 tailoring the output according to the User-Agent-supplied
 information.  While this
   is not such a bad idea in theory, it has been abused by
 servers denying information to clients other than (historically)
 Netscape or, more fre‐
   quently, Microsoft Internet Explorer.  This option allows
 you to change the User-Agent line issued by Wget.  Use of this
 option is discouraged,
   unless you really know what you are doing.


 On Mon, Sep 8, 2008 at 12:25 PM, Donald Allen [EMAIL PROTECTED] wrote:
 There was a recent discussion concerning using wget to obtain pages
 from yahoo logged into yahoo as a particular user. Micah replied to
 Rick Nakroshis with instructions describing two methods for doing
 this. This information has also been added by Micah to the wiki.

 I just tried the simpler of the two methods -- logging into yahoo with
 my browser (Firefox 2.0.0.16) and then downloading a page with

 wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home
 directory/.mozilla/firefox/id2dmo7r.default/cookies.txt
 'http://yahoo url'

 The page I get is what would be obtained if an un-logged-in user went
 to the specified url. Opening that same url in Firefox *does*
 correctly indicate that it is logged in as me and reflects my
 customizations.

 wget -V:
 GNU Wget 1.11.1

 I am running a reasonably up-to-date Gentoo system (updated within the
 last month) on a Thinkpad X61.

 Have I missed something here? Any help will be appreciated. Please
 include my personal address in your replies as I am not (yet) a
 subscriber to this list.

 Thanks --
 /Don Allen




 --
 Best Regards.
 Please keep in touch.
 This is unedited.
 P-)