Re: Wget and Yahoo login?

2008-09-10 Thread Tony Godshall
And you'll probably have to do this again- I bet
yahoo expires the session cookies!


On Tue, Sep 9, 2008 at 2:18 PM, Donald Allen [EMAIL PROTECTED] wrote:
 After surprisingly little struggle, I got Plan B working -- logged into
 yahoo with wget, saved the cookies, including session cookies, and then
 proceeded to fetch pages using the saved cookies. Those pages came back
 logged in as me, with my customizations. Thanks to Tony, Daniel, and Micah
 -- you all provided critical advice in solving this problem.

 /Don

 On Tue, Sep 9, 2008 at 2:21 PM, Donald Allen [EMAIL PROTECTED] wrote:


 On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
 
 
  On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  Donald Allen wrote:
  I am doing the yahoo session login with firefox, not with wget,
  so I'm
  using the first and easier of your two suggested methods. I'm
  guessing
  you are thinking that I'm trying to login to the yahoo session with
  wget, and thus --keep-session-cookies and
  --save-cookies=foo.txt would
  make perfect sense to me, but that's not what I'm doing (yet --
  if I'm
  right about what's happening here, I'm going to have to resort to
  this).
  But using firefox to initiate the session, it looks to me like wget
  never gets to see the session cookies because I don't think firefox
  writes them to its cookie file (which actually makes sense -- if they
  only need to live as long as the session, why write them out?).
 
  Yes, and I understood this; the thing is, that if session cookies are
  involved (i.e., cookies that are marked for immediate expiration and
  are
  not meant to be saved to the cookies file), then I don't see how you
  have much choice other than to use the harder method, or else to fake
  the session cookies by manually inserting them to your cookies file or
  whatnot (not sure how well that may be expected to work). Or, yeah, add
  an explicit --header 'Cookie: ...'.
 
 
  Ah, the misunderstanding was that the stuff you thought I missed was
  intended to push me in the direction of Plan B -- log in to yahoo with
  wget.

 Yes; and that's entirely my fault, as I didn't explicitly say that.

 No problem.


  I understand now. I'll look at trying to make this work. Thanks
  for all the help, though I can't guarantee that you are done yet :-)
  But, hopefully, this exchange will benefit others.

 I was actually surprised you kept going after I pointed out that it
 required the Accept-Encoding header that results in gzipped content.

 That didn't faze me because the pages I'm after will be processed by a
 python program, so having to gunzip would not require a manual step.

 This behavior is a little surprising to me from Yahoo!. It's not
 surprising in _general_, but for a site that really wants to be as
 accessible as possible (I would think?), insisting on the latest
 browsers seems ill-advised.

 Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape,
 visit a site, and get a server-generated page that's empty other than
 the phrase You're not using Internet Explorer. :p

 And taking it one step further, I'm greatly enjoying watching Microsoft
 thrash around, trying to save themselves, which I don't think they will.
 Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not
 going to produce milk too much longer. I've just installed the Chrome beta
 on the Windows side of one of my machines (I grudgingly give it 10 Gb on
 each machine; Linux gets the rest), and it looks very, very nice. They've
 still got work to do, but they appear to be heading in a very good
 direction. These are smart people at Google. All signs seem to be pointing
 towards more and more computing happening on the server side in the coming
 years.

 /Don


 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik
 3HbbATyqnrm0hAJXqNTqpl4=
 =3XD/
 -END PGP SIGNATURE-






-- 
Best Regards.
Please keep in touch.
This is unedited.
P-)


Re: Wget and Yahoo login?

2008-09-09 Thread Daniel Stenberg

On Mon, 8 Sep 2008, Donald Allen wrote:

The page I get is what would be obtained if an un-logged-in user went to the 
specified url. Opening that same url in Firefox *does* correctly indicate 
that it is logged in as me and reflects my customizations.


First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts 
need. Then you read the capure and replay them as closely as possible using 
your tool.


As you will find out, sites like this use all sorts of funny tricks to figure 
out you and to make it hard to automate what you're trying to do. They tend to 
use javascripts for redirects and for fiddling with cookies just to make sure 
you have a javascript and cookie enabled browser. So you need to work hard(er) 
when trying this with non-browsers.


It's certainly still possible, even without using the browser to get the first 
cookie file. But it may take some effort.


--

 / daniel.haxx.se


Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote:
 On Mon, 8 Sep 2008, Donald Allen wrote:

 The page I get is what would be obtained if an un-logged-in user went to
 the specified url. Opening that same url in Firefox *does* correctly
 indicate that it is logged in as me and reflects my customizations.

 First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts
 need. Then you read the capure and replay them as closely as possible using
 your tool.

 As you will find out, sites like this use all sorts of funny tricks to
 figure out you and to make it hard to automate what you're trying to do.
 They tend to use javascripts for redirects and for fiddling with cookies
 just to make sure you have a javascript and cookie enabled browser. So you
 need to work hard(er) when trying this with non-browsers.

 It's certainly still possible, even without using the browser to get the
 first cookie file. But it may take some effort.

I have not been able to retrieve a page with wget as if I were logged
in using --load-cookies and Micah's suggestion about 'Accept-Encoding'
(there was a typo in his message -- it's 'Accept-Encoding', not
'Accept-Encodings'). I did install livehttpheaders and tried
--no-cookies and --header cookie info from livehttpheaders and that
did work. Some of the cookie info sent by Firefox was a mystery,
because it's not in the cookie file. Perhaps that's the crucial
difference -- I'm speculating that wget isn't sending quite the same
thing as Firefox when --load-cookies is used, because Firefox is
adding stuff that isn't in the cookie file. Just a guess. Is there a
way to ask wget to print the headers it sends (ala livehttpheaders)?
I've looked through the options on the man page and didn't see
anything, though I might have missed it.


 --

  / daniel.haxx.se



Re: Wget and Yahoo login?

2008-09-09 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Donald Allen wrote:
 On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote:
 On Mon, 8 Sep 2008, Donald Allen wrote:

 The page I get is what would be obtained if an un-logged-in user went to
 the specified url. Opening that same url in Firefox *does* correctly
 indicate that it is logged in as me and reflects my customizations.
 First, LiveHTTPHeaders is the Firefox plugin everyone who tries these stunts
 need. Then you read the capure and replay them as closely as possible using
 your tool.

 As you will find out, sites like this use all sorts of funny tricks to
 figure out you and to make it hard to automate what you're trying to do.
 They tend to use javascripts for redirects and for fiddling with cookies
 just to make sure you have a javascript and cookie enabled browser. So you
 need to work hard(er) when trying this with non-browsers.

 It's certainly still possible, even without using the browser to get the
 first cookie file. But it may take some effort.
 
 I have not been able to retrieve a page with wget as if I were logged
 in using --load-cookies and Micah's suggestion about 'Accept-Encoding'
 (there was a typo in his message -- it's 'Accept-Encoding', not
 'Accept-Encodings'). I did install livehttpheaders and tried
 --no-cookies and --header cookie info from livehttpheaders and that
 did work.

That's how I did it as well (except I got the headers from tcpdump); I'm
using Firefox 3, so don't have access to FF's new sqllite-based cookies
file (apart from the patch at
http://wget.addictivecode.org/FrontPage?action=AttachFiledo=viewtarget=wget-firefox3-cookie.patch).

 Some of the cookie info sent by Firefox was a mystery,
 because it's not in the cookie file. Perhaps that's the crucial
 difference -- I'm speculating that wget isn't sending quite the same
 thing as Firefox when --load-cookies is used, because Firefox is
 adding stuff that isn't in the cookie file. Just a guess.

Probably there are session cookies involved, that are sent in the first
page, that you're not sending back with the form submit.
- --keep-session-cookies and --save-cookies=foo.txt make a good
combination.

 Is there a
 way to ask wget to print the headers it sends (ala livehttpheaders)?
 I've looked through the options on the man page and didn't see
 anything, though I might have missed it.

- --debug

- --
HTH,
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxqL77M8hyUobTrERAovFAJ9yagS2xW+2wFG65BwiFkJNfTMylgCfYaq7
1vOmTDimFg8E7Cn+Q+HGZn8=
=JKXH
-END PGP SIGNATURE-


Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 12:23 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
  On Tue, Sep 9, 2008 at 3:14 AM, Daniel Stenberg [EMAIL PROTECTED] wrote:
  On Mon, 8 Sep 2008, Donald Allen wrote:
 
  The page I get is what would be obtained if an un-logged-in user went
 to
  the specified url. Opening that same url in Firefox *does* correctly
  indicate that it is logged in as me and reflects my customizations.
  First, LiveHTTPHeaders is the Firefox plugin everyone who tries these
 stunts
  need. Then you read the capure and replay them as closely as possible
 using
  your tool.
 
  As you will find out, sites like this use all sorts of funny tricks to
  figure out you and to make it hard to automate what you're trying to do.
  They tend to use javascripts for redirects and for fiddling with cookies
  just to make sure you have a javascript and cookie enabled browser. So
 you
  need to work hard(er) when trying this with non-browsers.
 
  It's certainly still possible, even without using the browser to get the
  first cookie file. But it may take some effort.
 
  I have not been able to retrieve a page with wget as if I were logged
  in using --load-cookies and Micah's suggestion about 'Accept-Encoding'
  (there was a typo in his message -- it's 'Accept-Encoding', not
  'Accept-Encodings'). I did install livehttpheaders and tried
  --no-cookies and --header cookie info from livehttpheaders and that
  did work.

 That's how I did it as well (except I got the headers from tcpdump); I'm
 using Firefox 3, so don't have access to FF's new sqllite-based cookies
 file (apart from the patch at

 http://wget.addictivecode.org/FrontPage?action=AttachFiledo=viewtarget=wget-firefox3-cookie.patch
 ).

  Some of the cookie info sent by Firefox was a mystery,
  because it's not in the cookie file. Perhaps that's the crucial
  difference -- I'm speculating that wget isn't sending quite the same
  thing as Firefox when --load-cookies is used, because Firefox is
  adding stuff that isn't in the cookie file. Just a guess.

 Probably there are session cookies involved, that are sent in the first
 page, that you're not sending back with the form submit.
 - --keep-session-cookies and --save-cookies=foo.txt make a good
 combination.

  Is there a
  way to ask wget to print the headers it sends (ala livehttpheaders)?
  I've looked through the options on the man page and didn't see
  anything, though I might have missed it.

 - --debug


Well, I rebuilt my wget with the 'debug' use flag and ran it on the yahoo
test page (after having logged in to yahoo with firefox, of course) with
--load-cookies and the accept-encoding header item, with --debug. Very
useful. wget is sending every cookie item in firefox's cookies.txt. But
firefox sends three additional cookie items in the header that wget does not
send. Those items are *not* in firefox's cookies.txt so wget has no way of
knowing about them. Is it possible that firefox is not writing session
cookies to the file?

The result of this test, just to be clear, was a page that indicated yahoo
thought I was not logged in. Those extra items firefox is sending appear to
be the difference, because I included them (from the livehttpheaders output)
when I tried sending the cookies manually with --header, I got the same page
back with wget that indicated that yahoo knew I was logged in and formatted
with page with my preferences.

/Don





 - --
 HTH,
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxqL77M8hyUobTrERAovFAJ9yagS2xW+2wFG65BwiFkJNfTMylgCfYaq7
 1vOmTDimFg8E7Cn+Q+HGZn8=
 =JKXH
 -END PGP SIGNATURE-



Re: Wget and Yahoo login?

2008-09-09 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Donald Allen wrote:
 The result of this test, just to be clear, was a page that indicated
 yahoo thought I was not logged in. Those extra items firefox is sending
 appear to be the difference, because I included them (from the
 livehttpheaders output) when I tried sending the cookies manually with
 --header, I got the same page back with wget that indicated that yahoo
 knew I was logged in and formatted with page with my preferences.

Perhaps you missed this in my last message:

 Probably there are session cookies involved, that are sent in the first
 page, that you're not sending back with the form submit.
 --keep-session-cookies and --save-cookies=foo.txt make a good
 combination.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxrJ17M8hyUobTrERAvdsAJ9XEwMfimHXRUXKtV66P+YsG+tA7gCfWKbq
nCqAmXJfU3kTncMQkKk0JZo=
=17Yr
-END PGP SIGNATURE-


Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 1:29 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
  The result of this test, just to be clear, was a page that indicated
  yahoo thought I was not logged in. Those extra items firefox is sending
  appear to be the difference, because I included them (from the
  livehttpheaders output) when I tried sending the cookies manually with
  --header, I got the same page back with wget that indicated that yahoo
  knew I was logged in and formatted with page with my preferences.

 Perhaps you missed this in my last message:

  Probably there are session cookies involved, that are sent in the first
  page, that you're not sending back with the form submit.
  --keep-session-cookies and --save-cookies=foo.txt make a good
  combination.


I think we're mis-communicating, easily my fault, since I know just enough
about this stuff to be dangerous.

I am doing the yahoo session login with firefox, not with wget, so I'm using
the first and easier of your two suggested methods. I'm guessing you are
thinking that I'm trying to login to the yahoo session with wget, and thus
--keep-session-cookies and --save-cookies=foo.txt would make perfect sense
to me, but that's not what I'm doing (yet -- if I'm right about what's
happening here, I'm going to have to resort to this). But using firefox to
initiate the session, it looks to me like wget never gets to see the session
cookies because I don't think firefox writes them to its cookie file (which
actually makes sense -- if they only need to live as long as the session,
why write them out?).

/Don





 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxrJ17M8hyUobTrERAvdsAJ9XEwMfimHXRUXKtV66P+YsG+tA7gCfWKbq
 nCqAmXJfU3kTncMQkKk0JZo=
 =17Yr
 -END PGP SIGNATURE-



Re: Wget and Yahoo login?

2008-09-09 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Donald Allen wrote:
 I am doing the yahoo session login with firefox, not with wget, so I'm
 using the first and easier of your two suggested methods. I'm guessing
 you are thinking that I'm trying to login to the yahoo session with
 wget, and thus --keep-session-cookies and --save-cookies=foo.txt would
 make perfect sense to me, but that's not what I'm doing (yet -- if I'm
 right about what's happening here, I'm going to have to resort to this).
 But using firefox to initiate the session, it looks to me like wget
 never gets to see the session cookies because I don't think firefox
 writes them to its cookie file (which actually makes sense -- if they
 only need to live as long as the session, why write them out?).

Yes, and I understood this; the thing is, that if session cookies are
involved (i.e., cookies that are marked for immediate expiration and are
not meant to be saved to the cookies file), then I don't see how you
have much choice other than to use the harder method, or else to fake
the session cookies by manually inserting them to your cookies file or
whatnot (not sure how well that may be expected to work). Or, yeah, add
an explicit --header 'Cookie: ...'.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxrVD7M8hyUobTrERAt19AJ9bmmczCKjzMtGCoXb8B5g25uMLRQCeK8qh
M57W3Reqj+/pO8GuDwb9Nok=
=ajp/
-END PGP SIGNATURE-


Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
  I am doing the yahoo session login with firefox, not with wget, so I'm
  using the first and easier of your two suggested methods. I'm guessing
  you are thinking that I'm trying to login to the yahoo session with
  wget, and thus --keep-session-cookies and --save-cookies=foo.txt would
  make perfect sense to me, but that's not what I'm doing (yet -- if I'm
  right about what's happening here, I'm going to have to resort to this).
  But using firefox to initiate the session, it looks to me like wget
  never gets to see the session cookies because I don't think firefox
  writes them to its cookie file (which actually makes sense -- if they
  only need to live as long as the session, why write them out?).

 Yes, and I understood this; the thing is, that if session cookies are
 involved (i.e., cookies that are marked for immediate expiration and are
 not meant to be saved to the cookies file), then I don't see how you
 have much choice other than to use the harder method, or else to fake
 the session cookies by manually inserting them to your cookies file or
 whatnot (not sure how well that may be expected to work). Or, yeah, add
 an explicit --header 'Cookie: ...'.


Ah, the misunderstanding was that the stuff you thought I missed was
intended to push me in the direction of Plan B -- log in to yahoo with wget.
I understand now. I'll look at trying to make this work. Thanks for all the
help, though I can't guarantee that you are done yet :-) But, hopefully,
this exchange will benefit others.

/Don



 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxrVD7M8hyUobTrERAt19AJ9bmmczCKjzMtGCoXb8B5g25uMLRQCeK8qh
 M57W3Reqj+/pO8GuDwb9Nok=
 =ajp/
 -END PGP SIGNATURE-



Re: Wget and Yahoo login?

2008-09-09 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Donald Allen wrote:
 
 
 On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote:
 
 Donald Allen wrote:
 I am doing the yahoo session login with firefox, not with wget,
 so I'm
 using the first and easier of your two suggested methods. I'm
 guessing
 you are thinking that I'm trying to login to the yahoo session with
 wget, and thus --keep-session-cookies and
 --save-cookies=foo.txt would
 make perfect sense to me, but that's not what I'm doing (yet --
 if I'm
 right about what's happening here, I'm going to have to resort to
 this).
 But using firefox to initiate the session, it looks to me like wget
 never gets to see the session cookies because I don't think firefox
 writes them to its cookie file (which actually makes sense -- if they
 only need to live as long as the session, why write them out?).
 
 Yes, and I understood this; the thing is, that if session cookies are
 involved (i.e., cookies that are marked for immediate expiration and are
 not meant to be saved to the cookies file), then I don't see how you
 have much choice other than to use the harder method, or else to fake
 the session cookies by manually inserting them to your cookies file or
 whatnot (not sure how well that may be expected to work). Or, yeah, add
 an explicit --header 'Cookie: ...'.
 
 
 Ah, the misunderstanding was that the stuff you thought I missed was
 intended to push me in the direction of Plan B -- log in to yahoo with
 wget.

Yes; and that's entirely my fault, as I didn't explicitly say that.

 I understand now. I'll look at trying to make this work. Thanks
 for all the help, though I can't guarantee that you are done yet :-)
 But, hopefully, this exchange will benefit others.

I was actually surprised you kept going after I pointed out that it
required the Accept-Encoding header that results in gzipped content.
This behavior is a little surprising to me from Yahoo!. It's not
surprising in _general_, but for a site that really wants to be as
accessible as possible (I would think?), insisting on the latest
browsers seems ill-advised.

Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape,
visit a site, and get a server-generated page that's empty other than
the phrase You're not using Internet Explorer. :p

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik
3HbbATyqnrm0hAJXqNTqpl4=
=3XD/
-END PGP SIGNATURE-


Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
 
 
  On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  Donald Allen wrote:
  I am doing the yahoo session login with firefox, not with wget,
  so I'm
  using the first and easier of your two suggested methods. I'm
  guessing
  you are thinking that I'm trying to login to the yahoo session with
  wget, and thus --keep-session-cookies and
  --save-cookies=foo.txt would
  make perfect sense to me, but that's not what I'm doing (yet --
  if I'm
  right about what's happening here, I'm going to have to resort to
  this).
  But using firefox to initiate the session, it looks to me like wget
  never gets to see the session cookies because I don't think firefox
  writes them to its cookie file (which actually makes sense -- if they
  only need to live as long as the session, why write them out?).
 
  Yes, and I understood this; the thing is, that if session cookies are
  involved (i.e., cookies that are marked for immediate expiration and are
  not meant to be saved to the cookies file), then I don't see how you
  have much choice other than to use the harder method, or else to fake
  the session cookies by manually inserting them to your cookies file or
  whatnot (not sure how well that may be expected to work). Or, yeah, add
  an explicit --header 'Cookie: ...'.
 
 
  Ah, the misunderstanding was that the stuff you thought I missed was
  intended to push me in the direction of Plan B -- log in to yahoo with
  wget.

 Yes; and that's entirely my fault, as I didn't explicitly say that.


No problem.



  I understand now. I'll look at trying to make this work. Thanks
  for all the help, though I can't guarantee that you are done yet :-)
  But, hopefully, this exchange will benefit others.

 I was actually surprised you kept going after I pointed out that it
 required the Accept-Encoding header that results in gzipped content.


That didn't faze me because the pages I'm after will be processed by a
python program, so having to gunzip would not require a manual step.


 This behavior is a little surprising to me from Yahoo!. It's not
 surprising in _general_, but for a site that really wants to be as
 accessible as possible (I would think?), insisting on the latest
 browsers seems ill-advised.

 Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape,
 visit a site, and get a server-generated page that's empty other than
 the phrase You're not using Internet Explorer. :p


And taking it one step further, I'm greatly enjoying watching Microsoft
thrash around, trying to save themselves, which I don't think they will.
Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not
going to produce milk too much longer. I've just installed the Chrome beta
on the Windows side of one of my machines (I grudgingly give it 10 Gb on
each machine; Linux gets the rest), and it looks very, very nice. They've
still got work to do, but they appear to be heading in a very good
direction. These are smart people at Google. All signs seem to be pointing
towards more and more computing happening on the server side in the coming
years.

/Don




 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik
 3HbbATyqnrm0hAJXqNTqpl4=
 =3XD/
 -END PGP SIGNATURE-



Re: Wget and Yahoo login?

2008-09-09 Thread Donald Allen
After surprisingly little struggle, I got Plan B working -- logged into
yahoo with wget, saved the cookies, including session cookies, and then
proceeded to fetch pages using the saved cookies. Those pages came back
logged in as me, with my customizations. Thanks to Tony, Daniel, and Micah
-- you all provided critical advice in solving this problem.

/Don

On Tue, Sep 9, 2008 at 2:21 PM, Donald Allen [EMAIL PROTECTED] wrote:



 On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
 
 
  On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  Donald Allen wrote:
  I am doing the yahoo session login with firefox, not with wget,
  so I'm
  using the first and easier of your two suggested methods. I'm
  guessing
  you are thinking that I'm trying to login to the yahoo session with
  wget, and thus --keep-session-cookies and
  --save-cookies=foo.txt would
  make perfect sense to me, but that's not what I'm doing (yet --
  if I'm
  right about what's happening here, I'm going to have to resort to
  this).
  But using firefox to initiate the session, it looks to me like wget
  never gets to see the session cookies because I don't think firefox
  writes them to its cookie file (which actually makes sense -- if they
  only need to live as long as the session, why write them out?).
 
  Yes, and I understood this; the thing is, that if session cookies are
  involved (i.e., cookies that are marked for immediate expiration and are
  not meant to be saved to the cookies file), then I don't see how you
  have much choice other than to use the harder method, or else to fake
  the session cookies by manually inserting them to your cookies file or
  whatnot (not sure how well that may be expected to work). Or, yeah, add
  an explicit --header 'Cookie: ...'.
 
 
  Ah, the misunderstanding was that the stuff you thought I missed was
  intended to push me in the direction of Plan B -- log in to yahoo with
  wget.

 Yes; and that's entirely my fault, as I didn't explicitly say that.


 No problem.



  I understand now. I'll look at trying to make this work. Thanks
  for all the help, though I can't guarantee that you are done yet :-)
  But, hopefully, this exchange will benefit others.

 I was actually surprised you kept going after I pointed out that it
 required the Accept-Encoding header that results in gzipped content.


 That didn't faze me because the pages I'm after will be processed by a
 python program, so having to gunzip would not require a manual step.


 This behavior is a little surprising to me from Yahoo!. It's not
 surprising in _general_, but for a site that really wants to be as
 accessible as possible (I would think?), insisting on the latest
 browsers seems ill-advised.

 Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape,
 visit a site, and get a server-generated page that's empty other than
 the phrase You're not using Internet Explorer. :p


 And taking it one step further, I'm greatly enjoying watching Microsoft
 thrash around, trying to save themselves, which I don't think they will.
 Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not
 going to produce milk too much longer. I've just installed the Chrome beta
 on the Windows side of one of my machines (I grudgingly give it 10 Gb on
 each machine; Linux gets the rest), and it looks very, very nice. They've
 still got work to do, but they appear to be heading in a very good
 direction. These are smart people at Google. All signs seem to be pointing
 towards more and more computing happening on the server side in the coming
 years.

 /Don




 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik
 3HbbATyqnrm0hAJXqNTqpl4=
 =3XD/
 -END PGP SIGNATURE-





Re: Wget and Yahoo login?

2008-09-08 Thread Donald Allen
2008/9/8 Tony Godshall [EMAIL PROTECTED]:
 I haven't done this but I can speculate that you need to
 have wget identify itself as firefox.

When I read this, I thought it looked promising, but it doesn't work.
I tried sending exactly the user-agent string firefox is sending and
still got a page from yahoo that clearly indicates yahoo thinks I'm
not logged in.

/Don


 Quote from man wget...

   -U agent-string
   --user-agent=agent-string
   Identify as agent-string to the HTTP server.

   The HTTP protocol allows the clients to identify themselves
 using a User-Agent header field.  This enables distinguishing the
 WWW software,
   usually for statistical purposes or for tracing of protocol
 violations.  Wget normally identifies as Wget/version, version being
 the current ver‐
   sion number of Wget.

   However, some sites have been known to impose the policy of
 tailoring the output according to the User-Agent-supplied
 information.  While this
   is not such a bad idea in theory, it has been abused by
 servers denying information to clients other than (historically)
 Netscape or, more fre‐
   quently, Microsoft Internet Explorer.  This option allows
 you to change the User-Agent line issued by Wget.  Use of this
 option is discouraged,
   unless you really know what you are doing.


 On Mon, Sep 8, 2008 at 12:25 PM, Donald Allen [EMAIL PROTECTED] wrote:
 There was a recent discussion concerning using wget to obtain pages
 from yahoo logged into yahoo as a particular user. Micah replied to
 Rick Nakroshis with instructions describing two methods for doing
 this. This information has also been added by Micah to the wiki.

 I just tried the simpler of the two methods -- logging into yahoo with
 my browser (Firefox 2.0.0.16) and then downloading a page with

 wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home
 directory/.mozilla/firefox/id2dmo7r.default/cookies.txt
 'http://yahoo url'

 The page I get is what would be obtained if an un-logged-in user went
 to the specified url. Opening that same url in Firefox *does*
 correctly indicate that it is logged in as me and reflects my
 customizations.

 wget -V:
 GNU Wget 1.11.1

 I am running a reasonably up-to-date Gentoo system (updated within the
 last month) on a Thinkpad X61.

 Have I missed something here? Any help will be appreciated. Please
 include my personal address in your replies as I am not (yet) a
 subscriber to this list.

 Thanks --
 /Don Allen




 --
 Best Regards.
 Please keep in touch.
 This is unedited.
 P-)



Re: Wget and Yahoo login?

2008-09-08 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Donald Allen wrote:
 There was a recent discussion concerning using wget to obtain pages
 from yahoo logged into yahoo as a particular user. Micah replied to
 Rick Nakroshis with instructions describing two methods for doing
 this. This information has also been added by Micah to the wiki.
 
 I just tried the simpler of the two methods -- logging into yahoo with
 my browser (Firefox 2.0.0.16) and then downloading a page with
 
 wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home
 directory/.mozilla/firefox/id2dmo7r.default/cookies.txt
 'http://yahoo url'
 
 The page I get is what would be obtained if an un-logged-in user went
 to the specified url. Opening that same url in Firefox *does*
 correctly indicate that it is logged in as me and reflects my
 customizations.

Are you signing into the main Yahoo! site?

When I try to do so, whether I use the cookies or no, I get a message
about update your browser to something more modern or the like. The
difference appears to be a combination of _both_ User-Agent (as you've
done), _and_ --header Accept-Encodings: gzip,deflate. This plus
appropriate cookies gets me a decent logged-in page, but of course it's
gzip-compressed.

Since Wget doesn't currently support gzip-decoding and the like, that
makes the use of Wget in this situation cumbersome. Support for
something like this probably won't be seen until 1.13 or 1.14, I'm afraid.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxdw77M8hyUobTrERAi/QAJ0atPMeUQ/0YCNwAP+XiH4nDyvclwCcDxYo
obud0CjpATBYDvA0eS3ZHGY=
=vv4R
-END PGP SIGNATURE-


RE: Wget and Yahoo login?

2008-08-21 Thread Tony Lewis
Micah Cowan wrote:

 The easiest way to do what you want may be to log in using your browser,
 and then tell Wget to use the cookies from your browser, using

Given the frequency of the login and then download a file use case , it
should probably be documented on the wiki. (Perhaps it already is. :-)

Also, it would probably be helpful to have a shell script to automate this.

Tony



Re: Wget and Yahoo login?

2008-08-21 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tony Lewis wrote:
 Micah Cowan wrote:
 
 The easiest way to do what you want may be to log in using your browser,
 and then tell Wget to use the cookies from your browser, using
 
 Given the frequency of the login and then download a file use case , it
 should probably be documented on the wiki. (Perhaps it already is. :-)

Yeah, at
http://wget.addictivecode.org/FrequentlyAskedQuestions#password-protected

I think you missed the final sentence of my how-to:

 (I'm going to put this up on the Wgiki Faq now, at
 http://wget.addictivecode.org/FrequentlyAskedQuestions)

:)

(Back to you:)
 Also, it would probably be helpful to have a shell script to automate this.

I filed the following issue some time ago:
https://savannah.gnu.org/bugs/index.php?22561

The report is low on details; but I was envisioning something that would
spew out forms and their fields, accept values for fields in one form,
and invoke the appropriate Wget command to do the submission.

I don't know if it could be _completely_ automated, since it's not 100%
possible for the script to know which form fields are the ones it should
be filling out.

OTOH, there are some damn good heuristics that could be done: I imagine
that the right form (in the event of more than one) can usually be
guessed by seeing which one has a password-type input (assuming
there's also only one of those). If that form has only one text-type
input, then we've found the username field as well. Name-based
heuristics (with pass, user, uname, login, etc) could also help.

If someone wants to do this, that'd be terrific. Could probably reuse
the existing HTML parser code from Wget. Otherwise, it'd probably be a
while before I could get to it, since I've got higher priorities that
have been languishing.

Such a tool might also be an appropriate place to add FF3 sqllite
cookies support.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIrb0s7M8hyUobTrERAlVXAJ9YnAM7JiQrxrB/KclA1FXDnoVswgCdGO7t
Vaa98nhNRuEY4aLMx2BFXm0=
=ScoA
-END PGP SIGNATURE-


Re: Wget and Yahoo login?

2008-08-11 Thread Rick Nakroshis

At 04:27 PM 8/10/2008, you wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Rick Nakroshis wrote:
 Micah,

 If you will excuse a quick question about Wget, I'm trying to find out
 if I can use it to download a page from Yahoo that requires me to be
 logged in using my Yahoo profile name and password.  It's a display of a
 CSV file, and the only wrinkle is trying to get past the Yahoo login.

 Try as I may, I just can't seem to find anything about Wget and Yahoo.
 Any suggestions or pointers?

Hi Rick,

In the future, it's better if you post questions to the mailing list at
wget@sunsite.dk; I don't always have time to respond.

The easiest way to do what you want may be to log in using your browser,
and then tell Wget to use the cookies from your browser, using
- --load-cookies=path-to-browser's-cookies. Of course, this only works
if your browser saves its cookies in the standard text format (Firefox
prior to version 3 will do this), or can export to that format (note
that someone contributed a patch to allow Wget to work with Firefox 3
cookies; it's linked from http://wget.addictivecode.org/, it's
unoffocial so I can't vouch for its quality).

Otherwise, you can perform the login using Wget, saving the cookies to a
file of your choice, using --post-data=..., --save-cookies=cookies.txt,
and probably --keep-session-cookies. This will require that you know
what data to place in --post-data, which generally requires that you dig
around in the HTML to find the right form field names, and where to post
them.

For instance, if you find a form like the following within the page
containing the log-in form:

form action=/doLogin.php method=POST
  input type=text name=s-login
  input type=password name=s-pass
/form

then you need to do something like:

  $ wget --post-data='s-login=USERNAMEs-pass=PASSWORD' \
--save-cookies=my-cookies.txt --keep-session-cookies \
http://HOSTNAME/doLogin.php

(Note that you _don't_ necessarily send the information to the page that
 had the login page: you send it to the spot mentioned in the action
attribute of the password form.)

Once this is done, you _should_ be able to perform further operations
with Wget as if you're logged in, by using

  $ wget --load-cookies=my-cookies.txt --save-cookies=my-cookies.txt \
--keep-session-cookies ...

(I'm going to put this up on the Wgiki Faq now, at
http://wget.addictivecode.org/FrequentlyAskedQuestions)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIn09A7M8hyUobTrERAu04AJ9EgRoBBhvNCDwOt87f91p+HpWktACdFgMM
KEfliBtfrPBbh/XdvusEPiw=
=qlGZ
-END PGP SIGNATURE-



Micah,

Thank you for taking the time to answer so thoroughly, and doing so 
promptly, too.  You've given me a great boost forward, and I appreciate it.


Thank you, sir!


Rick



Re: Wget and Yahoo login?

2008-08-10 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Rick Nakroshis wrote:
 Micah,
 
 If you will excuse a quick question about Wget, I'm trying to find out
 if I can use it to download a page from Yahoo that requires me to be
 logged in using my Yahoo profile name and password.  It's a display of a
 CSV file, and the only wrinkle is trying to get past the Yahoo login.
 
 Try as I may, I just can't seem to find anything about Wget and Yahoo. 
 Any suggestions or pointers?

Hi Rick,

In the future, it's better if you post questions to the mailing list at
wget@sunsite.dk; I don't always have time to respond.

The easiest way to do what you want may be to log in using your browser,
and then tell Wget to use the cookies from your browser, using
- --load-cookies=path-to-browser's-cookies. Of course, this only works
if your browser saves its cookies in the standard text format (Firefox
prior to version 3 will do this), or can export to that format (note
that someone contributed a patch to allow Wget to work with Firefox 3
cookies; it's linked from http://wget.addictivecode.org/, it's
unoffocial so I can't vouch for its quality).

Otherwise, you can perform the login using Wget, saving the cookies to a
file of your choice, using --post-data=..., --save-cookies=cookies.txt,
and probably --keep-session-cookies. This will require that you know
what data to place in --post-data, which generally requires that you dig
around in the HTML to find the right form field names, and where to post
them.

For instance, if you find a form like the following within the page
containing the log-in form:

form action=/doLogin.php method=POST
  input type=text name=s-login
  input type=password name=s-pass
/form

then you need to do something like:

  $ wget --post-data='s-login=USERNAMEs-pass=PASSWORD' \
--save-cookies=my-cookies.txt --keep-session-cookies \
http://HOSTNAME/doLogin.php

(Note that you _don't_ necessarily send the information to the page that
 had the login page: you send it to the spot mentioned in the action
attribute of the password form.)

Once this is done, you _should_ be able to perform further operations
with Wget as if you're logged in, by using

  $ wget --load-cookies=my-cookies.txt --save-cookies=my-cookies.txt \
--keep-session-cookies ...

(I'm going to put this up on the Wgiki Faq now, at
http://wget.addictivecode.org/FrequentlyAskedQuestions)

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIn09A7M8hyUobTrERAu04AJ9EgRoBBhvNCDwOt87f91p+HpWktACdFgMM
KEfliBtfrPBbh/XdvusEPiw=
=qlGZ
-END PGP SIGNATURE-