small doc typo in 9.1 Robot Exclusion

2008-09-10 Thread Michael Kessler
9.1 Robot Exclusion

..
.
Although Wget is not a web robot in the strictest sense of the word, it
can downloads large parts of the site without the user's...
..
.

possibly meant:
...it can download large 

cheers 
michael



Re: Wget and Yahoo login?

2008-09-10 Thread Tony Godshall
And you'll probably have to do this again- I bet
yahoo expires the session cookies!


On Tue, Sep 9, 2008 at 2:18 PM, Donald Allen [EMAIL PROTECTED] wrote:
 After surprisingly little struggle, I got Plan B working -- logged into
 yahoo with wget, saved the cookies, including session cookies, and then
 proceeded to fetch pages using the saved cookies. Those pages came back
 logged in as me, with my customizations. Thanks to Tony, Daniel, and Micah
 -- you all provided critical advice in solving this problem.

 /Don

 On Tue, Sep 9, 2008 at 2:21 PM, Donald Allen [EMAIL PROTECTED] wrote:


 On Tue, Sep 9, 2008 at 1:51 PM, Micah Cowan [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Donald Allen wrote:
 
 
  On Tue, Sep 9, 2008 at 1:41 PM, Micah Cowan [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  Donald Allen wrote:
  I am doing the yahoo session login with firefox, not with wget,
  so I'm
  using the first and easier of your two suggested methods. I'm
  guessing
  you are thinking that I'm trying to login to the yahoo session with
  wget, and thus --keep-session-cookies and
  --save-cookies=foo.txt would
  make perfect sense to me, but that's not what I'm doing (yet --
  if I'm
  right about what's happening here, I'm going to have to resort to
  this).
  But using firefox to initiate the session, it looks to me like wget
  never gets to see the session cookies because I don't think firefox
  writes them to its cookie file (which actually makes sense -- if they
  only need to live as long as the session, why write them out?).
 
  Yes, and I understood this; the thing is, that if session cookies are
  involved (i.e., cookies that are marked for immediate expiration and
  are
  not meant to be saved to the cookies file), then I don't see how you
  have much choice other than to use the harder method, or else to fake
  the session cookies by manually inserting them to your cookies file or
  whatnot (not sure how well that may be expected to work). Or, yeah, add
  an explicit --header 'Cookie: ...'.
 
 
  Ah, the misunderstanding was that the stuff you thought I missed was
  intended to push me in the direction of Plan B -- log in to yahoo with
  wget.

 Yes; and that's entirely my fault, as I didn't explicitly say that.

 No problem.


  I understand now. I'll look at trying to make this work. Thanks
  for all the help, though I can't guarantee that you are done yet :-)
  But, hopefully, this exchange will benefit others.

 I was actually surprised you kept going after I pointed out that it
 required the Accept-Encoding header that results in gzipped content.

 That didn't faze me because the pages I'm after will be processed by a
 python program, so having to gunzip would not require a manual step.

 This behavior is a little surprising to me from Yahoo!. It's not
 surprising in _general_, but for a site that really wants to be as
 accessible as possible (I would think?), insisting on the latest
 browsers seems ill-advised.

 Ah, well. At least the days are _mostly_ gone when I'd fire up Netscape,
 visit a site, and get a server-generated page that's empty other than
 the phrase You're not using Internet Explorer. :p

 And taking it one step further, I'm greatly enjoying watching Microsoft
 thrash around, trying to save themselves, which I don't think they will.
 Perhaps they'll re-invent themselves, as IBM did, but their cash cow is not
 going to produce milk too much longer. I've just installed the Chrome beta
 on the Windows side of one of my machines (I grudgingly give it 10 Gb on
 each machine; Linux gets the rest), and it looks very, very nice. They've
 still got work to do, but they appear to be heading in a very good
 direction. These are smart people at Google. All signs seem to be pointing
 towards more and more computing happening on the server side in the coming
 years.

 /Don


 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer.
 GNU Maintainer: wget, screen, teseq
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIxreZ7M8hyUobTrERAslyAJwKfirhzth9ACgdunxp/rfQlR86mQCcClik
 3HbbATyqnrm0hAJXqNTqpl4=
 =3XD/
 -END PGP SIGNATURE-






-- 
Best Regards.
Please keep in touch.
This is unedited.
P-)