Hello, All and bug #21793

2008-09-08 Thread David Coon
Hello everyone,

I thought I'd introduce myself to you all, as I intend to start helping out
with wget.  This will be my first time contributing to any kind of free or
open source software, so I may have some basic questions down the line about
best practices and such, though I'll try to keep that to a minimum.

Anyway, I've been researching unicode and utf-8 recently, so I'm gonna try
to tackle bug #21793 https://savannah.gnu.org/bugs/?21793.

-David A Coon


Re: Hello, All and bug #21793

2008-09-08 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

David Coon wrote:
 Hello everyone,
 
 I thought I'd introduce myself to you all, as I intend to start helping
 out with wget.  This will be my first time contributing to any kind of
 free or open source software, so I may have some basic questions down
 the line about best practices and such, though I'll try to keep that to
 a minimum.
 
 Anyway, I've been researching unicode and utf-8 recently, so I'm gonna
 try to tackle bug #21793 https://savannah.gnu.org/bugs/?21793. 

Hi David, and welcome!

If you haven't already, please see
http://wget.addictivecode.org/HelpingWithWget

I'd encourage you to get a Savannah account, so I can assign that bug to
you. Also, I tend to hang out quite a bit on IRC (#wget @
irc.freenode.net), so you might want to sign on there.

Since you mentioned an interest in Unicode and UTF-8, you might want to
check out Saint Xavier's recent work on IRI and iDNS support in Wget,
which is available at http://hg.addictivecode.org/wget/sxav/.

Among other things, sxav's additions make Wget more aware of the user's
locale, so it might be useful for providing a feature to automatically
transcode filenames to the user's locale, rather than just supporting
UTF-8 only (which should still probably remain an explicit option). If
that sounds like the direction you'd like to take it, you should
probably base your work on sxav's repository, rather than mainline.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxViR7M8hyUobTrERAv/jAJ9/DxAaPaYpdLJojX9gorHn2hqwSACeK7oD
veVZAIH2NjbYI8dG6DimjRg=
=9Qau
-END PGP SIGNATURE-


Wget and Yahoo login?

2008-09-08 Thread Donald Allen
There was a recent discussion concerning using wget to obtain pages
from yahoo logged into yahoo as a particular user. Micah replied to
Rick Nakroshis with instructions describing two methods for doing
this. This information has also been added by Micah to the wiki.

I just tried the simpler of the two methods -- logging into yahoo with
my browser (Firefox 2.0.0.16) and then downloading a page with

wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home
directory/.mozilla/firefox/id2dmo7r.default/cookies.txt
'http://yahoo url'

The page I get is what would be obtained if an un-logged-in user went
to the specified url. Opening that same url in Firefox *does*
correctly indicate that it is logged in as me and reflects my
customizations.

wget -V:
GNU Wget 1.11.1

I am running a reasonably up-to-date Gentoo system (updated within the
last month) on a Thinkpad X61.

Have I missed something here? Any help will be appreciated. Please
include my personal address in your replies as I am not (yet) a
subscriber to this list.

Thanks --
/Don Allen


Re: Wget and Yahoo login?

2008-09-08 Thread Donald Allen
2008/9/8 Tony Godshall [EMAIL PROTECTED]:
 I haven't done this but I can speculate that you need to
 have wget identify itself as firefox.

When I read this, I thought it looked promising, but it doesn't work.
I tried sending exactly the user-agent string firefox is sending and
still got a page from yahoo that clearly indicates yahoo thinks I'm
not logged in.

/Don


 Quote from man wget...

   -U agent-string
   --user-agent=agent-string
   Identify as agent-string to the HTTP server.

   The HTTP protocol allows the clients to identify themselves
 using a User-Agent header field.  This enables distinguishing the
 WWW software,
   usually for statistical purposes or for tracing of protocol
 violations.  Wget normally identifies as Wget/version, version being
 the current ver‐
   sion number of Wget.

   However, some sites have been known to impose the policy of
 tailoring the output according to the User-Agent-supplied
 information.  While this
   is not such a bad idea in theory, it has been abused by
 servers denying information to clients other than (historically)
 Netscape or, more fre‐
   quently, Microsoft Internet Explorer.  This option allows
 you to change the User-Agent line issued by Wget.  Use of this
 option is discouraged,
   unless you really know what you are doing.


 On Mon, Sep 8, 2008 at 12:25 PM, Donald Allen [EMAIL PROTECTED] wrote:
 There was a recent discussion concerning using wget to obtain pages
 from yahoo logged into yahoo as a particular user. Micah replied to
 Rick Nakroshis with instructions describing two methods for doing
 this. This information has also been added by Micah to the wiki.

 I just tried the simpler of the two methods -- logging into yahoo with
 my browser (Firefox 2.0.0.16) and then downloading a page with

 wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home
 directory/.mozilla/firefox/id2dmo7r.default/cookies.txt
 'http://yahoo url'

 The page I get is what would be obtained if an un-logged-in user went
 to the specified url. Opening that same url in Firefox *does*
 correctly indicate that it is logged in as me and reflects my
 customizations.

 wget -V:
 GNU Wget 1.11.1

 I am running a reasonably up-to-date Gentoo system (updated within the
 last month) on a Thinkpad X61.

 Have I missed something here? Any help will be appreciated. Please
 include my personal address in your replies as I am not (yet) a
 subscriber to this list.

 Thanks --
 /Don Allen




 --
 Best Regards.
 Please keep in touch.
 This is unedited.
 P-)



Re: Wget and Yahoo login?

2008-09-08 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Donald Allen wrote:
 There was a recent discussion concerning using wget to obtain pages
 from yahoo logged into yahoo as a particular user. Micah replied to
 Rick Nakroshis with instructions describing two methods for doing
 this. This information has also been added by Micah to the wiki.
 
 I just tried the simpler of the two methods -- logging into yahoo with
 my browser (Firefox 2.0.0.16) and then downloading a page with
 
 wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home
 directory/.mozilla/firefox/id2dmo7r.default/cookies.txt
 'http://yahoo url'
 
 The page I get is what would be obtained if an un-logged-in user went
 to the specified url. Opening that same url in Firefox *does*
 correctly indicate that it is logged in as me and reflects my
 customizations.

Are you signing into the main Yahoo! site?

When I try to do so, whether I use the cookies or no, I get a message
about update your browser to something more modern or the like. The
difference appears to be a combination of _both_ User-Agent (as you've
done), _and_ --header Accept-Encodings: gzip,deflate. This plus
appropriate cookies gets me a decent logged-in page, but of course it's
gzip-compressed.

Since Wget doesn't currently support gzip-decoding and the like, that
makes the use of Wget in this situation cumbersome. Support for
something like this probably won't be seen until 1.13 or 1.14, I'm afraid.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIxdw77M8hyUobTrERAi/QAJ0atPMeUQ/0YCNwAP+XiH4nDyvclwCcDxYo
obud0CjpATBYDvA0eS3ZHGY=
=vv4R
-END PGP SIGNATURE-