Hello, All and bug #21793
Hello everyone, I thought I'd introduce myself to you all, as I intend to start helping out with wget. This will be my first time contributing to any kind of free or open source software, so I may have some basic questions down the line about best practices and such, though I'll try to keep that to a minimum. Anyway, I've been researching unicode and utf-8 recently, so I'm gonna try to tackle bug #21793 https://savannah.gnu.org/bugs/?21793. -David A Coon
Re: Hello, All and bug #21793
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 David Coon wrote: Hello everyone, I thought I'd introduce myself to you all, as I intend to start helping out with wget. This will be my first time contributing to any kind of free or open source software, so I may have some basic questions down the line about best practices and such, though I'll try to keep that to a minimum. Anyway, I've been researching unicode and utf-8 recently, so I'm gonna try to tackle bug #21793 https://savannah.gnu.org/bugs/?21793. Hi David, and welcome! If you haven't already, please see http://wget.addictivecode.org/HelpingWithWget I'd encourage you to get a Savannah account, so I can assign that bug to you. Also, I tend to hang out quite a bit on IRC (#wget @ irc.freenode.net), so you might want to sign on there. Since you mentioned an interest in Unicode and UTF-8, you might want to check out Saint Xavier's recent work on IRI and iDNS support in Wget, which is available at http://hg.addictivecode.org/wget/sxav/. Among other things, sxav's additions make Wget more aware of the user's locale, so it might be useful for providing a feature to automatically transcode filenames to the user's locale, rather than just supporting UTF-8 only (which should still probably remain an explicit option). If that sounds like the direction you'd like to take it, you should probably base your work on sxav's repository, rather than mainline. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxViR7M8hyUobTrERAv/jAJ9/DxAaPaYpdLJojX9gorHn2hqwSACeK7oD veVZAIH2NjbYI8dG6DimjRg= =9Qau -END PGP SIGNATURE-
Wget and Yahoo login?
There was a recent discussion concerning using wget to obtain pages from yahoo logged into yahoo as a particular user. Micah replied to Rick Nakroshis with instructions describing two methods for doing this. This information has also been added by Micah to the wiki. I just tried the simpler of the two methods -- logging into yahoo with my browser (Firefox 2.0.0.16) and then downloading a page with wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home directory/.mozilla/firefox/id2dmo7r.default/cookies.txt 'http://yahoo url' The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. wget -V: GNU Wget 1.11.1 I am running a reasonably up-to-date Gentoo system (updated within the last month) on a Thinkpad X61. Have I missed something here? Any help will be appreciated. Please include my personal address in your replies as I am not (yet) a subscriber to this list. Thanks -- /Don Allen
Re: Wget and Yahoo login?
2008/9/8 Tony Godshall [EMAIL PROTECTED]: I haven't done this but I can speculate that you need to have wget identify itself as firefox. When I read this, I thought it looked promising, but it doesn't work. I tried sending exactly the user-agent string firefox is sending and still got a page from yahoo that clearly indicates yahoo thinks I'm not logged in. /Don Quote from man wget... -U agent-string --user-agent=agent-string Identify as agent-string to the HTTP server. The HTTP protocol allows the clients to identify themselves using a User-Agent header field. This enables distinguishing the WWW software, usually for statistical purposes or for tracing of protocol violations. Wget normally identifies as Wget/version, version being the current ver‐ sion number of Wget. However, some sites have been known to impose the policy of tailoring the output according to the User-Agent-supplied information. While this is not such a bad idea in theory, it has been abused by servers denying information to clients other than (historically) Netscape or, more fre‐ quently, Microsoft Internet Explorer. This option allows you to change the User-Agent line issued by Wget. Use of this option is discouraged, unless you really know what you are doing. On Mon, Sep 8, 2008 at 12:25 PM, Donald Allen [EMAIL PROTECTED] wrote: There was a recent discussion concerning using wget to obtain pages from yahoo logged into yahoo as a particular user. Micah replied to Rick Nakroshis with instructions describing two methods for doing this. This information has also been added by Micah to the wiki. I just tried the simpler of the two methods -- logging into yahoo with my browser (Firefox 2.0.0.16) and then downloading a page with wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home directory/.mozilla/firefox/id2dmo7r.default/cookies.txt 'http://yahoo url' The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. wget -V: GNU Wget 1.11.1 I am running a reasonably up-to-date Gentoo system (updated within the last month) on a Thinkpad X61. Have I missed something here? Any help will be appreciated. Please include my personal address in your replies as I am not (yet) a subscriber to this list. Thanks -- /Don Allen -- Best Regards. Please keep in touch. This is unedited. P-)
Re: Wget and Yahoo login?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Donald Allen wrote: There was a recent discussion concerning using wget to obtain pages from yahoo logged into yahoo as a particular user. Micah replied to Rick Nakroshis with instructions describing two methods for doing this. This information has also been added by Micah to the wiki. I just tried the simpler of the two methods -- logging into yahoo with my browser (Firefox 2.0.0.16) and then downloading a page with wget --output-document=/tmp/yahoo/yahoo.htm --load-cookies my home directory/.mozilla/firefox/id2dmo7r.default/cookies.txt 'http://yahoo url' The page I get is what would be obtained if an un-logged-in user went to the specified url. Opening that same url in Firefox *does* correctly indicate that it is logged in as me and reflects my customizations. Are you signing into the main Yahoo! site? When I try to do so, whether I use the cookies or no, I get a message about update your browser to something more modern or the like. The difference appears to be a combination of _both_ User-Agent (as you've done), _and_ --header Accept-Encodings: gzip,deflate. This plus appropriate cookies gets me a decent logged-in page, but of course it's gzip-compressed. Since Wget doesn't currently support gzip-decoding and the like, that makes the use of Wget in this situation cumbersome. Support for something like this probably won't be seen until 1.13 or 1.14, I'm afraid. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIxdw77M8hyUobTrERAi/QAJ0atPMeUQ/0YCNwAP+XiH4nDyvclwCcDxYo obud0CjpATBYDvA0eS3ZHGY= =vv4R -END PGP SIGNATURE-