Re: Support for file://
Michelle Konzack napsal(a): Am 2008-09-20 22:05:35, schrieb Micah Cowan: I'm confused. If you can successfully download the files from HOSTINGPROVIDER in the first place, then why would a difference exist? And if you can't, then this wouldn't be an effective way to find out. I mean, IF you have a local (master) mirror and your website @ISP and you want to know, whether the two websites are identical and have no cruft in it, you can I didn't follow this thread, however, just FYI, there exist excellent (not only) FTP client called "lftp" that has built-in command "mirror". The command has similar effect as rsync tool---i.e. it synchronize remote and local directories recursively. -- Petr signature.asc Description: OpenPGP digital signature
Re: Support for file://
Am 2008-09-20 22:05:35, schrieb Micah Cowan: > I'm confused. If you can successfully download the files from > HOSTINGPROVIDER in the first place, then why would a difference exist? > And if you can't, then this wouldn't be an effective way to find out. I mean, IF you have a local (master) mirror and your website @ISP and you want to know, whether the two websites are identical and have no cruft in it, you can 1) fetch the website from your isp recursively with wget -r -nH -R /tmp/tmp_ISP http://website.isp.tld/ 2) fetch the local mirror with wget -r -nH -R /tmp/tmp_LOC file://path/to/local/mirror/ where the full path in 2) would be the same as the website in 1) and then compare it with 3) /path/to/local/mirror/ If you have edited the files local and remote, you can get surprising results. Fetching recursive of /index.html mean, that ALL files are downloaded which are mentioned in ANY HTML files. So if 1) differs from ftp://website.isp.tld/ then there is something wrong in the site... Thanks, Greetings and nice Day/Evening Michelle Konzack Systemadministrator 24V Electronic Engineer Tamay Dogan Network Debian GNU/Linux Consultant -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ # Debian GNU/Linux Consultant # Michelle Konzack Apt. 917 ICQ #328449886 +49/177/935194750, rue de Soultz MSN LinuxMichi +33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com) signature.pgp Description: Digital signature
Re: Support for file://
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 David wrote: > > Hi Micah, > > Your're right - this was raised before and in fact it was a feature > Mauro Tortonesi intended to be implemented for the 1.12 release, but it > seems to have been forgotten somewhere along the line. I wrote to the > list in 2006 describing what I consider a compelling reason to support > file:// . Here is what I wrote then: > > At 03:45 PM 26/06/2006, David wrote: > In replies to the post requesting support of the "file://" scheme, > requests were made for someone to provide a compelling reason to want to > do this. Perhaps the following is such a reason. > I have a CD with HTML content (it is a CD of abstracts from a scientific > conference), however for space reasons not all the content was included > on the CD - there remain links to figures and diagrams on a remote web > site. I'd like to create an archive of the complete content locally by > having wget retrieve everything and convert the links to point to the > retrieved material. Thus the wget functionality when retrieving the > local files should work the same as if the files were retrieved from a > web server (i.e. the input local file needs to be processed, both local > and remote content retrieved, and the copies made of the local and > remote files all need to be adjusted to now refer to the local copy > rather than the remote content). A simple shell script that runs cp or > rsync on local files without any further processing would not achieve > this aim. Fair enough. This example at least makes sense to me. I suppose it can't hurt to provide this, so long as we document clearly that it is not a replacement for cp or rsync, and is never intended to be (won't handle attributes and special file properties). However, support for file:// will introduce security issues, care is needed. For instance, file:// should never be respected when it comes from the web. Even on the local machine, it could be problematic to use it on files writable by other users (as they can then craft links to download privileged files with upgraded permissions). Perhaps files that are only readable for root should always be skipped, or wget should require a "--force" sort of option if the current mode can result in more permissive settings on the downloaded file. Perhaps it would be wise to make this a configurable option. It might also be prudent to enable an option for file:// to be disallowed for root. https://savannah.gnu.org/bugs/?24347 If any of you can think of additional security issues that will need consideration, please add them in comments to the report. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI19aE7M8hyUobTrERAt49AJ4irLGMd6OVRWeooKPqZxmX0+K2agCfaq2d Mx9IgSo5oUDQgBPD01mcGcY= =sdAZ -END PGP SIGNATURE-
Re: Support for file://
Hi Micah, Your're right - this was raised before and in fact it was a feature Mauro Tortonesi intended to be implemented for the 1.12 release, but it seems to have been forgotten somewhere along the line. I wrote to the list in 2006 describing what I consider a compelling reason to support file://. Here is what I wrote then: At 03:45 PM 26/06/2006, David wrote: In replies to the post requesting support of the "file://" scheme, requests were made for someone to provide a compelling reason to want to do this. Perhaps the following is such a reason. I have a CD with HTML content (it is a CD of abstracts from a scientific conference), however for space reasons not all the content was included on the CD - there remain links to figures and diagrams on a remote web site. I'd like to create an archive of the complete content locally by having wget retrieve everything and convert the links to point to the retrieved material. Thus the wget functionality when retrieving the local files should work the same as if the files were retrieved from a web server (i.e. the input local file needs to be processed, both local and remote content retrieved, and the copies made of the local and remote files all need to be adjusted to now refer to the local copy rather than the remote content). A simple shell script that runs cp or rsync on local files without any further processing would not achieve this aim. Regarding to where the local files should be copied, I suggest a default scheme similar to current http functionality. For example, if the local source was /source/index.htm, and I ran something like: wget.exe -m -np -k file:///source/index.htm this could be retrieved to ./source/index.htm (assuming that I ran the command from anywhere other than the root directory). On Windows, if the local source file is c:\test.htm, then the destination could be .\c\test.htm. It would probably be fair enough for wget to throw up an error if the source and destination were the same file (and perhaps helpfully suggest that the user changes into a new subdirectory and retry the command). One additional problem this scheme needs to deal with is when one or more /../ in the path specification results in the destination being above the current parent directory; then the destination would have to be adjusted to ensure the file remained within the parent directory structure. For example, if I am in /dir/dest/ and ran wget.exe -m -np -k file://../../source/index.htm this could be saved to ./source/index.htm (i.e. /dir/dest/source/index.htm) -David. At 08:49 AM 3/09/2008, you wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Petri Koistinen wrote: > Hi, > > I would be nice if wget would also support file://. Feel free to file an issue for this (I'll mark it "Needs Discussion" and set at low priority). I'd thought there was already an issue for this, but can't find it (either open or closed). I know this has come up before, at least. I think I'd need some convincing on this, as well as a clear definition of what the scope for such a feature ought to be. Unlike curl, which "groks urls", Wget "W(eb)-gets", and file:// can't really be argued to be part of the web. That in and of itself isn't really a reason not to support it, but my real misgivings have to do with the existence of various excellent tools that already do local-file transfers, and likely do it _much_ better than Wget could hope to. Rsync springs readily to mind. Even the system "cp" command is likely to handle things much better than Wget. In particular, special OS-specific, extended file attributes, extended permissions and the like, are among the things that existing system tools probably handle quite well, and that Wget is unlikely to. I don't really want Wget to be in the business of duplicating the system "cp" command, but I might conceivably not mind "file://" support if it means simple _content_ transfer, and not actual file duplication. Also in need of addressing is what "recursion" should mean for file://. Between ftp:// and http://, "recursion" currently means different things. In FTP, it means "traverse the file hierarchy recursively", whereas in HTTP it means "traverse links recursively". I'm guessing file:// should work like FTP (i.e., recurse when the path is a directory, ignore HTML-ness), but anyway this is something that'd need answering. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3 lNR++Q0XMkcY4c6dZu0+gi4= =mKqj -END PGP SIGNATURE- Make the switch to the world's best email. Get Yahoo!7 Mail! http://au.yahoo.com/y7mail
Re: Support for file://
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michelle Konzack wrote: > Imagine you have a local mirror of your website and you want to know why > the site @HOSTINGPROVIDER has some files more or such. > > You can spider the website @HOSTINGPROVIDER recursiv in a local "tmp1" > directory and then, with the same commandline, you can do the same with > the local mirror and "download" the files recursive into "tmp2" and now > you and now you can make a recursive fs-diff and know which files are > used... on both, the local mirror and @HOSTINGPROVIDER I'm confused. If you can successfully download the files from HOSTINGPROVIDER in the first place, then why would a difference exist? And if you can't, then this wouldn't be an effective way to find out. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI1dYe7M8hyUobTrERAuuyAJ9m3ArCqxG4orhAQuEM010yWv6ScwCfaE9h jXIjJ+XUjBYwyBdi8NB/rEY= =NDnR -END PGP SIGNATURE-
Re: Support for file://
Hello Micah, Am 2008-09-02 15:49:15, schrieb Micah Cowan: > I think I'd need some convincing on this, as well as a clear definition > of what the scope for such a feature ought to be. Unlike curl, which > "groks urls", Wget "W(eb)-gets", and file:// can't really be argued to > be part of the web. Right but... > That in and of itself isn't really a reason not to support it, but my > real misgivings have to do with the existence of various excellent tools > that already do local-file transfers, and likely do it _much_ better > than Wget could hope to. Rsync springs readily to mind. > > Even the system "cp" command is likely to handle things much better than > Wget. In particular, special OS-specific, extended file attributes, > extended permissions and the like, are among the things that existing > system tools probably handle quite well, and that Wget is unlikely to. I > don't really want Wget to be in the business of duplicating the system > "cp" command, but I might conceivably not mind "file://" support if it > means simple _content_ transfer, and not actual file duplication. > > Also in need of addressing is what "recursion" should mean for file://. > Between ftp:// and http://, "recursion" currently means different > things. In FTP, it means "traverse the file hierarchy recursively", > whereas in HTTP it means "traverse links recursively". I'm guessing > file:// should work like FTP (i.e., recurse when the path is a > directory, ignore HTML-ness), but anyway this is something that'd need > answering. Imagine you have a local mirror of your website and you want to know why the site @HOSTINGPROVIDER has some files more or such. You can spider the website @HOSTINGPROVIDER recursiv in a local "tmp1" directory and then, with the same commandline, you can do the same with the local mirror and "download" the files recursive into "tmp2" and now you and now you can make a recursive fs-diff and know which files are used... on both, the local mirror and @HOSTINGPROVIDER I was searching such feature several times and currently the only way is to install a Webserver local which not always possibel. Maybe this is a discussion worth? Greetings Michelle -- Linux-User #280138 with the Linux Counter, http://counter.li.org/ signature.pgp Description: Digital signature
Re: Support for file://
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Petri Koistinen wrote: > Hi, > > I would be nice if wget would also support file://. Feel free to file an issue for this (I'll mark it "Needs Discussion" and set at low priority). I'd thought there was already an issue for this, but can't find it (either open or closed). I know this has come up before, at least. I think I'd need some convincing on this, as well as a clear definition of what the scope for such a feature ought to be. Unlike curl, which "groks urls", Wget "W(eb)-gets", and file:// can't really be argued to be part of the web. That in and of itself isn't really a reason not to support it, but my real misgivings have to do with the existence of various excellent tools that already do local-file transfers, and likely do it _much_ better than Wget could hope to. Rsync springs readily to mind. Even the system "cp" command is likely to handle things much better than Wget. In particular, special OS-specific, extended file attributes, extended permissions and the like, are among the things that existing system tools probably handle quite well, and that Wget is unlikely to. I don't really want Wget to be in the business of duplicating the system "cp" command, but I might conceivably not mind "file://" support if it means simple _content_ transfer, and not actual file duplication. Also in need of addressing is what "recursion" should mean for file://. Between ftp:// and http://, "recursion" currently means different things. In FTP, it means "traverse the file hierarchy recursively", whereas in HTTP it means "traverse links recursively". I'm guessing file:// should work like FTP (i.e., recurse when the path is a directory, ignore HTML-ness), but anyway this is something that'd need answering. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3 lNR++Q0XMkcY4c6dZu0+gi4= =mKqj -END PGP SIGNATURE-
Support for file://
Hi, I would be nice if wget would also support file://. Petri