Re: [PLUG] Download All PDF Files From Website
This works better since a lot of files get specified with additional info appended to the name. Otherwise wget will download the file and then proceed to delete it. wget --convert-links -r -A *.pdf* -erobots=off http://www On Sat, 2015-06-20 at 09:19 -0700, Rich Shepard wrote: Perhaps I'm the only one who did not know how to use wget to download multiple .pdf files from a website rather than the site itself. If others also have tried and failed this information may be useful. After reading the curl and wget man pages I tried various options to download ~50 .pdf files from a web site. All attempts failed. Web searches revealed many threads and blogs that showed how both can be used to download an entire site, but not just .pdf or .jpg or other file types and not the html itself. Then I found a thread on linuxquestions.org where a responder pointed out that the robots.txt file prevented file downloads. The original poster used that information to create the solution. Use this command followed by the full URL: wget --convert-links -r -A *.pdf -erobots=off http://www It works! Rich ___ PLUG mailing list PLUG@lists.pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug ___ PLUG mailing list PLUG@lists.pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug
Re: [PLUG] Download All PDF Files From Website
On Sat, 20 Jun 2015, Charles Sliger wrote: This works better since a lot of files get specified with additional info appended to the name. Otherwise wget will download the file and then proceed to delete it. wget --convert-links -r -A *.pdf* -erobots=off http://www Charles, Don't know I've seen text following the .pdf extension before. But, this is good to know. Thanks, Rich ___ PLUG mailing list PLUG@lists.pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug