Re: [PLUG] Download All PDF Files From Website

2015-06-20 Thread Charles Sliger
This works better since a lot of files get specified with additional
info appended to the name.  Otherwise wget will download the file and
then proceed to delete it.

wget --convert-links -r -A *.pdf* -erobots=off http://www


On Sat, 2015-06-20 at 09:19 -0700, Rich Shepard wrote:
Perhaps I'm the only one who did not know how to use wget to download
 multiple .pdf files from a website rather than the site itself. If others
 also have tried and failed this information may be useful.
 
After reading the curl and wget man pages I tried various options to
 download ~50 .pdf files from a web site. All attempts failed.
 
Web searches revealed many threads and blogs that showed how both can be
 used to download an entire site, but not just .pdf or .jpg or other file
 types and not the html itself. Then I found a thread on linuxquestions.org
 where a responder pointed out that the robots.txt file prevented file
 downloads.
 
The original poster used that information to create the solution. Use this
 command followed by the full URL:
 
 wget --convert-links -r -A *.pdf -erobots=off http://www
 
It works!
 
 Rich
 ___
 PLUG mailing list
 PLUG@lists.pdxlinux.org
 http://lists.pdxlinux.org/mailman/listinfo/plug


___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug


Re: [PLUG] Download All PDF Files From Website

2015-06-20 Thread Rich Shepard
On Sat, 20 Jun 2015, Charles Sliger wrote:

 This works better since a lot of files get specified with additional info
 appended to the name. Otherwise wget will download the file and then
 proceed to delete it.

 wget --convert-links -r -A *.pdf* -erobots=off http://www

Charles,

   Don't know I've seen text following the .pdf extension before. But, this
is good to know.

Thanks,

Rich


___
PLUG mailing list
PLUG@lists.pdxlinux.org
http://lists.pdxlinux.org/mailman/listinfo/plug