Thanks Bill, It worked like a charm. I ended up adding a couple extra spaces. Thanks so much for the help.
revised: wget -r -H -A.pdf -D access.gpo.gov -I /nara/cfr/waisidx_08,/nara/cfr/waisidx_07,/cfr_2008/octqtr/pdf/ http://www.access.gpo.gov/cgi-bin/cfrassemble.cgi?title=200846 Asa Bill Barry wrote: > On Sun, Nov 29, 2009 at 6:22 AM, Asa Nathannael Hunt <[email protected]>wrote: > >> does anyone know of a (set of) tool(s) I could use to download and >> compile several html text pages into one document? >> >> I'm looking for a way to generate a PDF copy of the CFR, which is only >> available as individual, txt or pdf subchapters. Rather I'd prefer to >> have one document that includes each of the title/chapter heading as >> it's laid out on the site. >> >> for example see: >> http://www.access.gpo.gov/cgi-bin/cfrassemble.cgi?title=200846 >> >> Asa Nathannael Hunt >> > > A combination of wget and pdftk will do this > > I think the following will download all the pdf files you are interested in > without downloading all of .gov > wget -r -H -A.pdf > -Daccess.gpo.gov-I/nara/cfr/waisidx_08,/nara/cfr/waisidx_07,/cfr_2008/octqtr/pdf/ > http://www.access.gpo.gov/cgi-bin/cfrassemble.cgi?title=200846 > > All the pdf files end up in a directory > edocket.access.gpo.gov/cfr_2008/octqtr/pdf > > cd to that directory and concatenate all the pdf files into out.pdf with > > pdftk *pdf cat output out.pdf > > I have not completely tested this as it involves a pretty big download, so > let me know if there is a problem. > > Bill > _______________________________________________ > PLUG mailing list > [email protected] > http://lists.pdxlinux.org/mailman/listinfo/plug _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
