On Sun, Nov 29, 2009 at 6:22 AM, Asa Nathannael Hunt <[email protected]>wrote:
> does anyone know of a (set of) tool(s) I could use to download and > compile several html text pages into one document? > > I'm looking for a way to generate a PDF copy of the CFR, which is only > available as individual, txt or pdf subchapters. Rather I'd prefer to > have one document that includes each of the title/chapter heading as > it's laid out on the site. > > for example see: > http://www.access.gpo.gov/cgi-bin/cfrassemble.cgi?title=200846 > > Asa Nathannael Hunt > A combination of wget and pdftk will do this I think the following will download all the pdf files you are interested in without downloading all of .gov wget -r -H -A.pdf -Daccess.gpo.gov-I/nara/cfr/waisidx_08,/nara/cfr/waisidx_07,/cfr_2008/octqtr/pdf/ http://www.access.gpo.gov/cgi-bin/cfrassemble.cgi?title=200846 All the pdf files end up in a directory edocket.access.gpo.gov/cfr_2008/octqtr/pdf cd to that directory and concatenate all the pdf files into out.pdf with pdftk *pdf cat output out.pdf I have not completely tested this as it involves a pretty big download, so let me know if there is a problem. Bill _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
