On Sun, Nov 29, 2009 at 6:22 AM, Asa Nathannael Hunt <[email protected]>wrote:

> does anyone know of a (set of) tool(s) I could use to download and
> compile several html text pages into one document?
>
> I'm looking for a way to generate a PDF copy of the CFR, which is only
> available as individual, txt or pdf subchapters. Rather I'd prefer to
> have one document that includes each of the title/chapter heading as
> it's laid out on the site.
>
> for example see:
> http://www.access.gpo.gov/cgi-bin/cfrassemble.cgi?title=200846
>
> Asa Nathannael Hunt
>

A combination of wget and pdftk will do this

I think the following will  download all the pdf files you are interested in
without downloading all of .gov
wget -r -H -A.pdf
-Daccess.gpo.gov-I/nara/cfr/waisidx_08,/nara/cfr/waisidx_07,/cfr_2008/octqtr/pdf/
http://www.access.gpo.gov/cgi-bin/cfrassemble.cgi?title=200846

All the pdf files end up in a directory
edocket.access.gpo.gov/cfr_2008/octqtr/pdf

cd to that directory and concatenate all the pdf files into out.pdf with

pdftk *pdf cat output out.pdf

I have not completely tested this as it involves a pretty big download, so
let me know if there is a problem.

Bill
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to