Thanks Bill,
It worked like a charm. I ended up adding a couple extra spaces. Thanks 
so much for the help.

revised:
wget -r -H -A.pdf -D access.gpo.gov -I 
/nara/cfr/waisidx_08,/nara/cfr/waisidx_07,/cfr_2008/octqtr/pdf/
http://www.access.gpo.gov/cgi-bin/cfrassemble.cgi?title=200846


Asa

Bill Barry wrote:
> On Sun, Nov 29, 2009 at 6:22 AM, Asa Nathannael Hunt <[email protected]>wrote:
> 
>> does anyone know of a (set of) tool(s) I could use to download and
>> compile several html text pages into one document?
>>
>> I'm looking for a way to generate a PDF copy of the CFR, which is only
>> available as individual, txt or pdf subchapters. Rather I'd prefer to
>> have one document that includes each of the title/chapter heading as
>> it's laid out on the site.
>>
>> for example see:
>> http://www.access.gpo.gov/cgi-bin/cfrassemble.cgi?title=200846
>>
>> Asa Nathannael Hunt
>>
> 
> A combination of wget and pdftk will do this
> 
> I think the following will  download all the pdf files you are interested in
> without downloading all of .gov
> wget -r -H -A.pdf
> -Daccess.gpo.gov-I/nara/cfr/waisidx_08,/nara/cfr/waisidx_07,/cfr_2008/octqtr/pdf/
> http://www.access.gpo.gov/cgi-bin/cfrassemble.cgi?title=200846
> 
> All the pdf files end up in a directory
> edocket.access.gpo.gov/cfr_2008/octqtr/pdf
> 
> cd to that directory and concatenate all the pdf files into out.pdf with
> 
> pdftk *pdf cat output out.pdf
> 
> I have not completely tested this as it involves a pretty big download, so
> let me know if there is a problem.
> 
> Bill
> _______________________________________________
> PLUG mailing list
> [email protected]
> http://lists.pdxlinux.org/mailman/listinfo/plug
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to