On Mon, 23 Jul 2018, Tomas Kuchta wrote:

Maybe you can speed things up by pdf2txt and identify the lines of interest
in awk.

Thomas,

  Almost every page is different. All have headers, data for a variable
number of hours (some with flags in the left margin, most without), and some
have summaries at the bottom. Then there are the days with missing data. And
some days have data in a specific column (but not on all data rows) while
other days are blank in that column.

  And, this is a one-time process. It's to get the data from the source
documents into a format suitable for import into a database and statistical
analyses.

THanks,

Rich
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to