On Mon, 23 Jul 2018, Tomas Kuchta wrote:
Maybe you can speed things up by pdf2txt and identify the lines of interest in awk.
Thomas, Almost every page is different. All have headers, data for a variable number of hours (some with flags in the left margin, most without), and some have summaries at the bottom. Then there are the days with missing data. And some days have data in a specific column (but not on all data rows) while other days are blank in that column. And, this is a one-time process. It's to get the data from the source documents into a format suitable for import into a database and statistical analyses. THanks, Rich _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
