Have a pdf file from a web site that I download, unfortunately coping the data directly results in long columns rather than going across.
The pdftohtml does produce output in a much more user friendly format that I can easily parse except for one issue. If a cell is completely empty, nothing is produced at all. Generally, there are 19 fields per row, but if a cell is blank, it only has 18, but not simple to tell, since no row marker is included either? Can detect a short line, since the last field of a line is longer than the starting field, which is a 3 digit line number. But no way to determin which of the fields in the empty on. Sometimes it is field 14, but sometimes field 17. Once determined, it is just a matter of inserting a cell, in the correct position. Don't know if there is some option or switch that might be there to include something for blank cells. Thanks. +------------------------------------------------------------+ Michael D. Setzer II - Computer Science Instructor (Retired) mailto:[email protected] mailto:[email protected] Guam - Where America's Day Begins G4L Disk Imaging Project maintainer http://sourceforge.net/projects/g4l/ +------------------------------------------------------------+ _______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
