Have a pdf file from a web site that I download, 
unfortunately coping the data directly results in long 
columns rather than going across.

The pdftohtml does produce output in a much more user 
friendly format that I can easily parse except for one 
issue. If a cell is completely empty, nothing is produced at 
all. Generally, there are 19 fields per row, but if a cell is 
blank, it only has 18, but not simple to tell, since no row 
marker is included either?

Can detect a short line, since the last field of a line is 
longer than the starting field, which is a 3 digit line 
number. But no way to determin which of the fields in the 
empty on. Sometimes it is field 14, but sometimes field 
17. Once determined, it is just a matter of inserting a cell, 
in the correct position.

Don't know if there is some option or switch that might be 
there to include something for blank cells.

Thanks.

+------------------------------------------------------------+
 Michael D. Setzer II - Computer Science Instructor 
(Retired)     
 mailto:[email protected]                            
 mailto:[email protected]
 Guam - Where America's Day Begins                        
 G4L Disk Imaging Project maintainer 
 http://sourceforge.net/projects/g4l/
+------------------------------------------------------------+



_______________________________________________
poppler mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to