Have a couple of spreadsheets my college puts on web site as pdf files, but are created from excell spreadsheets. Using the poppler pdftohtml or pdftotxt I'm able to get the data from the file. Only issue is that a few records have cells that are blank, and this throws the columns off. Original spreadsheet has columns A thru S, but on rows with a blank cell data gets shifted. Am able to have program correct issue, since there is a column later that has only 4 different values, so have it check that, and if a row has a different value than those, have it shift the values over. Don't know if the issue is how excell creates the PDF file or it because the cell is empty nothing is outputed.
Was using pdftohtml, since it tended to put cells out as separate lines, but recently it was randomly getting some cells combined on lines. Like I said, have a program that automatically cleans it all up, so not an issue, but thought I'd ask. Used at least one site that is for a paid program, but has a demo process, and it does export data, and catches empty cell some how. Thanks for all the work. Otherwise it is great. Have a nice day.
