I was trying an export as spreadsheet from Acrobat Pro, but that didn’t work. Doing a Save as Text from Acrobat Reader was more successful, but the columns come out in a different order, and some columns get combined into a single string.
> On Jul 8, 2016, at 11:44 AM, Richard Gaskin <[email protected]> > wrote: > > Jim Hurley wrote: > > > My County is now publishing the election results to the web as a PDF > > file: > > > > https://www.mynevadacounty.com/nc/elections/docs/2016%20Elections/June%207%2c%202016%2c%20Presidential%20Primary/Election%20Results/precinctreport.pdf > > > > Is there a way to parse these PDF files? > > It's unfortunate that so many orgs release data useful to analysis in complex > formats that inhibit such use. PDF is great when the goal is to preserve > page layout, but a uniquely poor choice for sharing data to be used for > analytics. Alas, that hasn't slowed its unfortunate use in such contexts. > > If this is to be done within an application for others to use, perhaps the > smoothest user experience would be via the XPDF external, currently available > only in LiveCode Business Edition at $1999/yr. While that may seem high, for > commercial products of such scope it may be a good bargain. > > However, if this is only for use in tools you'll be using yourself, where an > extra step or two is less important, there are many options. > > If it's just one file, perhaps the simplest is to use Save As Text from > Adobe's PDF Viewer. > > If you'll need to automate this for reuse, here's a way to use Apple's > Automator for that: > <https://www.engadget.com/2013/02/11/mac-101-use-automater-to-extract-text-from-pdfs/> > > I believe there may also be a command line option available on macOS, which > could be called from within LC using the shell function. I don't know the > name of the command line tool for that on macOS, but in Linux I use > pdftotext, where the syntax is pretty simple: > > pdftotext <sourcePdfFile> <destTextFile> > > e.g.: > > put "/Users/me/folder/SomeFile.pdf" into tSrc > put "/Users/me/folder/SomeFile.txt" into tDest > get shell("pdftotext "& tSrc && tDest) > > -- > Richard Gaskin > Fourth World Systems > Software Design and Development for the Desktop, Mobile, and the Web > ____________________________________________________________________ > [email protected] http://www.FourthWorld.com > > > _______________________________________________ > use-livecode mailing list > [email protected] > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode _______________________________________________ use-livecode mailing list [email protected] Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
