The trouble with PDF is that, contrary to expectations, it is actually an image file format, rather than a text document format. In other words, it doesn’t know anything about paragraphs, or headers, or footers; all it knows about are simple instructions to draw a given letter at given coordinates. (Worse than that, some PDFs are actually just embedded bitmaps).
That means that converting a PDF into a conventional document is more akin to “optical character recognition” than ordinary file format conversion. It takes machine learning or sophisticated heuristics for software to figure out the structural relationships behind the document image. There is some effective software available to do this conversion, but it tends to be expensive because it’s such a hard problem and the capability is so valuable. Best wishes Jeremy. > On 1 Sep 2017, at 05:02, TonyM <[email protected]> wrote: > > Such a tool would be helpful. > > Personally I would look into tools to turn pdfs into text then import that, > because there a many issues going from a highly formatted document type to > plain text. Not to mention text inside images where some OCR is needed. > > Foxit reader and Pro is great for pdf work but not sure it will help you. > > Regards > Tony > > -- > You received this message because you are subscribed to the Google Groups > "TiddlyWiki" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tiddlywiki. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tiddlywiki/4b702ecd-3fbd-4e09-b6db-dd4092ca4000%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "TiddlyWiki" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tiddlywiki. To view this discussion on the web visit https://groups.google.com/d/msgid/tiddlywiki/67AD8B93-ECE2-40DB-B980-15D04A442530%40gmail.com. For more options, visit https://groups.google.com/d/optout.

