Mike Bonner wrote:

> I haven't needed to do this before, but is there a (relatively) easy
> way to extract the text from a bunch of pdf files?  I'm hoping I can
> build some indexes for the boatload of files I want to go through
> (THough, I guess I could bipass LC and just grep my heart out)
>
> Any suggestions?

Long term:

Per Postel's Law, reduce the stockpile of PDFs littering humanity's infosphere by generating none except in the increasingly rare cases where no other format is a better choice.

PDF is an archaic format held over from the days when nearly all display devices had screens at least as wide as a printed page. Back in the '90s, when it was popularized, a fixed-size format emulating a printed piece of paper was not an unreasonable thing to do.

But times have changed. We rarely kill trees just to read anymore, so the bounds of a printed page are approaching meaninglessness.

This becomes critically important for delivering an enjoyable reading experience when we consider that an ever-smaller minority of our time is spent on screens large enough to accommodate that size.

Many of our screens are much smaller, and moreover they vary enough to make any single fixed size needlessly cumbersome.

Attempting to read PDFs on a phone ranges from mildly annoying to prohibitively frustrating.

That unnecessary pain is easily replaced these days with modern formats that reflow text to fit any of the many devices we might be using at any given moment.

There's a good argument for using EPub as that foundation.

But that's a long-term solution, and while I believe it's an inevitability as mobile use continues to grow it won't solve your need in the here-and-now., so:


Short term:

The Linux universe has many good command-line solutions available for extracting text from PDFs easily and efficiently, like this one:
https://www.howtogeek.com/228531/how-to-convert-a-pdf-file-to-editable-text-using-the-command-line-in-linux/

For those Win10 Pro users who can be convinced the tick a checkbox, the entire universe of the Ubuntu shell is now available.

macOS also includes utilities for this, but I don't believe the same ones (at least not without installing an independent package manager like Homebrew.

--
 Richard Gaskin
 Fourth World Systems


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
  • PDF Mike Bonner via use-livecode
    • Re: PDF Paul Dupuis via use-livecode
      • Re: PDF Mike Bonner via use-livecode
        • Re: PDF Paul Dupuis via use-livecode
          • Re: PDF Mike Bonner via use-livecode
    • Re: PDF Richard Gaskin via use-livecode
      • Re: PDF Mike Bonner via use-livecode
      • Re: PDF Alex Tweedly via use-livecode
        • Re: PDF Richard Gaskin via use-livecode
          • Re: PDF Mark Waddingham via use-livecode
            • Re: PDF Richard Gaskin via use-livecode
              • Re: PDF Mark Waddingham via use-livecode
      • Re: PDF Bob Sneidar via use-livecode
        • Re: PDF Richard Gaskin via use-livecode
          • Re: PDF Bob Sneidar via use-livecode
    • Re: PDF Dr. Hawkins via use-livecode

Reply via email to