At 10:40 PM 7/19/2006, Aaron J Weber wrote:
I basically am trying to filter PDFs
to see if they're a non-Latin-based language (Japanese, Korean, Chinese
to name a few).
Thanks for any
hints/tips/suggestions.
If I were
trying to tackle this problem, I would simply find all fonts in the
Thanks for the suggestion. I had thought
about that. But what if the document is PDF-Image (doesn't have a
significant text "layer")? Then I'm just going to have a lot of binary
stream data in there with very little (if any) notation of fonts,
right?
Puzzling stuff... :(
Thanks again,
AJ
At 09:51 AM 7/20/2006, Aaron J Weber wrote:
Thanks for the suggestion. I
had thought about that. But what if the document is PDF-Image
(doesn't have a significant text layer)?
Then you
won't find any fonts in the document. (keep in mind that a
PDF isn't a single thing - each object on each
Excuse me again...
I appreciate your correspondence on the matter, but
I don't understand your last comment.
My point is that ifthe file issolely a
PDF/Image (as I have found examples of), then there are no fonts listed in the
PDF at all (as you correctly stated).
-- Your initial reply
At 02:18 PM 7/20/2006, Aaron J Weber wrote:
My point is that if the file is
solely a PDF/Image (as I have found examples of), then there are no fonts
listed in the PDF at all (as you correctly stated).
If a PDF
consists of a collection of pages, where each page contains a single
image - then
I basically am trying to filter PDFs to see if
they're a non-Latin-based language (Japanese, Korean, Chinese to name a
few).
Thanks for any hints/tips/suggestions.
Sorry for the Off-Topic inquiry...there are a lot
of "PDF Experts" on this list! :)
-AJ