Re: [iText-questions] Slightly OT: Does anyone have a way to determine the language or even codepage of a PDF?

2006-07-20 Thread Leonard Rosenthol
At 10:40 PM 7/19/2006, Aaron J Weber wrote: I basically am trying to filter PDFs to see if they're a non-Latin-based language (Japanese, Korean, Chinese to name a few). Thanks for any hints/tips/suggestions. If I were trying to tackle this problem, I would simply find all fonts in the

Re: [iText-questions] Slightly OT: Does anyone have a way to determine the language or even codepage of a PDF?

2006-07-20 Thread Aaron J Weber
Thanks for the suggestion. I had thought about that. But what if the document is PDF-Image (doesn't have a significant text "layer")? Then I'm just going to have a lot of binary stream data in there with very little (if any) notation of fonts, right? Puzzling stuff... :( Thanks again, AJ

Re: [iText-questions] Slightly OT: Does anyone have a way to determine the language or even codepage of a PDF?

2006-07-20 Thread Leonard Rosenthol
At 09:51 AM 7/20/2006, Aaron J Weber wrote: Thanks for the suggestion. I had thought about that. But what if the document is PDF-Image (doesn't have a significant text layer)? Then you won't find any fonts in the document. (keep in mind that a PDF isn't a single thing - each object on each

Re: [iText-questions] Slightly OT: Does anyone have a way to determine the language or even codepage of a PDF?

2006-07-20 Thread Aaron J Weber
Excuse me again... I appreciate your correspondence on the matter, but I don't understand your last comment. My point is that ifthe file issolely a PDF/Image (as I have found examples of), then there are no fonts listed in the PDF at all (as you correctly stated). -- Your initial reply

Re: [iText-questions] Slightly OT: Does anyone have a way to determine the language or even codepage of a PDF?

2006-07-20 Thread Leonard Rosenthol
At 02:18 PM 7/20/2006, Aaron J Weber wrote: My point is that if the file is solely a PDF/Image (as I have found examples of), then there are no fonts listed in the PDF at all (as you correctly stated). If a PDF consists of a collection of pages, where each page contains a single image - then

[iText-questions] Slightly OT: Does anyone have a way to determine the language or even codepage of a PDF?

2006-07-19 Thread Aaron J Weber
I basically am trying to filter PDFs to see if they're a non-Latin-based language (Japanese, Korean, Chinese to name a few). Thanks for any hints/tips/suggestions. Sorry for the Off-Topic inquiry...there are a lot of "PDF Experts" on this list! :) -AJ