PDF and MS Word Metadata question: page counts

Ensor, Neal Thu, 22 Jul 2010 08:24:00 -0700

Just a curiousity:  I'm currently using tika 0.7 for some simple text 
extraction, and noticed that for some reason I can't access page counts for 
either PDF or Word documents.


I know the information is available via underlying library calls (e.g., PDF 
box) and appears it should be available via extended information in the MS 
Office parser, but I don't see it in the metadata of any documents I tried.  My 
question is, was there some reason why page counts are omitted?  I hacked my 
local copy of PDFParser to provide such via the PDDocument.getNumberOfPages() 
call,  but was wondering if I missed something somewhere or there might be a 
reason to not provide such information.  For the Word documents, I guess since 
it should be provided, guess I'm out of luck there, but for my purposes, I'd 
like at least parsed PDF metadata to provide that information if possible...  
Thanks!

Neal Ensor
[email protected]

PDF and MS Word Metadata question: page counts

Reply via email to