Daniel,
Is this (issues.apache.org) email id where I need to create an issue? As I
am new so not aware of this email id.
As per the understanding sending to '[email protected].
Also attaching the example file -
Problem Description
In my application, its a Document Management application which manages PDF
documents.
so while this PDF documents created in Version '1.4(5.x) or later' by
Adobe Professional 9.0 then 1 revision will be uploaded and pdf generated
successfully.
But If I am trying to create its new revision say 2, then at the time of
creation of 'PDF difference file', the line# 11 below -
1// Text Stripper initialisation
2 PDFTextStripper stripper = new PDFTextStripper();
3 pdfStream = new ByteArrayInputStream(pdf_buf);
4
5 // Open and load PDF document content.
6 document = PDDocument.load(pdfStream);
7 // Get the document content in String format.
8 // And suppress all non ascii characters
9 //try
10 //{
11 PDF_text = stripper.getText(document);
will return null as 'PDF_text' instead of text document of the PDF file.
Thanks & Regards,
Pooja Gupta
Tata Consultancy Services
Mailto: [email protected]
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Outsourcing
____________________________________________
From:
Daniel Wilson <[email protected]>
To:
[email protected]
Date:
05/27/2010 07:57 PM
Subject:
Re: Extract Text from PDF
Pooja,
Would you create an issue at issues.apache.org for this and attach an
example file?
Thanks.
Daniel
On Wed, May 26, 2010 at 12:03 AM, Pooja4 G <[email protected]> wrote:
> I tried to use the pdfbox1.1.0 but with this pdf generation failed while
> we are checking for Encryption of documents.
> Do anyone have any idea while more API we can use other than PDFbox for
> creation of PDFdiff file in DMS.
> We are uploading documents from Adobe professional 9.0 and while we
create
> the new revision of the documents, it will fail at creation of PDF diff
> file. It returns null as below
> using the class PDFTextStripper.class method
> getText().
>
> String PDF_text = new String();
> PDFTextStripper stripper = new PDFTextStripper();
>
> PDF_text = stripper.getText(document);
>
> So please help me in solving this.
>
> Thanks & Regards,
> Pooja Gupta
> Tata Consultancy Services
> Mailto: [email protected]
> Website: http://www.tcs.com
> ____________________________________________
> Experience certainty. IT Services
> Business Solutions
> Outsourcing
> ____________________________________________
>
>
>
> From:
> Andreas Lehmkuehler <[email protected]>
> To:
> [email protected]
> Date:
> 05/20/2010 09:21 PM
> Subject:
> Re: Extract Text from PDF
>
>
>
> Hi,
>
> Thomas Fischer schrieb:
> > Hello Pooja,
> >
> > I don't have any Adobe 9.0 documents, but I know that in my tests the
> newer versions of PDFBox perform significantly better than version 7.3.
> > I would suggest you try the fairly recent version 1.1.0, this works
very
> well at least on my Adobe Acrobat 8.1 documents.
> Which can be found at [1]
>
>
> BR
> Andreas Lehmkühler
>
> [1] http://pdfbox.apache.org/download.html
> >
> > Mit freundlichen Grüßen
> > Thomas Fischer
> >
> >
> > Am 20.05.2010 um 14:07 schrieb Pooja4 G:
> >
> >> Which version of the PDF documents are supported by PDFbox0.7.3, As
we
> >> upload a document of version Adobe Professional writer 9.0 and while
> >> creating the difference files to compare, we will extract the text
data
>
> >> from the PDF document using the class PDFTextStripper.class method
> >> getText().
> >>
> >> String PDF_text = new String();
> >> PDFTextStripper stripper = new PDFTextStripper();
> >>
> >> PDF_text = stripper.getText(document);
> >>
> >> But it will return null if the argument as document is created from
> adobe
> >> Professional 9.0 else it will run successfully.
> >> Please help or at least let us know if any upcoming new version
PDFBox
> >> does support this.
> >>
> >> Thanks & Regards,
> >> Pooja Gupta
> >> Tata Consultancy Services
> >> Mailto: [email protected]
> >> Website: http://www.tcs.com
> >> ____________________________________________
> >> Experience certainty. IT Services
> >> Business Solutions
> >> Outsourcing
> >> ____________________________________________
> >> =====-----=====-----=====
> >> Notice: The information contained in this e-mail
> >> message and/or attachments to it may contain
> >> confidential or privileged information. If you are
> >> not the intended recipient, any dissemination, use,
> >> review, distribution, printing or copying of the
> >> information contained in this e-mail message
> >> and/or attachments to it are strictly prohibited. If
> >> you have received this communication in error,
> >> please notify us by reply e-mail or telephone and
> >> immediately and permanently delete the message
> >> and any attachments. Thank you
> >>
> >>
> >
>
>
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
/**
* Open PDF Document Stream from a byte array buffer.
*
* @param pdf_buf
* @return byte[] containing PDF text
*/
public static byte[] fromBuffer(byte[] pdf_buf) throws
TechnicalException, FunctionalException {
String m_name = "PDFBOX-fromBuffer";
PDDocument document = null;
InputStream pdfStream = null;
try {
logger.debug("Entering function " + m_name);
String PDF_text = new String();
// Text Stripper initialisation
PDFTextStripper stripper = new PDFTextStripper();
pdfStream = new ByteArrayInputStream(pdf_buf);
// Open and load PDF document content.
document = PDDocument.load(pdfStream);
boolean isCrypt = false;
try {
PdfPdfbox.isCrypted(pdf_buf);
} catch (FunctionalException fu) {
throw new FunctionalException(m_name,
fu.getMessage());
}
logger.debug("Get the Document Content in String
Format");
// Get the document content in String format.
// And suppress all non ascii characters
//try
//{
PDF_text = stripper.getText(document);
logger.debug("Ascii Character Error " + PDF_text);
//}
//catch (Exception fu){
// logger.debug("Ascii Character Error " +
fu.getMessage());
// fu.printStackTrace();
// logger.debug("-------------------------- " +
PDF_text);
// }
PDF_text=
StringUtils.filterNonAsciiCharacters(PDF_text);
if (document != null) {
document.close();
}
if (pdfStream != null) {
pdfStream.close();
}
if ( isCrypt == false ){
logger.debug("Get Creator " + m_name);
String docCreator =
document.getDocumentInformation().getCreator();
logger.info("Creator " + docCreator);
// Suppress all windings characters
// if requested for the site and if the creator
is PDF Creator
if (
Utils.StringIsNull(Constants.PDF_CREATOR_SUPPRESS_WINDINGS_FONT) == false
&&
Utils.StringIsNull(docCreator) == false
&&
docCreator.startsWith("PDFCreator") == true) {
return (
pdfCreatorTools.pdfCreatorUpdate(PDF_text.getBytes()));
}
}
return PDF_text.getBytes();
} catch (FunctionalException fu) {
throw new FunctionalException(m_name, fu.getMessage());
} catch (IOException ioe) {
throw new TechnicalException(m_name, ioe);
} catch (Exception e) {
throw new TechnicalException(m_name, e);
} finally {
if (document != null) {
try {
document.close();
} catch (IOException ioe) {
// nothing;
}
}
if (pdfStream != null) {
try {
pdfStream.close();
} catch (IOException ioe) {
// nothing;
}
}
} // end finally
} // end PDF_DocumentReadBuffer