Hi,
Am 20.06.2013 12:48, schrieb Ramesh Shrestha:
Thanks,
As per your suggestion using annotation I was able to extract the name of
the embedded file however the contents of that file could not be extracted
Please refer to the code below.
var originalDocument = PDDocument.load(_PdfFile);
var originalCatalog = originalDocument.getDocumentCatalog();
java.util.List sourceDocumentPages = originalCatalog.getAllPages();
var newDocument = new PDDocument();
//number of pages in pdf file = 2
int[] PageNumbers = { 1, 2 };
foreach (var pageNumber in PageNumbers)
{
// Page numbers are 1-based, but PDPages are contained in a zero-based
array:
int pageIndex = pageNumber - 1;
PDPage pdpage = new PDPage();
try
{
pdpage = (PDPage)sourceDocumentPages.get(pageIndex);
List anno = pdpage.getAnnotations();
If(anno.size() > 0)
{
PDAnnotationFileAttachment pafa = (PDAnnotationFileAttachment)anno.get(0);
//FILENAME = GETCONTENTS()
string filename = pafa.getContents();
PDFileSpecification fs = pafa.getFile();
}
}
catch (Exception)
{ }
}
Can you help me one more time to extract and dump the embedded file in the
specified location?
You already mentioned some sample code yourself. [1] demonstrates how to do
that.
[1]
http://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractEmbeddedFiles.java
On Thu, Jun 20, 2013 at 2:46 PM, Ramesh Shrestha <[email protected]>wrote:
Even after trying Annotation i am not able to extract the
embedded/attached doc file located in the page of pdf.
On Tue, Jun 11, 2013 at 5:29 PM, Andreas Lehmkuehler <[email protected]>wrote:
Am 11.06.2013 07:06, schrieb Ramesh Shrestha:
Thanks,
The java example link i provided should have been -
http://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractEmbeddedFiles.java
But your suggestion WORKS.
Now i am able to extract the attached file located in the *attachments
tab*but
*haven't been able to extract the attached file located in page*. I am
getting null efTree in this case.
PDDocumentNameDictionary namesDictionary = new
PDDocumentNameDictionary(pdfDoc.getDocumentCatalog());
PDEmbeddedFilesNameTreeNode *efTree *=
namesDictionary.getEmbeddedFiles();
So now working on it.
Embedded files are always document related. If an embedded file is
referenced
on a single page a file attachment annotation is used. Try something like
this
to get all annotations of a single page:
List annotations = page.getAnnotations();
The one you are looking for has to be an instance of the class
org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationFileAttachment.
On Mon, Jun 10, 2013 at 7:38 PM, Andreas Lehmkuehler <[email protected]
wrote:
Hi,
Am 10.06.2013 11:22, schrieb Ramesh Shrestha:
Hi,
I am developing .NET Application using pdfbox to extract metadata,
content and attached file from PDF.
I was able to extract metadata and content, but stuck while extracting
attached/embedded files.
I have a pdf with embedded/attached doc file and want to retrieve that
file. I have gone through the java example -
http://www.docjar.com/html/**api/org/apache/pdfbox/**examples/pdmodel/**
EmbeddedFiles.java.html<
http://www.docjar.com/html/api/org/apache/pdfbox/examples/pdmodel/EmbeddedFiles.java.html
.
But while trying to use it in .Net, i got "non generic type
'java.util.Map'
cannot be used with type arguments" in the following code snippet
java.util.Map<String, COSObjectable> names = efTree.getNames();
So, i will be grateful if anybody help me to extract the file from pdf.
I'm not a .NET expert and don't know what may cause that issue. But
maybe
it is
a good idea to just omit the generics and try something like this:
java.util.Map names = efTree.getNames();
Thanks in advance.
HTH
Andreas Lehmkühler
BR
Andreas Lehmkühler
BR
Andreas Lehmkühler