Thanks,
As per your suggestion using annotation I was able to extract the name of
the embedded file however the contents of that file could not be extracted
Please refer to the code below.
var originalDocument = PDDocument.load(_PdfFile);
var originalCatalog = originalDocument.getDocumentCatalog();
java.util.List sourceDocumentPages = originalCatalog.getAllPages();
var newDocument = new PDDocument();
//number of pages in pdf file = 2
int[] PageNumbers = { 1, 2 };
foreach (var pageNumber in PageNumbers)
{
// Page numbers are 1-based, but PDPages are contained in a zero-based
array:
int pageIndex = pageNumber - 1;
PDPage pdpage = new PDPage();
try
{
pdpage = (PDPage)sourceDocumentPages.get(pageIndex);
List anno = pdpage.getAnnotations();
If(anno.size() > 0)
{
PDAnnotationFileAttachment pafa = (PDAnnotationFileAttachment)anno.get(0);
//FILENAME = GETCONTENTS()
string filename = pafa.getContents();
PDFileSpecification fs = pafa.getFile();
}
}
catch (Exception)
{ }
}
Can you help me one more time to extract and dump the embedded file in the
specified location?
On Thu, Jun 20, 2013 at 2:46 PM, Ramesh Shrestha <[email protected]>wrote:
>
> Even after trying Annotation i am not able to extract the
> embedded/attached doc file located in the page of pdf.
>
> On Tue, Jun 11, 2013 at 5:29 PM, Andreas Lehmkuehler <[email protected]>wrote:
>
>> Am 11.06.2013 07:06, schrieb Ramesh Shrestha:
>>
>>> Thanks,
>>>
>>> The java example link i provided should have been -
>>>
>>> http://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractEmbeddedFiles.java
>>>
>>> But your suggestion WORKS.
>>>
>>> Now i am able to extract the attached file located in the *attachments
>>> tab*but
>>> *haven't been able to extract the attached file located in page*. I am
>>>
>>> getting null efTree in this case.
>>>
>>> PDDocumentNameDictionary namesDictionary = new
>>> PDDocumentNameDictionary(pdfDoc.getDocumentCatalog());
>>> PDEmbeddedFilesNameTreeNode *efTree *=
>>>
>>> namesDictionary.getEmbeddedFiles();
>>>
>>> So now working on it.
>>>
>> Embedded files are always document related. If an embedded file is
>> referenced
>> on a single page a file attachment annotation is used. Try something like
>> this
>> to get all annotations of a single page:
>>
>> List annotations = page.getAnnotations();
>>
>> The one you are looking for has to be an instance of the class
>>
>>
>> org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationFileAttachment.
>>
>> On Mon, Jun 10, 2013 at 7:38 PM, Andreas Lehmkuehler <[email protected]
>>> >wrote:
>>>
>>> Hi,
>>>>
>>>> Am 10.06.2013 11:22, schrieb Ramesh Shrestha:
>>>>
>>>> Hi,
>>>>
>>>>>
>>>>>
>>>>> I am developing .NET Application using pdfbox to extract metadata,
>>>>> content and attached file from PDF.
>>>>>
>>>>> I was able to extract metadata and content, but stuck while extracting
>>>>> attached/embedded files.
>>>>>
>>>>> I have a pdf with embedded/attached doc file and want to retrieve that
>>>>> file. I have gone through the java example -
>>>>>
>>>>> http://www.docjar.com/html/**api/org/apache/pdfbox/**examples/pdmodel/**
>>>>> EmbeddedFiles.java.html<
>>>>> http://www.docjar.com/html/api/org/apache/pdfbox/examples/pdmodel/EmbeddedFiles.java.html
>>>>> >
>>>>>
>>>>> .
>>>>>
>>>>> But while trying to use it in .Net, i got "non generic type
>>>>> 'java.util.Map'
>>>>> cannot be used with type arguments" in the following code snippet
>>>>>
>>>>> java.util.Map<String, COSObjectable> names = efTree.getNames();
>>>>>
>>>>> So, i will be grateful if anybody help me to extract the file from pdf.
>>>>>
>>>>> I'm not a .NET expert and don't know what may cause that issue. But
>>>> maybe
>>>> it is
>>>> a good idea to just omit the generics and try something like this:
>>>>
>>>> java.util.Map names = efTree.getNames();
>>>>
>>>> Thanks in advance.
>>>>
>>>>>
>>>>>
>>>> HTH
>>>> Andreas Lehmkühler
>>>>
>>>
>> BR
>> Andreas Lehmkühler
>>
>>
>
>
>
--
pasa