Sorry – other things have priority…

Then, as previously mentioned, PoDoFo is happy to do exactly what you want.  It 
will do the PDF decoding parts of process and leave you with the original data 
that was placed into the stream.

In your original example, someone put a ZIP file in a stream.  And you will get 
a ZIP file back out.

Leonard

From: Nadav Ben Jakov
Date: Sunday, February 15, 2015 at 11:49 AM
To: 
"podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>"
Subject: [Podofo-users] Fwd: Problem with enumerating objects with PoDoFo

Hi;
Can anyone help me please?

---------- Forwarded message ----------
From: Nadav Ben Jakov <tbya...@gmail.com<mailto:tbya...@gmail.com>>
Date: Tue, Feb 10, 2015 at 6:34 PM
Subject: Re: [Podofo-users] Problem with enumerating objects with PoDoFo
To: Leonard Rosenthol <lrose...@adobe.com<mailto:lrose...@adobe.com>>


Hi,
I'm sorry for having pretty vogue explanations. I'm not trying to render the 
stream's content nor  to parse the streams as data. I just want to print the 
section between 'stream' to 'endstream'.
As to printing, I basically write the stream to files as output, so they are in 
fact "printable" for that matter. In the case I had given, I expect to be able 
to open the ouput file with winzip/7zip later, and view its content. if that 
stream would have contained a PE file's content, I would have wanted to be able 
to execute the output file just like a normal PE file.

Thanks in advance,
Nadav

On Tue, Feb 10, 2015 at 6:16 PM, Leonard Rosenthol 
<lrose...@adobe.com<mailto:lrose...@adobe.com>> wrote:
> I'm trying to extract each of the streams data out of all of the objects of a 
> given PDF file,
>and remove any encoding/encryption related to the PDF format
>
That is EXACTLY what PoDoFo is doing for you.


> Later I'm printing each stream separately
>
Printing them in what way?  Most (if not all) streams in a PDF aren’t in a 
format that would be printable.  You have font data, color data, XML streams, 
etc.  None of these make any sense to be printed.  Even a page content stream 
isn’t necessarily printable.


>In the example file I mentioned, I was hoping to extract that object steam 
>(the zip file) and print it as a raw stream
>
I don’t understand what it means to “print a ZIP file as a raw stream”.  A ZIP 
file itself can contain MANY files and in a rich hierarchical organization.   
So what would you be expecting to get back??

What if it was a .exe file – which si also perfectly fine in a PDF – what would 
you be expecting to get back??


>(again, I'm not after unzipping or decompressing the zip, just the plain 
>object).
>
And I think that’s the problem here – you are expecting something that isn’t 
reasonable.

Have you read the PDF Standard – ISO 32000-1 – or at least a good book on 
subject of PDF (eg. <http://shop.oreilly.com/product/0636920025269.do>) to get 
a better understanding of PDF?


Leonard

From: Nadav Ben Jakov
Date: Tuesday, February 10, 2015 at 10:07 AM

To: Leonard Rosenthol
Subject: Re: [Podofo-users] Problem with enumerating objects with PoDoFo

I'm trying to extract each of the streams data out of all of the objects of a 
given PDF file, and remove any encoding/encryption related to the PDF format. 
Later I'm printing each stream separately.
I'm only trying to figure out if I can rely on PoDoFo parsing and extraction of 
any stream on any object, even if the stream in question is unfiltered/clear, 
I'm basically after the raw object's streams on these cases.
In the example file I mentioned, I was hoping to extract that object steam (the 
zip file) and print it as a raw stream (again, I'm not after unzipping or 
decompressing the zip, just the plain object).

Nadav

On Tue, Feb 10, 2015 at 4:51 PM, Leonard Rosenthol 
<lrose...@adobe.com<mailto:lrose...@adobe.com>> wrote:
This stream IS clear (to use your term) as far as PDF is concerned.  It is 
neither filtered nor encrypted.  Therefore you get the raw bits – which in this 
case, happen to be a .zip file.      But as mentioned, it might be font data, 
ICC profile data or any other manner thing that can be put into a PDF stream in 
various ways.

Perhaps if you would explain the reason for your request, since it’s unclear 
what you are trying to achieve at the end of the process.

Leonard

From: Nadav Ben Jakov
Date: Tuesday, February 10, 2015 at 9:02 AM
To: Leonard Rosenthol
Subject: Re: [Podofo-users] Problem with enumerating objects with PoDoFo

Hi,

Thank you for your quick answer, yet it didn't completely answered my question: 
I need to get as an output all of the streams, regardless of having /Filter or 
/Encrypt. If the object happens to contain a /Filter - the stream should be 
decoded. If the stream is clear - it should only be parsed and returned as 
clear stream.
Can PoDoFo (or any other PDF library for that matter) perform such operation?

Thanks in advance,
 Nadav

On Tue, Feb 10, 2015 at 3:38 PM, Leonard Rosenthol 
<lrose...@adobe.com<mailto:lrose...@adobe.com>> wrote:
That stream that you show doesn’t have a filter.  If it did, you’d see it on 
the stream dictionary.

Instead, what you have is an unfiltered stream containing arbitrary data – 
which in this case, just happens to be a ZIP file.  But it could just as well 
been a Word file, another PDF or even a .exe.

There is nothing for PoDoFo (or any other PDF library) to do here.

Leonard

From: Nadav Ben Jakov
Date: Tuesday, February 10, 2015 at 8:32 AM
To: 
"podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>"
Subject: [Podofo-users] Problem with enumerating objects with PoDoFo

Hi.
I'm new to the PoDoFo library, and I've been trying to use it for my program.
In general, my program needs to open a PDF file, decode all of its filters, 
decrypt them with default password (if necessary), and print out all decoded 
and decrypted objects.
So I saw that podofouncompress (using the PoDoFo library) pretty much does 
that, so I tried to use the same library functions as he does. The whole 
project went great from that point untill I've encountered an error. Apparently 
the function PdfMemDocument::GetObjects() doesn't return all objects, but only 
those who had been handled (or loaded?). Anyway, it will be great to find out a 
way to get the entire pdf's object list decrypted and uncompressed 
(PdfMemDocument::UncompressObjects()).

I've uploaded a sample file I've been woking on. In which object 2 generation 0 
is clearly a zip file "embedded":

2 0 obj
<</Length 144>>stream
PK   ÏyJFëEŒ‰%   &

   Text File.txtíÆ1
 0 0+ ˜ f “ðpLþŒ´WsºâÖÛ ©ªªªªªª PK   ÏyJFëEŒ‰%   &

                Text File.txtPK     ;   P
endstream
endobj

In the link bellow:
http://www57.zippyshare.com/v/or6jqo28/file.html

when scanning with podofo, only object 6 0 is available in the mentioned 
function's output, and podofouncompress completely ignores object object 2 0

Thanks in advance,
 Nadav




------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to