Sorry – other things have priority…
Then, as previously mentioned, PoDoFo is happy to do exactly what you want. It
will do the PDF decoding parts of process and leave you with the original data
that was placed into the stream.
In your original example, someone put a ZIP file in a stream. And you will get
a ZIP file back out.
Leonard
From: Nadav Ben Jakov
Date: Sunday, February 15, 2015 at 11:49 AM
To:
"podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>"
Subject: [Podofo-users] Fwd: Problem with enumerating objects with PoDoFo
Hi;
Can anyone help me please?
---------- Forwarded message ----------
From: Nadav Ben Jakov <tbya...@gmail.com<mailto:tbya...@gmail.com>>
Date: Tue, Feb 10, 2015 at 6:34 PM
Subject: Re: [Podofo-users] Problem with enumerating objects with PoDoFo
To: Leonard Rosenthol <lrose...@adobe.com<mailto:lrose...@adobe.com>>
Hi,
I'm sorry for having pretty vogue explanations. I'm not trying to render the
stream's content nor to parse the streams as data. I just want to print the
section between 'stream' to 'endstream'.
As to printing, I basically write the stream to files as output, so they are in
fact "printable" for that matter. In the case I had given, I expect to be able
to open the ouput file with winzip/7zip later, and view its content. if that
stream would have contained a PE file's content, I would have wanted to be able
to execute the output file just like a normal PE file.
Thanks in advance,
Nadav
On Tue, Feb 10, 2015 at 6:16 PM, Leonard Rosenthol
<lrose...@adobe.com<mailto:lrose...@adobe.com>> wrote:
> I'm trying to extract each of the streams data out of all of the objects of a
> given PDF file,
>and remove any encoding/encryption related to the PDF format
>
That is EXACTLY what PoDoFo is doing for you.
> Later I'm printing each stream separately
>
Printing them in what way? Most (if not all) streams in a PDF aren’t in a
format that would be printable. You have font data, color data, XML streams,
etc. None of these make any sense to be printed. Even a page content stream
isn’t necessarily printable.
>In the example file I mentioned, I was hoping to extract that object steam
>(the zip file) and print it as a raw stream
>
I don’t understand what it means to “print a ZIP file as a raw stream”. A ZIP
file itself can contain MANY files and in a rich hierarchical organization.
So what would you be expecting to get back??
What if it was a .exe file – which si also perfectly fine in a PDF – what would
you be expecting to get back??
>(again, I'm not after unzipping or decompressing the zip, just the plain
>object).
>
And I think that’s the problem here – you are expecting something that isn’t
reasonable.
Have you read the PDF Standard – ISO 32000-1 – or at least a good book on
subject of PDF (eg. <http://shop.oreilly.com/product/0636920025269.do>) to get
a better understanding of PDF?
Leonard
From: Nadav Ben Jakov
Date: Tuesday, February 10, 2015 at 10:07 AM
To: Leonard Rosenthol
Subject: Re: [Podofo-users] Problem with enumerating objects with PoDoFo
I'm trying to extract each of the streams data out of all of the objects of a
given PDF file, and remove any encoding/encryption related to the PDF format.
Later I'm printing each stream separately.
I'm only trying to figure out if I can rely on PoDoFo parsing and extraction of
any stream on any object, even if the stream in question is unfiltered/clear,
I'm basically after the raw object's streams on these cases.
In the example file I mentioned, I was hoping to extract that object steam (the
zip file) and print it as a raw stream (again, I'm not after unzipping or
decompressing the zip, just the plain object).
Nadav
On Tue, Feb 10, 2015 at 4:51 PM, Leonard Rosenthol
<lrose...@adobe.com<mailto:lrose...@adobe.com>> wrote:
This stream IS clear (to use your term) as far as PDF is concerned. It is
neither filtered nor encrypted. Therefore you get the raw bits – which in this
case, happen to be a .zip file. But as mentioned, it might be font data,
ICC profile data or any other manner thing that can be put into a PDF stream in
various ways.
Perhaps if you would explain the reason for your request, since it’s unclear
what you are trying to achieve at the end of the process.
Leonard
From: Nadav Ben Jakov
Date: Tuesday, February 10, 2015 at 9:02 AM
To: Leonard Rosenthol
Subject: Re: [Podofo-users] Problem with enumerating objects with PoDoFo
Hi,
Thank you for your quick answer, yet it didn't completely answered my question:
I need to get as an output all of the streams, regardless of having /Filter or
/Encrypt. If the object happens to contain a /Filter - the stream should be
decoded. If the stream is clear - it should only be parsed and returned as
clear stream.
Can PoDoFo (or any other PDF library for that matter) perform such operation?
Thanks in advance,
Nadav
On Tue, Feb 10, 2015 at 3:38 PM, Leonard Rosenthol
<lrose...@adobe.com<mailto:lrose...@adobe.com>> wrote:
That stream that you show doesn’t have a filter. If it did, you’d see it on
the stream dictionary.
Instead, what you have is an unfiltered stream containing arbitrary data –
which in this case, just happens to be a ZIP file. But it could just as well
been a Word file, another PDF or even a .exe.
There is nothing for PoDoFo (or any other PDF library) to do here.
Leonard
From: Nadav Ben Jakov
Date: Tuesday, February 10, 2015 at 8:32 AM
To:
"podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>"
Subject: [Podofo-users] Problem with enumerating objects with PoDoFo
Hi.
I'm new to the PoDoFo library, and I've been trying to use it for my program.
In general, my program needs to open a PDF file, decode all of its filters,
decrypt them with default password (if necessary), and print out all decoded
and decrypted objects.
So I saw that podofouncompress (using the PoDoFo library) pretty much does
that, so I tried to use the same library functions as he does. The whole
project went great from that point untill I've encountered an error. Apparently
the function PdfMemDocument::GetObjects() doesn't return all objects, but only
those who had been handled (or loaded?). Anyway, it will be great to find out a
way to get the entire pdf's object list decrypted and uncompressed
(PdfMemDocument::UncompressObjects()).
I've uploaded a sample file I've been woking on. In which object 2 generation 0
is clearly a zip file "embedded":
2 0 obj
<</Length 144>>stream
PK ÏyJFëEŒ‰% &
Text File.txtíÆ1
0 0+ ˜ f “ðpLþŒ´WsºâÖÛ ©ªªªªªª PK ÏyJFëEŒ‰% &
Text File.txtPK ; P
endstream
endobj
In the link bellow:
http://www57.zippyshare.com/v/or6jqo28/file.html
when scanning with podofo, only object 6 0 is available in the mentioned
function's output, and podofouncompress completely ignores object object 2 0
Thanks in advance,
Nadav
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users