It looks like it was fairly simple to accomplish my desired behavior.
I'm attaching a patch which fixes the issue I brought up by virtually
overriding PdfTokenizer::GetNextToken in PdfContentsTokenizer.
- Mike Slegeir
Mike Slegeir wrote:
Hey Amin,
Good point, but I don't think the change I'm suggesting would affect
your usage. I assume in order to handle streams one at a time, you're
using the PdfContentsTokenizer which accepts a const char* and a length
rather than the PdfCanvas* constructor. If that's not the case, I'm not
sure how you're able to detect stream boundaries as is. My suggestion
is just to move the code at the top of PdfContentsTokenizer::ReadNext
(the if(!gotToken) block) into a virtual
PdfContentsTokenizer::GetNextToken method. If you're using the first
constructor, m_lstContents will be empty and PdfContentsTokenizer will
behave as before. Otherwise, if you construct with a PdfCanvas*, the
stream transitions will be seamless (as they have been), but it will now
behave correctly when an object is split across Content streams.
I'm also curious if your application could handle the PDF that I
previously posted where an array is split across the streams.
Unfortunately, though, I don't think that it would work nor that it's
really fixable in that case: you'd just have to use the PdfCanvas*
PdfContentsTokenizer which could be fixed by my suggested change.
- Mike Slegeir
A. Massad wrote:
Hi Mike,
If you change the behavior of PdfContentsTokenizer::GetNextToken() to
span across streams, could you please provide a flag to toggle this
behavior? For some users (like me) it might be important to change
back to the "old" behavior which DOES NOT span across streams.
I have got an application which parses through streams and replaces
the content of each single stream without changing the overall
structure of the streams. I think that this wouldn't be possible any
longer if PdfContentsTokenizer::GetNextToken() did not detect stream
boundaries anymore.
Thanks in advance!
Greetings,
Amin
On 26.08.2009, at 17:17, Mike Slegeir wrote:
I've discovered another related issue. PdfTokenizer is unable to
reach into the next content stream in order to get a token. So any
objects which are split across Contents have an UnexpectedEOF
raised. My suggested solution to the problem is to either
concatenate all the Content streams before doing any tokenization or
to make PdfTokenizer::GetNextToken virtual and move the stream
switching logic into PdfContentsTokenizer::GetNextToken such that it
will try the parents version, attempt to move to the next stream (if
it exists) on failure, then retry. Attached is a very basic example
of an array split between two streams.
- Mike Slegeir
Mike Slegeir wrote:
I've resolved this issue in an admittedly hacky way. This may be
sufficient for this problem though. Attached is a patch which
fixes the issue. I've only done limited testing, but it does at
least correct the issue.
- Mike Slegeir
When using PdfContentsTokenizer with a PDF with an array for
Contents
rather than a single stream, the tokenizer will reset its position
to
the beginning of the first stream upon exhausting a stream. An
Contents
array with contents X Y Z will appear as X X Y X Y Z to a user of
the
PdfContentsTokenizer. Attached is a PDF which has a Contents
array. I
can provide example code and output if necessary.
<split-
array
.pdf
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008
30-Day
trial. Simplify your report design, integration and deployment - and
focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.
http://p.sf.net/sfu/bobj-july_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users
Index: src/PdfContentsTokenizer.cpp
===================================================================
--- src/PdfContentsTokenizer.cpp (revision 1140)
+++ src/PdfContentsTokenizer.cpp (working copy)
@@ -76,9 +76,32 @@
PdfBufferOutputStream stream( &m_curBuffer );
pStream->GetFilteredCopy( &stream );
+ std::streamoff position = m_device.Device() ? m_device.Device()->Tell() : 0;
m_device = PdfRefCountedInputDevice( m_curBuffer.GetBuffer(), m_curBuffer.GetSize() );
+ m_device.Device()->Seek(position);
}
+bool PdfContentsTokenizer::GetNextToken( const char *& pszToken, EPdfTokenType* peType )
+{
+ bool gotToken = PdfTokenizer::GetNextToken( pszToken, peType );
+ if ( !gotToken )
+ {
+ if ( m_lstContents.size() )
+ {
+ // We ran out of tokens in this stream. Switch to the next stream
+ // and try again.
+ SetCurrentContentsStream( m_lstContents.front() );
+ m_lstContents.pop_front();
+ return PdfTokenizer::GetNextToken( pszToken, peType );
+ }
+ else
+ {
+ // No more content stream tokens to read.
+ return false;
+ }
+ }
+}
+
bool PdfContentsTokenizer::ReadNext( EPdfContentsType& reType, const char*& rpszKeyword, PdfVariant & rVariant )
{
if (m_readingInlineImgData)
@@ -103,21 +126,7 @@
bool gotToken = this->GetNextToken( pszToken, &eTokenType );
if ( !gotToken )
- {
- if ( m_lstContents.size() )
- {
- // We ran out of tokens in this stream. Switch to the next stream
- // and try again.
- SetCurrentContentsStream( m_lstContents.front() );
- m_lstContents.pop_front();
- return ReadNext( reType, rpszKeyword, rVariant );
- }
- else
- {
- // No more content stream tokens to read.
- return false;
- }
- }
+ return false;
eDataType = this->DetermineDataType( pszToken, eTokenType, rVariant );
Index: src/PdfContentsTokenizer.h
===================================================================
--- src/PdfContentsTokenizer.h (revision 1140)
+++ src/PdfContentsTokenizer.h (working copy)
@@ -100,6 +100,8 @@
*/
bool ReadNext( EPdfContentsType& reType, const char*& rpszKeyword, PoDoFo::PdfVariant & rVariant );
+ bool GetNextToken( const char *& pszToken, EPdfTokenType* peType = NULL);
+
private:
/** Set another objects stream as the current stream for parsing
*
Index: src/PdfTokenizer.h
===================================================================
--- src/PdfTokenizer.h (revision 1140)
+++ src/PdfTokenizer.h (working copy)
@@ -71,13 +71,13 @@
* \param[out] peType On true return, if not NULL the type of the read token
* will be stored into this parameter. Undefined on false
* return.
- *
+ *
* \returns True if a token was read, false if there are no
* more tokens to read.
*
* \see GetBuffer
*/
- bool GetNextToken( const char *& pszToken, EPdfTokenType* peType = NULL);
+ virtual bool GetNextToken( const char *& pszToken, EPdfTokenType* peType = NULL);
/** Reads the next token from the current file position
* ignoring all comments and compare the passed token
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users