It looks like it was fairly simple to accomplish my desired behavior. I'm attaching a patch which fixes the issue I brought up by virtually overriding PdfTokenizer::GetNextToken in PdfContentsTokenizer.

- Mike Slegeir

Mike Slegeir wrote:
Hey Amin,

Good point, but I don't think the change I'm suggesting would affect your usage. I assume in order to handle streams one at a time, you're using the PdfContentsTokenizer which accepts a const char* and a length rather than the PdfCanvas* constructor. If that's not the case, I'm not sure how you're able to detect stream boundaries as is. My suggestion is just to move the code at the top of PdfContentsTokenizer::ReadNext (the if(!gotToken) block) into a virtual PdfContentsTokenizer::GetNextToken method. If you're using the first constructor, m_lstContents will be empty and PdfContentsTokenizer will behave as before. Otherwise, if you construct with a PdfCanvas*, the stream transitions will be seamless (as they have been), but it will now behave correctly when an object is split across Content streams. I'm also curious if your application could handle the PDF that I previously posted where an array is split across the streams. Unfortunately, though, I don't think that it would work nor that it's really fixable in that case: you'd just have to use the PdfCanvas* PdfContentsTokenizer which could be fixed by my suggested change.

- Mike Slegeir

A. Massad wrote:
Hi Mike,

If you change the behavior of PdfContentsTokenizer::GetNextToken() to span across streams, could you please provide a flag to toggle this behavior? For some users (like me) it might be important to change back to the "old" behavior which DOES NOT span across streams.

I have got an application which parses through streams and replaces the content of each single stream without changing the overall structure of the streams. I think that this wouldn't be possible any longer if PdfContentsTokenizer::GetNextToken() did not detect stream boundaries anymore.

Thanks in advance!

Greetings,
Amin

On 26.08.2009, at 17:17, Mike Slegeir wrote:

I've discovered another related issue. PdfTokenizer is unable to reach into the next content stream in order to get a token. So any objects which are split across Contents have an UnexpectedEOF raised. My suggested solution to the problem is to either concatenate all the Content streams before doing any tokenization or to make PdfTokenizer::GetNextToken virtual and move the stream switching logic into PdfContentsTokenizer::GetNextToken such that it will try the parents version, attempt to move to the next stream (if it exists) on failure, then retry. Attached is a very basic example of an array split between two streams.

- Mike Slegeir

Mike Slegeir wrote:
I've resolved this issue in an admittedly hacky way. This may be sufficient for this problem though. Attached is a patch which fixes the issue. I've only done limited testing, but it does at least correct the issue.

- Mike Slegeir


When using PdfContentsTokenizer with a PDF with an array for Contents rather than a single stream, the tokenizer will reset its position to the beginning of the first stream upon exhausting a stream. An Contents array with contents X Y Z will appear as X X Y X Y Z to a user of the PdfContentsTokenizer. Attached is a PDF which has a Contents array. I
can provide example code and output if necessary.

<split- array .pdf ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  
http://p.sf.net/sfu/bobj-july_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users
Index: src/PdfContentsTokenizer.cpp
===================================================================
--- src/PdfContentsTokenizer.cpp	(revision 1140)
+++ src/PdfContentsTokenizer.cpp	(working copy)
@@ -76,9 +76,32 @@
     PdfBufferOutputStream stream( &m_curBuffer );
     pStream->GetFilteredCopy( &stream );
 
+    std::streamoff position = m_device.Device() ? m_device.Device()->Tell() : 0;
     m_device = PdfRefCountedInputDevice( m_curBuffer.GetBuffer(), m_curBuffer.GetSize() );
+    m_device.Device()->Seek(position);
 }
 
+bool PdfContentsTokenizer::GetNextToken( const char *& pszToken, EPdfTokenType* peType )
+{
+	bool gotToken = PdfTokenizer::GetNextToken( pszToken, peType );
+	if ( !gotToken )
+	{
+		if ( m_lstContents.size() )
+		{
+			// We ran out of tokens in this stream. Switch to the next stream
+			// and try again.
+			SetCurrentContentsStream( m_lstContents.front() );
+			m_lstContents.pop_front();
+			return PdfTokenizer::GetNextToken( pszToken, peType );
+		}
+		else
+		{
+			// No more content stream tokens to read.
+			return false;
+		}
+	}
+}
+
 bool PdfContentsTokenizer::ReadNext( EPdfContentsType& reType, const char*& rpszKeyword, PdfVariant & rVariant )
 {
     if (m_readingInlineImgData)
@@ -103,21 +126,7 @@
 
     bool gotToken = this->GetNextToken( pszToken, &eTokenType );
     if ( !gotToken )
-    {
-        if ( m_lstContents.size() )
-        {
-        // We ran out of tokens in this stream. Switch to the next stream
-        // and try again.
-            SetCurrentContentsStream( m_lstContents.front() );
-            m_lstContents.pop_front();
-            return ReadNext( reType, rpszKeyword, rVariant );
-        }
-        else
-        {
-            // No more content stream tokens to read.
-            return false;
-        }
-    }
+		return false;
 
     eDataType = this->DetermineDataType( pszToken, eTokenType, rVariant );
 
Index: src/PdfContentsTokenizer.h
===================================================================
--- src/PdfContentsTokenizer.h	(revision 1140)
+++ src/PdfContentsTokenizer.h	(working copy)
@@ -100,6 +100,8 @@
      */
     bool ReadNext( EPdfContentsType& reType, const char*& rpszKeyword, PoDoFo::PdfVariant & rVariant );
 
+    bool GetNextToken( const char *& pszToken, EPdfTokenType* peType = NULL);
+
  private:
     /** Set another objects stream as the current stream for parsing
      *
Index: src/PdfTokenizer.h
===================================================================
--- src/PdfTokenizer.h	(revision 1140)
+++ src/PdfTokenizer.h	(working copy)
@@ -71,13 +71,13 @@
      *  \param[out] peType On true return, if not NULL the type of the read token
      *                     will be stored into this parameter. Undefined on false
      *                     return.
-     * 
+     *
      *  \returns           True if a token was read, false if there are no
      *                     more tokens to read.
      *
      *  \see GetBuffer
      */
-    bool GetNextToken( const char *& pszToken, EPdfTokenType* peType = NULL);
+    virtual bool GetNextToken( const char *& pszToken, EPdfTokenType* peType = NULL);
 
     /** Reads the next token from the current file position
      *  ignoring all comments and compare the passed token
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to