Re: [Podofo-users] PdfContentsTokenizer position is reset with multiple streams

Mike Slegeir Wed, 26 Aug 2009 11:23:48 -0700

It looks like it was fairly simple to accomplish my desired behavior.I'm attaching a patch which fixes the issue I brought up by virtuallyoverriding PdfTokenizer::GetNextToken in PdfContentsTokenizer.


- Mike Slegeir


Mike Slegeir wrote:

Hey Amin,
Good point, but I don't think the change I'm suggesting would affectyour usage. I assume in order to handle streams one at a time, you'reusing the PdfContentsTokenizer which accepts a const char* and a lengthrather than the PdfCanvas* constructor. If that's not the case, I'm notsure how you're able to detect stream boundaries as is. My suggestionis just to move the code at the top of PdfContentsTokenizer::ReadNext(the if(!gotToken) block) into a virtualPdfContentsTokenizer::GetNextToken method. If you're using the firstconstructor, m_lstContents will be empty and PdfContentsTokenizer willbehave as before. Otherwise, if you construct with a PdfCanvas*, thestream transitions will be seamless (as they have been), but it will nowbehave correctly when an object is split across Content streams.I'm also curious if your application could handle the PDF that Ipreviously posted where an array is split across the streams.Unfortunately, though, I don't think that it would work nor that it'sreally fixable in that case: you'd just have to use the PdfCanvas*PdfContentsTokenizer which could be fixed by my suggested change.
- Mike Slegeir

A. Massad wrote:
Hi Mike,
If you change the behavior of PdfContentsTokenizer::GetNextToken() tospan across streams, could you please provide a flag to toggle thisbehavior? For some users (like me) it might be important to changeback to the "old" behavior which DOES NOT span across streams.
I have got an application which parses through streams and replacesthe content of each single stream without changing the overallstructure of the streams. I think that this wouldn't be possible anylonger if PdfContentsTokenizer::GetNextToken() did not detect streamboundaries anymore.
Thanks in advance!

Greetings,
Amin

On 26.08.2009, at 17:17, Mike Slegeir wrote:
I've discovered another related issue. PdfTokenizer is unable toreach into the next content stream in order to get a token. So anyobjects which are split across Contents have an UnexpectedEOFraised. My suggested solution to the problem is to eitherconcatenate all the Content streams before doing any tokenization orto make PdfTokenizer::GetNextToken virtual and move the streamswitching logic into PdfContentsTokenizer::GetNextToken such that itwill try the parents version, attempt to move to the next stream (ifit exists) on failure, then retry. Attached is a very basic exampleof an array split between two streams.
- Mike Slegeir

Mike Slegeir wrote:
I've resolved this issue in an admittedly hacky way. This may besufficient for this problem though. Attached is a patch whichfixes the issue. I've only done limited testing, but it does atleast correct the issue.
- Mike Slegeir
When using PdfContentsTokenizer with a PDF with an array forContentsrather than a single stream, the tokenizer will reset its positiontothe beginning of the first stream upon exhausting a stream. AnContentsarray with contents X Y Z will appear as X X Y X Y Z to a user ofthePdfContentsTokenizer. Attached is a PDF which has a Contentsarray. I
can provide example code and output if necessary.
<split-array.pdf------------------------------------------------------------------------------Let Crystal Reports handle the reporting - Free Crystal Reports 200830-Daytrial. Simplify your report design, integration and deployment - andfocus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  
http://p.sf.net/sfu/bobj-july_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Daytrial. Simplify your report design, integration and deployment - and focus onwhat you do best, core application coding. Discover what's new withCrystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Index: src/PdfContentsTokenizer.cpp
===================================================================
--- src/PdfContentsTokenizer.cpp	(revision 1140)
+++ src/PdfContentsTokenizer.cpp	(working copy)
@@ -76,9 +76,32 @@
     PdfBufferOutputStream stream( &m_curBuffer );
     pStream->GetFilteredCopy( &stream );
 
+    std::streamoff position = m_device.Device() ? m_device.Device()->Tell() : 0;
     m_device = PdfRefCountedInputDevice( m_curBuffer.GetBuffer(), m_curBuffer.GetSize() );
+    m_device.Device()->Seek(position);
 }
 
+bool PdfContentsTokenizer::GetNextToken( const char *& pszToken, EPdfTokenType* peType )
+{
+	bool gotToken = PdfTokenizer::GetNextToken( pszToken, peType );
+	if ( !gotToken )
+	{
+		if ( m_lstContents.size() )
+		{
+			// We ran out of tokens in this stream. Switch to the next stream
+			// and try again.
+			SetCurrentContentsStream( m_lstContents.front() );
+			m_lstContents.pop_front();
+			return PdfTokenizer::GetNextToken( pszToken, peType );
+		}
+		else
+		{
+			// No more content stream tokens to read.
+			return false;
+		}
+	}
+}
+
 bool PdfContentsTokenizer::ReadNext( EPdfContentsType& reType, const char*& rpszKeyword, PdfVariant & rVariant )
 {
     if (m_readingInlineImgData)
@@ -103,21 +126,7 @@
 
     bool gotToken = this->GetNextToken( pszToken, &eTokenType );
     if ( !gotToken )
-    {
-        if ( m_lstContents.size() )
-        {
-        // We ran out of tokens in this stream. Switch to the next stream
-        // and try again.
-            SetCurrentContentsStream( m_lstContents.front() );
-            m_lstContents.pop_front();
-            return ReadNext( reType, rpszKeyword, rVariant );
-        }
-        else
-        {
-            // No more content stream tokens to read.
-            return false;
-        }
-    }
+		return false;
 
     eDataType = this->DetermineDataType( pszToken, eTokenType, rVariant );
 
Index: src/PdfContentsTokenizer.h
===================================================================
--- src/PdfContentsTokenizer.h	(revision 1140)
+++ src/PdfContentsTokenizer.h	(working copy)
@@ -100,6 +100,8 @@
      */
     bool ReadNext( EPdfContentsType& reType, const char*& rpszKeyword, PoDoFo::PdfVariant & rVariant );
 
+    bool GetNextToken( const char *& pszToken, EPdfTokenType* peType = NULL);
+
  private:
     /** Set another objects stream as the current stream for parsing
      *
Index: src/PdfTokenizer.h
===================================================================
--- src/PdfTokenizer.h	(revision 1140)
+++ src/PdfTokenizer.h	(working copy)
@@ -71,13 +71,13 @@
      *  \param[out] peType On true return, if not NULL the type of the read token
      *                     will be stored into this parameter. Undefined on false
      *                     return.
-     * 
+     *
      *  \returns           True if a token was read, false if there are no
      *                     more tokens to read.
      *
      *  \see GetBuffer
      */
-    bool GetNextToken( const char *& pszToken, EPdfTokenType* peType = NULL);
+    virtual bool GetNextToken( const char *& pszToken, EPdfTokenType* peType = NULL);
 
     /** Reads the next token from the current file position
      *  ignoring all comments and compare the passed token

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july

_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Re: [Podofo-users] PdfContentsTokenizer position is reset with multiple streams

Reply via email to