AW: [PHP] Parsing pdf file

2005-02-09 Thread Mirco Blitz
 Hi,
Sorry i don't really find something useful there.

Greetings
Mirco Blitz

-Ursprüngliche Nachricht-
Von: Matt M. [mailto:[EMAIL PROTECTED] 
Gesendet: Mittwoch, 9. Februar 2005 22:09
An: Mirco Blitz
Cc: php-general@lists.php.net
Betreff: Re: [PHP] Parsing pdf file

 For a project of a customer i need to know if a pdf file contains 
 special functions and buttons.
 
 Is there a way to parse a PDF file in php?

you might be able to find something at

http://us3.php.net/pdf

--
PHP General Mailing List (http://www.php.net/) To unsubscribe, visit:
http://www.php.net/unsub.php

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



AW: [PHP] Parsing pdf file

2005-02-09 Thread Mirco Blitz
Thank you for that huge code. I will try.

Greetings
Mirco Blitz 

-Ursprüngliche Nachricht-
Von: Matt M. [mailto:[EMAIL PROTECTED] 
Gesendet: Mittwoch, 9. Februar 2005 22:39
An: Mirco Blitz
Cc: php-general@lists.php.net
Betreff: Re: [PHP] Parsing pdf file

did you try this?


?php
$test = pdf2string(pathtoPDFfile);
echo $test;

# Returns a -1 if uncompression failed
function pdf2string($sourcefile)
{
   $fp = fopen($sourcefile, 'rb');
   $content = fread($fp, filesize($sourcefile));
   fclose($fp);

   # Locate all text hidden within the stream and endstream tags
   $searchstart = 'stream';
   $searchend = 'endstream';
   $pdfdocument = ;

   $pos = 0;
   $pos2 = 0;
   $startpos = 0;
   # Iterate through each stream block
   while( $pos !== false  $pos2 !== false )
   {
 # Grab beginning and end tag locations if they have not yet been parsed
 $pos = strpos($content, $searchstart, $startpos);
 $pos2 = strpos($content, $searchend, $startpos + 1);
 if( $pos !== false  $pos2 !== false )
 {
 # Extract compressed text from between stream tags and uncompress
 $textsection = substr($content, $pos + strlen($searchstart) + 2,
$pos2 - $pos - strlen($searchstart) - 1);
 $data = @gzuncompress($textsection);
 # Clean up text via a special function
 $data = ExtractText($data);
 # Increase our PDF pointer past the section we just read
 $startpos = $pos2 + strlen($searchend) - 1;
 if( $data === false ) { return -1; }
 $pdfdocument = $pdfdocument . $data;
 }
   }

   return $pdfdocument;
}

function ExtractText($postScriptData)
{
   while( (($textStart = strpos($postScriptData, '(', $textStart)) 
($textEnd = strpos($postScriptData, ')', $textStart + 1)) 
substr($postScriptData, $textEnd - 1) != '\\') )
   {
 $plainText .= substr($postScriptData, $textStart + 1, $textEnd -
$textStart - 1);
 if( substr($postScriptData, $textEnd + 1, 1) == ']' ) // This adds
quite some additional spaces between the words
 {
 $plainText .= ' ';
 }

 $textStart = $textStart  $textEnd ? $textEnd : $textStart + 1;
   }

   return stripslashes($plainText);
}
?

--
PHP General Mailing List (http://www.php.net/) To unsubscribe, visit:
http://www.php.net/unsub.php

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: AW: [PHP] Parsing pdf file

2005-02-09 Thread Jason Barnett
Mirco Blitz wrote:
Thank you for that huge code. I will try.
Greetings
Mirco Blitz
-Ursprüngliche Nachricht-
Von: Matt M. [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 9. Februar 2005 22:39
An: Mirco Blitz
Cc: php-general@lists.php.net
Betreff: Re: [PHP] Parsing pdf file
did you try this?
...
Code worked fine for me as well, thanks for that extremely useful
snippet!  I ran it on a test .pdf document and it pulled everything out.
plansIn fact, this snippet gives me almost exactly the missing
functionality that I needed to work on a new project of mine!/plans
--
Teach a man to fish...
NEW? | http://www.catb.org/~esr/faqs/smart-questions.html
STFA | http://marc.theaimsgroup.com/?l=php-generalw=2
STFM | http://www.php.net/manual/en/index.php
STFW | http://www.google.com/search?q=php
LAZY |
http://mycroft.mozdev.org/download.html?name=PHPsubmitform=Find+search+plugins


signature.asc
Description: OpenPGP digital signature