> From: Erick Papadakis [mailto:[EMAIL PROTECTED]]
> 
> Hi,
> 
> I need to do a simple thing. I want to read a binary
> file (e.g., microsoft word, excel etc) and then
> extract only the text from it. I am using simple
> fopen() and fread() and when I print out the contents
> of the file, it returns me the text but apart from the
> text, there is some junk which is probably because of
> the file being binary. 
> 
> Is it possible through the regexp to specify that I
> only want some of the ASCII characters from the binary
> stream? Here is the perl equivalent: 
> 
>     /([\040-\176\s]{3,})/g
> 
> I want only those words that are minimum 3 characters
> and I want the characters to match the ASCII numbers
> from 40 to 176. 
> 
You can use the regex likewise to perl. Try it here:

http://www.php.comzept.de/rexpr/index.php4

Instead of the /g option use the function preg_grep() in PHP.
Read the file in an array with file(), then grep through
the array with the regex to get the right lines. Want a 
string? implode() without delimiter.

Joerg Krause
******************************************************
E-Mail: [EMAIL PROTECTED] Info:    www.joerg.krause.net
German Reference Handbook: www.php.comzept.de/referenz
******************************************************

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to