> From: Erick Papadakis [mailto:[EMAIL PROTECTED]]
>
> Hi,
>
> I need to do a simple thing. I want to read a binary
> file (e.g., microsoft word, excel etc) and then
> extract only the text from it. I am using simple
> fopen() and fread() and when I print out the contents
> of the file, it returns me the text but apart from the
> text, there is some junk which is probably because of
> the file being binary.
>
> Is it possible through the regexp to specify that I
> only want some of the ASCII characters from the binary
> stream? Here is the perl equivalent:
>
> /([\040-\176\s]{3,})/g
>
> I want only those words that are minimum 3 characters
> and I want the characters to match the ASCII numbers
> from 40 to 176.
>
You can use the regex likewise to perl. Try it here:
http://www.php.comzept.de/rexpr/index.php4
Instead of the /g option use the function preg_grep() in PHP.
Read the file in an array with file(), then grep through
the array with the regex to get the right lines. Want a
string? implode() without delimiter.
Joerg Krause
******************************************************
E-Mail: [EMAIL PROTECTED] Info: www.joerg.krause.net
German Reference Handbook: www.php.comzept.de/referenz
******************************************************
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]