Re: [PHP] How to fetch .DOC or .DOCX file in php

2008-12-05 Thread tedd

At 10:59 AM -0800 12/4/08, Jim Lucas wrote:

Ah, not true about the MS requirement.  If all you want is the clear/clean
text (without any formatting), then I can do it with php on any platform.

If this is what is needed, here is the code to do it.

?php

-snip- code

?

Hope this helps.
--
Jim Lucas



Jim:

Most excellent code!

I was considering a way for clients to post directly from MS word 
into a form requiring text-only. No matter how many times I tell them 
text only they keep cut-pasting directly from a formatted Word 
document and wondering Where did those characters come from?


Thanks,

tedd

--
---
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] How to fetch .DOC or .DOCX file in php

2008-12-05 Thread tedd

At 6:39 PM -0600 12/4/08, Shawn McKenzie wrote:

Jim Lucas wrote:

  Hope this helps.


 I am working on a set of php classes that will be able to read the 
text with the formatting included and convert it to a standard 
document format.

 The standard format that it will end up in has yet


has yet...  what?

Are you O.K. Jim?  Did you die while writing this?



If he did, at least we got the code.  :-)

Cheers,

tedd
--
---
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] How to fetch .DOC or .DOCX file in php

2008-12-05 Thread tedd

At 7:35 PM -0800 12/4/08, Jim Lucas wrote:
A question to all then.  How would you like to see the text, with 
formating, stored?


All suggestions welcome!

--
Jim Lucas



Jim:

What's wrong with .txt?

Cheers,

tedd

--
---
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] How to fetch .DOC or .DOCX file in php

2008-12-05 Thread Andrew Ballard
On Thu, Dec 4, 2008 at 10:35 PM, Jim Lucas [EMAIL PROTECTED] wrote:
 I was going to say that I haven't yet decided on what the final output format 
 is going to be.  Probably either rtf or OpenXML.

 How about I ask for suggestions on what would be the best format to store the 
 final copy.

 I figured that this tool would mainly be used for .doc to web conversion, but 
 I guess it could be used to also convert to other document formats too.

 But, I would like to have the ability to at least store the formating inline 
 with the text.  So, either some form of xml.  Be it (x)HTML or plain XML
 or even OpenXML.

 A question to all then.  How would you like to see the text, with formating, 
 stored?

 All suggestions welcome!

 --
 Jim Lucas


It's an excellent start. It pulled in some additional control
characters in some of the documents I tried, and some documents had
extra stuff at the end of the document. It was still text, but it
looked like the text from the page header/footer definitions. It would
be cool to see this polished and released. I just wish there was
something this basic that worked this well on PDF files! :-)

Andrew

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] How to fetch .DOC or .DOCX file in php

2008-12-05 Thread Jim Lucas
Andrew Ballard wrote:
 On Thu, Dec 4, 2008 at 10:35 PM, Jim Lucas [EMAIL PROTECTED] wrote:
 I was going to say that I haven't yet decided on what the final output 
 format is going to be.  Probably either rtf or OpenXML.

 How about I ask for suggestions on what would be the best format to store 
 the final copy.

 I figured that this tool would mainly be used for .doc to web conversion, 
 but I guess it could be used to also convert to other document formats too.

 But, I would like to have the ability to at least store the formating inline 
 with the text.  So, either some form of xml.  Be it (x)HTML or plain XML
 or even OpenXML.

 A question to all then.  How would you like to see the text, with formating, 
 stored?

 All suggestions welcome!

 --
 Jim Lucas

 
 It's an excellent start. It pulled in some additional control
 characters in some of the documents I tried, and some documents had
 extra stuff at the end of the document. It was still text, but it
 looked like the text from the page header/footer definitions. It would
 be cool to see this polished and released. I just wish there was
 something this basic that worked this well on PDF files! :-)
 
 Andrew
 

Ah, a part that I hadn't yet thought about checking to see if they would have 
been included.  Like I said, this is just the starting spot.

I will continue hacking at this thing.  Hopefully something good will come of 
it... :)

-- 
Jim Lucas

   Some men are born to greatness, some achieve greatness,
   and some have greatness thrust upon them.

Twelfth Night, Act II, Scene V
by William Shakespeare

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] How to fetch .DOC or .DOCX file in php

2008-12-05 Thread Eric Butera
On Thu, Dec 4, 2008 at 10:35 PM, Jim Lucas [EMAIL PROTECTED] wrote:
 Shawn McKenzie wrote:
 Jim Lucas wrote:
 Boyd, Todd M. wrote:
 -Original Message-
 From: Jagdeep Singh [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 04, 2008 8:39 AM
 To: php-general@lists.php.net
 Subject: [PHP] How to fetch .DOC or .DOCX file in php
 Importance: Low

 Hi !

 I want to fetch text from .doc / .docx file and save it into database
 file.
 But when  I tried to fetch text with fopen/fgets etc ... It gave me
 special
 characters with text.

 (With .txt files everything is fine)
 Only problem is with doc/docx files.
 I dont know whow to remove SPECIAL CHARACTERS from this text ...
 A.) This has been handled on this list several times. Please search the
 archives before posting a question.
 B.) Did you even TRY to Google for this? In the first 5 matches for php
 open ms word I found this:

 http://www.developertutorials.com/blog/php/extracting-text-from-word-doc
 uments-via-php-and-com-81/

 You will need an MS Windows machine for this solution to work. If you're
 using *nix... well... good luck.


 // Todd

 Ah, not true about the MS requirement.  If all you want is the clear/clean
 text (without any formatting), then I can do it with php on any platform.

 If this is what is needed, here is the code to do it.

 ?php

 $filename = './12345.doc';
 if ( file_exists($filename) ) {

  if ( ($fh = fopen($filename, 'r')) !== false ) {

  $headers = fread($fh, 0xA00);

  # 1 = (ord(n)*1) ; Document has from 0 to 255 characters
  $n1 = ( ord($headers[0x21C]) - 1 );

  # 1 = ((ord(n)-8)*256) ; Document has from 256 to 63743 
 characters
  $n2 =   ( ( ord($headers[0x21D]) - 8 ) * 256 );

  # 1 = ((ord(n)*256)*256) ; Document has from 63744 to 16775423 
 characters
  $n3 =   ( ( ord($headers[0x21E]) * 256 ) * 256 );

  # (((ord(n)*256)*256)*256) ; Document has from 16775424 to 
 4294965504 characters
  $n4 = ( ( ( ord($headers[0x21F]) * 256 ) * 256 ) * 256 );

  # Total length of text in the document
  $textLength = ($n1 + $n2 + $n3 + $n4);

  $extracted_plaintext = fread($fh, $textLength);

  # if you want the plain text with no formatting, do this
  echo $extracted_plaintext;

  # if you want to see your paragraphs in a web page, do this
  echo nl2br($extracted_plaintext);

  }

 }

 ?

 Hope this helps.

 I am working on a set of php classes that will be able to read the text 
 with the formatting included and convert it to a standard document format.
 The standard format that it will end up in has yet

   has yet...  what?

 Are you O.K. Jim?  Did you die while writing this?


 Sorry, still kickin'

 I was going to say that I haven't yet decided on what the final output format 
 is going to be.  Probably either rtf or OpenXML.

 How about I ask for suggestions on what would be the best format to store the 
 final copy.

 I figured that this tool would mainly be used for .doc to web conversion, but 
 I guess it could be used to also convert to other document formats too.

 But, I would like to have the ability to at least store the formating inline 
 with the text.  So, either some form of xml.  Be it (x)HTML or plain XML
 or even OpenXML.

 A question to all then.  How would you like to see the text, with formating, 
 stored?

 All suggestions welcome!

 --
 Jim Lucas

   Some men are born to greatness, some achieve greatness,
   and some have greatness thrust upon them.

 Twelfth Night, Act II, Scene V
by William Shakespeare

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



Is there a way to make it so that additional output renderers could be
created?  I'd lean towards xml though, since that can be parsed fairly
easily.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] How to fetch .DOC or .DOCX file in php

2008-12-04 Thread Boyd, Todd M.
 -Original Message-
 From: Jagdeep Singh [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 04, 2008 8:39 AM
 To: php-general@lists.php.net
 Subject: [PHP] How to fetch .DOC or .DOCX file in php
 Importance: Low
 
 Hi !
 
 I want to fetch text from .doc / .docx file and save it into database
 file.
 But when  I tried to fetch text with fopen/fgets etc ... It gave me
 special
 characters with text.
 
 (With .txt files everything is fine)
 Only problem is with doc/docx files.
 I dont know whow to remove SPECIAL CHARACTERS from this text ...

A.) This has been handled on this list several times. Please search the
archives before posting a question.
B.) Did you even TRY to Google for this? In the first 5 matches for php
open ms word I found this:

http://www.developertutorials.com/blog/php/extracting-text-from-word-doc
uments-via-php-and-com-81/

You will need an MS Windows machine for this solution to work. If you're
using *nix... well... good luck.


// Todd

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] How to fetch .DOC or .DOCX file in php

2008-12-04 Thread Jim Lucas
Boyd, Todd M. wrote:
 -Original Message-
 From: Jagdeep Singh [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 04, 2008 8:39 AM
 To: php-general@lists.php.net
 Subject: [PHP] How to fetch .DOC or .DOCX file in php
 Importance: Low

 Hi !

 I want to fetch text from .doc / .docx file and save it into database
 file.
 But when  I tried to fetch text with fopen/fgets etc ... It gave me
 special
 characters with text.

 (With .txt files everything is fine)
 Only problem is with doc/docx files.
 I dont know whow to remove SPECIAL CHARACTERS from this text ...
 
 A.) This has been handled on this list several times. Please search the
 archives before posting a question.
 B.) Did you even TRY to Google for this? In the first 5 matches for php
 open ms word I found this:
 
 http://www.developertutorials.com/blog/php/extracting-text-from-word-doc
 uments-via-php-and-com-81/
 
 You will need an MS Windows machine for this solution to work. If you're
 using *nix... well... good luck.
 
 
 // Todd
 

Ah, not true about the MS requirement.  If all you want is the clear/clean
text (without any formatting), then I can do it with php on any platform.

If this is what is needed, here is the code to do it.

?php

$filename = './12345.doc';
if ( file_exists($filename) ) {

if ( ($fh = fopen($filename, 'r')) !== false ) {

$headers = fread($fh, 0xA00);

# 1 = (ord(n)*1) ; Document has from 0 to 255 characters
$n1 = ( ord($headers[0x21C]) - 1 );

# 1 = ((ord(n)-8)*256) ; Document has from 256 to 63743 
characters
$n2 =   ( ( ord($headers[0x21D]) - 8 ) * 256 );

# 1 = ((ord(n)*256)*256) ; Document has from 63744 to 16775423 
characters
$n3 =   ( ( ord($headers[0x21E]) * 256 ) * 256 );

# (((ord(n)*256)*256)*256) ; Document has from 16775424 to 
4294965504 characters
$n4 = ( ( ( ord($headers[0x21F]) * 256 ) * 256 ) * 256 );

# Total length of text in the document
$textLength = ($n1 + $n2 + $n3 + $n4);

$extracted_plaintext = fread($fh, $textLength);

# if you want the plain text with no formatting, do this
echo $extracted_plaintext;

# if you want to see your paragraphs in a web page, do this
echo nl2br($extracted_plaintext);

}

}

?

Hope this helps.

I am working on a set of php classes that will be able to read the text with 
the formatting included and convert it to a standard document format.
The standard format that it will end up in has yet

-- 
Jim Lucas

   Some men are born to greatness, some achieve greatness,
   and some have greatness thrust upon them.

Twelfth Night, Act II, Scene V
by William Shakespeare

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] How to fetch .DOC or .DOCX file in php

2008-12-04 Thread Shawn McKenzie
Jim Lucas wrote:
 Boyd, Todd M. wrote:
 -Original Message-
 From: Jagdeep Singh [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 04, 2008 8:39 AM
 To: php-general@lists.php.net
 Subject: [PHP] How to fetch .DOC or .DOCX file in php
 Importance: Low

 Hi !

 I want to fetch text from .doc / .docx file and save it into database
 file.
 But when  I tried to fetch text with fopen/fgets etc ... It gave me
 special
 characters with text.

 (With .txt files everything is fine)
 Only problem is with doc/docx files.
 I dont know whow to remove SPECIAL CHARACTERS from this text ...
 A.) This has been handled on this list several times. Please search the
 archives before posting a question.
 B.) Did you even TRY to Google for this? In the first 5 matches for php
 open ms word I found this:

 http://www.developertutorials.com/blog/php/extracting-text-from-word-doc
 uments-via-php-and-com-81/

 You will need an MS Windows machine for this solution to work. If you're
 using *nix... well... good luck.


 // Todd

 
 Ah, not true about the MS requirement.  If all you want is the clear/clean
 text (without any formatting), then I can do it with php on any platform.
 
 If this is what is needed, here is the code to do it.
 
 ?php
 
 $filename = './12345.doc';
 if ( file_exists($filename) ) {
 
   if ( ($fh = fopen($filename, 'r')) !== false ) {
 
   $headers = fread($fh, 0xA00);
 
   # 1 = (ord(n)*1) ; Document has from 0 to 255 characters
   $n1 = ( ord($headers[0x21C]) - 1 );
 
   # 1 = ((ord(n)-8)*256) ; Document has from 256 to 63743 
 characters
   $n2 =   ( ( ord($headers[0x21D]) - 8 ) * 256 );
 
   # 1 = ((ord(n)*256)*256) ; Document has from 63744 to 16775423 
 characters
   $n3 =   ( ( ord($headers[0x21E]) * 256 ) * 256 );
 
   # (((ord(n)*256)*256)*256) ; Document has from 16775424 to 
 4294965504 characters
   $n4 = ( ( ( ord($headers[0x21F]) * 256 ) * 256 ) * 256 );
 
   # Total length of text in the document
   $textLength = ($n1 + $n2 + $n3 + $n4);
   
   $extracted_plaintext = fread($fh, $textLength);
   
   # if you want the plain text with no formatting, do this
   echo $extracted_plaintext;
   
   # if you want to see your paragraphs in a web page, do this
   echo nl2br($extracted_plaintext);
 
   }
 
 }
 
 ?
 
 Hope this helps.
 
 I am working on a set of php classes that will be able to read the text with 
 the formatting included and convert it to a standard document format.
 The standard format that it will end up in has yet
 
has yet...  what?

Are you O.K. Jim?  Did you die while writing this?

-- 
Thanks!
-Shawn
http://www.spidean.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] How to fetch .DOC or .DOCX file in php

2008-12-04 Thread Jim Lucas
Shawn McKenzie wrote:
 Jim Lucas wrote:
 Boyd, Todd M. wrote:
 -Original Message-
 From: Jagdeep Singh [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 04, 2008 8:39 AM
 To: php-general@lists.php.net
 Subject: [PHP] How to fetch .DOC or .DOCX file in php
 Importance: Low

 Hi !

 I want to fetch text from .doc / .docx file and save it into database
 file.
 But when  I tried to fetch text with fopen/fgets etc ... It gave me
 special
 characters with text.

 (With .txt files everything is fine)
 Only problem is with doc/docx files.
 I dont know whow to remove SPECIAL CHARACTERS from this text ...
 A.) This has been handled on this list several times. Please search the
 archives before posting a question.
 B.) Did you even TRY to Google for this? In the first 5 matches for php
 open ms word I found this:

 http://www.developertutorials.com/blog/php/extracting-text-from-word-doc
 uments-via-php-and-com-81/

 You will need an MS Windows machine for this solution to work. If you're
 using *nix... well... good luck.


 // Todd

 Ah, not true about the MS requirement.  If all you want is the clear/clean
 text (without any formatting), then I can do it with php on any platform.

 If this is what is needed, here is the code to do it.

 ?php

 $filename = './12345.doc';
 if ( file_exists($filename) ) {

  if ( ($fh = fopen($filename, 'r')) !== false ) {

  $headers = fread($fh, 0xA00);

  # 1 = (ord(n)*1) ; Document has from 0 to 255 characters
  $n1 = ( ord($headers[0x21C]) - 1 );

  # 1 = ((ord(n)-8)*256) ; Document has from 256 to 63743 
 characters
  $n2 =   ( ( ord($headers[0x21D]) - 8 ) * 256 );

  # 1 = ((ord(n)*256)*256) ; Document has from 63744 to 16775423 
 characters
  $n3 =   ( ( ord($headers[0x21E]) * 256 ) * 256 );

  # (((ord(n)*256)*256)*256) ; Document has from 16775424 to 
 4294965504 characters
  $n4 = ( ( ( ord($headers[0x21F]) * 256 ) * 256 ) * 256 );

  # Total length of text in the document
  $textLength = ($n1 + $n2 + $n3 + $n4);
  
  $extracted_plaintext = fread($fh, $textLength);
  
  # if you want the plain text with no formatting, do this
  echo $extracted_plaintext;
  
  # if you want to see your paragraphs in a web page, do this
  echo nl2br($extracted_plaintext);

  }

 }

 ?

 Hope this helps.

 I am working on a set of php classes that will be able to read the text with 
 the formatting included and convert it to a standard document format.
 The standard format that it will end up in has yet

   has yet...  what?
 
 Are you O.K. Jim?  Did you die while writing this?
 

Sorry, still kickin'

I was going to say that I haven't yet decided on what the final output format 
is going to be.  Probably either rtf or OpenXML.

How about I ask for suggestions on what would be the best format to store the 
final copy.

I figured that this tool would mainly be used for .doc to web conversion, but I 
guess it could be used to also convert to other document formats too.

But, I would like to have the ability to at least store the formating inline 
with the text.  So, either some form of xml.  Be it (x)HTML or plain XML
or even OpenXML.

A question to all then.  How would you like to see the text, with formating, 
stored?

All suggestions welcome!

-- 
Jim Lucas

   Some men are born to greatness, some achieve greatness,
   and some have greatness thrust upon them.

Twelfth Night, Act II, Scene V
by William Shakespeare

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php