Re: [Rails] How to Parse Microsoft Word Document

Scott Ribe Wed, 16 Mar 2011 21:53:06 -0700

On Mar 16, 2011, at 5:10 PM, Vladimir Rybas wrote:

> 1. Convert .doc to .pdf with PyODConverter
> http://www.artofsolving.com/opensource/pyodconverter
> 
> 2. Convert .pdf to .tiff with ImageMagick
> 
> 3. Process .tiff through Tesseract OCR and get .txt


Wow, talk about a long slow way to potentially lose text flow and introduce 
errors...

-- 
Scott Ribe
[email protected]
http://www.elevated-dev.com/
(303) 722-0567 voice




-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en.

Re: [Rails] How to Parse Microsoft Word Document

Reply via email to