1. Convert .doc to .pdf with PyODConverter
http://www.artofsolving.com/opensource/pyodconverter

2. Convert .pdf to .tiff with ImageMagick

3. Process .tiff through Tesseract OCR and get .txt


On Wed, Mar 16, 2011 at 9:51 PM, Hafiz Badrie Lubis
<[email protected]> wrote:
> Hi People,
>
> I just joined the group and I want to ask something about my problem.
> I'm still learning Ruby on Rails and now I have a task to parse
> Microsoft Word and store the content into database.
>
> Do you have any suggestion how to do it?
>
> FYI, I develop it under Unix Environment. So, I don't have a chance to
> use win32ole on it, CMIIW.
>
> I also have searched the internet about this. But all I found that I
> need to use JRuby and combine it with Apache POI or else I need to use
> win32ole. As far as I know, to use JRuby I need to create the rails
> project also with JRuby but unfortunately I already created the
> project with plain Ruby.
>
> So, I don't know what to do anymore. Does anybody have clue?
>
> Regards,
>
> Hafiz Badrie Lubis
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Ruby on Rails: Talk" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/rubyonrails-talk?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en.

Reply via email to