1. Convert .doc to .pdf with PyODConverter http://www.artofsolving.com/opensource/pyodconverter
2. Convert .pdf to .tiff with ImageMagick 3. Process .tiff through Tesseract OCR and get .txt On Wed, Mar 16, 2011 at 9:51 PM, Hafiz Badrie Lubis <[email protected]> wrote: > Hi People, > > I just joined the group and I want to ask something about my problem. > I'm still learning Ruby on Rails and now I have a task to parse > Microsoft Word and store the content into database. > > Do you have any suggestion how to do it? > > FYI, I develop it under Unix Environment. So, I don't have a chance to > use win32ole on it, CMIIW. > > I also have searched the internet about this. But all I found that I > need to use JRuby and combine it with Apache POI or else I need to use > win32ole. As far as I know, to use JRuby I need to create the rails > project also with JRuby but unfortunately I already created the > project with plain Ruby. > > So, I don't know what to do anymore. Does anybody have clue? > > Regards, > > Hafiz Badrie Lubis > > -- > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Talk" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/rubyonrails-talk?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

