For a start, here's the man page for catdoc, which you will need to install.
http://linux.die.net/man/1/catdoc Then, read up on using the system() or backtick operators in a Ruby script to engage it. You'll need to have a path to the file you want to process, which is highly dependent on the system you're using to store the files. In Paperclip, I made this processor to extract text from PDF files (pdftotext is part of the same collection of utilities as catdoc, I believe): #lib/paperclip_processors/text.rb module Paperclip # Handles extracting plain text from PDF file attachments class Text < Processor attr_accessor :whiny # Creates a Text extract from PDF def make src = @file dst = Tempfile.new([@basename, 'txt'].compact.join(".")) command = <<-end_command "#{ File.expand_path(src.path) }" "#{ File.expand_path(dst.path) }" end_command begin success = Paperclip.run("/usr/bin/pdftotext -nopgbrk", command.gsub(/\s+/, " ")) Rails.logger.info "Processing #{src.path} to #{dst.path} in the text processor." rescue PaperclipCommandLineError raise PaperclipError, "There was an error processing the text for #{@basename}" if @whiny end dst end end end Depending on how you are uploading your files, your mileage may vary. At the very simplest, the command would be text_contents = system('/usr/bin/catdoc /root/relative/path/to/file.doc') But that's hopelessly naive and will blow up on any error. Walter On Sep 16, 2012, at 6:16 AM, rovin varshney wrote: > > Hi Walter Lee Davis , Paul > > Please can u give some code snipet or give some more clarification > about parsing doc file. > > On Sat, Sep 15, 2012 at 7:37 PM, Scott Ribe <[email protected]> > wrote: > On Sep 15, 2012, at 7:27 AM, Paul wrote: > > > The docx format is actually pretty simple... > > You are really cruel to toy with him like that ;-) > > > -- > Scott Ribe > [email protected] > http://www.elevated-dev.com/ > (303) 722-0567 voice > > > > > -- > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Talk" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > > > -- > You received this message because you are subscribed to the Google Groups > "Ruby on Rails: Talk" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

