For a start, here's the man page for catdoc, which you will need to install. 

http://linux.die.net/man/1/catdoc

Then, read up on using the system() or backtick operators in a Ruby script to 
engage it. You'll need to have a path to the file you want to process, which is 
highly dependent on the system you're using to store the files. In Paperclip, I 
made this processor to extract text from PDF files (pdftotext is part of the 
same collection of utilities as catdoc, I believe):

#lib/paperclip_processors/text.rb

module Paperclip
  # Handles extracting plain text from PDF file attachments
  class Text < Processor

    attr_accessor :whiny

    # Creates a Text extract from PDF
    def make
      src = @file
      dst = Tempfile.new([@basename, 'txt'].compact.join("."))
      command = <<-end_command
        "#{ File.expand_path(src.path) }"
        "#{ File.expand_path(dst.path) }"
      end_command

      begin
        success = Paperclip.run("/usr/bin/pdftotext -nopgbrk", 
command.gsub(/\s+/, " "))
        Rails.logger.info "Processing #{src.path} to #{dst.path} in the text 
processor."
      rescue PaperclipCommandLineError
        raise PaperclipError, "There was an error processing the text for 
#{@basename}" if @whiny
      end
      dst
    end
  end
end

Depending on how you are uploading your files, your mileage may vary. At the 
very simplest, the command would be

text_contents = system('/usr/bin/catdoc /root/relative/path/to/file.doc')

But that's hopelessly naive and will blow up on any error. 

Walter


On Sep 16, 2012, at 6:16 AM, rovin varshney wrote:

> 
> Hi  Walter Lee Davis , Paul
> 
>          Please can u give some code snipet or give some more clarification 
> about parsing doc file.
> 
> On Sat, Sep 15, 2012 at 7:37 PM, Scott Ribe <[email protected]> 
> wrote:
> On Sep 15, 2012, at 7:27 AM, Paul wrote:
> 
> > The docx format is actually pretty simple...
> 
> You are really cruel to toy with him like that ;-)
> 
> 
> --
> Scott Ribe
> [email protected]
> http://www.elevated-dev.com/
> (303) 722-0567 voice
> 
> 
> 
> 
> --
> You received this message because you are subscribed to the Google Groups 
> "Ruby on Rails: Talk" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Ruby on Rails: Talk" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to