Re: [Rails] How to read Microsoft document file in ruby on rails ?

rovin varshney Mon, 17 Sep 2012 22:40:18 -0700

Hello Everyone,
   Thanks everyone.Finally got a solution while searching things that you
all had explained.
   There is a docx gem for parsing docx file and docx-html for convert it
into HTML.


   require 'docx'

d = Docx::Document.open('example.docx')d.each_paragraph do |p|
  puts dend

and for the docx file stored on s3 amazon.

Docx::Document.open(open('http://S3-URL/original.docx',:ssl_verify_mode =>
OpenSSL::SSL::VERIFY_NONE))

A big Thanks to All.


On Sun, Sep 16, 2012 at 9:42 PM, Walter Lee Davis <[email protected]>wrote:

> For a start, here's the man page for catdoc, which you will need to
> install.
>
> http://linux.die.net/man/1/catdoc
>
> Then, read up on using the system() or backtick operators in a Ruby script
> to engage it. You'll need to have a path to the file you want to process,
> which is highly dependent on the system you're using to store the files. In
> Paperclip, I made this processor to extract text from PDF files (pdftotext
> is part of the same collection of utilities as catdoc, I believe):
>
> #lib/paperclip_processors/text.rb
>
> module Paperclip
>   # Handles extracting plain text from PDF file attachments
>   class Text < Processor
>
>     attr_accessor :whiny
>
>     # Creates a Text extract from PDF
>     def make
>       src = @file
>       dst = Tempfile.new([@basename, 'txt'].compact.join("."))
>       command = <<-end_command
>         "#{ File.expand_path(src.path) }"
>         "#{ File.expand_path(dst.path) }"
>       end_command
>
>       begin
>         success = Paperclip.run("/usr/bin/pdftotext -nopgbrk",
> command.gsub(/\s+/, " "))
>         Rails.logger.info "Processing #{src.path} to #{dst.path} in the
> text processor."
>       rescue PaperclipCommandLineError
>         raise PaperclipError, "There was an error processing the text for
> #{@basename}" if @whiny
>       end
>       dst
>     end
>   end
> end
>
> Depending on how you are uploading your files, your mileage may vary. At
> the very simplest, the command would be
>
> text_contents = system('/usr/bin/catdoc /root/relative/path/to/file.doc')
>
> But that's hopelessly naive and will blow up on any error.
>
> Walter
>
>
> On Sep 16, 2012, at 6:16 AM, rovin varshney wrote:
>
> >
> > Hi  Walter Lee Davis , Paul
> >
> >          Please can u give some code snipet or give some more
> clarification about parsing doc file.
> >
> > On Sat, Sep 15, 2012 at 7:37 PM, Scott Ribe <[email protected]>
> wrote:
> > On Sep 15, 2012, at 7:27 AM, Paul wrote:
> >
> > > The docx format is actually pretty simple...
> >
> > You are really cruel to toy with him like that ;-)
> >
> >
> > --
> > Scott Ribe
> > [email protected]
> > http://www.elevated-dev.com/
> > (303) 722-0567 voice
> >
> >
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "Ruby on Rails: Talk" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> [email protected].
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "Ruby on Rails: Talk" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> [email protected].
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "Ruby on Rails: Talk" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Rails] How to read Microsoft document file in ruby on rails ?

Reply via email to