Hello Everyone,
Thanks everyone.Finally got a solution while searching things that you
all had explained.
There is a docx gem for parsing docx file and docx-html for convert it
into HTML.
require 'docx'
d = Docx::Document.open('example.docx')d.each_paragraph do |p|
puts dend
and for the docx file stored on s3 amazon.
Docx::Document.open(open('http://S3-URL/original.docx',:ssl_verify_mode =>
OpenSSL::SSL::VERIFY_NONE))
A big Thanks to All.
On Sun, Sep 16, 2012 at 9:42 PM, Walter Lee Davis <[email protected]>wrote:
> For a start, here's the man page for catdoc, which you will need to
> install.
>
> http://linux.die.net/man/1/catdoc
>
> Then, read up on using the system() or backtick operators in a Ruby script
> to engage it. You'll need to have a path to the file you want to process,
> which is highly dependent on the system you're using to store the files. In
> Paperclip, I made this processor to extract text from PDF files (pdftotext
> is part of the same collection of utilities as catdoc, I believe):
>
> #lib/paperclip_processors/text.rb
>
> module Paperclip
> # Handles extracting plain text from PDF file attachments
> class Text < Processor
>
> attr_accessor :whiny
>
> # Creates a Text extract from PDF
> def make
> src = @file
> dst = Tempfile.new([@basename, 'txt'].compact.join("."))
> command = <<-end_command
> "#{ File.expand_path(src.path) }"
> "#{ File.expand_path(dst.path) }"
> end_command
>
> begin
> success = Paperclip.run("/usr/bin/pdftotext -nopgbrk",
> command.gsub(/\s+/, " "))
> Rails.logger.info "Processing #{src.path} to #{dst.path} in the
> text processor."
> rescue PaperclipCommandLineError
> raise PaperclipError, "There was an error processing the text for
> #{@basename}" if @whiny
> end
> dst
> end
> end
> end
>
> Depending on how you are uploading your files, your mileage may vary. At
> the very simplest, the command would be
>
> text_contents = system('/usr/bin/catdoc /root/relative/path/to/file.doc')
>
> But that's hopelessly naive and will blow up on any error.
>
> Walter
>
>
> On Sep 16, 2012, at 6:16 AM, rovin varshney wrote:
>
> >
> > Hi Walter Lee Davis , Paul
> >
> > Please can u give some code snipet or give some more
> clarification about parsing doc file.
> >
> > On Sat, Sep 15, 2012 at 7:37 PM, Scott Ribe <[email protected]>
> wrote:
> > On Sep 15, 2012, at 7:27 AM, Paul wrote:
> >
> > > The docx format is actually pretty simple...
> >
> > You are really cruel to toy with him like that ;-)
> >
> >
> > --
> > Scott Ribe
> > [email protected]
> > http://www.elevated-dev.com/
> > (303) 722-0567 voice
> >
> >
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "Ruby on Rails: Talk" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> [email protected].
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
> >
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "Ruby on Rails: Talk" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> [email protected].
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "Ruby on Rails: Talk" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
--
You received this message because you are subscribed to the Google Groups "Ruby
on Rails: Talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit https://groups.google.com/groups/opt_out.