Extracting text from various file-types

2012-08-10 Thread Robert Rhodes

Hello again to all.

I need a way to extract text from word, excel, text, pdf, and ppt files
with Coldfusion, as the files are each submitted via a form.  The output
does not have to be particularly pretty or nicely formatted -- just plain
text that can be stored and searched later.

Any ideas?

--RR


~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:352103
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Extracting text from various file-types

2012-08-10 Thread Bruce Sorge

Check out the CFFILE tag. That offers this type of functionality.

Bruce


On Aug 10, 2012, at 4:07 PM, Robert Rhodes rrhode...@gmail.com wrote:

 
 Hello again to all.
 
 I need a way to extract text from word, excel, text, pdf, and ppt files
 with Coldfusion, as the files are each submitted via a form.  The output
 does not have to be particularly pretty or nicely formatted -- just plain
 text that can be stored and searched later.
 
 Any ideas?
 
 --RR
 

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:352104
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Extracting text from various file-types

2012-08-10 Thread Robert Rhodes

Hi Bruce.  Thanks for the reply.

I did, but no luck.  On text files, I got the text just fine.  On Word
docs, I got the text but with a whole bunch of garbage in the return.

On ppt, pdf, and excel docs, they all come out as unreadable garbage.  I
tried both the read and readbinary actions and they both did not work.

Maybe I am doing something wrong?

I am using CF9.

-RR

On Fri, Aug 10, 2012 at 6:11 PM, Bruce Sorge sor...@gmail.com wrote:


 Check out the CFFILE tag. That offers this type of functionality.

 Bruce


 On Aug 10, 2012, at 4:07 PM, Robert Rhodes rrhode...@gmail.com wrote:

 
  Hello again to all.
 
  I need a way to extract text from word, excel, text, pdf, and ppt files
  with Coldfusion, as the files are each submitted via a form.  The output
  does not have to be particularly pretty or nicely formatted -- just plain
  text that can be stored and searched later.
 
  Any ideas?
 
  --RR
 

 

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:352110
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Extracting text from various file-types

2012-08-10 Thread Bruce Sorge

For word, did you add the attribute in cffile action=readbinary?

For excel, there is a cfspreadsheet tag that will read a spreadsheet and you 
can put a query attribute on it and output the result.

For PDF's, there is a cfpdf tag that you can use.

Obviously you will have to get the file type then use cfif to tell the page 
which tag to use for which file. Hope this helps


Bruce
On Aug 10, 2012, at 4:48 PM, Robert Rhodes rrhode...@gmail.com wrote:

 
 Hi Bruce.  Thanks for the reply.
 
 I did, but no luck.  On text files, I got the text just fine.  On Word
 docs, I got the text but with a whole bunch of garbage in the return.
 
 On ppt, pdf, and excel docs, they all come out as unreadable garbage.  I
 tried both the read and readbinary actions and they both did not work.
 
 Maybe I am doing something wrong?
 
 I am using CF9.
 
 -RR
 


~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:352113
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm


Re: Extracting text from various file-types

2012-08-10 Thread Leigh

I do not have the URL handy but take a look at Raymond Camden's blog. He wrote 
an entry on extracting text from MS Office documents using POI.  

For PDF, use cfpf's extract text option.


-Leigh

~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:352115
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm