All:

 

I'm using (or trying to) use the CFPDF processddx action to extract the text
contents of a large number of PDF files.  I'm running into two problems that
I'm hoping you can help with.

 

The first problem is that, on the hosted, production server, running the tag
fails to write the contents of the PDF file to an XML file.  The output file
(the value of the "result" attribute of the "DocumentText" tag) is never
written.  The code works perfectly in development environments but not in
the production environment.  Here's the relevant code:

 

<!--- Create DDX --->

<CFSAVECONTENT VARIABLE="ddx"><?xml version="1.0" encoding="UTF-8"?>

<DDX xmlns="http://ns.adobe.com/DDX/1.0/";
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation="http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd">

<DocumentText result="output1">

<PDF source="input1"/>

</DocumentText>

</DDX></CFSAVECONTENT>

 

<!--- Set in/out parameters --->

<CFSET stInput = {input1="#VARIABLES.pdfFile#"}>

<CFSET stOutput = {output1="#VARIABLES.xmlFile#"}>

 

<!--- Process --->

<CFPDF ACTION="processddx" DDXFILE="#ddx#" INPUTFILES="#stInput#"
OUTPUTFILES="#stOutput#" NAME="ddxResult">

 

The second problem, even in the development environments, has to do with the
text that is extracted from the PDFs.  Some of the files, likely because of
how they were scanned, produce results with spaces between the letters of
the extracted contents like:

                T h i s   shoul d not b e  ha pp ening

 

The interesting thing is that if you open the PDF in Acrobat and copy and
paste the affected text, it pastes properly (i.e., no weird spacing issues).
So I thought to have Acrobat "Recognize text with OCR" and then extract the
contents.  When I tried this, CFPDF consistently produced empty XML files
where before it was producing the XML files with the weird spacing.

 

Does anyone know if there's a limitation in the processddx functionality
that prevents it from reading PDF files that have been processed that way?

 

Thanks in advance.

 

--

Mosh Teitelbaum

evoch, LLC

WWW: http://www.evoch.com/

 




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Want to reach the ColdFusion community with something they want? Let them know 
on the House of Fusion mailing lists
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:325883
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

Reply via email to