---------- Forwarded message ----------
From: leena chourey <[email protected]>
Date: Tue, Jul 6, 2010 at 4:49 PM
Subject: Re: [poppler] Accessibility of PDF documents (corrected patch
attached)
To: Albert Astals Cid <[email protected]>
Cc: [email protected], onkar <[email protected]>, Aparna
Ramamurthy <[email protected]>


Dear Albert,

Thanks for your response.

As discussed in the last mail, we have modified the patch so that:

   - There is no behavioural change in pdftohtml -c <filename> means it
   produces exactly the same output it did before.
   - Defined new option as pdftohtml -s <filename> to generate a single html
   file corresponding to a pdf file.

Please check and give your feedback if any further change is required.

With best regard
Leena C


On Wed, Jun 23, 2010 at 1:19 AM, Albert Astals Cid <[email protected]> wrote:

> A Dimarts, 22 de juny de 2010, leena chourey va escriure:
> > Dear Albert,
> >
> > Thanks for giving detail comment to patch.
> > Please check updates given inline:
>
> Please do not forget to CC the poppler mailing list.
>
> >
> > On Thu, Jun 17, 2010 at 4:14 AM, Albert Astals Cid <[email protected]>
> wrote:
> > > A Dimecres, 16 de juny de 2010, omkar va escriure:
> > > > Dear Albert,
> > > >
> > > > Please find the corrected patch for "accessibility of pdf document "
> > > > and give your feedback.
> > >
> > > Hi, some comments:
> > >  * The comments like
> > >  // One more parameter(int j) is added in the getCSStyle function by
> CDAC
> > >
> > > developer Team
> > >
> > >   need to be removed, if each line had near it who coded it, the code
> > >   will
> > >
> > > be
> > > twice as big and much more unreadable
> >
> > Done, deleted all unwanted comments
> >
> > >   * The spacing of your patches could be better, that is
> > >
> > > GooString* HtmlFontAccu::getCSStyle(int i, GooString* content ,int j){
> > > should be
> > > +GooString* HtmlFontAccu::getCSStyle(int i, GooString* content, int j){
> > > but that's nothing huge, i can fix it
> >
> > Updated accordingly.
> >
> > >   * You are leaking (i.e. not deleting) jStr in both
> > >
> > > HtmlFontAccu::getCSStyle
> > > and HtmlFontAccu::CSStyle
> >
> > Deleted jStr
> >
> > >  * I see that the new HtmlPage::complexHtml and the old
> > >
> > > HtmlPage::dumpComplex
> > > are very simple, i if you reused the code instead of copying it
> > >
> > >  * This introduces a behavioural change that is unaccetable, i
> understand
> > >
> > > you
> > > want pdftohtml to produce a different (in your opinion better) output,
> > > for that you'll have to introduce a new comandline option to pdftohtml
> > > (something
> > > like --singlehtml) or something like that
> >
> > For last 2 point we want some clarification.
> > As you said behavioural change is unacceptable and also suggested to
> > introduce a new command line option to generate single html. So if we do
> as
> > following, will it be acceptable?
> >
> >    - *Existing is:*
> >    Command line option: pdftohtml -c  <filename>
> >    Function called:
> >
> >
> >                       dumpComplex
> >    ()
> >    {
> >        Read from input file
> >        Write into file to Generates pagewise html format
> >    }
> >
> >
> >    - *Proposed changes:*
> >    New Command line option : pdftohtml -s <filename>         //Checked,
> >    nothing  is already defined for -s            (pdftohtml -c
>  <filename>
> > will exists as it is)
> >
> >
> >    - Function called:
> >
> >                          dumpSingle()  //new function similar to
> > dumpComplex {
> >        Read from input file
> >        Write into file to append single html format
> >    }
> >
> >    - A function to “Read from input file” can be defined and call it in
> > both dumpComplex() and dumpSingle(), So that code duplication can be
> > removed (for second last point of your mail).
> >    - And with -s option (for --single Html) behavioural change will be
> >    defined separately. (-c will not be affected)
>
> To be clear, pdftohtml -c should produce exactly the same output it did
> before
> your patch, pdftohtml -s you can output your version.
>
> So yes, i think i kind of agree with your proposal.
>
> Albert
>
>
> >
> >
> > For your opinion
> >
> > With Regards
> > Leena C & Onkar P
> > (for CDAC Accessibility Team)
>



-- 
Leena C



-- 
Leena C
From 091e83261478eea9d41d1f6b8de73993989992b9 Mon Sep 17 00:00:00 2001
From: Leena <[email protected]>, Onkar<[email protected]>
Date: Tue, 6 Jul 2010 16:31:06 +0530
Subject: [PATCH] accesspdf(pdftohtml -s <file>.pdf)
This will help to make pdftohtml more accessible and usable.
New option '-s' for pdftohtml is defined to generate a single html <file>-html.html that includes all pages.
---
 utils/HtmlFonts.cc     |   11 ++++++++---
 utils/HtmlFonts.h      |    5 +++--
 utils/HtmlOutputDev.cc |   43 ++++++++++++++++++++++++++++++++-----------
 utils/pdftohtml.cc     |   14 +++++++++++---
 4 files changed, 54 insertions(+), 19 deletions(-)

diff --git a/utils/HtmlFonts.cc b/utils/HtmlFonts.cc
index d2cbfd5..d908820 100644
--- a/utils/HtmlFonts.cc
+++ b/utils/HtmlFonts.cc
@@ -288,12 +288,14 @@ int HtmlFontAccu::AddFont(const HtmlFont& font){
 }
 
 // get CSS font name for font #i 
-GooString* HtmlFontAccu::getCSStyle(int i, GooString* content){
+GooString* HtmlFontAccu::getCSStyle(int i, GooString* content ,int j){
   GooString *tmp;
   GooString *iStr=GooString::fromInt(i);
+  GooString *jStr=GooString::fromInt(j);
   
   if (!xml) {
     tmp = new GooString("<span class=\"ft");
+    tmp->append(jStr);
     tmp->append(iStr);
     tmp->append("\">");
     tmp->append(content);
@@ -303,15 +305,16 @@ GooString* HtmlFontAccu::getCSStyle(int i, GooString* content){
     tmp->append(content);
   }
 
+  delete jStr;
   delete iStr;
   return tmp;
 }
 
 // get CSS font definition for font #i 
-GooString* HtmlFontAccu::CSStyle(int i){
+GooString* HtmlFontAccu::CSStyle(int i, int j){
    GooString *tmp=new GooString();
    GooString *iStr=GooString::fromInt(i);
-
+   GooString *jStr=GooString::fromInt(j);
    GooVector<HtmlFont>::iterator g=accu->begin();
    g+=i;
    HtmlFont font=*g;
@@ -322,6 +325,7 @@ GooString* HtmlFontAccu::CSStyle(int i){
    
    if(!xml){
      tmp->append(".ft");
+     tmp->append(jStr);
      tmp->append(iStr);
      tmp->append("{font-size:");
      tmp->append(Size);
@@ -352,6 +356,7 @@ GooString* HtmlFontAccu::CSStyle(int i){
 
    delete fontName;
    delete colorStr;
+   delete jStr;
    delete iStr;
    delete Size;
    return tmp;
diff --git a/utils/HtmlFonts.h b/utils/HtmlFonts.h
index df2b570..33a66f5 100644
--- a/utils/HtmlFonts.h
+++ b/utils/HtmlFonts.h
@@ -89,8 +89,9 @@ public:
     g+=i;  
     return g;
   } 
-  GooString* getCSStyle (int i, GooString* content);
-  GooString* CSStyle(int i);
+//One more parameter(int j) is added in the getCSStyle and CSStyle function by CDAC developer Team
+  GooString* getCSStyle (int i,GooString* content, int j=0);
+  GooString* CSStyle(int i,int j=0);
   int size() const {return accu->size();}
   
 };  
diff --git a/utils/HtmlOutputDev.cc b/utils/HtmlOutputDev.cc
index 81f8b88..10e918b 100644
--- a/utils/HtmlOutputDev.cc
+++ b/utils/HtmlOutputDev.cc
@@ -64,6 +64,7 @@ GooList *HtmlOutputDev::imgList=new GooList();
 
 extern double scale;
 extern GBool complexMode;
+extern GBool singleHtml;
 extern GBool ignore;
 extern GBool printCommands;
 extern GBool printHtml;
@@ -669,22 +670,34 @@ void HtmlPage::dumpComplex(FILE *file, int page){
   {
       GooString* pgNum=GooString::fromInt(page);
       tmp = new GooString(DocName);
-      tmp->append('-')->append(pgNum)->append(".html");
+      if (!singleHtml){
+            tmp->append('-')->append(pgNum)->append(".html");
+            pageFile = fopen(tmp->getCString(), "w");
+      }
+      else {
+            tmp->append("-html")->append(".html");//////////////
+            pageFile = fopen(tmp->getCString(), "a");
+      }
       delete pgNum;
-  
-      if (!(pageFile = fopen(tmp->getCString(), "w"))) {
+      if (!pageFile) {
 	  error(-1, "Couldn't open html file '%s'", tmp->getCString());
 	  delete tmp;
 	  return;
-      } 
+      }
+
       delete tmp;
 
-      fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE>Page %d</TITLE>\n\n",
-	      DOCTYPE, page);
+      if (!singleHtml)
+          fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE>Page %d</TITLE>\n\n", DOCTYPE, page);
+      else
+          fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE> %s</TITLE>\n\n", DOCTYPE, tmp->getCString());////file name
 
       htmlEncoding = HtmlOutputDev::mapEncodingToHtml
 	  (globalParams->getTextEncodingName());
-      fprintf(pageFile, "<META http-equiv=\"Content-Type\" content=\"text/html; charset=%s\">\n", htmlEncoding);
+      if (!singleHtml)
+          fprintf(pageFile, "<META http-equiv=\"Content-Type\" content=\"text/html; charset=%s\">\n", htmlEncoding);
+      else
+          fprintf(pageFile, "<META http-equiv=\"Content-Type\" content=\"text/html; charset=%s\">\n<br>\n", htmlEncoding);
   }
   else 
   {
@@ -700,7 +713,11 @@ void HtmlPage::dumpComplex(FILE *file, int page){
    
   fputs("<STYLE type=\"text/css\">\n<!--\n",pageFile);
   for(int i=fontsPageMarker;i!=fonts->size();i++) {
-    GooString *fontCSStyle = fonts->CSStyle(i);
+    GooString *fontCSStyle;
+    if (!singleHtml)
+          fontCSStyle = fonts->CSStyle(i);
+    else
+          fontCSStyle = fonts->CSStyle(i,page);
     fprintf(pageFile,"\t%s\n",fontCSStyle->getCString());
     delete fontCSStyle;
   }
@@ -731,7 +748,10 @@ void HtmlPage::dumpComplex(FILE *file, int page){
 	      xoutRound(tmp1->yMin),
 	      xoutRound(tmp1->xMin));
       fputs("<nobr>",pageFile); 
-      str1=fonts->getCSStyle(tmp1->fontpos, str);  
+      if (!singleHtml)
+          str1=fonts->getCSStyle(tmp1->fontpos, str);
+      else
+          str1=fonts->getCSStyle(tmp1->fontpos, str, page);
       fputs(str1->getCString(),pageFile);
       delete str;      
       delete str1;
@@ -751,7 +771,7 @@ void HtmlPage::dumpComplex(FILE *file, int page){
 
 void HtmlPage::dump(FILE *f, int pageNum) 
 {
-  if (complexMode)
+  if (complexMode || singleHtml)
   {
     if (xml) dumpAsXML(f, pageNum);
     if (!xml) dumpComplex(f, pageNum);  
@@ -943,6 +963,7 @@ HtmlOutputDev::HtmlOutputDev(char *fileName, char *title,
   if(!xml && !noframes)
   {
      GooString* left=new GooString(fileName);
+     if (!singleHtml){
      left->append("_ind.html");
 
      doFrame(firstPage);
@@ -963,7 +984,7 @@ HtmlOutputDev::HtmlOutputDev(char *fileName, char *title,
 		fprintf(fContentsFrame, "<A href=\"%s%s\" target=\"contents\">Outline</a><br>", str->getCString(), complexMode ? "-outline.html" : "s.html#outline");
 		delete str;
 	}
-  	
+     }
 	if (!complexMode)
 	{	/* not in complex mode */
 		
diff --git a/utils/pdftohtml.cc b/utils/pdftohtml.cc
index 3c74c6e..3578641 100644
--- a/utils/pdftohtml.cc
+++ b/utils/pdftohtml.cc
@@ -61,6 +61,7 @@ GBool printCommands = gTrue;
 static GBool printHelp = gFalse;
 GBool printHtml = gFalse;
 GBool complexMode=gFalse;
+GBool singleHtml=gFalse; // singleHtml
 GBool ignore=gFalse;
 //char extension[5]=".png";
 double scale=1.5;
@@ -99,6 +100,8 @@ static const ArgDesc argDesc[] = {
    "exchange .pdf links by .html"}, 
   {"-c",      argFlag,     &complexMode,          0,
    "generate complex document"},
+  {"-s",      argFlag,     &singleHtml,          0,
+   "generate single document that includes all pages"},
   {"-i",      argFlag,     &ignore,        0,
    "ignore images"},
   {"-noframes", argFlag,   &noframes,      0,
@@ -253,7 +256,7 @@ int main(int argc, char *argv[]) {
    if (scale>3.0) scale=3.0;
    if (scale<0.5) scale=0.5;
    
-   if (complexMode) {
+   if (complexMode || singleHtml) {
      //noframes=gFalse;
      stout=gFalse;
    } 
@@ -261,11 +264,13 @@ int main(int argc, char *argv[]) {
    if (stout) {
      noframes=gTrue;
      complexMode=gFalse;
+     singleHtml=gFalse;
    }
 
    if (xml)
    { 
        complexMode = gTrue;
+       singleHtml=gTrue;
        noframes = gTrue;
        noMerge = gTrue;
    }
@@ -300,7 +305,10 @@ int main(int argc, char *argv[]) {
 	  }
   }}
 
-  rawOrder = complexMode; // todo: figure out what exactly rawOrder do :)
+  if (!singleHtml)
+      rawOrder = complexMode; // todo: figure out what exactly rawOrder do :)
+  else
+      rawOrder = singleHtml;
 
   // write text file
   htmlOut = new HtmlOutputDev(htmlFileName->getCString(), 
@@ -341,7 +349,7 @@ int main(int argc, char *argv[]) {
 	}
   }
   
-  if( complexMode && !xml && !ignore ) {
+  if( (complexMode || singleHtml) && !xml && !ignore ) {
     int h=xoutRound(htmlOut->getPageHeight()/scale);
     int w=xoutRound(htmlOut->getPageWidth()/scale);
     //int h=xoutRound(doc->getPageHeight(1)/scale);
-- 
1.6.3.3

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to