Thanks Albert to review this patch again. Comments are inline: On Sun, Jul 25, 2010 at 9:27 PM, Albert Astals Cid <[email protected]> wrote:
> A Dimarts, 6 de juliol de 2010, leena chourey va escriure: > > Dear Albert, > > Hi > > > Thanks for your response. > > > > As discussed in the last mail, we have modified the patch so that: > > > > - There is no behavioural change in pdftohtml -c <filename> means it > > produces exactly the same output it did before. > > - Defined new option as pdftohtml -s <filename> to generate a single > > html file corresponding to a pdf file. > > > > Please check and give your feedback if any further change is required. > > You are using a variable you deleted (tmp) in this chunk of code > > *********************** > delete tmp; This (delete tmp) was from the original development code only, we didn't made changes regarding tmp. I have checked, it is in the recent development version also. > - fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE>Page %d</TITLE>\n\n", > - DOCTYPE, page); > + if (!singleHtml) > + fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE>Page > %d</TITLE>\n\n", > DOCTYPE, page); > + else > + fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE> %s</TITLE>\n\n", > DOCTYPE, tmp->getCString());////file name > *********************** > > I'm also concerned about you adding various <HTML> to the same .html page, > my > limited HTML knowledge says you can only have one of those. For the above: I would like to say that every page has different heading details as well as title. This should not be changed for pages. > > Also it would be necessary that you update the pdftohtml.1 file (the man > page) > adding the new option. pdftohtml.1 is updated. Please find the latest patch for "pdftohtml -s <file.pdf> " and give feedback. > > Albert > > > > > With best regard > > Leena C > > > > On Wed, Jun 23, 2010 at 1:19 AM, Albert Astals Cid <[email protected]> > wrote: > > > A Dimarts, 22 de juny de 2010, leena chourey va escriure: > > > > Dear Albert, > > > > > > > > Thanks for giving detail comment to patch. > > > > > > > Please check updates given inline: > > > Please do not forget to CC the poppler mailing list. > > > > > > > On Thu, Jun 17, 2010 at 4:14 AM, Albert Astals Cid <[email protected]> > > > > > > wrote: > > > > > A Dimecres, 16 de juny de 2010, omkar va escriure: > > > > > > Dear Albert, > > > > > > > > > > > > Please find the corrected patch for "accessibility of pdf > document > > > > > > " and give your feedback. > > > > > > > > > > Hi, some comments: > > > > > * The comments like > > > > > // One more parameter(int j) is added in the getCSStyle function > by > > > > > > CDAC > > > > > > > > developer Team > > > > > > > > > > need to be removed, if each line had near it who coded it, the > code > > > > > will > > > > > > > > > > be > > > > > twice as big and much more unreadable > > > > > > > > Done, deleted all unwanted comments > > > > > > > > > * The spacing of your patches could be better, that is > > > > > > > > > > GooString* HtmlFontAccu::getCSStyle(int i, GooString* content ,int > > > > > j){ should be > > > > > +GooString* HtmlFontAccu::getCSStyle(int i, GooString* content, int > > > > > j){ but that's nothing huge, i can fix it > > > > > > > > Updated accordingly. > > > > > > > > > * You are leaking (i.e. not deleting) jStr in both > > > > > > > > > > HtmlFontAccu::getCSStyle > > > > > and HtmlFontAccu::CSStyle > > > > > > > > Deleted jStr > > > > > > > > > * I see that the new HtmlPage::complexHtml and the old > > > > > > > > > > HtmlPage::dumpComplex > > > > > are very simple, i if you reused the code instead of copying it > > > > > > > > > > * This introduces a behavioural change that is unaccetable, i > > > > > > understand > > > > > > > > you > > > > > want pdftohtml to produce a different (in your opinion better) > > > > > output, for that you'll have to introduce a new comandline option > to > > > > > pdftohtml (something > > > > > like --singlehtml) or something like that > > > > > > > > For last 2 point we want some clarification. > > > > As you said behavioural change is unacceptable and also suggested to > > > > introduce a new command line option to generate single html. So if we > > > > do > > > > > > as > > > > > > > following, will it be acceptable? > > > > > > > > - *Existing is:* > > > > Command line option: pdftohtml -c <filename> > > > > > > > > Function called: > > > > dumpComplex > > > > > > > > () > > > > { > > > > > > > > Read from input file > > > > Write into file to Generates pagewise html format > > > > > > > > } > > > > > > > > > > > > - *Proposed changes:* > > > > New Command line option : pdftohtml -s <filename> > //Checked, > > > > nothing is already defined for -s (pdftohtml -c > > > > > > <filename> > > > > > > > will exists as it is) > > > > > > > > - Function called: > > > > dumpSingle() //new function similar to > > > > > > > > dumpComplex { > > > > > > > > Read from input file > > > > Write into file to append single html format > > > > > > > > } > > > > > > > > - A function to “Read from input file” can be defined and call it > in > > > > > > > > both dumpComplex() and dumpSingle(), So that code duplication can be > > > > removed (for second last point of your mail). > > > > > > > > - And with -s option (for --single Html) behavioural change will > be > > > > defined separately. (-c will not be affected) > > > > > > To be clear, pdftohtml -c should produce exactly the same output it did > > > before > > > your patch, pdftohtml -s you can output your version. > > > > > > So yes, i think i kind of agree with your proposal. > > > > > > Albert > > > > > > > For your opinion > > > > > > > > With Regards > > > > Leena C & Onkar P > > > > (for CDAC Accessibility Team) > With best regards Leena C
From 972092740c809f978861b379752878e2c1e1ea87 Mon Sep 17 00:00:00 2001 From: Leena <[email protected]> Date: Wed, 1 Sep 2010 11:00:03 +0530 Subject: [PATCH] accesspdf(pdftohtml -s <file>.pdf) This will help to make pdftohtml more accessible and usable. New option '-s' for pdftohtml is defined to generate complex html with a single <file>-html.html including all pages. --- utils/HtmlFonts.cc | 10 +++++- utils/HtmlFonts.h | 5 ++- utils/HtmlOutputDev.cc | 70 ++++++++++++++++++++++++++++++----------------- utils/pdftohtml.1 | 3 ++ utils/pdftohtml.cc | 14 +++++++-- 5 files changed, 70 insertions(+), 32 deletions(-) diff --git a/utils/HtmlFonts.cc b/utils/HtmlFonts.cc index d2cbfd5..aff3626 100644 --- a/utils/HtmlFonts.cc +++ b/utils/HtmlFonts.cc @@ -288,12 +288,14 @@ int HtmlFontAccu::AddFont(const HtmlFont& font){ } // get CSS font name for font #i -GooString* HtmlFontAccu::getCSStyle(int i, GooString* content){ +GooString* HtmlFontAccu::getCSStyle(int i, GooString* content ,int j){ GooString *tmp; GooString *iStr=GooString::fromInt(i); + GooString *jStr=GooString::fromInt(j); if (!xml) { tmp = new GooString("<span class=\"ft"); + tmp->append(jStr); tmp->append(iStr); tmp->append("\">"); tmp->append(content); @@ -303,14 +305,16 @@ GooString* HtmlFontAccu::getCSStyle(int i, GooString* content){ tmp->append(content); } + delete jStr; delete iStr; return tmp; } // get CSS font definition for font #i -GooString* HtmlFontAccu::CSStyle(int i){ +GooString* HtmlFontAccu::CSStyle(int i, int j){ GooString *tmp=new GooString(); GooString *iStr=GooString::fromInt(i); + GooString *jStr=GooString::fromInt(j); GooVector<HtmlFont>::iterator g=accu->begin(); g+=i; @@ -322,6 +326,7 @@ GooString* HtmlFontAccu::CSStyle(int i){ if(!xml){ tmp->append(".ft"); + tmp->append(jStr); tmp->append(iStr); tmp->append("{font-size:"); tmp->append(Size); @@ -352,6 +357,7 @@ GooString* HtmlFontAccu::CSStyle(int i){ delete fontName; delete colorStr; + delete jStr; delete iStr; delete Size; return tmp; diff --git a/utils/HtmlFonts.h b/utils/HtmlFonts.h index df2b570..ceb47ef 100644 --- a/utils/HtmlFonts.h +++ b/utils/HtmlFonts.h @@ -89,8 +89,9 @@ public: g+=i; return g; } - GooString* getCSStyle (int i, GooString* content); - GooString* CSStyle(int i); + //One more parameter(int j) is added in the getCSStyle and CSStyle function by CDAC developer Team + GooString* getCSStyle (int i,GooString* content, int j=0); + GooString* CSStyle(int i,int j=0); int size() const {return accu->size();} }; diff --git a/utils/HtmlOutputDev.cc b/utils/HtmlOutputDev.cc index dbf677f..89285cc 100644 --- a/utils/HtmlOutputDev.cc +++ b/utils/HtmlOutputDev.cc @@ -65,6 +65,7 @@ GooList *HtmlOutputDev::imgList=new GooList(); extern double scale; extern GBool complexMode; +extern GBool singleHtml; extern GBool ignore; extern GBool printCommands; extern GBool printHtml; @@ -670,23 +671,34 @@ void HtmlPage::dumpComplex(FILE *file, int page){ { GooString* pgNum=GooString::fromInt(page); tmp = new GooString(DocName); - tmp->append('-')->append(pgNum)->append(".html"); + if (!singleHtml){ + tmp->append('-')->append(pgNum)->append(".html"); + pageFile = fopen(tmp->getCString(), "w"); + } + else { + tmp->append("-html")->append(".html");////////////// + pageFile = fopen(tmp->getCString(), "a"); + } delete pgNum; - - if (!(pageFile = fopen(tmp->getCString(), "w"))) { + if (!pageFile) { error(-1, "Couldn't open html file '%s'", tmp->getCString()); delete tmp; return; } + delete tmp; - fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE>Page %d</TITLE>\n\n", - DOCTYPE, page); + if (!singleHtml) + fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE>Page %d</TITLE>\n\n", DOCTYPE, page); + else + fprintf(pageFile,"%s\n<HTML>\n<HEAD>\n<TITLE> %s</TITLE>\n\n", DOCTYPE, tmp->getCString());////file name htmlEncoding = HtmlOutputDev::mapEncodingToHtml (globalParams->getTextEncodingName()); - fprintf(pageFile, "<META http-equiv=\"Content-Type\" content=\"text/html; charset=%s\">\n", htmlEncoding); - } + if (!singleHtml) + fprintf(pageFile, "<META http-equiv=\"Content-Type\" content=\"text/html; charset=%s\">\n", htmlEncoding); + else + fprintf(pageFile, "<META http-equiv=\"Content-Type\" content=\"text/html; charset=%s\">\n <br>\n", htmlEncoding); } else { pageFile = file; @@ -701,7 +713,11 @@ void HtmlPage::dumpComplex(FILE *file, int page){ fputs("<STYLE type=\"text/css\">\n<!--\n",pageFile); for(int i=fontsPageMarker;i!=fonts->size();i++) { - GooString *fontCSStyle = fonts->CSStyle(i); + GooString *fontCSStyle; + if (!singleHtml) + fontCSStyle = fonts->CSStyle(i); + else + fontCSStyle = fonts->CSStyle(i,page); fprintf(pageFile,"\t%s\n",fontCSStyle->getCString()); delete fontCSStyle; } @@ -732,7 +748,10 @@ void HtmlPage::dumpComplex(FILE *file, int page){ xoutRound(tmp1->yMin), xoutRound(tmp1->xMin)); fputs("<nobr>",pageFile); - str1=fonts->getCSStyle(tmp1->fontpos, str); + if (!singleHtml) + str1=fonts->getCSStyle(tmp1->fontpos, str); + else + str1=fonts->getCSStyle(tmp1->fontpos, str, page); fputs(str1->getCString(),pageFile); delete str; delete str1; @@ -752,7 +771,7 @@ void HtmlPage::dumpComplex(FILE *file, int page){ void HtmlPage::dump(FILE *f, int pageNum) { - if (complexMode) + if (complexMode || singleHtml) { if (xml) dumpAsXML(f, pageNum); if (!xml) dumpComplex(f, pageNum); @@ -944,27 +963,28 @@ HtmlOutputDev::HtmlOutputDev(char *fileName, char *title, if(!xml && !noframes) { GooString* left=new GooString(fileName); - left->append("_ind.html"); + if (!singleHtml){ + left->append("_ind.html"); - doFrame(firstPage); + doFrame(firstPage); - if (!(fContentsFrame = fopen(left->getCString(), "w"))) + if (!(fContentsFrame = fopen(left->getCString(), "w"))) + { + error(-1, "Couldn't open html file '%s'", left->getCString()); + delete left; + return; + } + delete left; + fputs(DOCTYPE, fContentsFrame); + fputs("<HTML>\n<HEAD>\n<TITLE></TITLE>\n</HEAD>\n<BODY>\n",fContentsFrame); + + if (doOutline) { - error(-1, "Couldn't open html file '%s'", left->getCString()); - delete left; - return; - } - delete left; - fputs(DOCTYPE, fContentsFrame); - fputs("<HTML>\n<HEAD>\n<TITLE></TITLE>\n</HEAD>\n<BODY>\n",fContentsFrame); - - if (doOutline) - { GooString *str = basename(Docname); fprintf(fContentsFrame, "<A href=\"%s%s\" target=\"contents\">Outline</a><br>", str->getCString(), complexMode ? "-outline.html" : "s.html#outline"); delete str; - } - + } + } if (!complexMode) { /* not in complex mode */ diff --git a/utils/pdftohtml.1 b/utils/pdftohtml.1 index 6cdc6c6..bbdfa56 100644 --- a/utils/pdftohtml.1 +++ b/utils/pdftohtml.1 @@ -40,6 +40,9 @@ exchange .pdf links with .html .B \-c generate complex output .TP +.B \-s +generate single html that includes all pages +.TP .B \-i ignore images .TP diff --git a/utils/pdftohtml.cc b/utils/pdftohtml.cc index 5762f90..761396a 100644 --- a/utils/pdftohtml.cc +++ b/utils/pdftohtml.cc @@ -67,6 +67,7 @@ GBool printCommands = gTrue; static GBool printHelp = gFalse; GBool printHtml = gFalse; GBool complexMode=gFalse; +GBool singleHtml=gFalse; // singleHtml GBool ignore=gFalse; GBool useSplash=gTrue; char extension[5]="png"; @@ -107,6 +108,8 @@ static const ArgDesc argDesc[] = { "exchange .pdf links by .html"}, {"-c", argFlag, &complexMode, 0, "generate complex document"}, + {"-s", argFlag, &singleHtml, 0, + "generate single document that includes all pages"}, {"-i", argFlag, &ignore, 0, "ignore images"}, {"-noframes", argFlag, &noframes, 0, @@ -293,7 +296,7 @@ int main(int argc, char *argv[]) { if (scale>3.0) scale=3.0; if (scale<0.5) scale=0.5; - if (complexMode) { + if (complexMode || singleHtml) { //noframes=gFalse; stout=gFalse; } @@ -301,11 +304,13 @@ int main(int argc, char *argv[]) { if (stout) { noframes=gTrue; complexMode=gFalse; + singleHtml=gFalse; } if (xml) { complexMode = gTrue; + singleHtml = gFalse; noframes = gTrue; noMerge = gTrue; } @@ -359,7 +364,10 @@ int main(int argc, char *argv[]) { } #endif - rawOrder = complexMode; // todo: figure out what exactly rawOrder do :) + if (!singleHtml) + rawOrder = complexMode; // todo: figure out what exactly rawOrder do :) + else + rawOrder = singleHtml; // write text file htmlOut = new HtmlOutputDev(htmlFileName->getCString(), @@ -400,7 +408,7 @@ int main(int argc, char *argv[]) { } } - if( complexMode && !xml && !ignore ) { + if ((complexMode || singleHtml) && !xml && !ignore) { if(useSplash) { #ifdef HAVE_SPLASH GooString *imgFileName = NULL; -- 1.6.3.3
_______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
