Re: [WSG] Making PDF and Word files accessible
I have good experience with Tidy: http://tidy.sourceforge.net/ /Anders George S. Williams skrev: On Fri, 2005-06-03 at 06:36, Angela Galvin wrote: Secondly, with the Word documents, if there is an easier way to convert them to HTML? I use an open source program, antiword, to convert the Word docs to text and then just add the necessary markup. (And, of course, edit out the Word weirdness!) I've found this to be about 5 times faster than cut and paste. This is on a Linux box, but a Windows version of antiword seems to be available at- http://www.informatik.uni-frankfurt.de/~markus/antiword/ George ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help ** ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] Making PDF and Word files accessible
Hope Stewart wrote: Hi Angela, I see that your email was sent using Apple Mail. Assuming you are also using Dreamweaver on a Mac, you can try what I do: cut paste the Word doc into AppleWorks. Then either save the AppleWorks doc as html or cut paste from AppleWorks into Dreamweaver. AppleWorks strips out all the Word rubbish. HTH, Hope Stewart This sounds familiar, oh yeah!. When doing this on a PC, I just found out just yesterday that cutting and pasting Word text into Notepad THEN cutting and pasting from Notepad to Dreamweaver seems to work for me. /But/ it removes those pesky (tm) and (r) symbols and sometimes curly quotes. The link sent by heretic [http://textism.com/wordcleaner/] works wonders!! I just tried a one-page Word doc. I tried the same file through TidyGUI, it didn't do the ul's and li where needed and it left some artifacts: code po:p pnbsp;/p /o:p/p /code It's nice to know we have tools at our disposal to help make our lives easier (and cut the time spent coding)! :D regards, Z u l e m a O r t i z w e b d e s i g n e r email : [EMAIL PROTECTED] website : http://zoblue.com/ weblog : http://blog.zoblue.com/ browser : http://getfirefox.com/ ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
[WSG] Making PDF and Word files accessible
Hello all, I have the task of adding a bunch of PDF and Word files to a web site I work on, that currently conforms to WAI Priority 1 guidelines. My first question is that if I convert the PDF files to HTML to make them more accessible, am I right in thinking that this is only half my job done? If the original file wasn't marked up correctly in the first place before being saved as PDF (with headings, etc) does this mean that its still not really accessible? Secondly, with the Word documents, if there is an easier way to convert them to HTML? At the moment I am saving as HTML from Word, taking them into Dreamweaver and using 'Clean up Word HTML'. After that I use 'Find and replace' to strip out all font, span and attributes from p such as class and style. At which point I still have to mark up the document with proper headings, bulleted lists, etc. A little time-consuming and fiddly to say the least! Am I doing this right or is there another way to make these files accessible? (and make my life easier, after all it is Friday :-) ) Angela Angela Galvin Worth Media 15-17 Middle Street Brighton BN1 1AL T: 01273 201149 F: 01273 710004 - www.worthmedia.net ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] Making PDF and Word files accessible
At 05:36 AM 6/3/2005, you wrote: snip Secondly, with the Word documents, if there is an easier way to convert them to HTML? At the moment I am saving as HTML from Word, taking them into Dreamweaver and using 'Clean up Word HTML'. After that I use 'Find and replace' to strip out all font, span and attributes from p such as class and style. At which point I still have to mark up the document with proper headings, bulleted lists, etc. A little time-consuming and fiddly to say the least! Am I doing this right or is there another way to make these files accessible? (and make my life easier, after all it is Friday :-) ) Angela Angela Galvin Worth Media 15-17 Middle Street Brighton BN1 1AL T: 01273 201149 F: 01273 710004 - www.worthmedia.net I would skip the part where you save from Word into HTML. Why give yourself the grief? If you copy and paste the text into the 'content' part of your standard page, the line breaks will show you where the paragraph and headings are. I'm using Homesite so I just select and repeat the similar code ( first p, then h1, h2 etc) from one end of the document to the other. Generally the only thing missing them is the the use of bold and italic within the text (not part of the heading structure) and any tables or lists within the text. Validate to catch any stray weirdness and on to the next. Perhaps not the most interesting type of web coding but listening to music of your taste, you can work up a good rhythm and code a whack of stuff relatively cleanly. Not a bad way to spend a Friday. Mary Krieger Winnipeg Manitoba Canada http://www.mts.net/~mkrieger ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] Making PDF and Word files accessible
Angela Galvin wrote: Hello all, I have the task of adding a bunch of PDF and Word files to a web site I work on, that currently conforms to WAI Priority 1 guidelines. My first question is that if I convert the PDF files to HTML to make them more accessible, am I right in thinking that this is only half my job done? If the original file wasn't marked up correctly in the first place before being saved as PDF (with headings, etc) does this mean that its still not really accessible? Secondly, with the Word documents, if there is an easier way to convert them to HTML? At the moment I am saving as HTML from Word, taking them into Dreamweaver and using 'Clean up Word HTML'. After that I use 'Find and replace' to strip out all font, span and attributes from p such as class and style. At which point I still have to mark up the document with proper headings, bulleted lists, etc. A little time-consuming and fiddly to say the least! Am I doing this right or is there another way to make these files accessible? (and make my life easier, after all it is Friday :-) ) Angela Hi Angela, No easy way, but the most reliable is to cut and paste from Word into the design view of Dreamweaver. Using the design view ensures that all the spacing is preserved and indeed, all the quotes etc are presented as the correct codes. I didn't know this myself until recently, when someone on this list told me about it. Hope this helps, -- Bob McClelland Cornwall (U.K.) www.gwelanmor-internet.co.uk ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] Making PDF and Word files accessible
On Fri, 2005-06-03 at 06:36, Angela Galvin wrote: Secondly, with the Word documents, if there is an easier way to convert them to HTML? I use an open source program, antiword, to convert the Word docs to text and then just add the necessary markup. (And, of course, edit out the Word weirdness!) I've found this to be about 5 times faster than cut and paste. This is on a Linux box, but a Windows version of antiword seems to be available at- http://www.informatik.uni-frankfurt.de/~markus/antiword/ George ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
RE: [WSG] Making PDF and Word files accessible
Mary Krieger wrote: If you copy and paste the text into the 'content' part of your standard page, the line breaks will show you where the paragraph and headings are. I'm using Homesite so I just select and repeat the similar code ( first p, then h1, h2 etc) from one end of the document to the other. Depending on your version of MS Office, copying from displayed text may bring in a bunch of inline styles. Yes, even pasting into a text document! Ack! So, I usually save Word files as plain text (no line breaks) first. Next I use a good text editor with regular expression searching (I use TextPad, there are many others) to wrap text chunks in paragraph tags (e.g. ^is the beginning of a line, $ is the end, \n is carriage return, etc...) And last, I do a search and replace for weird apostrophes, quotes, dashes, etc... Generally the only thing missing them is the the use of bold and italic within the text (not part of the heading structure) and any tables or lists within the text. If you save as text, you'll still have tabs and funky characters for lists, which can also be regular expression searched and replaced with the right tags. I actually create a batch action for each contributor role that regularly sends me Word documents, which does most of the standard searches one after another (and in the right order, which I can screw up if it's been awhile) with the press of a hotkey. This allows me to include foreign characters for certain contributors, em dashes for others, different list designators for Macs vs. PCs, etc... The newest Acrobat (7 Pro) also exports to plain text quite effectively...not just RTF. It ostensibly offers an html w/css option, but uses inline styles extensively, so the plain text route is more efficient. Jona Decker Madison, WI USA ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] Making PDF and Word files accessible
Hi there, My first question is that if I convert the PDF files to HTML to make them more accessible, am I right in thinking that this is only half my job done? If the original file wasn't marked up correctly in the first place before being saved as PDF (with headings, etc) does this mean that its still not really accessible? As an extremely broad generalisation, yes - bad source gets bad output. However every case is different so you'll have to check your resulting (X)HTML to make sure it's standards compliant/accessible. Secondly, with the Word documents, if there is an easier way to convert them to HTML? At the moment I am saving as HTML from Word, taking them into Dreamweaver and using 'Clean up Word HTML'. Try http://textism.com/wordcleaner/ I've found it's pretty good, esp. in conjunction with the DW tricks you mention. If you have a large amount of this sort of work, you might like to invest in http://cita.disability.uiuc.edu/software/office/ cheers h -- --- http://www.200ok.com.au/ --- The future has arrived; it's just not --- evenly distributed. - William Gibson ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **