On Fri, 11 Aug 2006 15:39:41 +0200
Claudia Drechsle <[EMAIL PROTECTED]> wrote:

> 
> >>> Can I convert a saved PDF to a Microsoft Word with the openoffice
> >>> program?
> >>>
> >>>
> >> No.  This can only be done with one of the commercial Adobe
> >> programs. All others are prohibited from doing so.
> > 
> > If you have access to Linux (or a Linux Live CD), KWord can do it
> > very well.
> > 
> 
> Yes, KWord can import PDF-files. But if there are frames, graphics,
> tables etc within the pdf, the result is not very usable.
> In my cases KWord even crashed in some cases when I tried to import a
> PDF-file.

The same (KWord 1.5.1, KDE 3.5.3). And I couldn't get any acceptable
results saving to odt and opening with OOo.

> So I think, PDF that contain only text may be imported by KWord. 
> Are other programs able to import also structured PDF's into a
> Text-Format?

Open source - I guess no.

Possible workaround (indirect) may be pdftohtml.

I've tried version 0.36 (debian package) on a 132 pages user's manual
with tables and pictures (marked with textboxes, arrows, etc.), and
here are the results:

Simple output: looses many images, ignores fonts and font sizes, but
preserves text flow and bold text (maybe italics, too - don't know),
and outputs a single file (+ separate index and outline files). Tables
and text boxes are imported as simple text, and drawing objects aren't
imported at all.
If at least all the images were imported in right places, this way
would be the best. Some formating should still be done, but it's not
hard using styles.

---------------------------------

Complex html output (-c option) is very precise when opened in Firefox.
The precision is achieved at the cost of making everything but the
text a single background image, and placing text in <div>'s with
absolute positioning.
Each pdf page is converted to a separate html document.

Text flow is lost (paragraphs are split, too big spaces are made) on
subscripts and superscripts. A few paragraphs have wrong fonts -
strange since only one font is used in the document.

This option is good if you don't need to change the layout of the
original graphics. It means you are lost if you e.g. translate the
document, and the translation doesn't fit into a table cell, or a
picture needs to be moved due to different row number in a paragraph.

--------------------------------

But better try it yourself - I've tested just one file, and the only
docs I've read was a man page (not sure if there are other docs).

Another thing is getting the html to odt. I couldn't get good results
straight away with Writer/Web. But these were my first html ones opened
with OOo, so must not be a problem to anyone who knows how to use it.

So - waiting for KWord to improve both in import and odf? :-)



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to