Hi,

There is no PDFBox .net version. There is some unofficial stuff from old PDFBox versions.

There is no "formatting" in PDF like in HTML. Glyphs are put at specified places, sometimes 1 character at a time. There are products that try to recreate paragraphs from this. Even PDFBox tries this, but it's not perfect, see PDFText2HTML.java . To get the images, see the ExtractImages.java and PrintImageLocations.java . One would have to combine all this, and it would still not look very close to a PDF.

Tilman

On 20.03.2023 06:40, Vaishali Mahajan wrote:
Hi,

Creating PDF to Word conversion application using PDFbox .net version.
Getting all text from pdf but without formatting. I want to Preserve the
formatting of text as well as all images from pdf to word files. Please
guide me.


Thanks



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to