Hi,
There is no PDFBox .net version. There is some unofficial stuff from old
PDFBox versions.
There is no "formatting" in PDF like in HTML. Glyphs are put at
specified places, sometimes 1 character at a time.
There are products that try to recreate paragraphs from this. Even
PDFBox tries this, but it's not perfect, see PDFText2HTML.java .
To get the images, see the ExtractImages.java and
PrintImageLocations.java . One would have to combine all this, and it
would still not look very close to a PDF.
Tilman
On 20.03.2023 06:40, Vaishali Mahajan wrote:
Hi,
Creating PDF to Word conversion application using PDFbox .net version.
Getting all text from pdf but without formatting. I want to Preserve the
formatting of text as well as all images from pdf to word files. Please
guide me.
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org