[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-11-04 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

Heiko Tietze  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|NEW |RESOLVED

--- Comment #11 from Heiko Tietze  ---


*** This bug has been marked as a duplicate of bug 49705 ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-10-24 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

--- Comment #10 from V Stuart Foote  ---
(In reply to Kevin Suo from comment #9)
> ...

> I think we really need to have this ticket short and provide as much useful
> information as possible, otherwise devs would not finish reading this ticket
> and this one will never be fixed.

a little history from the poppler side...

https://bugs.freedesktop.org/show_bug.cgi?id=55977

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-10-24 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

Kevin Suo  changed:

   What|Removed |Added

 CC||suokunl...@126.com
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #9 from Kevin Suo  ---
To have the "justified" text be justified in Writer pdfimport as well, we need
to know how PDF specifies "justified alignment" as per the PDF specification.
If we can find the pdf token defining the justified alignment (there should be
one, but need to read the pdf specifications carefully to identify it), then we
can add a line output to the (poppler based) xpdfimport binary, then handle
that in the so called "emiting" process during import.

This is similar as to how do we handle bold, underline, etc. We read the PDF
tokens, if we encounter the pdf token specifying that the text should be
aligned justified, then we do that in our import process.

I think we really need to have this ticket short and provide as much useful
information as possible, otherwise devs would not finish reading this ticket
and this one will never be fixed.

May the irrelevant comments be tagged as "obsolete"?

I mark this to NEW as I see there is a bug here.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-10-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

--- Comment #8 from Eyal Rozenberg  ---
(In reply to V Stuart Foote from comment #7)

> (In reply to Eyal Rozenberg from comment #6)
> No, please understand how our poppler based PDF import filtering functions.

I actually assumed everything you wrote in your post. I'm not that dense... :-)

But - it is irrelevant how the current filter works. Or rather, it's relevant
when evaluating whether or not a fix can be based on the current implementation
- it is not relevant for evaluating what the desired behavior is.

> PDF is not an editable format. 

First of all, of course it's an editable format. It's not _convenient_ to edit;
it expresses many things implicitly, sure; and still, it's editable.

... but I won't fall for the moving-of-the-goalposts you seem to be setting up
here. PDFs do not need to be editable to have editors. We've already described
what an editor does - and that does not require directly working on the format
it's an editor for. It is perfectly legitimate for an editor to
import-edit-export. gimp and Photoshop do that for most image formats, because
those are also not editing-friendly.

> We do not Edit PDFs. 

I told you I wouldn't fall for that. You might as well say "We do not edit
OOXMLs"... ok, sure, but LO is still a DOCX editor, and one of the better ones.


> Even for a document being "round-tripped" LibreOffice's import filter(s),
> using external poppler and poppler-utils libraries, extracts the content
> streams from the published presentation, and converts each stream into a
> discreet draw Shape object. 

This, at most, may means that fixing this bug may require a lot of effort due
to the need for an alternative to the use of poppler (although - maybe not; I'm
not familiar with poppler's capabilities). Fine! I do not claim that this this
issue should be the LO project's top priority. 

> PDF Viewers don't need to do more with the content streams--they simply
> parse them and lay them out as described in the postscript pages.

Indeed,  PDF viewers have it easier, and don't have to reach structural
conclusions. A PDF import filter for a textual document editor needs to work
much harder, reconstituting structure, deducing features and styles etc.

I don't expect this to work perfectly for arbitrary PDFs. But I definitely
expect it to work well for the most straightforward of PDFs for us to import:
Paragraphs of text exported from LO Writer.

> Put another way it is not justified to expend dev, QA and design resources
> working on the PDF import filters when we offer exceptional fidelity for PDF
> content using the pdfium based insert filters.

But you know that's not what a PDF import filter is for. The PDF import filter
for Writer is for editing PDFs in Writer, and that's not at all provided by
pdfium. So, the existence of pdfium does not constitute an argument against
investing effort in improving the Writer PDF import filter.

In fact, I must say that you're taking a rather myopic view of the matter.
Think about the promotion of LO as a product! Especially vis-a-vis MS Office.
If you could tell the user "Someone send you a document as a PDF? With LO, you
can edit it! Either make it your own by modifying the text or use Track Changes
to treat it as a draft for discussion." - that very attractive functionality
that Microsoft doesn't offer. 


> And again, LibreOffice is *not* a PDF editor.

I commend your valiant (?) attempt to try to argue this point. Unfortunately,
your argument was based on the false premise that an editor for a file format
must be able to manipulate that format's internal structure directly.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-10-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

V Stuart Foote  changed:

   What|Removed |Added

  Component|Writer  |filters and storage

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-10-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

V Stuart Foote  changed:

   What|Removed |Added

  Component|filters and storage |Writer
 CC||kha...@aliftype.com,
   ||qui...@gmail.com,
   ||t...@libreoffice.org,
   ||xiscofa...@libreoffice.org

--- Comment #7 from V Stuart Foote  ---
(In reply to Eyal Rozenberg from comment #6)
No, please understand how our poppler based PDF import filtering functions.

PDF is not an editable format. We do not Edit PDFs. A PDF viewer processor will
open and parse PDF stream content onto fully described (in postscript) pages.
And then manage display of those complete pages.

Even for a document being "round-tripped" LibreOffice's import filter(s), using
external poppler and poppler-utils libraries, extracts the content streams from
the published presentation, and converts each stream into a discreet draw Shape
object. 

The text runs in the PDF are just one of the content streams. Those discreet
text run content streams have no lexical details and are strictly glyph based
snippets of text with font and character metrics that are then used to create
the draw Shape textboxes. The content stream includes a starting position on
the published page, and that is used to coarsely position the draw textbox to
LO canvas.  That is why the text runs are not rendered to LO canvas as
"justified" and can exceed the LO canvas margins.

The mishandling of the RTL text was also manifestation of the fact that the
content stream records text in the order they are recorded to the postscript
page. There are similar issues for complex text recorded to PDF with
/ActualText flag support.

PDF Viewers don't need to do more with the content streams--they simply parse
them and lay them out as described in the postscript pages.

And LibreOffice actually includes a PDF viewer processor--that is the pdfium
based ipdf filter used to insert PDF page as image.

Improving fidelity of filter imported draw Shapes to content on the source PDF
published page is out of scope for project.  

Put another way it is not justified to expend dev, QA and design resources
working on the PDF import filters when we offer exceptional fidelity for PDF
content using the pdfium based insert filters. Where any "manipulation" of the
source PDF (e.g. page extraction, clipping, etc.) to prepare it for insertion
is best done external to LibreOffice.

And that is why I make the suggestion that perhaps it would be best just to
drop  the functional poppler based PDF import filter from core LO deliverables.
And it could then be packaged more effectively as an extension (where it
started in the Oracle OOo era).

And again, LibreOffice is *not* a PDF editor.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-10-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

Eyal Rozenberg  changed:

   What|Removed |Added

  Component|Writer  |filters and storage

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-10-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

--- Comment #6 from Eyal Rozenberg  ---
(In reply to V Stuart Foote from comment #5)
> LibreOffice is not a PDF editor.

I explained why it is, and you have not presented a counter-argument. Repeating
your statement without a counter-argument is effectively conceding the point. 

> Spacing of the text runs is something that can not be efficiently extracted
> from the PDF

If that were true, the PDF format would be useless and PDF viewers would not
work.

Also, it would be good enough if LO simply realized that it's seeing a
justified line, and formatted it accordingly (as after all, we can justify
single lines.) That might not result in exactly the same spacing as in the PDF
file, but it would be pretty close typically, and could be identical if the
file had originated in LO (and if the text box were sized appropriately). 

> When a user choses to filter import a
> source PDF to LO, they *must* understand the content of the PDF is being
> extracted and constituent elements rendered as drawing Shapes to document
> canvas.

Why does it matter whether users "understand" that there's a bug? I'm not
following.

> It is time for UX and ESC to flatly state what project will do regards PDF
> source materials--up to an including *removal* of the PDF import filters to
> eliminate the misguided perception that LibreOffice is a PDF editor.

It's obvious you're trying to promote this agenda by pushing back against bug
reports on PDF import filters. That's not appropriate. PDF import into writer
is an officially supported feature. If you want to remove it - open an issue
about it (or actually, don't, since it's an important and useful feature);
certainly don't try to suppress requests to fix the import filter.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-10-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

V Stuart Foote  changed:

   What|Removed |Added

 CC||libreoffice-ux-advise@lists
   ||.freedesktop.org
   Keywords||needsUXEval

--- Comment #5 from V Stuart Foote  ---
Spacing of the text runs is something that can not be efficiently extracted
from the PDF, IMHO NAB and => WF

LibreOffice is not a PDF editor. When a user choses to filter import a source
PDF to LO, they *must* understand the content of the PDF is being extracted and
constituent elements rendered as drawing Shapes to document canvas. Draw by
default or optionally Impress or Writer.

It is time for UX and ESC to flatly state what project will do regards PDF
source materials--up to an including *removal* of the PDF import filters to
eliminate the misguided perception that LibreOffice is a PDF editor.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-10-15 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

Eyal Rozenberg  changed:

   What|Removed |Added

   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=15
   ||1554

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-10-15 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

--- Comment #4 from Eyal Rozenberg  ---
(In reply to V Stuart Foote from comment #3)
> Why is this a bug?

I believe you're being facetious here, but: A PDF imported into Writer should
(ignoring complex and esoteric PDF features) be rendered near-identically to
its rendering in a PDF viewer. Or phrased otherwise: Printing the PDF and
printing the Writer-imported PDF onto paper should result in almost-identical
printed documents.

If that seems too presumptuous, then at the very least that should hold for
PDFs created by exporting Writer documents (ignoring complex and esoteric
features).

In particular, if words are spaced out within a line in the PDF, they should be
spaced-out exactly, or almost-exactly the same way in the PDF-imported Writer
document.

>  And I will restate the obvious LibreOffice is not a PDF editor! 

That is is obvious...ly wrong: LibreOffice is a PDF editor.

Dictionary.com defines [1] editor as:

"A program used for writing and revising code, data, or text"

LO can open PDFs, make edits to the opened PDF, and save the result to a PDF.
It's a poor PDF editor, but considering the lack of FOSS alternatives, and the
fact that LO is installed so widely - it's the PDF editor of choice for many.

If PDF import filters - in particular for Writer, but for Draw as well - would
improve, LO could become a mediocre PDF editor.


> The import filter results in draw Shape textboxes not LO paragraphs.

That's an implementation detail. One could argue whether it's a good idea in
general for a Writer import filter, but regardless - implementation details are
not an excuse to mess up the import.

> The
> poppler based PDF import filter does not provide the spacings recorded into
> PDF.

So here's your bug.

> We use it to convert the text runs held in PDF into draw Shape
> objects--specifically textbox. The lack of "justified" filter import is
> expected and by design.

It's not what users expect, and if it was by design - the bug is in the design
of the import filter.

> We have pretty much the same result to canvas with the poppler based Impress
> filter.

Actually, that's not true, and I'll open a bug about the Draw filter
separately; but the difference is not a good one...

> If you need layout fidelity to the original PDF page, use the Insert PDF
> filter!

I need both layout fidelity and editability, and that's what the import filter
should provide.



[1] : https://www.dictionary.com/browse/editor

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-10-15 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

V Stuart Foote  changed:

   What|Removed |Added

 CC||vstuart.fo...@utsa.edu

--- Comment #3 from V Stuart Foote  ---
Why is this a bug?  And I will restate the obvious LibreOffice is not a PDF
editor! 

The import filter results in draw Shape textboxes not LO paragraphs. The
poppler based PDF import filter does not provide the spacings recorded into
PDF. We use it to convert the text runs held in PDF into draw Shape
objects--specifically textbox. The lack of "justified" filter import is
expected and by design. We ignore any spacing in parsing out the text.

We have pretty much the same result to canvas with the poppler based Impress
filter.

The filters (there are two separate but similar) only provides a workflow to
extract text runs (varying fidelity by script) and to merge them into one draw
Shape textbox (the 'Consolidate text' action). And from there select the text
stream to use if needed in a new LibreOffice paragraph object.

If you need layout fidelity to the original PDF page, use the Insert PDF
filter!

Otherwise accept the tool for what it provides--an ability to extract text runs
from PDF.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-10-15 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

--- Comment #2 from Eyal Rozenberg  ---
Created attachment 183068
  --> https://bugs.documentfoundation.org/attachment.cgi?id=183068=edit
Screenshot of attachment 183066 imported into Writer

Screenshot showing how the alignment after import is left-only.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 151552] PDF import into writer messes up line justification

2022-10-15 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=151552

Eyal Rozenberg  changed:

   What|Removed |Added

Summary|PDF import does messes up   |PDF import into writer
   |line justification  |messes up line
   ||justification

-- 
You are receiving this mail because:
You are the assignee for the bug.