[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2023-01-22 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

--- Comment #12 from Tomaz Vajngerl  ---
(In reply to V Stuart Foote from comment #10)
> Back to Unconfirmed then.
> 
> @quikee, are you offering to tackle it?

I'll try, but first need to change the ODF document to be embedded as a
compatible PDF embedded file.   

(In reply to Eyal Rozenberg from comment #11)
> So, that might actually be relevant even when the images don't originally
> come from a PDF.  Or do you mean you want to avoid re-encoding the images as
> object streams, even when not recompressing?

I mean the option in PDF export to re-compress JPEG images to reduce DPI
resolution. Re-compressing would be problematic in this case as we don't want
to mess with the original images in the ODF document.

> Do you feel this bug should focus just on images, leaving fonts for a
> separate bug report? Or is it close enough to keep them together in a single
> bug?

I think fonts would be way more messy and probably not worth the effort to
de-duplicate, so it is at least out of my scope. I would keep this one for
images only as also the document you refer to doesn't have fonts embedded into
ODT file, but it does contain 20+MB of images that can be de-duplicated.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2023-01-17 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

--- Comment #11 from Eyal Rozenberg  ---
(In reply to Tomaz Vajngerl from comment #9)
> There are some issues like making sure we don't re-compress the images when
> saving them to PDF (disable that option with hybrid PDF) 

So, that might actually be relevant even when the images don't originally come
from a PDF.  Or do you mean you want to avoid re-encoding the images as object
streams, even when not recompressing?

> This probably wouldn't work for fonts as PDF subsets the fonts, and normally
> the fonts also aren't included into ODF. For a max compatibility option we
> could however embed the whole font into PDF and do a similar thing like with
> images also for fonts.

Do you feel this bug should focus just on images, leaving fonts for a separate
bug report? Or is it close enough to keep them together in a single bug?

> I like this idea, because smaller the overhead of hybrid PDF the more likely
> it is the user will use it.

:-)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2023-01-17 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

V Stuart Foote  changed:

   What|Removed |Added

   Severity|normal  |enhancement

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2023-01-17 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

V Stuart Foote  changed:

   What|Removed |Added

 Resolution|WONTFIX |---
 Status|RESOLVED|UNCONFIRMED

--- Comment #10 from V Stuart Foote  ---
Back to Unconfirmed then.

@quikee, are you offering to tackle it?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2023-01-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

--- Comment #9 from Tomaz Vajngerl  ---
Thinking a bit about it I don't think this is that hard to implement. We don't
really need to mess with the PDF structure - all we really need is to extract
all the images from the PDF (easily done with PDFium I think), make sure to
preserve the image name (various solutions) and reconstruct the ODF document
before reading in the filter, then normally open the document. 

When saving the hybrid PDF, we save the ODF normally, but just skip saving the
images.

There are some issues like making sure we don't re-compress the images when
saving them to PDF (disable that option with hybrid PDF) and that the images
are all compatible, if not we would duplicate them or something else. 

This probably wouldn't work for fonts as PDF subsets the fonts, and normally
the fonts also aren't included into ODF. For a max compatibility option we
could however embed the whole font into PDF and do a similar thing like with
images also for fonts.

I like this idea, because smaller the overhead of hybrid PDF the more likely it
is the user will use it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2023-01-16 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

V Stuart Foote  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #8 from V Stuart Foote  ---
Not a bug and in no sense agreed to. Closing.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2023-01-15 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

Heiko Tietze  changed:

   What|Removed |Added

   Keywords|needsUXEval |
 CC|libreoffice-ux-advise@lists |heiko.tietze@documentfounda
   |.freedesktop.org|tion.org

--- Comment #7 from Heiko Tietze  ---
UX input given, removing the keyword.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2023-01-13 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

--- Comment #6 from Eyal Rozenberg  ---
(In reply to V Stuart Foote from comment #5)
> The current two-way filters are efficient and functional--suited to our
> needs for Hybrid PDF. 

They're not efficient - they about-double the amount of space necessary, when
the embedded media is significantly larger than the rest of the document. Hence
this bug.


As for the rest of your comment...

Right now, the PDF import filter, upon noticing a PDF is a "hybrid PDF" - e.g.
by some field/tag in the trailer or xref table, I guess - chucks all of the PDF
and keeps the embedded ODF document. So, there's already some parsing going on
which results in a coherent ODF - although, granted, it's limited. Also, the
PDF export filter (whether it's a hybrid PDF or not) already packs elements
into multiple PDF object streams and creates xref entries for them. The change
I'm proposing is that media references in the ODF saying "the PNG file named
foo.png packed into this ODF", we will have references saying, oh, maybe
something like "the indirect object 12345 foopng within the PDF this ODF is
in".

Indeed, this means there will need to be more parsing. But - that's nothing
compared to the amount of work done when importing MSO files! It's basically at
the level of complexity of a regexp application.

> However, refactoring PDF export filter to reliably embed ODF canvas
> internals as PDF object streams

I don't think I suggested doing that. I hope my last couple of paragraphs
illustrate what I mean

> would be non-performant--which elements go
> where?  While the likely necessary use of /ActualText (as for bug 117428)
> tagging for *all* text runs

I'm only talking about media such as images, sound, video and arbitrary binary
files. I really think you've misunderstood my suggestions.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2023-01-13 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

--- Comment #5 from V Stuart Foote  ---
You are asking for PDF filter export to pack ODF compliant elements into a  PDF
Object stream and individual xref entries. And to provide reverse PDF filter
import to parse those same Object streams back into a coherent ODF ready XML.

The current approach is a single export stream embedding a single ODF compliant
document as a PDF Object stream --the LibreOffice "Hybrid PDF". That is matched
by a filter import stream that reads the PDF xref table, recognizes the entry
for LO generated source ODF, and selectively parses that ODF stream rather than
the full PDF.

The current two-way filters are efficient and functional--suited to our needs
for Hybrid PDF. 

As noted in see also bug 95328 to refactor export/import filters and make the
ODF a PDF "attachment" might make sense to allow other PDF viewers to recognize
the attached ODF. 

However, refactoring PDF export filter to reliably embed ODF canvas internals
as PDF object streams would be non-performant--which elements go where?  While
the likely necessary use of /ActualText (as for bug 117428) tagging for *all*
text runs would negate any potential size reduction of embedding ODF elements
as PDF object streams.

And then there would be filter requirements to be able to roundtrip--rather
than a single fully ODF compliant source document, we would have to parse the
entire PDF xref table, identify where each PDF Object needs to be placed (and
on which page) individually extract and hold, and then reassemble in to some
semblance of the original source ODF.

It could be done, obviously--but it is not advantageous to the project in any
sense to do so! Not an imperative, certainly not worth the dev effort 
refactoring both PDF filters would require.

So again, NO!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2023-01-13 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

Eyal Rozenberg  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |---

--- Comment #4 from Eyal Rozenberg  ---
(In reply to Heiko Tietze from comment #3)
> While reducing the footprint is desired in general we have to deal with the
> standards. And either all PDF reader change their implementation or ODF
> relies on the way PDF handles content, if we read embedded data from the
> alien format. So this is unfortunately not going to fly.

I believe you've misunderstood my suggestion. I'm suggesting for the PDF to not
change at all, and be perfectly valid regardless of the ODT tacked on to it. It
is the ODT which should be altered, so that instead of referring to media
within the ODT, it refers to media that's part of the PDF. If one then saves
the opened file to an ODT, it will be saved the "regular" way.

At worst, this would require some tweaking of how one refers to media in the
ODF format, to cover this use-case. At best - ODF already supports something
like this, and it's just a matter of using this support.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2023-01-12 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

Heiko Tietze  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from Heiko Tietze  ---
We discussed the topic in the design meeting.

While reducing the footprint is desired in general we have to deal with the
standards. And either all PDF reader change their implementation or ODF relies
on the way PDF handles content, if we read embedded data from the alien format.
So this is unfortunately not going to fly.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2022-12-23 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

--- Comment #2 from Eyal Rozenberg  ---
(In reply to V Stuart Foote from comment #1)
> Why?
> 
> Not sure there is any means to do this--PDF and ODF are radically different
> document formats.

Of course there is - don't just concatenate the ODF. Entwine aspects of it with
the PDF. Then, if you want a proper ODF document, you can extract it from the
hybrid format.

>  That PDF can "hold" a fully described ODF document in a
> LibreOffice "Hybrid PDF" is a nice means to deliver an editable PDF
> rendering of an ODF source document.

It's not "nice" - it doubles the size.

> The cost is duplicated content and increased size--the PDF is fully rendered
> by PDF viewer, and LibreOFfice can parse out the ODF stream for the source
> document.

You can have a fully renderable PDF, and data which can be reconstituted into
an ODF.

> Otherwise don't use it.

This workflow will be improved if it doesn't double the size.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2022-12-23 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

V Stuart Foote  changed:

   What|Removed |Added

   Keywords||needsUXEval
 CC||libreoffice-ux-advise@lists
   ||.freedesktop.org,
   ||vsfo...@libreoffice.org

--- Comment #1 from V Stuart Foote  ---
Why?

Not sure there is any means to do this--PDF and ODF are radically different
document formats.  That PDF can "hold" a fully described ODF document in a
LibreOffice "Hybrid PDF" is a nice means to deliver an editable PDF rendering
of an ODF source document.

The cost is duplicated content and increased size--the PDF is fully rendered by
PDF viewer, and LibreOFfice can parse out the ODF stream for the source
document.

If that is a work flow a user needs, great! It is functional. Accept the cost
(PDF size) and get on with it.

Otherwise don't use it.

Not a lot of reason to refactor or make any attempt at reducing projects Hybrid
PDF size.

IHMO => INVALID

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2022-12-23 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

Eyal Rozenberg  changed:

   What|Removed |Added

   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=95
   ||328

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 152661] "Hybrid PDF" must share embedded media between the ODT and the proper PDF

2022-12-23 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=152661

Eyal Rozenberg  changed:

   What|Removed |Added

 Blocks||103378


Referenced Bugs:

https://bugs.documentfoundation.org/show_bug.cgi?id=103378
[Bug 103378] [META] PDF export bugs and enhancements
-- 
You are receiving this mail because:
You are the assignee for the bug.