[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.documentfoundation.org/show_bug.cgi?id=76080 QA Administrators qa-ad...@libreoffice.org changed: What|Removed |Added Status|NEEDINFO|RESOLVED Resolution|--- |INVALID --- Comment #17 from QA Administrators qa-ad...@libreoffice.org --- Dear Bug Submitter, Please read this message in its entirety before proceeding. Your bug report is being closed as INVALID due to inactivity and a lack of information which is needed in order to accurately reproduce and confirm the problem. We encourage you to retest your bug against the latest release. If the issue is still present in the latest stable release, we need the following information (please ignore any that you've already provided): a) Provide details of your system including your operating system and the latest version of LibreOffice that you have confirmed the bug to be present b) Provide easy to reproduce steps – the simpler the better c) Provide any test case(s) which will help us confirm the problem d) Provide screenshots of the problem if you think it might help e) Read all comments and provide any requested information Once all of this is done, please set the bug back to UNCONFIRMED and we will attempt to reproduce the issue. Please do not: a) respond via email b) update the version field in the bug or any of the other details on the top section of FDO Message generated on: 2015-02-11 -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.documentfoundation.org/show_bug.cgi?id=76080 --- Comment #18 from Stephan Bergmann sberg...@redhat.com --- (might be related to, or even a duplicate of, bug 76291, but hard to tell with the information given here) -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #16 from Stephan Bergmann sberg...@redhat.com --- Attachment 100862 is both broken and has problematic content: For one, as comment 14 notes, the HTML file is labelled as charset=utf-8, but contains the raw bytes E8 9A F8 9E EC that do not constitute UTF-8. How has this broken file been generated? For another, the file URL contained in the a link is problematic: First, that file URL, as written in the HTML file, contains raw non-ASCII bytes (see above). How they should be interpreted when extracting the URL from the HTML file depends on the HTML file's encoding (UTF-8), but as noted above the file is broken and those bytes cannot be interpreted meaningfully. Different software in different scenarios (OS's locale settings, etc.) will likely respond in different ways when confronted with such broken input. Second, even if the URL could meaningfully be extracted from the HTML file, it would contain non-ASCII bytes. URLs are written in a subset of ASCII. If a URLs payload (which is, roughly, a sequence of arbitrary byte values) shall contain values that are outside ASCII, they need to be escaped as %XX sequences. Again, different software in different scenarios (OS's locale settings, etc.) will likely respond in different ways when confronted with such broken input. Third, even if the file URL's payload (i.e., a representation of a Windows pathname) could meaningfully be extracted, as it contains non-ASCII bytes, it would be unclear how to interpret it as an actual Windows pathname. Windows pathnames are basically sequences of (16-bit) UTF-16 code units. An alternative way to access pathnames is via the OS's selected 8-bit character set (like windows-1250 etc.), where Windows internally translates between that 8-bit character set and UTF-16, and some valid UTF-16 pathnames can not be represented in certain 8-bit character sets, and the same 8-bit input sequence can denote different UTF-16 pathnames depending on the actually selected OS 8-bit character set. It is unspecified how (encodings of) non-ASCII bytes in a file URL's payload are to be interpreted on Windows, but general consensus appears to be to interpret them according to the OS's selected 8-bit character set (all the shortcomings of that approach notwithstanding). That, again, means that software in different scenarios (i.e., OS's locale settings) will likely respond in different ways when confronted with such problematic input. -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #15 from QA Administrators qa-ad...@libreoffice.org --- Dear Bug Submitter, This bug has been in NEEDINFO status with no change for at least 6 months. Please provide the requested information as soon as possible and mark the bug as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in NEEDINFO status with no change in 30 days the QA team will close the bug as INVALID due to lack of needed information. For more information about our NEEDINFO policy please read the wiki located here: https://wiki.documentfoundation.org/QA/FDO/NEEDINFO If you have already provided the requested information, please mark the bug as UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed. Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team Message generated on: 10/01/2015 -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #14 from Urmas davian...@gmail.com --- That file is not in UTF-8. -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #11 from Tomáš Tunkl tun...@gmail.com --- I think that is the catch. In my case it keeps windows-1250 a converts the link to UTF-8 strings. I think it has something to do with the default charset. Here is screen record https://www.youtube.com/watch?v=4x0FRHkCNzQ -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #12 from Tomáš Tunkl tun...@gmail.com --- Created attachment 100862 -- https://bugs.freedesktop.org/attachment.cgi?id=100862action=edit File before editing in LO (utf-8) -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #13 from Tomáš Tunkl tun...@gmail.com --- Hello again. I have tried to reproduce the problem on more machines and I have found the bigger problem is: HTML file before editing in LO encoded in utf-8. On most machines LO changes encoding to windows-1250 and converts links to utf-8 strings. On some PCs (like mine) it does even with HTML files encoded from the beginning in windows-1250. Most PCs in fact keeps the links OK as you stated. I have tested this issue on following setups: Win 7 LO (3.5.4.2); Win XP LO (3.3.2); Win 7 LO (3.4.4); Win 7 LO (4.2.4.2) So I'm sorry my intial file File before editing in LO was probably a bad example. Please try the newly uploaded File before editing in LO (utf-8) as this one should reproduce the problem better. -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 Tomáš Tunkl tun...@gmail.com changed: What|Removed |Added Version|unspecified |4.2.1.1 release -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #3 from Tomáš Tunkl tun...@gmail.com --- Created attachment 100812 -- https://bugs.freedesktop.org/attachment.cgi?id=100812action=edit Test target file -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #5 from Tomáš Tunkl tun...@gmail.com --- Created attachment 100814 -- https://bugs.freedesktop.org/attachment.cgi?id=100814action=edit File after edit in LO -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #4 from Tomáš Tunkl tun...@gmail.com --- Created attachment 100813 -- https://bugs.freedesktop.org/attachment.cgi?id=100813action=edit File before editing in LO -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #6 from Tomáš Tunkl tun...@gmail.com --- I have updated to the latest stable release 4.2.4.2 and it does the same. -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #7 from Tomáš Tunkl tun...@gmail.com --- Created attachment 100815 -- https://bugs.freedesktop.org/attachment.cgi?id=100815action=edit Visual code comparison -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #8 from Jay Philips philip...@hotmail.com --- Hi Tomas, Did some testing and this issue is only Windows related as on Linux it automatically changes the charset to utf8. So within windows, when you go and edit the link, first click on the 'web' tab, rather than editing it in the 'document' tab and the link should be fine. -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #9 from Tomáš Tunkl tun...@gmail.com --- Hello. It happens not only when editing the link. You just open the document and change f.e. font size and all the links in the document transform into UTF-8 strings. -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 --- Comment #10 from Jay Philips philip...@hotmail.com --- Created attachment 100838 -- https://bugs.freedesktop.org/attachment.cgi?id=100838action=edit modified version of the before edit in LibreOffice html file (In reply to comment #9) Hello. It happens not only when editing the link. You just open the document and change f.e. font size and all the links in the document transform into UTF-8 strings. Hi. I tried to reproduce TestHTML_after_edit_LO.html from TestHTML_before_edit_LO.html as that was the only changes made between the two files were that you modified the url from an absolute location on your hard disk to a relative location within the same directory, and then i confirm that it worked in IE. As you stated if i did more things to the file, it would all turn to UTF-8 string, so i did make changes by adding text to it as well as change the font and added another link, and everything went well for me and it was still openable in IE. I did notice that libreoffice in my case changed the charset to windows-1252 and a number of the characters change to their '#' numeric equivalents. -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 Jay Philips philip...@hotmail.com changed: What|Removed |Added Status|UNCONFIRMED |NEW CC||philip...@hotmail.com Ever confirmed|0 |1 --- Comment #2 from Jay Philips philip...@hotmail.com --- Hi Tomas, Which version of libreoffice did you try this on as i tried 4.0, 4.1 and 4.2 and didnt get the output you mentioned on ask libreoffice. -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 Jay Philips philip...@hotmail.com changed: What|Removed |Added Status|NEW |NEEDINFO -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 Tomáš Tunkl tun...@gmail.com changed: What|Removed |Added Hardware|Other |All Version|4.1.0.4 release |unspecified -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 76080] FILESAVE: URLs encoded into UTF-8 after saving HTML
https://bugs.freedesktop.org/show_bug.cgi?id=76080 Stephan Bergmann sberg...@redhat.com changed: What|Removed |Added CC||sberg...@redhat.com --- Comment #1 from Stephan Bergmann sberg...@redhat.com --- Looks like conversion of file URLs between LO's internal (path payload always in UTF-8) and external (path payload according to platform expectations) representations does not happen for HTML (im-?)/export. (The example at http://ask.libreoffice.org/en/question/31061/can-url-encoding-be-disabled/ apparently involves file URLs, albeit relative ones, which is an extra challenge for the conversion between internal and external, which bases its work on the scheme of a---necessarily absolute---URI.) -- You are receiving this mail because: You are the assignee for the bug. ___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs