[Libreoffice-bugs] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

2023-08-31 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=91192

--- Comment #19 from tylergloria  ---
The special characters you mentioned, such as carats (^), hash marks (#),
question marks (?), and backslashes (), are sometimes used in URLs for specific
purposes.
This can lead to issues when the URL contains these characters at the end.
Check this detail https://www.ietf.org/rfc/rfc1738.txt 
Best way to dig this to use browser developer tools to inspect network requests
and redirects or use any online tool like https://redirectchecker.com/  This
can help you to get detail redirection chain and its status code.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

2021-11-04 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=91192

himajin100...@gmail.com changed:

   What|Removed |Added

   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=69
   ||599

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

2021-04-25 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=91192

himajin100...@gmail.com changed:

   What|Removed |Added

   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=14
   ||1894

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

2021-03-18 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=91192

himajin100...@gmail.com changed:

   What|Removed |Added

   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=14
   ||1104

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

2021-02-06 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=91192

himajin100...@gmail.com changed:

   What|Removed |Added

   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=11
   ||3526

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

2021-02-05 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=91192

Heiko Tietze  changed:

   What|Removed |Added

   Keywords|needsUXEval |
 CC|libreoffice-ux-advise@lists |heiko.tietze@documentfounda
   |.freedesktop.org|tion.org

--- Comment #18 from Heiko Tietze  ---
So the question mark should be included in the algorithm but the hash mark is
attributed unsafe by the RFC. Anything else is a must not.

Users probably do not understand the algorithm easily (but know the
"workaround"). The documentation should explain what happens and why.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

2021-02-04 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=91192

Stephan Bergmann  changed:

   What|Removed |Added

 CC||sberg...@redhat.com

--- Comment #17 from Stephan Bergmann  ---
The code that guesses which part of a larger text shall be auto-detected as a
URI is URIHelper::FindFirstURLInText (svl/source/misc/urihelper.cxx, containing
detailed documentation).  Of necessity, it needs to apply some heuristics, and,
also of necessity, the algorithm's outcome will not necessarily match any given
user's exact expectations.  That said:

(In reply to sdc.blanco from comment #12)
> Asking for UXEval:  Two questions.
> 
> 1.  Is it a considered a "bug" a potential URL that ends with #  (or ?) does
> not include the # (or ?) in the URL recognition?
> 
> (but, as noted, no problem if text follows # or ? )

Especially with "?" (and similar to e.g. "," and "."), the heuristics
conservatively try to avoid including trailing punctuation (for which it is
assumed that it was not meant to be part of the URI).

> 2.  Is it a problem that the three characters:  ^ | \ are not recognized as
> part of a URL (and URL recognition stops with these characters)?
> 
> Relevant to note that these three characters are considered "unsafe" and
> should have percent-encoding ( https://www.ietf.org/rfc/rfc1738.txt )

That's not a "should" but a "must".  None of those three characters can appear
in a URI as-is, they always need to be percent-encoded.  The used heuristics in
general do not consider that a character that cannot appear in a URI would form
part of a to-be-detected URI.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

2021-02-02 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=91192

Guilhem Moulin  changed:

   What|Removed |Added

 CC|guil...@fripost.org |

--- Comment #16 from Guilhem Moulin  ---
My 2¢: Browser have it much easier since they whatever is entered into the URL
bar is assumed to be a URL: the browser can apply whatever heuristics to turn
the *entire string* into a valid RFC-compliant URL.  Trying to mimic that logic
in LO, with unclear boundaries and arbitrary text, will lead to false
positives.  But what do I know.  Removing myself from CC since I'm not involved
in the decision nor implementation.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

2021-02-02 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=91192

--- Comment #15 from Nick Levinson  ---
The backslash should be accepted for another reason: If I type a URL with an
incorrect backslash directly into certain browsers, the browser changes the
incorrect backslash into a correct slash. Examples in Firefox 84.0.2 (64-bit):
http:\\slashsleep.com loads http://slashsleep.com/ and
http://slashsleep.com\3\you-will-sleep\1\sleep-always-wins.html loads
http://slashsleep.com/3/you-will-sleep/1/sleep-always-wins.html (that's my
website and I don't have an alias or redirection set up for the backslashes so
either the browser or the hosting server is doing it for all URLs).

This fails but shouldn't: http://example.com?age=293 . However, this is
properly hyperlinked in LO Writer: https://example.com/?age=293 . The sole
difference is in the slash after the TLD; I'm not sure if a server could be
configured to accept the slashless version, so LO should hyperlink it, just in
case.

I favor recognizing characters that are questionable in URLs on the same
principle that early on applied to emailing: be strict in what you send but
generous in what you accept. LO should generously recognize a typist's text as
a URL with the boundaries being spaces or angle brackets. The worst that can
happen is failing to arrive at the URL when clicked and even that can be
corrected in the browser's address bar, which is easier for nongeeks than
figuring out what should have been in the URL in the LO document. This example
uses a nonexistent TLD and yet is generously hyperlinked as a URL by LO Writer:
http://google.quibble

Parentheses, square brackets, and pipes (unfamiliar to me as a URL boundary but
here accepted arguendo) can be identified as URL boundaries if they appear
spacelessly both before and after the string that otherwise is a URL. Examples:
(example.com), [ftp://example.com], and |example.com| . However, spacelessness
must be at both ends; if it's at only one end, I don't know exactly what should
be hytperlinked.

Angle brackets are already known to be boundaries. While 
properly hyperlinks in LO without hyperlinking the angle brackets,
 does not hyperlink in LO, but should.

A comma following a URL's directory, file, query, fragment, or slash should be
treated as part of the URL because the host's server might recognize it. But a
comma-and-space following an apparent TLD should be treated as not part of the
URL, although it's too burdensome to have LO check if a domain label is a known
or actually proposed TLD listed at iana.org or icann.org.

If a URL ends with a TLD, it may be followed by a period or not without
changing the URL. (I forgot which RFC says so.)

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

2021-02-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=91192

--- Comment #14 from Guilhem Moulin  ---
(In reply to Heiko Tietze from comment #13)
> With the pipe breaking the hyperlink it is clearly a bug to me.

I tend to disagree, any compliant URL-parser would stop there as well.

> Not sure if all the other characters are proper URLs, but why should 
> LibreOffice guard the web? Would take everything until the next white space 
> (all kind of spaces, tab, cr) into the URL.

How about URLs enclosed in parentheses, square/angle brackets, or even pipes? 
Formatted URLs shouldn't include the markers.  How about punctuation following
a URL?  Makes sense to greedily parse URLs following RFC3986/3987 and stick to
this IMHO.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

2021-02-01 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=91192

Heiko Tietze  changed:

   What|Removed |Added

 CC||caol...@redhat.com,
   ||guil...@fripost.org,
   ||ke...@collabora.com

--- Comment #13 from Heiko Tietze  ---
With the pipe breaking the hyperlink it is clearly a bug to me. Not sure if all
the other characters are proper URLs, but why should LibreOffice guard the web?
Would take everything until the next white space (all kind of spaces, tab, cr)
into the URL.

> Relevant to note that these three characters are considered "unsafe" and
> should have percent-encoding ( https://www.ietf.org/rfc/rfc1738.txt )

Adding people with more expertise to get opinions. Btw, the issue is also
relevant for LibreOffice Online.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

2021-01-31 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=91192

sdc.bla...@youmail.dk changed:

   What|Removed |Added

   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=84
   ||449

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 91192] AutoCorrect: Writer not recognizing a URL's trailing carat, hash mark, question mark, backslash, or pipe

2021-01-31 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=91192

sdc.bla...@youmail.dk changed:

   What|Removed |Added

   Keywords||needsUXEval
 CC||libreoffice-ux-advise@lists
   ||.freedesktop.org,
   ||sdc.bla...@youmail.dk
 Blocks||103341
Summary|Writer not recognizing a|AutoCorrect: Writer not
   |URL's trailing carat, hash  |recognizing a URL's
   |mark, question mark,|trailing carat, hash mark,
   |backslash, or pipe  |question mark, backslash,
   ||or pipe

--- Comment #12 from sdc.bla...@youmail.dk ---
With AutoCorrect "URL Recognition" [T] and Tools > AutoCorrect > While Typing
enabled.

Can reproduce all examples shown in attachment 156589 using 7.2.0.0.alpha0+

Additional Information:

1. If additional text follows # or ?, then there is URL recognition

http://example.com/directory#testing  
http://example.com/directory?testing  

Both these examples are recognized as URLs.

2. For ^ | \

URL conversion stops with these characters, even if additional text is appended
to them. 

e.g.,  http://example.com/directory^testing  (URL stops at 'y' in directory)


Asking for UXEval:  Two questions.

1.  Is it a considered a "bug" a potential URL that ends with #  (or ?) does
not include the # (or ?) in the URL recognition?

(but, as noted, no problem if text follows # or ? )

2.  Is it a problem that the three characters:  ^ | \ are not recognized as
part of a URL (and URL recognition stops with these characters)?

Relevant to note that these three characters are considered "unsafe" and should
have percent-encoding ( https://www.ietf.org/rfc/rfc1738.txt )

Could consider an enhancement request to character encode  ^ | \ as part of URL
Recognition.


Referenced Bugs:

https://bugs.documentfoundation.org/show_bug.cgi?id=103341
[Bug 103341] [META] AutoCorrect and Word Completion bugs and enhancements
-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs