Re: [O] Bug: text export and multi-word link descriptions with line breaks
Mathias Bauer writes: > I expect, Org to do the following steps while parsing the source > text: > > 1. "Normalize" or clean the link description, i.e. remove any >newlines, starting and trailing spaces, and replace any >occurrences of "[ \t]+" in the interior by a single space >only. (To be done.) > > 2. Check the tuple (description,target) for duplicates and drop >them. (Seems ok to me.) > > 3. Below the paragraph list the tuples as "[description] target" >in the order of occurrence in the original text. (Also seems >ok to me.) > > I hope this makes this issue a little bit more clear now. Indeed. I missed the duplicates links. This should be fixed. Thank you for the report. Regards, -- Nicolas Goaziou
Re: [O] Bug: text export and multi-word link descriptions with line breaks
Hello Nicolas, * Nicolas Goaziou wrote on 2014-04-03 at 17:25 (+0200): > Mathias Bauer writes: > > > I just stumbled over Org's plain text export and how it works on > > links with descriptions consisting of multiple words and line > > breaks between them. I'm running Org stable version 8.2.5h. > > > > Org source (spaces at the end of line 1 and 2 don't matter): > > > > snip > > "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC > > 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC > > 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])... > > ... > > foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar > > baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo > > bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz > > snip > > > > Text export result: > > > > snip > > "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC > > 2440])... ... foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz > > > > > > [RFC 4880] https://tools.ietf.org/html/rfc4880 > > > > [RFC 1991] https://tools.ietf.org/html/rfc1991 > > > > [RFC 2440] https://tools.ietf.org/html/rfc2440 > > > > [RFC 4880] https://tools.ietf.org/html/rfc4880 > > > > [RFC 1991] https://tools.ietf.org/html/rfc1991 > > snip > > > > These multiple references look quite bad. Is it possible to > > "normalize" the descriptions in some way *before* checking > > them for uniqueness and output them thereafter? > > Could you be more explicit? What does look quite bad? What did > you expect instead? How is related to line breaks in the > descriptions? Ok, let's go into more details. See the Org source text: 1. There are three links and each of them appears twice. The link targets of every two of them are identical. 2. Each of the two "[...][RFC 2440]" links appear in one line; the links "[...][RFC 4880]" and "[...][RFC 1991]" each have a newline in their description. They are in fact "[...][RFC\n4880]" and "[...][RFC 4880]" and, respectively, "[...][RFC\n1991]" and "[...][RFC 1991]". So, now let's examine the Org text export: The final reference part - the five links below the paragraph - shows two links, [RFC 4880] and [RFC 1991], which appear twice but the link [RFC 2440] appears only once there. This is, at least, inconsistent. The point is, that Org obviously considers "[...][RFC 4880]" and "[...][RFC\n4880]" as being two different links internally and list both of them in the reference part. For this listing, the \n is removed. This is, what I called "normalization" in my first post. Human eyes, however, won't see any difference between this two forms and start being surprised. I expect, Org to do the following steps while parsing the source text: 1. "Normalize" or clean the link description, i.e. remove any newlines, starting and trailing spaces, and replace any occurrences of "[ \t]+" in the interior by a single space only. (To be done.) 2. Check the tuple (description,target) for duplicates and drop them. (Seems ok to me.) 3. Below the paragraph list the tuples as "[description] target" in the order of occurrence in the original text. (Also seems ok to me.) I hope this makes this issue a little bit more clear now. Kind regards, Mathias
Re: [O] Bug: text export and multi-word link descriptions with line breaks
Hello, Mathias Bauer writes: > I just stumbled over Org's plain text export and how it works on > links with descriptions consisting of multiple words and line > breaks between them. I'm running Org stable version 8.2.5h. > > Org source (spaces at the end of line 1 and 2 don't matter): > > snip > "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC > 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC > 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])... > ... > foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar > baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo > bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz > snip > > Text export result: > > snip > "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC > 2440])... ... foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz > > > [RFC 4880] https://tools.ietf.org/html/rfc4880 > > [RFC 1991] https://tools.ietf.org/html/rfc1991 > > [RFC 2440] https://tools.ietf.org/html/rfc2440 > > [RFC 4880] https://tools.ietf.org/html/rfc4880 > > [RFC 1991] https://tools.ietf.org/html/rfc1991 > snip > > These multiple references look quite bad. Is it possible to > "normalize" the descriptions in some way *before* checking them > for uniqueness and output them thereafter? > > Thanks for considering this issue. Could you be more explicit? What does look quite bad? What did you expect instead? How is related to line breaks in the descriptions? Regards, -- Nicolas Goaziou
[O] Bug: text export and multi-word link descriptions with line breaks
Dear Maintainers, I just stumbled over Org's plain text export and how it works on links with descriptions consisting of multiple words and line breaks between them. I'm running Org stable version 8.2.5h. Org source (spaces at the end of line 1 and 2 don't matter): snip "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])... ... foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz snip Text export result: snip "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC 2440])... ... foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz [RFC 4880] https://tools.ietf.org/html/rfc4880 [RFC 1991] https://tools.ietf.org/html/rfc1991 [RFC 2440] https://tools.ietf.org/html/rfc2440 [RFC 4880] https://tools.ietf.org/html/rfc4880 [RFC 1991] https://tools.ietf.org/html/rfc1991 snip These multiple references look quite bad. Is it possible to "normalize" the descriptions in some way *before* checking them for uniqueness and output them thereafter? Thanks for considering this issue. Kind regards Mathias