[O] Bug: text export and multi-word link descriptions with line breaks

2014-04-03 Thread Mathias Bauer
Dear Maintainers,

I just stumbled over Org's plain text export and how it works on
links with descriptions consisting of multiple words and line
breaks between them.  I'm running Org stable version 8.2.5h.

Org source (spaces at the end of line 1 and 2 don't matter):

snip
OpenPGP Message Format ([[https://tools.ietf.org/html/rfc4880][RFC
4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC
1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])...
...
foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar
baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo
bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz
snip

Text export result:

snip
OpenPGP Message Format ([RFC 4880] which obsoletes [RFC 1991] and [RFC
2440])...  ...  foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz


[RFC 4880] https://tools.ietf.org/html/rfc4880

[RFC 1991] https://tools.ietf.org/html/rfc1991

[RFC 2440] https://tools.ietf.org/html/rfc2440

[RFC 4880] https://tools.ietf.org/html/rfc4880

[RFC 1991] https://tools.ietf.org/html/rfc1991
snip

These multiple references look quite bad.  Is it possible to
normalize the descriptions in some way *before* checking them
for uniqueness and output them thereafter?

Thanks for considering this issue.

Kind regards
Mathias



Re: [O] Bug: text export and multi-word link descriptions with line breaks

2014-04-03 Thread Nicolas Goaziou
Hello,

Mathias Bauer mba...@gmx.org writes:

 I just stumbled over Org's plain text export and how it works on
 links with descriptions consisting of multiple words and line
 breaks between them.  I'm running Org stable version 8.2.5h.

 Org source (spaces at the end of line 1 and 2 don't matter):

 snip
 OpenPGP Message Format ([[https://tools.ietf.org/html/rfc4880][RFC
 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC
 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])...
 ...
 foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar
 baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo
 bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz
 snip

 Text export result:

 snip
 OpenPGP Message Format ([RFC 4880] which obsoletes [RFC 1991] and [RFC
 2440])...  ...  foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz


 [RFC 4880] https://tools.ietf.org/html/rfc4880

 [RFC 1991] https://tools.ietf.org/html/rfc1991

 [RFC 2440] https://tools.ietf.org/html/rfc2440

 [RFC 4880] https://tools.ietf.org/html/rfc4880

 [RFC 1991] https://tools.ietf.org/html/rfc1991
 snip

 These multiple references look quite bad.  Is it possible to
 normalize the descriptions in some way *before* checking them
 for uniqueness and output them thereafter?

 Thanks for considering this issue.

Could you be more explicit? What does look quite bad? What did you
expect instead? How is related to line breaks in the descriptions?


Regards,

-- 
Nicolas Goaziou



Re: [O] Bug: text export and multi-word link descriptions with line breaks

2014-04-03 Thread Mathias Bauer
Hello Nicolas,

* Nicolas Goaziou wrote on 2014-04-03 at 17:25 (+0200):

 Mathias Bauer mba...@gmx.org writes:

  I just stumbled over Org's plain text export and how it works on
  links with descriptions consisting of multiple words and line
  breaks between them.  I'm running Org stable version 8.2.5h.
 
  Org source (spaces at the end of line 1 and 2 don't matter):
 
  snip
  OpenPGP Message Format ([[https://tools.ietf.org/html/rfc4880][RFC
  4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC
  1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])...
  ...
  foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar
  baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo
  bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz
  snip
 
  Text export result:
 
  snip
  OpenPGP Message Format ([RFC 4880] which obsoletes [RFC 1991] and [RFC
  2440])...  ...  foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz
 
 
  [RFC 4880] https://tools.ietf.org/html/rfc4880
 
  [RFC 1991] https://tools.ietf.org/html/rfc1991
 
  [RFC 2440] https://tools.ietf.org/html/rfc2440
 
  [RFC 4880] https://tools.ietf.org/html/rfc4880
 
  [RFC 1991] https://tools.ietf.org/html/rfc1991
  snip
 
  These multiple references look quite bad.  Is it possible to
  normalize the descriptions in some way *before* checking
  them for uniqueness and output them thereafter?

 Could you be more explicit? What does look quite bad? What did
 you expect instead? How is related to line breaks in the
 descriptions?

Ok, let's go into more details.  See the Org source text:

1. There are three links and each of them appears twice.  The
   link targets of every two of them are identical.

2. Each of the two [...][RFC 2440] links appear in one line; the
   links [...][RFC 4880] and [...][RFC 1991] each have a
   newline in their description.  They are in fact
   [...][RFC\n4880] and [...][RFC 4880] and, respectively,
   [...][RFC\n1991] and [...][RFC 1991].

So, now let's examine the Org text export:

The final reference part - the five links below the paragraph -
shows two links, [RFC 4880] and [RFC 1991], which appear twice
but the link [RFC 2440] appears only once there.

This is, at least, inconsistent.

The point is, that Org obviously considers [...][RFC 4880] and
[...][RFC\n4880] as being two different links internally and
list both of them in the reference part.  For this listing, the
\n is removed.  This is, what I called normalization in my
first post.

Human eyes, however, won't see any difference between this two
forms and start being surprised.

I expect, Org to do the following steps while parsing the source
text:

1. Normalize or clean the link description, i.e. remove any
   newlines, starting and trailing spaces, and replace any
   occurrences of [ \t]+ in the interior by a single space
   only.  (To be done.)

2. Check the tuple (description,target) for duplicates and drop
   them.  (Seems ok to me.)

3. Below the paragraph list the tuples as [description] target
   in the order of occurrence in the original text.  (Also seems
   ok to me.)

I hope this makes this issue a little bit more clear now.

Kind regards,
Mathias



Re: [O] Bug: text export and multi-word link descriptions with line breaks

2014-04-03 Thread Nicolas Goaziou
Mathias Bauer mba...@gmx.org writes:

 I expect, Org to do the following steps while parsing the source
 text:

 1. Normalize or clean the link description, i.e. remove any
newlines, starting and trailing spaces, and replace any
occurrences of [ \t]+ in the interior by a single space
only.  (To be done.)

 2. Check the tuple (description,target) for duplicates and drop
them.  (Seems ok to me.)

 3. Below the paragraph list the tuples as [description] target
in the order of occurrence in the original text.  (Also seems
ok to me.)

 I hope this makes this issue a little bit more clear now.

Indeed. I missed the duplicates links. This should be fixed.

Thank you for the report.


Regards,

-- 
Nicolas Goaziou