Re: [O] Bug: text export and multi-word link descriptions with line breaks

2014-04-03 Thread Nicolas Goaziou
Mathias Bauer  writes:

> I expect, Org to do the following steps while parsing the source
> text:
>
> 1. "Normalize" or clean the link description, i.e. remove any
>newlines, starting and trailing spaces, and replace any
>occurrences of "[ \t]+" in the interior by a single space
>only.  (To be done.)
>
> 2. Check the tuple (description,target) for duplicates and drop
>them.  (Seems ok to me.)
>
> 3. Below the paragraph list the tuples as "[description] target"
>in the order of occurrence in the original text.  (Also seems
>ok to me.)
>
> I hope this makes this issue a little bit more clear now.

Indeed. I missed the duplicates links. This should be fixed.

Thank you for the report.


Regards,

-- 
Nicolas Goaziou



Re: [O] Bug: text export and multi-word link descriptions with line breaks

2014-04-03 Thread Mathias Bauer
Hello Nicolas,

* Nicolas Goaziou wrote on 2014-04-03 at 17:25 (+0200):

> Mathias Bauer  writes:
>
> > I just stumbled over Org's plain text export and how it works on
> > links with descriptions consisting of multiple words and line
> > breaks between them.  I'm running Org stable version 8.2.5h.
> >
> > Org source (spaces at the end of line 1 and 2 don't matter):
> >
> > snip
> > "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC
> > 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC
> > 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])...
> > ...
> > foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar
> > baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo
> > bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz
> > snip
> >
> > Text export result:
> >
> > snip
> > "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC
> > 2440])...  ...  foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz
> >
> >
> > [RFC 4880] https://tools.ietf.org/html/rfc4880
> >
> > [RFC 1991] https://tools.ietf.org/html/rfc1991
> >
> > [RFC 2440] https://tools.ietf.org/html/rfc2440
> >
> > [RFC 4880] https://tools.ietf.org/html/rfc4880
> >
> > [RFC 1991] https://tools.ietf.org/html/rfc1991
> > snip
> >
> > These multiple references look quite bad.  Is it possible to
> > "normalize" the descriptions in some way *before* checking
> > them for uniqueness and output them thereafter?
>
> Could you be more explicit? What does look quite bad? What did
> you expect instead? How is related to line breaks in the
> descriptions?

Ok, let's go into more details.  See the Org source text:

1. There are three links and each of them appears twice.  The
   link targets of every two of them are identical.

2. Each of the two "[...][RFC 2440]" links appear in one line; the
   links "[...][RFC 4880]" and "[...][RFC 1991]" each have a
   newline in their description.  They are in fact
   "[...][RFC\n4880]" and "[...][RFC 4880]" and, respectively,
   "[...][RFC\n1991]" and "[...][RFC 1991]".

So, now let's examine the Org text export:

The final reference part - the five links below the paragraph -
shows two links, [RFC 4880] and [RFC 1991], which appear twice
but the link [RFC 2440] appears only once there.

This is, at least, inconsistent.

The point is, that Org obviously considers "[...][RFC 4880]" and
"[...][RFC\n4880]" as being two different links internally and
list both of them in the reference part.  For this listing, the
\n is removed.  This is, what I called "normalization" in my
first post.

Human eyes, however, won't see any difference between this two
forms and start being surprised.

I expect, Org to do the following steps while parsing the source
text:

1. "Normalize" or clean the link description, i.e. remove any
   newlines, starting and trailing spaces, and replace any
   occurrences of "[ \t]+" in the interior by a single space
   only.  (To be done.)

2. Check the tuple (description,target) for duplicates and drop
   them.  (Seems ok to me.)

3. Below the paragraph list the tuples as "[description] target"
   in the order of occurrence in the original text.  (Also seems
   ok to me.)

I hope this makes this issue a little bit more clear now.

Kind regards,
Mathias



Re: [O] Bug: text export and multi-word link descriptions with line breaks

2014-04-03 Thread Nicolas Goaziou
Hello,

Mathias Bauer  writes:

> I just stumbled over Org's plain text export and how it works on
> links with descriptions consisting of multiple words and line
> breaks between them.  I'm running Org stable version 8.2.5h.
>
> Org source (spaces at the end of line 1 and 2 don't matter):
>
> snip
> "OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC
> 4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC
> 1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])...
> ...
> foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar
> baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo
> bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz
> snip
>
> Text export result:
>
> snip
> "OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC
> 2440])...  ...  foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz
>
>
> [RFC 4880] https://tools.ietf.org/html/rfc4880
>
> [RFC 1991] https://tools.ietf.org/html/rfc1991
>
> [RFC 2440] https://tools.ietf.org/html/rfc2440
>
> [RFC 4880] https://tools.ietf.org/html/rfc4880
>
> [RFC 1991] https://tools.ietf.org/html/rfc1991
> snip
>
> These multiple references look quite bad.  Is it possible to
> "normalize" the descriptions in some way *before* checking them
> for uniqueness and output them thereafter?
>
> Thanks for considering this issue.

Could you be more explicit? What does look quite bad? What did you
expect instead? How is related to line breaks in the descriptions?


Regards,

-- 
Nicolas Goaziou



[O] Bug: text export and multi-word link descriptions with line breaks

2014-04-03 Thread Mathias Bauer
Dear Maintainers,

I just stumbled over Org's plain text export and how it works on
links with descriptions consisting of multiple words and line
breaks between them.  I'm running Org stable version 8.2.5h.

Org source (spaces at the end of line 1 and 2 don't matter):

snip
"OpenPGP Message Format" ([[https://tools.ietf.org/html/rfc4880][RFC
4880]] which obsoletes [[https://tools.ietf.org/html/rfc1991][RFC
1991]] and [[https://tools.ietf.org/html/rfc2440][RFC 2440]])...
...
foo [[https://tools.ietf.org/html/rfc4880][RFC 4880]] bar
baz [[https://tools.ietf.org/html/rfc1991][RFC 1991]] foo
bar [[https://tools.ietf.org/html/rfc2440][RFC 2440]] baz
snip

Text export result:

snip
"OpenPGP Message Format" ([RFC 4880] which obsoletes [RFC 1991] and [RFC
2440])...  ...  foo [RFC 4880] bar baz [RFC 1991] foo bar [RFC 2440] baz


[RFC 4880] https://tools.ietf.org/html/rfc4880

[RFC 1991] https://tools.ietf.org/html/rfc1991

[RFC 2440] https://tools.ietf.org/html/rfc2440

[RFC 4880] https://tools.ietf.org/html/rfc4880

[RFC 1991] https://tools.ietf.org/html/rfc1991
snip

These multiple references look quite bad.  Is it possible to
"normalize" the descriptions in some way *before* checking them
for uniqueness and output them thereafter?

Thanks for considering this issue.

Kind regards
Mathias