Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-11-25 Thread Max Nikulin

On 25/11/2021 19:11, Ihor Radchenko wrote:

Maxim Nikulin writes:


I think, new variant should be committed even it does not fix Juan's
case. He have not confirmed the fix yet.


The patch does apply onto main, but I have a comment.


+  ;; Emphasis of second word.
+  ;; Locales with significant spaces (C, es_ES, pl_PL)


I suspect that most of maintainers simply use C/en_US locales and we may
not be able to catch test errors in foreign locales. Do you know a good
way to automate testing with different locales? Can it be somehow
automated?

Otherwise, the patch looks good for me.


For this test locale is fixed to "C", see below. Concerning tests of 
multiple locales, I have no idea besides a virtual machine (qemu, 
VirtualBox, etc.) or container (docker, LXC, etc.)  with enough enabled 
and generated locales. I think, more locales may be enabled on SourceHut 
as well.


https://list.orgmode.org/s864tq$jet$1...@ciao.gmane.io/
On 21/05/2021 00:06, Maxim Nikulin wrote:>

In the following subthread Nicolas mentioned that some of them could fail
https://orgmode.org/list/87r1j6b6ku@nicolasgoaziou.fr
I do not see any reason for failure. I just have tried C.UTF-8, en_US.UTF-8,
es_ES.UTF-8, and ru_RU.UTF-8 locales (interactively) and do not see any
problem. This set of locales has 3 different collation rules,
however I do not think it matters for tests.

On 20/04/2021 19:37, Maxim Nikulin wrote:

On 20/04/2021 00:50, Nicolas Goaziou wrote:

Maxim Nikulin writes:


On 19/04/2021 23:08, Nicolas Goaziou wrote:

+  ;; Space role in sorting.
+  ;; Test would fail for locales with ignored space, e.g. en_US, it works
+  ;; in C and currently rare locales having significant space (es_ES, 
pl_PL)
+  (should
+   (equal "- Time stamp\n- Timer\n"
+  (org-test-with-temp-text "- Timer\n- Time stamp\n"
+(org-sort-list t ?a)
+(buffer-string))

Since this test is bound to fail for some developers, I assume it
shouldn't be included.


Locale "C" is forced for this group of tests.


Sorry, I don't understand. There is no locale change around this test,
so it will fail, for example, for me. I wouldn't want to get a noisy
failure each time I run tests.


Certainly flaky tests must be avoided. However I can not identify the source of 
confusion (yours or mine).

There is redefinition of `string-collate-lessp' to run tests with "C" locale:
https://code.orgmode.org/bzg/org-mode/src/master/testing/lisp/test-org-list.el#L1207

And it works for me






Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-11-25 Thread Ihor Radchenko
Maxim Nikulin  writes:

> I think, new variant should be committed even it does not fix Juan's 
> case. He have not confirmed the fix yet.

The patch does apply onto main, but I have a comment.

> +  ;; Emphasis of second word.
> +  ;; Locales with significant spaces (C, es_ES, pl_PL)

I suspect that most of maintainers simply use C/en_US locales and we may
not be able to catch test errors in foreign locales. Do you know a good
way to automate testing with different locales? Can it be somehow
automated?

Otherwise, the patch looks good for me.

Best,
Ihor



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-05-16 Thread Bastien
Hi Nicolas,

Nicolas Goaziou  writes:

>> Is this okay to install this in the maint branch?
>
> I didn't test it, but it seems to fix the issues reported. So I guess
> this is fine.

Done, thanks!

-- 
 Bastien



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-05-15 Thread Nicolas Goaziou
Hello,

Bastien  writes:

> Hi Nicolas,
>
> Nicolas Goaziou  writes:
>
>> Ah! I forgot the link part! Hopefully done here.
>>
>> (defun org-sort-remove-invisible (s)
>>   "Remove emphasis markers and any invisible property from string S.
>> Assume S may contain only objects."
>>   ;; org-element-interpret-data clears any text property, including
>>   ;; invisible part.
>
> Is this okay to install this in the maint branch?

I didn't test it, but it seems to fix the issues reported. So I guess
this is fine.

Regards,
-- 
Nicolas Goaziou



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-05-15 Thread Bastien
Maxim Nikulin  writes:

> Bastien, the following message in this thread contains alternative
> draft from Nicolas:
> https://orgmode.org/list/87a6pt9hyd@nicolasgoaziou.fr/
>
> It is better than initial patches from Juan Manuel. It fixes the
> problem with removing characters around emphasized text and prepending
> a space instead. As a result it fixes sorting items in es_ES locale.

Thanks, I've followed-up on the other thread.

-- 
 Bastien



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-05-15 Thread Bastien
Hi Nicolas,

Nicolas Goaziou  writes:

> Ah! I forgot the link part! Hopefully done here.
>
> (defun org-sort-remove-invisible (s)
>   "Remove emphasis markers and any invisible property from string S.
> Assume S may contain only objects."
>   ;; org-element-interpret-data clears any text property, including
>   ;; invisible part.

Is this okay to install this in the maint branch?

Thanks!

-- 
 Bastien



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-05-15 Thread Maxim Nikulin

On 15/05/2021 20:32, Bastien wrote:

Bastien  writes:


Bastien  writes:


Thanks for the heads-up.  I reverted the commit.  Please go ahead with
whatever you see fit.


PS: Re-opening this work-in-progress patch for updates.orgmode.org.


I'm closing this now -- Juan, feel free to resubmit a patch if you
are still working on this.


Bastien, the following message in this thread contains alternative draft 
from Nicolas:

https://orgmode.org/list/87a6pt9hyd@nicolasgoaziou.fr/

It is better than initial patches from Juan Manuel. It fixes the problem 
with removing characters around emphasized text and prepending a space 
instead. As a result it fixes sorting items in es_ES locale.





Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-05-15 Thread Bastien
Bastien  writes:

> Bastien  writes:
>
>> Thanks for the heads-up.  I reverted the commit.  Please go ahead with
>> whatever you see fit.
>
> PS: Re-opening this work-in-progress patch for updates.orgmode.org.

I'm closing this now -- Juan, feel free to resubmit a patch if you
are still working on this.

Thanks,

-- 
 Bastien



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-28 Thread Bastien
Bastien  writes:

> Thanks for the heads-up.  I reverted the commit.  Please go ahead with
> whatever you see fit.

PS: Re-opening this work-in-progress patch for updates.orgmode.org.

-- 
 Bastien



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-28 Thread Bastien
Nicolas Goaziou  writes:

> Unfortunately, this is not a proper fix for the problem, as discussed in
> the thread.

Thanks for the heads-up.  I reverted the commit.  Please go ahead with
whatever you see fit.

-- 
 Bastien



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-28 Thread Nicolas Goaziou
Hello,

Bastien  writes:

> Juan Manuel Macías  writes:
>
>> Done! I've attached the corrected patch. Sorry for the flaws in me
>> previous patch: I'm a bit of a novice at submitting patches...
>
> applied in the maint branch as commit 5be650714.

Unfortunately, this is not a proper fix for the problem, as discussed in
the thread.

Regards,
-- 
Nicolas Goaziou



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-27 Thread Bastien
Hi Juan,

Juan Manuel Macías  writes:

> Done! I've attached the corrected patch. Sorry for the flaws in me
> previous patch: I'm a bit of a novice at submitting patches...

applied in the maint branch as commit 5be650714.

The patch was missing a proper changelog entry: please see
https://orgmode.org/worg/org-contribute.html#commit-messages
for future patches.

Thanks!



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-24 Thread Maxim Nikulin

On 21/04/2021 22:45, Juan Manuel Macías wrote:


I have tried the Nicolas' patch (latest version) and I see that the
items with emphasis are already ordered well. However, it seems that the
problem with identical items with or without emphasis still persists:
which items should go before and in what order? For example, in the
following list I get:

- /a/
- *a*
- a
- *b*
- /b/
- b
- /v/
- *v*
- v


I am afraid, there is no easy way to take into account emphasis. Each 
item have to be split into logical units and locale-aware multilevel 
comparison should be applied to each unit separately. E.g. for 
description list, only term should be compared at first to properly 
order emphasized items, it does not matter if description starts from 
"a" or from "z". Simple `string-collate-lessp' for whole item uses 
further levels only strings are considered identical on previous levels. 
I had an idea to augment sort keys with some text properties for custom 
string comparator, but I decided that such complications would not 
ensure reliable sort for all possible cases.


- A :: B
- /A/ :: C
- *A* :: A

However `org-sort-remove invisible' still has some room for improvements 
(it is not mandatory in my opinion). I have realized it reading the 
thread on title representation for HTML export

https://orgmode.org/list/87h7jy4ebe@nicolasgoaziou.fr/

- <>A
- <<>>
- B




Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-21 Thread Juan Manuel Macías
Hi all,

Maxim Nikulin writes:

> I think, new variant should be committed even it does not fix Juan's
> case. He have not confirmed the fix yet.

Sorry, I have been busy with other matters and had lost the thread of
the discussion. I'm catching up on the messages...

I have tried the Nicolas' patch (latest version) and I see that the
items with emphasis are already ordered well. However, it seems that the
problem with identical items with or without emphasis still persists:
which items should go before and in what order? For example, in the
following list I get:

- /a/
- *a*
- a
- *b*
- /b/
- b
- /v/
- *v*
- v

Anyway, I think it is a minor problem (I can't think of many
scenarios in real life where this problem is relevant). In a more
realistic case, the result is correct:

- lo /bueno/
- lo bueno
- lo /vano/
- lo vano

...

- /a/
- b
- /v/
- *z*

Best regards,

Juan Manuel 



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-21 Thread Maxim Nikulin

On 21/04/2021 03:37, Nicolas Goaziou wrote:

Maxim Nikulin writes:


(let ((s (org-sort-remove-invisible
"A wrapping [[https://orgmode.org?a=b=d#e][link]] emphasis/"

I expect "A wrapping link emphasis".


Yet, your expectations are wrong. There is no link in the text above.
Emphasized text starts at "/wrapping" and ends before "/?".

Granted, this is a situation where the Org syntax is not very intuitive.
Anyway, the new function is more accurate.


I think, new variant should be committed even it does not fix Juan's 
case. He have not confirmed the fix yet.



Maybe we should require a space after punctuation following emphasized
text. I don't know. This is orthogonal to the current discussion.


I still believe in my expectation, however I admit such limitation of 
parser. At first I have not recognized that the issue may be similar to

https://orgmode.org/list/CAH+Wm4-_XHUZKFTf=ztbfncpvqwkbeoegs8epym+8spmu8l...@mail.gmail.com/
Anyway for my example workaround is to add more markers before, inside, 
and after the link. Maybe I will look closer at Tom's parser if it 
solves ambiguity in the same way.



In the meanwhile I have tried

 (benchmark-run 1 (org-sort-list t ?a))

in a file (1100 lines) obtained using


I don't think performance is really an issue. Of course, the suggested
function is clearly slower than the current one.


It is OK since difference is not really huge, especially taking into 
account that new variant was not compiled.


Do you still have problem with locale dependency of added tests? I can 
not guess what could be its source and expect that test should work 
reliably. Disregard "/3" in the subject of the patches. Third change is 
your code.
>From c2c46d133e80ffa2323618ac88a1902e63ba1efc Mon Sep 17 00:00:00 2001
From: Max Nikulin 
Date: Mon, 19 Apr 2021 19:06:36 +0700
Subject: [PATCH 1/3] More tests for `org-sort-list'

* lisp/org.el (org-verbatim-re): Add to the docstring a statement
concerning subgroups meaning.
* testing/lisp/test-org-list.el (test-org-list/sort): Add cases with
text emphasis.  Stress in comments that it is significant whether
locale-specific collation rules ignores spaces.
* testing/lisp/test-org.el (test-org/org-sort-remove-invisible): Add
tests for `org-sort-list' helper.
---
 lisp/org.el   |  3 ++-
 testing/lisp/test-org-list.el | 34 +-
 testing/lisp/test-org.el  | 25 +
 3 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/lisp/org.el b/lisp/org.el
index c2738100f..3d4f5553a 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -3685,7 +3685,8 @@ After a match, the match groups contain these elements:
 5  The character after the match, empty at the end of a line")
 
 (defvar org-verbatim-re nil
-  "Regular expression for matching verbatim text.")
+  "Regular expression for matching verbatim text.
+Groups 1-5 have the same meaning as in `org-emph-re' pattern.")
 
 (defvar org-emphasis-regexp-components) ; defined just below
 (defvar org-emphasis-alist) ; defined just below
diff --git a/testing/lisp/test-org-list.el b/testing/lisp/test-org-list.el
index 3689a172f..cc7914c8d 100644
--- a/testing/lisp/test-org-list.el
+++ b/testing/lisp/test-org-list.el
@@ -1225,7 +1225,39 @@ b. Item 2"
(equal "- b\n- a\n- C\n"
 	  (org-test-with-temp-text "- b\n- C\n- a\n"
 		(org-sort-list t ?A)
-		(buffer-string))
+		(buffer-string
+  ;; Emphasis in the beginning.
+  (should
+   (equal "- a\n- /z/\n"
+  (org-test-with-temp-text "- /z/\n- a\n"
+(org-sort-list t ?a)
+(buffer-string
+  (should
+   (equal "- *a*\n- z\n"
+  (org-test-with-temp-text "- z\n- *a*\n"
+(org-sort-list t ?a)
+(buffer-string
+  ;; Emphasis of second word.
+  ;; Locales with significant spaces (C, es_ES, pl_PL)
+  ;; are more sensitive to problems with sort.
+  (should
+   (equal "- a b\n- a /z/\n"
+  (org-test-with-temp-text "- a /z/\n- a b\n"
+(org-sort-list t ?a)
+(buffer-string
+  (should
+   (equal "- a *b*\n- a z\n"
+  (org-test-with-temp-text "- a z\n- a *b*\n"
+(org-sort-list t ?a)
+(buffer-string
+  ;; Space role in sorting.
+  ;; Test would fail for locales with ignored space, e.g. en_US, it works
+  ;; in C and currently rare locales having significant space (es_ES, pl_PL)
+  (should
+   (equal "- Time stamp\n- Timer\n"
+  (org-test-with-temp-text "- Timer\n- Time stamp\n"
+(org-sort-list t ?a)
+(buffer-string))
   ;; Sort numerically.
   (should
(equal "- 1\n- 2\n- 10\n"
diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el
index f6fb4b3ca..3f74b3f35 100644
--- a/testing/lisp/test-org.el
+++ b/testing/lisp/test-org.el
@@ -8262,4 +8262,29 @@ two
 
 (provide 'test-org)
 

Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-20 Thread Nicolas Goaziou
Hello,

Maxim Nikulin  writes:

> Surprisingly there are still cases when the old approach works better:
>
> (let ((s (org-sort-remove-invisible
> "A /wrapping [[https://orgmode.org/?a=b=d#e][link]] emphasis/")))
>   (set-text-properties 0 (length s) nil s)
>   s)
> "A wrapping [[https://orgmode.org?a=b=d#e][link]] emphasis/"
>
> I expect "A wrapping link emphasis".

Yet, your expectations are wrong. There is no link in the text above.
Emphasized text starts at "/wrapping" and ends before "/?".

Granted, this is a situation where the Org syntax is not very intuitive.
Anyway, the new function is more accurate.

Maybe we should require a space after punctuation following emphasized
text. I don't know. This is orthogonal to the current discussion.

> In the meanwhile I have tried
>
> (benchmark-run 1 (org-sort-list t ?a))
>
> in a file (1100 lines) obtained using

I don't think performance is really an issue. Of course, the suggested
function is clearly slower than the current one.

Regards,
-- 
Nicolas Goaziou



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-20 Thread Maxim Nikulin

On 20/04/2021 20:57, Nicolas Goaziou wrote:

Maxim Nikulin writes:

(org-sort-remove-invisible "A")
#("A" 0 1 (:parent (#("A" 0 1 ...


This is a string.


Thank you, from second attempt I have managed to strip text properties.

Since the intended usage of return value is sorting key, would not it 
benefit from passing result through the following expression?


(set-text-properties 0 (length s) nil s)

An alternative is to clean up keys in `org-sort-list' function.


Ah! I forgot the link part! Hopefully done here.


Surprisingly there are still cases when the old approach works better:

(let ((s (org-sort-remove-invisible
"A /wrapping [[https://orgmode.org/?a=b=d#e][link]] emphasis/")))
  (set-text-properties 0 (length s) nil s)
  s)
"A wrapping [[https://orgmode.org?a=b=d#e][link]] emphasis/"

I expect "A wrapping link emphasis".

In the meanwhile I have tried

(benchmark-run 1 (org-sort-list t ?a))

in a file (1100 lines) obtained using

grep '^- ' doc/org-manual.org >/tmp/list.org

It seems, performance is still acceptable (single run hardly could be 
considered as an accurate test):


(1.115571472 18 0.598646606999) ; new variant
(0.260384514 1 0.0980547519947) ; original code


--8<---cut here---start->8---
(defun org-sort-remove-invisible (s)
   "Remove emphasis markers and any invisible property from string S.
Assume S may contain only objects."
   ;; org-element-interpret-data clears any text property, including
   ;; invisible part.
   (org-element-interpret-data
(let ((tree (org-element-parse-secondary-string
 s (org-element-restriction 'paragraph
  (org-element-map tree '(bold code italic link strike-through underline 
verbatim)
(lambda (o)
  (pcase (org-element-type o)
;; Terminal object.  Replace it with its value.
((or `code `verbatim)
 (let ((new (org-element-property :value o)))
   (org-element-insert-before new o)
   (org-element-put-property
new :post-blank (org-element-property :post-blank o
;; Non-terminal objects.  Splice contents.
(type
 (let ((contents
(or (org-element-contents o)
(and (eq type 'link)
 (list (org-element-property :raw-link o)
   (c nil))
   (while contents
 (setq c (pop contents))
 (org-element-insert-before c o))
   (org-element-put-property
c :post-blank (org-element-property :post-blank o)
  (org-element-extract-element o)))
  ;; Return modified tree.
  tree)))
--8<---cut here---end--->8---






Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-20 Thread Nicolas Goaziou
Hello,

Maxim Nikulin  writes:

> Maybe first variant deserves to be committed while discussion of a 
> better option is in progress.

I'd rather not, since we're currently considering a somewhat different
path. The problem has been there for ages anyway.

> I can not even determine what type of structure is returned when 
> `org-sort-remove-invisible' is called from ert or from scratch buffer:
>
> (org-sort-remove-invisible "A")
> #("A" 0 1 (:parent (#("A" 0 1 ...

This is a string.

> A couple of obvious problems:
>
> 1. Link handling
>
> #+begin_src elisp
> (org-sort-remove-invisible
>   "- [[https://orgmode.org/submit?a=bc%20d=t+r#1234][a link]]")
> #+end_src
>
>
> #+RESULTS:
> : - [[https://orgmode.org/submit?a=bc%20d=t+r#1234][a link]]

Ah! I forgot the link part! Hopefully done here.

--8<---cut here---start->8---
(defun org-sort-remove-invisible (s)
  "Remove emphasis markers and any invisible property from string S.
Assume S may contain only objects."
  ;; org-element-interpret-data clears any text property, including
  ;; invisible part.
  (org-element-interpret-data
   (let ((tree (org-element-parse-secondary-string
s (org-element-restriction 'paragraph
 (org-element-map tree '(bold code italic link strike-through underline 
verbatim)
   (lambda (o)
 (pcase (org-element-type o)
   ;; Terminal object.  Replace it with its value.
   ((or `code `verbatim)
(let ((new (org-element-property :value o)))
  (org-element-insert-before new o)
  (org-element-put-property
   new :post-blank (org-element-property :post-blank o
   ;; Non-terminal objects.  Splice contents.
   (type
(let ((contents
   (or (org-element-contents o)
   (and (eq type 'link)
(list (org-element-property :raw-link o)
  (c nil))
  (while contents
(setq c (pop contents))
(org-element-insert-before c o))
  (org-element-put-property
   c :post-blank (org-element-property :post-blank o)
 (org-element-extract-element o)))
 ;; Return modified tree.
 tree)))
--8<---cut here---end--->8---

> 2. Missed spaces
>
> #+begin_src elisp
> (org-sort-remove-invisible "A *b* /i/ t.")
> #+end_src
>
> #+RESULTS:
> : A bit.

You need to update Org. I added a fix for that in "org-element"
yesterday.

Regards,
-- 
Nicolas Goaziou



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-20 Thread Maxim Nikulin

On 20/04/2021 00:50, Nicolas Goaziou wrote:

Maxim Nikulin writes:


On 19/04/2021 23:08, Nicolas Goaziou wrote:

+  ;; Space role in sorting.
+  ;; Test would fail for locales with ignored space, e.g. en_US, it works
+  ;; in C and currently rare locales having significant space (es_ES, 
pl_PL)
+  (should
+   (equal "- Time stamp\n- Timer\n"
+  (org-test-with-temp-text "- Timer\n- Time stamp\n"
+(org-sort-list t ?a)
+(buffer-string))

Since this test is bound to fail for some developers, I assume it
shouldn't be included.


Locale "C" is forced for this group of tests.


Sorry, I don't understand. There is no locale change around this test,
so it will fail, for example, for me. I wouldn't want to get a noisy
failure each time I run tests.


Certainly flaky tests must be avoided. However I can not identify the 
source of confusion (yours or mine).


There is redefinition of `string-collate-lessp' to run tests with "C" 
locale:

https://code.orgmode.org/bzg/org-mode/src/master/testing/lisp/test-org-list.el#L1207

And it works for me

#+begin_src elisp
(let ((original-string-collate-lessp (symbol-function 
#'string-collate-lessp)))

  (cl-letf (((symbol-function 'string-collate-lessp)
 (lambda (s1 s2  locale ignore-case)
   (funcall original-string-collate-lessp
s1 s2 "C" ignore-case
(org-test-with-temp-text "- Timer\n- Time stamp\n"
  (org-sort-list t ?a)
  (buffer-string
#+end_src

#+RESULTS:
: - Time stamp
: - Timer

#+begin_src elisp
(let ((original-string-collate-lessp (symbol-function 
#'string-collate-lessp)))

  (cl-letf (((symbol-function 'string-collate-lessp)
 (lambda (s1 s2  locale ignore-case)
   (funcall original-string-collate-lessp
s1 s2 "en_US.UTF-8" ignore-case
(org-test-with-temp-text "- Timer\n- Time stamp\n"
  (org-sort-list t ?a)
  (buffer-string
#+end_src

#+RESULTS:
: - Timer
: - Time stamp

glibc-2.31 (source for locales package), Ubuntu-20.04 focal, emacs-26.3

Could you, please, show result of the following command in your 
environment (for a chance that "C" locale definition has changed):


printf 'Timestamp\nTime zone\n' \
  | LC_ALL= LC_COLLATE= LANG=C.UTF-8 LANGUAGE=en sort --debug
sort: using ‘C.UTF-8’ sorting rules
Time zone
_
Timestamp
_





Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-20 Thread Maxim Nikulin

Hi,

On 19/04/2021 23:08, Nicolas Goaziou wrote:



In my opinion, a more severe limitation comes from sequential
regexp-based approach. Consider stripping markers from
1. "a =b *c* d= e"
2. "*b* /i/"


Fair enough. Here comes another, more involved, attempt.


Maybe first variant deserves to be committed while discussion of a 
better option is in progress.



--8<---cut here---start->8---
(defun org-sort-remove-invisible (s)
   "Remove emphasis markers and any invisible property from string S.
Assume S may contain only objects."
   ;; org-element-interpret-data clears any text property, including
   ;; invisible part.
   (org-element-interpret-data


Sorry, I can not help you with polishing code of this function, I am not 
familiar with functions working on org element tree yet.


I can not even determine what type of structure is returned when 
`org-sort-remove-invisible' is called from ert or from scratch buffer:


(org-sort-remove-invisible "A")
#("A" 0 1 (:parent (#("A" 0 1 ...

A couple of obvious problems:

1. Link handling

#+begin_src elisp
(org-sort-remove-invisible
 "- [[https://orgmode.org/submit?a=bc%20d=t+r#1234][a link]]")
#+end_src

#+RESULTS:
: - [[https://orgmode.org/submit?a=bc%20d=t+r#1234][a link]]

2. Missed spaces

#+begin_src elisp
(org-sort-remove-invisible "A *b* /i/ t.")
#+end_src

#+RESULTS:
: A bit.


(let ((tree (org-element-parse-secondary-string
 s (org-element-restriction 'paragraph
  (org-element-map tree '(bold code italic strike-through underline 
verbatim)
(lambda (o)
  (pcase (org-element-type o)
;; Terminal object.  Replace it with its value.
((or `code `verbatim)
 (let ((new (org-element-property :value o)))
   (org-element-insert-before new o)
   (org-element-put-property
new :post-blank (org-element-property :post-blank o
;; Non-terminal objects.  Splice contents.
(_
 (let ((contents (org-element-contents o))
   (c nil))
   (while contents
 (setq c (pop contents))
 (org-element-insert-before c o))
   (org-element-put-property
c :post-blank (org-element-property :post-blank o)
  (org-element-extract-element o)))
  ;; Return modified tree.
  tree)))
--8<---cut here---end--->8---

It is not perfect, but it does a better job.

WDYT?





Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-19 Thread Greg Minshall
Tom, thanks!  i assumed something like that.



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-19 Thread Nicolas Goaziou
Maxim Nikulin  writes:

> On 19/04/2021 23:08, Nicolas Goaziou wrote:
>>> +  ;; Space role in sorting.
>>> +  ;; Test would fail for locales with ignored space, e.g. en_US, it 
>>> works
>>> +  ;; in C and currently rare locales having significant space (es_ES, 
>>> pl_PL)
>>> +  (should
>>> +   (equal "- Time stamp\n- Timer\n"
>>> +  (org-test-with-temp-text "- Timer\n- Time stamp\n"
>>> +(org-sort-list t ?a)
>>> +(buffer-string))
>> Since this test is bound to fail for some developers, I assume it
>> shouldn't be included.
>
> Locale "C" is forced for this group of tests. 

Sorry, I don't understand. There is no locale change around this test,
so it will fail, for example, for me. I wouldn't want to get a noisy
failure each time I run tests.

Regards,



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-19 Thread Maxim Nikulin

On 19/04/2021 23:08, Nicolas Goaziou wrote:

+  ;; Space role in sorting.
+  ;; Test would fail for locales with ignored space, e.g. en_US, it works
+  ;; in C and currently rare locales having significant space (es_ES, 
pl_PL)
+  (should
+   (equal "- Time stamp\n- Timer\n"
+  (org-test-with-temp-text "- Timer\n- Time stamp\n"
+(org-sort-list t ?a)
+(buffer-string))


Since this test is bound to fail for some developers, I assume it
shouldn't be included.


Locale "C" is forced for this group of tests. I have added the test to 
catch an attempt to change locale since it would affect other cases. I 
hope, the comment might be useful for those who play with sorting having 
a regular locale instead of fallback "C".





Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-19 Thread Tom Gillespie
Hi Greg,
seq cannot be used because it is not available in older versions
of emacs that org still supports. When support for those older
versions is dropped then seq could be used. Best,
Tom



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-19 Thread Greg Minshall
hi, Nicolas,

i'm curious, not knowing history and/or procedures.

> ... CL is still necessary however, as we cannot use `seq' yet.

why is 'seq not "yet" available?  what will make it available?

cheers, Greg



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-19 Thread Nicolas Goaziou
Hello,

Maxim Nikulin  writes:

> Just a curiosity, what is style guide recommendation: let - lambda or
> cl-labels?

I stay away from CL as much as possible, otherwise newcomers will have
to learn two languages to start contributing, Elisp and CL (cl-loop,
ewww). CL is still necessary however, as we cannot use `seq' yet.

Also, there is `letrec' instead of `cl-labels'.

> In my opinion, a more severe limitation comes from sequential
> regexp-based approach. Consider stripping markers from
> 1. "a =b *c* d= e"
> 2. "*b* /i/"

Fair enough. Here comes another, more involved, attempt.

--8<---cut here---start->8---
(defun org-sort-remove-invisible (s)
  "Remove emphasis markers and any invisible property from string S.
Assume S may contain only objects."
  ;; org-element-interpret-data clears any text property, including
  ;; invisible part.
  (org-element-interpret-data
   (let ((tree (org-element-parse-secondary-string
s (org-element-restriction 'paragraph
 (org-element-map tree '(bold code italic strike-through underline verbatim)
   (lambda (o)
 (pcase (org-element-type o)
   ;; Terminal object.  Replace it with its value.
   ((or `code `verbatim)
(let ((new (org-element-property :value o)))
  (org-element-insert-before new o)
  (org-element-put-property
   new :post-blank (org-element-property :post-blank o
   ;; Non-terminal objects.  Splice contents.
   (_
(let ((contents (org-element-contents o))
  (c nil))
  (while contents
(setq c (pop contents))
(org-element-insert-before c o))
  (org-element-put-property
   c :post-blank (org-element-property :post-blank o)
 (org-element-extract-element o)))
 ;; Return modified tree.
 tree)))
--8<---cut here---end--->8---

It is not perfect, but it does a better job.

WDYT?

> +  ;; Space role in sorting.
> +  ;; Test would fail for locales with ignored space, e.g. en_US, it works
> +  ;; in C and currently rare locales having significant space (es_ES, 
> pl_PL)
> +  (should
> +   (equal "- Time stamp\n- Timer\n"
> +  (org-test-with-temp-text "- Timer\n- Time stamp\n"
> +(org-sort-list t ?a)
> +(buffer-string))

Since this test is bound to fail for some developers, I assume it
shouldn't be included.

> +  (dolist (case '(("Lost =in *verbatim* text= fragment" .
> +   "Lost in *verbatim* text fragment")
> +  ("Adjucent emphasis: *Overlapped* /regexps/" .
> +   "Adjucent emphasis: Ovelapped regexps")))
  
  typo

Regards,
-- 
Nicolas Goaziou



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-19 Thread Maxim Nikulin

On 19/04/2021 15:33, Nicolas Goaziou wrote:


Could you try the following instead?

--8<---cut here---start->8---
(defun org-sort-remove-invisible (s)
   "Remove invisible part of links and emphasis markers from string S."
   (let ((remove-markers
  (lambda (m)


Just a curiosity, what is style guide recommendation: let - lambda or 
cl-labels?



(concat (match-string 1 m)
(match-string 4 m)
(match-string 5 m)
 (remove-text-properties 0 (length s) org-rm-props s)
 (replace-regexp-in-string
  org-verbatim-re remove-markers
  (replace-regexp-in-string org-emph-re remove-markers 
(org-link-display-format s) t tt)


Single "t" is at the end, isn't it?


  t t)))


I think, the patch is an improvement.

There is still a minor shortcoming that ordering of emphasized and 
non-emphasized words is undefined

- /a/
- *a*
- a
Let's leave it aside since it requires multilevel comparison similar to 
collation in locales and more strict definition which part of string is 
compared at each level:

- /a/ :: A
- /a/ :: Z
- a :: A
- a :: Z

In my opinion, a more severe limitation comes from sequential 
regexp-based approach. Consider stripping markers from

1. "a =b *c* d= e"
2. "*b* /i/"
First case could be solved by splitting the input string by verbatim 
regexp at first and by applying emphasis substitution only to 
non-verbatim parts. However I am rather shy to experiment with regexps 
definition to avoid including chars before and after emphasis markers. 
It would be great to get role of particular characters from more 
reliable parser (or at least from text properties affected by 
fontification).


I have some tests for `org-sort-remove-invisible', see the attachment. 
Actually I do not like such style of tests, I strongly prefer to see all 
failures at once, but I have not found if such technique is used 
anywhere in org tests.
>From 6719386aa5a95394a4cb98ce28b889f545620bd9 Mon Sep 17 00:00:00 2001
From: Max Nikulin 
Date: Mon, 19 Apr 2021 19:06:36 +0700
Subject: [PATCH] More tests for `org-sort-list'

* lisp/org.el (org-verbatim-re): Add to the docstring a statement
concerning subgroups meaning.
* testing/lisp/test-org-list.el (test-org-list/sort): Add cases with
text emphasis.  Stress in comments that it is significant whether
locale-specific collation rules ignores spaces.
* testing/lisp/test-org.el (test-org/org-sort-remove-invisible): Add
tests for `org-sort-list' helper.

Add a test with expected failures to make apparent limitation of simple
regexp-based approach in `org-sort-remove-invisible' function.
---
 lisp/org.el   |  3 ++-
 testing/lisp/test-org-list.el | 34 +-
 testing/lisp/test-org.el  | 27 +++
 3 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/lisp/org.el b/lisp/org.el
index c2738100f..3d4f5553a 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -3685,7 +3685,8 @@ After a match, the match groups contain these elements:
 5  The character after the match, empty at the end of a line")
 
 (defvar org-verbatim-re nil
-  "Regular expression for matching verbatim text.")
+  "Regular expression for matching verbatim text.
+Groups 1-5 have the same meaning as in `org-emph-re' pattern.")
 
 (defvar org-emphasis-regexp-components) ; defined just below
 (defvar org-emphasis-alist) ; defined just below
diff --git a/testing/lisp/test-org-list.el b/testing/lisp/test-org-list.el
index 3689a172f..cc7914c8d 100644
--- a/testing/lisp/test-org-list.el
+++ b/testing/lisp/test-org-list.el
@@ -1225,7 +1225,39 @@ b. Item 2"
(equal "- b\n- a\n- C\n"
 	  (org-test-with-temp-text "- b\n- C\n- a\n"
 		(org-sort-list t ?A)
-		(buffer-string))
+		(buffer-string
+  ;; Emphasis in the beginning.
+  (should
+   (equal "- a\n- /z/\n"
+  (org-test-with-temp-text "- /z/\n- a\n"
+(org-sort-list t ?a)
+(buffer-string
+  (should
+   (equal "- *a*\n- z\n"
+  (org-test-with-temp-text "- z\n- *a*\n"
+(org-sort-list t ?a)
+(buffer-string
+  ;; Emphasis of second word.
+  ;; Locales with significant spaces (C, es_ES, pl_PL)
+  ;; are more sensitive to problems with sort.
+  (should
+   (equal "- a b\n- a /z/\n"
+  (org-test-with-temp-text "- a /z/\n- a b\n"
+(org-sort-list t ?a)
+(buffer-string
+  (should
+   (equal "- a *b*\n- a z\n"
+  (org-test-with-temp-text "- a z\n- a *b*\n"
+(org-sort-list t ?a)
+(buffer-string
+  ;; Space role in sorting.
+  ;; Test would fail for locales with ignored space, e.g. en_US, it works
+  ;; in C and currently rare locales having significant space (es_ES, pl_PL)
+  (should
+   (equal "- Time stamp\n- Timer\n"
+  

Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-19 Thread Nicolas Goaziou
Hello,

Juan Manuel Macías  writes:

> I wonder if this other approach can be viable or if it is something
> crazy: if the spaces in org-sort-remove-invisible are a
> problem only for the first emphasis of each item, how about this
> fix to org-sort-list? (not modifying org-sort-remove-invisible):

I think `org-sort-remove-invisible' is wrong, so it is the one that
needs to be fixed.

Could you try the following instead?

--8<---cut here---start->8---
(defun org-sort-remove-invisible (s)
  "Remove invisible part of links and emphasis markers from string S."
  (let ((remove-markers
 (lambda (m)
   (concat (match-string 1 m)
   (match-string 4 m)
   (match-string 5 m)
(remove-text-properties 0 (length s) org-rm-props s)
(replace-regexp-in-string
 org-verbatim-re remove-markers
 (replace-regexp-in-string org-emph-re remove-markers 
(org-link-display-format s) t tt)
 t t)))
--8<---cut here---end--->8---

Regards,
--
Nicolas Goaziou



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-18 Thread Juan Manuel Macías
Juan Manuel Macías writes:

> It seems that what I was proposing as a patch at the beginning is not,
> finally, a viable solution for all contexts...
>
> The problem is that, if the first space is removed, we get this
> abnormal result:
>
> #+begin_src emacs-lisp
> (org-sort-remove-invisible "- lo /bueno/")
> #+end_src
>
> #+RESULTS:
> : - lobueno

I wonder if this other approach can be viable or if it is something
crazy: if the spaces in org-sort-remove-invisible are a
problem only for the first emphasis of each item, how about this
fix to org-sort-list? (not modifying org-sort-remove-invisible):

@@ -2940,10 +2940,20 @@ function is being called interactively."
 (org-sort-remove-invisible
  (buffer-substring (match-end 0) (point-at-eol)
   ((= dcst ?a)
-   (funcall case-func
-(org-sort-remove-invisible
- (buffer-substring
-  (match-end 0) (point-at-eol)
+   (if (save-excursion
+ (beginning-of-line)
+ (forward-char)
+ (looking-at-p org-emph-re))
+   (replace-regexp-in-string
+"\\(^\\)\s+" "\\1"
+(funcall case-func
+ (org-sort-remove-invisible
+  (buffer-substring
+   (match-end 0) (point-at-eol)
+ (funcall case-func
+  (org-sort-remove-invisible
+   (buffer-substring
+(match-end 0) (point-at-eol))
   ((= dcst ?t)
(cond
 ;; If it is a timer list, convert timer to seconds



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-18 Thread Juan Manuel Macías
Hi Maxim,

Maxim Nikulin writes:

> On 10/04/2021 07:01, Juan Manuel Macías wrote:
>> Nicolas Goaziou writes:
>>> Could you apply the same fix to the `org-verbatim-re' match above, and
>>> provide an appropriate commit message?
>> Done! I've attached the corrected patch. Sorry for the flaws in me
>> previous patch: I'm a bit of a novice at submitting patches...
>
> I have not read yet the following documents to realize what pitfalls
> could be expected due to significant space added to Spanish and Polish 
> collation rules in libc.
>
> http://www.unicode.org/reports/tr10/
> Unicode® Technical Standard #10
> Unicode Collation Algorithm
>
> http://www.unicode.org/reports/tr35/tr35-collation.html#CLDR_Collation_Algorithm
> Unicode Technical Standard #35
> Unicode Locale Data Markup Language (LDML)
> Part 5: Collation
>
> In the meanwhile I have realized that org-sort-remove-invisible
> function has rather strange behavior. There are no tests, so I am in
> doubt if the following result is expected and intended (unpatched
> version)

All this research you've done on spaces and collation rules is very
interesting and enlightening.

Certainly, space is important, as the following list in Spanish is
correctly ordered as:

- lo raro
- lo vano
- localidad

However, with the 'unpatched' version of org-sort-remove invisible I get
this result (which is correct) for this list:

- lo bueno
- lo /bueno/
- lo vano
- lo /vano/

And with the patched version, I get this, which is wrong (!):

- lo bueno
- lo vano
- lo /bueno/
- lo /vano/

It seems that what I was proposing as a patch at the beginning is not,
finally, a viable solution for all contexts...

The problem is that, if the first space is removed, we get this
abnormal result:

#+begin_src emacs-lisp
(org-sort-remove-invisible "- lo /bueno/")
#+end_src

#+RESULTS:
: - lobueno

Best regards,

Juan Manuel 



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-17 Thread Maxim Nikulin

On 10/04/2021 07:01, Juan Manuel Macías wrote:

Nicolas Goaziou writes:

Could you apply the same fix to the `org-verbatim-re' match above, and
provide an appropriate commit message?


Done! I've attached the corrected patch. Sorry for the flaws in me
previous patch: I'm a bit of a novice at submitting patches...


I have not read yet the following documents to realize what pitfalls 
could be expected due to significant space added to Spanish and Polish 
collation rules in libc.


http://www.unicode.org/reports/tr10/
Unicode® Technical Standard #10
Unicode Collation Algorithm

http://www.unicode.org/reports/tr35/tr35-collation.html#CLDR_Collation_Algorithm
Unicode Technical Standard #35
Unicode Locale Data Markup Language (LDML)
Part 5: Collation

In the meanwhile I have realized that org-sort-remove-invisible function 
has rather strange behavior. There are no tests, so I am in doubt if the 
following result is expected and intended (unpatched version)


(org-sort-remove-invisible "- (*bold*)")
"-  bold "

I would expect
"- (bold)"

P.S.

On 10/04/2021 17:19, Nicolas Goaziou wrote:
>
> Do you have a simple test case to reproduce the problem? Currently
> sorting the following trivial lists causes no issue:

After the lengthy discussion the problem is traced down to the following:

- Ensure that you have *compiled* recent enough es_ES.UTF-8 locale 
(description includes 
https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=localedata/locales/es_ES;h=aa919a26267fd6311b71d7aeb81655e55787b4df;hp=d17612f6726d0c098ac981e06f3702106540bb23;hb=159738548130d5ac4fe6178977e940ed5f8cfdc4;hpb=ce6636b06b67d6bb9b3d6927bf2a926b9b7478f5)

- LC_ALL= LANG=es_ES.UTF-8 emacs
- (org-sort-list t ?a) for the following snippet

- /a/
- a
- /v/
- v

Significance of space should be better illustrated with multi-word items 
but I am not ready to provide an impressive example yet.





Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-16 Thread Maxim Nikulin

On 16/04/2021 21:59, Maxim Nikulin wrote:

Ukrainian sort works better than Russian one with such example:

printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \
   | LANG=uk_UA.UTF-8 sort
Иванов Адам
Иванова Алла
Иванова Светлана

printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \
   | LANG=ru_RU.UTF-8 sort
Иванова Алла
Иванов Адам
Иванова Светлана


Sorry, I forgot to generate uk_UA locale so simple ordering were applied 
without ignoring of spaces in the shared part of locale definitions.


printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \
  | LANG=uk_UA.UTF-8 sort
Иванова Алла
Иванов Адам
Иванова Светлана

So only pl_PL and es_ES have significant spaces

 ;IGNORE;IGNORE;

vs.

 IGNORE;IGNORE;IGNORE; % SPACE

in iso14651_t1_common




Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-16 Thread Maxim Nikulin

Hi Juan,

On 16/04/2021 01:21, you wrote:

#+begin_src emacs-lisp :tangle list-var.el
   (message "%S" (sort '("-\s\sv" "-\sv" "-\sa" "-\s\sa")
#'string-collate-lessp))
#+end_src

#+begin_src sh
exec 2>&1
LC_ALL=en_US.UTF-8 emacs --batch -Q -l list-var.el
#+end_src

#+RESULTS:
: -  a" "- a" "-  v" "- v

#+begin_src sh
exec 2>&1
LC_ALL=es_ES.UTF-8 emacs --batch -Q -l list-var.el
#+end_src

#+RESULTS:
: -  a" "-  v" "- a" "- v


You have managed to convince me that despite my first suspects the 
locale on your computer is correct. It is unexpectedly correct and it is 
more correct that most of locales in libc.


However I do not have opinion concerning you patch yet. I have not 
realized what is the proper way to sort list.


Space is significant. At least it may be. Only a few languages have got 
such fix, Spanish is among them 
https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=localedata/locales/es_ES;h=aa919a26267fd6311b71d7aeb81655e55787b4df;hp=d17612f6726d0c098ac981e06f3702106540bb23;hb=159738548130d5ac4fe6178977e940ed5f8cfdc4;hpb=ce6636b06b67d6bb9b3d6927bf2a926b9b7478f5


Notice "collating-symbol "

I have found example of sorting names in a language where woman surname 
usually have additional "a" in comparison to man surname.


printf "Ivanova Alla\nIvanov Adam\nIvanova Svetlana\n" \
  | LANG=pl_PL.UTF-8 sort
Ivanov Adam
Ivanova Alla
Ivanova Svetlana

es_ES behavior is just as the above example.

printf "Ivanova Alla\nIvanov Adam\nIvanova Svetlana\n" \
  | LANG=en_US.UTF-8 sort
Ivanova Alla
Ivanov Adam
Ivanova Svetlana

Ukrainian sort works better than Russian one with such example:

printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \
  | LANG=uk_UA.UTF-8 sort
Иванов Адам
Иванова Алла
Иванова Светлана

printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \
  | LANG=ru_RU.UTF-8 sort
Иванова Алла
Иванов Адам
Иванова Светлана

Man names are sorted first in such lists. Other cases might exist when 
significant space is undesired.


So sorting is tricky than I expected.




Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-15 Thread Juan Manuel Macías
Hi Maxim

Maxim Nikulin writes:

> I can reproduce the issue with emacs-27.1 from ubuntu-21.04 beta live
> image running in qemu. Org mode is current git HEAD. It seems that 
> something is changed in emacs since locale is correct:
>
> ubuntu@ubuntu:~$ printf '%s- v\n- /v/\n- a\n- /a/\n' '' \
>   | LANG=C.UTF-8 sort
> - /a/
> - /v/
> - a
> - v
> ubuntu@ubuntu:~$ printf '%s- v\n- /v/\n- a\n- /a/\n' '' \
>   | LANG=es_ES.UTF-8 sort
> - /a/
> - a
> - /v/
> - v
>
> Concerning org-sort-list, I do not see any problem with en_US.UTF-8,
> it_IT.UTF-8, and ru_RU.UTF-8 locales. However emphasized items are 
> sorted first for at least es_ES.UTF-8, es_MX.UTF-8, and es_US.UTF-8.
> I have found some evidence that the problem is on the org side
>
> cat list.el
> (message "%S" (sort '("- /v/" "- v" "- a" "- /a/")
> #'string-collate-lessp))
>
> LC_ALL=C.UTF-8 emacs --batch -Q -l list.el
> ("- /a/" "- /v/" "- a" "- v")
>
> LC_ALL=en_US.UTF-8 emacs --batch -Q -l list.el
> ("- /a/" "- a" "- /v/" "- v")
>
> LC_ALL=es_ES.UTF-8 emacs --batch -Q -l list.el
> ("- /a/" "- a" "- /v/" "- v")
>
> So even string-collate-lessp works as expected.
>
> I'm puzzled why the problem is specific to org-sort-list and namely to
> Spanish locales.
>
>

Well that's pretty weird ... I wonder if the (spurious?) spaces in
`org-sort-remove-invisible' that I mentioned at the beginning may be
helpful to keep a track, or if it's just something that masks the
problem.

Anyway, I have noticed that the only space that seems determinant (from
some way that escapes me) is this (where I put the @ sign):

org-emph-re (lambda (m) (format "@%s " (match-string 4 m)))

Following your examples, it occurred to me to try this, which I don't
know if it is relevant or if I have entered a dead end:

#+begin_src emacs-lisp :tangle list-var.el
  (message "%S" (sort '("-\s\sv" "-\sv" "-\sa" "-\s\sa")
#'string-collate-lessp))
#+end_src

#+begin_src sh
exec 2>&1
LC_ALL=en_US.UTF-8 emacs --batch -Q -l list-var.el
#+end_src

#+RESULTS:
: -  a" "- a" "-  v" "- v

#+begin_src sh
exec 2>&1
LC_ALL=es_ES.UTF-8 emacs --batch -Q -l list-var.el
#+end_src

#+RESULTS:
: -  a" "-  v" "- a" "- v

Best regards,

Juan Manuel 



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-15 Thread Maxim Nikulin
I can reproduce the issue with emacs-27.1 from ubuntu-21.04 beta live 
image running in qemu. Org mode is current git HEAD. It seems that 
something is changed in emacs since locale is correct:


ubuntu@ubuntu:~$ printf '%s- v\n- /v/\n- a\n- /a/\n' '' \
  | LANG=C.UTF-8 sort
- /a/
- /v/
- a
- v
ubuntu@ubuntu:~$ printf '%s- v\n- /v/\n- a\n- /a/\n' '' \
  | LANG=es_ES.UTF-8 sort
- /a/
- a
- /v/
- v

Concerning org-sort-list, I do not see any problem with en_US.UTF-8, 
it_IT.UTF-8, and ru_RU.UTF-8 locales. However emphasized items are 
sorted first for at least es_ES.UTF-8, es_MX.UTF-8, and es_US.UTF-8.

I have found some evidence that the problem is on the org side

cat list.el
(message "%S" (sort '("- /v/" "- v" "- a" "- /a/")
#'string-collate-lessp))

LC_ALL=C.UTF-8 emacs --batch -Q -l list.el
("- /a/" "- /v/" "- a" "- v")

LC_ALL=en_US.UTF-8 emacs --batch -Q -l list.el
("- /a/" "- a" "- /v/" "- v")

LC_ALL=es_ES.UTF-8 emacs --batch -Q -l list.el
("- /a/" "- a" "- /v/" "- v")

So even string-collate-lessp works as expected.

I'm puzzled why the problem is specific to org-sort-list and namely to 
Spanish locales.





Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-14 Thread Juan Manuel Macías
Hello again,

Since I have an old Emacs version (24.5.1) on my Raspberry, I've done
a few more tests. The situation is the following:

1. On Arch Linux and Emacs 27.2:

- The system locales are set to "es_ES.UTF-8". When, inside Emacs, I do M-x
getenv RET LANG RET, I get "es_ES.UTF-8".

org-sort-list a -> wrong result;

- Launching Emacs from terminal with

LANG=es_ES.UTF-8 \
etc...

org-sort-list a -> wrong result again.

2. On Fedora 32 (virtual machine) and Emacs 27.1

- Everything as in the previous case.

3. On Raspian stretch and Emacs 24.5.1:

- The system locales are set to "es_ES.UTF-8" as well. When, inside a
normal Emacs session, I do M-x getenv RET lang RET, I get "es_ES.UTF-8".

org-sort-list a -> wrong result;

- Launching Emacs from terminal with

LANG=es_ES.UTF-8 \
etc...

In this case the list is ordered correctly, but I observe that similar
forms with or without emphasis are not always ordered in the same way.
Sometimes the non-emphasized forms are ordered before and sometimes
they are ordered after (?).

I don't know if I'm missing something...

Best regards,

Juan Manuel

Juan Manuel Macías writes:

> Hi Maxim,
>
> Thanks again. In my case, I keep getting the wrong result. In addition
> to the test I did in a virtual machine with Fedora, I use Arch Linux in
> my every day computers, with locales correctly (I hope) configured as
> es_ES.UTF-8 (there was a typo in my previous mail, sorry:
> 'en_ES.UTF-8'). In my /etc/locale.conf file I have:
>
> LANG=es_ES.UTF-8
> LC_ADDRESS=es_ES.UTF-8
> LC_IDENTIFICATION=es_ES.UTF-8
> LC_MEASUREMENT=es_ES.UTF-8
> LC_MONETARY=es_ES.UTF-8
> LC_NAME=es_ES.UTF-8
> LC_NUMERIC=es_ES.UTF-8
> LC_PAPER=es_ES.UTF-8
> LC_TELEPHONE=es_ES.UTF-8
> LC_TIME=es_ES.UTF-8
>
> And with locale -a, I get a list of enabled locales:
>
> C
> en_US.utf8
> es_ES.utf8
> POSIX
>
> However, I keep getting the wrong result :-( (Arch, Emacs 27.2).
>
> Even with
>
> LANG=en_EN.UTF-8 \
> emacs -nw -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
> list.org
>
> Maybe the problem is on Arch's side (?), although it was also reproducible
> in the test I did with Fedora in virtual machine and Emacs 27.
>
> Best regards,
>
> Juan Manuel
>
> Maxim Nikulin writes:
>
>> On 14/04/2021 02:08, Juan Manuel Macías wrote:
>>> The situation is that with locales configured for Spanish from Spain
>>> (en_ES.UTF-8) the list is not ordered correctly, unless those three
>>> spaces from org-sort-remove-invisible are removed. But I couldn't say
>>> why or if that would be appropriate as a patch...
>>
>> Did not you get a warning like the following one?
>>
>> (process:220): Gtk-WARNING **: 15:17:45.066: Locale not supported by C
>> library.
>>   Using the fallback 'C' locale.
>>
>>> Regarding the collation rule of forms with emphasis, at least in Spanish
>>> these should come after the non-emphasized forms. I don't know if this
>>> is a general rule also for other languages (at least it seems more
>>> natural that the forms without emphasis are placed before).
>>
>> LANG=es_ES.UTF-8 \
>>   emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
>>   list.org
>>
>> - a
>> - /a/
>> - v
>> - /v/
>>
>> So it works accordingly to your expectation. Emacs 25.2.2,
>> Ubuntu-18.04 container.
>>
>> I have generated es_ES.UTF-8 locale using
>>
>> dpkg-reconfigure locales
>>
>> Depending on linux distribution you run, the locale may be ready to
>> use or not. I tend to think that in minimal environment of virtual
>> machine it was missed.
>>
>> I had an idea to add a test for sorting of items including emphasized
>> ones but test-org-list/sort forces C locale. Maybe it was done to
>> avoid failures due to missed locale.
>>
>>
>



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-14 Thread Juan Manuel Macías
Hi Maxim,

Thanks again. In my case, I keep getting the wrong result. In addition
to the test I did in a virtual machine with Fedora, I use Arch Linux in
my every day computers, with locales correctly (I hope) configured as
es_ES.UTF-8 (there was a typo in my previous mail, sorry:
'en_ES.UTF-8'). In my /etc/locale.conf file I have:

LANG=es_ES.UTF-8
LC_ADDRESS=es_ES.UTF-8
LC_IDENTIFICATION=es_ES.UTF-8
LC_MEASUREMENT=es_ES.UTF-8
LC_MONETARY=es_ES.UTF-8
LC_NAME=es_ES.UTF-8
LC_NUMERIC=es_ES.UTF-8
LC_PAPER=es_ES.UTF-8
LC_TELEPHONE=es_ES.UTF-8
LC_TIME=es_ES.UTF-8

And with locale -a, I get a list of enabled locales:

C
en_US.utf8
es_ES.utf8
POSIX

However, I keep getting the wrong result :-( (Arch, Emacs 27.2).

Even with

LANG=en_EN.UTF-8 \
emacs -nw -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
list.org

Maybe the problem is on Arch's side (?), although it was also reproducible
in the test I did with Fedora in virtual machine and Emacs 27. 

Best regards,

Juan Manuel 

Maxim Nikulin writes:

> On 14/04/2021 02:08, Juan Manuel Macías wrote:
>> The situation is that with locales configured for Spanish from Spain
>> (en_ES.UTF-8) the list is not ordered correctly, unless those three
>> spaces from org-sort-remove-invisible are removed. But I couldn't say
>> why or if that would be appropriate as a patch...
>
> Did not you get a warning like the following one?
>
> (process:220): Gtk-WARNING **: 15:17:45.066: Locale not supported by C
> library.
>   Using the fallback 'C' locale.
>
>> Regarding the collation rule of forms with emphasis, at least in Spanish
>> these should come after the non-emphasized forms. I don't know if this
>> is a general rule also for other languages (at least it seems more
>> natural that the forms without emphasis are placed before).
>
> LANG=es_ES.UTF-8 \
>   emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
>   list.org
>
> - a
> - /a/
> - v
> - /v/
>
> So it works accordingly to your expectation. Emacs 25.2.2,
> Ubuntu-18.04 container.
>
> I have generated es_ES.UTF-8 locale using
>
> dpkg-reconfigure locales
>
> Depending on linux distribution you run, the locale may be ready to
> use or not. I tend to think that in minimal environment of virtual
> machine it was missed.
>
> I had an idea to add a test for sorting of items including emphasized
> ones but test-org-list/sort forces C locale. Maybe it was done to
> avoid failures due to missed locale.
>
>



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-14 Thread Maxim Nikulin

On 14/04/2021 22:42, Maxim Nikulin wrote:

I have generated es_ES.UTF-8 locale using

     dpkg-reconfigure locales

Depending on linux distribution you run, the locale may be ready to use 
or not. I tend to think that in minimal environment of virtual machine 
it was missed.


I forgot to add a command how to get the list of generated locales

 locale --all-locales

locales-2.27-3ubuntu1.4 package is built from glibc sources.





Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-14 Thread Maxim Nikulin

On 14/04/2021 02:08, Juan Manuel Macías wrote:


The situation is that with locales configured for Spanish from Spain
(en_ES.UTF-8) the list is not ordered correctly, unless those three
spaces from org-sort-remove-invisible are removed. But I couldn't say
why or if that would be appropriate as a patch...


Did not you get a warning like the following one?

(process:220): Gtk-WARNING **: 15:17:45.066: Locale not supported by C 
library.

Using the fallback 'C' locale.


Regarding the collation rule of forms with emphasis, at least in Spanish
these should come after the non-emphasized forms. I don't know if this
is a general rule also for other languages (at least it seems more
natural that the forms without emphasis are placed before).


LANG=es_ES.UTF-8 \
  emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
  list.org

- a
- /a/
- v
- /v/

So it works accordingly to your expectation. Emacs 25.2.2, Ubuntu-18.04 
container.


I have generated es_ES.UTF-8 locale using

dpkg-reconfigure locales

Depending on linux distribution you run, the locale may be ready to use 
or not. I tend to think that in minimal environment of virtual machine 
it was missed.


I had an idea to add a test for sorting of items including emphasized 
ones but test-org-list/sort forces C locale. Maybe it was done to avoid 
failures due to missed locale.





Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-13 Thread Juan Manuel Macías
Hi, Maxim,

Thanks for clearing things up. So it seems obvious that the root
of the problem is in the locales and the collation rules.

The situation is that with locales configured for Spanish from Spain
(en_ES.UTF-8) the list is not ordered correctly, unless those three
spaces from org-sort-remove-invisible are removed. But I couldn't say
why or if that would be appropriate as a patch...

Regarding the collation rule of forms with emphasis, at least in Spanish
these should come after the non-emphasized forms. I don't know if this
is a general rule also for other languages (at least it seems more
natural that the forms without emphasis are placed before). 

Best regards,

Juan Manuel

Maxim Nikulin writes:

> I could reproduce such result but I am in doubt if it is a reason to
> merge the patch. I believe, the following behavior is almost expected
>
> list.org:
> - v
> - /v/
> - /a/
> - a
>
> LC_COLLATE=C.UTF-8 LC_ALL= LANGUAGE= \
>   emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
>   list.org
>
> (org-sort-list t ?a)
>
> - /a/
> - /v/
> - a
> - v
>
> LC_COLLATE=en_US.UTF-8 LC_ALL= LANGUAGE= \
>   emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
>   list.org
>
> (org-sort-list t ?a)
>
> - /a/
> - a
> - /v/
> - v
>
> Collation rules depend on language. The question is if emphasized
> variant should be sorted first.
>
> P.S. The thread is broken. Some new messages do not have proper
> In-Reply-To header. Original question was not referenced in the
> message with patch as well:
> https://orgmode.org/list/87blbac0k0@posteo.net/




Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-13 Thread Maxim Nikulin

On 10/04/2021 18:41, Juan Manuel Macías wrote:

Nicolas Goaziou writes:


Do you have a simple test case to reproduce the problem? Currently
sorting the following trivial lists causes no issue:

 - b
 - *a*

and

 - *b*
 - a


The current result is wrong:

- /a/
- /v/
- a
- b


I could reproduce such result but I am in doubt if it is a reason to 
merge the patch. I believe, the following behavior is almost expected


list.org:
- v
- /v/
- /a/
- a

LC_COLLATE=C.UTF-8 LC_ALL= LANGUAGE= \
  emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
  list.org

(org-sort-list t ?a)

- /a/
- /v/
- a
- v

LC_COLLATE=en_US.UTF-8 LC_ALL= LANGUAGE= \
  emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \
  list.org

(org-sort-list t ?a)

- /a/
- a
- /v/
- v

Collation rules depend on language. The question is if emphasized 
variant should be sorted first.


P.S. The thread is broken. Some new messages do not have proper 
In-Reply-To header. Original question was not referenced in the message 
with patch as well: https://orgmode.org/list/87blbac0k0@posteo.net/





Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-13 Thread Juan Manuel Macías
Hi Samuel,

Samuel Wales writes:

> probably not a relevant non-confirmation but in recent maint, my config:
>
> - a
> - /a/
> - b

Thanks for trying. I've uploaded this screencast with a minimal Emacs on
a virtual machine:

https://gnutas.juanmanuelmacias.com/images/org-sort-issue-2021-04-13_15.44.52.mp4

and, as you can see, in my case `org-sort-list' can not sort the list
correctly. Maybe it's something related to locales (??), but the only
thing that I can confirm is that if I remove those spaces in
`org-sort-remove-invisible', the list is then sorted correctly.

Best regards,

Juan Manuel 



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-12 Thread Samuel Wales
probably not a relevant non-confirmation but in recent maint, my config:

- a
- /a/
- b


On 4/12/21, Juan Manuel Macías  wrote:
> Hi Ypo,
>
> Ypo writes:
>
>> This is my result after doing "org-sort-list a":
>>
>>   - /a/
>>   - /v/
>>   - a
>>   - b
>>
>> Regards
>
> Thanks for trying. So it seems that you can reproduce the problem as
> well... I wonder if anyone else is able to reproduce it, preferably in a
> minimal emacs.
>
> Best regards,
>
> Juan Manuel
>
>


-- 
The Kafka Pandemic

Please learn what misopathy is.
https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-12 Thread Juan Manuel Macías
Hi Ypo,

Ypo writes:

> This is my result after doing "org-sort-list a":
>
>   - /a/
>   - /v/
>   - a
>   - b
>
> Regards

Thanks for trying. So it seems that you can reproduce the problem as
well... I wonder if anyone else is able to reproduce it, preferably in a
minimal emacs.

Best regards,

Juan Manuel 



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-12 Thread Ypo

This is my result after doing "org-sort-list a":

  - /a/
  - /v/
  - a
  - b

Regards


Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-12 Thread Juan Manuel Macías
Nicolas Goaziou writes:

> I cannot reproduce it. With your initial list, and a minimal init file,
> I get:
>
>   - /a/
>   - a
>   - b
>   - /v/
>
> Could you try with a minimal Emacs, too?

That's weird ... I have tried launching emacs -q in a virtual machine,
and I keep getting the wrong result (git version, master branch):

- /a/
- /v/
- a
- b

I have tried with the typical keyboard shortcut and also with M-: and
evaluating (org-sort-list t? a nil nil nil)

Best regards,

Juan Manuel 



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-10 Thread Juan Manuel Macías
Hello Nicolas,

Nicolas Goaziou writes:

> Do you have a simple test case to reproduce the problem? Currently
> sorting the following trivial lists causes no issue:
>
> - b
> - *a*
>
> and
>
> - *b*
> - a

Consider this (unordered) list:

- a
- b
- /v/
- /a/

The current result is wrong:

- /a/
- /v/
- a
- b

With the spaces removed in `org-sort-remove-invisible', this would be the
expected result[1]:

- a
- /a/
- b
- /v/

[1] According to the National Information Standards Organization:
"Different typefaces (italic, boldface, blackletter, etc.) do not affect
the arrangement of letters." (see p. 3 on:
https://www.niso.org/sites/default/files/2017-08/tr03.pdf)

Best regards,

Juan Manuel 



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-10 Thread Nicolas Goaziou
Hello,

Juan Manuel Macías  writes:

> Done! I've attached the corrected patch. Sorry for the flaws in me
> previous patch: I'm a bit of a novice at submitting patches...

No problem. Thank you.

Do you have a simple test case to reproduce the problem? Currently
sorting the following trivial lists causes no issue:

- b
- *a*

and 

- *b*
- a

Regards,
-- 
Nicolas Goaziou



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-09 Thread Juan Manuel Macías
Hellow Nicolas:

Nicolas Goaziou writes:

> Thank you.
>
> Could you apply the same fix to the `org-verbatim-re' match above, and
> provide an appropriate commit message?

Done! I've attached the corrected patch. Sorry for the flaws in me
previous patch: I'm a bit of a novice at submitting patches...

Best regards,

Juan Manuel

>From ab104bb079e3e32d622a6ff53824b5047dc25c14 Mon Sep 17 00:00:00 2001
From: Juan Manuel Macias 
Date: Fri, 2 Apr 2021 21:20:17 +0200
Subject: [PATCH] Remove spurious spaces in org-sort-remove-invisible

Some spurious spaces in org-sort-remove-invisible is causing
org-sort-list not to sort correctly (alphabetically) the items that
contain emphasis marks in a plain list
---
 lisp/org.el | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lisp/org.el b/lisp/org.el
index 675a614e2..74e2afac9 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8113,9 +8113,9 @@ Optional argument WITH-CASE means sort case-sensitively."
   "Remove invisible part of links and emphasis markers from string S."
   (remove-text-properties 0 (length s) org-rm-props s)
   (replace-regexp-in-string
-   org-verbatim-re (lambda (m) (format "%s " (match-string 4 m)))
+   org-verbatim-re (lambda (m) (format "%s" (match-string 4 m)))
(replace-regexp-in-string
-org-emph-re (lambda (m) (format " %s " (match-string 4 m)))
+org-emph-re (lambda (m) (format "%s" (match-string 4 m)))
 (org-link-display-format s)
 t t) t t))
 
-- 
2.26.0



Re: [Patch] to correctly sort the items with emphasis marks in a list

2021-04-09 Thread Nicolas Goaziou
Hello,

Juan Manuel Macías  writes:

> I have noticed that a couple of (spurious?) spaces in a `format'
> expression of `org-sort-remove-invisible' is causing `org-sort-list' not
> to sort correctly (alphabetically) the items that contain emphasis marks
> in a plain list. I propose this very simple patch to fix that problem.

Thank you.

Could you apply the same fix to the `org-verbatim-re' match above, and
provide an appropriate commit message?

Regards,
-- 
Nicolas Goaziou



[Patch] to correctly sort the items with emphasis marks in a list

2021-04-02 Thread Juan Manuel Macías
Hi all,

I have noticed that a couple of (spurious?) spaces in a `format'
expression of `org-sort-remove-invisible' is causing `org-sort-list' not
to sort correctly (alphabetically) the items that contain emphasis marks
in a plain list. I propose this very simple patch to fix that problem.

Best regards,

Juan Manuel 

diff --git a/lisp/org.el b/lisp/org.el
index 04da1afcd..d10cc2f5c 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8114,7 +8114,7 @@ Optional argument WITH-CASE means sort case-sensitively."
   (replace-regexp-in-string
org-verbatim-re (lambda (m) (format "%s " (match-string 4 m)))
(replace-regexp-in-string
-org-emph-re (lambda (m) (format " %s " (match-string 4 m)))
+org-emph-re (lambda (m) (format "%s" (match-string 4 m)))
 (org-link-display-format s)
 t t) t t))