Re: [Patch] to correctly sort the items with emphasis marks in a list
On 25/11/2021 19:11, Ihor Radchenko wrote: Maxim Nikulin writes: I think, new variant should be committed even it does not fix Juan's case. He have not confirmed the fix yet. The patch does apply onto main, but I have a comment. + ;; Emphasis of second word. + ;; Locales with significant spaces (C, es_ES, pl_PL) I suspect that most of maintainers simply use C/en_US locales and we may not be able to catch test errors in foreign locales. Do you know a good way to automate testing with different locales? Can it be somehow automated? Otherwise, the patch looks good for me. For this test locale is fixed to "C", see below. Concerning tests of multiple locales, I have no idea besides a virtual machine (qemu, VirtualBox, etc.) or container (docker, LXC, etc.) with enough enabled and generated locales. I think, more locales may be enabled on SourceHut as well. https://list.orgmode.org/s864tq$jet$1...@ciao.gmane.io/ On 21/05/2021 00:06, Maxim Nikulin wrote:> In the following subthread Nicolas mentioned that some of them could fail https://orgmode.org/list/87r1j6b6ku@nicolasgoaziou.fr I do not see any reason for failure. I just have tried C.UTF-8, en_US.UTF-8, es_ES.UTF-8, and ru_RU.UTF-8 locales (interactively) and do not see any problem. This set of locales has 3 different collation rules, however I do not think it matters for tests. On 20/04/2021 19:37, Maxim Nikulin wrote: On 20/04/2021 00:50, Nicolas Goaziou wrote: Maxim Nikulin writes: On 19/04/2021 23:08, Nicolas Goaziou wrote: + ;; Space role in sorting. + ;; Test would fail for locales with ignored space, e.g. en_US, it works + ;; in C and currently rare locales having significant space (es_ES, pl_PL) + (should + (equal "- Time stamp\n- Timer\n" + (org-test-with-temp-text "- Timer\n- Time stamp\n" +(org-sort-list t ?a) +(buffer-string)) Since this test is bound to fail for some developers, I assume it shouldn't be included. Locale "C" is forced for this group of tests. Sorry, I don't understand. There is no locale change around this test, so it will fail, for example, for me. I wouldn't want to get a noisy failure each time I run tests. Certainly flaky tests must be avoided. However I can not identify the source of confusion (yours or mine). There is redefinition of `string-collate-lessp' to run tests with "C" locale: https://code.orgmode.org/bzg/org-mode/src/master/testing/lisp/test-org-list.el#L1207 And it works for me
Re: [Patch] to correctly sort the items with emphasis marks in a list
Maxim Nikulin writes: > I think, new variant should be committed even it does not fix Juan's > case. He have not confirmed the fix yet. The patch does apply onto main, but I have a comment. > + ;; Emphasis of second word. > + ;; Locales with significant spaces (C, es_ES, pl_PL) I suspect that most of maintainers simply use C/en_US locales and we may not be able to catch test errors in foreign locales. Do you know a good way to automate testing with different locales? Can it be somehow automated? Otherwise, the patch looks good for me. Best, Ihor
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hi Nicolas, Nicolas Goaziou writes: >> Is this okay to install this in the maint branch? > > I didn't test it, but it seems to fix the issues reported. So I guess > this is fine. Done, thanks! -- Bastien
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hello, Bastien writes: > Hi Nicolas, > > Nicolas Goaziou writes: > >> Ah! I forgot the link part! Hopefully done here. >> >> (defun org-sort-remove-invisible (s) >> "Remove emphasis markers and any invisible property from string S. >> Assume S may contain only objects." >> ;; org-element-interpret-data clears any text property, including >> ;; invisible part. > > Is this okay to install this in the maint branch? I didn't test it, but it seems to fix the issues reported. So I guess this is fine. Regards, -- Nicolas Goaziou
Re: [Patch] to correctly sort the items with emphasis marks in a list
Maxim Nikulin writes: > Bastien, the following message in this thread contains alternative > draft from Nicolas: > https://orgmode.org/list/87a6pt9hyd@nicolasgoaziou.fr/ > > It is better than initial patches from Juan Manuel. It fixes the > problem with removing characters around emphasized text and prepending > a space instead. As a result it fixes sorting items in es_ES locale. Thanks, I've followed-up on the other thread. -- Bastien
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hi Nicolas, Nicolas Goaziou writes: > Ah! I forgot the link part! Hopefully done here. > > (defun org-sort-remove-invisible (s) > "Remove emphasis markers and any invisible property from string S. > Assume S may contain only objects." > ;; org-element-interpret-data clears any text property, including > ;; invisible part. Is this okay to install this in the maint branch? Thanks! -- Bastien
Re: [Patch] to correctly sort the items with emphasis marks in a list
On 15/05/2021 20:32, Bastien wrote: Bastien writes: Bastien writes: Thanks for the heads-up. I reverted the commit. Please go ahead with whatever you see fit. PS: Re-opening this work-in-progress patch for updates.orgmode.org. I'm closing this now -- Juan, feel free to resubmit a patch if you are still working on this. Bastien, the following message in this thread contains alternative draft from Nicolas: https://orgmode.org/list/87a6pt9hyd@nicolasgoaziou.fr/ It is better than initial patches from Juan Manuel. It fixes the problem with removing characters around emphasized text and prepending a space instead. As a result it fixes sorting items in es_ES locale.
Re: [Patch] to correctly sort the items with emphasis marks in a list
Bastien writes: > Bastien writes: > >> Thanks for the heads-up. I reverted the commit. Please go ahead with >> whatever you see fit. > > PS: Re-opening this work-in-progress patch for updates.orgmode.org. I'm closing this now -- Juan, feel free to resubmit a patch if you are still working on this. Thanks, -- Bastien
Re: [Patch] to correctly sort the items with emphasis marks in a list
Bastien writes: > Thanks for the heads-up. I reverted the commit. Please go ahead with > whatever you see fit. PS: Re-opening this work-in-progress patch for updates.orgmode.org. -- Bastien
Re: [Patch] to correctly sort the items with emphasis marks in a list
Nicolas Goaziou writes: > Unfortunately, this is not a proper fix for the problem, as discussed in > the thread. Thanks for the heads-up. I reverted the commit. Please go ahead with whatever you see fit. -- Bastien
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hello, Bastien writes: > Juan Manuel Macías writes: > >> Done! I've attached the corrected patch. Sorry for the flaws in me >> previous patch: I'm a bit of a novice at submitting patches... > > applied in the maint branch as commit 5be650714. Unfortunately, this is not a proper fix for the problem, as discussed in the thread. Regards, -- Nicolas Goaziou
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hi Juan, Juan Manuel Macías writes: > Done! I've attached the corrected patch. Sorry for the flaws in me > previous patch: I'm a bit of a novice at submitting patches... applied in the maint branch as commit 5be650714. The patch was missing a proper changelog entry: please see https://orgmode.org/worg/org-contribute.html#commit-messages for future patches. Thanks!
Re: [Patch] to correctly sort the items with emphasis marks in a list
On 21/04/2021 22:45, Juan Manuel Macías wrote: I have tried the Nicolas' patch (latest version) and I see that the items with emphasis are already ordered well. However, it seems that the problem with identical items with or without emphasis still persists: which items should go before and in what order? For example, in the following list I get: - /a/ - *a* - a - *b* - /b/ - b - /v/ - *v* - v I am afraid, there is no easy way to take into account emphasis. Each item have to be split into logical units and locale-aware multilevel comparison should be applied to each unit separately. E.g. for description list, only term should be compared at first to properly order emphasized items, it does not matter if description starts from "a" or from "z". Simple `string-collate-lessp' for whole item uses further levels only strings are considered identical on previous levels. I had an idea to augment sort keys with some text properties for custom string comparator, but I decided that such complications would not ensure reliable sort for all possible cases. - A :: B - /A/ :: C - *A* :: A However `org-sort-remove invisible' still has some room for improvements (it is not mandatory in my opinion). I have realized it reading the thread on title representation for HTML export https://orgmode.org/list/87h7jy4ebe@nicolasgoaziou.fr/ - <>A - <<>> - B
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hi all, Maxim Nikulin writes: > I think, new variant should be committed even it does not fix Juan's > case. He have not confirmed the fix yet. Sorry, I have been busy with other matters and had lost the thread of the discussion. I'm catching up on the messages... I have tried the Nicolas' patch (latest version) and I see that the items with emphasis are already ordered well. However, it seems that the problem with identical items with or without emphasis still persists: which items should go before and in what order? For example, in the following list I get: - /a/ - *a* - a - *b* - /b/ - b - /v/ - *v* - v Anyway, I think it is a minor problem (I can't think of many scenarios in real life where this problem is relevant). In a more realistic case, the result is correct: - lo /bueno/ - lo bueno - lo /vano/ - lo vano ... - /a/ - b - /v/ - *z* Best regards, Juan Manuel
Re: [Patch] to correctly sort the items with emphasis marks in a list
On 21/04/2021 03:37, Nicolas Goaziou wrote: Maxim Nikulin writes: (let ((s (org-sort-remove-invisible "A wrapping [[https://orgmode.org?a=b&c=d#e][link]] emphasis/" I expect "A wrapping link emphasis". Yet, your expectations are wrong. There is no link in the text above. Emphasized text starts at "/wrapping" and ends before "/?". Granted, this is a situation where the Org syntax is not very intuitive. Anyway, the new function is more accurate. I think, new variant should be committed even it does not fix Juan's case. He have not confirmed the fix yet. Maybe we should require a space after punctuation following emphasized text. I don't know. This is orthogonal to the current discussion. I still believe in my expectation, however I admit such limitation of parser. At first I have not recognized that the issue may be similar to https://orgmode.org/list/CAH+Wm4-_XHUZKFTf=ztbfncpvqwkbeoegs8epym+8spmu8l...@mail.gmail.com/ Anyway for my example workaround is to add more markers before, inside, and after the link. Maybe I will look closer at Tom's parser if it solves ambiguity in the same way. In the meanwhile I have tried (benchmark-run 1 (org-sort-list t ?a)) in a file (1100 lines) obtained using I don't think performance is really an issue. Of course, the suggested function is clearly slower than the current one. It is OK since difference is not really huge, especially taking into account that new variant was not compiled. Do you still have problem with locale dependency of added tests? I can not guess what could be its source and expect that test should work reliably. Disregard "/3" in the subject of the patches. Third change is your code. >From c2c46d133e80ffa2323618ac88a1902e63ba1efc Mon Sep 17 00:00:00 2001 From: Max Nikulin Date: Mon, 19 Apr 2021 19:06:36 +0700 Subject: [PATCH 1/3] More tests for `org-sort-list' * lisp/org.el (org-verbatim-re): Add to the docstring a statement concerning subgroups meaning. * testing/lisp/test-org-list.el (test-org-list/sort): Add cases with text emphasis. Stress in comments that it is significant whether locale-specific collation rules ignores spaces. * testing/lisp/test-org.el (test-org/org-sort-remove-invisible): Add tests for `org-sort-list' helper. --- lisp/org.el | 3 ++- testing/lisp/test-org-list.el | 34 +- testing/lisp/test-org.el | 25 + 3 files changed, 60 insertions(+), 2 deletions(-) diff --git a/lisp/org.el b/lisp/org.el index c2738100f..3d4f5553a 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -3685,7 +3685,8 @@ After a match, the match groups contain these elements: 5 The character after the match, empty at the end of a line") (defvar org-verbatim-re nil - "Regular expression for matching verbatim text.") + "Regular expression for matching verbatim text. +Groups 1-5 have the same meaning as in `org-emph-re' pattern.") (defvar org-emphasis-regexp-components) ; defined just below (defvar org-emphasis-alist) ; defined just below diff --git a/testing/lisp/test-org-list.el b/testing/lisp/test-org-list.el index 3689a172f..cc7914c8d 100644 --- a/testing/lisp/test-org-list.el +++ b/testing/lisp/test-org-list.el @@ -1225,7 +1225,39 @@ b. Item 2" (equal "- b\n- a\n- C\n" (org-test-with-temp-text "- b\n- C\n- a\n" (org-sort-list t ?A) - (buffer-string)) + (buffer-string + ;; Emphasis in the beginning. + (should + (equal "- a\n- /z/\n" + (org-test-with-temp-text "- /z/\n- a\n" +(org-sort-list t ?a) +(buffer-string + (should + (equal "- *a*\n- z\n" + (org-test-with-temp-text "- z\n- *a*\n" +(org-sort-list t ?a) +(buffer-string + ;; Emphasis of second word. + ;; Locales with significant spaces (C, es_ES, pl_PL) + ;; are more sensitive to problems with sort. + (should + (equal "- a b\n- a /z/\n" + (org-test-with-temp-text "- a /z/\n- a b\n" +(org-sort-list t ?a) +(buffer-string + (should + (equal "- a *b*\n- a z\n" + (org-test-with-temp-text "- a z\n- a *b*\n" +(org-sort-list t ?a) +(buffer-string + ;; Space role in sorting. + ;; Test would fail for locales with ignored space, e.g. en_US, it works + ;; in C and currently rare locales having significant space (es_ES, pl_PL) + (should + (equal "- Time stamp\n- Timer\n" + (org-test-with-temp-text "- Timer\n- Time stamp\n" +(org-sort-list t ?a) +(buffer-string)) ;; Sort numerically. (should (equal "- 1\n- 2\n- 10\n" diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el index f6fb4b3ca..3f74b3f35 100644 --- a/testing/lisp/test-org.el +++ b/testing/lisp/test-org.el @@ -8262,4 +8262,29 @@ two (provide 'test-org) +(
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hello, Maxim Nikulin writes: > Surprisingly there are still cases when the old approach works better: > > (let ((s (org-sort-remove-invisible > "A /wrapping [[https://orgmode.org/?a=b&c=d#e][link]] emphasis/"))) > (set-text-properties 0 (length s) nil s) > s) > "A wrapping [[https://orgmode.org?a=b&c=d#e][link]] emphasis/" > > I expect "A wrapping link emphasis". Yet, your expectations are wrong. There is no link in the text above. Emphasized text starts at "/wrapping" and ends before "/?". Granted, this is a situation where the Org syntax is not very intuitive. Anyway, the new function is more accurate. Maybe we should require a space after punctuation following emphasized text. I don't know. This is orthogonal to the current discussion. > In the meanwhile I have tried > > (benchmark-run 1 (org-sort-list t ?a)) > > in a file (1100 lines) obtained using I don't think performance is really an issue. Of course, the suggested function is clearly slower than the current one. Regards, -- Nicolas Goaziou
Re: [Patch] to correctly sort the items with emphasis marks in a list
On 20/04/2021 20:57, Nicolas Goaziou wrote: Maxim Nikulin writes: (org-sort-remove-invisible "A") #("A" 0 1 (:parent (#("A" 0 1 ... This is a string. Thank you, from second attempt I have managed to strip text properties. Since the intended usage of return value is sorting key, would not it benefit from passing result through the following expression? (set-text-properties 0 (length s) nil s) An alternative is to clean up keys in `org-sort-list' function. Ah! I forgot the link part! Hopefully done here. Surprisingly there are still cases when the old approach works better: (let ((s (org-sort-remove-invisible "A /wrapping [[https://orgmode.org/?a=b&c=d#e][link]] emphasis/"))) (set-text-properties 0 (length s) nil s) s) "A wrapping [[https://orgmode.org?a=b&c=d#e][link]] emphasis/" I expect "A wrapping link emphasis". In the meanwhile I have tried (benchmark-run 1 (org-sort-list t ?a)) in a file (1100 lines) obtained using grep '^- ' doc/org-manual.org >/tmp/list.org It seems, performance is still acceptable (single run hardly could be considered as an accurate test): (1.115571472 18 0.598646606999) ; new variant (0.260384514 1 0.0980547519947) ; original code --8<---cut here---start->8--- (defun org-sort-remove-invisible (s) "Remove emphasis markers and any invisible property from string S. Assume S may contain only objects." ;; org-element-interpret-data clears any text property, including ;; invisible part. (org-element-interpret-data (let ((tree (org-element-parse-secondary-string s (org-element-restriction 'paragraph (org-element-map tree '(bold code italic link strike-through underline verbatim) (lambda (o) (pcase (org-element-type o) ;; Terminal object. Replace it with its value. ((or `code `verbatim) (let ((new (org-element-property :value o))) (org-element-insert-before new o) (org-element-put-property new :post-blank (org-element-property :post-blank o ;; Non-terminal objects. Splice contents. (type (let ((contents (or (org-element-contents o) (and (eq type 'link) (list (org-element-property :raw-link o) (c nil)) (while contents (setq c (pop contents)) (org-element-insert-before c o)) (org-element-put-property c :post-blank (org-element-property :post-blank o) (org-element-extract-element o))) ;; Return modified tree. tree))) --8<---cut here---end--->8---
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hello, Maxim Nikulin writes: > Maybe first variant deserves to be committed while discussion of a > better option is in progress. I'd rather not, since we're currently considering a somewhat different path. The problem has been there for ages anyway. > I can not even determine what type of structure is returned when > `org-sort-remove-invisible' is called from ert or from scratch buffer: > > (org-sort-remove-invisible "A") > #("A" 0 1 (:parent (#("A" 0 1 ... This is a string. > A couple of obvious problems: > > 1. Link handling > > #+begin_src elisp > (org-sort-remove-invisible > "- [[https://orgmode.org/submit?a=bc%20d&e=t+r#1234][a link]]") > #+end_src > > > #+RESULTS: > : - [[https://orgmode.org/submit?a=bc%20d&e=t+r#1234][a link]] Ah! I forgot the link part! Hopefully done here. --8<---cut here---start->8--- (defun org-sort-remove-invisible (s) "Remove emphasis markers and any invisible property from string S. Assume S may contain only objects." ;; org-element-interpret-data clears any text property, including ;; invisible part. (org-element-interpret-data (let ((tree (org-element-parse-secondary-string s (org-element-restriction 'paragraph (org-element-map tree '(bold code italic link strike-through underline verbatim) (lambda (o) (pcase (org-element-type o) ;; Terminal object. Replace it with its value. ((or `code `verbatim) (let ((new (org-element-property :value o))) (org-element-insert-before new o) (org-element-put-property new :post-blank (org-element-property :post-blank o ;; Non-terminal objects. Splice contents. (type (let ((contents (or (org-element-contents o) (and (eq type 'link) (list (org-element-property :raw-link o) (c nil)) (while contents (setq c (pop contents)) (org-element-insert-before c o)) (org-element-put-property c :post-blank (org-element-property :post-blank o) (org-element-extract-element o))) ;; Return modified tree. tree))) --8<---cut here---end--->8--- > 2. Missed spaces > > #+begin_src elisp > (org-sort-remove-invisible "A *b* /i/ t.") > #+end_src > > #+RESULTS: > : A bit. You need to update Org. I added a fix for that in "org-element" yesterday. Regards, -- Nicolas Goaziou
Re: [Patch] to correctly sort the items with emphasis marks in a list
On 20/04/2021 00:50, Nicolas Goaziou wrote: Maxim Nikulin writes: On 19/04/2021 23:08, Nicolas Goaziou wrote: + ;; Space role in sorting. + ;; Test would fail for locales with ignored space, e.g. en_US, it works + ;; in C and currently rare locales having significant space (es_ES, pl_PL) + (should + (equal "- Time stamp\n- Timer\n" + (org-test-with-temp-text "- Timer\n- Time stamp\n" +(org-sort-list t ?a) +(buffer-string)) Since this test is bound to fail for some developers, I assume it shouldn't be included. Locale "C" is forced for this group of tests. Sorry, I don't understand. There is no locale change around this test, so it will fail, for example, for me. I wouldn't want to get a noisy failure each time I run tests. Certainly flaky tests must be avoided. However I can not identify the source of confusion (yours or mine). There is redefinition of `string-collate-lessp' to run tests with "C" locale: https://code.orgmode.org/bzg/org-mode/src/master/testing/lisp/test-org-list.el#L1207 And it works for me #+begin_src elisp (let ((original-string-collate-lessp (symbol-function #'string-collate-lessp))) (cl-letf (((symbol-function 'string-collate-lessp) (lambda (s1 s2 &optional locale ignore-case) (funcall original-string-collate-lessp s1 s2 "C" ignore-case (org-test-with-temp-text "- Timer\n- Time stamp\n" (org-sort-list t ?a) (buffer-string #+end_src #+RESULTS: : - Time stamp : - Timer #+begin_src elisp (let ((original-string-collate-lessp (symbol-function #'string-collate-lessp))) (cl-letf (((symbol-function 'string-collate-lessp) (lambda (s1 s2 &optional locale ignore-case) (funcall original-string-collate-lessp s1 s2 "en_US.UTF-8" ignore-case (org-test-with-temp-text "- Timer\n- Time stamp\n" (org-sort-list t ?a) (buffer-string #+end_src #+RESULTS: : - Timer : - Time stamp glibc-2.31 (source for locales package), Ubuntu-20.04 focal, emacs-26.3 Could you, please, show result of the following command in your environment (for a chance that "C" locale definition has changed): printf 'Timestamp\nTime zone\n' \ | LC_ALL= LC_COLLATE= LANG=C.UTF-8 LANGUAGE=en sort --debug sort: using ‘C.UTF-8’ sorting rules Time zone _ Timestamp _
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hi, On 19/04/2021 23:08, Nicolas Goaziou wrote: In my opinion, a more severe limitation comes from sequential regexp-based approach. Consider stripping markers from 1. "a =b *c* d= e" 2. "*b* /i/" Fair enough. Here comes another, more involved, attempt. Maybe first variant deserves to be committed while discussion of a better option is in progress. --8<---cut here---start->8--- (defun org-sort-remove-invisible (s) "Remove emphasis markers and any invisible property from string S. Assume S may contain only objects." ;; org-element-interpret-data clears any text property, including ;; invisible part. (org-element-interpret-data Sorry, I can not help you with polishing code of this function, I am not familiar with functions working on org element tree yet. I can not even determine what type of structure is returned when `org-sort-remove-invisible' is called from ert or from scratch buffer: (org-sort-remove-invisible "A") #("A" 0 1 (:parent (#("A" 0 1 ... A couple of obvious problems: 1. Link handling #+begin_src elisp (org-sort-remove-invisible "- [[https://orgmode.org/submit?a=bc%20d&e=t+r#1234][a link]]") #+end_src #+RESULTS: : - [[https://orgmode.org/submit?a=bc%20d&e=t+r#1234][a link]] 2. Missed spaces #+begin_src elisp (org-sort-remove-invisible "A *b* /i/ t.") #+end_src #+RESULTS: : A bit. (let ((tree (org-element-parse-secondary-string s (org-element-restriction 'paragraph (org-element-map tree '(bold code italic strike-through underline verbatim) (lambda (o) (pcase (org-element-type o) ;; Terminal object. Replace it with its value. ((or `code `verbatim) (let ((new (org-element-property :value o))) (org-element-insert-before new o) (org-element-put-property new :post-blank (org-element-property :post-blank o ;; Non-terminal objects. Splice contents. (_ (let ((contents (org-element-contents o)) (c nil)) (while contents (setq c (pop contents)) (org-element-insert-before c o)) (org-element-put-property c :post-blank (org-element-property :post-blank o) (org-element-extract-element o))) ;; Return modified tree. tree))) --8<---cut here---end--->8--- It is not perfect, but it does a better job. WDYT?
Re: [Patch] to correctly sort the items with emphasis marks in a list
Tom, thanks! i assumed something like that.
Re: [Patch] to correctly sort the items with emphasis marks in a list
Maxim Nikulin writes: > On 19/04/2021 23:08, Nicolas Goaziou wrote: >>> + ;; Space role in sorting. >>> + ;; Test would fail for locales with ignored space, e.g. en_US, it >>> works >>> + ;; in C and currently rare locales having significant space (es_ES, >>> pl_PL) >>> + (should >>> + (equal "- Time stamp\n- Timer\n" >>> + (org-test-with-temp-text "- Timer\n- Time stamp\n" >>> +(org-sort-list t ?a) >>> +(buffer-string)) >> Since this test is bound to fail for some developers, I assume it >> shouldn't be included. > > Locale "C" is forced for this group of tests. Sorry, I don't understand. There is no locale change around this test, so it will fail, for example, for me. I wouldn't want to get a noisy failure each time I run tests. Regards,
Re: [Patch] to correctly sort the items with emphasis marks in a list
On 19/04/2021 23:08, Nicolas Goaziou wrote: + ;; Space role in sorting. + ;; Test would fail for locales with ignored space, e.g. en_US, it works + ;; in C and currently rare locales having significant space (es_ES, pl_PL) + (should + (equal "- Time stamp\n- Timer\n" + (org-test-with-temp-text "- Timer\n- Time stamp\n" +(org-sort-list t ?a) +(buffer-string)) Since this test is bound to fail for some developers, I assume it shouldn't be included. Locale "C" is forced for this group of tests. I have added the test to catch an attempt to change locale since it would affect other cases. I hope, the comment might be useful for those who play with sorting having a regular locale instead of fallback "C".
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hi Greg, seq cannot be used because it is not available in older versions of emacs that org still supports. When support for those older versions is dropped then seq could be used. Best, Tom
Re: [Patch] to correctly sort the items with emphasis marks in a list
hi, Nicolas, i'm curious, not knowing history and/or procedures. > ... CL is still necessary however, as we cannot use `seq' yet. why is 'seq not "yet" available? what will make it available? cheers, Greg
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hello, Maxim Nikulin writes: > Just a curiosity, what is style guide recommendation: let - lambda or > cl-labels? I stay away from CL as much as possible, otherwise newcomers will have to learn two languages to start contributing, Elisp and CL (cl-loop, ewww). CL is still necessary however, as we cannot use `seq' yet. Also, there is `letrec' instead of `cl-labels'. > In my opinion, a more severe limitation comes from sequential > regexp-based approach. Consider stripping markers from > 1. "a =b *c* d= e" > 2. "*b* /i/" Fair enough. Here comes another, more involved, attempt. --8<---cut here---start->8--- (defun org-sort-remove-invisible (s) "Remove emphasis markers and any invisible property from string S. Assume S may contain only objects." ;; org-element-interpret-data clears any text property, including ;; invisible part. (org-element-interpret-data (let ((tree (org-element-parse-secondary-string s (org-element-restriction 'paragraph (org-element-map tree '(bold code italic strike-through underline verbatim) (lambda (o) (pcase (org-element-type o) ;; Terminal object. Replace it with its value. ((or `code `verbatim) (let ((new (org-element-property :value o))) (org-element-insert-before new o) (org-element-put-property new :post-blank (org-element-property :post-blank o ;; Non-terminal objects. Splice contents. (_ (let ((contents (org-element-contents o)) (c nil)) (while contents (setq c (pop contents)) (org-element-insert-before c o)) (org-element-put-property c :post-blank (org-element-property :post-blank o) (org-element-extract-element o))) ;; Return modified tree. tree))) --8<---cut here---end--->8--- It is not perfect, but it does a better job. WDYT? > + ;; Space role in sorting. > + ;; Test would fail for locales with ignored space, e.g. en_US, it works > + ;; in C and currently rare locales having significant space (es_ES, > pl_PL) > + (should > + (equal "- Time stamp\n- Timer\n" > + (org-test-with-temp-text "- Timer\n- Time stamp\n" > +(org-sort-list t ?a) > +(buffer-string)) Since this test is bound to fail for some developers, I assume it shouldn't be included. > + (dolist (case '(("Lost =in *verbatim* text= fragment" . > + "Lost in *verbatim* text fragment") > + ("Adjucent emphasis: *Overlapped* /regexps/" . > + "Adjucent emphasis: Ovelapped regexps"))) typo Regards, -- Nicolas Goaziou
Re: [Patch] to correctly sort the items with emphasis marks in a list
On 19/04/2021 15:33, Nicolas Goaziou wrote: Could you try the following instead? --8<---cut here---start->8--- (defun org-sort-remove-invisible (s) "Remove invisible part of links and emphasis markers from string S." (let ((remove-markers (lambda (m) Just a curiosity, what is style guide recommendation: let - lambda or cl-labels? (concat (match-string 1 m) (match-string 4 m) (match-string 5 m) (remove-text-properties 0 (length s) org-rm-props s) (replace-regexp-in-string org-verbatim-re remove-markers (replace-regexp-in-string org-emph-re remove-markers (org-link-display-format s) t tt) Single "t" is at the end, isn't it? t t))) I think, the patch is an improvement. There is still a minor shortcoming that ordering of emphasized and non-emphasized words is undefined - /a/ - *a* - a Let's leave it aside since it requires multilevel comparison similar to collation in locales and more strict definition which part of string is compared at each level: - /a/ :: A - /a/ :: Z - a :: A - a :: Z In my opinion, a more severe limitation comes from sequential regexp-based approach. Consider stripping markers from 1. "a =b *c* d= e" 2. "*b* /i/" First case could be solved by splitting the input string by verbatim regexp at first and by applying emphasis substitution only to non-verbatim parts. However I am rather shy to experiment with regexps definition to avoid including chars before and after emphasis markers. It would be great to get role of particular characters from more reliable parser (or at least from text properties affected by fontification). I have some tests for `org-sort-remove-invisible', see the attachment. Actually I do not like such style of tests, I strongly prefer to see all failures at once, but I have not found if such technique is used anywhere in org tests. >From 6719386aa5a95394a4cb98ce28b889f545620bd9 Mon Sep 17 00:00:00 2001 From: Max Nikulin Date: Mon, 19 Apr 2021 19:06:36 +0700 Subject: [PATCH] More tests for `org-sort-list' * lisp/org.el (org-verbatim-re): Add to the docstring a statement concerning subgroups meaning. * testing/lisp/test-org-list.el (test-org-list/sort): Add cases with text emphasis. Stress in comments that it is significant whether locale-specific collation rules ignores spaces. * testing/lisp/test-org.el (test-org/org-sort-remove-invisible): Add tests for `org-sort-list' helper. Add a test with expected failures to make apparent limitation of simple regexp-based approach in `org-sort-remove-invisible' function. --- lisp/org.el | 3 ++- testing/lisp/test-org-list.el | 34 +- testing/lisp/test-org.el | 27 +++ 3 files changed, 62 insertions(+), 2 deletions(-) diff --git a/lisp/org.el b/lisp/org.el index c2738100f..3d4f5553a 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -3685,7 +3685,8 @@ After a match, the match groups contain these elements: 5 The character after the match, empty at the end of a line") (defvar org-verbatim-re nil - "Regular expression for matching verbatim text.") + "Regular expression for matching verbatim text. +Groups 1-5 have the same meaning as in `org-emph-re' pattern.") (defvar org-emphasis-regexp-components) ; defined just below (defvar org-emphasis-alist) ; defined just below diff --git a/testing/lisp/test-org-list.el b/testing/lisp/test-org-list.el index 3689a172f..cc7914c8d 100644 --- a/testing/lisp/test-org-list.el +++ b/testing/lisp/test-org-list.el @@ -1225,7 +1225,39 @@ b. Item 2" (equal "- b\n- a\n- C\n" (org-test-with-temp-text "- b\n- C\n- a\n" (org-sort-list t ?A) - (buffer-string)) + (buffer-string + ;; Emphasis in the beginning. + (should + (equal "- a\n- /z/\n" + (org-test-with-temp-text "- /z/\n- a\n" +(org-sort-list t ?a) +(buffer-string + (should + (equal "- *a*\n- z\n" + (org-test-with-temp-text "- z\n- *a*\n" +(org-sort-list t ?a) +(buffer-string + ;; Emphasis of second word. + ;; Locales with significant spaces (C, es_ES, pl_PL) + ;; are more sensitive to problems with sort. + (should + (equal "- a b\n- a /z/\n" + (org-test-with-temp-text "- a /z/\n- a b\n" +(org-sort-list t ?a) +(buffer-string + (should + (equal "- a *b*\n- a z\n" + (org-test-with-temp-text "- a z\n- a *b*\n" +(org-sort-list t ?a) +(buffer-string + ;; Space role in sorting. + ;; Test would fail for locales with ignored space, e.g. en_US, it works + ;; in C and currently rare locales having significant space (es_ES, pl_PL) + (should + (equal "- Time stamp\n- Timer\n" + (org-te
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hello, Juan Manuel Macías writes: > I wonder if this other approach can be viable or if it is something > crazy: if the spaces in org-sort-remove-invisible are a > problem only for the first emphasis of each item, how about this > fix to org-sort-list? (not modifying org-sort-remove-invisible): I think `org-sort-remove-invisible' is wrong, so it is the one that needs to be fixed. Could you try the following instead? --8<---cut here---start->8--- (defun org-sort-remove-invisible (s) "Remove invisible part of links and emphasis markers from string S." (let ((remove-markers (lambda (m) (concat (match-string 1 m) (match-string 4 m) (match-string 5 m) (remove-text-properties 0 (length s) org-rm-props s) (replace-regexp-in-string org-verbatim-re remove-markers (replace-regexp-in-string org-emph-re remove-markers (org-link-display-format s) t tt) t t))) --8<---cut here---end--->8--- Regards, -- Nicolas Goaziou
Re: [Patch] to correctly sort the items with emphasis marks in a list
Juan Manuel Macías writes: > It seems that what I was proposing as a patch at the beginning is not, > finally, a viable solution for all contexts... > > The problem is that, if the first space is removed, we get this > abnormal result: > > #+begin_src emacs-lisp > (org-sort-remove-invisible "- lo /bueno/") > #+end_src > > #+RESULTS: > : - lobueno I wonder if this other approach can be viable or if it is something crazy: if the spaces in org-sort-remove-invisible are a problem only for the first emphasis of each item, how about this fix to org-sort-list? (not modifying org-sort-remove-invisible): @@ -2940,10 +2940,20 @@ function is being called interactively." (org-sort-remove-invisible (buffer-substring (match-end 0) (point-at-eol) ((= dcst ?a) - (funcall case-func -(org-sort-remove-invisible - (buffer-substring - (match-end 0) (point-at-eol) + (if (save-excursion + (beginning-of-line) + (forward-char) + (looking-at-p org-emph-re)) + (replace-regexp-in-string +"\\(^\\)\s+" "\\1" +(funcall case-func + (org-sort-remove-invisible + (buffer-substring + (match-end 0) (point-at-eol) + (funcall case-func + (org-sort-remove-invisible + (buffer-substring +(match-end 0) (point-at-eol)) ((= dcst ?t) (cond ;; If it is a timer list, convert timer to seconds
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hi Maxim, Maxim Nikulin writes: > On 10/04/2021 07:01, Juan Manuel Macías wrote: >> Nicolas Goaziou writes: >>> Could you apply the same fix to the `org-verbatim-re' match above, and >>> provide an appropriate commit message? >> Done! I've attached the corrected patch. Sorry for the flaws in me >> previous patch: I'm a bit of a novice at submitting patches... > > I have not read yet the following documents to realize what pitfalls > could be expected due to significant space added to Spanish and Polish > collation rules in libc. > > http://www.unicode.org/reports/tr10/ > Unicode® Technical Standard #10 > Unicode Collation Algorithm > > http://www.unicode.org/reports/tr35/tr35-collation.html#CLDR_Collation_Algorithm > Unicode Technical Standard #35 > Unicode Locale Data Markup Language (LDML) > Part 5: Collation > > In the meanwhile I have realized that org-sort-remove-invisible > function has rather strange behavior. There are no tests, so I am in > doubt if the following result is expected and intended (unpatched > version) All this research you've done on spaces and collation rules is very interesting and enlightening. Certainly, space is important, as the following list in Spanish is correctly ordered as: - lo raro - lo vano - localidad However, with the 'unpatched' version of org-sort-remove invisible I get this result (which is correct) for this list: - lo bueno - lo /bueno/ - lo vano - lo /vano/ And with the patched version, I get this, which is wrong (!): - lo bueno - lo vano - lo /bueno/ - lo /vano/ It seems that what I was proposing as a patch at the beginning is not, finally, a viable solution for all contexts... The problem is that, if the first space is removed, we get this abnormal result: #+begin_src emacs-lisp (org-sort-remove-invisible "- lo /bueno/") #+end_src #+RESULTS: : - lobueno Best regards, Juan Manuel
Re: [Patch] to correctly sort the items with emphasis marks in a list
On 10/04/2021 07:01, Juan Manuel Macías wrote: Nicolas Goaziou writes: Could you apply the same fix to the `org-verbatim-re' match above, and provide an appropriate commit message? Done! I've attached the corrected patch. Sorry for the flaws in me previous patch: I'm a bit of a novice at submitting patches... I have not read yet the following documents to realize what pitfalls could be expected due to significant space added to Spanish and Polish collation rules in libc. http://www.unicode.org/reports/tr10/ Unicode® Technical Standard #10 Unicode Collation Algorithm http://www.unicode.org/reports/tr35/tr35-collation.html#CLDR_Collation_Algorithm Unicode Technical Standard #35 Unicode Locale Data Markup Language (LDML) Part 5: Collation In the meanwhile I have realized that org-sort-remove-invisible function has rather strange behavior. There are no tests, so I am in doubt if the following result is expected and intended (unpatched version) (org-sort-remove-invisible "- (*bold*)") "- bold " I would expect "- (bold)" P.S. On 10/04/2021 17:19, Nicolas Goaziou wrote: > > Do you have a simple test case to reproduce the problem? Currently > sorting the following trivial lists causes no issue: After the lengthy discussion the problem is traced down to the following: - Ensure that you have *compiled* recent enough es_ES.UTF-8 locale (description includes https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=localedata/locales/es_ES;h=aa919a26267fd6311b71d7aeb81655e55787b4df;hp=d17612f6726d0c098ac981e06f3702106540bb23;hb=159738548130d5ac4fe6178977e940ed5f8cfdc4;hpb=ce6636b06b67d6bb9b3d6927bf2a926b9b7478f5) - LC_ALL= LANG=es_ES.UTF-8 emacs - (org-sort-list t ?a) for the following snippet - /a/ - a - /v/ - v Significance of space should be better illustrated with multi-word items but I am not ready to provide an impressive example yet.
Re: [Patch] to correctly sort the items with emphasis marks in a list
On 16/04/2021 21:59, Maxim Nikulin wrote: Ukrainian sort works better than Russian one with such example: printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \ | LANG=uk_UA.UTF-8 sort Иванов Адам Иванова Алла Иванова Светлана printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \ | LANG=ru_RU.UTF-8 sort Иванова Алла Иванов Адам Иванова Светлана Sorry, I forgot to generate uk_UA locale so simple ordering were applied without ignoring of spaces in the shared part of locale definitions. printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \ | LANG=uk_UA.UTF-8 sort Иванова Алла Иванов Адам Иванова Светлана So only pl_PL and es_ES have significant spaces ;IGNORE;IGNORE; vs. IGNORE;IGNORE;IGNORE; % SPACE in iso14651_t1_common
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hi Juan, On 16/04/2021 01:21, you wrote: #+begin_src emacs-lisp :tangle list-var.el (message "%S" (sort '("-\s\sv" "-\sv" "-\sa" "-\s\sa") #'string-collate-lessp)) #+end_src #+begin_src sh exec 2>&1 LC_ALL=en_US.UTF-8 emacs --batch -Q -l list-var.el #+end_src #+RESULTS: : - a" "- a" "- v" "- v #+begin_src sh exec 2>&1 LC_ALL=es_ES.UTF-8 emacs --batch -Q -l list-var.el #+end_src #+RESULTS: : - a" "- v" "- a" "- v You have managed to convince me that despite my first suspects the locale on your computer is correct. It is unexpectedly correct and it is more correct that most of locales in libc. However I do not have opinion concerning you patch yet. I have not realized what is the proper way to sort list. Space is significant. At least it may be. Only a few languages have got such fix, Spanish is among them https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=localedata/locales/es_ES;h=aa919a26267fd6311b71d7aeb81655e55787b4df;hp=d17612f6726d0c098ac981e06f3702106540bb23;hb=159738548130d5ac4fe6178977e940ed5f8cfdc4;hpb=ce6636b06b67d6bb9b3d6927bf2a926b9b7478f5 Notice "collating-symbol " I have found example of sorting names in a language where woman surname usually have additional "a" in comparison to man surname. printf "Ivanova Alla\nIvanov Adam\nIvanova Svetlana\n" \ | LANG=pl_PL.UTF-8 sort Ivanov Adam Ivanova Alla Ivanova Svetlana es_ES behavior is just as the above example. printf "Ivanova Alla\nIvanov Adam\nIvanova Svetlana\n" \ | LANG=en_US.UTF-8 sort Ivanova Alla Ivanov Adam Ivanova Svetlana Ukrainian sort works better than Russian one with such example: printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \ | LANG=uk_UA.UTF-8 sort Иванов Адам Иванова Алла Иванова Светлана printf "Иванова Алла\nИванов Адам\nИванова Светлана\n" \ | LANG=ru_RU.UTF-8 sort Иванова Алла Иванов Адам Иванова Светлана Man names are sorted first in such lists. Other cases might exist when significant space is undesired. So sorting is tricky than I expected.
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hi Maxim Maxim Nikulin writes: > I can reproduce the issue with emacs-27.1 from ubuntu-21.04 beta live > image running in qemu. Org mode is current git HEAD. It seems that > something is changed in emacs since locale is correct: > > ubuntu@ubuntu:~$ printf '%s- v\n- /v/\n- a\n- /a/\n' '' \ > | LANG=C.UTF-8 sort > - /a/ > - /v/ > - a > - v > ubuntu@ubuntu:~$ printf '%s- v\n- /v/\n- a\n- /a/\n' '' \ > | LANG=es_ES.UTF-8 sort > - /a/ > - a > - /v/ > - v > > Concerning org-sort-list, I do not see any problem with en_US.UTF-8, > it_IT.UTF-8, and ru_RU.UTF-8 locales. However emphasized items are > sorted first for at least es_ES.UTF-8, es_MX.UTF-8, and es_US.UTF-8. > I have found some evidence that the problem is on the org side > > cat list.el > (message "%S" (sort '("- /v/" "- v" "- a" "- /a/") > #'string-collate-lessp)) > > LC_ALL=C.UTF-8 emacs --batch -Q -l list.el > ("- /a/" "- /v/" "- a" "- v") > > LC_ALL=en_US.UTF-8 emacs --batch -Q -l list.el > ("- /a/" "- a" "- /v/" "- v") > > LC_ALL=es_ES.UTF-8 emacs --batch -Q -l list.el > ("- /a/" "- a" "- /v/" "- v") > > So even string-collate-lessp works as expected. > > I'm puzzled why the problem is specific to org-sort-list and namely to > Spanish locales. > > Well that's pretty weird ... I wonder if the (spurious?) spaces in `org-sort-remove-invisible' that I mentioned at the beginning may be helpful to keep a track, or if it's just something that masks the problem. Anyway, I have noticed that the only space that seems determinant (from some way that escapes me) is this (where I put the @ sign): org-emph-re (lambda (m) (format "@%s " (match-string 4 m))) Following your examples, it occurred to me to try this, which I don't know if it is relevant or if I have entered a dead end: #+begin_src emacs-lisp :tangle list-var.el (message "%S" (sort '("-\s\sv" "-\sv" "-\sa" "-\s\sa") #'string-collate-lessp)) #+end_src #+begin_src sh exec 2>&1 LC_ALL=en_US.UTF-8 emacs --batch -Q -l list-var.el #+end_src #+RESULTS: : - a" "- a" "- v" "- v #+begin_src sh exec 2>&1 LC_ALL=es_ES.UTF-8 emacs --batch -Q -l list-var.el #+end_src #+RESULTS: : - a" "- v" "- a" "- v Best regards, Juan Manuel
Re: [Patch] to correctly sort the items with emphasis marks in a list
I can reproduce the issue with emacs-27.1 from ubuntu-21.04 beta live image running in qemu. Org mode is current git HEAD. It seems that something is changed in emacs since locale is correct: ubuntu@ubuntu:~$ printf '%s- v\n- /v/\n- a\n- /a/\n' '' \ | LANG=C.UTF-8 sort - /a/ - /v/ - a - v ubuntu@ubuntu:~$ printf '%s- v\n- /v/\n- a\n- /a/\n' '' \ | LANG=es_ES.UTF-8 sort - /a/ - a - /v/ - v Concerning org-sort-list, I do not see any problem with en_US.UTF-8, it_IT.UTF-8, and ru_RU.UTF-8 locales. However emphasized items are sorted first for at least es_ES.UTF-8, es_MX.UTF-8, and es_US.UTF-8. I have found some evidence that the problem is on the org side cat list.el (message "%S" (sort '("- /v/" "- v" "- a" "- /a/") #'string-collate-lessp)) LC_ALL=C.UTF-8 emacs --batch -Q -l list.el ("- /a/" "- /v/" "- a" "- v") LC_ALL=en_US.UTF-8 emacs --batch -Q -l list.el ("- /a/" "- a" "- /v/" "- v") LC_ALL=es_ES.UTF-8 emacs --batch -Q -l list.el ("- /a/" "- a" "- /v/" "- v") So even string-collate-lessp works as expected. I'm puzzled why the problem is specific to org-sort-list and namely to Spanish locales.
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hello again, Since I have an old Emacs version (24.5.1) on my Raspberry, I've done a few more tests. The situation is the following: 1. On Arch Linux and Emacs 27.2: - The system locales are set to "es_ES.UTF-8". When, inside Emacs, I do M-x getenv RET LANG RET, I get "es_ES.UTF-8". org-sort-list a -> wrong result; - Launching Emacs from terminal with LANG=es_ES.UTF-8 \ etc... org-sort-list a -> wrong result again. 2. On Fedora 32 (virtual machine) and Emacs 27.1 - Everything as in the previous case. 3. On Raspian stretch and Emacs 24.5.1: - The system locales are set to "es_ES.UTF-8" as well. When, inside a normal Emacs session, I do M-x getenv RET lang RET, I get "es_ES.UTF-8". org-sort-list a -> wrong result; - Launching Emacs from terminal with LANG=es_ES.UTF-8 \ etc... In this case the list is ordered correctly, but I observe that similar forms with or without emphasis are not always ordered in the same way. Sometimes the non-emphasized forms are ordered before and sometimes they are ordered after (?). I don't know if I'm missing something... Best regards, Juan Manuel Juan Manuel Macías writes: > Hi Maxim, > > Thanks again. In my case, I keep getting the wrong result. In addition > to the test I did in a virtual machine with Fedora, I use Arch Linux in > my every day computers, with locales correctly (I hope) configured as > es_ES.UTF-8 (there was a typo in my previous mail, sorry: > 'en_ES.UTF-8'). In my /etc/locale.conf file I have: > > LANG=es_ES.UTF-8 > LC_ADDRESS=es_ES.UTF-8 > LC_IDENTIFICATION=es_ES.UTF-8 > LC_MEASUREMENT=es_ES.UTF-8 > LC_MONETARY=es_ES.UTF-8 > LC_NAME=es_ES.UTF-8 > LC_NUMERIC=es_ES.UTF-8 > LC_PAPER=es_ES.UTF-8 > LC_TELEPHONE=es_ES.UTF-8 > LC_TIME=es_ES.UTF-8 > > And with locale -a, I get a list of enabled locales: > > C > en_US.utf8 > es_ES.utf8 > POSIX > > However, I keep getting the wrong result :-( (Arch, Emacs 27.2). > > Even with > > LANG=en_EN.UTF-8 \ > emacs -nw -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \ > list.org > > Maybe the problem is on Arch's side (?), although it was also reproducible > in the test I did with Fedora in virtual machine and Emacs 27. > > Best regards, > > Juan Manuel > > Maxim Nikulin writes: > >> On 14/04/2021 02:08, Juan Manuel Macías wrote: >>> The situation is that with locales configured for Spanish from Spain >>> (en_ES.UTF-8) the list is not ordered correctly, unless those three >>> spaces from org-sort-remove-invisible are removed. But I couldn't say >>> why or if that would be appropriate as a patch... >> >> Did not you get a warning like the following one? >> >> (process:220): Gtk-WARNING **: 15:17:45.066: Locale not supported by C >> library. >> Using the fallback 'C' locale. >> >>> Regarding the collation rule of forms with emphasis, at least in Spanish >>> these should come after the non-emphasized forms. I don't know if this >>> is a general rule also for other languages (at least it seems more >>> natural that the forms without emphasis are placed before). >> >> LANG=es_ES.UTF-8 \ >> emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \ >> list.org >> >> - a >> - /a/ >> - v >> - /v/ >> >> So it works accordingly to your expectation. Emacs 25.2.2, >> Ubuntu-18.04 container. >> >> I have generated es_ES.UTF-8 locale using >> >> dpkg-reconfigure locales >> >> Depending on linux distribution you run, the locale may be ready to >> use or not. I tend to think that in minimal environment of virtual >> machine it was missed. >> >> I had an idea to add a test for sorting of items including emphasized >> ones but test-org-list/sort forces C locale. Maybe it was done to >> avoid failures due to missed locale. >> >> >
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hi Maxim, Thanks again. In my case, I keep getting the wrong result. In addition to the test I did in a virtual machine with Fedora, I use Arch Linux in my every day computers, with locales correctly (I hope) configured as es_ES.UTF-8 (there was a typo in my previous mail, sorry: 'en_ES.UTF-8'). In my /etc/locale.conf file I have: LANG=es_ES.UTF-8 LC_ADDRESS=es_ES.UTF-8 LC_IDENTIFICATION=es_ES.UTF-8 LC_MEASUREMENT=es_ES.UTF-8 LC_MONETARY=es_ES.UTF-8 LC_NAME=es_ES.UTF-8 LC_NUMERIC=es_ES.UTF-8 LC_PAPER=es_ES.UTF-8 LC_TELEPHONE=es_ES.UTF-8 LC_TIME=es_ES.UTF-8 And with locale -a, I get a list of enabled locales: C en_US.utf8 es_ES.utf8 POSIX However, I keep getting the wrong result :-( (Arch, Emacs 27.2). Even with LANG=en_EN.UTF-8 \ emacs -nw -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \ list.org Maybe the problem is on Arch's side (?), although it was also reproducible in the test I did with Fedora in virtual machine and Emacs 27. Best regards, Juan Manuel Maxim Nikulin writes: > On 14/04/2021 02:08, Juan Manuel Macías wrote: >> The situation is that with locales configured for Spanish from Spain >> (en_ES.UTF-8) the list is not ordered correctly, unless those three >> spaces from org-sort-remove-invisible are removed. But I couldn't say >> why or if that would be appropriate as a patch... > > Did not you get a warning like the following one? > > (process:220): Gtk-WARNING **: 15:17:45.066: Locale not supported by C > library. > Using the fallback 'C' locale. > >> Regarding the collation rule of forms with emphasis, at least in Spanish >> these should come after the non-emphasized forms. I don't know if this >> is a general rule also for other languages (at least it seems more >> natural that the forms without emphasis are placed before). > > LANG=es_ES.UTF-8 \ > emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \ > list.org > > - a > - /a/ > - v > - /v/ > > So it works accordingly to your expectation. Emacs 25.2.2, > Ubuntu-18.04 container. > > I have generated es_ES.UTF-8 locale using > > dpkg-reconfigure locales > > Depending on linux distribution you run, the locale may be ready to > use or not. I tend to think that in minimal environment of virtual > machine it was missed. > > I had an idea to add a test for sorting of items including emphasized > ones but test-org-list/sort forces C locale. Maybe it was done to > avoid failures due to missed locale. > >
Re: [Patch] to correctly sort the items with emphasis marks in a list
On 14/04/2021 22:42, Maxim Nikulin wrote: I have generated es_ES.UTF-8 locale using dpkg-reconfigure locales Depending on linux distribution you run, the locale may be ready to use or not. I tend to think that in minimal environment of virtual machine it was missed. I forgot to add a command how to get the list of generated locales locale --all-locales locales-2.27-3ubuntu1.4 package is built from glibc sources.
Re: [Patch] to correctly sort the items with emphasis marks in a list
On 14/04/2021 02:08, Juan Manuel Macías wrote: The situation is that with locales configured for Spanish from Spain (en_ES.UTF-8) the list is not ordered correctly, unless those three spaces from org-sort-remove-invisible are removed. But I couldn't say why or if that would be appropriate as a patch... Did not you get a warning like the following one? (process:220): Gtk-WARNING **: 15:17:45.066: Locale not supported by C library. Using the fallback 'C' locale. Regarding the collation rule of forms with emphasis, at least in Spanish these should come after the non-emphasized forms. I don't know if this is a general rule also for other languages (at least it seems more natural that the forms without emphasis are placed before). LANG=es_ES.UTF-8 \ emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \ list.org - a - /a/ - v - /v/ So it works accordingly to your expectation. Emacs 25.2.2, Ubuntu-18.04 container. I have generated es_ES.UTF-8 locale using dpkg-reconfigure locales Depending on linux distribution you run, the locale may be ready to use or not. I tend to think that in minimal environment of virtual machine it was missed. I had an idea to add a test for sorting of items including emphasized ones but test-org-list/sort forces C locale. Maybe it was done to avoid failures due to missed locale.
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hi, Maxim, Thanks for clearing things up. So it seems obvious that the root of the problem is in the locales and the collation rules. The situation is that with locales configured for Spanish from Spain (en_ES.UTF-8) the list is not ordered correctly, unless those three spaces from org-sort-remove-invisible are removed. But I couldn't say why or if that would be appropriate as a patch... Regarding the collation rule of forms with emphasis, at least in Spanish these should come after the non-emphasized forms. I don't know if this is a general rule also for other languages (at least it seems more natural that the forms without emphasis are placed before). Best regards, Juan Manuel Maxim Nikulin writes: > I could reproduce such result but I am in doubt if it is a reason to > merge the patch. I believe, the following behavior is almost expected > > list.org: > - v > - /v/ > - /a/ > - a > > LC_COLLATE=C.UTF-8 LC_ALL= LANGUAGE= \ > emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \ > list.org > > (org-sort-list t ?a) > > - /a/ > - /v/ > - a > - v > > LC_COLLATE=en_US.UTF-8 LC_ALL= LANGUAGE= \ > emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \ > list.org > > (org-sort-list t ?a) > > - /a/ > - a > - /v/ > - v > > Collation rules depend on language. The question is if emphasized > variant should be sorted first. > > P.S. The thread is broken. Some new messages do not have proper > In-Reply-To header. Original question was not referenced in the > message with patch as well: > https://orgmode.org/list/87blbac0k0@posteo.net/
Re: [Patch] to correctly sort the items with emphasis marks in a list
On 10/04/2021 18:41, Juan Manuel Macías wrote: Nicolas Goaziou writes: Do you have a simple test case to reproduce the problem? Currently sorting the following trivial lists causes no issue: - b - *a* and - *b* - a The current result is wrong: - /a/ - /v/ - a - b I could reproduce such result but I am in doubt if it is a reason to merge the patch. I believe, the following behavior is almost expected list.org: - v - /v/ - /a/ - a LC_COLLATE=C.UTF-8 LC_ALL= LANGUAGE= \ emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \ list.org (org-sort-list t ?a) - /a/ - /v/ - a - v LC_COLLATE=en_US.UTF-8 LC_ALL= LANGUAGE= \ emacs -Q -L ~/src/org-mode/lisp/ -L ~/src/org-mode/contrib/lisp/ \ list.org (org-sort-list t ?a) - /a/ - a - /v/ - v Collation rules depend on language. The question is if emphasized variant should be sorted first. P.S. The thread is broken. Some new messages do not have proper In-Reply-To header. Original question was not referenced in the message with patch as well: https://orgmode.org/list/87blbac0k0@posteo.net/
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hi Samuel, Samuel Wales writes: > probably not a relevant non-confirmation but in recent maint, my config: > > - a > - /a/ > - b Thanks for trying. I've uploaded this screencast with a minimal Emacs on a virtual machine: https://gnutas.juanmanuelmacias.com/images/org-sort-issue-2021-04-13_15.44.52.mp4 and, as you can see, in my case `org-sort-list' can not sort the list correctly. Maybe it's something related to locales (??), but the only thing that I can confirm is that if I remove those spaces in `org-sort-remove-invisible', the list is then sorted correctly. Best regards, Juan Manuel
Re: [Patch] to correctly sort the items with emphasis marks in a list
probably not a relevant non-confirmation but in recent maint, my config: - a - /a/ - b On 4/12/21, Juan Manuel Macías wrote: > Hi Ypo, > > Ypo writes: > >> This is my result after doing "org-sort-list a": >> >> - /a/ >> - /v/ >> - a >> - b >> >> Regards > > Thanks for trying. So it seems that you can reproduce the problem as > well... I wonder if anyone else is able to reproduce it, preferably in a > minimal emacs. > > Best regards, > > Juan Manuel > > -- The Kafka Pandemic Please learn what misopathy is. https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hi Ypo, Ypo writes: > This is my result after doing "org-sort-list a": > > - /a/ > - /v/ > - a > - b > > Regards Thanks for trying. So it seems that you can reproduce the problem as well... I wonder if anyone else is able to reproduce it, preferably in a minimal emacs. Best regards, Juan Manuel
Re: [Patch] to correctly sort the items with emphasis marks in a list
This is my result after doing "org-sort-list a": - /a/ - /v/ - a - b Regards
Re: [Patch] to correctly sort the items with emphasis marks in a list
Nicolas Goaziou writes: > I cannot reproduce it. With your initial list, and a minimal init file, > I get: > > - /a/ > - a > - b > - /v/ > > Could you try with a minimal Emacs, too? That's weird ... I have tried launching emacs -q in a virtual machine, and I keep getting the wrong result (git version, master branch): - /a/ - /v/ - a - b I have tried with the typical keyboard shortcut and also with M-: and evaluating (org-sort-list t? a nil nil nil) Best regards, Juan Manuel
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hello Nicolas, Nicolas Goaziou writes: > Do you have a simple test case to reproduce the problem? Currently > sorting the following trivial lists causes no issue: > > - b > - *a* > > and > > - *b* > - a Consider this (unordered) list: - a - b - /v/ - /a/ The current result is wrong: - /a/ - /v/ - a - b With the spaces removed in `org-sort-remove-invisible', this would be the expected result[1]: - a - /a/ - b - /v/ [1] According to the National Information Standards Organization: "Different typefaces (italic, boldface, blackletter, etc.) do not affect the arrangement of letters." (see p. 3 on: https://www.niso.org/sites/default/files/2017-08/tr03.pdf) Best regards, Juan Manuel
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hello, Juan Manuel Macías writes: > Done! I've attached the corrected patch. Sorry for the flaws in me > previous patch: I'm a bit of a novice at submitting patches... No problem. Thank you. Do you have a simple test case to reproduce the problem? Currently sorting the following trivial lists causes no issue: - b - *a* and - *b* - a Regards, -- Nicolas Goaziou
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hellow Nicolas: Nicolas Goaziou writes: > Thank you. > > Could you apply the same fix to the `org-verbatim-re' match above, and > provide an appropriate commit message? Done! I've attached the corrected patch. Sorry for the flaws in me previous patch: I'm a bit of a novice at submitting patches... Best regards, Juan Manuel >From ab104bb079e3e32d622a6ff53824b5047dc25c14 Mon Sep 17 00:00:00 2001 From: Juan Manuel Macias Date: Fri, 2 Apr 2021 21:20:17 +0200 Subject: [PATCH] Remove spurious spaces in org-sort-remove-invisible Some spurious spaces in org-sort-remove-invisible is causing org-sort-list not to sort correctly (alphabetically) the items that contain emphasis marks in a plain list --- lisp/org.el | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lisp/org.el b/lisp/org.el index 675a614e2..74e2afac9 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -8113,9 +8113,9 @@ Optional argument WITH-CASE means sort case-sensitively." "Remove invisible part of links and emphasis markers from string S." (remove-text-properties 0 (length s) org-rm-props s) (replace-regexp-in-string - org-verbatim-re (lambda (m) (format "%s " (match-string 4 m))) + org-verbatim-re (lambda (m) (format "%s" (match-string 4 m))) (replace-regexp-in-string -org-emph-re (lambda (m) (format " %s " (match-string 4 m))) +org-emph-re (lambda (m) (format "%s" (match-string 4 m))) (org-link-display-format s) t t) t t)) -- 2.26.0
Re: [Patch] to correctly sort the items with emphasis marks in a list
Hello, Juan Manuel Macías writes: > I have noticed that a couple of (spurious?) spaces in a `format' > expression of `org-sort-remove-invisible' is causing `org-sort-list' not > to sort correctly (alphabetically) the items that contain emphasis marks > in a plain list. I propose this very simple patch to fix that problem. Thank you. Could you apply the same fix to the `org-verbatim-re' match above, and provide an appropriate commit message? Regards, -- Nicolas Goaziou
[Patch] to correctly sort the items with emphasis marks in a list
Hi all, I have noticed that a couple of (spurious?) spaces in a `format' expression of `org-sort-remove-invisible' is causing `org-sort-list' not to sort correctly (alphabetically) the items that contain emphasis marks in a plain list. I propose this very simple patch to fix that problem. Best regards, Juan Manuel diff --git a/lisp/org.el b/lisp/org.el index 04da1afcd..d10cc2f5c 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -8114,7 +8114,7 @@ Optional argument WITH-CASE means sort case-sensitively." (replace-regexp-in-string org-verbatim-re (lambda (m) (format "%s " (match-string 4 m))) (replace-regexp-in-string -org-emph-re (lambda (m) (format " %s " (match-string 4 m))) +org-emph-re (lambda (m) (format "%s" (match-string 4 m))) (org-link-display-format s) t t) t t))