Re: [PATCH 1/2] Emacs: Add a new function for balancing bidi control chars

2020-08-16 Thread Teemu Likonen
* 2020-08-16 19:28:51+03, Tomi Ollila wrote:

> Good stuff -- implementation looks like port of the php code in 
>
>https://www.iamcal.com/understanding-bidirectional-text
>
> to emacs lisp... anyway nice implementation took be a bit of
> time for me to understand it...

I don't read PHP and didn't try to read that code at all but the idea is
simple enough.

> thoughts
>
> - is it slow to execute it always, pure lisp implementation;
>   (string-match "[\u202a-\u202e]") could be done before that.
>   (if it were executed often could loop with `looking-at`
>(and then moving point based on match-end) be faster...

I don't see any speed issues but if we want to optimize I would create a
new sanitize function which walks just once across the characters
without using regular expressions. But currently I think it's
unnecessary micro optimization.

> - *but* adding U+202C's in `notmuch-sanitize` is doing it too early, as
>   some functions truncate the strings afterwards if those are too long
>   (e.g. `notmuch-search-insert-authors`) so those get lost.. 

Good point. This would mean that we shouldn't do "bidi ctrl char
balancing" in notmuch-sanitize. We should call the new
notmuch-balance-bidi-ctrl-chars function in various places before
inserting arbitrary strings to buffer and before combining such strings
with other strings.

> (what I noticed when looking `notmuch-search-insert-authors` that it uses
>  `length` to check the length of a string -- but that also counts these bidi
>  mode changing "characters" (as one char). `string-width` would be better
>  there -- and probably in many other places.)

Yes, definitely string-width when truncating is based on width and when
using tabular format in buffers. With that function zero-width
characters really have no width.

-- 
/// Teemu Likonen - .-.. http://www.iki.fi/tlikonen/
// OpenPGP: 4E1055DC84E9DFF613D78557719D69D324539450


signature.asc
Description: PGP signature
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: [PATCH 1/2] Emacs: Add a new function for balancing bidi control chars

2020-08-16 Thread Tomi Ollila
On Sat, Aug 15 2020, Teemu Likonen wrote:

> The following Unicode's bidirectional control chars are modal so that
> they push a new bidirectional rendering mode to a stack:
>
> U+202A LEFT-TO-RIGHT EMBEDDING
> U+202B RIGHT-TO-LEFT EMBEDDING
> U+202D LEFT-TO-RIGHT OVERRIDE
> U+202E RIGHT-TO-LEFT OVERRIDE

Good stuff -- implementation looks like port of the php code in 

   https://www.iamcal.com/understanding-bidirectional-text

to emacs lisp... anyway nice implementation took be a bit of
time for me to understand it...

thoughts

- is it slow to execute it always, pure lisp implementation;
  (string-match "[\u202a-\u202e]") could be done before that.
  (if it were executed often could loop with `looking-at`
   (and then moving point based on match-end) be faster...

- *but* adding U+202C's in `notmuch-sanitize` is doing it too early, as
  some functions truncate the strings afterwards if those are too long
  (e.g. `notmuch-search-insert-authors`) so those get lost.. 

- what about https://en.wikipedia.org/wiki/Bidirectional_text#Isolates
  (was documented more in some page, cannot find it anymore...)

(what I noticed when looking `notmuch-search-insert-authors` that it uses
 `length` to check the length of a string -- but that also counts these bidi
 mode changing "characters" (as one char). `string-width` would be better
 there -- and probably in many other places.)

(I tried quite a few things, something that could "reset" the stack with
 e.g. one invisible tab, but no go (or that was filtered as I added it
 to `notmuch-sanitize` ;), As a final step I did

  (defun notmuch-sanitize (str)
  ...
  -  (replace-regexp-in-string "[[:cntrl:]\x7f\u2028\u2029]+" " " str))
  +  (replace-regexp-in-string
  +   "[\u202A-\u202E\u2066-\u2069]" ""
  +   (replace-regexp-in-string "[[:cntrl:]\x7f\u2028\u2029]+" " " str)))

just to test-drop those chars. probably not good enough ;/)


Tomi

>
> Every mode must be terminated with with character U+202C POP
> DIRECTIONAL FORMATTING which pops the mode from the stack. The stack
> is per paragraph. A new text paragraph resets the rendering mode
> changed by these control characters.
>
> This change adds a new function "notmuch-balance-bidi-ctrl-chars"
> which reads its STRING argument and ensures that all push
> characters (U+202A, U+202B, U+202D, U+202E) have a pop character
> pair (U+202C). The function may add more U+202C characters at the end
> of the returned string, or it may remove some U+202C characters. The
> returned string is safe in the sense that it won't change the
> surrounding bidirectional rendering mode. This function should be used
> when sanitizing arbitrary input.
> ---
>  emacs/notmuch-lib.el | 54 
>  1 file changed, 54 insertions(+)
>
> diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
> index 118faf1e..e6252c6c 100644
> --- a/emacs/notmuch-lib.el
> +++ b/emacs/notmuch-lib.el
> @@ -469,6 +469,60 @@ be displayed."
>   "[No Subject]"
>subject)))
>  
> +
> +(defun notmuch-balance-bidi-ctrl-chars (string)
> +  "Balance bidirectional control chars in STRING.
> +
> +The following Unicode's bidirectional control chars are modal so
> +that they push a new bidirectional rendering mode to a stack:
> +U+202A LEFT-TO-RIGHT EMBEDDING, U+202B RIGHT-TO-LEFT EMBEDDING,
> +U+202D LEFT-TO-RIGHT OVERRIDE and U+202E RIGHT-TO-LEFT OVERRIDE.
> +Every mode must be terminated with with character U+202C POP
> +DIRECTIONAL FORMATTING which pops the mode from the stack. The
> +stack is per paragraph. A new text paragraph resets the rendering
> +mode changed by these control characters.
> +
> +This function reads the STRING argument and ensures that all push
> +characters (U+202A, U+202B, U+202D, U+202E) have a pop character
> +pair (U+202C). The function may add more U+202C characters at the
> +end of the returned string, or it may remove some U+202C
> +characters. The returned string is safe in the sense that it
> +won't change the surrounding bidirectional rendering mode. This
> +function should be used when sanitizing arbitrary input."
> +
> +  (let ((new-string nil)
> + (stack-count 0))
> +
> +(cl-flet ((push-char-p (c)
> + ;; U+202A LEFT-TO-RIGHT EMBEDDING
> + ;; U+202B RIGHT-TO-LEFT EMBEDDING
> + ;; U+202D LEFT-TO-RIGHT OVERRIDE
> + ;; U+202E RIGHT-TO-LEFT OVERRIDE
> + (cl-find c '(?\u202a ?\u202b ?\u202d ?\u202e)))
> +   (pop-char-p (c)
> + ;; U+202C POP DIRECTIONAL FORMATTING
> + (eql c ?\u202c)))
> +
> +  (cl-loop

Re: [PATCH 0/2] Balance bidi control chars

2020-08-15 Thread Teemu Likonen
* 2020-08-15 12:30:34+03, Teemu Likonen wrote:

> These patches continue the ideas written in message:
>
> id:87sgcuuzio@iki.fi

Here is a nice and relatively short reference for anyone who is
interested in the subject:

https://www.iamcal.com/understanding-bidirectional-text

-- 
/// Teemu Likonen - .-.. http://www.iki.fi/tlikonen/
// OpenPGP: 4E1055DC84E9DFF613D78557719D69D324539450


signature.asc
Description: PGP signature
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


[PATCH 1/2] Emacs: Add a new function for balancing bidi control chars

2020-08-15 Thread Teemu Likonen
The following Unicode's bidirectional control chars are modal so that
they push a new bidirectional rendering mode to a stack:

U+202A LEFT-TO-RIGHT EMBEDDING
U+202B RIGHT-TO-LEFT EMBEDDING
U+202D LEFT-TO-RIGHT OVERRIDE
U+202E RIGHT-TO-LEFT OVERRIDE

Every mode must be terminated with with character U+202C POP
DIRECTIONAL FORMATTING which pops the mode from the stack. The stack
is per paragraph. A new text paragraph resets the rendering mode
changed by these control characters.

This change adds a new function "notmuch-balance-bidi-ctrl-chars"
which reads its STRING argument and ensures that all push
characters (U+202A, U+202B, U+202D, U+202E) have a pop character
pair (U+202C). The function may add more U+202C characters at the end
of the returned string, or it may remove some U+202C characters. The
returned string is safe in the sense that it won't change the
surrounding bidirectional rendering mode. This function should be used
when sanitizing arbitrary input.
---
 emacs/notmuch-lib.el | 54 
 1 file changed, 54 insertions(+)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 118faf1e..e6252c6c 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -469,6 +469,60 @@ be displayed."
"[No Subject]"
   subject)))
 
+
+(defun notmuch-balance-bidi-ctrl-chars (string)
+  "Balance bidirectional control chars in STRING.
+
+The following Unicode's bidirectional control chars are modal so
+that they push a new bidirectional rendering mode to a stack:
+U+202A LEFT-TO-RIGHT EMBEDDING, U+202B RIGHT-TO-LEFT EMBEDDING,
+U+202D LEFT-TO-RIGHT OVERRIDE and U+202E RIGHT-TO-LEFT OVERRIDE.
+Every mode must be terminated with with character U+202C POP
+DIRECTIONAL FORMATTING which pops the mode from the stack. The
+stack is per paragraph. A new text paragraph resets the rendering
+mode changed by these control characters.
+
+This function reads the STRING argument and ensures that all push
+characters (U+202A, U+202B, U+202D, U+202E) have a pop character
+pair (U+202C). The function may add more U+202C characters at the
+end of the returned string, or it may remove some U+202C
+characters. The returned string is safe in the sense that it
+won't change the surrounding bidirectional rendering mode. This
+function should be used when sanitizing arbitrary input."
+
+  (let ((new-string nil)
+   (stack-count 0))
+
+(cl-flet ((push-char-p (c)
+   ;; U+202A LEFT-TO-RIGHT EMBEDDING
+   ;; U+202B RIGHT-TO-LEFT EMBEDDING
+   ;; U+202D LEFT-TO-RIGHT OVERRIDE
+   ;; U+202E RIGHT-TO-LEFT OVERRIDE
+   (cl-find c '(?\u202a ?\u202b ?\u202d ?\u202e)))
+ (pop-char-p (c)
+   ;; U+202C POP DIRECTIONAL FORMATTING
+   (eql c ?\u202c)))
+
+  (cl-loop for char across string
+  do (cond ((push-char-p char)
+(cl-incf stack-count)
+(push char new-string))
+   ((and (pop-char-p char)
+ (cl-plusp stack-count))
+(cl-decf stack-count)
+(push char new-string))
+   ((and (pop-char-p char)
+ (not (cl-plusp stack-count)))
+;; The stack is empty. Ignore this pop character.
+)
+   (t (push char new-string)
+
+;; Add possible missing pop characters.
+(cl-loop repeat stack-count
+do (push ?\x202c new-string))
+
+(seq-into (nreverse new-string) 'string)))
+
 (defun notmuch-sanitize (str)
   "Sanitize control character in STR.
 
-- 
2.20.1
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


[PATCH 0/2] Balance bidi control chars

2020-08-15 Thread Teemu Likonen
These patches continue the ideas written in message:

id:87sgcuuzio@iki.fi

The first patch adds an new function which can be used to balance
Unicode's bidirectional control characters in its string argument. The
seconds patch modifies old "notmuch-sanitize" function so that it
calls the new function.


Teemu Likonen (2):
  Emacs: Add a new function for balancing bidi control chars
  Emacs: Call notmuch-balance-bidi-ctrl-chars in notmuch-sanitize

 emacs/notmuch-lib.el | 57 +++-
 1 file changed, 56 insertions(+), 1 deletion(-)

-- 
2.20.1
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


[PATCH 2/2] Emacs: Call notmuch-balance-bidi-ctrl-chars in notmuch-sanitize

2020-08-15 Thread Teemu Likonen
---
 emacs/notmuch-lib.el | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index e6252c6c..e0122f7a 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -527,7 +527,8 @@ function should be used when sanitizing arbitrary input."
   "Sanitize control character in STR.
 
 This includes newlines, tabs, and other funny characters."
-  (replace-regexp-in-string "[[:cntrl:]\x7f\u2028\u2029]+" " " str))
+  (notmuch-balance-bidi-ctrl-chars
+   (replace-regexp-in-string "[[:cntrl:]\x7f\u2028\u2029]+" " " str)))
 
 (defun notmuch-escape-boolean-term (term)
   "Escape a boolean term for use in a query.
-- 
2.20.1
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Sanitize bidi control chars

2020-08-10 Thread Teemu Likonen
* 2020-08-10 19:45:11+03, Teemu Likonen wrote:

> If we wanted to clean message headers from possible unpaired overrides
> we should clean all these:
>
> U+202A LEFT-TO-RIGHT EMBEDDING (push)
> U+202B RIGHT-TO-LEFT EMBEDDING (push)
> U+202C POP DIRECTIONAL FORMATTING (pop)
> U+202D LEFT-TO-RIGHT OVERRIDE (push)
> U+202E RIGHT-TO-LEFT OVERRIDE (push)
>
> Or we could even try to be clever and count those characters and then
> insert or remove some of them so that there are as many "push"
> characters as "pop" characters.

Below is an example Emacs Lisp function to balance those "push" and
"pop" bidi control chars. This kind of code could be used to sanitize
message headers or any arbitrary text coming from user.

I'm not even sure if such thing should be done in Emacs or in lower
level Notmuch code. Anyway, I tried to add it to notmuch-sanitize
function. Now Tomi's message didn't switch direction of other text
anymore (in notmuch-search-mode buffer).


(defun notmuch-balance-bidi-ctrl-chars (string)
  (let ((new nil)
(stack-count 0))

(cl-flet ((push-char-p (c)
;; U+202A LEFT-TO-RIGHT EMBEDDING
;; U+202B RIGHT-TO-LEFT EMBEDDING
;; U+202D LEFT-TO-RIGHT OVERRIDE
;; U+202E RIGHT-TO-LEFT OVERRIDE
(cl-find c '(?\x202a ?\x202b ?\x202d ?\x202e)))
  (pop-char-p (c)
;; U+202C POP DIRECTIONAL FORMATTING
(eql c ?\x202c)))

  (cl-loop
   for char across string
   do (cond ((push-char-p char)
 (cl-incf stack-count)
 (push char new))
((and (pop-char-p char)
  (cl-plusp stack-count))
 (cl-decf stack-count)
 (push char new))
((and (pop-char-p char)
  (not (cl-plusp stack-count)))
 ;; The stack is empty. Ignore this pop char.
 )
(t (push char new)

;; Add missing pops.
(cl-loop
 repeat stack-count
 do (push ?\x202c new))

(seq-into (nreverse new) 'string)))



-- 
/// Teemu Likonen - .-.. http://www.iki.fi/tlikonen/
// OpenPGP: 4E1055DC84E9DFF613D78557719D69D324539450


signature.asc
Description: PGP signature
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


mutt-like interface [was: Re: BiDi]

2012-02-03 Thread David Edmondson
On Thu, 02 Feb 2012 14:39:40 -0500, James Vasile  
wrote:
> Is anybody interested in a more mutt-like single-email view with a
> threaded index?  No indenting, no waiting forever for long threads to
> fontify, no wading through a long thread looking for the one email you
> actually want.

Various people have suggested it. No-one has written the code. Do you
want that because you prefer it generally or because you believe it will
be faster?

You can disable the indenting, of course (see
`notmuch-show-indent-messages-width').

Did you spend some time determining that it's the fontification that's
causing things to be slow? There was some analysis which showed that
json.el is not the best performer, but it's more likely to be a
combination of many issues.

You could try the patch set at
id:"1328181833-14988-1-git-send-email-dme at dme.org", which includes some
things that can speed up thread display by showing only the matching
messages.

Last, I've been thinking about lazy insertion of message bodies, such
that non-matching messages will not be inserted and rendered until you
un-hide them. That should improve the rendering speed, but would have
the effect of disabling the current 'isearch can open closed messages'
behaviour.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 



BiDi

2012-02-03 Thread David Edmondson
On Tue, 24 Jan 2012 20:08:14 +, Clint Adams  wrote:
> I just received an email in RTL script and it was rendered incorrectly
> in the emacs interface.  Does anyone know what to do about this?

Clint, can you provide a sample message for me to work with?
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 



Re: mutt-like interface [was: Re: BiDi]

2012-02-03 Thread David Edmondson
On Thu, 02 Feb 2012 14:39:40 -0500, James Vasile  
wrote:
> Is anybody interested in a more mutt-like single-email view with a
> threaded index?  No indenting, no waiting forever for long threads to
> fontify, no wading through a long thread looking for the one email you
> actually want.

Various people have suggested it. No-one has written the code. Do you
want that because you prefer it generally or because you believe it will
be faster?

You can disable the indenting, of course (see
`notmuch-show-indent-messages-width').

Did you spend some time determining that it's the fontification that's
causing things to be slow? There was some analysis which showed that
json.el is not the best performer, but it's more likely to be a
combination of many issues.

You could try the patch set at
id:"1328181833-14988-1-git-send-email-...@dme.org", which includes some
things that can speed up thread display by showing only the matching
messages.

Last, I've been thinking about lazy insertion of message bodies, such
that non-matching messages will not be inserted and rendered until you
un-hide them. That should improve the rendering speed, but would have
the effect of disabling the current 'isearch can open closed messages'
behaviour.


pgpx8OpZNBe6h.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: BiDi

2012-02-03 Thread David Edmondson
On Tue, 24 Jan 2012 20:08:14 +, Clint Adams  wrote:
> I just received an email in RTL script and it was rendered incorrectly
> in the emacs interface.  Does anyone know what to do about this?

Clint, can you provide a sample message for me to work with?


pgpXjnDL87II5.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


BiDi

2012-02-02 Thread Clint Adams
On Thu, Feb 02, 2012 at 09:44:07AM -0800, Jameson Graef Rollins wrote:
> More info please.  What is "RTL script"?  Is that "right to left"?  If
> so, yikes.  I can't even imagine what a mess that would have looked
> like.  Another argument against this crazy indenting stuff we do.  I
> hope the thread was also indenting from the right!

Yes, right to left.

Presumably this will be fixed in emacs some year;
http://www.emacswiki.org/emacs/SupportBiDi


BiDi

2012-02-02 Thread James Vasile
On Thu, 02 Feb 2012 09:44:07 -0800, Jameson Graef Rollins  wrote:
> On Tue, 24 Jan 2012 20:08:14 +, Clint Adams  wrote:
> > I just received an email in RTL script and it was rendered incorrectly
> > in the emacs interface.  Does anyone know what to do about this?
> 
> More info please.  What is "RTL script"?  Is that "right to left"?  If
> so, yikes.  I can't even imagine what a mess that would have looked
> like.  Another argument against this crazy indenting stuff we do.  I
> hope the thread was also indenting from the right!
> 
> jamie.
> ___
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

Is anybody interested in a more mutt-like single-email view with a
threaded index?  No indenting, no waiting forever for long threads to
fontify, no wading through a long thread looking for the one email you
actually want.


Re: BiDi

2012-02-02 Thread James Vasile
On Thu, 02 Feb 2012 09:44:07 -0800, Jameson Graef Rollins 
 wrote:
> On Tue, 24 Jan 2012 20:08:14 +, Clint Adams  wrote:
> > I just received an email in RTL script and it was rendered incorrectly
> > in the emacs interface.  Does anyone know what to do about this?
> 
> More info please.  What is "RTL script"?  Is that "right to left"?  If
> so, yikes.  I can't even imagine what a mess that would have looked
> like.  Another argument against this crazy indenting stuff we do.  I
> hope the thread was also indenting from the right!
> 
> jamie.
> ___
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

Is anybody interested in a more mutt-like single-email view with a
threaded index?  No indenting, no waiting forever for long threads to
fontify, no wading through a long thread looking for the one email you
actually want.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: BiDi

2012-02-02 Thread Clint Adams
On Thu, Feb 02, 2012 at 09:44:07AM -0800, Jameson Graef Rollins wrote:
> More info please.  What is "RTL script"?  Is that "right to left"?  If
> so, yikes.  I can't even imagine what a mess that would have looked
> like.  Another argument against this crazy indenting stuff we do.  I
> hope the thread was also indenting from the right!

Yes, right to left.

Presumably this will be fixed in emacs some year;
http://www.emacswiki.org/emacs/SupportBiDi
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: BiDi

2012-02-02 Thread Jameson Graef Rollins
On Tue, 24 Jan 2012 20:08:14 +, Clint Adams  wrote:
> I just received an email in RTL script and it was rendered incorrectly
> in the emacs interface.  Does anyone know what to do about this?

More info please.  What is "RTL script"?  Is that "right to left"?  If
so, yikes.  I can't even imagine what a mess that would have looked
like.  Another argument against this crazy indenting stuff we do.  I
hope the thread was also indenting from the right!

jamie.


pgpXMMmE8oDNP.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


BiDi

2012-02-02 Thread Jameson Graef Rollins
On Tue, 24 Jan 2012 20:08:14 +, Clint Adams  wrote:
> I just received an email in RTL script and it was rendered incorrectly
> in the emacs interface.  Does anyone know what to do about this?

More info please.  What is "RTL script"?  Is that "right to left"?  If
so, yikes.  I can't even imagine what a mess that would have looked
like.  Another argument against this crazy indenting stuff we do.  I
hope the thread was also indenting from the right!

jamie.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: 



BiDi

2012-02-02 Thread Clint Adams
I just received an email in RTL script and it was rendered incorrectly
in the emacs interface.  Does anyone know what to do about this?
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


BiDi

2012-01-24 Thread Clint Adams
I just received an email in RTL script and it was rendered incorrectly
in the emacs interface.  Does anyone know what to do about this?