Re: greedy substitution in org-open-file

2021-03-21 Thread Maxim Nikulin

On 13/02/2021 11:38, Kyle Meyer wrote:
  
+(defun org--open-file-format-spec (format specification)

+  (with-temp-buffer
+(insert format)
+(goto-char (point-min))
+(while (search-forward "%" nil t)
+  (cond ((eq (char-after) ?%)
+ (delete-char 1))
+((looking-at "[s0-9]")
+ (replace-match
+  (or (cdr (assoc (match-string 0) specification))
+  (error "Invalid format string"))
+  'fixed-case 'literal)
+ (delete-region (1- (match-beginning 0)) (match-beginning 0)))


Finally I managed to convince myself that delete-region does not change 
position in the buffer, so "%s" or "%1" in specification are not a 
problem. I am aware that this implementation is a simplified version of 
format-spec, but I am still unsure if jumping over a buffer is the best 
choice.


I am in doubts if strings or characters should be used in the spec:
'(("s" . "file.pdf")) vs. '((?s . "file.pdf")), but really it does not 
matter.


I have created some tests for this function, see the attachment. 
Actually I do not like such style of tests since first failure stops 
whole test and it is hard to get general impression to which degree the 
function under the test is broken, but I am not aware of a better way.
Recently I asked Ihor a similar question: 
https://orgmode.org/list/s324b0$74g$1...@ciao.gmane.io


I know that functions called from one point are not in favor in org 
sources, but I do not mind to have additional helper function to add 
tests that all substitutions are properly escaped.



+  (let ((ngroups (- (/ (length link-match-data) 2) 1)))
+(and (> ngroups 0)
+ (progn
+   (set-match-data link-match-data)
+   (mapcar (lambda (n)
+ (cons (number-to-string n)
+   (match-string-no-properties n dlink)))
+   (number-sequence 1 ngroups


Matter of taste: it seems that with (number-sequence 1 ngroups 1)) it is 
possible to avoid (> ngroups 0).


I have spent some time evaluating how to make errors more helpful to 
users. I am unsure if multiline message is acceptable to dump content of 
specification. For a while, a place in the format where error has 
happened (combined with a different approach to parse format string)


(defun org--open-file-format-spec (format specification)
  (apply #'concat
 (nreverse
  (let ((result nil)
(token-end 0))
(while (string-match "%\\(.?\\)" format token-end)
  (let ((token-start (match-beginning 0))
(subst (match-string-no-properties 1 format)))
(push (substring format token-end token-start) result)
(push (if (equal subst "%")
  "%"
(or (cdr (assoc subst specification))
  (error "Unknown substitution: '%s%s'"
   (substring format 0 token-start)
   (substring format token-start nil
  result))
  (setq token-end (match-end 0)))
(push (substring format token-end nil) result)

To my surprise neither ^ nor \\` in string-match regexp works if 
start-pos is not zero.
diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el
index 78cd29576..b6e42dc99 100644
--- a/testing/lisp/test-org.el
+++ b/testing/lisp/test-org.el
@@ -8236,6 +8236,72 @@ two
  (call-interactively #'org-paste-subtree)
  (buffer-string)
 
+(ert-deftest org-test/org--open-file-format-spec ()
+  "Test `org-open-file' helper `org--open-file-format-spec'."
+  (let ((def-spec '(("s" . "file.pdf") ("1" . "10") ("2" . "pattern"
+(should (equal "zathura --page 10 --find pattern file.pdf"
+		   (org--open-file-format-spec
+		"zathura --page %1 --find %2 %s" def-spec)))
+(should (equal "no subst"
+		   (org--open-file-format-spec
+		"no subst" def-spec)))
+(should (equal "simple file.pdf"
+		   (org--open-file-format-spec
+		"simple %s" def-spec)))
+(should (equal "with --page 10 file.pdf"
+		   (org--open-file-format-spec
+		  "with --page %1 %s" def-spec)))
+(should (equal "file.pdf at start"
+		 (org--open-file-format-spec
+		  "%s at start" def-spec)))
+(should (equal "in the file.pdf middle"
+		 (org--open-file-format-spec
+		  "in the %s middle" def-spec)))
+(should (equal "literal %"
+		   (org--open-file-format-spec
+		"literal %%" def-spec)))
+(should (equal "literal %s in the middle"
+		   (org--open-file-format-spec
+		"literal %%s in the middle" def-spec)))
+(should (equal "% literal at start"
+		 (org--open-file-format-spec
+		  "%% literal at start" def-spec)))
+(should (equal "" (org--open-file-format-spec "" def-spec)))
+(should (equal "adj

Re: greedy substitution in org-open-file

2021-03-03 Thread Maxim Nikulin

Discussion of the original patch:
https://orgmode.org/list/4b51d104.9090...@jboecker.de/T/#u
https://lists.gnu.org/archive/html/emacs-orgmode/2010-01/msg00450.html

https://orgmode.org/list/4bb9c078.9050...@jboecker.de/

The patch may make it a little easier to break things by
misconfiguring org-file-apps.





Re: greedy substitution in org-open-file

2021-02-15 Thread Maxim Nikulin

On 13/02/2021 11:38, Kyle Meyer wrote:


All right, here's
a format-spec-inspired fix.  At the very least it needs doc updates and
a comment or two.


Thank you. I am hardly familiar with elisp so it would be difficult for 
me to express the same. My comments are mostly a matter of taste.


Sorry, I have not tried to run the code yet.


diff --git a/lisp/org.el b/lisp/org.el
index 5b1443c4e..e8f60fd83 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8644,6 +8644,23 @@ (defun org--file-apps-regexp-alist (list &optional 
add-auto-mode)
 (when add-auto-mode
   (mapcar (lambda (x) (cons (car x) 'emacs)) auto-mode-alist
  
+(defun org--open-file-format-spec (format specification)

+  (with-temp-buffer
+(insert format)
+(goto-char (point-min))
+(while (search-forward "%" nil t)
+  (cond ((eq (char-after) ?%)
+ (delete-char 1))
+((looking-at "[s0-9]")


Personally I do not see a reason to limit substitutions just to just 
"%s". I would consider "[a-zA-Z0-9]".



+ (replace-match
+  (or (cdr (assoc (match-string 0) specification))
+  (error "Invalid format string"))


I think, substitution character in the error message could be useful 
during debugging, something like (format "Invalid format character %%%s" 
(match-string 0)).



+  'fixed-case 'literal)
+ (delete-region (1- (match-beginning 0)) (match-beginning 0)))
+(t
+ (error "Invalid format string"
+(buffer-string)))
+
  ;;;###autoload
  (defun org-open-file (path &optional in-emacs line search)
"Open the file at PATH.
@@ -8745,24 +8762,20 @@ (defun org-open-file (path &optional in-emacs line 
search)
;; Remove quotes around the file name - we'll use shell-quote-argument.
(while (string-match "['\"]%s['\"]" cmd)
(setq cmd (replace-match "%s" t t cmd)))
-  (setq cmd (replace-regexp-in-string
-"%s"
-(shell-quote-argument (convert-standard-filename file))
-cmd
-nil t))
-
-  ;; Replace "%1", "%2" etc. in command with group matches from regex
-  (save-match-data
-   (let ((match-index 1)
- (number-of-groups (- (/ (length link-match-data) 2) 1)))
- (set-match-data link-match-data)
- (while (<= match-index number-of-groups)
-   (let ((regex (concat "%" (number-to-string match-index)))
- (replace-with (match-string match-index dlink)))
- (while (string-match regex cmd)
-   (setq cmd (replace-match replace-with t t cmd
-   (setq match-index (+ match-index 1)
-
+  (setq cmd
+(org--open-file-format-spec
+ cmd
+ (cons
+  (cons "s" (shell-quote-argument
+ (convert-standard-filename file)))
+  (let ((ngroups (- (/ (length link-match-data) 2) 1)))
+(and (> ngroups 0)
+ (progn
+   (set-match-data link-match-data)
+   (mapcar (lambda (n)
+ (cons (number-to-string n)
+   (match-string-no-properties n dlink)))


Should not be numbered substitutions passed through shell-quote-argument 
as well? Besides just page numbers the link could be named internal 
anchor where more characters are allowed. It is the reason why in 
general I prefer bare exec to shell commands.


I am unsure concerning optional matches as

"\\(?:\\.pdf\\)\\(?:::\\([0-9]+\\)\\)?\\'"

Maybe they should not be used at all in favor of separate entries with 
and without page number. Maybe nil should coalesce to empty string "". 
Certainly I am against "nil" string for a missed group. By the way, in 
the original format-spec, cdr is applied after the check whether assoc 
value is nil.



+   (number-sequence 1 ngroups
(save-window-excursion
(message "Running %s...done" cmd)
(start-process-shell-command cmd nil cmd)







Re: greedy substitution in org-open-file

2021-02-12 Thread Kyle Meyer
Maxim Nikulin writes:

> On 12/02/2021 14:16, Kyle Meyer wrote:

>> Not relevant for the underlying issue, but doesn't xpdf require a colon
>> before the page number (i.e. ":%1")?
>
> At least for the application in debian & ubuntu xpdf package, page 
> number should be specified without a colon. It is Xt interface to 
> poppler PDF library, recently its maintainer decided to switch to 
> xpopple project as upstream. UI is derived from old version of xpdf. 
> Latest original xpdf version is based on Qt and might have different 
> convention in respect to page numbers.

Okay.  Fwiw the xpdf version I have that requires ":" before the page is
4.03.

>> What about flipping the processing, handling the %N placeholders first
>> and then formatting the file name?  Seems to work on my end, though I
>> haven't tested it thoroughly.
>
> I could anticipate similar problems if named destinations are involved. 
> I have not checked but I expect that internal links might have "%s" in 
> their names at least for some file types.

Indeed, flipping the order unsurprisingly just flips which placeholders
can be problematic.  A very contrived example:

  (setq org-file-apps
'(("\\.pdf::\\([A-z%]+\\)\\'" . "doesntmatter %s %1")))

  ;; file:/tmp/test.pdf::a%sb =>
  ;; Running doesntmatter /tmp/test.pdf a/tmp/test.pdfb...done

> That is why I would strongly prefer substitutions performed in a
> single pass. I do not like it, but it seems that simplified variant of
> format-spec is better. It should allows substitutions with digit. I
> hope, single digit should be enough.

True, realistically I don't think anyone has a command in org-file-apps
that relies on more than a couple of capture groups.  All right, here's
a format-spec-inspired fix.  At the very least it needs doc updates and
a comment or two.

diff --git a/lisp/org.el b/lisp/org.el
index 5b1443c4e..e8f60fd83 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8644,6 +8644,23 @@ (defun org--file-apps-regexp-alist (list &optional 
add-auto-mode)
(when add-auto-mode
  (mapcar (lambda (x) (cons (car x) 'emacs)) auto-mode-alist
 
+(defun org--open-file-format-spec (format specification)
+  (with-temp-buffer
+(insert format)
+(goto-char (point-min))
+(while (search-forward "%" nil t)
+  (cond ((eq (char-after) ?%)
+ (delete-char 1))
+((looking-at "[s0-9]")
+ (replace-match
+  (or (cdr (assoc (match-string 0) specification))
+  (error "Invalid format string"))
+  'fixed-case 'literal)
+ (delete-region (1- (match-beginning 0)) (match-beginning 0)))
+(t
+ (error "Invalid format string"
+(buffer-string)))
+
 ;;;###autoload
 (defun org-open-file (path &optional in-emacs line search)
   "Open the file at PATH.
@@ -8745,24 +8762,20 @@ (defun org-open-file (path &optional in-emacs line 
search)
   ;; Remove quotes around the file name - we'll use shell-quote-argument.
   (while (string-match "['\"]%s['\"]" cmd)
(setq cmd (replace-match "%s" t t cmd)))
-  (setq cmd (replace-regexp-in-string
-"%s"
-(shell-quote-argument (convert-standard-filename file))
-cmd
-nil t))
-
-  ;; Replace "%1", "%2" etc. in command with group matches from regex
-  (save-match-data
-   (let ((match-index 1)
- (number-of-groups (- (/ (length link-match-data) 2) 1)))
- (set-match-data link-match-data)
- (while (<= match-index number-of-groups)
-   (let ((regex (concat "%" (number-to-string match-index)))
- (replace-with (match-string match-index dlink)))
- (while (string-match regex cmd)
-   (setq cmd (replace-match replace-with t t cmd
-   (setq match-index (+ match-index 1)
-
+  (setq cmd
+(org--open-file-format-spec
+ cmd
+ (cons
+  (cons "s" (shell-quote-argument
+ (convert-standard-filename file)))
+  (let ((ngroups (- (/ (length link-match-data) 2) 1)))
+(and (> ngroups 0)
+ (progn
+   (set-match-data link-match-data)
+   (mapcar (lambda (n)
+ (cons (number-to-string n)
+   (match-string-no-properties n dlink)))
+   (number-sequence 1 ngroups
   (save-window-excursion
(message "Running %s...done" cmd)
(start-process-shell-command cmd nil cmd)



Re: greedy substitution in org-open-file

2021-02-12 Thread Maxim Nikulin

On 12/02/2021 14:16, Kyle Meyer wrote:

#+begin_src elisp
(setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1")))
#+end_src


Not relevant for the underlying issue, but doesn't xpdf require a colon
before the page number (i.e. ":%1")?


At least for the application in debian & ubuntu xpdf package, page 
number should be specified without a colon. It is Xt interface to 
poppler PDF library, recently its maintainer decided to switch to 
xpopple project as upstream. UI is derived from old version of xpdf. 
Latest original xpdf version is based on Qt and might have different 
convention in respect to page numbers.



I believe format-spec requires the placeholder to be A-z:

   (format-spec "xpdf %s" '((?s . "a")))  => "xpdf a"
   (format-spec "xpdf %s %1" '((?s . "a") (?1 . "b")))  ;; Invalid format string


You are right. I missed that format-spec allows to specify field width, 
so digits could not be used.



What about flipping the processing, handling the %N placeholders first
and then formatting the file name?  Seems to work on my end, though I
haven't tested it thoroughly.


I could anticipate similar problems if named destinations are involved. 
I have not checked but I expect that internal links might have "%s" in 
their names at least for some file types. That is why I would strongly 
prefer substitutions performed in a single pass. I do not like it, but 
it seems that simplified variant of format-spec is better. It should 
allows substitutions with digit. I hope, single digit should be enough.





Re: greedy substitution in org-open-file

2021-02-11 Thread Kyle Meyer
Maxim Nikulin writes:

> Looking into the code related to 'pty problem with 
> start-process-shell-command, I have realized that the following case is 
> not handled correctly:
>
> #+begin_src elisp
>(setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1")))
> #+end_src

Not relevant for the underlying issue, but doesn't xpdf require a colon
before the page number (i.e. ":%1")?

> I hope, I adapted an example from [[help:org-file-apps]] correctly. When 
> I try to open the following link using C-c C-o
>
> [[file:test-%1f.pdf::42]]
>
> I get
>
> Running xpdf /home/ubuntu/examples/org/test-\42f.pdf 42...done
>
> I believe, it should be
>
> Running xpdf /home/ubuntu/examples/org/test-%1f.pdf 42...done
>
> Or
>
> Running xpdf /home/ubuntu/examples/org/test-\%1f.pdf 42...done

Yep, it should show up as "Running xpdf /tmp/test-\%1f.pdf :42...done",
with the backslash coming from shell-quote-argument.

> The source of the problem is that %s substitution is expanded at first 
> and regexp groups are replaced later. Ideally, it should be performed in 
> a single pass. I have found format-spec function but I am unsure 
> concerning it, maybe it is avoided intentionally.

I believe format-spec requires the placeholder to be A-z:

  (format-spec "xpdf %s" '((?s . "a")))  => "xpdf a"
  (format-spec "xpdf %s %1" '((?s . "a") (?1 . "b")))  ;; Invalid format string

What about flipping the processing, handling the %N placeholders first
and then formatting the file name?  Seems to work on my end, though I
haven't tested it thoroughly.

diff --git a/lisp/org.el b/lisp/org.el
index c61b8fb56..31dbe352a 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8714,12 +8714,6 @@ (defun org-open-file (path &optional in-emacs line 
search)
   ;; Remove quotes around the file name - we'll use shell-quote-argument.
   (while (string-match "['\"]%s['\"]" cmd)
(setq cmd (replace-match "%s" t t cmd)))
-  (setq cmd (replace-regexp-in-string
-"%s"
-(shell-quote-argument (convert-standard-filename file))
-cmd
-nil t))
-
   ;; Replace "%1", "%2" etc. in command with group matches from regex
   (save-match-data
(let ((match-index 1)
@@ -8731,7 +8725,11 @@ (defun org-open-file (path &optional in-emacs line 
search)
  (while (string-match regex cmd)
(setq cmd (replace-match replace-with t t cmd
(setq match-index (+ match-index 1)
-
+  (setq cmd (replace-regexp-in-string
+"%s"
+(shell-quote-argument (convert-standard-filename file))
+cmd
+nil t))
   (save-window-excursion
(message "Running %s...done" cmd)
(start-process-shell-command cmd nil cmd)




greedy substitution in org-open-file

2021-01-20 Thread Maxim Nikulin
Looking into the code related to 'pty problem with 
start-process-shell-command, I have realized that the following case is 
not handled correctly:


#+begin_src elisp
  (setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1")))
#+end_src

I hope, I adapted an example from [[help:org-file-apps]] correctly. When 
I try to open the following link using C-c C-o


[[file:test-%1f.pdf::42]]

I get

Running xpdf /home/ubuntu/examples/org/test-\42f.pdf 42...done

I believe, it should be

Running xpdf /home/ubuntu/examples/org/test-%1f.pdf 42...done

Or

Running xpdf /home/ubuntu/examples/org/test-\%1f.pdf 42...done

Org mode version 9.4.4 (release_9.4.4-164-g7a9a8a

The source of the problem is that %s substitution is expanded at first 
and regexp groups are replaced later. Ideally, it should be performed in 
a single pass. I have found format-spec function but I am unsure 
concerning it, maybe it is avoided intentionally.