Re: greedy substitution in org-open-file
On 13/02/2021 11:38, Kyle Meyer wrote: +(defun org--open-file-format-spec (format specification) + (with-temp-buffer +(insert format) +(goto-char (point-min)) +(while (search-forward "%" nil t) + (cond ((eq (char-after) ?%) + (delete-char 1)) +((looking-at "[s0-9]") + (replace-match + (or (cdr (assoc (match-string 0) specification)) + (error "Invalid format string")) + 'fixed-case 'literal) + (delete-region (1- (match-beginning 0)) (match-beginning 0))) Finally I managed to convince myself that delete-region does not change position in the buffer, so "%s" or "%1" in specification are not a problem. I am aware that this implementation is a simplified version of format-spec, but I am still unsure if jumping over a buffer is the best choice. I am in doubts if strings or characters should be used in the spec: '(("s" . "file.pdf")) vs. '((?s . "file.pdf")), but really it does not matter. I have created some tests for this function, see the attachment. Actually I do not like such style of tests since first failure stops whole test and it is hard to get general impression to which degree the function under the test is broken, but I am not aware of a better way. Recently I asked Ihor a similar question: https://orgmode.org/list/s324b0$74g$1...@ciao.gmane.io I know that functions called from one point are not in favor in org sources, but I do not mind to have additional helper function to add tests that all substitutions are properly escaped. + (let ((ngroups (- (/ (length link-match-data) 2) 1))) +(and (> ngroups 0) + (progn + (set-match-data link-match-data) + (mapcar (lambda (n) + (cons (number-to-string n) + (match-string-no-properties n dlink))) + (number-sequence 1 ngroups Matter of taste: it seems that with (number-sequence 1 ngroups 1)) it is possible to avoid (> ngroups 0). I have spent some time evaluating how to make errors more helpful to users. I am unsure if multiline message is acceptable to dump content of specification. For a while, a place in the format where error has happened (combined with a different approach to parse format string) (defun org--open-file-format-spec (format specification) (apply #'concat (nreverse (let ((result nil) (token-end 0)) (while (string-match "%\\(.?\\)" format token-end) (let ((token-start (match-beginning 0)) (subst (match-string-no-properties 1 format))) (push (substring format token-end token-start) result) (push (if (equal subst "%") "%" (or (cdr (assoc subst specification)) (error "Unknown substitution: '%s%s'" (substring format 0 token-start) (substring format token-start nil result)) (setq token-end (match-end 0))) (push (substring format token-end nil) result) To my surprise neither ^ nor \\` in string-match regexp works if start-pos is not zero. diff --git a/testing/lisp/test-org.el b/testing/lisp/test-org.el index 78cd29576..b6e42dc99 100644 --- a/testing/lisp/test-org.el +++ b/testing/lisp/test-org.el @@ -8236,6 +8236,72 @@ two (call-interactively #'org-paste-subtree) (buffer-string) +(ert-deftest org-test/org--open-file-format-spec () + "Test `org-open-file' helper `org--open-file-format-spec'." + (let ((def-spec '(("s" . "file.pdf") ("1" . "10") ("2" . "pattern" +(should (equal "zathura --page 10 --find pattern file.pdf" + (org--open-file-format-spec + "zathura --page %1 --find %2 %s" def-spec))) +(should (equal "no subst" + (org--open-file-format-spec + "no subst" def-spec))) +(should (equal "simple file.pdf" + (org--open-file-format-spec + "simple %s" def-spec))) +(should (equal "with --page 10 file.pdf" + (org--open-file-format-spec + "with --page %1 %s" def-spec))) +(should (equal "file.pdf at start" + (org--open-file-format-spec + "%s at start" def-spec))) +(should (equal "in the file.pdf middle" + (org--open-file-format-spec + "in the %s middle" def-spec))) +(should (equal "literal %" + (org--open-file-format-spec + "literal %%" def-spec))) +(should (equal "literal %s in the middle" + (org--open-file-format-spec + "literal %%s in the middle" def-spec))) +(should (equal "% literal at start" + (org--open-file-format-spec + "%% literal at start" def-spec))) +(should (equal "" (org--open-file-format-spec "" def-spec))) +(should (equal "adj
Re: greedy substitution in org-open-file
Discussion of the original patch: https://orgmode.org/list/4b51d104.9090...@jboecker.de/T/#u https://lists.gnu.org/archive/html/emacs-orgmode/2010-01/msg00450.html https://orgmode.org/list/4bb9c078.9050...@jboecker.de/ The patch may make it a little easier to break things by misconfiguring org-file-apps.
Re: greedy substitution in org-open-file
On 13/02/2021 11:38, Kyle Meyer wrote: All right, here's a format-spec-inspired fix. At the very least it needs doc updates and a comment or two. Thank you. I am hardly familiar with elisp so it would be difficult for me to express the same. My comments are mostly a matter of taste. Sorry, I have not tried to run the code yet. diff --git a/lisp/org.el b/lisp/org.el index 5b1443c4e..e8f60fd83 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -8644,6 +8644,23 @@ (defun org--file-apps-regexp-alist (list &optional add-auto-mode) (when add-auto-mode (mapcar (lambda (x) (cons (car x) 'emacs)) auto-mode-alist +(defun org--open-file-format-spec (format specification) + (with-temp-buffer +(insert format) +(goto-char (point-min)) +(while (search-forward "%" nil t) + (cond ((eq (char-after) ?%) + (delete-char 1)) +((looking-at "[s0-9]") Personally I do not see a reason to limit substitutions just to just "%s". I would consider "[a-zA-Z0-9]". + (replace-match + (or (cdr (assoc (match-string 0) specification)) + (error "Invalid format string")) I think, substitution character in the error message could be useful during debugging, something like (format "Invalid format character %%%s" (match-string 0)). + 'fixed-case 'literal) + (delete-region (1- (match-beginning 0)) (match-beginning 0))) +(t + (error "Invalid format string" +(buffer-string))) + ;;;###autoload (defun org-open-file (path &optional in-emacs line search) "Open the file at PATH. @@ -8745,24 +8762,20 @@ (defun org-open-file (path &optional in-emacs line search) ;; Remove quotes around the file name - we'll use shell-quote-argument. (while (string-match "['\"]%s['\"]" cmd) (setq cmd (replace-match "%s" t t cmd))) - (setq cmd (replace-regexp-in-string -"%s" -(shell-quote-argument (convert-standard-filename file)) -cmd -nil t)) - - ;; Replace "%1", "%2" etc. in command with group matches from regex - (save-match-data - (let ((match-index 1) - (number-of-groups (- (/ (length link-match-data) 2) 1))) - (set-match-data link-match-data) - (while (<= match-index number-of-groups) - (let ((regex (concat "%" (number-to-string match-index))) - (replace-with (match-string match-index dlink))) - (while (string-match regex cmd) - (setq cmd (replace-match replace-with t t cmd - (setq match-index (+ match-index 1) - + (setq cmd +(org--open-file-format-spec + cmd + (cons + (cons "s" (shell-quote-argument + (convert-standard-filename file))) + (let ((ngroups (- (/ (length link-match-data) 2) 1))) +(and (> ngroups 0) + (progn + (set-match-data link-match-data) + (mapcar (lambda (n) + (cons (number-to-string n) + (match-string-no-properties n dlink))) Should not be numbered substitutions passed through shell-quote-argument as well? Besides just page numbers the link could be named internal anchor where more characters are allowed. It is the reason why in general I prefer bare exec to shell commands. I am unsure concerning optional matches as "\\(?:\\.pdf\\)\\(?:::\\([0-9]+\\)\\)?\\'" Maybe they should not be used at all in favor of separate entries with and without page number. Maybe nil should coalesce to empty string "". Certainly I am against "nil" string for a missed group. By the way, in the original format-spec, cdr is applied after the check whether assoc value is nil. + (number-sequence 1 ngroups (save-window-excursion (message "Running %s...done" cmd) (start-process-shell-command cmd nil cmd)
Re: greedy substitution in org-open-file
Maxim Nikulin writes: > On 12/02/2021 14:16, Kyle Meyer wrote: >> Not relevant for the underlying issue, but doesn't xpdf require a colon >> before the page number (i.e. ":%1")? > > At least for the application in debian & ubuntu xpdf package, page > number should be specified without a colon. It is Xt interface to > poppler PDF library, recently its maintainer decided to switch to > xpopple project as upstream. UI is derived from old version of xpdf. > Latest original xpdf version is based on Qt and might have different > convention in respect to page numbers. Okay. Fwiw the xpdf version I have that requires ":" before the page is 4.03. >> What about flipping the processing, handling the %N placeholders first >> and then formatting the file name? Seems to work on my end, though I >> haven't tested it thoroughly. > > I could anticipate similar problems if named destinations are involved. > I have not checked but I expect that internal links might have "%s" in > their names at least for some file types. Indeed, flipping the order unsurprisingly just flips which placeholders can be problematic. A very contrived example: (setq org-file-apps '(("\\.pdf::\\([A-z%]+\\)\\'" . "doesntmatter %s %1"))) ;; file:/tmp/test.pdf::a%sb => ;; Running doesntmatter /tmp/test.pdf a/tmp/test.pdfb...done > That is why I would strongly prefer substitutions performed in a > single pass. I do not like it, but it seems that simplified variant of > format-spec is better. It should allows substitutions with digit. I > hope, single digit should be enough. True, realistically I don't think anyone has a command in org-file-apps that relies on more than a couple of capture groups. All right, here's a format-spec-inspired fix. At the very least it needs doc updates and a comment or two. diff --git a/lisp/org.el b/lisp/org.el index 5b1443c4e..e8f60fd83 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -8644,6 +8644,23 @@ (defun org--file-apps-regexp-alist (list &optional add-auto-mode) (when add-auto-mode (mapcar (lambda (x) (cons (car x) 'emacs)) auto-mode-alist +(defun org--open-file-format-spec (format specification) + (with-temp-buffer +(insert format) +(goto-char (point-min)) +(while (search-forward "%" nil t) + (cond ((eq (char-after) ?%) + (delete-char 1)) +((looking-at "[s0-9]") + (replace-match + (or (cdr (assoc (match-string 0) specification)) + (error "Invalid format string")) + 'fixed-case 'literal) + (delete-region (1- (match-beginning 0)) (match-beginning 0))) +(t + (error "Invalid format string" +(buffer-string))) + ;;;###autoload (defun org-open-file (path &optional in-emacs line search) "Open the file at PATH. @@ -8745,24 +8762,20 @@ (defun org-open-file (path &optional in-emacs line search) ;; Remove quotes around the file name - we'll use shell-quote-argument. (while (string-match "['\"]%s['\"]" cmd) (setq cmd (replace-match "%s" t t cmd))) - (setq cmd (replace-regexp-in-string -"%s" -(shell-quote-argument (convert-standard-filename file)) -cmd -nil t)) - - ;; Replace "%1", "%2" etc. in command with group matches from regex - (save-match-data - (let ((match-index 1) - (number-of-groups (- (/ (length link-match-data) 2) 1))) - (set-match-data link-match-data) - (while (<= match-index number-of-groups) - (let ((regex (concat "%" (number-to-string match-index))) - (replace-with (match-string match-index dlink))) - (while (string-match regex cmd) - (setq cmd (replace-match replace-with t t cmd - (setq match-index (+ match-index 1) - + (setq cmd +(org--open-file-format-spec + cmd + (cons + (cons "s" (shell-quote-argument + (convert-standard-filename file))) + (let ((ngroups (- (/ (length link-match-data) 2) 1))) +(and (> ngroups 0) + (progn + (set-match-data link-match-data) + (mapcar (lambda (n) + (cons (number-to-string n) + (match-string-no-properties n dlink))) + (number-sequence 1 ngroups (save-window-excursion (message "Running %s...done" cmd) (start-process-shell-command cmd nil cmd)
Re: greedy substitution in org-open-file
On 12/02/2021 14:16, Kyle Meyer wrote: #+begin_src elisp (setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1"))) #+end_src Not relevant for the underlying issue, but doesn't xpdf require a colon before the page number (i.e. ":%1")? At least for the application in debian & ubuntu xpdf package, page number should be specified without a colon. It is Xt interface to poppler PDF library, recently its maintainer decided to switch to xpopple project as upstream. UI is derived from old version of xpdf. Latest original xpdf version is based on Qt and might have different convention in respect to page numbers. I believe format-spec requires the placeholder to be A-z: (format-spec "xpdf %s" '((?s . "a"))) => "xpdf a" (format-spec "xpdf %s %1" '((?s . "a") (?1 . "b"))) ;; Invalid format string You are right. I missed that format-spec allows to specify field width, so digits could not be used. What about flipping the processing, handling the %N placeholders first and then formatting the file name? Seems to work on my end, though I haven't tested it thoroughly. I could anticipate similar problems if named destinations are involved. I have not checked but I expect that internal links might have "%s" in their names at least for some file types. That is why I would strongly prefer substitutions performed in a single pass. I do not like it, but it seems that simplified variant of format-spec is better. It should allows substitutions with digit. I hope, single digit should be enough.
Re: greedy substitution in org-open-file
Maxim Nikulin writes: > Looking into the code related to 'pty problem with > start-process-shell-command, I have realized that the following case is > not handled correctly: > > #+begin_src elisp >(setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1"))) > #+end_src Not relevant for the underlying issue, but doesn't xpdf require a colon before the page number (i.e. ":%1")? > I hope, I adapted an example from [[help:org-file-apps]] correctly. When > I try to open the following link using C-c C-o > > [[file:test-%1f.pdf::42]] > > I get > > Running xpdf /home/ubuntu/examples/org/test-\42f.pdf 42...done > > I believe, it should be > > Running xpdf /home/ubuntu/examples/org/test-%1f.pdf 42...done > > Or > > Running xpdf /home/ubuntu/examples/org/test-\%1f.pdf 42...done Yep, it should show up as "Running xpdf /tmp/test-\%1f.pdf :42...done", with the backslash coming from shell-quote-argument. > The source of the problem is that %s substitution is expanded at first > and regexp groups are replaced later. Ideally, it should be performed in > a single pass. I have found format-spec function but I am unsure > concerning it, maybe it is avoided intentionally. I believe format-spec requires the placeholder to be A-z: (format-spec "xpdf %s" '((?s . "a"))) => "xpdf a" (format-spec "xpdf %s %1" '((?s . "a") (?1 . "b"))) ;; Invalid format string What about flipping the processing, handling the %N placeholders first and then formatting the file name? Seems to work on my end, though I haven't tested it thoroughly. diff --git a/lisp/org.el b/lisp/org.el index c61b8fb56..31dbe352a 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -8714,12 +8714,6 @@ (defun org-open-file (path &optional in-emacs line search) ;; Remove quotes around the file name - we'll use shell-quote-argument. (while (string-match "['\"]%s['\"]" cmd) (setq cmd (replace-match "%s" t t cmd))) - (setq cmd (replace-regexp-in-string -"%s" -(shell-quote-argument (convert-standard-filename file)) -cmd -nil t)) - ;; Replace "%1", "%2" etc. in command with group matches from regex (save-match-data (let ((match-index 1) @@ -8731,7 +8725,11 @@ (defun org-open-file (path &optional in-emacs line search) (while (string-match regex cmd) (setq cmd (replace-match replace-with t t cmd (setq match-index (+ match-index 1) - + (setq cmd (replace-regexp-in-string +"%s" +(shell-quote-argument (convert-standard-filename file)) +cmd +nil t)) (save-window-excursion (message "Running %s...done" cmd) (start-process-shell-command cmd nil cmd)
greedy substitution in org-open-file
Looking into the code related to 'pty problem with start-process-shell-command, I have realized that the following case is not handled correctly: #+begin_src elisp (setq org-file-apps '(("\\.pdf::\\([0-9]+\\)\\'" . "xpdf %s %1"))) #+end_src I hope, I adapted an example from [[help:org-file-apps]] correctly. When I try to open the following link using C-c C-o [[file:test-%1f.pdf::42]] I get Running xpdf /home/ubuntu/examples/org/test-\42f.pdf 42...done I believe, it should be Running xpdf /home/ubuntu/examples/org/test-%1f.pdf 42...done Or Running xpdf /home/ubuntu/examples/org/test-\%1f.pdf 42...done Org mode version 9.4.4 (release_9.4.4-164-g7a9a8a The source of the problem is that %s substitution is expanded at first and regexp groups are replaced later. Ideally, it should be performed in a single pass. I have found format-spec function but I am unsure concerning it, maybe it is avoided intentionally.