Dear AUCTeX developers, I have some problems with preview-latex with regard to the coding system when I use Japanese LaTeX. Since the recent TeXLive contains Japanese LaTeX by default, I suppose that non-Japanese users can experience the problems if sample file is provided. So I organize this email as the following 3 parts:
A. The problems are described with the attached sample files so that anyone can actually experience the situation and examine what's going on in detail. B. The reasons of the problems are explained and their tentative fixes are proposed by the attached patches. C. The patches in B. fix problems only partially. The remaining problem is described and call for help is expressed. A. There are two problems. I will describe them in order. A-1. How to reproduce: (1) Start a new emacs session with env LC_ALL=ja_JP.SJIS emacs & and enable preview-latex. (2) Open the attached file "preview-error-test.tex", which has many \section lines. They are all commented out initially. (3) Uncomment any one of them and start preview-latex with C-c C-p C-d. Answer with n to "Cache preamble?" question. Then the error or bad result described on the next line of the uncommented \section will occur, e.g. Invalid regexp: "Unmatched ( or \\(" (4) Comment out again that \section line, uncomment another \section line, and try C-c C-p C-d again. Another error will come out. (5) Repeat the procedure described in (4). The process (3) will not work if your tex distribution lacks the Japanese LaTeX command binary "platex". In that case, please check up the following list. o Be sure to install TeXLive. Other tex distributions usually lack Japanese TeX engines. o If you (or the package manager you are using) didn't select a scheme large enough when installing TeXLive, Japanese LaTeX suite is not present on your machine. o Japanese TeX was first included in TeXLive several years ago. Thus if your TeXLive is older than that, Japanese LaTeX is not available. o If your ghostscript is not configured to handle PS file with Japanese font, the character in the preview image may be garbled. However, that is not the point I'm speaking of now. Rather, it is the error in regexp match preventing preview-latex to do the job that I'd like you to look at. A-2. How to reproduce: (1) This time, start a new emacs session with another locale env LC_ALL=ja_JP.eucJP emacs & and enable preview-latex. (2) Open the attached file "preview-error-test2.tex" and type C-c C-p C-d. This time, answer with y, not n, to "Cache preamble?" question. (3) Then the preview image will come out at wrong position. This example requires `platex' binary, too. B. The reasons and tentative fixes to the problems. B-1. Shift-JIS encoding problem. The bad results demonstrated in A-1 are caused by the nature of the coding system `japanese-shift-jis' (SJIS for short). SJIS is one of the major encodings for Japanese text and the standard encoding in the Japanese edition of windows for historical reasons. Basically, SJIS represents one Japanese character by two bytes. Examples of such two-byte sequences are, in hexadecimal form: 8E 82 and 81 5B . While the first byte of the sequence is always 8-bit (MSB on), the second is not necessarily so. In the above two examples, the second byte of the first example (82) is 8-bit, but the second one (5B) is 7-bit (MSB off). It is this 7-bit byte that brings the problems in A-1 above. Unfortunately, this 7-bit byte sometimes coincides with a regexp meta character. Thus it is interfered with `regexp-quote' in the function `preview-error-quote'. Roughly speaking, 'preview-error-quote' works along this flow: 1. Encodes string in the given coding system (i.e., SJIS in this example). 2. Replaces texts which begin with "^^" with the corresponding byte. 3. Supplies regular expression, for later use to locate the position in the buffer for putting the preview image, guarding the meta character in the original text by `regexp-quote'. 4. Decodes back the obtained string out of the coding system again. However, when `regexp-quote' in the item 3 quotes the 7-bit byte in SJIS, decoding back fails to gain the original character. The following example illustrates what is going on: (let* ((s1 (char-to-string (make-char 'japanese-jisx0208 37 63))) ;; s1 is multibyte Japanese string. ;; Encode s1 in SJIS. (s2 (encode-coding-string s1 'shift_jis)) ;; At this point s2 is "\203^". (s3 (regexp-quote s2)) ;; Now s3 is "\203\\^". ;; Then decode back assuming SJIS encoding. (s4 (decode-coding-string s3 'shift_jis))) (string-equal s1 s4)) => nil ;; no longer goes back to the original string s1. The attached patch "preview-latex-fix" is my approach to fix this problem. It avoids to handle encoded string and does the relavant operations on the decoded string consistently. (In addition, it fixes a problem that `char-to-string' in the original code does not do the expected job in unicode-based emacs for chars of #x80 through #xFF. I changed to use `byte-to-string' instead when that function is available.) B-2. preview-latex drops the necessary command option. Japanese TeX command sometimes needs "-kanji" option to know the coding system of the given TeX file. In AUCTeX, this requirement is usually covered by the "%(kanjiopt)" construct in the following lines quoted from tex-jp.el: (setq TeX-engine-alist-builtin (append TeX-engine-alist-builtin '((ptex "pTeX" "ptex %(kanjiopt)" "platex %(kanjiopt)" "eptex") (jtex "jTeX" "jtex" "jlatex" nil) (uptex "upTeX" "euptex" "uplatex" "euptex")))) This "%(kanjiopt)" is changed to suitable option string like "-kanji XXX" when necessary. However, if the answer to the question "Cache preamble?" is y, preview-latex drops this option, which leads to the results described in A-2 above. The reason why the option "-kanji XXX" is missing is that `TeX-inline-preview-internal' transforms the command line passed to the OS shell by `(preview-do-replacements command preview-undump-replacements)' when caching preamble is enabled. Here the regular expression in `preview-undump-replacements' is designed to pick up the very first word of the value of the variable `command', leaving behind the option "-kanji XXX". The attached patch "preview-latex-fix2" aims to resolve this problem. It gives back the latex command options provided in the entry which `(TeX-engine-alist)' returns so that the command will run smoothly. C. Call for help There are still some problems remained. I think we should have a integrated framework which can serve for both preview-latex and tex-jp.el to determine the suitable process coding system. The coding systems to communicate with Japanese TeX command are not constant but vary with the environments. In fact it can only be determined at run time. Currently that situation is handled by the function `japanese-TeX-set-process-coding-system' in tex-jp.el during the normal runs. That function is set to the value of `TeX-after-start-process-function' and called after the TeX process starts. In that way, the process coding systems are set to suitable values under the environment at that point of time. However, the way preview-latex handles process coding systems sometimes conflicts with such setting. For example, `TeX-inline-preview-internal' overwrites the process coding system after `japanese-TeX-set-process-coding-system' does its job. (Current preview-latex uses the value of `TeX-japanese-process-output-coding-system', but it is not sufficient to rely on such constant value. In fact the default value of `TeX-japanese-process-output-coding-system' was changed to nil recently.) Even my patch "preview-latex-fix" is not sufficient about this point. The coding-system argument supplied to `decode-coding-string' should not simply be `buffer-file-coding-system'. I would appreciate if anyone who has deeper knowledge of AUCTeX could help to resolve all these coding system issues in preview-latex. Best regards, Ikumi Keita P.S. I subscribed to auctex-devel ML temporarily, so it is not necessary to put me on CC: when replying. I will stay on the ML until the discussion about this issue is settled.
\documentclass{jarticle} \begin{document} % How to see the errors or unexpected result: % Uncomment the each line of \section macro and enable % preview-latex with C-c C-p C-d. Answer with n to "Cache preamble?" % question. Then the error or unexpected result described on its next % line will occur. % The chars 表, 予 and 能 contain 0x5c backslash in the shift jis encoding. %\section{表(1)} % error in process sentinel: Invalid regexp: "Unmatched ( or \\(" %\section{予{a}} % error in process sentinel: Invalid regexp: "Invalid content of \\{\\}" %\section{(能)} % error in process sentinel: Invalid regexp: "Unmatched ) or \\)" %\section{能\|} % No error, but the image covers the text only partially. %\section{あ} %表 % error in process sentinel: Invalid regexp: "Trailing backslash" % The char ー contains 0x5b [ in the shift jis encoding. %\section{アース} % error in process sentinel: Invalid regexp: "Unmatched [ or [^" % The char 型 contains 0x5e ^ in the shift jis encoding. %\section{型} % No error, but the text is misplaced far rightward to the image. \end{document} %%% Local Variables: %%% coding: japanese-shift-jis %%% mode: japanese-latex %%% TeX-master: t %%% TeX-engine: ptex %%% End:
\documentclass{jarticle} \begin{document} % Enable preview-latex by C-c C-p C-d. Answer with y to "Cache preamble?" % question. Then you will see that the image is placed on wrong position % in the buffer. preview-latex で \(a^{2}=b^{2}+c^{2}\) のような数式を日本語 LaTeX でも preview したい。 \end{document} %%% Local Variables: %%% coding: euc-jp %%% mode: japanese-latex %%% TeX-master: t %%% TeX-engine: ptex %%% End:
diff --git a/preview.el.in b/preview.el.in --- a/preview.el.in +++ b/preview.el.in @@ -2613,35 +2613,96 @@ so the character represented by ^^^ preceding extended characters will not get matched, usually." (let (output case-fold-search) - (when (featurep 'mule) - (setq string (encode-coding-string string run-coding-system))) - (while (string-match "\\^\\{2,\\}\\(\\([@-_?]\\)\\|[8-9a-f][0-9a-f]\\)" - string) + ;; Some coding systems (e.g. japanese-shift-jis) use regexp meta + ;; characters on encoding. Such meta characters would be + ;; interfered with `regexp-quote' below. Thus the idea of + ;; "encoding entire string beforehand and decoding it at the last + ;; stage" does not work for such coding systems. + ;; (when (featurep 'mule) + ;; (setq string (encode-coding-string string run-coding-system))) + ;; Rather, we work consistently with decoded text. + (if (and (featurep 'xemacs) (featurep 'mule) + (eq 'raw-text (coding-system-name + (coding-system-base run-coding-system)))) + (setq string + (decode-coding-string string + (or (and (featurep 'tex-jp) + japanese-TeX-mode + TeX-japanese-process-output-coding-system) + buffer-file-coding-system)))) + + ;; Next, bytes with value 0x80 to 0xFF represented with ^^ form + ;; are converted to byte sequence, and decoded by suitable coding + ;; system. + (setq string + (preview--decode-^^ab string + (if (featurep 'mule) + buffer-file-coding-system nil))) + + ;; Then, control characters are taken into account. + (while (string-match "\\^\\{2,\\}\\([@-_?]\\)" string) (setq output (concat output (regexp-quote (substring string 0 (- (match-beginning 1) 2))) - (if (match-beginning 2) - (concat - "\\(?:" (regexp-quote - (substring string - (- (match-beginning 1) 2) - (match-end 0))) - "\\|" - (char-to-string - (logxor (aref string (match-beginning 2)) 64)) - "\\)") - (char-to-string - (string-to-number (match-string 1 string) 16)))) + (concat + "\\(?:" (regexp-quote + (substring string + (- (match-beginning 1) 2) + (match-end 0))) + "\\|" + (char-to-string + (logxor (aref string (match-beginning 1)) 64)) + "\\)")) string (substring string (match-end 0)))) (setq output (concat output (regexp-quote string))) - (if (featurep 'mule) - (decode-coding-string output - (or (and (boundp 'TeX-japanese-process-output-coding-system) - TeX-japanese-process-output-coding-system) - buffer-file-coding-system)) - output))) + output)) + +(defun preview--decode-^^ab (string coding-system) + "Decode ^^ sequences in STRING with CODING-SYSTEM. +Sequences of control characters such as ^^I are left untouched. + +Return a new string." + ;; Since the given string can contain multibyte characters, decoding + ;; should be performed seperately on each segment made up entirely + ;; with ASCII characters. + (let ((result "")) + (while (string-match "[\x00-\x7F]+" string) + (setq result + (concat result + (substring string 0 (match-beginning 0)) + (let ((text (preview--convert-^^ab + (match-string 0 string)))) + (if (featurep 'mule) + (decode-coding-string text coding-system) + text))) + string (substring string (match-end 0)))) + (setq result (concat result string)) + result)) + +(defun preview--convert-^^ab (string) + "Convert ^^ sequences in STRING to raw 8bit. +Sequences of control characters such as ^^I are left untouched. + +Return a new string." + (save-match-data + (let ((result "")) + (while (string-match "\\^\\^[8-9a-f][0-9a-f]" string) + (setq result + (concat result + (substring string 0 (match-beginning 0)) + (let ((byte (string-to-number + (substring (match-string 0 string) 2) 16))) + ;; `char-to-string' is not appropriate in + ;; Emacs >= 23 because it converts #xAB into + ;; "\u00AB" (multibyte string), not "\xAB" + ;; (raw 8bit unibyte string). + (if (fboundp 'byte-to-string) + (byte-to-string byte) (char-to-string byte)))) + string (substring string (match-end 0)))) + (setq result (concat result string)) + result))) (defun preview-parse-messages (open-closure) "Turn all preview snippets into overlays. @@ -3496,9 +3557,10 @@ (setq TeX-sentinel-function 'preview-TeX-inline-sentinel) (when (featurep 'mule) (setq preview-coding-system - (or (and (boundp 'TeX-japanese-process-output-coding-system) - TeX-japanese-process-output-coding-system) - (with-current-buffer commandbuff + (with-current-buffer commandbuff + (or (and (featurep 'tex-jp) + japanese-TeX-mode + TeX-japanese-process-output-coding-system) buffer-file-coding-system))) (when preview-coding-system (setq preview-coding-system
diff --git a/preview.el.in b/preview.el.in --- a/preview.el.in +++ b/preview.el.in @@ -3542,7 +3542,13 @@ "Preview-LaTeX" (if (consp (cdr dumped-cons)) (preview-do-replacements - command preview-undump-replacements) + command + (append preview-undump-replacements + ;; Since the command options provided in + ;; (TeX-engine-alist) are dropped, give them + ;; back. + (list (list "\\`\\([^ ]+\\)" + (TeX-command-expand "%(latex)" nil))))) command) file))) (condition-case err (progn
_______________________________________________ auctex-devel mailing list auctex-devel@gnu.org https://lists.gnu.org/mailman/listinfo/auctex-devel