Bug#575901: On LaTeX files, the encoding should be given
Vincent Lefevre wrote... On a LaTeX file, one currently gets: LaTeX 2e document text It would be useful to have the encoding too, e.g. ISO-8859-1 LaTeX 2e document text UTF-8 LaTeX 2e document text (...) From wheezy (5.11) on, file also prints a file encoding, like | LaTeX 2e document, UTF-8 Unicode text That one is guessed from the file content, not by eximation of statements like 'inputenc'. Is that sufficient for you? On LaTeX files, the encoding can be obtained unambiguously (well, in practice) by looking at \usepackage[...]{inputenc} commands, e.g. \usepackage[latin1]{inputenc} \usepackage[utf8]{inputenc} Seems feasible but still requires some hackery using regular expressions. Christoph signature.asc Description: Digital signature
Bug#575901: On LaTeX files, the encoding should be given
On 2014-03-09 14:02:48 +0100, Christoph Biedl wrote: Vincent Lefevre wrote... On a LaTeX file, one currently gets: LaTeX 2e document text It would be useful to have the encoding too, e.g. ISO-8859-1 LaTeX 2e document text UTF-8 LaTeX 2e document text (...) From wheezy (5.11) on, file also prints a file encoding, like | LaTeX 2e document, UTF-8 Unicode text That one is guessed from the file content, not by eximation of statements like 'inputenc'. Is that sufficient for you? Yes, more or less. The problem is for ISO-8859 files: one doesn't know which version of ISO-8859 it is. I only use the ISO-8859-1 version, so that this is unambiguous for me, but this can be a problem for filters based on file output that are distributed widely. On LaTeX files, the encoding can be obtained unambiguously (well, in practice) by looking at \usepackage[...]{inputenc} commands, e.g. \usepackage[latin1]{inputenc} \usepackage[utf8]{inputenc} Seems feasible but still requires some hackery using regular expressions. I think that in most cases, these commands occur at the beginning of a line (looking for such a command would be useful only in the ISO-8859 case, to differentiate the various versions). -- Vincent Lefèvre vinc...@vinc17.net - Web: https://www.vinc17.net/ 100% accessible validated (X)HTML - Blog: https://www.vinc17.net/blog/ Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#575901: On LaTeX files, the encoding should be given
Package: file Version: 5.04-1 Severity: wishlist On a LaTeX file, one currently gets: LaTeX 2e document text It would be useful to have the encoding too, e.g. ISO-8859-1 LaTeX 2e document text UTF-8 LaTeX 2e document text so that one can automatically call iconv via LESSOPEN when viewing a file with less, for instance. On LaTeX files, the encoding can be obtained unambiguously (well, in practice) by looking at \usepackage[...]{inputenc} commands, e.g. \usepackage[latin1]{inputenc} \usepackage[utf8]{inputenc} Note: ignore everything that is after a % character, as one sometimes comments out such commands with it. -- System Information: Debian Release: squeeze/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 2.6.30-2-amd64 (SMP w/8 CPU cores) Locale: LANG=POSIX, LC_CTYPE=en_US.ISO8859-1 (charmap=ISO-8859-1) Shell: /bin/sh linked to /bin/dash Versions of packages file depends on: ii libc6 2.10.2-6 Embedded GNU C Library: Shared lib ii libmagic1 5.04-1 File type determination library us ii zlib1g 1:1.2.3.4.dfsg-3 compression library - runtime file recommends no packages. file suggests no packages. -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org