Bug#575901: On LaTeX files, the encoding should be given

2014-03-09 Thread Christoph Biedl
Vincent Lefevre wrote...

 On a LaTeX file, one currently gets:
 
   LaTeX 2e document text
 
 It would be useful to have the encoding too, e.g.
 
   ISO-8859-1 LaTeX 2e document text
   UTF-8 LaTeX 2e document text
(...)

From wheezy (5.11) on, file also prints a file encoding, like

| LaTeX 2e document, UTF-8 Unicode text

That one is guessed from the file content, not by eximation of
statements like 'inputenc'. Is that sufficient for you?

 On LaTeX files, the encoding can be obtained unambiguously (well,
 in practice) by looking at \usepackage[...]{inputenc} commands,
 e.g.
 
   \usepackage[latin1]{inputenc}
   \usepackage[utf8]{inputenc}

Seems feasible but still requires some hackery using regular
expressions.

Christoph


signature.asc
Description: Digital signature


Bug#575901: On LaTeX files, the encoding should be given

2014-03-09 Thread Vincent Lefevre
On 2014-03-09 14:02:48 +0100, Christoph Biedl wrote:
 Vincent Lefevre wrote...
 
  On a LaTeX file, one currently gets:
  
LaTeX 2e document text
  
  It would be useful to have the encoding too, e.g.
  
ISO-8859-1 LaTeX 2e document text
UTF-8 LaTeX 2e document text
 (...)
 
 From wheezy (5.11) on, file also prints a file encoding, like
 
 | LaTeX 2e document, UTF-8 Unicode text
 
 That one is guessed from the file content, not by eximation of
 statements like 'inputenc'. Is that sufficient for you?

Yes, more or less. The problem is for ISO-8859 files: one doesn't
know which version of ISO-8859 it is. I only use the ISO-8859-1
version, so that this is unambiguous for me, but this can be a
problem for filters based on file output that are distributed
widely.

  On LaTeX files, the encoding can be obtained unambiguously (well,
  in practice) by looking at \usepackage[...]{inputenc} commands,
  e.g.
  
\usepackage[latin1]{inputenc}
\usepackage[utf8]{inputenc}
 
 Seems feasible but still requires some hackery using regular
 expressions.

I think that in most cases, these commands occur at the beginning
of a line (looking for such a command would be useful only in the
ISO-8859 case, to differentiate the various versions).

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: https://www.vinc17.net/
100% accessible validated (X)HTML - Blog: https://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#575901: On LaTeX files, the encoding should be given

2010-03-30 Thread Vincent Lefevre
Package: file
Version: 5.04-1
Severity: wishlist

On a LaTeX file, one currently gets:

  LaTeX 2e document text

It would be useful to have the encoding too, e.g.

  ISO-8859-1 LaTeX 2e document text
  UTF-8 LaTeX 2e document text

so that one can automatically call iconv via LESSOPEN when viewing
a file with less, for instance.

On LaTeX files, the encoding can be obtained unambiguously (well,
in practice) by looking at \usepackage[...]{inputenc} commands,
e.g.

  \usepackage[latin1]{inputenc}
  \usepackage[utf8]{inputenc}

Note: ignore everything that is after a % character, as one
sometimes comments out such commands with it.

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.30-2-amd64 (SMP w/8 CPU cores)
Locale: LANG=POSIX, LC_CTYPE=en_US.ISO8859-1 (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/dash

Versions of packages file depends on:
ii  libc6   2.10.2-6 Embedded GNU C Library: Shared lib
ii  libmagic1   5.04-1   File type determination library us
ii  zlib1g  1:1.2.3.4.dfsg-3 compression library - runtime

file recommends no packages.

file suggests no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org