On Thu, Aug 23, 2012 at 07:19:29PM -0400, Geoff Steckel wrote:
> >Well, yes, using a character set conversion API in stupid ways can
> >munge data. How does that relate to anything I was saying?
> As long as iconv is only used to display data, not to change file
> contents, you're perfectly right.

Yes, that's what I meant (sorry if I wasn't clear enough).

Open the file, allow the user to specify the file's encoding
(and maybe auto-detect it somehow, but always allow the user to
override this), load the data into a buffer, convert the buffer
for display, and show it on the screen.

The user can now edit the buffer in the display encoding.

Before saving, convert back to the file's encoding. If that fails
because the user added characters that cannot be represented in the
original encoding, complain and offer the option to save the file
in a suitable encoding.

> A real example is a L***x editor using iconv. Open a 5000 line file,
> change line 100, line 500 contains a non-conforming character,
> file is truncated there.
> 
> Not pretty.

Yeah, that's obviously not done right.

We can easily imagine other problems like a mix of character encodings
ending up in a file by accident. Sometimes this is done on purpose
however and then the display conversion step gets interesting, but
at a minimum it should display one of the encodings correctly and
allow users to switch the display encoding if necessary.

> Another real example. Bring up line containing non-conforming character.
> Line appears blank.
> 
> I agree that it takes a great deal of care to implement a multi-character
> set editor such that it works on all useful files while displaying in
> a particular locale's character set.

Yes, not every combination can be made to work. E.g. displaying any of
the non-latin1 subset of UTF-8 in a latin1 locale just won't work,
and this must be treated as a user error (invalid input or locale
configuration). And that's fine since it's an expected failure mode.
It just needs to be handled in a way that doesn't destroy data.

It isn't a trivial task on all accounts but the result would be useful.

But for this kind of feature to appear in mg we'll need iconv in base.
As a first step, adding a UTF-8 mode to mg, where file content is expected
to be UTF-8 encoded, would be much easier and already quite useful.

Reply via email to