On Thu, Aug 23, 2012 at 07:19:29PM -0400, Geoff Steckel wrote: > >Well, yes, using a character set conversion API in stupid ways can > >munge data. How does that relate to anything I was saying? > As long as iconv is only used to display data, not to change file > contents, you're perfectly right.
Yes, that's what I meant (sorry if I wasn't clear enough). Open the file, allow the user to specify the file's encoding (and maybe auto-detect it somehow, but always allow the user to override this), load the data into a buffer, convert the buffer for display, and show it on the screen. The user can now edit the buffer in the display encoding. Before saving, convert back to the file's encoding. If that fails because the user added characters that cannot be represented in the original encoding, complain and offer the option to save the file in a suitable encoding. > A real example is a L***x editor using iconv. Open a 5000 line file, > change line 100, line 500 contains a non-conforming character, > file is truncated there. > > Not pretty. Yeah, that's obviously not done right. We can easily imagine other problems like a mix of character encodings ending up in a file by accident. Sometimes this is done on purpose however and then the display conversion step gets interesting, but at a minimum it should display one of the encodings correctly and allow users to switch the display encoding if necessary. > Another real example. Bring up line containing non-conforming character. > Line appears blank. > > I agree that it takes a great deal of care to implement a multi-character > set editor such that it works on all useful files while displaying in > a particular locale's character set. Yes, not every combination can be made to work. E.g. displaying any of the non-latin1 subset of UTF-8 in a latin1 locale just won't work, and this must be treated as a user error (invalid input or locale configuration). And that's fine since it's an expected failure mode. It just needs to be handled in a way that doesn't destroy data. It isn't a trivial task on all accounts but the result would be useful. But for this kind of feature to appear in mg we'll need iconv in base. As a first step, adding a UTF-8 mode to mg, where file content is expected to be UTF-8 encoded, would be much easier and already quite useful.
