Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-22 Thread Stephan Beal
On Tue, Jul 8, 2014 at 9:37 PM, Stephan Beal sgb...@googlemail.com wrote: No characters between 128 and 255 are valid UTF-8, to avoid confusion with the many encodings which use that range. For the record, that's apparently wrong. My local man pages (and experimentation with the termbox API)

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-22 Thread Ron W
On Tue, Jul 22, 2014 at 11:48 AM, Stephan Beal sgb...@googlemail.com wrote: On Tue, Jul 8, 2014 at 9:37 PM, Stephan Beal sgb...@googlemail.com wrote: No characters between 128 and 255 are valid UTF-8, to avoid confusion with the many encodings which use that range. For the record, that's

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-22 Thread Andy Bradford
Thus said Stephan Beal on Tue, 22 Jul 2014 19:01:27 +0200: One would think i'd be more conscious of how i throw around byte vs character :/. i'm still not clear on the whole char-vs-code point bit, though. The whole char-vs-codepoint has always been unclear for me, no matter how many

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-10 Thread Jan Nijtmans
2014-07-09 0:05 GMT+02:00 Andy Bradford amb-fos...@bradfords.org: Or perhaps just making the documentation more clear that all files must be valid UTF-8. Oh no, fossil doesn't require at all that all files are valid UTF-8. Only fossil ui assumes UTF-8 encoding for non-binary files, otherwise

[fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-08 Thread Andy Bradford
Hello, I have some Tcl scripts (for IRC) that previously had no problems when I committed. They don't have UTF-8 characters at all, but when I try to commit them I get the warning: ./test.tcl contains invalid UTF-8. Use --no-warnings or the encoding-glob setting to disable this warning.

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-08 Thread Jan Nijtmans
2014-07-08 20:47 GMT+02:00 Andy Bradford amb-fos...@bradfords.org: Hello, I have some Tcl scripts (for IRC) that previously had no problems when I committed. They don't have UTF-8 characters at all, but when I try to commit them I get the warning: ./test.tcl contains invalid UTF-8. Use

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-08 Thread Stephan Beal
On Tue, Jul 8, 2014 at 8:47 PM, Andy Bradford amb-fos...@bradfords.org wrote: If I remove the the è (0xe8) character I can commit. I didn't think 0xe8 was UTF-8, but maybe I'm mistaken? No characters between 128 and 255 are valid UTF-8, to avoid confusion with the many encodings which use

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-08 Thread Andy Bradford
Thus said Jan Nijtmans on Tue, 08 Jul 2014 21:35:07 +0200: If you don't want this warning, just set 'encoding-glob' to '*'. I might actually want encoding warnings though... But did you ever view this file in the fossil UI? Did the è really look like è there? I did not, however, if I

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-08 Thread Andy Bradford
Thus said Stephan Beal on Tue, 08 Jul 2014 21:37:50 +0200: No characters between 128 and 255 are valid UTF-8, to avoid confusion with the many encodings which use that range. If no characters between 128 and 255 are valid UTF-8, and they can never be valid UTF-8 characters, and are used by

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-08 Thread Scott Robison
On Tue, Jul 8, 2014 at 3:38 PM, Andy Bradford amb-fos...@bradfords.org wrote: That's a good suggestion for fixing the Tcl script, but I'm still not sure why Fossil thinks that è is UTF-8. I thought it was extended ASCII. I didn't think 0xe8 was UTF-8, but maybe I'm mistaken? In the

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-08 Thread Stephan Beal
Interesting question/option, but i have no answer. Something to possibly consider? (sent from a mobile device - please excuse brevity, typos, and top-posting) - stephan beal http://wanderinghorse.net On Jul 8, 2014 11:43 PM, Andy Bradford amb-fos...@bradfords.org wrote: Thus said Stephan

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-08 Thread Andy Bradford
Thus said Scott Robison on Tue, 08 Jul 2014 15:48:05 -0600: The warning you are seeing is that the stream is invalid UTF-8. 0xE8 byte could be an extended ASCII character from one of the ISO-8859-X code pages. Or it could be real binary data that just happens to mostly have ASCII text

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-08 Thread Shal Farley
Andy, If no characters between 128 and 255 are valid UTF-8, and they can never be valid UTF-8 characters, and are used by many encodings, why doesn't Fossil simply ignore them when they are committed? I think Stephan said it poorly. A solitary byte in that range is never valid UTF-8, but

Re: [fossil-users] File contains invalid UTF-8, but is not UTF-8.

2014-07-08 Thread Andy Bradford
Thus said Stephan Beal on Tue, 08 Jul 2014 23:50:40 +0200: Interesting question/option, but i have no answer. Something to possibly consider? Or perhaps just making the documentation more clear that all files must be valid UTF-8. There is already an option to control how encodings are