On Thu, Feb 27, 2014 at 9:39 AM, Don Guinn <dongu...@gmail.com> wrote: > Although the unicode value of 'þ' is less than 256 it still must be > represented with two bytes in UTF-8. This is where it gets confusing to > view UTF-8 as literal. And why I sometimes think it would be nice if UTF-8 > was a type unique from literal and unicode.
You could do that. That could easily double the size of the J interpreter and yield all sorts of new errors. It would also take a lot of work to implement. Also, it's not clear whether any of the existing J commands should work on such a type. Personally, I'd much rather see J support utf-32. Put differently, J represents code points, it's up to the programmer to make sure that these code points represent meaningful characters. If you prohibit the language from representing things which are not meaningful to you you are also prohibiting it from representing those things for other people. For example, let's say that 'þ' was represented in a utf-8 type. What would 3 3$'þ' do? Here's how it works, currently: 3 3$'þ' þ� �þ þ� What you are seeing here is that 'þ' is a sequence of two literals in utf-8. So an array of those literals with an odd length will necessarily be flawed. Now, J will already report the error, if that is what you want: 7 u:"1(3 3$'þ') |domain error But the real issue here is not J, it's the complexity of unicode. No matter how the language is implemented, you are going to have to come to terms with that complexity if you are going to work with unicode. And yes, this is frustrating. And, yes, it's tempting to blame the language for this frustration. But if you've worked with unicode in another programming language you'll be experiencing similar (or worse) frustrations. And, eventually, once you get past those frustrations, you'll have a decent understanding of what's going on. Personally, I think it's best if people do not limit themselves to a single programming language. Thinking about problems in multiple programming languages gives you useful perspectives on how to solve problems. Of course, it's also good to not limit your knowledge to "only programming languages". To be useful you need to have knowledge of other fields (engineering, or whatever else). Overspecialization keeps you from recognizing and solving problems. Thanks, -- Raul ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm