Re: [julia-users] Re: UTF16String or UTF8String?

2015-09-28 Thread Scott Jones
On Monday, September 28, 2015 at 3:32:55 AM UTC-4, Jameson wrote: > > I find it interesting to note that the wikipedia article points out that > if size compression is the goal (and there is enough text for it to > matter), then SCSU (or other attempts at creating a unicode-specific > compress

Re: [julia-users] Re: UTF16String or UTF8String?

2015-09-28 Thread Jameson Nash
I find it interesting to note that the wikipedia article points out that if size compression is the goal (and there is enough text for it to matter), then SCSU (or other attempts at creating a unicode-specific compression scheme) is inferior to using a general purpose compression algorithm. Since t

Re: [julia-users] Re: UTF16String or UTF8String?

2015-09-27 Thread Scott Jones
The ANSI Latin 1 character set, which is equivalent to the 1st 256 characters of the Unicode character set, supports the following languages: Western Europe and Americas: Afrikaans, Basque, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, Galician, German, Icelandic, Irish, Italian, N

Re: [julia-users] Re: UTF16String or UTF8String?

2015-09-27 Thread Daniel Carrera
On 27 September 2015 at 23:41, Scott Jones wrote: > No. Most characters used in the countries I mentioned above can be > represented using just ANSI Latin1 > (which is why I specified *Western Europe*), so UTF-8 will take 1 or 2 > bytes for each character, > but when you are dealing with the Mid

Re: [julia-users] Re: UTF16String or UTF8String?

2015-09-27 Thread Scott Jones
On Sunday, September 27, 2015 at 5:40:03 PM UTC-4, Páll Haraldsson wrote: > > 2015-09-27 21:26 GMT+00:00 Páll Haraldsson >: > >> 2015-09-27 20:29 GMT+00:00 Scott Jones > >: >> >>> If it is mainly in North/South America, Western Europe, or Australia/NZ, >>> UTF-8 does OK. >>> UTF-8 is great for

Re: [julia-users] Re: UTF16String or UTF8String?

2015-09-27 Thread Scott Jones
On Sunday, September 27, 2015 at 5:26:26 PM UTC-4, Páll Haraldsson wrote: > > 2015-09-27 20:29 GMT+00:00 Scott Jones >: > >> If it is mainly in North/South America, Western Europe, or Australia/NZ, >> UTF-8 does OK. >> UTF-8 is great for data interchange, but can really slow things down if >>

Re: [julia-users] Re: UTF16String or UTF8String?

2015-09-27 Thread Páll Haraldsson
2015-09-27 21:26 GMT+00:00 Páll Haraldsson : > 2015-09-27 20:29 GMT+00:00 Scott Jones : > >> If it is mainly in North/South America, Western Europe, or Australia/NZ, >> UTF-8 does OK. >> UTF-8 is great for data interchange, but can really slow things down if >> you have many non-ASCII characters >

Re: [julia-users] Re: UTF16String or UTF8String?

2015-09-27 Thread Páll Haraldsson
2015-09-27 20:29 GMT+00:00 Scott Jones : > If it is mainly in North/South America, Western Europe, or Australia/NZ, > UTF-8 does OK. > UTF-8 is great for data interchange, but can really slow things down if > you have many non-ASCII characters > Did you mean non-BMP? Non-ASCII, but BMP ("European

Re: [julia-users] Re: UTF16String or UTF8String?

2015-09-27 Thread Scott Jones
On Sunday, September 27, 2015 at 4:56:28 PM UTC-4, Jameson wrote: > > UTF-16 is much faster in many situations than UTF-8. >> > > an encoding is not a speed. it is a format. Both formats are > variable-length encodings, and therefore both algorithms have the same time > and space complexity (

Re: [julia-users] Re: UTF16String or UTF8String?

2015-09-27 Thread Jameson Nash
> > UTF-16 is much faster in many situations than UTF-8. > an encoding is not a speed. it is a format. Both formats are variable-length encodings, and therefore both algorithms have the same time and space complexity (although the implementation of UTF16 does appear to be simpler from the length o

Re: [julia-users] Re: UTF16String or UTF8String?

2015-09-27 Thread Scott Jones
UTF-16 is much faster in many situations than UTF-8. It really depends a lot on just what you are doing, and the data you are processing. If it is mainly in North/South America, Western Europe, or Australia/NZ, UTF-8 does OK. UTF-8 is great for data interchange, but can really slow things down if

Re: [julia-users] Re: UTF16String or UTF8String?

2015-09-27 Thread Daniel Carrera
Thanks. On 27 September 2015 at 20:42, Páll Haraldsson wrote: > > UTF-16 was earlier (strictly speaking UCS-2) and Windows adopted it (and > also used elsewhere..). UTF-8 is almost in all cases better (except in > East-Asian languages, but not even there, if you use, something HTML (or I > guess