On Mon, 20 Aug 2012 00:44:22 -0400, Roy Smith wrote:

> In article <5031bb2f$0$29972$c3e8da3$54964...@news.astraweb.com>,
>  Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote:
> 
>> > So it may be with utf-8 someday.
>> 
>> Only if you believe that people's ability to generate data will remain
>> lower than people's ability to install more storage.
> 
> We're not talking *data*, we're talking *text*.  Most of those
> whatever-bytes people are generating are images, video, and music.  Text
> is a pittance compared to those.

Paul Rubin already told you about his experience using OCR to generate 
multiple terrabytes of text, and how he would not be happy if that was 
stored in UCS-4.

HTML is text. XML is text. SVG is text. Source code is text. Email is 
text. (Well, it's actually bytes, but it looks like ASCII text.) Log 
files are text, and they can fill a hard drive pretty quickly. Lots of 
data is text.

Pittance or not, I do not believe that people will widely abandon compact 
storage formats like UTF-8 and Latin-1 for UCS-4 any time soon. Given 
that we're still trying to convince people to use UTF-8 over ASCII, I 
reckon it will be at least 40 years before there's even a slim chance of 
migrating from UTF-8 to UCS-4 in a widespread manner. In the IT world, 
that's close enough to "never" -- we might not even be using Unicode in 
2052.

In any case, time will tell who is right.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to