John Millikin jmilli...@gmail.com writes:
The reason many Japanese and Chinese users reject UTF-8 isn't due to
space constraints (UTF-8 and UTF-16 are roughly equal), it's because
they reject Unicode itself.
Probably because they don't think it's complicated enough¹?
Shift-JIS and the
Jinjing Wang wrote:
John Millikin wrote:
The reason many Japanese and Chinese users reject UTF-8 isn't due to
space constraints (UTF-8 and UTF-16 are roughly equal), it's because
they reject Unicode itself.
+1.
This is the thing Unicode advocates don't want to admit. Until Unicode has
code
Alright, here's the results for the first three in the list (please forgive
me for being lazy- I am a Haskell programmer after all):
ifeng.com:
UTF8: 299949
UTF16: 566610
dzh.mop.com:
GBK: 1866
UTF8: 1891
UTF16: 3684
www.csdn.net:
UTF8: 122870
UTF16: 217420
Seems like UTF8 is a consistent
On Wed, Aug 18, 2010 at 2:12 AM, John Meacham j...@repetae.net wrote:
ranty thing to follow
That said, there is never a reason to use UTF-16, it is a vestigial
remanent from the brief period when it was thought 16 bits would be
enough for the unicode standard, any defense of it nowadays is
Johan Tibell johan.tib...@gmail.com writes:
Text continues to be UTF-16 today because
* no one has written a benchmark that shows that UTF-8 would be faster
*for use in Data.Text*, and
* no one has written a patch that converts Text to use UTF-8 internally.
I'm quite frustrated by
On Wed, Aug 18, 2010 at 2:39 PM, Johan Tibell johan.tib...@gmail.comwrote:
On Wed, Aug 18, 2010 at 2:12 AM, John Meacham j...@repetae.net wrote:
ranty thing to follow
That said, there is never a reason to use UTF-16, it is a vestigial
remanent from the brief period when it was thought 16
On 18 August 2010 15:04, Michael Snoyman mich...@snoyman.com wrote:
For me, the whole point of this discussion was to
determine whether we should attempt porting to UTF-8, which as I understand
it would be a rather large undertaking.
And the answer to that is, yes but only if we have good
Hi Michael,
On Wed, Aug 18, 2010 at 4:04 PM, Michael Snoyman mich...@snoyman.comwrote:
Here's my response to the two points:
* I haven't written a patch showing that Data.Text would be faster using
UTF-8 because that would require fulfilling the second point (I'll get to in
a second). I
On Wed, Aug 18, 2010 at 4:12 AM, wren ng thornton w...@freegeek.org wrote:
There was a study recently on this. They found that there are four main
parts of the Internet:
* a densely connected core, where from any site you can get to any other
* an in cone, from which you can reach the core
On Wed, Aug 18, 2010 at 6:24 PM, Johan Tibell johan.tib...@gmail.comwrote:
Hi Michael,
On Wed, Aug 18, 2010 at 4:04 PM, Michael Snoyman mich...@snoyman.comwrote:
Here's my response to the two points:
* I haven't written a patch showing that Data.Text would be faster using
UTF-8 because
On Wed, Aug 18, 2010 at 10:12 AM, Michael Snoyman mich...@snoyman.comwrote:
While working on optimizing Hamlet I started playing around with the
BigTable benchmark. I wrote two blog posts on the topic:
http://www.snoyman.com/blog/entry/bigtable-benchmarks/
On Wed, Aug 18, 2010 at 7:12 PM, Michael Snoyman mich...@snoyman.comwrote:
On Wed, Aug 18, 2010 at 6:24 PM, Johan Tibell johan.tib...@gmail.comwrote:
Sorry, I thought I'd sent these out. While working on optimizing Hamlet I
started playing around with the BigTable benchmark. I wrote two
On Wed, Aug 18, 2010 at 11:58 PM, Johan Tibell johan.tib...@gmail.comwrote:
As for blaze I'm not sure exactly how it deals with UTF-8 input. I tried to
browse through the repo but could find that input ByteStrings are actually
validated anywhere. If they're not it's a big generous to say that
Benedikt Huber benj...@gmx.net writes:
Despite of all this, I think the performance of the text
package is very promising, and hope it will improve further!
I agree, Data.Text is great. Unfortunately, its internal use of UTF-16
makes it inefficient for many purposes.
A large fraction -
On Tue, Aug 17, 2010 at 10:08 AM, Ketil Malde ke...@malde.org wrote:
Benedikt Huber benj...@gmx.net writes:
Despite of all this, I think the performance of the text
package is very promising, and hope it will improve further!
I agree, Data.Text is great. Unfortunately, its internal use
On Tue, Aug 17, 2010 at 9:08 AM, Ketil Malde ke...@malde.org wrote:
Benedikt Huber benj...@gmx.net writes:
Despite of all this, I think the performance of the text
package is very promising, and hope it will improve further!
I agree, Data.Text is great. Unfortunately, its internal use of
Hello Johan,
Tuesday, August 17, 2010, 12:20:37 PM, you wrote:
I agree, Data.Text is great. Unfortunately, its internal use of UTF-16
makes it inefficient for many purposes.
It's not clear to me that using UTF-16 internally does make
Data.Text noticeably slower.
not slower but require
On Tue, Aug 17, 2010 at 10:34, Bulat Ziganshin bulat.zigans...@gmail.comwrote:
Hello Johan,
Tuesday, August 17, 2010, 12:20:37 PM, you wrote:
I agree, Data.Text is great. Unfortunately, its internal use of UTF-16
makes it inefficient for many purposes.
It's not clear to me that
Hi Bulat,
On Tue, Aug 17, 2010 at 10:34 AM, Bulat Ziganshin bulat.zigans...@gmail.com
wrote:
It's not clear to me that using UTF-16 internally does make
Data.Text noticeably slower.
not slower but require 2x more memory. speed is the same since
Unicode contains 2^20 codepoints
Yes, in
Hello Johan,
Tuesday, August 17, 2010, 1:06:30 PM, you wrote:
So it's not clear to me that using UTF-16 makes the program
noticeably slower or use more memory on a real program.
it's clear misunderstanding. of course, not every program holds much
text data in memory. but some does, and here
Hello Tako,
Tuesday, August 17, 2010, 12:46:35 PM, you wrote:
not slower but require 2x more memory. speed is the same since
Unicode contains 2^20 codepoints
This is not entirely correct because it all depends on your data.
of course i mean ascii chars
--
Best regards,
Bulat
I agree, Data.Text is great. Unfortunately, its internal use of UTF-16
makes it inefficient for many purposes.
In the first iteration of the Text package, UTF-16 was chosen because
it had a nice balance of arithmetic overhead and space. The
arithmetic for UTF-8 started to have serious
Johan Tibell johan.tib...@gmail.com writes:
It's not clear to me that using UTF-16 internally does make Data.Text
noticeably slower.
I haven't benchmarked it, but I'm fairly sure that, if you try to fit a
3Gbyte file (the Human genome, say¹), into a computer with 4Gbytes of
RAM, UTF-16 will
Ketil Malde ke...@malde.org writes:
Johan Tibell johan.tib...@gmail.com writes:
It's not clear to me that using UTF-16 internally does make Data.Text
noticeably slower.
I haven't benchmarked it, but I'm fairly sure that, if you try to fit a
3Gbyte file (the Human genome, say¹), into a
Ketil == Ketil Malde ke...@malde.org writes:
Ketil Johan Tibell johan.tib...@gmail.com writes:
It's not clear to me that using UTF-16 internally does make
Data.Text noticeably slower.
Ketil I think that *IF* we are aiming for a single, grand, unified
Ketil text library to
Hello Tom,
Tuesday, August 17, 2010, 2:09:09 PM, you wrote:
In the first iteration of the Text package, UTF-16 was chosen because
it had a nice balance of arithmetic overhead and space. The
arithmetic for UTF-8 started to have serious performance impacts in
situations where the entire
2010/8/17 Bulat Ziganshin bulat.zigans...@gmail.com:
Hello Tom,
snip
i don't understand what you mean. are you support all 2^20 codepoints
in Data.Text package?
Bulat,
Yes, its internal representation is UTF-16, which is capable of
encoding *any* valid Unicode codepoint.
-- Tom
Ketil Malde wrote:
I haven't benchmarked it, but I'm fairly sure that, if you try to fit a
3Gbyte file (the Human genome, say¹), into a computer with 4Gbytes of
RAM, UTF-16 will be slower than UTF-8...
I don't think the genome is typical text. And
I doubt that is true if that text is in a CJK
Tom Harper rtomhar...@gmail.com writes:
2010/8/17 Bulat Ziganshin bulat.zigans...@gmail.com:
Hello Tom,
snip
i don't understand what you mean. are you support all 2^20 codepoints
in Data.Text package?
Bulat,
Yes, its internal representation is UTF-16, which is capable of
encoding
Ivan Lazar Miljenovic wrote:
Tom Harper rtomhar...@gmail.com writes:
2010/8/17 Bulat Ziganshin bulat.zigans...@gmail.com:
Hello Tom,
snip
i don't understand what you mean. are you support all 2^20 codepoints
in Data.Text package?
Bulat,
Yes, its internal representation is UTF-16, which
Miguel Mitrofanov miguelim...@yandex.ru writes:
Ivan Lazar Miljenovic wrote:
Tom Harper rtomhar...@gmail.com writes:
2010/8/17 Bulat Ziganshin bulat.zigans...@gmail.com:
Hello Tom,
snip
i don't understand what you mean. are you support all 2^20 codepoints
in Data.Text package?
Bulat,
On Tue, Aug 17, 2010 at 1:50 PM, Yitzchak Gale g...@sefer.org wrote:
Ketil Malde wrote:
I haven't benchmarked it, but I'm fairly sure that, if you try to fit a
3Gbyte file (the Human genome, say¹), into a computer with 4Gbytes of
RAM, UTF-16 will be slower than UTF-8...
I don't think the
On Tue, Aug 17, 2010 at 12:54, Ivan Lazar Miljenovic
ivan.miljeno...@gmail.com wrote:
Tom Harper rtomhar...@gmail.com writes:
2010/8/17 Bulat Ziganshin bulat.zigans...@gmail.com:
Hello Tom,
snip
i don't understand what you mean. are you support all 2^20 codepoints
in Data.Text
Ivan Lazar Miljenovic ivan.miljeno...@gmail.com writes:
Seeing as how the genome just uses 4 base letters,
Yes, the bulk of the data is not really text at all, but each sequence
(it's fragmented due to the molecular division into chromosomes, and
due to incompleteness) also has a textual
Hello Tako,
Tuesday, August 17, 2010, 3:03:20 PM, you wrote:
Unless a Char in Haskell is 32 bits (or at least more than 16 bits)
it con NOT encode all Unicode points.
it's 32 bit
--
Best regards,
Bulatmailto:bulat.zigans...@gmail.com
Hi Ketil,
On Tue, Aug 17, 2010 at 12:09 PM, Ketil Malde ke...@malde.org wrote:
Johan Tibell johan.tib...@gmail.com writes:
It's not clear to me that using UTF-16 internally does make Data.Text
noticeably slower.
I haven't benchmarked it, but I'm fairly sure that, if you try to fit a
On Tue, Aug 17, 2010 at 12:39 PM, Bulat Ziganshin bulat.zigans...@gmail.com
wrote:
Hello Tom,
Tuesday, August 17, 2010, 2:09:09 PM, you wrote:
In the first iteration of the Text package, UTF-16 was chosen because
it had a nice balance of arithmetic overhead and space. The
arithmetic
On Tue, Aug 17, 2010 at 1:05 PM, Bulat Ziganshin
bulat.zigans...@gmail.comwrote:
Hello Tako,
Tuesday, August 17, 2010, 3:03:20 PM, you wrote:
Unless a Char in Haskell is 32 bits (or at least more than 16 bits)
it con NOT encode all Unicode points.
it's 32 bit
Like Bulat said it's 32
Tako Schotanus t...@codejive.org writes:
On Tue, Aug 17, 2010 at 12:54, Ivan Lazar Miljenovic
ivan.miljeno...@gmail.com wrote:
Tom Harper rtomhar...@gmail.com writes:
2010/8/17 Bulat Ziganshin bulat.zigans...@gmail.com:
Hello Tom,
snip
i don't understand what you mean. are you
Michael Snoyman mich...@snoyman.com writes:
I don't think *anyone* is asserting that UTF-16 is a common encoding
for files anywhere,
*ahem*
http://en.wikipedia.org/wiki/UTF-16/UCS-2#Use_in_major_operating_systems_and_environments
--
Ivan Lazar Miljenovic
ivan.miljeno...@gmail.com
On Tue, Aug 17, 2010 at 13:00, Michael Snoyman mich...@snoyman.com wrote:
On Tue, Aug 17, 2010 at 1:50 PM, Yitzchak Gale g...@sefer.org wrote:
Ketil Malde wrote:
I haven't benchmarked it, but I'm fairly sure that, if you try to fit a
3Gbyte file (the Human genome, say¹), into a computer
On Tue, Aug 17, 2010 at 2:20 PM, Ivan Lazar Miljenovic
ivan.miljeno...@gmail.com wrote:
Michael Snoyman mich...@snoyman.com writes:
I don't think *anyone* is asserting that UTF-16 is a common encoding
for files anywhere,
*ahem*
On Tue, Aug 17, 2010 at 13:29, Ketil Malde ke...@malde.org wrote:
Tako Schotanus t...@codejive.org writes:
Just like Char is capable of encoding any valid Unicode codepoint.
Unless a Char in Haskell is 32 bits (or at least more than 16 bits) it
con
NOT encode all Unicode points.
And
Michael Snoyman mich...@snoyman.com writes:
As far as space usage, you are correct that CJK data will take up more
memory in UTF-8 than UTF-16.
With the danger of sounding ... alphabetist? as well as belaboring a
point I agree is irrelevant (the storage format):
I'd point out that it seems
Ivan == Ivan Lazar Miljenovic ivan.miljeno...@gmail.com writes:
Char is not an encoding, right?
Ivan No, but in GHC at least it corresponds to a Unicode codepoint.
I don't think this is right, or shouldn't be right, anyway.. Surely it
stands for a character. Unicode codepoints include
Yitzchak Gale g...@sefer.org writes:
I don't think the genome is typical text.
I think the typical *large* collection of text is text-encoded data, and
not, for lack of a better word, literature. Genomics data is just an
example.
-k
--
If I haven't seen further, it is by standing in the
On Tue, Aug 17, 2010 at 13:40, Ketil Malde ke...@malde.org wrote:
Michael Snoyman mich...@snoyman.com writes:
As far as space usage, you are correct that CJK data will take up more
memory in UTF-8 than UTF-16.
With the danger of sounding ... alphabetist? as well as belaboring a
point I
On Aug 17, 1:55 pm, Tako Schotanus t...@codejive.org wrote:
I'll repeat here that in my opinion a Text package should be good at
handling text, human text, from whatever country. If I need to handle large
streams of ASCII I'll use something else.
I would mostly agree.
However, a key use
Michael Snoyman wrote:
Regarding the data: you haven't actually quoted any
statistics about the prevalence of CJK data
True, I haven't seen any - except for Google, which
I don't believe is accurate. I would like to see some
good unbiased data.
Right now we just have our intuitions based on
Colin Paul Adams co...@colina.demon.co.uk writes:
Char is not an encoding, right?
Ivan No, but in GHC at least it corresponds to a Unicode codepoint.
I don't think this is right, or shouldn't be right, anyway.. Surely it
stands for a character. Unicode codepoints include non-characters
Ketil Malde wrote:
I'd point out that it seems at least as unfair to optimize for CJK at
the cost of Western languages.
Quite true.
[...speculative calculation from which we conclude that]
a given document translated
between Chinese and English should occupy roughly the same space in
On Tue, Aug 17, 2010 at 1:36 PM, Tako Schotanus t...@codejive.org wrote:
Yeah, I tried looking it up but I could find the technical definition for
Char, but in the end I found that maxBound was 0x10 making it
basically 24 bits :)
I think that's enough to represent all the assigned
Hello, Ketil Malde!
On Tue, Aug 17, 2010 at 8:02 AM, Ketil Malde ke...@malde.org wrote:
Ivan Lazar Miljenovic ivan.miljeno...@gmail.com writes:
Seeing as how the genome just uses 4 base letters,
Yes, the bulk of the data is not really text at all, but each sequence
(it's fragmented due to
Johan == Johan Tibell johan.tib...@gmail.com writes:
Johan On Tue, Aug 17, 2010 at 1:36 PM, Tako Schotanus t...@codejive.org
wrote:
Johan Yeah, I tried looking it up but I could find the
Johan technical definition for Char, but in the end I found that
Johan maxBound was
On Tue, Aug 17, 2010 at 2:23 PM, Yitzchak Gale g...@sefer.org wrote:
Michael Snoyman wrote:
Regarding the data: you haven't actually quoted any
statistics about the prevalence of CJK data
True, I haven't seen any - except for Google, which
I don't believe is accurate. I would like to see
On Tue, Aug 17, 2010 at 3:23 PM, Yitzchak Gale g...@sefer.org wrote:
Michael Snoyman wrote:
Regarding the data: you haven't actually quoted any
statistics about the prevalence of CJK data
True, I haven't seen any - except for Google, which
I don't believe is accurate. I would like to see
Sounds to me like we need a lazy Data.Text variation that allows UTF-8 and
UTF-16 segments in it list of strict text elements :) Then big chunks of
western text will be encoded efficiently, and same with CJK! Not sure what
to do about strict Data.Text though :)
On Tue, Aug 17, 2010 at 1:40 PM,
Felipe Lessa felipe.le...@gmail.com writes:
[-snip- I've already spent too much time on the other stuff :-]
And what do you think about creating a real SeqData data type
with two bases per byte? In terms of processing speed I guess
there will be a small penalty, but if you need to have large
Someone mentioned earlier that IHHO all of this messing around with
encodings and conversions should be handled transparently, and I guess
you could do something like have the internal representation be along
the lines of Either UTF8 UTF16 (or perhaps even more encodings), and
then implement every
(Actually, this seems more like a job for a type class.)
2010/8/17 Gábor Lehel illiss...@gmail.com:
Someone mentioned earlier that IHHO all of this messing around with
encodings and conversions should be handled transparently, and I guess
you could do something like have the internal
On Tue, Aug 17, 2010 at 06:12, Michael Snoyman mich...@snoyman.com wrote:
I'm not talking about API changes here; the topic at hand is the internal
representation of the stream of characters used by the text package. That is
currently UTF-16; I would argue switching to UTF8.
The
On Tue, Aug 17, 2010 at 6:19 PM, John Millikin jmilli...@gmail.com wrote:
Ruby, which has an enormous Japanese userbase, solved the problem by
essentially defining Text = (Encoding, ByteString), and then
re-implementing text logic for each encoding. This allows very
efficient operation with
Quoth John Millikin jmilli...@gmail.com,
Ruby, which has an enormous Japanese userbase, solved the problem by
essentially defining Text = (Encoding, ByteString), and then
re-implementing text logic for each encoding. This allows very
efficient operation with every possible encoding, at the
On Tue, Aug 17, 2010 at 9:30 PM, Donn Cave d...@avvanta.com wrote:
Quoth John Millikin jmilli...@gmail.com,
Ruby, which has an enormous Japanese userbase, solved the problem by
essentially defining Text = (Encoding, ByteString), and then
re-implementing text logic for each encoding. This
Hi michael, here is a web site http://zh.wikipedia.org/zh-cn/. It is the
wikipedia for Chinese.
-Andrew
On Tue, Aug 17, 2010 at 7:00 PM, Michael Snoyman mich...@snoyman.comwrote:
On Tue, Aug 17, 2010 at 1:50 PM, Yitzchak Gale g...@sefer.org wrote:
Ketil Malde wrote:
I haven't benchmarked
On Tue, Aug 17, 2010 at 03:21:32PM +0200, Daniel Peebles wrote:
Sounds to me like we need a lazy Data.Text variation that allows UTF-8 and
UTF-16 segments in it list of strict text elements :) Then big chunks of
western text will be encoded efficiently, and same with CJK! Not sure what
to do
Bulat Ziganshin wrote:
Johan wrote:
So it's not clear to me that using UTF-16 makes the program
noticeably slower or use more memory on a real program.
it's clear misunderstanding. of course, not every program holds much
text data in memory. but some does, and here you will double memory
On Tue, Aug 17, 2010 at 12:30, Donn Cave d...@avvanta.com wrote:
If Haskell had the development resources to make something like this
work, would it actually take the form of a Haskell-level type like
that - data Text = (Encoding, ByteString)? I mean, I know that's
just a very clear and
Michael Snoyman wrote:
On Tue, Aug 17, 2010 at 2:20 PM, Ivan Lazar Miljenovic
ivan.miljeno...@gmail.com wrote:
Michael Snoyman mich...@snoyman.com writes:
I don't think *anyone* is asserting that UTF-16 is a common encoding
for files anywhere,
*ahem*
John Millikin wrote:
The reason many Japanese and Chinese users reject UTF-8 isn't due to
space constraints (UTF-8 and UTF-16 are roughly equal), it's because
they reject Unicode itself.
+1.
This is the thing Unicode advocates don't want to admit. Until Unicode
has code points for _all_
Johan Tibell wrote:
To my knowledge the data we have about prevalence of encoding on the web is
accurate. We crawl all pages we can get our hands on, by starting at some
set of seeds and then following all the links. You cannot be sure that
you've reached all web sites as there might be cliques
On 18 August 2010 12:12, wren ng thornton w...@freegeek.org wrote:
Johan Tibell wrote:
To my knowledge the data we have about prevalence of encoding on the web
is
accurate. We crawl all pages we can get our hands on, by starting at some
set of seeds and then following all the links. You
On Aug 17, 2010, at 11:51 PM, Ketil Malde wrote:
Yitzchak Gale g...@sefer.org writes:
I don't think the genome is typical text.
I think the typical *large* collection of text is text-encoded data, and
not, for lack of a better word, literature. Genomics data is just an
example.
I have
Ivan Lazar Miljenovic wrote:
On 18 August 2010 12:12, wren ng thornton w...@freegeek.org wrote:
Johan Tibell wrote:
To my knowledge the data we have about prevalence of encoding on the web
is
accurate. We crawl all pages we can get our hands on, by starting at some
set of seeds and then
Well, I'm not certain if it counts as a typical Chinese website, but here
are the stats;
UTF8: 64,198
UTF16: 113,160
And just for fun, after gziping:
UTF8: 17,708
UTF16: 19,367
On Wed, Aug 18, 2010 at 2:59 AM, anderson leo fireman...@gmail.com wrote:
Hi michael, here is a web site
John Millikin wrote:
The reason many Japanese and Chinese users reject UTF-8 isn't due to
space constraints (UTF-8 and UTF-16 are roughly equal), it's because
they reject Unicode itself.
+1.
This is the thing Unicode advocates don't want to admit. Until Unicode has
code points for _all_
Hi Bulat,
On Monday 16 August 2010 07:35:44, Bulat Ziganshin wrote:
Hello Daniel,
Sunday, August 15, 2010, 10:39:24 PM, you wrote:
That's great. If that performance difference is a show stopper, one
shouldn't go higher-level than C anyway :)
*all* speed measurements that find Haskell is
On 16.08.10 14:44, Daniel Fischer wrote:
Hi Bulat,
On Monday 16 August 2010 07:35:44, Bulat Ziganshin wrote:
Hello Daniel,
Sunday, August 15, 2010, 10:39:24 PM, you wrote:
That's great. If that performance difference is a show stopper, one
shouldn't go higher-level than C anyway :)
*all*
On Sat, Aug 14, 2010 at 10:46 PM, Michael Snoyman mich...@snoyman.comwrote:
When I'm writing a web app, my code is sitting on a Linux system where the
default encoding is UTF-8, communicating with a database speaking UTF-8,
receiving request bodies in UTF-8 and sending response bodies in
Quoth John Millikin jmilli...@gmail.com,
I don't see why [Char] is obvious -- you'd never use [Word8] for
storing binary data, right? [Char] is popular because it's the default
type for string literals, and due to simple inertia, but when there's
a type based on packed arrays there's no
On Sat, Aug 14, 2010 at 10:07 PM, Donn Cave d...@avvanta.com wrote:
Am I confused about this? It's why I can't see Text ever being
simply the obvious choice. [Char] will continue to be the obvious
choice if you want a functional data type that supports pattern
matching etc.
Actually,
Bryan == Bryan O'Sullivan b...@serpentine.com writes:
Bryan On Sat, Aug 14, 2010 at 10:46 PM, Michael Snoyman
mich...@snoyman.com wrote:
Bryan When I'm writing a web app, my code is sitting on a Linux
Bryan system where the default encoding is UTF-8, communicating
Bryan with
Hi Colin,
On Sun, Aug 15, 2010 at 9:34 AM, Colin Paul Adams
co...@colina.demon.co.ukwrote:
But UTF-16 (apart from being an abomination for creating a hole in the
codepoint space and making it impossible to ever etxend it) is slow to
process compared with UTF-32 - you can't get the nth
Don Stewart d...@galois.com writes:
* Pay attention to Haskell Cafe announcements
* Follow the Reddit Haskell news.
* Read the quarterly reports on Hackage
* Follow Planet Haskell
And yet there are still many packages that fall under the radar with no
announcements of any
2010/8/15 Ivan Lazar Miljenovic ivan.miljeno...@gmail.com:
Don Stewart d...@galois.com writes:
* Pay attention to Haskell Cafe announcements
* Follow the Reddit Haskell news.
* Read the quarterly reports on Hackage
* Follow Planet Haskell
And yet there are still many
Don Stewart wrote:
So, to stay up to date, but without drowning in data. Do one of:
* Pay attention to Haskell Cafe announcements
* Follow the Reddit Haskell news.
* Read the quarterly reports on Hackage
* Follow Planet Haskell
Interesting. Obviously I look at Haskell Cafe
Vo Minh Thu not...@gmail.com writes:
2010/8/15 Ivan Lazar Miljenovic ivan.miljeno...@gmail.com:
Don Stewart d...@galois.com writes:
* Pay attention to Haskell Cafe announcements
* Follow the Reddit Haskell news.
* Read the quarterly reports on Hackage
* Follow Planet
2010/8/15 Ivan Lazar Miljenovic ivan.miljeno...@gmail.com:
Vo Minh Thu not...@gmail.com writes:
2010/8/15 Ivan Lazar Miljenovic ivan.miljeno...@gmail.com:
Don Stewart d...@galois.com writes:
* Pay attention to Haskell Cafe announcements
* Follow the Reddit Haskell news.
* Read
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 8/15/10 03:01 , Bryan O'Sullivan wrote:
On Sat, Aug 14, 2010 at 10:07 PM, Donn Cave d...@avvanta.com
mailto:d...@avvanta.com wrote:
We'll have a three way choice between programming
elegance, correctness and efficiency. If Haskell
No, not really. Linked lists are very easy to deal with recursively and
Strings automatically work with any already-defined list functions.
On Sun, Aug 15, 2010 at 11:17 AM, Brandon S Allbery KF8NH
allb...@ece.cmu.edu wrote:
More to the point, there's nothing elegant about [Char] --- its sole
Quoth Bryan O'Sullivan b...@serpentine.com,
On Sat, Aug 14, 2010 at 10:07 PM, Donn Cave d...@avvanta.com wrote:
...
ByteString will continue to be the obvious choice
for big data loads.
Don't confuse I have big data with I need bytes. If you are working with
bytes, use bytestring. If you are
Quoth Bill Atkins watk...@alum.rpi.edu,
No, not really. Linked lists are very easy to deal with recursively and
Strings automatically work with any already-defined list functions.
Yes, they're great - a terrible mistake, for a practical programming
language, but if you fail to recognize the
On Sun, Aug 15, 2010 at 12:50 PM, Donn Cave d...@avvanta.com wrote:
I wonder how many ByteString users are `working with bytes', in the
sense you apparently mean where the bytes are not text characters.
My impression is that in practice, there is a sizeable contingent
out here using
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 8/15/10 11:25 , Bill Atkins wrote:
No, not really. Linked lists are very easy to deal with recursively and
Strings automatically work with any already-defined list functions.
On Sun, Aug 15, 2010 at 11:17 AM, Brandon S Allbery KF8NH
Donn Cave wrote:
I wonder how many ByteString users are `working with bytes', in the
sense you apparently mean where the bytes are not text characters.
My impression is that in practice, there is a sizeable contingent
out here using ByteString.Char8 and relatively few applications for
the Word8
Donn Cave wrote:
Quoth Bill Atkins watk...@alum.rpi.edu,
No, not really. Linked lists are very easy to deal with recursively and
Strings automatically work with any already-defined list functions.
Yes, they're great - a terrible mistake, for a practical programming
language, but if
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 8/15/10 13:53 , Andrew Coppin wrote:
injection attacks, the Y2K bug, programs that can't handle files larger than
2GB or that don't understand Unicode, and so forth. All things that could
have been almost trivially avoided if everybody wasn't so
On Sat, Aug 14, 2010 at 6:05 PM, Bryan O'Sullivan b...@serpentine.comwrote:
- If it's not good enough, and the fault lies in a library you chose,
report a bug and provide a test case.
As a case in point, I took the string search benchmark that Daniel shared
on Friday, and boiled it
Brandon S Allbery KF8NH wrote:
(Remember that Unix is itself a practical example of a research platform
avoiding success at any cost gone horribly wrong.)
I haven't used Erlang myself, but I've heard it described in a similar
way. (I don't know how true that actually is...)
On Sunday 15 August 2010 20:04:01, Bryan O'Sullivan wrote:
On Sat, Aug 14, 2010 at 6:05 PM, Bryan O'Sullivan
b...@serpentine.comwrote:
- If it's not good enough, and the fault lies in a library you
chose, report a bug and provide a test case.
As a case in point, I took the string
1 - 100 of 168 matches
Mail list logo