Re: String != [Char]

2012-04-01 Thread Greg Weber
I am starting up the proposal.

http://hackage.haskell.org/trac/haskell-prime/ticket/143
http://hackage.haskell.org/trac/haskell-prime/wiki/OpaqueText

Unfortunately I haven't had any time to work on this for the last week
and won't for 2 more weeks.
Your help is appreciated. I think the first step is to move list
functions into a separate module for the Text package to see if we can
get rid of name conflicts with the Prelude.

On Sat, Mar 31, 2012 at 11:49 PM, Colin Paul Adams
co...@colina.demon.co.uk wrote:
 Gabriel == Gabriel Dos Reis g...@integrable-solutions.net writes:

    Gabriel On Mon, Mar 26, 2012 at 5:08 AM, Christian Siefkes

    Gabriel It is not the precision of Char or char that is the issue
    Gabriel here.  It has been clarified at several points that Char is
    Gabriel not a Unicode character, but a Unicode code point.

 That's not what the standard says:

 6.1.2 Characters and Strings

 The character type Char is an enumeration whose values represent Unicode
 characters [2].

 [2]   Unicode Consortium. Unicode
 standard. http://unicode.org/standard/standard.html.

 --
 Colin Adams
 Preston Lancashire
 ()  ascii ribbon campaign - against html e-mail
 /\  www.asciiribbon.org   - against proprietary attachments

 ___
 Haskell-prime mailing list
 Haskell-prime@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-prime

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-27 Thread Yitzchak Gale
Ben Millwood wrote:
 No-one's yet argued against OverloadedStrings. I think there /is/ an
 argument to be made, that it introduces ambiguity and could break
 existing programs (probably we can extend defaulting to take care of
 this, but I think there are people who'd be happier if we killed
 defaulting too).

For the record, I have a more fundamental objection to
OverloadedStrings. When library authors write partial functions
for the fromString method - and we see in practice that
they do so - it creates the shocking situation in which
invalid syntax of a literal is only caught at run time.

Instead, Haskell' should provide a pragma, or other
syntactic mechanism, by which compilers can allow
specific types other than String for string literals at
*compile* time in a monomorphic context. The
implementation of that syntax would be compiler
dependent, of course. For GHC, Template Haskell
could help. Another approach would be for a
compiler to have a built-in way of embedding, say,
Text literals directly in object code.

For people who like OverloadedStrings despite this
problem, it could be available via a pragma. But
never by default - it should be possible to get
fundamental string types like Text and ByteString
for string literals without having to turn on
OverloadedStrings. In GHC, quasi-quoters are the
right way to provide convenient user-defined syntax
for string-like types other than the canonical
string types.

Theoretically, polymorphism for numeric literals
is just as bad. But in practice, it seems to be much
more rare for people to implement fromIntegral
and fromRational with dangerous functions.

-Yitz

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


RE: String != [Char]

2012-03-26 Thread Simon Marlow
 The primary argument is to not break something that works well for most
 purposes, including teaching, at a huge cost of backwards compatibility
 for marginal if any real benefits.

I'm persuaded by this argument.  And I'm glad that teachers are speaking up in 
this debate - it's hard to get a balanced discussion on an obscure mailing list.

So I'm far from convinced that [Char] is a bad default for the String type.  
But it's important that as far as possible Text should not be a second class 
citizen, so I'd support adding OverloadedStrings to the language, and maybe 
looking at overloading some of the String APIs in the standard libraries.

Remember that FilePath is not part of the debate, since neither [Char] nor Text 
are correct representations of FilePath.

If we want to do an evaluation of the pedagogical value of [Char] vs. Text, I 
suggest writing something like a regex matcher in both and comparing the two.
 
One more thing: historically, performance considerations have been given a 
fairly low priority in the language design process for Haskell, and rightly so. 
 That doesn't mean performance has been ignored altogether (for example, seq), 
but it is almost never the case that a concession in other language design 
principles (e.g. consistency, simplicity) is made for performance reasons 
alone.  We should remember, when thinking about changes to Haskell, that 
Haskell is the way it is because of this uncompromising attitude, and we should 
be glad that Haskell is not burdened with (many) legacy warts that were 
invented to work around performance problems that no longer exist.  I'm not 
saying that this means we should ignore Text as a performance hack, just that 
performance should not come at the expense of good language design.

Cheers,
Simon



___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Christian Siefkes
On 03/26/2012 02:39 AM, Gabriel Dos Reis wrote:
 True, but should the language definition default to a string type
 that is one the most unsuited for text processing in the 21st
 century where global multilingualism abounds?  Even C has qualms
 about that.
...
 I have no doubt believing that if all texts my students have to
 process are US ASCII, [Char] is more than sufficient.  So, I have
 sympathy for your position.  However,  I doubt [Char] would be
 adequate if I ask them to shared texts from their diverse cultures.

Uh, while a C char is (usually) just a byte (2^8 bits of information, like
Word8 in Haskell), a Haskell Char is a Unicode character (2^21 bits of
information). A single C char cannot contain arbitrary Unicode character,
while a Haskell Char can, and does. Hence [Char] is (efficiency issues
aside) perfectly adequate for dealing with texts written in arbitrary languages.

Best regards
Christian

[I first accidentally send this just to Gabriel, sorry.]

-- 
|--- Dr. Christian Siefkes --- christ...@siefkes.net ---
| Homepage: http://www.siefkes.net/ | Blog: http://www.keimform.de/
|Peer Production Everywhere:   http://peerconomy.org/wiki/
|-- OpenPGP Key ID: 0x346452D8 --
A bug is a test case you haven't written yet.



signature.asc
Description: OpenPGP digital signature
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Gabriel Dos Reis
On Mon, Mar 26, 2012 at 5:08 AM, Christian Siefkes
christ...@siefkes.net wrote:
 On 03/26/2012 02:39 AM, Gabriel Dos Reis wrote:
 True, but should the language definition default to a string type
 that is one the most unsuited for text processing in the 21st
 century where global multilingualism abounds?  Even C has qualms
 about that.
 ...
 I have no doubt believing that if all texts my students have to
 process are US ASCII, [Char] is more than sufficient.  So, I have
 sympathy for your position.  However,  I doubt [Char] would be
 adequate if I ask them to shared texts from their diverse cultures.

 Uh, while a C char is (usually) just a byte (2^8 bits of information, like
 Word8 in Haskell), a Haskell Char is a Unicode character (2^21 bits of
 information).

It is not the precision of Char or char that is the issue here.
It has been clarified at several points that Char is not a Unicode character,
but a Unicode code point.  Not every Unicode code point represents a
Unicode code character, and not every sequence of Unicode code points
represents a character or a sequence of Unicode character.

 A single C char cannot contain arbitrary Unicode character,
 while a Haskell Char can, and does. Hence [Char] is (efficiency issues
 aside) perfectly adequate for dealing with texts written in arbitrary 
 languages.

See above.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Christian Siefkes
On 03/26/2012 01:26 PM, Gabriel Dos Reis wrote:
 It is not the precision of Char or char that is the issue here.
 It has been clarified at several points that Char is not a Unicode character,
 but a Unicode code point.  Not every Unicode code point represents a
 Unicode code character, and not every sequence of Unicode code points
 represents a character or a sequence of Unicode character.

What do you mean? Every Unicode character corresponds to one code point, and
every code point in the range 0 to 0x10 (excluding the range 0xD800 to
0xDFFF which is reserved for surrogate pairs in UTF-16, and a handful of
noncharacters, see
http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Special_code_points
) corresponds to one character.

Maybe your criticism is that Char does not explicitly prevent these special
code points from being assigned? While true, that seems a relatively minor
matter. Moreover, a future revision of the Haskell standard could easily
declare that a assigning a forbidden character results in an error/bottom
if that is so desired.

Best regards
Christian

-- 
|--- Dr. Christian Siefkes --- christ...@siefkes.net ---
| Homepage: http://www.siefkes.net/ | Blog: http://www.keimform.de/
|Peer Production Everywhere:   http://peerconomy.org/wiki/
|-- OpenPGP Key ID: 0x346452D8 --
Linux is like living in a tipi: no windows, no gates, Apache inside.



signature.asc
Description: OpenPGP digital signature
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Thomas Schilling
On 26 March 2012 13:29, Christian Siefkes christ...@siefkes.net wrote:
 On 03/26/2012 01:26 PM, Gabriel Dos Reis wrote:
 It is not the precision of Char or char that is the issue here.
 It has been clarified at several points that Char is not a Unicode character,
 but a Unicode code point.  Not every Unicode code point represents a
 Unicode code character, and not every sequence of Unicode code points
 represents a character or a sequence of Unicode character.

 What do you mean? Every Unicode character corresponds to one code point, and
 every code point in the range 0 to 0x10 (excluding the range 0xD800 to
 0xDFFF which is reserved for surrogate pairs in UTF-16, and a handful of
 noncharacters, see
 http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Special_code_points
 ) corresponds to one character.

I think it's best not to use the term Unicode character since it's
highly ambiguous, to quote from
http://www.icu-project.org/docs/papers/forms_of_unicode/:


We have seen that characters, glyphs, code points, and code units are
all different. Unfortunately the term character is vastly overloaded.
At various times people can use it to mean any of these things:

 - An image on paper (glyph)
 - What an end-user thinks of as a character (grapheme)
 - What a character encoding standard encodes (code point)
 - A memory storage unit in a character encoding (code unit)

Because of this, ironically, it is best to avoid the use of the term
character entirely when discussing character encodings, and stick to
the term code point.


The section http://www.icu-project.org/docs/papers/forms_of_unicode/#h0
is also important to keep in mind.



 Maybe your criticism is that Char does not explicitly prevent these special
 code points from being assigned? While true, that seems a relatively minor
 matter. Moreover, a future revision of the Haskell standard could easily
 declare that a assigning a forbidden character results in an error/bottom
 if that is so desired.

 Best regards
        Christian

 --
 |--- Dr. Christian Siefkes --- christ...@siefkes.net ---
 | Homepage: http://www.siefkes.net/ | Blog: http://www.keimform.de/
 |    Peer Production Everywhere:       http://peerconomy.org/wiki/
 |-- OpenPGP Key ID: 0x346452D8 --
 Linux is like living in a tipi: no windows, no gates, Apache inside.


 ___
 Haskell-prime mailing list
 Haskell-prime@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-prime




-- 
Push the envelope. Watch it bend.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Gábor Lehel
On Sun, Mar 25, 2012 at 5:19 AM, Greg Weber g...@gregweber.info wrote:
 On Sat, Mar 24, 2012 at 7:26 PM, Gabriel Dos Reis
 g...@integrable-solutions.net wrote:
 On Sat, Mar 24, 2012 at 9:09 PM, Greg Weber g...@gregweber.info wrote:

 Problem: we want to write beautiful (and possibly inefficient) code
 that is easy to explain. If nothing else, this is pedagologically
 important.
 The goals of this code are to:
  * use list processing pattern matching and functions on a string type

 I may have missed this question so I will ask it (apologies if it is a
 repeat):  Why is it believed that list processing pattern matching is
 appropriate or the right tool for text processing?

 Nobody said it is the right tool for text processing. In fact, I think
 we all agreed it is the wrong tool for many cases. But it is easy for
 students to understand since they are already being taught to use
 lists for everything else.  It would be great if you can talk with
 teachers of Haskell and figure out a better way to teach text
 processing.


I think a helpful question might be whether [Char] is mainly used to
teach about lists, or whether it's mainly used to teach about how to
do Unicode text processing correctly. If it's mainly used to teach
about lists, pattern matching, etc., as I suspect, then the fine
details of Unicode don't matter so much, you could even work with
ASCII-only strings and it would work equally well for teaching about
lists. How to do Unicode text processing correctly is a topic that
seems like it would become important much later, when someone's going
to write code that's meant to be used in a production environment.
Most students in an introductory university course probably don't get
close to that point. If you do want to teach about how to do Unicode
text processing correctly (which, for the record, is an important
issue irrespective of which programming language you're using) then
presumably you want to teach about Text, but hopefully your students
will be more advanced by then and it won't be so much of a problem.

I'm not really sure what that recommends in terms of policy. Mainly
what you need is it should be possible to work with lists of
characters and it should be possible to work with Text, which we
more-or-less have already. The important bits seem to be
OverloadedStrings and ideally some way to avoid a pervasive API bias
towards the wrong type (the tradeoffs there are probably more tricky).
(So... basically what Simon M. said.)

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Greg Weber
Can anyone explain how the tangent discussion of the finer points of
Unicode and the value of teaching [Char] is relevant to the proposal
under discussion? We aren't going to completely eliminate String and
break most existing Haskell code as Simon said. String is just a list
anyways, and lists are here to stay in Haskell.

I would like to get back to working on the proposal and determining
how Text can be added to the language.

Thank you,
Greg weber

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Gabriel Dos Reis
On Mon, Mar 26, 2012 at 8:35 AM, Gábor Lehel illiss...@gmail.com wrote:
 On Sun, Mar 25, 2012 at 5:19 AM, Greg Weber g...@gregweber.info wrote:
 On Sat, Mar 24, 2012 at 7:26 PM, Gabriel Dos Reis
 g...@integrable-solutions.net wrote:
 On Sat, Mar 24, 2012 at 9:09 PM, Greg Weber g...@gregweber.info wrote:

 Problem: we want to write beautiful (and possibly inefficient) code
 that is easy to explain. If nothing else, this is pedagologically
 important.
 The goals of this code are to:
  * use list processing pattern matching and functions on a string type

 I may have missed this question so I will ask it (apologies if it is a
 repeat):  Why is it believed that list processing pattern matching is
 appropriate or the right tool for text processing?

 Nobody said it is the right tool for text processing. In fact, I think
 we all agreed it is the wrong tool for many cases. But it is easy for
 students to understand since they are already being taught to use
 lists for everything else.  It would be great if you can talk with
 teachers of Haskell and figure out a better way to teach text
 processing.


 I think a helpful question might be whether [Char] is mainly used to
 teach about lists, or whether it's mainly used to teach about how to
 do Unicode text processing correctly. If it's mainly used to teach
 about lists, pattern matching, etc., as I suspect, then the fine
 details of Unicode don't matter so much, you could even work with
 ASCII-only strings and it would work equally well for teaching about
 lists.

I agree that if the purpose is to teach list and list pattern matching,
it does not matter much what the element type is as long as it follows
reasonable constraints.  However, as someone observed earlier, the
Haskell Report is not a vehicle to prescribe how Haskell should be
taught or for what reasons Haskell should be taught.  That argument,
while it was made to support String = [Char] for pedagogical purposes,
is in fact a good argument against.

  How to do Unicode text processing correctly is a topic that
 seems like it would become important much later, when someone's going
 to write code that's meant to be used in a production environment.
 Most students in an introductory university course probably don't get
 close to that point. If you do want to teach about how to do Unicode
 text processing correctly (which, for the record, is an important
 issue irrespective of which programming language you're using) then
 presumably you want to teach about Text, but hopefully your students
 will be more advanced by then and it won't be so much of a problem.

The Haskell Report claims very prominently that it uses the Unicode
character set.  The question is whether it should be using it correctly
at all, and if so should it even try to pretend that its default string type
use those characters correctly.

I do not subscribe to the notion that simple correct text processing
is something
students would have to learn only in advanced classes dedicated to
Unicode.  In the region of this side of the Atlantic Ocean where I teach, the
student population is very diverse and I do think it would responsible to stand
in front of students and say:
 You are all welcome; this class is open to all cultures and we
are committed
  to diversity and equal opportunity.  However, for the purpose of
simplicity and
  pedagogy, we would refrain from looking at texts from this and
other students.
  If you are really interested, you should take an advanced class.
 I hope you
  enjoy the class.

Furthermore, I am not convinced it is a good strategy to try hard to reflect
the notion that text processing is hard, either in the language or in
its presentation
(e.g. it is advanced topic, you need to be advanced before we talk about it.)

 I'm not really sure what that recommends in terms of policy. Mainly
 what you need is it should be possible to work with lists of
 characters and it should be possible to work with Text, which we
 more-or-less have already. The important bits seem to be
 OverloadedStrings and ideally some way to avoid a pervasive API bias
 towards the wrong type (the tradeoffs there are probably more tricky).
 (So... basically what Simon M. said.)

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Gabriel Dos Reis
On Mon, Mar 26, 2012 at 9:21 AM, Greg Weber g...@gregweber.info wrote:
 Can anyone explain how the tangent discussion of the finer points of
 Unicode and the value of teaching [Char] is relevant to the proposal
 under discussion? We aren't going to completely eliminate String and
 break most existing Haskell code as Simon said. String is just a list
 anyways, and lists are here to stay in Haskell.

 I would like to get back to working on the proposal and determining
 how Text can be added to the language.

The discussion started because of the question of whether Text should
support list processing functions at all, and if so how.  That is a
very legitimate
question related to the Text proposal, at least if you are concerned about
correct semantics.  Once you are there, the discussion about Unicode
characters is unavoidable, and is very much within the scope of discussing
Text.

I may have missed the proposal to eliminate list from Haskell, though.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Greg Weber

 I would like to get back to working on the proposal and determining
 how Text can be added to the language.

 The discussion started because of the question of whether Text should
 support list processing functions at all, and if so how.  That is a
 very legitimate
 question related to the Text proposal, at least if you are concerned about
 correct semantics.  Once you are there, the discussion about Unicode
 characters is unavoidable, and is very much within the scope of discussing
 Text.


Can we take a break from arguing then, and can you create a wiki page
that explains how you think Text should behave? This conversation is
getting extremely long and repetitive.

Can someone please show me how to create a wiki page on the Haskell
proposals site (or suggest a different appropriate site)?

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Johan Tibell
On Mon, Mar 26, 2012 at 7:48 AM, Malcolm Wallace malcolm.wall...@me.com wrote:
 In the region of this side of the Atlantic Ocean where I teach, the
 student population is very diverse

 Prelude putStrLn (take 5 Fröhßen)
 Fröhß

ghci putStrLn Fro\x0308hßen
Fröhßen
ghci putStrLn (take 5 Fro\x0308hßen)
Fröh

Your example works because your input happens to be in a normal form.

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Christian Siefkes
On 03/26/2012 05:50 PM, Johan Tibell wrote:
 Normalization isn't quite enough unfortunately, as it does solve e.g.
 
 upcase = map toUppper
 
 You need all-at-once functions on strings (which we could add.) I'm
 just pointing out that most (all?) list functions do the wrong thing
 when used on Strings.

Hm, do you have any other examples besides toUpper/toLower?

Also, that example is not really an argument against using list functions on
strings (which, by any reasonable definition, seem to be sequences of
characters -- whether that sequence is represented as a list, an array, or
something else, seems more like an implementation detail to me). Rather, it
indicates the fact that Char.toUpper may have to wrong type. If its type was
Char - String instead of Char - Char, it could handle things like toUppper
'ß' == SS correctly. Then stuff like

upcase = concatMap toUppper

would work fine.

As it is, the problem seems to be with Char, not with [Char].

Best regards
Christian

-- 
|--- Dr. Christian Siefkes --- christ...@siefkes.net ---
| Homepage: http://www.siefkes.net/ | Blog: http://www.keimform.de/
|Peer Production Everywhere:   http://peerconomy.org/wiki/
|-- OpenPGP Key ID: 0x346452D8 --
Failure is just success rounded down.
-- Ryan North



signature.asc
Description: OpenPGP digital signature
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Johan Tibell
On Mon, Mar 26, 2012 at 9:59 AM, Henrik Nilsson n...@cs.nott.ac.uk wrote:
 So, is the argument to deprecate Char, then? As long as Haskell
 allows Chars to be handled in isolation, it would seem impossible
 to prevent naive users from accidentally stumbling over the
 complexities of Unicode?

I haven't proposed anything at all. Someone asked why one should
prefer Text to String. I showed that the former is more correct (given
the currently available APIs) and much faster.

 There are canonical equivalence and compatibility, and each
 has two normal forms (fully composed and fully decomposed),
 and each of these four normal forms can be used in text processing.

 As an example of the difference between equivalent and compatible,
 the ligature ff is compatible - but not canonically equivalent
 to a sequence of two characters latin f, meaning they may be treated the
 same way in some applications (such as sorting and indexing), but not in
 others; and may be substituted for each other in some situations, but not in
 others.

 Is it realistic to think that if only Haskell used Text and not
 String = [Char], a naive user/beginner would be able to write
 correct code for all manner of text processing tasks without
 needing to understand a great deal about Unicode?

 I'm sorry, but I'm rather sceptical.

Why? We can hide most of these details behind the Text API. We can
pick which encoding and normal form is used internally and then have
the externally provided API for e.g. sorting do the right thing.

 So I reiterate that I see little if any gain, be it in terms of making
 life simpler for beginners, making Haskell more multi cultural, or
 giving Haskell applications in general a performance boost, in
 deprecating String = [Char] and mandating the use of Text.
 But the costs would be massive.

I agree and thus I don't propose we do something like that.

The way this will go down is that part of the Prelude and other base
modules will eventually be replaced by more modern packages (e.g. see
system-fileio) and the use of String will decline. Unfortunately it's
a bit of a painful transition as today we need to convert back and
forth between the two string types quite a lot.

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Johan Tibell
On Mon, Mar 26, 2012 at 10:12 AM, Ian Lynagh ig...@earth.li wrote:
 I am very unicode-ignorant, so apologies if I have misunderstood
 something, but doesn't Text do the same thing?

 Prelude T import Data.Text.IO as T
 Prelude T T T.putStrLn (T.take 5 (T.pack Fro\x0308hßen))
 Fröh

 Maybe your point is that neither take function should be used with
 unicode strings, but I don't see how advocating the Text type is going
 to help with that.

We already covered this. Text inherited a list-based API, even if that
sometimes doesn't make sense.

To work with Unicode you need more specific functions for different
tasks. Text only implements a few so far, like case conversion and
case-less comparison, and asks you to use text-icu for the rest.

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Brandon Allbery
On Mon, Mar 26, 2012 at 13:12, Ian Lynagh ig...@earth.li wrote:

 Maybe your point is that neither take function should be used with
 unicode strings, but I don't see how advocating the Text type is going
 to help with that.


I think we established earlier that the list-like operations on Text are a
backward compatibility wart.  Either they should go away, or they should be
modified to operate on some other level than codepoints.  Probably the way
the ecosystem should work is that [Char] (or possibly a packed version
thereof, sort of like lazy ByteStrings with Word32 instead of Word8 as the
fundamental unit) is the codepoint view and Text is the grapheme view; both
are necessary at various times, but the grapheme view is the more natural
one for text /per se/.

-- 
brandon s allbery  allber...@gmail.com
wandering unix systems administrator (available) (412) 475-9364 vm/sms
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-26 Thread Evan Laforge
 No-one's yet argued against OverloadedStrings. I think there /is/ an
 argument to be made, that it introduces ambiguity and could break
 existing programs (probably we can extend defaulting to take care of

Definitely, I have ones that would need some :: sprinkled in.

 this, but I think there are people who'd be happier if we killed
 defaulting too). Too much polymorphism /can/ be a bad thing. But I

Also me :)  I'd like that 'default' keyword back too, it's too useful
for variables names to waste on something so obscure.

That said, I like OverloadedStrings.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: Long live String = [Char] (Was: Re: String != [Char])

2012-03-25 Thread Henrik Nilsson

Hi all,

Thomas Schilling wrote:

 OK, I agree that breaking text books is a big deal.  On the other
 hand, the lack of a good Text data type forced text books to teach bad
 approaches to dealing with strings.  Haskell should do better.

As far as I know, none of the introductory Haskell text books
has the ambition of teaching serious text processing in Haskell.
And what they do for simple text processing for purpose of illustration
is no worse than what one typically would do in, say, an introduction
to programming using any language, like C or Java.

So I don't buy that argument per se. But I do agree, of course,
that a good library for text processing, and with adequate language
support for making it convenient to use, is important.

 Johan mentioned both semantic and performance problems with Strings.
 A part he didn't stress is that Strings are also a horribly
 memory-inefficient way of storing strings.  On 64 bit GHC systems a
 single ASCII character needs 16 bytes of memory (i.e., an overhead of
 16x). A non-ASCII character (ord c  255) actually requires 32 bytes.
 (This is due to a de-duplication optimisation in the GHC GC).  Other
 implementations may do better, but an abstract type would still be
 better to enable more freedom for implementors.

Sure it's inefficient. I doubt the above is news to anyone on this list.
The point, though, is that once we're at the level of applications,
in most cases, this inefficiency is negligible.

And in the cases where it is not, the programmer will be well aware
of this and pick a better representation, or will learn about it
the hard way and be forced to pick a better representation. Just
as with processing of significant amounts of *any* data.

It simply isn't the case that the Haskell world magically would
be significantly better of in terms of performance of only everyone
was forced to use something like Text instead of String = [Char].

Moreover, the above analysis is unnecessarily pessimistic for one
(somewhat important case: string literals. Thanks to Haskell being
lazy, it is very easy if one really worry (for an implementor) to
arrange that string literals are stored very compactly in a binary,
only to be materialized when (and if) actually used. (I did just that
years ago in the Freja compiler: memory was significantly smaller in
those days, so I did worry :-)

 Correct handling of unicode strings is a Hard Problem and String =
 [Char] is only better if you ignore all the issues (which is certainly
 fine a teaching environment).

Yes. Unicode is unfortunately (partly but not exclusively out of
necessity), very complicated. I doubt one would want to discuss
this in depth in any introductory programming course.

My point was that String = [Char] is fine as far as it goes. Not that
it should be the basis for serious string processing libraries.

 I would be happy to have a simplistic String = [Char] coexist with a
 Text type if it weren't for the problem that so many things are biased
 towards String.  E.g., error takes a String,

Yes. That's a bias. But is it a problem? Here we're just talking
about getting a sequence of (possibly unicode) characters to stderr.

 Show is used everywhere and produces strings,

Show and Read are mainly used for simplistic serialisation
and deserialisation. When ppl really care, they tend to use
more refined approaches, e.g. proper scanners and parsers,
or binary I/O. So again, while there is certainly a bias,
it doesn't seem like a genuine problem in most cases.

I can possibly see issues for conversion from and to e.g.
built-in numeric types and various string representations,
but I can't see why solving those would necessitate
getting rid of String = [Char]. Read and Show could be
overloaded on the string type, for example (at least
given multi-parameter type classes), and/or a bit of
compiler optimization ought to be enough to dispatch such
uses of read and show to appropriate primitives of e.g.
the Text library anyway.

 the pretty printing library uses Strings,

But that is a library issue, not a language issue.

 Read parses Strings.

See above.

The special status of Read and Show is questionable anyway.
Will hopefully be possible at some point to implement those
completely as libraries. So I'm not not overly swayed by the argument
of language bias in those cases.

 As I said, while I'm not a huge fan of having two String types
 co-exist, I could accept it as a necessary trade-off to keep text
 books valid and preserve backwards compatibility.

While an undue proliferation of string types would be unfortunate,
compared with the plethora of other representational choices one
is faced with when it comes to e.g. numeric types, arrays, maps,
etc., a couple of string types doesn't seem like a big deal,
especially not if one is designated the default choice for
any program that will do non-trivial text processing or aims
at doing internationalisation properly.

 (There are also other issues with String.  For example, 

Re: String != [Char]

2012-03-25 Thread Edward Kmett
On Sun, Mar 25, 2012 at 11:42 AM, Gabriel Dos Reis 
g...@integrable-solutions.net wrote:

 Perhaps we are underestimating their competences  and are
 complicating their lives unnecessarily...


Have you ever actually taught an introductory languages course?

If anything we delude ourselves by overestimating the ability of kids just
shortly out of highschool to assimilate an entire new worldview in a couple
of weeks while they are distracted by other things. Any additional
distraction that makes this harder is a serious pain point.

Consequently, in my experience, most instructors don't even go outside of
the Prelude, except perhaps to introduce simple custom data types that
their students define. The goal in that period is to get the students
accustomed to non-strictness, do some list processing, and hope that an
understanding of well-founded recursion vs. productive corecursion sticks,
because these are the things that you can't teach well in another language
and which are useful to the student no matter what tools they wind up using
in the future.

I would rather extra time be spent trying to get the users up to speed on
the really interesting and novel parts of the language, such as typeclasses
and monads in particular, than lose at least a quarter of my time fiddling
about with text processing, a special case API and qualified imports,
because those couple of weeks are going to shape many of those students'
opinion of the language forever.

-Edward
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-25 Thread Gabriel Dos Reis
On Sun, Mar 25, 2012 at 2:08 PM, Edward Kmett ekm...@gmail.com wrote:
 On Sun, Mar 25, 2012 at 11:42 AM, Gabriel Dos Reis
 g...@integrable-solutions.net wrote:

 Perhaps we are underestimating their competences  and are
 complicating their lives unnecessarily...


 Have you ever actually taught an introductory languages course?

Yes, and Haskell (if you asked); that is part of my daytime job.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-25 Thread Gabriel Dos Reis
On Sun, Mar 25, 2012 at 2:08 PM, Edward Kmett ekm...@gmail.com wrote:

 If anything we delude ourselves by overestimating the ability of kids just
 shortly out of highschool to assimilate an entire new worldview in a couple
 of weeks while they are distracted by other things. Any additional
 distraction that makes this harder is a serious pain point.

We are doing our students no favor, no good, in being condescending to
them pretending that they can't handle teaching material that would actually
be close real world experience.  If we truly believe that they don't have
enough time to learn what would really be useful to them, then we are
truly wasting their valuable time teaching them things they would have to
unlearn before writing good and correct code.  The education would have
been a complete failure and waste of resources.

 Consequently, in my experience, most instructors don't even go outside of
 the Prelude,

but  is that even a good thing?

 except perhaps to introduce simple custom data types that their
 students define.

I would say that does not do students any good, nor does it do justice to
the language.  If one believes that Haskell is unlike any other mainstream
language, then it is an opportunity to show that it can handle beautifully and
flawlessly some real world problems whose solutions are more involved
in other more popular languages.   These days, most of the students are
strolling around with smart phones that have lot of data in form of texts
(email, SMS chats, etc.)  What better real world data could you find at
a cheaper price to get them experiment with?  Restricting oneself to
purely academic
exercises, with no practical benefit whatsoever,  would only reinforce
students'
(mis)perception that the language isn't of any use to them -- and they would
probably be indulging more into distractions, and it would be hard to
blame them.

 The goal in that period is to get the students accustomed
 to non-strictness, do some list processing, and hope that an understanding
 of well-founded recursion vs. productive corecursion sticks, because these
 are the things that you can't teach well in another language and which are
 useful to the student no matter what tools they wind up using in the future.

I would say that is even more reasons to get them learn something that they
would not have to unlearn in order to remain harmless :-)


 I would rather extra time be spent trying to get the users up to speed on
 the really interesting and novel parts of the language, such as typeclasses
 and monads in particular, than lose at least a quarter of my time fiddling
 about with text processing, a special case API and qualified imports,
 because those couple of weeks are going to shape many of those students'
 opinion of the language forever.

More reasons not to show them anything that would reinforce the idea that
language should not be taken seriously and is complete waste of time.

You would be surprised to learn how bored students are *because* we obsess
too much on trying simplify their lives, while they are craving for us
to make it
more interesting, more challenging.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-25 Thread Daniel Peebles
On Sun, Mar 25, 2012 at 3:47 PM, Gabriel Dos Reis 
g...@integrable-solutions.net wrote:


 We are doing our students no favor, no good, in being condescending to
 them pretending that they can't handle teaching material that would
 actually
 be close real world experience.  If we truly believe that they don't have
 enough time to learn what would really be useful to them, then we are
 truly wasting their valuable time teaching them things they would have to
 unlearn before writing good and correct code.  The education would have
 been a complete failure and waste of resources.


When people teach Haskell, it typically isn't to give them real world
experience, but to teach them an interesting programming language and all
the great computer science it leads to. Types, laziness, higher-order
abstractions are the hard bits to learn, not a string-processing API.

If people want to learn how to deal with unicode correctly, I can think of
several better places to learn about it than a Haskell course. I don't
think it's condescending or impractical to focus on the things that make
Haskell unique, rather than teaching a unicode-correct API that could
conceivably be written in any other language. Learning that real human text
cannot be treated just an independent list of characters is something that
takes minutes to hours at most: someone tells you that there are all sorts
of exceptions to the list-of-chars paradigm, and then you read an article
or two on the language-specific difficulties, learn to use specialized API
functions, and then you get on with what you were actually trying to do.

So I think saying that ignoring unicode-correct strings a complete failure
and waste of resources is a bit hyperbolic, honestly.
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-25 Thread Gabriel Dos Reis
On Sun, Mar 25, 2012 at 4:03 PM, Daniel Peebles pumpkin...@gmail.com wrote:
 On Sun, Mar 25, 2012 at 3:47 PM, Gabriel Dos Reis
 g...@integrable-solutions.net wrote:


 We are doing our students no favor, no good, in being condescending to
 them pretending that they can't handle teaching material that would
 actually
 be close real world experience.  If we truly believe that they don't have
 enough time to learn what would really be useful to them, then we are
 truly wasting their valuable time teaching them things they would have to
 unlearn before writing good and correct code.  The education would have
 been a complete failure and waste of resources.


 When people teach Haskell, it typically isn't to give them real world
 experience, but to teach them an interesting programming language and all
 the great computer science it leads to.

Yes, but you have to frame it in the context of interesting problems, otherwise
it reduces to a series of dry, pointless, uninspiring series functions
named f, g, h :-)


 Types, laziness, higher-order
 abstractions are the hard bits to learn, not a string-processing API.

 If people want to learn how to deal with unicode correctly, I can think of
 several better places to learn about it than a Haskell course.

I don't think anybody suggested that a Haskell course should be a substitute
for Unicode course.  However, I maintain that it isn't an excuse to purposefully
teach something that the students have to unlearn.

 I don't think
 it's condescending or impractical to focus on the things that make Haskell
 unique, rather than teaching a unicode-correct API that could conceivably be
 written in any other language.

Why should a Unicode-correct API would have to be written in any other
language and not Haskell?

 Learning that real human text cannot be
 treated just an independent list of characters is something that takes
 minutes to hours at most: someone tells you that there are all sorts of
 exceptions to the list-of-chars paradigm, and then you read an article or
 two on the language-specific difficulties, learn to use specialized API
 functions, and then you get on with what you were actually trying to do.

Which brings us back to square zero: Is there any fundamental reason
why the language can't provide a good Unicode-correct API  to illustrate
solutions to text processing problems (many of them interesting) and that
can be used in introductory classes instead of having to say the
above.  Students,
like most children, learn by imitation.  They will replicate whatever they
are shown (for a long period of time, if not forever.) If it is true
that students
don't time, why should they have to waste that scarce resource listening to the
list-of-char  paradigm in the first place?

 So I think saying that ignoring unicode-correct strings a complete failure
 and waste of resources is a bit hyperbolic, honestly.

It may be an inconvenient truth, but not hyperbole [ I see you are
trying to be polite :-) ]  Look at the almost permanent damage done by the
culture that equated 'char*' to strings.  It may be inconvenient to say, but
[Char] isn't any better -- in fact, I'll go further and say: it is
spreading the same
damage but only with a different syntax.  The damage is semantics, no amount
of syntax clothing will undo it.  I realize that may appear as a
strong statement,
but think about it.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-25 Thread Henrik Nilsson

Hi all,

Gabriel Dos Reis wrote:

 Look at the almost permanent damage done by the
 culture that equated 'char*' to strings.  It may be inconvenient to
 say, but [Char] isn't any better -- in fact, I'll go further and say:
 it is spreading the same damage but only with a different syntax.
 The damage is semantics, no amount of syntax clothing will undo it.
 I realize that may appear as a strong statement, but think about it.

Reasonable people might choose to disagree with that.

In any case, this is hardly the place to to discuss how to best
teach Haskell or programming in general.

Nor is the Haskell standard a vehicle to prescribe how Haskell
should be taught or for what reasons Haskell should be taught:
that can only be decided by individual educators based in their
experience and given a specific teaching context.

Given intimate knowledge of our specific teaching context
here at Nottingham, I can say that removing String = [Char]
from the language wouldn't be helpful to us.

And we do respect our students.

Best,

/Henrik

--
Henrik Nilsson
School of Computer Science
The University of Nottingham
n...@cs.nott.ac.uk

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-25 Thread Gabriel Dos Reis
On Sun, Mar 25, 2012 at 6:54 PM, Henrik Nilsson n...@cs.nott.ac.uk wrote:

 In any case, this is hardly the place to to discuss how to best
 teach Haskell or programming in general.

Sure, I haven't seen any disagreement with that.

Note however that the pedagogical arguments was brought
in as support for the [Char] definition.  It is only natural that it being
challenged on that ground.

 Nor is the Haskell standard a vehicle to prescribe how Haskell
 should be taught or for what reasons Haskell should be taught:

I have not seen any assertion to that effect.

 that can only be decided by individual educators based in their
 experience and given a specific teaching context.

True, but should the language definition default to a string type
that is one the most unsuited for text processing in the 21st
century where global multilingualism abounds?  Even C has qualms
about that.

 Given intimate knowledge of our specific teaching context
 here at Nottingham, I can say that removing String = [Char]
 from the language wouldn't be helpful to us.

I have no doubt believing that if all texts my students have to
process are US ASCII, [Char] is more than sufficient.  So, I have
sympathy for your position.  However,  I doubt [Char] would be
adequate if I ask them to shared texts from their diverse cultures.
Should the language definition make it much harder to share such
experience in classroom when the primary argument for [Char]
is pedagogy?

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-25 Thread Greg Weber
On Sun, Mar 25, 2012 at 5:39 PM, Gabriel Dos Reis
g...@integrable-solutions.net wrote:
 On Sun, Mar 25, 2012 at 6:54 PM, Henrik Nilsson n...@cs.nott.ac.uk wrote:

 In any case, this is hardly the place to to discuss how to best
 teach Haskell or programming in general.

 Sure, I haven't seen any disagreement with that.

As interesting as this discussion is, I think this agreement is the
perfect segway to take it off list. Debating the usefulness of String
= [Char] does not seem to be productive here anyways. What would be
helpful is if alternatives were offered, tested out, and shared among
the teaching community. If this requires any changes to the language
(other than what is being discussed now), please let us know.

The Prelude is still going to export list functions whether they are
used on [Char] or not. So we are still in the position of requiring
qualified imports for Text functions or needing to change the Text
package or something about the language to avoid conflicts with list
functions.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: Long live String = [Char] (Was: Re: String != [Char])

2012-03-24 Thread Thomas Schilling
On 24 March 2012 12:53, Henrik Nilsson n...@cs.nott.ac.uk wrote:
 Hi all,

 Thomas Schilling wrote:

 I think most here agree that the main advantage of the current
 definition is only pedagogical.

 But that in itself is not a small deal. In fact, it's a pretty
 major advantage.

 Moreover, the utter simplicity of String = [Char] is a benefit
 in its own right. Let's not forget that this, in practice,
 across all Haskell applications, works just fine in the vast
 majority of cases.

 I get the sense that the proponents for deprecating, and ultimately
 get rid of, String = [Char], are suggesting that this would lead
 to noticeable performance improvements across the board by virtue
 of preventing programmers from accidentally making a poor choice
 of data structure for representing string. But I conjecture that
 the performance impact of switching form e.g. String to Text at
 the level of complete applications would be negligible in most
 cases, simply because most Haskell applications are not dominated
 by heavy-duty string processing. And those that are, probably
 already uses something like Text, and were written be people
 who know a thing or two about appropriate choice of data structures
 anyway.

 As to teaching:

 I don't really
 think that having an abstract type is such a big problem for teaching.
 You can do string processing by doing (pack . myfunction . unpack)

 Here at Nottingham, we're teaching all our 1st-year undergraduates
 Haskell. It works, but it is a challenge, and, alas, far from everyone
 gets it. And this is despite the module being taught by one of
 the leading and most experienced Haskell educators (and text book
 author), Graham Hutton.

 Without starting an endless discussion about how to best teach
 programming languages in general and Haskell in particular to
 (near) beginners, I dare say that idioms like the one suggested
 above would do nothing to help.

 String != [Char] would break no end of code, text books, tutorials,
 lecture slides, would not help with teaching Haskell, all
 for very little if any benefit in the grand scheme of things.

OK, I agree that breaking text books is a big deal.  On the other
hand, the lack of a good Text data type forced text books to teach bad
approaches to dealing with strings.  Haskell should do better.

Johan mentioned both semantic and performance problems with Strings.
A part he didn't stress is that Strings are also a horribly
memory-inefficient way of storing strings.  On 64 bit GHC systems a
single ASCII character needs 16 bytes of memory (i.e., an overhead of
16x). A non-ASCII character (ord c  255) actually requires 32 bytes.
(This is due to a de-duplication optimisation in the GHC GC).  Other
implementations may do better, but an abstract type would still be
better to enable more freedom for implementors.

Correct handling of unicode strings is a Hard Problem and String =
[Char] is only better if you ignore all the issues (which is certainly
fine a teaching environment).

I would be happy to have a simplistic String = [Char] coexist with a
Text type if it weren't for the problem that so many things are biased
towards String.  E.g., error takes a String, Show is used everywhere
and produces strings, the pretty printing library uses Strings, Read
parses Strings.

 On the other hand, a standardised, well thought-out, API for
 high-performance strings and appropriate mechanisms such
 as a measure of overloading to make it easy and palatable to
 use, and that work alongside the present String = [Char], would be a
 good thing.

As I said, while I'm not a huge fan of having two String types
co-exist, I could accept it as a necessary trade-off to keep text
books valid and preserve backwards compatibility.  (There are also
other issues with String.  For example, you can't write an instance
MyClass String in Haskell2010, and even with GHC extensions it seems
wrong and you often end up writing instances that overlap with MyClass
[a].)  I'm using Data.Text a lot, so I can work around the issue, but
unfortunately you run into a lot of issues where the standard library
forces the use of String, and that, I believe, is wrong.

If changing the standard library is the bigger issue, however, then
I'm not sure whether this discussion needs to take place on the
haskell-prime list or on the libraries list.

/ Thomas

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Ian Lynagh

Hi Johan,

On Sat, Mar 24, 2012 at 11:50:10AM -0700, Johan Tibell wrote:
 
 On Sat, Mar 24, 2012 at 12:39 AM, Heinrich Apfelmus
 apfel...@quantentunnel.de wrote:
  Which brings me to the fundamental question behind this proposal: Why do we
  need Text at all? What are its virtues and how do they compare? What is the
  trade-off? (I'm not familiar enough with the Text library to answer these.)
 
  To put it very pointedly: is a %20 performance increase on the current
  generation of computers worth the cost in terms of ease-of-use, when the
  performance can equally be gained by buying a faster computer or more RAM?
  I'm not sure whether I even agree with this statement, but this is the
  trade-off we are deciding on.
 
 Correctness
 ==
 
 Using list-based operations on Strings are almost always wrong

Data.Text seems to think that many of them are worth reimplementing for
Text. It looks like someone's systematically gone through Data.List.
And in fact, very few functions there /don't/ look like they are
directly equivalent to list functions.

 , as
 soon as you move away from English text. You almost always have to
 deal with Unicode strings as blobs, considering several code points at
 once. For example,
 
 upcase :: String - String
 upcase = map toUpper

This is no more incorrect than
upcase = Data.Text.map toUpper

There's no reason that there couldn't be a Data.String.toUpper
corresponding to Data.Text.toUpper.

 Performance
 ===
 
 Depending on the benchmark, the difference can be much bigger than
 20%. For example, here's a comparison of decoding UTF-8 byte data into
 a String vs a Text value:

I think Heinrich meant 20% performance in a useful program, not a
micro-benchmark.


Thanks
Ian


___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Thomas Schilling
On 24 March 2012 20:16, Ian Lynagh ig...@earth.li wrote:

 Hi Johan,

 On Sat, Mar 24, 2012 at 11:50:10AM -0700, Johan Tibell wrote:

 On Sat, Mar 24, 2012 at 12:39 AM, Heinrich Apfelmus
 apfel...@quantentunnel.de wrote:
  Which brings me to the fundamental question behind this proposal: Why do we
  need Text at all? What are its virtues and how do they compare? What is the
  trade-off? (I'm not familiar enough with the Text library to answer these.)
 
  To put it very pointedly: is a %20 performance increase on the current
  generation of computers worth the cost in terms of ease-of-use, when the
  performance can equally be gained by buying a faster computer or more RAM?
  I'm not sure whether I even agree with this statement, but this is the
  trade-off we are deciding on.

 Correctness
 ==

 Using list-based operations on Strings are almost always wrong

 Data.Text seems to think that many of them are worth reimplementing for
 Text. It looks like someone's systematically gone through Data.List.

That's exactly what happened as part of the platform inclusion
process.  In fact, there was quite a bit of bike shedding whether the
Text API should be compatible with the list API or not.  In the end
the decision was made to add all the list functions even if that
encouraged running into unicode issues.  I'm pretty sure you
participated in that discussion.



 Performance
 ===

 Depending on the benchmark, the difference can be much bigger than
 20%. For example, here's a comparison of decoding UTF-8 byte data into
 a String vs a Text value:

 I think Heinrich meant 20% performance in a useful program, not a
 micro-benchmark.

Generating web sites is a huge application area of Haskell and one
where a proper text type is in no way a micro optimisation.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Brandon Allbery
On Sat, Mar 24, 2012 at 16:16, Ian Lynagh ig...@earth.li wrote:

 On Sat, Mar 24, 2012 at 11:50:10AM -0700, Johan Tibell wrote:
  Using list-based operations on Strings are almost always wrong

 Data.Text seems to think that many of them are worth reimplementing for
 Text. It looks like someone's systematically gone through Data.List.
 And in fact, very few functions there /don't/ look like they are
 directly equivalent to list functions.


I was under the impression they have been very carefully designed to do the
right thing with characters represented by multiple codepoints, which is
something the String version *cannot* do.  It would help if Bryan were
involved with this discussion, though.  (I'm cc:ing him on this.)  Since
the whole point of Data.Text is to handle stuff like this properly I would
be surprised if your assertion that

 upcase :: String - String
  upcase = map toUpper

 This is no more incorrect than
upcase = Data.Text.map toUpper


is correct.

-- 
brandon s allbery  allber...@gmail.com
wandering unix systems administrator (available) (412) 475-9364 vm/sms
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Johan Tibell
On Sat, Mar 24, 2012 at 2:31 PM, Brandon Allbery allber...@gmail.com wrote:
 I was under the impression they have been very carefully designed to do the
 right thing with characters represented by multiple codepoints, which is
 something the String version *cannot* do.  It would help if Bryan were
 involved with this discussion, though.  (I'm cc:ing him on this.)  Since the
 whole point of Data.Text is to handle stuff like this properly I would be
 surprised if your assertion that

      upcase :: String - String
      upcase = map toUpper

 This is no more incorrect than
    upcase = Data.Text.map toUpper


 is correct.

This is simply not possible given the Unicode specification. There's
no code point that corresponds to the two characters used to represent
an upcased version of the essets. I think the list based API predates
Bryan.

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Freddie Manners
To add my tuppence-worth on this, addressed to no-one in particular:

(1) I think getting hung up on UTF-8 correctness is a distraction here.  I
can't imagine anyone suggesting that the C/C++ standards removed support
for (char*) because it wasn't UTF-8 correct: sure, you'd recommend people
use a different type when it matters, but the language standard itself
shouldn't be driven by technical issues that don't affect most people most
of the time.  I'm sure it's good engineering practice to worry about these
things, but the standard isn't there to encourage good engineering practice.

(2) I'd suggest that a proposal that advocated overloaded string literals
-- of which [Char] was an option -- couldn't be much more confusing from a
pedagogical perspective than the fact that numeric literals are overloaded.
 Since that seems to be one of the main biases in favour of [Char] in the
current standard, that might be a possible incremental fix.

Best,
Freddie

On 24 March 2012 22:15, Ian Lynagh ig...@earth.li wrote:

 On Sat, Mar 24, 2012 at 08:38:23PM +, Thomas Schilling wrote:
  On 24 March 2012 20:16, Ian Lynagh ig...@earth.li wrote:
  
   Correctness
   ==
  
   Using list-based operations on Strings are almost always wrong
  
   Data.Text seems to think that many of them are worth reimplementing for
   Text. It looks like someone's systematically gone through Data.List.
 
  That's exactly what happened as part of the platform inclusion
  process.  In fact, there was quite a bit of bike shedding whether the
  Text API should be compatible with the list API or not.  In the end
  the decision was made to add all the list functions even if that
  encouraged running into unicode issues.  I'm pretty sure you
  participated in that discussion.

 As far as I remember, a few functions were added to text and bytestring
 during that, but mostly the discussion was about naming.

 Even in the first 0.1 release of bytestring:

 http://hackage.haskell.org/packages/archive/text/0.1/doc/html/Data-Text.html
 there is a large amount of Data.List covered, e.g. map, transpose,
 foldl1', minimum, mapAccumR, groupBy.


 Thanks
 Ian


 ___
 Haskell-prime mailing list
 Haskell-prime@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-prime

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Thomas Schilling
On 24 March 2012 22:27, Ian Lynagh ig...@earth.li wrote:
 On Sat, Mar 24, 2012 at 05:31:48PM -0400, Brandon Allbery wrote:
 On Sat, Mar 24, 2012 at 16:16, Ian Lynagh ig...@earth.li wrote:

  On Sat, Mar 24, 2012 at 11:50:10AM -0700, Johan Tibell wrote:
   Using list-based operations on Strings are almost always wrong
 
  Data.Text seems to think that many of them are worth reimplementing for
  Text. It looks like someone's systematically gone through Data.List.
  And in fact, very few functions there /don't/ look like they are
  directly equivalent to list functions.
 

 I was under the impression they have been very carefully designed to do the
 right thing with characters represented by multiple codepoints, which is
 something the String version *cannot* do.  It would help if Bryan were
 involved with this discussion, though.  (I'm cc:ing him on this.)  Since
 the whole point of Data.Text is to handle stuff like this properly I would
 be surprised if your assertion that

      upcase :: String - String
       upcase = map toUpper
 
  This is no more incorrect than
     upcase = Data.Text.map toUpper

 is correct.

 I don't see how it could do any better, given both use
    toUpper :: Char - Char
 to do the hard work. That's why there is also a
    Data.Text.toUpper :: Text - Text

 Based on a very quick skim I think that there are only 3 such functions
 in Data.Text (toCaseFold, toLower, toUpper), although the 3
 justification functions may handle double-width characters properly.


 Anyway, my main point is that I don't think that either text or String
 should make it any easier for people to get things right. It's true that
 currently only text makes correct case-conversions easy, but only
 because no-one's written Data.String.to* yet.

The reason Text uses UTF16 internally is so that it can be used with
the ICU library (written in C, I think) which implements all the
difficult things (http://hackage.haskell.org/package/text-icu).
Reimplementing all that in Haskell would be a significant undertaking.
 You could do the same for String, but that would have to encode and
re-encode on each invokation.

BTW, I checked the version history of the text package and most of the
list functions existed already in Tom Harper's version that text was
based on in 2009.  If you look at the documentation you can see that
many of the list-like functions treat some invalid characters
specially, so they are different.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Johan Tibell
On Sat, Mar 24, 2012 at 3:45 PM, Isaac Dupree
m...@isaac.cedarswampstudios.org wrote:
 How is Text for small strings currently (e.g. one English word, if not one
 character)?  Can we reasonably recommend it for that?
 This recent question suggests it's still not great:
 http://stackoverflow.com/questions/9398572/memory-efficient-strings-in-haskell

It's definitely not as good as it could be with the common case being
2 bytes per code point and then some fixed overhead.

The UTF-8 GSoC project last summer was an attempt to see if we could
do better, but unfortunately GHC does a worse job streaming out of a
byte array containing utf-8 than out of a byte array containing utf-16
(due to bad branch layout.)

This resulted in some performance gains and some performance losses,
with some more wins and losses. As there are other engineering
benefits in favor of utf-16 (e.g. being able to use ICU efficiently)
we opted for not switching the decoding. If we can get GHC to the
point where it compiles an utf-8 based Text really well, we could
reconsider this decision.

There's also a design trade-off in Text that favors better asymptotic
complexity for some operations (e.g. taking substrings) that adds 2
words of overhead to every string.

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Thomas Schilling
On 24 March 2012 22:33, Freddie Manners f.mann...@gmail.com wrote:
 To add my tuppence-worth on this, addressed to no-one in particular:

 (1) I think getting hung up on UTF-8 correctness is a distraction here.  I
 can't imagine anyone suggesting that the C/C++ standards removed support for
 (char*) because it wasn't UTF-8 correct: sure, you'd recommend people use a
 different type when it matters, but the language standard itself shouldn't
 be driven by technical issues that don't affect most people most of the
 time.  I'm sure it's good engineering practice to worry about these things,
 but the standard isn't there to encourage good engineering practice.

It doesn't really have anything to do with UTF-8.  UTF-8 is just a
particular serialisation of a unicode string.

Here's a simple illustration of the problems one faces:  Let's say you
want to search for the string fix.  Now, the problem is that the
sequence 'f','i' could be represented both as ['f', 'i'] or as [chr
0xfb01] (the fi ligature).  The text-icu package provides a function
to normalise a string such that only one of these forms can occur in
each string.  Because the world's languages are rather complex there
are many more such cases which need to be handled properly (if you
don't want to run into weird corner cases).

 (2) I'd suggest that a proposal that advocated overloaded string literals --
 of which [Char] was an option -- couldn't be much more confusing from a
 pedagogical perspective than the fact that numeric literals are overloaded.
  Since that seems to be one of the main biases in favour of [Char] in the
 current standard, that might be a possible incremental fix.

I agree that this proposal should probably include the standardisation
of the OverloadedStrings extension.


 Best,
 Freddie


 On 24 March 2012 22:15, Ian Lynagh ig...@earth.li wrote:

 On Sat, Mar 24, 2012 at 08:38:23PM +, Thomas Schilling wrote:
  On 24 March 2012 20:16, Ian Lynagh ig...@earth.li wrote:
  
   Correctness
   ==
  
   Using list-based operations on Strings are almost always wrong
  
   Data.Text seems to think that many of them are worth reimplementing
   for
   Text. It looks like someone's systematically gone through Data.List.
 
  That's exactly what happened as part of the platform inclusion
  process.  In fact, there was quite a bit of bike shedding whether the
  Text API should be compatible with the list API or not.  In the end
  the decision was made to add all the list functions even if that
  encouraged running into unicode issues.  I'm pretty sure you
  participated in that discussion.

 As far as I remember, a few functions were added to text and bytestring
 during that, but mostly the discussion was about naming.

 Even in the first 0.1 release of bytestring:

  http://hackage.haskell.org/packages/archive/text/0.1/doc/html/Data-Text.html
 there is a large amount of Data.List covered, e.g. map, transpose,
 foldl1', minimum, mapAccumR, groupBy.


 Thanks
 Ian


 ___
 Haskell-prime mailing list
 Haskell-prime@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-prime



 ___
 Haskell-prime mailing list
 Haskell-prime@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-prime




-- 
Push the envelope. Watch it bend.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Johan Tibell
On Sat, Mar 24, 2012 at 3:33 PM, Freddie Manners f.mann...@gmail.com wrote:
 To add my tuppence-worth on this, addressed to no-one in particular:

 (1) I think getting hung up on UTF-8 correctness is a distraction here.  I
 can't imagine anyone suggesting that the C/C++ standards removed support for
 (char*) because it wasn't UTF-8 correct: sure, you'd recommend people use a
 different type when it matters, but the language standard itself shouldn't
 be driven by technical issues that don't affect most people most of the
 time.  I'm sure it's good engineering practice to worry about these things,
 but the standard isn't there to encourage good engineering practice.

(I assume you mean Unicode correctness. UTF-8 is only one possible
encoding. Also I'm not arguing for removing type String = [Char], I
arguing why Text is better than String.)

C++'s char* is morally equivalent of our ByteString, not Text. There's
no standardized C++ Unicode string type, ICU's UnicodeString is
perhaps the closest to one.

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Gabriel Dos Reis
On Sat, Mar 24, 2012 at 5:33 PM, Freddie Manners f.mann...@gmail.com wrote:
 To add my tuppence-worth on this, addressed to no-one in particular:

 (1) I think getting hung up on UTF-8 correctness is a distraction here.  I
 can't imagine anyone suggesting that the C/C++ standards removed support for
 (char*) because it wasn't UTF-8 correct: sure, you'd recommend people use a
 different type when it matters, but the language standard itself shouldn't
 be driven by technical issues that don't affect most people most of the
 time.  I'm sure it's good engineering practice to worry about these things,
 but the standard isn't there to encourage good engineering practice.

C++ does not consider 'char*' as the type of a string.

It has a standard template std::basic_string that can be instantiated on
char (giving std::string) or encoding type (of unicode characters) char16_t,
char32_t, and wchar_t giving rise to u16string, u32string, and wstring.
It has a large number of functions to manipulate a string as a sequence
(Haskell's statu quo) or as a text thanks to an elaborated
localization machinery.

-- Gaby, back to lurking mode

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Gabriel Dos Reis
On Sat, Mar 24, 2012 at 6:00 PM, Johan Tibell johan.tib...@gmail.com wrote:

 C++'s char* is morally equivalent of our ByteString, not Text. There's
 no standardized C++ Unicode string type, ICU's UnicodeString is
 perhaps the closest to one.

Hmm, std::u16string, std::u23string, and std::wstring are C++ standard
types to process Unicode texts.

Anyway, my inclination is that having a proper string in Haskell type would
be a Good Thing.  Sometimes it is worth breaking the textbook.

In our local Haskell system for AVR microcontrollers, we explicitly made
String distinct from [Char] -- we cannot afford the memory
inefficiency that [Char] entails, just to represent simple strings.

-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Johan Tibell
On Sat, Mar 24, 2012 at 4:42 PM, Gabriel Dos Reis
g...@integrable-solutions.net wrote:
 Hmm, std::u16string, std::u23string, and std::wstring are C++ standard
 types to process Unicode texts.

Note that at least u16string is too small to encode all of Unicode and
wstring might be as 16 bits is not enough to encode all of Unicode.

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Greg Weber
Can we all agree that

* Text can now demonstrate both CPU and RAM performance improvements
in benchmarks. Because Text is an opaque type it has a maximum
potential for future performance improvements. Declaring a String to
be a list limits performance improvements
* In a Unicode world, String = [Char] is not always correct: instead
for some operations one must operate on the String as a whole. Using a
[Char] type makes it much more likely for a programmer to  mistakenly
operate on individual characters. Using a Text type allows us to
choose to not expose character manipulation functions.
* The usage of String in the base libraries will continue as long as
Text is not in the language standard. This will continue to make
writing Haskell code a greater chore than is necessary: converting
between types, and working around the inconvenience of defining
typeclasses that operate on both String and [].


These are important enough to *try* to include Text into the standard,
even if there are objections to how it might practically be included.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Johan Tibell
On Sat, Mar 24, 2012 at 5:54 PM, Gabriel Dos Reis
g...@integrable-solutions.net wrote:
 I think there is a confusion here.  A Unicode character is an abstract
 entity.  For it to exist in some concrete form in a program, you need
 an encoding.  The fact that char16_t is 16-bit wide is irrelevant to
 whether it can be used in a representation of a Unicode text, just like
 uint8_t (e.g. 'unsigned char') can be used to encode Unicode string
 despite it being only 8-bit wide.   You do not need to make the
 character type exactly equal to the type of the individual element
 in the text representation.

Well, if you have a 21-bit type you can declare its value to be a
Unicode code point (which are numbered.) Using a char* that you claim
contain utf-8 encoded data is bad for safety, as there is no guarantee
that that's indeed the case.

 Note also that an encoding itself (whether UTF-8, UTF-16, etc.) is 
 insufficient
 as far as text processing goes; you also need a localization at the
 minimum.  It is the
 combination of the two that gives some meaning to text representation
 and operations.

text does that via ICU. Some operations would be possible without
using the locale, if it wasn't for those Turkish i:s. :/

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Gabriel Dos Reis
On Sat, Mar 24, 2012 at 9:09 PM, Greg Weber g...@gregweber.info wrote:
 # Switching to Text by default makes us embarrassed!

Text processing /is/ quick to embarrassment :-)

 Problem: we want to write beautiful (and possibly inefficient) code
 that is easy to explain. If nothing else, this is pedagologically
 important.
 The goals of this code are to:
  * use list processing pattern matching and functions on a string type

I may have missed this question so I will ask it (apologies if it is a
repeat):  Why is it believed that list processing pattern matching is
appropriate or the right tool for text processing?


-- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-24 Thread Greg Weber
On Sat, Mar 24, 2012 at 7:26 PM, Gabriel Dos Reis
g...@integrable-solutions.net wrote:
 On Sat, Mar 24, 2012 at 9:09 PM, Greg Weber g...@gregweber.info wrote:

 Problem: we want to write beautiful (and possibly inefficient) code
 that is easy to explain. If nothing else, this is pedagologically
 important.
 The goals of this code are to:
  * use list processing pattern matching and functions on a string type

 I may have missed this question so I will ask it (apologies if it is a
 repeat):  Why is it believed that list processing pattern matching is
 appropriate or the right tool for text processing?

Nobody said it is the right tool for text processing. In fact, I think
we all agreed it is the wrong tool for many cases. But it is easy for
students to understand since they are already being taught to use
lists for everything else.  It would be great if you can talk with
teachers of Haskell and figure out a better way to teach text
processing.



 -- Gaby

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-23 Thread ARJANEN Loïc Jean David
2012/3/22 Greg Weber g...@gregweber.info:
 I am not trying to win an argument with anyone. Just trying to do what
 is best for the community. Many others here have a better grasp of the
 issue than me and can help answer questions and come up with a
 solution.

 I am also not saying this proposal is done. A lot of thought and work
 is needed to ensure it can be implemented as smoothly as possible. It
 does seem though that everyone thinks it is a good proposal.

Sorry for the misunderstanding, but when you said that someone should
take this proposal up and help make sure it gets in the next batch, I
believed you thought we could take this proposal as is. Deeply sorry
for my error.
But now we have at least two tasks to do before we can put up the
proposal: define what operations should be supported by String and
should we apply this proposal in the next batch. Given that this
proposal will break many codebases (we shouldn't hope to apply all of
list's syntax to this string type) should we apply it alone or wait
until we have more other codebase-breakers to apply ?

-- 
ARJANEN Loïc Jean David
http://luigiscorner.wordpress.com
---
Computer science is no more about computers than astronomy is about
telescopes, biology is about microscopes, or chemistry is about
beakers and test tubes. Science is not about tools. It is about how we
use them, and what we find out when we do.
Michael R. Fellows and Ian Parberry

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-23 Thread Greg Weber
I would really just like for someone to show me how to create a wiki
proposal page :)

This proposal doesn't have to break any codebases.
One possibility is to add the Text type to the standard while keeping
String and marking it as deprecated.

On Fri, Mar 23, 2012 at 6:13 AM, ARJANEN Loïc Jean David
arjanen.l...@gmail.com wrote:
 2012/3/22 Greg Weber g...@gregweber.info:
 I am not trying to win an argument with anyone. Just trying to do what
 is best for the community. Many others here have a better grasp of the
 issue than me and can help answer questions and come up with a
 solution.

 I am also not saying this proposal is done. A lot of thought and work
 is needed to ensure it can be implemented as smoothly as possible. It
 does seem though that everyone thinks it is a good proposal.

 Sorry for the misunderstanding, but when you said that someone should
 take this proposal up and help make sure it gets in the next batch, I
 believed you thought we could take this proposal as is. Deeply sorry
 for my error.
 But now we have at least two tasks to do before we can put up the
 proposal: define what operations should be supported by String and
 should we apply this proposal in the next batch. Given that this
 proposal will break many codebases (we shouldn't hope to apply all of
 list's syntax to this string type) should we apply it alone or wait
 until we have more other codebase-breakers to apply ?

 --
 ARJANEN Loïc Jean David
 http://luigiscorner.wordpress.com
 ---
 Computer science is no more about computers than astronomy is about
 telescopes, biology is about microscopes, or chemistry is about
 beakers and test tubes. Science is not about tools. It is about how we
 use them, and what we find out when we do.
 Michael R. Fellows and Ian Parberry

 ___
 Haskell-prime mailing list
 Haskell-prime@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-prime

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-23 Thread Christian Siefkes
On 03/23/2012 02:13 PM, ARJANEN Loïc Jean David wrote:
 2012/3/22 Greg Weber g...@gregweber.info:
 But now we have at least two tasks to do before we can put up the
 proposal: define what operations should be supported by String and
 should we apply this proposal in the next batch. Given that this
 proposal will break many codebases (we shouldn't hope to apply all of
 list's syntax to this string type) should we apply it alone or wait
 until we have more other codebase-breakers to apply ?

I very much hope that the Haskell committee will never ever accept a
proposal that will break many codebases! That's what ruined Perl 6 und
Python 3, and quite unnecessarily so.

Even if I a future Haskell standard defines String as something that doesn't
have to be implemented as a list of Char, it still would have to behave as
if it was [Char] for most practical purposes (except performance-wise, of
course!). That's necessary for compatibility. Or String could just be
complemented with an additional standardized Text type, as Greg suggested.

Best regards
Christian

-- 
|--- Dr. Christian Siefkes --- christ...@siefkes.net ---
| Homepage: http://www.siefkes.net/ | Blog: http://www.keimform.de/
|Peer Production Everywhere:   http://peerconomy.org/wiki/
|-- OpenPGP Key ID: 0x346452D8 --
Just so that nobody takes his guess for the full truth, here's my standing
on keeping control, in 2 words (three?):
I won't.
-- Linus Torvalds, The Tanenbaum-Torvalds Debate



signature.asc
Description: OpenPGP digital signature
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-23 Thread Tillmann Rendel

Hi,

ARJANEN Loïc Jean David wrote:

But now we have at least two tasks to do before we can put up the
proposal: define what operations should be supported by String and
should we apply this proposal in the next batch. Given that this
proposal will break many codebases (we shouldn't hope to apply all of
list's syntax to this string type) should we apply it alone or wait
until we have more other codebase-breakers to apply ?


I would expect the following steps:

 1. Define what operations should be supported by String,
that is, define a String API, possibly including thoughts
on performance, formal specification, tests, benchmarks, ...

 2. Convince all Haskell implementations to provide an
implementation of the String API outside the Prelude, as an
additional module (in the base package?). That implementation can
be based on [Char] or something else.

 3. Convince all string-like-packages on Hackage to provide
exactly the String API in a separate module, so these packages
are now drop-in replacements for the String implementations from
step 2 above.

At this point, we haven't touched the Prelude, but we have a blessed 
String API with multiple implementations. So applications can opt-in to 
use that String API instead of the Prelude-based [Char]. This allows us to:


 4. Convince packages on Hackage to use the type String (from
step 2) instead of Prelude-based [Char]; or to use the StringLike
class instead of a concrete string type.

 5. Refine the String API according to experience.

And finally, we can:

 6. Change Prelude.String to be the type from step 2 above.


My hope is that because of steps 2 and 3, the investment from step 1 
starts to pay off long before we reach step 6.


  Tillmann

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-23 Thread Greg Weber
Does Python 3 have the equivalent of LANGUAGE pragmas? That is, as a
GHC user i can add {-# LANGUAGE OLD_STRINGS -#} and my program works
with the new language standard!

I think what ruined Perl 6 is that it is still under development!
Avoiding breakage is important. But throwing around comparisons to
Perl 6 is not going to help move the discussion along!

On Fri, Mar 23, 2012 at 6:33 AM, Christian Siefkes
christ...@siefkes.net wrote:
 On 03/23/2012 02:13 PM, ARJANEN Loïc Jean David wrote:
 2012/3/22 Greg Weber g...@gregweber.info:
 But now we have at least two tasks to do before we can put up the
 proposal: define what operations should be supported by String and
 should we apply this proposal in the next batch. Given that this
 proposal will break many codebases (we shouldn't hope to apply all of
 list's syntax to this string type) should we apply it alone or wait
 until we have more other codebase-breakers to apply ?

 I very much hope that the Haskell committee will never ever accept a
 proposal that will break many codebases! That's what ruined Perl 6 und
 Python 3, and quite unnecessarily so.

 Even if I a future Haskell standard defines String as something that doesn't
 have to be implemented as a list of Char, it still would have to behave as
 if it was [Char] for most practical purposes (except performance-wise, of
 course!). That's necessary for compatibility. Or String could just be
 complemented with an additional standardized Text type, as Greg suggested.

 Best regards
        Christian

 --
 |--- Dr. Christian Siefkes --- christ...@siefkes.net ---
 | Homepage: http://www.siefkes.net/ | Blog: http://www.keimform.de/
 |    Peer Production Everywhere:       http://peerconomy.org/wiki/
 |-- OpenPGP Key ID: 0x346452D8 --
 Just so that nobody takes his guess for the full truth, here's my standing
 on keeping control, in 2 words (three?):
 I won't.
        -- Linus Torvalds, The Tanenbaum-Torvalds Debate


 ___
 Haskell-prime mailing list
 Haskell-prime@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-prime


___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-23 Thread Ben Millwood
On Fri, Mar 23, 2012 at 1:24 PM, Greg Weber g...@gregweber.info wrote:
 I would really just like for someone to show me how to create a wiki
 proposal page :)

 This proposal doesn't have to break any codebases.
 One possibility is to add the Text type to the standard while keeping
 String and marking it as deprecated.


I'm in favour of this. In fact, I'm not sure if I would even deprecate
String. I think just adding Text or something Text-like to the
standard would be a good step towards encouraging libraries to use it
as their first choice. It might, however, be wise to first adopt GHC's
OverloadedStrings proposal so that the syntax for using string
alternatives is more convenient.

I'm inclined to start slow and small: OverloadedStrings is already in
use, and makes sense with overloaded numeric literals that we already
have, so I think it's reasonable to push for including that in the
standard. I don't think that blessing any other string type is going
to be very successful *without* OverloadedStrings, and I think that
Duncan is right that we want to introduce a new type before removing
the old one.

With regards to performance of fromString, I feel like if it was a
serious problem (and how many really big strings are going to be built
that way?) then an effort to do some special-case inlining (after all,
the parameters are constant and specified at compile time) might be
beneficial.

With regards to a general string API, I don't think a typeclass is the
correct solution (except for string literals); my view is that things
like ListLike may be practical but are awkward to use, and ambiguity
problems only make things more upsetting. I think we should just take
the abstract Text type and API, and leave implementors to do whatever
they want behind that.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-23 Thread Brandon Allbery
On Fri, Mar 23, 2012 at 13:05, Edward Kmett ekm...@gmail.com wrote:

 Isn't it enough that it is part of the platform?


As long as the entire Prelude and large chunks of the bootlibs are based
around String, String will be preferred.  String as a boxed singly-linked
list type is therefore a major problem.

-- 
brandon s allbery  allber...@gmail.com
wandering unix systems administrator (available) (412) 475-9364 vm/sms
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-23 Thread Edward Kmett
Like I said, my objection to including Text is a lot less strong than my 
feelings on any notion of deprecating String.

However, I still see a potentially huge downside from an pedagogical 
perspective to pushing Text, especially into a place where it will be front and 
center to new users. String lets the user learn about induction, and encourages 
a Haskelly programming style, where you aren't mucking about with indices and 
Builders everywhere, which is frankly very difficult to use when building Text. 
If you cons or append to build up a Text fragment, frankly you're doing it 
wrong.

The pedagogical concern is quite real, remember many introductory lanuage 
classes have time to present Haskell and the list data type and not much else. 
Showing parsing through pattern matching on strings makes a very powerful tool, 
its harder to show that with Text.

But even when taking apart Text, the choice of UTF16 internally makes it pretty 
much a worst case for many string manipulation purposes. (e.g. slicing has to 
spend linear time scanning the string) due to the existence of codepoints 
outside of plane 0. 

The major benefits of Text come from FFI opportunities, but even there if you 
dig into its internals it has to copy out of the array to talk to foreign 
functions because it lives in unpinned memory unlike ByteString.

The workarounds for these  limitations all require access to the internals, so 
a Text proposed in an implementation-agnostic manner is less than useful, and 
one supplied with a rigid set of implementation choices seems to fossilize the 
current design.

All of these things make me lean towards a position that it is premature to 
push Text as the one true text representation.

That I am very sympathetic to the position that the standard should ensure that 
there are Text equivalents for all of the exposed string operations, like read, 
show, etc, and the various IO primitives, so that a user who is savvy to all of 
these concerns has everything he needs to make his code perform well.

Sent from my iPad

On Mar 23, 2012, at 1:32 PM, Brandon Allbery allber...@gmail.com wrote:

 On Fri, Mar 23, 2012 at 13:05, Edward Kmett ekm...@gmail.com wrote:
 Isn't it enough that it is part of the platform?
 
 As long as the entire Prelude and large chunks of the bootlibs are based 
 around String, String will be preferred.  String as a boxed singly-linked 
 list type is therefore a major problem.
 
 -- 
 brandon s allbery  allber...@gmail.com
 wandering unix systems administrator (available) (412) 475-9364 vm/sms
 
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-23 Thread Brandon Allbery
On Fri, Mar 23, 2012 at 16:21, Nate Soares n...@so8r.es wrote:

 I think the 'naming issue' that you mention highlights the need for better
 use of type classes in the prelude.


...which is a rat's nest best avoided, unfortunately, unless the idea is to
stifle it entirely.  (How long have people been proposing alternatives with
no net effect?)

-- 
brandon s allbery  allber...@gmail.com
wandering unix systems administrator (available) (412) 475-9364 vm/sms
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-23 Thread Edward Kmett
On Fri, Mar 23, 2012 at 4:21 PM, Nate Soares n...@so8r.es wrote:

 Note that this might be a good time to consider re-factoring the list
 operations, for example, making ++ operate on monoids instead of just
 lists.


Note: we have () for Monoid, which was deliberately chosen rather than
generalizing (++) because that had already changed meaning from applying to
MonadPlus to the more restricted type during what I tend to refer to as the
great momomorphization revolution of 1998, and not every MonadPlus is
compatible with the corresponding monoid (Maybe comes to mind).

This also entails moving Data.Monoid into the standard.


 I think the 'naming issue' that you mention highlights the need for better
 use of type classes in the prelude.


The major issue with typeclasses for things like special-purpose containers
is that they almost inevitably wind up requiring multiparameter type
classes with functional dependencies, or they need type families. This then
runs afoul of the fact that since neither one is better than the other for
all usecases, neither one seems to be drifting any closer to
standardization.

-Edward
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-22 Thread ARJANEN Loïc Jean David

Le 22/03/2012 04:29, Greg Weber a écrit :

This proposal seems fairly uncontroversial at the moment. I would
really like it if someone more familiar with the proposal process can
take this proposal up and help make sure it gets in the next batch. I
can't even figure out how to create a wiki page for the proposal right
now :)


Well, this proposal seems uncontroversial because we didn't arrive to 
the difficult part: what operations should we define on this String type 
for it to be useful.
Because with only this proposal as it stands now (String defined as an 
implementation-defined newtype, a typeclass defined for conversion 
from/to String and [Char] as an instance of this typeclass), we're in a 
worse situation than before: not only String became useless given there 
is no operations defined on it, the only mean we have to portably work 
with it is to translate it to [Char] before doing anything.
So now, the fun part begins...what operations should String support ? I 
propose obtaining the length of a String, taking a substring of a given 
size beginning at a given index, taking the character at index i in a 
String, concatenation, converting a string to upper/lower case and 
determining if a string is contained in/a prefix/a suffix of another.
I am sure I am forgetting some useful operations and some operations I 
said are better placed in the typeclass or in a typeclass instance or 
are particular cases of general operations we should define rather than 
the particular cases. So, what are the operations we should define 
according to you ?


Regards,
ARJANEN Loïc

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-21 Thread ARJANEN Loïc Jean David

Le 20/03/2012 16:29, Tillmann Rendel a écrit :

Hi,

Thomas Schilling wrote:

I agree that the language standard should not prescribe the
implementation of a Text datatype.  It should instead require an
abstract data type (which may just be a newtype wrapper for [Char] in
some implementations) and a (minimal) set of operations on it.

Regarding the type class for converting to and from that type, there
is a perhaps more complicated question: The current fromString method
uses String as the source type which causes unnecessary overhead.


Is this still a problem if String would be replaced by an 
implementation-dependend newtype? Presumably, GHC would use a more 
efficient representation behind the newtype, so the following would be 
efficient in practice (or not?)


  newtype String
= ...

  class IsString a where
fromString :: String - a

The standard could even prescribe that an instance for [Char] exists:

  explode :: String - [Char]
  explode = ...

  instance IsString [Char] where
fromString = explode

Tillmann


A recent message on Haskell-café made me think that if the standard 
mandates that any instance exists, it should mandates that an instance 
exists for CString and CWString (C's strings and wide strings) or, more 
generally, that an instance exists for any foreign string type defined 
in the FFIs implemented. That is to say, if you implement a FFI for .Net 
and you expose .Net's string type, you should implement conversions 
between that string type and Haskell's one.


Regards,
ARJANEN Loïc


___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-21 Thread Greg Weber
This proposal seems fairly uncontroversial at the moment. I would
really like it if someone more familiar with the proposal process can
take this proposal up and help make sure it gets in the next batch. I
can't even figure out how to create a wiki page for the proposal right
now :)

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


RE: String != [Char]

2012-03-20 Thread Simon Marlow
 On Mon, Mar 19, 2012 at 9:02 AM, Christian Siefkes christ...@siefkes.net
 wrote:
  On 03/19/2012 04:53 PM, Johan Tibell wrote:
  I've been thinking about this question as well. How about
 
  class IsString s where
      unpackCString :: Ptr Word8 - CSize - s
 
  What's the Ptr Word8 supposed to contain? A UTF-8 encoded string?
 
 Yes.
 
 We could make a distinction between byte and Unicode literals and have:
 
 class IsBytes a where
 unpackBytes :: Ptr Word8 - Int - a
 
 class IsText a where
 unpackText :: Ptr Word8 - Int - a
 
 In the latter the caller guarantees that the passed in pointer points to
 wellformed UTF-8 data.

Is there a reason not to put all these methods in the IsString class, with 
appropriate default definitions?  You would need a UTF-8 encoder ( decoder) of 
course, but it would reduce the burden on clients and improve backwards 
compatibility.

Cheers,
Simon



___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-20 Thread Johan Tibell
On Tue, Mar 20, 2012 at 2:25 AM, Simon Marlow simon...@microsoft.com wrote:
 Is there a reason not to put all these methods in the IsString class, with 
 appropriate default definitions?  You would need a UTF-8 encoder ( decoder) 
 of course, but it would reduce the burden on clients and improve backwards 
 compatibility.

That sounds fine to me. I'm leaning towards only having
unpackUTF8String (in addition to the existing method), as in the
absence of proper byte literals we would have literals which change
types, depending on which bytes they contain*. Ugh!

* Is it even possible to create non-UTF8 literals without using
escaped sequences?

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Greg Weber
I actually was not able to successfully google for Text vs. String
benchmarks. If someone can point one out that would be very helpful.

On Sat, Mar 17, 2012 at 1:52 AM, Christopher Done
chrisd...@googlemail.com wrote:
 On 17 March 2012 05:30, Tony Morris tonymor...@gmail.com wrote:
 Do you know if there is a good write-up of the benefits of Data.Text
 over String? I'm aware of the advantages just by my own usage; hoping
 someone has documented it rather than in our heads.

 Good point, it would be good to collate the experience and wisdom of
 this decision with some benchmark results on the HaskellWiki as The
 Place to link to when justifying it.

 ___
 Haskell-prime mailing list
 Haskell-prime@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-prime

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Johan Tibell
Hi Greg,

There are a few blog posts on Bryan's blog. Here are two of them:


http://www.serpentine.com/blog/2009/10/09/announcing-a-major-revision-of-the-haskell-text-library/
http://www.serpentine.com/blog/2009/12/10/the-performance-of-data-text/

Unfortunately the blog seems partly broken. Images are missing and
some articles are missing altogether (i.e. the article is there but
the actualy body text is gone.)

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Duncan Coutts
On 17 March 2012 01:44, Greg Weber g...@gregweber.info wrote:
 the text library and Text data type have shown the worth in real world
 Haskell usage with GHC.
 I try to avoid String whenever possible, but I still have to deal with
 conversions and other issues.
 There is a lot of real work to be done to convert away from [Char],
 but I think we need to take it out of the language definition as a
 first step.

I'm pretty sure the majoirty of people would agree that if we were
making the Haskell standard nowadays we'd make String type abstract.

Unfortunately I fear making the change now will be quite disruptive,
though I don't think we've collectively put much effort yet into
working out just how disruptive.

In principle I'd support changing to reduce the number of string types
used in interfaces. From painful professional experience, I think that
one of the biggest things where C++ went wrong was not having a single
string type that everyone would use (I once had to write a C++
component integrating code that used 5 different string types). Like
Python 3, we should have two common string types used in interfaces:
string and bytes (with implementations like our current Text and
ByteString).

BTW, I don't think taking it out of the langauge would be a helpful
step. We actually want to tell people use *this* string type in
interfaces, not leave everyone to make their own choice. I think
taking it out of the language would tend to encourage everyone to make
their own choice.

Duncan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Thomas Schilling
On 18 March 2012 19:29, ARJANEN Loïc Jean David arjanen.l...@gmail.com wrote:

 Good point, but rather than specifying in the standard that the new string
 type should be the Text datatype, maybe the new definition should be that
 String is a newtype with suitable operations defined on it, and perhaps a
 typeclass to convert to and from this newtype. The reason of my remark is
 although most implementations compile to native code, an implementation
 compiling to, for example, JavaScript might wish to use JavaScript's string
 type rather than forcing its users to have a native library installed.

I agree that the language standard should not prescribe the
implementation of a Text datatype.  It should instead require an
abstract data type (which may just be a newtype wrapper for [Char] in
some implementations) and a (minimal) set of operations on it.

Regarding the type class for converting to and from that type, there
is a perhaps more complicated question: The current fromString method
uses String as the source type which causes unnecessary overhead. This
is unfortunate since GHC's built-in mechanism actually uses
unpackCString[Utf8]# which constructs the inefficient String
representation from a compact memory representation.  I think it would
be best if the new fromString/fromText class allowed an efficient
mechanism like that.  unpackCString# has type Addr# - [Char] which is
obviously GHC-specific.


-- 
Push the envelope. Watch it bend.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Johan Tibell
On Mon, Mar 19, 2012 at 8:45 AM, Thomas Schilling
nomin...@googlemail.com wrote:
 Regarding the type class for converting to and from that type, there
 is a perhaps more complicated question: The current fromString method
 uses String as the source type which causes unnecessary overhead. This
 is unfortunate since GHC's built-in mechanism actually uses
 unpackCString[Utf8]# which constructs the inefficient String
 representation from a compact memory representation.  I think it would
 be best if the new fromString/fromText class allowed an efficient
 mechanism like that.  unpackCString# has type Addr# - [Char] which is
 obviously GHC-specific.

I've been thinking about this question as well. How about

class IsString s where
unpackCString :: Ptr Word8 - CSize - s

It's morally equivalent of unpackCString#, but uses standard Haskell types.

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Christian Siefkes
On 03/19/2012 04:53 PM, Johan Tibell wrote:
 I've been thinking about this question as well. How about
 
 class IsString s where
 unpackCString :: Ptr Word8 - CSize - s

What's the Ptr Word8 supposed to contain? A UTF-8 encoded string?

Best regards
Christian

-- 
|--- Dr. Christian Siefkes --- christ...@siefkes.net ---
| Homepage: http://www.siefkes.net/ | Blog: http://www.keimform.de/
|Peer Production Everywhere:   http://peerconomy.org/wiki/
|-- OpenPGP Key ID: 0x346452D8 --
A choice of masters is not freedom.
-- Bradley M. Kuhn and Richard M. Stallman, Freedom Or Power?



signature.asc
Description: OpenPGP digital signature
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Johan Tibell
On Mon, Mar 19, 2012 at 9:02 AM, Christian Siefkes
christ...@siefkes.net wrote:
 On 03/19/2012 04:53 PM, Johan Tibell wrote:
 I've been thinking about this question as well. How about

 class IsString s where
     unpackCString :: Ptr Word8 - CSize - s

 What's the Ptr Word8 supposed to contain? A UTF-8 encoded string?

Yes.

We could make a distinction between byte and Unicode literals and have:

class IsBytes a where
unpackBytes :: Ptr Word8 - Int - a

class IsText a where
unpackText :: Ptr Word8 - Int - a

In the latter the caller guarantees that the passed in pointer points
to wellformed UTF-8 data.

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Greg Weber
This is the best I can do with Bryan's blog posts, but none of the
graphs (which contain all the information) show up:
http://web.archive.org/web/20100222031602/http://www.serpentine.com/blog/2009/12/10/the-performance-of-data-text/

If someone has some benchmarks that can be ran that would be helpful.

On Mon, Mar 19, 2012 at 7:51 AM, Johan Tibell johan.tib...@gmail.com wrote:
 Hi Greg,

 There are a few blog posts on Bryan's blog. Here are two of them:

    
 http://www.serpentine.com/blog/2009/10/09/announcing-a-major-revision-of-the-haskell-text-library/
    http://www.serpentine.com/blog/2009/12/10/the-performance-of-data-text/

 Unfortunately the blog seems partly broken. Images are missing and
 some articles are missing altogether (i.e. the article is there but
 the actualy body text is gone.)

 -- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


RE: String != [Char]

2012-03-19 Thread Simon Peyton-Jones
Don't forget that with -XOverloadedStrings we already have a IsString class.  
(That's not a Haskell Prime extension though.)

class IsString a where
fromString :: String - a

Simon

|  -Original Message-
|  From: haskell-prime-boun...@haskell.org [mailto:haskell-prime-
|  boun...@haskell.org] On Behalf Of Johan Tibell
|  Sent: 19 March 2012 15:54
|  To: Thomas Schilling
|  Cc: haskell-prime@haskell.org
|  Subject: Re: String != [Char]
|  
|  On Mon, Mar 19, 2012 at 8:45 AM, Thomas Schilling
|  nomin...@googlemail.com wrote:
|   Regarding the type class for converting to and from that type, there
|   is a perhaps more complicated question: The current fromString method
|   uses String as the source type which causes unnecessary overhead. This
|   is unfortunate since GHC's built-in mechanism actually uses
|   unpackCString[Utf8]# which constructs the inefficient String
|   representation from a compact memory representation.  I think it would
|   be best if the new fromString/fromText class allowed an efficient
|   mechanism like that.  unpackCString# has type Addr# - [Char] which is
|   obviously GHC-specific.
|  
|  I've been thinking about this question as well. How about
|  
|  class IsString s where
|  unpackCString :: Ptr Word8 - CSize - s
|  
|  It's morally equivalent of unpackCString#, but uses standard Haskell types.
|  
|  -- Johan
|  
|  ___
|  Haskell-prime mailing list
|  Haskell-prime@haskell.org
|  http://www.haskell.org/mailman/listinfo/haskell-prime



___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Daniel Peebles
If the input is specified to be UTF-8, wouldn't it be better to call the
method unpackUTF8 or something like that?

On Mon, Mar 19, 2012 at 12:59 PM, Johan Tibell johan.tib...@gmail.comwrote:

 On Mon, Mar 19, 2012 at 9:02 AM, Christian Siefkes
 christ...@siefkes.net wrote:
  On 03/19/2012 04:53 PM, Johan Tibell wrote:
  I've been thinking about this question as well. How about
 
  class IsString s where
  unpackCString :: Ptr Word8 - CSize - s
 
  What's the Ptr Word8 supposed to contain? A UTF-8 encoded string?

 Yes.

 We could make a distinction between byte and Unicode literals and have:

 class IsBytes a where
unpackBytes :: Ptr Word8 - Int - a

 class IsText a where
unpackText :: Ptr Word8 - Int - a

 In the latter the caller guarantees that the passed in pointer points
 to wellformed UTF-8 data.

 -- Johan

 ___
 Haskell-prime mailing list
 Haskell-prime@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-prime

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-19 Thread Johan Tibell
On Mon, Mar 19, 2012 at 2:55 PM, Daniel Peebles pumpkin...@gmail.com wrote:
 If the input is specified to be UTF-8, wouldn't it be better to call the
 method unpackUTF8 or something like that?

Sure.

-- Johan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-18 Thread ARJANEN Loïc Jean David

Le 17/03/2012 02:44, Greg Weber a écrit :

the text library and Text data type have shown the worth in real world
Haskell usage with GHC.
I try to avoid String whenever possible, but I still have to deal with
conversions and other issues.
There is a lot of real work to be done to convert away from [Char],
but I think we need to take it out of the language definition as a
first step.

I can only see one issue with the proposal: it can be convenient to
operate on a list of characters.
But I think there are plenty of solutions at our disposal. A simple
conversion from Text to a list of characters might suffice. In GHC,
OverloadedStrings means users would still be free to use String the
same way they are now.

Good point, but rather than specifying in the standard that the new 
string type should be the Text datatype, maybe the new definition should 
be that String is a newtype with suitable operations defined on it, and 
perhaps a typeclass to convert to and from this newtype. The reason of 
my remark is although most implementations compile to native code, an 
implementation compiling to, for example, JavaScript might wish to use 
JavaScript's string type rather than forcing its users to have a native 
library installed.


Regards,
ARJANEN Loïc

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-17 Thread Christopher Done
On 17 March 2012 05:30, Tony Morris tonymor...@gmail.com wrote:
 Do you know if there is a good write-up of the benefits of Data.Text
 over String? I'm aware of the advantages just by my own usage; hoping
 someone has documented it rather than in our heads.

Good point, it would be good to collate the experience and wisdom of
this decision with some benchmark results on the HaskellWiki as The
Place to link to when justifying it.

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: String != [Char]

2012-03-16 Thread Tony Morris
On 17/03/12 11:44, Greg Weber wrote:
 the text library and Text data type have shown the worth in real world
 Haskell usage with GHC.
 I try to avoid String whenever possible, but I still have to deal with
 conversions and other issues.
 There is a lot of real work to be done to convert away from [Char],
 but I think we need to take it out of the language definition as a
 first step.

 I can only see one issue with the proposal: it can be convenient to
 operate on a list of characters.
 But I think there are plenty of solutions at our disposal. A simple
 conversion from Text to a list of characters might suffice. In GHC,
 OverloadedStrings means users would still be free to use String the
 same way they are now.

 ___
 Haskell-prime mailing list
 Haskell-prime@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-prime
Do you know if there is a good write-up of the benefits of Data.Text
over String? I'm aware of the advantages just by my own usage; hoping
someone has documented it rather than in our heads.

-- 
Tony Morris
http://tmorris.net/



___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime