[Haskell-cafe] RE: Enum class
Well, you already have succ:: (Enum a) = a - a defined for all data of the Enum class. You also need Bounded so that you can check maxBound. This (untested) code should do it: next:: (Enum a, Bounded a) = a - a next v = if (toEnum v) == (toEnum (maxBound)) then minBound else succ v -- Arjun Guha [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Some random newbie questions
I'm constantly surprised hearing from so many people about their space problems. I cannot remember having space problems with my programs. I don't know what everybody else is doing wrong :-) At least two common cases. Extracting compact data structures from large files. The contents of the large file is read as a linked list (ugh) of pointers (double ugh) to 32-bit Chars (triple ugh) -- twelve times the size of the file, if my calculations are correct. The contents can't be GC'ed before the extracted data is fully evaluated. (Now if the file was an mmap'ed array, it wouldn't be so bad, perhaps in the next generation IO that people are discussing this will be easier?) Naive use of foldl. I tend to think the default foldl should be strict (ie. replaced by foldl') -- are there important cases where it needs to be lazy? Indeed, extracting a compact data structure from a large data structure can easily cost much space because of lazy evaluation. foldl is probably used mostly for that purpose. Me not having space problems is probably related to the kind of programs I write. Most of my programs are program transformations that take an abstract syntax tree as input and produce a different abstract syntax tree (again a large structure). Trying to be lazy then means trying to produce as much output as possible with processing as little output as possible. More formally that means if there is some partial input for a function such that for all completions of this partial input to fully defined inputs all outputs of the function have a common prefix, then the function should already yield this prefix as output for the partial input. In the example that I mentioned in my previous posting I did actually originally compute size information for a large data structure, so did extract something compact from something large. However, I then realised that I only did some very basic arithmetic with the size information before generating another large data structure of this computed size. So then I decided to not to compute integers at all but do the arithmetic directly on the algebraic data type. Gone was the space problem, without using seq. You might also want to look at Colin Runciman's paper What about the Natural Numbers? in the Journal of Functional Programming in which he argues for a type of lazy natural numbers, with the same semantics as data Nat = Zero | Succ Nat. It fits much better for computing the size of a lazy data structure. I don't claim that all space problems can easily be dealt with. Olaf ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Hugs vs GHC (again) was: Re: Some random newbiequestions
Simon Marlow [EMAIL PROTECTED] writes: - Do the character class functions (isUpper, isAlpha etc.) work correctly on the full range of Unicode characters? This is true in Hugs. It's true with GHC on some systems (basically we were lazy and used the underlying C library's support here, which is patchy). It's not obvious what the predicates should really mean, e.g. should isDigit and isHexDigit include non-ASCII digits or should isSpace include non-breaking space characters. Haskell 98 report gives some guidelines which don't necessarily coincide with the C practice nor with expectations from Unicode people. Once this is agreed, it would be easy to make scripts which generate C code from UnicodeData.txt tables from Unicode. I think table-driven predicates and toUpper/toLower should better be implemented in C; Haskell is not good at static constant tables with numbers. Another issue is that the set of predicates provided by Haskell 98 library report is not enough to implement e.g. a Haskell 98 lexer, which needs any Unicode symbol or punctuation. Case mapping would be better done string - string rather than character - character; this breaks a long established Haskell interface. Case mapping is locale-sensitive (in very minor ways). Haskell doesn't provide algorithms like normalization or collation. In general the Haskell 98 interface is not enough for complex Unicode processing. - Can you do String I/O in some encoding of Unicode? No Haskell compiler has support for this yet, and there are design decisions to be made. The problem with designing an API of recoders is that depending on whether the recoder is implemented in Haskell or interfaced from C, it needs different data representation. Pure Haskell recoders prefer lazy lists of characters or bytes (except that a desire to detect source errors or characters unavailable in the target encoding breaks this), while high performance C prefers pointers to buffers with chunks of text. Transparent recoding makes some behavior hard to express. Imagine parsing HTTP headers followed by \r\n\r\n and a binary file. If you read headers line by line and decoding is performed in blocks, then once you determine where the headers end it's too late to find the start of the binary file: a part of it has already been decoded into text. You have to determine the end of the headers while working with bytes, not characters, and only convert the first part. Not performing the recoding in blocks is tricky if the decoder is implemented in C. Giving 1-byte buffers for lots of iconv() calls is not nice. Or imagine parsing a HTML file with the encoding specified inside it in a meta element. Switching the encoding in the middle is incompatible with buffering. Maybe the best option is to parse the beginning in ISO-8859-1 just to determine the encoding, and then reparse everything again once the encoding is known. If characters are recoded automatically on I/O, one is tempted to extend the framework for other conversions like compression, line ending convention, HTML character escaping etc. - What about Unicode FilePaths? This was discussed a few months ago on the haskell(-cafe) list, no support yet in any compiler. Nobody knows what the semantics should be. I've once written elsewhere a short report about handling filename encodings in various languages and environments which use Unicode as their string representation. Here it is (I've been later corrected that Unicode non-characters are valid in UTF-x): I describe here languages which exclusively use Unicode strings. Some languages have both byte strings and Unicode strings (e.g. Python) and then byte strings are generally used for strings exchanged with the OS, the programmer is responsible for the conversion if he wishes to use Unicode. I consider situations when the encoding is implicit. For I/O of file contents it's always possible to set the encoding explicitly somehow. Corrections are welcome. This is mostly based on experimentation. Java (Sun) -- Strings are UTF-16. Filenames are assumed to be in the locale encoding. a) Interpreting. Bytes which cannot be converted are replaced by U+FFFD. b) Creating. Characters which cannot be converted are replaced by ?. Command line arguments and standard I/O are treated in the same way. Java (GNU) -- Strings are UTF-16. Filenames are assumed to be in Java-modified UTF-8. a) Interpreting. If a filename cannot be converted, a directory listing contains a null instead of a string object. b) Creating. All Java characters are representable in Java-modified UTF-8. Obviously not all potential filenames can be represented. Command line arguments are interpreted according to the locale. Bytes which cannot be converted are skipped. Standard I/O works in ISO-8859-1 by default. Obviously all input is accepted. On output characters above U+00FF are replaced by ?. C# (mono) - Strings are UTF-16. Filenames use
Re: [Haskell-cafe] Some random newbie questions
On Friday 07 January 2005 12:03, Ketil Malde wrote: Naive use of foldl. I tend to think the default foldl should be strict (ie. replaced by foldl') -- are there important cases where it needs to be lazy? Hi, One simple example would be, reverse = foldl (flip (:)) [] J.A. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Hugs vs GHC (again) was: Re: Some random newbiequestions
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 9 janv. 05, at 20:03, Marcin 'Qrczak' Kowalczyk wrote: Once this is agreed, it would be easy to make scripts which generate C code from UnicodeData.txt tables from Unicode. I think table-driven predicates and toUpper/toLower should better be implemented in C; Haskell is not good at static constant tables with numbers. Sebastien Carlier already wrote this for hOp, see : http://etudiants.insia.org/~jbobbio/hOp/Gen_wctype.hs Cheers, Jérémy. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (Darwin) iD8DBQFB4YO22PUjs9fQ72URAmkYAJ9bM3dgW003JByEAv11pPsjUPxJpgCdEHFp 1G76TJObsKboeEOUIky15Xw= =fP3j -END PGP SIGNATURE- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Some random newbie questions
Jorge Adriano Aires [EMAIL PROTECTED] writes: Naive use of foldl. I tend to think the default foldl should be strict (ie. replaced by foldl') -- are there important cases where it needs to be lazy? Hi, One simple example would be, reverse = foldl (flip (:)) [] No, it would work with strict foldl too. In fact in the absence of optimization it would work better (uses less time and space). The optimization required is inlining and strictness analysis. A function which requires lazy foldl for correctness would have to be sometimes lazy in its first argument, and at the same time some partial results would have to be undefined. By function I mean the first argument of foldl, treated as a binary function. Here this doesn't apply because flip (:) x y is always defined. And another common case for foldl, sum, doesn't apply because (+) is usually strict on both arguments (although in principle it does not have to be true because of overloading, which implies that a compiler can only optimize particular specializations of sum, not generic sum). I don't know of any real-life example. Perhaps there are cases where evaluating partial results is correct but inefficient. I don't know such case either. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Some random newbie questions
On Friday 07 January 2005 12:03, Ketil Malde wrote: Naive use of foldl. I tend to think the default foldl should be strict (ie. replaced by foldl') -- are there important cases where it needs to be lazy? Hi, One simple example would be, reverse = foldl (flip (:)) [] A better example would be building some other lazy structure that is strict on it's elements... J.A. --- module Test where import Data.List data L = E | !Int :+: L deriving Show -- my head h (x:+:xs) = x h E= error ops -- rev1 = foldl (flip (:+:)) E rev2 = foldl' (flip (:+:)) E l= [error , error , 1::Int] -- *Test h (rev1 l) 1 (0.00 secs, 264560 bytes) *Test h (rev2 l) *** Exception: (0.01 secs, 264524 bytes) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Some random newbie questions
No, it would work with strict foldl too. In fact in the absence of optimization it would work better (uses less time and space). The optimization required is inlining and strictness analysis. Is this also true if your just going to use the first few elements after reversing it? A function which requires lazy foldl for correctness would have to be sometimes lazy in its first argument, and at the same time some partial results would have to be undefined. By function I mean the first argument of foldl, treated as a binary function. Here this doesn't apply because flip (:) x y is always defined. And another common case for foldl, sum, doesn't apply because (+) is usually strict on both arguments (although in principle it does not have to be true because of overloading, which implies that a compiler can only optimize particular specializations of sum, not generic sum). I don't know of any real-life example. Yes you are right, my bad. I was thinking as if lists were not lazy on their elements, therefore my second example... But yes, now it is not a common example anymore. Perhaps there are cases where evaluating partial results is correct but inefficient. I don't know such case either. Just replace the errors on my second example by some big computations... J.A. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Some random newbie questions
Jorge Adriano Aires [EMAIL PROTECTED] writes: No, it would work with strict foldl too. In fact in the absence of optimization it would work better (uses less time and space). The optimization required is inlining and strictness analysis. Is this also true if your just going to use the first few elements after reversing it? Yes. A strict fold would evaluate cons cells of the result while they are constructed, not list elements. They are all defined (flip (:) x y is always defined), so a strict foldl is correct. Making a cons cell should be not more expensive than making a thunk which will make a cons cell when evaluated. Well, unless the implementation doesn't inline flip and thus making these thunks is actually faster than running them. In this case a lazy foldl is more efficient than a strict foldl, as long as a sufficiently small part of the result is used; it's always less efficient if the whole result is examined. -- __( Marcin Kowalczyk \__/ [EMAIL PROTECTED] ^^ http://qrnik.knm.org.pl/~qrczak/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Some random newbie questions
On Sunday 09 January 2005 21:30, Marcin 'Qrczak' Kowalczyk wrote: Jorge Adriano Aires [EMAIL PROTECTED] writes: No, it would work with strict foldl too. In fact in the absence of optimization it would work better (uses less time and space). The optimization required is inlining and strictness analysis. Is this also true if your just going to use the first few elements after reversing it? Yes. A strict fold would evaluate cons cells of the result while they are constructed, not list elements. They are all defined (flip (:) x y is always defined), so a strict foldl is correct. Yes, now I was refering to the efficiency issue only. Making a cons cell should be not more expensive than making a thunk which will make a cons cell when evaluated. Ok, I wasn't sure about this... Well, unless the implementation doesn't inline flip and thus making these thunks is actually faster than running them. In this case a lazy foldl is more efficient than a strict foldl, as long as a sufficiently small part of the result is used; it's always less efficient if the whole result is examined. Right. J.A. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Some random newbie questions
(+) is usually strict on both arguments (although in principle it does not have to be true because of overloading, which implies that a compiler can only optimize particular specializations of sum, not generic sum). Since you mention it, there was some talk about this in the #haskell channel, and I wondered why aren't sum and product members of Num with default instances, just like mconcat is also a member of Data.Monoid.Monoid. From the docs: mconcat :: [a] - a Fold a list using the monoid. For most types, the default definition for mconcat will be used, but the function is included in the class definition so that an optimized version can be provided for specific types J.A. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] RE: Enum class
Sorry, that was a stupid error. I meant to use fromEnum: next:: (Enum a, Bounded a) = a - a next v = if (fromEnum v) == (fromEnum maxBound) then minBound else succ v I don't think this will quite work if fromEnum:: a - Int for type a, were not injective. However, that might be part of the specification... I don't remember. It should be fine, so long as the preimage of (fromEnum maxBound) is {maxBound}. -Arjun -- Arjun Guha [EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe