[Haskell-cafe] RE: Enum class

2005-01-09 Thread Arjun Guha
Well, you already have succ:: (Enum a) = a - a defined for all data of the 
Enum class.  You also need Bounded so that you can check maxBound.


This (untested) code should do it:

 next:: (Enum a, Bounded a) = a - a
 next v = if (toEnum v) == (toEnum (maxBound))
then minBound
else succ v

--
Arjun Guha [EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Some random newbie questions

2005-01-09 Thread O . Chitil

 I'm constantly surprised hearing from so many people about their space
 problems. I cannot remember having space problems with my programs. I
 don't know what everybody else is doing wrong :-)

 At least two common cases.

 Extracting compact data structures from large files.  The contents of
 the large file is read as a linked list (ugh) of pointers (double ugh)
 to 32-bit Chars (triple ugh) -- twelve times the size of the file, if
 my calculations are correct.  The contents can't be GC'ed before the
 extracted data is fully evaluated.  (Now if the file was an mmap'ed
 array, it wouldn't be so bad, perhaps in the next generation IO that
 people are discussing this will be easier?)

 Naive use of foldl.  I tend to think the default foldl should be
 strict (ie. replaced by foldl') -- are there important cases where it
 needs to be lazy?

Indeed, extracting a compact data structure from a large data structure
can easily cost much space because of lazy evaluation. foldl is probably
used mostly for that purpose. Me not having space problems is probably
related to the kind of programs I write. Most of my programs are program
transformations that take an abstract syntax tree as input and produce a
different abstract syntax tree (again a large structure). Trying to be
lazy then means trying to produce as much output as possible with
processing as little output as possible. More formally that means if there
is some partial input for a function such that for all completions of this
partial input to fully defined inputs all outputs of the function have a
common prefix, then the function should already yield this prefix as
output for the partial input.

In the example that I mentioned in my previous posting I did actually
originally compute size information for a large data structure, so did
extract something compact from something large. However, I then realised
that I only did some very basic arithmetic with the size information
before generating another large data structure of this computed size. So
then I decided to not to compute integers at all but do the arithmetic
directly on the algebraic data type. Gone was the space problem, without
using seq.

You might also want to look at Colin Runciman's paper What about the
Natural Numbers? in the Journal of Functional Programming in which he
argues for a type of lazy natural numbers, with the same semantics as data
Nat = Zero | Succ Nat. It fits much better for computing the size of a
lazy data structure.

I don't claim that all space problems can easily be dealt with.

Olaf

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Hugs vs GHC (again) was: Re: Some random newbiequestions

2005-01-09 Thread Marcin 'Qrczak' Kowalczyk
Simon Marlow [EMAIL PROTECTED] writes:

  - Do the character class functions (isUpper, isAlpha etc.) work
correctly on the full range of Unicode characters?  This is true in
Hugs.  It's true with GHC on some systems (basically we were lazy
and used the underlying C library's support here, which is patchy).

It's not obvious what the predicates should really mean, e.g. should
isDigit and isHexDigit include non-ASCII digits or should isSpace
include non-breaking space characters. Haskell 98 report gives some
guidelines which don't necessarily coincide with the C practice nor
with expectations from Unicode people.

Once this is agreed, it would be easy to make scripts which generate
C code from UnicodeData.txt tables from Unicode. I think table-driven
predicates and toUpper/toLower should better be implemented in C;
Haskell is not good at static constant tables with numbers.

Another issue is that the set of predicates provided by Haskell 98
library report is not enough to implement e.g. a Haskell 98 lexer,
which needs any Unicode symbol or punctuation. Case mapping would
be better done string - string rather than character - character;
this breaks a long established Haskell interface. Case mapping is
locale-sensitive (in very minor ways). Haskell doesn't provide
algorithms like normalization or collation. In general the Haskell 98
interface is not enough for complex Unicode processing.

  - Can you do String I/O in some encoding of Unicode?  No Haskell
compiler has support for this yet, and there are design decisions
to be made.

The problem with designing an API of recoders is that depending on
whether the recoder is implemented in Haskell or interfaced from C, it
needs different data representation. Pure Haskell recoders prefer lazy
lists of characters or bytes (except that a desire to detect source
errors or characters unavailable in the target encoding breaks this),
while high performance C prefers pointers to buffers with chunks of
text.

Transparent recoding makes some behavior hard to express. Imagine
parsing HTTP headers followed by \r\n\r\n and a binary file. If you
read headers line by line and decoding is performed in blocks, then
once you determine where the headers end it's too late to find the
start of the binary file: a part of it has already been decoded into
text. You have to determine the end of the headers while working with
bytes, not characters, and only convert the first part. Not performing
the recoding in blocks is tricky if the decoder is implemented in C.
Giving 1-byte buffers for lots of iconv() calls is not nice.

Or imagine parsing a HTML file with the encoding specified inside
it in a meta element. Switching the encoding in the middle is
incompatible with buffering. Maybe the best option is to parse the
beginning in ISO-8859-1 just to determine the encoding, and then
reparse everything again once the encoding is known.

If characters are recoded automatically on I/O, one is tempted to
extend the framework for other conversions like compression, line
ending convention, HTML character escaping etc.

  - What about Unicode FilePaths?  This was discussed a few months ago
on the haskell(-cafe) list, no support yet in any compiler.

Nobody knows what the semantics should be.

I've once written elsewhere a short report about handling filename
encodings in various languages and environments which use Unicode as
their string representation. Here it is (I've been later corrected
that Unicode non-characters are valid in UTF-x):

I describe here languages which exclusively use Unicode strings.
Some languages have both byte strings and Unicode strings (e.g. Python)
and then byte strings are generally used for strings exchanged with
the OS, the programmer is responsible for the conversion if he wishes
to use Unicode.

I consider situations when the encoding is implicit. For I/O of file
contents it's always possible to set the encoding explicitly somehow.

Corrections are welcome. This is mostly based on experimentation.


Java (Sun)
--

Strings are UTF-16.

Filenames are assumed to be in the locale encoding.

a) Interpreting. Bytes which cannot be converted are replaced by U+FFFD.

b) Creating. Characters which cannot be converted are replaced by ?.

Command line arguments and standard I/O are treated in the same way.


Java (GNU)
--

Strings are UTF-16.

Filenames are assumed to be in Java-modified UTF-8.

a) Interpreting. If a filename cannot be converted, a directory listing
   contains a null instead of a string object.

b) Creating. All Java characters are representable in Java-modified UTF-8.
   Obviously not all potential filenames can be represented.

Command line arguments are interpreted according to the locale.
Bytes which cannot be converted are skipped.

Standard I/O works in ISO-8859-1 by default. Obviously all input is
accepted. On output characters above U+00FF are replaced by ?.


C# (mono)
-

Strings are UTF-16.

Filenames use 

Re: [Haskell-cafe] Some random newbie questions

2005-01-09 Thread Jorge Adriano Aires
On Friday 07 January 2005 12:03, Ketil Malde wrote:
 Naive use of foldl.  I tend to think the default foldl should be
 strict (ie. replaced by foldl') -- are there important cases where it
 needs to be lazy?

Hi, 
One simple example would be,
 reverse = foldl (flip (:)) []

J.A.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Hugs vs GHC (again) was: Re: Some random newbiequestions

2005-01-09 Thread Jérémy Bobbio
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 9 janv. 05, at 20:03, Marcin 'Qrczak' Kowalczyk wrote:
Once this is agreed, it would be easy to make scripts which generate
C code from UnicodeData.txt tables from Unicode. I think table-driven
predicates and toUpper/toLower should better be implemented in C;
Haskell is not good at static constant tables with numbers.
Sebastien Carlier already wrote this for hOp, see :
http://etudiants.insia.org/~jbobbio/hOp/Gen_wctype.hs
Cheers,
Jérémy.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (Darwin)
iD8DBQFB4YO22PUjs9fQ72URAmkYAJ9bM3dgW003JByEAv11pPsjUPxJpgCdEHFp
1G76TJObsKboeEOUIky15Xw=
=fP3j
-END PGP SIGNATURE-
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Some random newbie questions

2005-01-09 Thread Marcin 'Qrczak' Kowalczyk
Jorge Adriano Aires [EMAIL PROTECTED] writes:

 Naive use of foldl. I tend to think the default foldl should be
 strict (ie. replaced by foldl') -- are there important cases where it
 needs to be lazy?

 Hi, 
 One simple example would be,
 reverse = foldl (flip (:)) []

No, it would work with strict foldl too. In fact in the absence
of optimization it would work better (uses less time and space).
The optimization required is inlining and strictness analysis.

A function which requires lazy foldl for correctness would have
to be sometimes lazy in its first argument, and at the same time
some partial results would have to be undefined. By function
I mean the first argument of foldl, treated as a binary function.

Here this doesn't apply because flip (:) x y is always defined. And
another common case for foldl, sum, doesn't apply because (+) is
usually strict on both arguments (although in principle it does not
have to be true because of overloading, which implies that a compiler
can only optimize particular specializations of sum, not generic sum).

I don't know of any real-life example.

Perhaps there are cases where evaluating partial results is correct
but inefficient. I don't know such case either.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Some random newbie questions

2005-01-09 Thread Jorge Adriano Aires

 On Friday 07 January 2005 12:03, Ketil Malde wrote:
  Naive use of foldl.  I tend to think the default foldl should be
  strict (ie. replaced by foldl') -- are there important cases where it
  needs to be lazy?

 Hi,
 One simple example would be,
  reverse = foldl (flip (:)) []

A better example would be building some other lazy structure that is strict 
on it's elements...
J.A.

---
module Test where
import Data.List 

data L = E | !Int :+: L deriving Show

-- my head
h (x:+:xs) = x 
h E= error ops

-- 
rev1 = foldl  (flip (:+:)) E
rev2 = foldl' (flip (:+:)) E

l= [error , error , 1::Int]
--

*Test h (rev1 l)
1
(0.00 secs, 264560 bytes)
*Test h (rev2 l)
*** Exception:
(0.01 secs, 264524 bytes)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Some random newbie questions

2005-01-09 Thread Jorge Adriano Aires

 No, it would work with strict foldl too. In fact in the absence
 of optimization it would work better (uses less time and space).
 The optimization required is inlining and strictness analysis.

Is this also true if your just going to use the first few elements after 
reversing it?

 A function which requires lazy foldl for correctness would have
 to be sometimes lazy in its first argument, and at the same time
 some partial results would have to be undefined. By function
 I mean the first argument of foldl, treated as a binary function.

 Here this doesn't apply because flip (:) x y is always defined. And
 another common case for foldl, sum, doesn't apply because (+) is
 usually strict on both arguments (although in principle it does not
 have to be true because of overloading, which implies that a compiler
 can only optimize particular specializations of sum, not generic sum).

 I don't know of any real-life example.

Yes you are right, my bad. I was thinking as if lists were not lazy on their 
elements, therefore my second example... But yes, now it is not a common 
example anymore. 

 Perhaps there are cases where evaluating partial results is correct
 but inefficient. I don't know such case either.

Just replace the errors on my second example by some big computations...

J.A.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Some random newbie questions

2005-01-09 Thread Marcin 'Qrczak' Kowalczyk
Jorge Adriano Aires [EMAIL PROTECTED] writes:

 No, it would work with strict foldl too. In fact in the absence
 of optimization it would work better (uses less time and space).
 The optimization required is inlining and strictness analysis.

 Is this also true if your just going to use the first few elements after 
 reversing it?

Yes. A strict fold would evaluate cons cells of the result while they
are constructed, not list elements. They are all defined (flip (:) x y
is always defined), so a strict foldl is correct.

Making a cons cell should be not more expensive than making a thunk
which will make a cons cell when evaluated. Well, unless the
implementation doesn't inline flip and thus making these thunks
is actually faster than running them. In this case a lazy foldl is
more efficient than a strict foldl, as long as a sufficiently small
part of the result is used; it's always less efficient if the whole
result is examined.

-- 
   __( Marcin Kowalczyk
   \__/   [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Some random newbie questions

2005-01-09 Thread Jorge Adriano Aires
On Sunday 09 January 2005 21:30, Marcin 'Qrczak' Kowalczyk wrote:
 Jorge Adriano Aires [EMAIL PROTECTED] writes:
  No, it would work with strict foldl too. In fact in the absence
  of optimization it would work better (uses less time and space).
  The optimization required is inlining and strictness analysis.
 
  Is this also true if your just going to use the first few elements after
  reversing it?

 Yes. A strict fold would evaluate cons cells of the result while they
 are constructed, not list elements. They are all defined (flip (:) x y
 is always defined), so a strict foldl is correct.

Yes, now I was refering to the efficiency issue only. 

 Making a cons cell should be not more expensive than making a thunk
 which will make a cons cell when evaluated. 

Ok, I wasn't sure about this...

 Well, unless the 
 implementation doesn't inline flip and thus making these thunks
 is actually faster than running them. In this case a lazy foldl is
 more efficient than a strict foldl, as long as a sufficiently small
 part of the result is used; it's always less efficient if the whole
 result is examined.

Right.

J.A.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Some random newbie questions

2005-01-09 Thread Jorge Adriano Aires
 (+) is
 usually strict on both arguments (although in principle it does not
 have to be true because of overloading, which implies that a compiler
 can only optimize particular specializations of sum, not generic sum).

Since you mention it, there was some talk about this in the #haskell channel, 
and I wondered why aren't sum and product members of Num with default 
instances, just like mconcat is also a member of Data.Monoid.Monoid. 
From the docs: 

mconcat :: [a] - a
Fold a list using the monoid. For most types, the default definition for 
mconcat will be used, but the function is included in the class definition so 
that an optimized version can be provided for specific types

J.A.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


RE: [Haskell-cafe] RE: Enum class

2005-01-09 Thread Arjun Guha
Sorry, that was a stupid error.  I meant to use fromEnum:

 next:: (Enum a, Bounded a) = a - a
 next v = if (fromEnum v) == (fromEnum maxBound)
then minBound
else succ v

I don't think this will quite work if fromEnum:: a - Int for type a, were not 
injective.  However, that might be part of the specification... I don't 
remember.

It should be fine, so long as the preimage of (fromEnum maxBound) is 
{maxBound}.

-Arjun

--
Arjun Guha [EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe