Re: binary files in haskell

2001-02-08 Thread Fergus Henderson

On 08-Feb-2001, John Meacham [EMAIL PROTECTED] wrote:
 A nice advantage of using my mid-level routines is that there are very
 little requirements placed on 'Byte' as a type, this means that as long
 as to the outside world you only read in 8 bit values and spit 8 bit
 values out you can represent it internally however you want. 
 
 for example you might have a machine where a 16 bit word is the smallest
 addressable entity, if you relied on hPut Word8 then your program would
 not work since Word8 cannot exist on that platform. however if you made
 Byte be 16 bits and only used the bottom half of each word then your
 program will run unchanged even among architectures such as this.

I agree that `Byte' is a useful abstraction.
However, I think what you say about Word8 here is not correct.
Word8 can be implemented on a 16-bit machine just by computing all
arithmetic operations modulo 256.  There is no requirement that Word8
be physically 8 bits, just that it represents an 8-bit quantity.

Indeed, I think ghc uses this technique, representing Word8 as a full
machine word (e.g. 32 bits for x86, of which the topmost 24 are always
zero).

-- 
Fergus Henderson [EMAIL PROTECTED]  |  "I have always known that the pursuit
|  of excellence is a lethal habit"
WWW: http://www.cs.mu.oz.au/~fjh  | -- the last words of T. S. Garp.

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



Re: binary files in haskell

2001-02-06 Thread Koen Claessen

Marcin 'Qrczak' Kowalczyk wrote:

 | IMHO it looks nicer to have Byte in function names, so
 | there can be such type synonym too.

Style Warning!

Why do many people when designing libraries not make full
use of the Haskell module system? Instead of

  writeByte
  readByte

(or so), one could also say:

  Byte.write
  Byte.read

So that the context (Byte.) can be left out when
unambiguous, and one can just say:

  write
  read

/Koen.

--
Koen Claessen http://www.cs.chalmers.se/~koen
phone:+46-31-772 5424  mailto:[EMAIL PROTECTED]
-
Chalmers University of Technology, Gothenburg, Sweden


___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



Re: binary files in haskell

2001-02-06 Thread Johannes Waldmann

 Style Warning! ... writeByte - Byte.write

Yes yes yes please!

Often, if someone writes identifiersWithSuffix,
the suffix actually carries a type information or a module information,
and the programmer should use the type resp. module system
of the language to express that.

Should this also apply to names in the standard library?
like  Monad (filterM, zipWithM ,..) 
I mean, theoretically yes, but is it feasible to change it?

While we're at it, stylistically: the sight of "g" changing to "G"  
in  getLine - hGetLine (and similar) always irritates me. 

Best regards,
-- 
-- Johannes Waldmann  http://www.informatik.uni-leipzig.de/~joe/ --
-- [EMAIL PROTECTED] -- phone/fax (+49) 341 9732 204/252 --

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



Re: binary files in haskell

2001-02-06 Thread Stefan Karrmann

Marcin 'Qrczak' Kowalczyk schrieb folgendes am Tue, Feb 06, 2001 at 12:54:29PM +0100:
 On Tue, 6 Feb 2001, Olaf Chitil wrote:
 
  I just see one problem with John's proposal: the type Byte.
 
 type Byte = Word8

I would prefer

type Octet = Word8

to emphasise that the functions really uses 8 bits.

-- 
Stefan Karrmann

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



Re: binary files in haskell

2001-02-06 Thread Erik Meijer

This is exactly what I proposed when fmap and the other weird names were
introduced. Hopefully there are more allies now.

Erik

- Original Message -
From: "Koen Claessen" [EMAIL PROTECTED]
To: "The Haskell Mailing List" [EMAIL PROTECTED]
Sent: Tuesday, February 06, 2001 6:25 AM
Subject: Re: binary files in haskell


 Johannes Waldmann wrote:

  | Should this also apply to names in the standard
  | library? like Monad (filterM, zipWithM ,..)  I mean,
  | theoretically yes, but is it feasible to change it?

 Obviously, these functions should have been called:

   Monad.filter, Monad.zipWith

 The lazy programmer can then say:

   import Monad as M(*)

   M.filter, M.zipWith

 Just (asymptotically) 1 character more! :-)

  | getLine - hGetLine always irritates me.

 How about:

   import Handle as H

   H.getLine

 (This is a good example where type classes would not help
 making this any better, since the types of getLine and
 H.getLine are very different.)

 While we're at it, how about instead of the "fmap" function:

   Functor.map  (F.map)
   List.map (L.map)
   Maybe.map(M.map)

 The programmer can pick him/herself what function to use.
 (The Prelude really has too many functions in it, and very
 often the rationale for a function being in Prelude or in
 Char/List/Maybe/Monad/IO/etc. is not motivated.)

 What do people think about this? If people prefer these
 stylistic changes, I think we should not hesitate making
 them for Haskell/2 by completely redesigning the module
 structure and using more consistent naming conventions.

 /Koen.

 (*) What actually happened to the excellent proposal
 somebody made a while ago for Haskell98:

   import M = Monad

 ? I like it a lot!

 --
 Koen Claessen http://www.cs.chalmers.se/~koen
 phone:+46-31-772 5424  mailto:[EMAIL PROTECTED]
 -
 Chalmers University of Technology, Gothenburg, Sweden


 ___
 Haskell mailing list
 [EMAIL PROTECTED]
 http://www.haskell.org/mailman/listinfo/haskell


___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



RE: binary files in haskell

2001-02-06 Thread Simon Marlow

  How about this slightly more general interface, which works 
 with the new
  FFI libraries, and is trivial to implement on top of the 
 primitives in
  GHC's IOExts:
  
  hPut :: Storable a = Handle - a - IO ()
  hGet :: Storable a = Handle - IO a
 
 What about endianess? In which format are Floats or even just Bools
 stored? For a file which probably shall be read from 
 different machines
 this is not clear at all.

The behaviour is defined by the Storable instances for each type.  The
endianness for writing say an Int32 would be the same as the host
architecture, for instance.  If you want to work with just bytes, you
can always just use hPut and hGet at type Word8.

Overloading with Storable gives you more flexibility, since if you have
a way to serialise an object in memory for passing to a foreign
function, you also have a way to store it in binary format in a file
(modulo problems with pointers, of course).

In the long term, we'll want to be able to serialise more than just
Storable objects (c.f. the other overloaded binary I/O libraries out
there), and possibly make the output endian-independent - but after all
there's no requirement that Haskell's Int has the same size on all
implementations, so there's no guarantee that binary files written on
one machine will be readable on another, unless they only use explicitly
sized types or Integer.

Perhaps these should be called hPutStorable and hGetStorable so as not
to prematurely steal the best names.

 I think John is right that there needs to be a primitive interface for
 just writing bytes. You can then build anything more 
 complicated on top
 (probably different high-level ones for different purposes).
 
 I just see one problem with John's proposal: the type Byte. It is
 completely useless if you don't have operations that go with it;
 bit-operations and conversions to and from Int. The FFI 
 already defines
 such a type: Word8. So I suggest that the binary IO library 
 explicitely
 reads and writes Word8's.

yup, that's what I had in mind.

Cheers,
Simon

___
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe



Re: binary files in haskell

2001-02-05 Thread John Meacham

There is actually NO way with haskell 98 to portably read and write
binary files. and many libraries which provide this feature are
inherently non portable, (they assume a Haskell Char = C char = byte)
which is not necisarily the case. I wrote up a proposal for a binary
file IO mechanism to be added as a 'blessed addendum' to the standard at
best and as a commonly implmented extension (in hslibs) at least..

my original proposal messages  to the list can be gotten from 
http://www.ugcs.caltech.edu/~john/computer/haskell/
although without the messages from others in the list they may
seem out of context.

-- 
--
John Meacham   http://www.ugcs.caltech.edu/~john/
California Institute of Technology, Alum.  [EMAIL PROTECTED]
--

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell



Re: Binary files in Haskell

1998-03-16 Thread Malcolm Wallace

 Malcolm In fact, Binary allows you to process the data directly
 Malcolm from the disk rather than hauling it all into memory.
 Malcolm This eliminates start-up time altogether.  What's more,
 Malcolm provided the file is used read-only, you can use pure
 Malcolm lazy functions to read the data, rather than having to
 Malcolm sequence everything through the IO monad.
 
 Is the fact that the processing is "directly from disk" transparent to
 the programmer

Absolutely.

 or does the processing function have to be written
 differently to avoid startup time ?

No - the program can be identical but for one line, which opens the file.

 Does 'get' return its result lazily ?

There are two functions for reading, with different semantics - 'getAt'
returns its result eagerly in the IO monad, while 'getFAt' is a pure
function which returns its result lazily.

  class Binary where
...
getAt  :: BinHandle - BinPtr a - IO a
getFAt :: BinHandle - BinPtr a - a

The latter obviously requires some guarantee that the file cannot be
modified between the building of the thunk and the demand which
evaluates it.  So the 'getFAt' function only gives a result when the
file is in RO (read-only) mode.  (After writing to a file, it can be
"frozen" to RO mode using the 'freezeBin' operation.)

On the other hand, 'getAt' has to return its result eagerly, because
there is always the possibility that a subsequent 'putAt' operation
will overwrite that part of the file before the value read earlier
is needed.

 I've been looking at the html doc and the nhc Binary src, but I can't
 quite see a transparent way of processing data on disk. There is the
 putAt operation, but that needs a file pointer; and there is the
 BinArray, which can only be written sequentially.

Sorry I wasn't clear.  The original question was from someone who
wished only to *read* a huge quantity of data from a file in order to
do some processing on it - but not to write any data back.  I was
claiming that with the Binary library s/he could process the data
transparently direct from disk without having to load everything into
memory at once.  As I understand it, the quantity of data was so large
that having enough memory for everything was a real bottleneck.
Provided the processing function is in some sense incremental - does
not require *all* the data at once - then Binary will be a win.

As you say however, more general processing which involves update to
a structure is a rather different beast.  The Binary library does not
support transparent update in the way you describe, where only the
modified part of the value is written back.  As you note, the whole
(large) structure has to be written back.

Regards,
Malcolm

[EMAIL PROTECTED]
Dr Malcolm Wallace (functional programming research)   +44 1904 434756
Department of Computer Science, University of York, YORK YO1 5DD, U.K.
http://www.cs.york.ac.uk/~malcolm/





Re: Binary files in Haskell

1998-03-14 Thread Timothy Robin BARBOUR

 "Malcolm" == Malcolm Wallace [EMAIL PROTECTED] writes:

Malcolm In fact, Binary allows you to process the data directly
Malcolm from the disk rather than hauling it all into memory.
Malcolm This eliminates start-up time altogether.  What's more,
Malcolm provided the file is used read-only, you can use pure
Malcolm lazy functions to read the data, rather than having to
Malcolm sequence everything through the IO monad.

Is the fact that the processing is "directly from disk" transparent to
the programmer or does the processing function have to be written
differently to avoid startup time ? Does 'get' return its result
lazily ?

I've been looking at the html doc and the nhc Binary src, but I can't
quite see a transparent way of processing data on disk. There is the
putAt operation, but that needs a file pointer; and there is the
BinArray, which can only be written sequentially.

If one were to read a (large) data structure, modify it slightly
(in-place), then write back the change (without using a file pointer),
I imagine the whole data structure would be written back to
disk. Whereas using a memory mapped file, only modified pages would be
written back.

Would it be helpful and possible to bolt a mmap'ed file underneath
some of these operations so that large data structures could be
modified (either functionally or in-place) without having to write
back the whole structure ? I guess the problem comes back to lazy
closures and associated pointers to volatile code.

Tim
-- 


--
T.R.BARBOUR Email : [EMAIL PROTECTED]
--
Department of Computer Science
The University of Melbourne
Parkville, Victoria 3052
Australia
--





Re: Binary files in Haskell

1998-03-12 Thread Malcolm Wallace

 If the representation is simple enough, you might consider accessing
 the database via C ...
 
 Of course, Binary or Native would be easier, but the start-up time
 for a large database might be a problem.

In fact, Binary allows you to process the data directly from the disk
rather than hauling it all into memory.  This eliminates start-up time
altogether.  What's more, provided the file is used read-only, you can
use pure lazy functions to read the data, rather than having to sequence
everything through the IO monad.

(And of course for efficiency, the implementation of Binary is written
largely in C, via the GreenCard preprocessor for Haskell.)

Malcolm

[EMAIL PROTECTED]
Dr Malcolm Wallace (functional programming research)   +44 1904 434756
Department of Computer Science, University of York, YORK YO1 5DD, U.K.
http://www.cs.york.ac.uk/~malcolm/





Re: Binary files in Haskell

1998-03-11 Thread Malcolm Wallace

Alex Ferguson and (earlier) Yoshihiko Ichikawa ask for clarification of
the intent of the discussion on binary I/O.

It might help if I outline the capabilities of the Binary library
(currently available only for nhc13, but as Simon PJ notes, if you want
it in ghc/hugs, ask him).

   1. Reading and writing binary data in a prescribed format.
  Typical examples are image data in GIF, TIFF, Sun Raster etc,
  and user-account statistics files used in Unix.

Easy to program using the Binary library.

   2. Reading and writing part of Haskell data in a file.
  This resembles the array dumping mechanism found in some
  implementation of FORTRAN. 

Comes for free (by adding a "deriving" clause on your datatype).

   3. Warm starting Haskell sessions or programs.
  As found in ordinary implementations of Smalltalk, Lisp, 
  and Yale implementation of Haskell.

Can be programmed, if it amounts to selecting appropriate data
structures to "save" at the end of one run and "reload" at the
beginning of the next.  (It is a far more difficult problem if you
want to save un-evaluated closures.)

   4. Supporting orthogonal persistence in Haskell.
  This may lead to a persistent Haskell implementation.

No, the Binary library doesn't lend itself to this.

Regards,
Malcolm

[EMAIL PROTECTED]
Dr Malcolm Wallace (functional programming research)   +44 1904 434756
Department of Computer Science, University of York, YORK YO1 5DD, U.K.
http://www.cs.york.ac.uk/~malcolm/





Re: Binary files in Haskell

1998-03-11 Thread Simon L Peyton Jones


 Real world example: development tools process a large geometric data set to
 build a run-time optimized BSP tree with precalculated lighting and
 collision information.  The user application will not modify this data, but
 it will have to load it dynamically without slowing down a 30Hz
 graphics/interaction loop. (Apologies to Quake and Mario 64 :)
 
 Solution in C: load the data as contiguous raw bytes and cast pointers into
 the block to the appropriate structure types.
 
 Solution in Haskell?

Malcolm Wallace and Colin Runciman at York have been working
on this kind of thing.  They had a paper at the Haskell workshop
last year (which is on Malcolm's home page 
http://www.cs.york.ac.uk/~malcolm/).  I believe they are near to
releasing a revised version.  

It only works for nhc at the moment, but if enough people like you yell loudly
enough (construe this positively!) then we'll put it into GHC/Hugs too.
Our priorities are largely driven by what people ask for.

Simon






Re: Binary files in Haskell

1998-03-11 Thread Steve Roggenkamp

Alex Ferguson wrote:
 
 Tony Davie:
  And anyway didn't this start out about how to do I/O on binary data.
 
 To be honest, I'm not sure this discussion does necessarily address
 the original query:
 
 Steve Roggenkamp:
  I would like to use Haskell for several larger scale projects, but I
  can't figure out how to read and write binary data.
 

I'm not sure that it has.  I haven't saved every message, but it appears
I could use a Native class from ghc.  I've not tried this yet.  Chapter
7 of the Haskell 1.4 Report appears to deal with character I/O only and
I can't find any binary I/O operations in the library.

 Steve, can you clarify what sort of binary file manipulation it is you
 are actually looking for, and whether the thread so far has answered
 your question (either in the positive, or the negative)?
 

I currently working on a database which consists mainly of integers.  I
have to search large arrays of these integers which may contain over a
hundred million entries.  With this amount of data I'm looking at ways
to compress it to a minimum amount of physical space.  I also do not
want to have to parse a string to read in an integer.  Both
computational performance and space would suffer too much.  Thus the
need for storing binary data.

The major performance determinant appears to be the speed of the disk
drive.  Most disk drives can only deliver about 100-125 blocks/sec when
randomly accessing offsets in a file.  So I need to be careful about
which disk blocks I access.  Even with 250 million entries, I can find a
single entry in less than 30 disk I/Os by using binary search, so I
could conduct about three of these searches per second, assuming no help
from the disk cache.

I have considered using `mmap', but I think the database will eventually
outgrow the address space of a single process.

 Though in any case, I do agree that the whole persistency area is an
 alarmingly open one wrt Haskell, and the discussion has merit for its
 own sakes.  Yoshihiko Ichikawa's request for clarification was also
 rather to the point; we're in danger of covering the whole of a very
 broad and somewhat diffuse subject area -- possibly by a search algorithm
 reasonably characterisable as a species of Drunkard's Walk ;-)
 
 Slainte,
 Alex.

Steve

-- 
-
Steve Roggenkamp
InterNet:[EMAIL PROTECTED]
Address: 9159 Eversole Run Road
 Powell, OH  43065  USA
Phone:   614.873.6573
WWW: http://www.infinet.com/~sroggen/home.html





Re: Binary files in Haskell

1998-02-26 Thread Martin Norbäck

On Wed, 25 Feb 1998, Jerzy Karczmarczuk wrote:

: PS. Could somebody inform me what is the current status of
: multi-parametric classes? Concretely (for example) I would
: like to construct a generic Universal Parser
: 
:  type Uparser a c = [c] - [(a,[c])]
: 
: which consumes any stream and produces any result. The classical
: combinators do not care about [c] being character strings, but
: with two parameters it is not possible to apply directly the magic 
: word Monad. Of course, my problem is not veeery dramatic, but
: those multiparametric classes bother me already some time. 

Well, it is actually possible to solve with Haskell 1.4 without multi
parameter type classes.

Instead of making a specific parser data type, I've implemented a general
state transformer type, which can be directly used as a parser.

it has the type

 newtype StateM s m a = SM (s - m (a,s))
 unStateM (SM f) = f

and takes three arguments, a state type, a container monad and the result
type.

a typical Monad instantiation looks like this

 instance Monad m = Monad (StateM s m) where
   (SM stm) = f = SM $ \s - stm s = \(v,s') - unStateM (f v) s'
   return v = SM $ \s - return (v, s)

so your Uparser would be

 type Uparser a c = StateM c [] a

I have actually implemented the Hugs Parselib using these types.
If you are interested, you can mail me for the source.

n.

---[ www.dtek.chalmers.se/~d95mback ]--[ PGP: 0x453504F1 ]--[ UIN: 4439498 ]---





Re: Binary files in Haskell

1998-02-25 Thread Malcolm Wallace

Tim writes:
 I noticed the tracer stuff on the nhc web-site. It looks very much
 like a declarative-debugger, although that term never appeared in the
 documentation I read. Will it in fact be a declarative-debugger ?  

Debugging is certainly one of the applications of a tracer.  I'm not
sure what you mean by "declarative" in this context.  The browser
permits you to explore the program trace, but not to modify it - in
that sense you see a declaration of what occurred.

 If so, will it be portable to ghc?

Not without a large amount of work.  There are three parts to the tracer:

  1. The source-to-source transformation (from an ordinary program to one
 which computes its own trace in addition to its normal behaviour),
 is currently implemented within the nhc13 compiler, but has the
 potential to be re-written as a free-standing tool.  However, all
 primitively-defined operations (such as I/O) must also have a
 trace-computing variant available, and at present these are written
 by hand.
  2. Modifications to nhc13's runtime-system, in particular to the garbage
 collector, enable large programs to be feasibly traced without
 accumulating vast quantities of irrelevant trace data.
  3. The trace browser is a free-standing application which connects to
 a running Haskell program using sockets.  This is very portable.

Regards,
Malcolm

[EMAIL PROTECTED]
Dr Malcolm Wallace (functional programming research)   +44 1904 434756
Department of Computer Science, University of York, YORK YO1 5DD, U.K.
http://www.cs.york.ac.uk/~malcolm/





Re: Binary files in Haskell

1998-02-25 Thread S. Alexander Jacobson

On Wed, 25 Feb 1998, Jerzy Karczmarczuk wrote:
 Compete in the space of professional production quality languages?
 
 I am sorry, but if one day the Haskell, or any other language creators
 decide to include *everything* into their language, it will be The End
 of it.

Yes that comment went too far.  My point was not that Haskell must
implement all this functionality, but that absence of an ability (in
principle) to maintain persistent state easily would be a serious
shortcoming.  I understood some of the posts I saw to indicate that it
would be difficult, in principle, to provide a relatively transparent
persistence system.  e.g.

 I am afraid that storing under binary format the internal objects under
 user control might be difficult in a pure lazy language. There is no
 difference between closures and their reduced results, the referential
 transparence should be maintained. Moreover functions in Haskell do
 not belong to Eq.

Given that a huge amount of modern programming involves databases and
persistence (especially in a web environment), this strikes me as a huge
issue.  Of course, one might argue that Haskell programmers should just
call out to external database api's via Corba/DCOM. But, given that most
DBMS provide a declarative interface, and some provide atomic transactions
and object versioning, it should be possible to have a smoother storage
interface than that of pure Corba/DCOM API calls.

-Alex-
___
S. Alexander Jacobson   i2x Media  
1-212-697-0184 voice1-212-697-1427 fax






Re: Binary files in Haskell

1998-02-25 Thread Jerzy Karczmarczuk

The discussion about heap dumps, binary persistent objects, etc.
continues.

S. Alexander Jacobson:

 Java comes with a serialization mechanism that allows the user to save
 and load all objects ...


 Object databases, like PSE, provide caching and indirection facilities
  ...

 Relational databases forces the programmer to transalte data structures
 into a relational format and then use JDBC to store/retrieve them.
  ...

 I realize that implementing these options is easier in languages that are
 pointer or handle based rather than in languages that treat all objects as
 literals, but if Haskell does not provide a way to use facilities like
 these, it will not be able to compete in the space of professional
 production quality languages.
 
 -Alex-


???
Compete in the space of professional production quality languages?

I am sorry, but if one day the Haskell, or any other language creators
decide to include *everything* into their language, it will be The End
of it.

Of course, Scheme which has an internal compiler is able to store on
the disk the bytecodes, Python gives the user the full access to the
data internal structures an permits to 'pickle' or to 'marshal' some
modules. 

But is this, which makes them "professional production quality languages?
You cannot do it - normally - in "C" without special add-on libraries,
and saving internal data has nothing to do with storing and dynamic 
loading of the compiled code.

I am afraid that storing under binary format the internal objects under
user control might be difficult in a pure lazy language. There is no
difference between closures and their reduced results, the referential
transparence should be maintained. Moreover functions in Haskell do
not belong to Eq.

So, it could be done - if at all - only on the implementation level, totally
transparent for the user, just a kind of virtual memory. Is it really
vital for a typical Haskell application? But I am not an expert...


Jerzy Karczmarczuk
University of Caen, France.


**

PS. Could somebody inform me what is the current status of
multi-parametric classes? Concretely (for example) I would
like to construct a generic Universal Parser

 type Uparser a c = [c] - [(a,[c])]

which consumes any stream and produces any result. The classical
combinators do not care about [c] being character strings, but
with two parameters it is not possible to apply directly the magic 
word Monad. Of course, my problem is not veeery dramatic, but
those multiparametric classes bother me already some time. 





Re: Binary files in Haskell

1998-02-24 Thread Tony Davie

Tim Barbour writes

 "Tony" == Tony Davie [EMAIL PROTECTED] writes:

Tony This has well know disadvantages. Simon has already pointed
Tony out that it's not relocatable. It's also limited to the size
Tony of virtual memory and takes up that amount of space even
Tony though most of it may not be used. Loading a complete
Tony persistent memory at once is overkill. You should load it's
Tony objects lazily and only store away accessible stuff at
Tony commit time.

Relocation and address space are not such a problem on 64-bit
machines. Not having enough address space is really a disadvantage of
32-bit machines, not persistence.

I was suggesting using memory-mapped files, using e.g. mmap. Such a
file is mapped instantly into the address space, but its contents are
paged in and out on demand. A mmapped file takes up disk space only
for the populated part of its address region, and does not take up
virtual memory. It takes up real memory only according to demand for
pages from the file, in much the same way as buffer blocks do on a
conventional file.

Tim
--



A much more serious problem is that, in a persistent system, one may want
to extend the persistent store by adding new objects to it and making use,
in newly written programs, of the objects found there already. It's then
very inconvenient to treat the store in one monolithic lump. And anyway
didn't this start out about how to do I/O on binary data. Surely you don't
want to output everything every time you output a single item, only those
things reachable from it.

Tony Davie, Computer Science, St.Andrews University, North Haugh, St.Andrews
Scotland, KY16 9SS,  Tel: +44 1334 463257,  Fax: +44 1334
463278
mailto:[EMAIL PROTECTED]  Home:  http://www.dcs.st-and.ac.uk/~ad/Home.html
Handel Index and Chronology:
http://bruichladdich.dcs.st-and.ac.uk/HandelWWW/HandelCat.html
Limerick Archive:
http://bruichladdich.dcs.st-and.ac.uk/LimericksDir/Limericks.html

'There is magic in the web' - Othello, Act 3, Scene 4






Re: Binary files in Haskell

1998-02-24 Thread Timothy Robin BARBOUR

 "Tony" == Tony Davie [EMAIL PROTECTED] writes:

Tony This has well know disadvantages. Simon has already pointed
Tony out that it's not relocatable. It's also limited to the size
Tony of virtual memory and takes up that amount of space even
Tony though most of it may not be used. Loading a complete
Tony persistent memory at once is overkill. You should load it's
Tony objects lazily and only store away accessible stuff at
Tony commit time.

Relocation and address space are not such a problem on 64-bit
machines. Not having enough address space is really a disadvantage of
32-bit machines, not persistence.

I was suggesting using memory-mapped files, using e.g. mmap. Such a
file is mapped instantly into the address space, but its contents are
paged in and out on demand. A mmapped file takes up disk space only
for the populated part of its address region, and does not take up
virtual memory. It takes up real memory only according to demand for
pages from the file, in much the same way as buffer blocks do on a
conventional file.

Tim
-- 


--
T.R.BARBOUR Email : [EMAIL PROTECTED]
--
Department of Computer Science
The University of Melbourne
Parkville, Victoria 3052
Australia
--





Re: Binary files in Haskell

1998-02-24 Thread S. Alexander Jacobson


The debate about heavy or light-weight saving sounds a lot like the debate
in traditional programming languages of how users should implement
persistent state.

Java comes with a serialization mechanism that allows the user to save
and load all objects reachable from a particular object root. It handles
cyclic data structures by replacing memory pointers with pointers to
object locations in a handle table. (see
http://www.javasoft.com/products/jdk/1.1/docs/guide/serialization/index.html)

Object databases, like PSE, provide caching and indirection facilities
that allow the "lazy" loading/saving/updating of object graphs.  (Some
versions provide ACID transactions)
(see http://www.odi.com)

Relational databases forces the programmer to transalte data structures
into a relational format and then use JDBC to store/retrieve them.
(see http://www.javasoft.com/products/jdk/1.1/docs/guide/jdbc/index.html)

I realize that implementing these options is easier in languages that are
pointer or handle based rather than in languages that treat all objects as
literals, but if Haskell does not provide a way to use facilities like
these, it will not be able to compete in the space of professional
production quality languages.

-Alex-

___
S. Alexander Jacobson   i2x Media  
1-212-697-0184 voice1-212-697-1427 fax

On Tue, 24 Feb 1998, Tony Davie wrote:

 Tim Barbour writes
 
  "Tony" == Tony Davie [EMAIL PROTECTED] writes:
 
 Tony This has well know disadvantages. Simon has already pointed
 Tony out that it's not relocatable. It's also limited to the size
 Tony of virtual memory and takes up that amount of space even
 Tony though most of it may not be used. Loading a complete
 Tony persistent memory at once is overkill. You should load it's
 Tony objects lazily and only store away accessible stuff at
 Tony commit time.
 
 Relocation and address space are not such a problem on 64-bit
 machines. Not having enough address space is really a disadvantage of
 32-bit machines, not persistence.
 
 I was suggesting using memory-mapped files, using e.g. mmap. Such a
 file is mapped instantly into the address space, but its contents are
 paged in and out on demand. A mmapped file takes up disk space only
 for the populated part of its address region, and does not take up
 virtual memory. It takes up real memory only according to demand for
 pages from the file, in much the same way as buffer blocks do on a
 conventional file.
 
 Tim
 --
 
 
 
 A much more serious problem is that, in a persistent system, one may want
 to extend the persistent store by adding new objects to it and making use,
 in newly written programs, of the objects found there already. It's then
 very inconvenient to treat the store in one monolithic lump. And anyway
 didn't this start out about how to do I/O on binary data. Surely you don't
 want to output everything every time you output a single item, only those
 things reachable from it.
 
 Tony Davie, Computer Science, St.Andrews University, North Haugh, St.Andrews
 Scotland, KY16 9SS,  Tel: +44 1334 463257,  Fax: +44 1334
 463278
 mailto:[EMAIL PROTECTED]  Home:  http://www.dcs.st-and.ac.uk/~ad/Home.html
 Handel Index and Chronology:
 http://bruichladdich.dcs.st-and.ac.uk/HandelWWW/HandelCat.html
 Limerick Archive:
 http://bruichladdich.dcs.st-and.ac.uk/LimericksDir/Limericks.html
 
 'There is magic in the web' - Othello, Act 3, Scene 4
 






Re: Binary files in Haskell

1998-02-23 Thread Simon L Peyton Jones


 I would like to use Haskell for several larger scale projects, but I
 can't figure out how to read and write binary data.  It does not appear
 that the language supports binary files.  Am I missing something?

Colin Runciman and his Merrie Men are working on writing
Haskell values into binary files with the additional feature
that the files are compressed.  There was a paper in the '97 Haskell
workshop and I know he's been working on it since; but I don't
think a "product" has emerged for general use.  But with encouragement
perhaps it will!

Simon





Re: Binary files in Haskell

1998-02-23 Thread Olivier Lefevre

--MimeMultipartBoundary
Content-Type: text/plain; charset="us-ascii"

This thread and particularly the following passage:

   "If I understand this right, you're suggesting essentially dumping 
out part of the Haskell heap to a file."

reminds me very strongly of APL, where dumping the current state of
the system into a binary file that you can later reload (and resume
your work where you left it) has always been a possibility. This was
useful not just for interactive work but also for breaking down large
computations and escape the reaper built into the mainframe's batch 
queue :) Is it anything of the kind that Runciman is working on or 
something less ambitious?

Regards,

-- O.l.
--MimeMultipartBoundary--





Re: Binary files in Haskell

1998-02-23 Thread Malcolm Wallace

"Steve" == Steve Roggenkamp [EMAIL PROTECTED] writes:

  Steve I would like to use Haskell for several larger scale
  Steve projects, but I can't figure out how to read and write
  Steve binary data.  It does not appear that the language supports
  Steve binary files.  Am I missing something?

"Tim" == Timothy Robin BARBOUR [EMAIL PROTECTED] writes:

  Tim There is the (automatically deriveable) Binary class, defined in the
  Tim Haskell Report ? Unfortuately ghc does not support it yet.  But ghc
  Tim does provide the Native class (not automatically deriveable), but with
  Tim many pre-defined instances.

The Haskell Report no longer defines a Binary class.  However, the
nhc13 compiler now supports an improved library for the Binary class,
including automatic derivation for any user-defined type.

  Tim Another thing you will need is some way to transport your binary data
  Tim between platforms e.g. Intel - Alpha.

Nhc13's implementation of Binary values always stores the same representation,
regardless of the endian-ness of the underlying machine.

  Tim There have been claims that Read is sometimes very inefficient at
  Tim parsing large structures

Indeed.  The Binary class does not suffer from this problem.

  Tim There is another way one might proceed. Why not just use a
  Tim memory-mapped file (mmap) to make the data persistent in-place ?

With our current design, it is possible to map binary files into memory
and vice versa, in a completely orthogonal fashion.

  Tim The data structure better not contain any lazy closures,...

Yes, the three restrictions on our design are:
  (1) values must be fully-evaluated when written to a binary file;
  (2) sharing is lost, so cycles and infinite structures are excluded;
  (3) functions cannot yet be stored in binary files.

Full information on nhc13, including downloads, is available at
http://www.cs.york.ac.uk/fp/nhc13/
This "York" version of nhc13 is currently in beta-release, with a full
public release expected soon.  A paper describing a preliminary design
for Binary I/O was published in the Haskell Workshop '97 (and is available
from my homepage).  The current design has moved on quite a bit however -
some documentation is available in html, and a new paper is in preparation.

Regards,
Malcolm

[EMAIL PROTECTED]
Dr Malcolm Wallace (functional programming research)   +44 1904 434756
Department of Computer Science, University of York, YORK YO1 5DD, U.K.
http://www.cs.york.ac.uk/~malcolm/





Re: Binary files in Haskell

1998-02-23 Thread Yoshihiko ICHIKAWA


Sorry for interruption, but I feel this thread of discussion
includes four different aspects of `binary' data:

   1. Reading and writing binary data in a prescribed format.
  Typical examples are image data in GIF, TIFF, Sun Raster etc,
  and user-account statistics files used in Unix.

   2. Reading and writing part of Haskell data in a file.
  This resembles the array dumping mechanism found in some
  implementation of FORTRAN. 

   3. Warm starting Haskell sessions or programs.
  As found in ordinary implementations of Smalltalk, Lisp, 
  and Yale implementation of Haskell.

   4. Supporting orthogonal persistence in Haskell.
  This may lead to a persistent Haskell implementation.

Could someone clarify the main topic?   All of them?


Yoshihiko Ichikawa, Dept of Info Sci, Fac of Sci, Ochanomizu University
Phone:  +81-3-5978-5708 (Dial-in) / +81-3-5978-5704 (Library of Department)
Fax:+81-3-5978-5898 (Faculty) / +81-3-5878-5705 (Library of Department)
E-mail: [EMAIL PROTECTED]





Re: Binary files in Haskell

1998-02-23 Thread Tony Davie

This thread and particularly the following passage:

   "If I understand this right, you're suggesting essentially dumping
out part of the Haskell heap to a file."

reminds me very strongly of APL, where dumping the current state of
the system into a binary file that you can later reload (and resume
your work where you left it) has always been a possibility. This was
useful not just for interactive work but also for breaking down large
computations and escape the reaper built into the mainframe's batch
queue :) Is it anything of the kind that Runciman is working on or
something less ambitious?



This has well know disadvantages. Simon has already pointed out that it's
not relocatable. It's also limited to the size of virtual memory and takes
up that amount of space even though most of it may not be used. Loading a
complete persistent memory at once is overkill. You should load it's
objects lazily and only store away accessible stuff at commit time.


Tony Davie, Computer Science, St.Andrews University, North Haugh, St.Andrews
Scotland, KY16 9SS,  Tel: +44 1334 463257,  Fax: +44 1334
463278
mailto:[EMAIL PROTECTED]  Home:  http://www.dcs.st-and.ac.uk/~ad/Home.html
Handel Index and Chronology:
http://bruichladdich.dcs.st-and.ac.uk/HandelWWW/HandelCat.html
Limerick Archive:
http://bruichladdich.dcs.st-and.ac.uk/LimericksDir/Limericks.html

'There is magic in the web' - Othello, Act 3, Scene 4






Re: Binary files in Haskell

1998-02-23 Thread Olivier Lefevre

--MimeMultipartBoundary
Content-Type: text/plain; charset="us-ascii"

Tony Davie wrote:
 
 This has well know disadvantages. Simon has already pointed out that 
 it's not relocatable. 

How is it done in APL and in other systems that do it (SmallTalk
and some LISP systems, according to another poster)?

 Loading a complete persistent memory at once is overkill.

Perhaps but it has an appealing simplicity :) Dry-freezing your
workspace by hand, so to speak, must be a rather involved task,
if probably an interesting one.

-- O.L.
--MimeMultipartBoundary--





Re: Binary files in Haskell

1998-02-22 Thread Timothy Robin BARBOUR

 "Steve" == Steve Roggenkamp [EMAIL PROTECTED] writes:

Steve I would like to use Haskell for several larger scale
Steve projects, but I can't figure out how to read and write
Steve binary data.  It does not appear that the language supports
Steve binary files.  Am I missing something?

I have the same problem, but it is not intractable. 

I'd like some comments from Glasgow people on the last possibility
below.

There is the (automatically deriveable) Binary class, defined in the
Haskell Report ? Unfortuately ghc does not support it yet.  But ghc
does provide the Native class (not automatically deriveable), but with
many pre-defined instances. Using Native should be very like using C++
streams.

The function that flattens an arbitrary (mostly) type into bytes is
probably a polytypic function. There is a polytypic pre-processor for
Haskell called Polyp, but it has severe restrictions at present. It
was claimed some time ago that the next version of Polyp would remove
the restrictions. This would be rather useful. In the meantime there
is another pre-processor called Derive, which can be used to
automatically derive instances of non-standard classes (such as
Native). It lacks the elegance and generality of Polyp, but it is
usable now. See http://www.dcs.gla.ac.uk/~nww/derive.html .

Another thing you will need is some way to transport your binary data
between platforms e.g. Intel - Alpha. It would probably be
straighforward to make an endian-aware subclass of Native, that kept
the binary representation in network-byte-order. If no-one else does
this I will at some stage. Of course transporting between platforms
may be quite rare, in which case ascii (Show and Read) might
work. There have been claims that Read is sometimes very inefficient at
parsing large structures - it *might* need a million years for a large
file.

There is another way one might proceed. Why not just use a
memory-mapped file (mmap) to make the data persistent in-place ? This
would be a way of getting efficient persistence of (almost) any
Haskell data structure without any code-writing and without any
flattening. There are a few difficulties here, but it may well be
feasible. The problems are:

(i) The data structure better not contain any lazy closures, since
they are unlikely to be valid on a future run of the
program. Solution: restrict this technique to strict data structures
for now.

(ii) When the data structure is constructed, ghc will put in bits of
memory from all-over the heap. Solution: use the 2-space copying
garbage collector to gc the data structure (just give it the data
structure for its root set), with the too-space being the memory from
the mmapped file e.g. obtained using mmalloc. This will eliminate
ordinary ghc heap from the data structure, so the file can be closed.
For a large file (e.g. several Gb), closing would take quite a while
because of the gc. It would also need (temporarily) twice the storage
space of the file.

Doing the above might just be a case of knowing how to call the
garbage collector appropriately. 

Any comments from Glasgow ?

Tim
--


--
T.R.BARBOUR Email : [EMAIL PROTECTED]
--
Department of Computer Science
The University of Melbourne
Parkville, Victoria 3052
Australia
--





Re: Binary files in Haskell

1998-02-22 Thread Simon Marlow

Timothy Robin BARBOUR [EMAIL PROTECTED] writes:

 There is another way one might proceed. Why not just use a
 memory-mapped file (mmap) to make the data persistent in-place ? This
 would be a way of getting efficient persistence of (almost) any
 Haskell data structure without any code-writing and without any
 flattening. There are a few difficulties here, but it may well be
 feasible. The problems are:

If I understand this right, you're suggesting essentially dumping out
part of the Haskell heap to a file.  Assuming the data you want to
dump out is closed (i.e. has no external references) and is fully
evaluated, there are still some problems remaining:

- The data is location-dependent, and the exact location the
  data was originally resident at might not be available at
  load-time, so it would have to be relocated.

- Heap-resident data in GHC refers (by address) to the code of
  the program itself, and to parts of the run-time system.
  You'd have to do some kind of dynamic linking to be able to
  load data back again.

- The format of heap-resident data varies depending on things
  like profiling, so you wouldn't be able (for instance) to
  load data generated by a profiled program into one that isn't.

So, you could avoid these problems by always loading the data at the
same address in memory, and only using one program to save/load data.
If you recompiled the program, you'd have to regenerate the persistent
store.  This seems overly restrictive, it might be simpler to just use
the Native or Binary classes.

Cheers,
Simon

-- 
Simon Marlow [EMAIL PROTECTED]
University of Glasgow   http://www.dcs.gla.ac.uk/~simonm/
finger for PGP public key





Re: Binary files in Haskell

1997-03-21 Thread malcolm

Phil Trinder asked:

 Do any Haskell implementations support Binary files?

At York we have a version of nhc13 which supports, amongst other things,
binary file I/O.  We hope to release it to the world fairly soon.

We have confirmed that binary files enable a faster data transfer rate
than the textual parsing of ordinary files, as well as delivering
considerable space savings.  As with ordinary files, our implementation
allows only data to be stored - functional values are somewhat tricky.
Also, it is easy to lose sharing in the data structure when it is
converted to the binary format - however we now have a means of
avoiding that loss.

More details will be available soon.

Malcolm