Re: binary files in haskell
On 08-Feb-2001, John Meacham [EMAIL PROTECTED] wrote: A nice advantage of using my mid-level routines is that there are very little requirements placed on 'Byte' as a type, this means that as long as to the outside world you only read in 8 bit values and spit 8 bit values out you can represent it internally however you want. for example you might have a machine where a 16 bit word is the smallest addressable entity, if you relied on hPut Word8 then your program would not work since Word8 cannot exist on that platform. however if you made Byte be 16 bits and only used the bottom half of each word then your program will run unchanged even among architectures such as this. I agree that `Byte' is a useful abstraction. However, I think what you say about Word8 here is not correct. Word8 can be implemented on a 16-bit machine just by computing all arithmetic operations modulo 256. There is no requirement that Word8 be physically 8 bits, just that it represents an 8-bit quantity. Indeed, I think ghc uses this technique, representing Word8 as a full machine word (e.g. 32 bits for x86, of which the topmost 24 are always zero). -- Fergus Henderson [EMAIL PROTECTED] | "I have always known that the pursuit | of excellence is a lethal habit" WWW: http://www.cs.mu.oz.au/~fjh | -- the last words of T. S. Garp. ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: binary files in haskell
Marcin 'Qrczak' Kowalczyk wrote: | IMHO it looks nicer to have Byte in function names, so | there can be such type synonym too. Style Warning! Why do many people when designing libraries not make full use of the Haskell module system? Instead of writeByte readByte (or so), one could also say: Byte.write Byte.read So that the context (Byte.) can be left out when unambiguous, and one can just say: write read /Koen. -- Koen Claessen http://www.cs.chalmers.se/~koen phone:+46-31-772 5424 mailto:[EMAIL PROTECTED] - Chalmers University of Technology, Gothenburg, Sweden ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: binary files in haskell
Style Warning! ... writeByte - Byte.write Yes yes yes please! Often, if someone writes identifiersWithSuffix, the suffix actually carries a type information or a module information, and the programmer should use the type resp. module system of the language to express that. Should this also apply to names in the standard library? like Monad (filterM, zipWithM ,..) I mean, theoretically yes, but is it feasible to change it? While we're at it, stylistically: the sight of "g" changing to "G" in getLine - hGetLine (and similar) always irritates me. Best regards, -- -- Johannes Waldmann http://www.informatik.uni-leipzig.de/~joe/ -- -- [EMAIL PROTECTED] -- phone/fax (+49) 341 9732 204/252 -- ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: binary files in haskell
Marcin 'Qrczak' Kowalczyk schrieb folgendes am Tue, Feb 06, 2001 at 12:54:29PM +0100: On Tue, 6 Feb 2001, Olaf Chitil wrote: I just see one problem with John's proposal: the type Byte. type Byte = Word8 I would prefer type Octet = Word8 to emphasise that the functions really uses 8 bits. -- Stefan Karrmann ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: binary files in haskell
This is exactly what I proposed when fmap and the other weird names were introduced. Hopefully there are more allies now. Erik - Original Message - From: "Koen Claessen" [EMAIL PROTECTED] To: "The Haskell Mailing List" [EMAIL PROTECTED] Sent: Tuesday, February 06, 2001 6:25 AM Subject: Re: binary files in haskell Johannes Waldmann wrote: | Should this also apply to names in the standard | library? like Monad (filterM, zipWithM ,..) I mean, | theoretically yes, but is it feasible to change it? Obviously, these functions should have been called: Monad.filter, Monad.zipWith The lazy programmer can then say: import Monad as M(*) M.filter, M.zipWith Just (asymptotically) 1 character more! :-) | getLine - hGetLine always irritates me. How about: import Handle as H H.getLine (This is a good example where type classes would not help making this any better, since the types of getLine and H.getLine are very different.) While we're at it, how about instead of the "fmap" function: Functor.map (F.map) List.map (L.map) Maybe.map(M.map) The programmer can pick him/herself what function to use. (The Prelude really has too many functions in it, and very often the rationale for a function being in Prelude or in Char/List/Maybe/Monad/IO/etc. is not motivated.) What do people think about this? If people prefer these stylistic changes, I think we should not hesitate making them for Haskell/2 by completely redesigning the module structure and using more consistent naming conventions. /Koen. (*) What actually happened to the excellent proposal somebody made a while ago for Haskell98: import M = Monad ? I like it a lot! -- Koen Claessen http://www.cs.chalmers.se/~koen phone:+46-31-772 5424 mailto:[EMAIL PROTECTED] - Chalmers University of Technology, Gothenburg, Sweden ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
RE: binary files in haskell
How about this slightly more general interface, which works with the new FFI libraries, and is trivial to implement on top of the primitives in GHC's IOExts: hPut :: Storable a = Handle - a - IO () hGet :: Storable a = Handle - IO a What about endianess? In which format are Floats or even just Bools stored? For a file which probably shall be read from different machines this is not clear at all. The behaviour is defined by the Storable instances for each type. The endianness for writing say an Int32 would be the same as the host architecture, for instance. If you want to work with just bytes, you can always just use hPut and hGet at type Word8. Overloading with Storable gives you more flexibility, since if you have a way to serialise an object in memory for passing to a foreign function, you also have a way to store it in binary format in a file (modulo problems with pointers, of course). In the long term, we'll want to be able to serialise more than just Storable objects (c.f. the other overloaded binary I/O libraries out there), and possibly make the output endian-independent - but after all there's no requirement that Haskell's Int has the same size on all implementations, so there's no guarantee that binary files written on one machine will be readable on another, unless they only use explicitly sized types or Integer. Perhaps these should be called hPutStorable and hGetStorable so as not to prematurely steal the best names. I think John is right that there needs to be a primitive interface for just writing bytes. You can then build anything more complicated on top (probably different high-level ones for different purposes). I just see one problem with John's proposal: the type Byte. It is completely useless if you don't have operations that go with it; bit-operations and conversions to and from Int. The FFI already defines such a type: Word8. So I suggest that the binary IO library explicitely reads and writes Word8's. yup, that's what I had in mind. Cheers, Simon ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: binary files in haskell
There is actually NO way with haskell 98 to portably read and write binary files. and many libraries which provide this feature are inherently non portable, (they assume a Haskell Char = C char = byte) which is not necisarily the case. I wrote up a proposal for a binary file IO mechanism to be added as a 'blessed addendum' to the standard at best and as a commonly implmented extension (in hslibs) at least.. my original proposal messages to the list can be gotten from http://www.ugcs.caltech.edu/~john/computer/haskell/ although without the messages from others in the list they may seem out of context. -- -- John Meacham http://www.ugcs.caltech.edu/~john/ California Institute of Technology, Alum. [EMAIL PROTECTED] -- ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: Binary files in Haskell
Malcolm In fact, Binary allows you to process the data directly Malcolm from the disk rather than hauling it all into memory. Malcolm This eliminates start-up time altogether. What's more, Malcolm provided the file is used read-only, you can use pure Malcolm lazy functions to read the data, rather than having to Malcolm sequence everything through the IO monad. Is the fact that the processing is "directly from disk" transparent to the programmer Absolutely. or does the processing function have to be written differently to avoid startup time ? No - the program can be identical but for one line, which opens the file. Does 'get' return its result lazily ? There are two functions for reading, with different semantics - 'getAt' returns its result eagerly in the IO monad, while 'getFAt' is a pure function which returns its result lazily. class Binary where ... getAt :: BinHandle - BinPtr a - IO a getFAt :: BinHandle - BinPtr a - a The latter obviously requires some guarantee that the file cannot be modified between the building of the thunk and the demand which evaluates it. So the 'getFAt' function only gives a result when the file is in RO (read-only) mode. (After writing to a file, it can be "frozen" to RO mode using the 'freezeBin' operation.) On the other hand, 'getAt' has to return its result eagerly, because there is always the possibility that a subsequent 'putAt' operation will overwrite that part of the file before the value read earlier is needed. I've been looking at the html doc and the nhc Binary src, but I can't quite see a transparent way of processing data on disk. There is the putAt operation, but that needs a file pointer; and there is the BinArray, which can only be written sequentially. Sorry I wasn't clear. The original question was from someone who wished only to *read* a huge quantity of data from a file in order to do some processing on it - but not to write any data back. I was claiming that with the Binary library s/he could process the data transparently direct from disk without having to load everything into memory at once. As I understand it, the quantity of data was so large that having enough memory for everything was a real bottleneck. Provided the processing function is in some sense incremental - does not require *all* the data at once - then Binary will be a win. As you say however, more general processing which involves update to a structure is a rather different beast. The Binary library does not support transparent update in the way you describe, where only the modified part of the value is written back. As you note, the whole (large) structure has to be written back. Regards, Malcolm [EMAIL PROTECTED] Dr Malcolm Wallace (functional programming research) +44 1904 434756 Department of Computer Science, University of York, YORK YO1 5DD, U.K. http://www.cs.york.ac.uk/~malcolm/
Re: Binary files in Haskell
"Malcolm" == Malcolm Wallace [EMAIL PROTECTED] writes: Malcolm In fact, Binary allows you to process the data directly Malcolm from the disk rather than hauling it all into memory. Malcolm This eliminates start-up time altogether. What's more, Malcolm provided the file is used read-only, you can use pure Malcolm lazy functions to read the data, rather than having to Malcolm sequence everything through the IO monad. Is the fact that the processing is "directly from disk" transparent to the programmer or does the processing function have to be written differently to avoid startup time ? Does 'get' return its result lazily ? I've been looking at the html doc and the nhc Binary src, but I can't quite see a transparent way of processing data on disk. There is the putAt operation, but that needs a file pointer; and there is the BinArray, which can only be written sequentially. If one were to read a (large) data structure, modify it slightly (in-place), then write back the change (without using a file pointer), I imagine the whole data structure would be written back to disk. Whereas using a memory mapped file, only modified pages would be written back. Would it be helpful and possible to bolt a mmap'ed file underneath some of these operations so that large data structures could be modified (either functionally or in-place) without having to write back the whole structure ? I guess the problem comes back to lazy closures and associated pointers to volatile code. Tim -- -- T.R.BARBOUR Email : [EMAIL PROTECTED] -- Department of Computer Science The University of Melbourne Parkville, Victoria 3052 Australia --
Re: Binary files in Haskell
If the representation is simple enough, you might consider accessing the database via C ... Of course, Binary or Native would be easier, but the start-up time for a large database might be a problem. In fact, Binary allows you to process the data directly from the disk rather than hauling it all into memory. This eliminates start-up time altogether. What's more, provided the file is used read-only, you can use pure lazy functions to read the data, rather than having to sequence everything through the IO monad. (And of course for efficiency, the implementation of Binary is written largely in C, via the GreenCard preprocessor for Haskell.) Malcolm [EMAIL PROTECTED] Dr Malcolm Wallace (functional programming research) +44 1904 434756 Department of Computer Science, University of York, YORK YO1 5DD, U.K. http://www.cs.york.ac.uk/~malcolm/
Re: Binary files in Haskell
Alex Ferguson and (earlier) Yoshihiko Ichikawa ask for clarification of the intent of the discussion on binary I/O. It might help if I outline the capabilities of the Binary library (currently available only for nhc13, but as Simon PJ notes, if you want it in ghc/hugs, ask him). 1. Reading and writing binary data in a prescribed format. Typical examples are image data in GIF, TIFF, Sun Raster etc, and user-account statistics files used in Unix. Easy to program using the Binary library. 2. Reading and writing part of Haskell data in a file. This resembles the array dumping mechanism found in some implementation of FORTRAN. Comes for free (by adding a "deriving" clause on your datatype). 3. Warm starting Haskell sessions or programs. As found in ordinary implementations of Smalltalk, Lisp, and Yale implementation of Haskell. Can be programmed, if it amounts to selecting appropriate data structures to "save" at the end of one run and "reload" at the beginning of the next. (It is a far more difficult problem if you want to save un-evaluated closures.) 4. Supporting orthogonal persistence in Haskell. This may lead to a persistent Haskell implementation. No, the Binary library doesn't lend itself to this. Regards, Malcolm [EMAIL PROTECTED] Dr Malcolm Wallace (functional programming research) +44 1904 434756 Department of Computer Science, University of York, YORK YO1 5DD, U.K. http://www.cs.york.ac.uk/~malcolm/
Re: Binary files in Haskell
Real world example: development tools process a large geometric data set to build a run-time optimized BSP tree with precalculated lighting and collision information. The user application will not modify this data, but it will have to load it dynamically without slowing down a 30Hz graphics/interaction loop. (Apologies to Quake and Mario 64 :) Solution in C: load the data as contiguous raw bytes and cast pointers into the block to the appropriate structure types. Solution in Haskell? Malcolm Wallace and Colin Runciman at York have been working on this kind of thing. They had a paper at the Haskell workshop last year (which is on Malcolm's home page http://www.cs.york.ac.uk/~malcolm/). I believe they are near to releasing a revised version. It only works for nhc at the moment, but if enough people like you yell loudly enough (construe this positively!) then we'll put it into GHC/Hugs too. Our priorities are largely driven by what people ask for. Simon
Re: Binary files in Haskell
Alex Ferguson wrote: Tony Davie: And anyway didn't this start out about how to do I/O on binary data. To be honest, I'm not sure this discussion does necessarily address the original query: Steve Roggenkamp: I would like to use Haskell for several larger scale projects, but I can't figure out how to read and write binary data. I'm not sure that it has. I haven't saved every message, but it appears I could use a Native class from ghc. I've not tried this yet. Chapter 7 of the Haskell 1.4 Report appears to deal with character I/O only and I can't find any binary I/O operations in the library. Steve, can you clarify what sort of binary file manipulation it is you are actually looking for, and whether the thread so far has answered your question (either in the positive, or the negative)? I currently working on a database which consists mainly of integers. I have to search large arrays of these integers which may contain over a hundred million entries. With this amount of data I'm looking at ways to compress it to a minimum amount of physical space. I also do not want to have to parse a string to read in an integer. Both computational performance and space would suffer too much. Thus the need for storing binary data. The major performance determinant appears to be the speed of the disk drive. Most disk drives can only deliver about 100-125 blocks/sec when randomly accessing offsets in a file. So I need to be careful about which disk blocks I access. Even with 250 million entries, I can find a single entry in less than 30 disk I/Os by using binary search, so I could conduct about three of these searches per second, assuming no help from the disk cache. I have considered using `mmap', but I think the database will eventually outgrow the address space of a single process. Though in any case, I do agree that the whole persistency area is an alarmingly open one wrt Haskell, and the discussion has merit for its own sakes. Yoshihiko Ichikawa's request for clarification was also rather to the point; we're in danger of covering the whole of a very broad and somewhat diffuse subject area -- possibly by a search algorithm reasonably characterisable as a species of Drunkard's Walk ;-) Slainte, Alex. Steve -- - Steve Roggenkamp InterNet:[EMAIL PROTECTED] Address: 9159 Eversole Run Road Powell, OH 43065 USA Phone: 614.873.6573 WWW: http://www.infinet.com/~sroggen/home.html
Re: Binary files in Haskell
On Wed, 25 Feb 1998, Jerzy Karczmarczuk wrote: : PS. Could somebody inform me what is the current status of : multi-parametric classes? Concretely (for example) I would : like to construct a generic Universal Parser : : type Uparser a c = [c] - [(a,[c])] : : which consumes any stream and produces any result. The classical : combinators do not care about [c] being character strings, but : with two parameters it is not possible to apply directly the magic : word Monad. Of course, my problem is not veeery dramatic, but : those multiparametric classes bother me already some time. Well, it is actually possible to solve with Haskell 1.4 without multi parameter type classes. Instead of making a specific parser data type, I've implemented a general state transformer type, which can be directly used as a parser. it has the type newtype StateM s m a = SM (s - m (a,s)) unStateM (SM f) = f and takes three arguments, a state type, a container monad and the result type. a typical Monad instantiation looks like this instance Monad m = Monad (StateM s m) where (SM stm) = f = SM $ \s - stm s = \(v,s') - unStateM (f v) s' return v = SM $ \s - return (v, s) so your Uparser would be type Uparser a c = StateM c [] a I have actually implemented the Hugs Parselib using these types. If you are interested, you can mail me for the source. n. ---[ www.dtek.chalmers.se/~d95mback ]--[ PGP: 0x453504F1 ]--[ UIN: 4439498 ]---
Re: Binary files in Haskell
Tim writes: I noticed the tracer stuff on the nhc web-site. It looks very much like a declarative-debugger, although that term never appeared in the documentation I read. Will it in fact be a declarative-debugger ? Debugging is certainly one of the applications of a tracer. I'm not sure what you mean by "declarative" in this context. The browser permits you to explore the program trace, but not to modify it - in that sense you see a declaration of what occurred. If so, will it be portable to ghc? Not without a large amount of work. There are three parts to the tracer: 1. The source-to-source transformation (from an ordinary program to one which computes its own trace in addition to its normal behaviour), is currently implemented within the nhc13 compiler, but has the potential to be re-written as a free-standing tool. However, all primitively-defined operations (such as I/O) must also have a trace-computing variant available, and at present these are written by hand. 2. Modifications to nhc13's runtime-system, in particular to the garbage collector, enable large programs to be feasibly traced without accumulating vast quantities of irrelevant trace data. 3. The trace browser is a free-standing application which connects to a running Haskell program using sockets. This is very portable. Regards, Malcolm [EMAIL PROTECTED] Dr Malcolm Wallace (functional programming research) +44 1904 434756 Department of Computer Science, University of York, YORK YO1 5DD, U.K. http://www.cs.york.ac.uk/~malcolm/
Re: Binary files in Haskell
On Wed, 25 Feb 1998, Jerzy Karczmarczuk wrote: Compete in the space of professional production quality languages? I am sorry, but if one day the Haskell, or any other language creators decide to include *everything* into their language, it will be The End of it. Yes that comment went too far. My point was not that Haskell must implement all this functionality, but that absence of an ability (in principle) to maintain persistent state easily would be a serious shortcoming. I understood some of the posts I saw to indicate that it would be difficult, in principle, to provide a relatively transparent persistence system. e.g. I am afraid that storing under binary format the internal objects under user control might be difficult in a pure lazy language. There is no difference between closures and their reduced results, the referential transparence should be maintained. Moreover functions in Haskell do not belong to Eq. Given that a huge amount of modern programming involves databases and persistence (especially in a web environment), this strikes me as a huge issue. Of course, one might argue that Haskell programmers should just call out to external database api's via Corba/DCOM. But, given that most DBMS provide a declarative interface, and some provide atomic transactions and object versioning, it should be possible to have a smoother storage interface than that of pure Corba/DCOM API calls. -Alex- ___ S. Alexander Jacobson i2x Media 1-212-697-0184 voice1-212-697-1427 fax
Re: Binary files in Haskell
The discussion about heap dumps, binary persistent objects, etc. continues. S. Alexander Jacobson: Java comes with a serialization mechanism that allows the user to save and load all objects ... Object databases, like PSE, provide caching and indirection facilities ... Relational databases forces the programmer to transalte data structures into a relational format and then use JDBC to store/retrieve them. ... I realize that implementing these options is easier in languages that are pointer or handle based rather than in languages that treat all objects as literals, but if Haskell does not provide a way to use facilities like these, it will not be able to compete in the space of professional production quality languages. -Alex- ??? Compete in the space of professional production quality languages? I am sorry, but if one day the Haskell, or any other language creators decide to include *everything* into their language, it will be The End of it. Of course, Scheme which has an internal compiler is able to store on the disk the bytecodes, Python gives the user the full access to the data internal structures an permits to 'pickle' or to 'marshal' some modules. But is this, which makes them "professional production quality languages? You cannot do it - normally - in "C" without special add-on libraries, and saving internal data has nothing to do with storing and dynamic loading of the compiled code. I am afraid that storing under binary format the internal objects under user control might be difficult in a pure lazy language. There is no difference between closures and their reduced results, the referential transparence should be maintained. Moreover functions in Haskell do not belong to Eq. So, it could be done - if at all - only on the implementation level, totally transparent for the user, just a kind of virtual memory. Is it really vital for a typical Haskell application? But I am not an expert... Jerzy Karczmarczuk University of Caen, France. ** PS. Could somebody inform me what is the current status of multi-parametric classes? Concretely (for example) I would like to construct a generic Universal Parser type Uparser a c = [c] - [(a,[c])] which consumes any stream and produces any result. The classical combinators do not care about [c] being character strings, but with two parameters it is not possible to apply directly the magic word Monad. Of course, my problem is not veeery dramatic, but those multiparametric classes bother me already some time.
Re: Binary files in Haskell
Tim Barbour writes "Tony" == Tony Davie [EMAIL PROTECTED] writes: Tony This has well know disadvantages. Simon has already pointed Tony out that it's not relocatable. It's also limited to the size Tony of virtual memory and takes up that amount of space even Tony though most of it may not be used. Loading a complete Tony persistent memory at once is overkill. You should load it's Tony objects lazily and only store away accessible stuff at Tony commit time. Relocation and address space are not such a problem on 64-bit machines. Not having enough address space is really a disadvantage of 32-bit machines, not persistence. I was suggesting using memory-mapped files, using e.g. mmap. Such a file is mapped instantly into the address space, but its contents are paged in and out on demand. A mmapped file takes up disk space only for the populated part of its address region, and does not take up virtual memory. It takes up real memory only according to demand for pages from the file, in much the same way as buffer blocks do on a conventional file. Tim -- A much more serious problem is that, in a persistent system, one may want to extend the persistent store by adding new objects to it and making use, in newly written programs, of the objects found there already. It's then very inconvenient to treat the store in one monolithic lump. And anyway didn't this start out about how to do I/O on binary data. Surely you don't want to output everything every time you output a single item, only those things reachable from it. Tony Davie, Computer Science, St.Andrews University, North Haugh, St.Andrews Scotland, KY16 9SS, Tel: +44 1334 463257, Fax: +44 1334 463278 mailto:[EMAIL PROTECTED] Home: http://www.dcs.st-and.ac.uk/~ad/Home.html Handel Index and Chronology: http://bruichladdich.dcs.st-and.ac.uk/HandelWWW/HandelCat.html Limerick Archive: http://bruichladdich.dcs.st-and.ac.uk/LimericksDir/Limericks.html 'There is magic in the web' - Othello, Act 3, Scene 4
Re: Binary files in Haskell
"Tony" == Tony Davie [EMAIL PROTECTED] writes: Tony This has well know disadvantages. Simon has already pointed Tony out that it's not relocatable. It's also limited to the size Tony of virtual memory and takes up that amount of space even Tony though most of it may not be used. Loading a complete Tony persistent memory at once is overkill. You should load it's Tony objects lazily and only store away accessible stuff at Tony commit time. Relocation and address space are not such a problem on 64-bit machines. Not having enough address space is really a disadvantage of 32-bit machines, not persistence. I was suggesting using memory-mapped files, using e.g. mmap. Such a file is mapped instantly into the address space, but its contents are paged in and out on demand. A mmapped file takes up disk space only for the populated part of its address region, and does not take up virtual memory. It takes up real memory only according to demand for pages from the file, in much the same way as buffer blocks do on a conventional file. Tim -- -- T.R.BARBOUR Email : [EMAIL PROTECTED] -- Department of Computer Science The University of Melbourne Parkville, Victoria 3052 Australia --
Re: Binary files in Haskell
The debate about heavy or light-weight saving sounds a lot like the debate in traditional programming languages of how users should implement persistent state. Java comes with a serialization mechanism that allows the user to save and load all objects reachable from a particular object root. It handles cyclic data structures by replacing memory pointers with pointers to object locations in a handle table. (see http://www.javasoft.com/products/jdk/1.1/docs/guide/serialization/index.html) Object databases, like PSE, provide caching and indirection facilities that allow the "lazy" loading/saving/updating of object graphs. (Some versions provide ACID transactions) (see http://www.odi.com) Relational databases forces the programmer to transalte data structures into a relational format and then use JDBC to store/retrieve them. (see http://www.javasoft.com/products/jdk/1.1/docs/guide/jdbc/index.html) I realize that implementing these options is easier in languages that are pointer or handle based rather than in languages that treat all objects as literals, but if Haskell does not provide a way to use facilities like these, it will not be able to compete in the space of professional production quality languages. -Alex- ___ S. Alexander Jacobson i2x Media 1-212-697-0184 voice1-212-697-1427 fax On Tue, 24 Feb 1998, Tony Davie wrote: Tim Barbour writes "Tony" == Tony Davie [EMAIL PROTECTED] writes: Tony This has well know disadvantages. Simon has already pointed Tony out that it's not relocatable. It's also limited to the size Tony of virtual memory and takes up that amount of space even Tony though most of it may not be used. Loading a complete Tony persistent memory at once is overkill. You should load it's Tony objects lazily and only store away accessible stuff at Tony commit time. Relocation and address space are not such a problem on 64-bit machines. Not having enough address space is really a disadvantage of 32-bit machines, not persistence. I was suggesting using memory-mapped files, using e.g. mmap. Such a file is mapped instantly into the address space, but its contents are paged in and out on demand. A mmapped file takes up disk space only for the populated part of its address region, and does not take up virtual memory. It takes up real memory only according to demand for pages from the file, in much the same way as buffer blocks do on a conventional file. Tim -- A much more serious problem is that, in a persistent system, one may want to extend the persistent store by adding new objects to it and making use, in newly written programs, of the objects found there already. It's then very inconvenient to treat the store in one monolithic lump. And anyway didn't this start out about how to do I/O on binary data. Surely you don't want to output everything every time you output a single item, only those things reachable from it. Tony Davie, Computer Science, St.Andrews University, North Haugh, St.Andrews Scotland, KY16 9SS, Tel: +44 1334 463257, Fax: +44 1334 463278 mailto:[EMAIL PROTECTED] Home: http://www.dcs.st-and.ac.uk/~ad/Home.html Handel Index and Chronology: http://bruichladdich.dcs.st-and.ac.uk/HandelWWW/HandelCat.html Limerick Archive: http://bruichladdich.dcs.st-and.ac.uk/LimericksDir/Limericks.html 'There is magic in the web' - Othello, Act 3, Scene 4
Re: Binary files in Haskell
I would like to use Haskell for several larger scale projects, but I can't figure out how to read and write binary data. It does not appear that the language supports binary files. Am I missing something? Colin Runciman and his Merrie Men are working on writing Haskell values into binary files with the additional feature that the files are compressed. There was a paper in the '97 Haskell workshop and I know he's been working on it since; but I don't think a "product" has emerged for general use. But with encouragement perhaps it will! Simon
Re: Binary files in Haskell
--MimeMultipartBoundary Content-Type: text/plain; charset="us-ascii" This thread and particularly the following passage: "If I understand this right, you're suggesting essentially dumping out part of the Haskell heap to a file." reminds me very strongly of APL, where dumping the current state of the system into a binary file that you can later reload (and resume your work where you left it) has always been a possibility. This was useful not just for interactive work but also for breaking down large computations and escape the reaper built into the mainframe's batch queue :) Is it anything of the kind that Runciman is working on or something less ambitious? Regards, -- O.l. --MimeMultipartBoundary--
Re: Binary files in Haskell
"Steve" == Steve Roggenkamp [EMAIL PROTECTED] writes: Steve I would like to use Haskell for several larger scale Steve projects, but I can't figure out how to read and write Steve binary data. It does not appear that the language supports Steve binary files. Am I missing something? "Tim" == Timothy Robin BARBOUR [EMAIL PROTECTED] writes: Tim There is the (automatically deriveable) Binary class, defined in the Tim Haskell Report ? Unfortuately ghc does not support it yet. But ghc Tim does provide the Native class (not automatically deriveable), but with Tim many pre-defined instances. The Haskell Report no longer defines a Binary class. However, the nhc13 compiler now supports an improved library for the Binary class, including automatic derivation for any user-defined type. Tim Another thing you will need is some way to transport your binary data Tim between platforms e.g. Intel - Alpha. Nhc13's implementation of Binary values always stores the same representation, regardless of the endian-ness of the underlying machine. Tim There have been claims that Read is sometimes very inefficient at Tim parsing large structures Indeed. The Binary class does not suffer from this problem. Tim There is another way one might proceed. Why not just use a Tim memory-mapped file (mmap) to make the data persistent in-place ? With our current design, it is possible to map binary files into memory and vice versa, in a completely orthogonal fashion. Tim The data structure better not contain any lazy closures,... Yes, the three restrictions on our design are: (1) values must be fully-evaluated when written to a binary file; (2) sharing is lost, so cycles and infinite structures are excluded; (3) functions cannot yet be stored in binary files. Full information on nhc13, including downloads, is available at http://www.cs.york.ac.uk/fp/nhc13/ This "York" version of nhc13 is currently in beta-release, with a full public release expected soon. A paper describing a preliminary design for Binary I/O was published in the Haskell Workshop '97 (and is available from my homepage). The current design has moved on quite a bit however - some documentation is available in html, and a new paper is in preparation. Regards, Malcolm [EMAIL PROTECTED] Dr Malcolm Wallace (functional programming research) +44 1904 434756 Department of Computer Science, University of York, YORK YO1 5DD, U.K. http://www.cs.york.ac.uk/~malcolm/
Re: Binary files in Haskell
Sorry for interruption, but I feel this thread of discussion includes four different aspects of `binary' data: 1. Reading and writing binary data in a prescribed format. Typical examples are image data in GIF, TIFF, Sun Raster etc, and user-account statistics files used in Unix. 2. Reading and writing part of Haskell data in a file. This resembles the array dumping mechanism found in some implementation of FORTRAN. 3. Warm starting Haskell sessions or programs. As found in ordinary implementations of Smalltalk, Lisp, and Yale implementation of Haskell. 4. Supporting orthogonal persistence in Haskell. This may lead to a persistent Haskell implementation. Could someone clarify the main topic? All of them? Yoshihiko Ichikawa, Dept of Info Sci, Fac of Sci, Ochanomizu University Phone: +81-3-5978-5708 (Dial-in) / +81-3-5978-5704 (Library of Department) Fax:+81-3-5978-5898 (Faculty) / +81-3-5878-5705 (Library of Department) E-mail: [EMAIL PROTECTED]
Re: Binary files in Haskell
This thread and particularly the following passage: "If I understand this right, you're suggesting essentially dumping out part of the Haskell heap to a file." reminds me very strongly of APL, where dumping the current state of the system into a binary file that you can later reload (and resume your work where you left it) has always been a possibility. This was useful not just for interactive work but also for breaking down large computations and escape the reaper built into the mainframe's batch queue :) Is it anything of the kind that Runciman is working on or something less ambitious? This has well know disadvantages. Simon has already pointed out that it's not relocatable. It's also limited to the size of virtual memory and takes up that amount of space even though most of it may not be used. Loading a complete persistent memory at once is overkill. You should load it's objects lazily and only store away accessible stuff at commit time. Tony Davie, Computer Science, St.Andrews University, North Haugh, St.Andrews Scotland, KY16 9SS, Tel: +44 1334 463257, Fax: +44 1334 463278 mailto:[EMAIL PROTECTED] Home: http://www.dcs.st-and.ac.uk/~ad/Home.html Handel Index and Chronology: http://bruichladdich.dcs.st-and.ac.uk/HandelWWW/HandelCat.html Limerick Archive: http://bruichladdich.dcs.st-and.ac.uk/LimericksDir/Limericks.html 'There is magic in the web' - Othello, Act 3, Scene 4
Re: Binary files in Haskell
--MimeMultipartBoundary Content-Type: text/plain; charset="us-ascii" Tony Davie wrote: This has well know disadvantages. Simon has already pointed out that it's not relocatable. How is it done in APL and in other systems that do it (SmallTalk and some LISP systems, according to another poster)? Loading a complete persistent memory at once is overkill. Perhaps but it has an appealing simplicity :) Dry-freezing your workspace by hand, so to speak, must be a rather involved task, if probably an interesting one. -- O.L. --MimeMultipartBoundary--
Re: Binary files in Haskell
"Steve" == Steve Roggenkamp [EMAIL PROTECTED] writes: Steve I would like to use Haskell for several larger scale Steve projects, but I can't figure out how to read and write Steve binary data. It does not appear that the language supports Steve binary files. Am I missing something? I have the same problem, but it is not intractable. I'd like some comments from Glasgow people on the last possibility below. There is the (automatically deriveable) Binary class, defined in the Haskell Report ? Unfortuately ghc does not support it yet. But ghc does provide the Native class (not automatically deriveable), but with many pre-defined instances. Using Native should be very like using C++ streams. The function that flattens an arbitrary (mostly) type into bytes is probably a polytypic function. There is a polytypic pre-processor for Haskell called Polyp, but it has severe restrictions at present. It was claimed some time ago that the next version of Polyp would remove the restrictions. This would be rather useful. In the meantime there is another pre-processor called Derive, which can be used to automatically derive instances of non-standard classes (such as Native). It lacks the elegance and generality of Polyp, but it is usable now. See http://www.dcs.gla.ac.uk/~nww/derive.html . Another thing you will need is some way to transport your binary data between platforms e.g. Intel - Alpha. It would probably be straighforward to make an endian-aware subclass of Native, that kept the binary representation in network-byte-order. If no-one else does this I will at some stage. Of course transporting between platforms may be quite rare, in which case ascii (Show and Read) might work. There have been claims that Read is sometimes very inefficient at parsing large structures - it *might* need a million years for a large file. There is another way one might proceed. Why not just use a memory-mapped file (mmap) to make the data persistent in-place ? This would be a way of getting efficient persistence of (almost) any Haskell data structure without any code-writing and without any flattening. There are a few difficulties here, but it may well be feasible. The problems are: (i) The data structure better not contain any lazy closures, since they are unlikely to be valid on a future run of the program. Solution: restrict this technique to strict data structures for now. (ii) When the data structure is constructed, ghc will put in bits of memory from all-over the heap. Solution: use the 2-space copying garbage collector to gc the data structure (just give it the data structure for its root set), with the too-space being the memory from the mmapped file e.g. obtained using mmalloc. This will eliminate ordinary ghc heap from the data structure, so the file can be closed. For a large file (e.g. several Gb), closing would take quite a while because of the gc. It would also need (temporarily) twice the storage space of the file. Doing the above might just be a case of knowing how to call the garbage collector appropriately. Any comments from Glasgow ? Tim -- -- T.R.BARBOUR Email : [EMAIL PROTECTED] -- Department of Computer Science The University of Melbourne Parkville, Victoria 3052 Australia --
Re: Binary files in Haskell
Timothy Robin BARBOUR [EMAIL PROTECTED] writes: There is another way one might proceed. Why not just use a memory-mapped file (mmap) to make the data persistent in-place ? This would be a way of getting efficient persistence of (almost) any Haskell data structure without any code-writing and without any flattening. There are a few difficulties here, but it may well be feasible. The problems are: If I understand this right, you're suggesting essentially dumping out part of the Haskell heap to a file. Assuming the data you want to dump out is closed (i.e. has no external references) and is fully evaluated, there are still some problems remaining: - The data is location-dependent, and the exact location the data was originally resident at might not be available at load-time, so it would have to be relocated. - Heap-resident data in GHC refers (by address) to the code of the program itself, and to parts of the run-time system. You'd have to do some kind of dynamic linking to be able to load data back again. - The format of heap-resident data varies depending on things like profiling, so you wouldn't be able (for instance) to load data generated by a profiled program into one that isn't. So, you could avoid these problems by always loading the data at the same address in memory, and only using one program to save/load data. If you recompiled the program, you'd have to regenerate the persistent store. This seems overly restrictive, it might be simpler to just use the Native or Binary classes. Cheers, Simon -- Simon Marlow [EMAIL PROTECTED] University of Glasgow http://www.dcs.gla.ac.uk/~simonm/ finger for PGP public key
Re: Binary files in Haskell
Phil Trinder asked: Do any Haskell implementations support Binary files? At York we have a version of nhc13 which supports, amongst other things, binary file I/O. We hope to release it to the world fairly soon. We have confirmed that binary files enable a faster data transfer rate than the textual parsing of ordinary files, as well as delivering considerable space savings. As with ordinary files, our implementation allows only data to be stored - functional values are somewhat tricky. Also, it is easy to lose sharing in the data structure when it is converted to the binary format - however we now have a means of avoiding that loss. More details will be available soon. Malcolm