Re: [polyml] Re: reading a file on byte level
On Wed, 17 Oct 2012, David Matthews wrote: Michael, It's probably a good idea to send questions like this to the Poly/ML mailing list and I'm copying this reply to the list. There are people on the list who may have experience of what you're trying to do and also any replies get archived and reduce the chances of the same question being asked repeatedly. To reply to your specific question: there's no difference in efficiency between TextIO and BinIO on Unix. On Windows TextIO is more complicated because of the need to convert between CRLF and \n. I'm not clear how you are accessing the file, whether it is random or sequential. To read individual bytes you can always use BinIO.input1 or TextIO.input1. This would save you having to explode a vector. BinIO deals with values of type Word8.word and the vector is Word8Vector.vector. These are unsigned byte values. Regards, David On 16/10/2012 20:59, Michael Moeller wrote: Hi, I've got a problem regarding huge files as input. I need to get access to the single byteslike with fgetc() in C. TextIO and subsequently exploding the string does the jobbut performance matters and it seems to me like BinIOsaves me from the 'explode' step.Unfortunately I can't figure out how to access theresulting BinIO.vector. I'd be appreciative to seean exampleand please tell me if this is not the right approach. Regards, Michael David probably intended to send this back to Michael Moeller, and just got the M completion wrong. Nonetheless, I can offer two hints from the Isabelle/ML code base: (1) Combinators fold_lines and fold_pages to access large text files in a scalable manner: http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2012/src/Pure/General/file.ML#l132 (2) Some applications of BinIO.vector over sockets, which had to be trashed for compatibility with old Poly/ML 5.4.x versions: http://isabelle.in.tum.de/repos/isabelle/file/63144ea111f7/src/Pure/System/system_channel.ML http://isabelle.in.tum.de/repos/isabelle/file/8b20be429cf3/src/Pure/System/system_channel.ML Makarius ___ polyml mailing list polyml@inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml
Re: [polyml] Re: reading a file on byte level
On Wed, 17 Oct 2012, Michael Moeller wrote: Thanks for the links, I'll look into it. Talking about large amounts of data, do you happen to have an example of using HashArray? This would be great. Maybe someone else on the list has experience with hash tables. In Isabelle/ML we do the large data management in memory as immutable tree structures. This allows to conserve a lot of memory in Poly/ML 5.5.0 in particular, where David provides this exceedingly nice online sharing of old/life data on the heap. It also works well with multicore / multithread programming to have everything persistent, immutable, sharable by default. The main workhorse for map/set-like structures in Isabelle/ML is http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2012/src/Pure/General/table.ML This is a quite efficient and reasonably simple implementation of 2-3 trees. When starting that in 1999, I've first had a look at the red-black tree implementation in SML/NJ (ported from C) and was not convinced -- later some people found actual problems in it that might be still there. Another prerequisite of efficiency are reasonable order functions defined on the key values for indexing tables. E.g. for type string we look at the length in the descriptor first, before walking through the memory to compare the text (you cannot do this in C). Same for complex types like lists, trees, lambda-terms etc.: first compare superficial outline structure, then walk down to the content. Since red-black trees are more present in the public these days (why?), several people have worked on this verified implementation using Isabelle/HOL: http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2012/src/HOL/Library/RBT_Impl.thy http://isabelle.in.tum.de/repos/isabelle/file/Isabelle2012/src/HOL/Library/RBT.thy You can backport that to SML manually, or use the Isabelle/HOL generator to produce SML, Haskell, Scala etc. -- the result would not look much worse than the C - SML translation performed by hand in SML/NJ some years ago. Makarius ___ polyml mailing list polyml@inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml