Re: [Haskell-cafe] Same compiled program behaving differently when called from ghci and shell
Il 21/11/2010 06:49, Bruno Damour ha scritto: Hello, I have a very strange (for me) problem that I manage to reduce to this : I have a small program that reads a file with 1 only character (è = e8) The program is ftest2.hs : [...] The only difference I can see is the codepage used. The Windows console use codepage 850: http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using Instead the default codepage of Windows for western languages is 1252. Now, fate is that (Python console): '\xe8'.decode('cp1252').encode('cp850') '\x8a' '\xde'.decode('cp1252').encode('cp850') '\xe8' You can now see the possible cause of the problem. Try to change the codepage of the console. See also: http://www.postgresql.org/docs/9.0/interactive/app-psql.html#AEN75686 [...] Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Same compiled program behaving differently when called from ghci and shell
Il 21/11/2010 19:06, Bruno Damour ha scritto: Le 21/11/10 17:21, Manlio Perillo a écrit : Il 21/11/2010 06:49, Bruno Damour ha scritto: Hello, I have a very strange (for me) problem that I manage to reduce to this : I have a small program that reads a file with 1 only character (è = e8) The program is ftest2.hs : [...] Now, fate is that (Python console): '\xe8'.decode('cp1252').encode('cp850') '\x8a' '\xde'.decode('cp1252').encode('cp850') '\xe8' [...] yes I kind of began to figure that IO might use an environment setting. Did you tried to execute again the program, setting the console codepage to 1252? That souns a bit weird to me (newbe) at it should impact the result of a program depending on where it is launched... its the same binary anyway ? or ? This is only a guess, but recent versions of GHC I/O lib do a low level encoding, when reading a file in text mode. This is the correct way, since a Char is supposed to be an Unicode character. I assume that when reading a text file, the I/O lib just check the system encoding and use it. In your case, you have a text file, codified with codepage 1252, but that GHC is trying to read using codepage 850, instead. So, as in the example I posted, you have (using, again, Python syntax): - the character u'è' - Unicode code point 0xe8 - a byte data in the file, as 0xe8; this is the result of u'è'.encode('cp1252') - a Haskell Char '\xde'; this is the result of '\xe8'.decode('cp850') There are 3 solutions: 1) open the file in binary mode 2) set the console codepage to 1252. I do this by changing the Command Prompt shortcut destination to: `%SystemRoot%\system32\cmd.exe /k chcp 1252` 3) explicitly set the encoding when reading the file in text mode Unfortunately this is now a rather low level and GHC specific operation: http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/GHC-IO-Handle.html The Python API is, by the way: http://docs.python.org/dev/py3k/library/functions.html#open GHC API is quite different (if I understand it correctly). You can change the encoding only after the file has been opened, and you can change it again after having read some data (in Python, instead, the file encoding is immutable) Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Same compiled program behaving differently when called from ghci and shell
Il 21/11/2010 19:28, Bruno Damour ha scritto: [...] Of course you're right but that was a surprise to me... G:\CODE\rlibchcp 1252 Page de codes active: 1252 G:\CODE\rlibftest3.exe è Just '1' G:\CODE\rlibchcp 850 Page de codes active : 850 G:\CODE\rlibftest3.exe è Just '2' Quite treacherous IMHO ? Or what It is not treacherous at all. When you open a file, GHC use localeEncoding, that, as the name suggest, depends on system current codepage (in Windows case). In your example, you are simply changing the codepage, and thus the program behaviour changes accordling. It is the same as when you have a program that print some environ parameter (as an example with System.Environment.getEnvironment). Of course if you change the OS environ from the console, that program behaviour will change. And it is also the same when your program read a file content. If you change the file content from elsewere, the program behaviour will change. As for the original example, I just think that the GHC user guide *should* clearly explain what does it means to open a file in text mode [1], and, if possible, add a note about Windows console (as it has been done with PostgreSQL documentation). [1] right now I do not remember what the Haskell Report says Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Same compiled program behaving differently when called from ghci and shell
Il 21/11/2010 21:51, Manlio Perillo ha scritto: [...] There are 3 solutions: 1) open the file in binary mode 2) set the console codepage to 1252. I do this by changing the Command Prompt shortcut destination to: `%SystemRoot%\system32\cmd.exe /k chcp 1252` 3) explicitly set the encoding when reading the file in text mode Unfortunately this is now a rather low level and GHC specific operation: http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/GHC-IO-Handle.html Correction: encoding support is in System.IO (base 4.2 package), but it is not documented in the Haskell 2010 Report. By the way: what is the rationale why the TextEncoding data does not contain the encoding name? Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: dynamic loading of code on windows
Il 12/11/2010 19:01, Kevin Jardine ha scritto: This isn't about the plugin functionality, it's about compiling code. As the message says : This requires a Unix compatibility toolchain such as MinGW+MSYS or Cygwin. Is it really necessary to use autoconf? I have read the autoconf.ac file and I'm not sure to understand why autoconf is used. Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] back doors into the IO monad
Hi. What are the available methods to execute IO actions from pure code? I know only unsafePerformIO and foreign import (to call a non pure foreign function). Assuming I want to execute external untrusted code using plugins (via the `plugins` package), is it possible to completely forbid execution of IO actions from user code? Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] surrogate code points in a Char
Hi. The Unicode Standard (version 4.0, section 3.9, D31 - pag 76) says: Because surrogate code points are not included in the set of Unicode scalar values, UTF-32 code units in the range D800 .. DFFF are ill-formed However GHC does not reject this code units: Prelude print '\xD800' '\55296' Is this a correct behaviour? Note that Python, too (2.5.4, UCS4 build, Linux Debian), accept these code units. Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] attoparsec and parsec
Jason Dusek ha scritto: To add to the confusion, I forked `bytestringparser` when I wrote the `json-b` package. The fork is here: http://hackage.haskell.org/package/bytestringparser-temporary/ I have added a number of things to original as well as fixing some problems with it. The reason I went with the older package is that the new one depended on stuff that wouldn't build on Hackage I installed attoparsec yesterday without problems. so I was like whatever; however, I now consider that it might have been better to work off the newer package. A subtle error, corrected in my version, seems yet to be present in the `attoparsec-0.7.2`, in an operator used internally to build up the result set. {-# LINE 132 Data/Attoparsec/Internal.hs #-} -- | Turn our split representation back into a normal lazy ByteString. (+:) :: SB.ByteString - LB.ByteString - LB.ByteString sb +: lb | SB.null sb = lb | otherwise = LB.Chunk sb lb Where this operator showed up in `bytestringparser`, I replaced `LB.Chunk` with the smart constructor, `LB.chunk`, to ensure that the no empty chunks invariant of lazy `ByteString`s was followed (I discovered this failing one evening when I was fleshing out the JSON parser). Did you try to contact the author/maintainer? This test can be added to the test suite. Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Lexical Syntax and Unicode
Hi. Reading the Haskell 98 Report (section 9.2), I have found a possible problem. The lexical syntax supports Unicode, however this is not true for the newline: newline - return linefeed | return | linefeed | formfeed The Unicode standard adds two additional characters: U+2028 LINE SEPARATOR U+2029 PARAGRAPH SEPARATOR The Unicode Character Database, also defines two general categories: Zl = Separator, line Zp = Separator, paragraph The Zl category only contains the LINE SEPARATOR character and the Zp category only contains the PARAGRAPH SEPARATOR character. So, IMHO, the lexical syntax should be changed in : newline - return linefeed | return | linefeed | formfeed | uniLine | uniPara uniLine - any Unicode character defined as line separator uniPara - any Unicode character defined as paragraph separator or, alternatively: uniLine - LINE SEPARATOR uniPara - PARAGRAPH SEPARATOR Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] attoparsec and parsec
Hi. I'm writing a generic log parsing package, and I'm serching a parser combinators library. Parsing log files is a simple task, so I would like to use the more efficient solution. I have looked at attoparsec source code, and it seems very specialized for lazy bytestrings parsing, so I assume it is very efficient. How stable is the attoparsec package? How much more efficient is attoparsec than standard packages like parsec? By the way: there seem to be problems with generated documentation in http://hackage.haskell.org/package/attoparsec. Moreover, there is a similar package: http://hackage.haskell.org/package/bytestringparser what is the status of this package? It has the same API of attoparsec, but its older. However there is no indication that this package is deprecated in favor of attoparsec. Thanks Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Converting IO [XmlTree] to [XmlTree]
Luke Palmer ha scritto: [...] Note that it is not always possible to separate IO from pure code. As an example, consider an HTTP 1.1 server that read a request body containing a number for each line, and return a response body containing the sum of the numbers. What? sumRequest :: String - String -- strips header, sums numbers, returns response sumServer :: IO () -- reads from socket, writes sumRequest to socket And many other permutations, with differing degrees of laziness and parametericity. As long as you stricly read a string from the socket, this is ok. But you can not read a string lazily. Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Converting IO [XmlTree] to [XmlTree]
Henning Thielemann ha scritto: On Tue, 14 Apr 2009, rodrigo.bonifacio wrote: I guess this is a very simple question. How can I convert IO [XmlTree] to just a list of XmlTree? The old Wiki had: http://www.haskell.org/wikisnapshot/ThatAnnoyingIoType.html Should be ported to the new Wiki since it is a FAQ ... Note that it is not always possible to separate IO from pure code. As an example, consider an HTTP 1.1 server that read a request body containing a number for each line, and return a response body containing the sum of the numbers. Here, you can not really separate IO from pure computation. And you can not use lazy IO, if you want your server to support HTTP 1.1 pipelining. Regards Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Maybe off-topic -- Writing contracts or software specifications
Maurício ha scritto: Hi, [...] I need, for instance, to write a contract with a programmer we are hiring for a task. [...] The question is: how much do you trust the programmer? And how much do the programmer trust you? Much of the complications of the contracts arise from the need to deal with parts that don't trust each other. A few pages should suffice. Make sure that: 1) You explain accurately what the program must do, and how the programmer intend to write the program. Do you need strong unit tests? 2) Write the deadline for program completion. What happens if the deadline is not honoured? 3) Write the estimate price for the work. Are price changes allowed? Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] advice on space efficient data structure with efficient snoc operation
Hi. I'm still working on my Netflix Prize project. For a function I wrote, I really need a data structure that is both space efficient (unboxed elements) and with an efficient snoc operation. I have pasted a self contained module with the definition of the function I'm using: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3453 The movie ratings are loaded from serialized data, and the result is serialized again, using the binary package: transcodeIO :: IO () transcodeIO = do input - L.hGetContents stdin let output = encodeZ $ transcode $ decodeZ input L.hPut stdout output (here encodeZ and decodeZ are wrappers around Data.Binary.encode and Data.Binary.decode, with support to gzip compression/decompression) This function (transcodeIO, not transcode) takes, on my Core2 CPU T7200 @ 2.00GHz: real30m8.794s user29m30.659s sys 0m10.313s 1068 Mb total memory in use The problem here is with snocU, that requires a lot of copying. I rewrote the transcode function so that the input data set is split in N parts: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3453#a3456 The mapReduce function is the one defined in the Real World Haskell. The new function takes (using only one thread): real18m48.039s user18m30.901s sys 0m6.520s 1351 Mb total memory in use The additional required memory is probably caused by unionsWith, that is not strict. The function takes less time, since array copying is optimized. I still use snocU, but on small arrays. GC time is very high: 54.4% Unfortunately I can not test with more then one thread, since I get segmentation faults (probably a bug with uvector packages). I also got two strange errors (but this may be just the result of the segmentation fault, I'm not able to reproduce them): tpp.c:63: __pthread_tpp_change_priority: Assertion `new_prio == -1 || (new_prio = __sched_fifo_min_prio new_prio = __sched_fifo_max_prio)' failed. internal error: removeThreadFromQueue: not found (GHC version 6.8.2 for i386_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug Now the question is: what data structure should I use to optimize the transcode function? IMHO there are two solutions: 1) Use a lazy array. Something like ByteString.Lazy, and what is available in storablevector package. Using this data structure, I can avoid the use of appendU. 2) Use an unboxed list. Something like: http://mdounin.ru/hg/nginx-vendor-current/file/tip/src/core/ngx_list.h That is: a linked list of unboxed arrays, but unlike the lazy array solution, a snoc operation avoid copying if there is space in the current array. I don't know if this is easy/efficient to implement in Haskell. Any other suggestions? Thanks Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] advice on space efficient data structure with efficient snoc operation
Edward Kmett ha scritto: I'm in the process of adding a Data.Sequence.Unboxed to unboxed-containers. I hope to have it in hackage today or tomorrow, should its performance work out as well as Data.Set.Unboxed. Looking at the data definition of Data.Sequence I suspect that it is not really space efficient. Please note that I have 480189 arrays, where each array has, on average, 209 (Word16 :*: Word8) elements. Using a [(Word32, UArr (Word16 :*: Word8))] takes about 800 MB (but it's hard to measure exact memory usage). [...] Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] binary package: memory problem decoding an IntMap
Manlio Perillo ha scritto: Hi. I'm having memory problems decoding a big IntMap. The data structure is: IntMap (UArr (Word16 :*: Word8)) There are 480189 keys, and a total of 100480507 elements (Netflix Prize). The size of the encoded (and compressed) data is 184 MB. When I load data from the Netflix Prize data set, total memory usage is 1030 Mb. It seems there is a problem with tuples, too. I have a: [(Word16, UArr (Word32 :*:* Word8))] This eats more memory than it should, since tuples are decoded lazily. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] binary package: memory problem decoding an IntMap
Manlio Perillo ha scritto: [...] It seems there is a problem with tuples, too. I have a: [(Word16, UArr (Word32 :*:* Word8))] This eats more memory than it should, since tuples are decoded lazily. My bad, sorry. I simply solved by using a strict consumer (foldl' instead of foldl). Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] about import alias
Hi. Haskell 98 allows import alias: import qualified VeryLongModuleName as C however it does not allow aliasing for imported names import NormalModule (veryLongFunctionName as C) what is the rational? IMHO this can be very useful in some cases. Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] binary package: memory problem decoding an IntMap
Hi. I'm having memory problems decoding a big IntMap. The data structure is: IntMap (UArr (Word16 :*: Word8)) There are 480189 keys, and a total of 100480507 elements (Netflix Prize). The size of the encoded (and compressed) data is 184 MB. When I load data from the Netflix Prize data set, total memory usage is 1030 Mb. However when I try to decode the data, memory usage grows too much (even using the -F1.1 option in the RTS). The problem seems to be with `fromAscList` function, defined as: fromList :: [(Key,a)] - IntMap a fromList xs = foldlStrict ins empty xs where ins t (k,x) = insert k x t (by the way, why IntMap module does not use Data.List.foldl'?). The `ins` function is not strict. This seems an hard problem to solve. First of all, IntMap should provide strict variants of the implemented functions. And the binary package should choose whether use the strict or lazy version. For me, the simplest solution is to serialize the association list obtained from `toAscList` function, instead of directly serialize the IntMap. The question is: can I reuse the data already serialized? Is the binary format of `IntMap a` and `[(Int, a)]` compatible? Thanks Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] uvector package appendU: memory leak?
wren ng thornton ha scritto: Manlio Perillo wrote: Since ratings for each customers are parsed at the same time, using a plain list would consume a lot of memory, since stream fusion can only be executed at the end of the parsing. On the other hand, when I want to group ratings by movies, stream fusion seems to work fine. [...] For the problem as you've discussed it, I'd suggest a different approach: You can't fit all the data into memory at once, so you shouldn't try to. You should write one program that takes in the per-movie grouping of data and produces a per-user file as output. Well, creating 480189 files in a directory is not a very nice thing to do to a normal file system. I should arrange files in directory, but then this starts to become too complex. The solution I'm using now just works. It takes about 950 MB of memory and 35 minutes, but it's not a big problem since: 1) Once loaded, I can serialize the data in binary format 2) I think that the program can be parallelized, parsing subsets of the files in N threads, and then merging the maps. Using this method, should optimize array copying. The problem is that unionWith seems to be lazy, and there is no no strict variant; I'm not sure. Then have your second program read in the reorganized data and do fusion et al. This reduces the problem to just writing the PerMovie - PerUser program. Since you still can't fit all the data into memory, that means you can't hope to write the per-user file in one go. The data *do* fit into memory, fortunately. [...] Best of luck. Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] uvector package appendU: memory leak?
Claus Reinke ha scritto: appendU is strict, insertWith just doesn't force it (follow the source link in the haddocks to see why). Ok, I see. But, IMHO, this should be clearly documented. There seems to be some agreement that strict variant operations should also be provided, but it needs some more thinking, and whenever I look at the code, I have the feeling that there are just too many operations defined in it, so I look away again:-( Well, it does not matter if strict variant operation is provided or not. But at least it should be documented whether an implemented operation is strict or lazy. However, reading the code now, I prefer my version using alter. Yes, it is a more direct way to enforce strict evaluation. Btw, if your update function 'f' is strict in 'old' and 'new', you might just want to use 'Just $! f old new' and save all those 'seq's. Righ, thanks. And this seems to be a bit more memory friendly, too. By the way, about insertWith/alter; from IntMap documentation: insertWithKey: O(min(n,W) alter: O(log n) So, alter is more efficient than insertWithKey? And what is that `W` ? 'W' is a maximum bound (see comment at top of that haddock page). Where? And the top of that page there is only a link to a paper where the algorithm is defined. 'alter' has some shortcuts if the update functions doesn't actually do anything, but for reimplementing 'insertWithKey', the complexity should be the same as the code would be the same. Ok. But piling up appendUs is still not a good idea. For a moment, I thought that the stream representation's (+++) was handling runtime fusion gracefully, but a simple test case suggests otherwise at least for the simpler case of consU (the attached appendU.hs doesn't do any appendUs, as the consU case already demonstrates the issue; be careful with large numbers here, it'll quickly eat your ram): I'm not sure to fully understand the code. [ Ok ] But, again, IMHO it does not apply to my original problem. It is a question of resource constraints, probably. Instead of maintaining an 'IntMap (UArr Int)' all the way through buildup, you could use an 'IntMap [Int]' while parsing, and then compress into 'IntMap (UArr Int)' when parsing is finished. No, as I have written, this is not possible for my problem. Because this would eat all available memory after few minutes. The data I'm parsing from Netflix Prize data set contains: 17770 movies 480189 customers (with customer ids ranging from 1 to 2649429) 100480507 total ratings ratings are grouped for each movie, but I want to group them by costumers. This means that each customer has few ratings (200, on average), so there are a lot of small arrays. Since ratings for each customers are parsed at the same time, using a plain list would consume a lot of memory, since stream fusion can only be executed at the end of the parsing. On the other hand, when I want to group ratings by movies, stream fusion seems to work fine. This is IMHO, of course. If you can afford the larger memory profile during parsing, this should give better overall performance, with the same memory need for the final state, but larger memory needs up to that state. I'd be interested in the differences, if you try it. Unfortunately, I can not afford this memory profile! My PC has only 2 GB of RAM. And I doubt any other could afford the required memory. Using UArr, the total required memory is about 900 MB. Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] uvector package appendU: memory leak?
Claus Reinke ha scritto: Can I close this ticket as not being to do with uvector? -- Don You did notice the suggestion that performance of uvector and bytestring could be improved drastically if compile-time fusion would be augmented with runtime fusion? The storablevector package implements Data.StorableVector.Lazy Just as with Data.ByteString.Lazy, it contains a linked list of chunks. I think that this can improve performances of my implementation, since it is much more efficient to append elements at the end of the vector (it avoids a lot of copying). In my draft implementation of the the Netflix Prize in D language, I used a similar implementation, base on: http://mdounin.ru/hg/nginx-vendor-current/file/tip/src/core/ngx_list.h Unfortunately, D garbage collector is really bad when there are a lot of allocations, so I gave up. Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haddock: inserting interactive sessions in the documentation
Wouter Swierstra ha scritto: What is the suggested (if any) convention for inserting an interactive session in the documentation? You may want to look at: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/DocTest Ah, ok, thanks. So convention is to just use `-- `. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] uvector package appendU: memory leak?
Claus Reinke ha scritto: But Claus was right, appendU is lazy; this seems to be the cause of the problem. appendU is strict, insertWith just doesn't force it (follow the source link in the haddocks to see why). Ok, I see. But, IMHO, this should be clearly documented. I have updated my test code: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3103 The interesting thing is that using appendU is *faster*. Using appendU: real0m38.736s user0m38.638s sys 0m0.048s Using snocU: real0m41.279s user0m40.939s sys 0m0.040s Memory usage is the same. However now I don't really understand why the two implementations differs in lazyness. Or, to ask a different question, how can I make the version using insertWith strict? deja vu:-( http://www.haskell.org/pipermail/haskell-cafe/2009-March/057032.html Your are right, sorry. The problem is that at that time I was not able to fully understand the code! However, reading the code now, I prefer my version using alter. By the way, about insertWith/alter; from IntMap documentation: insertWithKey: O(min(n,W) alter: O(log n) So, alter is more efficient than insertWithKey? And what is that `W` ? As you've noticed, alter also allows to enforce strictness. But piling up appendUs is still not a good idea. For a moment, I thought that the stream representation's (+++) was handling runtime fusion gracefully, but a simple test case suggests otherwise at least for the simpler case of consU (the attached appendU.hs doesn't do any appendUs, as the consU case already demonstrates the issue; be careful with large numbers here, it'll quickly eat your ram): I'm not sure to fully understand the code. But, again, IMHO it does not apply to my original problem. [...] Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] uvector package appendU: memory leak?
Don Stewart ha scritto: Can I close this ticket as not being to do with uvector? Yes, thanks; and sorry for the noise. But it may be interesting to study this the example I have pasted: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3103 I find it a bit surprising that using appendU is actually faster than using snocU. The interesting thing is that using appendU is *faster*. Using appendU: real0m38.736s user0m38.638s sys 0m0.048s Using snocU: real0m41.279s user0m40.939s sys 0m0.040s Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Haddock: inserting interactive sessions in the documentation
Hi. What is the suggested (if any) convention for inserting an interactive session in the documentation? Right know I'm doing (in random-shuffle package): -- |Convert a sequence @(e1...en)@ to a complete binary tree. -- -- @ -- System.Random.Shuffle buildTree ['a','b','c','d','e'] -- Node 5 (Node 4 (Node 2 (Leaf 'a') (Leaf 'b')) -- (Node 2 (Leaf 'c') (Leaf 'd'))) -- (Leaf 'e') -- @ buildTree :: [a] - Tree a As an example, in Python we use 3 + 2 5 One nice feature of this special syntax is that it can be parsed, using doctest. Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] uvector package appendU: memory leak?
Hi. As with a previous post, I think I have found a possible memory problem with the uvector package. I have this data structure (for, again, my Netflix Prize project): IntMap (UArr (Word16 :*: Word8)) I was adding elements to the map using something like: v = map singletonU (a :*: b) insertWith appendU k v m However doing this eats a *lot* of memory. Today I have rewritten my program to use `alter` and `snocU`: append Nothing = Just $ singletonU (a :*: b) append (Just u) = Just $ snocU u (a :*: b) alter append k m This, finally, works. Unfortunately I'm still not able to load the entire Netflix Prize training data set, grouping ratings by customers, because my PC has only 2 GB of RAM. The required memory is about 2 GB, but when the system start to swap, I have to kill the program. So the question is: why appending an array of only one element to an existing array causes memory problems? This should be pratically the same as adding an element. Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] uvector package appendU: memory leak?
Don Stewart ha scritto: [...] So the question is: why appending an array of only one element to an existing array causes memory problems? It must copy the entire array. Isn't it the same with snocU? And, since the final result is the same, what happens to the temporary memory used for array copying? I have executed the program with: +RTS -A128M -s -c -F1.1 -RTS The memory seems to leak. -- Don Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] uvector package appendU: memory leak?
Manlio Perillo ha scritto: Hi. As with a previous post, I think I have found a possible memory problem with the uvector package. I have this data structure (for, again, my Netflix Prize project): IntMap (UArr (Word16 :*: Word8)) [...] Today I have rewritten my program to use `alter` and `snocU`: append Nothing = Just $ singletonU (a :*: b) append (Just u) = Just $ snocU u (a :*: b) alter append k m [...] Unfortunately I'm still not able to load the entire Netflix Prize training data set, grouping ratings by customers, because my PC has only 2 GB of RAM. The required memory is about 2 GB, but when the system start to swap, I have to kill the program. Just another small strictness hint: append (Just u) = u `seq` Just $ snocU u (a :*: b) and now memory usage is 955 MB. Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] uvector package appendU: memory leak?
Claus Reinke ha scritto: IntMap (UArr (Word16 :*: Word8)) I was adding elements to the map using something like: v = map singletonU (a :*: b) insertWith appendU k v m However doing this eats a *lot* of memory. Since 'insertWith' doesn't actually do the 'appendU', the appends will also be compressed in time, at point of use, rather than spread out in time, over construction. Which might be inconvenient if large volumes of data are involved. Ah, ok; that may explain the problem, but I'm still not sure. With this program: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3063 Memory usage seems ok. 955 MB, against 622 MB when ratings are grouped by movies ( using [(MovieID, UArr (Word32 :*: Word8))] ) The version using `insertWith appendU` is here: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3063#a3065 I still can not explain so much difference in memory usage. So the question is: why appending an array of only one element to an existing array causes memory problems? It must copy the entire array. And doing this repeatedly, with arrays of increasing length, would not be a good idea. I can't really see other solutions. For single-threaded use, one might use fusion to avoid the construction/recopying of the intermediate arrays. Fusion is not possible, in my case, IMHO. I'm parsing movie ratings, where we have customers ratings for each movie in separate files (17770 total movies). Each file has format movie1: customer1,rating1,date1 customer2,rating2,date2 ... movie2: customer3,rating3,date3 ... I want to group ratings by customers, instead of grouping them by movies. Stream fusion is only possible in the latter case (but in this case I don't even need an IntMap). I'm missing something? [...] Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] uvector package appendU: memory leak?
Don Stewart ha scritto: manlio_perillo: Don Stewart ha scritto: [...] So the question is: why appending an array of only one element to an existing array causes memory problems? It must copy the entire array. Isn't it the same with snocU? And, since the final result is the same, what happens to the temporary memory used for array copying? I have executed the program with: +RTS -A128M -s -c -F1.1 -RTS The memory seems to leak. Send me a test case. http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3071 But Claus was right, appendU is lazy; this seems to be the cause of the problem. However now I don't really understand why the two implementations differs in lazyness. Or, to ask a different question, how can I make the version using insertWith strict? Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] g++ std:map vs GHC IntMap
Hi. I have tried to compare performance of the g++ std::map versus the GHC IntMap. The test consists in adding 1000 elements to an empty map. Haskell code is here: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2899 C++ code is here: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2900 The execution time and CPU usage is almost the same. However the C++ version requires 305 MB, the GHC version 617 MB. gcc 4.3.2 ghc 6.8.2 on Debian Linux Lenny, i386. Isn't really possible to optimize memory usage in cases like this? I also tried with Data.HashTable: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2902 but memory usage is 703 MB, and execution time is about 4.5 times slower! Thanks Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Claus Reinke ha scritto: Continuing our adventures into stylistic and semantic differences:-) Can you write this analysis on the wiki? Thanks! Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] g++ std:map vs GHC IntMap
Brandon S. Allbery KF8NH ha scritto: On 2009 Mar 26, at 11:39, Manlio Perillo wrote: The execution time and CPU usage is almost the same. However the C++ version requires 305 MB, the GHC version 617 MB. I wonder how much of that is due to lifting (i.e. laziness). http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2911 Performances are the same; but I'm not sure if this removes all laziness. I have changed ins t (k,x) = insert k x t to ins t (k,x) = k `seq` x `seq` insert k x t Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] g++ std:map vs GHC IntMap
Bulat Ziganshin ha scritto: Hello Manlio, Thursday, March 26, 2009, 6:39:12 PM, you wrote: The test consists in adding 1000 elements to an empty map. +RTS -c -F1.1 then read about garbage collection It now requires 386 MB of memory, but is 4.7 times slower. So, now memory required is about the same as the C++ version, but how can I optimize memory usage without having to tweak the garbage collector? Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: generalized shuffle
o...@okmij.org ha scritto: Hello! The reason is that in a project I need to extract random elements from a list (and I need to extract a lot of them), and using normal methods [1] seems to be rather inefficient. Note that I have used an incorrect term, sorry. What I need, in detail, is to: Given a population of 100480507 elements, extract 17770 random samples, where each sample has a size between 1 and 480189. Yes, it is again my Netflix Prize project; here I'm trying to write a program to generate the entire training data set using random data. [1] by normal method I mean: extract a random number, in the range [0, n] (where n is the length of the sequence), and get the element indexed by that number. BTW, have you tried to use Data.IntMap? That is, put the sequence into an IntMap (each element being indexed by its index) and then extract elements at random; IntMap seems to be quite efficient... This should be ok if I need independent random choices from a population, but I need random samples, instead, sorry for the confusion. Do you think that it is possible to further optimize your tree structure? Or to use IntMap, instead? Cheers, Oleg Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: about Haskell code written to be too smart
Heinrich Apfelmus ha scritto: Manlio Perillo wrote: Conal Elliott ha scritto: Manlio, We live in the age of participation -- of co-education. Don't worry about text-books. Contribute to some wiki pages blogs today that share these smart techniques with others. When I started learning Haskell (by my initiative), what I did was: [steps 1) - 9), mostly internet tutorials ] I think you'd have had a much easier time by starting with a proper book right away, like Richard Bird's Introduction to Functional Programming in Haskell, accompanied by Real World Haskell. Unfortunately, one year ago Real World Haskell was not here. And note that I have no problems with basic functional programming concepts. My problems are specific to Haskell. You see, the reason that books cost money is (should be) high quality content. :) Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
wren ng thornton ha scritto: Manlio Perillo wrote: [...] Following directly from the Rule of Least Power, if you can get away with foreach then that's what you should use. Why? Because the less power the construct has, the fewer corner cases and generalizations a reader of the code needs to consider. Now, just because iterators exist does not mean that one should never use the more general tool. If you're fighting to break out of your chosen straitjacket, then chances are it's the wrong one to use in the first place; it'd be clearer to use more power and have less fighting. [...] Note that, as I have already written, I agree with you. And this is one of the reasons i like Haskell. The main problem, here, is that: - recursion and pattern matching are explained in every tutorial about functional programming and Haskell. This is the reason why I find them more natural. - high level, Haskell specific, abstractions, are *not* explained in normal tutorials or books. The libraries where these concepts are implemented, are not well documented. Most of the documentation is in research papers, and a normal programmer don't want to read these papers. Only in the recent Real World Haskell, all these high level abstraction have been properly documented Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] a cabal package with both a library and executable programs
Hi. This example is taken from the Cabal documentation: Name:TestPackage ... Library Build-Depends: HUnit Exposed-Modules: A, B, C Executable program1 Main-Is: Main.hs Hs-Source-Dirs: prog1 Other-Modules: A, B Executable program2 Main-Is: Main.hs Hs-Source-Dirs: prog2 Other-Modules: A, C, Utils Here I see a small problem. Why should I explicitly declare modules from the library that are used in the executables? I would like to do, instead ... Executable program1 Main-Is: Main.hs Hs-Source-Dirs: prog1 build-depends: TestPackage Executable program2 Main-Is: Main.hs Hs-Source-Dirs: prog2 build-depends: TestPackag Other-Modules: Utils I tried this, but configuration fails, since Cabal does not find the TestPackage. It is true that it does not yet exists, but we are going to build it! Should I fill a feature request ticket, or this is how it is supposed to work? Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Equations for `foo' have different numbers of arguments
Hi. There is a limitation, in Haskell, that I'm not sure to understand. Here is an example: module Main where divide :: Float - Float - Float divide _ 0 = error division by 0 divide = (/) main = do print $ divide 1.0 0.0 print $ divide 4.0 2.0 With GHC I get: Equations for `divide' have different numbers of arguments With HUGS: Equations give different arities for divide However the two equations really have the same number of arguments. What's the problem? Thanks Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] about Haskell code written to be too smart
Hi. In these days I'm discussing with some friends, that mainly use Python as programming language, but know well other languages like Scheme, Prolog, C, and so. These friends are very interested in Haskell, but it seems that the main reason why they don't start to seriously learning it, is that when they start reading some code, they feel the Perl syndrome. That is, code written to be too smart, and that end up being totally illegible by Haskell novice. I too have this feeling, from time to time. Since someone is starting to write the Haskell coding style, I really suggest him to take this problem into strong consideration. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Equations for `foo' have different numbers of arguments
Bulat Ziganshin ha scritto: Hello Manlio, Tuesday, March 24, 2009, 5:44:14 PM, you wrote: divide _ 0 = error division by 0 divide = (/) Equations for `divide' have different numbers of arguments you should write divide a b = a/b Right. But my question was: why can't I write the function as I did? Why can't I write an equation in the curried form? Thanks (and thanks for the other responses, I will reply only here) Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Tim Newsham ha scritto: These friends are very interested in Haskell, but it seems that the main reason why they don't start to seriously learning it, is that when they start reading some code, they feel the Perl syndrome. That is, code written to be too smart, and that end up being totally illegible by Haskell novice. I too have this feeling, from time to time. Since someone is starting to write the Haskell coding style, I really suggest him to take this problem into strong consideration. When you think about it, what you are saying is that Haskell programmers shouldn't take advantage of the extra tools that Haskell provides. No, I'm not saying this. But, as an example, when you read a function like: buildPartitions xs ns = zipWith take ns . init $ scanl (flip drop) xs ns that can be rewritten (argument reversed) as: takeList :: [Int] - [a] - [[a]] takeList [] _ = [] takeList _ [] = [] takeList (n : ns) xs = head : takeList ns tail where (head, tail) = splitAt n xs I think that there is a problem. The buildPartition contains too many blocks. And I have read code with even more blocks in one line. It may not be a problem for a seasoned Haskell programmer, but when you write some code, you should never forget that your code will be read by programmers that can not be at your same level. I think that many Haskell programmers forget this detail, and IMHO this is wrong. Haskell provides the ability to abstract code beyond what many other programming systems allow. This abstraction gives you the ability to express things much more tersely. This makes the code a lot harder to read for people who are not familiar with the abstractions being used. The problem is that I have still problems at reading and understanding code that is too much terse... Because I have to assemble in my mind each block, and if there are too many blocks I have problems. [...] Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Jake McArthur ha scritto: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Manlio Perillo wrote: | These friends are very interested in Haskell, but it seems that the main | reason why they don't start to seriously learning it, is that when they | start reading some code, they feel the Perl syndrome. | | That is, code written to be too smart, and that end up being totally | illegible by Haskell novice. | | I too have this feeling, from time to time. I used to think this as well, but have since changed my mind about most cases. The same for me. [...] All these factors combined just means that you have to concentrate just as hard to understand one line of Haskell as you might 10 or 20 lines of other languages. There is 10 or 20 times the amount of information. This is right. The problem is that often (IMHO) a function definition can be rewritten so that it is much more readable. As an example, with the takeList function I posted. In other cases, you can just break long lines, introducing intermediate functions that have a descriptive name *and* a type definition. Doing this is an art, but a coding style for Haskell should try to document this. [...] Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Jake McArthur ha scritto: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Manlio Perillo wrote: | This is right. | The problem is that often (IMHO) a function definition can be rewritten | so that it is much more readable. | | As an example, with the takeList function I posted. I looked at it, found nothing wrong with the original, and absolutely hated your fixed version. With the original version, you have to follow 3 separate operations: Prelude let xs = [1, 2, 3, 4] :: [Int] Prelude let ns = [3, 1] :: [Int] Prelude let _1 = scanl (flip drop) xs ns Prelude let _2 = init _1 Prelude let _3 = zipWith take ns _2 With my function, instead, you only have to follow 1 operation: Prelude (head, tail) = splitAt n xs [...] Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Yitzchak Gale ha scritto: [...] So the bottom line is that Manlio is right, really. It's just that Haskell is still very different than what most programmers are used to. So it does take a while to get a feeling for what is too smart. Right, you centered the problem! The problem is where to place the separation line between normal and too smart. Your function is readable, once I mentally separate each step. For someone with more experience, this operation may be automatic, and the function may appear totally natural. When writing these dense function, it is important, IMHO, to help the reader using comments, or by introducing intermediate functions. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Jake McArthur ha scritto: [...] | With my function, instead, you only have to follow 1 operation: | | Prelude (head, tail) = splitAt n xs I think you are way oversimplifying your own code. ~takeList :: [Int] - [a] - [[a]] ~takeList [] _ = [] ~takeList _ [] = [] ~takeList (n : ns) xs = head : takeList ns tail ~where (head, tail) = splitAt n xs In order to understand this, I have to look at three different cases, an uncons, a splitAt, a cons, *and* a recursive call. This is *seven* different things I have to absorb. These cases are, IMHO, more natural. We have a set of equations, pattern matching and recursion. These are one of the basic building block of Haskell. The only foreign building block is the splitAt function. But this may be really a question of personal taste or experience. What is more natural? 1) pattern matching 2) recursion or 1) function composition 2) high level functions ? [...] Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Conal Elliott ha scritto: Another helpful strategy for the reader is to get smarter, i.e. to invest effort in rising to the level of the writer. Or just choose a different book if s/he prefers. - Conal This strategy is doomed to failure, unfortunately. We live in the real world, compromises are necessary. [...] Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Jonathan Cast ha scritto: [...] I think, in general, the best way to document the purpose of the function is -- | Split a function into a sequence of partitions of specified lenth takeList :: [Int] - [a] - [[a]] Note that I was not speaking about the best way to document a function. I was speaking about the best way to write a function, so that it may help someone who is learning Haskell. [...] Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Zachary Turner ha scritto: [...] but I do understand that one of the primary uses cases and/or motivating factors for using Haskell is when you really just NEED that extra abstraction and power you get from being able to do these types of things. Someone once said that simple problems should be simple and difficult problems should be possible. That doesn't mean the difficult problems become EASY. One of the best uses for haskell is solving difficult problems. It's obviously still going to be difficult to solve, and as such the writer (and hence by extension the reader) is going to have to be smart as well. I agree with you, and in fact I'm still learning Haskell. The reason I'm still learning Haskell is because I like its syntax. And yes, I also like the ability to write efficient function by composing other function. But there is a limit. In C you have the ability to write assembler code, but one usually think twice before doing so, since it will become unreadable to most of the people. If you think that writing low level assembler code is the best solution, you should at least document it well, instead of assuming that the reader is as smart as you. As I have written at the begin of the thread, there are people I know (*much* more smarter then me), that keep themselves away from Haskell because they start to read some code, and they feel something is wrong. They *think* ah, the author wrote code in this way just to show how smart he is; how can I learn a language if most of the available code is written in this way? Note the use of the verb think. This is only a sensation, and it is wrong; but sensations are important. [...] Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Erik de Castro Lopo ha scritto: Manlio Perillo wrote: I was speaking about the best way to write a function, so that it may help someone who is learning Haskell. I've been learning Haskell for about 3 months. I think its a mistake to write code so that its easy for someone learning Haskell to read it. Code should be written to be easily read by other experienced users of the langauge. Note that to write code so that its easy to read, does not mean rewrite the code as I did in the example. It also means to add good comments, in the right places. Erik Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Conal Elliott ha scritto: I'd love to help newbies get the hang of Haskell without having to jump in the deep (and smart-infested) end first. And I'd love for people to keep writing smart code for non-newbies to enjoy. Perhaps a practical suggestion would be some wiki pages devoted to pointing out code with various learning qualities, to help haskellers of all levels of experience learn effectively. Yes, this is a good start. Advices to people learning Haskell about how to learn reading code. And advices to experienced Haskell programmers about how to document their code so that it may help less experienced programmers. IMHO, this should also go in the future Haskell coding style. [...] Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Dan Piponi ha scritto: Miguel Mitrofanov wrote: takeList = evalState . mapM (State . splitAt) However, ironically, I stopped using them for pretty much the same reason that Manlio is saying. Are you saying there's a problem with this implementation? It's the only one I could just read immediately. Yes, you understand it immediately once you know what a state monad is. But how well is introduced, explained and emphasized the state monad in current textbooks? When I started learning Haskell, the first thing I learned was recursion and pattern matching. So, this may be the reason why I find more readable my takeList solution. [...] Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Conal Elliott ha scritto: Manlio, We live in the age of participation -- of co-education. Don't worry about text-books. Contribute to some wiki pages blogs today that share these smart techniques with others. When I started learning Haskell (by my initiative), what I did was: 1) Quick reading of the first tutorial I found on the wiki. http://darcs.haskell.org/yaht/yaht.pdf, if i remember correctly 2) Quick reading the Haskell Report 3) Reading another tutorial: http://www.haskell.org/tutorial/ 4) Reading again the Haskell Report 5) A lot of time spent finding good tutorials. Yet, I did not knew what monads were, I just felt that monads were some strange and advanced feature ... A period where I stop looking for Haskell 6) Found some good tutorial about what monads are, but yet I did not knew anything about state monads, monad transformers, and so. ... Another period were I stop looking for Haskell 7) The Real Word Haskell book. Finally in one book all advanced concepts. I read the book online. I found the book good, but i think it is too dispersive in some chapters. I already forgot some of the concepts I read, mostly because in some chapter I get annoyed, and started skipping things, or reading it quickly. I will buying a copy in May, at Pycon Italy (were there will be a stand by O'Really), so that I can read it again. 8) New impetus at learning Haskell. I read again the Haskell Report, and the A Gentle Introduction to Haskell. I finally started to understand how things works 7) Start to write some real code. I now I'm able to understand much of the code I read. But for some kind of code I still have problems. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] about Haskell code written to be too smart
Conal Elliott ha scritto: And advices to experienced Haskell programmers about how to document their code so that it may help less experienced programmers. Manlio -- You may be missing the point of my suggestion, Ah, sorry. which is to help people *find* code that suits them, rather than changing anyone's coding style. Optimizing code for one segment of readers is pessimizing it for another. Instead of dumbing down the smart code, I'd like to help your friends to help each other find dumber code, *and* to help others of us find smarter code. This may be hard to do. However I already suggested to start reading the Prelude code, from the Haskell Report. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] System.Random.Shuffle fix
friggin friggin ha scritto: I was looking for a shuffling algorithm to shuffle mp3-playlists so was very happy to see System.Random.Shuffle: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/random-shuffle-0.0.2 However I get errors,non-exhaustive patterns in function shufleTree or extractTree depending how I call it. Errors are at the bottom. During building you should only get warnings. Non exhaustive patterns are ok, you hit them only if the input data is incorret. Probably in these cases, error should be used. I fixed it but I don't have the math skills to see if I perhaps broke it statistically ... Here is my fix, someone (don't remember who, helped me a little): http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2789#a2789 the shuffle at the end is with the fix. *Freet S.shuffle [1..10] [1..3] Your input is not correct. If you read the source code (in a future version I'll add Haddock support): -- Given a sequence (e1,...en) to shuffle, and a sequence -- (r1,...r[n-1]) of numbers such that r[i] is an independent sample -- from a uniform random distribution [0..n-i], compute the -- corresponding permutation of the input sequence. I have added a convenience function `shuffle'`, where you just need to supply a random number generator. Note that the shuffle' function contains a bug; it should return the new random generator: shuffle' :: RandomGen gen = [a] - Int - gen - ([a], gen) I'm going to fix it in next version. [...] Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] generalized shuffle
Hi. I have implemented a generalized shuffle function, for the random-shuffle package http://hackage.haskell.org/cgi-bin/hackage-scripts/package/random-shuffle I have not yet commited the change, before that I would like to receive some feedbacks (especially by the original author of the shuffle function, in Cc) The new function is defined in this example: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2808 Note that it make use of functions not exported by the System.Random.Shuffle module. I have generalized the original shuffle function in two ways: 1) It is now possible to obtain a random sample from a sequence, by doing, as an example: shuffles [1..10] [2, 3, 4, 2, 1] Note that this is equivalent at taking the first 5 elements of the full shuffled sequence, and the performance are pratically the same. 2) It is now possible to pass an infinite r-sequence to the shuffle function, as an example: shuffles [1..10] $ cycle [9, 8 .. 1] The result is an infinite list, with elements cyclically extracted from the r-sequence. When the r-sequence contains random numbers, we get an infinite sequence containing all possible random choices of elements in the original sequence. Why I'm doing this? The reason is that in a project I need to extract random elements from a list (and I need to extract a lot of them), and using normal methods [1] seems to be rather inefficient. Using the `shuffles` function should be much more efficient, since it uses a tree, and this tree is built only once. [1] by normal method I mean: extract a random number, in the range [0, n] (where n is the length of the sequence), and get the element indexed by that number. Indexing for a list if very expensive. And it is not very fast, even using arrays (unless you use non portable unsafeRead). Thanks Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Haskell syntax highlighting support for Trac
Hi. I have noted that with Haskell projects that make use of Trac for issue tracking, it is not possible to have syntax highlighting for Haskell code. http://hackage.haskell.org/trac/hackage/wiki/WikiProcessors Is someone working on this? I can try to write a Trac plugin or patch to solve this. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Haskell syntax highlighting support for Trac
Patai Gergely ha scritto: I'm not sure what your problem is. Yes, haskell support is available in recent versions of Trac. I was trying with the Trac instance for Hackage/Cabal. [...] Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Cabal properties with conditionals
Hi. Assuming this configuration fragment: library xxx cc-options: -Wall if flag(HAVE_URANDOM) cc-options:-DHAVE_URANDOM In case the HAVE_URANDOM flag is defined, what will be the value of the used cc-options? 1) -DHAHE_URANDOM 2) -Wall -DHAHE_URANDOM Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] [ANN] random-stream 0.1.1
I'm pleased to announce a new version of random-stream package. In this version I have rewritten the internal API. Now the package exposes a new System.Random.URandom module, with the function: urandom :: Int - IO S.ByteString This function is an interface to the system pseudo random numbers generator. The API of the module System.Random.Stream is left unchanged, but it now uses the urandom function, internally. System.Random.Stream provides a pure interface over urandom, using a lazy ByteString of random bytes. Since it provides an infinite stream of random data, things should be ok. I have also added support to Windows (older versions may not be supported, however [1]). The package should be available on Hackage. Mercurial repository is at: http://hg.mperillo.ath.cx/haskell/random-stream/ Example usage: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2738 [1] It is possible to use OpenSSL, as a fallback. Just set the HAVE_SSL flag, during package configuration. The OpenSSL DLL should be placed where the linker can find it. I tried to test this, but without success (but I did not put much effort in it). Regards Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: [ANN] ansi-terminal, ansi-wl-pprint - ANSI terminal support for Haskell
Achim Schneider ha scritto: Manlio Perillo manlio_peri...@libero.it wrote: Do you plan to extend the package to any terminal, using terminfo database? Are there any non-ansi terminals left? I assumed they were extinct... it'd come close to using EBCDIC. http://groups.google.com/group/comp.unix.shell/browse_thread/thread/a1a088da1032f336/b3862f039688f0f8 Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [ANN] ansi-terminal, ansi-wl-pprint - ANSI terminal support for Haskell
Max Bolingbroke ha scritto: 2009/3/21 Manlio Perillo manlio_peri...@libero.it: Max Bolingbroke ha scritto: These two packages allow Haskell programs to produce much richer console output by allowing colorisation, emboldening and so on. Do you plan to extend the package to any terminal, using terminfo database? Manlio, I don't plan on an extension like this myself, as assuming ANSI support is enough for all the applications I'm interested in, and I suspect dealing with terminfo stuff will be a headache. Sorry! I think, instead, that it should not be that hard. After all the C API can be wrapper in pure functions. And you can also write a pure Haskell interface to terminfo database. Cheers, Max Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Cabal properties with conditionals
Duncan Coutts ha scritto: On Sat, 2009-03-21 at 14:26 +0100, Manlio Perillo wrote: Hi. Assuming this configuration fragment: library xxx cc-options: -Wall if flag(HAVE_URANDOM) cc-options:-DHAVE_URANDOM In case the HAVE_URANDOM flag is defined, what will be the value of the used cc-options? 1) -DHAHE_URANDOM 2) -Wall -DHAHE_URANDOM The latter. Try it. In general all fields get `mappend`ed which for list-like fields means appending. For single value fields like True/False then latter fields win. Ok, thanks. P.S: I tried to send an email to cabal-devel some days ago, with a feature I would like to see in Cabal. But the mail was never posted to the mailing list. Is that list moderated? Should I simply fill a ticket? Duncan Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Cabal properties with conditionals
Duncan Coutts ha scritto: On Sat, 2009-03-21 at 23:05 +0100, Manlio Perillo wrote: P.S: I tried to send an email to cabal-devel some days ago, with a feature I would like to see in Cabal. But the mail was never posted to the mailing list. Is that list moderated? It's subscriber only, like all the haskell.org mailing lists. We got too much spam otherwise. :-( But I did subscribed! Should I simply fill a ticket? Yes, thanks very much. Ok. I posted some details in this thread: http://www.haskell.org/pipermail/haskell-cafe/2009-March/057923.html Duncan Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] random shuffle and random list partition
Yitzchak Gale ha scritto: Hi Manlio, Manlio Perillo wrote: For my Netflix Prize project I have implemented two reusable modules. The first module implements a random shuffle on immutable lists... The second module implements a function used to partition a list into n sublists of random length. [...] As you point out, your partition algorithm is not fair. Using your Random.Shuffle and a well-know trick from combinatorics, you can easily get a fair partitions function: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2485#a2495 Someone added an alternative implementation: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2485#a2497 Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Announce: language-python
Bernie Pope ha scritto: Language-python provides a parser (and lexer) for Python written in Haskell. Currently it only supports version 3 of Python (the most recent version), but it will support version 2 in the future. Great, thanks! I was planning to write a Python parser by myself, since I would like to do some AST processing, and Haskell is IMHO a better choice. [...] Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] [ANN] random-stream package
Hi. I'm pleased to announce the availability of my random-stream package. The cabalized package is available at: http://haskell.mperillo.ath.cx/random-stream-0.0.1.tar.gz Note that I have not uploaded it on Hackage, and I do not plan to upload it in the near future, at least until I will repute the package mature enough. From package description: Portable interface for the operating system source of pseudo random data. Supported sources are Unix /dev/urandom, Win32 CryptGenRandom and OpenSSL pseudo random numbers generator. This package is based on idea from os.urandom implementation, in CPython. The idea is to view the system pseudo random generator as a stream of infinite random bytes. So, I have used a lazy bytestring, and unsafePerformIO/unsafeInterleaveIO. The underlying data is: newtype Stream = Stream L.ByteString and the constructor *is exposed*. The package configuration *must* be done using a flag to specify the source to use. As an example: runghc Setup.hs configure -O2 -fHAVE_URANDOM runghc Setup.hs configure -O2 -fHAVE_SSL runghc Setup.hs configure -O2 -fHAVE_WIN32_CRYPT If the flag is not specified, the compilation will fail. Note that Windows it not yet supported. The stream generator implements tha RandomGen interface. Some notes: 1) The system *pseudo* random generator is used. As an example, on Linux, /dev/random produces true random numbers, but it may block if there is not enough entropy in the system. More details on http://en.wikipedia.org/wiki//dev/random 2) When reading data, the lazy bytestring chunk size is used. The default chunk size, however, is fine when reading from a regular file, but may not be the best choice when reading pseudo random numbers. For SSL (and Windows) support, the chunk size can be easily changed; but for Unix support this seems to not be possible, unless I reimplement hGetContents. The Data.ByteString.Lazy module have an hGetContentsN function. Unfortunately it is not exposed. I find this annoying; is it possible to export the *N functions from Data.ByteString.Lazy.Internal? 3) I have tested the package on Debian Linux Etch, using GHC 6.8.2. Feedback will be appreciate. I not sure about some details. Usage example: module Main where import System.Random import System.Random.Stream gen = mkStream l :: [Int] l = randoms gen :: [Int] main = do print l Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] doubts about runGetState in the binary package
Hi. I have some doubts about the runGetState function in the binary package. The signature is: runGetState :: Get a - LBS - Int64 - (a, LBS, Int64) however the Int64 input parameter is not documented. What value should I pass? How will be used? Thanks Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: doubts about runGetState in the binary package
ChrisK ha scritto: Manlio Perillo wrote: Hi. I have some doubts about the runGetState function in the binary package. The signature is: runGetState :: Get a - LBS - Int64 - (a, LBS, Int64) however the Int64 input parameter is not documented. What value should I pass? How will be used? [...] hackage has the code at http://hackage.haskell.org/packages/archive/binary/0.5.0.1/doc/html/src/Data-Binary-Get.html#runGetState Yes, and I have read the code, as the first thing. And (after some testing) I figured out how it works. However I wanted to be sure I understand it, since, as I have written, IMHO it is not clearly documented; and I can't see how it can be useful, there are no usage examples. [...] Thanks Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [ANN] random-stream package
Brent Yorgey ha scritto: On Thu, Mar 19, 2009 at 11:55:16AM +0100, Manlio Perillo wrote: Note that I have not uploaded it on Hackage, and I do not plan to upload it in the near future, at least until I will repute the package mature enough. I would encourage you to upload it to Hackage regardless of its supposed maturity, especially if you want feedback. I'm not sure why you would be hesitant to upload your package; Because I'm very strict with my code. I don't even release my Python packages on http://pypi.python.org/pypi, and I'm much more expert in Python ;-). Hackage hosts an extremely wide range of packages in terms of maturity, and that's a good thing. It also provides a very convenient method (in combination with cabal-install) of disseminating libraries. I am several orders of magnitude more likely to try something out if I can just 'cabal install foo' than if I have to manually download and unpack a tarball myself, and I doubt I am alone in this. It should be possible to do: cabal install http://haskell.mperillo.ath.cx/random-stream-0.0.1.tar.gz It's unfortunate that this is not supported. Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [ANN] random-stream package
Gökhan San ha scritto: Manlio Perillo manlio_peri...@libero.it writes: The stream generator implements tha RandomGen interface. This is really cool, though I think 'split' is a must. Maybe all instances could share the same stream in the background, then it wouldn't cause resource issues. I have thought about this, but is it safe? I feared that this would break referential transparency. Also, IMHO mkStream should produce an IO Stream (like getStdGen), as current implementation isn't referentially transparent; let the library user decide whether to use unsafePerformIO. The basic idea is that there is this system wide random number generator, that is always available. That's the reason why mkIOStream is hidden. 3) I have tested the package on Debian Linux Etch, using GHC 6.8.2. Tested with Gentoo Linux on a 64-bit machine using GHC 6.10.1. On my system, it is less than 2 times slower than StdGen. Thanks for the report. I have tested against the pure mersenne twister, on my 32 bit system, and it seems to have the same performances. I should check against StdGen, too. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [ANN] random-stream package
Andrew Wagner ha scritto: Maybe the feature has been added, but not released, because somebody is strict about their code... We should first agree on the meaning of release some code... Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [ANN] random-shuffle package
Max Rabkin ha scritto: On Thu, Mar 19, 2009 at 4:41 PM, Manlio Perillo manlio_peri...@libero.it wrote: However, in this case, the package name should be changed. I'm not sure it is a good idea to release a package that implements only one function (but I may be wrong). Personally, I think that there is little harm in releasing a package if it does something useful in a not-totally-broken way. Especially if you plan to extend it. Ok, I will add the package on Hackage, thanks. One last thing. Yitzchak Gale kindly posted an example with the implementation of my `partition` function: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2485 using shuffle'. Now, I would like to implement a `sample` function, like the one in the Python random module: http://docs.python.org/library/random.html#random.sample Is an implementation based on shuffle good? Or a more efficient implementation is possible? How should be the interface of this sample function? Should it return the sampled list *and* the original list without the sampled elements? Or it is better to just return the sampled list? If both partition and sample functions can be implemented efficiently using shuffle', then it may be appropriate to add these two functions inside the random-shuffle package. Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] random shuffle and random list partition
Yitzchak Gale ha scritto: [...] While I think Oleg's tree method is beautiful, in practice it may be re-inventing the wheel. I haven't tested it, but I doubt that this implementation is much better than using the classical shuffle algorithm on an IntMap. Do you have a working implementation? It's essentially the same tree inside. That's what I usually use for this, and it works fine. Oleg implementation is rather efficient, but it requires a lot of memory for huge lists. Here, as an example, two programs, one in Python and one in Haskell. The default Python generator in Python use the Mersenne Twister, but returning floats number in the range [0, 1]. # Python version from random import shuffle n = 1000 m = 10 l = range(1, n + 1) shuffle(l) print l[:m] -- Haskell version module Main where import Random.Shuffle import System.Random.Mersenne.Pure64 (newPureMT) n = 1000 m = 10 l = [1 .. n] main = do gen - newPureMT print $ take m $ shuffle' l n gen The Python version performances are: real0m16.812s user0m16.469s sys 0m0.280s 150 MB memory usage The Haskell version performances are: real0m8.757s user0m7.920s sys 0m0.792s 800 MB memory usage In future I can add an implementation of the random shuffle algorithm on mutable arrays in the ST monad. I've tried that in the past. Surprisingly, it wasn't faster than using trees. Perhaps I did something wrong. Or perhaps the difference only becomes apparent for huge lists. Can you try it on the list I have posted above? As you point out, your partition algorithm is not fair. Using your Random.Shuffle and a well-know trick from combinatorics, you can easily get a fair partitions function: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2485#a2495 Thanks, this is very nice. I have to run some benchmarks to see if it is efficient. Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] can GHC build an executable from a C source file?
Hi. I'm checking if it possible to build an executable from C source files only. As an example: #include stdio.h int main () { printf(hello world\n); return 0; } $ghc --make foo.c However this only produces the object file, foo.o; it does not build the executable file. What is the reason for this behaviour? I have tested with GHC 6.8.2. Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] can GHC build an executable from a C source file?
Anton Tayanovskyy ha scritto: Works for me without the --make, as `ghc foo.c` For me, too, thanks. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] can GHC build an executable from a C source file?
Joe Fredette ha scritto: You know, I hear theres this brilliant program for compiling C code -- gcd? ccg? gcc, yah gcc... Anyone tried it? In all seriousness though, why do you need to compile c with ghc? I'm curious, it seems a bit pointless... It's for a possible extension I'm planning for Cabal. The idea is to add support for configuration features. A configuration feature is similar to a configuration flag, but with some important differences. data Feature = Bool | String A feature block: Feature HaveURandom action: execute include-dirs: ... c-sources: features/urandom.c -- other possible properties, as listed in -- Build information chapter, excluding `buildable` and -- `other-modules` -- The `action` property can have values `compile` (default`) -- or `execute` This means that I'm testing for a feature named HaveURandom, and the testing requires to compile/build and then execute some code. In this case there is only a C source file that needs to be compiled; this is the reason why I have asked if GHC supports compilation for C source files only, and the creation of an executable. Here I'm asking Cabal to execute the generated program. If the executable returns 0, then HaveURandom will have value `Feature True`; if it returns 1, then HaveURandom will have value `Feature False`. If the executable write something on stdout, then HaveURandom will have value `Feature string`. If compilation fails, HaveURandom will have value `Feature False`. If `action` property is compile, then a successful compilation will result in HaveURandom having a value of `Feature True`. Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Visualising the Hierarchical Namespace
Don Stewart ha scritto: I've just finished a post (and quick tool) for graphing the complete module namespace of Haskell, taken from the core libraries and all of Hackage. It's quite large: http://donsbot.wordpress.com/2009/03/16/visualising-the-haskell-universe/ Just a note: isn't it time to switch to SVG images? All that PNG images are pretty useless, they are too small. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Visualising the Hierarchical Namespace
Don Stewart ha scritto: manlio_perillo: Don Stewart ha scritto: I've just finished a post (and quick tool) for graphing the complete module namespace of Haskell, taken from the core libraries and all of Hackage. It's quite large: http://donsbot.wordpress.com/2009/03/16/visualising-the-haskell-universe/ Just a note: isn't it time to switch to SVG images? All that PNG images are pretty useless, they are too small. All the .svg's are linked at the bottom. Note: they will break your browser FF handle them quite well. The problem is that there is too much informations :). By the way: in the SVG files there is a lot of redundant informations. Graphviz should realy use CSS instead of SVG presentation attributes. Using CSS to factor out common style should reduce image size a lot. I would like to see a Graphics package for Haskell, based on SVG/libVG/Cairo/PDF imaging model. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] random shuffle and random list partition
Hi. For my Netflix Prize project I have implemented two reusable modules. The first module implements a random shuffle on immutable lists. It uses http://okmij.org/ftp/Haskell/perfect-shuffle.txt, with an additional wrapper function, having a more friendly interface. The second module implements a function used to partition a list into n sublists of random length. I have pasted the modules here: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2483 http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2485 If someone is interested (and if Oleg give me permission), I can release them as a package on Hackage. I need to improve documentation, however. In future I can add an implementation of the random shuffle algorithm on mutable arrays in the ST monad. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] broken IO support in uvector package, when using non primitive types
Daniel Peebles ha scritto: I have added UAProd-based Binary instances to my copy of the uvector repo at http://patch-tag.com/publicrepos/pumpkin-uvector . I have some extensive tests (added to the existing tests directory) of things in there and they seem to be mostly sane. Thanks for the work. I have a question, however. Isn't it possible to write binary serialization code for the generic Stream type? Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Alternative to Data.Binary
Svein Ove Aas ha scritto: On Sat, Mar 14, 2009 at 1:37 PM, Grzegorz Chrupala grzegorz.chrup...@computing.dcu.ie wrote: Hi all, Is there a serialization library other than the Data.Binary from hackage? I am using Data.Binary in a couple of projects, but I have found its stack and memory usage very hard to control. Its very common that decoding a map or list of non-trivial size uses up all available RAM, or causes a stack overflow. That little problem appears to be an artifact of the particular Binary instance for lists (the map instance starts by converting to/from a list), and should be fixable in principle, for example by writing them in chunks of 256 elements. I can confirm that reading serialized UArr from uvector package does not cause memory problems. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] broken IO support in uvector package, when using non primitive types
Daniel Peebles ha scritto: I have added UAProd-based Binary instances to my copy of the uvector repo at http://patch-tag.com/publicrepos/pumpkin-uvector . I can confirm that it works for me. However I have now a memory problem with data decoding. I need to serialize the Netflix Prize training dataset. When I parse the data from original data set, memory usage is about 640 MB [1]. But when I load the data serialized and compressed, (as [UArr (Word32 *:* Word8)]), memory usage is about 840 MB... The culprit is probably the decoding of the list (17770 elements). [1] I have written a script in Python that parse the data, and it only takes 491 MB (using a list of a tuple with two compact arrays from numpy). So, GHC has memory problems here. Thanks Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Alternative to Data.Binary
Don Stewart ha scritto: grzegorz.chrupala: Hi all, Is there a serialization library other than the Data.Binary from hackage? I am using Data.Binary in a couple of projects, but I have found its stack and memory usage very hard to control. Its very common that decoding a map or list of non-trivial size uses up all available RAM, or causes a stack overflow. [...] Have you tried the latest release, which modified the Map and [a] instances? Tried right now. My [UArr (Word32 :*: Word8)], where the list length is 17770, now requires 660 MB of memory, when decoding, against 840 MB with the previous version of the binary package. This is fantastic, thanks! Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Don Stewart ha scritto: [...] I think uvector only works with certain types that can be unboxed, while storablevector works with all types that instantiate Foreign.Storable.Storable. I don't know about vector. From the description of vector, I have the One of the nice feature of uvector is the support for UArr (a :*: b). An UArr (a :*: b) can be easily (with fstU and sndU) transformed in UArr a and UArr b. uvector package also suppors Complex and Rational, however the support for these type is hard written, using a UAProd class, and requires some boiler plate code (IMHO). I find StorableVector implementation much more simple; I would like to see it in the Haskell Platform. As for Data.Parallel, uvector and vector, it seems there is some code duplication. Both Data.Parallel and uvector, make us of a strict pair type. Such a type is also implemented in the strict package [1]. The authors are the same, so I don't understand the reason of code replication. There is also replication in the definition of the Stream data type. [1] there seems to be an error in the documentation: http://hackage.haskell.org/packages/archive/strict/0.3.2/doc/html/Data-Strict-Tuple.html In the description, there is: Same as regular Haskell pairs, but (x :*: _|_) = (_|_ :*: y) = _|_ but in the synopsis, the data constructor is :!:, not :*:. Regards Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Don Stewart ha scritto: bulat.ziganshin: Hello Don, Wednesday, March 11, 2009, 12:12:07 AM, you wrote: Right, so my point stands: there's no difference now. If you can write a Storable instance, you can write a UA et al instance. yes, if there is some class provided for this and not just hard-coded 4 or so base types That's right. For example (supporting even pairs): instance (RealFloat a, UA a) = UA (Complex a) where newtype UArr (Complex a) = UAComplex (UArr (a :*: a)) newtype MUArr (Complex a) s = MUAComplex (MUArr (a :*: a) s) You also have to add instance for UIO: instance (RealFloat a, UIO a) = UIO (Complex a) where hPutU h (UAComplex arr) = hPutU h arr hGetU h = do arr - hGetU h return (UAComplex arr) With Storable, this should not be required; you just have to write an instance for the Storable class. Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Manlio Perillo ha scritto: [...] uvector package also suppors Complex and Rational, however the support for these type is hard written, using a UAProd class, and requires some boiler plate code (IMHO). Correction: UAProd is not a class, sorry. It is the UA constructor overloaded for a:*:b, and Complex and Rational just reuse this specialization. [...] Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Don Stewart ha scritto: [...] You also have to add instance for UIO: instance (RealFloat a, UIO a) = UIO (Complex a) where hPutU h (UAComplex arr) = hPutU h arr hGetU h = do arr - hGetU h return (UAComplex arr) With Storable, this should not be required; you just have to write an instance for the Storable class. Though you get no IO operations with Storable... UIO is entirely separate. Yes, but using Storable and Foreign.Ptr, IO is rather simple. From storablevector package: -- | Outputs a 'Vector' to the specified 'Handle'. hPut :: (Storable a) = Handle - Vector a - IO () hPut h v = if null v then return () else let (fptr, s, l) = toForeignPtr v in withForeignPtr fptr $ \ ptr - let ptrS = advancePtr ptr s ptrE = advancePtr ptrS l -- use advancePtr and minusPtr in order to respect -- alignment in hPutBuf h ptrS (minusPtr ptrE ptrS) -- | Read a 'Vector' directly from the specified 'Handle'. This -- is far more efficient than reading the characters into a list -- and then using 'pack'. -- hGet :: (Storable a) = Handle - Int - IO (Vector a) hGet _ 0 = return empty hGet h i = createAndTrim i $ \p - let elemType :: Ptr a - a elemType _ = undefined sizeOfElem = sizeOf (elemType p) in fmap (flip div sizeOfElem) $ hGetBuf h p (i * sizeOfElem) Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] bytestring vs. uvector
Bryan O'Sullivan ha scritto: [...] text is not mature, and is based on the same modern fusion framework as uvector and vector. It uses unpinned arrays, but provides functions for dealing with foreign code. What is the reason why you have decided to use unpinned arrays (ByteArray#) instead of pinned arrays (Foreign.Ptr)? [...] Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] broken IO support in uvector package, when using non primitive types
Hi. I'm sure this is a know bug, but I can't find details with Google. The offending code is here: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2362 When I execute the program I get: uio: readChunkBU: can't read What's the problem? I'm using uvector from: http://code.haskell.org/~dons/code/uvector/ Thanks Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] broken IO support in uvector package, when using non primitive types
Daniel Peebles ha scritto: As far as I know, the reason for this is that the UIO instance for productions writes the two rows out sequentially to file, but doesn't include any means to determine the length of the two halves when it's loading up again. When you try to read the production back in, it tries to read in two arrays, but the first array read consumes all the input leaving the second with nothing. Ok, thanks. For now, I think that the simple solution is to not use UArr (a:*:b), but (UArr a, UArr b). Or I should just switch to Data.Array.Unboxed. The operations I need with the array are only: - sum of elements - binary search - serialization/deserialization - sorting (but right now I sort the list before creating the array) with a very big data set. For my problem, is uvector the best solution? What about storablevector or cvector, instead? Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] broken IO support in uvector package, when using non primitive types
Daniel Fischer ha scritto: [...] Worked with uvector-0.1.0.1: [...] But not with uvector-0.2 [...] The main difference is that in uvector 0.2, hPutBU does not write in the file the length of the array; hGetBU simply use the file size. let elemSize = sizeBU 1 (undefined :: e) n - fmap ((`div`elemSize) . fromInteger) $ hFileSize h So, the problem seems to be here. This simply don't support having two arrays written in the same file, and sizeBU belongs to the UAE class, whose instances are only declared for builtin types. So, the patch is: just revert this change. Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] broken IO support in uvector package, when using non primitive types
Don Stewart ha scritto: [...] So, the patch is: just revert this change. Or... use your own UIO instance. That's why it's a type class! Why should I rewrite the UIO instance, if one already exists? Anyway, for the background on this: Tue Nov 18 08:44:46 PST 2008 Malcolm Wallace * Use hFileSize to determine arraysize, rather than encoding it in the file. Here is a patch to the uvector library that fixes hGetBU and hPutBU to use the filesize to determine arraysize, rather than encoding it within the file. I guess the disadvantages are that now only one array can live in a file, and the given Handle must indeed be a file, not a socket Handle. But the advantage is that one can read packed raw datafiles obtained externally. Still, again, I'd point out that uvector is alpha, APIs can and will change. Ok, with API changes. But they *should not* break normal package usage. That patch simply introduced a bug in uvector. If you think that the patch is useful, then the patch should also include a modification to the UIO instance for a:*:b. Regards Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] advice on a parsing function
Hi. I'm still working on the Netflix Prize; now I have managed to implement a parsing function for the qualifying data set (or quiz set). The quiz set is in the format: 1: 10 20 30 2: 100 200 3: 1000 2000 3000 4000 5000 Where 1, 2, 3 are movie ids, and the others are user ids. The parsing program is at: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2300 The program reads the file using lazy IO. One of the feature I want is, for the quiz function, to be a *good producer*. I'm quite satisfied with the result (this is the first complex parsing function I have written in Haskell), and I have also managed to avoid the use of an accumulator. However I'm interested to know it there is a more efficient solution. The qualifying.txt file is 51MB, 2834601 lines. On my laptop, the function performance are: real1m14.117s user0m2.496s sys 0m0.136s CPU usage is about 3%, system load is about 0.20, memory usage is 4956 KB. What I'm worried about is: quiz' ((id, :) : l) = (id, quiz'' l) : quiz' l quiz' ((id, _) : l) = quiz' l the problem here is that the same elements in the list are processed multiple times. I have written alternate versions. The first using foldl http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2303 (that, however, builds the entire data structure in memory, and in reverse order) The latter using foldr http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2304 (that, however, is incorrect and I'm unable to fix). The performances of the foldr version are very similar to the performances of the first implementation (it make use, however, of 3704 KB, and it is about 3 seconds faster). P.S: the expected result for the sample quiz set I have posted is: [(1,[10,20,30]),(2,[100,200]),(3,[1000,2000,3000,4000,5000])] The foldl version produces: [(3,[5000,4000,3000,2000,1000]),(2,[200,100]),(1,[30,20,10])] The foldr version produces: [(1,[]),(2,[10,20,30]),(3,[100,200]),(5000,[1000,2000,3000,4000])] Thanks Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ByteString in patterns
Don Stewart ha scritto: manlio_perillo: Don Stewart ha scritto: [...] {-# LANGUAGE OverloadedStrings #-} import qualified Data.ByteString.Char8 as C isMatch :: C.ByteString - Bool isMatch match = True isMatch _ = False main = print . map isMatch . C.lines = C.getContents What is the reason why instance declarations for IsString class are not defined for available ByteStrings? I need to define it by myself. They're exported from Data.ByteString.Char8 Then there is something I'm missing. Your code does not compile. Thanks Manlio Perillo ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] advice on a parsing function
minh thu ha scritto: [...] I suggest you try an alternative strategy. That altenrative strategy is twofold, just like you have quiz' and quiz'. This alternate strategy avoid pattern matching on strings (which would be cumbersome for a bit more complex syntax). But for this specific case it is very compact and elegant (IMHO). [...] Now, given those two functions, try to apply them on your input string, feeding the next function application with the resulting string of the current application. So, I should not split the string into lines? An useful feature of my program is that it parses both an input like: 1: 1046323,2005-12-19 and 1: 1046323 If I write a parser from scratch I need to implement two separate functions. I will give it a try, just to check if it has better performances. Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] advice on a parsing function
minh thu ha scritto: [...] The approach I suggested is a bit overkill. You can indeed use L.lines to split the input into lines then work on that. But still, avoid the pair (Int, Bytestring). Instead, you can basically map on each line the unsafeReadInt modified to : - return the id - return if it is one kind of id or the other kind. so : type UserId = Int type MovieId = Int unsafeReadInt :: Line - Either MovieId UserId Now you have a nice list [Either MovieId UserId] that you need to transform into (MovieId, [UserId]). Thanks, this seems a much better solution. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ByteString in patterns
Don Stewart ha scritto: [...] Then there is something I'm missing. Your code does not compile. Sure it does: As Daniel suggested, I'm using an old bytestring version that came with Debian Etch (GHC 6.8.2). Thanks Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] advice on a parsing function
Manlio Perillo ha scritto: minh thu ha scritto: [...] The approach I suggested is a bit overkill. You can indeed use L.lines to split the input into lines then work on that. But still, avoid the pair (Int, Bytestring). Instead, you can basically map on each line the unsafeReadInt modified to : - return the id - return if it is one kind of id or the other kind. so : type UserId = Int type MovieId = Int unsafeReadInt :: Line - Either MovieId UserId Now you have a nice list [Either MovieId UserId] that you need to transform into (MovieId, [UserId]). Thanks, this seems a much better solution. Done: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2309 real1m15.220s user0m4.816s sys 0m0.308s 3084 KB memory usage Previous version required 4956 KB of memory. Thanks again for the suggestion, Minh. Manlio ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe