Re: [Haskell-cafe] strict version of Haskell - does it exist?
http://www.vex.net/~trebla/haskell/lazy.xhtml It is half done. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
Even though advertised as parallel programming tools, parMap and other functions that work in parallel over *sequential* access data structures (i.e. linked lists.) We want flat, strict, unpacked data structures to get good performance out of parallel algorithms. DPH, repa, and even vector show the way. You would think that tree data structures would be good here as well. For example, monad-par includes a definition of an append-based AList (like Guy Steele argues for). But alas that turns out to be much harder to get working well. For most algorithms Vectors so often end up better. -Ryan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
Using insertWith' gets time down to 30-40 secs (thus only being 3-4 times slower than PHP). PHP still is at 13 secs, does not require installing libraries - does not require compilation and is trivial to write. A trivial C++ application takes 11-12secs and even with some googling was trivial to write. Excerpts from Felipe Almeida Lessa's message of Mon Jan 30 17:36:46 +0100 2012: Then please take a deeper look into my code. What you said that you've tried is something else. I didn't say that I tried your code. I gave enumerator package a try counting lines which I expected to behave similar to conduits because both serve a similar purpose. Then I hit the the sourceFile returns chunked lines issue (reported it, got fixed) - Anyway: My log files are a json dictionary on each line: { id : foo, ... } { id : bar, ... } Now how do I use the conduit package to split a chunked file into lines? Or should I create a new parser many json newline ? Except that I think my processJson for this test should look like this because I want to count how often the clients queried the server. Probalby I should also be using CL.fold as shown in the test cases of conduit. If you tell me how you'd cope with the one json dict on each line issue I'll try to benchmark this solution as well. -- probably existing library functions can be used here .. processJson :: (M.Map T.Text Int) - Value - (M.Map T.Text Int) processJson m value = case value of Ae.Object hash_map - case HMS.lookup (T.pack id) hash_map of Just id_o - case id_o of Ae.String id - M.insertWith' (+) id 1 m _ - m _ - m _ - m Marc Weber ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
On Tue, Jan 31, 2012 at 6:05 AM, Marc Weber marco-owe...@gmx.de wrote: I didn't say that I tried your code. I gave enumerator package a try counting lines which I expected to behave similar to conduits because both serve a similar purpose. Then I hit the the sourceFile returns chunked lines issue (reported it, got fixed) - Anyway: My log files are a json dictionary on each line: { id : foo, ... } { id : bar, ... } Now how do I use the conduit package to split a chunked file into lines? Or should I create a new parser many json newline ? Currently there are two solutions. The first one is what I wrote earlier on this thread: jsonLines :: C.Resource m = C.Conduit B.ByteString m Value jsonLines = C.sequenceSink () $ do val - CA.sinkParser json' CB.dropWhile isSpace_w8 return $ C.Emit () [val] This conduit will run the json' parser (from aeson) and then drop any whitespace after that. Note that it will correctly parse all of your files but will also parse some files that don't conform to your specification. I assume that's fine. The other solution is going to released with conduit 0.2, probably today. There's a lines conduit that splits the file into lines, so you could write jsonLines above as: mapJson :: C.Resource m = C.Conduit B.ByteString m Value mapJson = C.sequenceSink () $ do val - CA.sinkParser json' return $ C.Emit () [val] which doesn't need to care about newlines, and then change main to main = do ... ret - forM_ fileList $ \fp - do C.runResourceT $ CB.sourceFile fp C.$= CB.lines C.$= -- new line is here mapJson C.$= CL.mapM processJson C.$$ CL.consume print ret I don't know which solution would be faster. Either way, both solutions will probably be faster with the new conduit 0.2. Except that I think my processJson for this test should look like this because I want to count how often the clients queried the server. Probalby I should also be using CL.fold as shown in the test cases of conduit. If you tell me how you'd cope with the one json dict on each line issue I'll try to benchmark this solution as well. This issue was already being coped with in my previous e-mail =). -- probably existing library functions can be used here .. processJson :: (M.Map T.Text Int) - Value - (M.Map T.Text Int) processJson m value = case value of Ae.Object hash_map - case HMS.lookup (T.pack id) hash_map of Just id_o - case id_o of Ae.String id - M.insertWith' (+) id 1 m _ - m _ - m _ - m Looks like the perfect job for CL.fold. Just change those three last lines in main from ... C.$= CL.mapM processJson C.$$ CL.consume into ... C.$$ CL.fold processJson and you should be ready to go. Cheers! -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
jsonLines :: C.Resource m = C.Conduit B.ByteString m Value jsonLines = C.sequenceSink () $ do val - CA.sinkParser json' CB.dropWhile isSpace_w8 return $ C.Emit () [val] Adding a \state - (the way Felipe Lessa told me) make is work and it runs in about 20sec and that although some conduit overhead is likely to take place. omitting my custom data type using bytestrings operating on Value of Aeson reduces running time to 16secs. PHP/C++ still wins: less than 12secs. Now I can imagine again that even a desktop multi core system is faster than a single threaded C application. Thanks for your help. Maybe I can setup profiling again to understand why its still taking little bit more time. Marc Weber ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
On Tue, Jan 31, 2012 at 1:36 PM, Marc Weber marco-owe...@gmx.de wrote: Adding a \state - (the way Felipe Lessa told me) make is work and it runs in about 20sec and that although some conduit overhead is likely to take place. Just out of curiosity: did you use conduit 0.1 or 0.2? Cheers! =) -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
Excerpts from Felipe Almeida Lessa's message of Tue Jan 31 16:49:52 +0100 2012: Just out of curiosity: did you use conduit 0.1 or 0.2? I updated to 0.2 today because I was looking for a monad instance for SequenceSink - but didn't find it cause I tried using it the wrong way (\state - see last mail) I also tried json' vs json (strict and non strict versions) - didn't seem to make a big difference. Marc Weber ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
Hi Everyone, I had a similar experience with a similar type of problem. The application was analyzing web pages that our web crawler had collected, well not the pages themselves but metadata about when the page was collected. The basic query was: SELECT Domain, Date, COUNT(*) FROM Pages GROUP BY Domain, Date The webpage data was split out across tens of thousands of files compressed binary. I used enumerator to load these files and select the appropriate columns. This step was performed in parallel using parMap and worked fine once i figured out how to add the appropriate !s. The second step was the group by. I built some tools across monad-par that had the normal higher level operators like map, groupBy, filter, etc... The typical pattern I followed was the map-reduce style pattern used in monad-par. I was hoping to someday share this work, although I have since abandoned work on it. It took me a couple of weeks to get the strictness mostly right. I say mostly because it still randomly blows up, meaning if I feed in a single 40kb file maybe 1 time in 10 it consumes all the memory on the machine in a few seconds. There is obviously a laziness bug in there somewhere although after working on it for a few days and failing to come up with a solid repro case I eventually built all the web page analysis tools in scala, in large part because I did not see a way forward and need to tie off that work and move on. My observations: Combining laziness and parallelism made it very difficult to reason about what was going on. Test cases became non-deterministic not in terms out output in the success case but whether they ran at all. The tooling around laziness does not give enough information about debugging complex problems. Because of this when people ask Is Haskell good for parallel development? I tell them the answer is complicated. Haskell has excellent primitives for parallel development like the STM which I love but it lacks a PLINQ like toolkit that is fully built out to enable flexible parallel data processing. The other thing is that deepseq is very important . IMHO this needs to be a first class language feature with all major libraries shipping with deepseq instances. There seems to have been some movement on this front but you can't do serious parallel development without it. Some ideas for things that might help would be a plugin for vim that showed the level of strictness of operations and data. I am going to take another crack at a PLINQ like library with GHC 7.4.1 in the next couple of months using the debug symbols that Peter has been working on. Conclusion: Haskell was the wrong platform to be doing webpage analysis anyhow, not because anything is wrong with the language but simply it does not have the tooling that the JVM does. I moved all my work into Hadoop to take advantage of multi-machine parallelism and higher level tools like Hive. There might be a future in building haskell code that could be translated into a Hive query. With better tools I think that Haskell can become the goto language for developing highly parallel software. We just need the tools to help developers better understand the laziness of their software. There also seems to be a documentation gap on developing data analysis or data transformation pipelines in haskell. Sorry for the length. I hope my experience is useful to someone. Steve On Tue, Jan 31, 2012 at 7:57 AM, Marc Weber marco-owe...@gmx.de wrote: Excerpts from Felipe Almeida Lessa's message of Tue Jan 31 16:49:52 +0100 2012: Just out of curiosity: did you use conduit 0.1 or 0.2? I updated to 0.2 today because I was looking for a monad instance for SequenceSink - but didn't find it cause I tried using it the wrong way (\state - see last mail) I also tried json' vs json (strict and non strict versions) - didn't seem to make a big difference. Marc Weber ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
On Tue, Jan 31, 2012 at 9:19 PM, Steve Severance ssevera...@alphaheavy.comwrote: The other thing is that deepseq is very important . IMHO this needs to be a first class language feature with all major libraries shipping with deepseq instances. There seems to have been some movement on this front but you can't do serious parallel development without it. I completely agree on the first part, but deepseq is not a panacea either. It's a big hammer and overuse can sometimes cause wasteful O(n) no-op traversals of already-forced data structures. I also definitely wouldn't go so far as to say that you can't do serious parallel development without it! The only real solution to problems like these is a thorough understanding of Haskell's evaluation order, and how and why call-by-need is different than call-by-value. This is both a pedagogical problem and genuinely hard -- even Haskell experts like the guys at GHC HQ sometimes spend a lot of time chasing down space leaks. Haskell makes a trade-off here; reasoning about denotational semantics is much easier than in most other languages because of purity, but non-strict evaluation makes reasoning about operational semantics a little bit harder. In domains where you care a lot about operational semantics (like parallel and concurrent programming, where it's absolutely critical), programmers necessarily require a lot more experience and knowledge in order to be effective in Haskell. G -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
On Tue, Jan 31, 2012 at 1:22 PM, Gregory Collins g...@gregorycollins.net wrote: I completely agree on the first part, but deepseq is not a panacea either. It's a big hammer and overuse can sometimes cause wasteful O(n) no-op traversals of already-forced data structures. I also definitely wouldn't go so far as to say that you can't do serious parallel development without it! I agree. The only time I ever use deepseq is in Criterion benchmarks, as it's a convenient way to make sure that the input data is evaluated before the benchmark starts. If you want a data structure to be fully evaluated, evaluate it as it's created, not after the fact. The only real solution to problems like these is a thorough understanding of Haskell's evaluation order, and how and why call-by-need is different than call-by-value. This is both a pedagogical problem and genuinely hard -- even Haskell experts like the guys at GHC HQ sometimes spend a lot of time chasing down space leaks. Haskell makes a trade-off here; reasoning about denotational semantics is much easier than in most other languages because of purity, but non-strict evaluation makes reasoning about operational semantics a little bit harder. +1 We can do a much better job at teaching how to reason about performance. A few rules of thumb gets you a long way. I'm (slowly) working on improving the state of affairs here. -- Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
On Tue, Jan 31, 2012 at 12:19 PM, Steve Severance ssevera...@alphaheavy.com wrote: The webpage data was split out across tens of thousands of files compressed binary. I used enumerator to load these files and select the appropriate columns. This step was performed in parallel using parMap and worked fine once i figured out how to add the appropriate !s. Even though advertised as parallel programming tools, parMap and other functions that work in parallel over *sequential* access data structures (i.e. linked lists.) We want flat, strict, unpacked data structures to get good performance out of parallel algorithms. DPH, repa, and even vector show the way. -- Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
On Sun, 2012-01-29 at 23:47 +0100, Marc Weber wrote: So maybe also the JSON parsing library kept too many unevaluated things in memory. So I could start either writing my own JSON parsing library (being more strict) Jfyi, aeson has been added strict parser variants json' and value' [1] some time ago, so you shouldn't have to write your own stricter JSON parsing library... [1]: http://hackage.haskell.org/packages/archive/aeson/0.6.0.0/doc/html/Data-Aeson-Parser.html#g:2 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
On Mon, Jan 30, 2012 at 6:21 AM, Herbert Valerio Riedel h...@gnu.org wrote: On Sun, 2012-01-29 at 23:47 +0100, Marc Weber wrote: So maybe also the JSON parsing library kept too many unevaluated things in memory. So I could start either writing my own JSON parsing library (being more strict) Jfyi, aeson has been added strict parser variants json' and value' some time ago, so you shouldn't have to write your own stricter JSON parsing library... Also, besides using those variants, you may also use the attoparsec-conduit library [1]. If you have processJson :: Value - IO X then you'd need just something like import Data.Aeson (Value, json') import Data.Attoparsec.Char8 (isSpace_w8) import qualified Data.ByteString as B import qualified Data.Conduit as C import qualified Data.Conduit.Attoparsec as CA import qualified Data.Conduit.Binary as CB import qualified Data.Conduit.List as CL main = do ... ret - forM_ fileList $ \fp - do C.runResourceT $ CB.sourceFile fp C.$= jsonLines C.$= CL.mapM processJson C.$$ CL.consume print ret jsonLines :: C.Resource m = C.Conduit B.ByteString m Value jsonLines = C.sequenceSink () $ do val - CA.sinkParser json' CB.dropWhile isSpace_w8 return $ C.Emit () [val] This code is extremely resource-friendly, since (a) you can't leak file descriptors and (b) you just have to make sure that processJson function isn't too lazy. It should be quite fast as well. Cheers! =) [1] http://hackage.haskell.org/package/attoparsec-conduit -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
On Sun, Jan 29, 2012 at 11:25:09PM +0100, Ertugrul Söylemez wrote: First of all, /learning/ to optimize Haskell can be difficult. The optimizing itself is actually fairly easy in my experience, once you understand how the language works. Given the fact that you have obviously mastered the learning part of this, do you have any recommendations on what to read or on how to proceed in order to learn how to optimize Haskell code? I can imagine, it's not only about understanding the language itself, but also about understanding how your compiler and its switches work, being able to find the hot spots in your code, being able to measure the effects of your changes, developing a good sense for the tradeoffs, etc. So far, I have only stumpled upon chapter 25 of Real World Haskell. Anything else you might recommend? regards Alex signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
On 29 Jan 2012, at 22:25, Ertugrul Söylemez wrote: A strict-by-default Haskell comes with the implication that you can throw away most of the libraries, including the base library. So yes, a strict-by-default Haskell is very well possible, but the question is whether you actually want that. I wouldn't, because a lot of my code relies on the standard semantics. At work, we have a strict version of Haskell, and indeed we do not use the standard libraries, but have built up our own versions of the ones we use. However, our compiler is smart enough to transform and optimise the source code *as if* it were non-strict: it is only at runtime that things are evaluated strictly. This means that, in general, you can rely on the standard semantics to a surprisingly large extent. For instance, maybe (error foo) lines (Just hello\nworld) will succeed without calling error, just like in Haskell. Even if the final argument is supplied only at runtime, not statically, it will still do the right thing. However, the downside of being strict at runtime is frequently poorer performance. Stack overflows are common if you use explicit recursion: it is better to use higher-order functions (map, fold, until) that are implemented at a lower level in C (i.e. not directly using explicit recursion themselves). This is a good thing of course - thinking of data structures in the aggregate, rather than piecemeal. However, bulk operations do transform the entire data structure, not merely the fragments that are needed for the onward computation, so it can often be a net performance loss. The standard lazy computational paradigm of generate-and-test is therefore hideously expensive, on occasion. Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
On Mon, Jan 30, 2012 at 6:24 AM, Malcolm Wallace malcolm.wall...@me.comwrote: On 29 Jan 2012, at 22:25, Ertugrul Söylemez wrote: A strict-by-default Haskell comes with the implication that you can throw away most of the libraries, including the base library. So yes, a strict-by-default Haskell is very well possible, but the question is whether you actually want that. I wouldn't, because a lot of my code relies on the standard semantics. At work, we have a strict version of Haskell, and indeed we do not use the standard libraries, but have built up our own versions of the ones we use. However, our compiler is smart enough to transform and optimise the source code *as if* it were non-strict: it is only at runtime that things are evaluated strictly. This means that, in general, you can rely on the standard semantics to a surprisingly large extent. I wanted to emphasize Malcolm's point here. Optimizing using the original Haskell semantics turned out to be pretty important back when I was working on Eager Haskell. For example, a lot of Haskell functions are written in the following style: f a b c | guard1 d = g h i | guard2 e = h | guard3 f = i | otherwise = j where d = ...expensive... e = ...expensive... f = ...expensive... g = ...expensive... h = ...expensive... i = ...expensive... j = ... expensive... An an ordinary procedural language, where function calls in g, h, i, and j might have side effects, we can't sink bindings down to the point of use. Even in the absence of side effects, we have to account for the fact that some of these computations are used in some -- but not all -- right-hand sides, and that often we need to do some inlining to discover that a value isn't going to be used. It turns out Haskell code relies on this sort of thing all over the place, and simply coding around it leads to deeply-nested let bindings that walk off the right-hand edge of the screen. It's not difficult to rewrite most of the prelude functions in this style, but it's no longer pretty, and it's recognizably not idiomatic Haskell. However, bulk operations do transform the entire data structure, not merely the fragments that are needed for the onward computation, so it can often be a net performance loss. The standard lazy computational paradigm of generate-and-test is therefore hideously expensive, on occasion. This was a huge issue in Eager Haskell. By far our worst performance was on stream-like programs that generated infinite lists of results, and then sliced away the useless tail. With completely strict evaluation, this of course doesn't work at all, but it can be tricky to bound the required sizes of inputs even if you know how much of the output you want (imagine a call to takeWhile or filter on an infinite list). -Jan-Willem Maessen Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
Alexander Bernauer alex-hask...@copton.net wrote: On Sun, Jan 29, 2012 at 11:25:09PM +0100, Ertugrul Söylemez wrote: First of all, /learning/ to optimize Haskell can be difficult. The optimizing itself is actually fairly easy in my experience, once you understand how the language works. Given the fact that you have obviously mastered the learning part of this, do you have any recommendations on what to read or on how to proceed in order to learn how to optimize Haskell code? I can imagine, it's not only about understanding the language itself, but also about understanding how your compiler and its switches work, being able to find the hot spots in your code, being able to measure the effects of your changes, developing a good sense for the tradeoffs, etc. So far, I have only stumpled upon chapter 25 of Real World Haskell. Anything else you might recommend? That's the tricky part about this. Unfortunately the monad tutorial fallacy [1] applies here. At some point it makes click and suddenly you see all the data dependencies. If you were schizophrenic, you would probably see little arrows all over your code. What helped me a bit was to read the Wikibooks page about Haskell's denotational semantics [2], but other than that I pretty much learned it by myself. In any case it's helpful to use 'seq' explicitly instead of bang patterns. Personally I never use bang patterns, but always use seq, strict patterns or guards with strict functions. Example: add :: Integer - Integer - Integer add x 0 = x add x y = (add $! succ x) (pred y) Here the pattern forces the second argument, so there is no need to force it yourself. However, nothing forces the first argument, so it needs to be forced manually, in this case by using ($!). [1]: http://byorgey.wordpress.com/2009/01/12/ abstraction-intuition-and-the-monad-tutorial-fallacy/ [2]: http://en.wikibooks.org/wiki/Haskell/Denotational_semantics Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife = sex) http://ertes.de/ signature.asc Description: PGP signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
Replying to all replies at once: Malcolm Wallace At work, we have a strict version of Haskell :-) which proofs that it is worth thinking about it. Ertugrul If you want to save the time to learn how to write efficient Haskell programs, you may want to have a look into the Disciple language. Yes - I will. Its on my TODO list for at least 12 month :( Not sure whether there are parser combinator libraries yet. @ Herbert Valerio Riedel (suggesting aeson) I gave it yet another try - see below. It still fails. @ Felipe Almeida Lessa (suggesting conduits and atto parsec) I mentioned that I already tried it. Counting lines only was a lot slower than counting lines and parsing JSON using PHP. @ Chris Wong flag is that strictness isn't just something you turn on and off willy-nilly You're right. But those features are not required for all tasks :) Eg have a look at Data.Map. Would a strict implementation look that different? I came up with this now. Trying strict bytestrings and Aeson. note the LB.fromChunks usage below to turn a strict into a lazy bytestring. Result: PHP script doing the same runs in 33secs (using the AFindCheckoutsAndContacts branch) The haskell program below - I stopped it after 8 min. (At least it does no longer cause my system to swap .. You're right: I could start profiling. I could learn about how to optimize it.) But why? The conclusion is that if I ever use yesod - and if I ever want to analyse logs - I should call PHP from yesod as external process !? :-( Even if I had a core i7 (8 threads in parallel) I still would not be sure whether Haskell would be the right choice. I agree that I assume that all data fits into memory so that piecewise evaluation doesn't matter too much. Thanks for all your help - it proofs once again that the Haskell community is the most helpful I know about. Yet I think me being the programmer Haskell is the wrong choice for this task. Thanks Marc Weber my new attempt - now I should start profiling.. Looks like I haven't built all libs with profiling support .. import Data.Aeson.Types import Data.Aeson import Data.List import Control.Applicative import Debug.Trace import qualified Data.Map as M import Action import Data.Aeson.Parser as AP import qualified Data.ByteString.Lazy as LB import qualified Data.ByteString as BS import qualified Data.ByteString.Char8 as LBC8 data Action = ACountLine | AFindCheckoutsAndContacts -- lines look like this: -- {id:4ee535f01550c,r:,ua:Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm),t:1323644400,k:XXX,v:YY} data Item = Item { id :: SB.ByteString, ua :: SB.ByteString, t :: Int, k :: SB.ByteString, v :: SB.ByteString } instance FromJSON Item where parseJSON (Object v) = Item $ v .: id * v .: ua * v .: t * v .: k * v .: v parseJSON _ = empty run :: Action - [FilePath] - IO () run AFindCheckoutsAndContacts files = do -- find all ids quering the server more than 100 times. -- do so by building a map counting queries (lines :: [BS.ByteString]) - fmap (concat . (map LBC8.lines) ) $ mapM BS.readFile files let processLine :: (M.Map BS.ByteString Int) - BS.ByteString - (M.Map BS.ByteString Int) processLine st line = case decode' (LB.fromChunks [line]) of Nothing - st -- traceShow (bad line ++ (LBC8.unpack line)) st Just (Item id ua t k v) - M.insertWith (+) k 1 st let count_by_id = foldl' processLine (M.empty) lines mapM_ print $ M.toList $ M.filter ( 100) count_by_id ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
On Mon, Jan 30, 2012 at 2:12 PM, Marc Weber marco-owe...@gmx.de wrote: @ Felipe Almeida Lessa (suggesting conduits and atto parsec) I mentioned that I already tried it. Counting lines only was a lot slower than counting lines and parsing JSON using PHP. Then please take a deeper look into my code. What you said that you've tried is something else. Cheers, -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
Marc Weber wrote: Replying to all replies at once: Malcolm Wallace At work, we have a strict version of Haskell :-) which proofs that it is worth thinking about it. But doesn't necessarily prove that it's a good idea. Just (Item id ua t k v) - M.insertWith (+) k 1 st Does replacing this by insertWith' help? Roman ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
Marc Weber wrote: Replying to all replies at once: Malcolm Wallace At work, we have a strict version of Haskell :-) which proofs that it is worth thinking about it. But doesn't necessarily prove that it's a good idea. Just (Item id ua t k v) - M.insertWith (+) k 1 st Does replacing this by insertWith' help? Roman ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
Hi, Am Montag, den 30.01.2012, 10:52 +0100 schrieb Alexander Bernauer: On Sun, Jan 29, 2012 at 11:25:09PM +0100, Ertugrul Söylemez wrote: First of all, /learning/ to optimize Haskell can be difficult. The optimizing itself is actually fairly easy in my experience, once you understand how the language works. Given the fact that you have obviously mastered the learning part of this, do you have any recommendations on what to read or on how to proceed in order to learn how to optimize Haskell code? I can imagine, it's not only about understanding the language itself, but also about understanding how your compiler and its switches work, being able to find the hot spots in your code, being able to measure the effects of your changes, developing a good sense for the tradeoffs, etc. So far, I have only stumpled upon chapter 25 of Real World Haskell. Anything else you might recommend? Although I would not claim that I have mastered this, I did recently hold a talk that touched some of these issues, and also exhibits a case where you want something more fine-grained than just strictness or lazyness. From your name I think it is safe to point you to a German document: http://www.joachim-breitner.de/blog/archives/539-Guest-lecture-on-Haskell-performance.html Greetings, Joachim -- Joachim nomeata Breitner m...@joachim-breitner.de | nome...@debian.org | GPG: 0x4743206C xmpp: nome...@joachim-breitner.de | http://www.joachim-breitner.de/ signature.asc Description: This is a digitally signed message part ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
Generally strict Haskell means using strict data types - vectors, arrays, bytestrings, intmaps where required. However, you usually don't want all code and data strict, all the time, since laziness/on-demand eval is critical for deferring non-essential work. Summary; -fstrict wouldn't magically make your code good. Using the right balance of strict and lazy code, via the right choice of strict and lazy types, however, often does. Id be interested to know what choices were made in your log file case led you into problems -- using something excessively lazy (like lazy lists) or something excessively strict (like strict bytestrings) would both be suboptimal for log analysis. A hybrid type like a lazy bytestring, would be more appropriate. On Sunday, January 29, 2012, Marc Weber marco-owe...@gmx.de wrote: A lot of work has been gone into GHC and its libraries. However for some use cases C is still preferred, for obvious speed reasons - because optimizing an Haskell application can take much time. Is there any document describing why there is no ghc --strict flag making all code strict by default? Wouldn't this make it easier to apply Haskell to some additional fields such as video processing etc? Wouldn't such a '--strict' flag turn Haskell/GHC into a better C/gcc compiler? Projects like this: https://github.com/thoughtpolice/strict-ghc-plugin show that the idea is not new. Eg some time ago I had to do some logfile analysis. I ended doing it in PHP because optimizing the Haskell code took too much time. Marc Weber ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
On Mon, Jan 30, 2012 at 10:13 AM, Marc Weber marco-owe...@gmx.de wrote: A lot of work has been gone into GHC and its libraries. However for some use cases C is still preferred, for obvious speed reasons - because optimizing an Haskell application can take much time. As much as any other high-level language, I guess. Don't compare apples to oranges and complain oranges aren't crunchy enough ;) Is there any document describing why there is no ghc --strict flag making all code strict by default? Yes -- it's called the Haskell Report :) GHC does a lot of optimization already. If making something strict won't change how it behaves, it will, using a process called strictness analysis. The reason why there is no --strict flag is that strictness isn't just something you turn on and off willy-nilly: it changes how the whole language works. Structures such as infinite lists and Don Stewart's lazy bytestrings *depend* on laziness for their performance. Wouldn't this make it easier to apply Haskell to some additional fields such as video processing etc? Wouldn't such a '--strict' flag turn Haskell/GHC into a better C/gcc compiler? See above. Projects like this: https://github.com/thoughtpolice/strict-ghc-plugin show that the idea is not new. Not sure what that does, but I'll have a look at it. Eg some time ago I had to do some logfile analysis. I ended doing it in PHP because optimizing the Haskell code took too much time. That probably because you're using linked lists for strings. For intensive text processing, it's better to use the text package instead [1] Chris [1] http://hackage.haskell.org/package/text Marc Weber ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
Marc Weber marco-owe...@gmx.de wrote: A lot of work has been gone into GHC and its libraries. However for some use cases C is still preferred, for obvious speed reasons - because optimizing an Haskell application can take much time. Is there any document describing why there is no ghc --strict flag making all code strict by default? Wouldn't this make it easier to apply Haskell to some additional fields such as video processing etc? Wouldn't such a '--strict' flag turn Haskell/GHC into a better C/gcc compiler? Projects like this: https://github.com/thoughtpolice/strict-ghc-plugin show that the idea is not new. Eg some time ago I had to do some logfile analysis. I ended doing it in PHP because optimizing the Haskell code took too much time. First of all, /learning/ to optimize Haskell can be difficult. The optimizing itself is actually fairly easy in my experience, once you understand how the language works. Usually the nonstrictness is no bottleneck. However, you have to know that you are in a nonstrict language. In fact, I find myself having difficulties writing efficient code in a strict language. Now to answer your question: A strict-by-default Haskell comes with the implication that you can throw away most of the libraries, including the base library. So yes, a strict-by-default Haskell is very well possible, but the question is whether you actually want that. I wouldn't, because a lot of my code relies on the standard semantics. I would also expect problems with the way Haskell performs I/O, because it would mean that forever (putStrLn Hello world) would cause a heap overflow, if Haskell were strict. Note that we don't have control structures. We have combinators, and their nonstrictness is essential. The flag you are proposing would turn Haskell into a language that is different enough that you couldn't do many useful things with it. If you want to save the time to learn how to write efficient Haskell programs, you may want to have a look into the Disciple language. You will find that it has a different type system, which captures side effects explicitly to make a pure strict language even possible. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife = sex) http://ertes.de/ signature.asc Description: PGP signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
The strict-ghc-plugin (under my maintenance) is just a continuation of one of the original demos Max had for plugin support in the compiler. The idea is fairly simple: 'let' and 'case' are the forms for creating lazy/strict bindings in Core. It just systematically replaces all occurrences of 'let' in Core with 'case'. So 'let b = e1 in e2' becomes 'case e1 of { b - e2 }', making 'b' strict. It also replaces all applications of the form 'App e1 e2' (which is also lazy, as e2 is evaluated on demand) with an equivalent binding like 'case e2 of { x - App e1 x }'. Pretty simple, and results in a totally strict program. The idea is just a proof of concept; in particular, I (and likely Max although I cannot speak for him) am not using it as a position to say that sometimes you want everything strict. You don't; at some point, you're not even using Haskell anymore I suppose (remember: non-strict semantics.) I can't think of any instance in which I would need or want to use this plugin, honestly. But maybe someone else would - I did refactor it to where you can strictify individual functions, as opposed to full-blown modules, via annotations. So you could selectively strictify things if you found it beneficial on certain identifiers. But then there's the question of what affect that has on the rest of GHC's optimizers, which I cant answer: the strictifier modifies the pipeline to be the *first* pass, and the remaining ones run afterwords. Compilers are built on heuristics and built for 'average' code. Sometimes these heuristics interact in odd ways, especially with code that may deviate from 'the norm.' Once you're fighting the optimizer, it can become a very difficult battle to win. Careful analysis and selective optimization is probably going to take you farther than hitting it with a giant hammer. Having lazy and strict data structures and knowing when/where to use them is crucial for good performance, and both have their merits (same with every other thing under the sun, like by-ref/by-val semantics in `data` types, which you can control with UNPACK etc.) I think we could most certainly use better tools for analyzing low-level performance details and the tradeoff between strictness/laziness and (especially in large codebases,) but I don't think systematically making everything strict is going to be the right idea in a vast majority of situations. On Sun, Jan 29, 2012 at 4:12 PM, Chris Wong chrisyco+haskell-c...@gmail.com wrote: On Mon, Jan 30, 2012 at 10:13 AM, Marc Weber marco-owe...@gmx.de wrote: A lot of work has been gone into GHC and its libraries. However for some use cases C is still preferred, for obvious speed reasons - because optimizing an Haskell application can take much time. As much as any other high-level language, I guess. Don't compare apples to oranges and complain oranges aren't crunchy enough ;) Is there any document describing why there is no ghc --strict flag making all code strict by default? Yes -- it's called the Haskell Report :) GHC does a lot of optimization already. If making something strict won't change how it behaves, it will, using a process called strictness analysis. The reason why there is no --strict flag is that strictness isn't just something you turn on and off willy-nilly: it changes how the whole language works. Structures such as infinite lists and Don Stewart's lazy bytestrings *depend* on laziness for their performance. Wouldn't this make it easier to apply Haskell to some additional fields such as video processing etc? Wouldn't such a '--strict' flag turn Haskell/GHC into a better C/gcc compiler? See above. Projects like this: https://github.com/thoughtpolice/strict-ghc-plugin show that the idea is not new. Not sure what that does, but I'll have a look at it. Eg some time ago I had to do some logfile analysis. I ended doing it in PHP because optimizing the Haskell code took too much time. That probably because you're using linked lists for strings. For intensive text processing, it's better to use the text package instead [1] Chris [1] http://hackage.haskell.org/package/text Marc Weber ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Regards, Austin ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] strict version of Haskell - does it exist?
Excerpts from Don Stewart's message of Sun Jan 29 22:55:08 +0100 2012: Summary; -fstrict wouldn't magically make your code good. No. you're right. I don't expect that to happen. I agree on it being always the programmers fault using wrong tools or not knowing the tools well enough to get a job done. The PHP code looks like this: foreach(glob('*.txt') as $file){ foreach(split(file_get_contents($file)) = $line){ $parsed_line = json_decode($line); // get some unix timestamps, keep some hashes of seen clients // (cookie ids) and such // check how many minutes before a checkout the customer visited // the site - and whether he did so for a couple of days. } } // print result The files are about 300 MB in size. However memory usage grew and grew and grew - I had to kill it or limit amount of files. The PHP code runs in a couple of seconds (parsing json and loading files).. the Haskell app took much longer. That PHP is fast is no surprise: I expect json_decode and split to be implemented in C. So yes - I used lazy lists. However 8GB of RAM should have been enough to keep things in RAM. So maybe also the JSON parsing library kept too many unevaluated things in memory. So I could start either writing my own JSON parsing library (being more strict) or profile the application many times - but I don't want to. Ignoring the json parsing I also gave conduits a try - only counting lines. I know its experimental - but from its description I concluded it would prevent me being a stupid Haskell programmer from taking too much memory even using bad Haskell code. However it looked like splitting the lines only counting them (recognizing utf-8 chars) took a lot longer than also doing the json parsing in PHP. Then the conduit implementation looked funny: hGetLine is used to feed a line each time ... (luckily - because the utf8-libraries don't provide nice ways to parse incomplete chunks of utf-8 bytes such as returning Either IncompleteMissingByte UTF8Chunk).. Probably the split(,\n) in PHP doesn't parse utf-8 chars - and luckily it doesn't seem to matter because \n only uses one byte. I know that I'm not an Haskell expert. I still got the impression that getting nice performance would be a small challenge and require much more time than I spend on the PHP version. That's why I'm wondering why there is no -fstrict option for such simple use cases because Haskell has so many optimizations other languages dream about. I mean lot's of non lazy language still get their jobs done. So it always depends on the use case. Isn't it easy to add a compiler flag to GHC adding those ! strictness annotations everywhere possible? Then simple use case like this would not be a huge challenge. Maybe you're right: I should just prepare some dummy files and ask the community to help. However optimizing the JSON parser and my code just seemed to be too much effort .. Marc Weber ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe