[Haskell-cafe] cereal vs. binary

2010-07-03 Thread braver
I dump results of a computation as a Data.Trie of [(Int,Float)]. It contains about 5 million entries, with the lists of 35 or less pairs each. It takes 8 minutes to load with Data.Binary and lookup a single key. What can take so long? If I change from compressed to uncompressed (and then

[Haskell-cafe] building a patched ghc

2010-06-25 Thread braver
How do I get the latest 6.12 source and cherry-pick selected patches from the trunk, e.g. Simon's GC fixes? The wiki says that you get the head with darcs get --lazy http://darcs.haskell.org/ghc and the 6.12 with darcs get --lazy http://darcs.haskell.org/ghc-6.12/ghc Since the trunk is big

[Haskell-cafe] Re: building a patched ghc

2010-06-25 Thread braver
An attempt to build the trunk gets me this: /opt/portage/usr/lib/gcc/x86_64-pc-linux-gnu/4.2.4/../../../../x86_64- pc-linux-gnu/bin/ld: rts/dist/build/RtsStartup.dyn_o: relocation R_X86_64_PC32 against symbol `StgRun' can not be used when making a shared object; recompile with -fPIC -- I use

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread braver
On Jun 24, 5:07 am, Johan Tibell johan.tib...@gmail.com wrote: The new The Performance of Haskell containers package paper compares the performance of, among other things, Maps holding Strings/ByteString. It also improves the performance of many operations on these. I think it's very relevant

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread braver
Claus -- cafe5 is pretty much where it's at. You're right, the proggy was used as the bug finder, actually at cafe3, still using ByteString. Having translated it from Clojure to Haskell to OCaml, I'm now debugging the logic and perhaps the conceptual data structures. Then better maps will be

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread braver
Simon -- amazing feat! Thanks for tracking it down. I'll now happily rely on the Haskell version if it is fast enough :). -- Alexy ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-24 Thread braver
Simon -- so how can I get me a new ghc now? From git, I suppose? (It used to live in darcs...) -- Alexy ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: How does one get off haskell?

2010-06-18 Thread braver
On Jun 18, 10:37 am, Ivan Lazar Miljenovic ivan.miljeno...@gmail.com wrote: C. McCann c...@uptoisomorphism.net writes: I've seen a lot of people claim that there are cases where it's easier/better to use dynamic typing than even Haskell-style static typing, but have never been given an example

[Haskell-cafe] Re: How does one get off haskell?

2010-06-18 Thread braver
On Jun 18, 12:59 pm, Edward Z. Yang ezy...@mit.edu wrote: ... I would still prefer Haskel for a system intended for production; with the pain of making sure you've handled all of the possible constructors for the data you're operating on, you also have a pretty good assurance that you haven't

[Haskell-cafe] Re: How does one get off haskell?

2010-06-17 Thread braver
If you're willing to use Haskell as a synonym for FP of sorts, you can now get cool jobs writing Scala -- e.g. Sony uses it to manage all of their disk farms; Clojure is awesome, although very different (dynamic and macro); and Jane Street is always hiring in New York, London and Tokyo doing

[Haskell-cafe] Re: What is Haskell unsuitable for?

2010-06-17 Thread braver
At this very moment I'm struggling with fitting a huge graph of Twitter communications into a Haskell program. Apparently it gets into a loop freeing memory. As I suspected, JVM garbage collector got more testing than Haskell at this scale; since not many people load it up as much, it may be

[Haskell-cafe] Re: How does one get off haskell?

2010-06-17 Thread braver
Another way to be happy is to get a family to support, then any idea of making a quick pfennig with C# and then luxuriating in Rio with a laptop full of Haskell will only work if your company goes public! :) -- Alexy ___ Haskell-Cafe mailing list

[Haskell-cafe] Re: What is Haskell unsuitable for?

2010-06-17 Thread braver
(Looks like my previous reply didn't go through yet) -- yes, we're working on it! :) Cheers, Alexy ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: What is Haskell unsuitable for?

2010-06-17 Thread braver
On Jun 17, 12:20 pm, Don Stewart d...@galois.com wrote: Did you talk to Simon Marlow yet? Unlike the JVM, we provide direct access to the GC developers when you run into trouble. :) Yes -- Simon's very helpful, he showed how to identify the loop and debug it further. Hopefully we'll work

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-17 Thread braver
Since folks got interested, I've added a doc/ subdirectory (on the intern branch) with a PDF defining my karmic social capital mathematically. It is this definition which is faithfully computed both in Clojure and Haskell. I've also added a LICENSE file basically stating that this research is to

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-17 Thread braver
On Jun 17, 2:36 pm, Claus Reinke claus.rei...@talk21.com wrote: I'll work with Simon to investigate the runtime, but would welcome any ideas on further speeding up cafe4. Just a wild guess, but those foldWithKeys make me nervous. The result is strict, the step function tries to be strict,

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-16 Thread braver
WIth @dafis's help, there's a version tagged cafe3 on the master branch which is better performing with ByteString. I also went ahead and interned ByteString as Int, converting the structure to IntMap everywhere. That's reflected on the new intern branch at tag cafe4. Still it can't do the full

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-15 Thread braver
If you just want to optimize it and not compare exactly equal idiomatic code, you should stop using functional data structures and use a structure that fits your problem (the ST monad has been designed for that in Haskell), because compilers do not detect single-threaded usage and rewrite all

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-15 Thread braver
On Jun 15, 6:27 am, Simon Marlow marlo...@gmail.com wrote: On 15/06/2010 06:09, braver wrote: In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 days, with RAM slowly getting into 50 GB; a previous version caused ghc 6.12.1 to segfault around day 12 -- -debug showing

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-15 Thread braver
Wren -- thanks for the clarification! Someone said that Foldable on Trie may not be very efficient -- is that true? I use ByteString as a node type for the graph; these are Twitter user names. Surely it's useful to replace them with Int, which I'll try, but Clojure works with Java String fine

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-14 Thread braver
I've supplied a profile report there. Since I load the graphs in memory and then walk them a lot, the time seems expected. It allocates a lot, though. The main graph type is type Graph = M.Map User AdjList type AdjList = M.Map Day Reps type User = B.ByteString type Day = Int type Reps = M.Map

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-14 Thread braver
On Jun 14, 11:40 am, Don Stewart d...@galois.com wrote: Oh, you'll want insertWith'. You might also consider bytestring-trie for the Graph, and IntMap for the AdJList ? Yeah, I saw jsonb using Trie and thought there's a reason for it. But it's very API-poor compared with Map, e.g. there's

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-14 Thread braver
OK, sample data is uploaded to data/sample in the git repo, and README.md updated with the build and run command lines. I've achieved a bit more strictness, again with great help from @dons, @dafis, and other great folks here and #haskell, but it's still slower than Clojure and occupies a bit

[Haskell-cafe] Re: Mining Twitter data in Haskell and Clojure

2010-06-14 Thread braver
In fact, the tag cafe2, when run on the full dataset, gets stuck at 11 days, with RAM slowly getting into 50 GB; a previous version caused ghc 6.12.1 to segfault around day 12 -- -debug showing an assert failure in Storage.c. ghc 6.10 got stuck at 30 days for good, and when profiling crashed

[Haskell-cafe] Mining Twitter data in Haskell and Clojure

2010-06-13 Thread braver
I'm computing a communication graph from Twitter data and then scan it daily to allocate social capital to nodes behaving in a good karmic manner. The graph is culled from 100 million tweets and has about 3 million nodes. First I wrote the simulation of the 35 days of data in Clojure and then