Using transients within fold
I've been experimenting with reducers using a small example that counts the words in Wikipedia pages by parsing the Wikipedia XML dump. The basic structure of the code is: (frequencies (flatten (map get-words (get-pages where get-pages returns a lazy sequence of pages from the XML dump and get-words takes a page and returns a sequence of the words on that page. The above code takes ~40s to count the words on the first 1 pages. If I convert that code to use reducers, it runs in ~22s (yay!). If I convert it to use fold and therefore run in parallel, it runs in ~13s on my 4-core MacBook Pro. So it's faster (yay!) but nowhere near 4x faster (boo). The primary reason for this is that, in order to be able to use fold, I've had to write my own version of frequencies: (defn frequencies-parallel [words] (r/fold (partial merge-with +) (fn [counts x] (assoc counts x (inc (get counts x 0 words)) And, unlike the version in core, this doesn't use transients. If I replace the fold with reduce (i.e. make it run sequentially) it runs in ~43s. So, I *am* getting close to a 4x speedup from parallelising the code, but unfortunately I'm also seeing a 2x slowdown because I can't use transients. Can anyone think of any way that it would be possible to modify this code to use transients? Or any way to modify reducers to allow transients to be used? -- paul.butcher-msgCount++ Snetterton, Castle Combe, Cadwell Park... Who says I have a one track mind? http://www.paulbutcher.com/ LinkedIn: http://www.linkedin.com/in/paulbutcher MSN: p...@paulbutcher.com AIM: paulrabutcher Skype: paulrabutcher -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Using transients within fold
Hi, that's not really possible at the moment. cf. https://groups.google.com/d/topic/clojure-dev/UbJlMO9XYjo/discussion and https://github.com/cgrand/clojure/commit/65e1acef03362a76f7043ebf3fe2fa277c581912 Kind regards Meikel -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Using transients within fold
On 14 Mar 2013, at 11:49, Meikel Brandmeyer (kotarak) m...@kotka.de wrote: that's not really possible at the moment. cf. https://groups.google.com/d/topic/clojure-dev/UbJlMO9XYjo/discussion and https://github.com/cgrand/clojure/commit/65e1acef03362a76f7043ebf3fe2fa277c581912 Dang. At least other people have the same problem, so perhaps there's a chance it'll be addressed :-) -- paul.butcher-msgCount++ Snetterton, Castle Combe, Cadwell Park... Who says I have a one track mind? http://www.paulbutcher.com/ LinkedIn: http://www.linkedin.com/in/paulbutcher MSN: p...@paulbutcher.com AIM: paulrabutcher Skype: paulrabutcher -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Using transients within fold
As a temporary hack, perhaps you could implement a deftype ReduceToTransient wrapper that implements CollReduce by calling reduce on the parameter, and then calling persistent! on the return value of reduce. You'd also need to implement CollFold so that the partitioning function produces wrapped results. Would that work? -- Dave On Thu, Mar 14, 2013 at 1:02 PM, Meikel Brandmeyer (kotarak) m...@kotka.dewrote: You could use the proposed change (second link) and use a patched clojure in your application. Meikel -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Using transients within fold
On 14 Mar 2013, at 13:13, David Powell djpow...@djpowell.net wrote: As a temporary hack, perhaps you could implement a deftype ReduceToTransient wrapper that implements CollReduce by calling reduce on the parameter, and then calling persistent! on the return value of reduce. You'd also need to implement CollFold so that the partitioning function produces wrapped results. Would that work? Hmm. Possibly. I'll have a think about it... -- paul.butcher-msgCount++ Snetterton, Castle Combe, Cadwell Park... Who says I have a one track mind? http://www.paulbutcher.com/ LinkedIn: http://www.linkedin.com/in/paulbutcher MSN: p...@paulbutcher.com AIM: paulrabutcher Skype: paulrabutcher -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.