Using transients within fold

2013-03-14 Thread Paul Butcher
I've been experimenting with reducers using a small example that counts the 
words in Wikipedia pages by parsing the Wikipedia XML dump. The basic structure 
of the code is:

(frequencies (flatten (map get-words (get-pages

where get-pages returns a lazy sequence of pages from the XML dump and 
get-words takes a page and returns a sequence of the words on that page. The 
above code takes ~40s to count the words on the first 1 pages.

If I convert that code to use reducers, it runs in ~22s (yay!).

If I convert it to use fold and therefore run in parallel, it runs in ~13s on 
my 4-core MacBook Pro. So it's faster (yay!) but nowhere near 4x faster (boo).

The primary reason for this is that, in order to be able to use fold, I've had 
to write my own version of frequencies:

(defn frequencies-parallel [words]
  (r/fold (partial merge-with +)
  (fn [counts x] (assoc counts x (inc (get counts x 0
  words))

And, unlike the version in core, this doesn't use transients. If I replace the 
fold with reduce (i.e. make it run sequentially) it runs in ~43s.

So, I *am* getting close to a 4x speedup from parallelising the code, but 
unfortunately I'm also seeing a 2x slowdown because I can't use transients.

Can anyone think of any way that it would be possible to modify this code to 
use transients? Or any way to modify reducers to allow transients to be used?

--
paul.butcher-msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: p...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using transients within fold

2013-03-14 Thread Meikel Brandmeyer (kotarak)
Hi,

that's not really possible at the moment. 
cf. https://groups.google.com/d/topic/clojure-dev/UbJlMO9XYjo/discussion 
and 
https://github.com/cgrand/clojure/commit/65e1acef03362a76f7043ebf3fe2fa277c581912

Kind regards
Meikel

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using transients within fold

2013-03-14 Thread Paul Butcher
On 14 Mar 2013, at 11:49, Meikel Brandmeyer (kotarak) m...@kotka.de wrote:

 that's not really possible at the moment. cf. 
 https://groups.google.com/d/topic/clojure-dev/UbJlMO9XYjo/discussion and 
 https://github.com/cgrand/clojure/commit/65e1acef03362a76f7043ebf3fe2fa277c581912

Dang. At least other people have the same problem, so perhaps there's a chance 
it'll be addressed :-)

--
paul.butcher-msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: p...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using transients within fold

2013-03-14 Thread David Powell
As a temporary hack, perhaps you could implement a deftype
ReduceToTransient wrapper that implements CollReduce by calling reduce on
the parameter, and then calling persistent! on the return value of reduce.
 You'd also need to implement CollFold so that the partitioning function
produces wrapped results.

Would that work?

-- 
Dave



On Thu, Mar 14, 2013 at 1:02 PM, Meikel Brandmeyer (kotarak) 
m...@kotka.dewrote:

 You could use the proposed change (second link) and use a patched clojure
 in your application.

 Meikel

  --
 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are subscribed to the Google Groups
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Using transients within fold

2013-03-14 Thread Paul Butcher
On 14 Mar 2013, at 13:13, David Powell djpow...@djpowell.net wrote:

 As a temporary hack, perhaps you could implement a deftype ReduceToTransient 
 wrapper that implements CollReduce by calling reduce on the parameter, and 
 then calling persistent! on the return value of reduce.  You'd also need to 
 implement CollFold so that the partitioning function produces wrapped results.
 
 Would that work?


Hmm. Possibly. I'll have a think about it...

--
paul.butcher-msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: p...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.