This is related to my multi-level bucketing problem which I am starting
new thread. 

The code is at:

   https://gist.github.com/952861

I am referring to the sum-by function which was provided by fellow
clojurer in this group. It choked when I passed in data of size one
million - meaning I didn't run out of memory but it took a very long
time. 

Quoting below are two functions from my code:

(def data (take 1000000 (repeatedly get-rec)))

;get aggregate values for list of attributes
(defn sum-by [data attrs]
  (let [aggregated (group-by (apply juxt attrs) data)]
    (zipmap (keys aggregated) (map #(reduce + (map :mv %)) (vals
aggregated)))))

;invoke sum-by
(sum-by data [:attr1 :attr2])

Are there any obvious performance optimizations (e.g. transient, etc)
that can be performed so that the function can perform better and
consume less memory? In general what are the things to watch out for
when writing functions such as these so as not to get poor performance
with very large data sets.

Thanks for your help.

-- Shoeb

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to