Re: Working with a huge graph - how can I make Clojure performant?
Yes, that's definitely a good idea. I tried a few other things (including that, I think) after I posted that but nothing really worked and it turned out that the tail-recursive version even had a bug. I couldn't find a way to really keep the amount of copying of the data structures (stack, finished above) very low and thus my algorithm was slow. I know that the data structures are persistent and share structure but it still was slow for that many elements. I finally solved the problem by implementing an imperative solution with Java arrays and type hints. It ran in ~20-30 seconds. On Thursday, March 28, 2013 2:22:44 AM UTC+1, Stephen Compall wrote: On Mon, 2013-03-11 at 10:37 -0700, Balint Erdi wrote: (let [neighbors (persistent! (reduce (fn [c u] (if (explored u) c (conj! c u))) (transient []) (G v)))] What happens if you do ^^^ *after* vvv? (explored v) (recur vs explored lhalf rhalf (inc iter-cnt)) -- Stephen Compall ^aCollection allSatisfy: [:each | aCondition]: less is better than -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Working with a huge graph - how can I make Clojure performant?
Perhaps for inspiration have a look at Christophe Grand's implementation of Tarjan's algorithmhttp://clj-me.cgrand.net/2013/03/18/tarjans-strongly-connected-components-algorithm/(which is a more efficient version of Kosaraju's). On Thursday, March 28, 2013 12:06:45 PM UTC+1, Balint Erdi wrote: Yes, that's definitely a good idea. I tried a few other things (including that, I think) after I posted that but nothing really worked and it turned out that the tail-recursive version even had a bug. I couldn't find a way to really keep the amount of copying of the data structures (stack, finished above) very low and thus my algorithm was slow. I know that the data structures are persistent and share structure but it still was slow for that many elements. I finally solved the problem by implementing an imperative solution with Java arrays and type hints. It ran in ~20-30 seconds. On Thursday, March 28, 2013 2:22:44 AM UTC+1, Stephen Compall wrote: On Mon, 2013-03-11 at 10:37 -0700, Balint Erdi wrote: (let [neighbors (persistent! (reduce (fn [c u] (if (explored u) c (conj! c u))) (transient []) (G v)))] What happens if you do ^^^ *after* vvv? (explored v) (recur vs explored lhalf rhalf (inc iter-cnt)) -- Stephen Compall ^aCollection allSatisfy: [:each | aCondition]: less is better than -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Working with a huge graph - how can I make Clojure performant?
In fact I did. At first glance it seemed like it would have the same issues as my algorithm for really large graphs. However, there is no certainty without actually trying so I might give it a go with the huge graph. Thank you for bringing it up. On Thursday, March 28, 2013 3:29:01 PM UTC+1, Niels van Klaveren wrote: Perhaps for inspiration have a look at Christophe Grand's implementation of Tarjan's algorithmhttp://clj-me.cgrand.net/2013/03/18/tarjans-strongly-connected-components-algorithm/(which is a more efficient version of Kosaraju's). On Thursday, March 28, 2013 12:06:45 PM UTC+1, Balint Erdi wrote: Yes, that's definitely a good idea. I tried a few other things (including that, I think) after I posted that but nothing really worked and it turned out that the tail-recursive version even had a bug. I couldn't find a way to really keep the amount of copying of the data structures (stack, finished above) very low and thus my algorithm was slow. I know that the data structures are persistent and share structure but it still was slow for that many elements. I finally solved the problem by implementing an imperative solution with Java arrays and type hints. It ran in ~20-30 seconds. On Thursday, March 28, 2013 2:22:44 AM UTC+1, Stephen Compall wrote: On Mon, 2013-03-11 at 10:37 -0700, Balint Erdi wrote: (let [neighbors (persistent! (reduce (fn [c u] (if (explored u) c (conj! c u))) (transient []) (G v)))] What happens if you do ^^^ *after* vvv? (explored v) (recur vs explored lhalf rhalf (inc iter-cnt)) -- Stephen Compall ^aCollection allSatisfy: [:each | aCondition]: less is better than -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Working with a huge graph - how can I make Clojure performant?
If you can't parallelize the work, the default persistent data structures are just an impediment. If, however, you could parallelize it, and adapt the algorithms towards a divide-and-conquer principle instead of the accumulator principle that is the best choice for single-threaded work, then the persistent structures are a real win. Unfortunately, in many areas there are only sequential well-known algorithms. -marko On Thursday, March 28, 2013 12:06:45 PM UTC+1, Balint Erdi wrote: Yes, that's definitely a good idea. I tried a few other things (including that, I think) after I posted that but nothing really worked and it turned out that the tail-recursive version even had a bug. I couldn't find a way to really keep the amount of copying of the data structures (stack, finished above) very low and thus my algorithm was slow. I know that the data structures are persistent and share structure but it still was slow for that many elements. I finally solved the problem by implementing an imperative solution with Java arrays and type hints. It ran in ~20-30 seconds. On Thursday, March 28, 2013 2:22:44 AM UTC+1, Stephen Compall wrote: On Mon, 2013-03-11 at 10:37 -0700, Balint Erdi wrote: (let [neighbors (persistent! (reduce (fn [c u] (if (explored u) c (conj! c u))) (transient []) (G v)))] What happens if you do ^^^ *after* vvv? (explored v) (recur vs explored lhalf rhalf (inc iter-cnt)) -- Stephen Compall ^aCollection allSatisfy: [:each | aCondition]: less is better than -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Working with a huge graph - how can I make Clojure performant?
Have you looked at https://github.com/jordanlewis/data.union-find ? Personally, I'd prefer it to Christophe's implementation, since his blog post seems to start with I dislike this algorithm; I also helped out a bit in writing this version. On Monday, March 11, 2013 10:37:39 AM UTC-7, Balint Erdi wrote: Hey, I got an assignment to implement an algorithm to calculate strongly connected components in a graph ( http://en.wikipedia.org/wiki/Kosaraju's_algorithm). The graph is rather big, it has ~900.000 vertices. In its first pass, it needs to do a depth-first search on the graph and calculate the finishing time for each node (the finishing time for a node is a number from 0…V-1 where V is the number of vertices). Potentially several depth-first search need to be launched (done in the finishing-times function below) to discover all components. The version I pasted below is the most performant. It discovers ~600.000 vertices (2/3 of all vertices). However, on subsequent dfs-by-finishing-times it becomes extremely slow (exploring a handful of additional nodes/second) and I'm not sure why. Here are a few things I tried to speed up the algorithm: * rewrite to a recursive function (not tail-recursive) and use lazy-seqs * define dfs-by-finishing-times in finishing-times so that the whole graph (G) does not have to be passed at each call. * use persistent data structures everywhere instead of transients (I know this would only slow things down but it did not hurt to try) * use a profiler (VisualVM) to learn where the bottlenecks are. I'm not very proficient with profilers but I could not extract any valuable information Here are the code snippets in question: (I also pasted it here: https://www.refheap.com/paste/3840cc772cc3a5b1d9c4f1db3 for better readability) (defn dfs-by-finishing-times ([G u] (dfs-by-finishing-times G u #{})) ([G u explored] (loop [[v vs :as stack] (list u), explored (transient explored), lhalf [], rhalf [], iter-cnt 0] (if (seq stack) (let [neighbors (persistent! (reduce (fn [c u] (if (explored u) c (conj! c u))) (transient []) (G v)))] (cond (explored v) (recur vs explored lhalf rhalf (inc iter-cnt)) (empty? neighbors) (recur vs (conj! explored v) (conj lhalf v) rhalf (inc iter-cnt)) :else (recur (reduce (fn [stack e] (cons e stack)) vs neighbors) (conj! explored v) lhalf (cons v rhalf) (inc iter-cnt (concat lhalf rhalf))) )) (defn finishing-times [G vertices] The first pass of Kosaraju's algorithm. Scan the transpose graph of G, and mark the finishing time for each. G should already be the transposed graph (loop [[u vs :as stack] (seq vertices) explored #{}, finished []] (if (nil? u) finished (let [path (dfs-by-finishing-times G u explored) new-explored (into explored path)] (recur (remove new-explored vs) new-explored (into finished path)) Do you have any insights into what technique I could use to speed up my algorithm? I'm pretty sure I'm missing a key point but I'm not sure what. Presumably the whole algorithm runs under 10 seconds in C# on the same graph so this is rather embarrassing :) Appreciate your help, Balint -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Working with a huge graph - how can I make Clojure performant?
That's quoting far out of context Alan. All Christophe says in his blog is he dislikes the statefulness of most implementations of Tarjan, and shows how this isn't needed, and can be done in a functional way. You could have stated the arguments why you think your version is superior, and it might very well be, but don't picture it like Christophe said he doesn't like his own implementation. On Thursday, March 28, 2013 7:38:41 PM UTC+1, Alan Malloy wrote: Have you looked at https://github.com/jordanlewis/data.union-find ? Personally, I'd prefer it to Christophe's implementation, since his blog post seems to start with I dislike this algorithm; I also helped out a bit in writing this version. On Monday, March 11, 2013 10:37:39 AM UTC-7, Balint Erdi wrote: Hey, I got an assignment to implement an algorithm to calculate strongly connected components in a graph ( http://en.wikipedia.org/wiki/Kosaraju's_algorithm). The graph is rather big, it has ~900.000 vertices. In its first pass, it needs to do a depth-first search on the graph and calculate the finishing time for each node (the finishing time for a node is a number from 0…V-1 where V is the number of vertices). Potentially several depth-first search need to be launched (done in the finishing-times function below) to discover all components. The version I pasted below is the most performant. It discovers ~600.000 vertices (2/3 of all vertices). However, on subsequent dfs-by-finishing-times it becomes extremely slow (exploring a handful of additional nodes/second) and I'm not sure why. Here are a few things I tried to speed up the algorithm: * rewrite to a recursive function (not tail-recursive) and use lazy-seqs * define dfs-by-finishing-times in finishing-times so that the whole graph (G) does not have to be passed at each call. * use persistent data structures everywhere instead of transients (I know this would only slow things down but it did not hurt to try) * use a profiler (VisualVM) to learn where the bottlenecks are. I'm not very proficient with profilers but I could not extract any valuable information Here are the code snippets in question: (I also pasted it here: https://www.refheap.com/paste/3840cc772cc3a5b1d9c4f1db3 for better readability) (defn dfs-by-finishing-times ([G u] (dfs-by-finishing-times G u #{})) ([G u explored] (loop [[v vs :as stack] (list u), explored (transient explored), lhalf [], rhalf [], iter-cnt 0] (if (seq stack) (let [neighbors (persistent! (reduce (fn [c u] (if (explored u) c (conj! c u))) (transient []) (G v)))] (cond (explored v) (recur vs explored lhalf rhalf (inc iter-cnt)) (empty? neighbors) (recur vs (conj! explored v) (conj lhalf v) rhalf (inc iter-cnt)) :else (recur (reduce (fn [stack e] (cons e stack)) vs neighbors) (conj! explored v) lhalf (cons v rhalf) (inc iter-cnt (concat lhalf rhalf))) )) (defn finishing-times [G vertices] The first pass of Kosaraju's algorithm. Scan the transpose graph of G, and mark the finishing time for each. G should already be the transposed graph (loop [[u vs :as stack] (seq vertices) explored #{}, finished []] (if (nil? u) finished (let [path (dfs-by-finishing-times G u explored) new-explored (into explored path)] (recur (remove new-explored vs) new-explored (into finished path)) Do you have any insights into what technique I could use to speed up my algorithm? I'm pretty sure I'm missing a key point but I'm not sure what. Presumably the whole algorithm runs under 10 seconds in C# on the same graph so this is rather embarrassing :) Appreciate your help, Balint -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Working with a huge graph - how can I make Clojure performant?
On Mon, 2013-03-11 at 10:37 -0700, Balint Erdi wrote: (let [neighbors (persistent! (reduce (fn [c u] (if (explored u) c (conj! c u))) (transient []) (G v)))] What happens if you do ^^^ *after* vvv? (explored v) (recur vs explored lhalf rhalf (inc iter-cnt)) -- Stephen Compall ^aCollection allSatisfy: [:each | aCondition]: less is better than -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Working with a huge graph - how can I make Clojure performant?
Hey, I got an assignment to implement an algorithm to calculate strongly connected components in a graph ( http://en.wikipedia.org/wiki/Kosaraju's_algorithm). The graph is rather big, it has ~900.000 vertices. In its first pass, it needs to do a depth-first search on the graph and calculate the finishing time for each node (the finishing time for a node is a number from 0…V-1 where V is the number of vertices). Potentially several depth-first search need to be launched (done in the finishing-times function below) to discover all components. The version I pasted below is the most performant. It discovers ~600.000 vertices (2/3 of all vertices). However, on subsequent dfs-by-finishing-times it becomes extremely slow (exploring a handful of additional nodes/second) and I'm not sure why. Here are a few things I tried to speed up the algorithm: * rewrite to a recursive function (not tail-recursive) and use lazy-seqs * define dfs-by-finishing-times in finishing-times so that the whole graph (G) does not have to be passed at each call. * use persistent data structures everywhere instead of transients (I know this would only slow things down but it did not hurt to try) * use a profiler (VisualVM) to learn where the bottlenecks are. I'm not very proficient with profilers but I could not extract any valuable information Here are the code snippets in question: (I also pasted it here: https://www.refheap.com/paste/3840cc772cc3a5b1d9c4f1db3 for better readability) (defn dfs-by-finishing-times ([G u] (dfs-by-finishing-times G u #{})) ([G u explored] (loop [[v vs :as stack] (list u), explored (transient explored), lhalf [], rhalf [], iter-cnt 0] (if (seq stack) (let [neighbors (persistent! (reduce (fn [c u] (if (explored u) c (conj! c u))) (transient []) (G v)))] (cond (explored v) (recur vs explored lhalf rhalf (inc iter-cnt)) (empty? neighbors) (recur vs (conj! explored v) (conj lhalf v) rhalf (inc iter-cnt)) :else (recur (reduce (fn [stack e] (cons e stack)) vs neighbors) (conj! explored v) lhalf (cons v rhalf) (inc iter-cnt (concat lhalf rhalf))) )) (defn finishing-times [G vertices] The first pass of Kosaraju's algorithm. Scan the transpose graph of G, and mark the finishing time for each. G should already be the transposed graph (loop [[u vs :as stack] (seq vertices) explored #{}, finished []] (if (nil? u) finished (let [path (dfs-by-finishing-times G u explored) new-explored (into explored path)] (recur (remove new-explored vs) new-explored (into finished path)) Do you have any insights into what technique I could use to speed up my algorithm? I'm pretty sure I'm missing a key point but I'm not sure what. Presumably the whole algorithm runs under 10 seconds in C# on the same graph so this is rather embarrassing :) Appreciate your help, Balint -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.