Re: Working with a huge graph - how can I make Clojure performant?

2013-03-28 Thread Balint Erdi
Yes, that's definitely a good idea. I tried a few other things (including 
that, I think) after I posted that but nothing really worked and it turned 
out that the tail-recursive version even had a bug.

I couldn't find a way to really keep the amount of copying of the data 
structures (stack, finished above) very low and thus my algorithm was slow. 
I know that the data structures are persistent and share structure but it 
still was slow for that many elements.

I finally solved the problem by implementing an imperative solution with 
Java arrays and type hints. It ran in ~20-30 seconds.

On Thursday, March 28, 2013 2:22:44 AM UTC+1, Stephen Compall wrote:

 On Mon, 2013-03-11 at 10:37 -0700, Balint Erdi wrote: 
(let [neighbors (persistent! 
 (reduce 
  (fn [c u] (if (explored u) c (conj! c u))) 
  (transient []) 
  (G v)))] 

 What happens if you do ^^^ *after* vvv? 

   (explored v) (recur vs explored lhalf rhalf (inc iter-cnt)) 

 -- 
 Stephen Compall 
 ^aCollection allSatisfy: [:each | aCondition]: less is better than 




-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Working with a huge graph - how can I make Clojure performant?

2013-03-28 Thread Niels van Klaveren
Perhaps for inspiration have a look at Christophe Grand's implementation of 
Tarjan's 
algorithmhttp://clj-me.cgrand.net/2013/03/18/tarjans-strongly-connected-components-algorithm/(which
 is a more efficient version of Kosaraju's).

On Thursday, March 28, 2013 12:06:45 PM UTC+1, Balint Erdi wrote:

 Yes, that's definitely a good idea. I tried a few other things (including 
 that, I think) after I posted that but nothing really worked and it turned 
 out that the tail-recursive version even had a bug.

 I couldn't find a way to really keep the amount of copying of the data 
 structures (stack, finished above) very low and thus my algorithm was slow. 
 I know that the data structures are persistent and share structure but it 
 still was slow for that many elements.

 I finally solved the problem by implementing an imperative solution with 
 Java arrays and type hints. It ran in ~20-30 seconds.

 On Thursday, March 28, 2013 2:22:44 AM UTC+1, Stephen Compall wrote:

 On Mon, 2013-03-11 at 10:37 -0700, Balint Erdi wrote: 
(let [neighbors (persistent! 
 (reduce 
  (fn [c u] (if (explored u) c (conj! c u))) 
  (transient []) 
  (G v)))] 

 What happens if you do ^^^ *after* vvv? 

   (explored v) (recur vs explored lhalf rhalf (inc 
 iter-cnt)) 

 -- 
 Stephen Compall 
 ^aCollection allSatisfy: [:each | aCondition]: less is better than 




-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Working with a huge graph - how can I make Clojure performant?

2013-03-28 Thread Balint Erdi
In fact I did. At first glance it seemed like it would have the same issues 
as my algorithm for really large graphs. However, there is no certainty 
without actually trying so I might give it a go with the huge graph.

Thank you for bringing it up.

On Thursday, March 28, 2013 3:29:01 PM UTC+1, Niels van Klaveren wrote:

 Perhaps for inspiration have a look at Christophe Grand's implementation 
 of Tarjan's 
 algorithmhttp://clj-me.cgrand.net/2013/03/18/tarjans-strongly-connected-components-algorithm/(which
  is a more efficient version of Kosaraju's).

 On Thursday, March 28, 2013 12:06:45 PM UTC+1, Balint Erdi wrote:

 Yes, that's definitely a good idea. I tried a few other things (including 
 that, I think) after I posted that but nothing really worked and it turned 
 out that the tail-recursive version even had a bug.

 I couldn't find a way to really keep the amount of copying of the data 
 structures (stack, finished above) very low and thus my algorithm was slow. 
 I know that the data structures are persistent and share structure but it 
 still was slow for that many elements.

 I finally solved the problem by implementing an imperative solution with 
 Java arrays and type hints. It ran in ~20-30 seconds.

 On Thursday, March 28, 2013 2:22:44 AM UTC+1, Stephen Compall wrote:

 On Mon, 2013-03-11 at 10:37 -0700, Balint Erdi wrote: 
(let [neighbors (persistent! 
 (reduce 
  (fn [c u] (if (explored u) c (conj! c u))) 
  (transient []) 
  (G v)))] 

 What happens if you do ^^^ *after* vvv? 

   (explored v) (recur vs explored lhalf rhalf (inc 
 iter-cnt)) 

 -- 
 Stephen Compall 
 ^aCollection allSatisfy: [:each | aCondition]: less is better than 




-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Working with a huge graph - how can I make Clojure performant?

2013-03-28 Thread Marko Topolnik
If you can't parallelize the work, the default persistent data structures 
are just an impediment. If, however, you could parallelize it, and adapt 
the algorithms towards a divide-and-conquer principle instead of the 
accumulator principle that is the best choice for single-threaded work, 
then the persistent structures are a real win. Unfortunately, in many areas 
there are only sequential well-known algorithms.

-marko


On Thursday, March 28, 2013 12:06:45 PM UTC+1, Balint Erdi wrote:

 Yes, that's definitely a good idea. I tried a few other things (including 
 that, I think) after I posted that but nothing really worked and it turned 
 out that the tail-recursive version even had a bug.

 I couldn't find a way to really keep the amount of copying of the data 
 structures (stack, finished above) very low and thus my algorithm was slow. 
 I know that the data structures are persistent and share structure but it 
 still was slow for that many elements.

 I finally solved the problem by implementing an imperative solution with 
 Java arrays and type hints. It ran in ~20-30 seconds.

 On Thursday, March 28, 2013 2:22:44 AM UTC+1, Stephen Compall wrote:

 On Mon, 2013-03-11 at 10:37 -0700, Balint Erdi wrote: 
(let [neighbors (persistent! 
 (reduce 
  (fn [c u] (if (explored u) c (conj! c u))) 
  (transient []) 
  (G v)))] 

 What happens if you do ^^^ *after* vvv? 

   (explored v) (recur vs explored lhalf rhalf (inc 
 iter-cnt)) 

 -- 
 Stephen Compall 
 ^aCollection allSatisfy: [:each | aCondition]: less is better than 




-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Working with a huge graph - how can I make Clojure performant?

2013-03-28 Thread Alan Malloy
Have you looked at https://github.com/jordanlewis/data.union-find ? 
Personally, I'd prefer it to Christophe's implementation, since his blog 
post seems to start with I dislike this algorithm; I also helped out a 
bit in writing this version.

On Monday, March 11, 2013 10:37:39 AM UTC-7, Balint Erdi wrote:

 Hey,


 I got an assignment to implement an algorithm to calculate strongly 
 connected components in a graph (
 http://en.wikipedia.org/wiki/Kosaraju's_algorithm). The graph is rather 
 big, it has ~900.000 vertices.


 In its first pass, it needs to do a depth-first search on the graph and 
 calculate the finishing time for each node (the finishing time for a node 
 is a number from 0…V-1 where V is the number of vertices). Potentially 
 several depth-first search need to be launched (done in the finishing-times 
 function below) to discover all components.


 The version I pasted below is the most performant. It discovers ~600.000 
 vertices (2/3 of all vertices). However, on subsequent 
 dfs-by-finishing-times it becomes extremely slow (exploring a handful of 
 additional nodes/second) and I'm not sure why.


 Here are a few things I tried to speed up the algorithm:


 * rewrite to a recursive function (not tail-recursive) and use lazy-seqs

 * define dfs-by-finishing-times in finishing-times so that the whole graph 
 (G) does not have to be passed at each call.

 * use persistent data structures everywhere instead of transients (I know 
 this would only slow things down but it did not hurt to try)

 * use a profiler (VisualVM) to learn where the bottlenecks are. I'm not 
 very proficient with profilers but I could not extract any valuable 
 information


 Here are the code snippets in question:

 (I also pasted it here: 
 https://www.refheap.com/paste/3840cc772cc3a5b1d9c4f1db3 for better 
 readability)

 (defn dfs-by-finishing-times

   ([G u]

  (dfs-by-finishing-times G u #{}))

   ([G u explored]

  (loop [[v  vs :as stack] (list u), explored (transient explored), 
 lhalf [], rhalf [],  iter-cnt 0]

  (if (seq stack)

   (let [neighbors (persistent!

(reduce

 (fn [c u] (if (explored u) c (conj! c u)))

 (transient [])

 (G v)))]

 (cond

  (explored v) (recur vs explored lhalf rhalf (inc iter-cnt))

  (empty? neighbors) (recur vs (conj! explored v) (conj lhalf 
 v) rhalf (inc iter-cnt))

  :else (recur (reduce (fn [stack e] (cons e stack)) vs 
 neighbors)

(conj! explored v)

lhalf

(cons v rhalf)

(inc iter-cnt

   (concat lhalf rhalf)))

  ))


 (defn finishing-times [G vertices]

   The first pass of Kosaraju's algorithm.

Scan the transpose graph of G, and mark the finishing time for each.

G should already be the transposed graph

   (loop [[u  vs :as stack] (seq vertices)

   explored #{},

   finished []]

  (if (nil? u)

finished

(let [path (dfs-by-finishing-times G u explored)

  new-explored (into explored path)]

  (recur (remove new-explored vs)

 new-explored

 (into finished path))

 Do you have any insights into what technique I could use to speed up my 
 algorithm? I'm pretty sure I'm missing a key point but I'm not sure 
 what. Presumably the whole algorithm runs under 10 seconds in C# on the 
 same graph so this is rather embarrassing :)

 Appreciate your help,

 Balint


-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Working with a huge graph - how can I make Clojure performant?

2013-03-28 Thread Niels van Klaveren
That's quoting far out of context Alan. All Christophe says in his blog is 
he dislikes the statefulness of most implementations of Tarjan, and shows 
how this isn't needed, and can be done in a functional way.

You could have stated the arguments why you think your version is superior, 
and it might very well be, but don't picture it like Christophe said he 
doesn't like his own implementation.

On Thursday, March 28, 2013 7:38:41 PM UTC+1, Alan Malloy wrote:

 Have you looked at https://github.com/jordanlewis/data.union-find ? 
 Personally, I'd prefer it to Christophe's implementation, since his blog 
 post seems to start with I dislike this algorithm; I also helped out a 
 bit in writing this version.

 On Monday, March 11, 2013 10:37:39 AM UTC-7, Balint Erdi wrote:

 Hey,


 I got an assignment to implement an algorithm to calculate strongly 
 connected components in a graph (
 http://en.wikipedia.org/wiki/Kosaraju's_algorithm). The graph is rather 
 big, it has ~900.000 vertices.


 In its first pass, it needs to do a depth-first search on the graph and 
 calculate the finishing time for each node (the finishing time for a node 
 is a number from 0…V-1 where V is the number of vertices). Potentially 
 several depth-first search need to be launched (done in the finishing-times 
 function below) to discover all components.


 The version I pasted below is the most performant. It discovers ~600.000 
 vertices (2/3 of all vertices). However, on subsequent 
 dfs-by-finishing-times it becomes extremely slow (exploring a handful of 
 additional nodes/second) and I'm not sure why.


 Here are a few things I tried to speed up the algorithm:


 * rewrite to a recursive function (not tail-recursive) and use lazy-seqs

 * define dfs-by-finishing-times in finishing-times so that the whole 
 graph (G) does not have to be passed at each call.

 * use persistent data structures everywhere instead of transients (I know 
 this would only slow things down but it did not hurt to try)

 * use a profiler (VisualVM) to learn where the bottlenecks are. I'm not 
 very proficient with profilers but I could not extract any valuable 
 information


 Here are the code snippets in question:

 (I also pasted it here: 
 https://www.refheap.com/paste/3840cc772cc3a5b1d9c4f1db3 for better 
 readability)

 (defn dfs-by-finishing-times

   ([G u]

  (dfs-by-finishing-times G u #{}))

   ([G u explored]

  (loop [[v  vs :as stack] (list u), explored (transient explored), 
 lhalf [], rhalf [],  iter-cnt 0]

  (if (seq stack)

   (let [neighbors (persistent!

(reduce

 (fn [c u] (if (explored u) c (conj! c u)))

 (transient [])

 (G v)))]

 (cond

  (explored v) (recur vs explored lhalf rhalf (inc iter-cnt))

  (empty? neighbors) (recur vs (conj! explored v) (conj lhalf 
 v) rhalf (inc iter-cnt))

  :else (recur (reduce (fn [stack e] (cons e stack)) vs 
 neighbors)

(conj! explored v)

lhalf

(cons v rhalf)

(inc iter-cnt

   (concat lhalf rhalf)))

  ))


 (defn finishing-times [G vertices]

   The first pass of Kosaraju's algorithm.

Scan the transpose graph of G, and mark the finishing time for each.

G should already be the transposed graph

   (loop [[u  vs :as stack] (seq vertices)

   explored #{},

   finished []]

  (if (nil? u)

finished

(let [path (dfs-by-finishing-times G u explored)

  new-explored (into explored path)]

  (recur (remove new-explored vs)

 new-explored

 (into finished path))

 Do you have any insights into what technique I could use to speed up my 
 algorithm? I'm pretty sure I'm missing a key point but I'm not sure 
 what. Presumably the whole algorithm runs under 10 seconds in C# on the 
 same graph so this is rather embarrassing :)

 Appreciate your help,

 Balint



-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: Working with a huge graph - how can I make Clojure performant?

2013-03-27 Thread Stephen Compall
On Mon, 2013-03-11 at 10:37 -0700, Balint Erdi wrote:
   (let [neighbors (persistent!
(reduce
 (fn [c u] (if (explored u) c (conj! c u)))
 (transient [])
 (G v)))]

What happens if you do ^^^ *after* vvv?

  (explored v) (recur vs explored lhalf rhalf (inc iter-cnt))

-- 
Stephen Compall
^aCollection allSatisfy: [:each | aCondition]: less is better than


-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Working with a huge graph - how can I make Clojure performant?

2013-03-11 Thread Balint Erdi
 

Hey,


I got an assignment to implement an algorithm to calculate strongly 
connected components in a graph (
http://en.wikipedia.org/wiki/Kosaraju's_algorithm). The graph is rather 
big, it has ~900.000 vertices.


In its first pass, it needs to do a depth-first search on the graph and 
calculate the finishing time for each node (the finishing time for a node 
is a number from 0…V-1 where V is the number of vertices). Potentially 
several depth-first search need to be launched (done in the finishing-times 
function below) to discover all components.


The version I pasted below is the most performant. It discovers ~600.000 
vertices (2/3 of all vertices). However, on subsequent 
dfs-by-finishing-times it becomes extremely slow (exploring a handful of 
additional nodes/second) and I'm not sure why.


Here are a few things I tried to speed up the algorithm:


* rewrite to a recursive function (not tail-recursive) and use lazy-seqs

* define dfs-by-finishing-times in finishing-times so that the whole graph 
(G) does not have to be passed at each call.

* use persistent data structures everywhere instead of transients (I know 
this would only slow things down but it did not hurt to try)

* use a profiler (VisualVM) to learn where the bottlenecks are. I'm not 
very proficient with profilers but I could not extract any valuable 
information


Here are the code snippets in question:

(I also pasted it 
here: https://www.refheap.com/paste/3840cc772cc3a5b1d9c4f1db3 for better 
readability)

(defn dfs-by-finishing-times

  ([G u]

 (dfs-by-finishing-times G u #{}))

  ([G u explored]

 (loop [[v  vs :as stack] (list u), explored (transient explored), 
lhalf [], rhalf [],  iter-cnt 0]

 (if (seq stack)

  (let [neighbors (persistent!

   (reduce

(fn [c u] (if (explored u) c (conj! c u)))

(transient [])

(G v)))]

(cond

 (explored v) (recur vs explored lhalf rhalf (inc iter-cnt))

 (empty? neighbors) (recur vs (conj! explored v) (conj lhalf v) 
rhalf (inc iter-cnt))

 :else (recur (reduce (fn [stack e] (cons e stack)) vs 
neighbors)

   (conj! explored v)

   lhalf

   (cons v rhalf)

   (inc iter-cnt

  (concat lhalf rhalf)))

 ))


(defn finishing-times [G vertices]

  The first pass of Kosaraju's algorithm.

   Scan the transpose graph of G, and mark the finishing time for each.

   G should already be the transposed graph

  (loop [[u  vs :as stack] (seq vertices)

  explored #{},

  finished []]

 (if (nil? u)

   finished

   (let [path (dfs-by-finishing-times G u explored)

 new-explored (into explored path)]

 (recur (remove new-explored vs)

new-explored

(into finished path))

Do you have any insights into what technique I could use to speed up my 
algorithm? I'm pretty sure I'm missing a key point but I'm not sure 
what. Presumably the whole algorithm runs under 10 seconds in C# on the 
same graph so this is rather embarrassing :)

Appreciate your help,

Balint

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.