Re: Loading a huge graph
Thanks, having a C++ background, there is definitely a lot to learn about the JVM. I'm wrote a script in Python that ran in about 10-12s and used up to 320MB of memory. I'm running a clojure repl from Emacs via jack-in. What's the best way to adjust the heap size? swank-clojure and clojure-mode don't say. If I spin up the repl via lein swank, I can use the LEIN_JVM_OPTS, however it affects only one of the JVMs started by lein swank, the one with jline. Any advice? :) 2012/4/13 Alex Robbins alexander.j.robb...@gmail.com Yeah, sounds like it could definitely be a memory issue. This is one part where the JVM works a lot differently than I expected coming from a python background. Everybody may already know this, but the JVM only takes 64mb for the heap by default. You'll get an out of memory error if your program uses more than that. In contrast, python just takes all the memory it needs. As your program gets closer to the JVM memory limit it'll spend more and more time doing garbage collection, with less and less real work getting done. You can pass an -Xmx flag to give java access to more memory, which many (most?) programs do. On Thu, Apr 12, 2012 at 5:22 PM, David Nolen dnolen.li...@gmail.com wrote: How much memory do Python Go consume when you do this? Are you giving the JVM enough memory? On Thu, Apr 12, 2012 at 6:17 PM, László Török ltoro...@gmail.com wrote: Hi, I'm trying figure out how to load a huge file that contains some 800k pair of integers (two integers per line) which represent edges of a directed graph. So if the ith line has x and y, it means that there is an edge between x and y vertex in the graph. The goal is to load it in an array of arrays representation, where the kth array contains all the nodes, where there is a directed edge from the kth node to those nodes. I've attempted multiple variants of with-open reader and line-seq etc. but almost always ended up with OutMemoryException or sg VERY slow. My latest attempt that also does not work on the large input: (defn load-graph [input-f] (with-open [rdr (io/reader input-f)] (- (line-seq rdr) (map (fn [row] (let [[v1str v2str] (str/split row #\s)] [ (Integer/parseInt v1str) (Integer/parseInt v2str) ])) ) (reduce (fn [G [v1 v2]] (if-let [vs (get G v1)] (update-in G [v1] #(conj % v2)) (assoc G v1 [v2]))) { } I'm getting a bit frustrated as there are Python, Go implementations that load the graph in less the 5 seconds. What am I doing wrong? Thanks -- László Török -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- László Török -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Loading a huge graph
I have this option in my project.clj file, which does the trick if you are developing from emacs+swank+clojure-jack-in, and using large networks :jvm-opts [-Xmx4000m] And yes, one of the things to do when working with the jvm is learning how to use jconsole or visualvm to see why your program is slow :-), and know how to adapt the jvm options. If I remember correctly, java 1.7 has an option for a more flexible heap size, which might be a nice default. On Friday, 13 April 2012 10:32:24 UTC+2, Las wrote: Thanks, having a C++ background, there is definitely a lot to learn about the JVM. I'm wrote a script in Python that ran in about 10-12s and used up to 320MB of memory. I'm running a clojure repl from Emacs via jack-in. What's the best way to adjust the heap size? swank-clojure and clojure-mode don't say. If I spin up the repl via lein swank, I can use the LEIN_JVM_OPTS, however it affects only one of the JVMs started by lein swank, the one with jline. Any advice? :) 2012/4/13 Alex Robbins alexander.j.robb...@gmail.com Yeah, sounds like it could definitely be a memory issue. This is one part where the JVM works a lot differently than I expected coming from a python background. Everybody may already know this, but the JVM only takes 64mb for the heap by default. You'll get an out of memory error if your program uses more than that. In contrast, python just takes all the memory it needs. As your program gets closer to the JVM memory limit it'll spend more and more time doing garbage collection, with less and less real work getting done. You can pass an -Xmx flag to give java access to more memory, which many (most?) programs do. On Thu, Apr 12, 2012 at 5:22 PM, David Nolen dnolen.li...@gmail.com wrote: How much memory do Python Go consume when you do this? Are you giving the JVM enough memory? On Thu, Apr 12, 2012 at 6:17 PM, László Török ltoro...@gmail.com wrote: Hi, I'm trying figure out how to load a huge file that contains some 800k pair of integers (two integers per line) which represent edges of a directed graph. So if the ith line has x and y, it means that there is an edge between x and y vertex in the graph. The goal is to load it in an array of arrays representation, where the kth array contains all the nodes, where there is a directed edge from the kth node to those nodes. I've attempted multiple variants of with-open reader and line-seq etc. but almost always ended up with OutMemoryException or sg VERY slow. My latest attempt that also does not work on the large input: (defn load-graph [input-f] (with-open [rdr (io/reader input-f)] (- (line-seq rdr) (map (fn [row] (let [[v1str v2str] (str/split row #\s)] [ (Integer/parseInt v1str) (Integer/parseInt v2str) ])) ) (reduce (fn [G [v1 v2]] (if-let [vs (get G v1)] (update-in G [v1] #(conj % v2)) (assoc G v1 [v2]))) { } I'm getting a bit frustrated as there are Python, Go implementations that load the graph in less the 5 seconds. What am I doing wrong? Thanks -- László Török -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- László Török -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Loading a huge graph
Excellent! Thank you, that did the trick! 2012/4/13 bOR_ boris.sch...@gmail.com I have this option in my project.clj file, which does the trick if you are developing from emacs+swank+clojure-jack-in, and using large networks :jvm-opts [-Xmx4000m] And yes, one of the things to do when working with the jvm is learning how to use jconsole or visualvm to see why your program is slow :-), and know how to adapt the jvm options. If I remember correctly, java 1.7 has an option for a more flexible heap size, which might be a nice default. On Friday, 13 April 2012 10:32:24 UTC+2, Las wrote: Thanks, having a C++ background, there is definitely a lot to learn about the JVM. I'm wrote a script in Python that ran in about 10-12s and used up to 320MB of memory. I'm running a clojure repl from Emacs via jack-in. What's the best way to adjust the heap size? swank-clojure and clojure-mode don't say. If I spin up the repl via lein swank, I can use the LEIN_JVM_OPTS, however it affects only one of the JVMs started by lein swank, the one with jline. Any advice? :) 2012/4/13 Alex Robbins alexander.j.robb...@gmail.com** Yeah, sounds like it could definitely be a memory issue. This is one part where the JVM works a lot differently than I expected coming from a python background. Everybody may already know this, but the JVM only takes 64mb for the heap by default. You'll get an out of memory error if your program uses more than that. In contrast, python just takes all the memory it needs. As your program gets closer to the JVM memory limit it'll spend more and more time doing garbage collection, with less and less real work getting done. You can pass an -Xmx flag to give java access to more memory, which many (most?) programs do. On Thu, Apr 12, 2012 at 5:22 PM, David Nolen dnolen.li...@gmail.com wrote: How much memory do Python Go consume when you do this? Are you giving the JVM enough memory? On Thu, Apr 12, 2012 at 6:17 PM, László Török ltoro...@gmail.com wrote: Hi, I'm trying figure out how to load a huge file that contains some 800k pair of integers (two integers per line) which represent edges of a directed graph. So if the ith line has x and y, it means that there is an edge between x and y vertex in the graph. The goal is to load it in an array of arrays representation, where the kth array contains all the nodes, where there is a directed edge from the kth node to those nodes. I've attempted multiple variants of with-open reader and line-seq etc. but almost always ended up with OutMemoryException or sg VERY slow. My latest attempt that also does not work on the large input: (defn load-graph [input-f] (with-open [rdr (io/reader input-f)] (- (line-seq rdr) (map (fn [row] (let [[v1str v2str] (str/split row #\s)] [ (Integer/parseInt v1str) (Integer/parseInt v2str) ])) ) (reduce (fn [G [v1 v2]] (if-let [vs (get G v1)] (update-in G [v1] #(conj % v2)) (assoc G v1 [v2]))) { } I'm getting a bit frustrated as there are Python, Go implementations that load the graph in less the 5 seconds. What am I doing wrong? Thanks -- László Török -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscribe@**googlegroups.comclojure%2bunsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/**group/clojure?hl=enhttp://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscribe@**googlegroups.comclojure%2bunsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/**group/clojure?hl=enhttp://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscribe@**googlegroups.comclojure%2bunsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/**group/clojure?hl=enhttp://groups.google.com/group/clojure?hl=en -- László Török -- You received this message because you are subscribed to the Google Groups Clojure
Re: Loading a huge graph
If the jvm does have enough memory, you may want to try building up the map using a transient. And not sure if this is faster, (maybe it's slower), but you can spell the function you pass to reduce more succinctly: (fn [G [v1 v2]] (update-in G [v1] (fnil conj []) v2)) Robert On Thu, Apr 12, 2012 at 06:22:34PM -0400, David Nolen wrote: How much memory do Python Go consume when you do this? Are you giving the JVM enough memory? On Thu, Apr 12, 2012 at 6:17 PM, László Török ltoro...@gmail.com wrote: Hi, I'm trying figure out how to load a huge file that contains some 800k pair of integers (two integers per line) which represent edges of a directed graph. So if the ith line has x and y, it means that there is an edge between x and y vertex in the graph. The goal is to load it in an array of arrays representation, where the kth array contains all the nodes, where there is a directed edge from the kth node to those nodes. I've attempted multiple variants of with-open reader and line-seq etc. but almost always ended up with OutMemoryException or sg VERY slow. My latest attempt that also does not work on the large input: (defn load-graph [input-f] (with-open [rdr (io/reader input-f)] (- (line-seq rdr) (map (fn [row] (let [[v1str v2str] (str/split row #\s)] [ (Integer/parseInt v1str) (Integer/parseInt v2str) ])) ) (reduce (fn [G [v1 v2]] (if-let [vs (get G v1)] (update-in G [v1] #(conj % v2)) (assoc G v1 [v2]))) { } I'm getting a bit frustrated as there are Python, Go implementations that load the graph in less the 5 seconds. What am I doing wrong? Thanks -- László Török -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Loading a huge graph
Thanks, I know about transients, but I'm already using mutable arrays for speed :-) On Apr 13, 2012 8:01 PM, Robert Marianski r...@marianski.com wrote: If the jvm does have enough memory, you may want to try building up the map using a transient. And not sure if this is faster, (maybe it's slower), but you can spell the function you pass to reduce more succinctly: (fn [G [v1 v2]] (update-in G [v1] (fnil conj []) v2)) Robert On Thu, Apr 12, 2012 at 06:22:34PM -0400, David Nolen wrote: How much memory do Python Go consume when you do this? Are you giving the JVM enough memory? On Thu, Apr 12, 2012 at 6:17 PM, László Török ltoro...@gmail.com wrote: Hi, I'm trying figure out how to load a huge file that contains some 800k pair of integers (two integers per line) which represent edges of a directed graph. So if the ith line has x and y, it means that there is an edge between x and y vertex in the graph. The goal is to load it in an array of arrays representation, where the kth array contains all the nodes, where there is a directed edge from the kth node to those nodes. I've attempted multiple variants of with-open reader and line-seq etc. but almost always ended up with OutMemoryException or sg VERY slow. My latest attempt that also does not work on the large input: (defn load-graph [input-f] (with-open [rdr (io/reader input-f)] (- (line-seq rdr) (map (fn [row] (let [[v1str v2str] (str/split row #\s)] [ (Integer/parseInt v1str) (Integer/parseInt v2str) ])) ) (reduce (fn [G [v1 v2]] (if-let [vs (get G v1)] (update-in G [v1] #(conj % v2)) (assoc G v1 [v2]))) { } I'm getting a bit frustrated as there are Python, Go implementations that load the graph in less the 5 seconds. What am I doing wrong? Thanks -- László Török -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Loading a huge graph
Hi, I'm trying figure out how to load a huge file that contains some 800k pair of integers (two integers per line) which represent edges of a directed graph. So if the ith line has x and y, it means that there is an edge between x and y vertex in the graph. The goal is to load it in an array of arrays representation, where the kth array contains all the nodes, where there is a directed edge from the kth node to those nodes. I've attempted multiple variants of with-open reader and line-seq etc. but almost always ended up with OutMemoryException or sg VERY slow. My latest attempt that also does not work on the large input: (defn load-graph [input-f] (with-open [rdr (io/reader input-f)] (- (line-seq rdr) (map (fn [row] (let [[v1str v2str] (str/split row #\s)] [ (Integer/parseInt v1str) (Integer/parseInt v2str) ])) ) (reduce (fn [G [v1 v2]] (if-let [vs (get G v1)] (update-in G [v1] #(conj % v2)) (assoc G v1 [v2]))) { } I'm getting a bit frustrated as there are Python, Go implementations that load the graph in less the 5 seconds. What am I doing wrong? Thanks -- László Török -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Loading a huge graph
How much memory do Python Go consume when you do this? Are you giving the JVM enough memory? On Thu, Apr 12, 2012 at 6:17 PM, László Török ltoro...@gmail.com wrote: Hi, I'm trying figure out how to load a huge file that contains some 800k pair of integers (two integers per line) which represent edges of a directed graph. So if the ith line has x and y, it means that there is an edge between x and y vertex in the graph. The goal is to load it in an array of arrays representation, where the kth array contains all the nodes, where there is a directed edge from the kth node to those nodes. I've attempted multiple variants of with-open reader and line-seq etc. but almost always ended up with OutMemoryException or sg VERY slow. My latest attempt that also does not work on the large input: (defn load-graph [input-f] (with-open [rdr (io/reader input-f)] (- (line-seq rdr) (map (fn [row] (let [[v1str v2str] (str/split row #\s)] [ (Integer/parseInt v1str) (Integer/parseInt v2str) ])) ) (reduce (fn [G [v1 v2]] (if-let [vs (get G v1)] (update-in G [v1] #(conj % v2)) (assoc G v1 [v2]))) { } I'm getting a bit frustrated as there are Python, Go implementations that load the graph in less the 5 seconds. What am I doing wrong? Thanks -- László Török -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Loading a huge graph
Yeah, sounds like it could definitely be a memory issue. This is one part where the JVM works a lot differently than I expected coming from a python background. Everybody may already know this, but the JVM only takes 64mb for the heap by default. You'll get an out of memory error if your program uses more than that. In contrast, python just takes all the memory it needs. As your program gets closer to the JVM memory limit it'll spend more and more time doing garbage collection, with less and less real work getting done. You can pass an -Xmx flag to give java access to more memory, which many (most?) programs do. On Thu, Apr 12, 2012 at 5:22 PM, David Nolen dnolen.li...@gmail.com wrote: How much memory do Python Go consume when you do this? Are you giving the JVM enough memory? On Thu, Apr 12, 2012 at 6:17 PM, László Török ltoro...@gmail.com wrote: Hi, I'm trying figure out how to load a huge file that contains some 800k pair of integers (two integers per line) which represent edges of a directed graph. So if the ith line has x and y, it means that there is an edge between x and y vertex in the graph. The goal is to load it in an array of arrays representation, where the kth array contains all the nodes, where there is a directed edge from the kth node to those nodes. I've attempted multiple variants of with-open reader and line-seq etc. but almost always ended up with OutMemoryException or sg VERY slow. My latest attempt that also does not work on the large input: (defn load-graph [input-f] (with-open [rdr (io/reader input-f)] (- (line-seq rdr) (map (fn [row] (let [[v1str v2str] (str/split row #\s)] [ (Integer/parseInt v1str) (Integer/parseInt v2str) ])) ) (reduce (fn [G [v1 v2]] (if-let [vs (get G v1)] (update-in G [v1] #(conj % v2)) (assoc G v1 [v2]))) { } I'm getting a bit frustrated as there are Python, Go implementations that load the graph in less the 5 seconds. What am I doing wrong? Thanks -- László Török -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en