Re: doseq vs dorun
Hi, On Friday, October 18, 2013 12:12:31 AM UTC+2, Brian Craft wrote: I briefly tried working with the reducers library, which generally made things 2-3 times slower, presumably because I'm using it incorrectly. I would really like to see more reducers examples, e.g. for this case: reading a seq larger than memory, doing transforms on the data, and then executing side effects. I used reducers for processing lots of XML files. Probably the most common pitfall is, that fork only does parallel computation when working on a vector. While all the XML data would not have fit into memory, the vector of filenames to read from certainly did, and that made a big difference. Plus, I reduced the chunksize from default 512 to 1. Cheers, Stefan -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: doseq vs dorun
Hi All, Thank you so much for your replies! For my particular use case (tail -f multiple files and write the entries into a db), I'm using pmap to process each file in a separate thread and for each file, I'm using doseq to write to db. It seems to be working well (though I still need to benchmark it). Thanks to your help, I have a better understanding of how doseq, dorun, et. al. work. On Friday, October 18, 2013 12:05:50 AM UTC-7, Stefan Kamphausen wrote: Hi, On Friday, October 18, 2013 12:12:31 AM UTC+2, Brian Craft wrote: I briefly tried working with the reducers library, which generally made things 2-3 times slower, presumably because I'm using it incorrectly. I would really like to see more reducers examples, e.g. for this case: reading a seq larger than memory, doing transforms on the data, and then executing side effects. I used reducers for processing lots of XML files. Probably the most common pitfall is, that fork only does parallel computation when working on a vector. While all the XML data would not have fit into memory, the vector of filenames to read from certainly did, and that made a big difference. Plus, I reduced the chunksize from default 512 to 1. Cheers, Stefan -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: doseq vs dorun
Ideally, you wouldn't be using a side effect at all, but something like reducers to return a single computed result after going over the sequence. (If the input's too big for main memory, you'd also need to partition the input seq into reducible-collection chunks small enough to fit in memory.) If side effects are necessary because you're doing I/O for each element of the seq, then the overhead of wrapping in pmap is probably minimal as the task is I/O-bound, but the benefit of pmap may not be significant either. Threaded I/O is generally only useful for 1. preventing I/O from bottlenecking a CPU-bound task by splitting them into separate threads and 2. networking with many remote hosts, so you can usefully do something with host B while waiting for a response from host A, or with one remote host where latency and task orthogonality make several parallel interactions preferable to several sequential ones (e.g. a web browser loading images several at a time from a web server when the throughput is high but so is the latency). If side effects are necessary because you're interacting with a legacy Java API that uses mutable state, you might want to look into pvalues and pcalls. On Wed, Oct 16, 2013 at 10:34 PM, Pradeep Gollakota pradeep...@gmail.comwrote: Hi All, I’m (very) new to clojure (and loving it)… and I’m trying to wrap my head around how to correctly choose doseq vs dorun for my particular use case. I’ve read this earlier post https://groups.google.com/forum/#!msg/clojure/8ebJsllH8UY/mXtixH3CRRsJand I had a clarifying question. From what I gathered in the above post, it’s more efficient to use doseq instead of dorun since map creates another seq. However, if the fn you want to apply on the seq can be parallelized, doseq wouldn’t give you the ability to parallelize. With dorun you can use pmap instead of map and get parallelization. (doseq [i some-lazy-seq] side-effect-fn) (dorun (pmap side-effect-fn some-lazy-seq)) What is the idiomatic way of parallelizing a computation on a lazy seq? Thanks, Pradeep -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out. -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: doseq vs dorun
Hi, What is the idiomatic way of parallelizing a computation on a lazy seq? keep in mind, that pmap lazily processes the seq with a moving window the size of which depends on the available cores on your machine. If the processing of one element takes a long time, the parallel work will wait for it to finish before moving on. Thus, pmap may be an easy way to achieve parallel processing but is only suited for problems which take approximately the same time each. Stefan -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: doseq vs dorun
On Thursday, 17 October 2013 10:34:18 UTC+8, Pradeep Gollakota wrote: Hi All, I’m (very) new to clojure (and loving it)… and I’m trying to wrap my head around how to correctly choose doseq vs dorun for my particular use case. I’ve read this earlier post https://groups.google.com/forum/#!msg/clojure/8ebJsllH8UY/mXtixH3CRRsJand I had a clarifying question. From what I gathered in the above post, it’s more efficient to use doseq instead of dorun since map creates another seq. However, if the fn you want to apply on the seq can be parallelized, doseq wouldn’t give you the ability to parallelize. With dorun you can use pmap instead of map and get parallelization. (doseq [i some-lazy-seq] side-effect-fn) (dorun (pmap side-effect-fn some-lazy-seq)) What is the idiomatic way of parallelizing a computation on a lazy seq? I don't think there is a single idiomatic way. It depends on lots of things, e.g.: - How expensive is each side-effect-fn? If it is cheap, then the ovehead of making things parallel may not be worth it - Do you want to constrain the thread pool or have a separate thread for each element? For the later, futures are an option - Where is the actual bottleneck? If an external resource is constrained, CPU parallelization may not help you at all. - How is the lazy sequence being produced? Is it already realised, or being computed on the fly? - Is there any concern about ordering / concurrent access to resources / race conditions? Assuming that side-effect-fn is relatively CPU-expensive and that the runtimes of each call to it are reasonably similar, then I'd say that your (dorun (pmap .)) version is a decent choice. Otherwise you make want to take a look at the reducers library - the Fork/Join capabilities are very impressive and should do what you need. -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: doseq vs dorun
I have the same use case: walking a seq of an input file, and doing file/db operations for each row. pmap is working very well, but it has required a lot of attention to the data flow, to make sure that no significant compute is done in the main thread. Otherwise IO blocks the compute. I briefly tried working with the reducers library, which generally made things 2-3 times slower, presumably because I'm using it incorrectly. I would really like to see more reducers examples, e.g. for this case: reading a seq larger than memory, doing transforms on the data, and then executing side effects. On Thursday, October 17, 2013 4:04:51 AM UTC-7, Mikera wrote: On Thursday, 17 October 2013 10:34:18 UTC+8, Pradeep Gollakota wrote: Hi All, I’m (very) new to clojure (and loving it)… and I’m trying to wrap my head around how to correctly choose doseq vs dorun for my particular use case. I’ve read this earlier post https://groups.google.com/forum/#!msg/clojure/8ebJsllH8UY/mXtixH3CRRsJand I had a clarifying question. From what I gathered in the above post, it’s more efficient to use doseq instead of dorun since map creates another seq. However, if the fn you want to apply on the seq can be parallelized, doseq wouldn’t give you the ability to parallelize. With dorun you can use pmap instead of map and get parallelization. (doseq [i some-lazy-seq] side-effect-fn) (dorun (pmap side-effect-fn some-lazy-seq)) What is the idiomatic way of parallelizing a computation on a lazy seq? I don't think there is a single idiomatic way. It depends on lots of things, e.g.: - How expensive is each side-effect-fn? If it is cheap, then the ovehead of making things parallel may not be worth it - Do you want to constrain the thread pool or have a separate thread for each element? For the later, futures are an option - Where is the actual bottleneck? If an external resource is constrained, CPU parallelization may not help you at all. - How is the lazy sequence being produced? Is it already realised, or being computed on the fly? - Is there any concern about ordering / concurrent access to resources / race conditions? Assuming that side-effect-fn is relatively CPU-expensive and that the runtimes of each call to it are reasonably similar, then I'd say that your (dorun (pmap .)) version is a decent choice. Otherwise you make want to take a look at the reducers library - the Fork/Join capabilities are very impressive and should do what you need. -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.