Re: doseq vs dorun

2013-10-18 Thread Stefan Kamphausen
Hi,

On Friday, October 18, 2013 12:12:31 AM UTC+2, Brian Craft wrote:

 I briefly tried working with the reducers library, which generally made 
 things 2-3 times slower, presumably because I'm using it incorrectly. I 
 would really like to see more reducers examples, e.g. for this case: 
 reading a seq larger than memory, doing transforms on the data, and then 
 executing side effects.


I used reducers for processing lots of XML files.  Probably the most common 
pitfall is, that fork only does parallel computation when working on a 
vector.  While all the XML data would not have fit into memory, the vector 
of filenames to read from certainly did, and that made a big difference.  
Plus, I reduced the chunksize from default 512 to 1.


Cheers,
Stefan

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: doseq vs dorun

2013-10-18 Thread Pradeep Gollakota
Hi All,

Thank you so much for your replies!

For my particular use case (tail -f multiple files and write the entries 
into a db), I'm using pmap to process each file in a separate thread and 
for each file, I'm using doseq to write to db. It seems to be working well 
(though I still need to benchmark it).

Thanks to your help, I have a better understanding of how doseq, dorun, et. 
al. work.

On Friday, October 18, 2013 12:05:50 AM UTC-7, Stefan Kamphausen wrote:

 Hi,

 On Friday, October 18, 2013 12:12:31 AM UTC+2, Brian Craft wrote:

 I briefly tried working with the reducers library, which generally made 
 things 2-3 times slower, presumably because I'm using it incorrectly. I 
 would really like to see more reducers examples, e.g. for this case: 
 reading a seq larger than memory, doing transforms on the data, and then 
 executing side effects.


 I used reducers for processing lots of XML files.  Probably the most 
 common pitfall is, that fork only does parallel computation when working 
 on a vector.  While all the XML data would not have fit into memory, the 
 vector of filenames to read from certainly did, and that made a big 
 difference.  Plus, I reduced the chunksize from default 512 to 1.


 Cheers,
 Stefan


-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: doseq vs dorun

2013-10-17 Thread Cedric Greevey
Ideally, you wouldn't be using a side effect at all, but something like
reducers to return a single computed result after going over the sequence.
(If the input's too big for main memory, you'd also need to partition the
input seq into reducible-collection chunks small enough to fit in memory.)

If side effects are necessary because you're doing I/O for each element of
the seq, then the overhead of wrapping in pmap is probably minimal as the
task is I/O-bound, but the benefit of pmap may not be significant either.
Threaded I/O is generally only useful for 1. preventing I/O from
bottlenecking a CPU-bound task by splitting them into separate threads and
2. networking with many remote hosts, so you can usefully do something with
host B while waiting for a response from host A, or with one remote host
where latency and task orthogonality make several parallel interactions
preferable to several sequential ones (e.g. a web browser loading images
several at a time from a web server when the throughput is high but so is
the latency).

If side effects are necessary because you're interacting with a legacy Java
API that uses mutable state, you might want to look into pvalues and pcalls.


On Wed, Oct 16, 2013 at 10:34 PM, Pradeep Gollakota pradeep...@gmail.comwrote:

 Hi All,

 I’m (very) new to clojure (and loving it)… and I’m trying to wrap my head
 around how to correctly choose doseq vs dorun for my particular use case.
 I’ve read this earlier post
 https://groups.google.com/forum/#!msg/clojure/8ebJsllH8UY/mXtixH3CRRsJand I 
 had a clarifying question.

 From what I gathered in the above post, it’s more efficient to use doseq
 instead of dorun since map creates another seq. However, if the fn you want
 to apply on the seq can be parallelized, doseq wouldn’t give you the
 ability to parallelize. With dorun you can use pmap instead of map and get
 parallelization.

 (doseq [i some-lazy-seq] side-effect-fn)
 (dorun (pmap side-effect-fn some-lazy-seq))

 What is the idiomatic way of parallelizing a computation on a lazy seq?

 Thanks,
 Pradeep

 --
 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are subscribed to the Google Groups
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: doseq vs dorun

2013-10-17 Thread Stefan Kamphausen
Hi,

What is the idiomatic way of parallelizing a computation on a lazy seq?


 keep in mind, that pmap lazily processes the seq with a moving window the 
size of which depends on the available cores on your machine.  If the 
processing of one element takes a long time, the parallel work will wait 
for it to finish before moving on.  Thus, pmap may be an easy way to 
achieve parallel processing but is only suited for problems which take 
approximately the same time each.

Stefan

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: doseq vs dorun

2013-10-17 Thread Mikera
On Thursday, 17 October 2013 10:34:18 UTC+8, Pradeep Gollakota wrote:

 Hi All,

 I’m (very) new to clojure (and loving it)… and I’m trying to wrap my head 
 around how to correctly choose doseq vs dorun for my particular use case. 
 I’ve read this earlier post 
 https://groups.google.com/forum/#!msg/clojure/8ebJsllH8UY/mXtixH3CRRsJand I 
 had a clarifying question.

 From what I gathered in the above post, it’s more efficient to use doseq 
 instead of dorun since map creates another seq. However, if the fn you want 
 to apply on the seq can be parallelized, doseq wouldn’t give you the 
 ability to parallelize. With dorun you can use pmap instead of map and get 
 parallelization.

 (doseq [i some-lazy-seq] side-effect-fn)
 (dorun (pmap side-effect-fn some-lazy-seq))

 What is the idiomatic way of parallelizing a computation on a lazy seq?

I don't think there is a single idiomatic way. It depends on lots of 
things, e.g.:
- How expensive is each side-effect-fn? If it is cheap, then the ovehead of 
making things parallel may not be worth it
- Do you want to constrain the thread pool or have a separate thread for 
each element? For the later, futures are an option
- Where is the actual bottleneck? If an external resource is constrained, 
CPU parallelization may not help you at all.
- How is the lazy sequence being produced? Is it already realised, or being 
computed on the fly?
- Is there any concern about ordering / concurrent access to resources / 
race conditions?

Assuming that side-effect-fn is relatively CPU-expensive and that the 
runtimes of each call to it are reasonably similar, then I'd say that your 
(dorun (pmap .)) version is a decent choice. Otherwise you make want to 
take a look at the reducers library - the Fork/Join capabilities are very 
impressive and should do what you need.

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: doseq vs dorun

2013-10-17 Thread Brian Craft
I have the same use case: walking a seq of an input file, and doing file/db 
operations for each row. pmap is working very well, but it has required a 
lot of attention to the data flow, to make sure that no significant compute 
is done in the main thread. Otherwise IO blocks the compute.

I briefly tried working with the reducers library, which generally made 
things 2-3 times slower, presumably because I'm using it incorrectly. I 
would really like to see more reducers examples, e.g. for this case: 
reading a seq larger than memory, doing transforms on the data, and then 
executing side effects.

On Thursday, October 17, 2013 4:04:51 AM UTC-7, Mikera wrote:

 On Thursday, 17 October 2013 10:34:18 UTC+8, Pradeep Gollakota wrote:

 Hi All,

 I’m (very) new to clojure (and loving it)… and I’m trying to wrap my head 
 around how to correctly choose doseq vs dorun for my particular use case. 
 I’ve read this earlier post 
 https://groups.google.com/forum/#!msg/clojure/8ebJsllH8UY/mXtixH3CRRsJand I 
 had a clarifying question.

 From what I gathered in the above post, it’s more efficient to use doseq 
 instead of dorun since map creates another seq. However, if the fn you want 
 to apply on the seq can be parallelized, doseq wouldn’t give you the 
 ability to parallelize. With dorun you can use pmap instead of map and get 
 parallelization.

 (doseq [i some-lazy-seq] side-effect-fn)
 (dorun (pmap side-effect-fn some-lazy-seq))

 What is the idiomatic way of parallelizing a computation on a lazy seq?

 I don't think there is a single idiomatic way. It depends on lots of 
 things, e.g.:
 - How expensive is each side-effect-fn? If it is cheap, then the ovehead 
 of making things parallel may not be worth it
 - Do you want to constrain the thread pool or have a separate thread for 
 each element? For the later, futures are an option
 - Where is the actual bottleneck? If an external resource is constrained, 
 CPU parallelization may not help you at all.
 - How is the lazy sequence being produced? Is it already realised, or 
 being computed on the fly?
 - Is there any concern about ordering / concurrent access to resources / 
 race conditions?

 Assuming that side-effect-fn is relatively CPU-expensive and that the 
 runtimes of each call to it are reasonably similar, then I'd say that your 
 (dorun (pmap .)) version is a decent choice. Otherwise you make want to 
 take a look at the reducers library - the Fork/Join capabilities are very 
 impressive and should do what you need.


-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.