Re: Seeking a function to partially parallelize collection processing

2017-07-14 Thread arthur
I have a few questions if this doesn't solve your issue, but how about something as simple as: (pmap (partial map handler) (partition-by splitter collection)) partition-by is lazy, and pmap is lazy-ish. On Friday, June 16, 2017 at 10:13:11 AM UTC-4, Tom Connors wrote: > > I'm looking for a

Re: Seeking a function to partially parallelize collection processing

2017-06-26 Thread Tom Connors
Thanks for these two examples, Didier. In particular, I like the second one a lot. I'm currently using a slightly altered version of my first solution that avoids batching, but the code has gotten pretty nasty. Once I get a chance I'll cleanup my solution and benchmark yours and mine and post

Re: Seeking a function to partially parallelize collection processing

2017-06-20 Thread Didier
Here: (ns dda.test) (def test-infinite-lazy-seq (repeatedly (fn [] {:id (rand-int 2) :val (rand-int 10)}))) (def test-finite-seq [{:id 1 :val 1} {:id 1 :val 2} {:id 3 :val 1}]) (defn

Re: Seeking a function to partially parallelize collection processing

2017-06-20 Thread Didier
Do you want something like this? (ns dda.test) (def test-infinite-lazy-seq (repeatedly (fn [] {:id (rand-int 2) :val (rand-int 10)}))) (def test-finite-seq [{:id 1 :val 1} {:id 1 :val 2}

Re: Seeking a function to partially parallelize collection processing

2017-06-20 Thread Tom Connors
Great, I'll watch that video. Thanks again. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To

Re: Seeking a function to partially parallelize collection processing

2017-06-20 Thread Justin Smith
channel operations are io, and intermixing them with processing leads to code that is difficult to read and debug. core.async has facilities to help you code more declaratively over channels. I think TImothy Baldridge's talk at the last Clojure/West does a great job of presenting the issue

Re: Seeking a function to partially parallelize collection processing

2017-06-20 Thread Tom Connors
Thanks, Justin. Regarding the mixing of program logic with channel io, I'm don't understand why that's a problem in this case or how it could be improved. Do you mind explaining that a bit more? -- You received this message because you are subscribed to the Google Groups "Clojure" group. To

Re: Seeking a function to partially parallelize collection processing

2017-06-20 Thread Justin Smith
Aside from style issues of mixing channel input/output with program logic, and hiding the useful return value of go-loop, the real problem here is doing your work inside a go block. Go blocks are not meant for blocking tasks, whether CPU or IO bound; doing real work inside go blocks risks starving

Re: Seeking a function to partially parallelize collection processing

2017-06-20 Thread Tom Connors
Thanks for the suggestion, Didier, but I was unable to find a way to make pmap work for my use case. For those interested, here's what I came up with, then some questions: (defn parallel-per "Handle records from input-chan in parallel, but records with matching `splitter` return values

Re: Seeking a function to partially parallelize collection processing

2017-06-19 Thread Didier
So, if I understand correctly, you need to have one function sequentially and lazily split the stream, then you want each split to be sequentially processed, but you'd like different splits to be processed in parallel. I think for splitting, you could use (reductions), and then you could (pmap)

Re: Seeking a function to partially parallelize collection processing

2017-06-19 Thread Tom Connors
Thanks Jose and Sam for the suggestions. I'm having some trouble figuring out the lifecycle for the channels created for each return value from the splitter function. I'll post my code once I have something I think works in case it's interesting to anyone in the future. -- You received this

Re: Seeking a function to partially parallelize collection processing

2017-06-17 Thread Sam Raker
core.async pub/sub ? On Friday, June 16, 2017 at 10:13:11 AM UTC-4, Tom Connors wrote: > > I'm looking for a function that would likely be named something like > "sequential-by" or "parallel-per" that takes some data-producing thing like > a

Re: Seeking a function to partially parallelize collection processing

2017-06-16 Thread Jose Figueroa Martinez
Hello Tom, I think you are talking about distribution, not parallelization. As I see (sorry for not reading enough previously) you want a way to handle different things in a sequential way where each sequence of things (already grouped) are handled in a different thread. You can put the

Re: Seeking a function to partially parallelize collection processing

2017-06-16 Thread Tom Connors
Thanks Justin. My mistake. Point 2 stands. On Friday, June 16, 2017 at 3:58:38 PM UTC-4, Justin Smith wrote: > > pmap is rarely actually useful, but point 1 is false, pmap doesn't require > that it's input or output fit in memory > > On Fri, Jun 16, 2017 at 12:52 PM Tom Connors

Re: Seeking a function to partially parallelize collection processing

2017-06-16 Thread Justin Smith
pmap is rarely actually useful, but point 1 is false, pmap doesn't require that it's input or output fit in memory On Fri, Jun 16, 2017 at 12:52 PM Tom Connors wrote: > Hello Jose, > Thank you for the response, but pmap does not address my use case. It's > insufficient

Re: Seeking a function to partially parallelize collection processing

2017-06-16 Thread Tom Connors
Hello Jose, Thank you for the response, but pmap does not address my use case. It's insufficient for two reasons: 1) the entire collection must fit in memory. My use case is handling records from a Kinesis stream. and 2) pmap parallelizes over the whole collection, whereas I want to parallelize

Re: Seeking a function to partially parallelize collection processing

2017-06-16 Thread Jose Figueroa Martinez
Hello, there are many videos on how parallelize sequential processing on ClojureTV, but, the most basic way in Clojure I think is *pmap* Saludos. El viernes, 16 de junio de 2017, 9:13:11 (UTC-5), Tom Connors escribió: > > I'm looking for a function that would likely be named something like >

Seeking a function to partially parallelize collection processing

2017-06-16 Thread Tom Connors
I'm looking for a function that would likely be named something like "sequential-by" or "parallel-per" that takes some data-producing thing like a lazy seq or a core async channel, a function to split records from that input, and a function to handle each item. Each item with an identical