[julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-07 Thread cheng wang
Thx a lot. You saved my life :)

On Wednesday, October 7, 2015 at 3:00:25 PM UTC+2, Jonathan Malmaud wrote:
>
> Within the next few days, support for native threads will be merged into 
> to the development version of Julia (
> https://github.com/JuliaLang/julia/pull/13410 
> 
> ).
>
> You can also used the SharedArray type which Julia already has, which lets 
> multiple Julia processes running on the same machine share memory. You 
> would use the standard Julia task-parallel tools (like @parfor, etc.) in 
> that model. 
>
> On Wednesday, October 7, 2015 at 8:34:02 AM UTC-4, cheng wang wrote:
>>
>> Thanks all for replying.
>>
>> I have read th parallel computing document before I post this.
>> Actually, what I mean is a shared memory model not a distributed model.
>>
>> My daily research involves extensively using of blas and parallel 
>> for-loop.
>> Julia has a perfect support for blas, as well parallel for-loop could be 
>> solved by multi-process.
>>
>> However, if I want to have a shared array that could do efficient blast 
>> and parallel for-loop in the same time,
>> what is the best solution ??
>>
>>
>> On Tuesday, October 6, 2015 at 4:24:51 PM UTC+2, Andrei Zh wrote:
>>>
>>> Julia supports multiprocessing pretty well, including map-reduce-like 
>>> jobs. E.g. in the next example I add 3 processes to a "workgroup", 
>>> distribute simulation between them and then reduce results via (+) operator:
>>>
>>>
>>> julia> addprocs(3)
>>> 3-element Array{Int64,1}:
>>>  2
>>>  3
>>>  4
>>>
>>>
>>> julia> nheads = @parallel (+) for i=1:2
>>>  Int(rand(Bool))
>>>end
>>> 18845
>>>
>>> You can find full example and a lot of other fun in official 
>>> documentation on parallel computing: 
>>>
>>> http://julia.readthedocs.org/en/latest/manual/parallel-computing/
>>>
>>> Note, though, that it's not real (i.e. Hadoop/Spark-like) map-reduce, 
>>> since original idea of MR concerns distributed systems and data-local 
>>> computations, while here we do everything on the same machine. If you are 
>>> looking for big data solution, search this forum for some (dead or alive) 
>>> projects for it. 
>>>
>>>
>>>
>>> On Monday, October 5, 2015 at 11:52:21 PM UTC+3, cheng wang wrote:

 Hello everyone,

 I am a Julia newbie. I am thrilled by Julia recently. It's an amazing 
 language!

 I notice that julia currently does not have good support for 
 multi-threading programming.
 So I am thinking that a spark-like mapreduce parallel model + 
 multi-process maybe enough.
 It is easy to be thread-safe and It could solve most vector-based 
 computation.

 This idea might be too naive. However, I am happy to see your opinions.

 Thanks in advance,
 Cheng

>>>

[julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-07 Thread Jonathan Malmaud
Within the next few days, support for native threads will be merged into to 
the development version of Julia 
(https://github.com/JuliaLang/julia/pull/13410).

You can also used the SharedArray type which Julia already has, which lets 
multiple Julia processes running on the same machine share memory. You 
would use the standard Julia task-parallel tools (like @parfor, etc.) in 
that model. 

On Wednesday, October 7, 2015 at 8:34:02 AM UTC-4, cheng wang wrote:
>
> Thanks all for replying.
>
> I have read th parallel computing document before I post this.
> Actually, what I mean is a shared memory model not a distributed model.
>
> My daily research involves extensively using of blas and parallel for-loop.
> Julia has a perfect support for blas, as well parallel for-loop could be 
> solved by multi-process.
>
> However, if I want to have a shared array that could do efficient blast 
> and parallel for-loop in the same time,
> what is the best solution ??
>
>
> On Tuesday, October 6, 2015 at 4:24:51 PM UTC+2, Andrei Zh wrote:
>>
>> Julia supports multiprocessing pretty well, including map-reduce-like 
>> jobs. E.g. in the next example I add 3 processes to a "workgroup", 
>> distribute simulation between them and then reduce results via (+) operator:
>>
>>
>> julia> addprocs(3)
>> 3-element Array{Int64,1}:
>>  2
>>  3
>>  4
>>
>>
>> julia> nheads = @parallel (+) for i=1:2
>>  Int(rand(Bool))
>>end
>> 18845
>>
>> You can find full example and a lot of other fun in official 
>> documentation on parallel computing: 
>>
>> http://julia.readthedocs.org/en/latest/manual/parallel-computing/
>>
>> Note, though, that it's not real (i.e. Hadoop/Spark-like) map-reduce, 
>> since original idea of MR concerns distributed systems and data-local 
>> computations, while here we do everything on the same machine. If you are 
>> looking for big data solution, search this forum for some (dead or alive) 
>> projects for it. 
>>
>>
>>
>> On Monday, October 5, 2015 at 11:52:21 PM UTC+3, cheng wang wrote:
>>>
>>> Hello everyone,
>>>
>>> I am a Julia newbie. I am thrilled by Julia recently. It's an amazing 
>>> language!
>>>
>>> I notice that julia currently does not have good support for 
>>> multi-threading programming.
>>> So I am thinking that a spark-like mapreduce parallel model + 
>>> multi-process maybe enough.
>>> It is easy to be thread-safe and It could solve most vector-based 
>>> computation.
>>>
>>> This idea might be too naive. However, I am happy to see your opinions.
>>>
>>> Thanks in advance,
>>> Cheng
>>>
>>

[julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-07 Thread cheng wang
Thanks all for replying.

I have read th parallel computing document before I post this.
Actually, what I mean is a shared memory model not a distributed model.

My daily research involves extensively using of blas and parallel for-loop.
Julia has a perfect support for blas, as well parallel for-loop could be 
solved by multi-process.

However, if I want to have a shared array that could do efficient blast and 
parallel for-loop in the same time,
what is the best solution ??


On Tuesday, October 6, 2015 at 4:24:51 PM UTC+2, Andrei Zh wrote:
>
> Julia supports multiprocessing pretty well, including map-reduce-like 
> jobs. E.g. in the next example I add 3 processes to a "workgroup", 
> distribute simulation between them and then reduce results via (+) operator:
>
>
> julia> addprocs(3)
> 3-element Array{Int64,1}:
>  2
>  3
>  4
>
>
> julia> nheads = @parallel (+) for i=1:2
>  Int(rand(Bool))
>end
> 18845
>
> You can find full example and a lot of other fun in official documentation 
> on parallel computing: 
>
> http://julia.readthedocs.org/en/latest/manual/parallel-computing/
>
> Note, though, that it's not real (i.e. Hadoop/Spark-like) map-reduce, 
> since original idea of MR concerns distributed systems and data-local 
> computations, while here we do everything on the same machine. If you are 
> looking for big data solution, search this forum for some (dead or alive) 
> projects for it. 
>
>
>
> On Monday, October 5, 2015 at 11:52:21 PM UTC+3, cheng wang wrote:
>>
>> Hello everyone,
>>
>> I am a Julia newbie. I am thrilled by Julia recently. It's an amazing 
>> language!
>>
>> I notice that julia currently does not have good support for 
>> multi-threading programming.
>> So I am thinking that a spark-like mapreduce parallel model + 
>> multi-process maybe enough.
>> It is easy to be thread-safe and It could solve most vector-based 
>> computation.
>>
>> This idea might be too naive. However, I am happy to see your opinions.
>>
>> Thanks in advance,
>> Cheng
>>
>

Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-07 Thread Steven Sagaert
I think what is meant is that in HPC typically this is done via MPI which 
is just a low level approach where you explicitely have to specify all the 
data communication (compared to Hadoop & Spark where it is implicit).

>
>
>  The only codes that really nail it are carefully handcrafted HPC codes.
>
>
> Could you please elaborate on this? I think I know Spark code quite well, 
> but can't connect it to the notion of handcrafted HPC code. 
>
>
>
>
>>

Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread Andrei Zh


> In my experience, Hadoop is pretty terrible about minimizing data 
> movement; Spark seems to be significantly better. 
>
>
If you mean MapReduce (the framework, version 1 or 2), it doesn't move data 
anywhere unless you tell it to do so in reduce phase. You could experience 
another issue with MR1 - multiple reads and writes to disk on multistage 
jobs, which makes them teribly slow. (Recall, that Hadoop was born to 
efficiently and reliably download and store millions of web pages obtained 
using Nutch, not to write iterative machine learning algorithms.)

 The only codes that really nail it are carefully handcrafted HPC codes.


Could you please elaborate on this? I think I know Spark code quite well, 
but can't connect it to the notion of handcrafted HPC code. 



On Tue, Oct 6, 2015 at 12:57 PM, Andrei Zh  > wrote:
>
>> Yet, calling Julia processes on other machines via ssh doesn't address 
>> data locality. In big data systems (say, > 1TB) main performance concern is 
>> not a number of CPUs, but IO operations and data movement across a cluster, 
>> so map reduce tries to do as much as possible on local data without any 
>> movement (map phase) and then combine results globally (reduce phase). This 
>> way little program is send to data nodes instead of huge data being sent to 
>> program's node(s). 
>>
>> As far as I know, Julia doesn't provide any tools for working with huge 
>> distributed datasets, that's why I say it doesn't give you Hadoop- (or 
>> Spark-, or Google-like) map-reduce. But it's quite easy to add these 
>> features of MR too. E.g. one can use Elly.jl to access HDFS (including 
>> location of data blocks) and execute tasks using remotecall() on a Julia 
>> worker which is closest to data. 
>>
>>
>> On Tuesday, October 6, 2015 at 8:03:57 PM UTC+3, Stefan Karpinski wrote:
>>>
>>> That works fine in a distributed setting if you start Julia workers on 
>>> other machines, so it is actually a legitimate form of map reduce. It 
>>> doesn't do anything for handling machine failures, however, which was 
>>> arguably the major concern of the original MapReduce design.
>>>
>>> On Tue, Oct 6, 2015 at 10:24 AM, Andrei Zh  wrote:
>>>
 Julia supports multiprocessing pretty well, including map-reduce-like 
 jobs. E.g. in the next example I add 3 processes to a "workgroup", 
 distribute simulation between them and then reduce results via (+) 
 operator:


 julia> addprocs(3)
 3-element Array{Int64,1}:
  2
  3
  4


 julia> nheads = @parallel (+) for i=1:2
  Int(rand(Bool))
end
 18845

 You can find full example and a lot of other fun in official 
 documentation on parallel computing: 

 http://julia.readthedocs.org/en/latest/manual/parallel-computing/

 Note, though, that it's not real (i.e. Hadoop/Spark-like) map-reduce, 
 since original idea of MR concerns distributed systems and data-local 
 computations, while here we do everything on the same machine. If you are 
 looking for big data solution, search this forum for some (dead or alive) 
 projects for it. 



 On Monday, October 5, 2015 at 11:52:21 PM UTC+3, cheng wang wrote:
>
> Hello everyone,
>
> I am a Julia newbie. I am thrilled by Julia recently. It's an amazing 
> language!
>
> I notice that julia currently does not have good support for 
> multi-threading programming.
> So I am thinking that a spark-like mapreduce parallel model + 
> multi-process maybe enough.
> It is easy to be thread-safe and It could solve most vector-based 
> computation.
>
> This idea might be too naive. However, I am happy to see your opinions.
>
> Thanks in advance,
> Cheng
>

>>>
>

Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread Stefan Karpinski
In my experience, Hadoop is pretty terrible about minimizing data movement;
Spark seems to be significantly better. The only codes that really nail it
are carefully handcrafted HPC codes.

On Tue, Oct 6, 2015 at 12:57 PM, Andrei Zh 
wrote:

> Yet, calling Julia processes on other machines via ssh doesn't address
> data locality. In big data systems (say, > 1TB) main performance concern is
> not a number of CPUs, but IO operations and data movement across a cluster,
> so map reduce tries to do as much as possible on local data without any
> movement (map phase) and then combine results globally (reduce phase). This
> way little program is send to data nodes instead of huge data being sent to
> program's node(s).
>
> As far as I know, Julia doesn't provide any tools for working with huge
> distributed datasets, that's why I say it doesn't give you Hadoop- (or
> Spark-, or Google-like) map-reduce. But it's quite easy to add these
> features of MR too. E.g. one can use Elly.jl to access HDFS (including
> location of data blocks) and execute tasks using remotecall() on a Julia
> worker which is closest to data.
>
>
> On Tuesday, October 6, 2015 at 8:03:57 PM UTC+3, Stefan Karpinski wrote:
>>
>> That works fine in a distributed setting if you start Julia workers on
>> other machines, so it is actually a legitimate form of map reduce. It
>> doesn't do anything for handling machine failures, however, which was
>> arguably the major concern of the original MapReduce design.
>>
>> On Tue, Oct 6, 2015 at 10:24 AM, Andrei Zh  wrote:
>>
>>> Julia supports multiprocessing pretty well, including map-reduce-like
>>> jobs. E.g. in the next example I add 3 processes to a "workgroup",
>>> distribute simulation between them and then reduce results via (+) operator:
>>>
>>>
>>> julia> addprocs(3)
>>> 3-element Array{Int64,1}:
>>>  2
>>>  3
>>>  4
>>>
>>>
>>> julia> nheads = @parallel (+) for i=1:2
>>>  Int(rand(Bool))
>>>end
>>> 18845
>>>
>>> You can find full example and a lot of other fun in official
>>> documentation on parallel computing:
>>>
>>> http://julia.readthedocs.org/en/latest/manual/parallel-computing/
>>>
>>> Note, though, that it's not real (i.e. Hadoop/Spark-like) map-reduce,
>>> since original idea of MR concerns distributed systems and data-local
>>> computations, while here we do everything on the same machine. If you are
>>> looking for big data solution, search this forum for some (dead or alive)
>>> projects for it.
>>>
>>>
>>>
>>> On Monday, October 5, 2015 at 11:52:21 PM UTC+3, cheng wang wrote:

 Hello everyone,

 I am a Julia newbie. I am thrilled by Julia recently. It's an amazing
 language!

 I notice that julia currently does not have good support for
 multi-threading programming.
 So I am thinking that a spark-like mapreduce parallel model +
 multi-process maybe enough.
 It is easy to be thread-safe and It could solve most vector-based
 computation.

 This idea might be too naive. However, I am happy to see your opinions.

 Thanks in advance,
 Cheng

>>>
>>


Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread David van Leeuwen
See also an earlier discussion 
 on a 
similar topic, for an out-of-core approach.  

---david

On Tuesday, October 6, 2015 at 10:29:52 PM UTC+2, Tim Holy wrote:
>
> There's 
>
> https://github.com/JuliaParallel/DistributedArrays.jl 
> https://github.com/JuliaParallel/HDFS.jl 
>
> in case they help. (See the other packages in JuliaParallel, in case you 
> have 
> missed that organization.) 
>
> --Tim 
>
> On Tuesday, October 06, 2015 12:57:17 PM Andrei Zh wrote: 
> > Yet, calling Julia processes on other machines via ssh doesn't address 
> data 
> > locality. In big data systems (say, > 1TB) main performance concern is 
> not 
> > a number of CPUs, but IO operations and data movement across a cluster, 
> so 
> > map reduce tries to do as much as possible on local data without any 
> > movement (map phase) and then combine results globally (reduce phase). 
> This 
> > way little program is send to data nodes instead of huge data being sent 
> to 
> > program's node(s). 
> > 
> > As far as I know, Julia doesn't provide any tools for working with huge 
> > distributed datasets, that's why I say it doesn't give you Hadoop- (or 
> > Spark-, or Google-like) map-reduce. But it's quite easy to add these 
> > features of MR too. E.g. one can use Elly.jl to access HDFS (including 
> > location of data blocks) and execute tasks using remotecall() on a Julia 
> > worker which is closest to data. 
> > 
> > On Tuesday, October 6, 2015 at 8:03:57 PM UTC+3, Stefan Karpinski wrote: 
> > > That works fine in a distributed setting if you start Julia workers on 
> > > other machines, so it is actually a legitimate form of map reduce. It 
> > > doesn't do anything for handling machine failures, however, which was 
> > > arguably the major concern of the original MapReduce design. 
> > > 
> > > On Tue, Oct 6, 2015 at 10:24 AM, Andrei Zh  > > 
> > > > wrote: 
> > >> Julia supports multiprocessing pretty well, including map-reduce-like 
> > >> jobs. E.g. in the next example I add 3 processes to a "workgroup", 
> > >> distribute simulation between them and then reduce results via (+) 
> > >> operator: 
> > >> 
> > >> 
> > >> julia> addprocs(3) 
> > >> 
> > >> 3-element Array{Int64,1}: 
> > >>  2 
> > >>  3 
> > >>  4 
> > >> 
> > >> julia> nheads = @parallel (+) for i=1:2 
> > >> 
> > >>  Int(rand(Bool)) 
> > >> 
> > >>end 
> > >> 
> > >> 18845 
> > >> 
> > >> You can find full example and a lot of other fun in official 
> > >> documentation on parallel computing: 
> > >> 
> > >> http://julia.readthedocs.org/en/latest/manual/parallel-computing/ 
> > >> 
> > >> Note, though, that it's not real (i.e. Hadoop/Spark-like) map-reduce, 
> > >> since original idea of MR concerns distributed systems and data-local 
> > >> computations, while here we do everything on the same machine. If you 
> are 
> > >> looking for big data solution, search this forum for some (dead or 
> alive) 
> > >> projects for it. 
> > >> 
> > >> On Monday, October 5, 2015 at 11:52:21 PM UTC+3, cheng wang wrote: 
> > >>> Hello everyone, 
> > >>> 
> > >>> I am a Julia newbie. I am thrilled by Julia recently. It's an 
> amazing 
> > >>> language! 
> > >>> 
> > >>> I notice that julia currently does not have good support for 
> > >>> multi-threading programming. 
> > >>> So I am thinking that a spark-like mapreduce parallel model + 
> > >>> multi-process maybe enough. 
> > >>> It is easy to be thread-safe and It could solve most vector-based 
> > >>> computation. 
> > >>> 
> > >>> This idea might be too naive. However, I am happy to see your 
> opinions. 
> > >>> 
> > >>> Thanks in advance, 
> > >>> Cheng 
>
>

Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread Tim Holy
There's

https://github.com/JuliaParallel/DistributedArrays.jl
https://github.com/JuliaParallel/HDFS.jl

in case they help. (See the other packages in JuliaParallel, in case you have 
missed that organization.)

--Tim

On Tuesday, October 06, 2015 12:57:17 PM Andrei Zh wrote:
> Yet, calling Julia processes on other machines via ssh doesn't address data
> locality. In big data systems (say, > 1TB) main performance concern is not
> a number of CPUs, but IO operations and data movement across a cluster, so
> map reduce tries to do as much as possible on local data without any
> movement (map phase) and then combine results globally (reduce phase). This
> way little program is send to data nodes instead of huge data being sent to
> program's node(s).
> 
> As far as I know, Julia doesn't provide any tools for working with huge
> distributed datasets, that's why I say it doesn't give you Hadoop- (or
> Spark-, or Google-like) map-reduce. But it's quite easy to add these
> features of MR too. E.g. one can use Elly.jl to access HDFS (including
> location of data blocks) and execute tasks using remotecall() on a Julia
> worker which is closest to data.
> 
> On Tuesday, October 6, 2015 at 8:03:57 PM UTC+3, Stefan Karpinski wrote:
> > That works fine in a distributed setting if you start Julia workers on
> > other machines, so it is actually a legitimate form of map reduce. It
> > doesn't do anything for handling machine failures, however, which was
> > arguably the major concern of the original MapReduce design.
> > 
> > On Tue, Oct 6, 2015 at 10:24 AM, Andrei Zh  > 
> > > wrote:
> >> Julia supports multiprocessing pretty well, including map-reduce-like
> >> jobs. E.g. in the next example I add 3 processes to a "workgroup",
> >> distribute simulation between them and then reduce results via (+)
> >> operator:
> >> 
> >> 
> >> julia> addprocs(3)
> >> 
> >> 3-element Array{Int64,1}:
> >>  2
> >>  3
> >>  4
> >> 
> >> julia> nheads = @parallel (+) for i=1:2
> >> 
> >>  Int(rand(Bool))
> >>
> >>end
> >> 
> >> 18845
> >> 
> >> You can find full example and a lot of other fun in official
> >> documentation on parallel computing:
> >> 
> >> http://julia.readthedocs.org/en/latest/manual/parallel-computing/
> >> 
> >> Note, though, that it's not real (i.e. Hadoop/Spark-like) map-reduce,
> >> since original idea of MR concerns distributed systems and data-local
> >> computations, while here we do everything on the same machine. If you are
> >> looking for big data solution, search this forum for some (dead or alive)
> >> projects for it.
> >> 
> >> On Monday, October 5, 2015 at 11:52:21 PM UTC+3, cheng wang wrote:
> >>> Hello everyone,
> >>> 
> >>> I am a Julia newbie. I am thrilled by Julia recently. It's an amazing
> >>> language!
> >>> 
> >>> I notice that julia currently does not have good support for
> >>> multi-threading programming.
> >>> So I am thinking that a spark-like mapreduce parallel model +
> >>> multi-process maybe enough.
> >>> It is easy to be thread-safe and It could solve most vector-based
> >>> computation.
> >>> 
> >>> This idea might be too naive. However, I am happy to see your opinions.
> >>> 
> >>> Thanks in advance,
> >>> Cheng



Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread Andrei Zh
Yet, calling Julia processes on other machines via ssh doesn't address data 
locality. In big data systems (say, > 1TB) main performance concern is not 
a number of CPUs, but IO operations and data movement across a cluster, so 
map reduce tries to do as much as possible on local data without any 
movement (map phase) and then combine results globally (reduce phase). This 
way little program is send to data nodes instead of huge data being sent to 
program's node(s). 

As far as I know, Julia doesn't provide any tools for working with huge 
distributed datasets, that's why I say it doesn't give you Hadoop- (or 
Spark-, or Google-like) map-reduce. But it's quite easy to add these 
features of MR too. E.g. one can use Elly.jl to access HDFS (including 
location of data blocks) and execute tasks using remotecall() on a Julia 
worker which is closest to data. 


On Tuesday, October 6, 2015 at 8:03:57 PM UTC+3, Stefan Karpinski wrote:
>
> That works fine in a distributed setting if you start Julia workers on 
> other machines, so it is actually a legitimate form of map reduce. It 
> doesn't do anything for handling machine failures, however, which was 
> arguably the major concern of the original MapReduce design.
>
> On Tue, Oct 6, 2015 at 10:24 AM, Andrei Zh  > wrote:
>
>> Julia supports multiprocessing pretty well, including map-reduce-like 
>> jobs. E.g. in the next example I add 3 processes to a "workgroup", 
>> distribute simulation between them and then reduce results via (+) operator:
>>
>>
>> julia> addprocs(3)
>> 3-element Array{Int64,1}:
>>  2
>>  3
>>  4
>>
>>
>> julia> nheads = @parallel (+) for i=1:2
>>  Int(rand(Bool))
>>end
>> 18845
>>
>> You can find full example and a lot of other fun in official 
>> documentation on parallel computing: 
>>
>> http://julia.readthedocs.org/en/latest/manual/parallel-computing/
>>
>> Note, though, that it's not real (i.e. Hadoop/Spark-like) map-reduce, 
>> since original idea of MR concerns distributed systems and data-local 
>> computations, while here we do everything on the same machine. If you are 
>> looking for big data solution, search this forum for some (dead or alive) 
>> projects for it. 
>>
>>
>>
>> On Monday, October 5, 2015 at 11:52:21 PM UTC+3, cheng wang wrote:
>>>
>>> Hello everyone,
>>>
>>> I am a Julia newbie. I am thrilled by Julia recently. It's an amazing 
>>> language!
>>>
>>> I notice that julia currently does not have good support for 
>>> multi-threading programming.
>>> So I am thinking that a spark-like mapreduce parallel model + 
>>> multi-process maybe enough.
>>> It is easy to be thread-safe and It could solve most vector-based 
>>> computation.
>>>
>>> This idea might be too naive. However, I am happy to see your opinions.
>>>
>>> Thanks in advance,
>>> Cheng
>>>
>>
>

Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread Stefan Karpinski
That works fine in a distributed setting if you start Julia workers on
other machines, so it is actually a legitimate form of map reduce. It
doesn't do anything for handling machine failures, however, which was
arguably the major concern of the original MapReduce design.

On Tue, Oct 6, 2015 at 10:24 AM, Andrei Zh 
wrote:

> Julia supports multiprocessing pretty well, including map-reduce-like
> jobs. E.g. in the next example I add 3 processes to a "workgroup",
> distribute simulation between them and then reduce results via (+) operator:
>
>
> julia> addprocs(3)
> 3-element Array{Int64,1}:
>  2
>  3
>  4
>
>
> julia> nheads = @parallel (+) for i=1:2
>  Int(rand(Bool))
>end
> 18845
>
> You can find full example and a lot of other fun in official documentation
> on parallel computing:
>
> http://julia.readthedocs.org/en/latest/manual/parallel-computing/
>
> Note, though, that it's not real (i.e. Hadoop/Spark-like) map-reduce,
> since original idea of MR concerns distributed systems and data-local
> computations, while here we do everything on the same machine. If you are
> looking for big data solution, search this forum for some (dead or alive)
> projects for it.
>
>
>
> On Monday, October 5, 2015 at 11:52:21 PM UTC+3, cheng wang wrote:
>>
>> Hello everyone,
>>
>> I am a Julia newbie. I am thrilled by Julia recently. It's an amazing
>> language!
>>
>> I notice that julia currently does not have good support for
>> multi-threading programming.
>> So I am thinking that a spark-like mapreduce parallel model +
>> multi-process maybe enough.
>> It is easy to be thread-safe and It could solve most vector-based
>> computation.
>>
>> This idea might be too naive. However, I am happy to see your opinions.
>>
>> Thanks in advance,
>> Cheng
>>
>


[julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

2015-10-06 Thread Andrei Zh
Julia supports multiprocessing pretty well, including map-reduce-like jobs. 
E.g. in the next example I add 3 processes to a "workgroup", distribute 
simulation between them and then reduce results via (+) operator:


julia> addprocs(3)
3-element Array{Int64,1}:
 2
 3
 4


julia> nheads = @parallel (+) for i=1:2
 Int(rand(Bool))
   end
18845

You can find full example and a lot of other fun in official documentation 
on parallel computing: 

http://julia.readthedocs.org/en/latest/manual/parallel-computing/

Note, though, that it's not real (i.e. Hadoop/Spark-like) map-reduce, since 
original idea of MR concerns distributed systems and data-local 
computations, while here we do everything on the same machine. If you are 
looking for big data solution, search this forum for some (dead or alive) 
projects for it. 



On Monday, October 5, 2015 at 11:52:21 PM UTC+3, cheng wang wrote:
>
> Hello everyone,
>
> I am a Julia newbie. I am thrilled by Julia recently. It's an amazing 
> language!
>
> I notice that julia currently does not have good support for 
> multi-threading programming.
> So I am thinking that a spark-like mapreduce parallel model + 
> multi-process maybe enough.
> It is easy to be thread-safe and It could solve most vector-based 
> computation.
>
> This idea might be too naive. However, I am happy to see your opinions.
>
> Thanks in advance,
> Cheng
>