Re: Scala Spark-like RDD for D?

2016-02-16 Thread Jon D via Digitalmars-d-learn

On Wednesday, 17 February 2016 at 02:32:01 UTC, bachmeier wrote:

You can discuss here, but there is also a gitter room

https://gitter.im/DlangScience/public

Also, I've got a project that embeds R inside D

http://lancebachmeier.com/rdlang/

It's not quite as good a user experience as others because I 
have limited time for things not related to work. I've got an 
older project to embed D inside R, but it hasn't been updated 
in a while and it's Linux only.


https://bitbucket.org/bachmeil/dmdinline2


Excellent, thanks, I'll check these out.  --Jon


Re: Scala Spark-like RDD for D?

2016-02-16 Thread bachmeier via Digitalmars-d-learn

On Wednesday, 17 February 2016 at 02:03:40 UTC, Jon D wrote:

On Tuesday, 16 February 2016 at 16:27:27 UTC, bachmeier wrote:
On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer 
wrote:


As an alternative are there plans for parallel/cluster 
computing frameworks for D?


You can use MPI:
https://github.com/DlangScience/OpenMPI


FWIW, I'm interested in the wider topic of incorporating D into 
data science environments also. Sounds as if there are several 
interesting projects in the area, but so far my understanding 
of them is limited. Perhaps the forum isn't the best place to 
discuss, but if there happen to be any blog posts or other 
descriptions, it'd be great to get links.


--Jon


You can discuss here, but there is also a gitter room

https://gitter.im/DlangScience/public

Also, I've got a project that embeds R inside D

http://lancebachmeier.com/rdlang/

It's not quite as good a user experience as others because I have 
limited time for things not related to work. I've got an older 
project to embed D inside R, but it hasn't been updated in a 
while and it's Linux only.


https://bitbucket.org/bachmeil/dmdinline2



Re: Scala Spark-like RDD for D?

2016-02-16 Thread Jon D via Digitalmars-d-learn

On Tuesday, 16 February 2016 at 16:27:27 UTC, bachmeier wrote:
On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer 
wrote:


As an alternative are there plans for parallel/cluster 
computing frameworks for D?


You can use MPI:
https://github.com/DlangScience/OpenMPI


FWIW, I'm interested in the wider topic of incorporating D into 
data science environments also. Sounds as if there are several 
interesting projects in the area, but so far my understanding of 
them is limited. Perhaps the forum isn't the best place to 
discuss, but if there happen to be any blog posts or other 
descriptions, it'd be great to get links.


--Jon


Re: Scala Spark-like RDD for D?

2016-02-16 Thread bachmeier via Digitalmars-d-learn
On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer 
wrote:


As an alternative are there plans for parallel/cluster 
computing frameworks for D?


You can use MPI:
https://github.com/DlangScience/OpenMPI




Re: Scala Spark-like RDD for D?

2016-02-16 Thread jmh530 via Digitalmars-d-learn

On Tuesday, 16 February 2016 at 15:03:36 UTC, Jakob Jenkov wrote:


I cannot speak on behalf of the D community. In my opinion I 
don't think that it is D that needs a big data strategy. It is 
the users of D that need that strategy.


I am originally a Java developer. Java devs. create all kinds 
of crazy tools all the time. Lots fail, but some survive and 
grow big, like Spark.


D devs need to do the same. Just jump into it. Have it be your 
hobby project in D. Then see where it takes you.


Good attitude. Nevertheless, I think there is a much larger 
population of people who would want to use D for normal data 
analysis if packages could replicate much of what people do in 
R/Python.


If the OP really wants to contribute to big data projects in D, 
he might want to start with things that will more easily allow D 
to interact with existing libraries.


For instance, Google's MR4C allows C code to be run in a Hadoop 
instance. Maybe adding support for D might be do-able?


http://google-opensource.blogspot.com/2015/02/mapreduce-for-c-run-native-code-in.html

There is likely value in writing bindings to machine learning 
libraries. I did a quick search of machine learning libraries and 
much of it looked like it was in C++. I don't have much expertise 
with writing bindings to C++ libraries.





Re: Scala Spark-like RDD for D?

2016-02-16 Thread Jakob Jenkov via Digitalmars-d-learn
Perhaps the question is too prescriptive. Another way is: Does 
D have a big data strategy? But I tried to anchor it to some 
currently functioning framework which is why I suggested RDD.


I cannot speak on behalf of the D community. In my opinion I 
don't think that it is D that needs a big data strategy. It is 
the users of D that need that strategy.


I am originally a Java developer. Java devs. create all kinds of 
crazy tools all the time. Lots fail, but some survive and grow 
big, like Spark.


D devs need to do the same. Just jump into it. Have it be your 
hobby project in D. Then see where it takes you.




Re: Scala Spark-like RDD for D?

2016-02-15 Thread data pulverizer via Digitalmars-d-learn
On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer 
wrote:
Are there are any plans to create a scala spark-like RDD class 
for D 
(https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)? This is a powerful model that has taken the data science world by storm; it would be useful to have something like this in the D world. Most of the algorithms in statistics/data science are iterative in nature which fits well with this kind of data model.


I read through the Kind Of Container thread which has some 
relationship with this issue 
(https://forum.dlang.org/thread/n07rh8$dmb$1...@digitalmars.com). 
It looks like Immutability would be the way to go for an RDD 
data structure. But I am not wedded to any model as long as we 
can have something that performs the same functionality as the 
RDD.


As an alternative are there plans for parallel/cluster 
computing frameworks for D?


Apologies if I am kicking a hornet's nest. It is not my 
intention.


Thanks


Perhaps the question is too prescriptive. Another way is: Does D 
have a big data strategy? But I tried to anchor it to some 
currently functioning framework which is why I suggested RDD.




Scala Spark-like RDD for D?

2016-02-15 Thread data pulverizer via Digitalmars-d-learn
Are there are any plans to create a scala spark-like RDD class 
for D 
(https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)? 
This is a powerful model that has taken the data science world by 
storm; it would be useful to have something like this in the D 
world. Most of the algorithms in statistics/data science are 
iterative in nature which fits well with this kind of data model.


I read through the Kind Of Container thread which has some 
relationship with this issue 
(https://forum.dlang.org/thread/n07rh8$dmb$1...@digitalmars.com). It 
looks like Immutability would be the way to go for an RDD data 
structure. But I am not wedded to any model as long as we can 
have something that performs the same functionality as the RDD.


As an alternative are there plans for parallel/cluster computing 
frameworks for D?


Apologies if I am kicking a hornet's nest. It is not my intention.

Thanks