Hi Matei,

1. If I understand pipe correctly, I don't think that it can solve the
problem if the algorithm is iterative and requires a reduction step in each
iteration. Consider this simple linear regression example

            // Example: Batch-gradient-descent logistic regression,
ignoring biases
            for (int i = 0; i < NIter; i++) {
                var gradient = data.Sum(p => (w dot p.x - p.y) * p.x);
                w -= rate * gradient;
            }

In order to use pipe as you said, one needs to move the for loop to the
calling code (in Java), which may not be simple when dealing with more
complex code and would still require (major) re-factoring of the ML
libraries. Furthermore, there will be I/O at each iteration, which makes
Spark not different from Hadoop MapReduce.

2. Before asking this, I have also looked at jni4net. Besides the usage
complexity, jni4net has a few red flags

   - It hasn't been developed since 2011 although the latest status is alpha
   - Its license terms (and code integrity) may not pass our legal
   department
   - Its robustness and efficiency are dubious.

Anyway, I'm looking at some other alternatives (e.g. JNBridge).

Thanks.
-Ken


On Mon, Dec 16, 2013 at 12:04 PM, Matei Zaharia <[email protected]>wrote:

> Hi Kenneth,
>
> Try using the RDD.pipe() operator in Spark, which lets you call out to an
> external process by passing data to it through standard in/out. This will
> let you call programs written in C# (e.g. that use your ML libraries) from
> a Spark program.
>
> I believe there are other projects enabling communication from Java to
> .NET, e.g. http://jni4net.sourceforge.net, but I’m not sure how easy
> they’ll be to use.
>
> Matei
>
> On Dec 16, 2013, at 10:54 AM, Kenneth Tran <[email protected]> wrote:
>
> Hi,
>
> We have a large ML code base in .NET. Spark seems cool and we want to
> leverage it. What would be the best strategies to bridge the our .NET code
> and Spark?
>
>
>    1. Initiate a Spark .NET project
>    2. A lightweight bridge between .NET and Java
>
> While (1) sound too daunting, it's not clear to me how to do (2) easily
> and efficiently.
>
> I'm willing to contribute to (1) if there's already an existing effort.
>
>

Reply via email to