Yup, this is true, pipe will add overhead. Might still be worth a shot though 
if you’re okay with having mixed Scala + .NET code.

Matei

On Dec 16, 2013, at 4:42 PM, Kenneth Tran <[email protected]> wrote:

> Hi Matei,
> 
> 1. If I understand pipe correctly, I don't think that it can solve the 
> problem if the algorithm is iterative and requires a reduction step in each 
> iteration. Consider this simple linear regression example
> 
>             // Example: Batch-gradient-descent logistic regression, ignoring 
> biases
>             for (int i = 0; i < NIter; i++) {
>                 var gradient = data.Sum(p => (w dot p.x - p.y) * p.x);
>                 w -= rate * gradient;
>             }
> 
> In order to use pipe as you said, one needs to move the for loop to the 
> calling code (in Java), which may not be simple when dealing with more 
> complex code and would still require (major) re-factoring of the ML 
> libraries. Furthermore, there will be I/O at each iteration, which makes 
> Spark not different from Hadoop MapReduce.
> 
> 2. Before asking this, I have also looked at jni4net. Besides the usage 
> complexity, jni4net has a few red flags
> It hasn't been developed since 2011 although the latest status is alpha
> Its license terms (and code integrity) may not pass our legal department
> Its robustness and efficiency are dubious. 
> Anyway, I'm looking at some other alternatives (e.g. JNBridge).
> 
> Thanks.
> -Ken
> 
> 
> On Mon, Dec 16, 2013 at 12:04 PM, Matei Zaharia <[email protected]> 
> wrote:
> Hi Kenneth,
> 
> Try using the RDD.pipe() operator in Spark, which lets you call out to an 
> external process by passing data to it through standard in/out. This will let 
> you call programs written in C# (e.g. that use your ML libraries) from a 
> Spark program.
> 
> I believe there are other projects enabling communication from Java to .NET, 
> e.g. http://jni4net.sourceforge.net, but I’m not sure how easy they’ll be to 
> use.
> 
> Matei
> 
> On Dec 16, 2013, at 10:54 AM, Kenneth Tran <[email protected]> wrote:
> 
>> Hi, 
>> 
>> We have a large ML code base in .NET. Spark seems cool and we want to 
>> leverage it. What would be the best strategies to bridge the our .NET code 
>> and Spark?
>> 
>> Initiate a Spark .NET project
>> A lightweight bridge between .NET and Java
>> While (1) sound too daunting, it's not clear to me how to do (2) easily and 
>> efficiently.
>> 
>> I'm willing to contribute to (1) if there's already an existing effort.
> 

Reply via email to