Yup, this is true, pipe will add overhead. Might still be worth a shot though if you’re okay with having mixed Scala + .NET code.
Matei On Dec 16, 2013, at 4:42 PM, Kenneth Tran <[email protected]> wrote: > Hi Matei, > > 1. If I understand pipe correctly, I don't think that it can solve the > problem if the algorithm is iterative and requires a reduction step in each > iteration. Consider this simple linear regression example > > // Example: Batch-gradient-descent logistic regression, ignoring > biases > for (int i = 0; i < NIter; i++) { > var gradient = data.Sum(p => (w dot p.x - p.y) * p.x); > w -= rate * gradient; > } > > In order to use pipe as you said, one needs to move the for loop to the > calling code (in Java), which may not be simple when dealing with more > complex code and would still require (major) re-factoring of the ML > libraries. Furthermore, there will be I/O at each iteration, which makes > Spark not different from Hadoop MapReduce. > > 2. Before asking this, I have also looked at jni4net. Besides the usage > complexity, jni4net has a few red flags > It hasn't been developed since 2011 although the latest status is alpha > Its license terms (and code integrity) may not pass our legal department > Its robustness and efficiency are dubious. > Anyway, I'm looking at some other alternatives (e.g. JNBridge). > > Thanks. > -Ken > > > On Mon, Dec 16, 2013 at 12:04 PM, Matei Zaharia <[email protected]> > wrote: > Hi Kenneth, > > Try using the RDD.pipe() operator in Spark, which lets you call out to an > external process by passing data to it through standard in/out. This will let > you call programs written in C# (e.g. that use your ML libraries) from a > Spark program. > > I believe there are other projects enabling communication from Java to .NET, > e.g. http://jni4net.sourceforge.net, but I’m not sure how easy they’ll be to > use. > > Matei > > On Dec 16, 2013, at 10:54 AM, Kenneth Tran <[email protected]> wrote: > >> Hi, >> >> We have a large ML code base in .NET. Spark seems cool and we want to >> leverage it. What would be the best strategies to bridge the our .NET code >> and Spark? >> >> Initiate a Spark .NET project >> A lightweight bridge between .NET and Java >> While (1) sound too daunting, it's not clear to me how to do (2) easily and >> efficiently. >> >> I'm willing to contribute to (1) if there's already an existing effort. >
