Re: A Spark Design Problem

2014-11-01 Thread Steve Lewis
join seems to me the proper approach followed by keying the fits by KeyID and using combineByKey to choose the best - I am implementing that now and will report on performance On Fri, Oct 31, 2014 at 11:56 AM, Sonal Goyal wrote: > Does the following help? > > JavaPairRDD join with JavaPairRDD >

Re: A Spark Design Problem

2014-10-31 Thread Sonal Goyal
Does the following help? JavaPairRDD join with JavaPairRDD If you partition both RDDs by the bin id, I think you should be able to get what you want. Best Regards, Sonal Nube Technologies On Fri, Oct 31, 2014 at 11:19 PM, wrote

Re: A Spark Design Problem

2014-10-31 Thread francois . garillot
Hi Steve, Are you talking about sequence alignment ? — FG On Fri, Oct 31, 2014 at 5:44 PM, Steve Lewis wrote: > The original problem is in biology but the following captures the CS > issues, Assume I have a large number of locks and a large number of keys. > There is a scoring function bet