Maybe a bug with HttpBroadcast or maybe my fault but can't find where :) The problem: At the beginning a job computes a treemap(string, someobject) with a custom order (some dummy lowercase), this treemap is broadcasted. Then i use this map to do some matching against input rdd (excluding those that don't exist). What happens? In local (bc is in that case not used) or by passing all the treemap without broadcast I got more than 3M matchings, after broadcast it falls to 20K.
Replacing HttpBroadcastFactory with TreeBroadcastFactory solves the problem (I obtain expected results). I am trying to implement a test case to reproduce it, but it is quite tricky in that case... BTW is there a way to reproduce the broadcast mechanism in local (I see that the SparkEnv instance is shared as static, so I guess there is no easy way)? Thanks, Eugen
