Maybe a bug with HttpBroadcast or maybe my fault but can't find where :)

The problem:
  At the beginning a job computes a treemap(string, someobject) with a
custom order (some dummy lowercase), this treemap is broadcasted.
  Then i use this map to do some matching against input rdd (excluding
those that don't exist).
  What happens? In local (bc is in that case not used) or by passing all
the treemap without broadcast I got more than 3M matchings, after broadcast
it falls to 20K.

 Replacing HttpBroadcastFactory with TreeBroadcastFactory solves the
problem (I obtain expected results). I am trying to implement a test case
to reproduce it, but it is quite tricky in that case...

BTW is there a way to reproduce the broadcast mechanism in local (I see
that the SparkEnv instance is shared as static, so I guess there is no easy
way)?

Thanks,
Eugen

Reply via email to