RDDs are not Maps. lookup() does a linear scan -- parallel by partition, but stil linear. Yes, it is not supposed be an O(1) lookup data structure. It'd be much nicer to broadcast the relatively small data set as a Map and look it up fast, locally.
On Thu, Feb 19, 2015 at 3:29 PM, shahab <[email protected]> wrote: > Hi, > > I am doing lookup on cached RDDs [(Int,String)], and I noticed that the > lookup is relatively slow 30-100 ms ?? I even tried this on one machine with > single partition, but no difference! > > The RDDs are not large at all, 3-30 MB. > > Is this expected behaviour? should I use other data structures, like HashMap > to keep data and look up it there and use Broadcast to send a copy to all > machines? > > best, > /Shahab > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
