RDDs are not Maps. lookup() does a linear scan -- parallel by
partition, but stil linear. Yes, it is not supposed be an O(1) lookup
data structure. It'd be much nicer to broadcast the relatively small
data set as a Map and look it up fast, locally.

On Thu, Feb 19, 2015 at 3:29 PM, shahab <[email protected]> wrote:
> Hi,
>
> I am doing lookup on cached RDDs [(Int,String)], and I noticed that the
> lookup is relatively slow 30-100 ms ?? I even tried this on one machine with
> single partition, but no difference!
>
> The RDDs are not large at all, 3-30 MB.
>
> Is this expected behaviour? should I use other data structures, like HashMap
> to keep data and look up it there and use Broadcast to send a copy to all
> machines?
>
> best,
> /Shahab
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to