Re: Query a Dataframe in rdd.map()

Holden Karau Thu, 21 May 2015 10:40:55 -0700

So DataFrames, like RDDs, can only be accused from the driver. If your IP
Frequency table is small enough you could collect it and distribute it as a
hashmap with broadcast or you could also join your rdd with the ip
frequency table. Hope that helps :)


On Thursday, May 21, 2015, ping yan <sharon...@gmail.com> wrote:

> I have a dataframe as a reference table for IP frequencies.
> e.g.,
>
> ip                       freq
> 10.226.93.67         1
> 10.226.93.69         1
> 161.168.251.101   4
> 10.236.70.2           1
> 161.168.251.105 14
>
>
> All I need is to query the df in a map.
>
> rdd = sc.parallelize(['208.51.22.18', '31.207.6.173', '208.51.22.18'])
>
> freqs = rdd.map(lambda x: df.where(df.ip ==x ).first())
>
> It doesn't get through.. would appreciate any help.
>
> Thanks!
> Ping
>
>
>
>
> --
> Ping Yan
> Ph.D. in Management
> Dept. of Management Information Systems
> University of Arizona
> Tucson, AZ 85721
>
>

-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau
Linked In: https://www.linkedin.com/in/holdenkarau

Re: Query a Dataframe in rdd.map()

Reply via email to