1. You can call any api that returns you the hostname in your map
function. Here's a simplified example, You would generally use
mapPartitions as it will save the overhead of retrieving hostname multiple
times
2.
3. import scala.sys.process._
4. val distinctHosts = sc.parallelize(0 to 100).map { _ =>
5. val hostname = ("hostname".!!).trim
6. // your code
7. (hostname)
8. }.collect.distinct
9.
On 24 October 2015 at 01:41, weoccc <[email protected]> wrote:
> yea,
>
> my use cases is that i want to have some external communications where rdd
> is being run in map. The external communication might be handled separately
> transparent to spark. What will be the hacky way and nonhacky way to do
> that ? :)
>
> Weide
>
>
>
> On Fri, Oct 23, 2015 at 5:32 PM, Ted Yu <[email protected]> wrote:
>
>> Can you outline your use case a bit more ?
>>
>> Do you want to know all the hosts which would run the map ?
>>
>> Cheers
>>
>> On Fri, Oct 23, 2015 at 5:16 PM, weoccc <[email protected]> wrote:
>>
>>> in rdd map function, is there a way i can know the list of host names
>>> where the map runs ? any code sample would be appreciated ?
>>>
>>> thx,
>>>
>>> Weide
>>>
>>>
>>>
>>
>