It will be very appreciated if you can give more details about why runJob
function could not be called in getPreferredLocations()
In the NewHadoopRDD class and HadoopRDD class, they get the location
information from the inputSplit. But there may be an issue in NewHadoopRDD,
because it generates al
You can’t call runJob inside getPreferredLocations().
You can take a look at the source code of HadoopRDD to help you implement
getPreferredLocations() appropriately.
> On Dec 31, 2016, at 09:48, Fei Hu wrote:
>
> That is a good idea.
>
> I tried add the following code to get getPreferredLocat
That is a good idea.
I tried add the following code to get getPreferredLocations() function:
val results: Array[Array[DataChunkPartition]] = context.runJob(
partitionsRDD, (context: TaskContext, partIter:
Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true)
But it seem
Maybe you can create your own subclass of RDD and override the
getPreferredLocations() to implement the logic of dynamic changing of the
locations.
> On Dec 30, 2016, at 12:06, Fei Hu wrote:
>
> Dear all,
>
> Is there any way to change the host location for a certain partition of RDD?
>
> "pr
Dear all,
Is there any way to change the host location for a certain partition of RDD?
"protected def getPreferredLocations(split: Partition)" can be used to
initialize the location, but how to change it after the initialization?
Thanks,
Fei