Re: RDD Location

2016-12-30 Thread Fei Hu
It will be very appreciated if you can give more details about why runJob function could not be called in getPreferredLocations() In the NewHadoopRDD class and HadoopRDD class, they get the location information from the inputSplit. But there may be an issue in NewHadoopRDD, because it generates al

Re: RDD Location

2016-12-30 Thread Sun Rui
You can’t call runJob inside getPreferredLocations(). You can take a look at the source code of HadoopRDD to help you implement getPreferredLocations() appropriately. > On Dec 31, 2016, at 09:48, Fei Hu wrote: > > That is a good idea. > > I tried add the following code to get getPreferredLocat

Re: RDD Location

2016-12-30 Thread Fei Hu
That is a good idea. I tried add the following code to get getPreferredLocations() function: val results: Array[Array[DataChunkPartition]] = context.runJob( partitionsRDD, (context: TaskContext, partIter: Iterator[DataChunkPartition]) => partIter.toArray, dd, allowLocal = true) But it seem

Re: RDD Location

2016-12-29 Thread Sun Rui
Maybe you can create your own subclass of RDD and override the getPreferredLocations() to implement the logic of dynamic changing of the locations. > On Dec 30, 2016, at 12:06, Fei Hu wrote: > > Dear all, > > Is there any way to change the host location for a certain partition of RDD? > > "pr

RDD Location

2016-12-29 Thread Fei Hu
Dear all, Is there any way to change the host location for a certain partition of RDD? "protected def getPreferredLocations(split: Partition)" can be used to initialize the location, but how to change it after the initialization? Thanks, Fei