You may try `sparkContext.hadoopConfiguration().set("mapred.max.split.size", "33554432")` to tune the partition size when reading from HDFS.
Thanks, Manu Zhang On Mon, Apr 15, 2019 at 11:28 PM M Bilal <mbilalce....@gmail.com> wrote: > Hi, > > I have implemented a custom partitioning algorithm to partition graphs in > GraphX. Saving the partitioning graph (the edges) to HDFS creates separate > files in the output folder with the number of files equal to the number of > Partitions. > > However, reading back the edges creates number of partitions that are > equal to the number of blocks in the HDFS folder. Is there a way to instead > create the same number of partitions as the number of files written to HDFS > while preserving the original partitioning? > > I would like to avoid repartitioning. > > Thanks. > - Bilal >