Is there a way to take advantage of the underlying datasource partitions when generating a DataFrame/SchemaRDD via catalyst? It seems from the sql module that the only options are RangePartitioner and HashPartitioner - and further that those are selected automatically by the code . It was not apparent that either the underlying partitioning were translated to the partitions presented in the rdd or that a custom partitioner were possible to be provided.
The motivation would be to subsequently use df.map (with preservesPartitioning=true) and/or df.mapPartitions (likewise) to perform operations that work within the original datasource partitions - thus avoiding a shuffle.