*Issue: * When I am reading sequence file in spark, I can specify the number of partitions as an argument to the API, below is the way *public <K, V> JavaPairRDD<K, V> sequenceFile(String path, Class<K> keyClass, Class<V> valueClass, int minPartitions)*
*In newAPIHadoopFile(), this support has been removed. below are the APIs.* - public <K, V, F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> JavaPairRDD<K, V> *newAPIHadoopFile*(String path, Class<F> fClass, Class<K> kClass, Class<V> vClass, Configuration conf) - public <K, V, F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> JavaPairRDD<K, V> *newAPIHadoopRDD*(Configuration conf, Class<F> fClass, Class<K> kClass, Class<V> vClass) Is there a way to specify the number of partitions when I will read *Avro* file using *newAPIHadoopFile*(). I explored and found that we can specify the hadoop configuration and in that, we can set various Hadoop properties. but there we can specify the size using this property ("*mapred.max.split.size","50mb"). *based on this it will calculate the number of partitions but then each partition's size may or may not be equal or less than the specified size. - *note - *A way other than repartition() *Execution Environment* - SPARK-JAVA VERSION - 2.4.0 - JDK VERSION - 1.8 - SPARK ARTIFACTID - spark-core_2.11 - AVRO VERSION - 1.8.2 Please help us understand, why this issue is coming? Thanks, Vatsal