Hi, No. It's a java application that uses RDD APIs. Thanks, Akshay
On Mon, Feb 25, 2019 at 7:54 AM Manu Zhang <owenzhang1...@gmail.com> wrote: > Is your application using Spark SQL / DataFrame API ? Is so, please try > setting > > spark.sql.files.maxPartitionBytes > > to a larger value which is 128MB by default. > > Thanks, > Manu Zhang > On Feb 25, 2019, 2:58 AM +0800, Akshay Mendole <akshaymend...@gmail.com>, > wrote: > > Hi, > We have dfs.blocksize configured to be 512MB and we have some large > files in hdfs that we want to process with spark application. We want to > split the files get more splits to optimise for memory but the above > mentioned parameters are not working > The max and min size params as below are configured to be 50MB still a > file which is as big as 500MB is read as one split while it is expected to > split into at least 10 input splits > > SparkConf conf = new SparkConf().setAppName(jobName); > > SparkContext sparkContext = new SparkContext(conf); > sparkContext.hadoopConfiguration().set("mapreduce.input.fileinputformat.split.maxsize", > "50000000"); > sparkContext.hadoopConfiguration().set("mapreduce.input.fileinputformat.split.minsize", > "50000000"); > JavaSparkContext sc = new JavaSparkContext(sparkContext); > sc.hadoopConfiguration().set("io.compression.codecs", > "com.hadoop.compression.lzo.LzopCodec"); > > > Could you please suggest what could be wrong with my configuration? > > Thanks, > Akshay > >