Is your application using Spark SQL / DataFrame API ? Is so, please try setting

spark.sql.files.maxPartitionBytes

to a larger value which is 128MB by default.

Thanks,
Manu Zhang
On Feb 25, 2019, 2:58 AM +0800, Akshay Mendole <akshaymend...@gmail.com>, wrote:
> Hi,
>    We have dfs.blocksize configured to be 512MB  and we have some large files 
> in hdfs that we want to process with spark application. We want to split the 
> files get more splits to optimise for memory but the above mentioned 
> parameters are not working
> The max and min size params as below are configured to be 50MB still a file 
> which is as big as 500MB is read as one split while it is expected to split 
> into at least 10 input splits
> SparkConf conf = new SparkConf().setAppName(jobName);
>
> SparkContext sparkContext = new SparkContext(conf);
> sparkContext.hadoopConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
>  "50000000");
> sparkContext.hadoopConfiguration().set("mapreduce.input.fileinputformat.split.minsize",
>  "50000000");
> JavaSparkContext sc = new JavaSparkContext(sparkContext);
> sc.hadoopConfiguration().set("io.compression.codecs", 
> "com.hadoop.compression.lzo.LzopCodec");
>
> Could you please suggest what could be wrong with my configuration?
>
> Thanks,
> Akshay
>

Reply via email to