Thanks Ayan!
Finally it worked!! Thanks a lot everyone for the inputs!
Once I prefixed the params with "spark.hadoop", I see the no.of tasks
getting reduced.
I'm setting the following params:
--conf spark.hadoop.dfs.block.size
--conf spark.hadoop.mapreduce.input.fileinputformat.split.minsize
Maybe you need to set the parameters for the mapreduce api and not the mapred
api. I do not have in mind now how they differ but the Hadoop web page should
tell you ;-)
> On 10. Oct 2017, at 17:53, Kanagha Kumar wrote:
>
> Thanks for the inputs!!
>
> I passed in
Have you seen this:
https://stackoverflow.com/questions/42796561/set-hadoop-configuration-values-on-spark-submit-command-line
? Please try and let us know.
On Wed, Oct 11, 2017 at 2:53 AM, Kanagha Kumar
wrote:
> Thanks for the inputs!!
>
> I passed in
Thanks for the inputs!!
I passed in spark.mapred.max.split.size, spark.mapred.min.split.size to the
size I wanted to read. It didn't take any effect.
I also tried passing in spark.dfs.block.size, with all the params set to
the same value.
I have not tested this, but you should be able to pass on any map-reduce
like conf to underlying hadoop config.essentially you should be able to
control behaviour of split as you can do in a map-reduce program (as Spark
uses the same input format)
On Tue, Oct 10, 2017 at 10:21 PM, Jörn Franke
Write your own input format/datasource or split the file yourself beforehand
(not recommended).
> On 10. Oct 2017, at 09:14, Kanagha Kumar wrote:
>
> Hi,
>
> I'm trying to read a 60GB HDFS file using spark textFile("hdfs_file_path",
> minPartitions).
>
> How can I
Hi,
I'm trying to read a 60GB HDFS file using spark textFile("hdfs_file_path",
minPartitions).
How can I control the no.of tasks by increasing the split size? With
default split size of 250 MB, several tasks are created. But I would like
to have a specific no.of tasks created while reading from