Re: hadoop input/output format advanced control

2015-06-26 Thread ๏̯͡๏
I am trying the very same thing to configure min split size with Spark 1.3.1 and i get compilation error Code: val hadoopConfiguration = new Configuration(sc.hadoopConfiguration) hadoopConfiguration.set("mapreduce.input.fileinputformat.split.maxsize", "67108864") sc.newAPIHadoopFile

Re: hadoop input/output format advanced control

2015-03-24 Thread Nick Pentreath
You can indeed override the Hadoop configuration at a per-RDD level - though it is a little more verbose, as in the below example, and you need to effectively make a copy of the hadoop Configuration: val thisRDDConf = new Configuration(sc.hadoopConfiguration) thisRDDConf.set("mapred.min.split.size