How to create fewer output files for Spark job ?

๏̯͡๏ Tue, 02 Jun 2015 22:56:51 -0700

I am running a series of spark functions with 9000 executors and its
resulting in 9000+ files that is execeeding the namespace file count qutota.


How can Spark be configured to use CombinedOutputFormat.
{code}

protected def writeOutputRecords(detailRecords:
RDD[(AvroKey[DetailOutputRecord], NullWritable)], outputDir: String) {

    val writeJob = new Job()

    val schema = SchemaUtil.outputSchema(_detail)

    AvroJob.setOutputKeySchema(writeJob, schema)

    detailRecords.saveAsNewAPIHadoopFile(outputDir,

      classOf[AvroKey[GenericRecord]],

      classOf[org.apache.hadoop.io.NullWritable],

      classOf[AvroKeyOutputFormat[GenericRecord]],

      writeJob.getConfiguration)

  }
{code}

-- 
Deepak

How to create fewer output files for Spark job ?

Reply via email to