I am running a series of spark functions with 9000 executors and its
resulting in 9000+ files that is execeeding the namespace file count qutota.

How can Spark be configured to use CombinedOutputFormat.
{code}

protected def writeOutputRecords(detailRecords:
RDD[(AvroKey[DetailOutputRecord], NullWritable)], outputDir: String) {

    val writeJob = new Job()

    val schema = SchemaUtil.outputSchema(_detail)

    AvroJob.setOutputKeySchema(writeJob, schema)

    detailRecords.saveAsNewAPIHadoopFile(outputDir,

      classOf[AvroKey[GenericRecord]],

      classOf[org.apache.hadoop.io.NullWritable],

      classOf[AvroKeyOutputFormat[GenericRecord]],

      writeJob.getConfiguration)

  }
{code}

-- 
Deepak

Reply via email to