Running a job from IDEA against local HDFS

'Michal Rozar' via Scalding Development Fri, 12 Aug 2016 15:12:40 -0700

I have local HDFS running and I have a job I would like to run using the 
localhost HDFS.


My latest attempt was using ExecutionApp like this:


object MyExApp extends ExecutionApp {

  val job: Execution[Unit] =
    TypedPipe.from(TextLine("input"))
      .flatMap(_.split("\\s+"))
      .map { word => (word, 1L) }
      .sumByKey
      .toTypedPipe
      .writeExecution(TypedTsv("output"))
}


and running it with the following arguments: --hdfs --output 
/user/local/srp-visits --input /user/local/zeno


but I'm getting the following error: 


Exception in thread "main" com.twitter.scalding.InvalidSourceException: 
[com.twitter.scalding.TextLine(input)] Data is missing from one or more 
paths in: List(input)


I have also tried:


object JobRunner extends App {

  val hadoopConfiguration: Configuration = new Configuration
//  hadoopConfiguration.set("mapred.job.tracker","hadoop-master:9000")
  hadoopConfiguration.set("fs.defaultFS","hdfs://localhost:9000")

  val hdfsMode = Hdfs(strict = true, hadoopConfiguration)
  val arguments = Mode.putMode(hdfsMode, Args("--output /user/local/srp-visits 
--input /user/local/zeno"))

  // Now create the job after the mode is set up properly.
  val job = new CountSrpVisitsWithKeyword(arguments)
  val flow = job.buildFlow
  flow.complete()
}


and ended up with to following error:

SEVERE: PriviledgedActionException as:<user> 
cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not 
exist: file:/user/local/zeno


Any suggestion what configuration am I missing? I would like to avoid building 
.jar file and running it using hadoop commands as it feels like massive time 
waste.


Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Running a job from IDEA against local HDFS

Reply via email to