I have local HDFS running and I have a job I would like to run using the
localhost HDFS.
My latest attempt was using ExecutionApp like this:
object MyExApp extends ExecutionApp {
val job: Execution[Unit] =
TypedPipe.from(TextLine("input"))
.flatMap(_.split("\\s+"))
.map { word => (word, 1L) }
.sumByKey
.toTypedPipe
.writeExecution(TypedTsv("output"))
}
and running it with the following arguments: --hdfs --output
/user/local/srp-visits --input /user/local/zeno
but I'm getting the following error:
Exception in thread "main" com.twitter.scalding.InvalidSourceException:
[com.twitter.scalding.TextLine(input)] Data is missing from one or more
paths in: List(input)
I have also tried:
object JobRunner extends App {
val hadoopConfiguration: Configuration = new Configuration
// hadoopConfiguration.set("mapred.job.tracker","hadoop-master:9000")
hadoopConfiguration.set("fs.defaultFS","hdfs://localhost:9000")
val hdfsMode = Hdfs(strict = true, hadoopConfiguration)
val arguments = Mode.putMode(hdfsMode, Args("--output /user/local/srp-visits
--input /user/local/zeno"))
// Now create the job after the mode is set up properly.
val job = new CountSrpVisitsWithKeyword(arguments)
val flow = job.buildFlow
flow.complete()
}
and ended up with to following error:
SEVERE: PriviledgedActionException as:<user>
cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: file:/user/local/zeno
Any suggestion what configuration am I missing? I would like to avoid building
.jar file and running it using hadoop commands as it feels like massive time
waste.
Thanks
--
You received this message because you are subscribed to the Google Groups
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.