RDD[URI]

Philip Ogren Thu, 30 Jan 2014 08:23:36 -0800

In my Spark programming thus far my unit of work has been a single rowfrom an hdfs file by creating an RDD[Array[String]] with something like:


spark.textFile(path).map(_.split("\t"))

Now, I'd like to do some work over a large collection of files in whichthe unit of work is a single file (rather than a row from a file.) DoesSpark anticipate users creating an RDD[URI] or RDD[File] or some suchand supporting actions and transformations that one might want to do onsuch an RDD? Any advice and/or code snippets would be appreciated!


Thanks,
Philip

RDD[URI]

Reply via email to