textFile does reads all files in a directory. We have modified the sparkstreaming code base to read nested files from S3, you can check this function <https://github.com/sigmoidanalytics/spark-modified/blob/8074620414df6bbed81ac855067600573a7b22ca/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L206> which does that and implement something similar for your usecase.
Or if your job is just a batch job and you don't bother processing file by file, then may be you can iterate over your list and create a sc.textFile for each file entry and do the computing too. something like: for(file <- fileNames){ // Create sparkContext // do sc.textFile(file) // do your computing // sc.stop } Thanks Best Regards On Thu, May 21, 2015 at 1:45 AM, lovelylavs <lxn130...@utdallas.edu> wrote: > Hi, > > I am trying to get a collection of files according to LastModifiedDate from > S3 > > List <String> FileNames = new ArrayList<String>(); > > ListObjectsRequest listObjectsRequest = new ListObjectsRequest() > .withBucketName(s3_bucket) > .withPrefix(logs_dir); > > ObjectListing objectListing; > > > do { > objectListing = s3Client.listObjects(listObjectsRequest); > for (S3ObjectSummary objectSummary : > objectListing.getObjectSummaries()) { > > if > ((objectSummary.getLastModified().compareTo(dayBefore) > 0) && > (objectSummary.getLastModified().compareTo(dayAfter) <1) && > objectSummary.getKey().contains(".log")) > FileNames.add(objectSummary.getKey()); > } > > listObjectsRequest.setMarker(objectListing.getNextMarker()); > } while (objectListing.isTruncated()); > > I would like to process these files using Spark > > I understand that textFile reads a single text file. Is there any way to > read all these files that are part of the List? > > Thanks for your help. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Read-multiple-files-from-S3-tp22965.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >