textFile does reads all files in a directory.

We have modified the sparkstreaming code base to read nested files from S3,
you can check this function
which does that and implement something similar for your usecase.

Or if your job is just a batch job and you don't bother processing file by
file, then may be you can iterate over your list and create a sc.textFile
for each file entry and do the computing too. something like:

for(file <- fileNames){

 // Create sparkContext
 // do sc.textFile(file)
 // do your computing
 // sc.stop


Best Regards

On Thu, May 21, 2015 at 1:45 AM, lovelylavs <lxn130...@utdallas.edu> wrote:

> Hi,
> I am trying to get a collection of files according to LastModifiedDate from
> S3
>     List <String>  FileNames = new ArrayList<String>();
> ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
>                     .withBucketName(s3_bucket)
>                     .withPrefix(logs_dir);
>             ObjectListing objectListing;
>             do {
>                 objectListing = s3Client.listObjects(listObjectsRequest);
>                 for (S3ObjectSummary objectSummary :
>                         objectListing.getObjectSummaries()) {
>                     if
> ((objectSummary.getLastModified().compareTo(dayBefore) > 0)  &&
> (objectSummary.getLastModified().compareTo(dayAfter) <1) &&
> objectSummary.getKey().contains(".log"))
>                         FileNames.add(objectSummary.getKey());
>                 }
> listObjectsRequest.setMarker(objectListing.getNextMarker());
>             } while (objectListing.isTruncated());
> I would like to process these files using Spark
> I understand that textFile reads a single text file. Is there any way to
> read all these files that are part of the List?
> Thanks for your help.
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Read-multiple-files-from-S3-tp22965.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to