Re: Loading lots of parquet files into dataframe from s3

lovelylavs Thu, 18 Jun 2015 08:19:56 -0700

You can do something like this:

 ObjectListing objectListing;



        do { 
            objectListing = s3Client.listObjects(listObjectsRequest); 
            for (S3ObjectSummary objectSummary : 
                    objectListing.getObjectSummaries()) { 

                if ((objectSummary.getLastModified().compareTo(dayBefore) >
0)  && (objectSummary.getLastModified().compareTo(dayAfter) <1) &&
objectSummary.getKey().contains(".log")) 
                    FileNames.add(objectSummary.getKey()); 
            } 
            listObjectsRequest.setMarker(objectListing.getNextMarker()); 
        } while (objectListing.isTruncated()); 


String concatName= "";
    for(String fName : FileNames) {
       if(FileNames.indexOf(fName) == (FileNames.size() -1)) {
          concatName+= "s3n://" + s3_bucket + "/" + fName;
       } else {
          concatName+= "s3n://" + s3_bucket + "/" + fName + ",";
       }
    }



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Loading-lots-of-parquet-files-into-dataframe-from-s3-tp23127p23394.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Loading lots of parquet files into dataframe from s3

Reply via email to