Oozie may be able to do this for you and integrate with Spark.

> On 09 Mar 2016, at 06:03, Benjamin Kim <bbuil...@gmail.com> wrote:
> 
> I am wondering if anyone can help.
> 
> Our company stores zipped CSV files in S3, which has been a big headache from 
> the start. I was wondering if anyone has created a way to iterate through 
> several subdirectories (s3n://events/2016/03/01/00, s3n://2016/03/01/01, 
> etc.) in S3 to find the newest files and load them. It would be a big bonus 
> to include the unzipping of the file in the process so that the CSV can be 
> loaded directly into a dataframe for further processing. I’m pretty sure that 
> the S3 part of this request is not uncommon. I would think the file being 
> zipped is uncommon. If anyone can help, I would truly be grateful for I am 
> new to Scala and Spark. This would be a great help in learning.
> 
> Thanks,
> Ben
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to