Just some complements to other answers: If you output to, say, `s3://bucket/myfile`, then you can use this bucket as the input of other jobs (sc.textFile('s3://bucket/myfile')). By default all `part-xxx` files will be used. There's also `sc.wholeTextFiles` that you can play with.
If you file is small and need to be interoperable by other tools/langs, s3n may be a better choice. But in my experience, when reading directly from s3n, spark create only 1 input partition per file, regardless of the file size. This may lead to some performance problem if you have big files. 2014-05-07 2:39 GMT+02:00 Andre Kuhnen <andrekuh...@gmail.com>: > Try using s3n instead of s3 > Em 06/05/2014 21:19, "kamatsuoka" <ken...@gmail.com> escreveu: > > I have a Spark app that writes out a file, s3://mybucket/mydir/myfile.txt. >> >> Behind the scenes, the S3 driver creates a bunch of files like >> s3://mybucket//mydir/myfile.txt/part-0000, as well as the block files like >> s3://mybucket/block_3574186879395643429. >> >> How do I construct an url to use this file as input to another Spark app? >> I >> tried all the variations of s3://mybucket/mydir/myfile.txt, but none of >> them >> work. >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-multipart-s3-file-tp5463.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > -- *JU Han* Data Engineer @ Botify.com +33 0619608888