Hi Akshat,

Is there a particular reason you don't use s3a? From my experience,s3a performs 
much better than the rest. I believe the inefficiency is from the 
implementation of the s3 interface.

Best Regards,

Jerry

Sent from my iPhone

> On 9 Aug, 2015, at 5:48 am, Akhil Das <ak...@sigmoidanalytics.com> wrote:
> 
> Depends on which operation you are doing, If you are doing a .count() on a 
> parquet, it might not download the entire file i think, but if you do a 
> .count() on a normal text file it might pull the entire file.
> 
> Thanks
> Best Regards
> 
>> On Sat, Aug 8, 2015 at 3:12 AM, Akshat Aranya <aara...@gmail.com> wrote:
>> Hi,
>> 
>> I've been trying to track down some problems with Spark reads being very 
>> slow with s3n:// URIs (NativeS3FileSystem).  After some digging around, I 
>> realized that this file system implementation fetches the entire file, which 
>> isn't really a Spark problem, but it really slows down things when trying to 
>> just read headers from a Parquet file or just creating partitions in the 
>> RDD.  Is this something that others have observed before, or am I doing 
>> something wrong?
>> 
>> Thanks,
>> Akshat
> 

Reply via email to