Did you try with the data unzipped?

On Sun, Feb 21, 2016 at 2:53 PM, Oscar Morante <[email protected]> wrote:

> I'm still fighting with this, so far I've tried:
>
>  - Adding the native hadoop libraries.
>  - `s3n://` and `s3a://`.
>  - SSL on/off.
>  - different S3 endpoints.
>  - different values in `drill.exec.buffer.size`.
>
> None of these seem to make a difference and `s3cmd` is always ten times
> faster than Drill to download the same file.  Netstat shows that about 800k
> piling up in Recv-Q during the query, and `s3cmd` is pretty much clean the
> whole time.
>
> I've also noticed that if I cancel the query in the middle, the download
> speed suddently goes up and matches s3cmd for a while before it stops.
>
> Is there anything else that I can try to improve the situation?  At the
> begining I thought that S3 was the bottleneck but everything is pointing to
> kind of lock in Drill.
>
> Or maybe I'm just being unrealistic and asking too much :?
> Cheers,
>
>
>
> On Fri, Feb 19, 2016 at 02:27:56PM +0200, Oscar Morante wrote:
>
>> Hi there,
>>
>> I'm experiencing very slow download rates from S3 but only when using
>> Drill.  This is testing with only one drillbit and querying a 250Mb gzipped
>> JSON:
>>
>>   select count(somefield) from s3.`test/big.json.gz`;
>>
>> The download speed while drill is executing the query is about 5Mb/s.
>> Then if I try downloading the same file from the same environment using
>> `s3cmd` the average speed is about 60Mb/s.
>>
>> Any idea what could be causing such a big difference?  I'm not sure
>> what's the best way to debug this, or what are the relevant configuration
>> parameters that I should be tweaking.
>>
>> Thanks!
>>
>
>
>
> --
> Oscar Morante
> "Self-education is, I firmly believe, the only kind of education there is."
>                                                          -- Isaac Asimov.
>



-- 
----------------------------------
Paul Ilechko
Senior Systems Engineer
MapR Technologies
908 331 2207

Reply via email to