Re: WebHDFS performance issue in Knox

2018-09-11 Thread Kevin Risden
What client are you using to connect Knox? Is this for a single file or a
bunch of files?

The SSL handshake can be slow if the client doesn't keep the connection
open.

Kevin Risden

On Tue, Sep 11, 2018, 14:51 Guang Yang  wrote:

> Thanks Larry. But the only difference is this part in my gateway-site.xml.
>
> **
> *ssl.enabled*
> *false*
> *Indicates whether SSL is enabled.*
> **
>
> On Tue, Sep 11, 2018 at 11:42 AM, larry mccay  wrote:
>
>> I really don't think that kind of difference should be expected from
>> merely SSL overhead.
>> I don't however have any metrics to contradict it either since I do not
>> run Knox without SSL.
>>
>> Given the above, I am struggling coming up with a meaningful response to
>> this. :(
>> I don't think you should see a 10 fold increase in speed by disabling SSL
>> though.
>>
>> On Tue, Sep 11, 2018 at 2:35 PM Guang Yang  wrote:
>>
>>> Any idea guys?
>>>
>>> On Mon, Sep 10, 2018 at 3:07 PM, Guang Yang  wrote:
>>>
 Thanks guys! The issue seems exactly what David pointed out, which is
 because of encrypted over SSL.

 Without Knox, the download speed can reach to *400M/s* if I call
 Namenode directly. And with disabling SSL, the speed can reach to
 *~400M/s* as well through Knox. But with SSL, the speed drops
 significantly to *~40M/s*. I know it's because of encrypted, but it
 does surprised me with such a difference. Is it normal from your
 perspective?

 Thanks,
 Guang

 On Tue, Sep 4, 2018 at 11:07 AM, David Villarreal <
 dvillarr...@hortonworks.com> wrote:

> Hi Guang,
>
>
>
> Keep in mind the data is being encrypted over SSL.  If you disable SSL
> you will most likely see a very significant boost in throughput.  Some
> people have used more powerful computers to make encryption quicker.
>
>
>
> Thanks,
>
>
>
> David
>
>
>
> *From: *Sean Roberts 
> *Reply-To: *"user@knox.apache.org" 
> *Date: *Tuesday, September 4, 2018 at 1:53 AM
> *To: *"user@knox.apache.org" 
> *Subject: *Re: WebHDFS performance issue in Knox
>
>
>
> Guang – This is somewhat to be expected.
>
>
>
> When you talk to WebHDFS directly, the client can distribute the
> request across many data nodes. Also, you are getting data directly from
> the source.
>
> With Knox, all traffic goes through the single Knox host. Knox is
> responsible for fetching from the datanodes and consolidating to send to
> you. This means overhead as it’s acting as a middle man, and lower network
> capacity since only 1 host is serving data to you.
>
>
>
> Also, if running on a cloud provider, the Knox host may be a smaller
> instance size with lower network capacity.
>
> --
>
> Sean Roberts
>
>
>
> *From: *Guang Yang 
> *Reply-To: *"user@knox.apache.org" 
> *Date: *Tuesday, 4 September 2018 at 07:46
> *To: *"user@knox.apache.org" 
> *Subject: *WebHDFS performance issue in Knox
>
>
>
> Hi,
>
>
>
> We're using Knox 1.1.0 to proxy WebHDFS request. If we download a file
> through WebHDFS in Knox, the download speed is just about 11M/s. However,
> if we download directly from datanode, the speed is about 40M/s at least.
>
>
>
> Are you guys aware of this problem? Any suggestion?
>
>
>
> Thanks,
>
> Guang
>


>>>
>


Re: WebHDFS performance issue in Knox

2018-09-11 Thread Guang Yang
Thanks Larry. But the only difference is this part in my gateway-site.xml.

**
*ssl.enabled*
*false*
*Indicates whether SSL is enabled.*
**

On Tue, Sep 11, 2018 at 11:42 AM, larry mccay  wrote:

> I really don't think that kind of difference should be expected from
> merely SSL overhead.
> I don't however have any metrics to contradict it either since I do not
> run Knox without SSL.
>
> Given the above, I am struggling coming up with a meaningful response to
> this. :(
> I don't think you should see a 10 fold increase in speed by disabling SSL
> though.
>
> On Tue, Sep 11, 2018 at 2:35 PM Guang Yang  wrote:
>
>> Any idea guys?
>>
>> On Mon, Sep 10, 2018 at 3:07 PM, Guang Yang  wrote:
>>
>>> Thanks guys! The issue seems exactly what David pointed out, which is
>>> because of encrypted over SSL.
>>>
>>> Without Knox, the download speed can reach to *400M/s* if I call
>>> Namenode directly. And with disabling SSL, the speed can reach to
>>> *~400M/s* as well through Knox. But with SSL, the speed drops
>>> significantly to *~40M/s*. I know it's because of encrypted, but it
>>> does surprised me with such a difference. Is it normal from your
>>> perspective?
>>>
>>> Thanks,
>>> Guang
>>>
>>> On Tue, Sep 4, 2018 at 11:07 AM, David Villarreal <
>>> dvillarr...@hortonworks.com> wrote:
>>>
 Hi Guang,



 Keep in mind the data is being encrypted over SSL.  If you disable SSL
 you will most likely see a very significant boost in throughput.  Some
 people have used more powerful computers to make encryption quicker.



 Thanks,



 David



 *From: *Sean Roberts 
 *Reply-To: *"user@knox.apache.org" 
 *Date: *Tuesday, September 4, 2018 at 1:53 AM
 *To: *"user@knox.apache.org" 
 *Subject: *Re: WebHDFS performance issue in Knox



 Guang – This is somewhat to be expected.



 When you talk to WebHDFS directly, the client can distribute the
 request across many data nodes. Also, you are getting data directly from
 the source.

 With Knox, all traffic goes through the single Knox host. Knox is
 responsible for fetching from the datanodes and consolidating to send to
 you. This means overhead as it’s acting as a middle man, and lower network
 capacity since only 1 host is serving data to you.



 Also, if running on a cloud provider, the Knox host may be a smaller
 instance size with lower network capacity.

 --

 Sean Roberts



 *From: *Guang Yang 
 *Reply-To: *"user@knox.apache.org" 
 *Date: *Tuesday, 4 September 2018 at 07:46
 *To: *"user@knox.apache.org" 
 *Subject: *WebHDFS performance issue in Knox



 Hi,



 We're using Knox 1.1.0 to proxy WebHDFS request. If we download a file
 through WebHDFS in Knox, the download speed is just about 11M/s. However,
 if we download directly from datanode, the speed is about 40M/s at least.



 Are you guys aware of this problem? Any suggestion?



 Thanks,

 Guang

>>>
>>>
>>


Re: WebHDFS performance issue in Knox

2018-09-11 Thread larry mccay
I really don't think that kind of difference should be expected from merely
SSL overhead.
I don't however have any metrics to contradict it either since I do not run
Knox without SSL.

Given the above, I am struggling coming up with a meaningful response to
this. :(
I don't think you should see a 10 fold increase in speed by disabling SSL
though.

On Tue, Sep 11, 2018 at 2:35 PM Guang Yang  wrote:

> Any idea guys?
>
> On Mon, Sep 10, 2018 at 3:07 PM, Guang Yang  wrote:
>
>> Thanks guys! The issue seems exactly what David pointed out, which is
>> because of encrypted over SSL.
>>
>> Without Knox, the download speed can reach to *400M/s* if I call
>> Namenode directly. And with disabling SSL, the speed can reach to
>> *~400M/s* as well through Knox. But with SSL, the speed drops
>> significantly to *~40M/s*. I know it's because of encrypted, but it does
>> surprised me with such a difference. Is it normal from your perspective?
>>
>> Thanks,
>> Guang
>>
>> On Tue, Sep 4, 2018 at 11:07 AM, David Villarreal <
>> dvillarr...@hortonworks.com> wrote:
>>
>>> Hi Guang,
>>>
>>>
>>>
>>> Keep in mind the data is being encrypted over SSL.  If you disable SSL
>>> you will most likely see a very significant boost in throughput.  Some
>>> people have used more powerful computers to make encryption quicker.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> David
>>>
>>>
>>>
>>> *From: *Sean Roberts 
>>> *Reply-To: *"user@knox.apache.org" 
>>> *Date: *Tuesday, September 4, 2018 at 1:53 AM
>>> *To: *"user@knox.apache.org" 
>>> *Subject: *Re: WebHDFS performance issue in Knox
>>>
>>>
>>>
>>> Guang – This is somewhat to be expected.
>>>
>>>
>>>
>>> When you talk to WebHDFS directly, the client can distribute the request
>>> across many data nodes. Also, you are getting data directly from the source.
>>>
>>> With Knox, all traffic goes through the single Knox host. Knox is
>>> responsible for fetching from the datanodes and consolidating to send to
>>> you. This means overhead as it’s acting as a middle man, and lower network
>>> capacity since only 1 host is serving data to you.
>>>
>>>
>>>
>>> Also, if running on a cloud provider, the Knox host may be a smaller
>>> instance size with lower network capacity.
>>>
>>> --
>>>
>>> Sean Roberts
>>>
>>>
>>>
>>> *From: *Guang Yang 
>>> *Reply-To: *"user@knox.apache.org" 
>>> *Date: *Tuesday, 4 September 2018 at 07:46
>>> *To: *"user@knox.apache.org" 
>>> *Subject: *WebHDFS performance issue in Knox
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> We're using Knox 1.1.0 to proxy WebHDFS request. If we download a file
>>> through WebHDFS in Knox, the download speed is just about 11M/s. However,
>>> if we download directly from datanode, the speed is about 40M/s at least.
>>>
>>>
>>>
>>> Are you guys aware of this problem? Any suggestion?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Guang
>>>
>>
>>
>