If I run two downloads concurrently:

1,073,741,824 46.1MB/s   in 22s
1,073,741,824 51.3MB/s   in 22s

So it isn't a limitation of the Knox gateway itself in total bandwidth but
a per connection limitation somehow.

Kevin Risden


On Tue, Oct 9, 2018 at 12:24 PM Kevin Risden <kris...@apache.org> wrote:

> So I was able to reproduce a slowdown with SSL with a pseudo distributed
> HDFS setup on a single node with Knox running on the same node. This was
> setup in Virtualbox on my laptop.
>
> Rough timings with wget for a 1GB random file:
>
>    - directly to webhdfs - 1,073,741,824  252MB/s   in 3.8s
>    - knox no ssl - 1,073,741,824  264MB/s   in 3.6s
>    - knox ssl - 1,073,741,824 54.3MB/s   in 20s
>
> There is a significant decrease with Knox SSL for some reason.
>
> Kevin Risden
>
>
> On Sun, Sep 23, 2018 at 8:53 PM larry mccay <lmc...@apache.org> wrote:
>
>> SSL handshake will likely happen at least twice.
>> Once for the request through Knox to the NN then the redirect from the NN
>> to the DN goes all the way back to the client.
>> So they have to follow the redirect and do the handshake to the DN.
>>
>>
>> On Sun, Sep 23, 2018 at 8:30 PM Kevin Risden <kris...@apache.org> wrote:
>>
>>> So I found this in the Knox issues list in JIRA:
>>>
>>> https://issues.apache.org/jira/browse/KNOX-1221
>>>
>>> It sounds familiar in terms of a slowdown when going through Knox.
>>>
>>> Kevin Risden
>>>
>>>
>>> On Sat, Sep 15, 2018 at 10:17 PM Kevin Risden <kris...@apache.org>
>>> wrote:
>>>
>>>> Hmmm yea curl for a single file should do the handshake once.
>>>>
>>>> What are the system performance statistics during the SSL vs non SSL
>>>> testing? CPU/memory/disk/etc? Ambari metrics with Grafana would help here
>>>> if using that. Otherwise watching top may be helpful. It would be help to
>>>> determine if the Knox is working harder during the SSL transfer.
>>>>
>>>> Kevin Risden
>>>>
>>>>
>>>> On Wed, Sep 12, 2018 at 2:52 PM Guang Yang <k...@uber.com> wrote:
>>>>
>>>>> I'm just using curl to download a single large file. So I suspect SSL
>>>>> handshake just happens once?
>>>>>
>>>>> On Tue, Sep 11, 2018 at 12:02 PM
>>>>> Kevin Risden
>>>>> <kris...@apache.org> wrote:
>>>>>
>>>>>> What client are you using to connect Knox? Is this for a single file
>>>>>> or a bunch of files?
>>>>>>
>>>>>> The SSL handshake can be slow if the client doesn't keep the
>>>>>> connection open.
>>>>>>
>>>>>> Kevin Risden
>>>>>>
>>>>>> On Tue, Sep 11, 2018, 14:51 Guang Yang <k...@uber.com> wrote:
>>>>>>
>>>>>>> Thanks Larry. But the only difference is this part in my
>>>>>>> gateway-site.xml.
>>>>>>>
>>>>>>> *<property>*
>>>>>>> *        <name>ssl.enabled</name>*
>>>>>>> *        <value>false</value>*
>>>>>>> *        <description>Indicates whether SSL is
>>>>>>> enabled.</description>*
>>>>>>> *</property>*
>>>>>>>
>>>>>>> On Tue, Sep 11, 2018 at 11:42 AM, larry mccay <lmc...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I really don't think that kind of difference should be expected
>>>>>>>> from merely SSL overhead.
>>>>>>>> I don't however have any metrics to contradict it either since I do
>>>>>>>> not run Knox without SSL.
>>>>>>>>
>>>>>>>> Given the above, I am struggling coming up with a meaningful
>>>>>>>> response to this. :(
>>>>>>>> I don't think you should see a 10 fold increase in speed by
>>>>>>>> disabling SSL though.
>>>>>>>>
>>>>>>>> On Tue, Sep 11, 2018 at 2:35 PM Guang Yang <k...@uber.com> wrote:
>>>>>>>>
>>>>>>>>> Any idea guys?
>>>>>>>>>
>>>>>>>>> On Mon, Sep 10, 2018 at 3:07 PM, Guang Yang <k...@uber.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks guys! The issue seems exactly what David pointed out,
>>>>>>>>>> which is because of encrypted over SSL.
>>>>>>>>>>
>>>>>>>>>> Without Knox, the download speed can reach to *400M/s* if I call
>>>>>>>>>> Namenode directly. And with disabling SSL, the speed can reach to
>>>>>>>>>> *~400M/s* as well through Knox. But with SSL, the speed drops
>>>>>>>>>> significantly to *~40M/s*. I know it's because of encrypted, but
>>>>>>>>>> it does surprised me with such a difference. Is it normal from your
>>>>>>>>>> perspective?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Guang
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 4, 2018 at 11:07 AM, David Villarreal <
>>>>>>>>>> dvillarr...@hortonworks.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Guang,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Keep in mind the data is being encrypted over SSL.  If you
>>>>>>>>>>> disable SSL you will most likely see a very significant boost in
>>>>>>>>>>> throughput.  Some people have used more powerful computers to make
>>>>>>>>>>> encryption quicker.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *From: *Sean Roberts <srobe...@hortonworks.com>
>>>>>>>>>>> *Reply-To: *"user@knox.apache.org" <user@knox.apache.org>
>>>>>>>>>>> *Date: *Tuesday, September 4, 2018 at 1:53 AM
>>>>>>>>>>> *To: *"user@knox.apache.org" <user@knox.apache.org>
>>>>>>>>>>> *Subject: *Re: WebHDFS performance issue in Knox
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Guang – This is somewhat to be expected.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> When you talk to WebHDFS directly, the client can distribute the
>>>>>>>>>>> request across many data nodes. Also, you are getting data directly 
>>>>>>>>>>> from
>>>>>>>>>>> the source.
>>>>>>>>>>>
>>>>>>>>>>> With Knox, all traffic goes through the single Knox host. Knox
>>>>>>>>>>> is responsible for fetching from the datanodes and consolidating to 
>>>>>>>>>>> send to
>>>>>>>>>>> you. This means overhead as it’s acting as a middle man, and lower 
>>>>>>>>>>> network
>>>>>>>>>>> capacity since only 1 host is serving data to you.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Also, if running on a cloud provider, the Knox host may be a
>>>>>>>>>>> smaller instance size with lower network capacity.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Sean Roberts
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *From: *Guang Yang <k...@uber.com>
>>>>>>>>>>> *Reply-To: *"user@knox.apache.org" <user@knox.apache.org>
>>>>>>>>>>> *Date: *Tuesday, 4 September 2018 at 07:46
>>>>>>>>>>> *To: *"user@knox.apache.org" <user@knox.apache.org>
>>>>>>>>>>> *Subject: *WebHDFS performance issue in Knox
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We're using Knox 1.1.0 to proxy WebHDFS request. If we download
>>>>>>>>>>> a file through WebHDFS in Knox, the download speed is just about 
>>>>>>>>>>> 11M/s.
>>>>>>>>>>> However, if we download directly from datanode, the speed is about 
>>>>>>>>>>> 40M/s at
>>>>>>>>>>> least.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Are you guys aware of this problem? Any suggestion?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Guang
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>

Reply via email to