So I was able to reproduce a slowdown with SSL with a pseudo distributed HDFS setup on a single node with Knox running on the same node. This was setup in Virtualbox on my laptop.
Rough timings with wget for a 1GB random file: - directly to webhdfs - 1,073,741,824 252MB/s in 3.8s - knox no ssl - 1,073,741,824 264MB/s in 3.6s - knox ssl - 1,073,741,824 54.3MB/s in 20s There is a significant decrease with Knox SSL for some reason. Kevin Risden On Sun, Sep 23, 2018 at 8:53 PM larry mccay <lmc...@apache.org> wrote: > SSL handshake will likely happen at least twice. > Once for the request through Knox to the NN then the redirect from the NN > to the DN goes all the way back to the client. > So they have to follow the redirect and do the handshake to the DN. > > > On Sun, Sep 23, 2018 at 8:30 PM Kevin Risden <kris...@apache.org> wrote: > >> So I found this in the Knox issues list in JIRA: >> >> https://issues.apache.org/jira/browse/KNOX-1221 >> >> It sounds familiar in terms of a slowdown when going through Knox. >> >> Kevin Risden >> >> >> On Sat, Sep 15, 2018 at 10:17 PM Kevin Risden <kris...@apache.org> wrote: >> >>> Hmmm yea curl for a single file should do the handshake once. >>> >>> What are the system performance statistics during the SSL vs non SSL >>> testing? CPU/memory/disk/etc? Ambari metrics with Grafana would help here >>> if using that. Otherwise watching top may be helpful. It would be help to >>> determine if the Knox is working harder during the SSL transfer. >>> >>> Kevin Risden >>> >>> >>> On Wed, Sep 12, 2018 at 2:52 PM Guang Yang <k...@uber.com> wrote: >>> >>>> I'm just using curl to download a single large file. So I suspect SSL >>>> handshake just happens once? >>>> >>>> On Tue, Sep 11, 2018 at 12:02 PM >>>> Kevin Risden >>>> <kris...@apache.org> wrote: >>>> >>>>> What client are you using to connect Knox? Is this for a single file >>>>> or a bunch of files? >>>>> >>>>> The SSL handshake can be slow if the client doesn't keep the >>>>> connection open. >>>>> >>>>> Kevin Risden >>>>> >>>>> On Tue, Sep 11, 2018, 14:51 Guang Yang <k...@uber.com> wrote: >>>>> >>>>>> Thanks Larry. But the only difference is this part in my >>>>>> gateway-site.xml. >>>>>> >>>>>> *<property>* >>>>>> * <name>ssl.enabled</name>* >>>>>> * <value>false</value>* >>>>>> * <description>Indicates whether SSL is enabled.</description>* >>>>>> *</property>* >>>>>> >>>>>> On Tue, Sep 11, 2018 at 11:42 AM, larry mccay <lmc...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> I really don't think that kind of difference should be expected from >>>>>>> merely SSL overhead. >>>>>>> I don't however have any metrics to contradict it either since I do >>>>>>> not run Knox without SSL. >>>>>>> >>>>>>> Given the above, I am struggling coming up with a meaningful >>>>>>> response to this. :( >>>>>>> I don't think you should see a 10 fold increase in speed by >>>>>>> disabling SSL though. >>>>>>> >>>>>>> On Tue, Sep 11, 2018 at 2:35 PM Guang Yang <k...@uber.com> wrote: >>>>>>> >>>>>>>> Any idea guys? >>>>>>>> >>>>>>>> On Mon, Sep 10, 2018 at 3:07 PM, Guang Yang <k...@uber.com> wrote: >>>>>>>> >>>>>>>>> Thanks guys! The issue seems exactly what David pointed out, which >>>>>>>>> is because of encrypted over SSL. >>>>>>>>> >>>>>>>>> Without Knox, the download speed can reach to *400M/s* if I call >>>>>>>>> Namenode directly. And with disabling SSL, the speed can reach to >>>>>>>>> *~400M/s* as well through Knox. But with SSL, the speed drops >>>>>>>>> significantly to *~40M/s*. I know it's because of encrypted, but >>>>>>>>> it does surprised me with such a difference. Is it normal from your >>>>>>>>> perspective? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Guang >>>>>>>>> >>>>>>>>> On Tue, Sep 4, 2018 at 11:07 AM, David Villarreal < >>>>>>>>> dvillarr...@hortonworks.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Guang, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Keep in mind the data is being encrypted over SSL. If you >>>>>>>>>> disable SSL you will most likely see a very significant boost in >>>>>>>>>> throughput. Some people have used more powerful computers to make >>>>>>>>>> encryption quicker. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *From: *Sean Roberts <srobe...@hortonworks.com> >>>>>>>>>> *Reply-To: *"user@knox.apache.org" <user@knox.apache.org> >>>>>>>>>> *Date: *Tuesday, September 4, 2018 at 1:53 AM >>>>>>>>>> *To: *"user@knox.apache.org" <user@knox.apache.org> >>>>>>>>>> *Subject: *Re: WebHDFS performance issue in Knox >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Guang – This is somewhat to be expected. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> When you talk to WebHDFS directly, the client can distribute the >>>>>>>>>> request across many data nodes. Also, you are getting data directly >>>>>>>>>> from >>>>>>>>>> the source. >>>>>>>>>> >>>>>>>>>> With Knox, all traffic goes through the single Knox host. Knox is >>>>>>>>>> responsible for fetching from the datanodes and consolidating to >>>>>>>>>> send to >>>>>>>>>> you. This means overhead as it’s acting as a middle man, and lower >>>>>>>>>> network >>>>>>>>>> capacity since only 1 host is serving data to you. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Also, if running on a cloud provider, the Knox host may be a >>>>>>>>>> smaller instance size with lower network capacity. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Sean Roberts >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *From: *Guang Yang <k...@uber.com> >>>>>>>>>> *Reply-To: *"user@knox.apache.org" <user@knox.apache.org> >>>>>>>>>> *Date: *Tuesday, 4 September 2018 at 07:46 >>>>>>>>>> *To: *"user@knox.apache.org" <user@knox.apache.org> >>>>>>>>>> *Subject: *WebHDFS performance issue in Knox >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> We're using Knox 1.1.0 to proxy WebHDFS request. If we download a >>>>>>>>>> file through WebHDFS in Knox, the download speed is just about 11M/s. >>>>>>>>>> However, if we download directly from datanode, the speed is about >>>>>>>>>> 40M/s at >>>>>>>>>> least. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Are you guys aware of this problem? Any suggestion? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Guang >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>