Re: WebHDFS performance issue in Knox
What client are you using to connect Knox? Is this for a single file or a bunch of files? The SSL handshake can be slow if the client doesn't keep the connection open. Kevin Risden On Tue, Sep 11, 2018, 14:51 Guang Yang wrote: > Thanks Larry. But the only difference is this part in my gateway-site.xml. > > ** > *ssl.enabled* > *false* > *Indicates whether SSL is enabled.* > ** > > On Tue, Sep 11, 2018 at 11:42 AM, larry mccay wrote: > >> I really don't think that kind of difference should be expected from >> merely SSL overhead. >> I don't however have any metrics to contradict it either since I do not >> run Knox without SSL. >> >> Given the above, I am struggling coming up with a meaningful response to >> this. :( >> I don't think you should see a 10 fold increase in speed by disabling SSL >> though. >> >> On Tue, Sep 11, 2018 at 2:35 PM Guang Yang wrote: >> >>> Any idea guys? >>> >>> On Mon, Sep 10, 2018 at 3:07 PM, Guang Yang wrote: >>> Thanks guys! The issue seems exactly what David pointed out, which is because of encrypted over SSL. Without Knox, the download speed can reach to *400M/s* if I call Namenode directly. And with disabling SSL, the speed can reach to *~400M/s* as well through Knox. But with SSL, the speed drops significantly to *~40M/s*. I know it's because of encrypted, but it does surprised me with such a difference. Is it normal from your perspective? Thanks, Guang On Tue, Sep 4, 2018 at 11:07 AM, David Villarreal < dvillarr...@hortonworks.com> wrote: > Hi Guang, > > > > Keep in mind the data is being encrypted over SSL. If you disable SSL > you will most likely see a very significant boost in throughput. Some > people have used more powerful computers to make encryption quicker. > > > > Thanks, > > > > David > > > > *From: *Sean Roberts > *Reply-To: *"user@knox.apache.org" > *Date: *Tuesday, September 4, 2018 at 1:53 AM > *To: *"user@knox.apache.org" > *Subject: *Re: WebHDFS performance issue in Knox > > > > Guang – This is somewhat to be expected. > > > > When you talk to WebHDFS directly, the client can distribute the > request across many data nodes. Also, you are getting data directly from > the source. > > With Knox, all traffic goes through the single Knox host. Knox is > responsible for fetching from the datanodes and consolidating to send to > you. This means overhead as it’s acting as a middle man, and lower network > capacity since only 1 host is serving data to you. > > > > Also, if running on a cloud provider, the Knox host may be a smaller > instance size with lower network capacity. > > -- > > Sean Roberts > > > > *From: *Guang Yang > *Reply-To: *"user@knox.apache.org" > *Date: *Tuesday, 4 September 2018 at 07:46 > *To: *"user@knox.apache.org" > *Subject: *WebHDFS performance issue in Knox > > > > Hi, > > > > We're using Knox 1.1.0 to proxy WebHDFS request. If we download a file > through WebHDFS in Knox, the download speed is just about 11M/s. However, > if we download directly from datanode, the speed is about 40M/s at least. > > > > Are you guys aware of this problem? Any suggestion? > > > > Thanks, > > Guang > >>> >
Re: WebHDFS performance issue in Knox
Thanks Larry. But the only difference is this part in my gateway-site.xml. ** *ssl.enabled* *false* *Indicates whether SSL is enabled.* ** On Tue, Sep 11, 2018 at 11:42 AM, larry mccay wrote: > I really don't think that kind of difference should be expected from > merely SSL overhead. > I don't however have any metrics to contradict it either since I do not > run Knox without SSL. > > Given the above, I am struggling coming up with a meaningful response to > this. :( > I don't think you should see a 10 fold increase in speed by disabling SSL > though. > > On Tue, Sep 11, 2018 at 2:35 PM Guang Yang wrote: > >> Any idea guys? >> >> On Mon, Sep 10, 2018 at 3:07 PM, Guang Yang wrote: >> >>> Thanks guys! The issue seems exactly what David pointed out, which is >>> because of encrypted over SSL. >>> >>> Without Knox, the download speed can reach to *400M/s* if I call >>> Namenode directly. And with disabling SSL, the speed can reach to >>> *~400M/s* as well through Knox. But with SSL, the speed drops >>> significantly to *~40M/s*. I know it's because of encrypted, but it >>> does surprised me with such a difference. Is it normal from your >>> perspective? >>> >>> Thanks, >>> Guang >>> >>> On Tue, Sep 4, 2018 at 11:07 AM, David Villarreal < >>> dvillarr...@hortonworks.com> wrote: >>> Hi Guang, Keep in mind the data is being encrypted over SSL. If you disable SSL you will most likely see a very significant boost in throughput. Some people have used more powerful computers to make encryption quicker. Thanks, David *From: *Sean Roberts *Reply-To: *"user@knox.apache.org" *Date: *Tuesday, September 4, 2018 at 1:53 AM *To: *"user@knox.apache.org" *Subject: *Re: WebHDFS performance issue in Knox Guang – This is somewhat to be expected. When you talk to WebHDFS directly, the client can distribute the request across many data nodes. Also, you are getting data directly from the source. With Knox, all traffic goes through the single Knox host. Knox is responsible for fetching from the datanodes and consolidating to send to you. This means overhead as it’s acting as a middle man, and lower network capacity since only 1 host is serving data to you. Also, if running on a cloud provider, the Knox host may be a smaller instance size with lower network capacity. -- Sean Roberts *From: *Guang Yang *Reply-To: *"user@knox.apache.org" *Date: *Tuesday, 4 September 2018 at 07:46 *To: *"user@knox.apache.org" *Subject: *WebHDFS performance issue in Knox Hi, We're using Knox 1.1.0 to proxy WebHDFS request. If we download a file through WebHDFS in Knox, the download speed is just about 11M/s. However, if we download directly from datanode, the speed is about 40M/s at least. Are you guys aware of this problem? Any suggestion? Thanks, Guang >>> >>> >>
Re: WebHDFS performance issue in Knox
I really don't think that kind of difference should be expected from merely SSL overhead. I don't however have any metrics to contradict it either since I do not run Knox without SSL. Given the above, I am struggling coming up with a meaningful response to this. :( I don't think you should see a 10 fold increase in speed by disabling SSL though. On Tue, Sep 11, 2018 at 2:35 PM Guang Yang wrote: > Any idea guys? > > On Mon, Sep 10, 2018 at 3:07 PM, Guang Yang wrote: > >> Thanks guys! The issue seems exactly what David pointed out, which is >> because of encrypted over SSL. >> >> Without Knox, the download speed can reach to *400M/s* if I call >> Namenode directly. And with disabling SSL, the speed can reach to >> *~400M/s* as well through Knox. But with SSL, the speed drops >> significantly to *~40M/s*. I know it's because of encrypted, but it does >> surprised me with such a difference. Is it normal from your perspective? >> >> Thanks, >> Guang >> >> On Tue, Sep 4, 2018 at 11:07 AM, David Villarreal < >> dvillarr...@hortonworks.com> wrote: >> >>> Hi Guang, >>> >>> >>> >>> Keep in mind the data is being encrypted over SSL. If you disable SSL >>> you will most likely see a very significant boost in throughput. Some >>> people have used more powerful computers to make encryption quicker. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> David >>> >>> >>> >>> *From: *Sean Roberts >>> *Reply-To: *"user@knox.apache.org" >>> *Date: *Tuesday, September 4, 2018 at 1:53 AM >>> *To: *"user@knox.apache.org" >>> *Subject: *Re: WebHDFS performance issue in Knox >>> >>> >>> >>> Guang – This is somewhat to be expected. >>> >>> >>> >>> When you talk to WebHDFS directly, the client can distribute the request >>> across many data nodes. Also, you are getting data directly from the source. >>> >>> With Knox, all traffic goes through the single Knox host. Knox is >>> responsible for fetching from the datanodes and consolidating to send to >>> you. This means overhead as it’s acting as a middle man, and lower network >>> capacity since only 1 host is serving data to you. >>> >>> >>> >>> Also, if running on a cloud provider, the Knox host may be a smaller >>> instance size with lower network capacity. >>> >>> -- >>> >>> Sean Roberts >>> >>> >>> >>> *From: *Guang Yang >>> *Reply-To: *"user@knox.apache.org" >>> *Date: *Tuesday, 4 September 2018 at 07:46 >>> *To: *"user@knox.apache.org" >>> *Subject: *WebHDFS performance issue in Knox >>> >>> >>> >>> Hi, >>> >>> >>> >>> We're using Knox 1.1.0 to proxy WebHDFS request. If we download a file >>> through WebHDFS in Knox, the download speed is just about 11M/s. However, >>> if we download directly from datanode, the speed is about 40M/s at least. >>> >>> >>> >>> Are you guys aware of this problem? Any suggestion? >>> >>> >>> >>> Thanks, >>> >>> Guang >>> >> >> >