Hi Mohammad - That is wonderful to see! Not sure I understand why they are so close but I'm not arguing with success. "Performance Tip #1: turn off DEBUG logging" :)
These numbers may be useful within a blog post or something too. Thank you for sharing with the community! thanks, --larry On Wed, Nov 9, 2016 at 5:26 PM, Mohammad Islam <[email protected]> wrote: > Updates: > > *Root cause : The log level was DEBUG. As soon as I moved it to INFO for > all, the performance got very comparable. > > Data : I ran those downloads for 7 times and then averaged. Looks like > they are very close to each other. > I tried it from a 3rd machine NOT the machine where Knox was running. > Overall: download speed for direct WebHDFS was nearly384M. With Knox proxy, > download speed was 383M. > > > Data Size ~14 GB > Iteration Approach Time (sec) Downland speed (MBS) > 1 Direct 42 325 Knox 44 310 > 2 Direct 31 444 Knox 29 467 > 3 Direct 44 314 Knox 51 270 > 4 Direct 38 359 Knox 36 382 > 5 Direct 73 188 Knox 39 350 > 6 Direct 25 536 Knox 28 489 > 7 Direct 26 523 Knox 33 410 > 39.85714286 384.1428571 37.14285714 382.5714286 > > > On Saturday, November 5, 2016 7:14 PM, Mohammad Islam <[email protected]> > wrote: > > > Thanks Larry for sharing your findings. > Number looks much better than mine. I tried with 0.9.1. Should i upgrade > to 0,10, > > If possible can you please share your exact command with various options. > Did you try with SSL on? My two hosts were different and i tried it from > KNOX_HOST box. > > Any other idea of how can i get better number? > > Regards, > Mohammad > > > > > > On Saturday, November 5, 2016 6:55 PM, larry mccay <[email protected]> > wrote: > > > Hi Mohammad - > > I have played around with this a bit and haven't been able to reproduce > your results. > > My environment is a sandbox VM download and the Apache Knox 0.10.0 test > instance running on the host machine. > I put an ~8.5 GB file in hdfs and OPENed it with and without Knox. > > With Knox: > 100 8470M 0 8470M 0 0 9.9M 0 --:--:-- 0:14:09 --:--:-- > 9.9M > > Direct to WebHDFS: > 100 8470M 0 8470M 0 0 13.6M 0 --:--:-- 0:10:20 --:--:-- > 14.9M > > While we are certainly not speeding things up it isn't too bad. > I believe that there is still room for some optimization in our rewrite > process as has been discussed a bit on [1]. > > This would get the numbers even closer together probably. > However, even that won't make up the difference that you are seeing. > > I wonder what your test environment looks like where you are getting 99.6M > avg speed direct and 4.8M from Knox. > If the KNOX_HOST and WEBHDFS_HOST are different machines maybe you should > try the direct curl command from the KNOX_HOST and see if there is a > difference being introduced by the network or something like that. > > thanks, > > --larry > > [1] https://issues.apache.org/jira/browse/KNOX-767 > > > > On Sat, Nov 5, 2016 at 6:41 PM, larry mccay <[email protected]> wrote: > > Hi Mohammad - > > Thanks for reporting this. > > That is a big difference. > Let me play around with it and see what I can reproduce. > > thanks, > > --larry > > On Sat, Nov 5, 2016 at 5:52 PM, Mohammad Islam <[email protected]> wrote: > > Hi, > I did a very basic comparison of download speed. I used similar "curl .." > command to download a large file (13.6 GB) and gathered the numbers. > > Looks like WebHDFS with Knox is very slow ( at least 20x slower). I ran it > twice with similar numbers. For Knox, I turned off SSL and both cases I > used unsecured (non-Kerberos) cluster. > > Download with Knox took nearly 49 minutes whereas direct download took 2 > mins. The download speed was *4811k* for Knox and *99.6M* for direct > download. > > I'm sure I have done something wrong. Do you see any such performance? Any > help will be really appreciated. > > Regards, > Mohammad > > > > > > > Interactions: > curl -o t2.direct -L http://<WEBHDFS_HOST>:50070/we > bhdfs/v1/<FILE_PATH>?op=OPEN > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left > Speed > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- > 0 > 100 13.5G 100 13.5G 0 0 *99.6M* 0 0:02:19 *0:02:19* > --:--:-- 117M > > > > > curl -H X-Auth-Params-Email: [email protected] -o t2 -L http:// > <http://hadoopdevgw01-/><KNOW_HOST>:8445/gatewa > y/sandbox/webhdfs/v1/<FILE_ PATH>?op=OPEN > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left > Speed > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- > 0 > 0 0 0 13.5G 0 0 *4811k* 0 --:--:-- *0:49:12* > --:--:-- 6121k > > > > > > > > >
