Thanks Larry. As soon as I cross my current hurdles, I plan to write a blog
post.At Uber, we are replying on Knox a lot.So far good experience.
On Wednesday, November 9, 2016 5:43 PM, larry mccay <[email protected]>
wrote:
Hi Mohammad -
That is wonderful to see!Not sure I understand why they are so close but I'm
not arguing with success."Performance Tip #1: turn off DEBUG logging" :)
These numbers may be useful within a blog post or something too.
Thank you for sharing with the community!
thanks,
--larry
On Wed, Nov 9, 2016 at 5:26 PM, Mohammad Islam <[email protected]> wrote:
Updates:
*Root cause : The log level was DEBUG. As soon as I moved it to INFO for all,
the performance got very comparable.
Data : I ran those downloads for 7 times and then averaged. Looks like they are
very close to each other. I tried it from a 3rd machine NOT the machine where
Knox was running. Overall: download speed for direct WebHDFS was nearly384M.
With Knox proxy, download speed was 383M.
| Data Size | ~14 GB | | | | | |
| | | | | | | |
| Iteration | Approach | Time (sec) | Downland speed (MBS) | | | |
| 1 | Direct | 42 | 325 | Knox | 44 | 310 |
| 2 | Direct | 31 | 444 | Knox | 29 | 467 |
| 3 | Direct | 44 | 314 | Knox | 51 | 270 |
| 4 | Direct | 38 | 359 | Knox | 36 | 382 |
| 5 | Direct | 73 | 188 | Knox | 39 | 350 |
| 6 | Direct | 25 | 536 | Knox | 28 | 489 |
| 7 | Direct | 26 | 523 | Knox | 33 | 410 |
| | | 39.85714286 | 384.1428571 | | 37.14285714 | 382.5714286 |
On Saturday, November 5, 2016 7:14 PM, Mohammad Islam <[email protected]>
wrote:
Thanks Larry for sharing your findings.Number looks much better than mine. I
tried with 0.9.1. Should i upgrade to 0,10,
If possible can you please share your exact command with various options. Did
you try with SSL on? My two hosts were different and i tried it from KNOX_HOST
box.
Any other idea of how can i get better number?
Regards,Mohammad
On Saturday, November 5, 2016 6:55 PM, larry mccay <[email protected]>
wrote:
Hi Mohammad -
I have played around with this a bit and haven't been able to reproduce your
results.
My environment is a sandbox VM download and the Apache Knox 0.10.0 test
instance running on the host machine.I put an ~8.5 GB file in hdfs and OPENed
it with and without Knox.
With Knox:100 8470M 0 8470M 0 0 9.9M 0 --:--:-- 0:14:09
--:--:-- 9.9M
Direct to WebHDFS:100 8470M 0 8470M 0 0 13.6M 0 --:--:--
0:10:20 --:--:-- 14.9M
While we are certainly not speeding things up it isn't too bad.I believe that
there is still room for some optimization in our rewrite process as has been
discussed a bit on [1].
This would get the numbers even closer together probably.However, even that
won't make up the difference that you are seeing.
I wonder what your test environment looks like where you are getting 99.6M avg
speed direct and 4.8M from Knox.If the KNOX_HOST and WEBHDFS_HOST are different
machines maybe you should try the direct curl command from the KNOX_HOST and
see if there is a difference being introduced by the network or something like
that.
thanks,
--larry
[1] https://issues.apache.org/ jira/browse/KNOX-767
On Sat, Nov 5, 2016 at 6:41 PM, larry mccay <[email protected]> wrote:
Hi Mohammad -
Thanks for reporting this.
That is a big difference.Let me play around with it and see what I can
reproduce.
thanks,
--larry
On Sat, Nov 5, 2016 at 5:52 PM, Mohammad Islam <[email protected]> wrote:
Hi,I did a very basic comparison of download speed. I used similar "curl .."
command to download a large file (13.6 GB) and gathered the numbers.
Looks like WebHDFS with Knox is very slow ( at least 20x slower). I ran it
twice with similar numbers. For Knox, I turned off SSL and both cases I used
unsecured (non-Kerberos) cluster.
Download with Knox took nearly 49 minutes whereas direct download took 2 mins.
The download speed was 4811k for Knox and 99.6M for direct download.
I'm sure I have done something wrong. Do you see any such performance? Any help
will be really appreciated.
Regards,Mohammad
Interactions:curl -o t2.direct -L http://<WEBHDFS_HOST>:50070/we
bhdfs/v1/<FILE_PATH>?op=OPEN % Total % Received % Xferd Average Speed
Time Time Time Current Dload Upload
Total Spent Left Speed 0 0 0 0 0 0 0 0
--:--:-- --:--:-- --:--:-- 0100 13.5G 100 13.5G 0 0 99.6M 0
0:02:19 0:02:19 --:--:-- 117M
curl -H X-Auth-Params-Email: [email protected] -o t2 -L
http://<KNOW_HOST>:8445/gatewa y/sandbox/webhdfs/v1/<FILE_ PATH>?op=OPEN %
Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed 0
0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0
0 13.5G 0 0 4811k 0 --:--:-- 0:49:12 --:--:-- 6121k