Hi Mohammad -

That is wonderful to see!
Not sure I understand why they are so close but I'm not arguing with
success.
"Performance Tip #1: turn off DEBUG logging" :)

These numbers may be useful within a blog post or something too.

Thank you for sharing with the community!

thanks,

--larry


On Wed, Nov 9, 2016 at 5:26 PM, Mohammad Islam <[email protected]> wrote:

> Updates:
>
> *Root cause : The log level  was DEBUG. As soon as I moved it to INFO for
> all, the performance got very comparable.
>
> Data : I ran those downloads for 7 times and then averaged. Looks like
> they are very close to each other.
> I tried it from a 3rd machine NOT the machine where Knox was running.
> Overall: download speed for direct WebHDFS was nearly384M. With Knox proxy,
> download speed was 383M.
>
>
> Data Size ~14 GB
> Iteration Approach Time (sec) Downland speed (MBS)
> 1 Direct 42 325 Knox 44 310
> 2 Direct 31 444 Knox 29 467
> 3 Direct 44 314 Knox 51 270
> 4 Direct 38 359 Knox 36 382
> 5 Direct 73 188 Knox 39 350
> 6 Direct 25 536 Knox 28 489
> 7 Direct 26 523 Knox 33 410
> 39.85714286 384.1428571 37.14285714 382.5714286
>
>
> On Saturday, November 5, 2016 7:14 PM, Mohammad Islam <[email protected]>
> wrote:
>
>
> Thanks Larry for sharing your findings.
> Number looks much better than mine. I tried with 0.9.1. Should i upgrade
> to 0,10,
>
> If possible can you please share your exact command with various options.
> Did you try with SSL  on? My two hosts were different and i tried it from
> KNOX_HOST box.
>
> Any other idea of how can i get better number?
>
> Regards,
> Mohammad
>
>
>
>
>
> On Saturday, November 5, 2016 6:55 PM, larry mccay <[email protected]>
> wrote:
>
>
> Hi Mohammad -
>
> I have played around with this a bit and haven't been able to reproduce
> your results.
>
> My environment is a sandbox VM download and the Apache Knox 0.10.0 test
> instance running on the host machine.
> I put an ~8.5 GB file in hdfs and OPENed it with and without Knox.
>
> With Knox:
> 100 8470M    0 8470M    0     0   9.9M      0 --:--:--  0:14:09 --:--:--
>  9.9M
>
> Direct to WebHDFS:
> 100 8470M    0 8470M    0     0  13.6M      0 --:--:--  0:10:20 --:--:--
> 14.9M
>
> While we are certainly not speeding things up it isn't too bad.
> I believe that there is still room for some optimization in our rewrite
> process as has been discussed a bit on [1].
>
> This would get the numbers even closer together probably.
> However, even that won't make up the difference that you are seeing.
>
> I wonder what your test environment looks like where you are getting 99.6M
> avg speed direct and 4.8M from Knox.
> If the KNOX_HOST and WEBHDFS_HOST are different machines maybe you should
> try the direct curl command from the KNOX_HOST and see if there is a
> difference being introduced by the network or something like that.
>
> thanks,
>
> --larry
>
> [1] https://issues.apache.org/jira/browse/KNOX-767
>
>
>
> On Sat, Nov 5, 2016 at 6:41 PM, larry mccay <[email protected]> wrote:
>
> Hi Mohammad -
>
> Thanks for reporting this.
>
> That is a big difference.
> Let me play around with it and see what I can reproduce.
>
> thanks,
>
> --larry
>
> On Sat, Nov 5, 2016 at 5:52 PM, Mohammad Islam <[email protected]> wrote:
>
> Hi,
> I did a very basic comparison of download speed. I used similar "curl .."
>  command to download a large file (13.6 GB) and gathered the numbers.
>
> Looks like WebHDFS with Knox is very slow ( at least 20x slower). I ran it
> twice with similar numbers. For Knox, I turned off  SSL and both cases I
> used unsecured (non-Kerberos)  cluster.
>
> Download with Knox took nearly 49 minutes whereas direct download took 2
> mins. The download speed was *4811k* for Knox and  *99.6M* for direct
> download.
>
> I'm sure I have done something wrong. Do you see any such performance? Any
> help will be really appreciated.
>
> Regards,
> Mohammad
>
>
>
>
>
>
> Interactions:
> curl -o t2.direct -L http://<WEBHDFS_HOST>:50070/we
> bhdfs/v1/<FILE_PATH>?op=OPEN
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                  Dload  Upload   Total   Spent    Left
> Speed
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--
>   0
> 100 13.5G  100 13.5G    0     0  *99.6M*      0  0:02:19  *0:02:19*
> --:--:--  117M
>
>
>
>
> curl -H X-Auth-Params-Email: [email protected] -o t2 -L http://
> <http://hadoopdevgw01-/><KNOW_HOST>:8445/gatewa
> y/sandbox/webhdfs/v1/<FILE_ PATH>?op=OPEN
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                  Dload  Upload   Total   Spent    Left
> Speed
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--
>   0
>   0     0    0 13.5G    0     0  *4811k*      0 --:--:--  *0:49:12*
> --:--:-- 6121k
>
>
>
>
>
>
>
>
>

Reply via email to