Re: Load balancing of Hiveserver2 through Knox

2019-01-15 Thread David Villarreal
Hi Rabii,

There is a lot to think about here.  I don’t think every request/connection 
would be a good design to check zookeeper every time, but maybe if there is a 
way to identify a new client-session we could design it to go check zookeeper.  
We would also need to see what impact in performance this could be.  But I do 
like the concept.  Just keep in mind for zookeeper, I don’t think this is a 
true loadbalancer in the hive code.  I believe it randomly returns a host:port 
for a registered hiveserver2 instance.

Best regards,

David
From: rabii lamriq 
Reply-To: "user@knox.apache.org" 
Date: Tuesday, January 15, 2019 at 1:01 AM
To: "user@knox.apache.org" 
Subject: Load balancing of Hiveserver2 through Knox

Hi

I am using knox to connect to HS2, but Knox ensure only HA and not Load 
balancing.

In fact, I noticed that there are a load balancing when I connect to HS2 using 
Zookeeper only, but using Knox, knox connect to zookeeper to get an available 
instance of HS2, then use this instance for all connection.

My question is : can we make any thing to let knox to connect to zookeeper in 
each new connection in order to get a different instance for each new 
connection to HS2.

Best
Rabii


Re: [CANCEL][VOTE] Release Apache Knox 1.2.0 RC 2

2018-11-30 Thread David Villarreal
Outstanding.  Keep up the great work!

From: larry mccay 
Reply-To: "user@knox.apache.org" 
Date: Friday, November 30, 2018 at 7:35 AM
To: "user@knox.apache.org" 
Subject: Re: [CANCEL][VOTE] Release Apache Knox 1.2.0 RC 2

Good find, Kevin!

On Fri, Nov 30, 2018 at 10:13 AM Kevin Risden 
mailto:kris...@apache.org>> wrote:
-1 found https://issues.apache.org/jira/browse/KNOX-1645

All other testing went well minus Zeppelin UI due to KNOX-1645. I'll respin a 
new rc later today.

Tested the following:

  *   src zip - mvn verify -Ppackage,release
  *   knoxshell - able to connect to Knox and execute a few examples
  *   Ran rc2 against https://github.com/risdenk/knox-performance-tests to test 
unsecure webhdfs, hbase, hive
  *   Manual testing of UIs with Kerberos
  *   Zeppelin UI with websocket enabled
  *   Manual testing of Knox against Kerberized cluster
  *   knoxsso Pac4j backed by Okta
  *   hadoopauth provider
  *   token service and Bearer tokens
  *   default topology url
  *   topology port mapping
  *   ambari discovery
  *   hadoop group provider
Kevin Risden

On Wed, Nov 28, 2018 at 2:41 PM larry mccay 
mailto:lmc...@apache.org>> wrote:
All -

Thanks to Kevin for so much work in cleaning up the backlog and taking on
release manager work for 1.2.0!

The 1.2.0 release happens to contain many dependency upgrades.
Not the least of which is Jetty itself from 9.2.x to 9.4.x.

We need to put some key areas through their paces pretty rigorously in
order to tease out any corner case regressions.

For instance, any Jetty specific features - such as:
* Websocket support
* SSL configuration params
* buffer size tweaks
* comb the gateway config and see which things are configuring aspects of
Jetty where units of measure defaults may change, etc

* Pac4J was upgraded - we will need KnoxSSO testing of SAML via Okta,
Google Authenticator, etc.

* Classloading is already known to have changed and has caused some issues
that have already been found.
Give some thought into possible classloading issues: singletons, custom
providers added to the ext directory, etc.

If anyone can think of other corner cases that may not be immediately
covered by default and common configurations please call them out and/or
list what you have tested for them.

thanks!

--larry

On Wed, Nov 28, 2018 at 2:16 PM Jeffrey Rodriguez 
mailto:jeffrey...@gmail.com>>
wrote:

> +1 , based on my review and testing.
>
> Thanks,
> Jeffrey E Rodriguez
>
> On Wed, Nov 28, 2018 at 11:05 AM
Kevin Risden
mailto:kris...@apache.org>> wrote:
>
> > Release candidate #2 for the Apache Knox 1.2.0 release is available at:
> >
> > https://dist.apache.org/repos/dist/dev/knox/knox-1.2.0/
> >
> > The release candidate is a zip archive of the sources in:
> >
> > https://git-wip-us.apache.org/repos/asf/knox.git
> > Branch v1.2.0 (git checkout -b v1.2.0)
> > Tag is v1.2.0-rc2 (git checkout -b v1.2.0-rc2)
> >
> > The KEYS file for signature validation is available at:
> > https://dist.apache.org/repos/dist/release/knox/KEYS
> >
> > Please vote on releasing this package as Apache Knox 1.2.0.
> > The vote is open for the next 72 hours and passes if a majority of at
> > least three +1 Apache Knox PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Knox 1.2.0
> > [ ] -1 Do not release this package because...
> >
> > Kevin Risden
> >
>


Re: WebHDFS performance issue in Knox

2018-10-10 Thread David Villarreal
Interesting.  Nice work.  2x improvement is great!




From: Kevin Risden 
Reply-To: "user@knox.apache.org" 
Date: Wednesday, October 10, 2018 at 12:48 PM
To: "user@knox.apache.org" 
Subject: Re: WebHDFS performance issue in Knox

I tried disabling GCM ciphers based on the following information:
* https://www.wowza.com/docs/how-to-improve-ssl-performance-with-java-8
* 
https://stackoverflow.com/questions/25992131/slow-aes-gcm-encryption-and-decryption-with-java-8u20

The results for the read were:
* knox ssl no GCM - 1,073,741,824  125MB/s   in 8.7s
* knox ssl - 1,073,741,824 54.3MB/s   in 20s

This is a little more than a 2x speedup. There is also information in the links 
above that there should be more performance improvements with JDK 9+.

For the write side slow down, I found an issue with how Knox is handing the 
streaming data on writes only. I am looking into fixing this to get the write 
performance for HDFS improved.

Kevin Risden


On Wed, Oct 10, 2018 at 1:20 PM David Villarreal 
mailto:dvillarr...@hortonworks.com>> wrote:
I believe Curl has an option of what cipher to use..  You may also be able to 
force it at the server jvm level using /jre/lib/security/java.security


From: Sandeep Moré mailto:moresand...@gmail.com>>
Reply-To: "user@knox.apache.org<mailto:user@knox.apache.org>" 
mailto:user@knox.apache.org>>
Date: Tuesday, October 9, 2018 at 6:39 PM
To: "user@knox.apache.org<mailto:user@knox.apache.org>" 
mailto:user@knox.apache.org>>
Subject: Re: WebHDFS performance issue in Knox

I think this would be a good test, worth a try, not sure how we can force a 
certain cipher to be used perhaps a permutation combination of
ssl.include.ciphers, ssl.exclude.ciphers.

Best,
Sandeep


On Tue, Oct 9, 2018 at 5:29 PM David Villarreal 
mailto:dvillarr...@hortonworks.com>> wrote:
Hi Kevin,

In my humble opinion, this has to do with cpu processing encryption in general 
based on which cipher being used.  Couldn’t the same type of 
principals/improvements (hdfs encryption improvements) be done here for let’s 
say for AES cipher suites?  If the main bottleneck here is CPU couldn’t you 
enhance encryption though hardware acceleration and you may see better 
performance numbers?

https://calomel.org/aesni_ssl_performance.html

Try forcing a less secure cipher to be used in your environment.  Do you then 
see better numbers?

dav


From:
Kevin Risden
mailto:kris...@apache.org>>
Reply-To: "user@knox.apache.org<mailto:user@knox.apache.org>" 
mailto:user@knox.apache.org>>
Date: Tuesday, October 9, 2018 at 1:05 PM
To: "user@knox.apache.org<mailto:user@knox.apache.org>" 
mailto:user@knox.apache.org>>
Subject: Re: WebHDFS performance issue in Knox

@David - Not sure what you mean since this is SSL/TLS and not related to RPC 
encryption like the two JIRAs that you linked.
@Guang - NP just took some time to sit down and look at it.

Some preliminary investigation shows this may be the JDK implementation of 
TLS/SSL that is slowing down the read path. I need to dig into it further but 
found a few references showing that Java slowness for TLS/SSL affects Jetty.

  *   https://nbsoftsolutions.com/blog/the-cost-of-tls-in-java-and-solutions
  *   https://nbsoftsolutions.com/blog/dropwizard-1-3-upcoming-tls-improvements
  *   https://webtide.com/conscrypting-native-ssl-for-jetty/
Locally testing off a Jetty 9.4 branch (for KNOX-1516), I was able to enable 
conscrypting 
(https://www.eclipse.org/jetty/documentation/9.4.x/configuring-ssl.html#conscrypt).
 With that I was able to get read performance on par with non ssl and native 
webhdfs. The write side of the equation still has some performance differences 
that need to be looked at further.

Kevin Risden


On Tue, Oct 9, 2018 at 2:01 PM Guang Yang mailto:k...@uber.com>> 
wrote:
Thanks Kevin conducting such experiment! This is exactly what I saw before. It 
doesn't look right the download speed is 10x slower when enabling SSL.

On Tue, Oct 9, 2018 at 10:40 AM David Villarreal 
mailto:dvillarr...@hortonworks.com>> wrote:
I bring this up because HDFS encryption saw an increase in performance.
https://issues.apache.org/jira/browse/HDFS-6606

https://issues.apache.org/jira/browse/HADOOP-10768

Maybe Knox can make some enhancements in this area?

From: David Villarreal 
mailto:dvillarr...@hortonworks.com>>
Date: Tuesday, October 9, 2018 at 10:34 AM
To: "user@knox.apache.org<mailto:user@knox.apache.org>" 
mailto:user@knox.apache.org>>
Subject: Re: WebHDFS performance issue in Knox

Hi Kevin,
Now increase your CPU processing power and show me the numbers.

Do we support AES-NI optimization with extended CPU instruction set for AES 
hardware acceleration?
libcrypto.so library that supports hardware acceleration, such as OpenSSL 
1.0.1e. (Many OS versions have an older version of the library that does not 

Re: WebHDFS performance issue in Knox

2018-10-10 Thread David Villarreal
I believe Curl has an option of what cipher to use..  You may also be able to 
force it at the server jvm level using /jre/lib/security/java.security


From: Sandeep Moré 
Reply-To: "user@knox.apache.org" 
Date: Tuesday, October 9, 2018 at 6:39 PM
To: "user@knox.apache.org" 
Subject: Re: WebHDFS performance issue in Knox

I think this would be a good test, worth a try, not sure how we can force a 
certain cipher to be used perhaps a permutation combination of
ssl.include.ciphers, ssl.exclude.ciphers.

Best,
Sandeep


On Tue, Oct 9, 2018 at 5:29 PM David Villarreal 
mailto:dvillarr...@hortonworks.com>> wrote:
Hi Kevin,

In my humble opinion, this has to do with cpu processing encryption in general 
based on which cipher being used.  Couldn’t the same type of 
principals/improvements (hdfs encryption improvements) be done here for let’s 
say for AES cipher suites?  If the main bottleneck here is CPU couldn’t you 
enhance encryption though hardware acceleration and you may see better 
performance numbers?

https://calomel.org/aesni_ssl_performance.html

Try forcing a less secure cipher to be used in your environment.  Do you then 
see better numbers?

dav


From: Kevin Risden mailto:kris...@apache.org>>
Reply-To: "user@knox.apache.org<mailto:user@knox.apache.org>" 
mailto:user@knox.apache.org>>
Date: Tuesday, October 9, 2018 at 1:05 PM
To: "user@knox.apache.org<mailto:user@knox.apache.org>" 
mailto:user@knox.apache.org>>
Subject: Re: WebHDFS performance issue in Knox

@David - Not sure what you mean since this is SSL/TLS and not related to RPC 
encryption like the two JIRAs that you linked.
@Guang - NP just took some time to sit down and look at it.

Some preliminary investigation shows this may be the JDK implementation of 
TLS/SSL that is slowing down the read path. I need to dig into it further but 
found a few references showing that Java slowness for TLS/SSL affects Jetty.

  *   https://nbsoftsolutions.com/blog/the-cost-of-tls-in-java-and-solutions
  *   https://nbsoftsolutions.com/blog/dropwizard-1-3-upcoming-tls-improvements
  *   https://webtide.com/conscrypting-native-ssl-for-jetty/
Locally testing off a Jetty 9.4 branch (for KNOX-1516), I was able to enable 
conscrypting 
(https://www.eclipse.org/jetty/documentation/9.4.x/configuring-ssl.html#conscrypt).
 With that I was able to get read performance on par with non ssl and native 
webhdfs. The write side of the equation still has some performance differences 
that need to be looked at further.

Kevin Risden


On Tue, Oct 9, 2018 at 2:01 PM Guang Yang mailto:k...@uber.com>> 
wrote:
Thanks Kevin conducting such experiment! This is exactly what I saw before. It 
doesn't look right the download speed is 10x slower when enabling SSL.

On Tue, Oct 9, 2018 at 10:40 AM David Villarreal 
mailto:dvillarr...@hortonworks.com>> wrote:
I bring this up because HDFS encryption saw an increase in performance.
https://issues.apache.org/jira/browse/HDFS-6606

https://issues.apache.org/jira/browse/HADOOP-10768

Maybe Knox can make some enhancements in this area?

From: David Villarreal 
mailto:dvillarr...@hortonworks.com>>
Date: Tuesday, October 9, 2018 at 10:34 AM
To: "user@knox.apache.org<mailto:user@knox.apache.org>" 
mailto:user@knox.apache.org>>
Subject: Re: WebHDFS performance issue in Knox

Hi Kevin,
Now increase your CPU processing power and show me the numbers.

Do we support AES-NI optimization with extended CPU instruction set for AES 
hardware acceleration?
libcrypto.so library that supports hardware acceleration, such as OpenSSL 
1.0.1e. (Many OS versions have an older version of the library that does not 
support AES-NI.)


From:
Kevin Risden
mailto:kris...@apache.org>>
Reply-To: "user@knox.apache.org<mailto:user@knox.apache.org>" 
mailto:user@knox.apache.org>>
Date: Tuesday, October 9, 2018 at 10:26 AM
To: "user@knox.apache.org<mailto:user@knox.apache.org>" 
mailto:user@knox.apache.org>>
Subject: Re: WebHDFS performance issue in Knox

Writes look to have performance impact as well:

  *   directly to webhdfs - ~2.6 seconds
  *   knox no ssl - ~29 seconds
  *   knox ssl - ~49.6 seconds
Kevin Risden


On Tue, Oct 9, 2018 at 12:39 PM Kevin Risden 
mailto:kris...@apache.org>> wrote:
If I run two downloads concurrently:

1,073,741,824 46.1MB/s   in 22s
1,073,741,824 51.3MB/s   in 22s

So it isn't a limitation of the Knox gateway itself in total bandwidth but a 
per connection limitation somehow.

Kevin Risden


On Tue, Oct 9, 2018 at 12:24 PM Kevin Risden 
mailto:kris...@apache.org>> wrote:
So I was able to reproduce a slowdown with SSL with a pseudo distributed HDFS 
setup on a single node with Knox running on the same node. This was setup in 
Virtualbox on my laptop.

Rough timings with wget for a 1GB random file:

  *   directly to webhdfs - 1,073,741,824  252MB/s   in 3.8s

Re: Spark History UI Error WARN HttpParser: Header is too large >8192

2018-10-09 Thread David Villarreal
Hi Theyaa,

Change the size of gateway.httpserver.requestHeaderBuffer property.  I think 
the default is 8912  (8k) change to 16384. See if that helps.

For the second problem Request is a replay (34))] this message is often seen 
when the timing of one of the servers is off.  Make sure you use NTPD on all 
servers and they are all in sync.  If everything is in sync you can work around 
this issue by turning off krb5 replay cache. With the following parameter
-Dsun.security.krb5.rcache=none

dav


On 10/9/18, 9:01 AM, "Theyaa Matti"  wrote:

Hi,
   I am getting this error message "WARN HttpParser: Header is too large
>8192" when trying to access the spark history ui through knox. Any idea
please?

Also when trying to load the executors page, I get : GSS initiate failed
[Caused by GSSException: Failure unspecified at GSS-API level (Mechanism 
level:
Request is a replay (34))]

when knox is requesting executorspage-template.html

appreciate any help here.




Re: WebHDFS performance issue in Knox

2018-10-09 Thread David Villarreal
Hi Kevin,

In my humble opinion, this has to do with cpu processing encryption in general 
based on which cipher being used.  Couldn’t the same type of 
principals/improvements (hdfs encryption improvements) be done here for let’s 
say for AES cipher suites?  If the main bottleneck here is CPU couldn’t you 
enhance encryption though hardware acceleration and you may see better 
performance numbers?

https://calomel.org/aesni_ssl_performance.html

Try forcing a less secure cipher to be used in your environment.  Do you then 
see better numbers?

dav


From: Kevin Risden 
Reply-To: "user@knox.apache.org" 
Date: Tuesday, October 9, 2018 at 1:05 PM
To: "user@knox.apache.org" 
Subject: Re: WebHDFS performance issue in Knox

@David - Not sure what you mean since this is SSL/TLS and not related to RPC 
encryption like the two JIRAs that you linked.
@Guang - NP just took some time to sit down and look at it.

Some preliminary investigation shows this may be the JDK implementation of 
TLS/SSL that is slowing down the read path. I need to dig into it further but 
found a few references showing that Java slowness for TLS/SSL affects Jetty.

  *   https://nbsoftsolutions.com/blog/the-cost-of-tls-in-java-and-solutions
  *   https://nbsoftsolutions.com/blog/dropwizard-1-3-upcoming-tls-improvements
  *   https://webtide.com/conscrypting-native-ssl-for-jetty/
Locally testing off a Jetty 9.4 branch (for KNOX-1516), I was able to enable 
conscrypting 
(https://www.eclipse.org/jetty/documentation/9.4.x/configuring-ssl.html#conscrypt).
 With that I was able to get read performance on par with non ssl and native 
webhdfs. The write side of the equation still has some performance differences 
that need to be looked at further.

Kevin Risden


On Tue, Oct 9, 2018 at 2:01 PM Guang Yang mailto:k...@uber.com>> 
wrote:
Thanks Kevin conducting such experiment! This is exactly what I saw before. It 
doesn't look right the download speed is 10x slower when enabling SSL.

On Tue, Oct 9, 2018 at 10:40 AM David Villarreal 
mailto:dvillarr...@hortonworks.com>> wrote:
I bring this up because HDFS encryption saw an increase in performance.
https://issues.apache.org/jira/browse/HDFS-6606

https://issues.apache.org/jira/browse/HADOOP-10768

Maybe Knox can make some enhancements in this area?

From: David Villarreal 
mailto:dvillarr...@hortonworks.com>>
Date: Tuesday, October 9, 2018 at 10:34 AM
To: "user@knox.apache.org<mailto:user@knox.apache.org>" 
mailto:user@knox.apache.org>>
Subject: Re: WebHDFS performance issue in Knox

Hi Kevin,
Now increase your CPU processing power and show me the numbers.

Do we support AES-NI optimization with extended CPU instruction set for AES 
hardware acceleration?
libcrypto.so library that supports hardware acceleration, such as OpenSSL 
1.0.1e. (Many OS versions have an older version of the library that does not 
support AES-NI.)


From:
Kevin Risden
mailto:kris...@apache.org>>
Reply-To: "user@knox.apache.org<mailto:user@knox.apache.org>" 
mailto:user@knox.apache.org>>
Date: Tuesday, October 9, 2018 at 10:26 AM
To: "user@knox.apache.org<mailto:user@knox.apache.org>" 
mailto:user@knox.apache.org>>
Subject: Re: WebHDFS performance issue in Knox

Writes look to have performance impact as well:

  *   directly to webhdfs - ~2.6 seconds
  *   knox no ssl - ~29 seconds
  *   knox ssl - ~49.6 seconds
Kevin Risden


On Tue, Oct 9, 2018 at 12:39 PM Kevin Risden 
mailto:kris...@apache.org>> wrote:
If I run two downloads concurrently:

1,073,741,824 46.1MB/s   in 22s
1,073,741,824 51.3MB/s   in 22s

So it isn't a limitation of the Knox gateway itself in total bandwidth but a 
per connection limitation somehow.

Kevin Risden


On Tue, Oct 9, 2018 at 12:24 PM Kevin Risden 
mailto:kris...@apache.org>> wrote:
So I was able to reproduce a slowdown with SSL with a pseudo distributed HDFS 
setup on a single node with Knox running on the same node. This was setup in 
Virtualbox on my laptop.

Rough timings with wget for a 1GB random file:

  *   directly to webhdfs - 1,073,741,824  252MB/s   in 3.8s
  *   knox no ssl - 1,073,741,824  264MB/s   in 3.6s
  *   knox ssl - 1,073,741,824 54.3MB/s   in 20s
There is a significant decrease with Knox SSL for some reason.

Kevin Risden


On Sun, Sep 23, 2018 at 8:53 PM larry mccay 
mailto:lmc...@apache.org>> wrote:
SSL handshake will likely happen at least twice.
Once for the request through Knox to the NN then the redirect from the NN to 
the DN goes all the way back to the client.
So they have to follow the redirect and do the handshake to the DN.


On Sun, Sep 23, 2018 at 8:30 PM Kevin Risden 
mailto:kris...@apache.org>> wrote:
So I found this in the Knox issues list in JIRA:

https://issues.apache.org/jira/browse/KNOX-1221

It sounds familiar in terms of a slowdown when going through Knox.

Kevin Risden


On Sat, Sep 15, 2018 at 10:17 PM Kevin Ris

Re: WebHDFS performance issue in Knox

2018-09-04 Thread David Villarreal
Hi Guang,

Keep in mind the data is being encrypted over SSL.  If you disable SSL you will 
most likely see a very significant boost in throughput.  Some people have used 
more powerful computers to make encryption quicker.

Thanks,

David

From: Sean Roberts 
Reply-To: "user@knox.apache.org" 
Date: Tuesday, September 4, 2018 at 1:53 AM
To: "user@knox.apache.org" 
Subject: Re: WebHDFS performance issue in Knox

Guang – This is somewhat to be expected.

When you talk to WebHDFS directly, the client can distribute the request across 
many data nodes. Also, you are getting data directly from the source.
With Knox, all traffic goes through the single Knox host. Knox is responsible 
for fetching from the datanodes and consolidating to send to you. This means 
overhead as it’s acting as a middle man, and lower network capacity since only 
1 host is serving data to you.

Also, if running on a cloud provider, the Knox host may be a smaller instance 
size with lower network capacity.
--
Sean Roberts

From: Guang Yang 
Reply-To: "user@knox.apache.org" 
Date: Tuesday, 4 September 2018 at 07:46
To: "user@knox.apache.org" 
Subject: WebHDFS performance issue in Knox

Hi,

We're using Knox 1.1.0 to proxy WebHDFS request. If we download a file through 
WebHDFS in Knox, the download speed is just about 11M/s. However, if we 
download directly from datanode, the speed is about 40M/s at least.

Are you guys aware of this problem? Any suggestion?

Thanks,
Guang


Re: Impersonate/ProxyUser through Knox?

2018-08-31 Thread David Villarreal
Hi Sean,

Proxy/Impersonation is configured on the Hadoop side.  And knox user/principal 
impersonates users.  I think the answer to this question is no….   Knox does 
not have its own proxy impersonation provider.

What I know Knox does have is
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_security/content/knox_configuring_identity_assertion.html
http://kminder.github.io/knox/2015/11/20/identity-assertion.html
http://knox.apache.org/books/knox-1-1-0/user-guide.html#Identity+Assertion


From: Sean Roberts 
Date: Friday, August 31, 2018 at 12:43 PM
To: "user@knox.apache.org" 
Subject: Impersonate/ProxyUser through Knox?

Knox experts – Does Knox provide impersonation/proxyuser functionality like 
direct WebHDFS connections (hadoop.proxyuser.service-user.users) and HttpFS 
(httpfs.proxyuser.service-user.users)?

For example:

  *   “service-user” authenticates to Knox, then requests to run commands as 
“normal-user”.

--
Sean Roberts