[jira] [Comment Edited] (ARIES-1804) Timeout due to connection loss in RSA fastbin provider?

2018-06-01 Thread Alex Weirig (JIRA)


[ 
https://issues.apache.org/jira/browse/ARIES-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498073#comment-16498073
 ] 

Alex Weirig edited comment on ARIES-1804 at 6/1/18 2:48 PM:


Hi Johannes,

I wrote a scheduled job to test my remote authentication service. The job is 
run every 5 minutes and this is a normal execution:

2018-06-01T16:00:00,002 | INFO  | Karaf_Worker-3   | job                        
      | 193 - lu.tlk.administration.services.authentication.job - 
0.0.0.201806011221 | AuthenticationServiceJob :: execute :: begin

2018-06-01T16:00:00,003 | INFO  | Karaf_Worker-3   | job                        
      | 193 - lu.tlk.administration.services.authentication.job - 
0.0.0.201806011221 | AuthenticationServiceJob :: execute :: 
authenticationService = 
org.apache.aries.rsa.provider.fastbin.tcp.ClientInvokerImpl$ProxyInvocationHandler@1a288b28

*2018-06-01T16:00:00,065 | INFO  | Karaf_Worker-3   | job                       
       | 193 - lu.tlk.administration.services.authentication.job - 
0.0.0.201806011221 | AuthenticationServiceJob :: execute :: authenticateUser = 
User(userid=awtest, firstname=AW, lastname=Test, 
whdApiKey=Y1S3nHUdjceVDVMQpQvORVzwTPRuPBO2xW19RLmY, 
entitlements=[technolink-administration-technician, technolink-administration, 
technolink-administration-advanced-technician])*

2018-06-01T16:00:00,065 | INFO  | Karaf_Worker-3   | job                        
      | 193 - lu.tlk.administration.services.authentication.job - 
0.0.0.201806011221 | AuthenticationServiceJob :: execute :: end

the bold part is the result from the remote service that connects to our LDAP 
and authenticates a user. I've created a copy of my user.

 

And then suddenly I get this:

2018-06-01T16:30:00,002 | INFO  | Karaf_Worker-9   | job                        
      | 193 - lu.tlk.administration.services.authentication.job - 
0.0.0.201806011221 | AuthenticationServiceJob :: execute :: begin

2018-06-01T16:30:00,003 | INFO  | Karaf_Worker-9   | job                        
      | 193 - lu.tlk.administration.services.authentication.job - 
0.0.0.201806011221 | AuthenticationServiceJob :: execute :: 
authenticationService = 
org.apache.aries.rsa.provider.fastbin.tcp.ClientInvokerImpl$ProxyInvocationHandler@1a288b28

*2018-06-01T16:30:00,004 | ERROR | Karaf_Worker-8   | QuartzScheduler           
       | 192 - org.apache.karaf.scheduler.core - 4.2.0 | Exception during job 
execution of 
lu.tlk.administration.services.authentication.job.AuthenticationServiceJob@7b91685d
 : null*

*org.osgi.framework.ServiceException: null*

at 
org.apache.aries.rsa.provider.fastbin.tcp.ClientInvokerImpl$ProxyInvocationHandler.invoke(ClientInvokerImpl.java:328)
 ~[?:?]

at com.sun.proxy.$Proxy68.authenticateUser(Unknown Source) ~[?:?]

at 
lu.tlk.administration.services.authentication.job.AuthenticationServiceJob.execute(AuthenticationServiceJob.java:32)
 ~[?:?]

at 
org.apache.karaf.scheduler.core.QuartzJobExecutor.execute(QuartzJobExecutor.java:52)
 [192:org.apache.karaf.scheduler.core:4.2.0]

at org.quartz.core.JobRunShell.run(JobRunShell.java:202) 
[192:org.apache.karaf.scheduler.core:4.2.0]

at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) 
[192:org.apache.karaf.scheduler.core:4.2.0]

Caused by: java.util.concurrent.TimeoutException

at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[?:?]

at 
org.apache.aries.rsa.provider.fastbin.tcp.ClientInvokerImpl.request(ClientInvokerImpl.java:274)
 ~[?:?]

at 
org.apache.aries.rsa.provider.fastbin.tcp.ClientInvokerImpl$ProxyInvocationHandler.invoke(ClientInvokerImpl.java:312)
 ~[?:?]

... 5 more

at  timestamp 2018-06-01T16:30:00,003 it prints the object.toString from the 
@Reference of the remote service. So the @Reference variable is not null ( 
org.apache.aries.rsa.provider.fastbin.tcp.ClientInvokerImpl$ProxyInvocationHandler@1a288b28)
 but when it tries to call that service it produces the message in bold.

And from that point on the authentication is no longer working anymore ...

On my remote karaf I see that the remote service has correctly authenticated 
the user between the 2 timestamps  2018-06-01T16:30:00,006 and  
2018-06-01T16:30:00,008

2018-06-01T16:30:00,006 | INFO  | pool-11-thread-8 | authentication             
      | 167 - lu.tlk.administration.services.authentication - 
0.0.0.201805090646 | AuthenticationServiceImpl::authenticateUser - userid = 
awtest

2018-06-01T16:30:00,008 | INFO  | pool-11-thread-8 | ldap                       
      | 148 - lu.tlk.administration.services.authentication_provider.ldap - 
0.0.0.201805090647 | AuthenticationServiceProviderLDAPImpl::authenticateUser - 
userid = awtest

2018-06-01T16:30:00,008 | INFO  | pool-11-thread-8 | ldap                       
      | 136 - lu.tlk.common.services.ldap - 0.0.0.201805070851 | 
TechnolinkLDAPServiceImpl::authenticateUser - ldapDn = 

[jira] [Comment Edited] (ARIES-1804) Timeout due to connection loss in RSA fastbin provider?

2018-05-31 Thread Johannes Utzig (JIRA)


[ 
https://issues.apache.org/jira/browse/ARIES-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496321#comment-16496321
 ] 

Johannes Utzig edited comment on ARIES-1804 at 5/31/18 9:50 AM:


It doesn't sound like Zookeeper (or the connection to it) is the problem here.

Zookeeper is just used for sharing the information which services are exported, 
it is not used for the actual remote conmmunicatio.

If the zookeeper connection broke down, the remote services should get 
deregisted, not stop working.

To me this sounds like e.g. a firewall is dropping the connection at some 
point, maybe due to inactivity.

You should check the firewall settings/logs.

I don't know how long it takes you to reproduce this, but you could test the 
theory by invoking a simple remote service every minute or so, to see if the 
periodic traffic prevents the firewall from dropping the connection. If the 
authentication service works correctly after that, it definetly sounds like 
firewall/iptables


was (Author: j.utzig):
It doesn't sound like Zookeeper (or the connection to it) is the problem here.

Zookeeper is just used for sharing the information which services are exported, 
it is not used for the actual remote configuration.

If the zookeeper connection broke down, the remote services should get 
deregisted, not stop working.

To me this sounds like e.g. a firewall is dropping the connection at some 
point, maybe due to inactivity.

You should check the firewall settings/logs.

I don't know how long it takes you to reproduce this, but you could test the 
theory by invoking a simple remote service every minute or so, to see if the 
periodic traffic prevents the firewall from dropping the connection. If the 
authentication service works correctly after that, it definetly sounds like 
firewall/iptables

> Timeout due to connection loss in RSA fastbin provider?
> ---
>
> Key: ARIES-1804
> URL: https://issues.apache.org/jira/browse/ARIES-1804
> Project: Aries
>  Issue Type: Bug
>  Components: Remote Service Admin
>Affects Versions: rsa-1.12.0
> Environment: Karaf 4.2.0
> RSA 1.12.0
> zookeeper 3.4.12
> java 1.8.0_172-b11
> RHEL 7.5
>Reporter: Alex Weirig
>Priority: Critical
> Attachments: AuthenticationServiceImpl.txt, LoginView.txt, 
> stacktrace.txt, zoo.cfg.txt
>
>
> Hello,
> I'm running two karaf (4.2.0) servers, one is running the frontend of my 
> application, the second one is running the backend.
> The backend services are published to 3 clustered zookeeper (3.4.12) servers. 
> In karaf I have deployed the following RSA features:
> karaf@appsrvtlk()> feature:list | grep rsa
> aries-rsa-core │ 1.12.0 │ │ Started │ aries-rsa-1.12.0 │
> aries-rsa-provider-tcp │ 1.12.0 │ │ Uninstalled │ aries-rsa-1.12.0 │
> aries-rsa-provider-fastbin │ 1.12.0 │ x │ Started │ aries-rsa-1.12.0 │
> aries-rsa-discovery-local │ 1.12.0 │ │ Uninstalled │ aries-rsa-1.12.0 │
> aries-rsa-discovery-config │ 1.12.0 │ │ Uninstalled │ aries-rsa-1.12.0 │
> aries-rsa-discovery-zookeeper │ 1.12.0 │ x │ Started │ aries-rsa-1.12.0 │
> aries-rsa-discovery-zookeeper-server │ 1.12.0 │ │ Uninstalled │ 
> aries-rsa-1.12.0 │
> When I start my karaf servers everything is working fine and my frontend can 
> call my backend service and gets the result. But after some time (I can't 
> figure out when) it seems that the connections between the karaf and 
> zookeeper gets lost and I'm getting a timeout when I call my remote service 
> eventhough all the servers (karaf and zookeepers) are still available and 
> responding. Exhibitor shows no apparent issues with the zookeepers.
> I have attached the 
>  * relevant parts of my LoginView UI where I declared the @Reference to my 
> service and where I call the remote service
>  * relevant parts of my AuthenticationService implementation that should be 
> called on the remote karaf
>  * the stacktrace that I'm getting on the frontend karaf when the timeout 
> occurs
>  * my zoo.cfg file
> From the stacktrace one can see that the LoginView has a non-null fastbin 
> proxy handler for the authentication service but that after 5 minutes a 
> timeout occurs and there is no line in the log that shows that the remote 
> service was actually called.
> Many thanks in advance for your support.
> Kind regards,
> Alex



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARIES-1804) Timeout due to connection loss in RSA fastbin provider?

2018-05-31 Thread Johannes Utzig (JIRA)


[ 
https://issues.apache.org/jira/browse/ARIES-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496291#comment-16496291
 ] 

Johannes Utzig edited comment on ARIES-1804 at 5/31/18 9:08 AM:


You should have a look on the side that provides the service. Is there an 
exception? How long will the method execution take in worst case?

Sync remote calls are expected to execute fast so that they do not block the 
threads used for remote communication. If a sync call takes too long (default 
is 15s AFAIR) you will get a timeout like this.

For potentially long running calls, use async calls. To make the call async, 
have the method return either a Future, CompletableFuture or Promise.


was (Author: j.utzig):
You should have a look on the side that provides the service. Is there an 
exception? How  will the method execution take in worst case?

Sync remote calls are expected to execute fast so that they do not block the 
threads used for remote communication. If a sync call takes too long (default 
is 15s AFAIR) you will get a timeout like this.

For potentially long running calls, use async calls. To make the call async, 
have the method return either a Future, CompletableFuture or Promise.

> Timeout due to connection loss in RSA fastbin provider?
> ---
>
> Key: ARIES-1804
> URL: https://issues.apache.org/jira/browse/ARIES-1804
> Project: Aries
>  Issue Type: Bug
>  Components: Remote Service Admin
>Affects Versions: rsa-1.12.0
> Environment: Karaf 4.2.0
> RSA 1.12.0
> zookeeper 3.4.12
> java 1.8.0_172-b11
> RHEL 7.5
>Reporter: Alex Weirig
>Priority: Critical
> Attachments: AuthenticationServiceImpl.txt, LoginView.txt, 
> stacktrace.txt, zoo.cfg.txt
>
>
> Hello,
> I'm running two karaf (4.2.0) servers, one is running the frontend of my 
> application, the second one is running the backend.
> The backend services are published to 3 clustered zookeeper (3.4.12) servers. 
> In karaf I have deployed the following RSA features:
> karaf@appsrvtlk()> feature:list | grep rsa
> aries-rsa-core │ 1.12.0 │ │ Started │ aries-rsa-1.12.0 │
> aries-rsa-provider-tcp │ 1.12.0 │ │ Uninstalled │ aries-rsa-1.12.0 │
> aries-rsa-provider-fastbin │ 1.12.0 │ x │ Started │ aries-rsa-1.12.0 │
> aries-rsa-discovery-local │ 1.12.0 │ │ Uninstalled │ aries-rsa-1.12.0 │
> aries-rsa-discovery-config │ 1.12.0 │ │ Uninstalled │ aries-rsa-1.12.0 │
> aries-rsa-discovery-zookeeper │ 1.12.0 │ x │ Started │ aries-rsa-1.12.0 │
> aries-rsa-discovery-zookeeper-server │ 1.12.0 │ │ Uninstalled │ 
> aries-rsa-1.12.0 │
> When I start my karaf servers everything is working fine and my frontend can 
> call my backend service and gets the result. But after some time (I can't 
> figure out when) it seems that the connections between the karaf and 
> zookeeper gets lost and I'm getting a timeout when I call my remote service 
> eventhough all the servers (karaf and zookeepers) are still available and 
> responding. Exhibitor shows no apparent issues with the zookeepers.
> I have attached the 
>  * relevant parts of my LoginView UI where I declared the @Reference to my 
> service and where I call the remote service
>  * relevant parts of my AuthenticationService implementation that should be 
> called on the remote karaf
>  * the stacktrace that I'm getting on the frontend karaf when the timeout 
> occurs
>  * my zoo.cfg file
> From the stacktrace one can see that the LoginView has a non-null fastbin 
> proxy handler for the authentication service but that after 5 minutes a 
> timeout occurs and there is no line in the log that shows that the remote 
> service was actually called.
> Many thanks in advance for your support.
> Kind regards,
> Alex



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)