Re: Freeradius proxy code questions and proposed patch
O/H Alan DeKok έγραψε: Kostas Zorbadelos wrote: I have read in the list about the major clean up version 2.0 of the server will be. While reading the code of versions 1.x I could see that there is great room for improvement. I will take a look in the 2.0 sources and I look forward to testing it when it becomes available. Please test it now. If everyone waits for 2.0 to be release before testing it, then everyone will discover little problems that they don't like. Spend some time now to give feedback, and 2.0 will be that much more robust for everyone. I think it's a good idea to start releasing 2.0preX versions. That should make a few more people interested in testing the code and get more comments. Alan DeKok. -- http://deployingradius.com - The web site of the book http://deployingradius.com/blog/ - The blog - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html -- Kostas Kalevras - Network Operations Center National Technical University of Athens http://kkalev.wordpress.com - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Freeradius proxy code questions and proposed patch
On Fri 04 May 2007, Kostas Kalevras wrote: O/H Alan DeKok έγραψε: Kostas Zorbadelos wrote: I have read in the list about the major clean up version 2.0 of the server will be. While reading the code of versions 1.x I could see that there is great room for improvement. I will take a look in the 2.0 sources and I look forward to testing it when it becomes available. Please test it now. If everyone waits for 2.0 to be release before testing it, then everyone will discover little problems that they don't like. Spend some time now to give feedback, and 2.0 will be that much more robust for everyone. I think it's a good idea to start releasing 2.0preX versions. That should make a few more people interested in testing the code and get more comments. I agree. While I have been rolling freeradius-server-snapshot rpms on a weekly basis, releasing freeradius-server-2.0preX rpms is likely to get a lot more people to upgrade. (Anyone using my repo will get the new version automatically) Cheers -- Peter Nixon http://www.peternixon.net/ PGP Key: http://www.peternixon.net/public.asc - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Freeradius proxy code questions and proposed patch
Kostas Kalevras wrote: I think it's a good idea to start releasing 2.0preX versions. That should make a few more people interested in testing the code and get more comments. I'm working on fixing the handling of the detail files right now, so that one server will be able to read the detail files it writes. Once that's done, I think we're ready for 2.0-pre0. e.g. With the new code, the server will be able to: - proxy accounting to a home server - if that fails, write a detail file - read the detail file - try to proxy the packets again - if that fails, leave the data in the detail file This means that a server doing proxying can just be a pass through server when everything is OK. Then, if something goes wrong, it can log the accounting data to a file for later replay. Once the home servers come back up, the accounting data will be automatically sent there. Alan DeKok. -- http://deployingradius.com - The web site of the book http://deployingradius.com/blog/ - The blog - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Freeradius proxy code questions and proposed patch
Kostas Zorbadelos wrote: Precicely. But when we work in 'synchronous' mode we want the NAS to be in charge of the retransmision policy not our proxy server. If the home server does not reply for any reason, we want the client (NAS) to notice it and retransmit. Eventually, the client will mark our proxy server dead not because it is its fault, but because the home server is not responding. Have you tried using failover for home servers? The whole point of marking a home server dead is to remove it from the pool of home servers. Then, if another one in the same pool is alive, the proxy will use it. If you don't mark the home server dead, then you can't do failover, and your system becomes less robust. Which server? All your patch does is make sure that the NAS marks the proxying server as dead. Eventually, yes this is what the NAS will do. All that is due to the synchronous mode in proxy operation. The solution is not to patch the code to make the proxying server dead. The solution is to use more than one home server. I have read in the list about the major clean up version 2.0 of the server will be. While reading the code of versions 1.x I could see that there is great room for improvement. I will take a look in the 2.0 sources and I look forward to testing it when it becomes available. Please test it now. If everyone waits for 2.0 to be release before testing it, then everyone will discover little problems that they don't like. Spend some time now to give feedback, and 2.0 will be that much more robust for everyone. Alan DeKok. -- http://deployingradius.com - The web site of the book http://deployingradius.com/blog/ - The blog - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Freeradius proxy code questions and proposed patch
Hello to everyone. In a previous thread http://www.mail-archive.com/freeradius-users@lists.freeradius.org/msg33354.html I had described a strange behavior in our large proxy setup. After running the server in debug mode (radiusd -xxx) in our production systems we found out what was causing our problems. The problem was that the home server in our proxy setup was marked dead quite often during the day and with a dead_time of 30 secs every request that came within these 30 secs was rejected. Our proxy conf initially looked like the following: proxy server { synchronous = yes retry_delay = 0 retry_count = 0 dead_time = 30 default_fallback = yes post_proxy_authorize = no } ### # # Configuration for the proxy realms. # ... We first changed the dead_time to 0 so as to avoid marking the home server dead in synchronous mode. Additionally, we implemented the following patch (against version 1.1.6): --- ./src/main/files.c.orig 2007-04-23 15:14:14.569932000 +0300 +++ ./src/main/files.c 2007-04-23 15:22:30.995686000 +0300 @@ -489,6 +489,15 @@ if (cl-last_reply (( now - mainconfig.proxy_retry_delay * mainconfig.proxy_retry_count ))) { continue; } + /* +* If we are running in synchronous proxy mode, there's no point marking the target +* server(s) dead, since this should be done by the radius client +*/ + if (mainconfig.proxy_synchronous) { + radlog(L_PROXY, authentication server %s:%d for realm %s seems unresponsive., + cl-server, port, cl-realm); + continue; + } cl-active = FALSE; cl-wakeup = now + mainconfig.proxy_dead_time; @@ -498,6 +507,15 @@ if (cl-last_reply (( now - mainconfig.proxy_retry_delay * mainconfig.proxy_retry_count ))) { continue; } + /* +* If we are running in synchronous proxy mode, there's no point marking the target +* server(s) dead, since this should be done by the radius client +*/ + if (mainconfig.proxy_synchronous) { + radlog(L_PROXY, accounting server %s:%d for realm %s seems unresponsive., + cl-acct_server, port, cl-realm); + continue; + } cl-acct_active = FALSE; cl-acct_wakeup = now + mainconfig.proxy_dead_time; The purpose of this patch is to not have the freeradius server mark the home server dead when working in synchronous mode. We believe that in synchronous operation it is a good idea to leave the job of marking the server dead to the NAS client. All the above actions solved our initial problems. However, after a while we noticed again clients being rejected when they shouldn't. The following code in request_list.c caught my attention: /* * Refresh a request, by using proxy_retry_delay, cleanup_delay, * max_request_time, etc. * * When walking over the request list, all of the per-request * magic is done here. */ static int refresh_request(REQUEST *request, void *data) { ... (around line 1264 version 1.1.6) } else if (request-proxy !request-proxy_reply) { /* * The request is NOT finished, but there is an * outstanding proxy request, with no matching * proxy reply. * * Wake up when it's time to re-send * the proxy request. * * But in synchronous proxy, we don't retry but we update * the next retry time as NAS has not resent the request * in the given retry window. */ if (mainconfig.proxy_synchronous) { /* * If the retry_delay * count has passed, * then mark the realm dead. */ if (info-now (request-timestamp + (mainconfig.proxy_retry_delay * mainconfig.proxy_retry_count))) { rad_assert(request-child_pid == NO_SUCH_CHILD_PID); request_reject(request); realm_disable(request-proxy-dst_ipaddr, request-proxy-dst_port); request-finished = TRUE;
Re: Freeradius proxy code questions and proposed patch
Kostas Zorbadelos wrote: I had described a strange behavior in our large proxy setup. After running the server in debug mode (radiusd -xxx) in our production systems we found out what was causing our problems. The problem was that the home server in our proxy setup was marked dead quite often during the day and with a dead_time of 30 secs every request that came within these 30 secs was rejected. Yes. In 1.x, the proxy code does this. It's fixed in 2.0, which should be released real soon now. + /* +* If we are running in synchronous proxy mode, there's no point marking the target +* server(s) dead, since this should be done by the radius client Uh, no. The RADIUS client doesn't know about the home servers. It only knows about the server it's sending packets to. The purpose of this patch is to not have the freeradius server mark the home server dead when working in synchronous mode. We believe that in synchronous operation it is a good idea to leave the job of marking the server dead to the NAS client. Which server? All your patch does is make sure that the NAS marks the proxying server as dead. ... It seems that in some strange occations the code enters the above path. A decision is made in case the current time is older than mainconfig.proxy_retry_delay * mainconfig.proxy_retry_count. If this is the case, the request is rejected and the code tries to disable the realm. However in the proxy.conf configuration file it is mentioned: All of that code is *gone* in 2.0. The new code is so much better that it's really quite hard to describe how much better it is. Please let me know your thoughts on these matters (also on the patch we provide) Take a look at the current CVS snapshot. It should be pretty robust with some recent bug fixes, and it will solve *all* of your proxying problems. And I do mean ALL of the problems. Alan DeKok. -- http://deployingradius.com - The web site of the book http://deployingradius.com/blog/ - The blog - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Freeradius proxy code questions and proposed patch
On Mon, Apr 30, 2007 at 05:41:06PM +0200, Alan DeKok wrote: Kostas Zorbadelos wrote: I had described a strange behavior in our large proxy setup. After running the server in debug mode (radiusd -xxx) in our production systems we found out what was causing our problems. The problem was that the home server in our proxy setup was marked dead quite often during the day and with a dead_time of 30 secs every request that came within these 30 secs was rejected. Yes. In 1.x, the proxy code does this. It's fixed in 2.0, which should be released real soon now. + /* +* If we are running in synchronous proxy mode, there's no point marking the target +* server(s) dead, since this should be done by the radius client Uh, no. The RADIUS client doesn't know about the home servers. It only knows about the server it's sending packets to. Precicely. But when we work in 'synchronous' mode we want the NAS to be in charge of the retransmision policy not our proxy server. If the home server does not reply for any reason, we want the client (NAS) to notice it and retransmit. Eventually, the client will mark our proxy server dead not because it is its fault, but because the home server is not responding. The purpose of this patch is to not have the freeradius server mark the home server dead when working in synchronous mode. We believe that in synchronous operation it is a good idea to leave the job of marking the server dead to the NAS client. Which server? All your patch does is make sure that the NAS marks the proxying server as dead. Eventually, yes this is what the NAS will do. All that is due to the synchronous mode in proxy operation. ... It seems that in some strange occations the code enters the above path. A decision is made in case the current time is older than mainconfig.proxy_retry_delay * mainconfig.proxy_retry_count. If this is the case, the request is rejected and the code tries to disable the realm. However in the proxy.conf configuration file it is mentioned: All of that code is *gone* in 2.0. The new code is so much better that it's really quite hard to describe how much better it is. Please let me know your thoughts on these matters (also on the patch we provide) Take a look at the current CVS snapshot. It should be pretty robust with some recent bug fixes, and it will solve *all* of your proxying problems. And I do mean ALL of the problems. I have read in the list about the major clean up version 2.0 of the server will be. While reading the code of versions 1.x I could see that there is great room for improvement. I will take a look in the 2.0 sources and I look forward to testing it when it becomes available. Thanks a lot Alan. Kostas Alan DeKok. -- http://deployingradius.com - The web site of the book http://deployingradius.com/blog/ - The blog - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html