Re: hashing + roundrobin algorithm

2011-12-04 Thread Willy Tarreau
On Sun, Dec 04, 2011 at 08:37:31AM +0100, Rerngvit Yanggratoke wrote:
> John, thanks for your idea. I thought about redirection as well.
> Nonetheless, for the legacy software, I could not change anything at the
> moment. They are simply not under my control.

Haproxy can do the redirection itself if it can help you (check the "redir"
keyword on servers). For instance you could have such a configuration :

frontend pub
use_backend farm1 if { hdr_beg(host) -i farm1. }
use_backend farm2 if { hdr_beg(host) -i farm2. }
use_backend farm3 if { hdr_beg(host) -i farm3. }
use_backend farm4 if { hdr_beg(host) -i farm4. }
default_backend hash

backend hash
balance uri
hash-type consistent
server farm1 127.0.0.1:80 redir http://farm1.mydomain
server farm2 127.0.0.1:80 redir http://farm2.mydomain
server farm3 127.0.0.1:80 redir http://farm3.mydomain
server farm4 127.0.0.1:80 redir http://farm4.mydomain

backend farm1
balance roundrobin
server srv1 srv1:80
server srv2 srv2:80
server srv3 srv3:80

backend farm2
balance roundrobin
server srv1 srv1:80
server srv2 srv2:80
server srv3 srv3:80

backend farm3
balance roundrobin
server srv1 srv1:80
server srv2 srv2:80
server srv3 srv3:80

backend farm4
balance roundrobin
server srv1 srv1:80
server srv2 srv2:80
server srv3 srv3:80

The principle is that your farms are composed of 3 servers each (in this
example), and the load balancing between the farms is based on a consistent
hash of the URI. Instead of internally chaining the backends to frontends,
you do that with a redirect so the client comes to fetch the URL on the
proper farm.

You can make variations on this, such as having dedicated IP:ports for
checking the farms' health and remove them from the consistent hash if
too many servers are missing in a farm.

Regards,
Willy




Re: hashing + roundrobin algorithm

2011-12-03 Thread Rerngvit Yanggratoke
John, thanks for your idea. I thought about redirection as well.
Nonetheless, for the legacy software, I could not change anything at the
moment. They are simply not under my control.

On Sun, Nov 27, 2011 at 11:01 PM, John Marrett  wrote:

> Rerngvit,
>
>
>  However, I couldn't change the file's URL by adding a prefix. The client
>> software requesting the files from this cluster are a legacy software. It
>> is simply not feasible to change it at the moment.
>>
>
> Does this legacy software by any chance support redirection? If it does
> you could load balance to one or more redirection servers that would serve
> redirects, these redirects could then feed into restructured path based
> URIs that could then be load balanced based on the path to sets of servers,
> as Baptiste (and possibly others) have proposed.
>
> -JohnF
>



-- 
Best Regards,
Rerngvit Yanggratoke


Re: Re: hashing + roundrobin algorithm

2011-12-03 Thread Rerngvit Yanggratoke
Thanks for sharing the code.

2011/11/30 wsq003 

> **
>
> My modification is based on version 1.4.16.
>
> ===in struct server add
> following===
>  char vgroup_name[100];
> struct proxy *vgroup; //if not NULL, means this is a Virtual GROUP
>
> ===in function process_chk()
> add at line 1198===
>  if (s->vgroup) {
> if ((s->vgroup->lbprm.tot_weight > 0) && !(s->state & SRV_RUNNING)) {
> s->health = s->rise;
> set_server_check_status(s, HCHK_STATUS_L4OK, "vgroup ok");
> set_server_up(s);
>
> } else if (!(s->vgroup->lbprm.tot_weight > 0) && (s->state & SRV_RUNNING)) {
> s->health = s->rise;
>
> set_server_check_status(s, HCHK_STATUS_HANA, "vgroup has no available 
> server");
> set_server_down(s);
> }
>
> if (s->state & SRV_RUNNING) {
> s->health = s->rise + s->fall - 1;
> set_server_check_status(s, HCHK_STATUS_L4OK, "vgroup ok");
> }
>
> while (tick_is_expired(t->expire, now_ms))
> t->expire = tick_add(t->expire, MS_TO_TICKS(s->inter));
> return t;
> }
>
> ===in function assign_server()
> add at line 622===
>  if (s->srv->vgroup) {
> struct proxy *old = s->be;
> s->be = s->srv->vgroup;
> int ret = assign_server(s);
> s->be = old;
> return ret;
> }
>
> ===in function
> cfg_parse_listen() add at line
> 3949===
>  else if (!defsrv && !strcmp(args[cur_arg], "vgroup")) {
> if (!args[cur_arg + 1]) {
> Alert("parsing [%s:%d] : '%s' : missing virtual_group name.\n",
> file, linenum, newsrv->id);
> err_code |= ERR_ALERT | ERR_FATAL;
> goto out;
> }
> if (newsrv->addr.sin_addr.s_addr) {
> //for easy indicate
>
> Alert("parsing [%s:%d] : '%s' : virtual_group requires the server address as 
> 0.0.0.0\n",
> file, linenum, newsrv->id);
> err_code |= ERR_ALERT | ERR_FATAL;
> goto out;
> }
> newsrv->check_port = 1;
>
> strlcpy2(newsrv->vgroup_name, args[cur_arg + 1], sizeof(newsrv->vgroup_name));
> cur_arg += 2;
> }
>
> ===in function
> check_config_validdity() add at line
> 5680
>  /*
>  * set vgroup if necessary
>  */
> newsrv = curproxy->srv;
> while (newsrv != NULL) {
> if (newsrv->vgroup_name[0] != '\0') {
> struct proxy *px = findproxy(newsrv->vgroup_name, PR_CAP_BE);
> if (px == NULL) {
>
> Alert("[%s][%s] : vgroup '%s' not exist.\n", curproxy->id, newsrv->id, 
> newsrv->vgroup_name);
> err_code |= ERR_ALERT | ERR_FATAL;
> break;
> }
> newsrv->vgroup = px;
> }
> newsrv = newsrv->next;
> }
>
>
> ==
>
> and some minor changes in function stats_dump_proxy() that not important.
>
>
> ==
>
> sample config file looks like:
>
>  backend internallighttpd
> option httpchk /monitor/ok.htm
>
> server wsqa 0.0.0.0 vgroup subproxy1 weight 32 check inter 4000 rise 3 
> fall 3
>
> server wsqb 0.0.0.0 vgroup subproxy2 weight 32 check inter 4000 rise 3 
> fall 3
> balance uri
> hash-type consistent
> option redispatch
> retries 3
>
> backend subproxy1
> option httpchk /monitor/ok.htm
> server wsq01 1.1.1.1:8001 weight 32 check inter 4000 rise 3 fall 3
> server wsq02 1.1.1.2:8001 weight 32 check inter 4000 rise 3 fall 3
> balance roundrobin
> option redispatch
> retries 3
>
> backend subproxy2
> option httpchk /monitor/ok.htm
> server wsq03 1.1.1.1:8002 weight 32 check inter 4000 rise 3 fall 3
> server wsq04 1.1.1.2:8002 weight 32 check inter 4000 rise 3 fall 3
> balance roundrobin
> option redispatch
> retries 3
>
>
> ==
>
> Sorry I can't provide a clean patch, because vgroup is just one of several
> changes.
> I did not consider the rewrite rules at that time. Maybe we can add
> a function call before calling assigen_server()?
>
>
>  *From:* Willy Tarreau 
> *Date:* 2011-11-30 01:47
&

Re: Re: hashing + roundrobin algorithm

2011-12-03 Thread Rerngvit Yanggratoke
Thank you, this is interesting idea. You are using HAProxy for all layers
and simply change LB configurations, right?

2011/11/29 wsq003 

> **
>
> Backend proxies may be multiple layers, then every layer can have its own
> LB param.
> Logically this is a tree-like structure, every real server is a leaf.
> Every none-leaf node is a backend proxy and may have LB param.
> When a HTTP request arrives, it go through the tree-like structure to find
> a proper real server.
>
> It would be better if official version can provide this feature.
>
>
>  *From:* Willy Tarreau 
> *Date:* 2011-11-29 14:24
> *To:* wsq003 
> *CC:* Rerngvit Yanggratoke ; haproxy;
> Baptiste 
> *Subject:* Re: Re: hashing + roundrobin algorithm
>  Hi,
>
> On Tue, Nov 29, 2011 at 01:52:31PM +0800, wsq003 wrote:
> >
> > We add a new keyword 'vgroup' under 'server' key word.
>
> > server wsqa 0.0.0.0 vgroup subproxy1 weight 32 check inter 4000 rise 3 
> > fall 3
>
> > means request assigned to this server will be treated as set backend 
> > 'subproxy1'. Then in backend 'subproxy1' you can configure any load balance 
> > strategy. This can be recursive.
> >
> > In source code:
>
> >  At the end of assign_server(), if we found that a server has 'vgroup' 
> > property, we will set backend of cur_proxy and call assign_server() again.
>
> Your trick sounds interesting but I'm not sure I completely understand
> how it works.
>
> There was a feature I wanted to implement some time ago, it would be sort
> of an internal server which would directly map to a frontend (or maybe just
> a backend) without passing via a TCP connection. It looks like your trick
> does something similar but I just fail to understand how the LB params are
> assigned to multiple backends for a given server.
>
> Regards,
> Willy
>
>
>
>



-- 
Best Regards,
Rerngvit Yanggratoke


Re: hashing + roundrobin algorithm

2011-12-03 Thread Rerngvit Yanggratoke
Thank you for the suggestion. Consistent hashing sounds promising. The
number of files I would have to redistributed is limited if some servers
failures.

On Sun, Nov 27, 2011 at 10:42 PM, Allan Wind
wrote:

> On 2011-11-26 01:30:41, Rerngvit Yanggratoke wrote:
> > We have over three millions of files. Each static file is rather
> > small (< 5MB) and has a unique identifier used as well as an URL. As a
> > result, we are in the second case you mentioned. In particular, we should
> > concern about if everybody downloads the same file simultaneously. We
> > replicate each file at least two servers to provide fail over and load
> > balancing. In particular, if a server temporary fails, users can retrieve
> > the files kept on the failing server from another server.
>
> In order for haproxy to route the request correctly it needs to
> know, per url, what the two backend servers should be.  Or needs
> to fail on the server that is temporarily down (make sure you
> define that, and haproxy has the same understanding) and reroute
> traffic the server that is up.  Do you care about the request
> that sees the first failure?
>
> I do not know enough about haproxy yet to determine whatever
> either option is available.
>
> If you replicate all files on server a to server b, then each server
> needs 200% capacity to handle failover.  If you replicate 3 times
> it would be 150% and if you replicate a given resource to a
> random via a consistent hash you get much better behavior.  Make
> sure you consider hot spots.
>
>
> /Allan
> --
> Allan Wind
> Life Integrity, LLC
> 
>
>


-- 
Best Regards,
Rerngvit Yanggratoke


Re: Re: hashing + roundrobin algorithm

2011-11-30 Thread Baptiste
Hi,

Ricardo, from Tuenti, posted a very nice presentation on slideshare:
http://www.slideshare.net/ricbartm/load-balancing-at-tuenti

They explain how they did configure HAProxy to do what you're trying
to achieve ;)

cheers


2011/11/30 wsq003 :
>
> My modification is based on version 1.4.16.
>
> ===in struct server add
> following===
> char vgroup_name[100];
> struct proxy *vgroup; //if not NULL, means this is a Virtual GROUP
>
> ===in function process_chk() add
> at line 1198===
> if (s->vgroup) {
> if ((s->vgroup->lbprm.tot_weight > 0) && !(s->state & SRV_RUNNING)) {
> s->health = s->rise;
> set_server_check_status(s, HCHK_STATUS_L4OK, "vgroup ok");
> set_server_up(s);
> } else if (!(s->vgroup->lbprm.tot_weight > 0) && (s->state & SRV_RUNNING)) {
> s->health = s->rise;
> set_server_check_status(s, HCHK_STATUS_HANA, "vgroup has no available server");
> set_server_down(s);
> }
>
> if (s->state & SRV_RUNNING) {
> s->health = s->rise + s->fall - 1;
> set_server_check_status(s, HCHK_STATUS_L4OK, "vgroup ok");
> }
>
> while (tick_is_expired(t->expire, now_ms))
> t->expire = tick_add(t->expire, MS_TO_TICKS(s->inter));
> return t;
> }
>
> ===in function assign_server()
> add at line 622===
> if (s->srv->vgroup) {
> struct proxy *old = s->be;
> s->be = s->srv->vgroup;
> int ret = assign_server(s);
> s->be = old;
> return ret;
> }
>
> ===in function
> cfg_parse_listen() add at line
> 3949===
> else if (!defsrv && !strcmp(args[cur_arg], "vgroup")) {
> if (!args[cur_arg + 1]) {
> Alert("parsing [%s:%d] : '%s' : missing virtual_group name.\n",
> file, linenum, newsrv->id);
> err_code |= ERR_ALERT | ERR_FATAL;
> goto out;
> }
> if (newsrv->addr.sin_addr.s_addr) {
> //for easy indicate
> Alert("parsing [%s:%d] : '%s' : virtual_group requires the server address as 0.0.0.0\n",
> file, linenum, newsrv->id);
> err_code |= ERR_ALERT | ERR_FATAL;
> goto out;
> }
> newsrv->check_port = 1;
> strlcpy2(newsrv->vgroup_name, args[cur_arg + 1], sizeof(newsrv->vgroup_name));
> cur_arg += 2;
> }
>
> ===in function
> check_config_validdity() add at line
> 5680
> /*
>  * set vgroup if necessary
>  */
> newsrv = curproxy->srv;
> while (newsrv != NULL) {
> if (newsrv->vgroup_name[0] != '\0') {
> struct proxy *px = findproxy(newsrv->vgroup_name, PR_CAP_BE);
> if (px == NULL) {
> Alert("[%s][%s] : vgroup '%s' not exist.\n", curproxy->id, newsrv->id, newsrv->vgroup_name);
> err_code |= ERR_ALERT | ERR_FATAL;
> break;
> }
> newsrv->vgroup = px;
> }
> newsrv = newsrv->next;
> }
>
> ==
>
> and some minor changes in function stats_dump_proxy() that not important.
>
> ==
>
> sample config file looks like:
>
> backend internallighttpd
> option httpchk /monitor/ok.htm
> server wsqa 0.0.0.0 vgroup subproxy1 weight 32 check inter 4000 rise 3 fall 3
> server wsqb 0.0.0.0 vgroup subproxy2 weight 32 check inter 4000 rise 3 fall 3
> balance uri
> hash-type consistent
> option redispatch
> retries 3
>
> backend subproxy1
> option httpchk /monitor/ok.htm
> server wsq01 1.1.1.1:8001 weight 32 check inter 4000 rise 3 fall 3
> server wsq02 1.1.1.2:8001 weight 32 check inter 4000 rise 3 fall 3
> balance roundrobin
> option redispatch
> retries 3
>
> backend subproxy2
> option httpchk /monitor/ok.htm
> server wsq03 1.1.1.1:8002 weight 32 check inter 4000 rise 3 fall 3
> server wsq04 1.1.1.2:8002 weight 32 check inter 4000 rise 3 fall 3
> balance roundrobin
> option redispatch
> retries 3
>
> ==
>
> Sorry I can't provide a clean patch, because vgroup is just one of several
> changes.
> I did not consider the rewrite rules at that time. Maybe we can add
> a fun

Re: Re: hashing + roundrobin algorithm

2011-11-29 Thread wsq003

My modification is based on version 1.4.16.

===in struct server add 
following===
char vgroup_name[100];
struct proxy *vgroup; //if not NULL, means this is a Virtual GROUP

===in function process_chk() add at 
line 1198===
if (s->vgroup) {
if ((s->vgroup->lbprm.tot_weight > 0) && !(s->state & SRV_RUNNING)) {
s->health = s->rise;
set_server_check_status(s, HCHK_STATUS_L4OK, "vgroup ok");
set_server_up(s);
} else if (!(s->vgroup->lbprm.tot_weight > 0) && (s->state & SRV_RUNNING)) {
s->health = s->rise;
set_server_check_status(s, HCHK_STATUS_HANA, "vgroup has no available server");
set_server_down(s);
}

if (s->state & SRV_RUNNING) {
s->health = s->rise + s->fall - 1;
set_server_check_status(s, HCHK_STATUS_L4OK, "vgroup ok");
}

while (tick_is_expired(t->expire, now_ms))
t->expire = tick_add(t->expire, MS_TO_TICKS(s->inter));
return t;
}

===in function assign_server() add 
at line 622===
if (s->srv->vgroup) {
struct proxy *old = s->be;
s->be = s->srv->vgroup;
int ret = assign_server(s);
s->be = old;
return ret;
}

===in function cfg_parse_listen() 
add at line 3949===
else if (!defsrv && !strcmp(args[cur_arg], "vgroup")) {
if (!args[cur_arg + 1]) {
Alert("parsing [%s:%d] : '%s' : missing virtual_group name.\n",
file, linenum, newsrv->id);
err_code |= ERR_ALERT | ERR_FATAL;
goto out;
}
if (newsrv->addr.sin_addr.s_addr) {
//for easy indicate
Alert("parsing [%s:%d] : '%s' : virtual_group requires the server address as 
0.0.0.0\n",
file, linenum, newsrv->id);
err_code |= ERR_ALERT | ERR_FATAL;
goto out;
}
newsrv->check_port = 1;
strlcpy2(newsrv->vgroup_name, args[cur_arg + 1], sizeof(newsrv->vgroup_name));
cur_arg += 2;
}

===in function 
check_config_validdity() add at line 
5680
/*
 * set vgroup if necessary
 */
newsrv = curproxy->srv;
while (newsrv != NULL) {
if (newsrv->vgroup_name[0] != '\0') {
struct proxy *px = findproxy(newsrv->vgroup_name, PR_CAP_BE);
if (px == NULL) {
Alert("[%s][%s] : vgroup '%s' not exist.\n", curproxy->id, newsrv->id, 
newsrv->vgroup_name);
err_code |= ERR_ALERT | ERR_FATAL;
break;
}
newsrv->vgroup = px;
}
newsrv = newsrv->next;
}

==

and some minor changes in function stats_dump_proxy() that not important.

==

sample config file looks like:

backend internallighttpd
option httpchk /monitor/ok.htm
server wsqa 0.0.0.0 vgroup subproxy1 weight 32 check inter 4000 rise 3 fall 
3
server wsqb 0.0.0.0 vgroup subproxy2 weight 32 check inter 4000 rise 3 fall 
3
balance uri
hash-type consistent
option redispatch
retries 3

backend subproxy1
option httpchk /monitor/ok.htm
server wsq01 1.1.1.1:8001 weight 32 check inter 4000 rise 3 fall 3
server wsq02 1.1.1.2:8001 weight 32 check inter 4000 rise 3 fall 3
balance roundrobin
option redispatch
retries 3

backend subproxy2
option httpchk /monitor/ok.htm
server wsq03 1.1.1.1:8002 weight 32 check inter 4000 rise 3 fall 3
server wsq04 1.1.1.2:8002 weight 32 check inter 4000 rise 3 fall 3
balance roundrobin
option redispatch
retries 3

==

Sorry I can't provide a clean patch, because vgroup is just one of several 
changes.
I did not consider the rewrite rules at that time. Maybe we can add a function 
call before calling assigen_server()?


From: Willy Tarreau
Date: 2011-11-30 01:47
To: wsq003
CC: Rerngvit Yanggratoke; haproxy; Baptiste
Subject: Re: Re: hashing + roundrobin algorithm
On Tue, Nov 29, 2011 at 02:56:49PM +0800, wsq003 wrote:
> 
> Backend proxies may be multiple layers, then every layer can have its own LB 
> param.
> Logically this is a tree-like structure, every real server is a leaf. Every 
> none-leaf node is a backend proxy and may have LB param.

I clearly understand what it looks like from the outside. It's still not very
clear how you *concretely* implemented it. Maybe you basically did what I've
been planning for a long time (the internal server) and then your code could
save us some time.

A feature I found important there was to be able to apply backend rewrite rules
ag

Re: Re: hashing + roundrobin algorithm

2011-11-29 Thread Willy Tarreau
On Tue, Nov 29, 2011 at 02:56:49PM +0800, wsq003 wrote:
> 
> Backend proxies may be multiple layers, then every layer can have its own LB 
> param.
> Logically this is a tree-like structure, every real server is a leaf. Every 
> none-leaf node is a backend proxy and may have LB param.

I clearly understand what it looks like from the outside. It's still not very
clear how you *concretely* implemented it. Maybe you basically did what I've
been planning for a long time (the internal server) and then your code could
save us some time.

A feature I found important there was to be able to apply backend rewrite rules
again when selecting a new backend as a server. This is the only way I found to
be able to perform outgoing rewrites.

> When a HTTP request arrives, it go through the tree-like structure to find a 
> proper real server.
> 
> It would be better if official version can provide this feature.

Care to post your code so that we can comment on something real ? :-)

Cheers,
Willy




Re: Re: hashing + roundrobin algorithm

2011-11-28 Thread wsq003

Backend proxies may be multiple layers, then every layer can have its own LB 
param.
Logically this is a tree-like structure, every real server is a leaf. Every 
none-leaf node is a backend proxy and may have LB param.
When a HTTP request arrives, it go through the tree-like structure to find a 
proper real server.

It would be better if official version can provide this feature.


From: Willy Tarreau
Date: 2011-11-29 14:24
To: wsq003
CC: Rerngvit Yanggratoke; haproxy; Baptiste
Subject: Re: Re: hashing + roundrobin algorithm
Hi,

On Tue, Nov 29, 2011 at 01:52:31PM +0800, wsq003 wrote:
> 
> We add a new keyword 'vgroup' under 'server' key word.
> server wsqa 0.0.0.0 vgroup subproxy1 weight 32 check inter 4000 rise 3 
> fall 3 
> means request assigned to this server will be treated as set backend 
> 'subproxy1'. Then in backend 'subproxy1' you can configure any load balance 
> strategy. This can be recursive.
> 
> In source code:
>  At the end of assign_server(), if we found that a server has 'vgroup' 
> property, we will set backend of cur_proxy and call assign_server() again.

Your trick sounds interesting but I'm not sure I completely understand
how it works.

There was a feature I wanted to implement some time ago, it would be sort
of an internal server which would directly map to a frontend (or maybe just
a backend) without passing via a TCP connection. It looks like your trick
does something similar but I just fail to understand how the LB params are
assigned to multiple backends for a given server.

Regards,
Willy

Re: Re: hashing + roundrobin algorithm

2011-11-28 Thread Willy Tarreau
Hi,

On Tue, Nov 29, 2011 at 01:52:31PM +0800, wsq003 wrote:
> 
> We add a new keyword 'vgroup' under 'server' key word.
> server wsqa 0.0.0.0 vgroup subproxy1 weight 32 check inter 4000 rise 3 
> fall 3 
> means request assigned to this server will be treated as set backend 
> 'subproxy1'. Then in backend 'subproxy1' you can configure any load balance 
> strategy. This can be recursive.
> 
> In source code:
>  At the end of assign_server(), if we found that a server has 'vgroup' 
> property, we will set backend of cur_proxy and call assign_server() again.

Your trick sounds interesting but I'm not sure I completely understand
how it works.

There was a feature I wanted to implement some time ago, it would be sort
of an internal server which would directly map to a frontend (or maybe just
a backend) without passing via a TCP connection. It looks like your trick
does something similar but I just fail to understand how the LB params are
assigned to multiple backends for a given server.

Regards,
Willy




Re: Re: hashing + roundrobin algorithm

2011-11-28 Thread wsq003

We add a new keyword 'vgroup' under 'server' key word.
server wsqa 0.0.0.0 vgroup subproxy1 weight 32 check inter 4000 rise 3 fall 
3 
means request assigned to this server will be treated as set backend 
'subproxy1'. Then in backend 'subproxy1' you can configure any load balance 
strategy. This can be recursive.

In source code:
 At the end of assign_server(), if we found that a server has 'vgroup' 
property, we will set backend of cur_proxy and call assign_server() again.


From: Rerngvit Yanggratoke
Date: 2011-11-26 08:33
To: wsq003
CC: Willy Tarreau; haproxy; Baptiste
Subject: Re: Re: hashing + roundrobin algorithm
Hello wsq003,
   That sounds very interesting. It would be great if you could share your 
patch. If that is not possible, providing guideline on how to implement that 
would be helpful as well. Thank you!


2011/11/23 wsq003 


I've made a private patch to haproxy (just a few lines of code, but not 
elegant), which can support this feature. 

My condition is just like your imagination: consistent-hashing to a group then 
round-robin in this group.

Our design is that several 'server' will share a physical machine, and 'severs' 
of one group will be distributed to several physical machine.
So, if one physical machine is down, nothing will pass through the cache layer, 
because every group still works. Then we will get a chance to recover the 
cluster as we want.


From: Willy Tarreau
Date: 2011-11-23 15:15
To: Rerngvit Yanggratoke
CC: haproxy; Baptiste
Subject: Re: hashing + roundrobin algorithm
Hi,

On Fri, Nov 18, 2011 at 05:48:54PM +0100, Rerngvit Yanggratoke wrote:
> Hello All,
> First of all, pardon me if I'm not communicating very well. English
> is not my native language. We are running a static file distribution
> cluster. The cluster consists of many web servers serving static files over
> HTTP.  We have very large number of files such that a single server simply
> can not keep all files (don't have enough disk space). In particular, a
> file can be served only from a subset of servers. Each file is uniquely
> identified by a file's URI. I would refer to this URI later as a key.
> I am investigating deploying HAProxy as a front end to this
> cluster. We want HAProxy to provide load balancing and automatic fail over.
> In other words, a request comes first to HAProxy and HAProxy should forward
> the request to appropriate backend server. More precisely, for a particular
> key, there should be at least two servers being forwarded to from HAProxy
> for the sake of load balancing. My question is what load
> balancing strategy should I use?
> I could use hashing(based on key) or consistent hashing. However,
> each file would end up being served by a single server on a particular
> moment. That means I wouldn't have load balancing and fail over for a
> particular key.

This question is much more a question of architecture than of configuration.
What is important is not what you can do with haproxy, but how you want your
service to run. I suspect that if you acquired hardware and bandwidth to build
your service, you have pretty clear ideas of how your files will be distributed
and/or replicated between your servers. You also know whether you'll serve
millions of files or just a few tens, which means in the first case that you
can safely have one server per URL, and in the later that you would risk
overloading a server if everybody downloads the same file at a time. Maybe
you have installed caches to avoid overloading some servers. You have probably
planned what will happen when you add new servers, and what is supposed to
happen when a server temporarily fails.

All of these are very important questions, they determine whether your site
will work or fail.

Once you're able to respond to these questions, it becomes much more obvious
what the LB strategy can be, if you want to dedicate server farms to some
URLs, or load-balance each hash among a few servers because you have a
particular replication strategy. And once you know what you need, then we
can study how haproxy can respond to this need. Maybe it can't at all, maybe
it's easy to modify it to respond to your needs, maybe it does respond pretty
well.

My guess from what you describe is that it could make a lot of sense to
have one layer of haproxy in front of Varnish caches. The first layer of
haproxy chooses a cache based on a consistent hash of the URL, and each
varnish is then configured to address a small bunch of servers in round
robin. But this means that you need to assign servers to farms, and that
if you lose a varnish, all the servers behind it are lost too.

If your files are present on all servers, it might make sense to use
varnish as explained above but which would round-robin across all servers.
That way you make 

Re: hashing + roundrobin algorithm

2011-11-27 Thread John Marrett

Rerngvit,

However, I couldn't change the file's URL by adding a prefix. The 
client software requesting the files from this cluster are a legacy 
software. It is simply not feasible to change it at the moment.


Does this legacy software by any chance support redirection? If it does 
you could load balance to one or more redirection servers that would 
serve redirects, these redirects could then feed into restructured path 
based URIs that could then be load balanced based on the path to sets of 
servers, as Baptiste (and possibly others) have proposed.


-JohnF



Re: hashing + roundrobin algorithm

2011-11-27 Thread Allan Wind
On 2011-11-26 01:30:41, Rerngvit Yanggratoke wrote:
> We have over three millions of files. Each static file is rather
> small (< 5MB) and has a unique identifier used as well as an URL. As a
> result, we are in the second case you mentioned. In particular, we should
> concern about if everybody downloads the same file simultaneously. We
> replicate each file at least two servers to provide fail over and load
> balancing. In particular, if a server temporary fails, users can retrieve
> the files kept on the failing server from another server.

In order for haproxy to route the request correctly it needs to 
know, per url, what the two backend servers should be.  Or needs 
to fail on the server that is temporarily down (make sure you 
define that, and haproxy has the same understanding) and reroute 
traffic the server that is up.  Do you care about the request 
that sees the first failure?

I do not know enough about haproxy yet to determine whatever 
either option is available.  

If you replicate all files on server a to server b, then each server
needs 200% capacity to handle failover.  If you replicate 3 times 
it would be 150% and if you replicate a given resource to a 
random via a consistent hash you get much better behavior.  Make 
sure you consider hot spots.


/Allan
-- 
Allan Wind
Life Integrity, LLC




Re: Re: hashing + roundrobin algorithm

2011-11-25 Thread Rerngvit Yanggratoke
Hello wsq003,
   That sounds very interesting. It would be great if you could share
your patch. If that is not possible, providing guideline on how to
implement that would be helpful as well. Thank you!

2011/11/23 wsq003 

> **
>
> I've made a private patch to haproxy (just a few lines of code, but not
> elegant), which can support this feature.
>
> My condition is just like your imagination: consistent-hashing to a group
> then round-robin in this group.
>
> Our design is that several 'server' will share a physical machine, and
> 'severs' of one group will be distributed to several physical machine.
> So, if one physical machine is down, nothing will pass through the cache
> layer, because every group still works. Then we will get a chance to
> recover the cluster as we want.
>
>
>  *From:* Willy Tarreau 
> *Date:* 2011-11-23 15:15
> *To:* Rerngvit Yanggratoke 
> *CC:* haproxy ; Baptiste 
> *Subject:* Re: hashing + roundrobin algorithm
>  Hi,
>
> On Fri, Nov 18, 2011 at 05:48:54PM +0100, Rerngvit Yanggratoke wrote:
> > Hello All,
>
> > First of all, pardon me if I'm not communicating very well. English
> > is not my native language. We are running a static file distribution
>
> > cluster. The cluster consists of many web servers serving static files over
>
> > HTTP.  We have very large number of files such that a single server simply
> > can not keep all files (don't have enough disk space). In particular, a
> > file can be served only from a subset of servers. Each file is uniquely
> > identified by a file's URI. I would refer to this URI later as a key.
> > I am investigating deploying HAProxy as a front end to this
>
> > cluster. We want HAProxy to provide load balancing and automatic fail over.
>
> > In other words, a request comes first to HAProxy and HAProxy should forward
>
> > the request to appropriate backend server. More precisely, for a particular
> > key, there should be at least two servers being forwarded to from HAProxy
> > for the sake of load balancing. My question is what load
> > balancing strategy should I use?
> > I could use hashing(based on key) or consistent hashing. However,
> > each file would end up being served by a single server on a particular
> > moment. That means I wouldn't have load balancing and fail over for a
> > particular key.
>
>
> This question is much more a question of architecture than of configuration.
>
> What is important is not what you can do with haproxy, but how you want your
>
> service to run. I suspect that if you acquired hardware and bandwidth to build
>
> your service, you have pretty clear ideas of how your files will be 
> distributed
> and/or replicated between your servers. You also know whether you'll serve
>
> millions of files or just a few tens, which means in the first case that you
> can safely have one server per URL, and in the later that you would risk
> overloading a server if everybody downloads the same file at a time. Maybe
>
> you have installed caches to avoid overloading some servers. You have probably
> planned what will happen when you add new servers, and what is supposed to
> happen when a server temporarily fails.
>
> All of these are very important questions, they determine whether your site
> will work or fail.
>
>
> Once you're able to respond to these questions, it becomes much more obvious
> what the LB strategy can be, if you want to dedicate server farms to some
> URLs, or load-balance each hash among a few servers because you have a
> particular replication strategy. And once you know what you need, then we
>
> can study how haproxy can respond to this need. Maybe it can't at all, maybe
>
> it's easy to modify it to respond to your needs, maybe it does respond pretty
> well.
>
> My guess from what you describe is that it could make a lot of sense to
> have one layer of haproxy in front of Varnish caches. The first layer of
> haproxy chooses a cache based on a consistent hash of the URL, and each
> varnish is then configured to address a small bunch of servers in round
> robin. But this means that you need to assign servers to farms, and that
> if you lose a varnish, all the servers behind it are lost too.
>
> If your files are present on all servers, it might make sense to use
> varnish as explained above but which would round-robin across all servers.
> That way you make the cache layer and the server layer independant of each
> other. But this can imply complex replication strategies.
>
> As you see, there is no single response, you really need to define how you
> want your architecture to work and to scale first.
>
> Regards,
> Willy
>
>
>
>



-- 
Best Regards,
Rerngvit Yanggratoke


Re: hashing + roundrobin algorithm

2011-11-25 Thread Rerngvit Yanggratoke
Dear Willy,
   Thank you for your help. We have a clear performance goal for our
cluster. The goal is high availability and maximizing throughput under a
predefined constant latency. However, we don't have a clear idea what
architecture or software would allow us to achieve that yet.  Let me
provide more details and try to answers your questions then.
We have over three millions of files. Each static file is rather
small (< 5MB) and has a unique identifier used as well as an URL. As a
result, we are in the second case you mentioned. In particular, we should
concern about if everybody downloads the same file simultaneously. We
replicate each file at least two servers to provide fail over and load
balancing. In particular, if a server temporary fails, users can retrieve
the files kept on the failing server from another server.
   We do not have caching layer at the moments. More precisely, every
request is served directly from the web servers. We want the system to
scale linearly with the system size. In particular, when a new server is
added, we want traffic to be equally channeled to the new server compared
to existing servers.
   I will investigate Varnish cache and see if it fits to our system
then.

On Wed, Nov 23, 2011 at 8:15 AM, Willy Tarreau  wrote:

> Hi,
>
> On Fri, Nov 18, 2011 at 05:48:54PM +0100, Rerngvit Yanggratoke wrote:
> > Hello All,
> > First of all, pardon me if I'm not communicating very well.
> English
> > is not my native language. We are running a static file distribution
> > cluster. The cluster consists of many web servers serving static files
> over
> > HTTP.  We have very large number of files such that a single server
> simply
> > can not keep all files (don't have enough disk space). In particular, a
> > file can be served only from a subset of servers. Each file is uniquely
> > identified by a file's URI. I would refer to this URI later as a key.
> > I am investigating deploying HAProxy as a front end to this
> > cluster. We want HAProxy to provide load balancing and automatic fail
> over.
> > In other words, a request comes first to HAProxy and HAProxy should
> forward
> > the request to appropriate backend server. More precisely, for a
> particular
> > key, there should be at least two servers being forwarded to from HAProxy
> > for the sake of load balancing. My question is what load
> > balancing strategy should I use?
> > I could use hashing(based on key) or consistent hashing. However,
> > each file would end up being served by a single server on a particular
> > moment. That means I wouldn't have load balancing and fail over for a
> > particular key.
>
> This question is much more a question of architecture than of
> configuration.
> What is important is not what you can do with haproxy, but how you want
> your
> service to run. I suspect that if you acquired hardware and bandwidth to
> build
> your service, you have pretty clear ideas of how your files will be
> distributed
> and/or replicated between your servers. You also know whether you'll serve
> millions of files or just a few tens, which means in the first case that
> you
> can safely have one server per URL, and in the later that you would risk
> overloading a server if everybody downloads the same file at a time. Maybe
> you have installed caches to avoid overloading some servers. You have
> probably
> planned what will happen when you add new servers, and what is supposed to
> happen when a server temporarily fails.
>
> All of these are very important questions, they determine whether your site
> will work or fail.
>
> Once you're able to respond to these questions, it becomes much more
> obvious
> what the LB strategy can be, if you want to dedicate server farms to some
> URLs, or load-balance each hash among a few servers because you have a
> particular replication strategy. And once you know what you need, then we
> can study how haproxy can respond to this need. Maybe it can't at all,
> maybe
> it's easy to modify it to respond to your needs, maybe it does respond
> pretty
> well.
>
> My guess from what you describe is that it could make a lot of sense to
> have one layer of haproxy in front of Varnish caches. The first layer of
> haproxy chooses a cache based on a consistent hash of the URL, and each
> varnish is then configured to address a small bunch of servers in round
> robin. But this means that you need to assign servers to farms, and that
> if you lose a varnish, all the servers behind it are lost too.
>
> If your files are present on all servers, it might make sense to use
> varnish as explained above but which would round-robin across all servers.
> That way you make the cache layer and the server layer independant of each
> other. But this can imply complex replication strategies.
>
> As you see, there is no single response, you really need to define how you
> want your architecture to work and to scale first.
>
> Regards,
> Willy
>
>
>


-- 
Best Re

Re: hashing + roundrobin algorithm

2011-11-25 Thread Rerngvit Yanggratoke
Thank you, Baptiste.  Let me elaborate more details then. We have over
three millions of files. Each static file is rather small (< 5MB) and has a
unique identifier used as well as an URL. I referred to this URL as a file
key. The performance goal we want from our system is high availability and
maximizing throughput under a predefined constant latency, for example,
300ms.

I think what you suggest would work if I could change the file's URL. I
would be able to manually divide all files into different groups and create
separated directory paths for each group. Then, based on ACL matching each
group, HAProxy would forward a request to a particular backend running the
roundrobin strategy.

However, I couldn't change the file's URL by adding a prefix. The client
software requesting the files from this cluster are a legacy software. It
is simply not feasible to change it at the moment.


On Tue, Nov 22, 2011 at 10:05 PM, Baptiste  wrote:

> Hi,
>
> As long as you don't share more details on how your files are accessed
> and what makes each URL unique, I can't help.
> As I said, splitting your files by directory path or by Host header may be
> good.
>
> Concerning example in haproxy, having the following in your frontend
> will do the stuff:
>  acl dir1 path_beg /dir1/
>  usebackend bk_dir1 if dir1
>  acl dir1 path_beg /dir2/
>  usebackend bk_dir2 if dir2
> ...
>
> then create the backends:
>  bk_dir1
>balance roundrobin
>server srv1
>server srv2
>  bk_dir2
>balance roundrobin
>server srv3
>server srv4
> ...
>
> Hope this helps
>
> On Mon, Nov 21, 2011 at 3:24 PM, Rerngvit Yanggratoke 
> wrote:
> > Dear Baptiste,
> > Could you please exemplify a criterion that would reduce the
> > number of files per backends? And, if possible, how to implement that
> with
> > HAProxy?
> >
> > On Sat, Nov 19, 2011 at 8:29 PM, Baptiste  wrote:
> >>
> >> On Fri, Nov 18, 2011 at 5:48 PM, Rerngvit Yanggratoke 
> >> wrote:
> >> > Hello All,
> >> > First of all, pardon me if I'm not communicating very well.
> >> > English
> >> > is not my native language. We are running a static file distribution
> >> > cluster. The cluster consists of many web servers serving static files
> >> > over
> >> > HTTP.  We have very large number of files such that a single server
> >> > simply
> >> > can not keep all files (don't have enough disk space). In particular,
> a
> >> > file
> >> > can be served only from a subset of servers. Each file is uniquely
> >> > identified by a file's URI. I would refer to this URI later as a key.
> >> > I am investigating deploying HAProxy as a front end to this
> >> > cluster.
> >> > We want HAProxy to provide load balancing and automatic fail over. In
> >> > other
> >> > words, a request comes first to HAProxy and HAProxy should forward the
> >> > request to appropriate backend server. More precisely, for a
> particular
> >> > key,
> >> > there should be at least two servers being forwarded to from HAProxy
> for
> >> > the
> >> > sake of load balancing. My question is what load
> >> > balancing strategy should I
> >> > use?
> >> > I could use hashing(based on key) or consistent hashing.
> >> > However,
> >> > each file would end up being served by a single server on a particular
> >> > moment. That means I wouldn't have load balancing and fail over for a
> >> > particular key.
> >> >Is there something like a combination of hashing and
> >> > roundrobin strategy? In particular, for a particular key, there would
> be
> >> > multiple servers serving the requests and HAProxy selects one of them
> >> > according to roundrobin policy. If there isn't such a stretegy, any
> >> > suggestions on how to implement this into HAProxy? Any other comments
> >> > are
> >> > welcome as well.
> >> >
> >> > --
> >> > Best Regards,
> >> > Rerngvit Yanggratoke
> >> >
> >>
> >> Hi,
> >>
> >> You could create several backends and redirect requests based on an
> >> arbitrary criteria to reduce the number of files per backends. Using
> >> URL path prefix might be a good idea.
> >> Then inside a backend, you can use the hash url load-balancing
> algorithm.
> >>
> >> cheers
> >
> >
> >
> > --
> > Best Regards,
> > Rerngvit Yanggratoke
> >
>



-- 
Best Regards,
Rerngvit Yanggratoke


Re: Re: hashing + roundrobin algorithm

2011-11-23 Thread wsq003

I've made a private patch to haproxy (just a few lines of code, but not 
elegant), which can support this feature. 

My condition is just like your imagination: consistent-hashing to a group then 
round-robin in this group.

Our design is that several 'server' will share a physical machine, and 'severs' 
of one group will be distributed to several physical machine.
So, if one physical machine is down, nothing will pass through the cache layer, 
because every group still works. Then we will get a chance to recover the 
cluster as we want.


From: Willy Tarreau
Date: 2011-11-23 15:15
To: Rerngvit Yanggratoke
CC: haproxy; Baptiste
Subject: Re: hashing + roundrobin algorithm
Hi,

On Fri, Nov 18, 2011 at 05:48:54PM +0100, Rerngvit Yanggratoke wrote:
> Hello All,
> First of all, pardon me if I'm not communicating very well. English
> is not my native language. We are running a static file distribution
> cluster. The cluster consists of many web servers serving static files over
> HTTP.  We have very large number of files such that a single server simply
> can not keep all files (don't have enough disk space). In particular, a
> file can be served only from a subset of servers. Each file is uniquely
> identified by a file's URI. I would refer to this URI later as a key.
> I am investigating deploying HAProxy as a front end to this
> cluster. We want HAProxy to provide load balancing and automatic fail over.
> In other words, a request comes first to HAProxy and HAProxy should forward
> the request to appropriate backend server. More precisely, for a particular
> key, there should be at least two servers being forwarded to from HAProxy
> for the sake of load balancing. My question is what load
> balancing strategy should I use?
> I could use hashing(based on key) or consistent hashing. However,
> each file would end up being served by a single server on a particular
> moment. That means I wouldn't have load balancing and fail over for a
> particular key.

This question is much more a question of architecture than of configuration.
What is important is not what you can do with haproxy, but how you want your
service to run. I suspect that if you acquired hardware and bandwidth to build
your service, you have pretty clear ideas of how your files will be distributed
and/or replicated between your servers. You also know whether you'll serve
millions of files or just a few tens, which means in the first case that you
can safely have one server per URL, and in the later that you would risk
overloading a server if everybody downloads the same file at a time. Maybe
you have installed caches to avoid overloading some servers. You have probably
planned what will happen when you add new servers, and what is supposed to
happen when a server temporarily fails.

All of these are very important questions, they determine whether your site
will work or fail.

Once you're able to respond to these questions, it becomes much more obvious
what the LB strategy can be, if you want to dedicate server farms to some
URLs, or load-balance each hash among a few servers because you have a
particular replication strategy. And once you know what you need, then we
can study how haproxy can respond to this need. Maybe it can't at all, maybe
it's easy to modify it to respond to your needs, maybe it does respond pretty
well.

My guess from what you describe is that it could make a lot of sense to
have one layer of haproxy in front of Varnish caches. The first layer of
haproxy chooses a cache based on a consistent hash of the URL, and each
varnish is then configured to address a small bunch of servers in round
robin. But this means that you need to assign servers to farms, and that
if you lose a varnish, all the servers behind it are lost too.

If your files are present on all servers, it might make sense to use
varnish as explained above but which would round-robin across all servers.
That way you make the cache layer and the server layer independant of each
other. But this can imply complex replication strategies.

As you see, there is no single response, you really need to define how you
want your architecture to work and to scale first.

Regards,
Willy

Re: hashing + roundrobin algorithm

2011-11-22 Thread Willy Tarreau
Hi,

On Fri, Nov 18, 2011 at 05:48:54PM +0100, Rerngvit Yanggratoke wrote:
> Hello All,
> First of all, pardon me if I'm not communicating very well. English
> is not my native language. We are running a static file distribution
> cluster. The cluster consists of many web servers serving static files over
> HTTP.  We have very large number of files such that a single server simply
> can not keep all files (don't have enough disk space). In particular, a
> file can be served only from a subset of servers. Each file is uniquely
> identified by a file's URI. I would refer to this URI later as a key.
> I am investigating deploying HAProxy as a front end to this
> cluster. We want HAProxy to provide load balancing and automatic fail over.
> In other words, a request comes first to HAProxy and HAProxy should forward
> the request to appropriate backend server. More precisely, for a particular
> key, there should be at least two servers being forwarded to from HAProxy
> for the sake of load balancing. My question is what load
> balancing strategy should I use?
> I could use hashing(based on key) or consistent hashing. However,
> each file would end up being served by a single server on a particular
> moment. That means I wouldn't have load balancing and fail over for a
> particular key.

This question is much more a question of architecture than of configuration.
What is important is not what you can do with haproxy, but how you want your
service to run. I suspect that if you acquired hardware and bandwidth to build
your service, you have pretty clear ideas of how your files will be distributed
and/or replicated between your servers. You also know whether you'll serve
millions of files or just a few tens, which means in the first case that you
can safely have one server per URL, and in the later that you would risk
overloading a server if everybody downloads the same file at a time. Maybe
you have installed caches to avoid overloading some servers. You have probably
planned what will happen when you add new servers, and what is supposed to
happen when a server temporarily fails.

All of these are very important questions, they determine whether your site
will work or fail.

Once you're able to respond to these questions, it becomes much more obvious
what the LB strategy can be, if you want to dedicate server farms to some
URLs, or load-balance each hash among a few servers because you have a
particular replication strategy. And once you know what you need, then we
can study how haproxy can respond to this need. Maybe it can't at all, maybe
it's easy to modify it to respond to your needs, maybe it does respond pretty
well.

My guess from what you describe is that it could make a lot of sense to
have one layer of haproxy in front of Varnish caches. The first layer of
haproxy chooses a cache based on a consistent hash of the URL, and each
varnish is then configured to address a small bunch of servers in round
robin. But this means that you need to assign servers to farms, and that
if you lose a varnish, all the servers behind it are lost too.

If your files are present on all servers, it might make sense to use
varnish as explained above but which would round-robin across all servers.
That way you make the cache layer and the server layer independant of each
other. But this can imply complex replication strategies.

As you see, there is no single response, you really need to define how you
want your architecture to work and to scale first.

Regards,
Willy




Re: hashing + roundrobin algorithm

2011-11-22 Thread Baptiste
Hi,

As long as you don't share more details on how your files are accessed
and what makes each URL unique, I can't help.
As I said, splitting your files by directory path or by Host header may be good.

Concerning example in haproxy, having the following in your frontend
will do the stuff:
  acl dir1 path_beg /dir1/
  usebackend bk_dir1 if dir1
  acl dir1 path_beg /dir2/
  usebackend bk_dir2 if dir2
...

then create the backends:
  bk_dir1
balance roundrobin
server srv1
server srv2
  bk_dir2
balance roundrobin
server srv3
server srv4
...

Hope this helps

On Mon, Nov 21, 2011 at 3:24 PM, Rerngvit Yanggratoke  wrote:
> Dear Baptiste,
>             Could you please exemplify a criterion that would reduce the
> number of files per backends? And, if possible, how to implement that with
> HAProxy?
>
> On Sat, Nov 19, 2011 at 8:29 PM, Baptiste  wrote:
>>
>> On Fri, Nov 18, 2011 at 5:48 PM, Rerngvit Yanggratoke 
>> wrote:
>> > Hello All,
>> >         First of all, pardon me if I'm not communicating very well.
>> > English
>> > is not my native language. We are running a static file distribution
>> > cluster. The cluster consists of many web servers serving static files
>> > over
>> > HTTP.  We have very large number of files such that a single server
>> > simply
>> > can not keep all files (don't have enough disk space). In particular, a
>> > file
>> > can be served only from a subset of servers. Each file is uniquely
>> > identified by a file's URI. I would refer to this URI later as a key.
>> >         I am investigating deploying HAProxy as a front end to this
>> > cluster.
>> > We want HAProxy to provide load balancing and automatic fail over. In
>> > other
>> > words, a request comes first to HAProxy and HAProxy should forward the
>> > request to appropriate backend server. More precisely, for a particular
>> > key,
>> > there should be at least two servers being forwarded to from HAProxy for
>> > the
>> > sake of load balancing. My question is what load
>> > balancing strategy should I
>> > use?
>> >         I could use hashing(based on key) or consistent hashing.
>> > However,
>> > each file would end up being served by a single server on a particular
>> > moment. That means I wouldn't have load balancing and fail over for a
>> > particular key.
>> >        Is there something like a combination of hashing and
>> > roundrobin strategy? In particular, for a particular key, there would be
>> > multiple servers serving the requests and HAProxy selects one of them
>> > according to roundrobin policy. If there isn't such a stretegy, any
>> > suggestions on how to implement this into HAProxy? Any other comments
>> > are
>> > welcome as well.
>> >
>> > --
>> > Best Regards,
>> > Rerngvit Yanggratoke
>> >
>>
>> Hi,
>>
>> You could create several backends and redirect requests based on an
>> arbitrary criteria to reduce the number of files per backends. Using
>> URL path prefix might be a good idea.
>> Then inside a backend, you can use the hash url load-balancing algorithm.
>>
>> cheers
>
>
>
> --
> Best Regards,
> Rerngvit Yanggratoke
>



Re: hashing + roundrobin algorithm

2011-11-21 Thread Rerngvit Yanggratoke
Dear Baptiste,
Could you please exemplify a criterion that would reduce the
number of files per backends? And, if possible, how to implement that with
HAProxy?

On Sat, Nov 19, 2011 at 8:29 PM, Baptiste  wrote:

> On Fri, Nov 18, 2011 at 5:48 PM, Rerngvit Yanggratoke 
> wrote:
> > Hello All,
> > First of all, pardon me if I'm not communicating very well.
> English
> > is not my native language. We are running a static file distribution
> > cluster. The cluster consists of many web servers serving static files
> over
> > HTTP.  We have very large number of files such that a single server
> simply
> > can not keep all files (don't have enough disk space). In particular, a
> file
> > can be served only from a subset of servers. Each file is uniquely
> > identified by a file's URI. I would refer to this URI later as a key.
> > I am investigating deploying HAProxy as a front end to this
> cluster.
> > We want HAProxy to provide load balancing and automatic fail over. In
> other
> > words, a request comes first to HAProxy and HAProxy should forward the
> > request to appropriate backend server. More precisely, for a particular
> key,
> > there should be at least two servers being forwarded to from HAProxy for
> the
> > sake of load balancing. My question is what load
> balancing strategy should I
> > use?
> > I could use hashing(based on key) or consistent hashing. However,
> > each file would end up being served by a single server on a particular
> > moment. That means I wouldn't have load balancing and fail over for a
> > particular key.
> >Is there something like a combination of hashing and
> > roundrobin strategy? In particular, for a particular key, there would be
> > multiple servers serving the requests and HAProxy selects one of them
> > according to roundrobin policy. If there isn't such a stretegy, any
> > suggestions on how to implement this into HAProxy? Any other comments are
> > welcome as well.
> >
> > --
> > Best Regards,
> > Rerngvit Yanggratoke
> >
>
> Hi,
>
> You could create several backends and redirect requests based on an
> arbitrary criteria to reduce the number of files per backends. Using
> URL path prefix might be a good idea.
> Then inside a backend, you can use the hash url load-balancing algorithm.
>
> cheers
>



-- 
Best Regards,
Rerngvit Yanggratoke


Re: hashing + roundrobin algorithm

2011-11-19 Thread Baptiste
On Fri, Nov 18, 2011 at 5:48 PM, Rerngvit Yanggratoke  wrote:
> Hello All,
>         First of all, pardon me if I'm not communicating very well. English
> is not my native language. We are running a static file distribution
> cluster. The cluster consists of many web servers serving static files over
> HTTP.  We have very large number of files such that a single server simply
> can not keep all files (don't have enough disk space). In particular, a file
> can be served only from a subset of servers. Each file is uniquely
> identified by a file's URI. I would refer to this URI later as a key.
>         I am investigating deploying HAProxy as a front end to this cluster.
> We want HAProxy to provide load balancing and automatic fail over. In other
> words, a request comes first to HAProxy and HAProxy should forward the
> request to appropriate backend server. More precisely, for a particular key,
> there should be at least two servers being forwarded to from HAProxy for the
> sake of load balancing. My question is what load balancing strategy should I
> use?
>         I could use hashing(based on key) or consistent hashing. However,
> each file would end up being served by a single server on a particular
> moment. That means I wouldn't have load balancing and fail over for a
> particular key.
>        Is there something like a combination of hashing and
> roundrobin strategy? In particular, for a particular key, there would be
> multiple servers serving the requests and HAProxy selects one of them
> according to roundrobin policy. If there isn't such a stretegy, any
> suggestions on how to implement this into HAProxy? Any other comments are
> welcome as well.
>
> --
> Best Regards,
> Rerngvit Yanggratoke
>

Hi,

You could create several backends and redirect requests based on an
arbitrary criteria to reduce the number of files per backends. Using
URL path prefix might be a good idea.
Then inside a backend, you can use the hash url load-balancing algorithm.

cheers