Re: [squid-users] Re: can squid load data into cache faster than sending it out?
On Thu, May 12, 2011 at 01:37:13PM +1200, Amos Jeffries wrote: On 12/05/11 08:18, Dave Dykstra wrote: ... So its a choice of being partially vulnerable to slow loris style attacks (timeouts etc prevent full vulnerability) or packet amplification on a massive scale. Just to make sure I understand you, in both cases you're talking about attacks, not normal operation, right? And are you saying that it is easier to mitigate the trickle-feed attack than the packet-amplification attack, so trickle-feed is less bad? I'm not so worried about attacks as normal operation. Both are real traffic types, the attack form is just artificially induced to make it worse. Like ping-flooding in the 90's it happens normally, but not often. All it takes is a large number of slow clients requesting non-identical URLs. IIRC it was noticed worse by cellphone networks with very large numbers of very slow GSM clients. A client connects sends request, Squid reads back N bytes from server and sends N-M to the client. Repeat until all FD available in Squid are consumed. During which time M bytes of packets are overflowing the server link for each 2 FD used. If the total of all M is greater than the server link size... Under the current design the worst case is Server running out of FD first and reject new connections. Or TCP protections dropping connections and Squid aborting the clients early. The overflow factor is 32K or 64K linear with the number of FD and cant happen naturally where the client does read the data just slowly. With my application the server has a limit on the number of parallel connections it has to its backend database, so there is no danger of overflowing the bandwidth between the reverse-proxy squid and its server (also, they're on the same machine so the network is intra-machine). If there are many clients that suddenly make large requests they are put into a queue on the server until they get their turn, and meanwhile the server sends keepalive messages every 5 seconds so the clients don't timeout. With my preferred behavior, the squid would read that data from the server as fast as possible, and then it wouldn't make any difference to the squid-to-server link if the clients had low bandwidth or high bandwidth. I'll submit a feature request for an option to bugzilla. Thanks a lot for your explanations, Amos. - Dave
Re: [squid-users] Re: can squid load data into cache faster than sending it out?
On 11/05/11 04:34, Dave Dykstra wrote: On Sat, May 07, 2011 at 02:32:22PM +1200, Amos Jeffries wrote: On 07/05/11 08:54, Dave Dykstra wrote: Ah, but as explained here http://www.squid-cache.org/mail-archive/squid-users/200903/0509.html this does risk using up a lot of memory because squid keeps all of the read-ahead data in memory. I don't see a reason why it couldn't instead write it all out to the disk cache as normal and then read it back from there as needed. Is there some way to do that currently? If not, Squid should be writing to the cache in parallel to the data arrival, the only bit required in memory being the bit queued for sending to the client. Which gets bigger, and bigger... up to the read_ahead_gap limit. Amos, Yes, it makes sense that it's writing to the disk cache in parallel, but what I'm asking for is a way to get squid to keep reading from the origin server as fast as it can without reserving all that memory. I'm asking for an option to not block the reading from the origin server writing to the cache when the read_ahead_gap is full, and instead read data back from the cache to write it out when the client is ready for more. Most likely the data will still be in the filesystem cache so it will be fast. That will have to be a configuration option. We had a LOT of complaints when we accidentally made several 3.0 act that way. IIRC it is supposed to be taken out of the cache_mem space available, but I've not seen anything to confirm that. I'm sure that's not the case, because I have been able to force the memory usage to grow by more than the cache_mem setting by doing a number of wgets of largeish requests in parallel using --limit-rate to cause them to take a long time. Besides, it doesn't make sense that it would do that when the read_ahead_gap is far greater than maximum_object_size_in_memory. Okay. Thanks for that info. perhaps I'll just submit a ticket as a feature request. I *think* that under normal circumstances in my application squid won't run out of memory, but I'll see after running it in production for a while. So far I haven't seen a problem but I can imagine ways that it could cause too much growth so I'm worried that one day it will. Yes, both approaches lead to problems. The trickle-feed approach used now leads to resource holding on the Server. Not doing it leads to bandwidth overload as Squid downloads N objects for N clients and only has to send back one packet to each client. So its a choice of being partially vulnerable to slow loris style attacks (timeouts etc prevent full vulnerability) or packet amplification on a massive scale. Amos -- Please be using Current Stable Squid 2.7.STABLE9 or 3.1.12 Beta testers wanted for 3.2.0.7 and 3.1.12.1
Re: [squid-users] Re: can squid load data into cache faster than sending it out?
On Wed, May 11, 2011 at 09:05:08PM +1200, Amos Jeffries wrote: On 11/05/11 04:34, Dave Dykstra wrote: On Sat, May 07, 2011 at 02:32:22PM +1200, Amos Jeffries wrote: On 07/05/11 08:54, Dave Dykstra wrote: Ah, but as explained here http://www.squid-cache.org/mail-archive/squid-users/200903/0509.html this does risk using up a lot of memory because squid keeps all of the read-ahead data in memory. I don't see a reason why it couldn't instead write it all out to the disk cache as normal and then read it back from there as needed. Is there some way to do that currently? If not, Squid should be writing to the cache in parallel to the data arrival, the only bit required in memory being the bit queued for sending to the client. Which gets bigger, and bigger... up to the read_ahead_gap limit. Amos, Yes, it makes sense that it's writing to the disk cache in parallel, but what I'm asking for is a way to get squid to keep reading from the origin server as fast as it can without reserving all that memory. I'm asking for an option to not block the reading from the origin server writing to the cache when the read_ahead_gap is full, and instead read data back from the cache to write it out when the client is ready for more. Most likely the data will still be in the filesystem cache so it will be fast. That will have to be a configuration option. We had a LOT of complaints when we accidentally made several 3.0 act that way. That's interesting. I'm curious about what people didn't like about it, do you remember details? ... perhaps I'll just submit a ticket as a feature request. I *think* that under normal circumstances in my application squid won't run out of memory, but I'll see after running it in production for a while. So far I haven't seen a problem but I can imagine ways that it could cause too much growth so I'm worried that one day it will. Yes, both approaches lead to problems. The trickle-feed approach used now leads to resource holding on the Server. Not doing it leads to bandwidth overload as Squid downloads N objects for N clients and only has to send back one packet to each client. So its a choice of being partially vulnerable to slow loris style attacks (timeouts etc prevent full vulnerability) or packet amplification on a massive scale. Just to make sure I understand you, in both cases you're talking about attacks, not normal operation, right? And are you saying that it is easier to mitigate the trickle-feed attack than the packet-amplification attack, so trickle-feed is less bad? I'm not so worried about attacks as normal operation. Thanks, - Dave
Re: [squid-users] Re: can squid load data into cache faster than sending it out?
On 12/05/11 08:18, Dave Dykstra wrote: On Wed, May 11, 2011 at 09:05:08PM +1200, Amos Jeffries wrote: On 11/05/11 04:34, Dave Dykstra wrote: On Sat, May 07, 2011 at 02:32:22PM +1200, Amos Jeffries wrote: On 07/05/11 08:54, Dave Dykstra wrote: Ah, but as explained here http://www.squid-cache.org/mail-archive/squid-users/200903/0509.html this does risk using up a lot of memory because squid keeps all of the read-ahead data in memory. I don't see a reason why it couldn't instead write it all out to the disk cache as normal and then read it back from there as needed. Is there some way to do that currently? If not, Squid should be writing to the cache in parallel to the data arrival, the only bit required in memory being the bit queued for sending to the client. Which gets bigger, and bigger... up to the read_ahead_gap limit. Amos, Yes, it makes sense that it's writing to the disk cache in parallel, but what I'm asking for is a way to get squid to keep reading from the origin server as fast as it can without reserving all that memory. I'm asking for an option to not block the reading from the origin server writing to the cache when the read_ahead_gap is full, and instead read data back from the cache to write it out when the client is ready for more. Most likely the data will still be in the filesystem cache so it will be fast. That will have to be a configuration option. We had a LOT of complaints when we accidentally made several 3.0 act that way. That's interesting. I'm curious about what people didn't like about it, do you remember details? The bandwidth overflow mentioned below. ... perhaps I'll just submit a ticket as a feature request. I *think* that under normal circumstances in my application squid won't run out of memory, but I'll see after running it in production for a while. So far I haven't seen a problem but I can imagine ways that it could cause too much growth so I'm worried that one day it will. Yes, both approaches lead to problems. The trickle-feed approach used now leads to resource holding on the Server. Not doing it leads to bandwidth overload as Squid downloads N objects for N clients and only has to send back one packet to each client. So its a choice of being partially vulnerable to slow loris style attacks (timeouts etc prevent full vulnerability) or packet amplification on a massive scale. Just to make sure I understand you, in both cases you're talking about attacks, not normal operation, right? And are you saying that it is easier to mitigate the trickle-feed attack than the packet-amplification attack, so trickle-feed is less bad? I'm not so worried about attacks as normal operation. Both are real traffic types, the attack form is just artificially induced to make it worse. Like ping-flooding in the 90's it happens normally, but not often. All it takes is a large number of slow clients requesting non-identical URLs. IIRC it was noticed worse by cellphone networks with very large numbers of very slow GSM clients. A client connects sends request, Squid reads back N bytes from server and sends N-M to the client. Repeat until all FD available in Squid are consumed. During which time M bytes of packets are overflowing the server link for each 2 FD used. If the total of all M is greater than the server link size... Under the current design the worst case is Server running out of FD first and reject new connections. Or TCP protections dropping connections and Squid aborting the clients early. The overflow factor is 32K or 64K linear with the number of FD and cant happen naturally where the client does read the data just slowly. Amos -- Please be using Current Stable Squid 2.7.STABLE9 or 3.1.12 Beta testers wanted for 3.2.0.7 and 3.1.12.1
Re: [squid-users] Re: can squid load data into cache faster than sending it out?
On Sat, May 07, 2011 at 02:32:22PM +1200, Amos Jeffries wrote: On 07/05/11 08:54, Dave Dykstra wrote: Ah, but as explained here http://www.squid-cache.org/mail-archive/squid-users/200903/0509.html this does risk using up a lot of memory because squid keeps all of the read-ahead data in memory. I don't see a reason why it couldn't instead write it all out to the disk cache as normal and then read it back from there as needed. Is there some way to do that currently? If not, Squid should be writing to the cache in parallel to the data arrival, the only bit required in memory being the bit queued for sending to the client. Which gets bigger, and bigger... up to the read_ahead_gap limit. Amos, Yes, it makes sense that it's writing to the disk cache in parallel, but what I'm asking for is a way to get squid to keep reading from the origin server as fast as it can without reserving all that memory. I'm asking for an option to not block the reading from the origin server writing to the cache when the read_ahead_gap is full, and instead read data back from the cache to write it out when the client is ready for more. Most likely the data will still be in the filesystem cache so it will be fast. IIRC it is supposed to be taken out of the cache_mem space available, but I've not seen anything to confirm that. I'm sure that's not the case, because I have been able to force the memory usage to grow by more than the cache_mem setting by doing a number of wgets of largeish requests in parallel using --limit-rate to cause them to take a long time. Besides, it doesn't make sense that it would do that when the read_ahead_gap is far greater than maximum_object_size_in_memory. perhaps I'll just submit a ticket as a feature request. I *think* that under normal circumstances in my application squid won't run out of memory, but I'll see after running it in production for a while. So far I haven't seen a problem but I can imagine ways that it could cause too much growth so I'm worried that one day it will. - Dave
Re: [squid-users] Re: can squid load data into cache faster than sending it out?
Ah, but as explained here http://www.squid-cache.org/mail-archive/squid-users/200903/0509.html this does risk using up a lot of memory because squid keeps all of the read-ahead data in memory. I don't see a reason why it couldn't instead write it all out to the disk cache as normal and then read it back from there as needed. Is there some way to do that currently? If not, perhaps I'll just submit a ticket as a feature request. I *think* that under normal circumstances in my application squid won't run out of memory, but I'll see after running it in production for a while. - Dave On Wed, May 04, 2011 at 02:52:12PM -0500, Dave Dykstra wrote: I found the answer: set read_ahead_gap to a buffer larger than the largest data chunk I transfer. - Dave On Wed, May 04, 2011 at 09:11:59AM -0500, Dave Dykstra wrote: I have a reverse proxy squid on the same machine as my origin server. Sometimes queries from squid are sent around the world and can be very slow, for example today there is one client taking 40 minutes to transfer 46MB. When the data is being transferred from the origin server, the connection between squid and the origin server is tied up for the entire 40 minutes, leaving it unavailable for other work (there's only a small number of connections allowed by the origin server to its upstream database). My question is, can squid be configured to take in the data from the origin server as fast as it can and cache it, and then send out the data to the client as bandwidth allows? I would want it to stream to the client during this process too, but not block the transfer from origin server to squid if the client is slow. I'm using squid-2.7STABLE9, and possibly relevant non-default squid.conf options I'm using are: http_port 8000 accel defaultsite=127.0.0.1:8080 cache_peer 127.0.0.1 parent 8080 0 no-query originserver collapsed_forwarding on - Dave
Re: [squid-users] Re: can squid load data into cache faster than sending it out?
On 07/05/11 08:54, Dave Dykstra wrote: Ah, but as explained here http://www.squid-cache.org/mail-archive/squid-users/200903/0509.html this does risk using up a lot of memory because squid keeps all of the read-ahead data in memory. I don't see a reason why it couldn't instead write it all out to the disk cache as normal and then read it back from there as needed. Is there some way to do that currently? If not, Squid should be writing to the cache in parallel to the data arrival, the only bit required in memory being the bit queued for sending to the client. Which gets bigger, and bigger... up to the read_ahead_gap limit. IIRC it is supposed to be taken out of the cache_mem space available, but I've not seen anything to confirm that. perhaps I'll just submit a ticket as a feature request. I *think* that under normal circumstances in my application squid won't run out of memory, but I'll see after running it in production for a while. - Dave On Wed, May 04, 2011 at 02:52:12PM -0500, Dave Dykstra wrote: I found the answer: set read_ahead_gap to a buffer larger than the largest data chunk I transfer. Amos -- Please be using Current Stable Squid 2.7.STABLE9 or 3.1.12 Beta testers wanted for 3.2.0.7 and 3.1.12.1