Re: [squid-users] Re: can squid load data into cache faster than sending it out?

2011-05-12 Thread Dave Dykstra
On Thu, May 12, 2011 at 01:37:13PM +1200, Amos Jeffries wrote:
 On 12/05/11 08:18, Dave Dykstra wrote:
...
   So its a choice of being partially vulnerable to slow loris style
 attacks (timeouts etc prevent full vulnerability) or packet
 amplification on a massive scale.
 
 Just to make sure I understand you, in both cases you're talking about
 attacks, not normal operation, right?  And are you saying that it is
 easier to mitigate the trickle-feed attack than the packet-amplification
 attack, so trickle-feed is less bad?  I'm not so worried about attacks
 as normal operation.
 
 
 Both are real traffic types, the attack form is just artificially
 induced to make it worse. Like ping-flooding in the 90's it happens
 normally, but not often. All it takes is a large number of slow
 clients requesting non-identical URLs.
 
 IIRC it was noticed worse by cellphone networks with very large
 numbers of very slow GSM clients.
  A client connects sends request, Squid reads back N bytes from
 server and sends N-M to the client. Repeat until all FD available in
 Squid are consumed. During which time M bytes of packets are
 overflowing the server link for each 2 FD used. If the total of all
 M is greater than the server link size...
 
 Under the current design the worst case is Server running out of FD
 first and reject new connections. Or TCP protections dropping
 connections and Squid aborting the clients early. The overflow
 factor is 32K or 64K linear with the number of FD and cant happen
 naturally where the client does read the data just slowly.

With my application the server has a limit on the number of parallel
connections it has to its backend database, so there is no danger of
overflowing the bandwidth between the reverse-proxy squid and its server
(also, they're on the same machine so the network is intra-machine).
If there are many clients that suddenly make large requests they are put
into a queue on the server until they get their turn, and meanwhile the
server sends keepalive messages every 5 seconds so the clients don't
timeout.  With my preferred behavior, the squid would read that data
from the server as fast as possible, and then it wouldn't make any
difference to the squid-to-server link if the clients had low bandwidth
or high bandwidth.

I'll submit a feature request for an option to bugzilla.

Thanks a lot for your explanations, Amos.

- Dave


Re: [squid-users] Re: can squid load data into cache faster than sending it out?

2011-05-11 Thread Amos Jeffries

On 11/05/11 04:34, Dave Dykstra wrote:

On Sat, May 07, 2011 at 02:32:22PM +1200, Amos Jeffries wrote:

On 07/05/11 08:54, Dave Dykstra wrote:

Ah, but as explained here
 http://www.squid-cache.org/mail-archive/squid-users/200903/0509.html
this does risk using up a lot of memory because squid keeps all of the
read-ahead data in memory.  I don't see a reason why it couldn't instead
write it all out to the disk cache as normal and then read it back from
there as needed.  Is there some way to do that currently?  If not,


Squid should be writing to the cache in parallel to the data
arrival, the only bit required in memory being the bit queued for
sending to the client.  Which gets bigger, and bigger... up to the
read_ahead_gap limit.


Amos,

Yes, it makes sense that it's writing to the disk cache in parallel, but
what I'm asking for is a way to get squid to keep reading from the
origin server as fast as it can without reserving all that memory.  I'm
asking for an option to not block the reading from the origin server
writing to the cache when the read_ahead_gap is full, and instead read
data back from the cache to write it out when the client is ready for
more.  Most likely the data will still be in the filesystem cache so it
will be fast.


That will have to be a configuration option. We had a LOT of complaints 
when we accidentally made several 3.0 act that way.





IIRC it is supposed to be taken out of the cache_mem space
available, but I've not seen anything to confirm that.


I'm sure that's not the case, because I have been able to force the
memory usage to grow by more than the cache_mem setting by doing a
number of wgets of largeish requests in parallel using --limit-rate to
cause them to take a long time.  Besides, it doesn't make sense that it
would do that when the read_ahead_gap is far greater than
maximum_object_size_in_memory.


Okay. Thanks for that info.




perhaps I'll just submit a ticket as a feature request.  I *think* that
under normal circumstances in my application squid won't run out of
memory, but I'll see after running it in production for a while.


So far I haven't seen a problem but I can imagine ways that it could
cause too much growth so I'm worried that one day it will.



Yes, both approaches lead to problems.  The trickle-feed approach used 
now leads to resource holding on the Server. Not doing it leads to 
bandwidth overload as Squid downloads N objects for N clients and only 
has to send back one packet to each client.
 So its a choice of being partially vulnerable to slow loris style 
attacks (timeouts etc prevent full vulnerability) or packet 
amplification on a massive scale.


Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.12
  Beta testers wanted for 3.2.0.7 and 3.1.12.1


Re: [squid-users] Re: can squid load data into cache faster than sending it out?

2011-05-11 Thread Dave Dykstra
On Wed, May 11, 2011 at 09:05:08PM +1200, Amos Jeffries wrote:
 On 11/05/11 04:34, Dave Dykstra wrote:
 On Sat, May 07, 2011 at 02:32:22PM +1200, Amos Jeffries wrote:
 On 07/05/11 08:54, Dave Dykstra wrote:
 Ah, but as explained here
  http://www.squid-cache.org/mail-archive/squid-users/200903/0509.html
 this does risk using up a lot of memory because squid keeps all of the
 read-ahead data in memory.  I don't see a reason why it couldn't instead
 write it all out to the disk cache as normal and then read it back from
 there as needed.  Is there some way to do that currently?  If not,
 
 Squid should be writing to the cache in parallel to the data
 arrival, the only bit required in memory being the bit queued for
 sending to the client.  Which gets bigger, and bigger... up to the
 read_ahead_gap limit.
 
 Amos,
 
 Yes, it makes sense that it's writing to the disk cache in parallel, but
 what I'm asking for is a way to get squid to keep reading from the
 origin server as fast as it can without reserving all that memory.  I'm
 asking for an option to not block the reading from the origin server
 writing to the cache when the read_ahead_gap is full, and instead read
 data back from the cache to write it out when the client is ready for
 more.  Most likely the data will still be in the filesystem cache so it
 will be fast.
 
 That will have to be a configuration option. We had a LOT of
 complaints when we accidentally made several 3.0 act that way.

That's interesting.  I'm curious about what people didn't like about it,
do you remember details?


...
 perhaps I'll just submit a ticket as a feature request.  I *think* that
 under normal circumstances in my application squid won't run out of
 memory, but I'll see after running it in production for a while.
 
 So far I haven't seen a problem but I can imagine ways that it could
 cause too much growth so I'm worried that one day it will.
 
 Yes, both approaches lead to problems.  The trickle-feed approach
 used now leads to resource holding on the Server. Not doing it leads
 to bandwidth overload as Squid downloads N objects for N clients and
 only has to send back one packet to each client.
  So its a choice of being partially vulnerable to slow loris style
 attacks (timeouts etc prevent full vulnerability) or packet
 amplification on a massive scale.

Just to make sure I understand you, in both cases you're talking about
attacks, not normal operation, right?  And are you saying that it is
easier to mitigate the trickle-feed attack than the packet-amplification
attack, so trickle-feed is less bad?  I'm not so worried about attacks
as normal operation.

Thanks,

- Dave


Re: [squid-users] Re: can squid load data into cache faster than sending it out?

2011-05-11 Thread Amos Jeffries

On 12/05/11 08:18, Dave Dykstra wrote:

On Wed, May 11, 2011 at 09:05:08PM +1200, Amos Jeffries wrote:

On 11/05/11 04:34, Dave Dykstra wrote:

On Sat, May 07, 2011 at 02:32:22PM +1200, Amos Jeffries wrote:

On 07/05/11 08:54, Dave Dykstra wrote:

Ah, but as explained here
 http://www.squid-cache.org/mail-archive/squid-users/200903/0509.html
this does risk using up a lot of memory because squid keeps all of the
read-ahead data in memory.  I don't see a reason why it couldn't instead
write it all out to the disk cache as normal and then read it back from
there as needed.  Is there some way to do that currently?  If not,


Squid should be writing to the cache in parallel to the data
arrival, the only bit required in memory being the bit queued for
sending to the client.  Which gets bigger, and bigger... up to the
read_ahead_gap limit.


Amos,

Yes, it makes sense that it's writing to the disk cache in parallel, but
what I'm asking for is a way to get squid to keep reading from the
origin server as fast as it can without reserving all that memory.  I'm
asking for an option to not block the reading from the origin server
writing to the cache when the read_ahead_gap is full, and instead read
data back from the cache to write it out when the client is ready for
more.  Most likely the data will still be in the filesystem cache so it
will be fast.


That will have to be a configuration option. We had a LOT of
complaints when we accidentally made several 3.0 act that way.


That's interesting.  I'm curious about what people didn't like about it,
do you remember details?



The bandwidth overflow mentioned below.



...

perhaps I'll just submit a ticket as a feature request.  I *think* that
under normal circumstances in my application squid won't run out of
memory, but I'll see after running it in production for a while.


So far I haven't seen a problem but I can imagine ways that it could
cause too much growth so I'm worried that one day it will.


Yes, both approaches lead to problems.  The trickle-feed approach
used now leads to resource holding on the Server. Not doing it leads
to bandwidth overload as Squid downloads N objects for N clients and
only has to send back one packet to each client.
  So its a choice of being partially vulnerable to slow loris style
attacks (timeouts etc prevent full vulnerability) or packet
amplification on a massive scale.


Just to make sure I understand you, in both cases you're talking about
attacks, not normal operation, right?  And are you saying that it is
easier to mitigate the trickle-feed attack than the packet-amplification
attack, so trickle-feed is less bad?  I'm not so worried about attacks
as normal operation.



Both are real traffic types, the attack form is just artificially 
induced to make it worse. Like ping-flooding in the 90's it happens 
normally, but not often. All it takes is a large number of slow clients 
requesting non-identical URLs.


IIRC it was noticed worse by cellphone networks with very large numbers 
of very slow GSM clients.
 A client connects sends request, Squid reads back N bytes from server 
and sends N-M to the client. Repeat until all FD available in Squid are 
consumed. During which time M bytes of packets are overflowing the 
server link for each 2 FD used. If the total of all M is greater than 
the server link size...


Under the current design the worst case is Server running out of FD 
first and reject new connections. Or TCP protections dropping 
connections and Squid aborting the clients early. The overflow factor is 
32K or 64K linear with the number of FD and cant happen naturally where 
the client does read the data just slowly.


Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.12
  Beta testers wanted for 3.2.0.7 and 3.1.12.1


Re: [squid-users] Re: can squid load data into cache faster than sending it out?

2011-05-10 Thread Dave Dykstra
On Sat, May 07, 2011 at 02:32:22PM +1200, Amos Jeffries wrote:
 On 07/05/11 08:54, Dave Dykstra wrote:
 Ah, but as explained here
  http://www.squid-cache.org/mail-archive/squid-users/200903/0509.html
 this does risk using up a lot of memory because squid keeps all of the
 read-ahead data in memory.  I don't see a reason why it couldn't instead
 write it all out to the disk cache as normal and then read it back from
 there as needed.  Is there some way to do that currently?  If not,
 
 Squid should be writing to the cache in parallel to the data
 arrival, the only bit required in memory being the bit queued for
 sending to the client.  Which gets bigger, and bigger... up to the
 read_ahead_gap limit.

Amos,

Yes, it makes sense that it's writing to the disk cache in parallel, but
what I'm asking for is a way to get squid to keep reading from the
origin server as fast as it can without reserving all that memory.  I'm
asking for an option to not block the reading from the origin server 
writing to the cache when the read_ahead_gap is full, and instead read
data back from the cache to write it out when the client is ready for
more.  Most likely the data will still be in the filesystem cache so it
will be fast.

 IIRC it is supposed to be taken out of the cache_mem space
 available, but I've not seen anything to confirm that.

I'm sure that's not the case, because I have been able to force the
memory usage to grow by more than the cache_mem setting by doing a
number of wgets of largeish requests in parallel using --limit-rate to
cause them to take a long time.  Besides, it doesn't make sense that it
would do that when the read_ahead_gap is far greater than
maximum_object_size_in_memory.

 perhaps I'll just submit a ticket as a feature request.  I *think* that
 under normal circumstances in my application squid won't run out of
 memory, but I'll see after running it in production for a while.

So far I haven't seen a problem but I can imagine ways that it could
cause too much growth so I'm worried that one day it will.

- Dave


Re: [squid-users] Re: can squid load data into cache faster than sending it out?

2011-05-06 Thread Dave Dykstra
Ah, but as explained here
http://www.squid-cache.org/mail-archive/squid-users/200903/0509.html
this does risk using up a lot of memory because squid keeps all of the
read-ahead data in memory.  I don't see a reason why it couldn't instead
write it all out to the disk cache as normal and then read it back from
there as needed.  Is there some way to do that currently?  If not,
perhaps I'll just submit a ticket as a feature request.  I *think* that
under normal circumstances in my application squid won't run out of
memory, but I'll see after running it in production for a while.

- Dave

On Wed, May 04, 2011 at 02:52:12PM -0500, Dave Dykstra wrote:
 I found the answer: set read_ahead_gap to a buffer larger than the
 largest data chunk I transfer.
 
 - Dave
 
 On Wed, May 04, 2011 at 09:11:59AM -0500, Dave Dykstra wrote:
  I have a reverse proxy squid on the same machine as my origin server.
  Sometimes queries from squid are sent around the world and can be very
  slow, for example today there is one client taking 40 minutes to
  transfer 46MB.  When the data is being transferred from the origin
  server, the connection between squid and the origin server is tied up
  for the entire 40 minutes, leaving it unavailable for other work
  (there's only a small number of connections allowed by the origin server
  to its upstream database).  My question is, can squid be configured to
  take in the data from the origin server as fast as it can and cache it,
  and then send out the data to the client as bandwidth allows?  I would
  want it to stream to the client during this process too, but not block
  the transfer from origin server to squid if the client is slow.
  
  I'm using squid-2.7STABLE9, and possibly relevant non-default squid.conf
  options I'm using are:
  http_port 8000 accel defaultsite=127.0.0.1:8080
  cache_peer 127.0.0.1 parent 8080 0 no-query originserver
  collapsed_forwarding on
  
  - Dave


Re: [squid-users] Re: can squid load data into cache faster than sending it out?

2011-05-06 Thread Amos Jeffries

On 07/05/11 08:54, Dave Dykstra wrote:

Ah, but as explained here
 http://www.squid-cache.org/mail-archive/squid-users/200903/0509.html
this does risk using up a lot of memory because squid keeps all of the
read-ahead data in memory.  I don't see a reason why it couldn't instead
write it all out to the disk cache as normal and then read it back from
there as needed.  Is there some way to do that currently?  If not,


Squid should be writing to the cache in parallel to the data arrival, 
the only bit required in memory being the bit queued for sending to the 
client. Which gets bigger, and bigger... up to the read_ahead_gap limit.


IIRC it is supposed to be taken out of the cache_mem space available, 
but I've not seen anything to confirm that.



perhaps I'll just submit a ticket as a feature request.  I *think* that
under normal circumstances in my application squid won't run out of
memory, but I'll see after running it in production for a while.

- Dave

On Wed, May 04, 2011 at 02:52:12PM -0500, Dave Dykstra wrote:

I found the answer: set read_ahead_gap to a buffer larger than the
largest data chunk I transfer.



Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.12
  Beta testers wanted for 3.2.0.7 and 3.1.12.1