Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Sachin Malave
On Wed, Nov 25, 2009 at 7:48 AM, Amos Jeffries  wrote:
> On Tue, 24 Nov 2009 16:13:37 -0700, Alex Rousskov
>  wrote:
>> On 11/20/2009 10:59 PM, Robert Collins wrote:
>>> On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote:
>> Q1. What are the major areas or units of asynchronous code
> execution?
>> Some of us may prefer large areas such as "http_port acceptor" or
>> "cache" or "server side". Others may root for AsyncJob as the
> largest
>> asynchronous unit of execution. These two approaches and their
>> implications differ a lot. There may be other designs worth
>> considering.
>>
>>> I'd like to let people start writing (and perf testing!) patches. To
>>> unblock people. I think the primary questions are:
>>>  - do we permit multiple approaches inside the same code base. E.g.
>>> OpenMP in some bits, pthreads / windows threads elsewhere, and 'job
>>> queues' or some such abstraction elsewhere ?
>>>     (I vote yes, but with caution: someone trying something we don't
>>> already do should keep it on a branch and really measure it well until
>>> its got plenty of buy in).
>>
>> I vote for multiple approaches at lower levels of the architecture and
>> against multiple approaches at highest level of the architecture. My Q1
>> was only about the highest levels, BTW.
>>
>> For example, I do not think it is a good idea to allow a combination of
>> OpenMP, ACE, and something else as a top-level design. Understanding,
>> supporting, and tuning such a mix would be a nightmare, IMO.
>>
>> On the other hand, using threads within some disk storage schemes while
>> using processes for things like "cache" may make a lot of sense, and we
>> already have examples of some of that working.
>>
>
> OpenMP seems almost unanimous negative by the people who know it.
>

OK


>>
>> This is why I believe that the decision of processes versus threads *at
>> the highest level* of the architecture is so important. Yes, we are,
>> can, and will use threads at lower levels. There is no argument there.
>> The question is whether we can also use threads to split Squid into
>> several instances of "major areas" like client side(s), cache(s), and
>> server side(s).
>>
>> See Henrik's email on why it is difficult to use threads at highest
>> levels. I am not convinced yet, but I do see Henrik's point, and I
>> consider the dangers he cites critical for the right Q1 answer.
>>
>>
>>>  - If we do *not* permit multiple approaches, then what approach do we
>>> want for parallelisation. E.g. a number of long lived threads that take
>>> on work, or many transient threads as particular bits of the code need
>>> threads. I favour the former (long lived 'worker' threads).
>>
>> For highest-level models, I do not think that "one job per
>> thread/process", "one call per thread/process", or any other "one little
>> short-lived something per thread/process" is a good idea. I do believe
>> we have to parallelize "major areas", and I think we should support
>> multiple instances of some of those "areas" (e.g., multiple client
>> sides). Each "major area" would be long-lived process/thread, of course.
>
> Agreed. mostly.
>
> As Rob points out the idea is for one small'ish pathway of the code to be
> run N times with different state data each time by a single thread.
>
> Sachins' initial AcceptFD thread proposal would perhapse be exemplar for
> this type of thread. Where one thread does the comm layer; accept() through
> to the scheduling call hand-off to handlers outside comm. Then goes back
> for the next accept().
>
> The only performance issue brought up was by you that its particular case
> might flood the slower main process if done first. Not all code can be done
> this way.
>
> Overheads are simply moving the state data in/out of the thread. IMO
> starting/stopping threads too often is a fairly bad idea. Most events will
> end up being grouped together into types (perhapse categorized by
> component, perhapse by client request, perhapse by pathway) with a small
> thread dedicated to handling that type of call.
>
>>
>> Again for higher-level models, I am also skeptical that it is a good
>> idea to just split Squid into N mostly non-cooperating nearly identical
>> instances. It may be the right first step, but I would like to offer
>> more than that in terms of overall performance and tunability.
>
> The answer to that is: of all the SMP models we theorize, that one is the
> only proven model so far.
> Administrators are already doing it with all the instance management
> manually handled on quad+ core machines. With a lot of performance success.
>
> In last nights discussion on IRC we covered what issues are outstanding
> from making this automatic and all are resolvable except cache index. It's
> not easily shareable between instances.
>
>>
>> I hope the above explains why I consider Q1 critical for the meant
>> "highest level" scope and why "we already use processes and threads" is
>> certainly true but irrelevant withi

Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Amos Jeffries
On Tue, 24 Nov 2009 16:13:37 -0700, Alex Rousskov
 wrote:
> On 11/20/2009 10:59 PM, Robert Collins wrote:
>> On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote:
> Q1. What are the major areas or units of asynchronous code
execution?
> Some of us may prefer large areas such as "http_port acceptor" or
> "cache" or "server side". Others may root for AsyncJob as the
largest
> asynchronous unit of execution. These two approaches and their
> implications differ a lot. There may be other designs worth
> considering.
> 
>> I'd like to let people start writing (and perf testing!) patches. To
>> unblock people. I think the primary questions are:
>>  - do we permit multiple approaches inside the same code base. E.g.
>> OpenMP in some bits, pthreads / windows threads elsewhere, and 'job
>> queues' or some such abstraction elsewhere ?
>> (I vote yes, but with caution: someone trying something we don't
>> already do should keep it on a branch and really measure it well until
>> its got plenty of buy in).
> 
> I vote for multiple approaches at lower levels of the architecture and
> against multiple approaches at highest level of the architecture. My Q1
> was only about the highest levels, BTW.
> 
> For example, I do not think it is a good idea to allow a combination of
> OpenMP, ACE, and something else as a top-level design. Understanding,
> supporting, and tuning such a mix would be a nightmare, IMO.
> 
> On the other hand, using threads within some disk storage schemes while
> using processes for things like "cache" may make a lot of sense, and we
> already have examples of some of that working.
> 

OpenMP seems almost unanimous negative by the people who know it.

> 
> This is why I believe that the decision of processes versus threads *at
> the highest level* of the architecture is so important. Yes, we are,
> can, and will use threads at lower levels. There is no argument there.
> The question is whether we can also use threads to split Squid into
> several instances of "major areas" like client side(s), cache(s), and
> server side(s).
> 
> See Henrik's email on why it is difficult to use threads at highest
> levels. I am not convinced yet, but I do see Henrik's point, and I
> consider the dangers he cites critical for the right Q1 answer.
> 
> 
>>  - If we do *not* permit multiple approaches, then what approach do we
>> want for parallelisation. E.g. a number of long lived threads that take
>> on work, or many transient threads as particular bits of the code need
>> threads. I favour the former (long lived 'worker' threads).
> 
> For highest-level models, I do not think that "one job per
> thread/process", "one call per thread/process", or any other "one little
> short-lived something per thread/process" is a good idea. I do believe
> we have to parallelize "major areas", and I think we should support
> multiple instances of some of those "areas" (e.g., multiple client
> sides). Each "major area" would be long-lived process/thread, of course.

Agreed. mostly.

As Rob points out the idea is for one small'ish pathway of the code to be
run N times with different state data each time by a single thread.

Sachins' initial AcceptFD thread proposal would perhapse be exemplar for
this type of thread. Where one thread does the comm layer; accept() through
to the scheduling call hand-off to handlers outside comm. Then goes back
for the next accept().

The only performance issue brought up was by you that its particular case
might flood the slower main process if done first. Not all code can be done
this way.

Overheads are simply moving the state data in/out of the thread. IMO
starting/stopping threads too often is a fairly bad idea. Most events will
end up being grouped together into types (perhapse categorized by
component, perhapse by client request, perhapse by pathway) with a small
thread dedicated to handling that type of call.

> 
> Again for higher-level models, I am also skeptical that it is a good
> idea to just split Squid into N mostly non-cooperating nearly identical
> instances. It may be the right first step, but I would like to offer
> more than that in terms of overall performance and tunability.

The answer to that is: of all the SMP models we theorize, that one is the
only proven model so far.
Administrators are already doing it with all the instance management
manually handled on quad+ core machines. With a lot of performance success.

In last nights discussion on IRC we covered what issues are outstanding
from making this automatic and all are resolvable except cache index. It's
not easily shareable between instances.

> 
> I hope the above explains why I consider Q1 critical for the meant
> "highest level" scope and why "we already use processes and threads" is
> certainly true but irrelevant within that scope.
> 
> 
> Thank you,
> 
> Alex.

Thank you for clarifying that. I now think we are all more or less headed
in the same direction(s). With three models proposed for t

Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Robert Collins
On Tue, 2009-11-24 at 16:13 -0700, Alex Rousskov wrote:

> For example, I do not think it is a good idea to allow a combination of
> OpenMP, ACE, and something else as a top-level design. Understanding,
> supporting, and tuning such a mix would be a nightmare, IMO.

I think that would be hard, yes.

> See Henrik's email on why it is difficult to use threads at highest
> levels. I am not convinced yet, but I do see Henrik's point, and I
> consider the dangers he cites critical for the right Q1 answer.

> >  - If we do *not* permit multiple approaches, then what approach do we
> > want for parallelisation. E.g. a number of long lived threads that take
> > on work, or many transient threads as particular bits of the code need
> > threads. I favour the former (long lived 'worker' threads).
> 
> For highest-level models, I do not think that "one job per
> thread/process", "one call per thread/process", or any other "one little
> short-lived something per thread/process" is a good idea.

Neither do I. Short lived things have a high overhead. But consider that
a queue of tasks in a single long lived thread doesn't have the high
overhead of making a new thread or process per item in the queue. Using
ACLs as an example, ACL checking is callback based nearly everywhere; we
could have a thread that does ACL checking and free up the main thread
to continue doing work. Later on, with more auditing we could have
multiple concurrent ACL checking threads.

-Rob


signature.asc
Description: This is a digitally signed message part


Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Alex Rousskov
On 11/20/2009 10:59 PM, Robert Collins wrote:
> On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote:
 Q1. What are the major areas or units of asynchronous code execution?
 Some of us may prefer large areas such as "http_port acceptor" or
 "cache" or "server side". Others may root for AsyncJob as the largest
 asynchronous unit of execution. These two approaches and their
 implications differ a lot. There may be other designs worth considering.

> I'd like to let people start writing (and perf testing!) patches. To
> unblock people. I think the primary questions are:
>  - do we permit multiple approaches inside the same code base. E.g.
> OpenMP in some bits, pthreads / windows threads elsewhere, and 'job
> queues' or some such abstraction elsewhere ?
> (I vote yes, but with caution: someone trying something we don't
> already do should keep it on a branch and really measure it well until
> its got plenty of buy in).

I vote for multiple approaches at lower levels of the architecture and
against multiple approaches at highest level of the architecture. My Q1
was only about the highest levels, BTW.

For example, I do not think it is a good idea to allow a combination of
OpenMP, ACE, and something else as a top-level design. Understanding,
supporting, and tuning such a mix would be a nightmare, IMO.

On the other hand, using threads within some disk storage schemes while
using processes for things like "cache" may make a lot of sense, and we
already have examples of some of that working.


This is why I believe that the decision of processes versus threads *at
the highest level* of the architecture is so important. Yes, we are,
can, and will use threads at lower levels. There is no argument there.
The question is whether we can also use threads to split Squid into
several instances of "major areas" like client side(s), cache(s), and
server side(s).

See Henrik's email on why it is difficult to use threads at highest
levels. I am not convinced yet, but I do see Henrik's point, and I
consider the dangers he cites critical for the right Q1 answer.


>  - If we do *not* permit multiple approaches, then what approach do we
> want for parallelisation. E.g. a number of long lived threads that take
> on work, or many transient threads as particular bits of the code need
> threads. I favour the former (long lived 'worker' threads).

For highest-level models, I do not think that "one job per
thread/process", "one call per thread/process", or any other "one little
short-lived something per thread/process" is a good idea. I do believe
we have to parallelize "major areas", and I think we should support
multiple instances of some of those "areas" (e.g., multiple client
sides). Each "major area" would be long-lived process/thread, of course.

Again for higher-level models, I am also skeptical that it is a good
idea to just split Squid into N mostly non-cooperating nearly identical
instances. It may be the right first step, but I would like to offer
more than that in terms of overall performance and tunability.

I hope the above explains why I consider Q1 critical for the meant
"highest level" scope and why "we already use processes and threads" is
certainly true but irrelevant within that scope.


Thank you,

Alex.


Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Sachin Malave
On Tue, Nov 24, 2009 at 6:08 PM, Henrik Nordstrom
 wrote:
> ons 2009-11-25 klockan 00:55 +1300 skrev Amos Jeffries:
>
>> I kind of mean that by the "smaller units". I'm thinking primarily here
>> of the internal DNS. It's API is very isolated from the work.
>
> And also a good example of where the CPU usage is negligible.
>
> And no, it's not really that isolated. It's allocating data for the
> response which is then handed to the caller, and modified in other parts
> of the code via ipcache..
>
> But yes, it's a good example of where one can try scheduling the
> processing on a separate thread to experiment with such model.

Its not only about how much CPU usage we are distributing among
threads, But we also have to consider that thread works only inside
its own memory if shared data is less( & must be), If we could let
thread to work inside its own private memory maximum time, it is worth
to create thread so a thread scheduled on a core accessing its own
cache will definitely speed up our squid.

Yes we have to consider how OS is doing read/write operations, Because
all write operations must be done serially if using WRITE THROUGH
policy to update all levels of memory ( cache or main), otherwise no
issues



> Regards
> Henrik
>
>



-- 
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in


Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Henrik Nordstrom
ons 2009-11-25 klockan 00:55 +1300 skrev Amos Jeffries:

> I kind of mean that by the "smaller units". I'm thinking primarily here 
> of the internal DNS. It's API is very isolated from the work.

And also a good example of where the CPU usage is negligible.

And no, it's not really that isolated. It's allocating data for the
response which is then handed to the caller, and modified in other parts
of the code via ipcache..

But yes, it's a good example of where one can try scheduling the
processing on a separate thread to experiment with such model.

Regards
Henrik



Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Amos Jeffries

Henrik Nordstrom wrote:

sön 2009-11-22 klockan 00:12 +1300 skrev Amos Jeffries:

I think we can open the doors earlier than after that. I'm happy with an 
approach that would see the smaller units of Squid growing in 
parallelism to encompass two full cores.


And I have a more careful opinion.

Introducing threads in the current Squid core processing is very
non-trivial. This due to the relatively high amount of shared data with
no access protection. We already have sufficient nightmares from data
access synchronization issues in the current non-threaded design, and
trying to synchronize access in a threaded operations is many orders of
magnitude more complex.

The day the code base is cleaned up to the level that one can actually
assess what data is being accessed where threads may be a viable
discussion, but as things are today it's almost impossible to judge what
data will be directly or indirectly accessed by any larger operation.


I kind of mean that by the "smaller units". I'm thinking primarily here 
of the internal DNS. It's API is very isolated from the work.





Using threads for micro operations will not help us. The overhead
involved in scheduling an operation to a thread is comparably large to
most operations we are performing, and if adding to this the amount of
synchronization needed to shield the data accessed by that operation
then the overhead will in nearly all cases by far out weight the actual
processing time of the micro operations only resulting in a net loss of
performance. There is some isolated cases I can think of like SSL
handshake negotiation where actual processing may be significant, but at
the general level I don't see many operations which would be candidates
for micro threading.


These are the ones I can see without really looking ...

 * receive DNS packet,
 * validate
 * add to cache
 * schedule event
 * repeat
::shared: call event data, IP memory block (copy?), queue access, any 
stats counted


or the one Sachin found:
 * accept connection
 * perform NAT if needed
 * perform SSL handshakes if needed
 * generate connection state objects
 * schedule
 * repeat
::shared: state data object (write), SSL context (read-only?), call 
event data, call queue access, any stats counted


or the request body pump is a dead-end for handling:
 * read data chunk
 * compress/decompress
 * write to disk
 * write data chunk to client
 * repeat
::shared: state data object (read-only, if thread provides its own data 
buffer), 2N FD data (read-only), any stats counted



Yes this last is overkill unless bunching up the concurrency a 
little/lot in each thread. so the request body data pump can pull/push 
up to N active client connections at once.




Using threads for isolated things like disk I/O is one thing. The code
running in those threads are very very isolated and limited in what it's
allowed to do (may only access the data given to them, may NOT allocate
new data or look up any other global data), but is still heavily
penalized from synchronization overhead. Further the only reason why we
have the threaded I/O model is because Posix AIO do not provide a rich
enough interface, missing open/close operations which may both block for
significant amount of time. So we had to implement our own alternative
having open/close operations. If you look closely at the threads I/O
code you will see that it goes to quite great lengths to isolate the
threads from the main code, with obvious performance drawbacks. The
initial code even went much further in isolation, but core changes have
over time provided a somewhat more suitable environment for some of
those operations.


For the same reasons I don't see OpenMP as fitting for the problem scope
we have. The strength of OpenMP is to parallize CPU intensive operations
of the code where those regions is well defined in what data they
access, not to deal with a large scale of concurrent operations with
access to unknown amounts of shared data.



Trying to thread the Squid core engine is in many ways similar to the
problems kernel developers have had to fight in making the OS kernels
multithreaded, except that we don't even have threads of execution (the
OS developers at least had processes). If trying to do the same with the
Squid code then we would need an approach like the following:

1. Create a big Squid main lock, always held except for audited regions
known to use more fine grained locking.

2. Set up N threads of executing, all initially fighting for that big
main lock in each operation.

3. Gradually work over the code identify areas where that big lock is
not needed to be held, transition over to more fine grained locking.
Starting at the main loops and work down from there.

This is not a path I favor for the Squid code. It's a transition which
is larger than the Squid-3 transition, and which have even bigger
negative impacts on performance until most of the work have been
completed.



Another alternative is to start on Squid-4, rewri

Re: squid-smp: synchronization issue & solutions

2009-11-24 Thread Henrik Nordstrom
sön 2009-11-22 klockan 00:12 +1300 skrev Amos Jeffries:

> I think we can open the doors earlier than after that. I'm happy with an 
> approach that would see the smaller units of Squid growing in 
> parallelism to encompass two full cores.

And I have a more careful opinion.

Introducing threads in the current Squid core processing is very
non-trivial. This due to the relatively high amount of shared data with
no access protection. We already have sufficient nightmares from data
access synchronization issues in the current non-threaded design, and
trying to synchronize access in a threaded operations is many orders of
magnitude more complex.

The day the code base is cleaned up to the level that one can actually
assess what data is being accessed where threads may be a viable
discussion, but as things are today it's almost impossible to judge what
data will be directly or indirectly accessed by any larger operation.

Using threads for micro operations will not help us. The overhead
involved in scheduling an operation to a thread is comparably large to
most operations we are performing, and if adding to this the amount of
synchronization needed to shield the data accessed by that operation
then the overhead will in nearly all cases by far out weight the actual
processing time of the micro operations only resulting in a net loss of
performance. There is some isolated cases I can think of like SSL
handshake negotiation where actual processing may be significant, but at
the general level I don't see many operations which would be candidates
for micro threading.

Using threads for isolated things like disk I/O is one thing. The code
running in those threads are very very isolated and limited in what it's
allowed to do (may only access the data given to them, may NOT allocate
new data or look up any other global data), but is still heavily
penalized from synchronization overhead. Further the only reason why we
have the threaded I/O model is because Posix AIO do not provide a rich
enough interface, missing open/close operations which may both block for
significant amount of time. So we had to implement our own alternative
having open/close operations. If you look closely at the threads I/O
code you will see that it goes to quite great lengths to isolate the
threads from the main code, with obvious performance drawbacks. The
initial code even went much further in isolation, but core changes have
over time provided a somewhat more suitable environment for some of
those operations.


For the same reasons I don't see OpenMP as fitting for the problem scope
we have. The strength of OpenMP is to parallize CPU intensive operations
of the code where those regions is well defined in what data they
access, not to deal with a large scale of concurrent operations with
access to unknown amounts of shared data.



Trying to thread the Squid core engine is in many ways similar to the
problems kernel developers have had to fight in making the OS kernels
multithreaded, except that we don't even have threads of execution (the
OS developers at least had processes). If trying to do the same with the
Squid code then we would need an approach like the following:

1. Create a big Squid main lock, always held except for audited regions
known to use more fine grained locking.

2. Set up N threads of executing, all initially fighting for that big
main lock in each operation.

3. Gradually work over the code identify areas where that big lock is
not needed to be held, transition over to more fine grained locking.
Starting at the main loops and work down from there.

This is not a path I favor for the Squid code. It's a transition which
is larger than the Squid-3 transition, and which have even bigger
negative impacts on performance until most of the work have been
completed.



Another alternative is to start on Squid-4, rewriting the code base
completely from scratch starting at a parallel design and then plug in
any pieces that can be rescued from earlier Squid generations if any.
But for obvious staffing reasons this is an approach I do not recommend
in this project. It's effectively starting another project, with very
little shared with the Squid we have today.


For these reasons I am more in favor for multi-process approaches. The
amount of work needed for making Squid multi-process capable is fairly
limited and mainly circulates around the cache index and a couple of
other areas that need to be shared for proper operation. We can fully
parallelize Squid today at process level if disabling persistent shared
cache + digest auth, and this is done by many users already. Squid-2 can
even do it on the same http_port, letting the OS schedule connections to
the available Squid processes.


Regards
Henrik



Re: squid-smp: synchronization issue & solutions

2009-11-21 Thread Amos Jeffries

Robert Collins wrote:

On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote:



Important features of  OPENMP, you might be interested in...

** If your compiler is not supporting OPENMP then you dont have to do
any special thing, Compiler simply ignores these #pragmas..
and runs codes as if they are in sequential single thread program,
without affecting the end goal.


I don't think this is useful to us: all the platforms we consider
important have threading libraries. Support does seem widespread.
 

** Programmers need not to create any locking mechanism and worry
about critical sections,


We have to worry about this because:
 - OpenMP is designed for large data set manipulation
 - few of our datasets are large except for:
   - some ACL's
   - the main hash table

So we'll need separate threads created around large constructs like
'process a request' (unless we take a thread-per-CPU approach and a
queue of jobs). Either approach will require careful synchronisation on
the 20 or so shared data structures.


** By default it creates number threads equals to processors( * cores
per processor) in your system.

All of the above make me think that OPENMP-enabled Squid may be
significantly slower than multi-instance Squid. I doubt OPENMP is so
smart that it can correctly and efficiently orchestrate the work of
Squid "threads" that are often not even visible/identifiable in the
current code.


I think it could, if we had a shared-nothing model under the hood so
that we could 'simply' parallelise the front end dispatch and let
everything run. However, that doesn't really fit our problem.


- Designed for parallelizing computation-intensive programs such as
various math models running on massively parallel computers. AFAICT, the
OpenMP steering group is comprised of folks that deal with such models
in such environments. Our environment and performance goals are rather
different.


But that doesnt mean that we can not have independent threads,

It means that there is a high probability that it will not work well for
other, very different, problem areas. It may work, but not work well enough.


I agree. From my reading openMP isn't really suitable to our domain.
I've asked around a little and noone has said 'Yes! you should Do It'.
The similar servers I know of like drizzle(Mysql) do not do it.


I think our first questions should instead include:

Q1. What are the major areas or units of asynchronous code execution?
Some of us may prefer large areas such as "http_port acceptor" or
"cache" or "server side". Others may root for AsyncJob as the largest
asynchronous unit of execution. These two approaches and their
implications differ a lot. There may be other designs worth considering.


I'd like to let people start writing (and perf testing!) patches. To
unblock people. I think the primary questions are:
 - do we permit multiple approaches inside the same code base. E.g.
OpenMP in some bits, pthreads / windows threads elsewhere, and 'job
queues' or some such abstraction elsewhere ?
(I vote yes, but with caution: someone trying something we don't
already do should keep it on a branch and really measure it well until
its got plenty of buy in).


I'm also in favor of the mixed approach. With care that the particular 
approach taken at each point is appropriate for the operation being done.
 For example I wouldn't place each Call into a process. But a thread 
each might be arguable. Whereas a Job might be a process with multiple 
threads, or a thread with async hops in time.




 - If we do *not* permit multiple approaches, then what approach do we
want for parallelisation. E.g. a number of long lived threads that take
on work, or many transient threads as particular bits of the code need
threads. I favour the former (long lived 'worker' threads).

If we can reach either a 'yes' on the first of these two questions or a
decision on the second, then folk can start working on their favourite
part of the code base. As long as its well tested and delivered with
appropriate synchronisation, I think the benefit of letting folk scratch
itches will be considerable.

I know you have processes vs threads as a key question, but I don't
actually think it is.


I don't think so either. Sounds like a good Q but its a choice of two 
alternatives where the best alternative is number 3: both.


We _already_ have a mixed environment. The helpers and diskd/unlinkd are 
perfect examples of having chosen the process model for some small 
internal units of Squid and the idns vs dnsserver being an example of 
the other choice being made.


We are not deciding on how to make Squid parallel, but how to make is 
massively _more_ parallel than it already is.




We *already* have significant experience with threads (threaded disk io
engine) and multiple processes (diskd io engine, helpers). We shouldn't
require a single answer for breaking squid up, rather good analysis by
the person doing the work on breaking a particular bit of it up.



Re: squid-smp: synchronization issue & solutions

2009-11-20 Thread Robert Collins
On Tue, 2009-11-17 at 08:45 -0700, Alex Rousskov wrote:


> > Important features of  OPENMP, you might be interested in...
> > 
> > ** If your compiler is not supporting OPENMP then you dont have to do
> > any special thing, Compiler simply ignores these #pragmas..
> > and runs codes as if they are in sequential single thread program,
> > without affecting the end goal.

I don't think this is useful to us: all the platforms we consider
important have threading libraries. Support does seem widespread.
 
> > ** Programmers need not to create any locking mechanism and worry
> > about critical sections,

We have to worry about this because:
 - OpenMP is designed for large data set manipulation
 - few of our datasets are large except for:
   - some ACL's
   - the main hash table

So we'll need separate threads created around large constructs like
'process a request' (unless we take a thread-per-CPU approach and a
queue of jobs). Either approach will require careful synchronisation on
the 20 or so shared data structures.

> > ** By default it creates number threads equals to processors( * cores
> > per processor) in your system.
> 
> All of the above make me think that OPENMP-enabled Squid may be
> significantly slower than multi-instance Squid. I doubt OPENMP is so
> smart that it can correctly and efficiently orchestrate the work of
> Squid "threads" that are often not even visible/identifiable in the
> current code.

I think it could, if we had a shared-nothing model under the hood so
that we could 'simply' parallelise the front end dispatch and let
everything run. However, that doesn't really fit our problem.

> >> - Designed for parallelizing computation-intensive programs such as
> >> various math models running on massively parallel computers. AFAICT, the
> >> OpenMP steering group is comprised of folks that deal with such models
> >> in such environments. Our environment and performance goals are rather
> >> different.
> >>
> > 
> > But that doesnt mean that we can not have independent threads,
> 
> It means that there is a high probability that it will not work well for
> other, very different, problem areas. It may work, but not work well enough.

I agree. From my reading openMP isn't really suitable to our domain.
I've asked around a little and noone has said 'Yes! you should Do It'.
The similar servers I know of like drizzle(Mysql) do not do it.

> >> I think our first questions should instead include:
> >>
> >> Q1. What are the major areas or units of asynchronous code execution?
> >> Some of us may prefer large areas such as "http_port acceptor" or
> >> "cache" or "server side". Others may root for AsyncJob as the largest
> >> asynchronous unit of execution. These two approaches and their
> >> implications differ a lot. There may be other designs worth considering.

I'd like to let people start writing (and perf testing!) patches. To
unblock people. I think the primary questions are:
 - do we permit multiple approaches inside the same code base. E.g.
OpenMP in some bits, pthreads / windows threads elsewhere, and 'job
queues' or some such abstraction elsewhere ?
(I vote yes, but with caution: someone trying something we don't
already do should keep it on a branch and really measure it well until
its got plenty of buy in).

 - If we do *not* permit multiple approaches, then what approach do we
want for parallelisation. E.g. a number of long lived threads that take
on work, or many transient threads as particular bits of the code need
threads. I favour the former (long lived 'worker' threads).

If we can reach either a 'yes' on the first of these two questions or a
decision on the second, then folk can start working on their favourite
part of the code base. As long as its well tested and delivered with
appropriate synchronisation, I think the benefit of letting folk scratch
itches will be considerable.

I know you have processes vs threads as a key question, but I don't
actually think it is.

We *already* have significant experience with threads (threaded disk io
engine) and multiple processes (diskd io engine, helpers). We shouldn't
require a single answer for breaking squid up, rather good analysis by
the person doing the work on breaking a particular bit of it up.


> > I AM THINKING ABOUT HYBRID OF BOTH...
> > 
> > Somebody might implement process model, Then we would merge both
> > process and thread models .. together we could have a better squid..
> > :)
> > What do u think? 
> 
> I doubt we have the resources to do a generic process model so I would
> rather decide on a single primary direction (processes or threads) and
> try to generalize that later if needed. However, a process (if we decide
> to go down that route) may still have lower-level threads, but that is a
> secondary question/decision.

We could simply adopt ACE wholesale and focus on the squid unique bits
of the stack. Squid is a pretty typical 'all in one' bundle at the
moment; I'd like to see us focus and reuse/split ou

Re: squid-smp: synchronization issue & solutions

2009-11-19 Thread Adrian Chadd
Right. Thats the easy bit. I could even do that in Squid-2 with a
little bit of luck. The hard bit is rewriting the relevant code which
relies on cbdata style reference counting behaviour. That is the
tricky bit.



Adrian

2009/11/20 Robert Collins :
> On Wed, 2009-11-18 at 10:46 +0800, Adrian Chadd wrote:
>> Plenty of kernels nowdays do a bit of TCP and socket process in
>> process/thread context; so you need to do your socket TX/RX in
>> different processes/threads to get parallelism in the networking side
>> of things.
>
> Very good point.
>
>> You could fake it somewhat by pushing socket IO into different threads
>> but then you have all the overhead of shuffling IO and completed IO
>> between threads. This may be .. complicated.
>
> The event loop I put together for -3 should be able to do that without
> changing the loop - just extending the modules that hook into it.
>
> -Rob
>


Re: squid-smp: synchronization issue & solutions

2009-11-19 Thread Robert Collins
On Wed, 2009-11-18 at 10:46 +0800, Adrian Chadd wrote:
> Plenty of kernels nowdays do a bit of TCP and socket process in
> process/thread context; so you need to do your socket TX/RX in
> different processes/threads to get parallelism in the networking side
> of things.

Very good point.

> You could fake it somewhat by pushing socket IO into different threads
> but then you have all the overhead of shuffling IO and completed IO
> between threads. This may be .. complicated.

The event loop I put together for -3 should be able to do that without
changing the loop - just extending the modules that hook into it.

-Rob


signature.asc
Description: This is a digitally signed message part


Re: squid-smp: synchronization issue & solutions

2009-11-17 Thread Adrian Chadd
Plenty of kernels nowdays do a bit of TCP and socket process in
process/thread context; so you need to do your socket TX/RX in
different processes/threads to get parallelism in the networking side
of things.

You could fake it somewhat by pushing socket IO into different threads
but then you have all the overhead of shuffling IO and completed IO
between threads. This may be .. complicated.


Adrian

2009/11/18 Gonzalo Arana :
> On Tue, Nov 17, 2009 at 12:45 PM, Alex Rousskov
>  wrote:
>> On 11/17/2009 04:09 AM, Sachin Malave wrote:
>>
>> 
>>
>>> I AM THINKING ABOUT HYBRID OF BOTH...
>>>
>>> Somebody might implement process model, Then we would merge both
>>> process and thread models .. together we could have a better squid..
>>> :)
>>> What do u think? 
>
> In my limited squid expierence, cpu usage is hardly a bottleneck.  So,
> why not just use smp for the cpu/disk-intensive parts?
>
> The candidates I can think of are:
>  * evaluating regular expressions (url_regex acls).
>  * aufs/diskd (squid already has support for this).
>
> Best regards,
>
> --
> Gonzalo A. Arana
>
>


Re: squid-smp: synchronization issue & solutions

2009-11-17 Thread Robert Collins
On Tue, 2009-11-17 at 15:49 -0300, Gonzalo Arana wrote:


> In my limited squid expierence, cpu usage is hardly a bottleneck.  So,
> why not just use smp for the cpu/disk-intensive parts?
> 
> The candidates I can think of are:
>   * evaluating regular expressions (url_regex acls).
>   * aufs/diskd (squid already has support for this).

So, we can drive squid to 100% CPU in production high load environments.
To scale further we need:
 - more cpus
 - more performance from the cpu's we have

Adrian is working on the latter, and the SMP discussion is about the
former. Simply putting each request in its own thread would go a long
way towards getting much more bang for buck - but thats not actually
trivial to do :)

-Rob


signature.asc
Description: This is a digitally signed message part


Re: squid-smp: synchronization issue & solutions

2009-11-17 Thread Sachin Malave
On Tue, Nov 17, 2009 at 9:15 PM, Alex Rousskov
 wrote:
> On 11/17/2009 04:09 AM, Sachin Malave wrote:
>
>>> After spending 2 minutes on openmp.org, I am not very excited about
>>> using OpenMP. Please correct me if I am wrong, but OpenMP seems to be:
>>>
>>> - An "approach" or "model" requiring compiler support and language
>>> extensions. It is _not_ a library. You examples with #pragmas is a good
>>> illustration.
>
>> Important features of  OPENMP, you might be interested in...
>>
>> ** If your compiler is not supporting OPENMP then you dont have to do
>> any special thing, Compiler simply ignores these #pragmas..
>> and runs codes as if they are in sequential single thread program,
>> without affecting the end goal.
>>
>> ** Programmers need not to create any locking mechanism and worry
>> about critical sections,
>>
>> ** By default it creates number threads equals to processors( * cores
>> per processor) in your system.
>
> All of the above make me think that OPENMP-enabled Squid may be
> significantly slower than multi-instance Squid. I doubt OPENMP is so
> smart that it can correctly and efficiently orchestrate the work of
> Squid "threads" that are often not even visible/identifiable in the
> current code.
>
>>> - Designed for parallelizing computation-intensive programs such as
>>> various math models running on massively parallel computers. AFAICT, the
>>> OpenMP steering group is comprised of folks that deal with such models
>>> in such environments. Our environment and performance goals are rather
>>> different.
>>>
>>
>> But that doesnt mean that we can not have independent threads,
>
> It means that there is a high probability that it will not work well for
> other, very different, problem areas. It may work, but not work well enough.
>
>>> I think our first questions should instead include:
>>>
>>> Q1. What are the major areas or units of asynchronous code execution?
>>> Some of us may prefer large areas such as "http_port acceptor" or
>>> "cache" or "server side". Others may root for AsyncJob as the largest
>>> asynchronous unit of execution. These two approaches and their
>>> implications differ a lot. There may be other designs worth considering.
>>>
>>
>> See my sample codes, I sent in last mail.. There i have separated out
>> the schedule() and dial()  functions, Where one thread is registering
>> calls in AsyncCallQueue and another is dispatching them..
>> Well, We can concentrate on other areas also
>
> scheedule() and dial() are low level routines that are irrelevant for Q1.
>
>>> Q2. Threads versus processes. Depending on Q1, we may have a choice. The
>>> choice will affect the required locking mechanism and other key decisions.
>>>
>>
>> If you are planning to use processes then it is as good as running
>> multiple squids on single machine..,
>
> I am not planning to use processes yet, but if they are indeed as good
> as running multiple Squids, that is a plus. Hopefully, we can do better
> than multi-instance Squid, but we should be at least as bad/good.
>
>
>>  Only thing is they must be
>> accepting requests on different ports... But if we want distribute
>> single squid's work then i feel threading is the best choice..
>
> You can have a process accepting a request and then forwarding the work
> to another process or receiving a cache hit from another process.
> Inter-process communication is slower than inter-thread communication,
> but it is not impossible.
>
>
>> I AM THINKING ABOUT HYBRID OF BOTH...
>>
>> Somebody might implement process model, Then we would merge both
>> process and thread models .. together we could have a better squid..
>> :)
>> What do u think? 
>
> I doubt we have the resources to do a generic process model so I would
> rather decide on a single primary direction (processes or threads) and
> try to generalize that later if needed. However, a process (if we decide
> to go down that route) may still have lower-level threads, but that is a
> secondary question/decision.
>

OK then, please come precisely,
What exactly you are thinking ?
tell me areas where i should concentrate ?
I want to know what exactly is going in your mind so that i could
start working and experimenting in that direction ... :)

meanwhile i would also try to experiment with threading, i am doing
right now, it would help me when we start actual development, is that
OK ?


Thanx..



> Cheers,
>
> Alex.
>



-- 
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in


Re: squid-smp: synchronization issue & solutions

2009-11-17 Thread Gonzalo Arana
On Tue, Nov 17, 2009 at 12:45 PM, Alex Rousskov
 wrote:
> On 11/17/2009 04:09 AM, Sachin Malave wrote:
>
> 
>
>> I AM THINKING ABOUT HYBRID OF BOTH...
>>
>> Somebody might implement process model, Then we would merge both
>> process and thread models .. together we could have a better squid..
>> :)
>> What do u think? 

In my limited squid expierence, cpu usage is hardly a bottleneck.  So,
why not just use smp for the cpu/disk-intensive parts?

The candidates I can think of are:
  * evaluating regular expressions (url_regex acls).
  * aufs/diskd (squid already has support for this).

Best regards,

-- 
Gonzalo A. Arana


Re: squid-smp: synchronization issue & solutions

2009-11-17 Thread Alex Rousskov
On 11/17/2009 04:09 AM, Sachin Malave wrote:

>> After spending 2 minutes on openmp.org, I am not very excited about
>> using OpenMP. Please correct me if I am wrong, but OpenMP seems to be:
>>
>> - An "approach" or "model" requiring compiler support and language
>> extensions. It is _not_ a library. You examples with #pragmas is a good
>> illustration.

> Important features of  OPENMP, you might be interested in...
> 
> ** If your compiler is not supporting OPENMP then you dont have to do
> any special thing, Compiler simply ignores these #pragmas..
> and runs codes as if they are in sequential single thread program,
> without affecting the end goal.
> 
> ** Programmers need not to create any locking mechanism and worry
> about critical sections,
> 
> ** By default it creates number threads equals to processors( * cores
> per processor) in your system.

All of the above make me think that OPENMP-enabled Squid may be
significantly slower than multi-instance Squid. I doubt OPENMP is so
smart that it can correctly and efficiently orchestrate the work of
Squid "threads" that are often not even visible/identifiable in the
current code.

>> - Designed for parallelizing computation-intensive programs such as
>> various math models running on massively parallel computers. AFAICT, the
>> OpenMP steering group is comprised of folks that deal with such models
>> in such environments. Our environment and performance goals are rather
>> different.
>>
> 
> But that doesnt mean that we can not have independent threads,

It means that there is a high probability that it will not work well for
other, very different, problem areas. It may work, but not work well enough.

>> I think our first questions should instead include:
>>
>> Q1. What are the major areas or units of asynchronous code execution?
>> Some of us may prefer large areas such as "http_port acceptor" or
>> "cache" or "server side". Others may root for AsyncJob as the largest
>> asynchronous unit of execution. These two approaches and their
>> implications differ a lot. There may be other designs worth considering.
>>
> 
> See my sample codes, I sent in last mail.. There i have separated out
> the schedule() and dial()  functions, Where one thread is registering
> calls in AsyncCallQueue and another is dispatching them..
> Well, We can concentrate on other areas also

scheedule() and dial() are low level routines that are irrelevant for Q1.

>> Q2. Threads versus processes. Depending on Q1, we may have a choice. The
>> choice will affect the required locking mechanism and other key decisions.
>>
> 
> If you are planning to use processes then it is as good as running
> multiple squids on single machine..,

I am not planning to use processes yet, but if they are indeed as good
as running multiple Squids, that is a plus. Hopefully, we can do better
than multi-instance Squid, but we should be at least as bad/good.


>  Only thing is they must be
> accepting requests on different ports... But if we want distribute
> single squid's work then i feel threading is the best choice..

You can have a process accepting a request and then forwarding the work
to another process or receiving a cache hit from another process.
Inter-process communication is slower than inter-thread communication,
but it is not impossible.


> I AM THINKING ABOUT HYBRID OF BOTH...
> 
> Somebody might implement process model, Then we would merge both
> process and thread models .. together we could have a better squid..
> :)
> What do u think? 

I doubt we have the resources to do a generic process model so I would
rather decide on a single primary direction (processes or threads) and
try to generalize that later if needed. However, a process (if we decide
to go down that route) may still have lower-level threads, but that is a
secondary question/decision.

Cheers,

Alex.


Re: squid-smp: synchronization issue & solutions

2009-11-17 Thread Sachin Malave
On Mon, Nov 16, 2009 at 9:43 PM, Alex Rousskov
 wrote:
> On 11/15/2009 11:59 AM, Sachin Malave wrote:
>
>> Since last few days i am analyzing squid code for smp support, I found
>> one big issue regarding debugs() function, It is very hard get rid of
>> this issue as it is appearing at almost everywhere in the code. So for
>> testing purpose i have disable the debug option in squid.conf as
>> follows
>>
>> ---
>> debug_options 0,0
>> ---
>>
>> Well this was only way, as did not want to spend time on this issue.
>
> You can certainly disable any feature as an intermediate step as long as
> the overall approach allows for the later efficient support of the
> temporary disabled feature. Debugging is probably the worst feature to
> disable though because without it we do not know much about Squid operation.
>
I agree, We should find a way to re-enable this feature. It is
temporarily disabled...
Off-course locking debugs() was not the solution thats why it is disabled...


>
>> Now concentrating on locking mechanism...
>
> I would not recommend starting with such low-level decisions as locking
> mechanisms. We need to decide what needs to be locked first. AFAIK,
> there is currently no consensus whether we start with processes or
> threads, for example. The locking mechanism would depend on that.
>


>
>> As OpenMP library is widely supported by almost all platforms and
>> compilers, I am inheriting locking mechanism from the same
>> Just include omp.h & compile code with -fopenmp option if using gcc,
>> Other may use similar thing on their platform, Well that is not a big
>> issue..


>
> After spending 2 minutes on openmp.org, I am not very excited about
> using OpenMP. Please correct me if I am wrong, but OpenMP seems to be:
>
> - An "approach" or "model" requiring compiler support and language
> extensions. It is _not_ a library. You examples with #pragmas is a good
> illustration.
>

We have to use something to create and manage threads, there are some
other libraries and models also but i feel we need something that will
work on all platforms,
Important features of  OPENMP, you might be interested in...

** If your compiler is not supporting OPENMP then you dont have to do
any special thing, Compiler simply ignores these #pragmas..
and runs codes as if they are in sequential single thread program,
without affecting the end goal.

** Programmers need not to create any locking mechanism and worry
about critical sections,

** By default it creates number threads equals to processors( * cores
per processor) in your system.

** Its fork and join model is scalable.. ( Off-course we must find
such areas in exiting code)

** OPENMP is OLD but still growing .. Providing new features with new
releases.. Think about other threading libraries, I think their
developments are stopped, Some of them are not freely available, some
of them are available only on WINDOWS..

** IT IS FREE and OPEN-SOURCE like us..

** INTEL just has released TBB ( Threading Building Blocks), But i
doubt its performance on AMD ( non-intel ) hardware.

** You might be thinking about old Pthreads, But i think OPENMP is
very safe and better than pthreads for programmers

SPECIALLY ONE WHO IS MAKING CHANGES IN EXISTING CODES.  and easy to debugs.

 please think about my last point... :)






> - Designed for parallelizing computation-intensive programs such as
> various math models running on massively parallel computers. AFAICT, the
> OpenMP steering group is comprised of folks that deal with such models
> in such environments. Our environment and performance goals are rather
> different.
>

But that doesnt mean that we can not have independent threads, Only
thing is that we have to start these threads in main(), because main
never ends.. Otherwise those independent threads will die after
returning to calling function..



>
>> 1. hash_link  LOCKED
>>
>> 2. dlink_list  LOCKED
>>
>> 3. ipcache, fqdncache   LOCKED,
>>
>> 4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then
>> please discuss.
>>
>> 5. statistic counters --- NOT LOCKED ( I know this is very important,
>> But these are scattered all around squid code, Write now they may be
>> holding wrong values)
>>
>> 6. memory manager --- DID NOT FOLLOW
>>
>> 7. configuration objects --- DID NOT FOLLOW
>
> I worry that the end result of this exercise would produce a slow and
> buggy Squid for several reasons:
>
> - Globally locking low-level but interdependent objects is likely to
> create deadlocks when two or more locked objects need to lock other
> locked objects in a circular fashion.
>

is there any other option ? As discussed, Amos is trying to make these
areas as independent as possible. So that we would have less locking
in the code.

> - Locking low-level objects without an overall performance-aware plan is
> likely to result in performance-killing competition for critical locks.
> I believe that with

Re: squid-smp: synchronization issue & solutions

2009-11-16 Thread Alex Rousskov
On 11/15/2009 11:59 AM, Sachin Malave wrote:

> Since last few days i am analyzing squid code for smp support, I found
> one big issue regarding debugs() function, It is very hard get rid of
> this issue as it is appearing at almost everywhere in the code. So for
> testing purpose i have disable the debug option in squid.conf as
> follows
> 
> ---
> debug_options 0,0
> ---
> 
> Well this was only way, as did not want to spend time on this issue.

You can certainly disable any feature as an intermediate step as long as
the overall approach allows for the later efficient support of the
temporary disabled feature. Debugging is probably the worst feature to
disable though because without it we do not know much about Squid operation.


> Now concentrating on locking mechanism...

I would not recommend starting with such low-level decisions as locking
mechanisms. We need to decide what needs to be locked first. AFAIK,
there is currently no consensus whether we start with processes or
threads, for example. The locking mechanism would depend on that.


> As OpenMP library is widely supported by almost all platforms and
> compilers, I am inheriting locking mechanism from the same
> Just include omp.h & compile code with -fopenmp option if using gcc,
> Other may use similar thing on their platform, Well that is not a big
> issue..

After spending 2 minutes on openmp.org, I am not very excited about
using OpenMP. Please correct me if I am wrong, but OpenMP seems to be:

- An "approach" or "model" requiring compiler support and language
extensions. It is _not_ a library. You examples with #pragmas is a good
illustration.

- Designed for parallelizing computation-intensive programs such as
various math models running on massively parallel computers. AFAICT, the
OpenMP steering group is comprised of folks that deal with such models
in such environments. Our environment and performance goals are rather
different.


> 1. hash_link  LOCKED
> 
> 2. dlink_list  LOCKED
> 
> 3. ipcache, fqdncache   LOCKED,
> 
> 4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then
> please discuss.
> 
> 5. statistic counters --- NOT LOCKED ( I know this is very important,
> But these are scattered all around squid code, Write now they may be
> holding wrong values)
> 
> 6. memory manager --- DID NOT FOLLOW
> 
> 7. configuration objects --- DID NOT FOLLOW

I worry that the end result of this exercise would produce a slow and
buggy Squid for several reasons:

- Globally locking low-level but interdependent objects is likely to
create deadlocks when two or more locked objects need to lock other
locked objects in a circular fashion.

- Locking low-level objects without an overall performance-aware plan is
likely to result in performance-killing competition for critical locks.
I believe that with the right design, many locks can be avoided.


I think our first questions should instead include:

Q1. What are the major areas or units of asynchronous code execution?
Some of us may prefer large areas such as "http_port acceptor" or
"cache" or "server side". Others may root for AsyncJob as the largest
asynchronous unit of execution. These two approaches and their
implications differ a lot. There may be other designs worth considering.

Q2. Threads versus processes. Depending on Q1, we may have a choice. The
choice will affect the required locking mechanism and other key decisions.


Thank you,

Alex.



Re: squid-smp: synchronization issue & solutions

2009-11-15 Thread Amos Jeffries
[NP: eliding recipients I know are getting these mails through squid-dev
anyway]

On Mon, 16 Nov 2009 12:52:15 +1100, Robert Collins
 wrote:
> On Mon, 2009-11-16 at 00:29 +0530, Sachin Malave wrote:
>> Hello,
>> 
>> Since last few days i am analyzing squid code for smp support, I found
>> one big issue regarding debugs() function, It is very hard get rid of
>> this issue as it is appearing at almost everywhere in the code. So for
>> testing purpose i have disable the debug option in squid.conf as
>> follows
>> 
>> ---
>> debug_options 0,0
>> ---
>> 
>> Well this was only way, as did not want to spend time on this
issue.
> 
> Its very important that debugs works.

What exactly were the problems identified?

> 
> 
>> 1. hash_link  LOCKED
> 
> Bad idea, not all hashes will be cross-thread, so making the primitive
> lock incurs massive overhead for all threads.
> 
>> 2. dlink_list  LOCKED
> 
> Ditto.
> 

Aye. These two need to be checked for thread-safe implementations and any
locking done in the caller code per the distinctly named hash/dlink.

>> 3. ipcache, fqdncache   LOCKED,
> 
> Probably important.
> 
>> 4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then
>> please discuss.
> 
>  we need analysis and proof, not 'seems to work'.

Aye. NP: this is one of the critical data stores in Squid. I wouldn't be
too far off generalizing the "everything" up and down the request handling
uses it semi-'random access' directly or indirectly.

> 
>> 5. statistic counters --- NOT LOCKED ( I know this is very important,
>> But these are scattered all around squid code, Write now they may be
>> holding wrong values)
> 
> Will need to be fixed.
> 
>> 6. memory manager --- DID NOT FOLLOW
> 
> Will need attention, e.g. per thread allocators.
> 
>> 7. configuration objects --- DID NOT FOLLOW
> 
> ACL's are not threadsafe.
> 
>> AND FINALLY, Two sections in EventLoop.cc are separated and executed
>> in two threads simultaneously
>> as follows (#pragma lines added in existing code, no other changes)
> 
> I'm not at all sure that splitting the event loop like that is sensible.
> 
> Better to have the dispatcher dispatch to threads.
> 
> -Rob

Amos


Re: squid-smp: synchronization issue & solutions

2009-11-15 Thread Robert Collins
On Mon, 2009-11-16 at 00:29 +0530, Sachin Malave wrote:
> Hello,
> 
> Since last few days i am analyzing squid code for smp support, I found
> one big issue regarding debugs() function, It is very hard get rid of
> this issue as it is appearing at almost everywhere in the code. So for
> testing purpose i have disable the debug option in squid.conf as
> follows
> 
> ---
> debug_options 0,0
> ---
> 
> Well this was only way, as did not want to spend time on this issue.

Its very important that debugs works. 


> 1. hash_link  LOCKED

Bad idea, not all hashes will be cross-thread, so making the primitive
lock incurs massive overhead for all threads.

> 2. dlink_list  LOCKED

Ditto.

> 3. ipcache, fqdncache   LOCKED,

Probably important.

> 4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then
> please discuss.

 we need analysis and proof, not 'seems to work'.

> 5. statistic counters --- NOT LOCKED ( I know this is very important,
> But these are scattered all around squid code, Write now they may be
> holding wrong values)

Will need to be fixed.

> 6. memory manager --- DID NOT FOLLOW

Will need attention, e.g. per thread allocators.

> 7. configuration objects --- DID NOT FOLLOW

ACL's are not threadsafe.

> AND FINALLY, Two sections in EventLoop.cc are separated and executed
> in two threads simultaneously
> as follows (#pragma lines added in existing code, no other changes)

I'm not at all sure that splitting the event loop like that is sensible.

Better to have the dispatcher dispatch to threads.

-Rob


signature.asc
Description: This is a digitally signed message part


squid-smp: synchronization issue & solutions

2009-11-15 Thread Sachin Malave
Hello,

Since last few days i am analyzing squid code for smp support, I found
one big issue regarding debugs() function, It is very hard get rid of
this issue as it is appearing at almost everywhere in the code. So for
testing purpose i have disable the debug option in squid.conf as
follows

---
debug_options 0,0
---

Well this was only way, as did not want to spend time on this issue.

Now concentrating on locking mechanism...

As OpenMP library is widely supported by almost all platforms and
compilers, I am inheriting locking mechanism from the same
Just include omp.h & compile code with -fopenmp option if using gcc,
Other may use similar thing on their platform, Well that is not a big
issue..

BUT, is it wise to take support from this library? Please discuss on
this issue  I felt it is really easy to manage threads and
critical sections if we use OPENMP.

AS DISCUSSED BEFORE.. AND details available on
http://wiki.squid-cache.org/Features/SmpScale
I think, I have solved SOME critical section problems in existing squid code.

*AsyncCallQueue.cc***

void AsyncCallQueue::schedule(AsyncCall::Pointer &call)
{

#pragma omp critical (AsyncCallQueueLock_c) // HERE IS THE LOCK
{
   if (theHead != NULL) { // append
       assert(!theTail->theNext);
       theTail->theNext = call;
       theTail = call;
   } else { // create queue from cratch
       theHead = theTail = call;
   }
}

//AND THEN

AsyncCallQueue::fireNext()
{
AsyncCall::Pointer call;
#pragma omp critical (AsyncCallQueueLock_c)  // SAME LOCK
{
       call = theHead;
   theHead = call->theNext;
   call->theNext = NULL;
   if (theTail == call)
       theTail = NULL;
}
       

}

ITS WORKING, AS SAME CRITICAL SECTIONS (i.e AsyncCallQueueLock_c) CAN
NOT BE CALLED SIMULTANEOUSLY
**

Well in the same way following thing as appearing on
/Features/SmpScale are also locked( May be incompletely)

1. hash_link  LOCKED

2. dlink_list  LOCKED

3. ipcache, fqdncache   LOCKED,

4. FD / fde handling ---WELL, SEEMS NOT CREATING PROBLEM, If any then
please discuss.

5. statistic counters --- NOT LOCKED ( I know this is very important,
But these are scattered all around squid code, Write now they may be
holding wrong values)

6. memory manager --- DID NOT FOLLOW

7. configuration objects --- DID NOT FOLLOW

AND FINALLY, Two sections in EventLoop.cc are separated and executed
in two threads simultaneously
as follows (#pragma lines added in existing code, no other changes)

**EventLoop.cc

#pragma omp parallel sections //PARALLEL SECTIONS
{

  #pragma omp section   //THREAD-1
  {
       if (waitingEngine != NULL)
            checkEngine(waitingEngine, true);
       if (timeService != NULL)
             timeService->tick();
       checked = true;
  }


#pragma omp section //THREAD-2
 {
  while(1)
  {
      if ( lastRound == true) break;
      sawActivity = dispatchCalls();
      if (sawActivity)
          runOnceResult = false;
      if(checked == true) lastRound = true;
   }
 }
}


May need deep testing , but it is working..
am I on the right path ?

Thank you,


--
Mr. S. H. Malave
Computer Science & Engineering Department,
Walchand College of Engineering,Sangli.
sachinmal...@wce.org.in