Re: squid-3.HEAD IPAddress leak

2008-02-12 Thread Amos Jeffries
>
> On Fri, 2008-02-08 at 12:14 +0900, Adrian Chadd wrote:
>>
>> > Together they make a pretty tree. But every used piece is
>> eseentially
>> > another new, memset, free.
>>
>> Ah, and here you will have problems.
>>
>> The members of that struct should probably be malloc, free, and not
>> new/delete. You're using new/delete which -should- map to a default
>> new operator and head off to the malloc libraries, but -squids- idea
>> of the malloc interface could differ from the -library- idea of
>> the malloc interface.
>
> You can tell C++ to use malloc/free if needed. Am I reading right that
> the OS calls free itself ? Usually /either/ clients free and allocate,
> or libraries free and allocate - never a mix. (Because even in C the
> malloc libraries can be partially overridden).
>
> If squid is doing the allocation and free the following:
>
>> So you should probably drop the new/delete'ing of the addrinfo stuff
>> and replace it with malloc/free.
>
> is irrelevant, but if its shared then we should definitely not use C++
> because operator delete can't be called. (And don't use talloc, or
> anything else either.

No it is very carefully non-shared. Handled via new/delete and only by
IPAddress internals.

>
>> You're also memset()'ing the addrinfo struct whether you allocated
>> it or not, which may be double-memset'ing the thing, and if someone
>> passed in an addrinfo it may have structure members which have now
>> been leaked.
>
> tighter boundaries may help this.

Amos



Re: squid-3.HEAD IPAddress leak

2008-02-11 Thread Robert Collins

On Fri, 2008-02-08 at 12:14 +0900, Adrian Chadd wrote:
> 
> > Together they make a pretty tree. But every used piece is
> eseentially
> > another new, memset, free.
> 
> Ah, and here you will have problems.
> 
> The members of that struct should probably be malloc, free, and not
> new/delete. You're using new/delete which -should- map to a default
> new operator and head off to the malloc libraries, but -squids- idea
> of the malloc interface could differ from the -library- idea of
> the malloc interface.

You can tell C++ to use malloc/free if needed. Am I reading right that
the OS calls free itself ? Usually /either/ clients free and allocate,
or libraries free and allocate - never a mix. (Because even in C the
malloc libraries can be partially overridden).

If squid is doing the allocation and free the following:

> So you should probably drop the new/delete'ing of the addrinfo stuff
> and replace it with malloc/free.

is irrelevant, but if its shared then we should definitely not use C++
because operator delete can't be called. (And don't use talloc, or
anything else either.

> You're also memset()'ing the addrinfo struct whether you allocated
> it or not, which may be double-memset'ing the thing, and if someone
> passed in an addrinfo it may have structure members which have now
> been leaked.

tighter boundaries may help this.

-Rob

-- 
GPG key available at: .


signature.asc
Description: This is a digitally signed message part


Re: squid-3.HEAD IPAddress leak

2008-02-08 Thread Alex Rousskov
On Sat, 2008-02-09 at 01:10 +1300, Amos Jeffries wrote:
> >> struct addrinfo
> >> {
> >>   int ai_family;
> >>   int ai_socktype;
> >>   int ai_protocol;
> >>   struct sockaddr *ai_addr; // pointer to sockaddr_storage/*_in/*_in6
> >>   int ai_addrlen;
> >>   char *ai_canonname; // we never new/free this ourselves in squid.
> >>   struct addrinfo *ai_next;// Pointer to next in list.
> >> };
> > 
> > Can you replace IPAddress data members with the above, except not use
> > any pointers and forget about ai_next and ai_canonname? I think doing so
> > will eliminate temporary allocations and other things that look rather
> > scary to both code quality and performance folks.
> 
> In the cases where addrinfo* is going into read-only usage yes.

My understanding is that you will not need addrinfo in most cases, not
for read-only usage, not for read-write usage. In fact, I do not see a
single line in comm.cc that would pass addrinfo to some system call! I
am sure I missed something, but clearly the majority of addrinfo uses is
just to store basic information like address family or protocol.

We create a complex structure, store basic information in there, use it,
and destroy. When we use that information, we pass it to most system
calls piece-by-piece, not as addrinfo structure! This tells me that we
do not need this complex temporary storage. We do not use that
complexity, but are paying a price both in terms of performance and in
terms of code quality.

Could you please make IPAddress store sockaddr_storage address (and
whatever other basic pieces of information you think we must store or
should cache for performance reasons) and remove (Get|Init|
Free)AddrInfo() calls from Squid code where addrinfo does not have to be
passed to system calls?

IMO, doing the above will improve the overall code quality a lot and
will address performance concerns raised on this thread.

Thank you,

Alex.
P.S. It is obviously unfortunate that I am making these comments after
IPAddress changes were committed to HEAD. To be honest, I just could not
imagine that supporting IPv6 would result in this kind of code changes.
I expected low-level IPv4 address storage to be replaced with IPAddress
class, leaving core code mostly intact. I do apologize for not reviewing
such a significant contribution earlier.




Re: squid-3.HEAD IPAddress leak

2008-02-08 Thread Alex Rousskov
On Sat, 2008-02-09 at 01:53 +1300, Amos Jeffries wrote:
> Amos Jeffries wrote:
> > Alex Rousskov wrote:
> >> On Fri, 2008-02-08 at 15:52 +1300, Amos Jeffries wrote:
> >>
> >>> The second;
> >>>  sockaddr_storage (as Husni uses, and Adrian mentioned) was created to
> >>> provide a better way of using sockaddr* so the sockaddr_in and
> >>> sockaddr_in6 bits could be read-written easily. But the big/litte endian
> >>> problems between OS screwed up the sockaddr_in* sa_family field 
> >>> locations
> >>> inside it so developers still can't portably use it for the v6/v6 flag
> >>> they wanted.
> >>
> >> Do you have a reference for that? I do not want to bug you with more
> >> questions but I am surprised to learn that some kind of a
> >> sockaddr_storage wrapper cannot work well for Squid... We may have to
> >> fix Polygraph that is using that approach, IIRC.
> > 
> > I don't have any online references. Everything I have found online 
> > indicated that the _storage should be padded properly.
> > But, I found in my hex-level debugging of the IPv6 code in squid that 
> > some things consistently went badly because the cast sa_family field was 
> > something much higher than any IANA protocol family. Its too long ago 
> > now to recall exact test cases. May have been a system-specific given 
> > this connect() bug.
> 
> Correction: I did have online source that were somewhat confusing but 
> explained the behaviour a little. This seems to be saying what I 
> remember, http://www.kame.net/newsletter/19980604/

Whew! The above link seems to confirm that we should be using
sockaddr_storage as the base for IPAddress wrapper.

Is it possible that you interpreted "use getaddrinfo() and getnameinfo()
everywhere" section _title_ too literally? :-) The actual text of that
section suggests where to use getaddrinfo. I think we are using it for a
lot more than that, right?

Alex.




Re: squid-3.HEAD IPAddress leak

2008-02-08 Thread Adrian Chadd
On Sat, Feb 09, 2008, Amos Jeffries wrote:

> Correction: I did have online source that were somewhat confusing but 
> explained the behaviour a little. This seems to be saying what I 
> remember, http://www.kame.net/newsletter/19980604/

Today's rant:

The C standard libraries after about 1995 weren't focusing on
performance. The non-"n" string library functions are pretty well
optimised for NULL strings.

Don't model stuff on the C standard libraries if you want anything
performance critical. And don't use C standard library calls if you
want anything to run fast.

Case in point: argh don't use *scanf and *printf in the critical path..
You haven't done that, but thats dotted all throughout the Squid codebase
and its really freaking irritating. Half the speedups in 2.6/2.7 were
me going through and slowly replacing all the stdio calls in -socket-
code.




adrian



Re: squid-3.HEAD IPAddress leak

2008-02-08 Thread Amos Jeffries

Amos Jeffries wrote:

Alex Rousskov wrote:

On Fri, 2008-02-08 at 15:52 +1300, Amos Jeffries wrote:


The second;
 sockaddr_storage (as Husni uses, and Adrian mentioned) was created to
provide a better way of using sockaddr* so the sockaddr_in and
sockaddr_in6 bits could be read-written easily. But the big/litte endian
problems between OS screwed up the sockaddr_in* sa_family field 
locations

inside it so developers still can't portably use it for the v6/v6 flag
they wanted.


Do you have a reference for that? I do not want to bug you with more
questions but I am surprised to learn that some kind of a
sockaddr_storage wrapper cannot work well for Squid... We may have to
fix Polygraph that is using that approach, IIRC.


I don't have any online references. Everything I have found online 
indicated that the _storage should be padded properly.
But, I found in my hex-level debugging of the IPv6 code in squid that 
some things consistently went badly because the cast sa_family field was 
something much higher than any IANA protocol family. Its too long ago 
now to recall exact test cases. May have been a system-specific given 
this connect() bug.


Correction: I did have online source that were somewhat confusing but 
explained the behaviour a little. This seems to be saying what I 
remember, http://www.kame.net/newsletter/19980604/


Amos






The third;
  addrinfo* defines a whole new type. Wrapping that old sockaddr* 
mess and

providing in a nice set fo bells and whistles for use. Most importantly
that flag we need to pas to the system calls.


I have to say that the "nice set of bells and whistles" in a basic
address structure used throughout a performance-sensitive program raises
red flags, but perhaps the actual performance implications are not as
bad.


The important bits for the squid comm code are:

struct addrinfo
{
  int ai_family;
  int ai_socktype;
  int ai_protocol;
  struct sockaddr *ai_addr; // pointer to sockaddr_storage/*_in/*_in6
  int ai_addrlen;
  char *ai_canonname; // we never new/free this ourselves in squid.
  struct addrinfo *ai_next;// Pointer to next in list.
};


Can you replace IPAddress data members with the above, except not use
any pointers and forget about ai_next and ai_canonname? I think doing so
will eliminate temporary allocations and other things that look rather
scary to both code quality and performance folks.


In the cases where addrinfo* is going into read-only usage yes.



When you do need a list of addresses or canonname, it is OK to use
addrinfo and convert from/to IPAddress as needed, of course. I am
guessing such uses will not affect performance or overall code quality.



Amos



--
Please use Squid 2.6STABLE17+ or 3.0STABLE1+
There are serious security advisories out on all earlier releases.


Re: squid-3.HEAD IPAddress leak

2008-02-08 Thread Amos Jeffries

Alex Rousskov wrote:

On Fri, 2008-02-08 at 15:52 +1300, Amos Jeffries wrote:


The second;
 sockaddr_storage (as Husni uses, and Adrian mentioned) was created to
provide a better way of using sockaddr* so the sockaddr_in and
sockaddr_in6 bits could be read-written easily. But the big/litte endian
problems between OS screwed up the sockaddr_in* sa_family field locations
inside it so developers still can't portably use it for the v6/v6 flag
they wanted.


Do you have a reference for that? I do not want to bug you with more
questions but I am surprised to learn that some kind of a
sockaddr_storage wrapper cannot work well for Squid... We may have to
fix Polygraph that is using that approach, IIRC.


I don't have any online references. Everything I have found online 
indicated that the _storage should be padded properly.
But, I found in my hex-level debugging of the IPv6 code in squid that 
some things consistently went badly because the cast sa_family field was 
something much higher than any IANA protocol family. Its too long ago 
now to recall exact test cases. May have been a system-specific given 
this connect() bug.






The third;
  addrinfo* defines a whole new type. Wrapping that old sockaddr* mess and
providing in a nice set fo bells and whistles for use. Most importantly
that flag we need to pas to the system calls.


I have to say that the "nice set of bells and whistles" in a basic
address structure used throughout a performance-sensitive program raises
red flags, but perhaps the actual performance implications are not as
bad.


The important bits for the squid comm code are:

struct addrinfo
{
  int ai_family;
  int ai_socktype;
  int ai_protocol;
  struct sockaddr *ai_addr; // pointer to sockaddr_storage/*_in/*_in6
  int ai_addrlen;
  char *ai_canonname; // we never new/free this ourselves in squid.
  struct addrinfo *ai_next; // Pointer to next in list.
};


Can you replace IPAddress data members with the above, except not use
any pointers and forget about ai_next and ai_canonname? I think doing so
will eliminate temporary allocations and other things that look rather
scary to both code quality and performance folks.


In the cases where addrinfo* is going into read-only usage yes.



When you do need a list of addresses or canonname, it is OK to use
addrinfo and convert from/to IPAddress as needed, of course. I am
guessing such uses will not affect performance or overall code quality.



Amos
--
Please use Squid 2.6STABLE17+ or 3.0STABLE1+
There are serious security advisories out on all earlier releases.


Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Alex Rousskov
On Fri, 2008-02-08 at 15:52 +1300, Amos Jeffries wrote:

> The second;
>  sockaddr_storage (as Husni uses, and Adrian mentioned) was created to
> provide a better way of using sockaddr* so the sockaddr_in and
> sockaddr_in6 bits could be read-written easily. But the big/litte endian
> problems between OS screwed up the sockaddr_in* sa_family field locations
> inside it so developers still can't portably use it for the v6/v6 flag
> they wanted.

Do you have a reference for that? I do not want to bug you with more
questions but I am surprised to learn that some kind of a
sockaddr_storage wrapper cannot work well for Squid... We may have to
fix Polygraph that is using that approach, IIRC.

> The third;
>   addrinfo* defines a whole new type. Wrapping that old sockaddr* mess and
> providing in a nice set fo bells and whistles for use. Most importantly
> that flag we need to pas to the system calls.

I have to say that the "nice set of bells and whistles" in a basic
address structure used throughout a performance-sensitive program raises
red flags, but perhaps the actual performance implications are not as
bad.

> The important bits for the squid comm code are:
> 
> struct addrinfo
> {
>   int ai_family;
>   int ai_socktype;
>   int ai_protocol;
>   struct sockaddr *ai_addr; // pointer to sockaddr_storage/*_in/*_in6
>   int ai_addrlen;
>   char *ai_canonname; // we never new/free this ourselves in squid.
>   struct addrinfo *ai_next;   // Pointer to next in list.
> };

Can you replace IPAddress data members with the above, except not use
any pointers and forget about ai_next and ai_canonname? I think doing so
will eliminate temporary allocations and other things that look rather
scary to both code quality and performance folks.

When you do need a list of addresses or canonname, it is OK to use
addrinfo and convert from/to IPAddress as needed, of course. I am
guessing such uses will not affect performance or overall code quality.

Thank you,

Alex.




Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
> On Fri, Feb 08, 2008, Amos Jeffries wrote:
>
>>  sockaddr_storage (as Husni uses, and Adrian mentioned) was created to
>> provide a better way of using sockaddr* so the sockaddr_in and
>> sockaddr_in6 bits could be read-written easily. But the big/litte endian
>> problems between OS screwed up the sockaddr_in* sa_family field
>> locations
>> inside it so developers still can't portably use it for the v6/v6 flag
>> they wanted.
>
> How'd that screw things up? Hm, I'll have to read.
>
>> ... (1) pointer to somewhere holding the sockaddr_storage/*_in/*_in6
>>   struct sockaddr *ai_addr;
>
> Which you have to allocate.

and do when needed.

>
>>   char *ai_canonname;
>
> Which we're not using, and you have to allocate.

and don't.

>
>> ... and because some system lookups in IPv6 can return _multiple_
>> addresses
>> ... it has a linked-list of branchs for each of the alternatives
>>   struct addrinfo *ai_next;  /* Pointer to next in list.  */
>> };
>>
>
> Which is only useful in the context of providing lists of addresses for
> a given host or IP address. What getaddrinfo() returns.

and consider read-only.

>
>> > Why do we need AddrInfo now
>>
>> To kill dozens of castings, magic operations on object sizes, and cut
>> down
>> dozens of lines of 'compatibility code'.
>
> And add in a minimum of one, and a maximum of two allocations per socket
> operation.
>
>> Together they make a pretty tree. But every used piece is eseentially
>> another new, memset, free.
>
> Ah, and here you will have problems.

Agreed.

>
> The members of that struct should probably be malloc, free, and not
> new/delete. You're using new/delete which -should- map to a default
> new operator and head off to the malloc libraries, but -squids- idea
> of the malloc interface could differ from the -library- idea of
> the malloc interface.

I was thinking squid overiding the new/delete to its xmalloc/xfree at the
same time it overrode the general malloc/free.

>
> So you should probably drop the new/delete'ing of the addrinfo stuff
> and replace it with malloc/free.

If thats better, no problem.

>
> You're also memset()'ing the addrinfo struct whether you allocated
> it or not, which may be double-memset'ing the thing, and if someone
> passed in an addrinfo it may have structure members which have now
> been leaked.

In the current usage the call should be the allocation. Not external.
Allowing external allocation via another API call would be the only
optimisation I can think of that does not break anything anywhere.

memset is needed there because I could not tell that the new/delete
_guaranteed_ pre-NULL'd memory and a single set bit in any unused fields
might cause a crash later.
With your deeper knowledge of the memory allocation in squid-3, feel free
to alter the default memset()'s.

Amos




Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Adrian Chadd
On Fri, Feb 08, 2008, Amos Jeffries wrote:

>  sockaddr_storage (as Husni uses, and Adrian mentioned) was created to
> provide a better way of using sockaddr* so the sockaddr_in and
> sockaddr_in6 bits could be read-written easily. But the big/litte endian
> problems between OS screwed up the sockaddr_in* sa_family field locations
> inside it so developers still can't portably use it for the v6/v6 flag
> they wanted.

How'd that screw things up? Hm, I'll have to read.

> ... (1) pointer to somewhere holding the sockaddr_storage/*_in/*_in6
>   struct sockaddr *ai_addr;

Which you have to allocate.

>   char *ai_canonname;

Which we're not using, and you have to allocate.

> ... and because some system lookups in IPv6 can return _multiple_ addresses
> ... it has a linked-list of branchs for each of the alternatives
>   struct addrinfo *ai_next;   /* Pointer to next in list.  */
> };
> 

Which is only useful in the context of providing lists of addresses for
a given host or IP address. What getaddrinfo() returns.

> > Why do we need AddrInfo now
> 
> To kill dozens of castings, magic operations on object sizes, and cut down
> dozens of lines of 'compatibility code'.

And add in a minimum of one, and a maximum of two allocations per socket
operation.

> Together they make a pretty tree. But every used piece is eseentially
> another new, memset, free.

Ah, and here you will have problems.

The members of that struct should probably be malloc, free, and not
new/delete. You're using new/delete which -should- map to a default
new operator and head off to the malloc libraries, but -squids- idea
of the malloc interface could differ from the -library- idea of
the malloc interface.

So you should probably drop the new/delete'ing of the addrinfo stuff
and replace it with malloc/free.

You're also memset()'ing the addrinfo struct whether you allocated
it or not, which may be double-memset'ing the thing, and if someone
passed in an addrinfo it may have structure members which have now
been leaked.




Adrian



Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
> On Fri, 2008-02-08 at 11:46 +1300, Amos Jeffries wrote:
>
>> The basics:
>>
>>   addrinfo* extends the neutral sockaddr* type ( union { sockaddr_in,
>> sockaddr_in6, sockaddr_storage} ) while adding the size values and
>> moving the flags and protocol details.
>
> Hi Amos,
>
> I understand that we need to support IPv4 and IPv6 addresses. I do
> not understand why we are suddenly talking about pointers and trees?
>
> I can understand if wherever Squid2 was using an IPv4 address, Squid3
> would use an (ip4, ip6, whichOneFlag) class with appropriate methods
> that check the flag and return or set the right thing. I know that would
> not be the best way to implement it, but it would work and I would
> understand what is going on.

The protocol extensions defined to replace sockaddr_in and in_addr went
through three generations of 'migration-pathway' attempts.
The first;
  sockaddr* defines a union type that was nasty for casting and did not
include that flag you mention in the structure itself. Most of the
accept(), connect(), bind() etc still use this as their input type.

The second;
 sockaddr_storage (as Husni uses, and Adrian mentioned) was created to
provide a better way of using sockaddr* so the sockaddr_in and
sockaddr_in6 bits could be read-written easily. But the big/litte endian
problems between OS screwed up the sockaddr_in* sa_family field locations
inside it so developers still can't portably use it for the v6/v6 flag
they wanted.

The third;
  addrinfo* defines a whole new type. Wrapping that old sockaddr* mess and
providing in a nice set fo bells and whistles for use. Most importantly
that flag we need to pas to the system calls.

It is in _exactly_ the class you mention. But in struct form so C people
can use it too.

>
> Can you explain AddrInfo in the above simple (ip4, ip6, whichOneFlag)
> class terms?

There are a few extra bits I'v snipped from this explanation.

The important bits for the squid comm code are:

struct addrinfo
{
... here is that flag you want to switch on if its been set properly.
... AF_INET (IPv4) or AF_INET6 (IPv6) or AF_UNSPEC (NULL, error, 'whatever')
  int ai_family;

... some bits needed only by the socket() creation calls
  int ai_socktype;
  int ai_protocol;

... and the actual data for the IP address:

... (1) pointer to somewhere holding the sockaddr_storage/*_in/*_in6
  struct sockaddr *ai_addr;

... (2) and the sizeof() for the (1) object.
... if the flag is not set this _might_ be magic'd instead.
  int ai_addrlen;


... along with some bits set when ia DNS lookup has been done
... _sometimes_ it's the host FQDN
... optional so we never new/free this ourselves in squid.
  char *ai_canonname;

... and because some system lookups in IPv6 can return _multiple_ addresses
... it has a linked-list of branchs for each of the alternatives
  struct addrinfo *ai_next; /* Pointer to next in list.  */
};


> Why do we need AddrInfo now

To kill dozens of castings, magic operations on object sizes, and cut down
dozens of lines of 'compatibility code'.

>  and why are we talking about a
> "tree" with "pointers" and alloc/free operations?

I see a tree of pointers I call it a tree even if its a class-1 (dynamic
linked-list).

addrinfo - makes a tree node.
   ai_addr (a sockaddr pile of unfortunate structs)
   ai_canonname (a null-terminated string),
   ai_next (another addrinfo .. repeat ad inifinitum)

Together they make a pretty tree. But every used piece is eseentially
another new, memset, free.

HTH.

>
> Did Squid2 use an equivalent of AddrInfo? If yes, did it use it as often
> as Squid3 does now? Did Squid2 allocated and freed it?

What I've seen of Husni's final 2.6s13 patch he used addrinfo in many of
the same system-interface places and uses the naked system
getaddrinfo()/freeaddrinfo() calls where I use my GetAddrInfo() wrappers
to the system ones.
Then where he is passing around either sockaddr* or sockaddr_storage*
where I universally use IPAddress& .


There are a few spots where a 'read-only' variant might be possible. So as
to pretend to allocate when no new/free is actually done, just a pointer
to the IPAddress field and a local addrinfo.
One of them is this connect() IFF that bug can be guaranteed not to
overwrite the internal IPAddress data and warp squid.

I may have some time to do this tonight. Bug me about it if I'm on.

Amos

>
> Thank you,
>
> Alex.
>
>
>> I'm using it in three ways.
>>
>> First Usage: [simplest via InitAddrInfo() paired with an assignment
>> somewhere ]
>>  - to create an empty addrinfo tree root and pass to the system to be
>> filled out. The results are copied to the fde or xxData objects for
>> later
>> async use.
>>
>>
>> Second usage: [ hiding complex setup via GetAddrInfo() ]
>>   - to create addrinfo tree containing the current IPAddress so system
>> or
>> squid can use the structure fields in some comm call.
>>   Variant 1: addrinfo provides: AF_*, sockaddr_storage*, and sizeof()
>> results in one package with fixe

Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Alex Rousskov
On Fri, 2008-02-08 at 11:46 +1300, Amos Jeffries wrote:

> The basics:
> 
>   addrinfo* extends the neutral sockaddr* type ( union { sockaddr_in,
> sockaddr_in6, sockaddr_storage} ) while adding the size values and
> moving the flags and protocol details.

Hi Amos,

I understand that we need to support IPv4 and IPv6 addresses. I do
not understand why we are suddenly talking about pointers and trees?

I can understand if wherever Squid2 was using an IPv4 address, Squid3
would use an (ip4, ip6, whichOneFlag) class with appropriate methods
that check the flag and return or set the right thing. I know that would
not be the best way to implement it, but it would work and I would
understand what is going on.

Can you explain AddrInfo in the above simple (ip4, ip6, whichOneFlag)
class terms? Why do we need AddrInfo now and why are we talking about a
"tree" with "pointers" and alloc/free operations?

Did Squid2 use an equivalent of AddrInfo? If yes, did it use it as often
as Squid3 does now? Did Squid2 allocated and freed it?

Thank you,

Alex.


> I'm using it in three ways.
> 
> First Usage: [simplest via InitAddrInfo() paired with an assignment
> somewhere ]
>  - to create an empty addrinfo tree root and pass to the system to be
> filled out. The results are copied to the fde or xxData objects for later
> async use.
> 
> 
> Second usage: [ hiding complex setup via GetAddrInfo() ]
>   - to create addrinfo tree containing the current IPAddress so system or
> squid can use the structure fields in some comm call.
>   Variant 1: addrinfo provides: AF_*, sockaddr_storage*, and sizeof()
> results in one package with fixed field names/locations for squid to
> send individually to a system call without mode-specific #ifdef's.
>   Variant 2: whole addrinfo goes to system for use.
> 
> 
> Third Usage: [ hiding complex usage via GetAddrInfo() paired with an
> assignment somewhere ]
>   - to create addrinfo tree containing the current IP and socket
> information so system can update/add its knowledge of the FQDN,
> hostname, alternate IPs, socket options, or original destination (in NAT
> case).
> (both above Variantions apply here too)
>   - which then gets saved somewhere for squid to use later.
> 
> 
> All three uses above end up with a tree of dynamic data squid is expected
> to cleanup. So all uses are terminated with a FreeAddrInfo() to perform
> that cleanup.
> 
> >
> > There's a netural sock addr size type - its called sockaddr_storage.
> > addr_info is for return results from hostname<->ip queries..
> >
> 
> struct addrinfo extends struct sockaddr_storage to handle multiple IPs,
> hostnames, and removes the need for many type-castings in user-space code.
> 
> Amos
> 
> >
> > Adrian
> >
> >
> > On Fri, Feb 08, 2008, Amos Jeffries wrote:
> >> > On Thu, Feb 07, 2008, Robert Collins wrote:
> >> >
> >> >> > Still, this is one of those "death of a thousand cuts" method of
> >> >> > killing performance..
> >> >>
> >> >> Right, I haven't seen the commit; care to mail the diff?
> >> >
> >> > Which? I just looked at the places where Amos is calling GetAddrInfo()
> >> > and FreeAddrInfo(); more then one are:
> >> >
> >> > * GetAddrInfo(temp, );
> >> > * F->{something} = temp;
> >> > * FreeAddrInfo(temp);
> >>
> >> Where?!
> >>
> >>  I hate addrinfo enough that I only added addrinfo where it needed:
> >>
> >>  GetAddrInfo(temp)
> >>  syscall_needing_neutral_sockaddr_size_family(temp)
> >>  FreeAddrInfo(temp)
> >>
> >> OR:
> >>
> >>  GetAddrInfo(temp)
> >>  syscall_altering_addrinfo(temp)
> >>  F->something = temp;
> >>  FreeAddrInfo(temp)
> >>
> >>
> >> Amos
> >>
> >
> > --
> > - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid
> > Support -
> > - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
> >
> 



Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
> And which version of debian on which platform is showing the busted
> behaviour?
>

 Dell i386
 Kernels 2.6.18-4 thru 2.6.22-2
 libstdc6 with g++ 4.1 thru 4.3

All the debian setups I have had to test with basically since I first
found it.

Amos

>
> On Fri, Feb 08, 2008, Amos Jeffries wrote:
>> >
>> > On Thu, 2008-02-07 at 12:21 +0900, Adrian Chadd wrote:
>> >>
>> >> Well, I haven't removed the temporary malloc/free pair, whatever its
>> >> called;
>> >> I've just removed Amos' workaround in src/comm.cc so it doesn't leak
>> >> on my
>> >> system whilst I profile.
>> >>
>> >> Still, this is one of those "death of a thousand cuts" method of
>> >> killing performance..
>> >
>> > Right, I haven't seen the commit; care to mail the diff?
>> >
>>
>>
>> Here you go.
>> "
>> --- src/comm.cc 2008-01-19 20:50:42.0 +1300
>> +++ src/comm.cc-2   2008-02-08 09:56:11.0 +1300
>> @@ -1381,7 +1381,7 @@
>>
>>  }
>>
>> -#ifdef _SQUID_LINUX_
>> +#ifdef 0 // _SQUID_LINUX_
>>  /* 2007-11-27:
>>   * Linux Debian replaces our allocated AI pointer with garbage when
>>   * connect() fails. This leads to segmentation faults deallocating
>> "
>>
>>
>> In reply to your last few emails Adrian.
>>
>> re: can we join up on IRC?
>>
>>   I may not be able to for the next few days, too much else on.
>>
>>
>> re: Why are you trying to allocate the structure on invocation of
>> GetAddrInfo() ?
>>
>>   The design was to follow the well-known structure of the kernel call
>> getaddrinfo(), which fills kernel-managed memory in a thread-safe way
>> down a dynamic structure. The addrinfo structs are too nasty to use
>> naked for anything like the amount of comm usage squid has. They are not
>> an object per-se, but the root of a pointer tree to various types of
>> node, which depend on the flags for their memory sizes.
>>   We could make a variant of the call which takes an addrinfo object and
>> sets it up to point at the IPAddress internals correctly. BUT the bug we
>> are looking at would then cause the IPAddress to become garbage, maybe
>> have the kernel free'ing stack memory, etc.
>>
>>
>> re: Argh, this temporary malloc/free pair is peppered throughout the
>> codebase! Grr.
>>
>>  You already know there are naked comm calls everywhere. :-( Most of
>> them
>> calls addrinfo was desgined for use in.
>>
>>
>> re: I've removed that hack, and things work fine for me. Ubuntu 7.01
>> here,
>> x86_64.
>>
>>  Yes, I found no problem on ubuntu also. I simply could not (after very
>> little searching) find a #if test that would only work for Debian.
>>
>>
>> Amos
>>
>
> --
> - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid
> Support -
> - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
>




Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Adrian Chadd
And which version of debian on which platform is showing the busted behaviour?



Adrian

On Fri, Feb 08, 2008, Amos Jeffries wrote:
> >
> > On Thu, 2008-02-07 at 12:21 +0900, Adrian Chadd wrote:
> >>
> >> Well, I haven't removed the temporary malloc/free pair, whatever its
> >> called;
> >> I've just removed Amos' workaround in src/comm.cc so it doesn't leak
> >> on my
> >> system whilst I profile.
> >>
> >> Still, this is one of those "death of a thousand cuts" method of
> >> killing performance..
> >
> > Right, I haven't seen the commit; care to mail the diff?
> >
> 
> 
> Here you go.
> "
> --- src/comm.cc 2008-01-19 20:50:42.0 +1300
> +++ src/comm.cc-2   2008-02-08 09:56:11.0 +1300
> @@ -1381,7 +1381,7 @@
> 
>  }
> 
> -#ifdef _SQUID_LINUX_
> +#ifdef 0 // _SQUID_LINUX_
>  /* 2007-11-27:
>   * Linux Debian replaces our allocated AI pointer with garbage when
>   * connect() fails. This leads to segmentation faults deallocating
> "
> 
> 
> In reply to your last few emails Adrian.
> 
> re: can we join up on IRC?
> 
>   I may not be able to for the next few days, too much else on.
> 
> 
> re: Why are you trying to allocate the structure on invocation of
> GetAddrInfo() ?
> 
>   The design was to follow the well-known structure of the kernel call
> getaddrinfo(), which fills kernel-managed memory in a thread-safe way
> down a dynamic structure. The addrinfo structs are too nasty to use
> naked for anything like the amount of comm usage squid has. They are not
> an object per-se, but the root of a pointer tree to various types of
> node, which depend on the flags for their memory sizes.
>   We could make a variant of the call which takes an addrinfo object and
> sets it up to point at the IPAddress internals correctly. BUT the bug we
> are looking at would then cause the IPAddress to become garbage, maybe
> have the kernel free'ing stack memory, etc.
> 
> 
> re: Argh, this temporary malloc/free pair is peppered throughout the
> codebase! Grr.
> 
>  You already know there are naked comm calls everywhere. :-( Most of them
> calls addrinfo was desgined for use in.
> 
> 
> re: I've removed that hack, and things work fine for me. Ubuntu 7.01 here,
> x86_64.
> 
>  Yes, I found no problem on ubuntu also. I simply could not (after very
> little searching) find a #if test that would only work for Debian.
> 
> 
> Amos
> 

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
> On Fri, Feb 08, 2008, Amos Jeffries wrote:
>
>> Try it. You will be horrified.
>
> Why? Whats being trashed, AI, or AI->somemember?

AI->somemember.

>
> If AI is being trashed, then just create a temporary pointer copy
> of AI, use that in the socket call, and then throw it away.
> if its trashing the memory -at- the AI rather than the AI pointer
> itself then you should probably spend some time with valgrind.
>
>> The pointer-tree needs by the OS to work with screws that up. We would
>> need to allocate a new addrinfo for each master node and set its
>> pointers
>> to new memory, copy the original data into it ... and hey! thats what
>> IPAddress::GetAddrInfo() does!
>
> It doesn't do that in C! I don't get it.
>
>> Do not confuse IPAddress::GetAddrInfo() which allocates an initializes a
>> 'full' one-IP addrinfo* tree quickly which is about to be used in place
>> of
>> a blocking getaddrinfo() system call.
>>
>> The second-best alternative in many of these places is hard-coding
>> sockaddr_in* types and using #if USE_IPV6 to alternate the calls.
>
>> With IPAddress::InitAddrInfo() which allocates a single empty addrinfo*
>> ready for new data to be inserted and only sets the flags up.
>
> Are there any situations where you iterate over a list of AddrInfo's ?

Yes, in all those F->somewhere = *AI calls. Robert pointed out.
In all results from get*() system calls.
In all results from xgetaddrinfo() system calls.

>
> You're not passing in a struct addr_info which you've allocated yourself
> into
> a glibc call which then frobs it and possibly allocates another?

I suspect thats whats happening. Or, errors changing it to some sockaddr*
thing representing a bad-IP state. Same diff to squid.
When it should NOT be altering the pointers, but writing to the memory
provided.

Look at this:

int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen);

in squid ...

  AI = { ...
ai_addr (0xfff...) 
ai_addrlen = 16
... };

   connect(fd, AI->ai_addr, AI->ai_addrlen);

afterwards:

  AI = { ...
ai_addr (0x80...) 
ai_addrlen = 16
... };

the rest of AI is untouched.

So its altering a CONST parameter. And worse. It's taking the pointer
by-value as if it was accepting by-ref and alocating its own.


Amos




Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Adrian Chadd
On Fri, Feb 08, 2008, Amos Jeffries wrote:

> Try it. You will be horrified.

Why? Whats being trashed, AI, or AI->somemember?

If AI is being trashed, then just create a temporary pointer copy
of AI, use that in the socket call, and then throw it away.
if its trashing the memory -at- the AI rather than the AI pointer
itself then you should probably spend some time with valgrind.

> The pointer-tree needs by the OS to work with screws that up. We would
> need to allocate a new addrinfo for each master node and set its pointers
> to new memory, copy the original data into it ... and hey! thats what
> IPAddress::GetAddrInfo() does!

It doesn't do that in C! I don't get it.

> Do not confuse IPAddress::GetAddrInfo() which allocates an initializes a
> 'full' one-IP addrinfo* tree quickly which is about to be used in place of
> a blocking getaddrinfo() system call.
> 
> The second-best alternative in many of these places is hard-coding
> sockaddr_in* types and using #if USE_IPV6 to alternate the calls.

> With IPAddress::InitAddrInfo() which allocates a single empty addrinfo*
> ready for new data to be inserted and only sets the flags up.

Are there any situations where you iterate over a list of AddrInfo's ?

You're not passing in a struct addr_info which you've allocated yourself into
a glibc call which then frobs it and possibly allocates another?




Adrian



Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
> Have you tried running valgrind?
>
> Also, wait. What exactly are you using addr_info for? Can you explain it
> to me?

The basics:

  addrinfo* extends the neutral sockaddr* type ( union { sockaddr_in,
sockaddr_in6, sockaddr_storage} ) while adding the size values and
moving the flags and protocol details.

I'm using it in three ways.

First Usage: [simplest via InitAddrInfo() paired with an assignment
somewhere ]
 - to create an empty addrinfo tree root and pass to the system to be
filled out. The results are copied to the fde or xxData objects for later
async use.


Second usage: [ hiding complex setup via GetAddrInfo() ]
  - to create addrinfo tree containing the current IPAddress so system or
squid can use the structure fields in some comm call.
  Variant 1: addrinfo provides: AF_*, sockaddr_storage*, and sizeof()
results in one package with fixed field names/locations for squid to
send individually to a system call without mode-specific #ifdef's.
  Variant 2: whole addrinfo goes to system for use.


Third Usage: [ hiding complex usage via GetAddrInfo() paired with an
assignment somewhere ]
  - to create addrinfo tree containing the current IP and socket
information so system can update/add its knowledge of the FQDN,
hostname, alternate IPs, socket options, or original destination (in NAT
case).
(both above Variantions apply here too)
  - which then gets saved somewhere for squid to use later.


All three uses above end up with a tree of dynamic data squid is expected
to cleanup. So all uses are terminated with a FreeAddrInfo() to perform
that cleanup.

>
> There's a netural sock addr size type - its called sockaddr_storage.
> addr_info is for return results from hostname<->ip queries..
>

struct addrinfo extends struct sockaddr_storage to handle multiple IPs,
hostnames, and removes the need for many type-castings in user-space code.

Amos

>
> Adrian
>
>
> On Fri, Feb 08, 2008, Amos Jeffries wrote:
>> > On Thu, Feb 07, 2008, Robert Collins wrote:
>> >
>> >> > Still, this is one of those "death of a thousand cuts" method of
>> >> > killing performance..
>> >>
>> >> Right, I haven't seen the commit; care to mail the diff?
>> >
>> > Which? I just looked at the places where Amos is calling GetAddrInfo()
>> > and FreeAddrInfo(); more then one are:
>> >
>> > * GetAddrInfo(temp, );
>> > * F->{something} = temp;
>> > * FreeAddrInfo(temp);
>>
>> Where?!
>>
>>  I hate addrinfo enough that I only added addrinfo where it needed:
>>
>>  GetAddrInfo(temp)
>>  syscall_needing_neutral_sockaddr_size_family(temp)
>>  FreeAddrInfo(temp)
>>
>> OR:
>>
>>  GetAddrInfo(temp)
>>  syscall_altering_addrinfo(temp)
>>  F->something = temp;
>>  FreeAddrInfo(temp)
>>
>>
>> Amos
>>
>
> --
> - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid
> Support -
> - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
>




Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
> Uhm, the kernel won't be free'ing userland memory at all; it doesn't give
> a rats how its managed.
>
> You're probably confused with the library - the libc is fumbling with
> all the struct addrinfo things.

Possibly. Once an external call is made I no longer care if its kernel or
library.  Bad effects are equally dangerous to squid.

>
> Also, if the bug bugs you, create a temporary pointer and pass that in. ;)
>

Try it. You will be horrified.
The pointer-tree needs by the OS to work with screws that up. We would
need to allocate a new addrinfo for each master node and set its pointers
to new memory, copy the original data into it ... and hey! thats what
IPAddress::GetAddrInfo() does!

Do not confuse IPAddress::GetAddrInfo() which allocates an initializes a
'full' one-IP addrinfo* tree quickly which is about to be used in place of
a blocking getaddrinfo() system call.

The second-best alternative in many of these places is hard-coding
sockaddr_in* types and using #if USE_IPV6 to alternate the calls.

With IPAddress::InitAddrInfo() which allocates a single empty addrinfo*
ready for new data to be inserted and only sets the flags up.

Amos

>
>
> Adrian
>
> On Fri, Feb 08, 2008, Amos Jeffries wrote:
>> >
>> > On Thu, 2008-02-07 at 12:21 +0900, Adrian Chadd wrote:
>> >>
>> >> Well, I haven't removed the temporary malloc/free pair, whatever its
>> >> called;
>> >> I've just removed Amos' workaround in src/comm.cc so it doesn't leak
>> >> on my
>> >> system whilst I profile.
>> >>
>> >> Still, this is one of those "death of a thousand cuts" method of
>> >> killing performance..
>> >
>> > Right, I haven't seen the commit; care to mail the diff?
>> >
>>
>>
>> Here you go.
>> "
>> --- src/comm.cc 2008-01-19 20:50:42.0 +1300
>> +++ src/comm.cc-2   2008-02-08 09:56:11.0 +1300
>> @@ -1381,7 +1381,7 @@
>>
>>  }
>>
>> -#ifdef _SQUID_LINUX_
>> +#ifdef 0 // _SQUID_LINUX_
>>  /* 2007-11-27:
>>   * Linux Debian replaces our allocated AI pointer with garbage when
>>   * connect() fails. This leads to segmentation faults deallocating
>> "
>>
>>
>> In reply to your last few emails Adrian.
>>
>> re: can we join up on IRC?
>>
>>   I may not be able to for the next few days, too much else on.
>>
>>
>> re: Why are you trying to allocate the structure on invocation of
>> GetAddrInfo() ?
>>
>>   The design was to follow the well-known structure of the kernel call
>> getaddrinfo(), which fills kernel-managed memory in a thread-safe way
>> down a dynamic structure. The addrinfo structs are too nasty to use
>> naked for anything like the amount of comm usage squid has. They are not
>> an object per-se, but the root of a pointer tree to various types of
>> node, which depend on the flags for their memory sizes.
>>   We could make a variant of the call which takes an addrinfo object and
>> sets it up to point at the IPAddress internals correctly. BUT the bug we
>> are looking at would then cause the IPAddress to become garbage, maybe
>> have the kernel free'ing stack memory, etc.
>>
>>
>> re: Argh, this temporary malloc/free pair is peppered throughout the
>> codebase! Grr.
>>
>>  You already know there are naked comm calls everywhere. :-( Most of
>> them
>> calls addrinfo was desgined for use in.
>>
>>
>> re: I've removed that hack, and things work fine for me. Ubuntu 7.01
>> here,
>> x86_64.
>>
>>  Yes, I found no problem on ubuntu also. I simply could not (after very
>> little searching) find a #if test that would only work for Debian.
>>
>>
>> Amos
>>
>
> --
> - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid
> Support -
> - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
>




Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Adrian Chadd
Have you tried running valgrind?

Also, wait. What exactly are you using addr_info for? Can you explain it
to me?

There's a netural sock addr size type - its called sockaddr_storage.
addr_info is for return results from hostname<->ip queries..


Adrian


On Fri, Feb 08, 2008, Amos Jeffries wrote:
> > On Thu, Feb 07, 2008, Robert Collins wrote:
> >
> >> > Still, this is one of those "death of a thousand cuts" method of
> >> > killing performance..
> >>
> >> Right, I haven't seen the commit; care to mail the diff?
> >
> > Which? I just looked at the places where Amos is calling GetAddrInfo()
> > and FreeAddrInfo(); more then one are:
> >
> > * GetAddrInfo(temp, );
> > * F->{something} = temp;
> > * FreeAddrInfo(temp);
> 
> Where?!
> 
>  I hate addrinfo enough that I only added addrinfo where it needed:
> 
>  GetAddrInfo(temp)
>  syscall_needing_neutral_sockaddr_size_family(temp)
>  FreeAddrInfo(temp)
> 
> OR:
> 
>  GetAddrInfo(temp)
>  syscall_altering_addrinfo(temp)
>  F->something = temp;
>  FreeAddrInfo(temp)
> 
> 
> Amos
> 

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
>
> On Thu, 2008-02-07 at 14:08 +0900, Adrian Chadd wrote:
>> On Thu, Feb 07, 2008, Robert Collins wrote:
>>
>> > > Still, this is one of those "death of a thousand cuts" method of
>> > > killing performance..
>> >
>> > Right, I haven't seen the commit; care to mail the diff?
>>
>> Which? I just looked at the places where Amos is calling GetAddrInfo()
>> and FreeAddrInfo(); more then one are:
>>
>> * GetAddrInfo(temp, );
>> * F->{something} = temp;
>> * FreeAddrInfo(temp);
>
> Wheee, surely that would be better as
> F->{something} = GetAddrInfo(...)
>

F-something should all be IPAddress type. Assignment to update based on
some results kernel places in temp. That usage above is a guaranteed leak
of N bytes where N can be up to the size of a small DNS zone!

Please read this for a short synopsis of the requirements of addrinfo usage:
  http://linux.die.net/man/3/getaddrinfo

Amos



Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Adrian Chadd
Uhm, the kernel won't be free'ing userland memory at all; it doesn't give
a rats how its managed.

You're probably confused with the library - the libc is fumbling with
all the struct addrinfo things.

Also, if the bug bugs you, create a temporary pointer and pass that in. ;)



Adrian

On Fri, Feb 08, 2008, Amos Jeffries wrote:
> >
> > On Thu, 2008-02-07 at 12:21 +0900, Adrian Chadd wrote:
> >>
> >> Well, I haven't removed the temporary malloc/free pair, whatever its
> >> called;
> >> I've just removed Amos' workaround in src/comm.cc so it doesn't leak
> >> on my
> >> system whilst I profile.
> >>
> >> Still, this is one of those "death of a thousand cuts" method of
> >> killing performance..
> >
> > Right, I haven't seen the commit; care to mail the diff?
> >
> 
> 
> Here you go.
> "
> --- src/comm.cc 2008-01-19 20:50:42.0 +1300
> +++ src/comm.cc-2   2008-02-08 09:56:11.0 +1300
> @@ -1381,7 +1381,7 @@
> 
>  }
> 
> -#ifdef _SQUID_LINUX_
> +#ifdef 0 // _SQUID_LINUX_
>  /* 2007-11-27:
>   * Linux Debian replaces our allocated AI pointer with garbage when
>   * connect() fails. This leads to segmentation faults deallocating
> "
> 
> 
> In reply to your last few emails Adrian.
> 
> re: can we join up on IRC?
> 
>   I may not be able to for the next few days, too much else on.
> 
> 
> re: Why are you trying to allocate the structure on invocation of
> GetAddrInfo() ?
> 
>   The design was to follow the well-known structure of the kernel call
> getaddrinfo(), which fills kernel-managed memory in a thread-safe way
> down a dynamic structure. The addrinfo structs are too nasty to use
> naked for anything like the amount of comm usage squid has. They are not
> an object per-se, but the root of a pointer tree to various types of
> node, which depend on the flags for their memory sizes.
>   We could make a variant of the call which takes an addrinfo object and
> sets it up to point at the IPAddress internals correctly. BUT the bug we
> are looking at would then cause the IPAddress to become garbage, maybe
> have the kernel free'ing stack memory, etc.
> 
> 
> re: Argh, this temporary malloc/free pair is peppered throughout the
> codebase! Grr.
> 
>  You already know there are naked comm calls everywhere. :-( Most of them
> calls addrinfo was desgined for use in.
> 
> 
> re: I've removed that hack, and things work fine for me. Ubuntu 7.01 here,
> x86_64.
> 
>  Yes, I found no problem on ubuntu also. I simply could not (after very
> little searching) find a #if test that would only work for Debian.
> 
> 
> Amos
> 

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
> On Thu, Feb 07, 2008, Robert Collins wrote:
>
>> > Still, this is one of those "death of a thousand cuts" method of
>> > killing performance..
>>
>> Right, I haven't seen the commit; care to mail the diff?
>
> Which? I just looked at the places where Amos is calling GetAddrInfo()
> and FreeAddrInfo(); more then one are:
>
> * GetAddrInfo(temp, );
> * F->{something} = temp;
> * FreeAddrInfo(temp);

Where?!

 I hate addrinfo enough that I only added addrinfo where it needed:

 GetAddrInfo(temp)
 syscall_needing_neutral_sockaddr_size_family(temp)
 FreeAddrInfo(temp)

OR:

 GetAddrInfo(temp)
 syscall_altering_addrinfo(temp)
 F->something = temp;
 FreeAddrInfo(temp)


Amos




Re: squid-3.HEAD IPAddress leak

2008-02-07 Thread Amos Jeffries
>
> On Thu, 2008-02-07 at 12:21 +0900, Adrian Chadd wrote:
>>
>> Well, I haven't removed the temporary malloc/free pair, whatever its
>> called;
>> I've just removed Amos' workaround in src/comm.cc so it doesn't leak
>> on my
>> system whilst I profile.
>>
>> Still, this is one of those "death of a thousand cuts" method of
>> killing performance..
>
> Right, I haven't seen the commit; care to mail the diff?
>


Here you go.
"
--- src/comm.cc 2008-01-19 20:50:42.0 +1300
+++ src/comm.cc-2   2008-02-08 09:56:11.0 +1300
@@ -1381,7 +1381,7 @@

 }

-#ifdef _SQUID_LINUX_
+#ifdef 0 // _SQUID_LINUX_
 /* 2007-11-27:
  * Linux Debian replaces our allocated AI pointer with garbage when
  * connect() fails. This leads to segmentation faults deallocating
"


In reply to your last few emails Adrian.

re: can we join up on IRC?

  I may not be able to for the next few days, too much else on.


re: Why are you trying to allocate the structure on invocation of
GetAddrInfo() ?

  The design was to follow the well-known structure of the kernel call
getaddrinfo(), which fills kernel-managed memory in a thread-safe way
down a dynamic structure. The addrinfo structs are too nasty to use
naked for anything like the amount of comm usage squid has. They are not
an object per-se, but the root of a pointer tree to various types of
node, which depend on the flags for their memory sizes.
  We could make a variant of the call which takes an addrinfo object and
sets it up to point at the IPAddress internals correctly. BUT the bug we
are looking at would then cause the IPAddress to become garbage, maybe
have the kernel free'ing stack memory, etc.


re: Argh, this temporary malloc/free pair is peppered throughout the
codebase! Grr.

 You already know there are naked comm calls everywhere. :-( Most of them
calls addrinfo was desgined for use in.


re: I've removed that hack, and things work fine for me. Ubuntu 7.01 here,
x86_64.

 Yes, I found no problem on ubuntu also. I simply could not (after very
little searching) find a #if test that would only work for Debian.


Amos




Re: squid-3.HEAD IPAddress leak

2008-02-06 Thread Adrian Chadd
On Thu, Feb 07, 2008, Robert Collins wrote:
> 
> On Thu, 2008-02-07 at 14:08 +0900, Adrian Chadd wrote:
> > On Thu, Feb 07, 2008, Robert Collins wrote:
> > 
> > > > Still, this is one of those "death of a thousand cuts" method of
> > > > killing performance..
> > > 
> > > Right, I haven't seen the commit; care to mail the diff?
> > 
> > Which? I just looked at the places where Amos is calling GetAddrInfo()
> > and FreeAddrInfo(); more then one are:
> > 
> > * GetAddrInfo(temp, );
> > * F->{something} = temp;
> > * FreeAddrInfo(temp);
> 
> Wheee, surely that would be better as
> F->{something} = GetAddrInfo(...)

or just GetAddrInfo(F->something, ...)

We're just filling in the details, there's no need to throw 50-60 bytes on
the stack and copy it when a pointer/reference will suffice.



Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: squid-3.HEAD IPAddress leak

2008-02-06 Thread Robert Collins

On Thu, 2008-02-07 at 14:08 +0900, Adrian Chadd wrote:
> On Thu, Feb 07, 2008, Robert Collins wrote:
> 
> > > Still, this is one of those "death of a thousand cuts" method of
> > > killing performance..
> > 
> > Right, I haven't seen the commit; care to mail the diff?
> 
> Which? I just looked at the places where Amos is calling GetAddrInfo()
> and FreeAddrInfo(); more then one are:
> 
> * GetAddrInfo(temp, );
> * F->{something} = temp;
> * FreeAddrInfo(temp);

Wheee, surely that would be better as
F->{something} = GetAddrInfo(...)

?
-Rob

-- 
GPG key available at: .


signature.asc
Description: This is a digitally signed message part


Re: squid-3.HEAD IPAddress leak

2008-02-06 Thread Adrian Chadd
On Thu, Feb 07, 2008, Robert Collins wrote:

> > Still, this is one of those "death of a thousand cuts" method of
> > killing performance..
> 
> Right, I haven't seen the commit; care to mail the diff?

Which? I just looked at the places where Amos is calling GetAddrInfo()
and FreeAddrInfo(); more then one are:

* GetAddrInfo(temp, );
* F->{something} = temp;
* FreeAddrInfo(temp);

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: squid-3.HEAD IPAddress leak

2008-02-06 Thread Robert Collins

On Thu, 2008-02-07 at 12:21 +0900, Adrian Chadd wrote:
> 
> Well, I haven't removed the temporary malloc/free pair, whatever its
> called;
> I've just removed Amos' workaround in src/comm.cc so it doesn't leak
> on my
> system whilst I profile.
> 
> Still, this is one of those "death of a thousand cuts" method of
> killing performance..

Right, I haven't seen the commit; care to mail the diff?

-Rob
-- 
GPG key available at: .


signature.asc
Description: This is a digitally signed message part


Re: squid-3.HEAD IPAddress leak

2008-02-06 Thread Adrian Chadd
On Thu, Feb 07, 2008, Robert Collins wrote:
> > 
> > I've removed that hack, and things work fine for me. Ubuntu 7.01 here, 
> > x86_64.
> 
> There's a useful c++ thing called 'placement new' - its used when you
> have classes that you want to use on the stack, but still have tight
> abstraction boundaries.

Well, I haven't removed the temporary malloc/free pair, whatever its called;
I've just removed Amos' workaround in src/comm.cc so it doesn't leak on my
system whilst I profile.

Still, this is one of those "death of a thousand cuts" method of killing 
performance..



Adrian



Re: squid-3.HEAD IPAddress leak

2008-02-06 Thread Robert Collins

On Thu, 2008-02-07 at 11:56 +0900, Adrian Chadd wrote:


> Argh, this temporary malloc/free pair is peppered throughout the codebase!
> Grr.
> 
> I've removed that hack, and things work fine for me. Ubuntu 7.01 here, x86_64.

There's a useful c++ thing called 'placement new' - its used when you
have classes that you want to use on the stack, but still have tight
abstraction boundaries.

-Rob
-- 
GPG key available at: .


signature.asc
Description: This is a digitally signed message part


Re: squid-3.HEAD IPAddress leak

2008-02-06 Thread Adrian Chadd
On Thu, Feb 07, 2008, Amos Jeffries wrote:
> > Amos, could you please poke the leak in IPAddress:GetAddrInfo a little?
> 
> I'll give it another try. But don't hold your breath too long on that one.
> 
> Did you see my note on it?
> I tracked that leak backwards from two segfaults to the connect() system
> call. Then I tracked the allocation from my init forwards to the connect()
> call.
> 
> As far as I can tell when connect() returns OK, things work. But if
> connect() fails in any way the addrinfo pointer is immediately pointing
> somewhere in the system read-only memory with a leak lost. Kernel
> segfaults squid when it attempt to cleanup anywhere in the read-only
> areas.

Argh, this temporary malloc/free pair is peppered throughout the codebase!
Grr.

I've removed that hack, and things work fine for me. Ubuntu 7.01 here, x86_64.



Adrian



Re: squid-3.HEAD IPAddress leak

2008-02-06 Thread Adrian Chadd
On Thu, Feb 07, 2008, Amos Jeffries wrote:
> > Amos, could you please poke the leak in IPAddress:GetAddrInfo a little?
> 
> I'll give it another try. But don't hold your breath too long on that one.

Why are you trying to allocate the structure on invocation of GetAddrInfo() ?

Wouldn't it be better to simply assert the caller must provide storage for
the structure, and you simply fill it in?

That avoids a malloc/free pair (as it'll generally be in the stack frame of the
caller or it'll be embedded in some other structure) and removes this special
case allocation.



Adrian

-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -


Re: squid-3.HEAD IPAddress leak

2008-02-06 Thread Adrian Chadd
On Thu, Feb 07, 2008, Amos Jeffries wrote:
> > Amos, could you please poke the leak in IPAddress:GetAddrInfo a little?
> 
> I'll give it another try. But don't hold your breath too long on that one.
> 
> Did you see my note on it?
> I tracked that leak backwards from two segfaults to the connect() system
> call. Then I tracked the allocation from my init forwards to the connect()
> call.
> 
> As far as I can tell when connect() returns OK, things work. But if
> connect() fails in any way the addrinfo pointer is immediately pointing
> somewhere in the system read-only memory with a leak lost. Kernel
> segfaults squid when it attempt to cleanup anywhere in the read-only
> areas.

Come on IRC and we'll talk through it? I can't imagine its going to take
long to fix.






Adrian



Re: squid-3.HEAD IPAddress leak

2008-02-06 Thread Amos Jeffries
> Amos, could you please poke the leak in IPAddress:GetAddrInfo a little?

I'll give it another try. But don't hold your breath too long on that one.

Did you see my note on it?
I tracked that leak backwards from two segfaults to the connect() system
call. Then I tracked the allocation from my init forwards to the connect()
call.

As far as I can tell when connect() returns OK, things work. But if
connect() fails in any way the addrinfo pointer is immediately pointing
somewhere in the system read-only memory with a leak lost. Kernel
segfaults squid when it attempt to cleanup anywhere in the read-only
areas.

Amos

>
> Thanks,
>
>
>
>
> Adrian
>
> ==19601==
> ==19601== ERROR SUMMARY: 9353 errors from 5 contexts (suppressed: 8 from
> 2)
> ==19601== malloc/free: in use at exit: 36,429,514 bytes in 14,790 blocks.
> ==19601== malloc/free: 114,225 allocs, 99,435 frees, 115,499,321 bytes
> allocated.
> ==19601== For counts of detected errors, rerun with: -v
> ==19601== searching for pointers to 14,790 not-freed blocks.
> ==19601== checked 37,289,184 bytes.
> ==19601==
> ==19601==
> ==19601== 4,104 bytes in 4 blocks are possibly lost in loss record 10 of
> 27
> ==19601==at 0x4C21C16: malloc (vg_replace_malloc.c:149)
> ==19601==by 0x4EDE17: xmalloc (util.c:506)
> ==19601==by 0x4754B8: httpHeaderBuildFieldsInfo (SquidNew.h:54)
> ==19601==by 0x46F44F: httpHeaderIdByNameDef (HttpHeader.cc:1707)
> ==19601==by 0x40E197: ACLHTTPHeaderData::parse()
> (ACLHTTPHeaderData.cc:90)
> ==19601==by 0x408D09: ACL::ParseAclLine(ConfigParser&, ACL**)
> (acl.cc:153)
> ==19601==by 0x421365: parse_line(char*) (cache_cf.cc:888)
> ==19601==by 0x424F9F: parseOneConfigFile(char const*, unsigned)
> (cache_cf.cc:325)
> ==19601==by 0x425990: parseConfigFile(char const*, CacheManager&)
> (cache_cf.cc:359)
> ==19601==by 0x483337: main (main.cc:1198)
> ==19601==
> ==19601==
> ==19601== 147,648 (110,736 direct, 36,912 indirect) bytes in 2,307 blocks
> are definitely lost in loss record 25 of 27
> ==19601==at 0x4C21C16: malloc (vg_replace_malloc.c:149)
> ==19601==by 0x4EDE17: xmalloc (util.c:506)
> ==19601==by 0x4E9BD9: IPAddress::GetAddrInfo(addrinfo*&, int) const
> (SquidNew.h:46)
> ==19601==by 0x4C2DAC: comm_connect_addr (comm.cc:1321)
> ==19601==by 0x4C3E70: ConnectStateData::connect() (comm.cc:1249)
> ==19601==by 0x44F7BA: FwdState::connectStart() (forward.cc:912)
> ==19601==by 0x4487AF: EventDispatcher::dispatch() (event.cc:131)
> ==19601==by 0x449747: EventLoop::runOnce() (EventLoop.cc:131)
> ==19601==by 0x449837: EventLoop::run() (EventLoop.cc:100)
> ==19601==by 0x4836F7: main (main.cc:1321)
>
>
>
> --
> - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid
> Support -
> - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
>




squid-3.HEAD IPAddress leak

2008-02-06 Thread Adrian Chadd
Amos, could you please poke the leak in IPAddress:GetAddrInfo a little?

Thanks,




Adrian

==19601==
==19601== ERROR SUMMARY: 9353 errors from 5 contexts (suppressed: 8 from 2)
==19601== malloc/free: in use at exit: 36,429,514 bytes in 14,790 blocks.
==19601== malloc/free: 114,225 allocs, 99,435 frees, 115,499,321 bytes 
allocated.
==19601== For counts of detected errors, rerun with: -v
==19601== searching for pointers to 14,790 not-freed blocks.
==19601== checked 37,289,184 bytes.
==19601==
==19601==
==19601== 4,104 bytes in 4 blocks are possibly lost in loss record 10 of 27
==19601==at 0x4C21C16: malloc (vg_replace_malloc.c:149)
==19601==by 0x4EDE17: xmalloc (util.c:506)
==19601==by 0x4754B8: httpHeaderBuildFieldsInfo (SquidNew.h:54)
==19601==by 0x46F44F: httpHeaderIdByNameDef (HttpHeader.cc:1707)
==19601==by 0x40E197: ACLHTTPHeaderData::parse() (ACLHTTPHeaderData.cc:90)
==19601==by 0x408D09: ACL::ParseAclLine(ConfigParser&, ACL**) (acl.cc:153)
==19601==by 0x421365: parse_line(char*) (cache_cf.cc:888)
==19601==by 0x424F9F: parseOneConfigFile(char const*, unsigned) 
(cache_cf.cc:325)
==19601==by 0x425990: parseConfigFile(char const*, CacheManager&) 
(cache_cf.cc:359)
==19601==by 0x483337: main (main.cc:1198)
==19601==
==19601==
==19601== 147,648 (110,736 direct, 36,912 indirect) bytes in 2,307 blocks are 
definitely lost in loss record 25 of 27
==19601==at 0x4C21C16: malloc (vg_replace_malloc.c:149)
==19601==by 0x4EDE17: xmalloc (util.c:506)
==19601==by 0x4E9BD9: IPAddress::GetAddrInfo(addrinfo*&, int) const 
(SquidNew.h:46)
==19601==by 0x4C2DAC: comm_connect_addr (comm.cc:1321)
==19601==by 0x4C3E70: ConnectStateData::connect() (comm.cc:1249)
==19601==by 0x44F7BA: FwdState::connectStart() (forward.cc:912)
==19601==by 0x4487AF: EventDispatcher::dispatch() (event.cc:131)
==19601==by 0x449747: EventLoop::runOnce() (EventLoop.cc:131)
==19601==by 0x449837: EventLoop::run() (EventLoop.cc:100)
==19601==by 0x4836F7: main (main.cc:1321)



-- 
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -