Re: [PATCH] net: Saner thash_entries default with much memory

2007-10-30 Thread David Miller
From: Andi Kleen <[EMAIL PROTECTED]>
Date: Tue, 30 Oct 2007 21:42:05 +0100

> I still have my doubts it makes sense to have an own lock for each bucket. It 
> would be probably better to just divide the hash value through a factor
> again and then use that to index a smaller lock only table.

Yes, and that's why we do it this way in the routing cache hashes.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Saner thash_entries default with much memory

2007-10-30 Thread David Miller
From: Jean Delvare <[EMAIL PROTECTED]>
Date: Tue, 30 Oct 2007 14:18:27 +0100

> OK, let's go with (512 * 1024) then. Want me to send an updated patch?

Why submit a patch that's already in Linus's tree :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Saner thash_entries default with much memory

2007-10-30 Thread Andi Kleen

> Next, machines that service that many sockets typically have them
> mostly with full transmit queues talking to a very slow receiver at
> the other end. 

Not sure -- there are likely use cases with lots of idle but connected 
sockets.

Also the constraint here is not really how many sockets are served,
but how well the hash function manages to spread them in the table.. I don't
have good data on that.

But still (512 * 1024) sounds reasonable because e.g. in the lots
of idle socket case you're probably fine with the hash chains
having more than one entry worst case because a small working
set will fit in cache and as long as the chains do not end up
very long walking in cache of a short list will be still fast enough.

> So to me (512 * 1024) is a very reasonable limit and (with lockdep
> and spinlock debugging disabled) this makes the EHASH table consume
> 8MB on UP 64-bit and ~12MB on SMP 64-bit systems.

I still have my doubts it makes sense to have an own lock for each bucket. It 
would be probably better to just divide the hash value through a factor
again and then use that to index a smaller lock only table.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Saner thash_entries default with much memory

2007-10-30 Thread Jean Delvare
Hi David,

Le mardi 30 octobre 2007, David Miller a écrit :
> From: Andi Kleen <[EMAIL PROTECTED]>
> Date: Fri, 26 Oct 2007 17:34:17 +0200
> 
> > On Fri, Oct 26, 2007 at 05:21:31PM +0200, Jean Delvare wrote:
> > > I propose 2 millions of entries as the arbitrary high limit. This
> > 
> > It's probably still far too large.
> 
> I agree.  Perhaps a better number is something on the order of
> (512 * 1024) so I think I'll check in a variant of Jean's patch
> with just the limit decreased like that.

That's very fine with me. I originally proposed an admittedly high
limit value to increase the chance to see it accepted. I am not
familiar enough with networking to know what a more reasonable
limit would be, so I'm leaving it to the experts.

> Using just some back of the envelope calculations, on UP 64-bit
> systems each socket uses about 2424 bytes minimum of memory (this is
> the sum of tcp_sock, inode, dentry, socket, and file on sparc64 UP).
> This is an underestimate because it does not even consider things like
> allocator overhead.
> 
> Next, machines that service that many sockets typically have them
> mostly with full transmit queues talking to a very slow receiver at
> the other end.  So let's estimate that on average each socket consumes
> about 64K of retransmit queue data.
> 
> I think this is an extremely conservative estimate beause it doesn't
> even consider overhead coming from struct sk_buff and related state.
> 
> So for (512 * 1024) of established sockets we consume roughly 35GB of
> memory, this is '((2424 + (64 * 1024)) * (512 * 1024))'.
> 
> So to me (512 * 1024) is a very reasonable limit and (with lockdep
> and spinlock debugging disabled) this makes the EHASH table consume
> 8MB on UP 64-bit and ~12MB on SMP 64-bit systems.

OK, let's go with (512 * 1024) then. Want me to send an updated patch?

Thanks,
-- 
Jean Delvare
Suse L3
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Saner thash_entries default with much memory

2007-10-29 Thread David Miller
From: Andi Kleen <[EMAIL PROTECTED]>
Date: Fri, 26 Oct 2007 17:34:17 +0200

> On Fri, Oct 26, 2007 at 05:21:31PM +0200, Jean Delvare wrote:
> > I propose 2 millions of entries as the arbitrary high limit. This
> 
> It's probably still far too large.

I agree.  Perhaps a better number is something on the order of
(512 * 1024) so I think I'll check in a variant of Jean's patch
with just the limit decreased like that.

Using just some back of the envelope calculations, on UP 64-bit
systems each socket uses about 2424 bytes minimum of memory (this is
the sum of tcp_sock, inode, dentry, socket, and file on sparc64 UP).
This is an underestimate because it does not even consider things like
allocator overhead.

Next, machines that service that many sockets typically have them
mostly with full transmit queues talking to a very slow receiver at
the other end.  So let's estimate that on average each socket consumes
about 64K of retransmit queue data.

I think this is an extremely conservative estimate beause it doesn't
even consider overhead coming from struct sk_buff and related state.

So for (512 * 1024) of established sockets we consume roughly 35GB of
memory, this is '((2424 + (64 * 1024)) * (512 * 1024))'.

So to me (512 * 1024) is a very reasonable limit and (with lockdep
and spinlock debugging disabled) this makes the EHASH table consume
8MB on UP 64-bit and ~12MB on SMP 64-bit systems.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Saner thash_entries default with much memory

2007-10-26 Thread akepner
On Fri, Oct 26, 2007 at 05:21:31PM +0200, Jean Delvare wrote:
> 
> This is just one way to limit the hash size, there are others; I am not
> familiar enough with the TCP code to decide which is best. Thus, I
> would welcome the proposals of alternatives.
> 

Yeah, the existing way of sizing them is very bad on large systems 
and hardcoding a size sucks (though that's what we do on large 
Altix systems now).

IIRC there was some talk about using a different data structure 
here - something that wouldn't require a fixed size and that 
could maintain reasonably fast lookup even when it got large. 

Robert, has work been done to use TRASH to address this problem? 
Other ideas?

-- 
Arthur

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Saner thash_entries default with much memory

2007-10-26 Thread Andi Kleen
On Fri, Oct 26, 2007 at 05:21:31PM +0200, Jean Delvare wrote:
> I know that /proc/net/tcp is
> deprecated in favor of tcp_diag, however at the moment netstat only
> knows of the former.

Even tcp_diag will be slow when all slots are dumped. It's a fundamental
problem of the data structure. /proc has slightly higher constant
factor overhead, that's all.

Also there are some tricks to make it a little faster (e.g.
the old patches to not take the lock for empty buckets) but again
it's only just patching constant factors, not the fundamental
scaling issue.

> I propose 2 millions of entries as the arbitrary high limit. This

It's probably still far too large.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html