Re: [RFC] how to get the size of a malloc(9) block ?

2013-11-29 Thread Tim Kientzle

On Nov 29, 2013, at 3:44 PM, jb  wrote:

> Luigi Rizzo  iet.unipi.it> writes:
> 
>> ... 
>> There is a difference between applications peeking into
>> implementation details that should be hidden, and providing
>> instead limited and specific information through a well defined API.
>> ...
> 
> Right.
> 
> If you want to improve memory management, that is, have the system (kernel
> or user space) handle memory reallocation intelligently and transparently
> to the user, then aim at a well defined API:

Don’t forget:

 * Request a block of “at least N bytes” and have the
allocator tell you what it *really* allocated.

This allows applications to use memory more efficiently
by taking advantage of over-allocation when it happens.

Tim

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] how to get the size of a malloc(9) block ?

2013-11-29 Thread dt71

What is the supposed definition of malloc_usable_size(p) in a hypothetical, 
upcoming C standard? With the rest of the C standard remaining the same, one 
could try:

Definition: The value of malloc_usable_size(p) is the amount of space allocated 
for object p, plus the amount of space after object p that can currently be 
written without messing up other objects or memory-management areas.

This definition is practically useless, because data written to the slack space 
after the object could be claimed by calls to alloc()ish function -- perhaps by 
other threads. Another attempt at the defining something useful:

Definition: The value of malloc_usable_size(p) is the amount of space allocated 
for object p, plus the amount of space after object p that can be written 
without messing up other objects or memory-management areas, while alloc()ish 
functions are not called on p.

With this, one asks the question: How much is usually overallocated? In some 
implementations, usually just a few bytes (say, when the minimal allocation 
unit is 8 bytes); where not, it can be said that the memory manager is quite 
space-leaky.

It appears that it's not possible to make a proper API with 
malloc_usable_size() included, at least when multi-threading is involved (ie., 
in the modern world).

However, it is still useful to create an API that supports the following cases:

- A program knows how to adapt to memory fragmentation without moving an 
ever-growing, but chainable array of data.
- A program would become faster, if it knew when moving is required; then, the 
program could update various pointer-based (as opposed to arrayindex-based) 
references to the object being moved. (Just like when memory is defragmentated 
in a garbage-collected programming language.)
- A program requires more memory in real-time, which means to either receive 
more memory immediately and do something, or to signal a real-time failure.

So new flags could be [1]:
- realloc_flags(p, s, REALLOCF_NO_MOVE): Resize object p, without moving it, to 
size s. With this restriction, when requesting more memory, and the specified 
amount isn't available, don't do anything (when requesting less memory, always 
succeed).
- realloc_flags(p, s, REALLOCF_NO_MOVE | REALLOCF_ELASTIC): Resize object p, 
without moving it, to size s. With this restriction, when requesting more 
memory, and the specified amount isn't available, reserve as much as possible 
(when requesting less memory, always succeed).

On the other hand, be advised of a hypothetical scenario, in which realloc() 
would like to jump at the opportunity to move the object to a different space, 
say, for the purpose of condensing slack space, when statistics show that 
allocated areas have plenty of holes. This means that the design of the new API 
can have more goals:

- The allocator implementation should be able to shape the workings of a 
program at quick-realloc points, for example, by coaxing it to call realloc() 
when memory is very scattered.
- The program should always be able to take advantage of a quick-realloc 
functionality, for example, to support certain real-time requirements of 
applications, to the extent reasonably possible within the implementation.

For this, there could be a REALLOCF_FORCE flag, to be used in real-time 
scenarios. Without the flag, the call can be expected to be rejected on the 
basis of some implementation-specific preference, such as anti-fragmentation.

Is there any insufficiency in this API, in anyone's mind?

[1] When such a distinction makes sense and is supported (not stubbed) in the 
current architecture, environment, implementation, etc.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


10.0-RELEASE status update

2013-11-29 Thread Glen Barber
Quick 10.0-RELEASE status update:

 - iconv(3) changes have been made in head/, and merged to stable/10
   today.

 - Two MFCs are undergoing review, one of which I will commit right
   before updating the stable/10 branch name to reflect '-BETA4'.[1]

 - Builds for 10.0-BETA4 will begin tomorrow.  The schedule page on the
   website will be updated to reflect the start of the -BETA4 builds, as
   well as updating the remainder of the schedule to reflect the
   adjustments after the delay.

[1] - Important note to those tracking stable/10:  An update will be
committed tomorrow that will disable automatic creation of pkg.conf(5).
Those installing new systems from 10.0-BETA4 should experience no
trouble, as the update pkg(8) version (pkg-1.2.1) should be available
around the same time 10.0-BETA4 is announced.  This affects the pkg(8)
bootstrap functionality *only*.  Those with bootstrapped pkg(8) will not
be affected.  For those doing source-based upgrades from stable/10 *and*
do not have pkg(8) already bootstrapped, there will be a brief (about
1 day, "brief" being relative) window of time where pkg(8) (versions
less than 1.2) will not have a pre-configured pkg.conf(5).  This will
also be noted in the 10.0-BETA4 announcement mail.[2]

[2] - I believe this is a non-issue, since usr.sbin/pkg/config.c has
pkg.FreeBSD.org set as the packagesite, but I am sure I am overlooking
something obvious that will prove me wrong.  So, that is why the
"important note" is longer than the actual status update.  :-)

Thank you for your patience.

Glen
On behalf of:   re@



pgpo5Kp0XjnHC.pgp
Description: PGP signature


Re: [RFC] how to get the size of a malloc(9) block ?

2013-11-29 Thread Luigi Rizzo
On Fri, Nov 29, 2013 at 5:02 PM, jb  wrote:

> Luigi Rizzo  iet.unipi.it> writes:
>
> > ...
> > > If you want to improve memory management, that is, have the system
> (kernel
> > > or user space) handle memory reallocation intelligently and
> transparently
> > > to the user, then aim at a well defined API:
> > > - reallocate "with no copy", which means new space appended (taking
> into
> > >   account *usable size*, a hidden-to-user implementation detail), if
> > >   possible
> > > - otherwise fail, and let the user decide about reallocation "with
> copy"
> > >   or allocation of a new space
> > >
> >
> > i respectfully disagree :) but am not pushing to add ksize.
> > Just note that both mine and your "well defined API" leak details:
> >
> > yours is (A) "I may be overallocating but won't tell you how much";
> > mine is  (B) "I may be overallocating and here is exactly how much".
> >
> > Now if I may make a comparison with going shopping,
> > I'd rather hear the final price from the seller (case B),
> > than having to guess by repeated trial and error,
> > which is what case A leads to if i really want to figure out.
> > ...
>
> This is not necessarily true - I omitted the details of reallocation
> implementation on purpose.
> From the caller's point of view, if it requested allocation of memory
> size, then that's what it wanted in the first place. If it got it, then
> there is no other info needed.
>

This is not what we are discussing.

We are discussing the case where the caller,
_before_ requesting extra memory, would like to know
how much space is available to make
different decisions such as

1. realloc unconditionally
2. give up
3. allocate a separate block and chain to it
4. reduce its requirements and live with what extra space
   is available (if any).

Your suggested flags support #1 and #2 directly,
#3 can be simulated with realloc(NO_ALLOC) + malloc(),
but prevent #4.

cheers
luigi

Next, if the caller came to the conclusion that more would be needed, then
> it should ask for memory reallocation, trusting that the system will do it
> in the most efficient way.
> If the caller wants to influence that process, then proper option(s) are
> needed in reallocation API, e.g.:
> - with no copy
> - with copy
> That means one call with options, with a specific (wanted by user) result.
> Of course, thinking thru the options (default, mutual exclusion, etc) is
> an important process and subject to RFC.
> A user-empowering API. No magic, no hacks.
>
> So, how about Request-for-Enhancement to GNU C lib, and the ugly hacks
> will disappear quickly.
>
> jb
>
>
>
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>



-- 
-+---
 Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/. Universita` di Pisa
 TEL  +39-050-2211611   . via Diotisalvi 2
 Mobile   +39-338-6809875   . 56122 PISA (Italy)
-+---
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] how to get the size of a malloc(9) block ?

2013-11-29 Thread Luigi Rizzo
On Fri, Nov 29, 2013 at 4:49 PM, Adrian Chadd  wrote:

> The reason I wouldn't implement this is to avoid having code that
> _relies_ on this behaviour in order to function or perform well.
>


nobody ever said (or could reasonably expect to do) that.

Applications don't know if the allocator overallocates,
so they have no hope unless they work even without the feature.

This is only about giving them an option to improve
performance in those (rare ?) cases where, as i showed,
knowing the underlying allocation size may lead to
better usage of memory.


>
> Heck, it may not even be portable to other operating systems. Except,
> Linux, I guess.
>
>
in userspace, as jb commented, all major OSes have it
(malloc_usable_size() on FreeBSD and Linux,
_msize() on Windows, malloc_size() on OSX).


In the kernel, I have no idea, but porting kernel code
across systems is a nightmare anyways...

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] how to get the size of a malloc(9) block ?

2013-11-29 Thread jb
Luigi Rizzo  iet.unipi.it> writes:

> ...
> > If you want to improve memory management, that is, have the system (kernel
> > or user space) handle memory reallocation intelligently and transparently
> > to the user, then aim at a well defined API:
> > - reallocate "with no copy", which means new space appended (taking into
> >   account *usable size*, a hidden-to-user implementation detail), if
> >   possible
> > - otherwise fail, and let the user decide about reallocation "with copy"
> >   or allocation of a new space
> >
> 
> i respectfully disagree :) but am not pushing to add ksize.
> Just note that both mine and your "well defined API" leak details:
> 
> yours is (A) "I may be overallocating but won't tell you how much";
> mine is  (B) "I may be overallocating and here is exactly how much".
> 
> Now if I may make a comparison with going shopping,
> I'd rather hear the final price from the seller (case B),
> than having to guess by repeated trial and error,
> which is what case A leads to if i really want to figure out.
> ...

This is not necessarily true - I omitted the details of reallocation
implementation on purpose.
>From the caller's point of view, if it requested allocation of memory 
size, then that's what it wanted in the first place. If it got it, then
there is no other info needed.
Next, if the caller came to the conclusion that more would be needed, then
it should ask for memory reallocation, trusting that the system will do it
in the most efficient way.
If the caller wants to influence that process, then proper option(s) are
needed in reallocation API, e.g.:
- with no copy
- with copy
That means one call with options, with a specific (wanted by user) result.
Of course, thinking thru the options (default, mutual exclusion, etc) is
an important process and subject to RFC.
A user-empowering API. No magic, no hacks.

So, how about Request-for-Enhancement to GNU C lib, and the ugly hacks
will disappear quickly.

jb



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] how to get the size of a malloc(9) block ?

2013-11-29 Thread Adrian Chadd
The reason I wouldn't implement this is to avoid having code that
_relies_ on this behaviour in order to function or perform well.

Heck, it may not even be portable to other operating systems. Except,
Linux, I guess.


-adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] how to get the size of a malloc(9) block ?

2013-11-29 Thread Luigi Rizzo
On Fri, Nov 29, 2013 at 3:44 PM, jb  wrote:

> Luigi Rizzo  iet.unipi.it> writes:
>
> > ...
> > There is a difference between applications peeking into
> > implementation details that should be hidden, and providing
> > instead limited and specific information through a well defined API.
> > ...
>
> Right.
>
> If you want to improve memory management, that is, have the system (kernel
> or user space) handle memory reallocation intelligently and transparently
> to the user, then aim at a well defined API:
> - reallocate "with no copy", which means new space appended (taking into
>   account *usable size*, a hidden-to-user implementation detail), if
>   possible
> - otherwise fail, and let the user decide about reallocation "with copy"
>   or allocation of a new space
>


i respectfully disagree :) but am not pushing to add ksize.
Just note that both mine and your "well defined API" leak details:

yours is (A) "I may be overallocating but won't tell you how much";
mine is  (B) "I may be overallocating and here is exactly how much".

Now if I may make a comparison with going shopping,
I'd rather hear the final price from the seller (case B),
than having to guess by repeated trial and error,
which is what case A leads to if i really want to figure out.


> The malloc_usable_size() is a hack.
> The extra space allocated or not due to fragmentation, alignment, etc, is
> an internal by-product, irrelevant to original memory alloc request, and it
> should not be leaked, also because its details may change in future API
> implementations.
> So, these memory allocation functions leaking implementation details, and
> the two derived functions, ksize() and malloc_usable_size() (and other
> derivatives like malloc_size() in Mac OS X), are a violations of a clean,
> safe, and maintainable API.
>
> Note that malloc_usable_size() is a GNU C Library extension, not part of
> Single UNIX Specification.
>

Honestly i did not even know they existed until a few days ago;
but the fact that many different systems have come out with similar
extensions at least make me wonder whether the SUS missed it.

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] how to get the size of a malloc(9) block ?

2013-11-29 Thread jb
Luigi Rizzo  iet.unipi.it> writes:

> ... 
> There is a difference between applications peeking into
> implementation details that should be hidden, and providing
> instead limited and specific information through a well defined API.
> ...

Right.

If you want to improve memory management, that is, have the system (kernel
or user space) handle memory reallocation intelligently and transparently
to the user, then aim at a well defined API:
- reallocate "with no copy", which means new space appended (taking into
  account *usable size*, a hidden-to-user implementation detail), if
  possible
- otherwise fail, and let the user decide about reallocation "with copy"
  or allocation of a new space

The malloc_usable_size() is a hack.
The extra space allocated or not due to fragmentation, alignment, etc, is
an internal by-product, irrelevant to original memory alloc request, and it
should not be leaked, also because its details may change in future API
implementations.
So, these memory allocation functions leaking implementation details, and
the two derived functions, ksize() and malloc_usable_size() (and other
derivatives like malloc_size() in Mac OS X), are a violations of a clean,
safe, and maintainable API.

Note that malloc_usable_size() is a GNU C Library extension, not part of
Single UNIX Specification.

jb
 


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

2013-11-29 Thread Adrian Chadd
Sure, is there a TCP version of this patch floating around? How's it
doing load balancing to multiple listeners?


-a

On 29 November 2013 11:28, Oleg Moskalenko  wrote:
> It would be nice to have this feature compiled and supported in FreeBSD
> kernel by default.
>
> Thanks
> Oleg
>
>
> On Fri, Nov 29, 2013 at 11:01 AM, Ermal Luçi  wrote:
>
>> And some better marketing from Dragonfly about it
>> http://forum.nginx.org/read.php?29,241283,241283 :)
>>
>>
>> On Fri, Nov 29, 2013 at 7:55 PM, Ermal Luçi  wrote:
>>
>>> Also some discussions and improvements to it.
>>>
>>> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/net/2013-09/msg00165.html
>>>
>>>
>>> On Fri, Nov 29, 2013 at 7:42 PM, Ermal Luçi  wrote:
>>>
 Well seems Dragonfly has some version of it already from commit [1].

 In FreeBSD there is the framework for this with by defining PCBGROUP.
 Also the explanation of it at [2] and [3].
 It can achieve approximately the same features of SO_RESUSEPORT of linux.
 The only thing missing is the marketing behind it and i think and better
 RSS support.
 By looking at dates the support is there before linux so all you guys
 looking for it can experiment with it.

 What i was trying to accomplish was something else from performance
 improvement and
 maybe put a sysctl behind it to make it more acceptable..

 [1]
 http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c9c021abb8197718d7a2d441c9
 [2]
 http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=bigexcerpts#L51
 [3]
 http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.html


 On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko 
 wrote:

> Tim, you are wrong. Read what is "multicast" definition, and read how
> UDP and TCP sockets work in Linux 3.9+ kernels.
>
> Oleg .
>
>
> On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle wrote:
>
>>
>> On Nov 29, 2013, at 4:04 AM, Ermal Luçi  wrote:
>>
>> > Hello,
>> >
>> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two
>> daemons to
>> > share the same port and possibly listening ip …
>>
>> These flags are used with TCP-based servers.
>>
>> I’ve used them to make software upgrades go more smoothly.
>> Without them, the following often happens:
>>
>> * Old server stops.  In the process, all of its TCP connections are
>> closed.
>>
>> * Connections to old server remain in the TCP connection table until
>> the remote end can acknowledge.
>>
>> * New server starts.
>>
>> * New server tries to open port but fails because that port is “still
>> in use” by connections in the TCP connection table.
>>
>> With these flags, the new server can open the port even though
>> it is “still in use” by existing connections.
>>
>>
>> > This is not the case today.
>> > Only multicast sockets seem to have the behaviour of broadcasting
>> the data
>> > to all sockets sharing the same properties through these options!
>>
>> That is what multicast is for.
>>
>> If you want the same data sent to all listeners, then
>> that is multicast behavior and you should be using
>> a multicast socket.
>>
>> > The patch at [1] implements/corrects the behaviour for UDP sockets.
>>
>> You’re trying to turn all UDP sockets with those options
>> into multicast sockets.
>>
>> If you want a multicast socket, you should ask for one.
>>
>> Tim
>>
>> ___
>> freebsd-...@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>>
>
>


 --
 Ermal

>>>
>>>
>>>
>>> --
>>> Ermal
>>>
>>
>>
>>
>> --
>> Ermal
>>
> ___
> freebsd-...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] how to get the size of a malloc(9) block ?

2013-11-29 Thread Luigi Rizzo
On Thu, Nov 28, 2013 at 7:13 AM, jb  wrote:

> Luigi Rizzo  iet.unipi.it> writes:
>
> > ...
> > But I don't understand why you find ksize()/malloc_usable_size()
> dangerous.
> > ...
>
> The original crime is commited when *usable size* (an implementation
> detail)
> is exported (leaked) to the caller.
> To be blunt, when a caller requests memory of certain size, and its
> request is
> satisfied, then it is not its business to learn details beyond that (and
> they
> should not be offered as well).
> The API should be sanitized, in kernel and user space.
> Otherwise, all kind of charlatans will try to play hair-raising games with
> it.
> If the caller wants to track the *requested size* programmatically, it is
> its
> business to do it and it can be done very easily.
>


There is a difference between applications peeking into
implementation details that should be hidden, and providing
instead limited and specific information through a well defined API.

In general (not in the specific code I am handling
and not something I personally need),
what the caller might want to do is optimize its requests
according to how system behaves, and it cannot do that
without some help from the below.

I have seen the following types of comments in this thread:

- "you should get it right the first time and never realloc"
  Maybe, but then the offending api is realloc() not ksize()

- "build your own allocator"
  Yes i do it when it makes sense,
  but sometimes it is either overkill or a bad idea (as it loses
  opportunities for global optimizations, duplicates code,
  takes memory in subsystem-specific freelists...)

- "what if ksize()/malloc_usable_size() lies ?"
  Well, that would be a bug in the allocator: if it says
  the memory is usable, it must be usable, period.

- "rather than ksize() i'll give you a fix for one use case"
  (the NO_REALLOC flag to realloc()).
  This i think would be a mistake -- it acknowledges the need
  for exposing some information but then only provides a
  specific fix for one use case.

I'll just restate that there are multiple situations where
an application might use some information on actual allocation
sizes:

- when it needs to extend memory and has a choice between
  a cheap realloc() (if extra space is available),
  chaining blocks (when the memcpy would be too expensive),
  give up and live with whatever space is available.

- when it has freedom in picking the block size
  and so it wants to optimize its requests basing on
  what the underlying allocator does.
  As an example, long ago FreeBSD was really suboptimal
  when you allocated blocks whose size was a power of 2,
  because the metadata was inline.
  These days, there is a different issue:
  powers of 2 are ok but blocks 2049 bytes
  and above seem to be padded to a multiple of 2048,
  leading to a huge overhead in some cases.

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

2013-11-29 Thread Ermal Luçi
And some better marketing from Dragonfly about it
http://forum.nginx.org/read.php?29,241283,241283 :)


On Fri, Nov 29, 2013 at 7:55 PM, Ermal Luçi  wrote:

> Also some discussions and improvements to it.
>
> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/net/2013-09/msg00165.html
>
>
> On Fri, Nov 29, 2013 at 7:42 PM, Ermal Luçi  wrote:
>
>> Well seems Dragonfly has some version of it already from commit [1].
>>
>> In FreeBSD there is the framework for this with by defining PCBGROUP.
>> Also the explanation of it at [2] and [3].
>> It can achieve approximately the same features of SO_RESUSEPORT of linux.
>> The only thing missing is the marketing behind it and i think and better
>> RSS support.
>> By looking at dates the support is there before linux so all you guys
>> looking for it can experiment with it.
>>
>> What i was trying to accomplish was something else from performance
>> improvement and
>> maybe put a sysctl behind it to make it more acceptable..
>>
>> [1]
>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c9c021abb8197718d7a2d441c9
>> [2]
>> http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=bigexcerpts#L51
>> [3] http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.html
>>
>>
>> On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko wrote:
>>
>>> Tim, you are wrong. Read what is "multicast" definition, and read how
>>> UDP and TCP sockets work in Linux 3.9+ kernels.
>>>
>>> Oleg .
>>>
>>>
>>> On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle wrote:
>>>

 On Nov 29, 2013, at 4:04 AM, Ermal Luçi  wrote:

 > Hello,
 >
 > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two daemons
 to
 > share the same port and possibly listening ip …

 These flags are used with TCP-based servers.

 I’ve used them to make software upgrades go more smoothly.
 Without them, the following often happens:

 * Old server stops.  In the process, all of its TCP connections are
 closed.

 * Connections to old server remain in the TCP connection table until
 the remote end can acknowledge.

 * New server starts.

 * New server tries to open port but fails because that port is “still
 in use” by connections in the TCP connection table.

 With these flags, the new server can open the port even though
 it is “still in use” by existing connections.


 > This is not the case today.
 > Only multicast sockets seem to have the behaviour of broadcasting the
 data
 > to all sockets sharing the same properties through these options!

 That is what multicast is for.

 If you want the same data sent to all listeners, then
 that is multicast behavior and you should be using
 a multicast socket.

 > The patch at [1] implements/corrects the behaviour for UDP sockets.

 You’re trying to turn all UDP sockets with those options
 into multicast sockets.

 If you want a multicast socket, you should ask for one.

 Tim

 ___
 freebsd-...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

>>>
>>>
>>
>>
>> --
>> Ermal
>>
>
>
>
> --
> Ermal
>



-- 
Ermal
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

2013-11-29 Thread Ermal Luçi
Also some discussions and improvements to it.

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/net/2013-09/msg00165.html


On Fri, Nov 29, 2013 at 7:42 PM, Ermal Luçi  wrote:

> Well seems Dragonfly has some version of it already from commit [1].
>
> In FreeBSD there is the framework for this with by defining PCBGROUP.
> Also the explanation of it at [2] and [3].
> It can achieve approximately the same features of SO_RESUSEPORT of linux.
> The only thing missing is the marketing behind it and i think and better
> RSS support.
> By looking at dates the support is there before linux so all you guys
> looking for it can experiment with it.
>
> What i was trying to accomplish was something else from performance
> improvement and
> maybe put a sysctl behind it to make it more acceptable..
>
> [1]
> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c9c021abb8197718d7a2d441c9
> [2]
> http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=bigexcerpts#L51
> [3] http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.html
>
>
> On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko wrote:
>
>> Tim, you are wrong. Read what is "multicast" definition, and read how UDP
>> and TCP sockets work in Linux 3.9+ kernels.
>>
>> Oleg .
>>
>>
>> On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle wrote:
>>
>>>
>>> On Nov 29, 2013, at 4:04 AM, Ermal Luçi  wrote:
>>>
>>> > Hello,
>>> >
>>> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two daemons
>>> to
>>> > share the same port and possibly listening ip …
>>>
>>> These flags are used with TCP-based servers.
>>>
>>> I’ve used them to make software upgrades go more smoothly.
>>> Without them, the following often happens:
>>>
>>> * Old server stops.  In the process, all of its TCP connections are
>>> closed.
>>>
>>> * Connections to old server remain in the TCP connection table until the
>>> remote end can acknowledge.
>>>
>>> * New server starts.
>>>
>>> * New server tries to open port but fails because that port is “still in
>>> use” by connections in the TCP connection table.
>>>
>>> With these flags, the new server can open the port even though
>>> it is “still in use” by existing connections.
>>>
>>>
>>> > This is not the case today.
>>> > Only multicast sockets seem to have the behaviour of broadcasting the
>>> data
>>> > to all sockets sharing the same properties through these options!
>>>
>>> That is what multicast is for.
>>>
>>> If you want the same data sent to all listeners, then
>>> that is multicast behavior and you should be using
>>> a multicast socket.
>>>
>>> > The patch at [1] implements/corrects the behaviour for UDP sockets.
>>>
>>> You’re trying to turn all UDP sockets with those options
>>> into multicast sockets.
>>>
>>> If you want a multicast socket, you should ask for one.
>>>
>>> Tim
>>>
>>> ___
>>> freebsd-...@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>>>
>>
>>
>
>
> --
> Ermal
>



-- 
Ermal
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

2013-11-29 Thread Ermal Luçi
On Fri, Nov 29, 2013 at 6:59 PM, Tim Kientzle  wrote:

>
> On Nov 29, 2013, at 4:04 AM, Ermal Luçi  wrote:
>
> > Hello,
> >
> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two daemons to
> > share the same port and possibly listening ip …
>
> These flags are used with TCP-based servers.
>

Every one has its own use-case!


>
> I’ve used them to make software upgrades go more smoothly.
> Without them, the following often happens:
>
> * Old server stops.  In the process, all of its TCP connections are closed.
>
> * Connections to old server remain in the TCP connection table until the
> remote end can acknowledge.
>
> * New server starts.
>
> * New server tries to open port but fails because that port is “still in
> use” by connections in the TCP connection table.
>
> With these flags, the new server can open the port even though
> it is “still in use” by existing connections.
>
>
> > This is not the case today.
> > Only multicast sockets seem to have the behaviour of broadcasting the
> data
> > to all sockets sharing the same properties through these options!
>
> That is what multicast is for.
>
>
Multicast has its defined scope and its applications though i think its
interpreting the same socket options
and respecting the options for what they should do and how they should
behave.


> If you want the same data sent to all listeners, then
> that is multicast behavior and you should be using
> a multicast socket.
>
> > The patch at [1] implements/corrects the behaviour for UDP sockets.
>
> You’re trying to turn all UDP sockets with those options
> into multicast sockets.
>

Not really the idea is how you do support the use case of having two
daemons using the same port numbers
but speaking different protocols.
The best would be to merge these daemons but in the case you cannot there
should be some support on it.
At the very end there are only 65k ports :).

Probably a sysctl for the feature might be a further compromise on it?


>
> If you want a multicast socket, you should ask for one.
>
> Tim
>
>


-- 
Ermal
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

2013-11-29 Thread Ermal Luçi
Well seems Dragonfly has some version of it already from commit [1].

In FreeBSD there is the framework for this with by defining PCBGROUP.
Also the explanation of it at [2] and [3].
It can achieve approximately the same features of SO_RESUSEPORT of linux.
The only thing missing is the marketing behind it and i think and better
RSS support.
By looking at dates the support is there before linux so all you guys
looking for it can experiment with it.

What i was trying to accomplish was something else from performance
improvement and
maybe put a sysctl behind it to make it more acceptable..

[1]
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c9c021abb8197718d7a2d441c9
[2]
http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=bigexcerpts#L51
[3] http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.html


On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko wrote:

> Tim, you are wrong. Read what is "multicast" definition, and read how UDP
> and TCP sockets work in Linux 3.9+ kernels.
>
> Oleg .
>
>
> On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle wrote:
>
>>
>> On Nov 29, 2013, at 4:04 AM, Ermal Luçi  wrote:
>>
>> > Hello,
>> >
>> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two daemons to
>> > share the same port and possibly listening ip …
>>
>> These flags are used with TCP-based servers.
>>
>> I’ve used them to make software upgrades go more smoothly.
>> Without them, the following often happens:
>>
>> * Old server stops.  In the process, all of its TCP connections are
>> closed.
>>
>> * Connections to old server remain in the TCP connection table until the
>> remote end can acknowledge.
>>
>> * New server starts.
>>
>> * New server tries to open port but fails because that port is “still in
>> use” by connections in the TCP connection table.
>>
>> With these flags, the new server can open the port even though
>> it is “still in use” by existing connections.
>>
>>
>> > This is not the case today.
>> > Only multicast sockets seem to have the behaviour of broadcasting the
>> data
>> > to all sockets sharing the same properties through these options!
>>
>> That is what multicast is for.
>>
>> If you want the same data sent to all listeners, then
>> that is multicast behavior and you should be using
>> a multicast socket.
>>
>> > The patch at [1] implements/corrects the behaviour for UDP sockets.
>>
>> You’re trying to turn all UDP sockets with those options
>> into multicast sockets.
>>
>> If you want a multicast socket, you should ask for one.
>>
>> Tim
>>
>> ___
>> freebsd-...@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>>
>
>


-- 
Ermal
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

2013-11-29 Thread Tim Kientzle

On Nov 29, 2013, at 4:04 AM, Ermal Luçi  wrote:

> Hello,
> 
> since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two daemons to
> share the same port and possibly listening ip …

These flags are used with TCP-based servers.

I’ve used them to make software upgrades go more smoothly.
Without them, the following often happens:

* Old server stops.  In the process, all of its TCP connections are closed.

* Connections to old server remain in the TCP connection table until the remote 
end can acknowledge.

* New server starts.

* New server tries to open port but fails because that port is “still in use” 
by connections in the TCP connection table.

With these flags, the new server can open the port even though
it is “still in use” by existing connections.


> This is not the case today.
> Only multicast sockets seem to have the behaviour of broadcasting the data
> to all sockets sharing the same properties through these options!

That is what multicast is for.

If you want the same data sent to all listeners, then
that is multicast behavior and you should be using
a multicast socket.

> The patch at [1] implements/corrects the behaviour for UDP sockets.

You’re trying to turn all UDP sockets with those options
into multicast sockets.

If you want a multicast socket, you should ask for one.

Tim

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

2013-11-29 Thread Julian Elischer

On 11/29/13, 8:04 PM, Ermal Luçi wrote:

Hello,

since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two daemons to
share the same port and possibly listening ip, you would expect if you bind
two daemon with such options to same port to see the same traffic on both!
this is not how I interpret it.. I presume it is is to allow two 
OUTGOING sessions from the same source.


This is not the case today.
Only multicast sockets seem to have the behaviour of broadcasting the data
to all sockets sharing the same properties through these options!

The patch at [1] implements/corrects the behaviour for UDP sockets.
Is there anything to be corrected in that patch?
Why it has not been provided there before?
Can it be committed to the tree?
Any extra security checks for jails needed there?


[1]
https://github.com/pfsense/pfsense-tools/blob/master/patches/RELENG_10_0/udp_SO_REUSEADDR%2BPORT.diff



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

2013-11-29 Thread Daniel Nebdal
On Fri, Nov 29, 2013 at 1:04 PM, Ermal Luçi  wrote:

> Hello,
>
> since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two daemons to
> share the same port and possibly listening ip, you would expect if you bind
> two daemon with such options to same port to see the same traffic on both!
>
> This is not the case today.
> Only multicast sockets seem to have the behaviour of broadcasting the data
> to all sockets sharing the same properties through these options!
>
> The patch at [1] implements/corrects the behaviour for UDP sockets.
> Is there anything to be corrected in that patch?
> Why it has not been provided there before?
> Can it be committed to the tree?
> Any extra security checks for jails needed there?
>
>
> [1]
>
> https://github.com/pfsense/pfsense-tools/blob/master/patches/RELENG_10_0/udp_SO_REUSEADDR%2BPORT.diff
>
> --
> Ermal


I understood it as working sort of like for TCP, where packages from a
given remote host+port all end up at exactly one of the local sockets? If
the idea is to split the workload over multiple threads holding their own
sockets listening to the same interface+port,  wouldn't sending all packets
to all sockets all the time be kind of counterproductive?

Of course, I haven't actually used it much; I might be wrong.

-- 
Daniel Nebdal
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


[PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

2013-11-29 Thread Ermal Luçi
Hello,

since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two daemons to
share the same port and possibly listening ip, you would expect if you bind
two daemon with such options to same port to see the same traffic on both!

This is not the case today.
Only multicast sockets seem to have the behaviour of broadcasting the data
to all sockets sharing the same properties through these options!

The patch at [1] implements/corrects the behaviour for UDP sockets.
Is there anything to be corrected in that patch?
Why it has not been provided there before?
Can it be committed to the tree?
Any extra security checks for jails needed there?


[1]
https://github.com/pfsense/pfsense-tools/blob/master/patches/RELENG_10_0/udp_SO_REUSEADDR%2BPORT.diff

-- 
Ermal
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] how to get the size of a malloc(9) block ?

2013-11-29 Thread Julian Elischer

On 11/29/13, 7:26 PM, Daniel Nebdal wrote:

On Fri, Nov 29, 2013 at 11:59 AM, Gleb Smirnoff  wrote:


On Thu, Nov 28, 2013 at 03:13:53PM +, jb wrote:
j> > But I don't understand why you find ksize()/malloc_usable_size()
dangerous.
j> > ...
j>
j> The original crime is commited when *usable size* (an implementation
detail)
j> is exported (leaked) to the caller.
j> To be blunt, when a caller requests memory of certain size, and its
request is
j> satisfied, then it is not its business to learn details beyond that
(and they
j> should not be offered as well).
j> The API should be sanitized, in kernel and user space.
j> Otherwise, all kind of charlatans will try to play hair-raising games
with it.
j> If the caller wants to track the *requested size* programmatically, it
is its
j> business to do it and it can be done very easily.

+1

This is kind of APIs that just shouldn't exist.

--
Totus tuus, Glebius.



Then again:  Using the "overflow" memory is only going to bite them if the
API lies : If the return value is exactly "the size of the block you got
allocated and can safely use until you free it", using it will per
definition be safe.  If the allocator later changes to, say, always
allocate exact byte ranges, or to allocating blocks but having the option
to fragment them later - then the return value would have to shrink to
match, and any program using it would still DTRT.

I'm completely ambivalent about adding it, though - it's not something I
need, it's more stuff that needs to be handled if you change/rewrite the
allocator, and it's not my decision.

I think that if you want to play games with expanding buffers etc,
then you should write your own allocator.

You asked for X bytes. you should expect that you get X bytes and 
nothing more...

either that or you should have asked for more in the first place.


  --
Daniel Nebdal
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"




___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] how to get the size of a malloc(9) block ?

2013-11-29 Thread Daniel Nebdal
On Fri, Nov 29, 2013 at 11:59 AM, Gleb Smirnoff  wrote:

> On Thu, Nov 28, 2013 at 03:13:53PM +, jb wrote:
> j> > But I don't understand why you find ksize()/malloc_usable_size()
> dangerous.
> j> > ...
> j>
> j> The original crime is commited when *usable size* (an implementation
> detail)
> j> is exported (leaked) to the caller.
> j> To be blunt, when a caller requests memory of certain size, and its
> request is
> j> satisfied, then it is not its business to learn details beyond that
> (and they
> j> should not be offered as well).
> j> The API should be sanitized, in kernel and user space.
> j> Otherwise, all kind of charlatans will try to play hair-raising games
> with it.
> j> If the caller wants to track the *requested size* programmatically, it
> is its
> j> business to do it and it can be done very easily.
>
> +1
>
> This is kind of APIs that just shouldn't exist.
>
> --
> Totus tuus, Glebius.
>


Then again:  Using the "overflow" memory is only going to bite them if the
API lies : If the return value is exactly "the size of the block you got
allocated and can safely use until you free it", using it will per
definition be safe.  If the allocator later changes to, say, always
allocate exact byte ranges, or to allocating blocks but having the option
to fragment them later - then the return value would have to shrink to
match, and any program using it would still DTRT.

I'm completely ambivalent about adding it, though - it's not something I
need, it's more stuff that needs to be handled if you change/rewrite the
allocator, and it's not my decision.

 --
Daniel Nebdal
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] how to get the size of a malloc(9) block ?

2013-11-29 Thread David Chisnall

On 28 Nov 2013, at 15:13, jb  wrote:

> Luigi Rizzo  iet.unipi.it> writes:
> 
>> ... 
>> But I don't understand why you find ksize()/malloc_usable_size() dangerous.
>> ...
> 
> The original crime is commited when *usable size* (an implementation detail)
> is exported (leaked) to the caller.
> To be blunt, when a caller requests memory of certain size, and its request is
> satisfied, then it is not its business to learn details beyond that (and they
> should not be offered as well).
> The API should be sanitized, in kernel and user space.
> Otherwise, all kind of charlatans will try to play hair-raising games with it.
> If the caller wants to track the *requested size* programmatically, it is its
> business to do it and it can be done very easily.
> 
> Some of these guys got it perfectly right:
> http://stackoverflow.com/questions/5813078/is-it-possible-to-find-the-memory-allocated-to-the-pointer-without-searching-fo

I disagree.  I've encountered several occasions where either locality doesn't 
matter so much or I know the pointer is aliased, and I'd like increase the size 
of a relatively large allocation.  I have two choices:

- Call realloc(), potentially copying a lot of data
- Call malloc(), and chain two (or more) allocations together.

What I'd like to do is call realloc() if it's effectively free, or call 
malloc() in other cases.

The malloc_useable_size() API is wrong though.  In the kernel, realloc() 
already takes a flag and a M_DONTALLOCATE would make more sense, enlarging the 
allocation if it can be done without doing the allocate-copy-free dance, but 
returning NULL and leaving the allocation unmodified if not.

David

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] how to get the size of a malloc(9) block ?

2013-11-29 Thread Gleb Smirnoff
On Thu, Nov 28, 2013 at 03:13:53PM +, jb wrote:
j> > But I don't understand why you find ksize()/malloc_usable_size() dangerous.
j> > ...
j> 
j> The original crime is commited when *usable size* (an implementation detail)
j> is exported (leaked) to the caller.
j> To be blunt, when a caller requests memory of certain size, and its request 
is
j> satisfied, then it is not its business to learn details beyond that (and they
j> should not be offered as well).
j> The API should be sanitized, in kernel and user space.
j> Otherwise, all kind of charlatans will try to play hair-raising games with 
it.
j> If the caller wants to track the *requested size* programmatically, it is its
j> business to do it and it can be done very easily.

+1

This is kind of APIs that just shouldn't exist.

-- 
Totus tuus, Glebius.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"