Re: New Address Family: Inter Process Networking (IPN)

2007-12-10 Thread Chris Friesen

David Miller wrote:


The kernel supports much more than 32 groups, see nlk->groups which is
a bitmap which can be sized to arbitrary sizes.  nlk->nl_groups is
for backwards compatability only.

netlink_change_ngroups() does the bitmap resizing when necessary.


Thanks for the explanation.  Given that it's a bitmap doesn't that 
result in a cost of O(number of groups) when processing messages?  In 
our case we need potentially thousands of groups.



The root multicast listening restriction can be relaxed in some
circumstances, whatever is needed to fill your needs.


Also, good to know.


Stop making excuses, with minor adjustments we have the facilities to
meet your needs.  There is no need for yet-another-protocol to do what
you're trying to do, we already have too much duplicated
functionality.


You may have confused me with the OP...I just chimed in because of some 
of the limitations we found when we wanted to do similar things.  In our 
case we created a new unix-like protocol to allow multicast, and have 
been using it for a few years.


However, if we could use netlink instead in our next release that would 
be a good thing.  A couple questions:


1) Is it possible to register to receive all netlink messages for a 
particular netlink family?  This is useful for debugging--it allows a 
tcpdump equivalent.


2) Is there any up-to-date netlink programming guide?  I found this one:

http://people.redhat.com/nhorman/papers/netlink.pdf

but it's three years old now.


Thanks,

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-10 Thread Chris Friesen

David Miller wrote:


The kernel supports much more than 32 groups, see nlk-groups which is
a bitmap which can be sized to arbitrary sizes.  nlk-nl_groups is
for backwards compatability only.

netlink_change_ngroups() does the bitmap resizing when necessary.


Thanks for the explanation.  Given that it's a bitmap doesn't that 
result in a cost of O(number of groups) when processing messages?  In 
our case we need potentially thousands of groups.



The root multicast listening restriction can be relaxed in some
circumstances, whatever is needed to fill your needs.


Also, good to know.


Stop making excuses, with minor adjustments we have the facilities to
meet your needs.  There is no need for yet-another-protocol to do what
you're trying to do, we already have too much duplicated
functionality.


You may have confused me with the OP...I just chimed in because of some 
of the limitations we found when we wanted to do similar things.  In our 
case we created a new unix-like protocol to allow multicast, and have 
been using it for a few years.


However, if we could use netlink instead in our next release that would 
be a good thing.  A couple questions:


1) Is it possible to register to receive all netlink messages for a 
particular netlink family?  This is useful for debugging--it allows a 
tcpdump equivalent.


2) Is there any up-to-date netlink programming guide?  I found this one:

http://people.redhat.com/nhorman/papers/netlink.pdf

but it's three years old now.


Thanks,

Chris
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-07 Thread Andi Kleen
> Stop making excuses, with minor adjustments we have the facilities to
> meet your needs.  There is no need for yet-another-protocol to do what

I suspect they would be better of just using IP multicast. But the localhost 
latency penalty vs Unix Chris was talking about probably needs to be 
investigated.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-07 Thread Andi Kleen
 Stop making excuses, with minor adjustments we have the facilities to
 meet your needs.  There is no need for yet-another-protocol to do what

I suspect they would be better of just using IP multicast. But the localhost 
latency penalty vs Unix Chris was talking about probably needs to be 
investigated.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread David Miller
From: "Chris Friesen" <[EMAIL PROTECTED]>
Date: Thu, 06 Dec 2007 22:21:39 -0600

> David Miller wrote:
> > From: "Chris Friesen" <[EMAIL PROTECTED]>
> > Date: Thu, 06 Dec 2007 14:36:54 -0600
> > 
> > 
> >>One problem we ran into was that there are only 32 multicast groups per 
> >>netlink protocol family.
> > 
> > 
> > I'm pretty sure we've removed this limitation.
> 
> As of 2.6.23 nl_groups is a 32-bit bitmask with one bit per group. 
> Also, it appears that only root is allowed to use multicast netlink.

The kernel supports much more than 32 groups, see nlk->groups which is
a bitmap which can be sized to arbitrary sizes.  nlk->nl_groups is
for backwards compatability only.

netlink_change_ngroups() does the bitmap resizing when necessary.

The root multicast listening restriction can be relaxed in some
circumstances, whatever is needed to fill your needs.

Stop making excuses, with minor adjustments we have the facilities to
meet your needs.  There is no need for yet-another-protocol to do what
you're trying to do, we already have too much duplicated
functionality.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Ben Pfaff
"Chris Friesen" <[EMAIL PROTECTED]> writes:

> David Miller wrote:
>> From: "Chris Friesen" <[EMAIL PROTECTED]>
>>> One problem we ran into was that there are only 32 multicast groups
>>> per netlink protocol family.
>> I'm pretty sure we've removed this limitation.
> As of 2.6.23 nl_groups is a 32-bit bitmask with one bit per
> group. 

Use setsockopt(fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, ...) to
join an arbitrary Netlink multicast group.
-- 
"A computer is a state machine.
 Threads are for people who cant [sic] program state machines."
--Alan Cox

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Chris Friesen

David Miller wrote:

From: "Chris Friesen" <[EMAIL PROTECTED]>
Date: Thu, 06 Dec 2007 14:36:54 -0600


One problem we ran into was that there are only 32 multicast groups per 
netlink protocol family.



I'm pretty sure we've removed this limitation.


As of 2.6.23 nl_groups is a 32-bit bitmask with one bit per group. 
Also, it appears that only root is allowed to use multicast netlink.


Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread David Miller
From: "Chris Friesen" <[EMAIL PROTECTED]>
Date: Thu, 06 Dec 2007 14:36:54 -0600

> One problem we ran into was that there are only 32 multicast groups per 
> netlink protocol family.

I'm pretty sure we've removed this limitation.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Renzo Davoli
I have done some raw tests.
(you can read the code here: http://www.cs.unibo.it/~renzo/rawperftest/)

The programs are quite simple. The sender sends "Hello World" as fast as it
can, while the receiver prints time() for each 1 million message
received.

On my laptop, tests on 2000 "Hello World" packets, 

One receiver:
multicast   244,000 msg/sec
IPN 333,000 msg/sec  (36% faster)

Two receivers:
multicast   174,000 msg/sec
IPN 250,000 msg/sec  (43% faster)

Apart from this, how could I implement policies over a multicast socket,
e.g. how does a Kernel VDE_switch work on multicast sockets?

If I send an ethernet packet over a multicast socket it can emulate just a
hub (Although it seems to me quite innatural to have to have TCP-UDP 
over IP over Ethernet over UDP over IP - okay we can skip the Ethernet 
on localhost, long ethernet frames get fragmentated but... details).

On multicast socket you cannot use policies, I mean a IPN network (or
bus or group) can have a policy reading some info on the packet to
decide the set of receipients.
For a vde_switch it is the destination mac address when found in the
MAC hash table to select the receipient port. For midi communication it 
could be the channel number

Moving the switching fabric to the userland the performance figures are
quite different.

renzo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Chris Friesen

Andi Kleen wrote:

On Thu, Dec 06, 2007 at 05:02:40PM -0600, Chris Friesen wrote:


I just reran on a 3.2GHZ P4 running 2.6.11 (Fedora Core 4).  42% latency 
increase.



Sounds like something that should be looked into. I know of no
principal reasons for that.


For stream sockets, unix gives approximately a 62% bandwidth increase 
over tcp.   (Although tcp could probably be tuned to do better than this.)



How long a stream did you test? You might be measuring slow start.


No idea.  These are just the standard local networking tests in lmbench 
v2.  In our case the latency was the big concern and we were using 
datagrams anyway.


Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
On Thu, Dec 06, 2007 at 05:02:40PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> 
> >>On a 1.4GHz P4 I measured a 44% increase in latency between a unix 
> >>datagram and a UDP datagram.
> 
> >That's weird.
> 
> I just reran on a 3.2GHZ P4 running 2.6.11 (Fedora Core 4).  42% latency 
> increase.

Sounds like something that should be looked into. I know of no
principal reasons for that.

> For stream sockets, unix gives approximately a 62% bandwidth increase 
> over tcp.   (Although tcp could probably be tuned to do better than this.)

How long a stream did you test? You might be measuring slow start.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Chris Friesen

Andi Kleen wrote:

On a 1.4GHz P4 I measured a 44% increase in latency between a unix 
datagram and a UDP datagram.



That's weird.


I just reran on a 3.2GHZ P4 running 2.6.11 (Fedora Core 4).  42% latency 
increase.


For stream sockets, unix gives approximately a 62% bandwidth increase 
over tcp.   (Although tcp could probably be tuned to do better than this.)


Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
> Renzo's IPN is a local protocol--you can't multicast to localhost.

You don't need to. All local clients can join the same group without
using localhost.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
On Thu, Dec 06, 2007 at 11:18:37PM +0100, Renzo Davoli wrote:
> * IPN is for inter-process communication. It is *not* directly related 
> to TCP-IP or Ethernet.
> 
> If you want you can call it Inter Process Bus Communication.  It is an
> extension of AF_UNIX.  Comments saying that some services can be
> implemented by using TCP-IP multicast protocols are unrelated to IPN.
> All AF_UNIX services could be implemented as TCP-IP services on
> 127.0.0.1. Do we abolish AF_UNIX, then?  The problem is that to use
> TCP-IP, you'd need to wrap the packets with TCP or UDP, IP and Ethernet

No ethernet headers on localhost. Just to give you a perspective:
IP+TCP headers are 50 bytes (with timestamps) and IP+UDP is 28 bytes.
On the other hand the sk_buff+skb_shared_info header which are used for 
all socket communication in Linux and have to be mostly set up always
are 192+312bytes on 64bit [parts of the 312 bytes is an array that is 
typically only partly used] or 156+236 bytes on 32bit. So the network
headers dwarf the internal data structures.

There might be other reasons why TCP/IP is slower, but arguing 
with the size of the headers is just bogus.

My personal feeling would be that if TCP/IP is too slow for something
it is better to just improve the stack than to add a completely
new socket family. That will benefit much more applications without
requiring to change them.

About the only good reason to use UNIX sockets is when you need to use
file system permissions. 

> * IPN services (like AF_UNIX) do not require root privileges.
> 
> There are many communication services where the user need broadcast or
> p2p among user processes.  If a user (not root) wants to run several

IP Multicast when properly set up also doesn't need root.

Broadcast is kind of obsolete anyways.

> User-Mode Linux, Qemu, Kvm VM the only way to have them connected
> together is our Virtual Distributed Ethernet.  (For this reason VDE

They could easily just tunnel over a local multicast group for example.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread David Newall

Andi Kleen wrote:
Renzo also described something new (in the socket() arena): the 
multi-reader, multi-writer is just not available in IP.



How is that different from a multicast group?
  


Good question.  I don't know much about multicast IP.  It's a bit new 
for me.  I knew it uses Martian addresses!  After a little reading, I 
now know that it does allow many to many communication.


Renzo's IPN is a local protocol--you can't multicast to localhost.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Renzo Davoli
Some more explanations trying to describe what IPN is and what it is
useful for.  We are writing the complete patch

Summary:
* IPN is for inter-process communication. It is *not* directly related 
to TCP-IP or Ethernet.
* IPN itself is a *level 1* virtual physical network.  IPN services
* (like AF_UNIX) do not require root privileges.  TAP and GRAB are just
* extra features for for IPN deliverying Ethernet frames.


* IPN is for inter-process communication. It is *not* directly related 
to TCP-IP or Ethernet.

If you want you can call it Inter Process Bus Communication.  It is an
extension of AF_UNIX.  Comments saying that some services can be
implemented by using TCP-IP multicast protocols are unrelated to IPN.
All AF_UNIX services could be implemented as TCP-IP services on
127.0.0.1. Do we abolish AF_UNIX, then?  The problem is that to use
TCP-IP, you'd need to wrap the packets with TCP or UDP, IP and Ethernet
headers, the stack would lose time to manage useless protocols.  If you
want just to send strings to set of local processes TCP-IP is an
overloading solution.  Even X-Window uses AF_UNIX sockets to talk with
local clients, it is a performance issue... I think Chris is right.

* IPN itself is a *level 1* virtual physical network.

Like any physical network you can run higher level protocols on it, thus
Ethernet, and then TCP-IP can be services you can run on IPN, but there
can be IPN networks running neither TCP-IP nor Ethernet.

* IPN services (like AF_UNIX) do not require root privileges.

There are many communication services where the user need broadcast or
p2p among user processes.  If a user (not root) wants to run several
User-Mode Linux, Qemu, Kvm VM the only way to have them connected
together is our Virtual Distributed Ethernet.  (For this reason VDE
exists in almost all the distros, it has been ported to other OSs, and
is already supported in the Linux Kernel for User-Mode Linux).  VDE is a
userland deamon, hence requires two context switches to deliver a
packet: VM1 -> K -> Daemon -> K -> VM2. Kvde running on IPN just one:
VM1 -> K ->VM2.  I think D-Bus can use IPN, too. The same cutoff of
context switches applies.  May I speculate that there will be a sensible
increase in performance?  *nix are multiuser. It means that there do
exist people that need to set up services without root access.  And even
if you have root access, the less you need to work as root, the safer is
you system.

* TAP and GRAB are just extra features for for IPN deliverying Ethernet frames.

Some IPN networks do use Ethernet as Data-Link protocol.  It is useful
to provide means to connect the IPN socket to a virtual (TAP) interface
or to a real (GRAB) interface.  I know that a lot of people use tap
interfaces, and the kernel bridge to connect Virtual Machines.  The
access can be resticted to some users or processes by itpables, but it
not as simple as a chmod to the socket.  A lot of people also use tunctl
to define a priori tap interfaces for users.  They must define as many
tuntap interfaces as the number of VM the users may want, each user has
his/her own taps.  Some users define a userland VDE switch to
interconnect their VM.  IPN itself could use a userland process to
define a standard TAP interface and loose its time and its cpu cycles to
move packets from tap to ipn and viceversa.  IPN is already kernel code
and then all its context switches and cpu cycles can be saved by
accessing the tap or grabbed interface diretly from the kernel.  (TAP
and GRAB obviously require CAP_NET_ADMIN).  Using IPN with TAP you can
define one single TAP interface connected to an IPN socket. Several VMs
can use that IPN socket, in this way the VMs are connected by a (virtual
ethernet) network which include the TAP interface.  The access control
to the network (and then to the TAP) is done by setting the permissions
to the socket.  Tunctl is *not* able to create a tap where all the users
belonging to a group can start their VM. IPN can.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
On Thu, Dec 06, 2007 at 03:49:51PM -0600, Chris Friesen wrote:
> Andi Kleen wrote:
> >>Latency was very 
> >>important, so we ended up doing essentially a multicast unix socket 
> >>rather than taking the extra penalty for UDP multicast.
> >
> >
> >What extra penalty? Local UDP shouldn't be much more expensive than Unix.
> 
> On a 1.4GHz P4 I measured a 44% increase in latency between a unix 
> datagram and a UDP datagram.

That's weird.

> 
> For UDP it has to go down the udp stack, then the ip stack, then through 

UDP doesn't really have much stack. IP is also very little assuming
cached route (connect called first) 

I would expect the copies to dominate in both cases.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Chris Friesen

Andi Kleen wrote:
Latency was very 
important, so we ended up doing essentially a multicast unix socket 
rather than taking the extra penalty for UDP multicast.



What extra penalty? Local UDP shouldn't be much more expensive than Unix.


On a 1.4GHz P4 I measured a 44% increase in latency between a unix 
datagram and a UDP datagram.


For UDP it has to go down the udp stack, then the ip stack, then through 
the routing tables and back up the receive side.


The unix socket just hashes to get the destination and delivers the message.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
> Latency was very 
> important, so we ended up doing essentially a multicast unix socket 
> rather than taking the extra penalty for UDP multicast.

What extra penalty? Local UDP shouldn't be much more expensive than Unix.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Chris Friesen

Andi Kleen wrote:

"This document describes Linux Netlink, which is used in Linux both as
  an intra-kernel messaging system as well as between kernel and user
  space."



It can be used between user space daemons as well. In fact it is.
e.g. they often listen to each other's messages.


One problem we ran into was that there are only 32 multicast groups per 
netlink protocol family.


We had a situation where we could have used netlink, but we needed the 
equivalent of thousands of multicast groups.  Latency was very 
important, so we ended up doing essentially a multicast unix socket 
rather than taking the extra penalty for UDP multicast.


Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
> Renzo also described something new (in the socket() arena): the 
> multi-reader, multi-writer is just not available in IP.

How is that different from a multicast group?

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
> "This document describes Linux Netlink, which is used in Linux both as
>an intra-kernel messaging system as well as between kernel and user
>space."

It can be used between user space daemons as well. In fact it is.
e.g. they often listen to each other's messages.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
 Renzo also described something new (in the socket() arena): the 
 multi-reader, multi-writer is just not available in IP.

How is that different from a multicast group?

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
 This document describes Linux Netlink, which is used in Linux both as
an intra-kernel messaging system as well as between kernel and user
space.

It can be used between user space daemons as well. In fact it is.
e.g. they often listen to each other's messages.

-Andi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Chris Friesen

Andi Kleen wrote:

This document describes Linux Netlink, which is used in Linux both as
  an intra-kernel messaging system as well as between kernel and user
  space.



It can be used between user space daemons as well. In fact it is.
e.g. they often listen to each other's messages.


One problem we ran into was that there are only 32 multicast groups per 
netlink protocol family.


We had a situation where we could have used netlink, but we needed the 
equivalent of thousands of multicast groups.  Latency was very 
important, so we ended up doing essentially a multicast unix socket 
rather than taking the extra penalty for UDP multicast.


Chris
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
 Latency was very 
 important, so we ended up doing essentially a multicast unix socket 
 rather than taking the extra penalty for UDP multicast.

What extra penalty? Local UDP shouldn't be much more expensive than Unix.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Chris Friesen

Andi Kleen wrote:
Latency was very 
important, so we ended up doing essentially a multicast unix socket 
rather than taking the extra penalty for UDP multicast.



What extra penalty? Local UDP shouldn't be much more expensive than Unix.


On a 1.4GHz P4 I measured a 44% increase in latency between a unix 
datagram and a UDP datagram.


For UDP it has to go down the udp stack, then the ip stack, then through 
the routing tables and back up the receive side.


The unix socket just hashes to get the destination and delivers the message.

Chris
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Renzo Davoli
Some more explanations trying to describe what IPN is and what it is
useful for.  We are writing the complete patch

Summary:
* IPN is for inter-process communication. It is *not* directly related 
to TCP-IP or Ethernet.
* IPN itself is a *level 1* virtual physical network.  IPN services
* (like AF_UNIX) do not require root privileges.  TAP and GRAB are just
* extra features for for IPN deliverying Ethernet frames.


* IPN is for inter-process communication. It is *not* directly related 
to TCP-IP or Ethernet.

If you want you can call it Inter Process Bus Communication.  It is an
extension of AF_UNIX.  Comments saying that some services can be
implemented by using TCP-IP multicast protocols are unrelated to IPN.
All AF_UNIX services could be implemented as TCP-IP services on
127.0.0.1. Do we abolish AF_UNIX, then?  The problem is that to use
TCP-IP, you'd need to wrap the packets with TCP or UDP, IP and Ethernet
headers, the stack would lose time to manage useless protocols.  If you
want just to send strings to set of local processes TCP-IP is an
overloading solution.  Even X-Window uses AF_UNIX sockets to talk with
local clients, it is a performance issue... I think Chris is right.

* IPN itself is a *level 1* virtual physical network.

Like any physical network you can run higher level protocols on it, thus
Ethernet, and then TCP-IP can be services you can run on IPN, but there
can be IPN networks running neither TCP-IP nor Ethernet.

* IPN services (like AF_UNIX) do not require root privileges.

There are many communication services where the user need broadcast or
p2p among user processes.  If a user (not root) wants to run several
User-Mode Linux, Qemu, Kvm VM the only way to have them connected
together is our Virtual Distributed Ethernet.  (For this reason VDE
exists in almost all the distros, it has been ported to other OSs, and
is already supported in the Linux Kernel for User-Mode Linux).  VDE is a
userland deamon, hence requires two context switches to deliver a
packet: VM1 - K - Daemon - K - VM2. Kvde running on IPN just one:
VM1 - K -VM2.  I think D-Bus can use IPN, too. The same cutoff of
context switches applies.  May I speculate that there will be a sensible
increase in performance?  *nix are multiuser. It means that there do
exist people that need to set up services without root access.  And even
if you have root access, the less you need to work as root, the safer is
you system.

* TAP and GRAB are just extra features for for IPN deliverying Ethernet frames.

Some IPN networks do use Ethernet as Data-Link protocol.  It is useful
to provide means to connect the IPN socket to a virtual (TAP) interface
or to a real (GRAB) interface.  I know that a lot of people use tap
interfaces, and the kernel bridge to connect Virtual Machines.  The
access can be resticted to some users or processes by itpables, but it
not as simple as a chmod to the socket.  A lot of people also use tunctl
to define a priori tap interfaces for users.  They must define as many
tuntap interfaces as the number of VM the users may want, each user has
his/her own taps.  Some users define a userland VDE switch to
interconnect their VM.  IPN itself could use a userland process to
define a standard TAP interface and loose its time and its cpu cycles to
move packets from tap to ipn and viceversa.  IPN is already kernel code
and then all its context switches and cpu cycles can be saved by
accessing the tap or grabbed interface diretly from the kernel.  (TAP
and GRAB obviously require CAP_NET_ADMIN).  Using IPN with TAP you can
define one single TAP interface connected to an IPN socket. Several VMs
can use that IPN socket, in this way the VMs are connected by a (virtual
ethernet) network which include the TAP interface.  The access control
to the network (and then to the TAP) is done by setting the permissions
to the socket.  Tunctl is *not* able to create a tap where all the users
belonging to a group can start their VM. IPN can.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread David Newall

Andi Kleen wrote:
Renzo also described something new (in the socket() arena): the 
multi-reader, multi-writer is just not available in IP.



How is that different from a multicast group?
  


Good question.  I don't know much about multicast IP.  It's a bit new 
for me.  I knew it uses Martian addresses!  After a little reading, I 
now know that it does allow many to many communication.


Renzo's IPN is a local protocol--you can't multicast to localhost.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
On Thu, Dec 06, 2007 at 03:49:51PM -0600, Chris Friesen wrote:
 Andi Kleen wrote:
 Latency was very 
 important, so we ended up doing essentially a multicast unix socket 
 rather than taking the extra penalty for UDP multicast.
 
 
 What extra penalty? Local UDP shouldn't be much more expensive than Unix.
 
 On a 1.4GHz P4 I measured a 44% increase in latency between a unix 
 datagram and a UDP datagram.

That's weird.

 
 For UDP it has to go down the udp stack, then the ip stack, then through 

UDP doesn't really have much stack. IP is also very little assuming
cached route (connect called first) 

I would expect the copies to dominate in both cases.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
On Thu, Dec 06, 2007 at 11:18:37PM +0100, Renzo Davoli wrote:
 * IPN is for inter-process communication. It is *not* directly related 
 to TCP-IP or Ethernet.
 
 If you want you can call it Inter Process Bus Communication.  It is an
 extension of AF_UNIX.  Comments saying that some services can be
 implemented by using TCP-IP multicast protocols are unrelated to IPN.
 All AF_UNIX services could be implemented as TCP-IP services on
 127.0.0.1. Do we abolish AF_UNIX, then?  The problem is that to use
 TCP-IP, you'd need to wrap the packets with TCP or UDP, IP and Ethernet

No ethernet headers on localhost. Just to give you a perspective:
IP+TCP headers are 50 bytes (with timestamps) and IP+UDP is 28 bytes.
On the other hand the sk_buff+skb_shared_info header which are used for 
all socket communication in Linux and have to be mostly set up always
are 192+312bytes on 64bit [parts of the 312 bytes is an array that is 
typically only partly used] or 156+236 bytes on 32bit. So the network
headers dwarf the internal data structures.

There might be other reasons why TCP/IP is slower, but arguing 
with the size of the headers is just bogus.

My personal feeling would be that if TCP/IP is too slow for something
it is better to just improve the stack than to add a completely
new socket family. That will benefit much more applications without
requiring to change them.

About the only good reason to use UNIX sockets is when you need to use
file system permissions. 

 * IPN services (like AF_UNIX) do not require root privileges.
 
 There are many communication services where the user need broadcast or
 p2p among user processes.  If a user (not root) wants to run several

IP Multicast when properly set up also doesn't need root.

Broadcast is kind of obsolete anyways.

 User-Mode Linux, Qemu, Kvm VM the only way to have them connected
 together is our Virtual Distributed Ethernet.  (For this reason VDE

They could easily just tunnel over a local multicast group for example.

-Andi

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
 Renzo's IPN is a local protocol--you can't multicast to localhost.

You don't need to. All local clients can join the same group without
using localhost.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Chris Friesen

Andi Kleen wrote:

On a 1.4GHz P4 I measured a 44% increase in latency between a unix 
datagram and a UDP datagram.



That's weird.


I just reran on a 3.2GHZ P4 running 2.6.11 (Fedora Core 4).  42% latency 
increase.


For stream sockets, unix gives approximately a 62% bandwidth increase 
over tcp.   (Although tcp could probably be tuned to do better than this.)


Chris
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Andi Kleen
On Thu, Dec 06, 2007 at 05:02:40PM -0600, Chris Friesen wrote:
 Andi Kleen wrote:
 
 On a 1.4GHz P4 I measured a 44% increase in latency between a unix 
 datagram and a UDP datagram.
 
 That's weird.
 
 I just reran on a 3.2GHZ P4 running 2.6.11 (Fedora Core 4).  42% latency 
 increase.

Sounds like something that should be looked into. I know of no
principal reasons for that.

 For stream sockets, unix gives approximately a 62% bandwidth increase 
 over tcp.   (Although tcp could probably be tuned to do better than this.)

How long a stream did you test? You might be measuring slow start.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Chris Friesen

Andi Kleen wrote:

On Thu, Dec 06, 2007 at 05:02:40PM -0600, Chris Friesen wrote:


I just reran on a 3.2GHZ P4 running 2.6.11 (Fedora Core 4).  42% latency 
increase.



Sounds like something that should be looked into. I know of no
principal reasons for that.


For stream sockets, unix gives approximately a 62% bandwidth increase 
over tcp.   (Although tcp could probably be tuned to do better than this.)



How long a stream did you test? You might be measuring slow start.


No idea.  These are just the standard local networking tests in lmbench 
v2.  In our case the latency was the big concern and we were using 
datagrams anyway.


Chris
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Renzo Davoli
I have done some raw tests.
(you can read the code here: http://www.cs.unibo.it/~renzo/rawperftest/)

The programs are quite simple. The sender sends Hello World as fast as it
can, while the receiver prints time() for each 1 million message
received.

On my laptop, tests on 2000 Hello World packets, 

One receiver:
multicast   244,000 msg/sec
IPN 333,000 msg/sec  (36% faster)

Two receivers:
multicast   174,000 msg/sec
IPN 250,000 msg/sec  (43% faster)

Apart from this, how could I implement policies over a multicast socket,
e.g. how does a Kernel VDE_switch work on multicast sockets?

If I send an ethernet packet over a multicast socket it can emulate just a
hub (Although it seems to me quite innatural to have to have TCP-UDP 
over IP over Ethernet over UDP over IP - okay we can skip the Ethernet 
on localhost, long ethernet frames get fragmentated but... details).

On multicast socket you cannot use policies, I mean a IPN network (or
bus or group) can have a policy reading some info on the packet to
decide the set of receipients.
For a vde_switch it is the destination mac address when found in the
MAC hash table to select the receipient port. For midi communication it 
could be the channel number

Moving the switching fabric to the userland the performance figures are
quite different.

renzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread David Miller
From: Chris Friesen [EMAIL PROTECTED]
Date: Thu, 06 Dec 2007 14:36:54 -0600

 One problem we ran into was that there are only 32 multicast groups per 
 netlink protocol family.

I'm pretty sure we've removed this limitation.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Chris Friesen

David Miller wrote:

From: Chris Friesen [EMAIL PROTECTED]
Date: Thu, 06 Dec 2007 14:36:54 -0600


One problem we ran into was that there are only 32 multicast groups per 
netlink protocol family.



I'm pretty sure we've removed this limitation.


As of 2.6.23 nl_groups is a 32-bit bitmask with one bit per group. 
Also, it appears that only root is allowed to use multicast netlink.


Chris
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread Ben Pfaff
Chris Friesen [EMAIL PROTECTED] writes:

 David Miller wrote:
 From: Chris Friesen [EMAIL PROTECTED]
 One problem we ran into was that there are only 32 multicast groups
 per netlink protocol family.
 I'm pretty sure we've removed this limitation.
 As of 2.6.23 nl_groups is a 32-bit bitmask with one bit per
 group. 

Use setsockopt(fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, ...) to
join an arbitrary Netlink multicast group.
-- 
A computer is a state machine.
 Threads are for people who cant [sic] program state machines.
--Alan Cox

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-06 Thread David Miller
From: Chris Friesen [EMAIL PROTECTED]
Date: Thu, 06 Dec 2007 22:21:39 -0600

 David Miller wrote:
  From: Chris Friesen [EMAIL PROTECTED]
  Date: Thu, 06 Dec 2007 14:36:54 -0600
  
  
 One problem we ran into was that there are only 32 multicast groups per 
 netlink protocol family.
  
  
  I'm pretty sure we've removed this limitation.
 
 As of 2.6.23 nl_groups is a 32-bit bitmask with one bit per group. 
 Also, it appears that only root is allowed to use multicast netlink.

The kernel supports much more than 32 groups, see nlk-groups which is
a bitmap which can be sized to arbitrary sizes.  nlk-nl_groups is
for backwards compatability only.

netlink_change_ngroups() does the bitmap resizing when necessary.

The root multicast listening restriction can be relaxed in some
circumstances, whatever is needed to fill your needs.

Stop making excuses, with minor adjustments we have the facilities to
meet your needs.  There is no need for yet-another-protocol to do what
you're trying to do, we already have too much duplicated
functionality.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread David Newall

Kyle Moffett wrote:

On Dec 06, 2007, at 00:30:16, Renzo Davoli wrote:
AF_IPN is different.  AF_IPN is the broadcast and peer-to-peer 
extension of AF_UNIX. It supports communication among *user* processes.


Ok, you say it's different, but then you describe how IP unicast and 
broadcast work.


Renzo also described something new (in the socket() arena): the 
multi-reader, multi-writer is just not available in IP.


I wonder if this solves the same problem as d-bus?


So if you really think this is something that belongs in the kernel 
you need to provide much more detailed descriptions and use-cases for 
why it cannot be implemented in user-space or with small modifications 
to existing UDP/TCP networking. 


I would strengthen this sentiment: If you think something belongs in the 
kernel, you need to argue your case (provide much more detailed 
descriptions and use-cases.)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Kyle Moffett

On Dec 06, 2007, at 00:30:16, Renzo Davoli wrote:
AF_IPN is different.  AF_IPN is the broadcast and peer-to-peer  
extension of AF_UNIX. It supports communication among *user*  
processes.


Ok, you say it's different, but then you describe how IP unicast and  
broadcast work.  Both are frequently used for communication among  
"*user* processes".  Please provide significantly more details about  
exactly *how* it's different.




Example:

Qemu, User-Mode Linux, Kvm, our umview machines can use IPN as an  
Ethernet Hub and communicate among themselves with the hosting  
computer and the world by a tap like interface.


You say "tap like" interface, but people do this already with  
existing infrastructure.  You can connect Qemu, UML, and KVM to a  
standard linus "tap" interface, and then use the standard Linux  
bridging code to connect the "tap" interface to your existing network  
interfaces.  Alternatively you could use the standard and well-tested  
IP routing/firewalling/NAT code to move your packets around.  None of  
this requires new network infrastructure in the slightest.  If you  
have problems with the existing code, please improve it instead of  
creating a slightly incompatible replacement which has different bugs  
and workarounds.



You can also grab an interface (say eth1) and use eth0 for your  
hosting computer and eth1 for the IPN network of virtual machines.


You can do that already with the bridging code.


If you load the kvde_switch submodule IPN can be a virtual Ethernet  
switch.


As I described above, this can be done with the existing bridging and  
tun/tap code.




Another Example:

You have a continuous stream of data packets generated by a  
process, and you want to send this data to many processes.  Maybe  
the set of processes is not known in advance, you want to send the  
data to any interested process. Some kind of publish  
communication service (among unix processes not on TCP-IP). Without  
IPN you need a server. With IPN the sender creates the socket  
connects to it and feed it with data packets. All the interested  
receivers connects to it and start reading. That's all.


This is already done frequently in userspace.  Just register a port  
number with IANA on which to implement a "registration" server and  
write a little daemon to listen on 127.0.0.1:${YOUR_PORT}.  Your  
interconnecting programs then use either unicast or multicast sockets  
to bind, then report to the registration server what service you are  
offering and what port it's on.  Your "receivers" then connect to the  
registration server, ask what port a given service is on, and then  
multicast-listen or unicast-connect to access that service.  The best  
part is that all of the performance implications are already  
thoroughly understood.  Furthermore, if you want to extend your  
communication protocol to other hosts as well, you just have to  
replace the 127.0.0.1 bind with a global bind.  This is exactly how  
the standard-specified multiple-participant "SIP" protocol works, for  
example.



So if you really think this is something that belongs in the kernel  
you need to provide much more detailed descriptions and use-cases for  
why it cannot be implemented in user-space or with small  
modifications to existing UDP/TCP networking.


Cheers,
Kyle Moffett

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Stephen Hemminger
On Thu, 6 Dec 2007 06:38:21 +0100
[EMAIL PROTECTED] (Renzo Davoli) wrote:

> On Wed, Dec 05, 2007 at 04:55:52PM -0500, Stephen Hemminger wrote:
> > On Wed, 5 Dec 2007 17:40:55 +0100
> > [EMAIL PROTECTED] (Renzo Davoli) wrote:
> > > 0- (Constructive) comments.
> > > 1- The "official" assignment of an Address Family.
> > > 2- Another "grabbing hook" for interfaces (like the ones already
> > > We are studying some way to register/deregister grabbing services,
> > > I feel this would be the cleanest way. 
> > 
> > Post complete source code for kernel part to [EMAIL PROTECTED]
> I'll do it as soon as possible.
> > If you want the hooks, you need to include the full source code for 
> > inclusion
> > in mainline. All the Documentation/SubmittingPatches rules apply;
> > you can't just ask for "facilitators" and expect to keep your stuff out of 
> > tree.
> I am sorry if I was misunderstood.
> I did not want any "facilitator", nor I wanted to keep my code outside
> the kernel, on the contrary.

Greate

> It is perfectly okay for me to provide the entire code for inclusion.
> The purposes of my message were the following:
> - I wanted to introduce the idea and say to the linux kernel community
>   that a team is working on it.
> - Address family: is it okay to send a patch that add a new AF?
> is there a "AF registry" somewhere? (like the device major/minor
> registry or the well-known port assignment for TCP-IP).

The usual process is to just add the value as part of the patchset.
You then need to tell the glibc maintainers so it gets included appropriately
in userspace.

> - Hook: we have two different options. We can add another grabbing
> inline function like those used by the bridge and macvlan or we can
> design a grabbing service registration facility. Which one is preferrable?

The problem with making it a registration facilties are:
 * risk of making it easier for non-GPL out of tree abuse
 * possible ordering issues: ie. by hardcoding each hook, the
behaviour is defined in the case of multiple usages on the same
machine.

> The former is simpler, the latter is more elegant but it requires some 
> changes in the kernel bridge code.

Not a big deal, but see above

> So the former choice is between less-invasive,safer,inelegant, the
> latter is more-invasive,less safe,elegant.

 
> We need a bit of time to stabilize the code: deeply testing the existing
> features and implementing some more ideas we have on it.
> In the meanwhile we would be grateful if the community could kindly ask to the
> questions above.

I am a believer in review early and often. It is easier to just deal with
the nuisance issues (style, naming, configuration) at the beginning rather
than the final stage of the project.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Renzo Davoli
> In the meanwhile we would be grateful if the community could kindly ask to the
> questions above.
Obviously I meant:
In the meanwhile we would be grateful if the community could kindly *answer*
to the questions above

sorry (it is early morning here, it happens ;-)

renzo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Renzo Davoli
On Wed, Dec 05, 2007 at 04:55:52PM -0500, Stephen Hemminger wrote:
> On Wed, 5 Dec 2007 17:40:55 +0100
> [EMAIL PROTECTED] (Renzo Davoli) wrote:
> > 0- (Constructive) comments.
> > 1- The "official" assignment of an Address Family.
> > 2- Another "grabbing hook" for interfaces (like the ones already
> > We are studying some way to register/deregister grabbing services,
> > I feel this would be the cleanest way. 
> 
> Post complete source code for kernel part to [EMAIL PROTECTED]
I'll do it as soon as possible.
> If you want the hooks, you need to include the full source code for inclusion
> in mainline. All the Documentation/SubmittingPatches rules apply;
> you can't just ask for "facilitators" and expect to keep your stuff out of 
> tree.
I am sorry if I was misunderstood.
I did not want any "facilitator", nor I wanted to keep my code outside
the kernel, on the contrary.
It is perfectly okay for me to provide the entire code for inclusion.
The purposes of my message were the following:
- I wanted to introduce the idea and say to the linux kernel community
  that a team is working on it.
- Address family: is it okay to send a patch that add a new AF?
is there a "AF registry" somewhere? (like the device major/minor
registry or the well-known port assignment for TCP-IP).
- Hook: we have two different options. We can add another grabbing
inline function like those used by the bridge and macvlan or we can
design a grabbing service registration facility. Which one is preferrable?
The former is simpler, the latter is more elegant but it requires some 
changes in the kernel bridge code.
So the former choice is between less-invasive,safer,inelegant, the
latter is more-invasive,less safe,elegant.

We need a bit of time to stabilize the code: deeply testing the existing
features and implementing some more ideas we have on it.
In the meanwhile we would be grateful if the community could kindly ask to the
questions above.

renzo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Renzo Davoli
On Thu, Dec 06, 2007 at 12:39:22AM +0100, Andi Kleen wrote:
> [EMAIL PROTECTED] (Renzo Davoli) writes:
> 
> > Berkeley socket have been designed for client server or point to point
> > communication. All existing Address Families implement this idea.
> Netlink is multicast/broadcast by default for once. And BC/MC certainly
> works for IPv[46] and a couple of other protocols too.
> 
> > IPN is an Inter Process Communication paradigm where all the processes
> > appear as they were connected by a networking bus.
> 
> Sounds like netlink. See also RFC 3549

RFC 3549 says:
"This document describes Linux Netlink, which is used in Linux both as
   an intra-kernel messaging system as well as between kernel and user
   space."

We know AF_NETLINK, our user-space stack lwipv6 supports it.

AF_IPN is different. 
AF_IPN is the broadcast and peer-to-peer extension of AF_UNIX.
It supports communication among *user* processes. 

Example:

Qemu, User-Mode Linux, Kvm, our umview machines can use IPN as an
Ethernet Hub and communicate among themselves with the hosting computer 
and the world by a tap like interface.

You can also grab an interface (say eth1) and use eth0 for your hosting
computer and eth1 for the IPN network of virtual machines.

If you load the kvde_switch submodule IPN can be a virtual Ethernet switch.

This example is already working using the svn versions of ipn and
vdeplug.

Another Example:

You have a continuous stream of data packets generated by a process,
and you want to send this data to many processes.
Maybe the set of processes is not known in advance, you want to send the
data to any interested process. Some kind of publish
communication service (among unix processes not on TCP-IP).
Without IPN you need a server. With IPN the sender creates the socket
connects to it and feed it with data packets. All the interested 
receivers connects to it and start reading. That's all.

I hope that this message can give a better undertanding of what IPN is.

renzo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Andi Kleen
[EMAIL PROTECTED] (Renzo Davoli) writes:

> Berkeley socket have been designed for client server or point to point
> communication. All existing Address Families implement this idea.

Netlink is multicast/broadcast by default for once. And BC/MC certainly
works for IPv[46] and a couple of other protocols too.

> IPN is an Inter Process Communication paradigm where all the processes
> appear as they were connected by a networking bus.

Sounds like netlink. See also RFC 3549

Haven't read further I admit.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Stephen Hemminger
On Wed, 5 Dec 2007 17:40:55 +0100
[EMAIL PROTECTED] (Renzo Davoli) wrote:

> 
> WHAT WE NEED FROM THE LINUX KERNEL COMMUNITY
> 
> 0- (Constructive) comments.
> 
> 1- The "official" assignment of an Address Family.
> (It is enough for everything but interface grabbing, see 2)
> 
> in include/linux/net.h:
> - #define NPROTO  34  /* should be enough for now..  */
> + #define NPROTO  35  /* should be enough for now..  */
> 
> in include/linux/socket.h
> + #define AF_IPN 34
> + #define PF_IPN AF_IPN
> - #define AF_MAX  34  /* For now.. */
> + #define AF_MAX  35  /* For now.. */
> 
> This seems to be quite simple.
> 
> 2- Another "grabbing hook" for interfaces (like the ones already
> existing for the kernel bridge and for the macvlan).
> 
> In include/linux/netdevice.h:
> among the fields of struct net_device:
> 
> /* bridge stuff */
>   struct net_bridge_port  *br_port;
>   /* macvlan */
>   struct macvlan_port *macvlan_port;
> +/* ipn */
> +struct ipn_node*ipn_port;
>
>   /* class/net/name entry */
>   struct device   dev;
> 
> In net/core/dev.c, we need another section for grabbing packets
> like the ones defined for CONFIG_BRIDGE and CONFIG_MACVLAN.
> I can write the patch (it needs just tens of minutes of cut).
> We are studying some way to register/deregister grabbing services,
> I feel this would be the cleanest way. 
> 
> WHERE?
> --
> There is an experimental version in the VDE svn tree.
> http://sourceforge.net/projects/vde
>

Post complete source code for kernel part to [EMAIL PROTECTED]
If you want the hooks, you need to include the full source code for inclusion
in mainline. All the Documentation/SubmittingPatches rules apply;
you can't just ask for "facilitators" and expect to keep your stuff out of tree.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Renzo Davoli
Inter Process Networking: 
a kernel module (and some simple kernel patches) to provide 
AF_IPN: a new address family for process networking, i.e. multipoint,
multicast/broadcast communication among processes (and networks).

WHAT IS IT?
---
Berkeley socket have been designed for client server or point to point
communication. All existing Address Families implement this idea.
IPN is a new address family designed for one-to-many, many-to-many and 
peer-to-peer communication among processes.
IPN is an Inter Process Communication paradigm where all the processes
appear as they were connected by a networking bus.
On IPN, processes can interoperate using real networking protocols 
(e.g. ethernet) but also using application defined protocols (maybe 
just sending ascii strings, video or audio frames, etc).
IPN provides networking (in the broaden definition you can imagine) to
the processes. Processes can be ethernet nodes, run their own TCP-IP stacks
if they like (e.g. virtual machines), mount ATAonEthernet disks, etc.etc.
IPN networks can be interconnected with real networks or IPN networks
running on different computers can interoperate (can be connected by
virtual cables).
IPN is part of the Virtual Square Project (vde, lwipv6, view-os, 
umview/kmview, see wiki.virtualsquare.org).

WHY?

Many applications can benefit from IPN.
First of all VDE (Virtual Distributed Ethernet): one service of IPN is a
kernel implementation of VDE.
IPN can be useful for applications where one or some processes feed their 
data to several consuming processes (maybe joining the stream at run time).
IPN sockets can be also connected to tap (tuntap) like interfaces or
to real interfaces (like "brctl addif").
There are specific ioctls to define a tap interface or grab an existing
one.
Several existing services could be implemented (and often could have extended
features) on the top of IPN:
- kernel bridge
- tuntap
- macvlan
IPN could be used (IMHO) to provide multicast services to processes.
Audio frames or video frames could be multiplexed such that multiple
applications can use them. I think that something like Jack can be
implemented on the top of IPN. Something like a VideoJack can
provide video frames to several applications: e.g. the same image from a
camera can be viewed by xawtv, recorded and sent to a streaming service.
IPN sockets can be used wherever there is the idea of broadcasting channel 
i.e. where processes can "join (and leave) the information flow" at
runtime. 
Different delivery policies can be defined as IPN protocols (loaded 
as submodules of ipn.ko).
e.g. ethernet switch is a policy (kvde_switch.ko: packets are unicast 
delivered if the MAC address is already in the switching hash table), 
we are designing an extendended switch, full of interesting features like
our userland vde_switch (with vlan/fst/manamement etc..), and a layer3
switch, but other policies can be defined to implement the specific
requirements of other services. I feel that there is no limits to creativity 
about multicast services for processes.
Userspace services (like vde or jack) do exist, but IPN provides
a faster and unified support.

HOW?

The complete specifications for IPN can be found here:
http://wiki.virtualsquare.org/index.php/IPN

bind() creates the socket (if it does not already exist). When bind() succeeds, 
the process has the right to manage the "network". 
No data is received or can be send if the socket is not connected 
(only get/setsockopt and ioctl work on bound unconnected sockets).

connect() is used to join the network. When the socket is connected it 
is possible to send/receive data. If the socket is already bound it is
useless to specify the socket again (you can use NULL, or specify the same
address).
connect() can be also used without bind(). In this case the process sends and
receives data but it cannot manage the network (in this case the socket
address specification is required).

listen() and accept() are for servers, thus they does not exist for IPN.

Examples:
1- Peer-to-Peer Communication:
Several processes run the same code:

  struct sockaddr_un sun={.sun_family=AF_IPN,.sun_path="/tmp/sockipn"};
  int s=socket(AF_IPN,SOCK_RAW,IPN_BROADCAST); 
  err=bind(s,(struct sockaddr *),sizeof(sun));
  err=connect(s,NULL,0);

In this case all the messages sent by each process get received by all the
other processes (IPN_BROADCAST). 
The processes need to be able to receive data when there are pending packets, 
e.g. by using poll/select and event driven programming or multithreading.

2- (One or) Some senders/many receivers
The sender runs the following code:

  struct sockaddr_un sun={.sun_family=AF_IPN,.sun_path="/tmp/sockipn"};
  int s=socket(AF_IPN,SOCK_RAW,IPN_BROADCAST);
  err=shutdown(s,SHUT_RD);
  err=bind(s,(struct sockaddr *),sizeof(sun));
  err=connect(s,NULL,0);

The receivers do not need to define the network, thus they skip the bind():

  struct sockaddr_un 

New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Renzo Davoli
Inter Process Networking: 
a kernel module (and some simple kernel patches) to provide 
AF_IPN: a new address family for process networking, i.e. multipoint,
multicast/broadcast communication among processes (and networks).

WHAT IS IT?
---
Berkeley socket have been designed for client server or point to point
communication. All existing Address Families implement this idea.
IPN is a new address family designed for one-to-many, many-to-many and 
peer-to-peer communication among processes.
IPN is an Inter Process Communication paradigm where all the processes
appear as they were connected by a networking bus.
On IPN, processes can interoperate using real networking protocols 
(e.g. ethernet) but also using application defined protocols (maybe 
just sending ascii strings, video or audio frames, etc).
IPN provides networking (in the broaden definition you can imagine) to
the processes. Processes can be ethernet nodes, run their own TCP-IP stacks
if they like (e.g. virtual machines), mount ATAonEthernet disks, etc.etc.
IPN networks can be interconnected with real networks or IPN networks
running on different computers can interoperate (can be connected by
virtual cables).
IPN is part of the Virtual Square Project (vde, lwipv6, view-os, 
umview/kmview, see wiki.virtualsquare.org).

WHY?

Many applications can benefit from IPN.
First of all VDE (Virtual Distributed Ethernet): one service of IPN is a
kernel implementation of VDE.
IPN can be useful for applications where one or some processes feed their 
data to several consuming processes (maybe joining the stream at run time).
IPN sockets can be also connected to tap (tuntap) like interfaces or
to real interfaces (like brctl addif).
There are specific ioctls to define a tap interface or grab an existing
one.
Several existing services could be implemented (and often could have extended
features) on the top of IPN:
- kernel bridge
- tuntap
- macvlan
IPN could be used (IMHO) to provide multicast services to processes.
Audio frames or video frames could be multiplexed such that multiple
applications can use them. I think that something like Jack can be
implemented on the top of IPN. Something like a VideoJack can
provide video frames to several applications: e.g. the same image from a
camera can be viewed by xawtv, recorded and sent to a streaming service.
IPN sockets can be used wherever there is the idea of broadcasting channel 
i.e. where processes can join (and leave) the information flow at
runtime. 
Different delivery policies can be defined as IPN protocols (loaded 
as submodules of ipn.ko).
e.g. ethernet switch is a policy (kvde_switch.ko: packets are unicast 
delivered if the MAC address is already in the switching hash table), 
we are designing an extendended switch, full of interesting features like
our userland vde_switch (with vlan/fst/manamement etc..), and a layer3
switch, but other policies can be defined to implement the specific
requirements of other services. I feel that there is no limits to creativity 
about multicast services for processes.
Userspace services (like vde or jack) do exist, but IPN provides
a faster and unified support.

HOW?

The complete specifications for IPN can be found here:
http://wiki.virtualsquare.org/index.php/IPN

bind() creates the socket (if it does not already exist). When bind() succeeds, 
the process has the right to manage the network. 
No data is received or can be send if the socket is not connected 
(only get/setsockopt and ioctl work on bound unconnected sockets).

connect() is used to join the network. When the socket is connected it 
is possible to send/receive data. If the socket is already bound it is
useless to specify the socket again (you can use NULL, or specify the same
address).
connect() can be also used without bind(). In this case the process sends and
receives data but it cannot manage the network (in this case the socket
address specification is required).

listen() and accept() are for servers, thus they does not exist for IPN.

Examples:
1- Peer-to-Peer Communication:
Several processes run the same code:

  struct sockaddr_un sun={.sun_family=AF_IPN,.sun_path=/tmp/sockipn};
  int s=socket(AF_IPN,SOCK_RAW,IPN_BROADCAST); 
  err=bind(s,(struct sockaddr *)sun,sizeof(sun));
  err=connect(s,NULL,0);

In this case all the messages sent by each process get received by all the
other processes (IPN_BROADCAST). 
The processes need to be able to receive data when there are pending packets, 
e.g. by using poll/select and event driven programming or multithreading.

2- (One or) Some senders/many receivers
The sender runs the following code:

  struct sockaddr_un sun={.sun_family=AF_IPN,.sun_path=/tmp/sockipn};
  int s=socket(AF_IPN,SOCK_RAW,IPN_BROADCAST);
  err=shutdown(s,SHUT_RD);
  err=bind(s,(struct sockaddr *)sun,sizeof(sun));
  err=connect(s,NULL,0);

The receivers do not need to define the network, thus they skip the bind():

  struct sockaddr_un 

Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Andi Kleen
[EMAIL PROTECTED] (Renzo Davoli) writes:

 Berkeley socket have been designed for client server or point to point
 communication. All existing Address Families implement this idea.

Netlink is multicast/broadcast by default for once. And BC/MC certainly
works for IPv[46] and a couple of other protocols too.

 IPN is an Inter Process Communication paradigm where all the processes
 appear as they were connected by a networking bus.

Sounds like netlink. See also RFC 3549

Haven't read further I admit.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Stephen Hemminger
On Wed, 5 Dec 2007 17:40:55 +0100
[EMAIL PROTECTED] (Renzo Davoli) wrote:

 
 WHAT WE NEED FROM THE LINUX KERNEL COMMUNITY
 
 0- (Constructive) comments.
 
 1- The official assignment of an Address Family.
 (It is enough for everything but interface grabbing, see 2)
 
 in include/linux/net.h:
 - #define NPROTO  34  /* should be enough for now..  */
 + #define NPROTO  35  /* should be enough for now..  */
 
 in include/linux/socket.h
 + #define AF_IPN 34
 + #define PF_IPN AF_IPN
 - #define AF_MAX  34  /* For now.. */
 + #define AF_MAX  35  /* For now.. */
 
 This seems to be quite simple.
 
 2- Another grabbing hook for interfaces (like the ones already
 existing for the kernel bridge and for the macvlan).
 
 In include/linux/netdevice.h:
 among the fields of struct net_device:
 
 /* bridge stuff */
   struct net_bridge_port  *br_port;
   /* macvlan */
   struct macvlan_port *macvlan_port;
 +/* ipn */
 +struct ipn_node*ipn_port;

   /* class/net/name entry */
   struct device   dev;
 
 In net/core/dev.c, we need another section for grabbing packets
 like the ones defined for CONFIG_BRIDGE and CONFIG_MACVLAN.
 I can write the patch (it needs just tens of minutes of cutpaste).
 We are studying some way to register/deregister grabbing services,
 I feel this would be the cleanest way. 
 
 WHERE?
 --
 There is an experimental version in the VDE svn tree.
 http://sourceforge.net/projects/vde


Post complete source code for kernel part to [EMAIL PROTECTED]
If you want the hooks, you need to include the full source code for inclusion
in mainline. All the Documentation/SubmittingPatches rules apply;
you can't just ask for facilitators and expect to keep your stuff out of tree.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Renzo Davoli
On Thu, Dec 06, 2007 at 12:39:22AM +0100, Andi Kleen wrote:
 [EMAIL PROTECTED] (Renzo Davoli) writes:
 
  Berkeley socket have been designed for client server or point to point
  communication. All existing Address Families implement this idea.
 Netlink is multicast/broadcast by default for once. And BC/MC certainly
 works for IPv[46] and a couple of other protocols too.
 
  IPN is an Inter Process Communication paradigm where all the processes
  appear as they were connected by a networking bus.
 
 Sounds like netlink. See also RFC 3549

RFC 3549 says:
This document describes Linux Netlink, which is used in Linux both as
   an intra-kernel messaging system as well as between kernel and user
   space.

We know AF_NETLINK, our user-space stack lwipv6 supports it.

AF_IPN is different. 
AF_IPN is the broadcast and peer-to-peer extension of AF_UNIX.
It supports communication among *user* processes. 

Example:

Qemu, User-Mode Linux, Kvm, our umview machines can use IPN as an
Ethernet Hub and communicate among themselves with the hosting computer 
and the world by a tap like interface.

You can also grab an interface (say eth1) and use eth0 for your hosting
computer and eth1 for the IPN network of virtual machines.

If you load the kvde_switch submodule IPN can be a virtual Ethernet switch.

This example is already working using the svn versions of ipn and
vdeplug.

Another Example:

You have a continuous stream of data packets generated by a process,
and you want to send this data to many processes.
Maybe the set of processes is not known in advance, you want to send the
data to any interested process. Some kind of publishsubscribe
communication service (among unix processes not on TCP-IP).
Without IPN you need a server. With IPN the sender creates the socket
connects to it and feed it with data packets. All the interested 
receivers connects to it and start reading. That's all.

I hope that this message can give a better undertanding of what IPN is.

renzo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Renzo Davoli
On Wed, Dec 05, 2007 at 04:55:52PM -0500, Stephen Hemminger wrote:
 On Wed, 5 Dec 2007 17:40:55 +0100
 [EMAIL PROTECTED] (Renzo Davoli) wrote:
  0- (Constructive) comments.
  1- The official assignment of an Address Family.
  2- Another grabbing hook for interfaces (like the ones already
  We are studying some way to register/deregister grabbing services,
  I feel this would be the cleanest way. 
 
 Post complete source code for kernel part to [EMAIL PROTECTED]
I'll do it as soon as possible.
 If you want the hooks, you need to include the full source code for inclusion
 in mainline. All the Documentation/SubmittingPatches rules apply;
 you can't just ask for facilitators and expect to keep your stuff out of 
 tree.
I am sorry if I was misunderstood.
I did not want any facilitator, nor I wanted to keep my code outside
the kernel, on the contrary.
It is perfectly okay for me to provide the entire code for inclusion.
The purposes of my message were the following:
- I wanted to introduce the idea and say to the linux kernel community
  that a team is working on it.
- Address family: is it okay to send a patch that add a new AF?
is there a AF registry somewhere? (like the device major/minor
registry or the well-known port assignment for TCP-IP).
- Hook: we have two different options. We can add another grabbing
inline function like those used by the bridge and macvlan or we can
design a grabbing service registration facility. Which one is preferrable?
The former is simpler, the latter is more elegant but it requires some 
changes in the kernel bridge code.
So the former choice is between less-invasive,safer,inelegant, the
latter is more-invasive,less safe,elegant.

We need a bit of time to stabilize the code: deeply testing the existing
features and implementing some more ideas we have on it.
In the meanwhile we would be grateful if the community could kindly ask to the
questions above.

renzo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Renzo Davoli
 In the meanwhile we would be grateful if the community could kindly ask to the
 questions above.
Obviously I meant:
In the meanwhile we would be grateful if the community could kindly *answer*
to the questions above

sorry (it is early morning here, it happens ;-)

renzo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Stephen Hemminger
On Thu, 6 Dec 2007 06:38:21 +0100
[EMAIL PROTECTED] (Renzo Davoli) wrote:

 On Wed, Dec 05, 2007 at 04:55:52PM -0500, Stephen Hemminger wrote:
  On Wed, 5 Dec 2007 17:40:55 +0100
  [EMAIL PROTECTED] (Renzo Davoli) wrote:
   0- (Constructive) comments.
   1- The official assignment of an Address Family.
   2- Another grabbing hook for interfaces (like the ones already
   We are studying some way to register/deregister grabbing services,
   I feel this would be the cleanest way. 
  
  Post complete source code for kernel part to [EMAIL PROTECTED]
 I'll do it as soon as possible.
  If you want the hooks, you need to include the full source code for 
  inclusion
  in mainline. All the Documentation/SubmittingPatches rules apply;
  you can't just ask for facilitators and expect to keep your stuff out of 
  tree.
 I am sorry if I was misunderstood.
 I did not want any facilitator, nor I wanted to keep my code outside
 the kernel, on the contrary.

Greate

 It is perfectly okay for me to provide the entire code for inclusion.
 The purposes of my message were the following:
 - I wanted to introduce the idea and say to the linux kernel community
   that a team is working on it.
 - Address family: is it okay to send a patch that add a new AF?
 is there a AF registry somewhere? (like the device major/minor
 registry or the well-known port assignment for TCP-IP).

The usual process is to just add the value as part of the patchset.
You then need to tell the glibc maintainers so it gets included appropriately
in userspace.

 - Hook: we have two different options. We can add another grabbing
 inline function like those used by the bridge and macvlan or we can
 design a grabbing service registration facility. Which one is preferrable?

The problem with making it a registration facilties are:
 * risk of making it easier for non-GPL out of tree abuse
 * possible ordering issues: ie. by hardcoding each hook, the
behaviour is defined in the case of multiple usages on the same
machine.

 The former is simpler, the latter is more elegant but it requires some 
 changes in the kernel bridge code.

Not a big deal, but see above

 So the former choice is between less-invasive,safer,inelegant, the
 latter is more-invasive,less safe,elegant.

 
 We need a bit of time to stabilize the code: deeply testing the existing
 features and implementing some more ideas we have on it.
 In the meanwhile we would be grateful if the community could kindly ask to the
 questions above.

I am a believer in review early and often. It is easier to just deal with
the nuisance issues (style, naming, configuration) at the beginning rather
than the final stage of the project.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread Kyle Moffett

On Dec 06, 2007, at 00:30:16, Renzo Davoli wrote:
AF_IPN is different.  AF_IPN is the broadcast and peer-to-peer  
extension of AF_UNIX. It supports communication among *user*  
processes.


Ok, you say it's different, but then you describe how IP unicast and  
broadcast work.  Both are frequently used for communication among  
*user* processes.  Please provide significantly more details about  
exactly *how* it's different.




Example:

Qemu, User-Mode Linux, Kvm, our umview machines can use IPN as an  
Ethernet Hub and communicate among themselves with the hosting  
computer and the world by a tap like interface.


You say tap like interface, but people do this already with  
existing infrastructure.  You can connect Qemu, UML, and KVM to a  
standard linus tap interface, and then use the standard Linux  
bridging code to connect the tap interface to your existing network  
interfaces.  Alternatively you could use the standard and well-tested  
IP routing/firewalling/NAT code to move your packets around.  None of  
this requires new network infrastructure in the slightest.  If you  
have problems with the existing code, please improve it instead of  
creating a slightly incompatible replacement which has different bugs  
and workarounds.



You can also grab an interface (say eth1) and use eth0 for your  
hosting computer and eth1 for the IPN network of virtual machines.


You can do that already with the bridging code.


If you load the kvde_switch submodule IPN can be a virtual Ethernet  
switch.


As I described above, this can be done with the existing bridging and  
tun/tap code.




Another Example:

You have a continuous stream of data packets generated by a  
process, and you want to send this data to many processes.  Maybe  
the set of processes is not known in advance, you want to send the  
data to any interested process. Some kind of publishsubscribe  
communication service (among unix processes not on TCP-IP). Without  
IPN you need a server. With IPN the sender creates the socket  
connects to it and feed it with data packets. All the interested  
receivers connects to it and start reading. That's all.


This is already done frequently in userspace.  Just register a port  
number with IANA on which to implement a registration server and  
write a little daemon to listen on 127.0.0.1:${YOUR_PORT}.  Your  
interconnecting programs then use either unicast or multicast sockets  
to bind, then report to the registration server what service you are  
offering and what port it's on.  Your receivers then connect to the  
registration server, ask what port a given service is on, and then  
multicast-listen or unicast-connect to access that service.  The best  
part is that all of the performance implications are already  
thoroughly understood.  Furthermore, if you want to extend your  
communication protocol to other hosts as well, you just have to  
replace the 127.0.0.1 bind with a global bind.  This is exactly how  
the standard-specified multiple-participant SIP protocol works, for  
example.



So if you really think this is something that belongs in the kernel  
you need to provide much more detailed descriptions and use-cases for  
why it cannot be implemented in user-space or with small  
modifications to existing UDP/TCP networking.


Cheers,
Kyle Moffett

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New Address Family: Inter Process Networking (IPN)

2007-12-05 Thread David Newall

Kyle Moffett wrote:

On Dec 06, 2007, at 00:30:16, Renzo Davoli wrote:
AF_IPN is different.  AF_IPN is the broadcast and peer-to-peer 
extension of AF_UNIX. It supports communication among *user* processes.


Ok, you say it's different, but then you describe how IP unicast and 
broadcast work.


Renzo also described something new (in the socket() arena): the 
multi-reader, multi-writer is just not available in IP.


I wonder if this solves the same problem as d-bus?


So if you really think this is something that belongs in the kernel 
you need to provide much more detailed descriptions and use-cases for 
why it cannot be implemented in user-space or with small modifications 
to existing UDP/TCP networking. 


I would strengthen this sentiment: If you think something belongs in the 
kernel, you need to argue your case (provide much more detailed 
descriptions and use-cases.)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/