Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-18 Thread jamal
On Fri, 2008-18-01 at 08:45 +0200, Timo Teräs wrote:

> I'll run my patched kernel 
> and try to get ipsec-tools fixed to use netlink...
> eventually.

If you are going to mod racoon to use netlink then thats without a doubt
the _best_ solution (linux distros dilema i had essentially disappears).

I would suggest as looking at racoon2 since the same KAME toolsmiths are
involved (Soichi Sakane CCed) and you can kill two birds with one stone.
 
In any case Timo - thanks again for your efforts.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread Timo Teräs
Herbert Xu wrote:
> On Thu, Jan 17, 2008 at 03:31:14PM +0200, Timo Teräs wrote:
>> I guess the idea was that application should know about the SAs it
>> created. Though a SA dump needs to be done if you want to check
>> for existing entries (created by other processes, or if you are
>> recovering from a crash).
> 
> That's what SADB_GET is for.  In any case KMs cannot coexist so this
> is pointless.  After a crash you should simply flush all states and
> policies.

I thought KMs could coexist. Actually, in strongSwan you have two
daemons: charon and pluto. Both can run at the same time. The other
is for IKEv1 and the other one for IKEv2. It uses netlink, though.
Looking the way pfkey works, it looks like being designed to work
with multiple KMs (e.g. acquires are sent to all registered sockets).

>> SPD dumping is still a must if you want to work nicely with kernel.
> 
> No it isn't.  Look at how Openswan does it.  No dumping anywhere at all.

Then you have to have all policy/static association configuration in
the application configuration. ipsec-tools wanted separate that. As
this is more robust if someone decided to run multiple KMs.

My point (as an ipsec-tools developer) is that the patch would fix
a lot of problems until ipsec-tools is netlink enabled. It would
handle perfectly the dumping even as non-atomic. And that ipsec-tools
is the KM you get by default in almost all distributions.

But apparently you are too worried that all the so many other KMs
still using pfkey (is there others? I think most others use netlink
already) in Linux might break because the rely on an unspecified
feature of pfkey.

I'm also tired of arguing since this is going nowhere. I'll run my
patched kernel and try to get ipsec-tools fixed to use netlink...
eventually.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread Herbert Xu
On Thu, Jan 17, 2008 at 03:31:14PM +0200, Timo Teräs wrote:
> 
> I guess the idea was that application should know about the SAs it
> created. Though a SA dump needs to be done if you want to check
> for existing entries (created by other processes, or if you are
> recovering from a crash).

That's what SADB_GET is for.  In any case KMs cannot coexist so this
is pointless.  After a crash you should simply flush all states and
policies.

> SPD dumping is still a must if you want to work nicely with kernel.

No it isn't.  Look at how Openswan does it.  No dumping anywhere at all.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread Timo Teräs
Herbert Xu wrote:
> On Thu, Jan 17, 2008 at 07:42:30AM -0500, jamal wrote:
>> Looking at the pfkey RFC one more time, heres a funny quote:
>> "
>> The dump message is used for debugging
>> purposes only and is not intended for production use.
>> "
> 
> In fact it goes much further:
> 
>Support for the dump message MAY be discontinued in future versions
>of PF_KEY.  Key management applications MUST NOT depend on this
>message for basic operation.

I guess the idea was that application should know about the SAs it
created. Though a SA dump needs to be done if you want to check
for existing entries (created by other processes, or if you are
recovering from a crash).

SPD dumping is still a must if you want to work nicely with kernel.

As noted earlier pfkey is not really standardized. E.g. the SPD
dumping message are not in the RFC as David noted. The above RFC
comments and the fact that SPD stuff is unspecified made me think
that making non-atomic dumps would be a lot better alternative then
leaving the socket to bad state which would make the application
completely unusable.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread jamal
On Thu, 2008-17-01 at 23:50 +1100, Herbert Xu wrote:

> In fact it goes much further:
> 
>Support for the dump message MAY be discontinued in future versions
>of PF_KEY.  Key management applications MUST NOT depend on this
>message for basic operation.

No doubt PF_KEY being an RFC has caused a lot of damage. 
Once something is deployed, unfortunately, it grows a foot and sometimes
a head[1]. 
Note: it's a big dilema in my mind as well and i agree in principle with
both Dave and you (we really should not be helping pfkey grow another
ear on the forehead); the only way i am convincing myself otherwise is
to note that Racoon/ipsec-tools is out there, shipped as default ipsec
user management tools by most if not all linux distros. If we really
want to stop the beast lets cut out the umbilicall code - just take out
pfkey altogether from Linux ;->


cheers,
jamal

[1] Whatever fix/approach Timo has will eventually show up in the BSDs
and solaris for example 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread Herbert Xu
On Thu, Jan 17, 2008 at 07:42:30AM -0500, jamal wrote:
> 
> Looking at the pfkey RFC one more time, heres a funny quote:
> "
> The dump message is used for debugging
> purposes only and is not intended for production use.
> "

In fact it goes much further:

   Support for the dump message MAY be discontinued in future versions
   of PF_KEY.  Key management applications MUST NOT depend on this
   message for basic operation.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread jamal
On Thu, 2008-17-01 at 07:54 +0200, Timo Teräs wrote:

> You listen for the events. It is guaranteed that if the dumping code
> does return the entry to be deleted, the deletion notification will
> occur after that dump entry.

Ok, sounds reasonable - as long as there is a known order for
occurances, then there will be no ambiguity.
I am assuming that the same ordering will happen with
updates/modifications?
To go back to what i suggested earlier - is it possible to have this in
two stages? First pfkey with expected behavior being the same as current
netlink; then later the optimizations you are talking about.

Looking at the pfkey RFC one more time, heres a funny quote:
"
The dump message is used for debugging
purposes only and is not intended for production use.
"

One thing Dave mentioned thats extremely important is to ensure no ABI 
breakage. 
Think of racoon 0.6 which knows nothing of this; it should continue to work.

Dave: One reason i paid attention to this is because it was on your TODO
list from netconf 2005 ;-> It has just been sitting in the background
memory cells since.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread Timo Teräs
David Miller wrote:
> From: Timo_Teräs <[EMAIL PROTECTED]>
> Date: Thu, 17 Jan 2008 13:00:09 +0200
> 
>> IMHO, it's a lot better then losing >50% of entries and the end
>> of sequence message on big dumps. SPD and SADB are not that
>> volatile; in most of the cases the dump would be as good as an
>> atomic one.
> 
> I humbly disagree with you.  Interface behavior stability
> is more important.

Small SPDs/SADBs would still be dumped atomically. The patch
affects only the cases when the receive queue is getting full.

>> I'm not sure if there's other major applications that we should
>> be concerned about, but at least ipsec-tools racoon does not
>> expect to get atomic dumps (which btw, comes originally from BSD).
> 
> Racoon was written as an addon to the BSD stack by an IPV6/IPSEC
> project in Japan named KAME, it did not "come from BSD".  It was
> added to BSD.
> 
> There are also other BSD based IPSEC daemons such as the one written
> by the OpenBSD folks.

Yes. I meant that it was originally written to be used in BSD. The
Linux port came later. Sorry for the ambiguous wording.

> I don't think this is arguable at all.  We're not changing semantics
> over what we've done for 4+ years and applications might depend upon.
> It's for a deprecated interface, which makes any semantic changes that
> much less inviting.
> 
> You can argue all you want, but it will not change the invariants in
> the previous paragraph.

True. If no one else agrees with me, I'll drop it. I can always run
my own patched kernel.

I'd appreciate feedback on the xfrm changes. I'll try to make that
part usable patch against net-2.6.25 git tree next week.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread jamal
On Thu, 2008-17-01 at 22:11 +1100, Herbert Xu wrote:

> Sure racoon uses pfkey but the question is does it use pfkey dumping?
> 

it does when trying to purge phase 2 SAs...

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread Timo Teräs
Herbert Xu wrote:
> On Thu, Jan 17, 2008 at 07:54:32AM +0200, Timo Teräs wrote:
>>> Racoon doesn't use pfkey dumping as far as I know.
>> ipsec-tools racoon uses pfkey and only pfkey. And it's non trivial to
>> make it use netlink; it relies heavily all around the code to pfkey
>> structs. It also runs on BSD so we cannot rip pfkey away; adding a
>> layer to work with both pfkey and netlink would be doable, but just a
>> lot of work.
> 
> Sure racoon uses pfkey but the question is does it use pfkey dumping?

Yes it does.

It does SPD dump at startup and keeps the SP database in sync by
listening to notifications.

It also does SA dumps when it is figuring out which SAs to purge.

I started to work on the xfrm/pfkey patch only because racoon is
having problems with (as is anything else using pfkey).

- Timo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread Herbert Xu
On Thu, Jan 17, 2008 at 07:54:32AM +0200, Timo Teräs wrote:
>
> > Racoon doesn't use pfkey dumping as far as I know.
> 
> ipsec-tools racoon uses pfkey and only pfkey. And it's non trivial to
> make it use netlink; it relies heavily all around the code to pfkey
> structs. It also runs on BSD so we cannot rip pfkey away; adding a
> layer to work with both pfkey and netlink would be doable, but just a
> lot of work.

Sure racoon uses pfkey but the question is does it use pfkey dumping?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread David Miller
From: Timo_Teräs <[EMAIL PROTECTED]>
Date: Thu, 17 Jan 2008 13:00:09 +0200

> IMHO, it's a lot better then losing >50% of entries and the end
> of sequence message on big dumps. SPD and SADB are not that
> volatile; in most of the cases the dump would be as good as an
> atomic one.

I humbly disagree with you.  Interface behavior stability
is more important.

> I'm not sure if there's other major applications that we should
> be concerned about, but at least ipsec-tools racoon does not
> expect to get atomic dumps (which btw, comes originally from BSD).

Racoon was written as an addon to the BSD stack by an IPV6/IPSEC
project in Japan named KAME, it did not "come from BSD".  It was
added to BSD.

There are also other BSD based IPSEC daemons such as the one written
by the OpenBSD folks.

I don't think this is arguable at all.  We're not changing semantics
over what we've done for 4+ years and applications might depend upon.
It's for a deprecated interface, which makes any semantic changes that
much less inviting.

You can argue all you want, but it will not change the invariants in
the previous paragraph.

All of the time you've spent arguing is time not spent on adding
netlink support to the daemons that do not do so already.  And that
would be 2 steps forwards compared to the 1 step backwards your
desired change would be.

I've stated my position as well as I can at this point so
respectfully, since I have tons of other things to do, I'm stepping
out of this specific discussion for now.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread Timo Teräs
David Miller wrote:
> From: Timo_Teräs <[EMAIL PROTECTED]>
> Date: Thu, 17 Jan 2008 12:01:17 +0200
> 
>> David Miller wrote:
>>> This is an inherent aspect of AF_KEY (and what it was
>>> derived from, BSD routing sockets).
>> Yes, this is the way BSD does it.
>>  
>>> It has to provide dumps atomically, and if there is no
>>> space there is no way to provide those entries which
>>> would require more rcvbuf space.
>> RFC does not say it has to be atomic.
> 
> Every application out there in the universe expects BSD socket
> semantics, and therefore atomic dumps.  You cannot "fix" things
> without breaking applications.

IMHO, it's a lot better then losing >50% of entries and the end
of sequence message on big dumps. SPD and SADB are not that
volatile; in most of the cases the dump would be as good as an
atomic one.

Even if it did change during ongoing dump you still get an usable
dump. All the entries reflect real data and there is no dependency
between different entries.

I'm not sure if there's other major applications that we should
be concerned about, but at least ipsec-tools racoon does not
expect to get atomic dumps (which btw, comes originally from BSD).

Cheers,
  Timo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread David Miller
From: Timo_Teräs <[EMAIL PROTECTED]>
Date: Thu, 17 Jan 2008 12:01:17 +0200

> David Miller wrote:
> > This is an inherent aspect of AF_KEY (and what it was
> > derived from, BSD routing sockets).
> 
> Yes, this is the way BSD does it.
>  
> > It has to provide dumps atomically, and if there is no
> > space there is no way to provide those entries which
> > would require more rcvbuf space.
> 
> RFC does not say it has to be atomic.

Every application out there in the universe expects BSD socket
semantics, and therefore atomic dumps.  You cannot "fix" things
without breaking applications.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread Timo Teräs
David Miller wrote:
> From: Timo_Teräs <[EMAIL PROTECTED]>
> Date: Thu, 17 Jan 2008 11:38:13 +0200
> 
>> The af_key issue is that in big dumps you get only first X
>> entries. The rest of the entries are dropped because the
>> socket receive buffer goes full. You get data corruption:
>> missing entries.
> 
> This is an inherent aspect of AF_KEY (and what it was
> derived from, BSD routing sockets).

Yes, this is the way BSD does it.
 
> It has to provide dumps atomically, and if there is no
> space there is no way to provide those entries which
> would require more rcvbuf space.

RFC does not say it has to be atomic.

It does say that the dump is terminated with SADB_DUMP
message having sadb_seq field set to zero. Currently
that is dropped too when the problem occurs. Thus the
socket is left in a bad state: dump ends never. This
can cause applications without any workarounds to hang.

- Timo
 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread David Miller
From: Timo_Teräs <[EMAIL PROTECTED]>
Date: Thu, 17 Jan 2008 11:38:13 +0200

> The af_key issue is that in big dumps you get only first X
> entries. The rest of the entries are dropped because the
> socket receive buffer goes full. You get data corruption:
> missing entries.

This is an inherent aspect of AF_KEY (and what it was
derived from, BSD routing sockets).

It has to provide dumps atomically, and if there is no
space there is no way to provide those entries which
would require more rcvbuf space.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread Timo Teräs
David Miller wrote:
> From: Timo_Teräs <[EMAIL PROTECTED]>
> Date: Thu, 17 Jan 2008 11:20:42 +0200
> 
>> Where as the pfkey bug fix is non-intrusive and helps all
>> legacy applications still using af_key by _fixing a bug in
>> kernel_.
> 
> It's not a bug.  You're fixing a speed issue, not a crash
> or a case where AF_KEY is providing incorrect data.
> 
> That is what I mean when I mean "life support", we fix crashes and
> data corruption.  We don't make performance tweaks.

No. The speed issue is complitely handled in xfrm_state
and xfrm_user changes.

The af_key issue is that in big dumps you get only first X
entries. The rest of the entries are dropped because the
socket receive buffer goes full. You get data corruption:
missing entries.

- Timo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread David Miller
From: Timo_Teräs <[EMAIL PROTECTED]>
Date: Thu, 17 Jan 2008 11:20:42 +0200

> Where as the pfkey bug fix is non-intrusive and helps all
> legacy applications still using af_key by _fixing a bug in
> kernel_.

It's not a bug.  You're fixing a speed issue, not a crash
or a case where AF_KEY is providing incorrect data.

That is what I mean when I mean "life support", we fix crashes and
data corruption.  We don't make performance tweaks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread Timo Teräs
David Miller wrote:
> From: Timo_Teräs <[EMAIL PROTECTED]>
> Date: Thu, 17 Jan 2008 10:11:17 +0200
> 
>> I thought my patch would qualify as "life support" bug fix.
>> Currently racoon fails to work if there are too many SPDs or SAs
>> because the kernel cannot handle the dump request properly. And this
>> is what my patch fixes for pfkey. It adds no new features or
>> functionality; just makes the dumping work with large databases.
> 
> Racoon should use netlink for reasons far and beyond the
> problem you are trying to address.

Yes. But this is fairly major thing to do. One needs to create
API abstraction layer (still need to use pfkey in *BSD). Test it.
A lot of work that is not going to happen very soon.

Where as the pfkey bug fix is non-intrusive and helps all
legacy applications still using af_key by _fixing a bug in
kernel_.

> The dumping behavior of AF_KEY is just horrific, as one of
> several examples.

If af_key is all that bad and does not qualify to get maintanace
bug fixes, why not remove it complitely?

That would make userland adapt faster.

>> Then there's also the xfrm dumping changes which change the
>> algorithm from O(n^2) to O(n) with some memory overhead, but
>> that is a different story. Any comments on that?
> 
> I have no general objections to those changes although I am
> backlogged and thus have not studied them in detail.  Jamal
> is having what appears to be a healthy dialogue with you about
> the details so I'm not concerned much :)

Ok. I hope someone can also give feedback on the naming
conventions. And about the api changes to xfrm policy/state
walking.

- Timo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread David Miller
From: Timo_Teräs <[EMAIL PROTECTED]>
Date: Thu, 17 Jan 2008 10:11:17 +0200

> I thought my patch would qualify as "life support" bug fix.
> Currently racoon fails to work if there are too many SPDs or SAs
> because the kernel cannot handle the dump request properly. And this
> is what my patch fixes for pfkey. It adds no new features or
> functionality; just makes the dumping work with large databases.

Racoon should use netlink for reasons far and beyond the
problem you are trying to address.

The dumping behavior of AF_KEY is just horrific, as one of
several examples.

> Then there's also the xfrm dumping changes which change the
> algorithm from O(n^2) to O(n) with some memory overhead, but
> that is a different story. Any comments on that?

I have no general objections to those changes although I am
backlogged and thus have not studied them in detail.  Jamal
is having what appears to be a healthy dialogue with you about
the details so I'm not concerned much :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-17 Thread Timo Teräs
David Miller wrote:
> Doing anything other than "life support" bug fixes for AF_KEY is
> inappropriate.

Yes. I thought my patch would qualify as "life support" bug fix.
Currently racoon fails to work if there are too many SPDs or SAs
because the kernel cannot handle the dump request properly. And
this is what my patch fixes for pfkey. It adds no new features or
functionality; just makes the dumping work with large databases.

Then there's also the xfrm dumping changes which change the
algorithm from O(n^2) to O(n) with some memory overhead, but
that is a different story. Any comments on that?

- Timo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-16 Thread David Miller
From: Timo_Teräs <[EMAIL PROTECTED]>
Date: Thu, 17 Jan 2008 09:38:15 +0200

> David Miller wrote:
> > From: Timo_Teräs <[EMAIL PROTECTED]>
> > Date: Thu, 17 Jan 2008 08:27:14 +0200
> > 
> >> I don't know about netlink. But pfkey works in *BSD too and it is RFC'd.
> >> So I'd say pfkey might be a bit more portable. Though netlink is definitely
> >> more robust and extensive.
> > 
> > The RFCs say absolutely nothing about policy interfaces for AF_KEY,
> > everybody rolls their own in slightly incompatible ways.
> > 
> > It is therefore anything but standardized.
> 
> Yes, there's non-standardized extensions.

You can't implement a keying daemon without policy support, and policy
support is where the "non-standardized extensions" live.

Doing anything other than "life support" bug fixes for AF_KEY is
inappropriate.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-16 Thread Timo Teräs
David Miller wrote:
> From: Timo_Teräs <[EMAIL PROTECTED]>
> Date: Thu, 17 Jan 2008 08:27:14 +0200
> 
>> I don't know about netlink. But pfkey works in *BSD too and it is RFC'd.
>> So I'd say pfkey might be a bit more portable. Though netlink is definitely
>> more robust and extensive.
> 
> The RFCs say absolutely nothing about policy interfaces for AF_KEY,
> everybody rolls their own in slightly incompatible ways.
> 
> It is therefore anything but standardized.

Yes, there's non-standardized extensions. But the point was that there are
other implementations of pfkey. And ipsec-tools racoon is an example of
a widely used application that runs in Linux and *BSD using this API. So
for the time being I'd consider having pfkey fixes as a good thing. This
pfkey dumping problem seems to be affecting many users.

- Timo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-16 Thread David Miller
From: Timo_Teräs <[EMAIL PROTECTED]>
Date: Thu, 17 Jan 2008 08:27:14 +0200

> I don't know about netlink. But pfkey works in *BSD too and it is RFC'd.
> So I'd say pfkey might be a bit more portable. Though netlink is definitely
> more robust and extensive.

The RFCs say absolutely nothing about policy interfaces for AF_KEY,
everybody rolls their own in slightly incompatible ways.

It is therefore anything but standardized.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-16 Thread Timo Teräs
Herbert Xu wrote:
> jamal <[EMAIL PROTECTED]> wrote:
>> There are two issues that are inter-mingled in there. The most important
>> is pf_key not being robust on dump. The other being the accurracy of
> 
> IMHO doing significant work on af_key is a waste of time.  It has no
> advantages at all over xfrm_user since neither is portable.  So we
> should discourage people from using af_key wherever possible.

I don't know about netlink. But pfkey works in *BSD too and it is RFC'd.
So I'd say pfkey might be a bit more portable. Though netlink is definitely
more robust and extensive.

Cheers,
  Timo


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-16 Thread Timo Teräs
jamal wrote:
> On Wed, 2008-16-01 at 16:28 +0200, Timo Teräs wrote:
>> > No. I'm not creating second copies of the SADB/SPD entries. The entries
>> > are just added to one more list.
> 
> Ah, sorry - yes, that sounds reasonable.
> So what happens if i delete an entry; does it get removed from the list?
> Also what happens on modification?

If the entry is removed befored it is dumped, it wont be dumped at all.
The state during dump code execution is returned. Depending when the
modification occurs it might or might not be reflected in the dumped
entry.

>> > If more entries are added, you can get notifications of them.
> 
> how would a user app (example racoon) appropriately deal with it?
> Example an entry sits in the dump-list, it gets deleted - an event gets
> generated user-space and later that entry shows up in user space dump.

You listen for the events. It is guaranteed that if the dumping code
does return the entry to be deleted, the deletion notification will
occur after that dump entry.

Herbert Xu wrote:
> On Wed, Jan 16, 2008 at 08:39:40PM -0500, jamal wrote:
>> I wouldnt disagree except some apps like racoon which depend on pfkey
>> are unfortunately beyond repair. Timo has a pretty good handle on the
> 
> Racoon doesn't use pfkey dumping as far as I know.

ipsec-tools racoon uses pfkey and only pfkey. And it's non trivial to
make it use netlink; it relies heavily all around the code to pfkey
structs. It also runs on BSD so we cannot rip pfkey away; adding a
layer to work with both pfkey and netlink would be doable, but just a
lot of work.

Also ipsec-tools racoon seems to be the default IKE daemon in some
popular distros. So for the time being I think pfkey is an evil we have
to live with.

Cheers,
  Timo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-16 Thread Herbert Xu
On Wed, Jan 16, 2008 at 08:39:40PM -0500, jamal wrote:
>
> I wouldnt disagree except some apps like racoon which depend on pfkey
> are unfortunately beyond repair. Timo has a pretty good handle on the

Racoon doesn't use pfkey dumping as far as I know.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-16 Thread jamal
On Thu, 2008-17-01 at 09:58 +1100, Herbert Xu wrote:

> IMHO doing significant work on af_key is a waste of time.  It has no
> advantages at all over xfrm_user since neither is portable.  So we
> should discourage people from using af_key wherever possible.

I wouldnt disagree except some apps like racoon which depend on pfkey
are unfortunately beyond repair. Timo has a pretty good handle on the
issue from what i have seen so far, so it pretty much seems to be there
without significant changes (if the two issues are separated).

cheers,
jamal



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-16 Thread jamal
On Wed, 2008-16-01 at 16:28 +0200, Timo Teräs wrote:
[..]
> Creating a separate af_key patch would not be a big problem. I was
> just hoping avoiding it as the xfrm_state / xfrm_policy changes
> modify the API and requires changing af_key also.

The way dumping is done by xfrm_user is consistent across all netlink
not just ipsec. Thats why i said it had broader implications. 
OTOH, theres a clear issue with pf_key.

> No. I'm not creating second copies of the SADB/SPD entries. The entries
> are just added to one more list.

Ah, sorry - yes, that sounds reasonable.
So what happens if i delete an entry; does it get removed from the list?
Also what happens on modification?

> If more entries are added, you can get notifications of them.

how would a user app (example racoon) appropriately deal with it?
Example an entry sits in the dump-list, it gets deleted - an event gets
generated user-space and later that entry shows up in user space dump.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-16 Thread Herbert Xu
jamal <[EMAIL PROTECTED]> wrote:
>
> There are two issues that are inter-mingled in there. The most important
> is pf_key not being robust on dump. The other being the accurracy of

IMHO doing significant work on af_key is a waste of time.  It has no
advantages at all over xfrm_user since neither is portable.  So we
should discourage people from using af_key wherever possible.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-16 Thread Timo Teräs
jamal wrote:
> On Sun, 2008-13-01 at 14:26 +0200, Timo Teräs wrote:
>> I tried to address all these problems, and added the xfrm_policy
>> and xfrm_state structure to a new linked list that is used solely
>> for dumping. This way the dumps can be done in O(n) and have an
>> iterator point to them. I also modified to af_key to stop dumping
>> when receive queue is filling up and continue it from a callback at
>> recvfrom() when there's more room.
>>
>> But I'd like to hear about:
>> - if the approach is ok (adding the entries in one more list)?
>> - if the new/modified variables, structures and function names are ok?
>> - if I should port these against net-2.6 or net-2.6.25 git tree?
>> - if I should split this to more than one commit? (maybe xfrm core,
>> xfrm user and af_key parts?)
> 
> To answer your last 2 questions:
> There are two issues that are inter-mingled in there. The most
> important is pf_key not being robust on dump. The other being the
> accurracy of netlink dumping - this has much broader implications. I
> think you need to separate the patches into those two at least and
> prioritize on the pf_key as a patch that you push in. I would also
> try to make the pf_key dumping approach to mirror what netlink does
> today - in the worst case it would be as innacurate as netlink; which
> is not bad. I think youll find no resistance getting such a patch in.

Creating a separate af_key patch would not be a big problem. I was
just hoping avoiding it as the xfrm_state / xfrm_policy changes
modify the API and requires changing af_key also.

> To answer your first 2 questions:
> 
> My main dilema with your approach is the policy of maintaining such a
>  list in the kernel and memory consumption needed vs the innacuracy 
> of netlink dumps.With your approach of maintaining extra SA/P D, i
> would need double the RAM amount.

No. I'm not creating second copies of the SADB/SPD entries. The entries
are just added to one more list. Memory requirements go up on per entry
(figures based on 2.6.22.9 kernel):
sizeof(struct xfrm_policy) 636 -> 644 bytes
sizeof(struct xfrm_state)  452 -> 460 bytes

For 400K entries it would take about 3 megs of more memory. Where the
total database size is around 240-250 megs.

> Netlink dumping gives up accuracy[1], uses 50% of the memory you are 
> proposing but abuses more cpu cycles. User space could maintain the
> list you are proposing instead - and compensate for the innaccuracies
> by watching events[2] Of course that shifts the accuracy to events
> which could be lost on their way to user space. This issue is
> alleviated a little more with your approach of keeping the list in
> the kernel and adding new updates to the tail of the dump list
> (which, btw, your patch didnt seem to do); however, even if you solve
> that: ** you are still will be faced with challenges of not being
> atomic on updates; example an entry already on your dump list could
> be deleted before being read by user space. I cant see you solving
> that without abusing a _lot_ more cpu (trying to find the entry on
> the dump list that may have gotten updated or deleted). Theres also
> the issue of how long to keep the dump list before aging it out and
> how to deal with multiple users.

If more entries are added, you can get notifications of them.
 
Cheers,
  Timo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-16 Thread jamal

Timo,

Thanks for the effort you are putting on this.
I would just speak to the concepts instead of the code - hopefully that
would simulate a discussion (and shorten my email)

On Sun, 2008-13-01 at 14:26 +0200, Timo Teräs wrote:
> Hi all,
> 
> The problem with IPsec SP/SA dumping is that they don't work well with large 
> SPD/SADB:s. 
> In netlink the dumping code is O(n^2) and can miss some entries/dump 
> duplicates if the DB 
> is changed between the recv() calls to read the dump entries. This is due to 
> the entry 
> index counting done in xfrm_user code. With af_key the dump entries are just 
> dropped 
> when the socket receive queue is filled.

nod to the above

> I tried to address all these problems, and added the xfrm_policy and 
> xfrm_state structure 
> to a new linked list that is used solely for dumping. This way the dumps can 
> be done in O(n) 
> and have an iterator point to them. I also modified to af_key to stop dumping 
> when receive 
> queue is filling up and continue it from a callback at recvfrom() when 
> there's more room.
> 
> This patch is against 2.6.23 vanilla. I wanted to get some feedback before I 
> make it against 
> the git tree.
> 
> But I'd like to hear about:
> - if the approach is ok (adding the entries in one more list)?
> - if the new/modified variables, structures and function names are ok?
> - if I should port these against net-2.6 or net-2.6.25 git tree?
> - if I should split this to more than one commit? (maybe xfrm core, xfrm user 
> and af_key parts?)
> 

To answer your last 2 questions:
There are two issues that are inter-mingled in there. The most important
is pf_key not being robust on dump. The other being the accurracy of
netlink dumping - this has much broader implications. I think you need
to separate the patches into those two at least and prioritize on the
pf_key as a patch that you push in. I would also try to make the pf_key
dumping approach to mirror what netlink does today - in the worst case
it would be as innacurate as netlink; which is not bad. I think youll
find no resistance getting such a patch in.

To answer your first 2 questions:

My main dilema with your approach is the policy of maintaining such a
list in the kernel and memory consumption needed vs the innacuracy of
netlink dumps. I have run a system with >400K SPD entries and >100K SAD
entries on a running system and i needed a few Gig of RAM to just make
sure my other apps dont get hit by oom at some point or other.  With
your approach of maintaining extra SA/P D, i would need double the RAM
amount. RAM is relatively cheap these days, but latency involved is not
(and one could argue that CPU cycles are much much cheaper).

Netlink dumping gives up accuracy[1], uses 50% of the memory you are
proposing but abuses more cpu cycles. User space could maintain the list
you are proposing instead - and compensate for the innaccuracies by
watching events[2]
Of course that shifts the accuracy to events which could be lost on
their way to user space. This issue is alleviated a little more with
your approach of keeping the list in the kernel and adding new updates
to the tail of the dump list (which, btw, your patch didnt seem to do);
however, even if you solve that: 
** you are still will be faced with challenges of not being atomic on
updates; example an entry already on your dump list could be deleted
before being read by user space. I cant see you solving that without
abusing a _lot_ more cpu (trying to find the entry on the dump list that
may have gotten updated or deleted). Theres also the issue of how long
to keep the dump list before aging it out and how to deal with multiple
users.

Ok, lets say you throw in the computation in there to walk the dump list
and find the proper entry and find out it only consumes 62% cpu of what
current netlink dump approach does, you are still using twice the RAM
needs. And hence it becomes a policy choice and what would be needed is 
some knob for me to select "i dont care about abusing more memory; i
want more accuracy dammit". I can tell you based on my current
experience i would rather use less RAM - but thats a matter of choice at
the moment because i havent come across scenarios of the conflicts where
user space wasnt able to resolve. 

I apologize for the long email, it would have been longer if i commented
on the code.

I hope you find my comments useful and dont see them as rocks being
thrown at you - rather look at them as worth at least 2-cents Canadian
and will stimulate other people speak out.

cheers,
jamal

[1] In cases where a dump happens to not complete while SAD/SPD are
getting concurently updated, it is feasible that a dump list will have
innacurate details.

[2] Racoon for example doesnt do this, hence it needs to dump from the
kernel when purging SAs.  

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC][PATCH] Fixing SA/SP dumps on netlink/af_key

2008-01-13 Thread Timo Teräs
Hi all,

The problem with IPsec SP/SA dumping is that they don't work well with large 
SPD/SADB:s. In netlink the dumping code is O(n^2) and can miss some 
entries/dump duplicates if the DB is changed between the recv() calls to read 
the dump entries. This is due to the entry index counting done in xfrm_user 
code. With af_key the dump entries are just dropped when the socket receive 
queue is filled.

I tried to address all these problems, and added the xfrm_policy and xfrm_state 
structure to a new linked list that is used solely for dumping. This way the 
dumps can be done in O(n) and have an iterator point to them. I also modified 
to af_key to stop dumping when receive queue is filling up and continue it from 
a callback at recvfrom() when there's more room.

This patch is against 2.6.23 vanilla. I wanted to get some feedback before I 
make it against the git tree.

But I'd like to hear about:
- if the approach is ok (adding the entries in one more list)?
- if the new/modified variables, structures and function names are ok?
- if I should port these against net-2.6 or net-2.6.25 git tree?
- if I should split this to more than one commit? (maybe xfrm core, xfrm user 
and af_key parts?)

Cheers,
  Timo

Index: linux-2.6.23/include/net/xfrm.h
===
--- linux-2.6.23.orig/include/net/xfrm.h2008-01-13 14:13:21.0 
+0200
+++ linux-2.6.23/include/net/xfrm.h 2008-01-13 14:13:34.0 +0200
@@ -104,6 +104,7 @@
 struct xfrm_state
 {
/* Note: bydst is re-used during gc */
+   struct list_headall;
struct hlist_node   bydst;
struct hlist_node   bysrc;
struct hlist_node   byspi;
@@ -346,6 +347,7 @@
 struct xfrm_policy
 {
struct xfrm_policy  *next;
+   struct list_headbytype;
struct hlist_node   bydst;
struct hlist_node   byidx;
 
@@ -912,6 +914,17 @@
int priority;
 };
 
+struct xfrm_state_walker {
+   struct xfrm_state *state;
+   int count;
+};
+
+struct xfrm_policy_walker {
+   struct xfrm_policy *policy;
+   int count;
+   int type;
+};
+
 extern void xfrm_init(void);
 extern void xfrm4_init(void);
 extern void xfrm6_init(void);
@@ -921,7 +934,15 @@
 extern void xfrm6_state_init(void);
 extern void xfrm6_state_fini(void);
 
-extern int xfrm_state_walk(u8 proto, int (*func)(struct xfrm_state *, int, 
void*), void *);
+static inline void xfrm_state_walk_start(struct xfrm_state_walker *walk)
+{
+   walk->state = NULL;
+   walk->count = 0;
+}
+
+extern int xfrm_state_walk(struct xfrm_state_walker *walk, u8 proto,
+  int (*func)(struct xfrm_state *, int, void*), void 
*);
+extern void xfrm_state_walk_end(struct xfrm_state_walker *walk);
 extern struct xfrm_state *xfrm_state_alloc(void);
 extern struct xfrm_state *xfrm_state_find(xfrm_address_t *daddr, 
xfrm_address_t *saddr, 
  struct flowi *fl, struct xfrm_tmpl 
*tmpl,
@@ -1025,7 +1046,17 @@
 #endif
 
 struct xfrm_policy *xfrm_policy_alloc(gfp_t gfp);
-extern int xfrm_policy_walk(u8 type, int (*func)(struct xfrm_policy *, int, 
int, void*), void *);
+
+static inline void xfrm_policy_walk_start(struct xfrm_policy_walker *walk)
+{
+   walk->policy = NULL;
+   walk->count = 0;
+   walk->type = XFRM_POLICY_TYPE_MAIN;
+}
+
+extern int xfrm_policy_walk(struct xfrm_policy_walker *walk, u8 type,
+   int (*func)(struct xfrm_policy *, int, int, void*), void *);
+extern void xfrm_policy_walk_end(struct xfrm_policy_walker *walk);
 int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl);
 struct xfrm_policy *xfrm_policy_bysel_ctx(u8 type, int dir,
  struct xfrm_selector *sel,
Index: linux-2.6.23/net/key/af_key.c
===
--- linux-2.6.23.orig/net/key/af_key.c  2008-01-13 14:13:21.0 +0200
+++ linux-2.6.23/net/key/af_key.c   2008-01-13 14:13:34.0 +0200
@@ -48,6 +48,16 @@
struct sock sk;
int registered;
int promisc;
+
+   u8  dump_type;
+   uint8_t dump_msg_version;
+   uint32_tdump_msg_pid;
+   int (*dump)(struct pfkey_sock *sk);
+   void(*dump_done)(struct pfkey_sock *sk);
+   union {
+   struct xfrm_policy_walker   policy;
+   struct xfrm_state_walkerstate;
+   } u;
 };
 
 static inline struct pfkey_sock *pfkey_sk(struct sock *sk)
@@ -55,6 +65,30 @@
return (struct pfkey_sock *)sk;
 }
 
+static int pfkey_can_dump(struct sock *sk)
+{
+   if (3 * atomic_read(&sk->sk_rmem_alloc) <= 2 * sk->sk_rcvbuf)
+   return 1;
+   return 0;
+}
+
+static int pfkey_do_dump(struct pfkey_sock *pfk)
+{
+   int rc;
+
+   rc = pfk->dump(pfk);
+   if (rc != 0) {
+