Re: Business Proposal

2018-11-01 Thread Edward Yuan


Dear Friend, 

  My name is Mr. Edward Yuan, a consultant/broker. I know you might be a bit 
apprehensive because you do not know me. Nevertheless, I have a proposal on 
behalf of a client, a lucrative business that might be of mutual benefit to you.

If interested in this proposition please kindly and urgently contact me for 
more details. 

Best Regards.
Mr. Edward Yuan.

---
This email has been checked for viruses by AVG.
https://www.avg.com



Re: Security enhancement proposal for kernel TLS

2018-08-03 Thread Dave Watson
On 08/02/18 05:23 PM, Vakul Garg wrote:
> > I agree that Boris' patch does what you say it does - it sets keys 
> > immediately
> > after CCS instead of after FINISHED message.  I disagree that the kernel tls
> > implementation currently requires that specific ordering, nor do I think 
> > that it
> > should require that ordering.
> 
> The current kernel implementation assumes record sequence number to start 
> from '0'.
> If keys have to be set after FINISHED message, then record sequence number 
> need to
> be communicated from user space TLS stack to kernel. IIRC, sequence number is 
> not 
> part of the interface through which key is transferred.

The setsockopt call struct takes the key, iv, salt, and seqno:

struct tls12_crypto_info_aes_gcm_128 {
struct tls_crypto_info info;
unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
};


RE: Security enhancement proposal for kernel TLS

2018-08-02 Thread Vakul Garg



> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Thursday, August 2, 2018 2:17 AM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; Peter Doliwa ; Boris
> Pismenny 
> Subject: Re: Security enhancement proposal for kernel TLS
> 
> On 07/31/18 10:45 AM, Vakul Garg wrote:
> > > > IIUC, with the upstream implementation of tls record layer in
> > > > kernel, the decryption of tls FINISHED message happens in kernel.
> > > > Therefore the keys are already being sent to kernel tls socket
> > > > before handshake is
> > > completed.
> > >
> > > This is incorrect.
> >
> > Let us first reach a common ground on this.
> >
> >  The kernel TLS implementation can decrypt only after setting the keys on
> the socket.
> > The TLS message 'finished' (which is encrypted) is received after receiving
> 'CCS'
> > message. After the user space  TLS library receives CCS message, it
> > sets the keys on kernel TLS socket. Therefore, the next message in the
> > socket receive queue which is TLS finished gets decrypted in kernel only.
> >
> > Please refer to following Boris's patch on openssl. The  commit log says:
> > " We choose to set this option at the earliest - just after CCS is 
> > complete".
> 
> I agree that Boris' patch does what you say it does - it sets keys immediately
> after CCS instead of after FINISHED message.  I disagree that the kernel tls
> implementation currently requires that specific ordering, nor do I think that 
> it
> should require that ordering.

The current kernel implementation assumes record sequence number to start from 
'0'.
If keys have to be set after FINISHED message, then record sequence number need 
to
be communicated from user space TLS stack to kernel. IIRC, sequence number is 
not 
part of the interface through which key is transferred.



Re: Security enhancement proposal for kernel TLS

2018-08-01 Thread Dave Watson
On 07/31/18 10:45 AM, Vakul Garg wrote:
> > > IIUC, with the upstream implementation of tls record layer in kernel,
> > > the decryption of tls FINISHED message happens in kernel. Therefore
> > > the keys are already being sent to kernel tls socket before handshake is
> > completed.
> > 
> > This is incorrect.  
> 
> Let us first reach a common ground on this.
> 
>  The kernel TLS implementation can decrypt only after setting the keys on the 
> socket.
> The TLS message 'finished' (which is encrypted) is received after receiving 
> 'CCS'
> message. After the user space  TLS library receives CCS message, it sets the 
> keys
> on kernel TLS socket. Therefore, the next message in the  socket receive queue
> which is TLS finished gets decrypted in kernel only.
> 
> Please refer to following Boris's patch on openssl. The  commit log says:
> " We choose to set this option at the earliest - just after CCS is complete".

I agree that Boris' patch does what you say it does - it sets keys
immediately after CCS instead of after FINISHED message.  I disagree
that the kernel tls implementation currently requires that specific
ordering, nor do I think that it should require that ordering.


RE: Security enhancement proposal for kernel TLS

2018-07-31 Thread Vakul Garg



> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Tuesday, July 31, 2018 2:46 AM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; Peter Doliwa ; Boris
> Pismenny 
> Subject: Re: Security enhancement proposal for kernel TLS
> 
> On 07/30/18 06:31 AM, Vakul Garg wrote:
> > > It's not entirely clear how your TLS handshake daemon works -   Why is
> > > it necessary to set the keys in the kernel tls socket before the
> > > handshake is completed?
> >
> > IIUC, with the upstream implementation of tls record layer in kernel,
> > the decryption of tls FINISHED message happens in kernel. Therefore
> > the keys are already being sent to kernel tls socket before handshake is
> completed.
> 
> This is incorrect.  

Let us first reach a common ground on this.

 The kernel TLS implementation can decrypt only after setting the keys on the 
socket.
The TLS message 'finished' (which is encrypted) is received after receiving 
'CCS'
message. After the user space  TLS library receives CCS message, it sets the 
keys
on kernel TLS socket. Therefore, the next message in the  socket receive queue
which is TLS finished gets decrypted in kernel only.

Please refer to following Boris's patch on openssl. The  commit log says:
" We choose to set this option at the earliest - just after CCS is complete".

--
commit a01dd062a32c687630b2a860b4bb053008f09ff5
Author: Boris Pismenny 
Date:   Sun Mar 11 16:18:27 2018 +0200

ssl: Linux TLS Rx Offload

This patch adds support for the Linux TLS Rx socket option.
It completes the previous patch for TLS Tx offload.
If the socket option is successful, then the receive data-path of the TCP
socket is implemented by the kernel.
We choose to set this option at the earliest - just after CCS is complete.
--

The  fact that keys are handed over to kernel TLS socket can also be verified
by putting a log in tls_sw_recvmsg().

I would stop here for you to confirm my observation first. 
Regards. Vakul


 > Currently the kernel TLS implementation decrypts
> everything after you set the keys on the socket.  I'm suggesting that you
> don't set the keys on the socket until after the FINISHED message.
> 
> > > Or, why do you need to hand off the fd to the client program before
> > > the handshake is completed?
> >
> > The fd is always owned by the client program..
> >
> > In my proposal, the applications poll their own tcp socket using
> read/recvmsg etc.
> > If they get handshake record, they forward it to the entity running
> handshake agent.
> > The handshake agent could be a linux daemon or could run on a separate
> > security processor like 'Secure element' or say arm trustzone etc. The
> > applications forward any handshake message it gets backs from
> > handshake agent to the connected tcp socket. Therefore, the
> > applications act as a forwarder of the handshake messages between the
> peer tls endpoint and handshake agent.
> > The received data messages are absorbed by the applications themselves
> > (bypassing ssl stack completely). Similarly, the applications tx data 
> > directly
> by writing on their socket.
> >
> > > Waiting until after handshake solves both of these issues.
> >
> > The security sensitive check which is 'Wait for handshake to finish
> > completely before accepting data' should not be the onus of the
> > application. We have enough examples in past where application
> > programmers made mistakes in setting up tls correctly. The idea is to
> isolate tls session setting up from the applications.
> 
> It's not clear to me what you gain by putting this 'handshake finished'
> notification in the kernel instead of in the client's tls library - you're 
> already
> forwarding the handshake start notification to the daemon, why can't the
> daemon notify them back in userspace that
> the handshake is finished?
> 
> If you did want to put the notification in the kernel, how would you handle
> poll on the socket, since probably both the handshake daemon and client
> might be polling the socket, but one for control messages and one for data?
> 
> The original kernel TLS RFC did split these to two separate sockets, but we
> decided it was too complicated, and that's not how userspace TLS clients
> function today.
> 
> Do you have an implementation of this?  There are a bunch of tricky corner
> cases here, it might make more sense to have something concrete to discuss.
> 
> > Further, as per tls RFC it is ok to piggyback the data records after
> > the finished handshake message. This is called early data

Re: Security enhancement proposal for kernel TLS

2018-07-30 Thread Dave Watson
On 07/30/18 06:31 AM, Vakul Garg wrote:
> > It's not entirely clear how your TLS handshake daemon works -   Why is
> > it necessary to set the keys in the kernel tls socket before the handshake 
> > is
> > completed? 
> 
> IIUC, with the upstream implementation of tls record layer in kernel, the
> decryption of tls FINISHED message happens in kernel. Therefore the keys are
> already being sent to kernel tls socket before handshake is completed.

This is incorrect.  Currently the kernel TLS implementation decrypts
everything after you set the keys on the socket.  I'm suggesting that
you don't set the keys on the socket until after the FINISHED message.

> > Or, why do you need to hand off the fd to the client program
> > before the handshake is completed?
>   
> The fd is always owned by the client program..
> 
> In my proposal, the applications poll their own tcp socket using read/recvmsg 
> etc.
> If they get handshake record, they forward it to the entity running handshake 
> agent.
> The handshake agent could be a linux daemon or could run on a separate 
> security
> processor like 'Secure element' or say arm trustzone etc. The applications
> forward any handshake message it gets backs from handshake agent to the
> connected tcp socket. Therefore, the  applications act as a forwarder of the 
> handshake 
> messages between the peer tls endpoint and handshake agent.
> The received data messages are absorbed by the applications themselves 
> (bypassing ssl stack
> completely). Similarly, the applications tx data directly by writing on their 
> socket.
> 
> > Waiting until after handshake solves both of these issues.
>  
> The security sensitive check which is 'Wait for handshake to finish 
> completely before 
> accepting data' should not be the onus of the application. We have enough 
> examples
> in past where application programmers made mistakes in setting up tls 
> correctly. The idea
> is to isolate tls session setting up from the applications.

It's not clear to me what you gain by putting this 'handshake
finished' notification in the kernel instead of in the client's tls
library - you're already forwarding the handshake start notification
to the daemon, why can't the daemon notify them back in userspace that
the handshake is finished?   

If you did want to put the notification in the kernel, how would you
handle poll on the socket, since probably both the handshake daemon
and client might be polling the socket, but one for control messages
and one for data? 

The original kernel TLS RFC did split these to two separate sockets,
but we decided it was too complicated, and that's not how userspace
TLS clients function today.

Do you have an implementation of this?  There are a bunch of tricky
corner cases here, it might make more sense to have something concrete
to discuss.

> Further, as per tls RFC it is ok to piggyback the data records after the 
> finished handshake
> message. This is called early data. But then it is the responsibility of 
> applications to first
> complete finished message processing before accepting the data records.
> 
> The proposal is to disallow application world seeing data records
> before handshake finishes.

You're talking about the TLS 1.3 0-RTT feature, which is indeed an
interesting case.  For in-process TLS libraries, it's fairly easy to
punt, and don't set the kernel TLS keys until after the 0-RTT data +
handshake message.  For an OOB handshake daemon it might indeed make
more sense to leave the data in kernelspace ... somehow.

> > >   - The handshake state should fallback to 'unverified' in case a control
> > record is seen again by kernel TLS (e.g. in case of renegotiation, post
> > handshake client auth etc).
> > 
> > Currently kernel tls sockets return an error unless you explicitly handle 
> > the
> > control record for exactly this reason.
> 
> IIRC, any kind handshake message post handshake-completion is a problem for 
> kernel tls.
> This includes renegotiation, post handshake client-auth etc.
> 
> Please correct me if I am wrong.

You are correct, but currently kernel TLS sockets return an error
unless you explicitly handle the control message.  This should be
enough already to implement your proposal. 



RE: Security enhancement proposal for kernel TLS

2018-07-30 Thread Vakul Garg
Sorry for a delayed response.
Kindly see inline.

> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Wednesday, July 25, 2018 9:30 PM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; Peter Doliwa ; Boris
> Pismenny 
> Subject: Re: Security enhancement proposal for kernel TLS
> 
> You would probably get more responses if you cc the relevant people.
> Comments inline
> 
> On 07/22/18 12:49 PM, Vakul Garg wrote:
> > The kernel based TLS record layer allows the user space world to use a
> decoupled TLS implementation.
> > The applications need not be linked with TLS stack.
> > The TLS handshake can be done by a TLS daemon on the behalf of
> applications.
> >
> > Presently, as soon as the handshake process derives keys, it pushes the
> negotiated keys to kernel TLS .
> > Thereafter the applications can directly read and write data on their TCP
> socket (without having to use SSL apis).
> >
> > With the current kernel TLS implementation, there is a security problem.
> > Since the kernel TLS socket does not have information about the state
> > of handshake, it allows applications to be able to receive data from the
> peer TLS endpoint even when the handshake verification has not been
> completed by the SSL daemon.
> > It is a security problem if applications can receive data if verification 
> > of the
> handshake transcript is not completed (done with processing tls FINISHED
> message).
> >
> > My proposal:
> > - Kernel TLS should maintain state of handshake (verified or
> unverified).
> > In un-verified state, data records should not be allowed pass through
> to the applications.
> >
> > - Add a new control interface using which that the user space SSL
> stack can tell the TLS socket that handshake has been verified and DATA
> records can flow.
> > In 'unverified' state, only control records should be allowed to pass
> and reception DATA record should be pause the receive side record
> decryption.
> 
> It's not entirely clear how your TLS handshake daemon works -   Why is
> it necessary to set the keys in the kernel tls socket before the handshake is
> completed? 

IIUC, with the upstream implementation of tls record layer in kernel, the
decryption of tls FINISHED message happens in kernel. Therefore the keys are
already being sent to kernel tls socket before handshake is completed.

> Or, why do you need to hand off the fd to the client program
> before the handshake is completed?
  
The fd is always owned by the client program..
The client program opens up the socket, TCP bind/connect it and then
hands it over to SSL stack as a transport handle for exchanging handshake
messages. This is how it works today whether we use kernel TLS or not.
I do not propose to change it.

In my proposal, the applications poll their own tcp socket using read/recvmsg 
etc.
If they get handshake record, they forward it to the entity running handshake 
agent.
The handshake agent could be a linux daemon or could run on a separate security
processor like 'Secure element' or say arm trustzone etc. The applications
forward any handshake message it gets backs from handshake agent to the
connected tcp socket. Therefore, the  applications act as a forwarder of the 
handshake 
messages between the peer tls endpoint and handshake agent.
The received data messages are absorbed by the applications themselves 
(bypassing ssl stack
completely). Similarly, the applications tx data directly by writing on their 
socket.

> Waiting until after handshake solves both of these issues.
 
The security sensitive check which is 'Wait for handshake to finish completely 
before 
accepting data' should not be the onus of the application. We have enough 
examples
in past where application programmers made mistakes in setting up tls 
correctly. The idea
is to isolate tls session setting up from the applications.

> 
> I'm not aware of any tls libraries that send data before the finished message,
> is there any reason you need to support this?

Sending data records before sending finished message is a protocol error.
A good tls library never does that. But an attacker can exploit it if 
applications can receive
the  data records before handshake is finished. With current kernel TLS, it is 
possible to do so.

Further, as per tls RFC it is ok to piggyback the data records after the 
finished handshake
message. This is called early data. But then it is the responsibility of 
applications to first
complete finished message processing before accepting the data records.

The proposal is to disallow application world seeing data records before 
handshake finishes.

> 
> >
> > - The handshake state should fallback to 'unverified' in case a control
> record is seen again by k

Re: Security enhancement proposal for kernel TLS

2018-07-25 Thread Dave Watson
You would probably get more responses if you cc the relevant people.
Comments inline

On 07/22/18 12:49 PM, Vakul Garg wrote:
> The kernel based TLS record layer allows the user space world to use a 
> decoupled TLS implementation.
> The applications need not be linked with TLS stack. 
> The TLS handshake can be done by a TLS daemon on the behalf of applications.
> 
> Presently, as soon as the handshake process derives keys, it pushes the 
> negotiated keys to kernel TLS . 
> Thereafter the applications can directly read and write data on their TCP 
> socket (without having to use SSL apis).
> 
> With the current kernel TLS implementation, there is a security problem. 
> Since the kernel TLS socket does not have information about the state of 
> handshake, 
> it allows applications to be able to receive data from the peer TLS endpoint 
> even when the handshake verification has not been completed by the SSL 
> daemon. 
> It is a security problem if applications can receive data if verification of 
> the handshake transcript is not completed (done with processing tls FINISHED 
> message).
> 
> My proposal:
>   - Kernel TLS should maintain state of handshake (verified or 
> unverified). 
>   In un-verified state, data records should not be allowed pass through 
> to the applications.
> 
>   - Add a new control interface using which that the user space SSL stack 
> can tell the TLS socket that handshake has been verified and DATA records can 
> flow. 
>   In 'unverified' state, only control records should be allowed to pass 
> and reception DATA record should be pause the receive side record decryption.

It's not entirely clear how your TLS handshake daemon works -   Why is
it necessary to set the keys in the kernel tls socket before the
handshake is completed?  Or, why do you need to hand off the fd to the
client program before the handshake is completed?  

Waiting until after handshake solves both of these issues.

I'm not aware of any tls libraries that send data before the finished
message, is there any reason you need to support this?

> 
>   - The handshake state should fallback to 'unverified' in case a control 
> record is seen again by kernel TLS (e.g. in case of renegotiation, post 
> handshake client auth etc).

Currently kernel tls sockets return an error unless you explicitly
handle the control record for exactly this reason.

If you want an external daemon to handle control messages after
handshake, there definitely might be some synchronization that would
make sense to push in the kernel.  However, with TLS 1.3 removing
renegotiation (and currently reneg is not implemented in kernel tls
anyway), there's much less reason to do so.


Security enhancement proposal for kernel TLS

2018-07-22 Thread Vakul Garg
Hi

The kernel based TLS record layer allows the user space world to use a 
decoupled TLS implementation.
The applications need not be linked with TLS stack. 
The TLS handshake can be done by a TLS daemon on the behalf of applications.

Presently, as soon as the handshake process derives keys, it pushes the 
negotiated keys to kernel TLS . 
Thereafter the applications can directly read and write data on their TCP 
socket (without having to use SSL apis).

With the current kernel TLS implementation, there is a security problem. 
Since the kernel TLS socket does not have information about the state of 
handshake, 
it allows applications to be able to receive data from the peer TLS endpoint 
even when the handshake verification has not been completed by the SSL daemon. 
It is a security problem if applications can receive data if verification of 
the handshake transcript is not completed (done with processing tls FINISHED 
message).

My proposal:
- Kernel TLS should maintain state of handshake (verified or 
unverified). 
In un-verified state, data records should not be allowed pass through 
to the applications.

- Add a new control interface using which that the user space SSL stack 
can tell the TLS socket that handshake has been verified and DATA records can 
flow. 
In 'unverified' state, only control records should be allowed to pass 
and reception DATA record should be pause the receive side record decryption.

- The handshake state should fallback to 'unverified' in case a control 
record is seen again by kernel TLS (e.g. in case of renegotiation, post 
handshake client auth etc).

Kindly comment.

Regards

Vakul


Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks


Proposal

2018-07-12 Thread Miss Victoria Mehmet
Hello

I have a business proposal of mutual benefits i would like to discuss with
you.


Business Proposal

2018-07-05 Thread BRENDA WILSON



I am Sgt.Brenda Wilson, originally from Lake Jackson Texas USA.I personally 
made a special research and I came across your information. I am presently 
writing this mail to you from U.S Military base Kabul Afghanistan I have a 
secured business proposal for you. Reply for more details via my private E-mail 
( brendawilson...@hotmail.com )


Business Proposal

2018-07-05 Thread BRENDA WILSON



I am Sgt.Brenda Wilson, originally from Lake Jackson Texas USA.I personally 
made a special research and I came across your information. I am presently 
writing this mail to you from U.S Military base Kabul Afghanistan I have a 
secured business proposal for you. Reply for more details via my private E-mail 
( brendawilson...@hotmail.com )


Proposal

2018-06-07 Thread Mr. Fawaz KhE. Al Saleh




--
Good day,

i know you do not know me personally but i have checked your profile  
and i see generosity in you, There's an urgent offer attach

to your name here in the office of Mr. Fawaz KhE. Al Saleh Member of
the Board of Directors, Kuveyt Türk Participation Bank  (Turkey) and
head of private banking and wealth management
Regards,
Mr. Fawaz KhE. Al Saleh



Proposal

2018-06-02 Thread Miss Victoria Mehmet




--
Hello

I have been trying to contact you. Did you get my business proposal?

Best Regards,
Miss.Victoria Mehmet


Lucrative Business Proposal

2018-06-02 Thread Adrien Saif




--
Dear Friend,

I would like to discuss a very important issue with you. I am writing 
to find out if this is your valid email. Please, let me know if this 
email is valid


Kind regards
Adrien Saif
Attorney to Quatif Group of Companies


Lucrative Business Proposal

2018-06-02 Thread Adrien Saif




--
Dear Friend,

I would like to discuss a very important issue with you. I am writing 
to find out if this is your valid email. Please, let me know if this 
email is valid


Kind regards
Adrien Saif
Attorney to Quatif Group of Companies


Proposal

2018-05-28 Thread Miss Zeliha Omer Faruk




--
Hello

I have been trying to contact you. Did you get my business proposal?

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turke


Proposal

2018-05-27 Thread Miss Zeliha Omer Faruk



--
Hello

I have been trying to contact you. Did you get my business proposal?

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turke


Proposal

2018-05-26 Thread Miss Zeliha Omer Faruk



Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey



Proposal

2018-05-22 Thread Miss Zeliha Omer Faruk



Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey



Proposal

2018-05-17 Thread Miss Zeliha Omer Faruk



Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey



Proposal

2018-05-16 Thread Miss Zeliha Omer Faruk



Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey



Proposal

2018-05-13 Thread Zeliha Omer Faruk



--
Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey


Proposal

2018-05-11 Thread Zeliha Omer Faruk



--
Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey


Proposal

2018-05-09 Thread Zeliha Omer Faruk



--
Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey


Proposal

2018-05-09 Thread Zeliha Omer Faruk



--
Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey


Proposal

2018-05-09 Thread Zeliha Omer Faruk



--
Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey


Proposal

2018-04-30 Thread Miss Zeliha Omer Faruk



Hello

   Greetings to you today i asked before but i did't get a response please
i know this might come to you as a surprise because you do not know me
personally i have a business proposal for our mutual benefit please let
me know if you are interested.



Best Regards,

Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
 Sisli - Istanbul, Turkey





Proposal

2018-04-26 Thread MS Zeliha Omer Faruk



Hello

   Greetings to you today i asked before but i did't get a response please
i know this might come to you as a surprise because you do not know me
personally i have a business proposal for you please reply for more
info.



Best Regards,

Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
 Sisli - Istanbul, Turkey



Proposal

2018-04-26 Thread MS Zeliha Omer Faruk



Hello

   Greetings to you today i asked before but i did't get a response please
i know this might come to you as a surprise because you do not know me
personally i have a business proposal for you please reply for more
info.



Best Regards,

Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
 Sisli - Istanbul, Turkey



Business Proposal,

2018-04-25 Thread Mrs Zeliha Faruk
Hello Dear 

Greetings to you, please I have a very important business proposal for our 
mutual benefit, please let me know if you are interested.

Best Regards,
Miss. Zeliha ömer Faruk
Caddesi Kristal Kule Binasi
No:215


Proposal

2018-04-16 Thread MS Zeliha Omer Faruk



Hello

Greeetings to you please did you get my previous email regarding my
investment proposal last week friday ?

MS.Zeliha ömer faruk
zeliha.omer.fa...@gmail.com



business Proposal / Geschäftsvorschlag

2018-04-07 Thread Anders Karlsson
I have a business Proposal for you, contact me directly 
This business has a cash involvement of $250,000,000.00

Anders Karlsson

Ich habe einen Geschäftsvorschlag für Sie, kontaktieren Sie mich direkt

Dieses Unternehmen hat eine Beteiligung von $ 250.000.000,00

- [] Anders Karlsson


[PATCH net-next 1/3] net/smc: restructure netinfo for CLC proposal msgs

2018-03-16 Thread Ursula Braun
From: Karsten Graul <kgr...@linux.vnet.ibm.com>

Introduce functions smc_clc_prfx_set to retrieve IP information for the
CLC proposal msg and smc_clc_prfx_match to match the contents of a
proposal message against the IP addresses of the net device. The new
functions replace the functionality provided by smc_clc_netinfo_by_tcpsk,
which is removed by this patch. The match functionality is extended to
scan all ipv4 addresses of the net device for a match against the
ipv4 subnet from the proposal msg.

Signed-off-by: Karsten Graul <kgr...@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubr...@linux.vnet.ibm.com>
---
 net/smc/af_smc.c  |  14 ++--
 net/smc/smc_clc.c | 100 +-
 net/smc/smc_clc.h |   4 +--
 3 files changed, 82 insertions(+), 36 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 649489f825a5..949a2714a453 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -767,8 +767,6 @@ static void smc_listen_work(struct work_struct *work)
struct smc_link *link;
int reason_code = 0;
int rc = 0;
-   __be32 subnet;
-   u8 prefix_len;
u8 ibport;
 
/* check if peer is smc capable */
@@ -803,17 +801,11 @@ static void smc_listen_work(struct work_struct *work)
goto decline_rdma;
}
 
-   /* determine subnet and mask from internal TCP socket */
-   rc = smc_clc_netinfo_by_tcpsk(newclcsock, , _len);
-   if (rc) {
-   reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */
-   goto decline_rdma;
-   }
-
pclc = (struct smc_clc_msg_proposal *)
pclc_prfx = smc_clc_proposal_get_prefix(pclc);
-   if (pclc_prfx->outgoing_subnet != subnet ||
-   pclc_prfx->prefix_len != prefix_len) {
+
+   rc = smc_clc_prfx_match(newclcsock, pclc_prfx);
+   if (rc) {
reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */
goto decline_rdma;
}
diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c
index 874c5a75d6dd..dc3a2235978d 100644
--- a/net/smc/smc_clc.c
+++ b/net/smc/smc_clc.c
@@ -74,15 +74,35 @@ static bool smc_clc_msg_hdr_valid(struct smc_clc_msg_hdr 
*clcm)
return true;
 }
 
-/* determine subnet and mask of internal TCP socket */
-int smc_clc_netinfo_by_tcpsk(struct socket *clcsock,
-__be32 *subnet, u8 *prefix_len)
+/* find ipv4 addr on device and get the prefix len, fill CLC proposal msg */
+static int smc_clc_prfx_set4_rcu(struct dst_entry *dst, __be32 ipv4,
+struct smc_clc_msg_proposal_prefix *prop)
+{
+   struct in_device *in_dev = __in_dev_get_rcu(dst->dev);
+
+   if (!in_dev)
+   return -ENODEV;
+   for_ifa(in_dev) {
+   if (!inet_ifa_match(ipv4, ifa))
+   continue;
+   prop->prefix_len = inet_mask_len(ifa->ifa_mask);
+   prop->outgoing_subnet = ifa->ifa_address & ifa->ifa_mask;
+   /* prop->ipv6_prefixes_cnt = 0; already done by memset before */
+   return 0;
+   } endfor_ifa(in_dev);
+   return -ENOENT;
+}
+
+/* retrieve and set prefixes in CLC proposal msg */
+static int smc_clc_prfx_set(struct socket *clcsock,
+   struct smc_clc_msg_proposal_prefix *prop)
 {
struct dst_entry *dst = sk_dst_get(clcsock->sk);
-   struct in_device *in_dev;
-   struct sockaddr_in addr;
+   struct sockaddr_storage addrs;
+   struct sockaddr_in *addr;
int rc = -ENOENT;
 
+   memset(prop, 0, sizeof(*prop));
if (!dst) {
rc = -ENOTCONN;
goto out;
@@ -91,22 +111,58 @@ int smc_clc_netinfo_by_tcpsk(struct socket *clcsock,
rc = -ENODEV;
goto out_rel;
}
-
/* get address to which the internal TCP socket is bound */
-   kernel_getsockname(clcsock, (struct sockaddr *));
-   /* analyze IPv4 specific data of net_device belonging to TCP socket */
+   kernel_getsockname(clcsock, (struct sockaddr *));
+   /* analyze IP specific data of net_device belonging to TCP socket */
rcu_read_lock();
-   in_dev = __in_dev_get_rcu(dst->dev);
+   if (addrs.ss_family == PF_INET) {
+   /* IPv4 */
+   addr = (struct sockaddr_in *)
+   rc = smc_clc_prfx_set4_rcu(dst, addr->sin_addr.s_addr, prop);
+   }
+   rcu_read_unlock();
+out_rel:
+   dst_release(dst);
+out:
+   return rc;
+}
+
+/* match ipv4 addrs of dev against addr in CLC proposal */
+static int smc_clc_prfx_match4_rcu(struct net_device *dev,
+  struct smc_clc_msg_proposal_prefix *prop)
+{
+   struct in_device *in_dev = __in_dev_get_rcu(dev);
+
+   if (!in_dev)
+   return -ENODEV;
for_ifa(in_dev) {
-   if (!inet_ifa_match(ad

Proposal

2018-02-21 Thread melisa mehmet
Hello

Greetings to you and everyone around you please did you get my previous email 
regarding my proposal ?
please let me know if we can work together on this.

Best Reagrds


Business Proposal Of $18,100,000.00

2017-12-13 Thread Mr Youichi Kanno


Dear Sir/Madam,

My name is Youichi Kanno and I work in Audit & credit Supervisory role at 
The Norinchukin Bank,I am contacting you regarding the asset of a deceased 
client Mr. Grigor Kassan and I need your assistance to process the fund 
claims oF $18,100,000.00. if intreasted get back to me so we can discuss 
the logistic of moving the funds to a safe offshore bank.

Yours sincerely,
Youichi Kanno



[PATCH net-next 6/6] smc: support variable CLC proposal messages

2017-12-07 Thread Ursula Braun
According to RFC7609 [1] the CLC proposal message contains an area of
unknown length for future growth. Additionally it may contain up to
8 IPv6 prefixes. The current version of the SMC-code does not
understand CLC proposal messages using these variable length fields and,
thus, is incompatible with SMC implementations in other operating
systems.

This patch makes sure, SMC understands incoming CLC proposals
* with arbitrary length values for future growth
* with up to 8 IPv6 prefixes

[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609

Signed-off-by: Ursula Braun <ubr...@linux.vnet.ibm.com>
Reviewed-by: Hans Wippel <hwip...@linux.vnet.ibm.com>
---
 net/smc/af_smc.c  | 15 ++
 net/smc/smc_clc.c | 82 ++-
 net/smc/smc_clc.h | 34 +++
 3 files changed, 107 insertions(+), 24 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index d3ae0d5b1677..daf8075f5a4c 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -751,14 +751,16 @@ static void smc_listen_work(struct work_struct *work)
 {
struct smc_sock *new_smc = container_of(work, struct smc_sock,
smc_listen_work);
+   struct smc_clc_msg_proposal_prefix *pclc_prfx;
struct socket *newclcsock = new_smc->clcsock;
struct smc_sock *lsmc = new_smc->listen_smc;
struct smc_clc_msg_accept_confirm cclc;
int local_contact = SMC_REUSE_CONTACT;
struct sock *newsmcsk = _smc->sk;
-   struct smc_clc_msg_proposal pclc;
+   struct smc_clc_msg_proposal *pclc;
struct smc_ib_device *smcibdev;
struct sockaddr_in peeraddr;
+   u8 buf[SMC_CLC_MAX_LEN];
struct smc_link *link;
int reason_code = 0;
int rc = 0, len;
@@ -775,7 +777,7 @@ static void smc_listen_work(struct work_struct *work)
/* do inband token exchange -
 *wait for and receive SMC Proposal CLC message
 */
-   reason_code = smc_clc_wait_msg(new_smc, , sizeof(pclc),
+   reason_code = smc_clc_wait_msg(new_smc, , sizeof(buf),
   SMC_CLC_PROPOSAL);
if (reason_code < 0)
goto out_err;
@@ -804,8 +806,11 @@ static void smc_listen_work(struct work_struct *work)
reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */
goto decline_rdma;
}
-   if ((pclc.outgoing_subnet != subnet) ||
-   (pclc.prefix_len != prefix_len)) {
+
+   pclc = (struct smc_clc_msg_proposal *)
+   pclc_prfx = smc_clc_proposal_get_prefix(pclc);
+   if (pclc_prfx->outgoing_subnet != subnet ||
+   pclc_prfx->prefix_len != prefix_len) {
reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */
goto decline_rdma;
}
@@ -816,7 +821,7 @@ static void smc_listen_work(struct work_struct *work)
/* allocate connection / link group */
mutex_lock(_create_lgr_pending);
local_contact = smc_conn_create(new_smc, peeraddr.sin_addr.s_addr,
-   smcibdev, ibport, , 0);
+   smcibdev, ibport, >lcl, 0);
if (local_contact < 0) {
rc = local_contact;
if (rc == -ENOMEM)
diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c
index f5e17d29112b..abf7ceb6690b 100644
--- a/net/smc/smc_clc.c
+++ b/net/smc/smc_clc.c
@@ -22,6 +22,54 @@
 #include "smc_clc.h"
 #include "smc_ib.h"
 
+/* check if received message has a correct header length and contains valid
+ * heading and trailing eyecatchers
+ */
+static bool smc_clc_msg_hdr_valid(struct smc_clc_msg_hdr *clcm)
+{
+   struct smc_clc_msg_proposal_prefix *pclc_prfx;
+   struct smc_clc_msg_accept_confirm *clc;
+   struct smc_clc_msg_proposal *pclc;
+   struct smc_clc_msg_decline *dclc;
+   struct smc_clc_msg_trail *trl;
+
+   if (memcmp(clcm->eyecatcher, SMC_EYECATCHER, sizeof(SMC_EYECATCHER)))
+   return false;
+   switch (clcm->type) {
+   case SMC_CLC_PROPOSAL:
+   pclc = (struct smc_clc_msg_proposal *)clcm;
+   pclc_prfx = smc_clc_proposal_get_prefix(pclc);
+   if (ntohs(pclc->hdr.length) !=
+   sizeof(*pclc) + ntohs(pclc->iparea_offset) +
+   sizeof(*pclc_prfx) +
+   pclc_prfx->ipv6_prefixes_cnt *
+   sizeof(struct smc_clc_ipv6_prefix) +
+   sizeof(*trl))
+   return false;
+   trl = (struct smc_clc_msg_trail *)
+   ((u8 *)pclc + ntohs(pclc->hdr.length) - sizeof(*trl));
+   break;
+   case SMC_CLC_ACCEPT:
+   case SMC_CLC_CONFIRM:
+   clc = (struct smc_clc_msg_accept_confirm *)clcm;
+   if (nt

[PATCH net-next 07/12] rxrpc: Don't transmit DELAY ACKs immediately on proposal

2017-11-24 Thread David Howells
Don't transmit a DELAY ACK immediately on proposal when the Rx window is
rotated, but rather defer it to the work function.  This means that we have
a chance to queue/consume more received packets before we actually send the
DELAY ACK, or even cancel it entirely, thereby reducing the number of
packets transmitted.

We do, however, want to continue sending other types of packet immediately,
particularly REQUESTED ACKs, as they may be used for RTT calculation by the
other side.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/recvmsg.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 0b6609da80b7..fad5f42a3abd 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -219,9 +219,9 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
after_eq(top, call->ackr_seen + 2) ||
(hard_ack == top && after(hard_ack, call->ackr_consumed)))
rxrpc_propose_ACK(call, RXRPC_ACK_DELAY, 0, serial,
- true, false,
+ true, true,
  rxrpc_propose_ack_rotate_rx);
-   if (call->ackr_reason)
+   if (call->ackr_reason && call->ackr_reason != RXRPC_ACK_DELAY)
rxrpc_send_ack_packet(call, false);
}
 }



Re: [virtio-dev] repost: af_packet vs virtio (was packed ring layout proposal v2)

2017-08-06 Thread Adam Tao
On Wed, Aug 02, 2017 at 04:50:03PM +0300, Michael S. Tsirkin wrote:
> On Tue, Aug 01, 2017 at 08:54:27PM -0700, Steven Luong wrote:
> > * Descriptor ring:
> > 
> > Guest adds descriptors with unique index values and DESC_HW set in 
> > flags.
> > Host overwrites used descriptors with correct len, index, and DESC_HW
> > clear.? Flags are always set/cleared last.
> > 
> > #define DESC_HW 0x0080
> > 
> > struct desc {
> > ? ? ? ? __le64 addr;
> > ? ? ? ? __le32 len;
> > ? ? ? ? __le16 index;
> > ? ? ? ? __le16 flags;
> > };
> > 
> > When DESC_HW is set, descriptor belongs to device. When it is clear,
> > it belongs to the driver.
> > 
> > We can use 1 bit to set direction
> > /* This marks a buffer as write-only (otherwise read-only). */
> > #define VRING_DESC_F_WRITE? ? ? 2
> > 
> > * Scatter/gather support
> > 
> > We can use 1 bit to chain s/g entries in a request, same as virtio 1.0:
> > 
> > /* This marks a buffer as continuing via the next field. */
next field seems like a structure field in the software, maybe we need
to change the "next field" to "next desc" to avoid misunderstanding.
> > 
> > 
> > This comment here is confusing to me. In 1.0, virtq_desc has the next field.
> > When the flag VRING_DESC_F_NEXT is set, the next entry to continue is 
> > specified
> > in the next field.
> > 
> > Here in 1.1, struct desc does not have the next field, only addr, len, 
> > index,
> > and flags. So when VRING_DESC_F_NEXT is set in struct desc's flags field, 
> > where
> > is the next entry to continue the current descriptor, the entry immediately
> > following the current entry? ie, if the current entry is at index 10 in the
> > descriptor table and its flags is set for VRING_DESC_F_NEXT, is the entry
> > continuing the current entry in index 11?
> > 
> > Steven
> 
> Exactly, you got it right.
> 
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org


Re: [virtio-dev] repost: af_packet vs virtio (was packed ring layout proposal v2)

2017-08-02 Thread Michael S. Tsirkin
On Tue, Aug 01, 2017 at 08:54:27PM -0700, Steven Luong wrote:
> * Descriptor ring:
> 
> Guest adds descriptors with unique index values and DESC_HW set in flags.
> Host overwrites used descriptors with correct len, index, and DESC_HW
> clear.  Flags are always set/cleared last.
> 
> #define DESC_HW 0x0080
> 
> struct desc {
>         __le64 addr;
>         __le32 len;
>         __le16 index;
>         __le16 flags;
> };
> 
> When DESC_HW is set, descriptor belongs to device. When it is clear,
> it belongs to the driver.
> 
> We can use 1 bit to set direction
> /* This marks a buffer as write-only (otherwise read-only). */
> #define VRING_DESC_F_WRITE      2
> 
> * Scatter/gather support
> 
> We can use 1 bit to chain s/g entries in a request, same as virtio 1.0:
> 
> /* This marks a buffer as continuing via the next field. */
> 
> 
> This comment here is confusing to me. In 1.0, virtq_desc has the next field.
> When the flag VRING_DESC_F_NEXT is set, the next entry to continue is 
> specified
> in the next field.
> 
> Here in 1.1, struct desc does not have the next field, only addr, len, index,
> and flags. So when VRING_DESC_F_NEXT is set in struct desc's flags field, 
> where
> is the next entry to continue the current descriptor, the entry immediately
> following the current entry? ie, if the current entry is at index 10 in the
> descriptor table and its flags is set for VRING_DESC_F_NEXT, is the entry
> continuing the current entry in index 11?
> 
> Steven

Exactly, you got it right.


:::::::::BUSINESS PROPOSAL::::::

2017-06-01 Thread Johnson King & Co
Attn,

My name is Johnson King, the principal attorney of my law firm., Johnson King & 
Co. A deceased client Mr. Henry died in 2010 and left a sum little above US$ 28 
million in his account here in Unity Bank Plc. Normally banking procedures 
requires that the bank declares the account forfeitable and transfer the 
proceeds to the Registry of Unclaimed Property for government use after 8 years 
from the time of the death of the diseased client.

The present situation made me to contact you given that you and my deceased 
client share the same last name and nationality which made it favorably 
disposed towards this proposals to present you as the Cestui Que trust and 
administrator of the account. It may also interest you to know that the 
transaction will be executed within the ambit of law and nothing shall be done 
outside of it.If you are not familiar with estate and probate measures, I shall 
send further information to you concerning these once i get a positive 
response. Whereas We will discuss the ratios succinctly and promote them in 
written signed agreement before commencement.

I wish to submit that I would expect nothing less but honesty and transparency. 
I will uncover to you further information on the matter in our following 
communications. If this business interests you kindly revert with your direct 
phone number for further exhaustive phone talk.

I look forward to having a good business relationship with you.

Yours sincerely,
Johnson King & Co


repost: af_packet vs virtio (was packed ring layout proposal v2)

2017-04-13 Thread Michael S. Tsirkin
On Fri, Apr 14, 2017 at 05:42:58AM +0300, Michael S. Tsirkin wrote:
> Hi all, I wanted to raise the question of similarities between virtio
> and new zero copy af_packet interfaces.
> 
> First I would like to mention that virtio device development isn't spec
> limited - spec is there to help interoperability and add peace of mind
> for people worried about IPR.
> 
> So I tend to accept patches without requiring people write it up in the
> spec as work on spec proceeds at its own pace - all I ask is that the
> virtio mailing list is copied, this requires contributor to subscribe
> and in the process contributor promises that it's ok for us to add this
> to spec in the future.
> 
> There shouldn't thus be a fundamental problem preventing use of virtio
> format or reusing some of the code for af_packet, but it still might or
> might not make sense - it was designed for CPU to CPU communication so
> it seems to make sense though.  So I would like that discussion to
> happen even if we decide against.
> 
> And even if people decide against, the problem space is very similar.  You
> can look up packed ring layout proposal v2 - should I repost here?  Our
> prototyping shows significant performance improvements from using it as
> compared to head/tail layout.
> 
> To start this discission I'm going to reply to this email reposting a
> copy of the simplified virtio layout that might be appropriate for
> af_packet as well.

Here's the repost (slightly cut down) sorry about the duplicates.

The idea is to have a r/w descriptor in a ring structure,
replacing the used and available ring, index and descriptor
buffer.

* Descriptor ring:

Guest adds descriptors with unique index values and DESC_HW set in flags.
Host overwrites used descriptors with correct len, index, and DESC_HW
clear.  Flags are always set/cleared last.

#define DESC_HW 0x0080

struct desc {
__le64 addr;
__le32 len;
__le16 index;
__le16 flags;
};

When DESC_HW is set, descriptor belongs to device. When it is clear,
it belongs to the driver.

We can use 1 bit to set direction
/* This marks a buffer as write-only (otherwise read-only). */
#define VRING_DESC_F_WRITE  2

* Scatter/gather support

We can use 1 bit to chain s/g entries in a request, same as virtio 1.0:

/* This marks a buffer as continuing via the next field. */
#define VRING_DESC_F_NEXT   1

Unlike virtio 1.0, all descriptors must have distinct ID values.

Also unlike virtio 1.0, use of this flag will be an optional feature
(e.g. VIRTIO_F_DESC_NEXT) so both devices and drivers can opt out of it.

* Indirect buffers

Can be marked like in virtio 1.0:

/* This means the buffer contains a table of buffer descriptors. */
#define VRING_DESC_F_INDIRECT   4

Unlike virtio 1.0, this is a table, not a list:
struct indirect_descriptor_table {
/* The actual descriptors (16 bytes each) */
struct virtq_desc desc[len / 16];
};

The first descriptor is located at start of the indirect descriptor
table, additional indirect descriptors come immediately afterwards.
DESC_F_WRITE is the only valid flag for descriptors in the indirect
table. Others should be set to 0 and are ignored.  id is also set to 0
and should be ignored.

virtio 1.0 seems to allow a s/g entry followed by
an indirect descriptor. This does not seem useful,
so we do not allow that anymore.

This support would be an optional feature, same as in virtio 1.0

* Batching descriptors:

virtio 1.0 allows passing a batch of descriptors in both directions, by
incrementing the used/avail index by values > 1.  We can support this by
chaining a list of descriptors through a bit the flags field.
To allow use together with s/g, a different bit will be used.

#define VRING_DESC_F_BATCH_NEXT 0x0010

Batching works for both driver and device descriptors.



* Processing descriptors in and out of order

Device processing all descriptors in order can simply flip
the DESC_HW bit as it is done with descriptors.

Device can write descriptors out in order as they are used, overwriting
descriptors that are there.

Device must not use a descriptor until DESC_HW is set.
It is only required to look at the first descriptor
submitted.

Driver must not overwrite a descriptor until DESC_HW is clear.
It is only required to look at the first descriptor
submitted.

* Device specific descriptor flags
We have a lot of unused space in the descriptor.  This can be put to
good use by reserving some flag bits for device use.
For example, network device can set a bit to request
that header in the descriptor is suppressed
(in case it's all 0s anyway). This reduces cache utilization.

Note: this feature can be supported in virtio 1.0 as well,
as we have unused bits in both descriptor and used ring there.

* Descriptor length in device descriptors

virtio 1.0 places strict requirements on descriptor length. For example
it must be 0 in use

Proposal...

2017-03-11 Thread Teresa Au


Personal Business proposal for you,contact me via my personal E-mail for more 
detail's: 
ms_teresa_a...@outlook.com



Proposal...

2017-03-11 Thread Teresa Au


Personal Business proposal for you,contact me via my personal E-mail for more 
detail's: 
ms_teresa_a...@outlook.com



Proposal...

2017-03-11 Thread Teresa Au


Personal Business proposal for you,contact me via my personal E-mail for more 
detail's: 
ms_teresa_a...@outlook.com



Proposal...

2017-03-11 Thread Teresa Au


Personal Business proposal for you,contact me via my personal E-mail for more 
detail's: 
ms_teresa_a...@outlook.com



Business Proposal

2017-01-31 Thread QUATIF OIL GROUP OF COMPANIES



--
Dear Friend,

I would like to discuss a very important issue with you. I am writing  
to find out if this is your valid email. Please, let me know if this  
email is valid


Kind regards
Adrien Saif
Attorney to Quatif Group of Companies


Business Proposal

2017-01-21 Thread QUATIF GROUP OF COMPANIES

Dear Friend,

I would like to discuss a very important issue with you. I am writing to
find out if this is your valid email. Please, let me know if this email is
valid

Kind regards
Adrien Saif
Attorney to Quatif Group of Companies





Business Proposal

2017-01-21 Thread QUATIF GROUP OF COMPANIES

Dear Friend,

I would like to discuss a very important issue with you. I am writing to
find out if this is your valid email. Please, let me know if this email is
valid

Kind regards
Adrien Saif
Attorney to Quatif Group of Companies





Proposal

2016-11-17 Thread Teresa Au


Business Partnership Proposal For You,contact me via my personal E-mail for 
further 
detail's: ms_teresa_a...@outlook.com



[PATCH net-next 14/15] rxrpc: Add tracepoint for ACK proposal

2016-09-23 Thread David Howells
Add a tracepoint to log proposed ACKs, including whether the proposal is
used to update a pending ACK or is discarded in favour of an easlier,
higher priority ACK.

Whilst we're at it, get rid of the rxrpc_acks() function and access the
name array directly.  We do, however, need to validate the ACK reason
number given to trace_rxrpc_rx_ack() to make sure we don't overrun the
array.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 include/rxrpc/packet.h   |1 +
 include/trace/events/rxrpc.h |   42 --
 net/rxrpc/ar-internal.h  |   25 +++--
 net/rxrpc/call_event.c   |   21 ++---
 net/rxrpc/input.c|   19 +--
 net/rxrpc/misc.c |   30 +++---
 net/rxrpc/output.c   |3 ++-
 net/rxrpc/recvmsg.c  |3 ++-
 8 files changed, 114 insertions(+), 30 deletions(-)

diff --git a/include/rxrpc/packet.h b/include/rxrpc/packet.h
index fd6eb3a60a8c..703a64b4681a 100644
--- a/include/rxrpc/packet.h
+++ b/include/rxrpc/packet.h
@@ -123,6 +123,7 @@ struct rxrpc_ackpacket {
 #define RXRPC_ACK_PING_RESPONSE7   /* response to 
RXRPC_ACK_PING */
 #define RXRPC_ACK_DELAY8   /* nothing happened 
since received packet */
 #define RXRPC_ACK_IDLE 9   /* ACK due to fully received 
ACK window */
+#define RXRPC_ACK__INVALID 10  /* Representation of invalid 
ACK reason */
 
uint8_t nAcks;  /* number of ACKs */
 #define RXRPC_MAXACKS  255
diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 9413b17ba04b..d67a8c6b085a 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -251,7 +251,7 @@ TRACE_EVENT(rxrpc_rx_ack,
 
TP_printk("c=%p %s f=%08x n=%u",
  __entry->call,
- rxrpc_acks(__entry->reason),
+ rxrpc_ack_names[__entry->reason],
  __entry->first,
  __entry->n_acks)
);
@@ -314,7 +314,7 @@ TRACE_EVENT(rxrpc_tx_ack,
TP_printk(" c=%p ACK  %08x %s f=%08x r=%08x n=%u",
  __entry->call,
  __entry->serial,
- rxrpc_acks(__entry->reason),
+ rxrpc_ack_names[__entry->reason],
  __entry->ack_first,
  __entry->ack_serial,
  __entry->n_acks)
@@ -505,6 +505,44 @@ TRACE_EVENT(rxrpc_rx_lose,
  __entry->hdr.type <= 15 ? rxrpc_pkts[__entry->hdr.type] : 
"?UNK")
);
 
+TRACE_EVENT(rxrpc_propose_ack,
+   TP_PROTO(struct rxrpc_call *call, enum rxrpc_propose_ack_trace why,
+u8 ack_reason, rxrpc_serial_t serial, bool immediate,
+bool background, enum rxrpc_propose_ack_outcome outcome),
+
+   TP_ARGS(call, why, ack_reason, serial, immediate, background,
+   outcome),
+
+   TP_STRUCT__entry(
+   __field(struct rxrpc_call *,call
)
+   __field(enum rxrpc_propose_ack_trace,   why 
)
+   __field(rxrpc_serial_t, serial  
)
+   __field(u8, ack_reason  
)
+   __field(bool,   immediate   
)
+   __field(bool,   background  
)
+   __field(enum rxrpc_propose_ack_outcome, outcome 
)
+),
+
+   TP_fast_assign(
+   __entry->call   = call;
+   __entry->why= why;
+   __entry->serial = serial;
+   __entry->ack_reason = ack_reason;
+   __entry->immediate  = immediate;
+   __entry->background = background;
+   __entry->outcome= outcome;
+  ),
+
+   TP_printk("c=%p %s %s r=%08x i=%u b=%u%s",
+ __entry->call,
+ rxrpc_propose_ack_traces[__entry->why],
+ rxrpc_ack_names[__entry->ack_reason],
+ __entry->serial,
+ __entry->immediate,
+ __entry->background,
+ rxrpc_propose_ack_outcomes[__entry->outcome])
+   );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index e564eca75985..042dbcc52654 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -689,8 +689,28 @@ enum rxrpc_timer_trace {
 
 e

BUSINESS PROPOSAL!!!

2016-07-20 Thread a . victima . lara
I am Mr.Saeed Bin Salem Executive Director and Chief Financial Officer of the 
National Commercial Bank Libya.I have a secured business suggestion for you 
reply me on my email: saeedbi...@qq.com


Proposal

2016-07-16 Thread Teresa Au


I have a Business proposal for you,contact me via my personal E-mail for more 
detail's: 
ms_teresa_a...@outlook.com



Fwd: Investment Proposal

2016-05-31 Thread Dr. Daniel Mminele


Business Proposal view the attached letter for more details''


CONFIDENTIAL INVESTMENT  PROPOSAL.doc
Description: MS-Word document


(Re: BUSINESS PROPOSAL!!!)

2016-03-09 Thread Mr. Anthony Oke
Email Address (anthony.o...@aol.com)
   Confidential Business Proposal!
 
 
I am Mr. Anthony Oke. I have a business proposal which will benefit both of us, 
The amount of money involved is (Thirty Five Million Great British Pounds) 
which i want to transfer from an abandoned account to your bank account; it is 
100% risk free.
 
 
Upon the conclusion of this transaction, i accept that  (50%) Fifty Percent 
will be for you in respect of all your assistance for this transaction and (50% 
) Fifty - Percent will be for me being the pioneer of the business.
 
 
A lot of customers open private accounts with different Banks without the 
knowledge of their families and when they die, such money will be lost to the 
Bank unless someone comes to claim it.
 
 
This is how a lot of Bank Directors make so much money silently.
I will like you to provide immediately the below information's, to enable me 
use it and get you next of kin application form from my bank.
 
 
1.Full Name:..
2.Full Address:...
3.Telephone Number:...
4.Country:
5.Occupation:.
6.Age:
7.Sex:
 
 
As soon as you reply through this private Email Address 
(anthony.o...@aol.com),I will let you know the next steps and procedures and 
more details to follow in order to finalize this transaction immediately.
 
 
Please keep whatever information you get from me strictly confidential even if 
you decide not to participate in this transaction.
 
 
Yours faithfully,
Mr. Anthony Oke


Fund Transaction Proposal

2016-03-07 Thread Teresa Au


US$23,200,000.00 Million Transaction, for further detail's contact me via my 
personal e-
mail: ms_teresa_...@outlook.com



Proposal for per-radio configuration file.

2016-03-03 Thread Ben Greear

Hello!

While working on ath10k NICs, I found a need to have one radio be configured
in one manner, and another in a different manner, and I need this config to 
happen
before the NIC is booted in at least some cases.  The primary reason is that
the NIC has limited resources, so there is definite need to allow the user to
optimize for their use case.  For instance, more vdevs vs more peer objects.

Module parameters do not work well for this because I want different NICs with 
the
same driver to have different configuration.

For ath10k, I implemented this with a text file for each NIC that is loaded
with the firmware-load API, parsed in the kernel, and then used to configure
the radio on bootup.

A patch I used to do this is here.  I think I ended up with a few follow-on
patches to fix some bugs, but this has the idea:

http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=6708e4047d91edf234239943332bc2f0d124d009

It seems to work fine in my testing, and is logically similar to loading board 
init
files and so forth (which ath10k already uses).

I am looking for feedback on this if anyone has any opinions


The config files look like this:

]# ls -l /lib/firmware/ath10k/
total 12
-rw-r--r--  1 root root  311 Feb 23 11:27 fwcfg-pci-:05:00.0.txt
-rw-r--r--  1 root root  330 Feb 23 11:18 fwcfg-pci-:07:00.0.txt
rwxr-xr-x.  3 root root 4096 May 21  2015 QCA988X

]# cat /lib/firmware/ath10k/fwcfg-pci-:05:00.0.txt
# Configuration for radio 1
vdevs = 64
peers = 127
stations = 127
rate_ctrl_objs = 36
regdom = 840
fwname = firmware-2.bin
fwver = 2
nohwcrypt = 1
tx_desc = 680
#max_nss = 3
num_tids = 256
skid_limit = 128

[root@ben-ota-1 lanforge]# cat /lib/firmware/ath10k/fwcfg-pci-\:07\:00.0.txt
# Configuration for radio 2
# Driver will pick defaults for any commented-out or missing variables.

# vdevs = 8
# peers = 64
# stations = 64
# rate_ctrl_objs = 10
# regdom = 840
# fwname = firmware-5.bin
# fwver = 5
# nohwcrypt = 1
# tx_desc = 1024
#max_nss = 3
# num_tids = 128
# skid_limit = 32



Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Business Proposal of $12.8m USD

2015-12-05 Thread Song
Dear Friend,
I am Song Chen i have a Business Proposal of $12.8m USD for you 
to handle with me from my bank contact me for more information
(mr.songchen...@hotmail.com)
Regards,
Mr Song Chen.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


I HAVE A PROPOSAL FOR YOU?

2015-10-21 Thread PROPOSAL
Good day

 I am Major Alan Edward, I Have a Proposal for you Please do get back for 

more details.

Regards,
Major Alan Edward.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


a question about the kcm proposal

2015-10-12 Thread Sowmini Varadhan
Thinking back a bit about the kcm proposal:
 https://www.mail-archive.com/netdev@vger.kernel.org/msg78696.html
I had a question:

If the user-space  has decided to encrypt the http/2 header using tls,
the len (and other http/2 fields) is no longer in the clear for the kernel.

My understanding is that http header encryption is common practice/BCP,
since the http hdr may contain a lot of identity, session and tenancy data.
If that's true, then wouldn't this break the BPF/kcm assumptions? 

There is a different but related problem in this space- existing TLS/DTLS
libraries (openssl, gnutls etc) only know how to work with tcp
or udp sockets - they do not know anything about PF_RDS or the
newly proposed kcm socket type.

In theory, it is possible to extend these libraries to handle
RDS/kcm etc, but (as we found out with RDS and IP_PKTINFO/BINDTODEVICE),
some things become tricky because of the many-to-one dgram-over-stream
hybrid.

I've looked at  IPSEC/IKE in transport mode for RDS on the kernel tcp
socket as we discussed at Plumbers in August, and that has some costs..
would be interesting to evaluate against other options..

--Sowmini


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: a question about the kcm proposal

2015-10-12 Thread Tom Herbert
>
> If the user-space  has decided to encrypt the http/2 header using tls,
> the len (and other http/2 fields) is no longer in the clear for the kernel.
>
> My understanding is that http header encryption is common practice/BCP,
> since the http hdr may contain a lot of identity, session and tenancy data.
> If that's true, then wouldn't this break the BPF/kcm assumptions?
>

Right, if data is encrypted then we can't do message delineation on
receive. KCM wouldn't help much on transmit either since the crypto
state would need to be shared. The solution is to move TLS into the
kernel.

> There is a different but related problem in this space- existing TLS/DTLS
> libraries (openssl, gnutls etc) only know how to work with tcp
> or udp sockets - they do not know anything about PF_RDS or the
> newly proposed kcm socket type.
>
TLS-in-kernel would be a lower layer so it shouldn't have to know
anything about RDS or KCM. If it makes sent KCM could be used for
parsing TLS records themselves...

> In theory, it is possible to extend these libraries to handle
> RDS/kcm etc, but (as we found out with RDS and IP_PKTINFO/BINDTODEVICE),
> some things become tricky because of the many-to-one dgram-over-stream
> hybrid.
>
> I've looked at  IPSEC/IKE in transport mode for RDS on the kernel tcp
> socket as we discussed at Plumbers in August, and that has some costs..
> would be interesting to evaluate against other options..
>
The design of TLS in the kernel is that it will be enabled on the TCP
socket, so that receive and transmit path are below RDS and KCM. We
have the transmit path for TLS-in-kernel running with good preliminary
results, we will post that at least as RFC shortly. Receive side still
seems to be feasible.

Thanks,
Tom

> --Sowmini
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: a question about the kcm proposal

2015-10-12 Thread Sowmini Varadhan
On (10/12/15 15:05), Tom Herbert wrote:
> > There is a different but related problem in this space- existing TLS/DTLS
> > libraries (openssl, gnutls etc) only know how to work with tcp
> > or udp sockets - they do not know anything about PF_RDS or the
> > newly proposed kcm socket type.
> >
> TLS-in-kernel would be a lower layer so it shouldn't have to know
> anything about RDS or KCM. If it makes sent KCM could be used for
> parsing TLS records themselves...

I wouldn't quite jump to that conclusion just yet though :-)

there are a lot of alternatives- you could have a uspace module
that shims between the application and kcm (even something that gets
LD_PRELOADed) and adds the right kcm header as needed. Or you
could use ipsec/ike..

tls in the kernel can be quite complex and history shows that it
can easily become hard to maintain: uspace TLS (both the protocol itself,
and the negotiated crypto) tend to move much faster than kernel changes
(at least that's what the 10+ year long solaris-kssl experiment found).

There is another aspect to this: in the DB world, for example,
I might seriously care about encrypting my payroll-database, but not
care so much about the christmas-potluck-database. Thus allowing the
uspace to select when (and what type of crypto algo) to use is a flexibiility
offered by TLS that a "kernel-TLS" would have a hard time matching.

> The design of TLS in the kernel is that it will be enabled on the TCP
> socket, so that receive and transmit path are below RDS and KCM. We
> have the transmit path for TLS-in-kernel running with good preliminary
> results, we will post that at least as RFC shortly. Receive side still
> seems to be feasible.

yes, please share.

TLS does complex things like mid-session CCS. Such things can result
in a lot of asyncrony in the kernel. Given that ipsec has already crossed 
that bridge, I, for one, would like to understand the trade-offs.

The question in my mind,  is "how does this match up with 
transport mode ipsec/ike", and if it does not, why not? The only 
difference (in theory) is whether you do encryption before, or after,
adding the transport (tcp/udp) header, so if there is a big perf gap,
we need to understand why.

--Sowmini
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


proposal

2015-10-02 Thread Mr
I wish to discuss a very confidential business proposition worth $48Million USD 
with you that will be of immense benefit to the both of us, but I want your 
consent before sending details.

Mr Wing
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


did you receive my charity proposal details

2015-09-27 Thread Janet Penninger

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next 0/16] Proposal for VRF-lite - v3

2015-07-28 Thread David Ahern

On 7/27/15 2:30 PM, Eric W. Biederman wrote:

This paragraph is false when it comes to sockets, as I have already
pointed out.

- VPN Routing and Forwarding (RFC4364 and it's kin) implies isolation
   strong enough to allow using the the same ip on different machines
   in different VPN instances and not have confusion.

- The routing table is not the only table in the kernel that uses
   an ip address as a key.

   The result is that you can combine packets fragments that come in
   on different interfaces (irrespective of your VPN), confuse tcp
   parameters between interfaces, scramble your ipsec connections and I
   don't know what else.


The duplicate IP address is a problem with the networking stack today; 
the VRF device does not introduce it. The VRF device does allow 
duplicate IP addresses within a namespace but separate VRFs, though yes 
various places that rely solely on source address like IP fragmentation 
do need to be fixed.


I looked at the IPv4 fragmentation code yesterday and will continue 
today. So help me with the history: is there any reason why the device 
index is not used today? It seems like a straight forward change.


1. simple netdevices with the same IP address
-- no problem using index in the lookup

2. 2 ipsec tunnels -- different netdevices, same IP address
-- no problem using index

3. stacked devices like bonding and team interfaces appear to the stack 
as a single device

-- no problem using index of stacked device

4. If an interface is deleted and a new one is created with the same IP 
address then we want to fail the lookup

-- no problem using index

5. other???

Is there a use case where I can't add ifindex of the incoming device (or 
higher level device if skb-dev is changed) to the hash and lookup for 
fragments?




Version 3
- addressed comments from first 2 RFCs with the exception of the name
   Nicolas: We will do the name conversion once we agree on what the
correct name should be (vrf, mrf or something else)


Not so.  I described the deep problems between your goals and your
implementation and they are not even mentioned let alone addressed.


I have addressed comments to the extent that I can. As I stated in my 
last followup to you Eric I did not understand your point. I asked for 
clarification, a --verbose if you will. I can't read your mind, so I 
need you to elaborate on your points to be able to respond and address 
your concerns.





-  packets flow through the VRF device in both directions allowing the
following:
- tcpdump -i vrfn
- tc rules on vrf device
- netfilter rules on vrf device

Ingo/Andy: I added you two as a start point for the proposed task related
changes. Not sure who should be the reviewer; please let me know
if someone else is more appropriate. Thanks.


It looks like you are trying to implement a namespace that isn't a
namespace.  Given that it is broken by design you have my nack.


This is an L3 separation within a namespace, not a device level 
separation which is what namespaces provide.


David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next 0/16] Proposal for VRF-lite - v3

2015-07-28 Thread Eric W. Biederman
David Ahern d...@cumulusnetworks.com writes:

 On 7/27/15 2:30 PM, Eric W. Biederman wrote:
 This paragraph is false when it comes to sockets, as I have already
 pointed out.

 - VPN Routing and Forwarding (RFC4364 and it's kin) implies isolation
strong enough to allow using the the same ip on different machines
in different VPN instances and not have confusion.

 - The routing table is not the only table in the kernel that uses
an ip address as a key.

The result is that you can combine packets fragments that come in
on different interfaces (irrespective of your VPN), confuse tcp
parameters between interfaces, scramble your ipsec connections and I
don't know what else.

 The duplicate IP address is a problem with the networking stack today; the VRF
 device does not introduce it. The VRF device does allow duplicate IP addresses
 within a namespace but separate VRFs, though yes various places that rely 
 solely
 on source address like IP fragmentation do need to be fixed.

No. The same IP address being used by different machines is not a
problem with the IP stack today.  IP addresses are defined to be
globally unique.

At the point you introduce VPNs/VRFs you introduce duplicate IP
addresses and then the code needs to cope.

As such I think there is a deep mismatch between the semantics of
BINDTODEVICE and VRFs because BINDTODEVICE by definition does not worry
about duplicate IP addresses.

Which means that you can't just reuse the BINDTODEVICE infrastructure.
It is fundamentally insufficient to the task.  So as you are discovering
you have to invent something new.

That new thing needs a definition.  Maybe the new thing makes sense and
you can just slice off a chunk of a network namespace.  Maybe you go
through and change all of the code.

 I looked at the IPv4 fragmentation code yesterday and will continue today. So
 help me with the history: is there any reason why the device index is not used
 today? It seems like a straight forward change.

Sigh.  I would have hoped someone dealing with routing issues would have
seen this at a glance.  The reason is multi-path reception of fragments.

Adding the device index into the fragment reassembly logic would break
fragment reassembly when fragments of the same packet come into a
machine on different network devices.  Given that only the first
fragment has port numbers I can easily see network path selection code
hashing fragments onto different paths through the network.

 Is there a use case where I can't add ifindex of the incoming device (or 
 higher
 level device if skb-dev is changed) to the hash and lookup for
 fragments?

As detailed above.  That breaks fragment reassembly on multiple paths.

 Version 3
 - addressed comments from first 2 RFCs with the exception of the name
Nicolas: We will do the name conversion once we agree on what the
 correct name should be (vrf, mrf or something else)

 Not so.  I described the deep problems between your goals and your
 implementation and they are not even mentioned let alone addressed.

 I have addressed comments to the extent that I can. As I stated in my last
 followup to you Eric I did not understand your point. I asked for 
 clarification,
 a --verbose if you will. I can't read your mind, so I need you to elaborate on
 your points to be able to respond and address your concerns.

Hopefully this helps.

Everything we are talking about follows from what I said at the outset.
You are introducing the idea of having the same ip address refer to
different network destinations depending upon context.  Outside of
network namespaces that concept is new and it breaks a lot of
assumptions.

The entire network stack is too large to fit in my head.  I don't know
every place where ip addresses are used as part of the index into a
table.  It is beholden on the implementor of a new feature to figure out
how to introduce such a concept safely.  I don't see that happening with
VRF-lite.

Pretty fundamentally a network device index is insufficient for your needs.

 -  packets flow through the VRF device in both directions allowing the
 following:
 - tcpdump -i vrfn
 - tc rules on vrf device
 - netfilter rules on vrf device

 Ingo/Andy: I added you two as a start point for the proposed task related
 changes. Not sure who should be the reviewer; please let me know
 if someone else is more appropriate. Thanks.

 It looks like you are trying to implement a namespace that isn't a
 namespace.  Given that it is broken by design you have my nack.

 This is an L3 separation within a namespace, not a device level separation 
 which
 is what namespaces provide.

Not my meaning.  I was not talking about network namespaces and how your
vrf is almost but not completely the same as a network namespace.

What I was talking about is that you are implementing something that is
used roughly the same way as the other namespaces pid, mount, ipc, net,
uts, etc.  As the 

[net-next 0/16] Proposal for VRF-lite - v3

2015-07-27 Thread David Ahern
In the context of internet scale routing a requirement that always comes
up is the need to partition the available routing tables into disjoint
routing planes. A specific use case is the multi-tenancy problem where
each tenant has their own unique routing tables and in the very least
need different default gateways.

This patch allows the ability to create virtual router domains (aka VRFs
(VRF-lite to be specific) in the linux packet forwarding stack. The main
observation is that through the use of rules and socket binding to interfaces,
all the facilities that we need are already present in the infrastructure. What
is missing is a handle that identifies a routing domain and can be used to
gather applicable rules/tables and uniqify neighbor selection. The scheme used
needs to preserves the notions of ECMP, and general routing principles.

This driver is a cross between functionality that the IPVLAN driver
and the Team drivers provide where a device is created and packets
into/out of the routing domain are shuttled through this device. The
device is then used as a handle to identify the applicable rules. The
VRF device is thus the layer3 equivalent of a vlan device.

The very important point to note is that this is only a Layer3 concept
so L2 tools (e.g., LLDP) do not need to be run in each VRF, processes can
run in unaware mode or select a VRF to be talking through. Also the
behavioral model is a generalized application of the familiar VRF-Lite
model with some performance paths that need optimization. (Specifically
the output route selector that Roopa, Robert, Thomas and EricB are
currently discussing on the MPLS thread)

High Level points
=
1. Simple overlay driver (minimal changes to current stack)
   * uses the existing fib tables and fib rules infrastructure
2. Modelled closely after the ipvlan driver
3. Uses current API and infrastructure.
   * Applications can use SO_BINDTODEVICE or cmsg device indentifiers
 to pick VRF (ping, traceroute just work)
   * Standard IP Rules work, and since they are aggregated against the
 device, scale is manageable
4. Completely orthogonal to Namespaces and only provides separation in
   the routing plane (and ARP)

 N2
   N1 (all configs here)  +---+
+--+  |   |
|swp1 :10.0.1.1+--+swp1 :10.0.1.2 |
|  |  |   |
|swp2 :10.0.2.1+--+swp2 :10.0.2.2 |
|  |  +---+
| VRF 1|
| table 5  |
|  |
+---+
|  |
| VRF 2| N3
| table 6  |  +---+
|  |  |   |
|swp3 :10.0.2.1+--+swp1 :10.0.2.2 |
|  |  |   |
|swp4 :10.0.3.1+--+swp2 :10.0.3.2 |
+--+  +---+


Given the topology above, the setup needed to get the basic VRF
functions working would be

Create the VRF devices and associate with a table
ip link add vrf1 type vrf table 5
ip link add vrf2 type vrf table 6

Install the lookup rules that map table to VRF domain
ip rule add pref 200 oif vrf1 lookup 5
ip rule add pref 200 iif vrf1 lookup 5
ip rule add pref 200 oif vrf2 lookup 6
ip rule add pref 200 iif vrf2 lookup 6

ip link set vrf1 up
ip link set vrf2 up

Enslave the routing member interfaces
ip link set swp1 master vrf1
ip link set swp2 master vrf1
ip link set swp3 master vrf2
ip link set swp4 master vrf2

Connected routes are automatically moved from main table to the VRF
table.

ping using VRF0 is simply
ping -I vrf0 10.0.1.2

Or using the task context and a command such as the example chvrf in
patch 15 unmodified applications are run in a VRF context using:
   chvrf -v 1 ping 10.0.1.2


Design Highlights
=
If a device is enslaved to a VRF device (ie., associated with a VRF)
then:
1. Rx path
   The master device index is used as the iif for all lookups.

2. Tx path
   Similarly, for Tx the VRF device oif is used in the flow to direct
   lookups to the table associated with the VRF via its rule. From there
   the FLOWI_FLAG_VRFSRC flag is used to indicate that the oif should
   not be used for FIB table lookups.

3. Connected and local routes
   On link up for a device, connected and local routes are added to the
   table associated with the VRF device, rather than the local and main
   tables.

4. Socket lookups
   Socket lookups use the VRF device for comparison with sk_bound_dev_if.
   If a socket is not bound to a device a socket match can happen based
   on destination address, port and protocol in which case a VRF global
   or agnostic 

Re: [net-next 0/16] Proposal for VRF-lite - v3

2015-07-27 Thread Eric W. Biederman
David Ahern d...@cumulusnetworks.com writes:

 In the context of internet scale routing a requirement that always comes
 up is the need to partition the available routing tables into disjoint
 routing planes. A specific use case is the multi-tenancy problem where
 each tenant has their own unique routing tables and in the very least
 need different default gateways.

 This patch allows the ability to create virtual router domains (aka VRFs
 (VRF-lite to be specific) in the linux packet forwarding stack. The main
 observation is that through the use of rules and socket binding to interfaces,
 all the facilities that we need are already present in the infrastructure. 
 What
 is missing is a handle that identifies a routing domain and can be used to
 gather applicable rules/tables and uniqify neighbor selection. The scheme used
 needs to preserves the notions of ECMP, and general routing
 principles.

This paragraph is false when it comes to sockets, as I have already
pointed out.

- VPN Routing and Forwarding (RFC4364 and it's kin) implies isolation
  strong enough to allow using the the same ip on different machines
  in different VPN instances and not have confusion.

- The routing table is not the only table in the kernel that uses
  an ip address as a key.

  The result is that you can combine packets fragments that come in
  on different interfaces (irrespective of your VPN), confuse tcp
  parameters between interfaces, scramble your ipsec connections and I
  don't know what else.

Binding a socket to a network device is not strong enough to do what you
want to do and it will lead to subtle bugs, that can be triggered by
accident or by hostile actors.

If these kinds of limitations are well documented and it is specified
that these kinds of problems can occur with your socket code there may
be a place for this code somewhere.

However described like it is your code is wrong and fundmentally broken.

 Version 3
 - addressed comments from first 2 RFCs with the exception of the name
   Nicolas: We will do the name conversion once we agree on what the
correct name should be (vrf, mrf or something else)

Not so.  I described the deep problems between your goals and your
implementation and they are not even mentioned let alone addressed.

 -  packets flow through the VRF device in both directions allowing the
following:
- tcpdump -i vrfn
- tc rules on vrf device
- netfilter rules on vrf device

 Ingo/Andy: I added you two as a start point for the proposed task related
changes. Not sure who should be the reviewer; please let me know
if someone else is more appropriate. Thanks.

It looks like you are trying to implement a namespace that isn't a
namespace.  Given that it is broken by design you have my nack.

Nacked-by: Eric W. Biederman ebied...@xmission.com

Eric
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/6] Proposal for VRF-lite - v2

2015-07-09 Thread Scott Feldman
On Mon, Jul 6, 2015 at 8:03 AM, David Ahern d...@cumulusnetworks.com wrote:
 In the context of internet scale routing a requirement that always
 comes up is the need to partition the available routing tables into
 disjoint routing planes. A specific use case is the multi-tenancy
 problem where each tenant has their own unique routing tables and in
 the very least need different default gateways.

Based on this problem statement, netns would be the answer: to
partition the physical router into N virtual routers.  If routing is
offloaded, the offload device is netns-aware to preserve the
partitioning down to the HW level.

I see from earlier discussions on VRF that netns is no good because
it's an inefficient use of resources.  I wonder if that's true in a
practical way?  If I have a 48-port router, I could create 24 2-port
virtual routers using netns, each running routing stuff (bgp, lldp,
ospf, etc).  Is the netns overhead plus the routing sw duplication not
going to fit on a Cumulus-class router?

In other words, if noone had ever heard of VRF, we'd conclude netns
given the problem statement.  And then focus on inefficiencies in
netns, if the implementation didn't fit a particular target.

So my C in RFC is what's wrong with using netns?  And can those wrongs be fixed?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/6] Proposal for VRF-lite - v2

2015-07-08 Thread Nicolas Dichtel

Le 06/07/2015 19:53, Shrijeet Mukherjee a écrit :

No no problem,

Just trying to get the functional aspects worked out. the global
search replace will be easy.

Was hoping to see some more responses on the naming suggestions here
from the community. If there is not disagreement we can spin patches
with MRF as the name.

For me, it's ok.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/6] Proposal for VRF-lite - v2

2015-07-06 Thread Shrijeet Mukherjee
No no problem,

Just trying to get the functional aspects worked out. the global
search replace will be easy.

Was hoping to see some more responses on the naming suggestions here
from the community. If there is not disagreement we can spin patches
with MRF as the name.


On Mon, Jul 6, 2015 at 8:40 AM, Nicolas Dichtel
nicolas.dich...@6wind.com wrote:
 Le 06/07/2015 17:03, David Ahern a écrit :

 In the context of internet scale routing a requirement that always
 comes up is the need to partition the available routing tables into
 disjoint routing planes. A specific use case is the multi-tenancy
 problem where each tenant has their own unique routing tables and in
 the very least need different default gateways.

 This is an attempt to build the ability to create virtual router
 domains aka VRF's (VRF-lite to be specific) in the linux packet
 forwarding stack. The main observation is that through the use of
 rules and socket binding to interfaces, all the facilities that we
 need are already present in the infrastructure. What is missing is a
 handle that identifies a routing domain and can be used to gather
 applicable rules/tables and uniqify neighbor selection. The scheme
 used needs to preserves the notions of ECMP, and general routing
 principles.

 [snip]

   drivers/net/vrf.c | 486
 ++

 [snip]

 I'm still opposed to name this 'vrf', see the v1 thread:
  - http://www.spinics.net/lists/netdev/msg332357.html
  - http://www.spinics.net/lists/netdev/msg332376.html

 Shrijeet seemed to agree to rename it, is there a problem?


 Regards,
 Nicolas
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC net-next 0/6] Proposal for VRF-lite - v2

2015-07-06 Thread David Ahern
In the context of internet scale routing a requirement that always
comes up is the need to partition the available routing tables into
disjoint routing planes. A specific use case is the multi-tenancy
problem where each tenant has their own unique routing tables and in
the very least need different default gateways.

This is an attempt to build the ability to create virtual router
domains aka VRF's (VRF-lite to be specific) in the linux packet
forwarding stack. The main observation is that through the use of
rules and socket binding to interfaces, all the facilities that we
need are already present in the infrastructure. What is missing is a
handle that identifies a routing domain and can be used to gather
applicable rules/tables and uniqify neighbor selection. The scheme
used needs to preserves the notions of ECMP, and general routing
principles.

This driver is a cross between functionality that the IPVLAN driver
and the Team drivers provide where a device is created and packets
into/out of the routing domain are shuttled through this device. The
device is then used as a handle to identify the applicable rules. The
VRF device is thus the layer3 equivalent of a vlan device.

The very important point to note is that this is only a Layer3 concept
so LLDP like tools do not need to be run in each VRF, processes can
run in unaware mode or select a VRF to be talking through. Also the
behavioral model is a generalized application of the familiar VRF-Lite
model with some performance paths that need optimization. (Specifically
the output route selector that Roopa, Robert, Thomas and EricB are
currently discussing on the MPLS thread)

High Level points
=
1. Simple overlay driver (minimal changes to current stack)
   * uses the existing fib tables and fib rules infrastructure
2. Modelled closely after the ipvlan driver
3. Uses current API and infrastructure.
   * Applications can use SO_BINDTODEVICE or cmsg device indentifiers
 to pick VRF (ping, traceroute just work)
   * Standard IP Rules work, and since they are aggregated against the
 device, scale is manageable
4. Completely orthogonal to Namespaces and only provides separation in
   the routing plane (and ARP)

 N2
   N1 (all configs here)  +---+
+--+  |   |
|swp1 :10.0.1.1+--+swp1 :10.0.1.2 |
|  |  |   |
|swp2 :10.0.2.1+--+swp2 :10.0.2.2 |
|  |  +---+
| VRF 1|
| table 5  |
|  |
+---+
|  |
| VRF 2| N3
| table 6  |  +---+
|  |  |   |
|swp3 :10.0.2.1+--+swp1 :10.0.2.2 |
|  |  |   |
|swp4 :10.0.3.1+--+swp2 :10.0.3.2 |
+--+  +---+


Given the topology above, the setup needed to get the basic VRF
functions working would be

Create the VRF devices and associate with a table
ip link add vrf1 type vrf table 5
ip link add vrf2 type vrf table 6

Install the lookup rules that map table to VRF domain
ip rule add pref 200 oif vrf1 lookup 5
ip rule add pref 200 iif vrf1 lookup 5
ip rule add pref 200 oif vrf2 lookup 6
ip rule add pref 200 iif vrf2 lookup 6

ip link set vrf1 up
ip link set vrf2 up

Enslave the routing member interfaces
ip link set swp1 master vrf1
ip link set swp2 master vrf1
ip link set swp3 master vrf2
ip link set swp4 master vrf2

In this version connected routes are automatically moved from main table
to VRF table.

ping using VRF0 is simply
ping -I vrf0 -I optional-src-addr 10.0.1.2

Or using the task context and a command such as the example chvrf in
patch 6 unmodified applications are run in a VRF context using:
   chvrf -v 1 ping 10.0.1.2


Design Highlights
=
If a device is enslaved to a VRF device (ie., associated with a VRF)
then:
1. Rx path
   The master device index is used as the iif for all lookups.

2. Tx path
   Similarly, for Tx the VRF device oif is used in the flow to direct
   lookups to the table associated with the VRF via its rule. From there
   the FLOWI_FLAG_VRFSRC flag is used to indicate that the oif should
   not be used for FIB table lookups.

3. Connected and local routes
   On link up for a device, connected and local routes are added to the
   table associated with the VRF device, rather than the local and main
   tables.

4. Socket lookups
   Socket lookups use the VRF device for comparison with sk_bound_dev_if.
   If a socket is not bound to a device a socket match can happen based
   on destination address, port and protocol in which 

Re: [RFC net-next 0/6] Proposal for VRF-lite - v2

2015-07-06 Thread Nicolas Dichtel

Le 06/07/2015 17:03, David Ahern a écrit :

In the context of internet scale routing a requirement that always
comes up is the need to partition the available routing tables into
disjoint routing planes. A specific use case is the multi-tenancy
problem where each tenant has their own unique routing tables and in
the very least need different default gateways.

This is an attempt to build the ability to create virtual router
domains aka VRF's (VRF-lite to be specific) in the linux packet
forwarding stack. The main observation is that through the use of
rules and socket binding to interfaces, all the facilities that we
need are already present in the infrastructure. What is missing is a
handle that identifies a routing domain and can be used to gather
applicable rules/tables and uniqify neighbor selection. The scheme
used needs to preserves the notions of ECMP, and general routing
principles.

[snip]

  drivers/net/vrf.c | 486 ++

[snip]

I'm still opposed to name this 'vrf', see the v1 thread:
 - http://www.spinics.net/lists/netdev/msg332357.html
 - http://www.spinics.net/lists/netdev/msg332376.html

Shrijeet seemed to agree to rename it, is there a problem?


Regards,
Nicolas
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/3] Proposal for VRF-lite

2015-06-12 Thread Thomas Graf
On 06/10/15 at 01:43pm, Shrijeet Mukherjee wrote:
 On Tue, Jun 9, 2015 at 3:15 AM, Thomas Graf tg...@suug.ch wrote:
  Do I understand this correctly that swp* represent veth pairs?
  Why do you have distinct addresses on each peer of the pair?
  Are the addresses in N2 and N3 considered private and NATed?
 
  [...]
 
 
 ???These are physical boxes in the picture not veth pairs or NAT's :)???

I see. So if I translate this to a virtual world with veths where
the guest facing peer is in its own netns, the host facing veth
peer would get attached to a vrf device and we should be good.

 ???Are you worried about ip rule scale ? this reduces the scale to number of
 L3 domains, which should be not that large. I do think we need to speed up
 rule lookup from the linear walk we have right now.

I definitely have more L3 domains than what a linear search can
handle.

 A generic classifier seems like a bigger hammer, but if that is the way to
 replace rules it is a worthy concept.
 
 That said, the patches from Hannes et al, will make it such that the table
 lookup maybe from the driver directly and thus will skip past the fib rule
 lookup.

The approach from Hannes definitely works for the physical world
but is undesirable for overlays, logical or encapsulations, where
we want to avoid maintaining a net_device for every virtual network.

As I said, I think this is something that can be resolved later on
with a programmable classifier.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/3] Proposal for VRF-lite

2015-06-09 Thread Thomas Graf
On 06/08/15 at 11:35am, Shrijeet Mukherjee wrote:
[...]
 model with some performance paths that need optimization. (Specifically
 the output route selector that Roopa, Robert, Thomas and EricB are
 currently discussing on the MPLS thread)

Thanks for posting these patches just in time. This explains how
you intent to deploy Roopa's patches in a scalable manner.

 High Level points
 
 1. Simple overlay driver (minimal changes to current stack)
* uses the existing fib tables and fib rules infrastructure
 2. Modelled closely after the ipvlan driver
 3. Uses current API and infrastructure.
* Applications can use SO_BINDTODEVICE or cmsg device indentifiers
  to pick VRF (ping, traceroute just work)

I like the aspect of reusing existing user interfaces. We might
need to introduce a more fine grained capability than CAP_NET_RAW
to give containers the privileges to bind to a VRF without
allowing them to inject raw frames.

Given I understand this correctly: If my intent was to run a
process in multiple VRFs, then I would need to run that process
in the host network namespace which contains the VRF devices
which would also contain the physical devices. While I might want
to grant my process the ability to bind to VRFs, I may not want
to give it the privileges to bind to any device. So we could
consider introducing CAP_NET_VRF which would allow to bind to
VRF devices.

* Standard IP Rules work, and since they are aggregated against the
  device, scale is manageable
 4. Completely orthogonal to Namespaces and only provides separation in
the routing plane (and ARP)
 5. Debugging is built-in as tcpdump and counters on the VRF device
works as is.
 
  N2
N1 (all configs here)  +---+
 +--+  |   |
 |swp1 :10.0.1.1+--+swp1 :10.0.1.2 |
 |  |  |   |
 |swp2 :10.0.2.1+--+swp2 :10.0.2.2 |
 |  |  +---+
 | VRF 0|
 | table 5  |
 |  |
 +---+
 |  |
 | VRF 1| N3
 | table 6  |  +---+
 |  |  |   |
 |swp3 :10.0.2.1+--+swp1 :10.0.2.2 |
 |  |  |   |
 |swp4 :10.0.3.1+--+swp2 :10.0.3.2 |
 +--+  +---+

Do I understand this correctly that swp* represent veth pairs?
Why do you have distinct addresses on each peer of the pair?
Are the addresses in N2 and N3 considered private and NATed?

[...]

 # Install the lookup rules that map table to VRF domain
 ip rule add pref 200 oif vrf0 lookup 5
 ip rule add pref 200 iif vrf0 lookup 5
 ip rule add pref 200 oif vrf1 lookup 6
 ip rule add pref 200 iif vrf1 lookup 6

I think this is a good start but we all know the scalability
constraints of this. Depending on the number of L3 domains,
an eBPF classifier utilizing a map to translate origin to
routing table and vice versa might address the scale requirement
long term.

[...]

I will comment on the implementation specifics once I have a
good understanding of your desired end state looks like.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/3] Proposal for VRF-lite

2015-06-09 Thread Nicolas Dichtel

Le 08/06/2015 20:35, Shrijeet Mukherjee a écrit :

From: Shrijeet Mukherjee s...@cumulusnetworks.com

In the context of internet scale routing a requirement that always
comes up is the need to partition the available routing tables into
disjoint routing planes. A specific use case is the multi-tenancy
problem where each tenant has their own unique routing tables and in
the very least need different default gateways.

This is an attempt to build the ability to create virtual router
domains aka VRF's (VRF-lite to be specific) in the linux packet
forwarding stack. The main observation is that through the use of

[snip]

  drivers/net/vrf.c|  654 ++


I'm not really in favor of the name 'vrf'. This term is very controversial and
having a consensus of what is/contains a 'vrf' is quite impossible.
There was already a lot of discussions about this topic on quagga ml that show
that everybody has a different opinion about this term ;-)

I know you call this 'MRF' internally, why not using this name instead?


Regards,
Nicolas
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/3] Proposal for VRF-lite

2015-06-09 Thread Nicolas Dichtel

Le 09/06/2015 16:21, David Ahern a écrit :

Hi Nicolas:

On 6/9/15 2:58 AM, Nicolas Dichtel wrote:

I'm not really in favor of the name 'vrf'. This term is very
controversial and
having a consensus of what is/contains a 'vrf' is quite impossible.
There was already a lot of discussions about this topic on quagga ml
that show
that everybody has a different opinion about this term ;-)


Are you referring to this thread?
https://lists.quagga.net/pipermail/quagga-dev/2014-November/011795.html

No, there were recent discussions on quagga about that subject. Here is some
non-exhaustive pointers:
https://lists.quagga.net/pipermail/quagga-dev/2015-May/012581.html
https://lists.quagga.net/pipermail/quagga-dev/2015-May/012630.html
https://lists.quagga.net/pipermail/quagga-dev/2015-June/012715.html

Note the last pointer also explains why it was called MRF by Cumulus.


Regards,
Nicolas
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/3] Proposal for VRF-lite

2015-06-09 Thread David Ahern

Hi Nicolas:

On 6/9/15 2:58 AM, Nicolas Dichtel wrote:

I'm not really in favor of the name 'vrf'. This term is very
controversial and
having a consensus of what is/contains a 'vrf' is quite impossible.
There was already a lot of discussions about this topic on quagga ml
that show
that everybody has a different opinion about this term ;-)


Are you referring to this thread?
https://lists.quagga.net/pipermail/quagga-dev/2014-November/011795.html

I could see differing opinions regarding the implementation of a VRF; is 
there really a controversy on what a VRF is?


David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/3] Proposal for VRF-lite

2015-06-09 Thread Shrijeet Mukherjee
On Tue, Jun 9, 2015 at 7:55 AM, Nicolas Dichtel
nicolas.dich...@6wind.com wrote:
 Le 09/06/2015 16:21, David Ahern a écrit :

 Hi Nicolas:

 On 6/9/15 2:58 AM, Nicolas Dichtel wrote:

 I'm not really in favor of the name 'vrf'. This term is very
 controversial and
 having a consensus of what is/contains a 'vrf' is quite impossible.
 There was already a lot of discussions about this topic on quagga ml
 that show
 that everybody has a different opinion about this term ;-)


 Are you referring to this thread?
 https://lists.quagga.net/pipermail/quagga-dev/2014-November/011795.html

 No, there were recent discussions on quagga about that subject. Here is some
 non-exhaustive pointers:
 https://lists.quagga.net/pipermail/quagga-dev/2015-May/012581.html
 https://lists.quagga.net/pipermail/quagga-dev/2015-May/012630.html
 https://lists.quagga.net/pipermail/quagga-dev/2015-June/012715.html

 Note the last pointer also explains why it was called MRF by Cumulus.



Agreed, I used the term VRF for this series to make sure we had the
right context, but clearly MRF is a term we are happier to use ..

 Regards,
 Nicolas
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/3] Proposal for VRF-lite

2015-06-09 Thread Nicolas Dichtel

Le 09/06/2015 12:15, Thomas Graf a écrit :

On 06/08/15 at 11:35am, Shrijeet Mukherjee wrote:
[...]

model with some performance paths that need optimization. (Specifically
the output route selector that Roopa, Robert, Thomas and EricB are
currently discussing on the MPLS thread)


Thanks for posting these patches just in time. This explains how
you intent to deploy Roopa's patches in a scalable manner.


High Level points

1. Simple overlay driver (minimal changes to current stack)
* uses the existing fib tables and fib rules infrastructure
2. Modelled closely after the ipvlan driver
3. Uses current API and infrastructure.
* Applications can use SO_BINDTODEVICE or cmsg device indentifiers
  to pick VRF (ping, traceroute just work)


I like the aspect of reusing existing user interfaces. We might
need to introduce a more fine grained capability than CAP_NET_RAW
to give containers the privileges to bind to a VRF without
allowing them to inject raw frames.

Given I understand this correctly: If my intent was to run a
process in multiple VRFs, then I would need to run that process
in the host network namespace which contains the VRF devices
which would also contain the physical devices. While I might want
to grant my process the ability to bind to VRFs, I may not want
to give it the privileges to bind to any device. So we could
consider introducing CAP_NET_VRF which would allow to bind to
VRF devices.


If I understand correctly, all existing applications should also be modified
if I want to run them into a VRF/MRF (see my previous email)?

ssh, dhcp, httpd, etc should be runnable per MRF without modifications of
their source code. So, it becomes a netns. What's about an IKE dameon?

It makes sense to have both: netns and MRF ; each can have their own logics
of VRF-like behavior depending on how a VRF is defined by the end users.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/3] Proposal for VRF-lite

2015-06-09 Thread Hannes Frederic Sowa
On Tue, Jun 9, 2015, at 14:30, Nicolas Dichtel wrote:
 Le 09/06/2015 12:15, Thomas Graf a écrit :
  On 06/08/15 at 11:35am, Shrijeet Mukherjee wrote:
  [...]
  model with some performance paths that need optimization. (Specifically
  the output route selector that Roopa, Robert, Thomas and EricB are
  currently discussing on the MPLS thread)
 
  Thanks for posting these patches just in time. This explains how
  you intent to deploy Roopa's patches in a scalable manner.
 
  High Level points
 
  1. Simple overlay driver (minimal changes to current stack)
  * uses the existing fib tables and fib rules infrastructure
  2. Modelled closely after the ipvlan driver
  3. Uses current API and infrastructure.
  * Applications can use SO_BINDTODEVICE or cmsg device indentifiers
to pick VRF (ping, traceroute just work)
 
  I like the aspect of reusing existing user interfaces. We might
  need to introduce a more fine grained capability than CAP_NET_RAW
  to give containers the privileges to bind to a VRF without
  allowing them to inject raw frames.
 
  Given I understand this correctly: If my intent was to run a
  process in multiple VRFs, then I would need to run that process
  in the host network namespace which contains the VRF devices
  which would also contain the physical devices. While I might want
  to grant my process the ability to bind to VRFs, I may not want
  to give it the privileges to bind to any device. So we could
  consider introducing CAP_NET_VRF which would allow to bind to
  VRF devices.
 
 If I understand correctly, all existing applications should also be
 modified
 if I want to run them into a VRF/MRF (see my previous email)?
 
 ssh, dhcp, httpd, etc should be runnable per MRF without modifications of
 their source code. So, it becomes a netns. What's about an IKE dameon?
 
 It makes sense to have both: netns and MRF ; each can have their own
 logics
 of VRF-like behavior depending on how a VRF is defined by the end users.

Agreed, the idea is to have a prctl in the end which gets inherited by
fork. current-rt_table_id or some kind of vrf specifier in task_struct
would make that possible then.

A helper tool like ip route exec table 100 /bin/bash would then start a
session bound to a specific routing instance.

Bye,
Hannes
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/3] Proposal for VRF-lite

2015-06-08 Thread David Ahern

On 6/8/15 12:35 PM, Shrijeet Mukherjee wrote:

5. Debugging is built-in as tcpdump and counters on the VRF device
works as is.


Is the intent that something like this

  tcpdump -i vrf0

can be used to see vrf traffic?

vrf_handle_frame only bumps counters; it does not switch skb-dev to the 
vrf device so for Rx path tcpdump will not get the packets. ie., tcpdump 
only shows outbound packets.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/3] Proposal for VRF-lite

2015-06-08 Thread Shrijeet Mukherjee
Good catch, as you know I used to have the device getting modified in
the RX path and that made it all work

generic ip_rcv will need a fix to make RX visible to tcpdump, but yes,
that is the goal.

On Mon, Jun 8, 2015 at 12:13 PM, David Ahern dsah...@gmail.com wrote:
 On 6/8/15 12:35 PM, Shrijeet Mukherjee wrote:

 5. Debugging is built-in as tcpdump and counters on the VRF device
 works as is.


 Is the intent that something like this

   tcpdump -i vrf0

 can be used to see vrf traffic?

 vrf_handle_frame only bumps counters; it does not switch skb-dev to the vrf
 device so for Rx path tcpdump will not get the packets. ie., tcpdump only
 shows outbound packets.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/3] Proposal for VRF-lite

2015-06-08 Thread Hannes Frederic Sowa
On Mon, Jun 8, 2015, at 21:13, David Ahern wrote:
 On 6/8/15 12:35 PM, Shrijeet Mukherjee wrote:
  5. Debugging is built-in as tcpdump and counters on the VRF device
  works as is.
 
 Is the intent that something like this
 
tcpdump -i vrf0
 
 can be used to see vrf traffic?
 
 vrf_handle_frame only bumps counters; it does not switch skb-dev to the 
 vrf device so for Rx path tcpdump will not get the packets. ie., tcpdump 
 only shows outbound packets.

My hope initially was that the vrf interface type would be as slim as
possible. I
am not even sure if we need packet counters, as one could easily have
user
space handle that by looking up the relations and accumulating them.
Same
for VRF traffic.

But the current model does allow to add support for that easily, so why
not? It
depends on how far we can and want to move parts of the logic into the
core
stack in the end.

Would you see this as a requirement?

Thanks,
Hannes
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH -mm take5 0/7] proposal for dynamic configurable netconsole

2007-06-13 Thread Keiichi KII
From: Keiichi KII [EMAIL PROTECTED]

The netconsole is a very useful module for collecting kernel message under
certain circumstances(e.g. disk logging fails, serial port is unavailable).

But current netconsole is not flexible. For example, if you want to change ip
address for logging agent, in the case of built-in netconsole, you can't change
config except for changing boot parameter and rebooting your system, or in the
case of module netconsole, you need to remove it and reload with different
parameters.

By adopting my patches, the current netconsole becomes a little complex.
But the kernel messages(especially panic messages) is significant information
 to solve bugs and troubles promptly and we have been losing serial console
port with PCs and Servers.

I think that we need the environment in which we can collect kernel messages
flexibly.

So, I propose the following extended features for netconsole.

1) support for multiple logging agents.
2) add interface to access each parameter of netconsole
   using sysfs.

[changes since take4]
-change kernel base from 2.6.21-rc6-mm1 to 2.6.22-rc4-mm2.
-update Documentation/networking/netconsole.txt
-fix Kconfig
-avoid forward-declared statics
-fix coding style
-use spin_lock_irqsave() and _restore()
-fix race condition(netconsole_event())
-remove extra lock(write in sysfs)
-change ioctl's location
-use kasprintf()
-error handling

Your comments are very welcome.

Signed-off-by: Keiichi KII [EMAIL PROTECTED]
Signed-off-by: Takayoshi Kochi [EMAIL PROTECTED]
---


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH -mm take4 0/6] proposal for dynamic configurable netconsole

2007-04-18 Thread Keiichi KII
From: Keiichi KII [EMAIL PROTECTED]

The netconsole is a very useful module for collecting kernel message under
certain circumstances(e.g. disk logging fails, serial port is unavailable).

But current netconsole is not flexible. For example, if you want to change ip
address for logging agent, in the case of built-in netconsole, you can't change
config except for changing boot parameter and rebooting your system, or in the
case of module netconsole, you need to remove it and reload with different
parameters.

By adopting my patches, the current netconsole becomes a little complex.
But the kernel messages(especially panic messages) is significant information
 to solve bugs and troubles promptly and we have been losing serial console
port with PCs and Servers.

I think that we need the environment in which we can collect kernel messages
flexibly.

So, I propose the following extended features for netconsole.

1) support for multiple logging agents.
2) add interface to access each parameter of netconsole
   using sysfs.

[changes since take3]
-changing kernel base from 2.6.21-rc3-mm2 to 2.6.21-rc6-mm1.
-introducing CONFIG_NETCONSOLE_DYNCON.
-cleanup

Your comments are very welcome.

Signed-off-by: Keiichi KII [EMAIL PROTECTED]
Signed-off-by: Takayoshi Kochi [EMAIL PROTECTED]
---

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH -mm take3 0/6][resend] proposal for dynamic configurable netconsole

2007-03-20 Thread Keiichi KII
From: Keiichi KII [EMAIL PROTECTED]

The netconsole is a very useful module for collecting kernel message under
certain circumstances(e.g. disk logging fails, serial port is unavailable).

But current netconsole is not flexible. For example, if you want to change ip
address for logging agent, in the case of built-in netconsole, you can't change
config except for changing boot parameter and rebooting your system, or in the
case of module netconsole, you need to remove it and reload with different
parameters.

So, I propose the following extended features for netconsole.

1) support for multiple logging agents.
2) add interface to access each parameter of netconsole
   using sysfs.

[changes since take2]
-changing kernel base from 2.6.20-rc1-mm1 to 2.6.21-rc3.mm2.
-using symbolic link for network device.
-changing in part interface from sysfs to ioctl,
 because Stephen Hemminger advised us that it is a misuse that
 sysfs has the behavior with magic side effect such as adding/removing port.

This patch is for linux-2.6.21-rc3-mm2 and is divided to each function.
Your comments are very welcome.

Signed-off-by: Keiichi KII [EMAIL PROTECTED]
Signed-off-by: Takayoshi Kochi [EMAIL PROTECTED]
---


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH -mm 0/5] proposal for dynamic configurable netconsole

2006-12-25 Thread Keiichi KII

Thank you for your comments.


So, I propose the following extended features for netconsole.

1) support for multiple logging agents.
2) add interface to access each parameter of netconsole
   using sysfs.

This patch is for linux-2.6.20-rc1-mm1 and is divided to each function.
Your comments are very welcome.


Rather than extending the existing kludge with module parameter, to
sysfs. I would rather see a better API for this. Please build think
about doing a better API with a basic set of ioctl's. Some additional


What advantage do we use a set of ioctl's compared to sysfs?
I think that sysfs is easier and more readable than the ioctl's 
to change configurations(IP address and port number and so on).


ex)
# cat /sys/class/misc/netconsole/port1/remote_ip
192.168.0.1
# echo 172.16.0.1  /sys/class/misc/netconsole/port1/remote_ip
# cat /sys/class/misc/netconsole/port1/remote_ip
172.16.0.1

And the sysfs doesn't need to create access program such as the ioctl's.
If you change configurations related to netconsole through the sysfs interface, 
a simple script file including a set of commands such as above echo 
will help you set up automatically.



things:
- shouldn't just be IPV4 specific, should handle IPV6 as well


I would like to implement handling IPV6 on demand in the future.


- shouldn't specify MAC address, it can do network discovery/arp to
  find that when adding addresses


I think a userland application would rather find target MAC address and 
change it through the sysfs.


--
Keiichi KII
NEC Corporation OSS Promotion Center
E-mail: [EMAIL PROTECTED]




-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -mm take2 0/5] proposal for dynamic configurable netconsole

2006-12-25 Thread Keiichi KII
From: Keiichi KII [EMAIL PROTECTED]

The netconsole is a very useful module for collecting kernel message under
certain circumstances(e.g. disk logging fails, serial port is unavailable).

But current netconsole is not flexible. For example, if you want to change ip
address for logging agent, in the case of built-in netconsole, you can't change
config except for changing boot parameter and rebooting your system, or in the
case of module netconsole, you need to remove it and reload with different 
parameters.

So, I propose the following extended features for netconsole.

1) support for multiple logging agents.
2) add interface to access each parameter of netconsole
   using sysfs.

This patch is for linux-2.6.20-rc1-mm1 and is divided to each function.
Your comments are very welcome.

Signed-off-by: Keiichi KII [EMAIL PROTECTED]
Signed-off-by: Takayoshi Kochi [EMAIL PROTECTED]
---
-- 
Keiichi KII
NEC Corporation OSS Promotion Center
E-mail: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH -mm 0/5] proposal for dynamic configurable netconsole

2006-12-22 Thread Keiichi KII
From: Keiichi KII [EMAIL PROTECTED]

The netconsole is a very useful module for collecting kernel message under
certain circumstances(e.g. disk logging fails, serial port is unavailable).

But current netconsole is not flexible. For example, if you want to change ip
address for logging agent, in the case of built-in netconsole, you can't change
config except for changing boot parameter and rebooting your system, or in the
case of module netconsole, you need to reload netconsole module.

So, I propose the following extended features for netconsole.

1) support for multiple logging agents.
2) add interface to access each parameter of netconsole
   using sysfs.

This patch is for linux-2.6.20-rc1-mm1 and is divided to each function.
Your comments are very welcome.

Signed-off-by: Keiichi KII [EMAIL PROTECTED]
---
[changes]
1. change kernel base from 2.6.19 to 2.6.20-rc1-mm1.
-- 
Keiichi KII
NEC Corporation OSS Promotion Center
E-mail: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH -mm 0/5] proposal for dynamic configurable netconsole

2006-12-22 Thread Stephen Hemminger
On Fri, 22 Dec 2006 21:01:09 +0900
Keiichi KII [EMAIL PROTECTED] wrote:

 From: Keiichi KII [EMAIL PROTECTED]
 
 The netconsole is a very useful module for collecting kernel message under
 certain circumstances(e.g. disk logging fails, serial port is unavailable).
 
 But current netconsole is not flexible. For example, if you want to change ip
 address for logging agent, in the case of built-in netconsole, you can't 
 change
 config except for changing boot parameter and rebooting your system, or in the
 case of module netconsole, you need to reload netconsole module.

If netconsole is a module, you should be able to remove it and reload
with different parameters.

 So, I propose the following extended features for netconsole.
 
 1) support for multiple logging agents.
 2) add interface to access each parameter of netconsole
using sysfs.
 
 This patch is for linux-2.6.20-rc1-mm1 and is divided to each function.
 Your comments are very welcome.

Rather than extending the existing kludge with module parameter, to
sysfs. I would rather see a better API for this. Please build think
about doing a better API with a basic set of ioctl's. Some additional
things:
- shouldn't just be IPV4 specific, should handle IPV6 as well
- shouldn't specify MAC address, it can do network discovery/arp to
  find that when adding addresses
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 2.6.19 take2 0/5] proposal for dynamic configurable netconsole

2006-12-21 Thread Keiichi KII
From: Keiichi KII [EMAIL PROTECTED]

The netconsole is a very useful module for collecting kernel message under
certain circumstances(e.g. disk logging fails, serial port is unavailable).

But current netconsole is not flexible. For example, if you want to change ip
address for logging agent, in the case of built-in netconsole, you can't change
config except for changing boot parameter and rebooting your system, or in the
case of module netconsole, you need to reload netconsole module.

So, I propose the following extended features for netconsole.

1) support for multiple logging agents.
2) add interface to access each parameter of netconsole
   using sysfs.

This patch is for linux-2.6.19 and is divided to each function.
Your comments are very welcome.

Signed-off-by: Keiichi KII [EMAIL PROTECTED]
---
-- 
Keiichi KII
NEC Corporation OSS Promotion Center
E-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [sungem] proposal for a new locking strategy

2006-11-06 Thread Stephen Hemminger
On Sun, 5 Nov 2006 21:11:34 +0100
Eric Lemoine [EMAIL PROTECTED] wrote:

 On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote:
  On Sun, 5 Nov 2006 18:52:45 +0100
  Eric Lemoine [EMAIL PROTECTED] wrote:
 
   On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote:
On Sun, 5 Nov 2006 18:28:33 +0100
Eric Lemoine [EMAIL PROTECTED] wrote:
   
  You could also just use net_tx_lock() now.

 You mean netif_tx_lock()?

 Thanks for letting me know about that function. Yes, I may need it.
 tg3 and bnx2 use it to wake up the transmit queue:

  if (unlikely(netif_queue_stopped(tp-dev) 
   (tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH))) {
  netif_tx_lock(tp-dev);
  if (netif_queue_stopped(tp-dev) 
  (tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH))
  netif_wake_queue(tp-dev);
  netif_tx_unlock(tp-dev);
  }

 2.6.17 didn't use it. Was it a bug?

 Thanks,
   
No, it was introduced in 2.6.18. The functions are just a wrapper
around the network device transmit lock that is normally held.
   
If the device does not need to acquire the lock during IRQ, it
is a good alternative and avoids a second lock.
   
For transmit locking there are three common alternatives:
   
Method A: dev-queue_xmit_lock and per-device tx_lock
send: dev-xmit_lock held by caller
dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock
   
irq:  netdev_priv(dev)-tx_lock acquired
   
Method B: dev-queue_xmit_lock only
send: dev-xmit_lock held by caller
irq:  schedules softirq (NAPI)
napi_poll: calls netif_tx_lock() which acquires dev-xmit_lock
   
Method C: LLTX
set dev-features LLTX
send: no locks held by caller
dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock
irq: netdev_priv(dev)-tx_lock acquired
   
Method A is the only one that works with 2.4 and early (2.6.8?) kernels.
   
  
   Current sungem does Method C, and uses two locks: lock and tx_lock.
   What I was planning to do is Method B (which current tg3 uses). It
   seems to me that Method B is better than Method C. What do you think?
 
  B is better than C because the transmit logic doesn't have to
  spin in the case of lock contention, but it is not a big difference.
 
 Current sungem does C but uses try_lock() to acquire its private
 tx_lock. So it doesn't spin either in case of contention.


But the spin is still there, just more complex..
In qdisc_restart() processing of NETDEV_TX_LOCKED causes:
spin_lock(dev-xmit_lock)

q-requeue()
netif_schedule(dev);

SOFTIRQ:
net_tx_action()
qdisc_run() -- qdisc_restart()

So instead of spinning in tight loop, you end up with a longer code
path.


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [sungem] proposal for a new locking strategy

2006-11-06 Thread Eric Lemoine

On 11/6/06, Stephen Hemminger [EMAIL PROTECTED] wrote:

On Sun, 5 Nov 2006 21:11:34 +0100
Eric Lemoine [EMAIL PROTECTED] wrote:

 On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote:
  On Sun, 5 Nov 2006 18:52:45 +0100
  Eric Lemoine [EMAIL PROTECTED] wrote:
 
   On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote:
On Sun, 5 Nov 2006 18:28:33 +0100
Eric Lemoine [EMAIL PROTECTED] wrote:
   
  You could also just use net_tx_lock() now.

 You mean netif_tx_lock()?

 Thanks for letting me know about that function. Yes, I may need it.
 tg3 and bnx2 use it to wake up the transmit queue:

  if (unlikely(netif_queue_stopped(tp-dev) 
   (tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH))) {
  netif_tx_lock(tp-dev);
  if (netif_queue_stopped(tp-dev) 
  (tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH))
  netif_wake_queue(tp-dev);
  netif_tx_unlock(tp-dev);
  }

 2.6.17 didn't use it. Was it a bug?

 Thanks,
   
No, it was introduced in 2.6.18. The functions are just a wrapper
around the network device transmit lock that is normally held.
   
If the device does not need to acquire the lock during IRQ, it
is a good alternative and avoids a second lock.
   
For transmit locking there are three common alternatives:
   
Method A: dev-queue_xmit_lock and per-device tx_lock
send: dev-xmit_lock held by caller
dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock
   
irq:  netdev_priv(dev)-tx_lock acquired
   
Method B: dev-queue_xmit_lock only
send: dev-xmit_lock held by caller
irq:  schedules softirq (NAPI)
napi_poll: calls netif_tx_lock() which acquires dev-xmit_lock
   
Method C: LLTX
set dev-features LLTX
send: no locks held by caller
dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock
irq: netdev_priv(dev)-tx_lock acquired
   
Method A is the only one that works with 2.4 and early (2.6.8?) kernels.
   
  
   Current sungem does Method C, and uses two locks: lock and tx_lock.
   What I was planning to do is Method B (which current tg3 uses). It
   seems to me that Method B is better than Method C. What do you think?
 
  B is better than C because the transmit logic doesn't have to
  spin in the case of lock contention, but it is not a big difference.

 Current sungem does C but uses try_lock() to acquire its private
 tx_lock. So it doesn't spin either in case of contention.


But the spin is still there, just more complex..
In qdisc_restart() processing of NETDEV_TX_LOCKED causes:
spin_lock(dev-xmit_lock)

q-requeue()
netif_schedule(dev);

SOFTIRQ:
net_tx_action()
qdisc_run() -- qdisc_restart()

So instead of spinning in tight loop, you end up with a longer code
path.


Stephen, sorry for insisting a bit but I'm failing to see how B is
different from C in that respect. With method B, in qdisc_restart(),
if netif_tx_trylock() fails to acquire the lock then we also
requeue(), etc. Same long code path in case of contention.

--
Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [sungem] proposal for a new locking strategy

2006-11-06 Thread Stephen Hemminger
On Mon, 6 Nov 2006 21:55:20 +0100
Eric Lemoine [EMAIL PROTECTED] wrote:

 On 11/6/06, Stephen Hemminger [EMAIL PROTECTED] wrote:
  On Sun, 5 Nov 2006 21:11:34 +0100
  Eric Lemoine [EMAIL PROTECTED] wrote:
 
   On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote:
On Sun, 5 Nov 2006 18:52:45 +0100
Eric Lemoine [EMAIL PROTECTED] wrote:
   
 On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote:
  On Sun, 5 Nov 2006 18:28:33 +0100
  Eric Lemoine [EMAIL PROTECTED] wrote:
 
You could also just use net_tx_lock() now.
  
   You mean netif_tx_lock()?
  
   Thanks for letting me know about that function. Yes, I may need 
   it.
   tg3 and bnx2 use it to wake up the transmit queue:
  
if (unlikely(netif_queue_stopped(tp-dev) 
 (tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH))) 
   {
netif_tx_lock(tp-dev);
if (netif_queue_stopped(tp-dev) 
(tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH))
netif_wake_queue(tp-dev);
netif_tx_unlock(tp-dev);
}
  
   2.6.17 didn't use it. Was it a bug?
  
   Thanks,
 
  No, it was introduced in 2.6.18. The functions are just a wrapper
  around the network device transmit lock that is normally held.
 
  If the device does not need to acquire the lock during IRQ, it
  is a good alternative and avoids a second lock.
 
  For transmit locking there are three common alternatives:
 
  Method A: dev-queue_xmit_lock and per-device tx_lock
  send: dev-xmit_lock held by caller
  dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock
 
  irq:  netdev_priv(dev)-tx_lock acquired
 
  Method B: dev-queue_xmit_lock only
  send: dev-xmit_lock held by caller
  irq:  schedules softirq (NAPI)
  napi_poll: calls netif_tx_lock() which acquires 
  dev-xmit_lock
 
  Method C: LLTX
  set dev-features LLTX
  send: no locks held by caller
  dev-hard_start_xmit acquires 
  netdev_priv(dev)-tx_lock
  irq: netdev_priv(dev)-tx_lock acquired
 
  Method A is the only one that works with 2.4 and early (2.6.8?) 
  kernels.
 

 Current sungem does Method C, and uses two locks: lock and tx_lock.
 What I was planning to do is Method B (which current tg3 uses). It
 seems to me that Method B is better than Method C. What do you think?
   
B is better than C because the transmit logic doesn't have to
spin in the case of lock contention, but it is not a big difference.
  
   Current sungem does C but uses try_lock() to acquire its private
   tx_lock. So it doesn't spin either in case of contention.
 
 
  But the spin is still there, just more complex..
  In qdisc_restart() processing of NETDEV_TX_LOCKED causes:
  spin_lock(dev-xmit_lock)
 
  q-requeue()
  netif_schedule(dev);
 
  SOFTIRQ:
  net_tx_action()
  qdisc_run() -- qdisc_restart()
 
  So instead of spinning in tight loop, you end up with a longer code
  path.
 
 Stephen, sorry for insisting a bit but I'm failing to see how B is
 different from C in that respect. With method B, in qdisc_restart(),
 if netif_tx_trylock() fails to acquire the lock then we also
 requeue(), etc. Same long code path in case of contention.
 

Method C LLTX causes repeated softirq's which will be slower since the loop
requires more instructions than a simple spin loop (Method B).


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [sungem] proposal for a new locking strategy

2006-11-06 Thread Eric Lemoine

On 11/6/06, Stephen Hemminger [EMAIL PROTECTED] wrote:

On Mon, 6 Nov 2006 21:55:20 +0100
Eric Lemoine [EMAIL PROTECTED] wrote:

 On 11/6/06, Stephen Hemminger [EMAIL PROTECTED] wrote:
  On Sun, 5 Nov 2006 21:11:34 +0100
  Eric Lemoine [EMAIL PROTECTED] wrote:
 
   On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote:
On Sun, 5 Nov 2006 18:52:45 +0100
Eric Lemoine [EMAIL PROTECTED] wrote:
   
 On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote:
  On Sun, 5 Nov 2006 18:28:33 +0100
  Eric Lemoine [EMAIL PROTECTED] wrote:
 
You could also just use net_tx_lock() now.
  
   You mean netif_tx_lock()?
  
   Thanks for letting me know about that function. Yes, I may need 
it.
   tg3 and bnx2 use it to wake up the transmit queue:
  
if (unlikely(netif_queue_stopped(tp-dev) 
 (tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH))) 
{
netif_tx_lock(tp-dev);
if (netif_queue_stopped(tp-dev) 
(tg3_tx_avail(tp)  TG3_TX_WAKEUP_THRESH))
netif_wake_queue(tp-dev);
netif_tx_unlock(tp-dev);
}
  
   2.6.17 didn't use it. Was it a bug?
  
   Thanks,
 
  No, it was introduced in 2.6.18. The functions are just a wrapper
  around the network device transmit lock that is normally held.
 
  If the device does not need to acquire the lock during IRQ, it
  is a good alternative and avoids a second lock.
 
  For transmit locking there are three common alternatives:
 
  Method A: dev-queue_xmit_lock and per-device tx_lock
  send: dev-xmit_lock held by caller
  dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock
 
  irq:  netdev_priv(dev)-tx_lock acquired
 
  Method B: dev-queue_xmit_lock only
  send: dev-xmit_lock held by caller
  irq:  schedules softirq (NAPI)
  napi_poll: calls netif_tx_lock() which acquires 
dev-xmit_lock
 
  Method C: LLTX
  set dev-features LLTX
  send: no locks held by caller
  dev-hard_start_xmit acquires 
netdev_priv(dev)-tx_lock
  irq: netdev_priv(dev)-tx_lock acquired
 
  Method A is the only one that works with 2.4 and early (2.6.8?) 
kernels.
 

 Current sungem does Method C, and uses two locks: lock and tx_lock.
 What I was planning to do is Method B (which current tg3 uses). It
 seems to me that Method B is better than Method C. What do you think?
   
B is better than C because the transmit logic doesn't have to
spin in the case of lock contention, but it is not a big difference.
  
   Current sungem does C but uses try_lock() to acquire its private
   tx_lock. So it doesn't spin either in case of contention.
 
 
  But the spin is still there, just more complex..
  In qdisc_restart() processing of NETDEV_TX_LOCKED causes:
  spin_lock(dev-xmit_lock)
 
  q-requeue()
  netif_schedule(dev);
 
  SOFTIRQ:
  net_tx_action()
  qdisc_run() -- qdisc_restart()
 
  So instead of spinning in tight loop, you end up with a longer code
  path.

 Stephen, sorry for insisting a bit but I'm failing to see how B is
 different from C in that respect. With method B, in qdisc_restart(),
 if netif_tx_trylock() fails to acquire the lock then we also
 requeue(), etc. Same long code path in case of contention.


Method C LLTX causes repeated softirq's which will be slower since the loop
requires more instructions than a simple spin loop (Method B).


What I'm saying above is that Method B also causes repeated tx
softirqs in case of contention on netif_tx_lock. The code path is :
netif_tx_trylock() fails - requeue() - netif_schedule() -
raise_softirq(NET_TX_SOFTIRQ). Am I missing anything?


--
Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[sungem] proposal for a new locking strategy

2006-11-05 Thread Eric Lemoine

Hi!

Some (long) time ago benh wrote a blaming comment in sungem.c about
that driver's locking strategy. That comment basically says that we
probably don't need two spinlocks.

I agree!

Proposal:

Today's sungem effectively uses two spinlock's: lock and tx_lock.

tx_lock is held by the xmit function when sending out a packet. Lots
of functions grab tx_lock not to mess up with xmit (gem_stop_phy(),
gem_change_mtu(), etc.).

All of these funcs also take lock!

What we could do is remove lx_lock, have the above functions take
only lock, and rely on dev-_xmit_lock to protect the xmit func from
reentrance. In that case, obviously, the driver wouldn't feature LLTX
anymore.

When (re-)configuring we'd now quiesce the device, with the new
functions gem_netif_stop() and gem_full_lock(), in the same way as tg3
does.

gem_interrupt(), gem_poll(), and gem_start_xmit() could become lockless. Fast!

Basically this proposal makes the data path faster, the control path
slower, and simplifies the code by using one single spinlock within
the driver.

If the idea seems reasonable to you guys I can go ahead and cook up something...

Thanks,

--
Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [sungem] proposal for a new locking strategy

2006-11-05 Thread Benjamin Herrenschmidt
On Sun, 2006-11-05 at 14:00 +0100, Eric Lemoine wrote:
 Hi!
 
 Some (long) time ago benh wrote a blaming comment in sungem.c about
 that driver's locking strategy. That comment basically says that we
 probably don't need two spinlocks.

Yeah :) Note that I mostly blamed myself there ... Just never found the
time to sit down a figure out a proper locking.

 I agree!
 
 Proposal:
 
 Today's sungem effectively uses two spinlock's: lock and tx_lock.
 
 tx_lock is held by the xmit function when sending out a packet. Lots
 of functions grab tx_lock not to mess up with xmit (gem_stop_phy(),
 gem_change_mtu(), etc.).
 
 All of these funcs also take lock!
 
 What we could do is remove lx_lock, have the above functions take
 only lock, and rely on dev-_xmit_lock to protect the xmit func from
 reentrance. In that case, obviously, the driver wouldn't feature LLTX
 anymore.

We could probably do even better but yeah, a single lock is a good
start. Overall, I'm unhappy with the infrastructure provided by the
network stack though. (Might be better nowadays, but last I looked, for
example, I couldn't properly do things like stopping  MAPI poll from
set_multicast etc... due to locks held by the upper level).

 When (re-)configuring we'd now quiesce the device, with the new
 functions gem_netif_stop() and gem_full_lock(), in the same way as tg3
 does.
 
 gem_interrupt(), gem_poll(), and gem_start_xmit() could become lockless. Fast!
 
 Basically this proposal makes the data path faster, the control path
 slower, and simplifies the code by using one single spinlock within
 the driver.
 
 If the idea seems reasonable to you guys I can go ahead and cook up 
 something...

I certainly does. Bring on the patch ! :-)

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [sungem] proposal for a new locking strategy

2006-11-05 Thread Eric Lemoine

On 11/5/06, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:

On Sun, 2006-11-05 at 14:00 +0100, Eric Lemoine wrote:
 Hi!

 Some (long) time ago benh wrote a blaming comment in sungem.c about
 that driver's locking strategy. That comment basically says that we
 probably don't need two spinlocks.

Yeah :) Note that I mostly blamed myself there ... Just never found the
time to sit down a figure out a proper locking.


I actually did introduce tx_lock! So you could well have blamed me :-)




 I agree!

 Proposal:

 Today's sungem effectively uses two spinlock's: lock and tx_lock.

 tx_lock is held by the xmit function when sending out a packet. Lots
 of functions grab tx_lock not to mess up with xmit (gem_stop_phy(),
 gem_change_mtu(), etc.).

 All of these funcs also take lock!

 What we could do is remove lx_lock, have the above functions take
 only lock, and rely on dev-_xmit_lock to protect the xmit func from
 reentrance. In that case, obviously, the driver wouldn't feature LLTX
 anymore.

We could probably do even better but yeah, a single lock is a good
start. Overall, I'm unhappy with the infrastructure provided by the
network stack though. (Might be better nowadays, but last I looked, for
example, I couldn't properly do things like stopping  MAPI poll from
set_multicast etc... due to locks held by the upper level).


What you said in your comment is that set_multicast and change_mtu
cannot schedule() because the upper layer holds a spinlock. This is
still the case actually.



 When (re-)configuring we'd now quiesce the device, with the new
 functions gem_netif_stop() and gem_full_lock(), in the same way as tg3
 does.

 gem_interrupt(), gem_poll(), and gem_start_xmit() could become lockless. Fast!

 Basically this proposal makes the data path faster, the control path
 slower, and simplifies the code by using one single spinlock within
 the driver.

 If the idea seems reasonable to you guys I can go ahead and cook up 
something...

I certainly does. Bring on the patch ! :-)


Will arrange some time to do it.

Thanks for your quick response.


--
Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >