Re: Business Proposal
Dear Friend, My name is Mr. Edward Yuan, a consultant/broker. I know you might be a bit apprehensive because you do not know me. Nevertheless, I have a proposal on behalf of a client, a lucrative business that might be of mutual benefit to you. If interested in this proposition please kindly and urgently contact me for more details. Best Regards. Mr. Edward Yuan. --- This email has been checked for viruses by AVG. https://www.avg.com
Re: Security enhancement proposal for kernel TLS
On 08/02/18 05:23 PM, Vakul Garg wrote: > > I agree that Boris' patch does what you say it does - it sets keys > > immediately > > after CCS instead of after FINISHED message. I disagree that the kernel tls > > implementation currently requires that specific ordering, nor do I think > > that it > > should require that ordering. > > The current kernel implementation assumes record sequence number to start > from '0'. > If keys have to be set after FINISHED message, then record sequence number > need to > be communicated from user space TLS stack to kernel. IIRC, sequence number is > not > part of the interface through which key is transferred. The setsockopt call struct takes the key, iv, salt, and seqno: struct tls12_crypto_info_aes_gcm_128 { struct tls_crypto_info info; unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE]; unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE]; unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE]; unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE]; };
RE: Security enhancement proposal for kernel TLS
> -Original Message- > From: Dave Watson [mailto:davejwat...@fb.com] > Sent: Thursday, August 2, 2018 2:17 AM > To: Vakul Garg > Cc: netdev@vger.kernel.org; Peter Doliwa ; Boris > Pismenny > Subject: Re: Security enhancement proposal for kernel TLS > > On 07/31/18 10:45 AM, Vakul Garg wrote: > > > > IIUC, with the upstream implementation of tls record layer in > > > > kernel, the decryption of tls FINISHED message happens in kernel. > > > > Therefore the keys are already being sent to kernel tls socket > > > > before handshake is > > > completed. > > > > > > This is incorrect. > > > > Let us first reach a common ground on this. > > > > The kernel TLS implementation can decrypt only after setting the keys on > the socket. > > The TLS message 'finished' (which is encrypted) is received after receiving > 'CCS' > > message. After the user space TLS library receives CCS message, it > > sets the keys on kernel TLS socket. Therefore, the next message in the > > socket receive queue which is TLS finished gets decrypted in kernel only. > > > > Please refer to following Boris's patch on openssl. The commit log says: > > " We choose to set this option at the earliest - just after CCS is > > complete". > > I agree that Boris' patch does what you say it does - it sets keys immediately > after CCS instead of after FINISHED message. I disagree that the kernel tls > implementation currently requires that specific ordering, nor do I think that > it > should require that ordering. The current kernel implementation assumes record sequence number to start from '0'. If keys have to be set after FINISHED message, then record sequence number need to be communicated from user space TLS stack to kernel. IIRC, sequence number is not part of the interface through which key is transferred.
Re: Security enhancement proposal for kernel TLS
On 07/31/18 10:45 AM, Vakul Garg wrote: > > > IIUC, with the upstream implementation of tls record layer in kernel, > > > the decryption of tls FINISHED message happens in kernel. Therefore > > > the keys are already being sent to kernel tls socket before handshake is > > completed. > > > > This is incorrect. > > Let us first reach a common ground on this. > > The kernel TLS implementation can decrypt only after setting the keys on the > socket. > The TLS message 'finished' (which is encrypted) is received after receiving > 'CCS' > message. After the user space TLS library receives CCS message, it sets the > keys > on kernel TLS socket. Therefore, the next message in the socket receive queue > which is TLS finished gets decrypted in kernel only. > > Please refer to following Boris's patch on openssl. The commit log says: > " We choose to set this option at the earliest - just after CCS is complete". I agree that Boris' patch does what you say it does - it sets keys immediately after CCS instead of after FINISHED message. I disagree that the kernel tls implementation currently requires that specific ordering, nor do I think that it should require that ordering.
RE: Security enhancement proposal for kernel TLS
> -Original Message- > From: Dave Watson [mailto:davejwat...@fb.com] > Sent: Tuesday, July 31, 2018 2:46 AM > To: Vakul Garg > Cc: netdev@vger.kernel.org; Peter Doliwa ; Boris > Pismenny > Subject: Re: Security enhancement proposal for kernel TLS > > On 07/30/18 06:31 AM, Vakul Garg wrote: > > > It's not entirely clear how your TLS handshake daemon works - Why is > > > it necessary to set the keys in the kernel tls socket before the > > > handshake is completed? > > > > IIUC, with the upstream implementation of tls record layer in kernel, > > the decryption of tls FINISHED message happens in kernel. Therefore > > the keys are already being sent to kernel tls socket before handshake is > completed. > > This is incorrect. Let us first reach a common ground on this. The kernel TLS implementation can decrypt only after setting the keys on the socket. The TLS message 'finished' (which is encrypted) is received after receiving 'CCS' message. After the user space TLS library receives CCS message, it sets the keys on kernel TLS socket. Therefore, the next message in the socket receive queue which is TLS finished gets decrypted in kernel only. Please refer to following Boris's patch on openssl. The commit log says: " We choose to set this option at the earliest - just after CCS is complete". -- commit a01dd062a32c687630b2a860b4bb053008f09ff5 Author: Boris Pismenny Date: Sun Mar 11 16:18:27 2018 +0200 ssl: Linux TLS Rx Offload This patch adds support for the Linux TLS Rx socket option. It completes the previous patch for TLS Tx offload. If the socket option is successful, then the receive data-path of the TCP socket is implemented by the kernel. We choose to set this option at the earliest - just after CCS is complete. -- The fact that keys are handed over to kernel TLS socket can also be verified by putting a log in tls_sw_recvmsg(). I would stop here for you to confirm my observation first. Regards. Vakul > Currently the kernel TLS implementation decrypts > everything after you set the keys on the socket. I'm suggesting that you > don't set the keys on the socket until after the FINISHED message. > > > > Or, why do you need to hand off the fd to the client program before > > > the handshake is completed? > > > > The fd is always owned by the client program.. > > > > In my proposal, the applications poll their own tcp socket using > read/recvmsg etc. > > If they get handshake record, they forward it to the entity running > handshake agent. > > The handshake agent could be a linux daemon or could run on a separate > > security processor like 'Secure element' or say arm trustzone etc. The > > applications forward any handshake message it gets backs from > > handshake agent to the connected tcp socket. Therefore, the > > applications act as a forwarder of the handshake messages between the > peer tls endpoint and handshake agent. > > The received data messages are absorbed by the applications themselves > > (bypassing ssl stack completely). Similarly, the applications tx data > > directly > by writing on their socket. > > > > > Waiting until after handshake solves both of these issues. > > > > The security sensitive check which is 'Wait for handshake to finish > > completely before accepting data' should not be the onus of the > > application. We have enough examples in past where application > > programmers made mistakes in setting up tls correctly. The idea is to > isolate tls session setting up from the applications. > > It's not clear to me what you gain by putting this 'handshake finished' > notification in the kernel instead of in the client's tls library - you're > already > forwarding the handshake start notification to the daemon, why can't the > daemon notify them back in userspace that > the handshake is finished? > > If you did want to put the notification in the kernel, how would you handle > poll on the socket, since probably both the handshake daemon and client > might be polling the socket, but one for control messages and one for data? > > The original kernel TLS RFC did split these to two separate sockets, but we > decided it was too complicated, and that's not how userspace TLS clients > function today. > > Do you have an implementation of this? There are a bunch of tricky corner > cases here, it might make more sense to have something concrete to discuss. > > > Further, as per tls RFC it is ok to piggyback the data records after > > the finished handshake message. This is called early data
Re: Security enhancement proposal for kernel TLS
On 07/30/18 06:31 AM, Vakul Garg wrote: > > It's not entirely clear how your TLS handshake daemon works - Why is > > it necessary to set the keys in the kernel tls socket before the handshake > > is > > completed? > > IIUC, with the upstream implementation of tls record layer in kernel, the > decryption of tls FINISHED message happens in kernel. Therefore the keys are > already being sent to kernel tls socket before handshake is completed. This is incorrect. Currently the kernel TLS implementation decrypts everything after you set the keys on the socket. I'm suggesting that you don't set the keys on the socket until after the FINISHED message. > > Or, why do you need to hand off the fd to the client program > > before the handshake is completed? > > The fd is always owned by the client program.. > > In my proposal, the applications poll their own tcp socket using read/recvmsg > etc. > If they get handshake record, they forward it to the entity running handshake > agent. > The handshake agent could be a linux daemon or could run on a separate > security > processor like 'Secure element' or say arm trustzone etc. The applications > forward any handshake message it gets backs from handshake agent to the > connected tcp socket. Therefore, the applications act as a forwarder of the > handshake > messages between the peer tls endpoint and handshake agent. > The received data messages are absorbed by the applications themselves > (bypassing ssl stack > completely). Similarly, the applications tx data directly by writing on their > socket. > > > Waiting until after handshake solves both of these issues. > > The security sensitive check which is 'Wait for handshake to finish > completely before > accepting data' should not be the onus of the application. We have enough > examples > in past where application programmers made mistakes in setting up tls > correctly. The idea > is to isolate tls session setting up from the applications. It's not clear to me what you gain by putting this 'handshake finished' notification in the kernel instead of in the client's tls library - you're already forwarding the handshake start notification to the daemon, why can't the daemon notify them back in userspace that the handshake is finished? If you did want to put the notification in the kernel, how would you handle poll on the socket, since probably both the handshake daemon and client might be polling the socket, but one for control messages and one for data? The original kernel TLS RFC did split these to two separate sockets, but we decided it was too complicated, and that's not how userspace TLS clients function today. Do you have an implementation of this? There are a bunch of tricky corner cases here, it might make more sense to have something concrete to discuss. > Further, as per tls RFC it is ok to piggyback the data records after the > finished handshake > message. This is called early data. But then it is the responsibility of > applications to first > complete finished message processing before accepting the data records. > > The proposal is to disallow application world seeing data records > before handshake finishes. You're talking about the TLS 1.3 0-RTT feature, which is indeed an interesting case. For in-process TLS libraries, it's fairly easy to punt, and don't set the kernel TLS keys until after the 0-RTT data + handshake message. For an OOB handshake daemon it might indeed make more sense to leave the data in kernelspace ... somehow. > > > - The handshake state should fallback to 'unverified' in case a control > > record is seen again by kernel TLS (e.g. in case of renegotiation, post > > handshake client auth etc). > > > > Currently kernel tls sockets return an error unless you explicitly handle > > the > > control record for exactly this reason. > > IIRC, any kind handshake message post handshake-completion is a problem for > kernel tls. > This includes renegotiation, post handshake client-auth etc. > > Please correct me if I am wrong. You are correct, but currently kernel TLS sockets return an error unless you explicitly handle the control message. This should be enough already to implement your proposal.
RE: Security enhancement proposal for kernel TLS
Sorry for a delayed response. Kindly see inline. > -Original Message- > From: Dave Watson [mailto:davejwat...@fb.com] > Sent: Wednesday, July 25, 2018 9:30 PM > To: Vakul Garg > Cc: netdev@vger.kernel.org; Peter Doliwa ; Boris > Pismenny > Subject: Re: Security enhancement proposal for kernel TLS > > You would probably get more responses if you cc the relevant people. > Comments inline > > On 07/22/18 12:49 PM, Vakul Garg wrote: > > The kernel based TLS record layer allows the user space world to use a > decoupled TLS implementation. > > The applications need not be linked with TLS stack. > > The TLS handshake can be done by a TLS daemon on the behalf of > applications. > > > > Presently, as soon as the handshake process derives keys, it pushes the > negotiated keys to kernel TLS . > > Thereafter the applications can directly read and write data on their TCP > socket (without having to use SSL apis). > > > > With the current kernel TLS implementation, there is a security problem. > > Since the kernel TLS socket does not have information about the state > > of handshake, it allows applications to be able to receive data from the > peer TLS endpoint even when the handshake verification has not been > completed by the SSL daemon. > > It is a security problem if applications can receive data if verification > > of the > handshake transcript is not completed (done with processing tls FINISHED > message). > > > > My proposal: > > - Kernel TLS should maintain state of handshake (verified or > unverified). > > In un-verified state, data records should not be allowed pass through > to the applications. > > > > - Add a new control interface using which that the user space SSL > stack can tell the TLS socket that handshake has been verified and DATA > records can flow. > > In 'unverified' state, only control records should be allowed to pass > and reception DATA record should be pause the receive side record > decryption. > > It's not entirely clear how your TLS handshake daemon works - Why is > it necessary to set the keys in the kernel tls socket before the handshake is > completed? IIUC, with the upstream implementation of tls record layer in kernel, the decryption of tls FINISHED message happens in kernel. Therefore the keys are already being sent to kernel tls socket before handshake is completed. > Or, why do you need to hand off the fd to the client program > before the handshake is completed? The fd is always owned by the client program.. The client program opens up the socket, TCP bind/connect it and then hands it over to SSL stack as a transport handle for exchanging handshake messages. This is how it works today whether we use kernel TLS or not. I do not propose to change it. In my proposal, the applications poll their own tcp socket using read/recvmsg etc. If they get handshake record, they forward it to the entity running handshake agent. The handshake agent could be a linux daemon or could run on a separate security processor like 'Secure element' or say arm trustzone etc. The applications forward any handshake message it gets backs from handshake agent to the connected tcp socket. Therefore, the applications act as a forwarder of the handshake messages between the peer tls endpoint and handshake agent. The received data messages are absorbed by the applications themselves (bypassing ssl stack completely). Similarly, the applications tx data directly by writing on their socket. > Waiting until after handshake solves both of these issues. The security sensitive check which is 'Wait for handshake to finish completely before accepting data' should not be the onus of the application. We have enough examples in past where application programmers made mistakes in setting up tls correctly. The idea is to isolate tls session setting up from the applications. > > I'm not aware of any tls libraries that send data before the finished message, > is there any reason you need to support this? Sending data records before sending finished message is a protocol error. A good tls library never does that. But an attacker can exploit it if applications can receive the data records before handshake is finished. With current kernel TLS, it is possible to do so. Further, as per tls RFC it is ok to piggyback the data records after the finished handshake message. This is called early data. But then it is the responsibility of applications to first complete finished message processing before accepting the data records. The proposal is to disallow application world seeing data records before handshake finishes. > > > > > - The handshake state should fallback to 'unverified' in case a control > record is seen again by k
Re: Security enhancement proposal for kernel TLS
You would probably get more responses if you cc the relevant people. Comments inline On 07/22/18 12:49 PM, Vakul Garg wrote: > The kernel based TLS record layer allows the user space world to use a > decoupled TLS implementation. > The applications need not be linked with TLS stack. > The TLS handshake can be done by a TLS daemon on the behalf of applications. > > Presently, as soon as the handshake process derives keys, it pushes the > negotiated keys to kernel TLS . > Thereafter the applications can directly read and write data on their TCP > socket (without having to use SSL apis). > > With the current kernel TLS implementation, there is a security problem. > Since the kernel TLS socket does not have information about the state of > handshake, > it allows applications to be able to receive data from the peer TLS endpoint > even when the handshake verification has not been completed by the SSL > daemon. > It is a security problem if applications can receive data if verification of > the handshake transcript is not completed (done with processing tls FINISHED > message). > > My proposal: > - Kernel TLS should maintain state of handshake (verified or > unverified). > In un-verified state, data records should not be allowed pass through > to the applications. > > - Add a new control interface using which that the user space SSL stack > can tell the TLS socket that handshake has been verified and DATA records can > flow. > In 'unverified' state, only control records should be allowed to pass > and reception DATA record should be pause the receive side record decryption. It's not entirely clear how your TLS handshake daemon works - Why is it necessary to set the keys in the kernel tls socket before the handshake is completed? Or, why do you need to hand off the fd to the client program before the handshake is completed? Waiting until after handshake solves both of these issues. I'm not aware of any tls libraries that send data before the finished message, is there any reason you need to support this? > > - The handshake state should fallback to 'unverified' in case a control > record is seen again by kernel TLS (e.g. in case of renegotiation, post > handshake client auth etc). Currently kernel tls sockets return an error unless you explicitly handle the control record for exactly this reason. If you want an external daemon to handle control messages after handshake, there definitely might be some synchronization that would make sense to push in the kernel. However, with TLS 1.3 removing renegotiation (and currently reneg is not implemented in kernel tls anyway), there's much less reason to do so.
Security enhancement proposal for kernel TLS
Hi The kernel based TLS record layer allows the user space world to use a decoupled TLS implementation. The applications need not be linked with TLS stack. The TLS handshake can be done by a TLS daemon on the behalf of applications. Presently, as soon as the handshake process derives keys, it pushes the negotiated keys to kernel TLS . Thereafter the applications can directly read and write data on their TCP socket (without having to use SSL apis). With the current kernel TLS implementation, there is a security problem. Since the kernel TLS socket does not have information about the state of handshake, it allows applications to be able to receive data from the peer TLS endpoint even when the handshake verification has not been completed by the SSL daemon. It is a security problem if applications can receive data if verification of the handshake transcript is not completed (done with processing tls FINISHED message). My proposal: - Kernel TLS should maintain state of handshake (verified or unverified). In un-verified state, data records should not be allowed pass through to the applications. - Add a new control interface using which that the user space SSL stack can tell the TLS socket that handshake has been verified and DATA records can flow. In 'unverified' state, only control records should be allowed to pass and reception DATA record should be pause the receive side record decryption. - The handshake state should fallback to 'unverified' in case a control record is seen again by kernel TLS (e.g. in case of renegotiation, post handshake client auth etc). Kindly comment. Regards Vakul
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you i asked before and i still await your positive response thanks
Proposal
Hello I have a business proposal of mutual benefits i would like to discuss with you.
Business Proposal
I am Sgt.Brenda Wilson, originally from Lake Jackson Texas USA.I personally made a special research and I came across your information. I am presently writing this mail to you from U.S Military base Kabul Afghanistan I have a secured business proposal for you. Reply for more details via my private E-mail ( brendawilson...@hotmail.com )
Business Proposal
I am Sgt.Brenda Wilson, originally from Lake Jackson Texas USA.I personally made a special research and I came across your information. I am presently writing this mail to you from U.S Military base Kabul Afghanistan I have a secured business proposal for you. Reply for more details via my private E-mail ( brendawilson...@hotmail.com )
Proposal
-- Good day, i know you do not know me personally but i have checked your profile and i see generosity in you, There's an urgent offer attach to your name here in the office of Mr. Fawaz KhE. Al Saleh Member of the Board of Directors, Kuveyt Türk Participation Bank (Turkey) and head of private banking and wealth management Regards, Mr. Fawaz KhE. Al Saleh
Proposal
-- Hello I have been trying to contact you. Did you get my business proposal? Best Regards, Miss.Victoria Mehmet
Lucrative Business Proposal
-- Dear Friend, I would like to discuss a very important issue with you. I am writing to find out if this is your valid email. Please, let me know if this email is valid Kind regards Adrien Saif Attorney to Quatif Group of Companies
Lucrative Business Proposal
-- Dear Friend, I would like to discuss a very important issue with you. I am writing to find out if this is your valid email. Please, let me know if this email is valid Kind regards Adrien Saif Attorney to Quatif Group of Companies
Proposal
-- Hello I have been trying to contact you. Did you get my business proposal? Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turke
Proposal
-- Hello I have been trying to contact you. Did you get my business proposal? Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turke
Proposal
Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
-- Hello Greetings to you please i have a business proposal for you contact me for more detailes asap thanks. Best Regards, Miss.Zeliha ömer faruk Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
Hello Greetings to you today i asked before but i did't get a response please i know this might come to you as a surprise because you do not know me personally i have a business proposal for our mutual benefit please let me know if you are interested. Best Regards, Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
Hello Greetings to you today i asked before but i did't get a response please i know this might come to you as a surprise because you do not know me personally i have a business proposal for you please reply for more info. Best Regards, Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Proposal
Hello Greetings to you today i asked before but i did't get a response please i know this might come to you as a surprise because you do not know me personally i have a business proposal for you please reply for more info. Best Regards, Esentepe Mahallesi Büyükdere Caddesi Kristal Kule Binasi No:215 Sisli - Istanbul, Turkey
Business Proposal,
Hello Dear Greetings to you, please I have a very important business proposal for our mutual benefit, please let me know if you are interested. Best Regards, Miss. Zeliha ömer Faruk Caddesi Kristal Kule Binasi No:215
Proposal
Hello Greeetings to you please did you get my previous email regarding my investment proposal last week friday ? MS.Zeliha ömer faruk zeliha.omer.fa...@gmail.com
business Proposal / Geschäftsvorschlag
I have a business Proposal for you, contact me directly This business has a cash involvement of $250,000,000.00 Anders Karlsson Ich habe einen Geschäftsvorschlag für Sie, kontaktieren Sie mich direkt Dieses Unternehmen hat eine Beteiligung von $ 250.000.000,00 - [] Anders Karlsson
[PATCH net-next 1/3] net/smc: restructure netinfo for CLC proposal msgs
From: Karsten Graul <kgr...@linux.vnet.ibm.com> Introduce functions smc_clc_prfx_set to retrieve IP information for the CLC proposal msg and smc_clc_prfx_match to match the contents of a proposal message against the IP addresses of the net device. The new functions replace the functionality provided by smc_clc_netinfo_by_tcpsk, which is removed by this patch. The match functionality is extended to scan all ipv4 addresses of the net device for a match against the ipv4 subnet from the proposal msg. Signed-off-by: Karsten Graul <kgr...@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubr...@linux.vnet.ibm.com> --- net/smc/af_smc.c | 14 ++-- net/smc/smc_clc.c | 100 +- net/smc/smc_clc.h | 4 +-- 3 files changed, 82 insertions(+), 36 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 649489f825a5..949a2714a453 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -767,8 +767,6 @@ static void smc_listen_work(struct work_struct *work) struct smc_link *link; int reason_code = 0; int rc = 0; - __be32 subnet; - u8 prefix_len; u8 ibport; /* check if peer is smc capable */ @@ -803,17 +801,11 @@ static void smc_listen_work(struct work_struct *work) goto decline_rdma; } - /* determine subnet and mask from internal TCP socket */ - rc = smc_clc_netinfo_by_tcpsk(newclcsock, , _len); - if (rc) { - reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */ - goto decline_rdma; - } - pclc = (struct smc_clc_msg_proposal *) pclc_prfx = smc_clc_proposal_get_prefix(pclc); - if (pclc_prfx->outgoing_subnet != subnet || - pclc_prfx->prefix_len != prefix_len) { + + rc = smc_clc_prfx_match(newclcsock, pclc_prfx); + if (rc) { reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */ goto decline_rdma; } diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c index 874c5a75d6dd..dc3a2235978d 100644 --- a/net/smc/smc_clc.c +++ b/net/smc/smc_clc.c @@ -74,15 +74,35 @@ static bool smc_clc_msg_hdr_valid(struct smc_clc_msg_hdr *clcm) return true; } -/* determine subnet and mask of internal TCP socket */ -int smc_clc_netinfo_by_tcpsk(struct socket *clcsock, -__be32 *subnet, u8 *prefix_len) +/* find ipv4 addr on device and get the prefix len, fill CLC proposal msg */ +static int smc_clc_prfx_set4_rcu(struct dst_entry *dst, __be32 ipv4, +struct smc_clc_msg_proposal_prefix *prop) +{ + struct in_device *in_dev = __in_dev_get_rcu(dst->dev); + + if (!in_dev) + return -ENODEV; + for_ifa(in_dev) { + if (!inet_ifa_match(ipv4, ifa)) + continue; + prop->prefix_len = inet_mask_len(ifa->ifa_mask); + prop->outgoing_subnet = ifa->ifa_address & ifa->ifa_mask; + /* prop->ipv6_prefixes_cnt = 0; already done by memset before */ + return 0; + } endfor_ifa(in_dev); + return -ENOENT; +} + +/* retrieve and set prefixes in CLC proposal msg */ +static int smc_clc_prfx_set(struct socket *clcsock, + struct smc_clc_msg_proposal_prefix *prop) { struct dst_entry *dst = sk_dst_get(clcsock->sk); - struct in_device *in_dev; - struct sockaddr_in addr; + struct sockaddr_storage addrs; + struct sockaddr_in *addr; int rc = -ENOENT; + memset(prop, 0, sizeof(*prop)); if (!dst) { rc = -ENOTCONN; goto out; @@ -91,22 +111,58 @@ int smc_clc_netinfo_by_tcpsk(struct socket *clcsock, rc = -ENODEV; goto out_rel; } - /* get address to which the internal TCP socket is bound */ - kernel_getsockname(clcsock, (struct sockaddr *)); - /* analyze IPv4 specific data of net_device belonging to TCP socket */ + kernel_getsockname(clcsock, (struct sockaddr *)); + /* analyze IP specific data of net_device belonging to TCP socket */ rcu_read_lock(); - in_dev = __in_dev_get_rcu(dst->dev); + if (addrs.ss_family == PF_INET) { + /* IPv4 */ + addr = (struct sockaddr_in *) + rc = smc_clc_prfx_set4_rcu(dst, addr->sin_addr.s_addr, prop); + } + rcu_read_unlock(); +out_rel: + dst_release(dst); +out: + return rc; +} + +/* match ipv4 addrs of dev against addr in CLC proposal */ +static int smc_clc_prfx_match4_rcu(struct net_device *dev, + struct smc_clc_msg_proposal_prefix *prop) +{ + struct in_device *in_dev = __in_dev_get_rcu(dev); + + if (!in_dev) + return -ENODEV; for_ifa(in_dev) { - if (!inet_ifa_match(ad
Proposal
Hello Greetings to you and everyone around you please did you get my previous email regarding my proposal ? please let me know if we can work together on this. Best Reagrds
Business Proposal Of $18,100,000.00
Dear Sir/Madam, My name is Youichi Kanno and I work in Audit & credit Supervisory role at The Norinchukin Bank,I am contacting you regarding the asset of a deceased client Mr. Grigor Kassan and I need your assistance to process the fund claims oF $18,100,000.00. if intreasted get back to me so we can discuss the logistic of moving the funds to a safe offshore bank. Yours sincerely, Youichi Kanno
[PATCH net-next 6/6] smc: support variable CLC proposal messages
According to RFC7609 [1] the CLC proposal message contains an area of unknown length for future growth. Additionally it may contain up to 8 IPv6 prefixes. The current version of the SMC-code does not understand CLC proposal messages using these variable length fields and, thus, is incompatible with SMC implementations in other operating systems. This patch makes sure, SMC understands incoming CLC proposals * with arbitrary length values for future growth * with up to 8 IPv6 prefixes [1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609 Signed-off-by: Ursula Braun <ubr...@linux.vnet.ibm.com> Reviewed-by: Hans Wippel <hwip...@linux.vnet.ibm.com> --- net/smc/af_smc.c | 15 ++ net/smc/smc_clc.c | 82 ++- net/smc/smc_clc.h | 34 +++ 3 files changed, 107 insertions(+), 24 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index d3ae0d5b1677..daf8075f5a4c 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -751,14 +751,16 @@ static void smc_listen_work(struct work_struct *work) { struct smc_sock *new_smc = container_of(work, struct smc_sock, smc_listen_work); + struct smc_clc_msg_proposal_prefix *pclc_prfx; struct socket *newclcsock = new_smc->clcsock; struct smc_sock *lsmc = new_smc->listen_smc; struct smc_clc_msg_accept_confirm cclc; int local_contact = SMC_REUSE_CONTACT; struct sock *newsmcsk = _smc->sk; - struct smc_clc_msg_proposal pclc; + struct smc_clc_msg_proposal *pclc; struct smc_ib_device *smcibdev; struct sockaddr_in peeraddr; + u8 buf[SMC_CLC_MAX_LEN]; struct smc_link *link; int reason_code = 0; int rc = 0, len; @@ -775,7 +777,7 @@ static void smc_listen_work(struct work_struct *work) /* do inband token exchange - *wait for and receive SMC Proposal CLC message */ - reason_code = smc_clc_wait_msg(new_smc, , sizeof(pclc), + reason_code = smc_clc_wait_msg(new_smc, , sizeof(buf), SMC_CLC_PROPOSAL); if (reason_code < 0) goto out_err; @@ -804,8 +806,11 @@ static void smc_listen_work(struct work_struct *work) reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */ goto decline_rdma; } - if ((pclc.outgoing_subnet != subnet) || - (pclc.prefix_len != prefix_len)) { + + pclc = (struct smc_clc_msg_proposal *) + pclc_prfx = smc_clc_proposal_get_prefix(pclc); + if (pclc_prfx->outgoing_subnet != subnet || + pclc_prfx->prefix_len != prefix_len) { reason_code = SMC_CLC_DECL_CNFERR; /* configuration error */ goto decline_rdma; } @@ -816,7 +821,7 @@ static void smc_listen_work(struct work_struct *work) /* allocate connection / link group */ mutex_lock(_create_lgr_pending); local_contact = smc_conn_create(new_smc, peeraddr.sin_addr.s_addr, - smcibdev, ibport, , 0); + smcibdev, ibport, >lcl, 0); if (local_contact < 0) { rc = local_contact; if (rc == -ENOMEM) diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c index f5e17d29112b..abf7ceb6690b 100644 --- a/net/smc/smc_clc.c +++ b/net/smc/smc_clc.c @@ -22,6 +22,54 @@ #include "smc_clc.h" #include "smc_ib.h" +/* check if received message has a correct header length and contains valid + * heading and trailing eyecatchers + */ +static bool smc_clc_msg_hdr_valid(struct smc_clc_msg_hdr *clcm) +{ + struct smc_clc_msg_proposal_prefix *pclc_prfx; + struct smc_clc_msg_accept_confirm *clc; + struct smc_clc_msg_proposal *pclc; + struct smc_clc_msg_decline *dclc; + struct smc_clc_msg_trail *trl; + + if (memcmp(clcm->eyecatcher, SMC_EYECATCHER, sizeof(SMC_EYECATCHER))) + return false; + switch (clcm->type) { + case SMC_CLC_PROPOSAL: + pclc = (struct smc_clc_msg_proposal *)clcm; + pclc_prfx = smc_clc_proposal_get_prefix(pclc); + if (ntohs(pclc->hdr.length) != + sizeof(*pclc) + ntohs(pclc->iparea_offset) + + sizeof(*pclc_prfx) + + pclc_prfx->ipv6_prefixes_cnt * + sizeof(struct smc_clc_ipv6_prefix) + + sizeof(*trl)) + return false; + trl = (struct smc_clc_msg_trail *) + ((u8 *)pclc + ntohs(pclc->hdr.length) - sizeof(*trl)); + break; + case SMC_CLC_ACCEPT: + case SMC_CLC_CONFIRM: + clc = (struct smc_clc_msg_accept_confirm *)clcm; + if (nt
[PATCH net-next 07/12] rxrpc: Don't transmit DELAY ACKs immediately on proposal
Don't transmit a DELAY ACK immediately on proposal when the Rx window is rotated, but rather defer it to the work function. This means that we have a chance to queue/consume more received packets before we actually send the DELAY ACK, or even cancel it entirely, thereby reducing the number of packets transmitted. We do, however, want to continue sending other types of packet immediately, particularly REQUESTED ACKs, as they may be used for RTT calculation by the other side. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/recvmsg.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index 0b6609da80b7..fad5f42a3abd 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -219,9 +219,9 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call) after_eq(top, call->ackr_seen + 2) || (hard_ack == top && after(hard_ack, call->ackr_consumed))) rxrpc_propose_ACK(call, RXRPC_ACK_DELAY, 0, serial, - true, false, + true, true, rxrpc_propose_ack_rotate_rx); - if (call->ackr_reason) + if (call->ackr_reason && call->ackr_reason != RXRPC_ACK_DELAY) rxrpc_send_ack_packet(call, false); } }
Re: [virtio-dev] repost: af_packet vs virtio (was packed ring layout proposal v2)
On Wed, Aug 02, 2017 at 04:50:03PM +0300, Michael S. Tsirkin wrote: > On Tue, Aug 01, 2017 at 08:54:27PM -0700, Steven Luong wrote: > > * Descriptor ring: > > > > Guest adds descriptors with unique index values and DESC_HW set in > > flags. > > Host overwrites used descriptors with correct len, index, and DESC_HW > > clear.? Flags are always set/cleared last. > > > > #define DESC_HW 0x0080 > > > > struct desc { > > ? ? ? ? __le64 addr; > > ? ? ? ? __le32 len; > > ? ? ? ? __le16 index; > > ? ? ? ? __le16 flags; > > }; > > > > When DESC_HW is set, descriptor belongs to device. When it is clear, > > it belongs to the driver. > > > > We can use 1 bit to set direction > > /* This marks a buffer as write-only (otherwise read-only). */ > > #define VRING_DESC_F_WRITE? ? ? 2 > > > > * Scatter/gather support > > > > We can use 1 bit to chain s/g entries in a request, same as virtio 1.0: > > > > /* This marks a buffer as continuing via the next field. */ next field seems like a structure field in the software, maybe we need to change the "next field" to "next desc" to avoid misunderstanding. > > > > > > This comment here is confusing to me. In 1.0, virtq_desc has the next field. > > When the flag VRING_DESC_F_NEXT is set, the next entry to continue is > > specified > > in the next field. > > > > Here in 1.1, struct desc does not have the next field, only addr, len, > > index, > > and flags. So when VRING_DESC_F_NEXT is set in struct desc's flags field, > > where > > is the next entry to continue the current descriptor, the entry immediately > > following the current entry? ie, if the current entry is at index 10 in the > > descriptor table and its flags is set for VRING_DESC_F_NEXT, is the entry > > continuing the current entry in index 11? > > > > Steven > > Exactly, you got it right. > > - > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] repost: af_packet vs virtio (was packed ring layout proposal v2)
On Tue, Aug 01, 2017 at 08:54:27PM -0700, Steven Luong wrote: > * Descriptor ring: > > Guest adds descriptors with unique index values and DESC_HW set in flags. > Host overwrites used descriptors with correct len, index, and DESC_HW > clear. Flags are always set/cleared last. > > #define DESC_HW 0x0080 > > struct desc { > __le64 addr; > __le32 len; > __le16 index; > __le16 flags; > }; > > When DESC_HW is set, descriptor belongs to device. When it is clear, > it belongs to the driver. > > We can use 1 bit to set direction > /* This marks a buffer as write-only (otherwise read-only). */ > #define VRING_DESC_F_WRITE 2 > > * Scatter/gather support > > We can use 1 bit to chain s/g entries in a request, same as virtio 1.0: > > /* This marks a buffer as continuing via the next field. */ > > > This comment here is confusing to me. In 1.0, virtq_desc has the next field. > When the flag VRING_DESC_F_NEXT is set, the next entry to continue is > specified > in the next field. > > Here in 1.1, struct desc does not have the next field, only addr, len, index, > and flags. So when VRING_DESC_F_NEXT is set in struct desc's flags field, > where > is the next entry to continue the current descriptor, the entry immediately > following the current entry? ie, if the current entry is at index 10 in the > descriptor table and its flags is set for VRING_DESC_F_NEXT, is the entry > continuing the current entry in index 11? > > Steven Exactly, you got it right.
:::::::::BUSINESS PROPOSAL::::::
Attn, My name is Johnson King, the principal attorney of my law firm., Johnson King & Co. A deceased client Mr. Henry died in 2010 and left a sum little above US$ 28 million in his account here in Unity Bank Plc. Normally banking procedures requires that the bank declares the account forfeitable and transfer the proceeds to the Registry of Unclaimed Property for government use after 8 years from the time of the death of the diseased client. The present situation made me to contact you given that you and my deceased client share the same last name and nationality which made it favorably disposed towards this proposals to present you as the Cestui Que trust and administrator of the account. It may also interest you to know that the transaction will be executed within the ambit of law and nothing shall be done outside of it.If you are not familiar with estate and probate measures, I shall send further information to you concerning these once i get a positive response. Whereas We will discuss the ratios succinctly and promote them in written signed agreement before commencement. I wish to submit that I would expect nothing less but honesty and transparency. I will uncover to you further information on the matter in our following communications. If this business interests you kindly revert with your direct phone number for further exhaustive phone talk. I look forward to having a good business relationship with you. Yours sincerely, Johnson King & Co
repost: af_packet vs virtio (was packed ring layout proposal v2)
On Fri, Apr 14, 2017 at 05:42:58AM +0300, Michael S. Tsirkin wrote: > Hi all, I wanted to raise the question of similarities between virtio > and new zero copy af_packet interfaces. > > First I would like to mention that virtio device development isn't spec > limited - spec is there to help interoperability and add peace of mind > for people worried about IPR. > > So I tend to accept patches without requiring people write it up in the > spec as work on spec proceeds at its own pace - all I ask is that the > virtio mailing list is copied, this requires contributor to subscribe > and in the process contributor promises that it's ok for us to add this > to spec in the future. > > There shouldn't thus be a fundamental problem preventing use of virtio > format or reusing some of the code for af_packet, but it still might or > might not make sense - it was designed for CPU to CPU communication so > it seems to make sense though. So I would like that discussion to > happen even if we decide against. > > And even if people decide against, the problem space is very similar. You > can look up packed ring layout proposal v2 - should I repost here? Our > prototyping shows significant performance improvements from using it as > compared to head/tail layout. > > To start this discission I'm going to reply to this email reposting a > copy of the simplified virtio layout that might be appropriate for > af_packet as well. Here's the repost (slightly cut down) sorry about the duplicates. The idea is to have a r/w descriptor in a ring structure, replacing the used and available ring, index and descriptor buffer. * Descriptor ring: Guest adds descriptors with unique index values and DESC_HW set in flags. Host overwrites used descriptors with correct len, index, and DESC_HW clear. Flags are always set/cleared last. #define DESC_HW 0x0080 struct desc { __le64 addr; __le32 len; __le16 index; __le16 flags; }; When DESC_HW is set, descriptor belongs to device. When it is clear, it belongs to the driver. We can use 1 bit to set direction /* This marks a buffer as write-only (otherwise read-only). */ #define VRING_DESC_F_WRITE 2 * Scatter/gather support We can use 1 bit to chain s/g entries in a request, same as virtio 1.0: /* This marks a buffer as continuing via the next field. */ #define VRING_DESC_F_NEXT 1 Unlike virtio 1.0, all descriptors must have distinct ID values. Also unlike virtio 1.0, use of this flag will be an optional feature (e.g. VIRTIO_F_DESC_NEXT) so both devices and drivers can opt out of it. * Indirect buffers Can be marked like in virtio 1.0: /* This means the buffer contains a table of buffer descriptors. */ #define VRING_DESC_F_INDIRECT 4 Unlike virtio 1.0, this is a table, not a list: struct indirect_descriptor_table { /* The actual descriptors (16 bytes each) */ struct virtq_desc desc[len / 16]; }; The first descriptor is located at start of the indirect descriptor table, additional indirect descriptors come immediately afterwards. DESC_F_WRITE is the only valid flag for descriptors in the indirect table. Others should be set to 0 and are ignored. id is also set to 0 and should be ignored. virtio 1.0 seems to allow a s/g entry followed by an indirect descriptor. This does not seem useful, so we do not allow that anymore. This support would be an optional feature, same as in virtio 1.0 * Batching descriptors: virtio 1.0 allows passing a batch of descriptors in both directions, by incrementing the used/avail index by values > 1. We can support this by chaining a list of descriptors through a bit the flags field. To allow use together with s/g, a different bit will be used. #define VRING_DESC_F_BATCH_NEXT 0x0010 Batching works for both driver and device descriptors. * Processing descriptors in and out of order Device processing all descriptors in order can simply flip the DESC_HW bit as it is done with descriptors. Device can write descriptors out in order as they are used, overwriting descriptors that are there. Device must not use a descriptor until DESC_HW is set. It is only required to look at the first descriptor submitted. Driver must not overwrite a descriptor until DESC_HW is clear. It is only required to look at the first descriptor submitted. * Device specific descriptor flags We have a lot of unused space in the descriptor. This can be put to good use by reserving some flag bits for device use. For example, network device can set a bit to request that header in the descriptor is suppressed (in case it's all 0s anyway). This reduces cache utilization. Note: this feature can be supported in virtio 1.0 as well, as we have unused bits in both descriptor and used ring there. * Descriptor length in device descriptors virtio 1.0 places strict requirements on descriptor length. For example it must be 0 in use
Proposal...
Personal Business proposal for you,contact me via my personal E-mail for more detail's: ms_teresa_a...@outlook.com
Proposal...
Personal Business proposal for you,contact me via my personal E-mail for more detail's: ms_teresa_a...@outlook.com
Proposal...
Personal Business proposal for you,contact me via my personal E-mail for more detail's: ms_teresa_a...@outlook.com
Proposal...
Personal Business proposal for you,contact me via my personal E-mail for more detail's: ms_teresa_a...@outlook.com
Business Proposal
-- Dear Friend, I would like to discuss a very important issue with you. I am writing to find out if this is your valid email. Please, let me know if this email is valid Kind regards Adrien Saif Attorney to Quatif Group of Companies
Business Proposal
Dear Friend, I would like to discuss a very important issue with you. I am writing to find out if this is your valid email. Please, let me know if this email is valid Kind regards Adrien Saif Attorney to Quatif Group of Companies
Business Proposal
Dear Friend, I would like to discuss a very important issue with you. I am writing to find out if this is your valid email. Please, let me know if this email is valid Kind regards Adrien Saif Attorney to Quatif Group of Companies
Proposal
Business Partnership Proposal For You,contact me via my personal E-mail for further detail's: ms_teresa_a...@outlook.com
[PATCH net-next 14/15] rxrpc: Add tracepoint for ACK proposal
Add a tracepoint to log proposed ACKs, including whether the proposal is used to update a pending ACK or is discarded in favour of an easlier, higher priority ACK. Whilst we're at it, get rid of the rxrpc_acks() function and access the name array directly. We do, however, need to validate the ACK reason number given to trace_rxrpc_rx_ack() to make sure we don't overrun the array. Signed-off-by: David Howells <dhowe...@redhat.com> --- include/rxrpc/packet.h |1 + include/trace/events/rxrpc.h | 42 -- net/rxrpc/ar-internal.h | 25 +++-- net/rxrpc/call_event.c | 21 ++--- net/rxrpc/input.c| 19 +-- net/rxrpc/misc.c | 30 +++--- net/rxrpc/output.c |3 ++- net/rxrpc/recvmsg.c |3 ++- 8 files changed, 114 insertions(+), 30 deletions(-) diff --git a/include/rxrpc/packet.h b/include/rxrpc/packet.h index fd6eb3a60a8c..703a64b4681a 100644 --- a/include/rxrpc/packet.h +++ b/include/rxrpc/packet.h @@ -123,6 +123,7 @@ struct rxrpc_ackpacket { #define RXRPC_ACK_PING_RESPONSE7 /* response to RXRPC_ACK_PING */ #define RXRPC_ACK_DELAY8 /* nothing happened since received packet */ #define RXRPC_ACK_IDLE 9 /* ACK due to fully received ACK window */ +#define RXRPC_ACK__INVALID 10 /* Representation of invalid ACK reason */ uint8_t nAcks; /* number of ACKs */ #define RXRPC_MAXACKS 255 diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 9413b17ba04b..d67a8c6b085a 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -251,7 +251,7 @@ TRACE_EVENT(rxrpc_rx_ack, TP_printk("c=%p %s f=%08x n=%u", __entry->call, - rxrpc_acks(__entry->reason), + rxrpc_ack_names[__entry->reason], __entry->first, __entry->n_acks) ); @@ -314,7 +314,7 @@ TRACE_EVENT(rxrpc_tx_ack, TP_printk(" c=%p ACK %08x %s f=%08x r=%08x n=%u", __entry->call, __entry->serial, - rxrpc_acks(__entry->reason), + rxrpc_ack_names[__entry->reason], __entry->ack_first, __entry->ack_serial, __entry->n_acks) @@ -505,6 +505,44 @@ TRACE_EVENT(rxrpc_rx_lose, __entry->hdr.type <= 15 ? rxrpc_pkts[__entry->hdr.type] : "?UNK") ); +TRACE_EVENT(rxrpc_propose_ack, + TP_PROTO(struct rxrpc_call *call, enum rxrpc_propose_ack_trace why, +u8 ack_reason, rxrpc_serial_t serial, bool immediate, +bool background, enum rxrpc_propose_ack_outcome outcome), + + TP_ARGS(call, why, ack_reason, serial, immediate, background, + outcome), + + TP_STRUCT__entry( + __field(struct rxrpc_call *,call ) + __field(enum rxrpc_propose_ack_trace, why ) + __field(rxrpc_serial_t, serial ) + __field(u8, ack_reason ) + __field(bool, immediate ) + __field(bool, background ) + __field(enum rxrpc_propose_ack_outcome, outcome ) +), + + TP_fast_assign( + __entry->call = call; + __entry->why= why; + __entry->serial = serial; + __entry->ack_reason = ack_reason; + __entry->immediate = immediate; + __entry->background = background; + __entry->outcome= outcome; + ), + + TP_printk("c=%p %s %s r=%08x i=%u b=%u%s", + __entry->call, + rxrpc_propose_ack_traces[__entry->why], + rxrpc_ack_names[__entry->ack_reason], + __entry->serial, + __entry->immediate, + __entry->background, + rxrpc_propose_ack_outcomes[__entry->outcome]) + ); + #endif /* _TRACE_RXRPC_H */ /* This part must be outside protection */ diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index e564eca75985..042dbcc52654 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -689,8 +689,28 @@ enum rxrpc_timer_trace { e
BUSINESS PROPOSAL!!!
I am Mr.Saeed Bin Salem Executive Director and Chief Financial Officer of the National Commercial Bank Libya.I have a secured business suggestion for you reply me on my email: saeedbi...@qq.com
Proposal
I have a Business proposal for you,contact me via my personal E-mail for more detail's: ms_teresa_a...@outlook.com
Fwd: Investment Proposal
Business Proposal view the attached letter for more details'' CONFIDENTIAL INVESTMENT PROPOSAL.doc Description: MS-Word document
(Re: BUSINESS PROPOSAL!!!)
Email Address (anthony.o...@aol.com) Confidential Business Proposal! I am Mr. Anthony Oke. I have a business proposal which will benefit both of us, The amount of money involved is (Thirty Five Million Great British Pounds) which i want to transfer from an abandoned account to your bank account; it is 100% risk free. Upon the conclusion of this transaction, i accept that (50%) Fifty Percent will be for you in respect of all your assistance for this transaction and (50% ) Fifty - Percent will be for me being the pioneer of the business. A lot of customers open private accounts with different Banks without the knowledge of their families and when they die, such money will be lost to the Bank unless someone comes to claim it. This is how a lot of Bank Directors make so much money silently. I will like you to provide immediately the below information's, to enable me use it and get you next of kin application form from my bank. 1.Full Name:.. 2.Full Address:... 3.Telephone Number:... 4.Country: 5.Occupation:. 6.Age: 7.Sex: As soon as you reply through this private Email Address (anthony.o...@aol.com),I will let you know the next steps and procedures and more details to follow in order to finalize this transaction immediately. Please keep whatever information you get from me strictly confidential even if you decide not to participate in this transaction. Yours faithfully, Mr. Anthony Oke
Fund Transaction Proposal
US$23,200,000.00 Million Transaction, for further detail's contact me via my personal e- mail: ms_teresa_...@outlook.com
Proposal for per-radio configuration file.
Hello! While working on ath10k NICs, I found a need to have one radio be configured in one manner, and another in a different manner, and I need this config to happen before the NIC is booted in at least some cases. The primary reason is that the NIC has limited resources, so there is definite need to allow the user to optimize for their use case. For instance, more vdevs vs more peer objects. Module parameters do not work well for this because I want different NICs with the same driver to have different configuration. For ath10k, I implemented this with a text file for each NIC that is loaded with the firmware-load API, parsed in the kernel, and then used to configure the radio on bootup. A patch I used to do this is here. I think I ended up with a few follow-on patches to fix some bugs, but this has the idea: http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=6708e4047d91edf234239943332bc2f0d124d009 It seems to work fine in my testing, and is logically similar to loading board init files and so forth (which ath10k already uses). I am looking for feedback on this if anyone has any opinions The config files look like this: ]# ls -l /lib/firmware/ath10k/ total 12 -rw-r--r-- 1 root root 311 Feb 23 11:27 fwcfg-pci-:05:00.0.txt -rw-r--r-- 1 root root 330 Feb 23 11:18 fwcfg-pci-:07:00.0.txt rwxr-xr-x. 3 root root 4096 May 21 2015 QCA988X ]# cat /lib/firmware/ath10k/fwcfg-pci-:05:00.0.txt # Configuration for radio 1 vdevs = 64 peers = 127 stations = 127 rate_ctrl_objs = 36 regdom = 840 fwname = firmware-2.bin fwver = 2 nohwcrypt = 1 tx_desc = 680 #max_nss = 3 num_tids = 256 skid_limit = 128 [root@ben-ota-1 lanforge]# cat /lib/firmware/ath10k/fwcfg-pci-\:07\:00.0.txt # Configuration for radio 2 # Driver will pick defaults for any commented-out or missing variables. # vdevs = 8 # peers = 64 # stations = 64 # rate_ctrl_objs = 10 # regdom = 840 # fwname = firmware-5.bin # fwver = 5 # nohwcrypt = 1 # tx_desc = 1024 #max_nss = 3 # num_tids = 128 # skid_limit = 32 Thanks, Ben -- Ben GreearCandela Technologies Inc http://www.candelatech.com
Business Proposal of $12.8m USD
Dear Friend, I am Song Chen i have a Business Proposal of $12.8m USD for you to handle with me from my bank contact me for more information (mr.songchen...@hotmail.com) Regards, Mr Song Chen. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I HAVE A PROPOSAL FOR YOU?
Good day I am Major Alan Edward, I Have a Proposal for you Please do get back for more details. Regards, Major Alan Edward. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
a question about the kcm proposal
Thinking back a bit about the kcm proposal: https://www.mail-archive.com/netdev@vger.kernel.org/msg78696.html I had a question: If the user-space has decided to encrypt the http/2 header using tls, the len (and other http/2 fields) is no longer in the clear for the kernel. My understanding is that http header encryption is common practice/BCP, since the http hdr may contain a lot of identity, session and tenancy data. If that's true, then wouldn't this break the BPF/kcm assumptions? There is a different but related problem in this space- existing TLS/DTLS libraries (openssl, gnutls etc) only know how to work with tcp or udp sockets - they do not know anything about PF_RDS or the newly proposed kcm socket type. In theory, it is possible to extend these libraries to handle RDS/kcm etc, but (as we found out with RDS and IP_PKTINFO/BINDTODEVICE), some things become tricky because of the many-to-one dgram-over-stream hybrid. I've looked at IPSEC/IKE in transport mode for RDS on the kernel tcp socket as we discussed at Plumbers in August, and that has some costs.. would be interesting to evaluate against other options.. --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: a question about the kcm proposal
> > If the user-space has decided to encrypt the http/2 header using tls, > the len (and other http/2 fields) is no longer in the clear for the kernel. > > My understanding is that http header encryption is common practice/BCP, > since the http hdr may contain a lot of identity, session and tenancy data. > If that's true, then wouldn't this break the BPF/kcm assumptions? > Right, if data is encrypted then we can't do message delineation on receive. KCM wouldn't help much on transmit either since the crypto state would need to be shared. The solution is to move TLS into the kernel. > There is a different but related problem in this space- existing TLS/DTLS > libraries (openssl, gnutls etc) only know how to work with tcp > or udp sockets - they do not know anything about PF_RDS or the > newly proposed kcm socket type. > TLS-in-kernel would be a lower layer so it shouldn't have to know anything about RDS or KCM. If it makes sent KCM could be used for parsing TLS records themselves... > In theory, it is possible to extend these libraries to handle > RDS/kcm etc, but (as we found out with RDS and IP_PKTINFO/BINDTODEVICE), > some things become tricky because of the many-to-one dgram-over-stream > hybrid. > > I've looked at IPSEC/IKE in transport mode for RDS on the kernel tcp > socket as we discussed at Plumbers in August, and that has some costs.. > would be interesting to evaluate against other options.. > The design of TLS in the kernel is that it will be enabled on the TCP socket, so that receive and transmit path are below RDS and KCM. We have the transmit path for TLS-in-kernel running with good preliminary results, we will post that at least as RFC shortly. Receive side still seems to be feasible. Thanks, Tom > --Sowmini > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: a question about the kcm proposal
On (10/12/15 15:05), Tom Herbert wrote: > > There is a different but related problem in this space- existing TLS/DTLS > > libraries (openssl, gnutls etc) only know how to work with tcp > > or udp sockets - they do not know anything about PF_RDS or the > > newly proposed kcm socket type. > > > TLS-in-kernel would be a lower layer so it shouldn't have to know > anything about RDS or KCM. If it makes sent KCM could be used for > parsing TLS records themselves... I wouldn't quite jump to that conclusion just yet though :-) there are a lot of alternatives- you could have a uspace module that shims between the application and kcm (even something that gets LD_PRELOADed) and adds the right kcm header as needed. Or you could use ipsec/ike.. tls in the kernel can be quite complex and history shows that it can easily become hard to maintain: uspace TLS (both the protocol itself, and the negotiated crypto) tend to move much faster than kernel changes (at least that's what the 10+ year long solaris-kssl experiment found). There is another aspect to this: in the DB world, for example, I might seriously care about encrypting my payroll-database, but not care so much about the christmas-potluck-database. Thus allowing the uspace to select when (and what type of crypto algo) to use is a flexibiility offered by TLS that a "kernel-TLS" would have a hard time matching. > The design of TLS in the kernel is that it will be enabled on the TCP > socket, so that receive and transmit path are below RDS and KCM. We > have the transmit path for TLS-in-kernel running with good preliminary > results, we will post that at least as RFC shortly. Receive side still > seems to be feasible. yes, please share. TLS does complex things like mid-session CCS. Such things can result in a lot of asyncrony in the kernel. Given that ipsec has already crossed that bridge, I, for one, would like to understand the trade-offs. The question in my mind, is "how does this match up with transport mode ipsec/ike", and if it does not, why not? The only difference (in theory) is whether you do encryption before, or after, adding the transport (tcp/udp) header, so if there is a big perf gap, we need to understand why. --Sowmini -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
proposal
I wish to discuss a very confidential business proposition worth $48Million USD with you that will be of immense benefit to the both of us, but I want your consent before sending details. Mr Wing -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
did you receive my charity proposal details
-- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next 0/16] Proposal for VRF-lite - v3
On 7/27/15 2:30 PM, Eric W. Biederman wrote: This paragraph is false when it comes to sockets, as I have already pointed out. - VPN Routing and Forwarding (RFC4364 and it's kin) implies isolation strong enough to allow using the the same ip on different machines in different VPN instances and not have confusion. - The routing table is not the only table in the kernel that uses an ip address as a key. The result is that you can combine packets fragments that come in on different interfaces (irrespective of your VPN), confuse tcp parameters between interfaces, scramble your ipsec connections and I don't know what else. The duplicate IP address is a problem with the networking stack today; the VRF device does not introduce it. The VRF device does allow duplicate IP addresses within a namespace but separate VRFs, though yes various places that rely solely on source address like IP fragmentation do need to be fixed. I looked at the IPv4 fragmentation code yesterday and will continue today. So help me with the history: is there any reason why the device index is not used today? It seems like a straight forward change. 1. simple netdevices with the same IP address -- no problem using index in the lookup 2. 2 ipsec tunnels -- different netdevices, same IP address -- no problem using index 3. stacked devices like bonding and team interfaces appear to the stack as a single device -- no problem using index of stacked device 4. If an interface is deleted and a new one is created with the same IP address then we want to fail the lookup -- no problem using index 5. other??? Is there a use case where I can't add ifindex of the incoming device (or higher level device if skb-dev is changed) to the hash and lookup for fragments? Version 3 - addressed comments from first 2 RFCs with the exception of the name Nicolas: We will do the name conversion once we agree on what the correct name should be (vrf, mrf or something else) Not so. I described the deep problems between your goals and your implementation and they are not even mentioned let alone addressed. I have addressed comments to the extent that I can. As I stated in my last followup to you Eric I did not understand your point. I asked for clarification, a --verbose if you will. I can't read your mind, so I need you to elaborate on your points to be able to respond and address your concerns. - packets flow through the VRF device in both directions allowing the following: - tcpdump -i vrfn - tc rules on vrf device - netfilter rules on vrf device Ingo/Andy: I added you two as a start point for the proposed task related changes. Not sure who should be the reviewer; please let me know if someone else is more appropriate. Thanks. It looks like you are trying to implement a namespace that isn't a namespace. Given that it is broken by design you have my nack. This is an L3 separation within a namespace, not a device level separation which is what namespaces provide. David -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next 0/16] Proposal for VRF-lite - v3
David Ahern d...@cumulusnetworks.com writes: On 7/27/15 2:30 PM, Eric W. Biederman wrote: This paragraph is false when it comes to sockets, as I have already pointed out. - VPN Routing and Forwarding (RFC4364 and it's kin) implies isolation strong enough to allow using the the same ip on different machines in different VPN instances and not have confusion. - The routing table is not the only table in the kernel that uses an ip address as a key. The result is that you can combine packets fragments that come in on different interfaces (irrespective of your VPN), confuse tcp parameters between interfaces, scramble your ipsec connections and I don't know what else. The duplicate IP address is a problem with the networking stack today; the VRF device does not introduce it. The VRF device does allow duplicate IP addresses within a namespace but separate VRFs, though yes various places that rely solely on source address like IP fragmentation do need to be fixed. No. The same IP address being used by different machines is not a problem with the IP stack today. IP addresses are defined to be globally unique. At the point you introduce VPNs/VRFs you introduce duplicate IP addresses and then the code needs to cope. As such I think there is a deep mismatch between the semantics of BINDTODEVICE and VRFs because BINDTODEVICE by definition does not worry about duplicate IP addresses. Which means that you can't just reuse the BINDTODEVICE infrastructure. It is fundamentally insufficient to the task. So as you are discovering you have to invent something new. That new thing needs a definition. Maybe the new thing makes sense and you can just slice off a chunk of a network namespace. Maybe you go through and change all of the code. I looked at the IPv4 fragmentation code yesterday and will continue today. So help me with the history: is there any reason why the device index is not used today? It seems like a straight forward change. Sigh. I would have hoped someone dealing with routing issues would have seen this at a glance. The reason is multi-path reception of fragments. Adding the device index into the fragment reassembly logic would break fragment reassembly when fragments of the same packet come into a machine on different network devices. Given that only the first fragment has port numbers I can easily see network path selection code hashing fragments onto different paths through the network. Is there a use case where I can't add ifindex of the incoming device (or higher level device if skb-dev is changed) to the hash and lookup for fragments? As detailed above. That breaks fragment reassembly on multiple paths. Version 3 - addressed comments from first 2 RFCs with the exception of the name Nicolas: We will do the name conversion once we agree on what the correct name should be (vrf, mrf or something else) Not so. I described the deep problems between your goals and your implementation and they are not even mentioned let alone addressed. I have addressed comments to the extent that I can. As I stated in my last followup to you Eric I did not understand your point. I asked for clarification, a --verbose if you will. I can't read your mind, so I need you to elaborate on your points to be able to respond and address your concerns. Hopefully this helps. Everything we are talking about follows from what I said at the outset. You are introducing the idea of having the same ip address refer to different network destinations depending upon context. Outside of network namespaces that concept is new and it breaks a lot of assumptions. The entire network stack is too large to fit in my head. I don't know every place where ip addresses are used as part of the index into a table. It is beholden on the implementor of a new feature to figure out how to introduce such a concept safely. I don't see that happening with VRF-lite. Pretty fundamentally a network device index is insufficient for your needs. - packets flow through the VRF device in both directions allowing the following: - tcpdump -i vrfn - tc rules on vrf device - netfilter rules on vrf device Ingo/Andy: I added you two as a start point for the proposed task related changes. Not sure who should be the reviewer; please let me know if someone else is more appropriate. Thanks. It looks like you are trying to implement a namespace that isn't a namespace. Given that it is broken by design you have my nack. This is an L3 separation within a namespace, not a device level separation which is what namespaces provide. Not my meaning. I was not talking about network namespaces and how your vrf is almost but not completely the same as a network namespace. What I was talking about is that you are implementing something that is used roughly the same way as the other namespaces pid, mount, ipc, net, uts, etc. As the
[net-next 0/16] Proposal for VRF-lite - v3
In the context of internet scale routing a requirement that always comes up is the need to partition the available routing tables into disjoint routing planes. A specific use case is the multi-tenancy problem where each tenant has their own unique routing tables and in the very least need different default gateways. This patch allows the ability to create virtual router domains (aka VRFs (VRF-lite to be specific) in the linux packet forwarding stack. The main observation is that through the use of rules and socket binding to interfaces, all the facilities that we need are already present in the infrastructure. What is missing is a handle that identifies a routing domain and can be used to gather applicable rules/tables and uniqify neighbor selection. The scheme used needs to preserves the notions of ECMP, and general routing principles. This driver is a cross between functionality that the IPVLAN driver and the Team drivers provide where a device is created and packets into/out of the routing domain are shuttled through this device. The device is then used as a handle to identify the applicable rules. The VRF device is thus the layer3 equivalent of a vlan device. The very important point to note is that this is only a Layer3 concept so L2 tools (e.g., LLDP) do not need to be run in each VRF, processes can run in unaware mode or select a VRF to be talking through. Also the behavioral model is a generalized application of the familiar VRF-Lite model with some performance paths that need optimization. (Specifically the output route selector that Roopa, Robert, Thomas and EricB are currently discussing on the MPLS thread) High Level points = 1. Simple overlay driver (minimal changes to current stack) * uses the existing fib tables and fib rules infrastructure 2. Modelled closely after the ipvlan driver 3. Uses current API and infrastructure. * Applications can use SO_BINDTODEVICE or cmsg device indentifiers to pick VRF (ping, traceroute just work) * Standard IP Rules work, and since they are aggregated against the device, scale is manageable 4. Completely orthogonal to Namespaces and only provides separation in the routing plane (and ARP) N2 N1 (all configs here) +---+ +--+ | | |swp1 :10.0.1.1+--+swp1 :10.0.1.2 | | | | | |swp2 :10.0.2.1+--+swp2 :10.0.2.2 | | | +---+ | VRF 1| | table 5 | | | +---+ | | | VRF 2| N3 | table 6 | +---+ | | | | |swp3 :10.0.2.1+--+swp1 :10.0.2.2 | | | | | |swp4 :10.0.3.1+--+swp2 :10.0.3.2 | +--+ +---+ Given the topology above, the setup needed to get the basic VRF functions working would be Create the VRF devices and associate with a table ip link add vrf1 type vrf table 5 ip link add vrf2 type vrf table 6 Install the lookup rules that map table to VRF domain ip rule add pref 200 oif vrf1 lookup 5 ip rule add pref 200 iif vrf1 lookup 5 ip rule add pref 200 oif vrf2 lookup 6 ip rule add pref 200 iif vrf2 lookup 6 ip link set vrf1 up ip link set vrf2 up Enslave the routing member interfaces ip link set swp1 master vrf1 ip link set swp2 master vrf1 ip link set swp3 master vrf2 ip link set swp4 master vrf2 Connected routes are automatically moved from main table to the VRF table. ping using VRF0 is simply ping -I vrf0 10.0.1.2 Or using the task context and a command such as the example chvrf in patch 15 unmodified applications are run in a VRF context using: chvrf -v 1 ping 10.0.1.2 Design Highlights = If a device is enslaved to a VRF device (ie., associated with a VRF) then: 1. Rx path The master device index is used as the iif for all lookups. 2. Tx path Similarly, for Tx the VRF device oif is used in the flow to direct lookups to the table associated with the VRF via its rule. From there the FLOWI_FLAG_VRFSRC flag is used to indicate that the oif should not be used for FIB table lookups. 3. Connected and local routes On link up for a device, connected and local routes are added to the table associated with the VRF device, rather than the local and main tables. 4. Socket lookups Socket lookups use the VRF device for comparison with sk_bound_dev_if. If a socket is not bound to a device a socket match can happen based on destination address, port and protocol in which case a VRF global or agnostic
Re: [net-next 0/16] Proposal for VRF-lite - v3
David Ahern d...@cumulusnetworks.com writes: In the context of internet scale routing a requirement that always comes up is the need to partition the available routing tables into disjoint routing planes. A specific use case is the multi-tenancy problem where each tenant has their own unique routing tables and in the very least need different default gateways. This patch allows the ability to create virtual router domains (aka VRFs (VRF-lite to be specific) in the linux packet forwarding stack. The main observation is that through the use of rules and socket binding to interfaces, all the facilities that we need are already present in the infrastructure. What is missing is a handle that identifies a routing domain and can be used to gather applicable rules/tables and uniqify neighbor selection. The scheme used needs to preserves the notions of ECMP, and general routing principles. This paragraph is false when it comes to sockets, as I have already pointed out. - VPN Routing and Forwarding (RFC4364 and it's kin) implies isolation strong enough to allow using the the same ip on different machines in different VPN instances and not have confusion. - The routing table is not the only table in the kernel that uses an ip address as a key. The result is that you can combine packets fragments that come in on different interfaces (irrespective of your VPN), confuse tcp parameters between interfaces, scramble your ipsec connections and I don't know what else. Binding a socket to a network device is not strong enough to do what you want to do and it will lead to subtle bugs, that can be triggered by accident or by hostile actors. If these kinds of limitations are well documented and it is specified that these kinds of problems can occur with your socket code there may be a place for this code somewhere. However described like it is your code is wrong and fundmentally broken. Version 3 - addressed comments from first 2 RFCs with the exception of the name Nicolas: We will do the name conversion once we agree on what the correct name should be (vrf, mrf or something else) Not so. I described the deep problems between your goals and your implementation and they are not even mentioned let alone addressed. - packets flow through the VRF device in both directions allowing the following: - tcpdump -i vrfn - tc rules on vrf device - netfilter rules on vrf device Ingo/Andy: I added you two as a start point for the proposed task related changes. Not sure who should be the reviewer; please let me know if someone else is more appropriate. Thanks. It looks like you are trying to implement a namespace that isn't a namespace. Given that it is broken by design you have my nack. Nacked-by: Eric W. Biederman ebied...@xmission.com Eric -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/6] Proposal for VRF-lite - v2
On Mon, Jul 6, 2015 at 8:03 AM, David Ahern d...@cumulusnetworks.com wrote: In the context of internet scale routing a requirement that always comes up is the need to partition the available routing tables into disjoint routing planes. A specific use case is the multi-tenancy problem where each tenant has their own unique routing tables and in the very least need different default gateways. Based on this problem statement, netns would be the answer: to partition the physical router into N virtual routers. If routing is offloaded, the offload device is netns-aware to preserve the partitioning down to the HW level. I see from earlier discussions on VRF that netns is no good because it's an inefficient use of resources. I wonder if that's true in a practical way? If I have a 48-port router, I could create 24 2-port virtual routers using netns, each running routing stuff (bgp, lldp, ospf, etc). Is the netns overhead plus the routing sw duplication not going to fit on a Cumulus-class router? In other words, if noone had ever heard of VRF, we'd conclude netns given the problem statement. And then focus on inefficiencies in netns, if the implementation didn't fit a particular target. So my C in RFC is what's wrong with using netns? And can those wrongs be fixed? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/6] Proposal for VRF-lite - v2
Le 06/07/2015 19:53, Shrijeet Mukherjee a écrit : No no problem, Just trying to get the functional aspects worked out. the global search replace will be easy. Was hoping to see some more responses on the naming suggestions here from the community. If there is not disagreement we can spin patches with MRF as the name. For me, it's ok. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/6] Proposal for VRF-lite - v2
No no problem, Just trying to get the functional aspects worked out. the global search replace will be easy. Was hoping to see some more responses on the naming suggestions here from the community. If there is not disagreement we can spin patches with MRF as the name. On Mon, Jul 6, 2015 at 8:40 AM, Nicolas Dichtel nicolas.dich...@6wind.com wrote: Le 06/07/2015 17:03, David Ahern a écrit : In the context of internet scale routing a requirement that always comes up is the need to partition the available routing tables into disjoint routing planes. A specific use case is the multi-tenancy problem where each tenant has their own unique routing tables and in the very least need different default gateways. This is an attempt to build the ability to create virtual router domains aka VRF's (VRF-lite to be specific) in the linux packet forwarding stack. The main observation is that through the use of rules and socket binding to interfaces, all the facilities that we need are already present in the infrastructure. What is missing is a handle that identifies a routing domain and can be used to gather applicable rules/tables and uniqify neighbor selection. The scheme used needs to preserves the notions of ECMP, and general routing principles. [snip] drivers/net/vrf.c | 486 ++ [snip] I'm still opposed to name this 'vrf', see the v1 thread: - http://www.spinics.net/lists/netdev/msg332357.html - http://www.spinics.net/lists/netdev/msg332376.html Shrijeet seemed to agree to rename it, is there a problem? Regards, Nicolas -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC net-next 0/6] Proposal for VRF-lite - v2
In the context of internet scale routing a requirement that always comes up is the need to partition the available routing tables into disjoint routing planes. A specific use case is the multi-tenancy problem where each tenant has their own unique routing tables and in the very least need different default gateways. This is an attempt to build the ability to create virtual router domains aka VRF's (VRF-lite to be specific) in the linux packet forwarding stack. The main observation is that through the use of rules and socket binding to interfaces, all the facilities that we need are already present in the infrastructure. What is missing is a handle that identifies a routing domain and can be used to gather applicable rules/tables and uniqify neighbor selection. The scheme used needs to preserves the notions of ECMP, and general routing principles. This driver is a cross between functionality that the IPVLAN driver and the Team drivers provide where a device is created and packets into/out of the routing domain are shuttled through this device. The device is then used as a handle to identify the applicable rules. The VRF device is thus the layer3 equivalent of a vlan device. The very important point to note is that this is only a Layer3 concept so LLDP like tools do not need to be run in each VRF, processes can run in unaware mode or select a VRF to be talking through. Also the behavioral model is a generalized application of the familiar VRF-Lite model with some performance paths that need optimization. (Specifically the output route selector that Roopa, Robert, Thomas and EricB are currently discussing on the MPLS thread) High Level points = 1. Simple overlay driver (minimal changes to current stack) * uses the existing fib tables and fib rules infrastructure 2. Modelled closely after the ipvlan driver 3. Uses current API and infrastructure. * Applications can use SO_BINDTODEVICE or cmsg device indentifiers to pick VRF (ping, traceroute just work) * Standard IP Rules work, and since they are aggregated against the device, scale is manageable 4. Completely orthogonal to Namespaces and only provides separation in the routing plane (and ARP) N2 N1 (all configs here) +---+ +--+ | | |swp1 :10.0.1.1+--+swp1 :10.0.1.2 | | | | | |swp2 :10.0.2.1+--+swp2 :10.0.2.2 | | | +---+ | VRF 1| | table 5 | | | +---+ | | | VRF 2| N3 | table 6 | +---+ | | | | |swp3 :10.0.2.1+--+swp1 :10.0.2.2 | | | | | |swp4 :10.0.3.1+--+swp2 :10.0.3.2 | +--+ +---+ Given the topology above, the setup needed to get the basic VRF functions working would be Create the VRF devices and associate with a table ip link add vrf1 type vrf table 5 ip link add vrf2 type vrf table 6 Install the lookup rules that map table to VRF domain ip rule add pref 200 oif vrf1 lookup 5 ip rule add pref 200 iif vrf1 lookup 5 ip rule add pref 200 oif vrf2 lookup 6 ip rule add pref 200 iif vrf2 lookup 6 ip link set vrf1 up ip link set vrf2 up Enslave the routing member interfaces ip link set swp1 master vrf1 ip link set swp2 master vrf1 ip link set swp3 master vrf2 ip link set swp4 master vrf2 In this version connected routes are automatically moved from main table to VRF table. ping using VRF0 is simply ping -I vrf0 -I optional-src-addr 10.0.1.2 Or using the task context and a command such as the example chvrf in patch 6 unmodified applications are run in a VRF context using: chvrf -v 1 ping 10.0.1.2 Design Highlights = If a device is enslaved to a VRF device (ie., associated with a VRF) then: 1. Rx path The master device index is used as the iif for all lookups. 2. Tx path Similarly, for Tx the VRF device oif is used in the flow to direct lookups to the table associated with the VRF via its rule. From there the FLOWI_FLAG_VRFSRC flag is used to indicate that the oif should not be used for FIB table lookups. 3. Connected and local routes On link up for a device, connected and local routes are added to the table associated with the VRF device, rather than the local and main tables. 4. Socket lookups Socket lookups use the VRF device for comparison with sk_bound_dev_if. If a socket is not bound to a device a socket match can happen based on destination address, port and protocol in which
Re: [RFC net-next 0/6] Proposal for VRF-lite - v2
Le 06/07/2015 17:03, David Ahern a écrit : In the context of internet scale routing a requirement that always comes up is the need to partition the available routing tables into disjoint routing planes. A specific use case is the multi-tenancy problem where each tenant has their own unique routing tables and in the very least need different default gateways. This is an attempt to build the ability to create virtual router domains aka VRF's (VRF-lite to be specific) in the linux packet forwarding stack. The main observation is that through the use of rules and socket binding to interfaces, all the facilities that we need are already present in the infrastructure. What is missing is a handle that identifies a routing domain and can be used to gather applicable rules/tables and uniqify neighbor selection. The scheme used needs to preserves the notions of ECMP, and general routing principles. [snip] drivers/net/vrf.c | 486 ++ [snip] I'm still opposed to name this 'vrf', see the v1 thread: - http://www.spinics.net/lists/netdev/msg332357.html - http://www.spinics.net/lists/netdev/msg332376.html Shrijeet seemed to agree to rename it, is there a problem? Regards, Nicolas -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] Proposal for VRF-lite
On 06/10/15 at 01:43pm, Shrijeet Mukherjee wrote: On Tue, Jun 9, 2015 at 3:15 AM, Thomas Graf tg...@suug.ch wrote: Do I understand this correctly that swp* represent veth pairs? Why do you have distinct addresses on each peer of the pair? Are the addresses in N2 and N3 considered private and NATed? [...] ???These are physical boxes in the picture not veth pairs or NAT's :)??? I see. So if I translate this to a virtual world with veths where the guest facing peer is in its own netns, the host facing veth peer would get attached to a vrf device and we should be good. ???Are you worried about ip rule scale ? this reduces the scale to number of L3 domains, which should be not that large. I do think we need to speed up rule lookup from the linear walk we have right now. I definitely have more L3 domains than what a linear search can handle. A generic classifier seems like a bigger hammer, but if that is the way to replace rules it is a worthy concept. That said, the patches from Hannes et al, will make it such that the table lookup maybe from the driver directly and thus will skip past the fib rule lookup. The approach from Hannes definitely works for the physical world but is undesirable for overlays, logical or encapsulations, where we want to avoid maintaining a net_device for every virtual network. As I said, I think this is something that can be resolved later on with a programmable classifier. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] Proposal for VRF-lite
On 06/08/15 at 11:35am, Shrijeet Mukherjee wrote: [...] model with some performance paths that need optimization. (Specifically the output route selector that Roopa, Robert, Thomas and EricB are currently discussing on the MPLS thread) Thanks for posting these patches just in time. This explains how you intent to deploy Roopa's patches in a scalable manner. High Level points 1. Simple overlay driver (minimal changes to current stack) * uses the existing fib tables and fib rules infrastructure 2. Modelled closely after the ipvlan driver 3. Uses current API and infrastructure. * Applications can use SO_BINDTODEVICE or cmsg device indentifiers to pick VRF (ping, traceroute just work) I like the aspect of reusing existing user interfaces. We might need to introduce a more fine grained capability than CAP_NET_RAW to give containers the privileges to bind to a VRF without allowing them to inject raw frames. Given I understand this correctly: If my intent was to run a process in multiple VRFs, then I would need to run that process in the host network namespace which contains the VRF devices which would also contain the physical devices. While I might want to grant my process the ability to bind to VRFs, I may not want to give it the privileges to bind to any device. So we could consider introducing CAP_NET_VRF which would allow to bind to VRF devices. * Standard IP Rules work, and since they are aggregated against the device, scale is manageable 4. Completely orthogonal to Namespaces and only provides separation in the routing plane (and ARP) 5. Debugging is built-in as tcpdump and counters on the VRF device works as is. N2 N1 (all configs here) +---+ +--+ | | |swp1 :10.0.1.1+--+swp1 :10.0.1.2 | | | | | |swp2 :10.0.2.1+--+swp2 :10.0.2.2 | | | +---+ | VRF 0| | table 5 | | | +---+ | | | VRF 1| N3 | table 6 | +---+ | | | | |swp3 :10.0.2.1+--+swp1 :10.0.2.2 | | | | | |swp4 :10.0.3.1+--+swp2 :10.0.3.2 | +--+ +---+ Do I understand this correctly that swp* represent veth pairs? Why do you have distinct addresses on each peer of the pair? Are the addresses in N2 and N3 considered private and NATed? [...] # Install the lookup rules that map table to VRF domain ip rule add pref 200 oif vrf0 lookup 5 ip rule add pref 200 iif vrf0 lookup 5 ip rule add pref 200 oif vrf1 lookup 6 ip rule add pref 200 iif vrf1 lookup 6 I think this is a good start but we all know the scalability constraints of this. Depending on the number of L3 domains, an eBPF classifier utilizing a map to translate origin to routing table and vice versa might address the scale requirement long term. [...] I will comment on the implementation specifics once I have a good understanding of your desired end state looks like. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] Proposal for VRF-lite
Le 08/06/2015 20:35, Shrijeet Mukherjee a écrit : From: Shrijeet Mukherjee s...@cumulusnetworks.com In the context of internet scale routing a requirement that always comes up is the need to partition the available routing tables into disjoint routing planes. A specific use case is the multi-tenancy problem where each tenant has their own unique routing tables and in the very least need different default gateways. This is an attempt to build the ability to create virtual router domains aka VRF's (VRF-lite to be specific) in the linux packet forwarding stack. The main observation is that through the use of [snip] drivers/net/vrf.c| 654 ++ I'm not really in favor of the name 'vrf'. This term is very controversial and having a consensus of what is/contains a 'vrf' is quite impossible. There was already a lot of discussions about this topic on quagga ml that show that everybody has a different opinion about this term ;-) I know you call this 'MRF' internally, why not using this name instead? Regards, Nicolas -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] Proposal for VRF-lite
Le 09/06/2015 16:21, David Ahern a écrit : Hi Nicolas: On 6/9/15 2:58 AM, Nicolas Dichtel wrote: I'm not really in favor of the name 'vrf'. This term is very controversial and having a consensus of what is/contains a 'vrf' is quite impossible. There was already a lot of discussions about this topic on quagga ml that show that everybody has a different opinion about this term ;-) Are you referring to this thread? https://lists.quagga.net/pipermail/quagga-dev/2014-November/011795.html No, there were recent discussions on quagga about that subject. Here is some non-exhaustive pointers: https://lists.quagga.net/pipermail/quagga-dev/2015-May/012581.html https://lists.quagga.net/pipermail/quagga-dev/2015-May/012630.html https://lists.quagga.net/pipermail/quagga-dev/2015-June/012715.html Note the last pointer also explains why it was called MRF by Cumulus. Regards, Nicolas -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] Proposal for VRF-lite
Hi Nicolas: On 6/9/15 2:58 AM, Nicolas Dichtel wrote: I'm not really in favor of the name 'vrf'. This term is very controversial and having a consensus of what is/contains a 'vrf' is quite impossible. There was already a lot of discussions about this topic on quagga ml that show that everybody has a different opinion about this term ;-) Are you referring to this thread? https://lists.quagga.net/pipermail/quagga-dev/2014-November/011795.html I could see differing opinions regarding the implementation of a VRF; is there really a controversy on what a VRF is? David -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] Proposal for VRF-lite
On Tue, Jun 9, 2015 at 7:55 AM, Nicolas Dichtel nicolas.dich...@6wind.com wrote: Le 09/06/2015 16:21, David Ahern a écrit : Hi Nicolas: On 6/9/15 2:58 AM, Nicolas Dichtel wrote: I'm not really in favor of the name 'vrf'. This term is very controversial and having a consensus of what is/contains a 'vrf' is quite impossible. There was already a lot of discussions about this topic on quagga ml that show that everybody has a different opinion about this term ;-) Are you referring to this thread? https://lists.quagga.net/pipermail/quagga-dev/2014-November/011795.html No, there were recent discussions on quagga about that subject. Here is some non-exhaustive pointers: https://lists.quagga.net/pipermail/quagga-dev/2015-May/012581.html https://lists.quagga.net/pipermail/quagga-dev/2015-May/012630.html https://lists.quagga.net/pipermail/quagga-dev/2015-June/012715.html Note the last pointer also explains why it was called MRF by Cumulus. Agreed, I used the term VRF for this series to make sure we had the right context, but clearly MRF is a term we are happier to use .. Regards, Nicolas -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] Proposal for VRF-lite
Le 09/06/2015 12:15, Thomas Graf a écrit : On 06/08/15 at 11:35am, Shrijeet Mukherjee wrote: [...] model with some performance paths that need optimization. (Specifically the output route selector that Roopa, Robert, Thomas and EricB are currently discussing on the MPLS thread) Thanks for posting these patches just in time. This explains how you intent to deploy Roopa's patches in a scalable manner. High Level points 1. Simple overlay driver (minimal changes to current stack) * uses the existing fib tables and fib rules infrastructure 2. Modelled closely after the ipvlan driver 3. Uses current API and infrastructure. * Applications can use SO_BINDTODEVICE or cmsg device indentifiers to pick VRF (ping, traceroute just work) I like the aspect of reusing existing user interfaces. We might need to introduce a more fine grained capability than CAP_NET_RAW to give containers the privileges to bind to a VRF without allowing them to inject raw frames. Given I understand this correctly: If my intent was to run a process in multiple VRFs, then I would need to run that process in the host network namespace which contains the VRF devices which would also contain the physical devices. While I might want to grant my process the ability to bind to VRFs, I may not want to give it the privileges to bind to any device. So we could consider introducing CAP_NET_VRF which would allow to bind to VRF devices. If I understand correctly, all existing applications should also be modified if I want to run them into a VRF/MRF (see my previous email)? ssh, dhcp, httpd, etc should be runnable per MRF without modifications of their source code. So, it becomes a netns. What's about an IKE dameon? It makes sense to have both: netns and MRF ; each can have their own logics of VRF-like behavior depending on how a VRF is defined by the end users. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] Proposal for VRF-lite
On Tue, Jun 9, 2015, at 14:30, Nicolas Dichtel wrote: Le 09/06/2015 12:15, Thomas Graf a écrit : On 06/08/15 at 11:35am, Shrijeet Mukherjee wrote: [...] model with some performance paths that need optimization. (Specifically the output route selector that Roopa, Robert, Thomas and EricB are currently discussing on the MPLS thread) Thanks for posting these patches just in time. This explains how you intent to deploy Roopa's patches in a scalable manner. High Level points 1. Simple overlay driver (minimal changes to current stack) * uses the existing fib tables and fib rules infrastructure 2. Modelled closely after the ipvlan driver 3. Uses current API and infrastructure. * Applications can use SO_BINDTODEVICE or cmsg device indentifiers to pick VRF (ping, traceroute just work) I like the aspect of reusing existing user interfaces. We might need to introduce a more fine grained capability than CAP_NET_RAW to give containers the privileges to bind to a VRF without allowing them to inject raw frames. Given I understand this correctly: If my intent was to run a process in multiple VRFs, then I would need to run that process in the host network namespace which contains the VRF devices which would also contain the physical devices. While I might want to grant my process the ability to bind to VRFs, I may not want to give it the privileges to bind to any device. So we could consider introducing CAP_NET_VRF which would allow to bind to VRF devices. If I understand correctly, all existing applications should also be modified if I want to run them into a VRF/MRF (see my previous email)? ssh, dhcp, httpd, etc should be runnable per MRF without modifications of their source code. So, it becomes a netns. What's about an IKE dameon? It makes sense to have both: netns and MRF ; each can have their own logics of VRF-like behavior depending on how a VRF is defined by the end users. Agreed, the idea is to have a prctl in the end which gets inherited by fork. current-rt_table_id or some kind of vrf specifier in task_struct would make that possible then. A helper tool like ip route exec table 100 /bin/bash would then start a session bound to a specific routing instance. Bye, Hannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] Proposal for VRF-lite
On 6/8/15 12:35 PM, Shrijeet Mukherjee wrote: 5. Debugging is built-in as tcpdump and counters on the VRF device works as is. Is the intent that something like this tcpdump -i vrf0 can be used to see vrf traffic? vrf_handle_frame only bumps counters; it does not switch skb-dev to the vrf device so for Rx path tcpdump will not get the packets. ie., tcpdump only shows outbound packets. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] Proposal for VRF-lite
Good catch, as you know I used to have the device getting modified in the RX path and that made it all work generic ip_rcv will need a fix to make RX visible to tcpdump, but yes, that is the goal. On Mon, Jun 8, 2015 at 12:13 PM, David Ahern dsah...@gmail.com wrote: On 6/8/15 12:35 PM, Shrijeet Mukherjee wrote: 5. Debugging is built-in as tcpdump and counters on the VRF device works as is. Is the intent that something like this tcpdump -i vrf0 can be used to see vrf traffic? vrf_handle_frame only bumps counters; it does not switch skb-dev to the vrf device so for Rx path tcpdump will not get the packets. ie., tcpdump only shows outbound packets. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] Proposal for VRF-lite
On Mon, Jun 8, 2015, at 21:13, David Ahern wrote: On 6/8/15 12:35 PM, Shrijeet Mukherjee wrote: 5. Debugging is built-in as tcpdump and counters on the VRF device works as is. Is the intent that something like this tcpdump -i vrf0 can be used to see vrf traffic? vrf_handle_frame only bumps counters; it does not switch skb-dev to the vrf device so for Rx path tcpdump will not get the packets. ie., tcpdump only shows outbound packets. My hope initially was that the vrf interface type would be as slim as possible. I am not even sure if we need packet counters, as one could easily have user space handle that by looking up the relations and accumulating them. Same for VRF traffic. But the current model does allow to add support for that easily, so why not? It depends on how far we can and want to move parts of the logic into the core stack in the end. Would you see this as a requirement? Thanks, Hannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH -mm take5 0/7] proposal for dynamic configurable netconsole
From: Keiichi KII [EMAIL PROTECTED] The netconsole is a very useful module for collecting kernel message under certain circumstances(e.g. disk logging fails, serial port is unavailable). But current netconsole is not flexible. For example, if you want to change ip address for logging agent, in the case of built-in netconsole, you can't change config except for changing boot parameter and rebooting your system, or in the case of module netconsole, you need to remove it and reload with different parameters. By adopting my patches, the current netconsole becomes a little complex. But the kernel messages(especially panic messages) is significant information to solve bugs and troubles promptly and we have been losing serial console port with PCs and Servers. I think that we need the environment in which we can collect kernel messages flexibly. So, I propose the following extended features for netconsole. 1) support for multiple logging agents. 2) add interface to access each parameter of netconsole using sysfs. [changes since take4] -change kernel base from 2.6.21-rc6-mm1 to 2.6.22-rc4-mm2. -update Documentation/networking/netconsole.txt -fix Kconfig -avoid forward-declared statics -fix coding style -use spin_lock_irqsave() and _restore() -fix race condition(netconsole_event()) -remove extra lock(write in sysfs) -change ioctl's location -use kasprintf() -error handling Your comments are very welcome. Signed-off-by: Keiichi KII [EMAIL PROTECTED] Signed-off-by: Takayoshi Kochi [EMAIL PROTECTED] --- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH -mm take4 0/6] proposal for dynamic configurable netconsole
From: Keiichi KII [EMAIL PROTECTED] The netconsole is a very useful module for collecting kernel message under certain circumstances(e.g. disk logging fails, serial port is unavailable). But current netconsole is not flexible. For example, if you want to change ip address for logging agent, in the case of built-in netconsole, you can't change config except for changing boot parameter and rebooting your system, or in the case of module netconsole, you need to remove it and reload with different parameters. By adopting my patches, the current netconsole becomes a little complex. But the kernel messages(especially panic messages) is significant information to solve bugs and troubles promptly and we have been losing serial console port with PCs and Servers. I think that we need the environment in which we can collect kernel messages flexibly. So, I propose the following extended features for netconsole. 1) support for multiple logging agents. 2) add interface to access each parameter of netconsole using sysfs. [changes since take3] -changing kernel base from 2.6.21-rc3-mm2 to 2.6.21-rc6-mm1. -introducing CONFIG_NETCONSOLE_DYNCON. -cleanup Your comments are very welcome. Signed-off-by: Keiichi KII [EMAIL PROTECTED] Signed-off-by: Takayoshi Kochi [EMAIL PROTECTED] --- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH -mm take3 0/6][resend] proposal for dynamic configurable netconsole
From: Keiichi KII [EMAIL PROTECTED] The netconsole is a very useful module for collecting kernel message under certain circumstances(e.g. disk logging fails, serial port is unavailable). But current netconsole is not flexible. For example, if you want to change ip address for logging agent, in the case of built-in netconsole, you can't change config except for changing boot parameter and rebooting your system, or in the case of module netconsole, you need to remove it and reload with different parameters. So, I propose the following extended features for netconsole. 1) support for multiple logging agents. 2) add interface to access each parameter of netconsole using sysfs. [changes since take2] -changing kernel base from 2.6.20-rc1-mm1 to 2.6.21-rc3.mm2. -using symbolic link for network device. -changing in part interface from sysfs to ioctl, because Stephen Hemminger advised us that it is a misuse that sysfs has the behavior with magic side effect such as adding/removing port. This patch is for linux-2.6.21-rc3-mm2 and is divided to each function. Your comments are very welcome. Signed-off-by: Keiichi KII [EMAIL PROTECTED] Signed-off-by: Takayoshi Kochi [EMAIL PROTECTED] --- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH -mm 0/5] proposal for dynamic configurable netconsole
Thank you for your comments. So, I propose the following extended features for netconsole. 1) support for multiple logging agents. 2) add interface to access each parameter of netconsole using sysfs. This patch is for linux-2.6.20-rc1-mm1 and is divided to each function. Your comments are very welcome. Rather than extending the existing kludge with module parameter, to sysfs. I would rather see a better API for this. Please build think about doing a better API with a basic set of ioctl's. Some additional What advantage do we use a set of ioctl's compared to sysfs? I think that sysfs is easier and more readable than the ioctl's to change configurations(IP address and port number and so on). ex) # cat /sys/class/misc/netconsole/port1/remote_ip 192.168.0.1 # echo 172.16.0.1 /sys/class/misc/netconsole/port1/remote_ip # cat /sys/class/misc/netconsole/port1/remote_ip 172.16.0.1 And the sysfs doesn't need to create access program such as the ioctl's. If you change configurations related to netconsole through the sysfs interface, a simple script file including a set of commands such as above echo will help you set up automatically. things: - shouldn't just be IPV4 specific, should handle IPV6 as well I would like to implement handling IPV6 on demand in the future. - shouldn't specify MAC address, it can do network discovery/arp to find that when adding addresses I think a userland application would rather find target MAC address and change it through the sysfs. -- Keiichi KII NEC Corporation OSS Promotion Center E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -mm take2 0/5] proposal for dynamic configurable netconsole
From: Keiichi KII [EMAIL PROTECTED] The netconsole is a very useful module for collecting kernel message under certain circumstances(e.g. disk logging fails, serial port is unavailable). But current netconsole is not flexible. For example, if you want to change ip address for logging agent, in the case of built-in netconsole, you can't change config except for changing boot parameter and rebooting your system, or in the case of module netconsole, you need to remove it and reload with different parameters. So, I propose the following extended features for netconsole. 1) support for multiple logging agents. 2) add interface to access each parameter of netconsole using sysfs. This patch is for linux-2.6.20-rc1-mm1 and is divided to each function. Your comments are very welcome. Signed-off-by: Keiichi KII [EMAIL PROTECTED] Signed-off-by: Takayoshi Kochi [EMAIL PROTECTED] --- -- Keiichi KII NEC Corporation OSS Promotion Center E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH -mm 0/5] proposal for dynamic configurable netconsole
From: Keiichi KII [EMAIL PROTECTED] The netconsole is a very useful module for collecting kernel message under certain circumstances(e.g. disk logging fails, serial port is unavailable). But current netconsole is not flexible. For example, if you want to change ip address for logging agent, in the case of built-in netconsole, you can't change config except for changing boot parameter and rebooting your system, or in the case of module netconsole, you need to reload netconsole module. So, I propose the following extended features for netconsole. 1) support for multiple logging agents. 2) add interface to access each parameter of netconsole using sysfs. This patch is for linux-2.6.20-rc1-mm1 and is divided to each function. Your comments are very welcome. Signed-off-by: Keiichi KII [EMAIL PROTECTED] --- [changes] 1. change kernel base from 2.6.19 to 2.6.20-rc1-mm1. -- Keiichi KII NEC Corporation OSS Promotion Center E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH -mm 0/5] proposal for dynamic configurable netconsole
On Fri, 22 Dec 2006 21:01:09 +0900 Keiichi KII [EMAIL PROTECTED] wrote: From: Keiichi KII [EMAIL PROTECTED] The netconsole is a very useful module for collecting kernel message under certain circumstances(e.g. disk logging fails, serial port is unavailable). But current netconsole is not flexible. For example, if you want to change ip address for logging agent, in the case of built-in netconsole, you can't change config except for changing boot parameter and rebooting your system, or in the case of module netconsole, you need to reload netconsole module. If netconsole is a module, you should be able to remove it and reload with different parameters. So, I propose the following extended features for netconsole. 1) support for multiple logging agents. 2) add interface to access each parameter of netconsole using sysfs. This patch is for linux-2.6.20-rc1-mm1 and is divided to each function. Your comments are very welcome. Rather than extending the existing kludge with module parameter, to sysfs. I would rather see a better API for this. Please build think about doing a better API with a basic set of ioctl's. Some additional things: - shouldn't just be IPV4 specific, should handle IPV6 as well - shouldn't specify MAC address, it can do network discovery/arp to find that when adding addresses - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 2.6.19 take2 0/5] proposal for dynamic configurable netconsole
From: Keiichi KII [EMAIL PROTECTED] The netconsole is a very useful module for collecting kernel message under certain circumstances(e.g. disk logging fails, serial port is unavailable). But current netconsole is not flexible. For example, if you want to change ip address for logging agent, in the case of built-in netconsole, you can't change config except for changing boot parameter and rebooting your system, or in the case of module netconsole, you need to reload netconsole module. So, I propose the following extended features for netconsole. 1) support for multiple logging agents. 2) add interface to access each parameter of netconsole using sysfs. This patch is for linux-2.6.19 and is divided to each function. Your comments are very welcome. Signed-off-by: Keiichi KII [EMAIL PROTECTED] --- -- Keiichi KII NEC Corporation OSS Promotion Center E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [sungem] proposal for a new locking strategy
On Sun, 5 Nov 2006 21:11:34 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote: On Sun, 5 Nov 2006 18:52:45 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote: On Sun, 5 Nov 2006 18:28:33 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: You could also just use net_tx_lock() now. You mean netif_tx_lock()? Thanks for letting me know about that function. Yes, I may need it. tg3 and bnx2 use it to wake up the transmit queue: if (unlikely(netif_queue_stopped(tp-dev) (tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH))) { netif_tx_lock(tp-dev); if (netif_queue_stopped(tp-dev) (tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH)) netif_wake_queue(tp-dev); netif_tx_unlock(tp-dev); } 2.6.17 didn't use it. Was it a bug? Thanks, No, it was introduced in 2.6.18. The functions are just a wrapper around the network device transmit lock that is normally held. If the device does not need to acquire the lock during IRQ, it is a good alternative and avoids a second lock. For transmit locking there are three common alternatives: Method A: dev-queue_xmit_lock and per-device tx_lock send: dev-xmit_lock held by caller dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock irq: netdev_priv(dev)-tx_lock acquired Method B: dev-queue_xmit_lock only send: dev-xmit_lock held by caller irq: schedules softirq (NAPI) napi_poll: calls netif_tx_lock() which acquires dev-xmit_lock Method C: LLTX set dev-features LLTX send: no locks held by caller dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock irq: netdev_priv(dev)-tx_lock acquired Method A is the only one that works with 2.4 and early (2.6.8?) kernels. Current sungem does Method C, and uses two locks: lock and tx_lock. What I was planning to do is Method B (which current tg3 uses). It seems to me that Method B is better than Method C. What do you think? B is better than C because the transmit logic doesn't have to spin in the case of lock contention, but it is not a big difference. Current sungem does C but uses try_lock() to acquire its private tx_lock. So it doesn't spin either in case of contention. But the spin is still there, just more complex.. In qdisc_restart() processing of NETDEV_TX_LOCKED causes: spin_lock(dev-xmit_lock) q-requeue() netif_schedule(dev); SOFTIRQ: net_tx_action() qdisc_run() -- qdisc_restart() So instead of spinning in tight loop, you end up with a longer code path. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [sungem] proposal for a new locking strategy
On 11/6/06, Stephen Hemminger [EMAIL PROTECTED] wrote: On Sun, 5 Nov 2006 21:11:34 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote: On Sun, 5 Nov 2006 18:52:45 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote: On Sun, 5 Nov 2006 18:28:33 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: You could also just use net_tx_lock() now. You mean netif_tx_lock()? Thanks for letting me know about that function. Yes, I may need it. tg3 and bnx2 use it to wake up the transmit queue: if (unlikely(netif_queue_stopped(tp-dev) (tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH))) { netif_tx_lock(tp-dev); if (netif_queue_stopped(tp-dev) (tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH)) netif_wake_queue(tp-dev); netif_tx_unlock(tp-dev); } 2.6.17 didn't use it. Was it a bug? Thanks, No, it was introduced in 2.6.18. The functions are just a wrapper around the network device transmit lock that is normally held. If the device does not need to acquire the lock during IRQ, it is a good alternative and avoids a second lock. For transmit locking there are three common alternatives: Method A: dev-queue_xmit_lock and per-device tx_lock send: dev-xmit_lock held by caller dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock irq: netdev_priv(dev)-tx_lock acquired Method B: dev-queue_xmit_lock only send: dev-xmit_lock held by caller irq: schedules softirq (NAPI) napi_poll: calls netif_tx_lock() which acquires dev-xmit_lock Method C: LLTX set dev-features LLTX send: no locks held by caller dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock irq: netdev_priv(dev)-tx_lock acquired Method A is the only one that works with 2.4 and early (2.6.8?) kernels. Current sungem does Method C, and uses two locks: lock and tx_lock. What I was planning to do is Method B (which current tg3 uses). It seems to me that Method B is better than Method C. What do you think? B is better than C because the transmit logic doesn't have to spin in the case of lock contention, but it is not a big difference. Current sungem does C but uses try_lock() to acquire its private tx_lock. So it doesn't spin either in case of contention. But the spin is still there, just more complex.. In qdisc_restart() processing of NETDEV_TX_LOCKED causes: spin_lock(dev-xmit_lock) q-requeue() netif_schedule(dev); SOFTIRQ: net_tx_action() qdisc_run() -- qdisc_restart() So instead of spinning in tight loop, you end up with a longer code path. Stephen, sorry for insisting a bit but I'm failing to see how B is different from C in that respect. With method B, in qdisc_restart(), if netif_tx_trylock() fails to acquire the lock then we also requeue(), etc. Same long code path in case of contention. -- Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [sungem] proposal for a new locking strategy
On Mon, 6 Nov 2006 21:55:20 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: On 11/6/06, Stephen Hemminger [EMAIL PROTECTED] wrote: On Sun, 5 Nov 2006 21:11:34 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote: On Sun, 5 Nov 2006 18:52:45 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote: On Sun, 5 Nov 2006 18:28:33 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: You could also just use net_tx_lock() now. You mean netif_tx_lock()? Thanks for letting me know about that function. Yes, I may need it. tg3 and bnx2 use it to wake up the transmit queue: if (unlikely(netif_queue_stopped(tp-dev) (tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH))) { netif_tx_lock(tp-dev); if (netif_queue_stopped(tp-dev) (tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH)) netif_wake_queue(tp-dev); netif_tx_unlock(tp-dev); } 2.6.17 didn't use it. Was it a bug? Thanks, No, it was introduced in 2.6.18. The functions are just a wrapper around the network device transmit lock that is normally held. If the device does not need to acquire the lock during IRQ, it is a good alternative and avoids a second lock. For transmit locking there are three common alternatives: Method A: dev-queue_xmit_lock and per-device tx_lock send: dev-xmit_lock held by caller dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock irq: netdev_priv(dev)-tx_lock acquired Method B: dev-queue_xmit_lock only send: dev-xmit_lock held by caller irq: schedules softirq (NAPI) napi_poll: calls netif_tx_lock() which acquires dev-xmit_lock Method C: LLTX set dev-features LLTX send: no locks held by caller dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock irq: netdev_priv(dev)-tx_lock acquired Method A is the only one that works with 2.4 and early (2.6.8?) kernels. Current sungem does Method C, and uses two locks: lock and tx_lock. What I was planning to do is Method B (which current tg3 uses). It seems to me that Method B is better than Method C. What do you think? B is better than C because the transmit logic doesn't have to spin in the case of lock contention, but it is not a big difference. Current sungem does C but uses try_lock() to acquire its private tx_lock. So it doesn't spin either in case of contention. But the spin is still there, just more complex.. In qdisc_restart() processing of NETDEV_TX_LOCKED causes: spin_lock(dev-xmit_lock) q-requeue() netif_schedule(dev); SOFTIRQ: net_tx_action() qdisc_run() -- qdisc_restart() So instead of spinning in tight loop, you end up with a longer code path. Stephen, sorry for insisting a bit but I'm failing to see how B is different from C in that respect. With method B, in qdisc_restart(), if netif_tx_trylock() fails to acquire the lock then we also requeue(), etc. Same long code path in case of contention. Method C LLTX causes repeated softirq's which will be slower since the loop requires more instructions than a simple spin loop (Method B). -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [sungem] proposal for a new locking strategy
On 11/6/06, Stephen Hemminger [EMAIL PROTECTED] wrote: On Mon, 6 Nov 2006 21:55:20 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: On 11/6/06, Stephen Hemminger [EMAIL PROTECTED] wrote: On Sun, 5 Nov 2006 21:11:34 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote: On Sun, 5 Nov 2006 18:52:45 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: On 11/5/06, Stephen Hemminger [EMAIL PROTECTED] wrote: On Sun, 5 Nov 2006 18:28:33 +0100 Eric Lemoine [EMAIL PROTECTED] wrote: You could also just use net_tx_lock() now. You mean netif_tx_lock()? Thanks for letting me know about that function. Yes, I may need it. tg3 and bnx2 use it to wake up the transmit queue: if (unlikely(netif_queue_stopped(tp-dev) (tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH))) { netif_tx_lock(tp-dev); if (netif_queue_stopped(tp-dev) (tg3_tx_avail(tp) TG3_TX_WAKEUP_THRESH)) netif_wake_queue(tp-dev); netif_tx_unlock(tp-dev); } 2.6.17 didn't use it. Was it a bug? Thanks, No, it was introduced in 2.6.18. The functions are just a wrapper around the network device transmit lock that is normally held. If the device does not need to acquire the lock during IRQ, it is a good alternative and avoids a second lock. For transmit locking there are three common alternatives: Method A: dev-queue_xmit_lock and per-device tx_lock send: dev-xmit_lock held by caller dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock irq: netdev_priv(dev)-tx_lock acquired Method B: dev-queue_xmit_lock only send: dev-xmit_lock held by caller irq: schedules softirq (NAPI) napi_poll: calls netif_tx_lock() which acquires dev-xmit_lock Method C: LLTX set dev-features LLTX send: no locks held by caller dev-hard_start_xmit acquires netdev_priv(dev)-tx_lock irq: netdev_priv(dev)-tx_lock acquired Method A is the only one that works with 2.4 and early (2.6.8?) kernels. Current sungem does Method C, and uses two locks: lock and tx_lock. What I was planning to do is Method B (which current tg3 uses). It seems to me that Method B is better than Method C. What do you think? B is better than C because the transmit logic doesn't have to spin in the case of lock contention, but it is not a big difference. Current sungem does C but uses try_lock() to acquire its private tx_lock. So it doesn't spin either in case of contention. But the spin is still there, just more complex.. In qdisc_restart() processing of NETDEV_TX_LOCKED causes: spin_lock(dev-xmit_lock) q-requeue() netif_schedule(dev); SOFTIRQ: net_tx_action() qdisc_run() -- qdisc_restart() So instead of spinning in tight loop, you end up with a longer code path. Stephen, sorry for insisting a bit but I'm failing to see how B is different from C in that respect. With method B, in qdisc_restart(), if netif_tx_trylock() fails to acquire the lock then we also requeue(), etc. Same long code path in case of contention. Method C LLTX causes repeated softirq's which will be slower since the loop requires more instructions than a simple spin loop (Method B). What I'm saying above is that Method B also causes repeated tx softirqs in case of contention on netif_tx_lock. The code path is : netif_tx_trylock() fails - requeue() - netif_schedule() - raise_softirq(NET_TX_SOFTIRQ). Am I missing anything? -- Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[sungem] proposal for a new locking strategy
Hi! Some (long) time ago benh wrote a blaming comment in sungem.c about that driver's locking strategy. That comment basically says that we probably don't need two spinlocks. I agree! Proposal: Today's sungem effectively uses two spinlock's: lock and tx_lock. tx_lock is held by the xmit function when sending out a packet. Lots of functions grab tx_lock not to mess up with xmit (gem_stop_phy(), gem_change_mtu(), etc.). All of these funcs also take lock! What we could do is remove lx_lock, have the above functions take only lock, and rely on dev-_xmit_lock to protect the xmit func from reentrance. In that case, obviously, the driver wouldn't feature LLTX anymore. When (re-)configuring we'd now quiesce the device, with the new functions gem_netif_stop() and gem_full_lock(), in the same way as tg3 does. gem_interrupt(), gem_poll(), and gem_start_xmit() could become lockless. Fast! Basically this proposal makes the data path faster, the control path slower, and simplifies the code by using one single spinlock within the driver. If the idea seems reasonable to you guys I can go ahead and cook up something... Thanks, -- Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [sungem] proposal for a new locking strategy
On Sun, 2006-11-05 at 14:00 +0100, Eric Lemoine wrote: Hi! Some (long) time ago benh wrote a blaming comment in sungem.c about that driver's locking strategy. That comment basically says that we probably don't need two spinlocks. Yeah :) Note that I mostly blamed myself there ... Just never found the time to sit down a figure out a proper locking. I agree! Proposal: Today's sungem effectively uses two spinlock's: lock and tx_lock. tx_lock is held by the xmit function when sending out a packet. Lots of functions grab tx_lock not to mess up with xmit (gem_stop_phy(), gem_change_mtu(), etc.). All of these funcs also take lock! What we could do is remove lx_lock, have the above functions take only lock, and rely on dev-_xmit_lock to protect the xmit func from reentrance. In that case, obviously, the driver wouldn't feature LLTX anymore. We could probably do even better but yeah, a single lock is a good start. Overall, I'm unhappy with the infrastructure provided by the network stack though. (Might be better nowadays, but last I looked, for example, I couldn't properly do things like stopping MAPI poll from set_multicast etc... due to locks held by the upper level). When (re-)configuring we'd now quiesce the device, with the new functions gem_netif_stop() and gem_full_lock(), in the same way as tg3 does. gem_interrupt(), gem_poll(), and gem_start_xmit() could become lockless. Fast! Basically this proposal makes the data path faster, the control path slower, and simplifies the code by using one single spinlock within the driver. If the idea seems reasonable to you guys I can go ahead and cook up something... I certainly does. Bring on the patch ! :-) Ben. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [sungem] proposal for a new locking strategy
On 11/5/06, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: On Sun, 2006-11-05 at 14:00 +0100, Eric Lemoine wrote: Hi! Some (long) time ago benh wrote a blaming comment in sungem.c about that driver's locking strategy. That comment basically says that we probably don't need two spinlocks. Yeah :) Note that I mostly blamed myself there ... Just never found the time to sit down a figure out a proper locking. I actually did introduce tx_lock! So you could well have blamed me :-) I agree! Proposal: Today's sungem effectively uses two spinlock's: lock and tx_lock. tx_lock is held by the xmit function when sending out a packet. Lots of functions grab tx_lock not to mess up with xmit (gem_stop_phy(), gem_change_mtu(), etc.). All of these funcs also take lock! What we could do is remove lx_lock, have the above functions take only lock, and rely on dev-_xmit_lock to protect the xmit func from reentrance. In that case, obviously, the driver wouldn't feature LLTX anymore. We could probably do even better but yeah, a single lock is a good start. Overall, I'm unhappy with the infrastructure provided by the network stack though. (Might be better nowadays, but last I looked, for example, I couldn't properly do things like stopping MAPI poll from set_multicast etc... due to locks held by the upper level). What you said in your comment is that set_multicast and change_mtu cannot schedule() because the upper layer holds a spinlock. This is still the case actually. When (re-)configuring we'd now quiesce the device, with the new functions gem_netif_stop() and gem_full_lock(), in the same way as tg3 does. gem_interrupt(), gem_poll(), and gem_start_xmit() could become lockless. Fast! Basically this proposal makes the data path faster, the control path slower, and simplifies the code by using one single spinlock within the driver. If the idea seems reasonable to you guys I can go ahead and cook up something... I certainly does. Bring on the patch ! :-) Will arrange some time to do it. Thanks for your quick response. -- Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html