Re: [PATCH 0/6] eBPF JIT for PPC64
On 2016/06/10 10:47PM, David Miller wrote: > From: "Naveen N. Rao"> Date: Tue, 7 Jun 2016 19:02:17 +0530 > > > Please note that patch [2] is a pre-requisite for this patchset, and is > > not yet upstream. > ... > > [1] http://thread.gmane.org/gmane.linux.kernel/2188694 > > [2] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/96514 > > Because of #2 I don't think I can take this directly into the networking > tree, right? > > Therefore, how would you like this to be merged? Hi David, Thanks for asking. Yes, I think it is better to take this through the powerpc tree as all the changes are contained within arch/powerpc, unless Michael Ellerman feels differently. Michael? Regards, Naveen
Re: [PATCH ipvs-next] ipvs: count pre-established TCP states as active
On Sun, Jun 12, 2016 at 06:27:39PM +0300, Julian Anastasov wrote: > > Hello, > > On Fri, 3 Jun 2016, Michal Kubecek wrote: > > > Some users observed that "least connection" distribution algorithm doesn't > > handle well bursts of TCP connections from reconnecting clients after > > a node or network failure. > > > > This is because the algorithm counts active connection as worth 256 > > inactive ones where for TCP, "active" only means TCP connections in > > ESTABLISHED state. In case of a connection burst, new connections are > > handled before previous ones have finished the three way handshaking so > > that all are still counted as "inactive", i.e. cheap ones. The become > > "active" quickly but at that time, all of them are already assigned to one > > real server (or few), resulting in highly unbalanced distribution. > > > > Address this by counting the "pre-established" states as "active". > > > > Signed-off-by: Michal Kubecek> > Acked-by: Julian Anastasov > > Simon, please apply! Thanks, done.
[PATCH net] net_sched: prio: rollback allocations if prio_init() fails
From: Eric DumazetNow prio_init() can return -ENOMEM, it also has to make sure any allocated qdisc are freed, since the caller (qdisc_create()) wont call ->destroy() handler for us. Fixes: cbdf45116478 ("net_sched: prio: properly report out of memory errors") Signed-off-by: Eric Dumazet Reported-by: Cong Wang --- net/sched/sch_prio.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c index 071718bccdab..9b703f2c921b 100644 --- a/net/sched/sch_prio.c +++ b/net/sched/sch_prio.c @@ -221,20 +221,18 @@ static int prio_tune(struct Qdisc *sch, struct nlattr *opt) static int prio_init(struct Qdisc *sch, struct nlattr *opt) { struct prio_sched_data *q = qdisc_priv(sch); - int i; + int err, i; for (i = 0; i < TCQ_PRIO_BANDS; i++) q->queues[i] = _qdisc; - if (opt == NULL) { + if (!opt) return -EINVAL; - } else { - int err; - if ((err = prio_tune(sch, opt)) != 0) - return err; - } - return 0; + err = prio_tune(sch, opt); + if (err) + prio_destroy(sch); + return err; } static int prio_dump(struct Qdisc *sch, struct sk_buff *skb)
Re: [PATCH net] net_sched: prio: properly report out of memory errors
On Sun, 2016-06-12 at 20:45 -0700, Cong Wang wrote: > On Sun, Jun 12, 2016 at 4:21 PM, Eric Dumazetwrote: > > + struct Qdisc *child; > > + > > + if (q->queues[i] != _qdisc) > > + continue; > > + > > + child = qdisc_create_dflt(sch->dev_queue, _qdisc_ops, > > + TC_H_MAKE(sch->handle, i + 1)); > > + if (!child) > > + return -ENOMEM; > > Since this is inside a loop, shouldn't we kfree the previous child > creations when we fail? You're right. prio_init() needs to do the cleanup, as prio_destroy() wont be called from qdisc_create() I am testing a fix with fault injection. Thanks.
Re: [PATCH net] net_sched: fix pfifo_head_drop behavior vs backlog
On Sun, Jun 12, 2016 at 8:01 PM, Eric Dumazetwrote: > From: Eric Dumazet > > When the qdisc is full, we drop a packet at the head of the queue, > queue the current skb and return NET_XMIT_CN > > Now we track backlog on upper qdiscs, we need to call > qdisc_tree_reduce_backlog(), even if the qlen did not change. > > Fixes: 2f5fb43f ("net_sched: update hierarchical backlog too") > Signed-off-by: Eric Dumazet > Cc: WANG Cong > Cc: Jamal Hadi Salim Acked-by: Cong Wang
Re: [PATCH net] net_sched: prio: properly report out of memory errors
On Sun, Jun 12, 2016 at 4:21 PM, Eric Dumazetwrote: > + struct Qdisc *child; > + > + if (q->queues[i] != _qdisc) > + continue; > + > + child = qdisc_create_dflt(sch->dev_queue, _qdisc_ops, > + TC_H_MAKE(sch->handle, i + 1)); > + if (!child) > + return -ENOMEM; Since this is inside a loop, shouldn't we kfree the previous child creations when we fail? > + sch_tree_lock(sch); > + q->queues[i] = child; > + sch_tree_unlock(sch); > } > return 0; > } > >
Re: [PATCH] netfilter/nflog: nflog-range does not truncate packets
On 06/09/2016 01:57 PM, Vishwanath Pai wrote: > On 06/08/2016 08:16 AM, Pablo Neira Ayuso wrote: >> Looking again at your code: >> >> case NFULNL_COPY_PACKET: >> - if (inst->copy_range > skb->len) >> + data_len = inst->copy_range; >> + if (li->u.ulog.copy_len < data_len) >> + data_len = li->u.ulog.copy_len; >> >> data_len is set to instance's copy_range. >> >> But then, if the NFLOG rule indicates smaller copy_len, you use this >> value. So to my understanding, NFLOG rule prevails over instance's >> copy_range in what matters, which is to shrink the copy range. >> --nflog-range will not override the per-instance default, the only time it would get preference is when its value is lesser than the per-instance value. If copy_range is lesser than --nflog-range then we retain copy_range. So basically what we are doing is min(copy_range, nflog-range). Just wanted to clarify this, if this is not how it's meant to be please let me know. Also, there is a bug in my patch, li->u.ulog.copy_len can be set to "0" from userspace (if --nflog-range is not specified), so we have to check for this condition before using the value. I will send a V2 of the patch based on your reply. >> Currently, li->u.ulog.copy_len is set to "0" by default when not >> specified. >> >> But copy_len = 0 is a valid possibility, so this looks a bit more >> tricky to me to fix since we may need to get flags here to know when >> this is set. >> >> Probably something like: >> >> if (li->flags & NF_LOG_F_COPY_RANGE) >> data_len = li->u.ulog.copy_len; >> /* Per-instance copy range prevails over global per-rule option. */ >> if (data_len < inst->copy_range) >> data_len = inst->copy_range; >> if (data_len > skb->len) >> data_len = skb->len; >> >> Although this would require a bit more code to introduce these flags. >> >> You will also need a new flag for xt_NFLOG: >> >> #define XT_NFLOG_COPY_LEN 0x2 >> >> it seems other XT_NFLOG_* flags were already in place. >> >> Interesting that nobody noticed this for so long BTW. > > I tried this out, I added two flags: one for XT_NFLOG to notify the > kernel when --nflog-range is set by the user, and another flag for > nfnetlink_log to pass this on to nfulnl_log_packet. This design works > fine but while testing this I found a problem. > > Lets say --nflog-range is set to 200, and the instance copy_range is set > to 100. According to the code above the final value of data_len will be > 200 so we try to pack 200 bytes into the skb. But somewhere between > nfnetlink_log to dumpcap the packet is getting truncated and dumpcap > doesn't like this: > > $> dumpcap -y NFLOG -i nflog:5 -s 100 > Capturing on 'nflog:5' > File: /tmp/wireshark_pcapng_nflog-5_20160609133531_pi6MrS > dumpcap: Error while capturing packets: Message truncated: (got: 228) > (nlmsg_len: 320) > Please report this to the Wireshark developers. > https://bugs.wireshark.org/ > (This is not a crash; please do not report it as such.) > Packets captured: 0 > Packets received/dropped on interface 'nflog:5': 0/0 > (pcap:0/dumpcap:0/flushed:0/ps_ifdrop:0) (0.0%) > > I'm trying to figure out where the packet is getting truncated. > I found where the problem is. This is the userspace code for libpcap: do { len = recv(handle->fd, handle->buffer, handle->bufsize, 0); if (handle->break_loop) { handle->break_loop = 0; return -2; } } while ((len == -1) && (errno == EINTR)); ... buf = handle->buffer; while (len >= NLMSG_SPACE(0)) { const struct nlmsghdr *nlh = (const struct nlmsghdr *) buf; u_int32_t msg_len; nftype_t type = OTHER; if (nlh->nlmsg_len < sizeof(struct nlmsghdr) || len < nlh->nlmsg_len) { snprintf(handle->errbuf, PCAP_ERRBUF_SIZE, "Message truncated: (got: %d) (nlmsg_len: %u)", len, nlh->nlmsg_len); return -1; } handle->bufsize is only big enough to accommodate the snaplen specified by the user in dumpcap. So if we send more data than that then we will break userspace. One way around this is to send min(li->u.ulog.copy_len, inst->copy_range). If this is OK then I can send a patch for this, please suggest.
Re: [PATCH v4] udp reuseport: fix packet of same flow hashed to different socket
On Mon, 2016-06-13 at 11:02 +0800, Su Xuemin wrote: > From: "Su, Xuemin"> > There is a corner case in which udp packets belonging to a same > flow are hashed to different socket when hslot->count changes from 10 > to 11: ... > Signed-off-by: Su, Xuemin > Signed-off-by: Eric Dumazet > --- > I use this tree to generate the patch: > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > > net/ipv4/udp.c | 73 > +- > net/ipv6/udp.c | 71 +--- > 2 files changed, 32 insertions(+), 112 deletions(-) Very nice simplification of UDP stack, thanks a lot for finalizing this.
[PATCH v4] udp reuseport: fix packet of same flow hashed to different socket
From: "Su, Xuemin"There is a corner case in which udp packets belonging to a same flow are hashed to different socket when hslot->count changes from 10 to 11: 1) When hslot->count <= 10, __udp_lib_lookup() searches udp_table->hash, and always passes 'daddr' to udp_ehashfn(). 2) When hslot->count > 10, __udp_lib_lookup() searches udp_table->hash2, but may pass 'INADDR_ANY' to udp_ehashfn() if the sockets are bound to INADDR_ANY instead of some specific addr. That means when hslot->count changes from 10 to 11, the hash calculated by udp_ehashfn() is also changed, and the udp packets belonging to a same flow will be hashed to different socket. This is easily reproduced: 1) Create 10 udp sockets and bind all of them to 0.0.0.0:4. 2) From the same host send udp packets to 127.0.0.1:4, record the socket index which receives the packets. 3) Create 1 more udp socket and bind it to 0.0.0.0:44096. The number 44096 is 4 + UDP_HASH_SIZE(4096), this makes the new socket put into the same hslot as the aformentioned 10 sockets, and makes the hslot->count change from 10 to 11. 4) From the same host send udp packets to 127.0.0.1:4, and the socket index which receives the packets will be different from the one received in step 2. This should not happen as the socket bound to 0.0.0.0:44096 should not change the behavior of the sockets bound to 0.0.0.0:4. It's the same case for IPv6, and this patch also fixes that. Signed-off-by: Su, Xuemin Signed-off-by: Eric Dumazet --- I use this tree to generate the patch: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git net/ipv4/udp.c | 73 +- net/ipv6/udp.c | 71 +--- 2 files changed, 32 insertions(+), 112 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 0ff31d9..55ec77c 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -391,9 +391,9 @@ int udp_v4_get_port(struct sock *sk, unsigned short snum) return udp_lib_get_port(sk, snum, ipv4_rcv_saddr_equal, hash2_nulladdr); } -static inline int compute_score(struct sock *sk, struct net *net, - __be32 saddr, unsigned short hnum, __be16 sport, - __be32 daddr, __be16 dport, int dif) +static int compute_score(struct sock *sk, struct net *net, +__be32 saddr, __be16 sport, +__be32 daddr, unsigned short hnum, int dif) { int score; struct inet_sock *inet; @@ -434,52 +434,6 @@ static inline int compute_score(struct sock *sk, struct net *net, return score; } -/* - * In this second variant, we check (daddr, dport) matches (inet_rcv_sadd, inet_num) - */ -static inline int compute_score2(struct sock *sk, struct net *net, -__be32 saddr, __be16 sport, -__be32 daddr, unsigned int hnum, int dif) -{ - int score; - struct inet_sock *inet; - - if (!net_eq(sock_net(sk), net) || - ipv6_only_sock(sk)) - return -1; - - inet = inet_sk(sk); - - if (inet->inet_rcv_saddr != daddr || - inet->inet_num != hnum) - return -1; - - score = (sk->sk_family == PF_INET) ? 2 : 1; - - if (inet->inet_daddr) { - if (inet->inet_daddr != saddr) - return -1; - score += 4; - } - - if (inet->inet_dport) { - if (inet->inet_dport != sport) - return -1; - score += 4; - } - - if (sk->sk_bound_dev_if) { - if (sk->sk_bound_dev_if != dif) - return -1; - score += 4; - } - - if (sk->sk_incoming_cpu == raw_smp_processor_id()) - score++; - - return score; -} - static u32 udp_ehashfn(const struct net *net, const __be32 laddr, const __u16 lport, const __be32 faddr, const __be16 fport) @@ -492,11 +446,11 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr, udp_ehash_secret + net_hash_mix(net)); } -/* called with read_rcu_lock() */ +/* called with rcu_read_lock() */ static struct sock *udp4_lib_lookup2(struct net *net, __be32 saddr, __be16 sport, __be32 daddr, unsigned int hnum, int dif, - struct udp_hslot *hslot2, unsigned int slot2, + struct udp_hslot *hslot2, struct sk_buff *skb) { struct sock *sk, *result; @@ -506,7 +460,7 @@ static struct sock *udp4_lib_lookup2(struct net *net, result = NULL; badness = 0; udp_portaddr_for_each_entry_rcu(sk, >head) { - score = compute_score2(sk, net, saddr, sport, + score = compute_score(sk, net,
[PATCH net] net_sched: fix pfifo_head_drop behavior vs backlog
From: Eric DumazetWhen the qdisc is full, we drop a packet at the head of the queue, queue the current skb and return NET_XMIT_CN Now we track backlog on upper qdiscs, we need to call qdisc_tree_reduce_backlog(), even if the qlen did not change. Fixes: 2f5fb43f ("net_sched: update hierarchical backlog too") Signed-off-by: Eric Dumazet Cc: WANG Cong Cc: Jamal Hadi Salim --- net/sched/sch_fifo.c |4 1 file changed, 4 insertions(+) diff --git a/net/sched/sch_fifo.c b/net/sched/sch_fifo.c index 2177eac0a61e..2e4bd2c0a50c 100644 --- a/net/sched/sch_fifo.c +++ b/net/sched/sch_fifo.c @@ -37,14 +37,18 @@ static int pfifo_enqueue(struct sk_buff *skb, struct Qdisc *sch) static int pfifo_tail_enqueue(struct sk_buff *skb, struct Qdisc *sch) { + unsigned int prev_backlog; + if (likely(skb_queue_len(>q) < sch->limit)) return qdisc_enqueue_tail(skb, sch); + prev_backlog = sch->qstats.backlog; /* queue full, remove one skb to fulfill the limit */ __qdisc_queue_drop_head(sch, >q); qdisc_qstats_drop(sch); qdisc_enqueue_tail(skb, sch); + qdisc_tree_reduce_backlog(sch, 0, prev_backlog - sch->qstats.backlog); return NET_XMIT_CN; }
udp failures traced to e858fae2b0b8
Mike: UDP tests in my vrf unit test suite are failing. git bisect points to: dsa@kenny:~/kernel-2.git$ git bisect good e858fae2b0b8f41f0bed2cdffde25e7c97da38a7 is the first bad commit commit e858fae2b0b8f41f0bed2cdffde25e7c97da38a7 Author: Mike RapoportDate: Wed Jun 8 16:09:21 2016 +0300 virtio_net: use common code for virtio_net_hdr and skb GSO conversion Replace open coded conversion between virtio_net_hdr to skb GSO info with virtio_net_hdr_{from,to}_skb Signed-off-by: Mike Rapoport Signed-off-by: David S. Miller :04 04 7e53f434e2d3a50ac685ce153dd92fc1a207de0e 470864c731ae7dd680f9807d2a1a359c715a4270 M drivers On console I get the following: [ 80.414134] skbuff: bad partial csum: csum=34/6 len=40 [ 80.416105] eth1: bad gso: type: 0, size: 0 Nothing special on the setup -- shell scripts and qemu commands that have worked fine for years. Host OS is debian/jessie: dsa@kenny:dsa-ns:~$ /usr/bin/qemu-system-x86_64 --version QEMU emulator version 2.1.2 (Debian 1:2.1+dfsg-12+deb8u6), Copyright (c) 2003-2008 Fabrice Bellard Networking arg: -netdev type=tap,vhost=on,ifname=vm01-eth1,script=no,downscript=no,id=netdev1 -device virtio-net-pci,mac=02:e0:f9:1c:b9:74,netdev=netdev1,romfile= tap device is connected to a bridge. David
Re: [RFC] Handle error writing UINT_MAX to u32 fields
The suggested change would extend the usable range of positive numbers by one bit only. As many systems are 64 bit this does not seem forward looking. I would prefer to have a routine that can handle 64 bit integers with limits (let's call it proc_doint64vec_minmax) which uses fields extra1 and extra2 of ctl_table as min and max. Then set xfrm_table[].extra1 = 0 and xfrm_table[].extra2 = UINT_MAX if you need a result in the u32 range. Thanks Heinrich. Do you think we can use proc_doulongvec_minmax for this?
Re: [very-RFC 6/8] Add TSN event-tracing
On Sun, 12 Jun 2016 23:25:10 +0200 Henrik Austadwrote: > > > +#include > > > +#include > > > +/* #include */ > > > + > > > +/* FIXME: update to TRACE_CLASS to reduce overhead */ > > > > I'm curious to why I didn't do this now. A class would make less > > duplication of typing too ;-) > > Yeah, I found this in a really great article written by some tracing-dude, > I hear he talks really, really fast! I plead the 5th! > > https://lwn.net/Articles/381064/ > > > > +TRACE_EVENT(tsn_buffer_write, > > > + > > > + TP_PROTO(struct tsn_link *link, > > > + size_t bytes), > > > + > > > + TP_ARGS(link, bytes), > > > + > > > + TP_STRUCT__entry( > > > + __field(u64, stream_id) > > > + __field(size_t, size) > > > + __field(size_t, bsize) > > > + __field(size_t, size_left) > > > + __field(void *, buffer) > > > + __field(void *, head) > > > + __field(void *, tail) > > > + __field(void *, end) > > > + ), > > > + > > > + TP_fast_assign( > > > + __entry->stream_id = link->stream_id; > > > + __entry->size = bytes; > > > + __entry->bsize = link->used_buffer_size; > > > + __entry->size_left = (link->head - link->tail) % > > > link->used_buffer_size; > > > > Move this logic into the print statement, since you save head and tail. > > Ok, any particular reason? Because it removes calculations during the trace. The calculations done in TP_printk() are done at the time of reading the trace, and calculations done in TP_fast_assign() are done during the recording and hence adding more overhead to the trace itself. > > > > + __entry->buffer = link->buffer; > > > + __entry->head = link->head; > > > + __entry->tail = link->tail; > > > + __entry->end = link->end; > > > + ), > > > + > > > + TP_printk("stream_id=%llu, copy=%zd, buffer: %zd, avail=%zd, > > > [buffer=%p, head=%p, tail=%p, end=%p]", > > > + __entry->stream_id, __entry->size, __entry->bsize, > > > __entry->size_left, > > > > __entry->stream_id, __entry->size, __entry->bsize, > > (__entry->head - __entry->tail) % __entry->bsize, > > > > Ok, so is this about saving space by dropping one intermediate value, or is > it some other point I'm missing here? Nope, just moving the overhead from the recording of the trace to the reading of the trace. > > > > + __entry->buffer,__entry->head, __entry->tail, __entry->end) > > > + > > > + ); > > > + > > > + > > > + TP_fast_assign( > > > + __entry->stream_id = link->stream_id; > > > + __entry->vlan_tag = (skb_vlan_tag_present(skb) ? > > > skb_vlan_tag_get(skb) : 0); > > > + __entry->bytes = bytes; > > > + __entry->data_len = skb->data_len; > > > + __entry->headlen = skb_headlen(skb); > > > + __entry->protocol = ntohs(vlan_get_protocol(skb)); > > > > Maybe it would be better to do the ntohs() in the TP_printk() as well. > > > > > + __entry->prot_native = ntohs(skb->protocol); > > > > here too. > > > > > + __entry->tx_idx = skb_get_queue_mapping(skb); > > > + > > > + __entry->mac_len = skb->mac_len; > > > + __entry->hdr_len = skb->hdr_len; > > > + __entry->vlan_tci = skb->vlan_tci; > > > + __entry->mac_header = skb->mac_header; > > > + __entry->tail = (unsigned int)skb->tail; > > > + __entry->end = (unsigned int)skb->end; > > > + __entry->truesize = skb->truesize; > > > + ), > > > + > > > + > > > TP_printk("stream_id=%llu,vlan_tag=0x%04x,data_size=%zd,data_len=%zd,headlen=%u,proto=0x%04x > > > > > > (0x%04x),tx_idx=%d,mac_len=%u,hdr_len=%u,vlan_tci=0x%02x,mac_header=0x%02x,tail=%u,end=%u,truesize=%u", > > > + __entry->stream_id, > > > + __entry->vlan_tag, > > > + __entry->bytes, > > > + __entry->data_len, > > > + __entry->headlen, > > > + __entry->protocol, > > > + __entry->prot_native, __entry->tx_idx, > > > + __entry->mac_len, > > > + __entry->hdr_len, > > > + __entry->vlan_tci, > > > + __entry->mac_header, > > > > Is this an ether mac header? If so we support %M. But as it's defined > > as only u16, it doesn't seem like it can be. > > Actually, looking at the output, I'm not quite sure what it is that I > wanted to grab with that, the skb->mac_header should give an offset into > the header-area of skb, so it should be a constant offset from skb->head > (that is an actual pointer). > > I *think* I wanted to make sure I updated things correctly so that the > offset didn't suddenly change, but the fact that I'm no longer sure > indicates that I should just drop that one. That whole printout is too long > anyway.. > > Thanks for pointing a finger at this! > > > I'm still a bit stymied as to why logic should be in TP_printk() and not > TP_fast_assign(). Not that I really
Re: [PATCH v2 -next] sched: remove NET_XMIT_POLICED
From: Cong WangDate: Sat, 11 Jun 2016 17:14:09 -0700 > On Sat, Jun 11, 2016 at 3:54 PM, David Miller wrote: >> From: Florian Westphal >> Date: Sat, 11 Jun 2016 12:46:04 +0200 >> >>> sch_atm returns this when TC_ACT_SHOT classification occurs. >>> >>> But all other schedulers that use tc_classify >>> (htb, hfsc, drr, fq_codel ...) return NET_XMIT_SUCCESS | __BYPASS >>> in this case so just do that in atm. >> >> Yes, but it's been like this for two decades. >> >> One of the points of my reply to the previous version of this >> patch is that indeed it is explicitly returned, intentionally. > > I think this is a leftover from CONFIG_NET_CLS_POLICE removal. > > BTW, probably TC_POLICE_* can be removed together. Ok I'll apply this.
Re: [PATCH] net: ethernet: ti: cpsw: use destroy ctlr to destroy channels
From: Ivan KhoronzhukDate: Sat, 11 Jun 2016 01:11:54 +0300 > There is no reason to destroy channels that are destroyed while > cpdma_ctlr destroy. In this case no need to remember how much > channels where created and destroy them by one, as cpdma_ctlr > destroys all of them. > > Signed-off-by: Ivan Khoronzhuk Applied.
Re: [PATCH net-next RFT] net: fec: handle small PHY reset durations more precisely
From: Stefan WahrenDate: Wed, 8 Jun 2016 20:42:46 + > Since msleep is based on jiffies the PHY reset could take longer > than expected. So use msleep for values greater than 20 msec otherwise > usleep_range. > > Signed-off-by: Stefan Wahren Applied.
Re: [PATCH net] net_sched: prio: properly report out of memory errors
From: Eric DumazetDate: Sun, 12 Jun 2016 16:21:47 -0700 > From: Eric Dumazet > > At Qdisc creation or change time, prio_tune() creates missing > pfifo qdiscs but does not return an error code if one > qdisc could not be allocated. > > Leaving a qdisc in non operational state without telling user > anything about this problem is not good. > > Also, testing if we replace something different than noop_qdisc > a second time makes no sense so I removed useless code. > > Signed-off-by: Eric Dumazet Applied.
RE: [PATCH v2 1/2] ARM: imx6: disable deeper idle states when FEC is active w/o HW workaround
From: Lucas StachSent: Thursday, June 09, 2016 5:35 PM > To: Fugang Duan ; Shawn Guo > Cc: devicet...@vger.kernel.org; patchwork-...@pengutronix.de; > ker...@pengutronix.de; linux-arm-ker...@lists.infradead.org; > netdev@vger.kernel.org > Subject: Re: [PATCH v2 1/2] ARM: imx6: disable deeper idle states when FEC is > active w/o HW workaround > > Hi Fugang, > > Am Montag, den 06.06.2016, 02:00 + schrieb Fugang Duan: > > From: Lucas Stach Sent: Saturday, June 04, > > 2016 12:31 AM > > > > > > To: Shawn Guo ; Fugang Duan > > om> > > > Cc: devicet...@vger.kernel.org; patchwork-...@pengutronix.de; > > > ker...@pengutronix.de; linux-arm-ker...@lists.infradead.org; > > > netdev@vger.kernel.org > > > Subject: [PATCH v2 1/2] ARM: imx6: disable deeper idle states when > > > FEC is active w/o HW workaround > > > > > > The i.MX6 Q/DL has an erratum (ERR006687) that prevents the FEC from > > > waking the CPUs when they are in wait(unclocked) state. As the > > > hardware workaround isn't applicable to all boards, disable the > > > deeper idle state when the workaround isn't present and the FEC is > > > in use. > > > > > > This allows to safely run a kernel with CPUidle enabled on all > > > i.MX6 boards. > > > > > > Signed-off-by: Lucas Stach > > > Acked-by: David S. Miller (for network > > > changes) > > > --- > > [...] > > > Hi, Lucas, > > > > FEC irq cannot wake up CPUs when system is in wait mode. But we can > > use GPIO_6 for FEC interrupt that GPIO irq wake up CPUs. > > No need to disable wait mode as your such patches. > > > > You just config the gpio irq like below patches: > > bc20a5d6da71 (ARM: dts: imx6qdl-sabreauto: use GPIO_6 for FEC > > interrupt.) > > 6261c4c8f13e (ARM: dts: imx6qdl-sabrelite: use GPIO_6 for FEC > > interrupt.) > > > Please look at the description of this series again. The changes don't > disable the > deeper idle states on boards where the HW waorkaround is available. There is a > large number of boards in the wild which can not use the HW workaround, as > they use GPIO_6 for other purposes. The aim of this series is to have an > automatic software workaround available for those boards. > > Regards, > Lucas I see. My concern is the wild boards why don't follow NXP HW reference guide. For the patch itself, it is fine for me to fix the wild boards issue. Acked-by: Fugang Duan
[PATCH] esp: correct offset for ESN when using NAT-T
The offset for calculating ESN was not taking into account the new UDP header created for NAT-T. --- net/ipv4/esp4.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index 4779374..c84d1fc 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -223,6 +223,8 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb) uh->len = htons(skb->len - skb_transport_offset(skb)); uh->check = 0; + skb_set_transport_header(skb, skb_transport_offset(skb) + sizeof(struct udphdr)); + switch (encap_type) { default: case UDP_ENCAP_ESPINUDP: -- 2.8.3
[PATCH] IPsec NAT-T issue
During testing we have discovered an issue with IPsec NAT-T where the SPI is over writing the source and dest ports of the UDP header. I'm not super familiar with this code, but I've found a solution that seems to work in my setup. I'd like some feedback on this please, if it's the right thing to be doing here, or if it should be done elsewhere. Thanks very much Blair Steven (1): esp: correct offset for ESN when using NAT-T net/ipv4/esp4.c | 2 ++ 1 file changed, 2 insertions(+) -- 2.8.3
Re: off-by-one in DecodeQ931
Attached is the patch generated with git format-patch. On Mon, Jun 6, 2016 at 10:55 AM, Pablo Neira Ayusowrote: > On Mon, Jun 06, 2016 at 04:35:55PM +0200, Florian Westphal wrote: >> Toby DiPasquale wrote: >> > Is this latest patch OK? >> >> Yes, I don't know why it wasn't applied yet. >> >> Pablo? > > This doesn't apply. > > $ git am /tmp/off-by-one-in-DecodeQ931.patch -s > Applying: off-by-one in DecodeQ931 > error: patch failed: net/netfilter/nf_conntrack_h323_asn1.c:846 > error: net/netfilter/nf_conntrack_h323_asn1.c: patch does not apply > Patch failed at 0001 off-by-one in DecodeQ931 > The copy of the patch that failed is found in: >patch > When you have resolved this problem, run "git am --continue". > If you prefer to skip this patch, run "git am --skip" instead. > To restore the original branch and stop patching, run "git am --abort". > > Could you resubmit using git-format-patch? Thanks. -- Toby DiPasquale off-by-one.patch Description: Binary data
[PATCH net] net_sched: prio: properly report out of memory errors
From: Eric DumazetAt Qdisc creation or change time, prio_tune() creates missing pfifo qdiscs but does not return an error code if one qdisc could not be allocated. Leaving a qdisc in non operational state without telling user anything about this problem is not good. Also, testing if we replace something different than noop_qdisc a second time makes no sense so I removed useless code. Signed-off-by: Eric Dumazet --- net/sched/sch_prio.c | 32 1 file changed, 12 insertions(+), 20 deletions(-) diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c index 4b0a82191bc4..071718bccdab 100644 --- a/net/sched/sch_prio.c +++ b/net/sched/sch_prio.c @@ -202,26 +202,18 @@ static int prio_tune(struct Qdisc *sch, struct nlattr *opt) sch_tree_unlock(sch); for (i = 0; i < q->bands; i++) { - if (q->queues[i] == _qdisc) { - struct Qdisc *child, *old; - - child = qdisc_create_dflt(sch->dev_queue, - _qdisc_ops, - TC_H_MAKE(sch->handle, i + 1)); - if (child) { - sch_tree_lock(sch); - old = q->queues[i]; - q->queues[i] = child; - - if (old != _qdisc) { - qdisc_tree_reduce_backlog(old, - old->q.qlen, - old->qstats.backlog); - qdisc_destroy(old); - } - sch_tree_unlock(sch); - } - } + struct Qdisc *child; + + if (q->queues[i] != _qdisc) + continue; + + child = qdisc_create_dflt(sch->dev_queue, _qdisc_ops, + TC_H_MAKE(sch->handle, i + 1)); + if (!child) + return -ENOMEM; + sch_tree_lock(sch); + q->queues[i] = child; + sch_tree_unlock(sch); } return 0; }
Re: [PATCH net] fib_rules: don't break ECN with TOS rules
On 6/11/16 3:10 PM, Hannes Frederic Sowa wrote: @@ -215,6 +216,9 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb, rule4->dst_len = frh->dst_len; rule4->dstmask = inet_make_mask(rule4->dst_len); rule4->tos = frh->tos; + if (!INET_ECN_ignore(rule4->tos)) + pr_warn("ipv4: never matching ipv4 rule with match for %x added\n", + rule4->tos); The wording of the warning message is not very clear on what the problem is.
Re: [very-RFC 0/8] TSN driver for the kernel
On Sun, Jun 12, 2016 at 07:43:34PM +0900, Takashi Sakamoto wrote: > On Jun 12 2016 17:31, Henrik Austad wrote: > > On Sun, Jun 12, 2016 at 01:30:24PM +0900, Takashi Sakamoto wrote: > >> On Jun 12 2016 12:38, Takashi Sakamoto wrote: > >>> In your patcset, there's no actual codes about how to handle any > >>> interrupt contexts (software / hardware), how to handle packet payload, > >>> and so on. Especially, for recent sound subsystem, the timing of > >>> generating interrupts and which context does what works are important to > >>> reduce playback/capture latency and power consumption. > >>> > >>> Of source, your intention of this patchset is to show your early concept > >>> of TSN feature. Nevertheless, both of explaination and codes are > >>> important to the other developers who have little knowledges about TSN, > >>> AVB and AES-64 such as me. > >> > >> Oops. Your 5th patch was skipped by alsa-project.org. I guess that size > >> of the patch is too large to the list service. I can see it: > >> http://marc.info/?l=linux-netdev=146568672728661=2 > >> > >> As long as seeing the patch, packets are queueing in hrtimer callbacks > >> every 1 seconds. > > > > Actually, the hrtimer fires every 1ms, and that part is something I have to > > do something about, also because it sends of the same number of frames > > every time, regardless of how accurate the internal timer is to the rest of > > the network (there's no backpressure from the networking layer). > > > >> (This is a high level discussion and it's OK to ignore it for the > >> moment. When writing packet-oriented drivers for sound subsystem, you > >> need to pay special attention to accuracy of the number of PCM frames > >> transferred currently, and granularity of the number of PCM frames > >> transferred by one operation. In this case, snd_avb_hw, > >> snd_avb_pcm_pointer(), tsn_buffer_write_net() and tsn_buffer_read_net() > >> are involved in this discussion. You can see ALSA developers' struggle > >> in USB audio device class drivers and (of cource) IEC 61883-1/6 drivers.) > > > > Ah, good point. Any particular parts of the USB-subsystem I should start > > looking at? > > I don't think it's a beter way for you to study USB Audio Device Class > driver unless you're interested in ALSA or USB subsystem. > > (But for your information, snd-usb-audio is in sound/usb/* of Linux > kernel. IEC 61883-1/6 driver is in sound/firewire/*.) Ok, thanks, I'll definately be looking at the firewire bit > We need different strategy to achieve it on different transmission backend. > > > Knowing where to start looking is a tremendous help > > It's not well-documented, and not well-generalized for packet-oriented > drivers. Most of developers who have enough knowledge about it work for > DMA-oriented drivers in mobile platforms and have little interests in > packet-oriented drivers. You need to find your own way. > > Currently I have few advices to you, because I'm also on the way for > drivers to process IEC 61883-1/6 packets on IEEE 1394 bus with enough > accuracy and granularity. The paper I introduced is for the way (but not > mature). > > I wish you get more helps from the other developers. Your work is more > significant to Linux system, than mine. > > (And I hope your future work get no ignorance and no unreasonable > hostility from coarse users.) Ah well, I have asbestos-underwear so that should be fine :) Thanks for the pointers, I really appreciate them! -- Henrik Austad signature.asc Description: Digital signature
QUICK LOAN OFFER
[iproute2 PATCH 1/1] action pedit: stylistic changes
From: Jamal Hadi SalimMore modern layout. Signed-off-by: Jamal Hadi Salim --- tc/m_pedit.c | 118 +++ 1 file changed, 62 insertions(+), 56 deletions(-) diff --git a/tc/m_pedit.c b/tc/m_pedit.c index a539b68..d276ba0 100644 --- a/tc/m_pedit.c +++ b/tc/m_pedit.c @@ -32,8 +32,7 @@ static struct m_pedit_util *pedit_list; static int pedit_debug; -static void -explain(void) +static void explain(void) { fprintf(stderr, "Usage: ... pedit munge [CONTROL]\n"); fprintf(stderr, @@ -48,22 +47,24 @@ explain(void) } -static void -usage(void) +static void usage(void) { explain(); exit(-1); } -static int -pedit_parse_nopopt (int *argc_p, char ***argv_p, struct tc_pedit_sel *sel, struct tc_pedit_key *tkey) +static int pedit_parse_nopopt(int *argc_p, char ***argv_p, + struct tc_pedit_sel *sel, + struct tc_pedit_key *tkey) { int argc = *argc_p; char **argv = *argv_p; if (argc) { - fprintf(stderr, "Unknown action hence option \"%s\" is unparsable\n", *argv); - return -1; + fprintf(stderr, + "Unknown action hence option \"%s\" is unparsable\n", + *argv); + return -1; } return 0; @@ -75,7 +76,7 @@ static struct m_pedit_util *get_pedit_kind(const char *str) static void *pBODY; void *dlh; char buf[256]; - struct m_pedit_util *p; + struct m_pedit_util *p; for (p = pedit_list; p; p = p->next) { if (strcmp(p->id, str) == 0) @@ -107,15 +108,14 @@ noexist: p = malloc(sizeof(*p)); if (p) { memset(p, 0, sizeof(*p)); - strncpy(p->id, str, sizeof(p->id)-1); + strncpy(p->id, str, sizeof(p->id) - 1); p->parse_peopt = pedit_parse_nopopt; goto reg; } return p; } -int -pack_key(struct tc_pedit_sel *sel, struct tc_pedit_key *tkey) +int pack_key(struct tc_pedit_sel *sel, struct tc_pedit_key *tkey) { int hwm = sel->nkeys; @@ -137,9 +137,8 @@ pack_key(struct tc_pedit_sel *sel, struct tc_pedit_key *tkey) return 0; } - -int -pack_key32(__u32 retain, struct tc_pedit_sel *sel, struct tc_pedit_key *tkey) +int pack_key32(__u32 retain, struct tc_pedit_sel *sel, + struct tc_pedit_key *tkey) { if (tkey->off > (tkey->off & ~3)) { fprintf(stderr, @@ -152,11 +151,11 @@ pack_key32(__u32 retain, struct tc_pedit_sel *sel, struct tc_pedit_key *tkey) return pack_key(sel, tkey); } -int -pack_key16(__u32 retain, struct tc_pedit_sel *sel, struct tc_pedit_key *tkey) +int pack_key16(__u32 retain, struct tc_pedit_sel *sel, + struct tc_pedit_key *tkey) { int ind, stride; - __u32 m[4] = {0x, 0xFFFF, 0x}; + __u32 m[4] = { 0x, 0xFFFF, 0x }; if (tkey->val > 0x || tkey->mask > 0x) { fprintf(stderr, "pack_key16 bad value\n"); @@ -177,19 +176,20 @@ pack_key16(__u32 retain, struct tc_pedit_sel *sel, struct tc_pedit_key *tkey) tkey->off &= ~3; if (pedit_debug) - printf("pack_key16: Final val %08x mask %08x\n", tkey->val, tkey->mask); + printf("pack_key16: Final val %08x mask %08x\n", + tkey->val, tkey->mask); return pack_key(sel, tkey); } -int -pack_key8(__u32 retain, struct tc_pedit_sel *sel, struct tc_pedit_key *tkey) +int pack_key8(__u32 retain, struct tc_pedit_sel *sel, struct tc_pedit_key *tkey) { int ind, stride; - __u32 m[4] = {0x00FF, 0xFF00, 0x00FF, 0xFF00}; + __u32 m[4] = { 0x00FF, 0xFF00, 0x00FF, 0xFF00 }; if (tkey->val > 0xFF || tkey->mask > 0xFF) { - fprintf(stderr, "pack_key8 bad value (val %x mask %x\n", tkey->val, tkey->mask); + fprintf(stderr, "pack_key8 bad value (val %x mask %x\n", + tkey->val, tkey->mask); return -1; } @@ -202,12 +202,12 @@ pack_key8(__u32 retain, struct tc_pedit_sel *sel, struct tc_pedit_key *tkey) tkey->off &= ~3; if (pedit_debug) - printf("pack_key8: Final word off %d val %08x mask %08x\n", tkey->off, tkey->val, tkey->mask); + printf("pack_key8: Final word off %d val %08x mask %08x\n", + tkey->off, tkey->val, tkey->mask); return pack_key(sel, tkey); } -int -parse_val(int *argc_p, char ***argv_p, __u32 *val, int type) +int parse_val(int *argc_p, char ***argv_p, __u32 * val, int type) { int argc = *argc_p; char **argv = *argv_p; @@ -216,7 +216,7 @@ parse_val(int *argc_p, char ***argv_p, __u32 *val, int type)
[PATCH] net: ethernet: enic: move to new ethtool api {get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated. We move the enic driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes--- drivers/net/ethernet/cisco/enic/enic_ethtool.c | 28 +-- 1 files changed, 16 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/cisco/enic/enic_ethtool.c b/drivers/net/ethernet/cisco/enic/enic_ethtool.c index f44a39c..fd3980c 100644 --- a/drivers/net/ethernet/cisco/enic/enic_ethtool.c +++ b/drivers/net/ethernet/cisco/enic/enic_ethtool.c @@ -103,25 +103,29 @@ static void enic_intr_coal_set_rx(struct enic *enic, u32 timer) } } -static int enic_get_settings(struct net_device *netdev, - struct ethtool_cmd *ecmd) +static int enic_get_ksettings(struct net_device *netdev, + struct ethtool_link_ksettings *ecmd) { struct enic *enic = netdev_priv(netdev); + struct ethtool_link_settings *base = >base; - ecmd->supported = (SUPPORTED_1baseT_Full | SUPPORTED_FIBRE); - ecmd->advertising = (ADVERTISED_1baseT_Full | ADVERTISED_FIBRE); - ecmd->port = PORT_FIBRE; - ecmd->transceiver = XCVR_EXTERNAL; + ethtool_link_ksettings_add_link_mode(ecmd, supported, +1baseT_Full); + ethtool_link_ksettings_add_link_mode(ecmd, supported, FIBRE); + ethtool_link_ksettings_add_link_mode(ecmd, advertising, +1baseT_Full); + ethtool_link_ksettings_add_link_mode(ecmd, advertising, FIBRE); + base->port = PORT_FIBRE; if (netif_carrier_ok(netdev)) { - ethtool_cmd_speed_set(ecmd, vnic_dev_port_speed(enic->vdev)); - ecmd->duplex = DUPLEX_FULL; + base->speed = vnic_dev_port_speed(enic->vdev); + base->duplex = DUPLEX_FULL; } else { - ethtool_cmd_speed_set(ecmd, SPEED_UNKNOWN); - ecmd->duplex = DUPLEX_UNKNOWN; + base->speed = SPEED_UNKNOWN; + base->duplex = DUPLEX_UNKNOWN; } - ecmd->autoneg = AUTONEG_DISABLE; + base->autoneg = AUTONEG_DISABLE; return 0; } @@ -500,7 +504,6 @@ static int enic_set_rxfh(struct net_device *netdev, const u32 *indir, } static const struct ethtool_ops enic_ethtool_ops = { - .get_settings = enic_get_settings, .get_drvinfo = enic_get_drvinfo, .get_msglevel = enic_get_msglevel, .set_msglevel = enic_set_msglevel, @@ -516,6 +519,7 @@ static const struct ethtool_ops enic_ethtool_ops = { .get_rxfh_key_size = enic_get_rxfh_key_size, .get_rxfh = enic_get_rxfh, .set_rxfh = enic_set_rxfh, + .get_link_ksettings = enic_get_ksettings, }; void enic_set_ethtool_ops(struct net_device *netdev) -- 1.7.4.4
Re: [RFC PATCH iproute2] tc: let m_ipt work with new iptables API headers
Hi, On 05/29/2016 08:27 PM, Alexander Aring wrote: > Since commit 5cd1adb ("Update to current iptables headers") the build > with m_ipt.o and the following config will fail: > > TC_CONFIG_XT:=n > TC_CONFIG_XT_OLD:=n > TC_CONFIG_XT_OLD_H:=n > > This patch renames "iptables_target" to "xtables_target" and some other > things which gets renamed and I noticed while reading iptables git log. > Functions which are not used in m_ipt.c and not exported by the header > are removed, if they still used in m_ipt.c I added a static to the function. > > Reported-by: Clemens Gruber> Signed-off-by: Alexander Aring > --- > I removed also "linux/if.h" because some redifinition errors and seems > not be necessary anymore. > > I didn't test this implementation, I just fix the compile issues at m_ipt > implementation according to the algoritmn described above. > > Maybe there are more TC_CONFIG_XT* config variants broken than just m_ipt? > ping, any comments to this patch? I can rebase and resend this patch if necessary. - Alex
Re: [very-RFC 6/8] Add TSN event-tracing
On Sun, Jun 12, 2016 at 12:58:03PM -0400, Steven Rostedt wrote: > On Sun, 12 Jun 2016 01:01:34 +0200 > Henrik Austadwrote: > > > From: Henrik Austad > > > > This needs refactoring and should be updated to use TRACE_CLASS, but for > > now it provides a fair debug-window into TSN. > > > > Cc: "David S. Miller" > > Cc: Steven Rostedt (maintainer:TRACING) > > Cc: Ingo Molnar (maintainer:TRACING) > > Signed-off-by: Henrik Austad > > --- > > include/trace/events/tsn.h | 349 > > + > > 1 file changed, 349 insertions(+) > > create mode 100644 include/trace/events/tsn.h > > > > diff --git a/include/trace/events/tsn.h b/include/trace/events/tsn.h > > new file mode 100644 > > index 000..ac1f31b > > --- /dev/null > > +++ b/include/trace/events/tsn.h > > @@ -0,0 +1,349 @@ > > +#undef TRACE_SYSTEM > > +#define TRACE_SYSTEM tsn > > + > > +#if !defined(_TRACE_TSN_H) || defined(TRACE_HEADER_MULTI_READ) > > +#define _TRACE_TSN_H > > + > > +#include > > +#include > > + > > +#include > > +#include > > +/* #include */ > > + > > +/* FIXME: update to TRACE_CLASS to reduce overhead */ > > I'm curious to why I didn't do this now. A class would make less > duplication of typing too ;-) Yeah, I found this in a really great article written by some tracing-dude, I hear he talks really, really fast! https://lwn.net/Articles/381064/ > > +TRACE_EVENT(tsn_buffer_write, > > + > > + TP_PROTO(struct tsn_link *link, > > + size_t bytes), > > + > > + TP_ARGS(link, bytes), > > + > > + TP_STRUCT__entry( > > + __field(u64, stream_id) > > + __field(size_t, size) > > + __field(size_t, bsize) > > + __field(size_t, size_left) > > + __field(void *, buffer) > > + __field(void *, head) > > + __field(void *, tail) > > + __field(void *, end) > > + ), > > + > > + TP_fast_assign( > > + __entry->stream_id = link->stream_id; > > + __entry->size = bytes; > > + __entry->bsize = link->used_buffer_size; > > + __entry->size_left = (link->head - link->tail) % > > link->used_buffer_size; > > Move this logic into the print statement, since you save head and tail. Ok, any particular reason? > > + __entry->buffer = link->buffer; > > + __entry->head = link->head; > > + __entry->tail = link->tail; > > + __entry->end = link->end; > > + ), > > + > > + TP_printk("stream_id=%llu, copy=%zd, buffer: %zd, avail=%zd, > > [buffer=%p, head=%p, tail=%p, end=%p]", > > + __entry->stream_id, __entry->size, __entry->bsize, > > __entry->size_left, > > __entry->stream_id, __entry->size, __entry->bsize, > (__entry->head - __entry->tail) % __entry->bsize, > Ok, so is this about saving space by dropping one intermediate value, or is it some other point I'm missing here? > > + __entry->buffer,__entry->head, __entry->tail, __entry->end) > > + > > + ); > > + > > +TRACE_EVENT(tsn_buffer_write_net, > > + > > + TP_PROTO(struct tsn_link *link, > > + size_t bytes), > > + > > + TP_ARGS(link, bytes), > > + > > + TP_STRUCT__entry( > > + __field(u64, stream_id) > > + __field(size_t, size) > > + __field(size_t, bsize) > > + __field(size_t, size_left) > > + __field(void *, buffer) > > + __field(void *, head) > > + __field(void *, tail) > > + __field(void *, end) > > + ), > > + > > + TP_fast_assign( > > + __entry->stream_id = link->stream_id; > > + __entry->size = bytes; > > + __entry->bsize = link->used_buffer_size; > > + __entry->size_left = (link->head - link->tail) % > > link->used_buffer_size; > > + __entry->buffer = link->buffer; > > + __entry->head = link->head; > > + __entry->tail = link->tail; > > + __entry->end = link->end; > > + ), > > + > > + TP_printk("stream_id=%llu, copy=%zd, buffer: %zd, avail=%zd, > > [buffer=%p, head=%p, tail=%p, end=%p]", > > + __entry->stream_id, __entry->size, __entry->bsize, > > __entry->size_left, > > + __entry->buffer,__entry->head, __entry->tail, __entry->end) > > + > > + ); > > + > > + > > +TRACE_EVENT(tsn_buffer_read, > > + > > + TP_PROTO(struct tsn_link *link, > > + size_t bytes), > > + > > + TP_ARGS(link, bytes), > > + > > + TP_STRUCT__entry( > > + __field(u64, stream_id) > > + __field(size_t, size) > > + __field(size_t, bsize) > > + __field(size_t, size_left) > > + __field(void *, buffer) > > + __field(void *, head) > > + __field(void *, tail) > > + __field(void *, end) > > + ), > > + > > + TP_fast_assign( >
[net-next PATCH v2 1/1] net sched actions: skbedit add support for mod-ing skb pkt_type
From: Jamal Hadi SalimExtremely useful for setting packet type to host so i dont have to modify the dst mac address using pedit (which requires that i know the mac address) Signed-off-by: Jamal Hadi Salim --- include/net/tc_act/tc_skbedit.h| 10 +- include/uapi/linux/tc_act/tc_skbedit.h | 2 ++ net/sched/act_skbedit.c| 18 +- 3 files changed, 24 insertions(+), 6 deletions(-) diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h index b496d5a..d01a5d4 100644 --- a/include/net/tc_act/tc_skbedit.h +++ b/include/net/tc_act/tc_skbedit.h @@ -24,11 +24,11 @@ struct tcf_skbedit { struct tcf_common common; - u32 flags; - u32 priority; - u32 mark; - u16 queue_mapping; - /* XXX: 16-bit pad here? */ + u32 flags; + u32 priority; + u32 mark; + u16 queue_mapping; + u16 ptype; }; #define to_skbedit(a) \ container_of(a->priv, struct tcf_skbedit, common) diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h index fecb5cc..a4d00c6 100644 --- a/include/uapi/linux/tc_act/tc_skbedit.h +++ b/include/uapi/linux/tc_act/tc_skbedit.h @@ -27,6 +27,7 @@ #define SKBEDIT_F_PRIORITY 0x1 #define SKBEDIT_F_QUEUE_MAPPING0x2 #define SKBEDIT_F_MARK 0x4 +#define SKBEDIT_F_PTYPE0x8 struct tc_skbedit { tc_gen; @@ -40,6 +41,7 @@ enum { TCA_SKBEDIT_QUEUE_MAPPING, TCA_SKBEDIT_MARK, TCA_SKBEDIT_PAD, + TCA_SKBEDIT_PTYPE, __TCA_SKBEDIT_MAX }; #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1) diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c index 8f235ff..8a7a34b 100644 --- a/net/sched/act_skbedit.c +++ b/net/sched/act_skbedit.c @@ -47,6 +47,8 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a, skb_set_queue_mapping(skb, d->queue_mapping); if (d->flags & SKBEDIT_F_MARK) skb->mark = d->mark; + if (d->flags & SKBEDIT_F_PTYPE) + skb->pkt_type = d->ptype; spin_unlock(>tcf_lock); return d->tcf_action; @@ -57,6 +59,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = { [TCA_SKBEDIT_PRIORITY] = { .len = sizeof(u32) }, [TCA_SKBEDIT_QUEUE_MAPPING] = { .len = sizeof(u16) }, [TCA_SKBEDIT_MARK] = { .len = sizeof(u32) }, + [TCA_SKBEDIT_PTYPE] = { .len = sizeof(u16) }, }; static int tcf_skbedit_init(struct net *net, struct nlattr *nla, @@ -68,7 +71,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, struct tc_skbedit *parm; struct tcf_skbedit *d; u32 flags = 0, *priority = NULL, *mark = NULL; - u16 *queue_mapping = NULL; + u16 *queue_mapping = NULL, *ptype = NULL; int ret = 0, err, aexists = 0; if (nla == NULL) @@ -91,6 +94,13 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, queue_mapping = nla_data(tb[TCA_SKBEDIT_QUEUE_MAPPING]); } + if (tb[TCA_SKBEDIT_PTYPE] != NULL) { + flags |= SKBEDIT_F_PTYPE; + ptype = nla_data(tb[TCA_SKBEDIT_PTYPE]); + if (*ptype > PACKET_KERNEL) + return -EINVAL; + } + if (tb[TCA_SKBEDIT_MARK] != NULL) { flags |= SKBEDIT_F_MARK; mark = nla_data(tb[TCA_SKBEDIT_MARK]); @@ -135,6 +145,8 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, d->queue_mapping = *queue_mapping; if (flags & SKBEDIT_F_MARK) d->mark = *mark; + if (flags & SKBEDIT_F_PTYPE) + d->ptype = *ptype; d->tcf_action = parm->action; @@ -172,6 +184,10 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a, nla_put(skb, TCA_SKBEDIT_MARK, sizeof(d->mark), >mark)) goto nla_put_failure; + if ((d->flags & SKBEDIT_F_PTYPE) && + nla_put(skb, TCA_SKBEDIT_PTYPE, sizeof(d->ptype), + >ptype)) + goto nla_put_failure; tcf_tm_dump(, >tcf_tm); if (nla_put_64bit(skb, TCA_SKBEDIT_TM, sizeof(t), , TCA_SKBEDIT_PAD)) -- 1.9.1
Re: [net-next PATCH 1/1] net sched actions: skbedit add support for mod-ing skb pkt_type
On 16-06-12 05:12 PM, Jamal Hadi Salim wrote: From: Jamal Hadi SalimExtremely useful for setting packet type to host so i dont have to modify the dst mac address using pedit (which requires that i know the mac address) Signed-off-by: Jamal Hadi Salim Please ignore this version; dumping was buggy. Sending a new version in a minute. cheers. jamal
[net-next PATCH 1/1] net sched actions: skbedit add support for mod-ing skb pkt_type
From: Jamal Hadi SalimExtremely useful for setting packet type to host so i dont have to modify the dst mac address using pedit (which requires that i know the mac address) Signed-off-by: Jamal Hadi Salim --- include/net/tc_act/tc_skbedit.h| 10 +- include/uapi/linux/tc_act/tc_skbedit.h | 2 ++ net/sched/act_skbedit.c| 17 - 3 files changed, 23 insertions(+), 6 deletions(-) diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h index b496d5a..d01a5d4 100644 --- a/include/net/tc_act/tc_skbedit.h +++ b/include/net/tc_act/tc_skbedit.h @@ -24,11 +24,11 @@ struct tcf_skbedit { struct tcf_common common; - u32 flags; - u32 priority; - u32 mark; - u16 queue_mapping; - /* XXX: 16-bit pad here? */ + u32 flags; + u32 priority; + u32 mark; + u16 queue_mapping; + u16 ptype; }; #define to_skbedit(a) \ container_of(a->priv, struct tcf_skbedit, common) diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h index fecb5cc..a4d00c6 100644 --- a/include/uapi/linux/tc_act/tc_skbedit.h +++ b/include/uapi/linux/tc_act/tc_skbedit.h @@ -27,6 +27,7 @@ #define SKBEDIT_F_PRIORITY 0x1 #define SKBEDIT_F_QUEUE_MAPPING0x2 #define SKBEDIT_F_MARK 0x4 +#define SKBEDIT_F_PTYPE0x8 struct tc_skbedit { tc_gen; @@ -40,6 +41,7 @@ enum { TCA_SKBEDIT_QUEUE_MAPPING, TCA_SKBEDIT_MARK, TCA_SKBEDIT_PAD, + TCA_SKBEDIT_PTYPE, __TCA_SKBEDIT_MAX }; #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1) diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c index 8f235ff..8ab5d0c 100644 --- a/net/sched/act_skbedit.c +++ b/net/sched/act_skbedit.c @@ -47,6 +47,8 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a, skb_set_queue_mapping(skb, d->queue_mapping); if (d->flags & SKBEDIT_F_MARK) skb->mark = d->mark; + if (d->flags & SKBEDIT_F_PTYPE) + skb->pkt_type = d->ptype; spin_unlock(>tcf_lock); return d->tcf_action; @@ -57,6 +59,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = { [TCA_SKBEDIT_PRIORITY] = { .len = sizeof(u32) }, [TCA_SKBEDIT_QUEUE_MAPPING] = { .len = sizeof(u16) }, [TCA_SKBEDIT_MARK] = { .len = sizeof(u32) }, + [TCA_SKBEDIT_PTYPE] = { .len = sizeof(u16) }, }; static int tcf_skbedit_init(struct net *net, struct nlattr *nla, @@ -68,7 +71,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, struct tc_skbedit *parm; struct tcf_skbedit *d; u32 flags = 0, *priority = NULL, *mark = NULL; - u16 *queue_mapping = NULL; + u16 *queue_mapping = NULL, *ptype = NULL; int ret = 0, err, aexists = 0; if (nla == NULL) @@ -91,6 +94,13 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, queue_mapping = nla_data(tb[TCA_SKBEDIT_QUEUE_MAPPING]); } + if (tb[TCA_SKBEDIT_PTYPE] != NULL) { + flags |= SKBEDIT_F_PTYPE; + ptype = nla_data(tb[TCA_SKBEDIT_PTYPE]); + if (*ptype > PACKET_KERNEL) + return -EINVAL; + } + if (tb[TCA_SKBEDIT_MARK] != NULL) { flags |= SKBEDIT_F_MARK; mark = nla_data(tb[TCA_SKBEDIT_MARK]); @@ -135,6 +145,8 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, d->queue_mapping = *queue_mapping; if (flags & SKBEDIT_F_MARK) d->mark = *mark; + if (flags & SKBEDIT_F_PTYPE) + d->ptype = *ptype; d->tcf_action = parm->action; @@ -171,6 +183,9 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a, if ((d->flags & SKBEDIT_F_MARK) && nla_put(skb, TCA_SKBEDIT_MARK, sizeof(d->mark), >mark)) + if ((d->flags & SKBEDIT_F_PTYPE) && + nla_put(skb, TCA_SKBEDIT_PTYPE, sizeof(d->ptype), + >ptype)) goto nla_put_failure; tcf_tm_dump(, >tcf_tm); -- 1.9.1
Re: ebpf: issue with clang
Hello, On Sun, 2016-06-12 at 20:35 +0200, Daniel Borkmann wrote: > On 06/12/2016 07:37 PM, Eric Leblond wrote: > > On Thu, 2016-06-09 at 17:34 -0700, Alexei Starovoitov wrote: > > > On Thu, Jun 09, 2016 at 11:10:05PM +0200, Eric Leblond wrote: > > > > Hello, > > > > > > > > I'm working on integrating ebpf cluster load balancing for > > > > AF_PACKET > > > > and I've got some problem to get real code inside the EBPF > > > > filter. > > > > > > > > I've tried different command lines in the build process. One of > > > > them > > > > is: > > > > clang-3.9 -Wall -O2 -emit-llvm -c hash_ports.c -o - | llc-3.9 > > > > -march=bpf -filetype=obj -o hash_ports.bpf > > > > > > > > If I use that one, then the generated code is almost void. If I > > > > remove > > > > the -O2 then I've got a generated code that fails during load. > > > > When > > > > not > > > > using -O2, I manage to load a trivial filter (return of static > > > > value). > > > > > > > > The C code is the following (a derivative of http-simple- > > > > filter.c > > > > used > > > > for testing): > > > > > > > > int filter(struct __sk_buff *skb) { > > > > uint8_t *cursor = 0; > > > > struct ethernet_t *ethernet = cursor_advance(cursor, > > > > sizeof(*ethernet)); > > > > > > this is bcc C syntax that is hiding the explicit > > > load_byte/half/word > > > operations > > > we have to do when using plain C. > > > If you want to compile C code with clang -O2 -target bpf file.c > > > -c -o > > > file.o > > > and copy .o around to be used in tc like: > > > tc filter add dev eth0 ingress bpf da obj file.o > > > then plain C should be used like in all samples/bpf/*_kern.c > > > examples. > > > Other folks like the convenience of bcc that hides clang/llvm > > > invocation. > > > It mostly applicable to tracing tools where both bcc-C and > > > corresponding python or lua bits are in the same file > > > like in iovisor/bcc/tools/* scripts. > > > The iovisor/bcc/examples/networking/* (where this http-simple- > > > filter.c came from) > > > are also suitable for networking and relying on pyroute2 to talk > > > to > > > kernel to create netns, veth and to attach bpf to qdisc. > > > > > > In summary there are several ways to write bpf C code: > > > 1. plain C syntax as in samples/bpf/*_kern.c > > > Pro: compiles with clang into .o > > > Con: .o requires elf loader (integrated into tc already for > > > networking), > > > > Yes, that's not an easy part. I've devel one loader for suricata > > but I > > will check the one in tc to see if I can take advantage of it. > > Sure, feel free to rip it out and adapt it. > > With AF_PACKET load balancing you mean a packet fanout eBPF demuxing > or > something else controlled via tc ingress? I'm using fanout eBPF demuxing to implement load balancing in Suricata. Current alpha level code is here: https://github.com/regit/suricata/commit/f299abe90bfed3590a9f3de1179091 b7afc2d90c I'm currently working on the demuxing to implement something more realistic than what current demuxing function. > > If packet fanout, then you also need to adapt the program type into > BPF_PROG_TYPE_SOCKET_FILTER. Yes, already done that (or at least it seems to work). BR, -- Eric Leblond
Re: ebpf: issue with clang
On 06/12/2016 07:37 PM, Eric Leblond wrote: On Thu, 2016-06-09 at 17:34 -0700, Alexei Starovoitov wrote: On Thu, Jun 09, 2016 at 11:10:05PM +0200, Eric Leblond wrote: Hello, I'm working on integrating ebpf cluster load balancing for AF_PACKET and I've got some problem to get real code inside the EBPF filter. I've tried different command lines in the build process. One of them is: clang-3.9 -Wall -O2 -emit-llvm -c hash_ports.c -o - | llc-3.9 -march=bpf -filetype=obj -o hash_ports.bpf If I use that one, then the generated code is almost void. If I remove the -O2 then I've got a generated code that fails during load. When not using -O2, I manage to load a trivial filter (return of static value). The C code is the following (a derivative of http-simple-filter.c used for testing): int filter(struct __sk_buff *skb) { uint8_t *cursor = 0; struct ethernet_t *ethernet = cursor_advance(cursor, sizeof(*ethernet)); this is bcc C syntax that is hiding the explicit load_byte/half/word operations we have to do when using plain C. If you want to compile C code with clang -O2 -target bpf file.c -c -o file.o and copy .o around to be used in tc like: tc filter add dev eth0 ingress bpf da obj file.o then plain C should be used like in all samples/bpf/*_kern.c examples. Other folks like the convenience of bcc that hides clang/llvm invocation. It mostly applicable to tracing tools where both bcc-C and corresponding python or lua bits are in the same file like in iovisor/bcc/tools/* scripts. The iovisor/bcc/examples/networking/* (where this http-simple- filter.c came from) are also suitable for networking and relying on pyroute2 to talk to kernel to create netns, veth and to attach bpf to qdisc. In summary there are several ways to write bpf C code: 1. plain C syntax as in samples/bpf/*_kern.c Pro: compiles with clang into .o Con: .o requires elf loader (integrated into tc already for networking), Yes, that's not an easy part. I've devel one loader for suricata but I will check the one in tc to see if I can take advantage of it. Sure, feel free to rip it out and adapt it. With AF_PACKET load balancing you mean a packet fanout eBPF demuxing or something else controlled via tc ingress? If packet fanout, then you also need to adapt the program type into BPF_PROG_TYPE_SOCKET_FILTER. Thanks! but not friendly for tracing that needs recompile for every kernel due to unstable kprobes 2. bcc C syntax that compiles C on the fly in memory and loads directly Pro: there is no .o, no extra files, no need to install clang/llvm Con: bcc is not widely available yet. ubuntu and others already have it in apt. python and lua may not be for everyone. c++ api is not stable yet. I need to include it into suricata which is C code. I've played with bcc and it is a great tool but installation on the different platform may be complicated for our users. 3. perf+bpf, it is similar to samples/pbf/ C style with few extensions. If .c file is passed, the perf calls external clang and loads .o eventually Pro: out-of-the-box perf and clang work well Con: not available for networking Out of scope for me then. Sounds like you want to use it with af_packet then tools/testing/selftests/net/reuseport_bpf.c could be a good start too, but there bpf is written in asm. Yes, bad point, asm is not really what I want. I want "normal advanced" users to be able to edit the load balancing function. If you pick bcc style then iovisor-...@lists.iovisor.org mailing list is a good place to ask questions. Be sure to subscribe first, since it rejects non-subscriber emails to reduce spam. Thanks a lot for all these explanations. You saved me days! BR,
Re: [PATCH 1/5] ethernet: add sun8i-emac driver
Le 09/06/2016 02:44, LABBE Corentin a écrit : > Hello > > I agree to all your comments, but for some I have additionnal questions > > On Mon, Jun 06, 2016 at 11:25:15AM -0700, Florian Fainelli wrote: >> On 06/03/2016 02:56 AM, LABBE Corentin wrote: >> >> [snip] >> >>> + >>> +/* The datasheet said that each descriptor can transfers up to 4096bytes >>> + * But latter, a register documentation reduce that value to 2048 >>> + * Anyway using 2048 cause strange behaviours and even BSP driver use 2047 >>> + */ >>> +#define DESC_BUF_MAX 2044 >>> +#if (DESC_BUF_MAX < (ETH_FRAME_LEN + 4)) >>> +#error "DESC_BUF_MAX must be set at minimum to ETH_FRAME_LEN + 4" >>> +#endif >> >> You can probably drop that, it would not make much sense to enable >> fragments and a buffer size smaller than ETH_FRAME_LEN + ETH_FCS_LEN anyway. >> > > I has added this test for preventing someone who want to "optimize" > DESC_BUF_MAX to doing mistake. > But I agree that it is of low use. It's actually dangerous, and if you don't make sure that the value is properly rounded to whatever the DMA controller's alignment should be, performance could be terribel too. > >>> +/* Return the number of contiguous free descriptors >>> + * starting from tx_slot >>> + */ >>> +static int rb_tx_numfreedesc(struct net_device *ndev) >>> +{ >>> + struct sun8i_emac_priv *priv = netdev_priv(ndev); >>> + >>> + if (priv->tx_slot < priv->tx_dirty) >>> + return priv->tx_dirty - priv->tx_slot; >> >> Does this work with if tx_dirty wraps around? >> > > The tx_dirty cannot wrap since I always keep an empty slot. (tx_slot cannot > go equal or after tx_dirty) OK, fair enough. > >>> +/* Grab a frame into a skb from descriptor number i */ >>> +static int sun8i_emac_rx_from_ddesc(struct net_device *ndev, int i) >>> +{ >>> + struct sk_buff *skb; >>> + struct sun8i_emac_priv *priv = netdev_priv(ndev); >>> + struct dma_desc *ddesc = priv->dd_rx + i; >>> + int frame_len; >>> + int crc_checked = 0; >>> + >>> + if (ndev->features & NETIF_F_RXCSUM) >>> + crc_checked = 1; >> >> Assuming CRC here refers to the Ethernet frame's FCS, then this is >> absolutely not how NETIF_F_RXCSUM works. NETIF_F_RXCSUM is about your >> Ethernet adapter supporting L3/L4 checksum offloads, while the Ethernet >> FCS is pretty much mandatory for the frame to be properly received in >> the first place. Can you clarify which way it is? >> > > No CRC here is RXCSUM. I understand the misnaming. > I will rename the variable to rxcsum_done. Thanks > >>> + >>> + priv->ndev->stats.rx_packets++; >>> + priv->ndev->stats.rx_bytes += frame_len; >>> + priv->rx_sk[i] = NULL; >>> + >>> + /* this frame is not the last */ >>> + if ((ddesc->status & BIT(8)) == 0) { >>> + dev_warn(priv->dev, "Multi frame not implemented currlen=%d\n", >>> +frame_len); >>> + } >>> + >>> + sun8i_emac_rx_sk(ndev, i); >>> + >>> + netif_rx(skb); >> >> netif_receive_skb() at the very least, or if you implement NAPI, like >> you shoud napi_gro_receive(). >> > > netif_receive_skb documentation say > "This function may only be called from softirq context and interrupts should > be enabled." > but the calling functions is in hardirq context. Well, yes, because you are not implementing NAPI, while you should, you execute in hard interrupt context. Once you move to NAPI, you can and should use netif_receive_skb(). > >>> + return 0; >>> +} >>> + >>> +/* Cycle over RX DMA descriptors for finding frame to receive >>> + */ >>> +static int sun8i_emac_receive_all(struct net_device *ndev) >>> +{ >>> + struct sun8i_emac_priv *priv = netdev_priv(ndev); >>> + struct dma_desc *ddesc; >>> + >>> + ddesc = priv->dd_rx + priv->rx_dirty; >>> + while (!(ddesc->status & BIT(31))) { >>> + sun8i_emac_rx_from_ddesc(ndev, priv->rx_dirty); >>> + rb_inc(>rx_dirty, nbdesc_rx); >>> + ddesc = priv->dd_rx + priv->rx_dirty; >>> + }; >> >> So, what if we ping flood your device here, is not there a remote chance >> that we keep the RX interrupt so busy we can't break out of this loop, >> and we are executing from hard IRQ context, that's bad. >> > > I have added a start variable for preventing to do more than a full loop. Which gets you close to a proper NAPI implementation, good. > >>> + >>> + return 0; >>> +} >>> + >>> +/* iterate over dma_desc for finding completed xmit. >>> + * Called from interrupt context, so no need to spinlock tx >> >> Humm, well it depends if what you are doing here may race with >> ndo_start_xmit(), and usually it does. >> > > I believe that how it is designed it cannot race each over (access the same > descriptor slot) since I keep a free slot between each other. > >> Also, you should consider completing TX packets in NAPI context (soft >> IRQ) instead of hard IRQs like here. >> > > I wanted to finish this driver the "old" way (with hard IRQ) and implementing > NAPI after as a Kconfig choice. > Does
Re: ebpf: issue with clang
Hello, On Thu, 2016-06-09 at 17:34 -0700, Alexei Starovoitov wrote: > On Thu, Jun 09, 2016 at 11:10:05PM +0200, Eric Leblond wrote: > > Hello, > > > > I'm working on integrating ebpf cluster load balancing for > > AF_PACKET > > and I've got some problem to get real code inside the EBPF filter. > > > > I've tried different command lines in the build process. One of > > them > > is: > > clang-3.9 -Wall -O2 -emit-llvm -c hash_ports.c -o - | llc-3.9 > > -march=bpf -filetype=obj -o hash_ports.bpf > > > > If I use that one, then the generated code is almost void. If I > > remove > > the -O2 then I've got a generated code that fails during load. When > > not > > using -O2, I manage to load a trivial filter (return of static > > value). > > > > The C code is the following (a derivative of http-simple-filter.c > > used > > for testing): > > > > int filter(struct __sk_buff *skb) { > > uint8_t *cursor = 0; > > struct ethernet_t *ethernet = cursor_advance(cursor, > > sizeof(*ethernet)); > > this is bcc C syntax that is hiding the explicit load_byte/half/word > operations > we have to do when using plain C. > If you want to compile C code with clang -O2 -target bpf file.c -c -o > file.o > and copy .o around to be used in tc like: > tc filter add dev eth0 ingress bpf da obj file.o > then plain C should be used like in all samples/bpf/*_kern.c > examples. > Other folks like the convenience of bcc that hides clang/llvm > invocation. > It mostly applicable to tracing tools where both bcc-C and > corresponding python or lua bits are in the same file > like in iovisor/bcc/tools/* scripts. > The iovisor/bcc/examples/networking/* (where this http-simple- > filter.c came from) > are also suitable for networking and relying on pyroute2 to talk to > kernel to create netns, veth and to attach bpf to qdisc. > > In summary there are several ways to write bpf C code: > 1. plain C syntax as in samples/bpf/*_kern.c > Pro: compiles with clang into .o > Con: .o requires elf loader (integrated into tc already for > networking), Yes, that's not an easy part. I've devel one loader for suricata but I will check the one in tc to see if I can take advantage of it. > but not friendly for tracing that needs recompile for every kernel > due to unstable kprobes > 2. bcc C syntax that compiles C on the fly in memory and loads > directly > Pro: there is no .o, no extra files, no need to install clang/llvm > Con: bcc is not widely available yet. ubuntu and others already have > it in apt. > python and lua may not be for everyone. c++ api is not stable yet. I need to include it into suricata which is C code. I've played with bcc and it is a great tool but installation on the different platform may be complicated for our users. > 3. perf+bpf, it is similar to samples/pbf/ C style with few > extensions. > If .c file is passed, the perf calls external clang and loads .o > eventually > Pro: out-of-the-box perf and clang work well > Con: not available for networking Out of scope for me then. > Sounds like you want to use it with af_packet then > tools/testing/selftests/net/reuseport_bpf.c > could be a good start too, but there bpf is written in asm. Yes, bad point, asm is not really what I want. I want "normal advanced" users to be able to edit the load balancing function. > If you pick bcc style then iovisor-...@lists.iovisor.org mailing list > is a good place to ask questions. Be sure to subscribe first, since > it rejects non-subscriber emails to reduce spam. Thanks a lot for all these explanations. You saved me days! BR, -- Eric Leblond
Re: [very-RFC 6/8] Add TSN event-tracing
On Sun, 12 Jun 2016 01:01:34 +0200 Henrik Austadwrote: > From: Henrik Austad > > This needs refactoring and should be updated to use TRACE_CLASS, but for > now it provides a fair debug-window into TSN. > > Cc: "David S. Miller" > Cc: Steven Rostedt (maintainer:TRACING) > Cc: Ingo Molnar (maintainer:TRACING) > Signed-off-by: Henrik Austad > --- > include/trace/events/tsn.h | 349 > + > 1 file changed, 349 insertions(+) > create mode 100644 include/trace/events/tsn.h > > diff --git a/include/trace/events/tsn.h b/include/trace/events/tsn.h > new file mode 100644 > index 000..ac1f31b > --- /dev/null > +++ b/include/trace/events/tsn.h > @@ -0,0 +1,349 @@ > +#undef TRACE_SYSTEM > +#define TRACE_SYSTEM tsn > + > +#if !defined(_TRACE_TSN_H) || defined(TRACE_HEADER_MULTI_READ) > +#define _TRACE_TSN_H > + > +#include > +#include > + > +#include > +#include > +/* #include */ > + > +/* FIXME: update to TRACE_CLASS to reduce overhead */ I'm curious to why I didn't do this now. A class would make less duplication of typing too ;-) > +TRACE_EVENT(tsn_buffer_write, > + > + TP_PROTO(struct tsn_link *link, > + size_t bytes), > + > + TP_ARGS(link, bytes), > + > + TP_STRUCT__entry( > + __field(u64, stream_id) > + __field(size_t, size) > + __field(size_t, bsize) > + __field(size_t, size_left) > + __field(void *, buffer) > + __field(void *, head) > + __field(void *, tail) > + __field(void *, end) > + ), > + > + TP_fast_assign( > + __entry->stream_id = link->stream_id; > + __entry->size = bytes; > + __entry->bsize = link->used_buffer_size; > + __entry->size_left = (link->head - link->tail) % > link->used_buffer_size; Move this logic into the print statement, since you save head and tail. > + __entry->buffer = link->buffer; > + __entry->head = link->head; > + __entry->tail = link->tail; > + __entry->end = link->end; > + ), > + > + TP_printk("stream_id=%llu, copy=%zd, buffer: %zd, avail=%zd, > [buffer=%p, head=%p, tail=%p, end=%p]", > + __entry->stream_id, __entry->size, __entry->bsize, > __entry->size_left, __entry->stream_id, __entry->size, __entry->bsize, (__entry->head - __entry->tail) % __entry->bsize, > + __entry->buffer,__entry->head, __entry->tail, __entry->end) > + > + ); > + > +TRACE_EVENT(tsn_buffer_write_net, > + > + TP_PROTO(struct tsn_link *link, > + size_t bytes), > + > + TP_ARGS(link, bytes), > + > + TP_STRUCT__entry( > + __field(u64, stream_id) > + __field(size_t, size) > + __field(size_t, bsize) > + __field(size_t, size_left) > + __field(void *, buffer) > + __field(void *, head) > + __field(void *, tail) > + __field(void *, end) > + ), > + > + TP_fast_assign( > + __entry->stream_id = link->stream_id; > + __entry->size = bytes; > + __entry->bsize = link->used_buffer_size; > + __entry->size_left = (link->head - link->tail) % > link->used_buffer_size; > + __entry->buffer = link->buffer; > + __entry->head = link->head; > + __entry->tail = link->tail; > + __entry->end = link->end; > + ), > + > + TP_printk("stream_id=%llu, copy=%zd, buffer: %zd, avail=%zd, > [buffer=%p, head=%p, tail=%p, end=%p]", > + __entry->stream_id, __entry->size, __entry->bsize, > __entry->size_left, > + __entry->buffer,__entry->head, __entry->tail, __entry->end) > + > + ); > + > + > +TRACE_EVENT(tsn_buffer_read, > + > + TP_PROTO(struct tsn_link *link, > + size_t bytes), > + > + TP_ARGS(link, bytes), > + > + TP_STRUCT__entry( > + __field(u64, stream_id) > + __field(size_t, size) > + __field(size_t, bsize) > + __field(size_t, size_left) > + __field(void *, buffer) > + __field(void *, head) > + __field(void *, tail) > + __field(void *, end) > + ), > + > + TP_fast_assign( > + __entry->stream_id = link->stream_id; > + __entry->size = bytes; > + __entry->bsize = link->used_buffer_size; > + __entry->size_left = (link->head - link->tail) % > link->used_buffer_size; > + __entry->buffer = link->buffer; > + __entry->head = link->head; > + __entry->tail = link->tail; > + __entry->end = link->end; > + ), > + > + TP_printk("stream_id=%llu, copy=%zd, buffer: %zd,
Re: [PATCH] net: alx: Work around the DMA RX overflow issue
On Sun, 2016-06-12 at 17:36 +0800, Feng Tang wrote: > Commit 26c5f03 uses a new skb allocator to avoid the RFD overflow > issue. > > But from debugging without datasheet, we found the error always > happen when the DMA RX address is set to 0xfc0, which is very > likely to be a HW/silicon problem. > > So one idea is instead of adding a new allocator, why not just > hitting the right target by avaiding the error-prone DMA address? > > This patch will actually > * Remove the commit 26c5f03 > * Apply rx skb with 64 bytes longer space, and if the allocated skb > has a 0x...fc0 address, it will use skb_resever(skb, 64) to > advance the address, so that the RX overflow can be avoided. > > In theory this method should also apply to atl1c driver, which > I can't find anyone who can help to test on real devices. > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761 > Signed-off-by: Feng Tang> Suggested-by: Eric Dumazet > Tested-by: Ole Lukoie > --- > drivers/net/ethernet/atheros/alx/alx.h | 4 --- > drivers/net/ethernet/atheros/alx/main.c | 61 > - > 2 files changed, 14 insertions(+), 51 deletions(-) Acked-by: Eric Dumazet Thanks !
Re: [PATCH v3] udp reuseport: fix packet of same flow hashed to different socket
On Sun, 2016-06-12 at 20:43 +0800, Su Xuemin wrote: > From: "Su, Xuemin"... > Signed-off-by: Su, Xuemin > Signed-off-by: Eric Dumazet > --- First, I want to thank you for this very high quality submission, especially if this is your first linux kernel patch. I have one additional comment to make : > if (score > badness) { > reuseport = sk->sk_reuseport; > @@ -556,14 +510,20 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 > saddr, > daddr, hnum, dif, > hslot2, slot2, skb); > if (!result) { > + unsigned int old = hash2; > hash2 = udp4_portaddr_hash(net, htonl(INADDR_ANY), > hnum); > + > + /* avoid search the same slot again. */ > + if (unlikely(old == hash2)) > + return result; > + Technically speaking, what matters is the slot, not the hash value (32bit) So I would save in old, slot2 > slot2 = hash2 & udptable->mask; And here perform the check if (unlikely(slot2 != old_slot)) return result; > hslot2 = >hash2[slot2]; > if (hslot->count < hslot2->count) > goto begin; (Same remark applies in IPv6) Thanks !
Re: [PATCH ipvs-next] ipvs: count pre-established TCP states as active
Hello, On Fri, 3 Jun 2016, Michal Kubecek wrote: > Some users observed that "least connection" distribution algorithm doesn't > handle well bursts of TCP connections from reconnecting clients after > a node or network failure. > > This is because the algorithm counts active connection as worth 256 > inactive ones where for TCP, "active" only means TCP connections in > ESTABLISHED state. In case of a connection burst, new connections are > handled before previous ones have finished the three way handshaking so > that all are still counted as "inactive", i.e. cheap ones. The become > "active" quickly but at that time, all of them are already assigned to one > real server (or few), resulting in highly unbalanced distribution. > > Address this by counting the "pre-established" states as "active". > > Signed-off-by: Michal KubecekAcked-by: Julian Anastasov Simon, please apply! > --- > net/netfilter/ipvs/ip_vs_proto_tcp.c | 25 +++-- > 1 file changed, 23 insertions(+), 2 deletions(-) > > diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c > b/net/netfilter/ipvs/ip_vs_proto_tcp.c > index d7024b2ed769..5117bcb7d2f0 100644 > --- a/net/netfilter/ipvs/ip_vs_proto_tcp.c > +++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c > @@ -395,6 +395,20 @@ static const char *const > tcp_state_name_table[IP_VS_TCP_S_LAST+1] = { > [IP_VS_TCP_S_LAST] = "BUG!", > }; > > +static const bool tcp_state_active_table[IP_VS_TCP_S_LAST] = { > + [IP_VS_TCP_S_NONE] = false, > + [IP_VS_TCP_S_ESTABLISHED] = true, > + [IP_VS_TCP_S_SYN_SENT] = true, > + [IP_VS_TCP_S_SYN_RECV] = true, > + [IP_VS_TCP_S_FIN_WAIT] = false, > + [IP_VS_TCP_S_TIME_WAIT] = false, > + [IP_VS_TCP_S_CLOSE] = false, > + [IP_VS_TCP_S_CLOSE_WAIT]= false, > + [IP_VS_TCP_S_LAST_ACK] = false, > + [IP_VS_TCP_S_LISTEN]= false, > + [IP_VS_TCP_S_SYNACK]= true, > +}; > + > #define sNO IP_VS_TCP_S_NONE > #define sES IP_VS_TCP_S_ESTABLISHED > #define sSS IP_VS_TCP_S_SYN_SENT > @@ -418,6 +432,13 @@ static const char * tcp_state_name(int state) > return tcp_state_name_table[state] ? tcp_state_name_table[state] : "?"; > } > > +static bool tcp_state_active(int state) > +{ > + if (state >= IP_VS_TCP_S_LAST) > + return false; > + return tcp_state_active_table[state]; > +} > + > static struct tcp_states_t tcp_states [] = { > /* INPUT */ > /*sNO, sES, sSS, sSR, sFW, sTW, sCL, sCW, sLA, sLI, sSA */ > @@ -540,12 +561,12 @@ set_tcp_state(struct ip_vs_proto_data *pd, struct > ip_vs_conn *cp, > > if (dest) { > if (!(cp->flags & IP_VS_CONN_F_INACTIVE) && > - (new_state != IP_VS_TCP_S_ESTABLISHED)) { > + !tcp_state_active(new_state)) { > atomic_dec(>activeconns); > atomic_inc(>inactconns); > cp->flags |= IP_VS_CONN_F_INACTIVE; > } else if ((cp->flags & IP_VS_CONN_F_INACTIVE) && > -(new_state == IP_VS_TCP_S_ESTABLISHED)) { > +tcp_state_active(new_state)) { > atomic_inc(>activeconns); > atomic_dec(>inactconns); > cp->flags &= ~IP_VS_CONN_F_INACTIVE; > -- > 2.8.3 Regards -- Julian Anastasov
[PATCH v3] udp reuseport: fix packet of same flow hashed to different socket
From: "Su, Xuemin"There is a corner case in which udp packets belonging to a same flow are hashed to different socket when hslot->count changes from 10 to 11: 1) When hslot->count <= 10, __udp_lib_lookup() searches udp_table->hash, and always passes 'daddr' to udp_ehashfn(). 2) When hslot->count > 10, __udp_lib_lookup() searches udp_table->hash2, but may pass 'INADDR_ANY' to udp_ehashfn() if the sockets are bound to INADDR_ANY instead of some specific addr. That means when hslot->count changes from 10 to 11, the hash calculated by udp_ehashfn() is also changed, and the udp packets belonging to a same flow will be hashed to different socket. This is easily reproduced: 1) Create 10 udp sockets and bind all of them to 0.0.0.0:4. 2) From the same host send udp packets to 127.0.0.1:4, record the socket index which receives the packets. 3) Create 1 more udp socket and bind it to 0.0.0.0:44096. The number 44096 is 4 + UDP_HASH_SIZE(4096), this makes the new socket put into the same hslot as the aformentioned 10 sockets, and makes the hslot->count change from 10 to 11. 4) From the same host send udp packets to 127.0.0.1:4, and the socket index which receives the packets will be different from the one received in step 2. This should not happen as the socket bound to 0.0.0.0:44096 should not change the behavior of the sockets bound to 0.0.0.0:4. It's the same case for IPv6, and this patch also fixes that. Signed-off-by: Su, Xuemin Signed-off-by: Eric Dumazet --- I use this tree to generate the patch: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git net/ipv4/udp.c | 68 -- net/ipv6/udp.c | 66 2 files changed, 28 insertions(+), 106 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 0ff31d9..586f1b0 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -391,9 +391,9 @@ int udp_v4_get_port(struct sock *sk, unsigned short snum) return udp_lib_get_port(sk, snum, ipv4_rcv_saddr_equal, hash2_nulladdr); } -static inline int compute_score(struct sock *sk, struct net *net, - __be32 saddr, unsigned short hnum, __be16 sport, - __be32 daddr, __be16 dport, int dif) +static int compute_score(struct sock *sk, struct net *net, +__be32 saddr, __be16 sport, +__be32 daddr, unsigned short hnum, int dif) { int score; struct inet_sock *inet; @@ -434,52 +434,6 @@ static inline int compute_score(struct sock *sk, struct net *net, return score; } -/* - * In this second variant, we check (daddr, dport) matches (inet_rcv_sadd, inet_num) - */ -static inline int compute_score2(struct sock *sk, struct net *net, -__be32 saddr, __be16 sport, -__be32 daddr, unsigned int hnum, int dif) -{ - int score; - struct inet_sock *inet; - - if (!net_eq(sock_net(sk), net) || - ipv6_only_sock(sk)) - return -1; - - inet = inet_sk(sk); - - if (inet->inet_rcv_saddr != daddr || - inet->inet_num != hnum) - return -1; - - score = (sk->sk_family == PF_INET) ? 2 : 1; - - if (inet->inet_daddr) { - if (inet->inet_daddr != saddr) - return -1; - score += 4; - } - - if (inet->inet_dport) { - if (inet->inet_dport != sport) - return -1; - score += 4; - } - - if (sk->sk_bound_dev_if) { - if (sk->sk_bound_dev_if != dif) - return -1; - score += 4; - } - - if (sk->sk_incoming_cpu == raw_smp_processor_id()) - score++; - - return score; -} - static u32 udp_ehashfn(const struct net *net, const __be32 laddr, const __u16 lport, const __be32 faddr, const __be16 fport) @@ -492,7 +446,7 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr, udp_ehash_secret + net_hash_mix(net)); } -/* called with read_rcu_lock() */ +/* called with rcu_read_lock() */ static struct sock *udp4_lib_lookup2(struct net *net, __be32 saddr, __be16 sport, __be32 daddr, unsigned int hnum, int dif, @@ -506,7 +460,7 @@ static struct sock *udp4_lib_lookup2(struct net *net, result = NULL; badness = 0; udp_portaddr_for_each_entry_rcu(sk, >head) { - score = compute_score2(sk, net, saddr, sport, + score = compute_score(sk, net, saddr, sport, daddr, hnum, dif); if (score > badness) { reuseport = sk->sk_reuseport; @@ -556,14
Re: [PATCH net-next] ethtool: Macro definition for SFF-8436/8636 Memory map max sizes
On Sat, 2016-06-11 at 19:26 -0700, David Miller wrote: > From: Vidya Sagar Ravipati> Date: Sat, 11 Jun 2016 16:22:38 -0700 > > > As part of ethtool application, application is requesting the drivers > > to provide the supported eeprom size to allocate memory buffer for > > getting complete dump. > > And the right way to do that is the driver requests the eeprom info > with a buffer size of zero, then the driver fills in the size field > for what the size actually is. > > Then the application can allocate the proper buffer size and rerun > the eeprom request. > > Putting endless values for each and every eeprom type a device has is > just rediculous. > > I'm not going to continue promoting this broken and unscalable scheme, > we have to fix this. I don't think there's nothing broken here. ethtool doesn't use those macros, the drivers do. Ben. -- Ben Hutchings The program is absolutely right; therefore, the computer must be wrong. signature.asc Description: This is a digitally signed message part
Re: [PATCH net-next] ethtool: Macro definition for SFF-8436/8636 Memory map max sizes
On Sat, 2016-06-11 at 15:51 -0700, David Miller wrote: [...] > Why do we need these values in the header file at all? Because we don't like putting magic numbers in driver code, and these sizes are defined by standards that are independent of a single driver. > The application can probe the size by repeated eeprom calls, increasing > the buffer size each time as needed until success. But really it should use ETHTOOL_GMODULEINFO first. Ben. -- Ben Hutchings The program is absolutely right; therefore, the computer must be wrong. signature.asc Description: This is a digitally signed message part
Re: [linux-sunxi] [PATCH 0/5] net-next: ethernet: add sun8i-emac driver
Hi, On 03-06-16 11:56, LABBE Corentin wrote: Hello This patch series add the driver for sun8i-emac which handle the Ethernet MAC present on Allwinner H3/A83T/A64 SoCs. It supports 10/100/1000 Mbit/s speed with half/full duplex. It can use an internal PHY (MII 10/100) or an external PHY via RGMII/RMII. This patch series enable the driver only for the H3 SoC since A83T and A64 doesn't have the necessary clocks present in mainline. This patch series enable the driver only for the OrangePiPC board since other board with H3 use external PHY which need optional regulators that will be supported later. The driver have been tested on the following boards: - H3 Orange PI PC, Orange PI Plus, BananaPI-M2+ - A64 Pine64 - A83T BananaPI-M3 I would like to thanks Chen-Yu Tsai for his help on developing this driver. Awesome! Thanks for your work on this. I've tested this on a orangepi-one. You can add my: Tested-by: Hans de GoedeFor the next version of this series. Attached is a dts patch for adding support for the emac one that one (to be applied on top of this patch-set). I've a Orange PI Plus and a Orange PI 2 as well, both of which use an external phy, I would love to test on those as well, do you've a tree with adds the necessary regulator bits and an example of how this would like in dts somewhere ? One remark about: [PATCH 5/5] ARM: dts: sun8i: Enable sun8i-emac on the Orange PI PC We always keep the nodes in board.dts files sorted alphebetically, so the emac node should be after the ehci node rather then at the end. Regards, Hans Regards LABBE Corentin (5): ethernet: add sun8i-emac driver MAINTAINERS: Add myself as maintainers of sun8i-emac ARM: sun8i: dt: Add DT bindings documentation for Allwinner sun8i-emac ARM: dts: sun8i-h3: add sun8i-emac ethernet driver ARM: dts: sun8i: Enable sun8i-emac on the Orange PI PC .../bindings/net/allwinner,sun8i-emac.txt | 64 + MAINTAINERS|6 + arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts | 11 + arch/arm/boot/dts/sun8i-h3.dtsi| 14 + drivers/net/ethernet/allwinner/Kconfig | 13 + drivers/net/ethernet/allwinner/Makefile|1 + drivers/net/ethernet/allwinner/sun8i-emac.c| 1943 7 files changed, 2052 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt create mode 100644 drivers/net/ethernet/allwinner/sun8i-emac.c >From 25ea802c4c2962dc50c34bd02faa9a5f3acd2e34 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Sun, 12 Jun 2016 13:20:05 +0200 Subject: [PATCH v3] ARM: dts: sun8i: Enable sun8i-emac on the Orange PI One The sun8i-emac hardware is present on the Orange PI One. It uses the internal PHY. This patch create the needed emac and phy nodes. Signed-off-by: Hans de Goede --- arch/arm/boot/dts/sun8i-h3-orangepi-one.dts | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/arm/boot/dts/sun8i-h3-orangepi-one.dts b/arch/arm/boot/dts/sun8i-h3-orangepi-one.dts index 0adf932..8df5c74 100644 --- a/arch/arm/boot/dts/sun8i-h3-orangepi-one.dts +++ b/arch/arm/boot/dts/sun8i-h3-orangepi-one.dts @@ -94,6 +94,17 @@ status = "okay"; }; + { + phy = <>; + phy-mode = "mii"; + allwinner,use-internal-phy; + allwinner,leds-active-low; + status = "okay"; + phy1: ethernet-phy@1 { + reg = <1>; + }; +}; + { pinctrl-names = "default"; pinctrl-0 = <_pins_a>, <_cd_pin>; -- 2.7.4
Re: [PATCH] netlink.7: describe netlink socket options
Hi Andrey, On 06/10/2016 10:28 PM, Andrey Vagin wrote: > Cc: Kir Kolyshkin> Cc: Michael Kerrisk > Cc: Herbert Xu > Cc: Patrick McHardy > Cc: Christophe Ricard > Cc: Nicolas Dichtel > Signed-off-by: Andrey Vagin > --- > man7/netlink.7 | 75 > ++ > 1 file changed, 75 insertions(+) Thanks for the nicely done patch. Applied! Cheers, Michael > diff --git a/man7/netlink.7 b/man7/netlink.7 > index 513f854..b4848df 100644 > --- a/man7/netlink.7 > +++ b/man7/netlink.7 > @@ -368,6 +368,81 @@ and > .BR NETLINK_SELINUX > groups allow other users to receive messages. > No groups allow other users to send messages. > + > +.SS Socket options > +To set or get a netlink socket option, call > +.BR getsockopt (2) > +to read or > +.BR setsockopt (2) > +to write the option with the option level argument set to > +.BR SOL_NETLINK . > +Unless otherwise noted, > +.I optval > +is a pointer to an > +.IR int . > +.TP > +.BR NETLINK_PKTINFO " (since Linux 2.6.14)" > +Enable > +.B nl_pktinfo > +control messages for received packets to get the extended > +destination group number. > +.TP > +.BR NETLINK_ADD_MEMBERSHIP ,\ NETLINK_DROP_MEMBERSHIP " (since Linux > 2.6.14)" > +Join/leave a group specified by > +.IR optval . > +.\" commit 9a4595bc7e67962f13232ee55a64e063062c3a99 > +.\" Author: Patrick McHardy > +.TP > +.BR NETLINK_LIST_MEMBERSHIPS " (since Linux 4.2)" > +Retrieve all groups a socket is a member of. > +.I optval > +is a pointer to > +.B __u32 > +and > +.I optlen > +is the size of the array. The array is filled with the full membership set > of the > +socket, and the required array size is returned in > +.I optlen. > +.\" commit b42be38b2778eda2237fc759e55e3b698b05b315 > +.\" Author: David Herrmann > +.TP > +.BR NETLINK_BROADCAST_ERROR " (since Linux 2.6.30)" > +When not set, > +.B netlink_broadcast() > +only reports > +.B ESRCH > +errors and silently ignore > +.B NOBUFS > +errors. > +.\" commit be0c22a46cfb79ab2342bb28fde99afa94ef868e > +.\" Author: Pablo Neira Ayuso > +.TP > +.BR NETLINK_NO_ENOBUFS " (since Linux 2.6.30)" > +This flag can be used by unicast and broadcast listeners to avoid receiving > +.B ENOBUFS > +errors. > +.\" commit 38938bfe3489394e2eed5e40c9bb8f66a2ce1405 > +.\" Author: Pablo Neira Ayuso > +.TP > +.BR NETLINK_LISTEN_ALL_NSID " (since Linux 4.2)" > +When set, this socket will receive netlink notifications from all network > namespaces that > +have an > +.I nsid > +assigned into the network namespace where the socket has been opened. The > +.I nsid > +is sent to user space via an ancillary data. > +.\" commit 59324cf35aba5336b611074028777838a963d03b > +.\" Author: Nicolas Dichtel > +.TP > +.BR NETLINK_CAP_ACK " (since Linux 4.2)" > +The kernel may fail to allocate the necessary room for the acknowledgment > +message back to userspace. This option trims off the payload of the original > +netlink message. > +The netlink message header is still included, so the user can guess from the > +sequence number what is the message that has triggered the acknowledgment. > +.\" commit 0a6a3a23ea6efde079a5b77688541a98bf202721 > +.\" Author: Christophe Ricard > + > .SH VERSIONS > The socket interface to netlink is a new feature of Linux 2.2. > > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Re: [very-RFC 0/8] TSN driver for the kernel
On Jun 12 2016 17:31, Henrik Austad wrote: > On Sun, Jun 12, 2016 at 01:30:24PM +0900, Takashi Sakamoto wrote: >> On Jun 12 2016 12:38, Takashi Sakamoto wrote: >>> In your patcset, there's no actual codes about how to handle any >>> interrupt contexts (software / hardware), how to handle packet payload, >>> and so on. Especially, for recent sound subsystem, the timing of >>> generating interrupts and which context does what works are important to >>> reduce playback/capture latency and power consumption. >>> >>> Of source, your intention of this patchset is to show your early concept >>> of TSN feature. Nevertheless, both of explaination and codes are >>> important to the other developers who have little knowledges about TSN, >>> AVB and AES-64 such as me. >> >> Oops. Your 5th patch was skipped by alsa-project.org. I guess that size >> of the patch is too large to the list service. I can see it: >> http://marc.info/?l=linux-netdev=146568672728661=2 >> >> As long as seeing the patch, packets are queueing in hrtimer callbacks >> every 1 seconds. > > Actually, the hrtimer fires every 1ms, and that part is something I have to > do something about, also because it sends of the same number of frames > every time, regardless of how accurate the internal timer is to the rest of > the network (there's no backpressure from the networking layer). > >> (This is a high level discussion and it's OK to ignore it for the >> moment. When writing packet-oriented drivers for sound subsystem, you >> need to pay special attention to accuracy of the number of PCM frames >> transferred currently, and granularity of the number of PCM frames >> transferred by one operation. In this case, snd_avb_hw, >> snd_avb_pcm_pointer(), tsn_buffer_write_net() and tsn_buffer_read_net() >> are involved in this discussion. You can see ALSA developers' struggle >> in USB audio device class drivers and (of cource) IEC 61883-1/6 drivers.) > > Ah, good point. Any particular parts of the USB-subsystem I should start > looking at? I don't think it's a beter way for you to study USB Audio Device Class driver unless you're interested in ALSA or USB subsystem. (But for your information, snd-usb-audio is in sound/usb/* of Linux kernel. IEC 61883-1/6 driver is in sound/firewire/*.) We need different strategy to achieve it on different transmission backend. > Knowing where to start looking is a tremendous help It's not well-documented, and not well-generalized for packet-oriented drivers. Most of developers who have enough knowledge about it work for DMA-oriented drivers in mobile platforms and have little interests in packet-oriented drivers. You need to find your own way. Currently I have few advices to you, because I'm also on the way for drivers to process IEC 61883-1/6 packets on IEEE 1394 bus with enough accuracy and granularity. The paper I introduced is for the way (but not mature). I wish you get more helps from the other developers. Your work is more significant to Linux system, than mine. (And I hope your future work get no ignorance and no unreasonable hostility from coarse users.) Regards Takashi Sakamoto signature.asc Description: OpenPGP digital signature
Re: [PATCH net] fib_rules: don't break ECN with TOS rules
On Sun, Jun 12, 2016, at 11:28, Julian Anastasov wrote: > > Hello, > > On Sun, 12 Jun 2016, Hannes Frederic Sowa wrote: > > > On Sun, Jun 12, 2016, at 02:09, Julian Anastasov wrote: > > > > > > Well, may be the confusion comes from commit 89aef8921bfb > > > ("ipv4: Delete routing cache.") where the 'tos &= IPTOS_RT_MASK;' > > > line is lost from ip_route_input_common. I think, we should > > > add it back, so that we can properly match input routes with rules > > > that specify tos value. Old kernels didn't stored ECN bits in > > > flowi4_tos in the input path, so we should do the same. > > > > I would love to have done that but was fearing problems with user space > > compatibility. Also IPTOS_RT_MASK is not enough for filtering, we need > > to check for the whole INET_ECN_MASK. > > I checked even 2.4.33 for reference. Nobody sets bit 1 > in key->tos but using IPTOS_TOS_MASK on configuration allows > rules with 'tos 2'. I guess, in this case such rule can not be > matched. And the TOS matching was used only under > CONFIG_IP_ROUTE_TOS, so the ip rules were supposed to match only > bits 2..4, not full TOS values. Thanks for looking it up! :) > > > > + INET_ECN_ignore(r->tos) != INET_ECN_ignore(fl4->flowi4_tos)) > > > > return 0; > > > > > > fib4_rule_configure already rejects ECN bits in r->tos, > > > so no need to filter them again in fast path. > > > > The problem is that IPTOS_TOS_MASK is just the masking of the 0x1 bit, > > not both ECN bits. Thi stems from the pre-DSCP time where it was > > forbidden to use bit 0x1 in TOS. > > I see, I didn't noticed it. If we decide to reject > rules with tos 2 we can break scripts. May be we should just > filter it: > > if (r->tos && ((r->tos & IPTOS_RT_MASK) != fl4->flowi4_tos)) > > This will allow the hidden feature 'ip rule add tos 2' to > match properly bits2..4 == 0 :) But may be we should not spend > extra cycles in fast path for such feature. As nobody complained > for problems with 3.6+, it seems match by tos is not used often. Not sure if people really noticed or it just quietly broke ecn. :/ > May be we just need to restore the missing 'tos &= IPTOS_RT_MASK;' > line? I fear we already use those lower bits for other flags, like RTO_ONLINK. I will study this today and will report back. My last proposal would be to simply enclose my block with a if (r->tos) { ... }. Not sure, yet. Bye, Hannes
[PATCH net] ipv4: fix checksum annotation in udp4_csum_init
Reported-by: Cong WangCc: Cong Wang Cc: Tom Herbert Fixes: 4068579e1e098fa ("net: Implmement RFC 6936 (zero RX csums for UDP/IPv6") Signed-off-by: Hannes Frederic Sowa --- net/ipv4/udp.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 0ff31d97d48586..ba0d8b8b76900a 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1755,8 +1755,11 @@ static inline int udp4_csum_init(struct sk_buff *skb, struct udphdr *uh, return err; } - return skb_checksum_init_zero_check(skb, proto, uh->check, - inet_compute_pseudo); + /* Note, we are only interested in != 0 or == 0, thus the +* force to int. +*/ + return (__force int)skb_checksum_init_zero_check(skb, proto, uh->check, +inet_compute_pseudo); } /* -- 2.5.5
Re: [PATCH v9 11/22] IB/hns: Add IB device registration
On 2016/6/9 14:26, Leon Romanovsky wrote: On Wed, Jun 01, 2016 at 11:37:53PM +0800, Lijun Ou wrote: This patch registered IB device when loaded, and unregistered IB device when removed. Signed-off-by: Wei HuSigned-off-by: Nenglong Zhao Signed-off-by: Lijun Ou --- drivers/infiniband/hw/hns/hns_roce_main.c | 46 +++ 1 file changed, 46 insertions(+) diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c index 7fb0d34..f179a7f 100644 --- a/drivers/infiniband/hw/hns/hns_roce_main.c +++ b/drivers/infiniband/hw/hns/hns_roce_main.c @@ -62,6 +62,41 @@ #include "hns_roce_device.h" #include "hns_roce_icm.h" +void hns_roce_unregister_device(struct hns_roce_dev *hr_dev) You are not calling to this function in this patch. +{ + ib_unregister_device(_dev->ib_dev); +} + +int hns_roce_register_device(struct hns_roce_dev *hr_dev) This function should be static. +{ + int ret; + struct hns_roce_ib_iboe *iboe = NULL; + struct ib_device *ib_dev = NULL; + struct device *dev = _dev->pdev->dev; + + iboe = _dev->iboe; + + ib_dev = _dev->ib_dev; + strlcpy(ib_dev->name, "hisi_%d", IB_DEVICE_NAME_MAX); + + ib_dev->owner= THIS_MODULE; + ib_dev->node_type= RDMA_NODE_IB_CA; + ib_dev->dma_device = dev; + + ib_dev->phys_port_cnt= hr_dev->caps.num_ports; + ib_dev->local_dma_lkey = hr_dev->caps.reserved_lkey; + ib_dev->num_comp_vectors = hr_dev->caps.num_comp_vectors; + ib_dev->uverbs_abi_ver = 1; + + ret = ib_register_device(ib_dev, NULL); + if (ret) { + dev_err(dev, "ib_register_device failed!\n"); + return ret; + } + + return 0; +} + int hns_roce_get_cfg(struct hns_roce_dev *hr_dev) { int i; @@ -363,6 +398,17 @@ static int hns_roce_probe(struct platform_device *pdev) goto error_failed_engine_init; } + ret = hns_roce_register_device(hr_dev); + if (ret) { + dev_err(dev, "register_device failed!\n"); According to the current code, you will print this error together with error line in hns_roce_register_device for the same failure. "ib_register_device failed!" "register_device failed!" Hi, leon In this patch [PATCH v9 11/22], there is only one error branch in funtion named hns_roce_register_device. In the following patch [PATCH v9 13/22], we add more operation, there are more than two error branch in this function as below. @@ -212,8 +402,27 @@ int hns_roce_register_device(struct hns_roce_dev *hr_dev) goto error_failed_setup_mtu_gids; } spin_lock_init(>lock); iboe->nb.notifier_call = hns_roce_netdev_event; ret = register_netdevice_notifier(>nb); if (ret) { dev_err(dev, "register_netdevice_notifier failed!\n"); goto error_failed_setup_mtu_gids; } iboe->nb_inet.notifier_call = hns_roce_inet_event; ret = register_inetaddr_notifier(>nb_inet); if (ret) { dev_err(dev, "register inet addr notifier failed!\n"); goto error_failed_register_inetaddr_notifier; } return 0; error_failed_register_inetaddr_notifier: unregister_netdevice_notifier(>nb); error_failed_setup_mtu_gids: ib_unregister_device(ib_dev); Regards Wei Hu + goto error_failed_register_device; + } + + return 0; + +error_failed_register_device: + hns_roce_engine_exit(hr_dev); + error_failed_engine_init: hns_roce_cleanup_bitmap(hr_dev); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: alx: Work around the DMA RX overflow issue
Commit 26c5f03 uses a new skb allocator to avoid the RFD overflow issue. But from debugging without datasheet, we found the error always happen when the DMA RX address is set to 0xfc0, which is very likely to be a HW/silicon problem. So one idea is instead of adding a new allocator, why not just hitting the right target by avaiding the error-prone DMA address? This patch will actually * Remove the commit 26c5f03 * Apply rx skb with 64 bytes longer space, and if the allocated skb has a 0x...fc0 address, it will use skb_resever(skb, 64) to advance the address, so that the RX overflow can be avoided. In theory this method should also apply to atl1c driver, which I can't find anyone who can help to test on real devices. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761 Signed-off-by: Feng TangSuggested-by: Eric Dumazet Tested-by: Ole Lukoie --- drivers/net/ethernet/atheros/alx/alx.h | 4 --- drivers/net/ethernet/atheros/alx/main.c | 61 - 2 files changed, 14 insertions(+), 51 deletions(-) diff --git a/drivers/net/ethernet/atheros/alx/alx.h b/drivers/net/ethernet/atheros/alx/alx.h index d02c424..8fc93c5 100644 --- a/drivers/net/ethernet/atheros/alx/alx.h +++ b/drivers/net/ethernet/atheros/alx/alx.h @@ -96,10 +96,6 @@ struct alx_priv { unsigned int rx_ringsz; unsigned int rxbuf_size; - struct page *rx_page; - unsigned int rx_page_offset; - unsigned int rx_frag_size; - struct napi_struct napi; struct alx_tx_queue txq; struct alx_rx_queue rxq; diff --git a/drivers/net/ethernet/atheros/alx/main.c b/drivers/net/ethernet/atheros/alx/main.c index c98acdc..e708e36 100644 --- a/drivers/net/ethernet/atheros/alx/main.c +++ b/drivers/net/ethernet/atheros/alx/main.c @@ -70,35 +70,6 @@ static void alx_free_txbuf(struct alx_priv *alx, int entry) } } -static struct sk_buff *alx_alloc_skb(struct alx_priv *alx, gfp_t gfp) -{ - struct sk_buff *skb; - struct page *page; - - if (alx->rx_frag_size > PAGE_SIZE) - return __netdev_alloc_skb(alx->dev, alx->rxbuf_size, gfp); - - page = alx->rx_page; - if (!page) { - alx->rx_page = page = alloc_page(gfp); - if (unlikely(!page)) - return NULL; - alx->rx_page_offset = 0; - } - - skb = build_skb(page_address(page) + alx->rx_page_offset, - alx->rx_frag_size); - if (likely(skb)) { - alx->rx_page_offset += alx->rx_frag_size; - if (alx->rx_page_offset >= PAGE_SIZE) - alx->rx_page = NULL; - else - get_page(page); - } - return skb; -} - - static int alx_refill_rx_ring(struct alx_priv *alx, gfp_t gfp) { struct alx_rx_queue *rxq = >rxq; @@ -115,9 +86,22 @@ static int alx_refill_rx_ring(struct alx_priv *alx, gfp_t gfp) while (!cur_buf->skb && next != rxq->read_idx) { struct alx_rfd *rfd = >rfd[cur]; - skb = alx_alloc_skb(alx, gfp); + /* +* When DMA RX address is set to something like +* 0xfc0, it will be very likely to cause DMA +* RFD overflow issue. +* +* To work around it, we apply rx skb with 64 bytes +* longer space, and offset the address whenever +* 0xfc0 is detected. +*/ + skb = __netdev_alloc_skb(alx->dev, alx->rxbuf_size + 64, gfp); if (!skb) break; + + if (((unsigned long)skb->data & 0xfff) == 0xfc0) + skb_reserve(skb, 64); + dma = dma_map_single(>hw.pdev->dev, skb->data, alx->rxbuf_size, DMA_FROM_DEVICE); @@ -153,7 +137,6 @@ static int alx_refill_rx_ring(struct alx_priv *alx, gfp_t gfp) alx_write_mem16(>hw, ALX_RFD_PIDX, cur); } - return count; } @@ -622,11 +605,6 @@ static void alx_free_rings(struct alx_priv *alx) kfree(alx->txq.bufs); kfree(alx->rxq.bufs); - if (alx->rx_page) { - put_page(alx->rx_page); - alx->rx_page = NULL; - } - dma_free_coherent(>hw.pdev->dev, alx->descmem.size, alx->descmem.virt, @@ -681,7 +659,6 @@ static int alx_request_irq(struct alx_priv *alx) alx->dev->name, alx); if (!err) goto out; - /* fall back to legacy interrupt */ pci_disable_msi(alx->hw.pdev); } @@ -725,7 +702,6 @@ static int alx_init_sw(struct alx_priv *alx) struct pci_dev *pdev = alx->hw.pdev; struct alx_hw
Re: [PATCH net] fib_rules: don't break ECN with TOS rules
Hello, On Sun, 12 Jun 2016, Hannes Frederic Sowa wrote: > On Sun, Jun 12, 2016, at 02:09, Julian Anastasov wrote: > > > > Well, may be the confusion comes from commit 89aef8921bfb > > ("ipv4: Delete routing cache.") where the 'tos &= IPTOS_RT_MASK;' > > line is lost from ip_route_input_common. I think, we should > > add it back, so that we can properly match input routes with rules > > that specify tos value. Old kernels didn't stored ECN bits in > > flowi4_tos in the input path, so we should do the same. > > I would love to have done that but was fearing problems with user space > compatibility. Also IPTOS_RT_MASK is not enough for filtering, we need > to check for the whole INET_ECN_MASK. I checked even 2.4.33 for reference. Nobody sets bit 1 in key->tos but using IPTOS_TOS_MASK on configuration allows rules with 'tos 2'. I guess, in this case such rule can not be matched. And the TOS matching was used only under CONFIG_IP_ROUTE_TOS, so the ip rules were supposed to match only bits 2..4, not full TOS values. > > > + INET_ECN_ignore(r->tos) != INET_ECN_ignore(fl4->flowi4_tos)) > > > return 0; > > > > fib4_rule_configure already rejects ECN bits in r->tos, > > so no need to filter them again in fast path. > > The problem is that IPTOS_TOS_MASK is just the masking of the 0x1 bit, > not both ECN bits. Thi stems from the pre-DSCP time where it was > forbidden to use bit 0x1 in TOS. I see, I didn't noticed it. If we decide to reject rules with tos 2 we can break scripts. May be we should just filter it: if (r->tos && ((r->tos & IPTOS_RT_MASK) != fl4->flowi4_tos)) This will allow the hidden feature 'ip rule add tos 2' to match properly bits2..4 == 0 :) But may be we should not spend extra cycles in fast path for such feature. As nobody complained for problems with 3.6+, it seems match by tos is not used often. May be we just need to restore the missing 'tos &= IPTOS_RT_MASK;' line? Regards -- Julian Anastasov
Re: [very-RFC 0/8] TSN driver for the kernel
On Jun 12 2016 17:28, Henrik Austad wrote: > On Sun, Jun 12, 2016 at 12:38:36PM +0900, Takashi Sakamoto wrote: >> I'm one of maintainers for ALSA firewire stack, which handles IEC >> 61883-1/6 and vendor-unique packets on IEEE 1394 bus for consumer >> recording equipments. >> (I'm not in MAINTAINERS because I'm a shy boy.) >> >> IEC 61883-6 describes that one packet can multiplex several types of >> data in its data channels; i.e. Multi Bit Linear Audio data (PCM >> samples), One Bit Audio Data (DSD), MIDI messages and so on. > > Hmm, that I did not know, not sure how that applies to AVB, but definately > something I have to look into. For your information, I describe more about it. You can see pre-standardized specification for IEC 61883-6 in website of 1394 Trade Association. Let's look for 'Audio and Music Data Transmission Protocol 2.3 (October 13, 2010, 1394TA)' http://1394ta.org/specifications/ In 'clause 12. AM824 SEQUENCE ADAPTATION LAYERS', you can see that one data block includes several types of data. But I can imagine that joint group for AVB loosely refers to IEC 61883-6. In this case, AVB specification might describe one data block transfers one type of data, to drop unreasonable complexities. >> If you handles packet payload in 'struct snd_pcm_ops.copy', a process >> context of an ALSA PCM applications performs the work. Thus, no chances >> to multiplex data with the other types. > > The driver is not adhering fully to any standards right now, the amount of > detail is quite high - but I'm slowly improving as I go through the > standards. Getting on top of all the standards and all the different > subsystems are definately a work in progress (it's a lot to digest!) In my taste, the driver is not necessarily compliant to any standards. It's enough just to work its task, without bad side-effects to Linux system. Based on this concept, current ALSA firewire stack just support PCM frames and MIDI messages. Here, I tell you that actual devices tend not to be compliant to any standards and lost inter-operability. (Especially, most of audio and music units on IEEE 1394 bus ignores some of items in standards. In short, they already lost inter-operability.) So here, we just consider about what actual devices do, instead of following any standards. Regards Takashi Sakamoto signature.asc Description: OpenPGP digital signature
Re: [very-RFC 0/8] TSN driver for the kernel
On Sun, Jun 12, 2016 at 01:30:24PM +0900, Takashi Sakamoto wrote: > On Jun 12 2016 12:38, Takashi Sakamoto wrote: > > In your patcset, there's no actual codes about how to handle any > > interrupt contexts (software / hardware), how to handle packet payload, > > and so on. Especially, for recent sound subsystem, the timing of > > generating interrupts and which context does what works are important to > > reduce playback/capture latency and power consumption. > > > > Of source, your intention of this patchset is to show your early concept > > of TSN feature. Nevertheless, both of explaination and codes are > > important to the other developers who have little knowledges about TSN, > > AVB and AES-64 such as me. > > Oops. Your 5th patch was skipped by alsa-project.org. I guess that size > of the patch is too large to the list service. I can see it: > http://marc.info/?l=linux-netdev=146568672728661=2 > > As long as seeing the patch, packets are queueing in hrtimer callbacks > every 1 seconds. Actually, the hrtimer fires every 1ms, and that part is something I have to do something about, also because it sends of the same number of frames every time, regardless of how accurate the internal timer is to the rest of the network (there's no backpressure from the networking layer). > (This is a high level discussion and it's OK to ignore it for the > moment. When writing packet-oriented drivers for sound subsystem, you > need to pay special attention to accuracy of the number of PCM frames > transferred currently, and granularity of the number of PCM frames > transferred by one operation. In this case, snd_avb_hw, > snd_avb_pcm_pointer(), tsn_buffer_write_net() and tsn_buffer_read_net() > are involved in this discussion. You can see ALSA developers' struggle > in USB audio device class drivers and (of cource) IEC 61883-1/6 drivers.) Ah, good point. Any particular parts of the USB-subsystem I should start looking at? Knowing where to start looking is a tremendous help Thanks for the feedback! -- Henrik Austad signature.asc Description: Digital signature
Re: [very-RFC 0/8] TSN driver for the kernel
On Sun, Jun 12, 2016 at 12:38:36PM +0900, Takashi Sakamoto wrote: > Hi, > > I'm one of maintainers for ALSA firewire stack, which handles IEC > 61883-1/6 and vendor-unique packets on IEEE 1394 bus for consumer > recording equipments. > (I'm not in MAINTAINERS because I'm a shy boy.) > > IEC 61883-6 describes that one packet can multiplex several types of > data in its data channels; i.e. Multi Bit Linear Audio data (PCM > samples), One Bit Audio Data (DSD), MIDI messages and so on. Hmm, that I did not know, not sure how that applies to AVB, but definately something I have to look into. > If you handles packet payload in 'struct snd_pcm_ops.copy', a process > context of an ALSA PCM applications performs the work. Thus, no chances > to multiplex data with the other types. Hmm, ok, I didn't know that, that is something I need to look into -and incidentally one of the reasons why I posted the series now instead of a few more months down the road - thanks! The driver is not adhering fully to any standards right now, the amount of detail is quite high - but I'm slowly improving as I go through the standards. Getting on top of all the standards and all the different subsystems are definately a work in progress (it's a lot to digest!) > To prevent this situation, current ALSA firewire stack handles packet > payload in software interrupt context of isochronous context of OHCI > 1394. As a result of this, the software stack supports PCM substreams > and MIDI substreams. > > In your patcset, there's no actual codes about how to handle any > interrupt contexts (software / hardware), how to handle packet payload, > and so on. Especially, for recent sound subsystem, the timing of > generating interrupts and which context does what works are important to > reduce playback/capture latency and power consumption. See reply in other mail :) > Of source, your intention of this patchset is to show your early concept > of TSN feature. Nevertheless, both of explaination and codes are > important to the other developers who have little knowledges about TSN, > AVB and AES-64 such as me. Yes, that is one of the things I aimed for, and also getting feedback on the overall thinking > And, I might cooperate to prepare for common IEC 61883 layer. For actual > codes of ALSA firewire stack, please see mainline kernel code. For > actual devices of IEC 61883-1/6 and IEEE 1394 bus, please refer to my > report in 2014. At least, you can get to know what to consider about > developing upper drivers near ALSA userspace applications. > https://github.com/takaswie/alsa-firewire-report Thanks, I'll dig into that, much appreciated > (But I confirm that the report includes my misunderstandings in clause > 3.4 and 6.2. need more time...) ok, good to know Thank you for your input, very much appreicated! -- Henrik Austad signature.asc Description: Digital signature
Re: [ovs-dev] [PATCH net-next] NSH(Network Service Header) implementation
On Mon, Jun 06, 2016 at 05:45:08PM -0700, pravin shelar wrote: > On Mon, Jun 6, 2016 at 2:34 AM, Yi Yangwrote: > > IETF defined NSH(Network Service Header) for Service > > Function Chaining, this is an IETF draft > > > > https://tools.ietf.org/html/draft-ietf-sfc-nsh-05 > > > > It will be a IETF standard shortly, this patch implemented > > NSH for Open vSwitch. > > > > Signed-off-by: Johnson Li > > Signed-off-by: Yi Yang > > --- > > drivers/net/vxlan.c | 7 ++ > > include/net/nsh.h| 117 +++ > > include/uapi/linux/openvswitch.h | 32 +++ > > net/openvswitch/actions.c| 68 + > > net/openvswitch/flow.c | 45 - > > net/openvswitch/flow.h | 15 +++ > > net/openvswitch/flow_netlink.c | 202 > > ++- > > net/openvswitch/vport-netdev.c | 3 +- > > net/openvswitch/vport-vxlan.c| 15 +++ > > 9 files changed, 501 insertions(+), 3 deletions(-) > > create mode 100644 include/net/nsh.h > > > > ... > ... > > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > > index 9a3eb7a..38e787c 100644 > > --- a/net/openvswitch/actions.c > > +++ b/net/openvswitch/actions.c > > @@ -29,6 +29,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -38,6 +39,7 @@ > > #include > > #include > > #include > > +#include > > > > #include "datapath.h" > > #include "flow.h" > > @@ -259,6 +261,64 @@ static int push_vlan(struct sk_buff *skb, struct > > sw_flow_key *key, > > ntohs(vlan->vlan_tci) & ~VLAN_TAG_PRESENT); > > } > > > ... > ... > > + > > +static int push_nsh(struct sk_buff *skb, struct sw_flow_key *key, > > + const struct ovs_action_push_nsh *nsh) > > +{ > > + if (nsh->len > 0 && nsh->len <= 256) { > > + struct nsh_hdr *nsh_hdr = NULL; > > + > > + if (skb_cow_head(skb, nsh->len) < 0) > > + return -ENOMEM; > > + > > + skb_push(skb, nsh->len); > > + nsh_hdr = (struct nsh_hdr *)(skb->data); > > + memcpy(nsh_hdr, nsh->header, nsh->len); > > + > > + if (!skb->inner_protocol) > > + skb_set_inner_protocol(skb, skb->protocol); > > + > > + skb->protocol = htons(ETH_P_NSH); /* 0x894F */ > > + key->eth.type = htons(ETH_P_NSH); > > + } else { > > + return -EINVAL; > > + } > > + > > + return 0; > > +} > > Networking stack or OVS can not handle arbitrary skb-protocol. For > example what happens if OVS has push vlan action or it sends this nsh > packet to net device which can not handle nsh packet? Even networking > stack can not parse such packet for handling offloads in software. > ... > ovs nsh patch set have been sent out, they can handle nsh header. This patch just implemented NSH push and pop actions, we do need some cdoe to do some sanity check for the case you mentioned, but I think it is out of scope of thi patch. > > diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c > > index 5eb7694..3d060c4 100644 > > --- a/net/openvswitch/vport-vxlan.c > > +++ b/net/openvswitch/vport-vxlan.c > > @@ -52,6 +52,18 @@ static int vxlan_get_options(const struct vport *vport, > > struct sk_buff *skb) > > return -EMSGSIZE; > > > > nla_nest_end(skb, exts); > > + } else if (vxlan->flags & VXLAN_F_GPE) { > > + struct nlattr *exts; > > + > > + exts = nla_nest_start(skb, OVS_TUNNEL_ATTR_EXTENSION); > > + if (!exts) > > + return -EMSGSIZE; > > + > > + if (vxlan->flags & VXLAN_F_GPE && > > + nla_put_flag(skb, OVS_VXLAN_EXT_GPE)) > > + return -EMSGSIZE; > > + > > + nla_nest_end(skb, exts); > > } > > > > return 0; > > @@ -59,6 +71,7 @@ static int vxlan_get_options(const struct vport *vport, > > struct sk_buff *skb) > > > > static const struct nla_policy exts_policy[OVS_VXLAN_EXT_MAX + 1] = { > > [OVS_VXLAN_EXT_GBP] = { .type = NLA_FLAG, }, > > + [OVS_VXLAN_EXT_GPE] = { .type = NLA_FLAG, }, > > }; > > > > static int vxlan_configure_exts(struct vport *vport, struct nlattr *attr, > > @@ -76,6 +89,8 @@ static int vxlan_configure_exts(struct vport *vport, > > struct nlattr *attr, > > > > if (exts[OVS_VXLAN_EXT_GBP]) > > conf->flags |= VXLAN_F_GBP; > > + else if (exts[OVS_VXLAN_EXT_GPE]) > > + conf->flags |= VXLAN_F_GPE; > > > This is compatibility code, no need to add new features to this code. > Now we should be directly using net devices. Will use net device after those patches are merged into net-next, It seems current net device implementation in
Re: [Bridge] [PATCH net-next v2] bridge: Synchronize unicast filtering with FDB
On 16/06/12 (日) 1:17, Nikolay Aleksandrov via Bridge wrote: On 06/11/2016 07:35 AM, David Miller wrote: From: Toshiaki MakitaDate: Mon, 6 Jun 2016 21:20:13 +0900 Patrick Schaaf reported that flooding due to a missing fdb entry of the address of macvlan on the bridge device caused high CPU consumption of an openvpn process behind a tap bridge port. Adding an fdb entry of the macvlan address can suppress flooding and avoid this problem. This change makes bridge able to synchronize unicast filtering with fdb automatically so admin do not need to manually add an fdb entry. This effectively supports IFF_UNICAST_FLT in bridge, thus adding an macvlan device would not place bridge into promiscuous mode as well. v2: - Test vlan with br_vlan_should_use() in br_fdb_sync_uc() as per Nikolay Aleksandrov. Reported-by: Patrick Schaaf Signed-off-by: Toshiaki Makita I really need bridging experts to review and ACK/NACK this. Thanks. Oops, I almost missed the v2, sorry about that. So, technically it looks correct, but I only fear the scalability impact of the change. If there're a large number of vlans adding a macvlan (or any device that syncs uc addr) might become very slow and every flag change will become very slow too without an option to revert to the original behaviour so we'll have to wait for the entries to be added in order to delete them. Another common scenario is having 8021q interfaces on top of the bridge with different mac addresses for some of the configured vlans (or with macvlans on top of them for VRR), that use case would suffer as well because their macs need to be local only for those vlans, and not the 2000+ other vlans that might exist. On every sync_uc() call all the fdb entries get deleted and added again, so even after deleting some manually they can come back unexpectedly after some operation and also the message storm from all the deletes and adds could be problematic as well. E.g. 2000 br0 vlans, 25 macvlans on br0 (adding them took more than 5 minutes, 53k fdb entries): $ bridge fdb del de:8e:9f:16:c5:71 dev br0 vlan 289 $ ip l set br0 multicast on $ bridge fdb | grep 289 | grep de:8e:9f:16:c5:71 de:8e:9f:16:c5:71 dev br0 vlan 1289 master br0 permanent de:8e:9f:16:c5:71 dev br0 vlan 289 master br0 permanent In fact you can't escape the slow performance even if you delete all entries because on the next flag change or interface add, they will be added back. I still think this auto-sync should be done, otherwise macvlan imposes promiscuous mode on bridge even if you manually add such fdb entries. I believe most of your concern would disappear by making use of __dev_uc_sync() instead. Indeed it seems that there is no easy way to propagate the combination of uc addr and vlan from upper device, so local entries for unneeded vlan can still be created even if using __dev_uc_sync(). In case you worry about those unneeded entries, I can add a knob to disable this feature. Are you comfortable with this change? Toshiaki Makita