Re: [PATCH net 2/2] bpf: fix overflow in prog accounting

2016-12-16 Thread kbuild test robot
Hi Daniel,

[auto build test ERROR on net/master]

url:
https://github.com/0day-ci/linux/commits/Daniel-Borkmann/bpf-dynamically-allocate-digest-scratch-buffer/20161217-090046
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or32-linux-gcc (GCC) 4.5.1-or32-1.0rc1
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=openrisc 

All errors (new ones prefixed by >>):

   kernel/bpf/core.c: In function '__bpf_prog_charge':
>> kernel/bpf/core.c:80:50: error: 'struct user_struct' has no member named 
>> 'locked_vm'
   kernel/bpf/core.c:82:32: error: 'struct user_struct' has no member named 
'locked_vm'
   kernel/bpf/core.c: In function '__bpf_prog_uncharge':
   kernel/bpf/core.c:93:31: error: 'struct user_struct' has no member named 
'locked_vm'

vim +80 kernel/bpf/core.c

74  static int __bpf_prog_charge(struct user_struct *user, u32 pages)
75  {
76  unsigned long memlock_limit = rlimit(RLIMIT_MEMLOCK) >> 
PAGE_SHIFT;
77  unsigned long user_bufs;
78  
79  if (user) {
  > 80  user_bufs = atomic_long_add_return(pages, 
>locked_vm);
81  if (user_bufs > memlock_limit) {
82  atomic_long_sub(pages, >locked_vm);
83  return -EPERM;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH net 2/2] bpf: fix overflow in prog accounting

2016-12-16 Thread kbuild test robot
Hi Daniel,

[auto build test ERROR on net/master]

url:
https://github.com/0day-ci/linux/commits/Daniel-Borkmann/bpf-dynamically-allocate-digest-scratch-buffer/20161217-090046
config: sparc-defconfig (attached as .config)
compiler: sparc-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc 

All errors (new ones prefixed by >>):

   kernel/bpf/core.c: In function '__bpf_prog_charge':
>> kernel/bpf/core.c:80:50: error: 'struct user_struct' has no member named 
>> 'locked_vm'; did you mean 'locked_shm'?
  user_bufs = atomic_long_add_return(pages, >locked_vm);
 ^~
   kernel/bpf/core.c:82:32: error: 'struct user_struct' has no member named 
'locked_vm'; did you mean 'locked_shm'?
   atomic_long_sub(pages, >locked_vm);
   ^~
   kernel/bpf/core.c: In function '__bpf_prog_uncharge':
   kernel/bpf/core.c:93:31: error: 'struct user_struct' has no member named 
'locked_vm'; did you mean 'locked_shm'?
  atomic_long_sub(pages, >locked_vm);
  ^~

vim +80 kernel/bpf/core.c

74  static int __bpf_prog_charge(struct user_struct *user, u32 pages)
75  {
76  unsigned long memlock_limit = rlimit(RLIMIT_MEMLOCK) >> 
PAGE_SHIFT;
77  unsigned long user_bufs;
78  
79  if (user) {
  > 80  user_bufs = atomic_long_add_return(pages, 
>locked_vm);
81  if (user_bufs > memlock_limit) {
82  atomic_long_sub(pages, >locked_vm);
83  return -EPERM;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [B.A.T.M.A.N.] [PATCH net-next 1/2] net: batman-adv: Treat NET_XMIT_CN as transmit successfully

2016-12-16 Thread Feng Gao
On Fri, Dec 16, 2016 at 5:19 PM, Sven Eckelmann  wrote:
> On Montag, 21. November 2016 23:00:32 CET f...@ikuai8.com wrote:
>> From: Gao Feng 
>>
>> The tc could return NET_XMIT_CN as one congestion notification, but
>> it does not mean the packet is lost. Other modules like ipvlan,
>> macvlan, and others treat NET_XMIT_CN as success too.
>>
>> So batman-adv should add the NET_XMIT_CN check.
>>
>> Signed-off-by: Gao Feng 
>> ---
>>  net/batman-adv/distributed-arp-table.c |  2 +-
>>  net/batman-adv/fragmentation.c |  2 +-
>>  net/batman-adv/routing.c   | 10 +-
>>  net/batman-adv/soft-interface.c|  2 +-
>>  net/batman-adv/tp_meter.c  |  2 +-
>>  5 files changed, 9 insertions(+), 9 deletions(-)
>
> David marked your patches as "derefered" after "under review" and did not
> apply them directly. Also Florian Westphal didn't continue the discussion
> about the direction you should choose.
>
> The patches were therefore queued up in the in batman-adv
> 671630d6aad0..eab7617142d2. They will be submitted later(tm) in a pull
> request to David.

I get it. Thanks Sven.

Regards
Feng

>
> Thanks,
> Sven


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> I already did this. Check my branch.

Do you think it should return "u32" (as you currently have it) or
"unsigned long"?  I thought the latter, since it doesn't cost any
more and makes more 

> I wonder if this could also lead to a similar aliasing
> with arch_get_random_int, since I'm pretty sure all rdrand-like
> instructions return native word size anyway.

Well, Intel's can return 16, 32 or 64 bits, and it makes a
small difference with reseed scheduling.

>> - Ted, Andy Lutorminski and I will try to figure out a construction of
>>   get_random_long() that we all like.

> And me, I hope... No need to make this exclusive.

Gaah, engage brain before fingers.  That was so obvious I didn't say
it, and the result came out sounding extremely rude.

A better (but longer) way to write it would be "I'm sorry that I, Ted,
and Andy are all arguing with you and each other about how to do this
and we can't finalize this part yet".


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
On Sat, Dec 17, 2016 at 12:44 AM, George Spelvin
 wrote:
> Ths advice I'd give now is:
> - Implement
> unsigned long hsiphash(const void *data, size_t len, const unsigned long 
> key[2])
>   .. as SipHash on 64-bit (maybe SipHash-1-3, still being discussed) and
>   HalfSipHash on 32-bit.

I already did this. Check my branch.

> - Document when it may or may not be used carefully.

Good idea. I'll write up some extensive documentation about all of
this, detailing use cases and our various conclusions.

> - #define get_random_int (unsigned)get_random_long

That's a good idea, since ultimately the other just casts in the
return value. I wonder if this could also lead to a similar aliasing
with arch_get_random_int, since I'm pretty sure all rdrand-like
instructions return native word size anyway.

> - Ted, Andy Lutorminski and I will try to figure out a construction of
>   get_random_long() that we all like.

And me, I hope... No need to make this exclusive.

Jason


[PATCH] net/x25: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook 
---
 net/x25/sysctl_net_x25.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/x25/sysctl_net_x25.c b/net/x25/sysctl_net_x25.c
index 43239527a205..a06dfe143c67 100644
--- a/net/x25/sysctl_net_x25.c
+++ b/net/x25/sysctl_net_x25.c
@@ -70,7 +70,7 @@ static struct ctl_table x25_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec,
},
-   { 0, },
+   { },
 };
 
 void __init x25_register_sysctl(void)
-- 
2.7.4


-- 
Kees Cook
Nexus Security


[PATCH] isdn: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook 
---
 drivers/isdn/i4l/isdn_concap.c   |  6 +++---
 drivers/isdn/i4l/isdn_x25iface.c | 16 
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/isdn/i4l/isdn_concap.c b/drivers/isdn/i4l/isdn_concap.c
index 91d57304d4d3..336523ec077c 100644
--- a/drivers/isdn/i4l/isdn_concap.c
+++ b/drivers/isdn/i4l/isdn_concap.c
@@ -80,9 +80,9 @@ static int isdn_concap_dl_disconn_req(struct concap_proto 
*concap)
 }
 
 struct concap_device_ops isdn_concap_reliable_dl_dops = {
-   _concap_dl_data_req,
-   _concap_dl_connect_req,
-   _concap_dl_disconn_req
+   .data_req = _concap_dl_data_req,
+   .connect_req = _concap_dl_connect_req,
+   .disconn_req = _concap_dl_disconn_req
 };
 
 /* The following should better go into a dedicated source file such that
diff --git a/drivers/isdn/i4l/isdn_x25iface.c b/drivers/isdn/i4l/isdn_x25iface.c
index 0c5d8de41b23..ba60076e0b95 100644
--- a/drivers/isdn/i4l/isdn_x25iface.c
+++ b/drivers/isdn/i4l/isdn_x25iface.c
@@ -53,14 +53,14 @@ static int isdn_x25iface_disconn_ind(struct concap_proto *);
 
 
 static struct concap_proto_ops ix25_pops = {
-   _x25iface_proto_new,
-   _x25iface_proto_del,
-   _x25iface_proto_restart,
-   _x25iface_proto_close,
-   _x25iface_xmit,
-   _x25iface_receive,
-   _x25iface_connect_ind,
-   _x25iface_disconn_ind
+   .proto_new = _x25iface_proto_new,
+   .proto_del = _x25iface_proto_del,
+   .restart = _x25iface_proto_restart,
+   .close = _x25iface_proto_close,
+   .encap_and_xmit = _x25iface_xmit,
+   .data_ind = _x25iface_receive,
+   .connect_ind = _x25iface_connect_ind,
+   .disconn_ind = _x25iface_disconn_ind
 };
 
 /* error message helper function */
-- 
2.7.4


-- 
Kees Cook
Nexus Security


[PATCH] bna: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook 
---
 drivers/net/ethernet/brocade/bna/bna_enet.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/brocade/bna/bna_enet.c 
b/drivers/net/ethernet/brocade/bna/bna_enet.c
index 4e5c3874a50f..bba81735ce87 100644
--- a/drivers/net/ethernet/brocade/bna/bna_enet.c
+++ b/drivers/net/ethernet/brocade/bna/bna_enet.c
@@ -1676,10 +1676,10 @@ bna_cb_ioceth_reset(void *arg)
 }
 
 static struct bfa_ioc_cbfn bna_ioceth_cbfn = {
-   bna_cb_ioceth_enable,
-   bna_cb_ioceth_disable,
-   bna_cb_ioceth_hbfail,
-   bna_cb_ioceth_reset
+   .enable_cbfn = bna_cb_ioceth_enable,
+   .disable_cbfn = bna_cb_ioceth_disable,
+   .hbfail_cbfn = bna_cb_ioceth_hbfail,
+   .reset_cbfn = bna_cb_ioceth_reset
 };
 
 static void bna_attr_init(struct bna_ioceth *ioceth)
-- 
2.7.4


-- 
Kees Cook
Nexus Security


[PATCH] WAN: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook 
---
 drivers/net/wan/lmc/lmc_media.c | 97 +
 1 file changed, 49 insertions(+), 48 deletions(-)

diff --git a/drivers/net/wan/lmc/lmc_media.c b/drivers/net/wan/lmc/lmc_media.c
index 5920c996fcdf..ff2e4a5654c7 100644
--- a/drivers/net/wan/lmc/lmc_media.c
+++ b/drivers/net/wan/lmc/lmc_media.c
@@ -95,62 +95,63 @@ static inline void write_av9110_bit (lmc_softc_t *, int);
 static void write_av9110(lmc_softc_t *, u32, u32, u32, u32, u32);
 
 lmc_media_t lmc_ds3_media = {
-  lmc_ds3_init,/* special media init stuff */
-  lmc_ds3_default, /* reset to default state */
-  lmc_ds3_set_status,  /* reset status to state provided */
-  lmc_dummy_set_1, /* set clock source */
-  lmc_dummy_set2_1,/* set line speed */
-  lmc_ds3_set_100ft,   /* set cable length */
-  lmc_ds3_set_scram,   /* set scrambler */
-  lmc_ds3_get_link_status, /* get link status */
-  lmc_dummy_set_1, /* set link status */
-  lmc_ds3_set_crc_length,  /* set CRC length */
-  lmc_dummy_set_1, /* set T1 or E1 circuit type */
-  lmc_ds3_watchdog
+  .init = lmc_ds3_init,/* special media init 
stuff */
+  .defaults = lmc_ds3_default, /* reset to default state */
+  .set_status = lmc_ds3_set_status,/* reset status to state 
provided */
+  .set_clock_source = lmc_dummy_set_1, /* set clock source */
+  .set_speed = lmc_dummy_set2_1,   /* set line speed */
+  .set_cable_length = lmc_ds3_set_100ft,   /* set cable length */
+  .set_scrambler = lmc_ds3_set_scram,  /* set scrambler */
+  .get_link_status = lmc_ds3_get_link_status,  /* get link status */
+  .set_link_status = lmc_dummy_set_1,  /* set link status */
+  .set_crc_length = lmc_ds3_set_crc_length,/* set CRC length */
+  .set_circuit_type = lmc_dummy_set_1, /* set T1 or E1 circuit type */
+  .watchdog = lmc_ds3_watchdog
 };
 
 lmc_media_t lmc_hssi_media = {
-  lmc_hssi_init,   /* special media init stuff */
-  lmc_hssi_default,/* reset to default state */
-  lmc_hssi_set_status, /* reset status to state provided */
-  lmc_hssi_set_clock,  /* set clock source */
-  lmc_dummy_set2_1,/* set line speed */
-  lmc_dummy_set_1, /* set cable length */
-  lmc_dummy_set_1, /* set scrambler */
-  lmc_hssi_get_link_status,/* get link status */
-  lmc_hssi_set_link_status,/* set link status */
-  lmc_hssi_set_crc_length, /* set CRC length */
-  lmc_dummy_set_1, /* set T1 or E1 circuit type */
-  lmc_hssi_watchdog
+  .init = lmc_hssi_init,   /* special media init stuff */
+  .defaults = lmc_hssi_default,/* reset to default 
state */
+  .set_status = lmc_hssi_set_status,   /* reset status to state 
provided */
+  .set_clock_source = lmc_hssi_set_clock,  /* set clock source */
+  .set_speed = lmc_dummy_set2_1,   /* set line speed */
+  .set_cable_length = lmc_dummy_set_1, /* set cable length */
+  .set_scrambler = lmc_dummy_set_1,/* set scrambler */
+  .get_link_status = lmc_hssi_get_link_status, /* get link status */
+  .set_link_status = lmc_hssi_set_link_status, /* set link status */
+  .set_crc_length = lmc_hssi_set_crc_length,   /* set CRC length */
+  .set_circuit_type = lmc_dummy_set_1, /* set T1 or E1 circuit type */
+  .watchdog = lmc_hssi_watchdog
 };
 
-lmc_media_t lmc_ssi_media = { lmc_ssi_init,/* special media init stuff */
-  lmc_ssi_default, /* reset to default state */
-  lmc_ssi_set_status,  /* reset status to state provided */
-  lmc_ssi_set_clock,   /* set clock source */
-  lmc_ssi_set_speed,   /* set line speed */
-  lmc_dummy_set_1, /* set cable length */
-  lmc_dummy_set_1, /* set scrambler */
-  lmc_ssi_get_link_status, /* get link status */
-  lmc_ssi_set_link_status, /* set link status */
-  lmc_ssi_set_crc_length,  /* set CRC length */
-  lmc_dummy_set_1, /* set T1 or E1 circuit type */
-  lmc_ssi_watchdog
+lmc_media_t lmc_ssi_media = {
+  .init = lmc_ssi_init,/* special media init 
stuff */
+  .defaults = lmc_ssi_default, /* reset to default state */
+  .set_status = lmc_ssi_set_status,/* reset status to state 
provided */
+  .set_clock_source = lmc_ssi_set_clock,   /* set clock source */
+  .set_speed = lmc_ssi_set_speed,  /* set line speed */
+  .set_cable_length = lmc_dummy_set_1,

[PATCH] net: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook 
---
 net/decnet/dn_dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index b2c26b081134..41f803e35da3 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -201,7 +201,7 @@ static struct dn_dev_sysctl_table {
.extra1 = _t3,
.extra2 = _t3
},
-   {0}
+   { }
},
 };
 
-- 
2.7.4


-- 
Kees Cook
Nexus Security


[PATCH] ATM: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook 
---
 net/atm/lec.c|  6 ++--
 net/atm/mpoa_caches.c| 43 ++--
 net/vmw_vsock/vmci_transport_notify.c| 30 +--
 net/vmw_vsock/vmci_transport_notify_qstate.c | 30 +--
 4 files changed, 54 insertions(+), 55 deletions(-)

diff --git a/net/atm/lec.c b/net/atm/lec.c
index 779b3fa6052d..019557d0a11d 100644
--- a/net/atm/lec.c
+++ b/net/atm/lec.c
@@ -111,9 +111,9 @@ static inline void lec_arp_put(struct lec_arp_table *entry)
 }
 
 static struct lane2_ops lane2_ops = {
-   lane2_resolve,  /* resolve, spec 3.1.3 */
-   lane2_associate_req,/* associate_req,   spec 3.1.4 */
-   NULL/* associate indicator, spec 3.1.5 */
+   .resolve = lane2_resolve,   /* spec 3.1.3 */
+   .associate_req = lane2_associate_req,   /* spec 3.1.4 */
+   .associate_indicator = NULL /* spec 3.1.5 */
 };
 
 static unsigned char bus_mac[ETH_ALEN] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff 
};
diff --git a/net/atm/mpoa_caches.c b/net/atm/mpoa_caches.c
index 9e60e74c807d..a89fdebeffda 100644
--- a/net/atm/mpoa_caches.c
+++ b/net/atm/mpoa_caches.c
@@ -535,33 +535,32 @@ static void eg_destroy_cache(struct mpoa_client *mpc)
 
 
 static const struct in_cache_ops ingress_ops = {
-   in_cache_add_entry,   /* add_entry   */
-   in_cache_get, /* get */
-   in_cache_get_with_mask,   /* get_with_mask   */
-   in_cache_get_by_vcc,  /* get_by_vcc  */
-   in_cache_put, /* put */
-   in_cache_remove_entry,/* remove_entry*/
-   cache_hit,/* cache_hit   */
-   clear_count_and_expired,  /* clear_count */
-   check_resolving_entries,  /* check_resolving */
-   refresh_entries,  /* refresh */
-   in_destroy_cache  /* destroy_cache   */
+   .add_entry = in_cache_add_entry,
+   .get = in_cache_get,
+   .get_with_mask = in_cache_get_with_mask,
+   .get_by_vcc = in_cache_get_by_vcc,
+   .put = in_cache_put,
+   .remove_entry = in_cache_remove_entry,
+   .cache_hit = cache_hit,
+   .clear_count = clear_count_and_expired,
+   .check_resolving = check_resolving_entries,
+   .refresh = refresh_entries,
+   .destroy_cache = in_destroy_cache
 };
 
 static const struct eg_cache_ops egress_ops = {
-   eg_cache_add_entry,   /* add_entry*/
-   eg_cache_get_by_cache_id, /* get_by_cache_id  */
-   eg_cache_get_by_tag,  /* get_by_tag   */
-   eg_cache_get_by_vcc,  /* get_by_vcc   */
-   eg_cache_get_by_src_ip,   /* get_by_src_ip*/
-   eg_cache_put, /* put  */
-   eg_cache_remove_entry,/* remove_entry */
-   update_eg_cache_entry,/* update   */
-   clear_expired,/* clear_expired*/
-   eg_destroy_cache  /* destroy_cache*/
+   .add_entry = eg_cache_add_entry,
+   .get_by_cache_id = eg_cache_get_by_cache_id,
+   .get_by_tag = eg_cache_get_by_tag,
+   .get_by_vcc = eg_cache_get_by_vcc,
+   .get_by_src_ip = eg_cache_get_by_src_ip,
+   .put = eg_cache_put,
+   .remove_entry = eg_cache_remove_entry,
+   .update = update_eg_cache_entry,
+   .clear_expired = clear_expired,
+   .destroy_cache = eg_destroy_cache
 };
 
-
 void atm_mpoa_init_cache(struct mpoa_client *mpc)
 {
mpc->in_ops = _ops;
diff --git a/net/vmw_vsock/vmci_transport_notify.c 
b/net/vmw_vsock/vmci_transport_notify.c
index fd8cf0214d51..1406db4d97d1 100644
--- a/net/vmw_vsock/vmci_transport_notify.c
+++ b/net/vmw_vsock/vmci_transport_notify.c
@@ -662,19 +662,19 @@ static void 
vmci_transport_notify_pkt_process_negotiate(struct sock *sk)
 
 /* Socket control packet based operations. */
 const struct vmci_transport_notify_ops vmci_transport_notify_pkt_ops = {
-   vmci_transport_notify_pkt_socket_init,
-   vmci_transport_notify_pkt_socket_destruct,
-   vmci_transport_notify_pkt_poll_in,
-   vmci_transport_notify_pkt_poll_out,
-   vmci_transport_notify_pkt_handle_pkt,
-   vmci_transport_notify_pkt_recv_init,
-   vmci_transport_notify_pkt_recv_pre_block,
-   vmci_transport_notify_pkt_recv_pre_dequeue,
-   vmci_transport_notify_pkt_recv_post_dequeue,
-   vmci_transport_notify_pkt_send_init,
-   vmci_transport_notify_pkt_send_pre_block,
-   

[PATCH] isdn/gigaset: use designated initializers

2016-12-16 Thread Kees Cook
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook 
---
 drivers/isdn/gigaset/bas-gigaset.c | 32 
 drivers/isdn/gigaset/ser-gigaset.c | 32 
 drivers/isdn/gigaset/usb-gigaset.c | 32 
 3 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/drivers/isdn/gigaset/bas-gigaset.c 
b/drivers/isdn/gigaset/bas-gigaset.c
index aecec6d32463..11e13c56126f 100644
--- a/drivers/isdn/gigaset/bas-gigaset.c
+++ b/drivers/isdn/gigaset/bas-gigaset.c
@@ -2565,22 +2565,22 @@ static int gigaset_post_reset(struct usb_interface 
*intf)
 
 
 static const struct gigaset_ops gigops = {
-   gigaset_write_cmd,
-   gigaset_write_room,
-   gigaset_chars_in_buffer,
-   gigaset_brkchars,
-   gigaset_init_bchannel,
-   gigaset_close_bchannel,
-   gigaset_initbcshw,
-   gigaset_freebcshw,
-   gigaset_reinitbcshw,
-   gigaset_initcshw,
-   gigaset_freecshw,
-   gigaset_set_modem_ctrl,
-   gigaset_baud_rate,
-   gigaset_set_line_ctrl,
-   gigaset_isoc_send_skb,
-   gigaset_isoc_input,
+   .write_cmd = gigaset_write_cmd,
+   .write_room = gigaset_write_room,
+   .chars_in_buffer = gigaset_chars_in_buffer,
+   .brkchars = gigaset_brkchars,
+   .init_bchannel = gigaset_init_bchannel,
+   .close_bchannel = gigaset_close_bchannel,
+   .initbcshw = gigaset_initbcshw,
+   .freebcshw = gigaset_freebcshw,
+   .reinitbcshw = gigaset_reinitbcshw,
+   .initcshw = gigaset_initcshw,
+   .freecshw = gigaset_freecshw,
+   .set_modem_ctrl = gigaset_set_modem_ctrl,
+   .baud_rate = gigaset_baud_rate,
+   .set_line_ctrl = gigaset_set_line_ctrl,
+   .send_skb = gigaset_isoc_send_skb,
+   .handle_input = gigaset_isoc_input,
 };
 
 /* bas_gigaset_init
diff --git a/drivers/isdn/gigaset/ser-gigaset.c 
b/drivers/isdn/gigaset/ser-gigaset.c
index b90776ef56ec..ab0b63a4d045 100644
--- a/drivers/isdn/gigaset/ser-gigaset.c
+++ b/drivers/isdn/gigaset/ser-gigaset.c
@@ -445,22 +445,22 @@ static int gigaset_set_line_ctrl(struct cardstate *cs, 
unsigned cflag)
 }
 
 static const struct gigaset_ops ops = {
-   gigaset_write_cmd,
-   gigaset_write_room,
-   gigaset_chars_in_buffer,
-   gigaset_brkchars,
-   gigaset_init_bchannel,
-   gigaset_close_bchannel,
-   gigaset_initbcshw,
-   gigaset_freebcshw,
-   gigaset_reinitbcshw,
-   gigaset_initcshw,
-   gigaset_freecshw,
-   gigaset_set_modem_ctrl,
-   gigaset_baud_rate,
-   gigaset_set_line_ctrl,
-   gigaset_m10x_send_skb,  /* asyncdata.c */
-   gigaset_m10x_input, /* asyncdata.c */
+   .write_cmd = gigaset_write_cmd,
+   .write_room = gigaset_write_room,
+   .chars_in_buffer = gigaset_chars_in_buffer,
+   .brkchars = gigaset_brkchars,
+   .init_bchannel = gigaset_init_bchannel,
+   .close_bchannel = gigaset_close_bchannel,
+   .initbcshw = gigaset_initbcshw,
+   .freebcshw = gigaset_freebcshw,
+   .reinitbcshw = gigaset_reinitbcshw,
+   .initcshw = gigaset_initcshw,
+   .freecshw = gigaset_freecshw,
+   .set_modem_ctrl = gigaset_set_modem_ctrl,
+   .baud_rate = gigaset_baud_rate,
+   .set_line_ctrl = gigaset_set_line_ctrl,
+   .send_skb = gigaset_m10x_send_skb,  /* asyncdata.c */
+   .handle_input = gigaset_m10x_input, /* asyncdata.c */
 };
 
 
diff --git a/drivers/isdn/gigaset/usb-gigaset.c 
b/drivers/isdn/gigaset/usb-gigaset.c
index 5f306e2eece5..eade36dafa34 100644
--- a/drivers/isdn/gigaset/usb-gigaset.c
+++ b/drivers/isdn/gigaset/usb-gigaset.c
@@ -862,22 +862,22 @@ static int gigaset_pre_reset(struct usb_interface *intf)
 }
 
 static const struct gigaset_ops ops = {
-   gigaset_write_cmd,
-   gigaset_write_room,
-   gigaset_chars_in_buffer,
-   gigaset_brkchars,
-   gigaset_init_bchannel,
-   gigaset_close_bchannel,
-   gigaset_initbcshw,
-   gigaset_freebcshw,
-   gigaset_reinitbcshw,
-   gigaset_initcshw,
-   gigaset_freecshw,
-   gigaset_set_modem_ctrl,
-   gigaset_baud_rate,
-   gigaset_set_line_ctrl,
-   gigaset_m10x_send_skb,
-   gigaset_m10x_input,
+   .write_cmd = gigaset_write_cmd,
+   .write_room = gigaset_write_room,
+   .chars_in_buffer = gigaset_chars_in_buffer,
+   .brkchars = gigaset_brkchars,
+   .init_bchannel = gigaset_init_bchannel,
+   .close_bchannel = gigaset_close_bchannel,
+   .initbcshw = gigaset_initbcshw,
+   .freebcshw = gigaset_freebcshw,
+   .reinitbcshw = gigaset_reinitbcshw,
+   .initcshw = gigaset_initcshw,
+   .freecshw = gigaset_freecshw,
+   .set_modem_ctrl = 

[PATCH net 2/2] bpf: fix overflow in prog accounting

2016-12-16 Thread Daniel Borkmann
Commit aaac3ba95e4c ("bpf: charge user for creation of BPF maps and
programs") made a wrong assumption of charging against prog->pages.
Unlike map->pages, prog->pages are still subject to change when we
need to expand the program through bpf_prog_realloc().

This can for example happen during verification stage when we need to
expand and rewrite parts of the program. Should the required space
cross a page boundary, then prog->pages is not the same anymore as
its original value that we used to bpf_prog_charge_memlock() on. Thus,
we'll hit a wrap-around during bpf_prog_uncharge_memlock() when prog
is freed eventually. I noticed this that despite having unlimited
memlock, programs suddenly refused to load with EPERM error due to
insufficient memlock.

There are two ways to fix this issue. One would be to add a cached
variable to struct bpf_prog that takes a snapshot of prog->pages at the
time of charging. The other approach is to also account for resizes. I
chose to go with the latter for a couple of reasons: i) We want accounting
rather to be more accurate instead of further fooling limits, ii) adding
yet another page counter on struct bpf_prog would also be a waste just
for this purpose. We also do want to charge as early as possible to
avoid going into the verifier just to find out later on that we crossed
limits. The only place that needs to be fixed is bpf_prog_realloc(),
since only here we expand the program, so we try to account for the
needed delta and should we fail, call-sites check for outcome anyway.
On cBPF to eBPF migrations, we don't grab a reference to the user as
they are charged differently. With that in place, my test case worked
fine.

Fixes: aaac3ba95e4c ("bpf: charge user for creation of BPF maps and programs")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 include/linux/filter.h |  3 +++
 kernel/bpf/core.c  | 61 +++---
 kernel/bpf/syscall.c   | 25 -
 3 files changed, 61 insertions(+), 28 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index a0934e6..496a8c0 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -577,6 +577,9 @@ struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, 
unsigned int size,
  gfp_t gfp_extra_flags);
 void __bpf_prog_free(struct bpf_prog *fp);
 
+int bpf_prog_charge_memlock(struct bpf_prog *prog);
+void bpf_prog_uncharge_memlock(struct bpf_prog *prog);
+
 static inline void bpf_prog_unlock_free(struct bpf_prog *fp)
 {
bpf_prog_unlock_ro(fp);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 75c08b8..1f9a146 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -71,6 +71,51 @@ void *bpf_internal_load_pointer_neg_helper(const struct 
sk_buff *skb, int k, uns
return NULL;
 }
 
+static int __bpf_prog_charge(struct user_struct *user, u32 pages)
+{
+   unsigned long memlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+   unsigned long user_bufs;
+
+   if (user) {
+   user_bufs = atomic_long_add_return(pages, >locked_vm);
+   if (user_bufs > memlock_limit) {
+   atomic_long_sub(pages, >locked_vm);
+   return -EPERM;
+   }
+   }
+
+   return 0;
+}
+
+static void __bpf_prog_uncharge(struct user_struct *user, u32 pages)
+{
+   if (user)
+   atomic_long_sub(pages, >locked_vm);
+}
+
+int bpf_prog_charge_memlock(struct bpf_prog *prog)
+{
+   struct user_struct *user = get_current_user();
+   int ret;
+
+   ret = __bpf_prog_charge(user, prog->pages);
+   if (ret) {
+   free_uid(user);
+   return ret;
+   }
+
+   prog->aux->user = user;
+   return 0;
+}
+
+void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
+{
+   struct user_struct *user = prog->aux->user;
+
+   __bpf_prog_uncharge(user, prog->pages);
+   free_uid(user);
+}
+
 struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
 {
gfp_t gfp_flags = GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO |
@@ -105,19 +150,29 @@ struct bpf_prog *bpf_prog_realloc(struct bpf_prog 
*fp_old, unsigned int size,
gfp_t gfp_flags = GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO |
  gfp_extra_flags;
struct bpf_prog *fp;
+   u32 pages, delta;
+   int ret;
 
BUG_ON(fp_old == NULL);
 
size = round_up(size, PAGE_SIZE);
-   if (size <= fp_old->pages * PAGE_SIZE)
+   pages = size / PAGE_SIZE;
+   if (pages <= fp_old->pages)
return fp_old;
 
+   delta = pages - fp_old->pages;
+   ret = __bpf_prog_charge(fp_old->aux->user, delta);
+   if (ret)
+   return NULL;
+
fp = __vmalloc(size, gfp_flags, PAGE_KERNEL);
-   if (fp != NULL) {
+   if (fp == NULL) {
+   

[PATCH net 0/2] Two BPF fixes

2016-12-16 Thread Daniel Borkmann
This set contains two BPF fixes for net, one that addresses the
complaint from Geert wrt static allocations, and the other is a
fix wrt mem accounting that I found recently during testing.

Thanks!

Daniel Borkmann (2):
  bpf: dynamically allocate digest scratch buffer
  bpf: fix overflow in prog accounting

 include/linux/bpf.h|  2 +-
 include/linux/filter.h | 17 --
 kernel/bpf/core.c  | 88 ++
 kernel/bpf/syscall.c   | 27 +---
 kernel/bpf/verifier.c  |  6 ++--
 5 files changed, 94 insertions(+), 46 deletions(-)

-- 
1.9.3



[PATCH net 1/2] bpf: dynamically allocate digest scratch buffer

2016-12-16 Thread Daniel Borkmann
Geert rightfully complained that 7bd509e311f4 ("bpf: add prog_digest
and expose it via fdinfo/netlink") added a too large allocation of
variable 'raw' from bss section, and should instead be done dynamically:

  # ./scripts/bloat-o-meter kernel/bpf/core.o.1 kernel/bpf/core.o.2
  add/remove: 3/0 grow/shrink: 0/0 up/down: 33291/0 (33291)
  function old new   delta
  raw-   32832  +32832
  [...]

Since this is only relevant during program creation path, which can be
considered slow-path anyway, lets allocate that dynamically and be not
implicitly dependent on verifier mutex. Move bpf_prog_calc_digest() at
the beginning of replace_map_fd_with_map_ptr() and also error handling
stays straight forward.

Reported-by: Geert Uytterhoeven 
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 include/linux/bpf.h|  2 +-
 include/linux/filter.h | 14 +++---
 kernel/bpf/core.c  | 27 ---
 kernel/bpf/syscall.c   |  2 +-
 kernel/bpf/verifier.c  |  6 --
 5 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8796ff0..201eb48 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -216,7 +216,7 @@ struct bpf_event_entry {
 u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 
 bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog 
*fp);
-void bpf_prog_calc_digest(struct bpf_prog *fp);
+int bpf_prog_calc_digest(struct bpf_prog *fp);
 
 const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
 
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 6a16583..a0934e6 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -57,9 +57,6 @@
 /* BPF program can access up to 512 bytes of stack space. */
 #define MAX_BPF_STACK  512
 
-/* Maximum BPF program size in bytes. */
-#define MAX_BPF_SIZE   (BPF_MAXINSNS * sizeof(struct bpf_insn))
-
 /* Helper macros for filter block array initializers. */
 
 /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
@@ -517,6 +514,17 @@ static __always_inline u32 bpf_prog_run_xdp(const struct 
bpf_prog *prog,
return BPF_PROG_RUN(prog, xdp);
 }
 
+static inline u32 bpf_prog_insn_size(const struct bpf_prog *prog)
+{
+   return prog->len * sizeof(struct bpf_insn);
+}
+
+static inline u32 bpf_prog_digest_scratch_size(const struct bpf_prog *prog)
+{
+   return round_up(bpf_prog_insn_size(prog) +
+   sizeof(__be64) + 1, SHA_MESSAGE_BYTES);
+}
+
 static inline unsigned int bpf_prog_size(unsigned int proglen)
 {
return max(sizeof(struct bpf_prog),
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 83e0d15..75c08b8 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -136,28 +136,29 @@ void __bpf_prog_free(struct bpf_prog *fp)
vfree(fp);
 }
 
-#define SHA_BPF_RAW_SIZE   \
-   round_up(MAX_BPF_SIZE + sizeof(__be64) + 1, SHA_MESSAGE_BYTES)
-
-/* Called under verifier mutex. */
-void bpf_prog_calc_digest(struct bpf_prog *fp)
+int bpf_prog_calc_digest(struct bpf_prog *fp)
 {
const u32 bits_offset = SHA_MESSAGE_BYTES - sizeof(__be64);
-   static u32 ws[SHA_WORKSPACE_WORDS];
-   static u8 raw[SHA_BPF_RAW_SIZE];
-   struct bpf_insn *dst = (void *)raw;
+   u32 raw_size = bpf_prog_digest_scratch_size(fp);
+   u32 ws[SHA_WORKSPACE_WORDS];
u32 i, bsize, psize, blocks;
+   struct bpf_insn *dst;
bool was_ld_map;
-   u8 *todo = raw;
+   u8 *raw, *todo;
__be32 *result;
__be64 *bits;
 
+   raw = vmalloc(raw_size);
+   if (!raw)
+   return -ENOMEM;
+
sha_init(fp->digest);
memset(ws, 0, sizeof(ws));
 
/* We need to take out the map fd for the digest calculation
 * since they are unstable from user space side.
 */
+   dst = (void *)raw;
for (i = 0, was_ld_map = false; i < fp->len; i++) {
dst[i] = fp->insnsi[i];
if (!was_ld_map &&
@@ -177,12 +178,13 @@ void bpf_prog_calc_digest(struct bpf_prog *fp)
}
}
 
-   psize = fp->len * sizeof(struct bpf_insn);
-   memset([psize], 0, sizeof(raw) - psize);
+   psize = bpf_prog_insn_size(fp);
+   memset([psize], 0, raw_size - psize);
raw[psize++] = 0x80;
 
bsize  = round_up(psize, SHA_MESSAGE_BYTES);
blocks = bsize / SHA_MESSAGE_BYTES;
+   todo   = raw;
if (bsize - psize >= sizeof(__be64)) {
bits = (__be64 *)(todo + bsize - sizeof(__be64));
} else {
@@ -199,6 +201,9 @@ void bpf_prog_calc_digest(struct bpf_prog *fp)
result = (__force __be32 *)fp->digest;
for (i = 0; i < SHA_DIGEST_WORDS; i++)
result[i] = cpu_to_be32(fp->digest[i]);
+
+   

[PATCH] net: mv643xx_eth: fix build failure

2016-12-16 Thread Sudip Mukherjee
The build of sparc allmodconfig fails with the error:
"of_irq_to_resource" [drivers/net/ethernet/marvell/mv643xx_eth.ko]
undefined!

of_irq_to_resource() is defined when CONFIG_OF_IRQ is defined. And also
CONFIG_OF_IRQ can only be defined if CONFIG_IRQ is defined. So we can
safely use #if defined(CONFIG_OF_IRQ) in the code.

Signed-off-by: Sudip Mukherjee <sudip.mukher...@codethink.co.uk>
---

build log of next-20161216 is at:
https://travis-ci.org/sudipm-mukherjee/parport/jobs/184652820

 drivers/net/ethernet/marvell/mv643xx_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c 
b/drivers/net/ethernet/marvell/mv643xx_eth.c
index 5f62c3d..1fa7c03 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -2713,7 +2713,7 @@ static void infer_hw_params(struct 
mv643xx_eth_shared_private *msp)
 MODULE_DEVICE_TABLE(of, mv643xx_eth_shared_ids);
 #endif
 
-#if defined(CONFIG_OF) && !defined(CONFIG_MV64X60)
+#if defined(CONFIG_OF_IRQ) && !defined(CONFIG_MV64X60)
 #define mv643xx_eth_property(_np, _name, _v)   \
do {\
u32 tmp;\
-- 
1.9.1



Re: [PATCH 1/2] bpf: do not use KMALLOC_SHIFT_MAX

2016-12-16 Thread Alexei Starovoitov
On Sat, Dec 17, 2016 at 12:39:17AM +0100, Michal Hocko wrote:
> On Fri 16-12-16 15:23:42, Alexei Starovoitov wrote:
> > On Fri, Dec 16, 2016 at 11:02:35PM +0100, Michal Hocko wrote:
> > > On Fri 16-12-16 10:02:10, Alexei Starovoitov wrote:
> > > > On Thu, Dec 15, 2016 at 05:47:21PM +0100, Michal Hocko wrote:
> > > > > From: Michal Hocko 
> > > > > 
> > > > > 01b3f52157ff ("bpf: fix allocation warnings in bpf maps and integer
> > > > > overflow") has added checks for the maximum allocateable size. It
> > > > > (ab)used KMALLOC_SHIFT_MAX for that purpose. While this is not 
> > > > > incorrect
> > > > > it is not very clean because we already have KMALLOC_MAX_SIZE for this
> > > > > very reason so let's change both checks to use KMALLOC_MAX_SIZE 
> > > > > instead.
> > > > > 
> > > > > Cc: Alexei Starovoitov 
> > > > > Signed-off-by: Michal Hocko 
> > > > 
> > > > Nack until the patches 1 and 2 are reversed.
> > > 
> > > I do not insist on ordering. The thing is that it shouldn't matter all
> > > that much. Or are you worried about bisectability?
> > 
> > This patch 1 strongly depends on patch 2 !
> > Therefore order matters.
> > The patch 1 by itself is broken.
> > The commit log is saying
> > '(ab)used KMALLOC_SHIFT_MAX for that purpose .. use KMALLOC_MAX_SIZE 
> > instead'
> > that is also incorrect. We cannot do that until KMALLOC_MAX_SIZE is fixed.
> > So please change the order
> 
> Yes, I agree that using KMALLOC_MAX_SIZE could lead to a warning with
> the current ordering. Why that matters all that much is less clear to
> me. The allocation would simply fail and you would return ENOMEM rather
> than E2BIG. Does this really matter?
> 
> Anyway, as I've said, I do not really insist on the current ordering and
> the will ask Andrew to reorder them. I am just really wondering about
> such a strong pushback about something that barely matters. Or maybe I
> am just missing your point and checking KMALLOC_MAX_SIZE without an
> update would lead to a wrong behavior, user space breakage, crash or
> anything similar.

if admin set ulimit for locked memory high enough for the particular user,
that non-root user will be able to trigger warn_on_once in 
__alloc_pages_slowpath
which is not acceptable.
Also see the comment in hashtab.c
  if (htab->map.value_size >= (1 << (KMALLOC_SHIFT_MAX - 1)) -
  MAX_BPF_STACK - sizeof(struct htab_elem))
  /* if value_size is bigger, the user space won't be able to
   * access the elements via bpf syscall. This check also makes
   * sure that the elem_size doesn't overflow and it's
   * kmalloc-able later in htab_map_update_elem()
   */
  goto free_htab;

> > and fix the commit log to say that KMALLOC_MAX_SIZE
> > is actually valid limit now.
> 
> KMALLOC_MAX_SIZE has always been the right limit. It's value has been
> incorrect but that is to be fixed now. Using KMALLOC_SHIFT_MAX is simply
> abusing an internal constant. So I am not sure what should be fixed in
> the changelog.

that's exactly my problem with this patch and the commit log.
You think it's abusing KMALLOC_SHIFT_MAX whereas it's doing so
for reasons stated above.
That piece of code cannot use KMALLOC_MAX_SIZE until it's fixed.
So commit log should say something like:
"now since KMALLOC_MAX_SIZE is fixed and size < KMALLOC_MAX_SIZE condition
guarantees warn free allocation in kmalloc(value_size, GFP_USER | __GFP_NOWARN);
we can safely use KMALLOC_MAX_SIZE instead of KMALLOC_SHIFT_MAX"



Re: [PATCH] net: wan: Use dma_pool_zalloc

2016-12-16 Thread Joe Perches
On Fri, 2016-12-16 at 19:25 +0530, Souptick Joarder wrote:
> On Fri, Dec 16, 2016 at 11:40 AM, Joe Perches  wrote:
> > On Fri, 2016-12-16 at 11:33 +0530, Souptick Joarder wrote:
> > > On Thu, Dec 15, 2016 at 10:18 PM, Joe Perches  wrote:
> > > > On Thu, 2016-12-15 at 10:41 +0530, Souptick Joarder wrote:
> > > > > On Mon, Dec 12, 2016 at 10:12 AM, Souptick Joarder 
> > > > >  wrote:
> > > > > > On Fri, Dec 9, 2016 at 6:33 PM, Krzysztof HaƂasa  
> > > > > > wrote:
> > > > > > > Souptick Joarder  writes:
> > > > > > > 
> > > > > > > > We should use dma_pool_zalloc instead of dma_pool_alloc/memset
> > > > 
> > > > []
> > > > > > > > diff --git a/drivers/net/wan/ixp4xx_hss.c 
> > > > > > > > b/drivers/net/wan/ixp4xx_hss.c
> > > > 
> > > > []
> > > > > > > > @@ -976,10 +976,9 @@ static int init_hdlc_queues(struct port 
> > > > > > > > *port)
> > > > > > > >   return -ENOMEM;
> > > > > > > >   }
> > > > > > > > 
> > > > > > > > - if (!(port->desc_tab = dma_pool_alloc(dma_pool, 
> > > > > > > > GFP_KERNEL,
> > > > > > > > -   
> > > > > > > > >desc_tab_phys)))
> > > > > > > > + if (!(port->desc_tab = dma_pool_zalloc(dma_pool, 
> > > > > > > > GFP_KERNEL,
> > > > > > > > +
> > > > > > > > >desc_tab_phys)))
> > > > > > > >   return -ENOMEM;
> > > > > > > > - memset(port->desc_tab, 0, POOL_ALLOC_SIZE);
> > > > > > > >   memset(port->rx_buff_tab, 0, sizeof(port->rx_buff_tab)); 
> > > > > > > > /* tables */
> > > > > > > >   memset(port->tx_buff_tab, 0, sizeof(port->tx_buff_tab));
> > > > > > > 
> > > > > > > This look fine, feel free to send it to the netdev mailing list 
> > > > > > > for
> > > > > > > inclusion.
> > > > > > 
> > > > > > Including netdev mailing list based as requested.
> > > > > > > Acked-by: Krzysztof Halasa 
> > > > 
> > > > []
> > > > > Any comment on this patch ?
> > > > 
> > > > Shouldn't the one in drivers/net/ethernet/xscale/ixp4xx_eth.c
> > > > also be changed?
> > > 
> > > Yes, you are right.   Do you want me to include it in same patch?
> > 
> > Your choice.  I would use a single patch.
> 
> There are few other places where the same change is applicable.
> I am planning to put all those changes in a single patch. It includes
> changes in drivers/net/ethernet/xscale/ixp4xx_eth.c
> 
> You can review this patch separately.

If you are spanning multiple drivers maintained by different
groups, it's probably better to create a patch series, one for
each driver, to allow the various maintainers to apply the
patches to their individually maintained drivers.

Joe



Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> 64-bit security for an RNG is not reasonable even with rekeying. No no
> no. Considering we already have a massive speed-up here with the
> secure version, there's zero reason to start weakening the security
> because we're trigger happy with our benchmarks. No no no.

Just to clarify, I was discussing the idea with Ted (who's in charge of
the whole thing, not me), not trying to make any sort of final decision
on the subject.  I need to look at the various users (46 non-trivial ones
for get_random_int, 15 for get_random_long) and see what their security
requirements actually are.

I'm also trying to see if HalfSipHash can be used in a way that gives
slightly more than 64 bits of effective security.

The problem is that the old MD5-based transform had unclear, but
obviously ample, security.  There were 64 bytes of global secret and
16 chaining bytes per CPU.  Adapting SipHash (even the full version)
takes more thinking.

An actual HalfSipHash-based equivalent to the existing code would be:

#define RANDOM_INT_WORDS (64 / sizeof(long))/* 16 or 8 */

static u32 random_int_secret[RANDOM_INT_WORDS]
cacheline_aligned __read_mostly;
static DEFINE_PER_CPU(unsigned long[4], get_random_int_hash)
__aligned(sizeof(unsigned long));

unsigned long get_random_long(void)
{
unsigned long *hash = get_cpu_var(get_random_int_hash);
unsigned long v0 = hash[0], v1 = hash[1], v2 = hash[2], v3 = hash[3];
int i;

/* This could be improved, but it's equivalent */
v0 += current->pid + jiffies + random_get_entropy();

for (i = 0; i < RANDOM_INT_WORDS; i++) {
v3 ^= random_int_secret[i];
HSIPROUND;
HSIPROUND;
v0 ^= random_int_secret[i];
}
/* To be equivalent, we *don't* finalize the transform */

hash[0] = v0; hash[1] = v1; hash[2] = v2; hash[3] = v3;
put_cpu_var(get_random_int_hash);

return v0 ^ v1 ^ v2 ^ v3;
}

I don't think there's a 2^64 attack on that.

But 64 bytes of global secret is ridiculous if the hash function
doesn't require that minimum block size.  It'll take some thinking.


Ths advice I'd give now is:
- Implement
unsigned long hsiphash(const void *data, size_t len, const unsigned long key[2])
  .. as SipHash on 64-bit (maybe SipHash-1-3, still being discussed) and
  HalfSipHash on 32-bit.
- Document when it may or may not be used carefully.
- #define get_random_int (unsigned)get_random_long
- Ted, Andy Lutorminski and I will try to figure out a construction of
  get_random_long() that we all like.


('scuse me for a few hours, I have some unrelated things I really *should*
be working on...)


Re: [PATCH 1/2] bpf: do not use KMALLOC_SHIFT_MAX

2016-12-16 Thread Daniel Borkmann

On 12/17/2016 12:23 AM, Alexei Starovoitov wrote:

On Fri, Dec 16, 2016 at 11:02:35PM +0100, Michal Hocko wrote:

On Fri 16-12-16 10:02:10, Alexei Starovoitov wrote:

On Thu, Dec 15, 2016 at 05:47:21PM +0100, Michal Hocko wrote:

From: Michal Hocko 

01b3f52157ff ("bpf: fix allocation warnings in bpf maps and integer
overflow") has added checks for the maximum allocateable size. It
(ab)used KMALLOC_SHIFT_MAX for that purpose. While this is not incorrect
it is not very clean because we already have KMALLOC_MAX_SIZE for this
very reason so let's change both checks to use KMALLOC_MAX_SIZE instead.

Cc: Alexei Starovoitov 
Signed-off-by: Michal Hocko 


Nack until the patches 1 and 2 are reversed.


I do not insist on ordering. The thing is that it shouldn't matter all
that much. Or are you worried about bisectability?


This patch 1 strongly depends on patch 2 !
Therefore order matters.
The patch 1 by itself is broken.
The commit log is saying
'(ab)used KMALLOC_SHIFT_MAX for that purpose .. use KMALLOC_MAX_SIZE instead'
that is also incorrect. We cannot do that until KMALLOC_MAX_SIZE is fixed.
So please change the order and fix the commit log to say that KMALLOC_MAX_SIZE
is actually valid limit now.


Michal, please also Cc netdev on your v2. Looks like the set
originally didn't Cc it (at least I didn't see 2/2). Thanks.


Re: [PATCH 1/2] bpf: do not use KMALLOC_SHIFT_MAX

2016-12-16 Thread Michal Hocko
On Fri 16-12-16 15:23:42, Alexei Starovoitov wrote:
> On Fri, Dec 16, 2016 at 11:02:35PM +0100, Michal Hocko wrote:
> > On Fri 16-12-16 10:02:10, Alexei Starovoitov wrote:
> > > On Thu, Dec 15, 2016 at 05:47:21PM +0100, Michal Hocko wrote:
> > > > From: Michal Hocko 
> > > > 
> > > > 01b3f52157ff ("bpf: fix allocation warnings in bpf maps and integer
> > > > overflow") has added checks for the maximum allocateable size. It
> > > > (ab)used KMALLOC_SHIFT_MAX for that purpose. While this is not incorrect
> > > > it is not very clean because we already have KMALLOC_MAX_SIZE for this
> > > > very reason so let's change both checks to use KMALLOC_MAX_SIZE instead.
> > > > 
> > > > Cc: Alexei Starovoitov 
> > > > Signed-off-by: Michal Hocko 
> > > 
> > > Nack until the patches 1 and 2 are reversed.
> > 
> > I do not insist on ordering. The thing is that it shouldn't matter all
> > that much. Or are you worried about bisectability?
> 
> This patch 1 strongly depends on patch 2 !
> Therefore order matters.
> The patch 1 by itself is broken.
> The commit log is saying
> '(ab)used KMALLOC_SHIFT_MAX for that purpose .. use KMALLOC_MAX_SIZE instead'
> that is also incorrect. We cannot do that until KMALLOC_MAX_SIZE is fixed.
> So please change the order

Yes, I agree that using KMALLOC_MAX_SIZE could lead to a warning with
the current ordering. Why that matters all that much is less clear to
me. The allocation would simply fail and you would return ENOMEM rather
than E2BIG. Does this really matter?

Anyway, as I've said, I do not really insist on the current ordering and
the will ask Andrew to reorder them. I am just really wondering about
such a strong pushback about something that barely matters. Or maybe I
am just missing your point and checking KMALLOC_MAX_SIZE without an
update would lead to a wrong behavior, user space breakage, crash or
anything similar.

> and fix the commit log to say that KMALLOC_MAX_SIZE
> is actually valid limit now.

KMALLOC_MAX_SIZE has always been the right limit. It's value has been
incorrect but that is to be fixed now. Using KMALLOC_SHIFT_MAX is simply
abusing an internal constant. So I am not sure what should be fixed in
the changelog.

-- 
Michal Hocko
SUSE Labs


Re: Soft lockup in inet_put_port on 4.6

2016-12-16 Thread Josef Bacik
On Fri, Dec 16, 2016 at 5:18 PM, Tom Herbert  
wrote:

On Fri, Dec 16, 2016 at 2:08 PM, Josef Bacik  wrote:

 On Fri, Dec 16, 2016 at 10:21 AM, Josef Bacik  wrote:


 On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik  wrote:


 On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa
  wrote:


 Hi Josef,

 On 15.12.2016 19:53, Josef Bacik wrote:


  On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert 


 wrote:


  On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek 


  wrote:


   On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert 


  wrote:


   I think there may be some suspicious code in 
inet_csk_get_port. At

   tb_found there is:

   if (((tb->fastreuse > 0 && reuse) ||
(tb->fastreuseport > 0 &&
 
!rcu_access_pointer(sk->sk_reuseport_cb) &&
 sk->sk_reuseport && 
uid_eq(tb->fastuid,

  uid))) &&
   smallest_size == -1)
   goto success;
   if 
(inet_csk(sk)->icsk_af_ops->bind_conflict(sk,

  tb, true)) {
   if ((reuse ||
(tb->fastreuseport > 0 &&
 sk->sk_reuseport &&

  !rcu_access_pointer(sk->sk_reuseport_cb) &&
 uid_eq(tb->fastuid, uid))) &&
   smallest_size != -1 && 
--attempts >=

 0) {
   
spin_unlock_bh(>lock);

   goto again;
   }
   goto fail_unlock;
   }

   AFAICT there is redundancy in these two conditionals.  The 
same

 clause
   is being checked in both: (tb->fastreuseport > 0 &&
   !rcu_access_pointer(sk->sk_reuseport_cb) && 
sk->sk_reuseport &&
   uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this 
is true

 the
   first conditional should be hit, goto done,  and the 
second will

 never
   evaluate that part to true-- unless the sk is changed (do 
we need

   READ_ONCE for sk->sk_reuseport_cb?).


   That's an interesting point... It looks like this function 
also

   changed in 4.6 from using a single local_bh_disable() at the
 beginning
   with several spin_lock(>lock) to exclusively
   spin_lock_bh(>lock) at each locking point.  Perhaps 
the full

 bh
   disable variant was preventing the timers in your stack 
trace from

   running interleaved with this function before?



  Could be, although dropping the lock shouldn't be able to 
affect the

  search state. TBH, I'm a little lost in reading function, the
  SO_REUSEPORT handling is pretty complicated. For instance,
  rcu_access_pointer(sk->sk_reuseport_cb) is checked three 
times in

 that
  function and also in every call to inet_csk_bind_conflict. I 
wonder

 if
  we can simply this under the assumption that SO_REUSEPORT is 
only

  allowed if the port number (snum) is explicitly specified.



  Ok first I have data for you Hannes, here's the time 
distributions
  before during and after the lockup (with all the debugging in 
place

 the
  box eventually recovers).  I've attached it as a text file 
since it is

  long.



 Thanks a lot!

  Second is I was thinking about why we would spend so much time 
doing

 the
  ->owners list, and obviously it's because of the massive 
amount of
  timewait sockets on the owners list.  I wrote the following 
dumb patch
  and tested it and the problem has disappeared completely.  Now 
I don't
  know if this is right at all, but I thought it was weird we 
weren't
  copying the soreuseport option from the original socket onto 
the twsk.
  Is there are reason we aren't doing this currently?  Does this 
help

  explain what is happening?  Thanks,



 The patch is interesting and a good clue, but I am immediately a 
bit
 concerned that we don't copy/tag the socket with the uid also to 
keep
 the security properties for SO_REUSEPORT. I have to think a bit 
more

 about this.

 We have seen hangs during connect. I am afraid this patch 
wouldn't help

 there while also guaranteeing uniqueness.




 Yeah so I looked at the code some more and actually my patch is 
really
 bad.  If sk2->sk_reuseport is set we'll look at 
sk2->sk_reuseport_cb, which

 is outside of the timewait sock, so that's definitely bad.

 But we should at least be setting it to 0 so that we don't do this
 normally.  Unfortunately simply setting it to 0 doesn't fix the 
problem.  So
 for some reason having ->sk_reuseport set to 1 on a timewait 
socket makes

 this problem non-existent, which is strange.

 So back to the drawing board I guess.  I wonder if doing what 
craig
 suggested and batching the timewait timer expires so it hurts 
less would

 accomplish the same results.  Thanks,



 Wait no I lied, we access the sk->sk_reuseport_cb, not sk2's.  
This is the

 code


Re: [PATCH 1/2] bpf: do not use KMALLOC_SHIFT_MAX

2016-12-16 Thread Alexei Starovoitov
On Fri, Dec 16, 2016 at 11:02:35PM +0100, Michal Hocko wrote:
> On Fri 16-12-16 10:02:10, Alexei Starovoitov wrote:
> > On Thu, Dec 15, 2016 at 05:47:21PM +0100, Michal Hocko wrote:
> > > From: Michal Hocko 
> > > 
> > > 01b3f52157ff ("bpf: fix allocation warnings in bpf maps and integer
> > > overflow") has added checks for the maximum allocateable size. It
> > > (ab)used KMALLOC_SHIFT_MAX for that purpose. While this is not incorrect
> > > it is not very clean because we already have KMALLOC_MAX_SIZE for this
> > > very reason so let's change both checks to use KMALLOC_MAX_SIZE instead.
> > > 
> > > Cc: Alexei Starovoitov 
> > > Signed-off-by: Michal Hocko 
> > 
> > Nack until the patches 1 and 2 are reversed.
> 
> I do not insist on ordering. The thing is that it shouldn't matter all
> that much. Or are you worried about bisectability?

This patch 1 strongly depends on patch 2 !
Therefore order matters.
The patch 1 by itself is broken.
The commit log is saying
'(ab)used KMALLOC_SHIFT_MAX for that purpose .. use KMALLOC_MAX_SIZE instead'
that is also incorrect. We cannot do that until KMALLOC_MAX_SIZE is fixed.
So please change the order and fix the commit log to say that KMALLOC_MAX_SIZE
is actually valid limit now.



Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
An idea I had which mght be useful:

You could perhaps save two rounds in siphash_*u64.

The final word with the length (called "b" in your implementation)
only needs to be there if the input is variable-sized.

If every use of a given key is of a fixed-size input, you don't need
a length suffix.  When the input is an even number of words, that can
save you two rounds.

This requires an audit of callers (e.g. you have to use different
keys for IPv4 and IPv6 ISNs), but can save time.

(This is crypto 101; search "MD-strengthening" or see the remark on
p. 101 on Damgaard's 1989 paper "A design principle for hash functions" at
http://saluc.engr.uconn.edu/refs/algorithms/hashalg/damgard89adesign.pdf
but I'm sure that Ted, Jean-Philippe, and/or DJB will confirm if you'd
like.)

Jason A. Donenfeld wrote:
> Oh, okay, that is exactly what I thought was going on. I just thought
> you were implying that jiffies could be moved inside the hash, which
> then confused my understanding of how things should be. In any case,
> thanks for the explanation.

No, the rekeying procedure is cleverer.

The thing is, all that matters is that the ISN increments fast enough,
but not wrap too soon.

It *is* permitted to change the random base, as long as it only
increases, and slower than the timestamp does.

So what you do is every few minutes, you increment the high 4 bits of the
random base and change the key used to generate the low 28 bits.

The base used for any particular host might change from 0x1000
to 0x2fff, or from 0x1fff to 0x2000, but either way, it's
increasing, and not too fast.

This has the downside that an attacker can see 4 bits of the base,
so only needs to send send 2^28 = 256 MB to flood the connection,
but the upside that the key used to generate the low bits changes
faster than it can be broken.


Re: [PATCH v6 3/5] random: use SipHash in place of MD5

2016-12-16 Thread Jason A. Donenfeld
Hi Andy,

> Agreed.  A simpler contruction would be:
>
> chaining++;
> output = H(chaining, secret);
>
> And this looks a whole lot like Ted's ChaCha20 construction.

In that simpler construction with counter-based secret rekeying and in
Ted's ChaCha20 construction, the issue is that every X hits, there's a
call to get_random_bytes, which has variable performance and entropy
issues. Doing it my way with it being time based, in the event that
somebody runs ` :(){ :|:& };:`, system performance doesn't suffer
because ASLR is making repeated calls to get_random_bytes every 128 or
so process creations. In the time based way, the system performance
will not suffer.

Jason


Re: Soft lockup in inet_put_port on 4.6

2016-12-16 Thread Tom Herbert
On Fri, Dec 16, 2016 at 2:08 PM, Josef Bacik  wrote:
> On Fri, Dec 16, 2016 at 10:21 AM, Josef Bacik  wrote:
>>
>> On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik  wrote:
>>>
>>> On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa
>>>  wrote:

 Hi Josef,

 On 15.12.2016 19:53, Josef Bacik wrote:
>
>  On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert 
> wrote:
>>
>>  On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek 
>>  wrote:
>>>
>>>   On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert 
>>>  wrote:

   I think there may be some suspicious code in inet_csk_get_port. At
   tb_found there is:

   if (((tb->fastreuse > 0 && reuse) ||
(tb->fastreuseport > 0 &&
 !rcu_access_pointer(sk->sk_reuseport_cb) &&
 sk->sk_reuseport && uid_eq(tb->fastuid,
  uid))) &&
   smallest_size == -1)
   goto success;
   if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk,
  tb, true)) {
   if ((reuse ||
(tb->fastreuseport > 0 &&
 sk->sk_reuseport &&

  !rcu_access_pointer(sk->sk_reuseport_cb) &&
 uid_eq(tb->fastuid, uid))) &&
   smallest_size != -1 && --attempts >=
 0) {
   spin_unlock_bh(>lock);
   goto again;
   }
   goto fail_unlock;
   }

   AFAICT there is redundancy in these two conditionals.  The same
 clause
   is being checked in both: (tb->fastreuseport > 0 &&
   !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport &&
   uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true
 the
   first conditional should be hit, goto done,  and the second will
 never
   evaluate that part to true-- unless the sk is changed (do we need
   READ_ONCE for sk->sk_reuseport_cb?).
>>>
>>>   That's an interesting point... It looks like this function also
>>>   changed in 4.6 from using a single local_bh_disable() at the
>>> beginning
>>>   with several spin_lock(>lock) to exclusively
>>>   spin_lock_bh(>lock) at each locking point.  Perhaps the full
>>> bh
>>>   disable variant was preventing the timers in your stack trace from
>>>   running interleaved with this function before?
>>
>>
>>  Could be, although dropping the lock shouldn't be able to affect the
>>  search state. TBH, I'm a little lost in reading function, the
>>  SO_REUSEPORT handling is pretty complicated. For instance,
>>  rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in
>> that
>>  function and also in every call to inet_csk_bind_conflict. I wonder
>> if
>>  we can simply this under the assumption that SO_REUSEPORT is only
>>  allowed if the port number (snum) is explicitly specified.
>
>
>  Ok first I have data for you Hannes, here's the time distributions
>  before during and after the lockup (with all the debugging in place
> the
>  box eventually recovers).  I've attached it as a text file since it is
>  long.


 Thanks a lot!

>  Second is I was thinking about why we would spend so much time doing
> the
>  ->owners list, and obviously it's because of the massive amount of
>  timewait sockets on the owners list.  I wrote the following dumb patch
>  and tested it and the problem has disappeared completely.  Now I don't
>  know if this is right at all, but I thought it was weird we weren't
>  copying the soreuseport option from the original socket onto the twsk.
>  Is there are reason we aren't doing this currently?  Does this help
>  explain what is happening?  Thanks,


 The patch is interesting and a good clue, but I am immediately a bit
 concerned that we don't copy/tag the socket with the uid also to keep
 the security properties for SO_REUSEPORT. I have to think a bit more
 about this.

 We have seen hangs during connect. I am afraid this patch wouldn't help
 there while also guaranteeing uniqueness.
>>>
>>>
>>>
>>> Yeah so I looked at the code some more and actually my patch is really
>>> bad.  If sk2->sk_reuseport is set we'll look at sk2->sk_reuseport_cb, which
>>> is outside of the timewait sock, so that's definitely bad.
>>>
>>> But we should at least be 

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
On Fri, Dec 16, 2016 at 11:13 PM, George Spelvin
 wrote:
> Remembering that on "real" machines it's full SipHash, then I'd say that
> 64-bit security + rekeying seems reasonable.

64-bit security for an RNG is not reasonable even with rekeying. No no
no. Considering we already have a massive speed-up here with the
secure version, there's zero reason to start weakening the security
because we're trigger happy with our benchmarks. No no no.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Andy Lutomirski
On Fri, Dec 16, 2016 at 2:13 PM, George Spelvin
 wrote:
>> What should we do with get_random_int() and get_random_long()?  In
>> some cases it's being used in performance sensitive areas, and where
>> anti-DoS protection might be enough.  In others, maybe not so much.
>
> This is tricky.  The entire get_random_int() structure is an abuse of
> the hash function and will need to be thoroughly rethought to convert
> it to SipHash.  Remember, SipHash's security goals are very different
> from MD5, so there's no obvious way to do the conversion.
>
> (It's *documented* as "not cryptographically secure", but we know
> where that goes.)
>
>> If we rekeyed the secret used by get_random_int() and
>> get_random_long() frequently (say, every minute or every 5 minutes),
>> would that be sufficient for current and future users of these
>> interfaces?
>
> Remembering that on "real" machines it's full SipHash, then I'd say that
> 64-bit security + rekeying seems reasonable.
>
> The question is, the idea has recently been floated to make hsiphash =
> SipHash-1-3 on 64-bit machines.  Is *that* okay?
>
>
> The annoying thing about the currently proposed patch is that the *only*
> chaining is the returned value.  What I'd *like* to do is the same
> pattern as we do with md5, and remember v[0..3] between invocations.
> But there's no partial SipHash primitive; we only get one word back.
>
> Even
> *chaining += ret = siphash_3u64(...)
>
> would be an improvement.

This is almost exactly what I suggested in my email on the other
thread from a few seconds ago :)

--Andy


Re: [PATCH v6 3/5] random: use SipHash in place of MD5

2016-12-16 Thread Andy Lutomirski
On Fri, Dec 16, 2016 at 1:45 PM, Jason A. Donenfeld  wrote:
> Hi Andy,
>
> On Fri, Dec 16, 2016 at 10:31 PM, Andy Lutomirski  wrote:
>> I think it would be nice to try to strenghen the PRNG construction.
>> FWIW, I'm not an expert in PRNGs, and there's fairly extensive
>> literature, but I can at least try.
>
> In an effort to keep this patchset as initially as uncontroversial as
> possible, I kept the same same construction as before and just swapped
> out slow MD5 for fast Siphash. Additionally, the function
> documentation says that it isn't cryptographically secure. But in the
> end I certainly agree with you; we should most definitely improve
> things, and seeing the eyeballs now on this series, I think we now
> might have a mandate to do so.
>
>> 1. A one-time leak of memory contents doesn't ruin security until
>> reboot.  This is especially value across suspend and/or hibernation.
>
> Ted and I were discussing this in another thread, and it sounds like
> he wants the same thing. I'll add re-generation of the secret every
> once in a while. Perhaps time-based makes more sense than
> counter-based for rekeying frequency?

Counter may be faster -- you don't need to read a timer.  Lots of CPUs
are surprisingly slow at timing.  OTOH jiffies are fast.

>
>> 2. An attack with a low work factor (2^64?) shouldn't break the scheme
>> until reboot.
>
> It won't. The key is 128-bits.

Whoops, I thought the key was 64-bit...

>
>> This is effectively doing:
>>
>> output = H(prev_output, weak "entropy", per-boot secret);
>>
>> One unfortunately downside is that, if used in a context where an
>> attacker can see a single output, the attacker learns the chaining
>> value.  If the attacker can guess the entropy, then, with 2^64 work,
>> they learn the secret, and they can predict future outputs.
>
> No, the secret is 128-bits, which isn't feasibly guessable. The secret
> also isn't part of the hash, but rather is the generator of the hash
> function. A keyed hash (a PRF) is a bit of a different construction
> than just hashing a secret value into a hash function.

I was thinking in the random oracle model, in which case the whole
function is just a PRF that takes a few input parameters, one of which
is a key.

>
>> Second, change the mode so that an attacker doesn't learn so much
>> internal state.  For example:
>>
>> output = H(old_chain, entropy, secret);
>> new_chain = old_chain + entropy + output;
>
> Happy to make this change, with making the chaining value additive
> rather than a replacement.
>
> However, I'm not sure adding entropy to the new_chain makes a
> different. That entropy is basically just the cycle count plus the
> jiffies count. If an attacker can already guess them, then adding them
> again to the chaining value doesn't really add much.

Agreed.  A simpler contruction would be:

chaining++;
output = H(chaining, secret);

And this looks a whole lot like Ted's ChaCha20 construction.

The benefit of my construction is that (in the random oracle model,
assuming my intuition is right), if an attacker misses ~128 samples
and entropy has at least one bit of independent min-entropy per
sample, then the attacker needs ~2^128 work to brute-force the
chaining value even fi the attacker knew both the original chaining
value and the secret.

--Andy


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> What should we do with get_random_int() and get_random_long()?  In
> some cases it's being used in performance sensitive areas, and where
> anti-DoS protection might be enough.  In others, maybe not so much.

This is tricky.  The entire get_random_int() structure is an abuse of
the hash function and will need to be thoroughly rethought to convert
it to SipHash.  Remember, SipHash's security goals are very different
from MD5, so there's no obvious way to do the conversion.

(It's *documented* as "not cryptographically secure", but we know
where that goes.)

> If we rekeyed the secret used by get_random_int() and
> get_random_long() frequently (say, every minute or every 5 minutes),
> would that be sufficient for current and future users of these
> interfaces?

Remembering that on "real" machines it's full SipHash, then I'd say that
64-bit security + rekeying seems reasonable.

The question is, the idea has recently been floated to make hsiphash =
SipHash-1-3 on 64-bit machines.  Is *that* okay?


The annoying thing about the currently proposed patch is that the *only*
chaining is the returned value.  What I'd *like* to do is the same
pattern as we do with md5, and remember v[0..3] between invocations.
But there's no partial SipHash primitive; we only get one word back.

Even
*chaining += ret = siphash_3u64(...)

would be an improvement.

Although we could do something like

c0 = chaining[0];
chaining[0] = c1 = chaining[1];

ret = hsiphash(c0, c1, ...)

chaining[1] = c0 + ret;


Re: [PATCH v6 3/5] random: use SipHash in place of MD5

2016-12-16 Thread Jason A. Donenfeld
Hi Andy, Ted,

I've made the requested changes. Keys now rotate and are per-CPU
based. The chaining value is now additive instead of replacing.

DavidL suggested I lower the velocity of `git-send-email` triggers, so
if you'd like to take a look at this before I post v7, you can follow
along at my git tree here:

https://git.zx2c4.com/linux-dev/log/?h=siphash

Choose the commit entitled "random: use SipHash in place of MD5"

Thanks,
Jason


Re: [TSN RFC v2 5/9] Add TSN header for the driver

2016-12-16 Thread Richard Cochran
On Fri, Dec 16, 2016 at 06:59:09PM +0100, hen...@austad.us wrote:
> +/*
> + * List of current subtype fields in the common header of AVTPDU
> + *
> + * Note: AVTPDU is a remnant of the standards from when it was AVB.
> + *
> + * The list has been updated with the recent values from IEEE 1722, draft 16.
> + */
> +enum avtp_subtype {
> + TSN_61883_IIDC = 0, /* IEC 61883/IIDC Format */
> + TSN_MMA_STREAM, /* MMA Streams */
> + TSN_AAF,/* AVTP Audio Format */
> + TSN_CVF,/* Compressed Video Format */
> + TSN_CRF,/* Clock Reference Format */
> + TSN_TSCF,   /* Time-Synchronous Control Format */
> + TSN_SVF,/* SDI Video Format */
> + TSN_RVF,/* Raw Video Format */
> + /* 0x08 - 0x6D reserved */
> + TSN_AEF_CONTINOUS = 0x6e, /* AES Encrypted Format Continous */
> + TSN_VSF_STREAM, /* Vendor Specific Format Stream */
> + /* 0x70 - 0x7e reserved */
> + TSN_EF_STREAM = 0x7f,   /* Experimental Format Stream */
> + /* 0x80 - 0x81 reserved */
> + TSN_NTSCF = 0x82,   /* Non Time-Synchronous Control Format */
> + /* 0x83 - 0xed reserved */
> + TSN_ESCF = 0xec,/* ECC Signed Control Format */
> + TSN_EECF,   /* ECC Encrypted Control Format */
> + TSN_AEF_DISCRETE,   /* AES Encrypted Format Discrete */
> + /* 0xef - 0xf9 reserved */
> + TSN_ADP = 0xfa, /* AVDECC Discovery Protocol */
> + TSN_AECP,   /* AVDECC Enumeration and Control Protocol */
> + TSN_ACMP,   /* AVDECC Connection Management Protocol */
> + /* 0xfd reserved */
> + TSN_MAAP = 0xfe,/* MAAP Protocol */
> + TSN_EF_CONTROL, /* Experimental Format Control */
> +};

The kernel shouldn't be in the business of assembling media packets.

Thanks,
Richard


Re: Soft lockup in inet_put_port on 4.6

2016-12-16 Thread Josef Bacik

On Fri, Dec 16, 2016 at 10:21 AM, Josef Bacik  wrote:

On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik  wrote:
On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa 
 wrote:

Hi Josef,

On 15.12.2016 19:53, Josef Bacik wrote:
 On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert 
 wrote:
 On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek 


 wrote:
  On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert 


 wrote:
  I think there may be some suspicious code in 
inet_csk_get_port. At

  tb_found there is:

  if (((tb->fastreuse > 0 && reuse) ||
   (tb->fastreuseport > 0 &&

!rcu_access_pointer(sk->sk_reuseport_cb) &&

sk->sk_reuseport && uid_eq(tb->fastuid,
 uid))) &&
  smallest_size == -1)
  goto success;
  if 
(inet_csk(sk)->icsk_af_ops->bind_conflict(sk,

 tb, true)) {
  if ((reuse ||
   (tb->fastreuseport > 0 &&
sk->sk_reuseport &&

 !rcu_access_pointer(sk->sk_reuseport_cb) &&
uid_eq(tb->fastuid, uid))) &&
  smallest_size != -1 && --attempts 
>= 0) {

  spin_unlock_bh(>lock);
  goto again;
  }
  goto fail_unlock;
  }

  AFAICT there is redundancy in these two conditionals.  The 
same clause

  is being checked in both: (tb->fastreuseport > 0 &&
  !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport 
&&
  uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is 
true the
  first conditional should be hit, goto done,  and the second 
will never
  evaluate that part to true-- unless the sk is changed (do we 
need

  READ_ONCE for sk->sk_reuseport_cb?).

  That's an interesting point... It looks like this function also
  changed in 4.6 from using a single local_bh_disable() at the 
beginning

  with several spin_lock(>lock) to exclusively
  spin_lock_bh(>lock) at each locking point.  Perhaps the 
full bh
  disable variant was preventing the timers in your stack trace 
from

  running interleaved with this function before?


 Could be, although dropping the lock shouldn't be able to affect 
the

 search state. TBH, I'm a little lost in reading function, the
 SO_REUSEPORT handling is pretty complicated. For instance,
 rcu_access_pointer(sk->sk_reuseport_cb) is checked three times 
in that
 function and also in every call to inet_csk_bind_conflict. I 
wonder if

 we can simply this under the assumption that SO_REUSEPORT is only
 allowed if the port number (snum) is explicitly specified.


 Ok first I have data for you Hannes, here's the time distributions
 before during and after the lockup (with all the debugging in 
place the
 box eventually recovers).  I've attached it as a text file since 
it is

 long.


Thanks a lot!

 Second is I was thinking about why we would spend so much time 
doing the

 ->owners list, and obviously it's because of the massive amount of
 timewait sockets on the owners list.  I wrote the following dumb 
patch
 and tested it and the problem has disappeared completely.  Now I 
don't
 know if this is right at all, but I thought it was weird we 
weren't
 copying the soreuseport option from the original socket onto the 
twsk.
 Is there are reason we aren't doing this currently?  Does this 
help

 explain what is happening?  Thanks,


The patch is interesting and a good clue, but I am immediately a bit
concerned that we don't copy/tag the socket with the uid also to 
keep

the security properties for SO_REUSEPORT. I have to think a bit more
about this.

We have seen hangs during connect. I am afraid this patch wouldn't 
help

there while also guaranteeing uniqueness.



Yeah so I looked at the code some more and actually my patch is 
really bad.  If sk2->sk_reuseport is set we'll look at 
sk2->sk_reuseport_cb, which is outside of the timewait sock, so 
that's definitely bad.


But we should at least be setting it to 0 so that we don't do this 
normally.  Unfortunately simply setting it to 0 doesn't fix the 
problem.  So for some reason having ->sk_reuseport set to 1 on a 
timewait socket makes this problem non-existent, which is strange.


So back to the drawing board I guess.  I wonder if doing what craig 
suggested and batching the timewait timer expires so it hurts less 
would accomplish the same results.  Thanks,


Wait no I lied, we access the sk->sk_reuseport_cb, not sk2's.  This 
is the code


   if ((!reuse || !sk2->sk_reuse ||
   sk2->sk_state == TCP_LISTEN) &&
   (!reuseport || !sk2->sk_reuseport ||
rcu_access_pointer(sk->sk_reuseport_cb) ||
  

Re: [TSN RFC v2 0/9] TSN driver for the kernel

2016-12-16 Thread Richard Cochran
On Fri, Dec 16, 2016 at 06:59:04PM +0100, hen...@austad.us wrote:
> The driver is directed via ConfigFS as we need userspace to handle
> stream-reservation (MSRP), discovery and enumeration (IEEE 1722.1) and
> whatever other management is needed.

I complained about configfs before, but you didn't listen.

> 2 new fields in netdev_ops have been introduced, and the Intel
> igb-driver has been updated (as this an AVB-capable NIC which is
> available as a PCI-e card).

The igb hacks show that you are on the wrong track.  We can and should
be able to support TSN without resorting to driver specific hacks and
module parameters.

> Before reading on - this is not even beta, but I'd really appreciate if
> people would comment on the overall architecture and perhaps provide
> some pointers to where I should improve/fix/update

As I said before about V1, this architecture stinks.  You appear to
have continued hacking along and posted the same design again.  Did
you even address any of the points I raised back then?

You are trying to put tons of code into the kernel that really belongs
in user space, and at the same time, you omit critical functions that
only the kernel can provide.

> There are at least one AVB-driver (the AV-part of TSN) in the kernel
> already.

And which driver is that?

Thanks,
Richard


Re: [PATCH 1/2] bpf: do not use KMALLOC_SHIFT_MAX

2016-12-16 Thread Michal Hocko
On Fri 16-12-16 10:02:10, Alexei Starovoitov wrote:
> On Thu, Dec 15, 2016 at 05:47:21PM +0100, Michal Hocko wrote:
> > From: Michal Hocko 
> > 
> > 01b3f52157ff ("bpf: fix allocation warnings in bpf maps and integer
> > overflow") has added checks for the maximum allocateable size. It
> > (ab)used KMALLOC_SHIFT_MAX for that purpose. While this is not incorrect
> > it is not very clean because we already have KMALLOC_MAX_SIZE for this
> > very reason so let's change both checks to use KMALLOC_MAX_SIZE instead.
> > 
> > Cc: Alexei Starovoitov 
> > Signed-off-by: Michal Hocko 
> 
> Nack until the patches 1 and 2 are reversed.

I do not insist on ordering. The thing is that it shouldn't matter all
that much. Or are you worried about bisectability?
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v6 3/5] random: use SipHash in place of MD5

2016-12-16 Thread Jason A. Donenfeld
Hi Andy,

On Fri, Dec 16, 2016 at 10:31 PM, Andy Lutomirski  wrote:
> I think it would be nice to try to strenghen the PRNG construction.
> FWIW, I'm not an expert in PRNGs, and there's fairly extensive
> literature, but I can at least try.

In an effort to keep this patchset as initially as uncontroversial as
possible, I kept the same same construction as before and just swapped
out slow MD5 for fast Siphash. Additionally, the function
documentation says that it isn't cryptographically secure. But in the
end I certainly agree with you; we should most definitely improve
things, and seeing the eyeballs now on this series, I think we now
might have a mandate to do so.

> 1. A one-time leak of memory contents doesn't ruin security until
> reboot.  This is especially value across suspend and/or hibernation.

Ted and I were discussing this in another thread, and it sounds like
he wants the same thing. I'll add re-generation of the secret every
once in a while. Perhaps time-based makes more sense than
counter-based for rekeying frequency?

> 2. An attack with a low work factor (2^64?) shouldn't break the scheme
> until reboot.

It won't. The key is 128-bits.

> This is effectively doing:
>
> output = H(prev_output, weak "entropy", per-boot secret);
>
> One unfortunately downside is that, if used in a context where an
> attacker can see a single output, the attacker learns the chaining
> value.  If the attacker can guess the entropy, then, with 2^64 work,
> they learn the secret, and they can predict future outputs.

No, the secret is 128-bits, which isn't feasibly guessable. The secret
also isn't part of the hash, but rather is the generator of the hash
function. A keyed hash (a PRF) is a bit of a different construction
than just hashing a secret value into a hash function.

> Second, change the mode so that an attacker doesn't learn so much
> internal state.  For example:
>
> output = H(old_chain, entropy, secret);
> new_chain = old_chain + entropy + output;

Happy to make this change, with making the chaining value additive
rather than a replacement.

However, I'm not sure adding entropy to the new_chain makes a
different. That entropy is basically just the cycle count plus the
jiffies count. If an attacker can already guess them, then adding them
again to the chaining value doesn't really add much.

Jason


[PATCH] isdn: Constify some function parameters

2016-12-16 Thread Kees Cook
From: Emese Revfy 

The coming initify gcc plugin expects const pointer types, and caught
some __printf arguments that weren't const yet. This fixes those.

Signed-off-by: Emese Revfy 
[kees: expanded commit message]
Signed-off-by: Kees Cook 
---
 drivers/isdn/hisax/config.c | 16 
 drivers/isdn/hisax/hisax.h  |  4 ++--
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/isdn/hisax/config.c b/drivers/isdn/hisax/config.c
index bf04d2a3cf4a..2d12c6ceeb89 100644
--- a/drivers/isdn/hisax/config.c
+++ b/drivers/isdn/hisax/config.c
@@ -659,7 +659,7 @@ int jiftime(char *s, long mark)
 
 static u_char tmpbuf[HISAX_STATUS_BUFSIZE];
 
-void VHiSax_putstatus(struct IsdnCardState *cs, char *head, char *fmt,
+void VHiSax_putstatus(struct IsdnCardState *cs, char *head, const char *fmt,
  va_list args)
 {
/* if head == NULL the fmt contains the full info */
@@ -669,23 +669,24 @@ void VHiSax_putstatus(struct IsdnCardState *cs, char 
*head, char *fmt,
u_char  *p;
isdn_ctrl   ic;
int len;
+   const u_char*data;
 
if (!cs) {
printk(KERN_WARNING "HiSax: No CardStatus for message");
return;
}
spin_lock_irqsave(>statlock, flags);
-   p = tmpbuf;
if (head) {
+   p = tmpbuf;
p += jiftime(p, jiffies);
p += sprintf(p, " %s", head);
p += vsprintf(p, fmt, args);
*p++ = '\n';
*p = 0;
len = p - tmpbuf;
-   p = tmpbuf;
+   data = tmpbuf;
} else {
-   p = fmt;
+   data = fmt;
len = strlen(fmt);
}
if (len > HISAX_STATUS_BUFSIZE) {
@@ -699,13 +700,12 @@ void VHiSax_putstatus(struct IsdnCardState *cs, char 
*head, char *fmt,
if (i >= len)
i = len;
len -= i;
-   memcpy(cs->status_write, p, i);
+   memcpy(cs->status_write, data, i);
cs->status_write += i;
if (cs->status_write > cs->status_end)
cs->status_write = cs->status_buf;
-   p += i;
if (len) {
-   memcpy(cs->status_write, p, len);
+   memcpy(cs->status_write, data + i, len);
cs->status_write += len;
}
 #ifdef KERNELSTACK_DEBUG
@@ -729,7 +729,7 @@ void VHiSax_putstatus(struct IsdnCardState *cs, char *head, 
char *fmt,
}
 }
 
-void HiSax_putstatus(struct IsdnCardState *cs, char *head, char *fmt, ...)
+void HiSax_putstatus(struct IsdnCardState *cs, char *head, const char *fmt, 
...)
 {
va_list args;
 
diff --git a/drivers/isdn/hisax/hisax.h b/drivers/isdn/hisax/hisax.h
index 6ead6314e6d2..338d0408b377 100644
--- a/drivers/isdn/hisax/hisax.h
+++ b/drivers/isdn/hisax/hisax.h
@@ -1288,9 +1288,9 @@ int jiftime(char *s, long mark);
 int HiSax_command(isdn_ctrl *ic);
 int HiSax_writebuf_skb(int id, int chan, int ack, struct sk_buff *skb);
 __printf(3, 4)
-void HiSax_putstatus(struct IsdnCardState *cs, char *head, char *fmt, ...);
+void HiSax_putstatus(struct IsdnCardState *cs, char *head, const char *fmt, 
...);
 __printf(3, 0)
-void VHiSax_putstatus(struct IsdnCardState *cs, char *head, char *fmt, va_list 
args);
+void VHiSax_putstatus(struct IsdnCardState *cs, char *head, const char *fmt, 
va_list args);
 void HiSax_reportcard(int cardnr, int sel);
 int QuickHex(char *txt, u_char *p, int cnt);
 void LogFrame(struct IsdnCardState *cs, u_char *p, int size);
-- 
2.7.4


-- 
Kees Cook
Nexus Security


Re: [PATCH v6 3/5] random: use SipHash in place of MD5

2016-12-16 Thread Andy Lutomirski
On Thu, Dec 15, 2016 at 7:03 PM, Jason A. Donenfeld  wrote:
> -static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
> -   __aligned(sizeof(unsigned long));
> +static DEFINE_PER_CPU(u64, get_random_int_chaining);
>

[...]

>  unsigned long get_random_long(void)
>  {
> -   __u32 *hash;
> unsigned long ret;
> +   u64 *chaining;
>
> if (arch_get_random_long())
> return ret;
>
> -   hash = get_cpu_var(get_random_int_hash);
> -
> -   hash[0] += current->pid + jiffies + random_get_entropy();
> -   md5_transform(hash, random_int_secret);
> -   ret = *(unsigned long *)hash;
> -   put_cpu_var(get_random_int_hash);
> -
> +   chaining = _cpu_var(get_random_int_chaining);
> +   ret = *chaining = siphash_3u64(*chaining, jiffies, 
> random_get_entropy() +
> +  current->pid, random_int_secret);
> +   put_cpu_var(get_random_int_chaining);
> return ret;
>  }

I think it would be nice to try to strenghen the PRNG construction.
FWIW, I'm not an expert in PRNGs, and there's fairly extensive
literature, but I can at least try.  Here are some properties I'd
like:

1. A one-time leak of memory contents doesn't ruin security until
reboot.  This is especially value across suspend and/or hibernation.

2. An attack with a low work factor (2^64?) shouldn't break the scheme
until reboot.


This is effectively doing:

output = H(prev_output, weak "entropy", per-boot secret);

One unfortunately downside is that, if used in a context where an
attacker can see a single output, the attacker learns the chaining
value.  If the attacker can guess the entropy, then, with 2^64 work,
they learn the secret, and they can predict future outputs.

I would advocate adding two types of improvements.  First, re-seed it
every now and then (every 128 calls?) by just replacing both the
chaining value and the percpu secret with fresh CSPRNG output.
Second, change the mode so that an attacker doesn't learn so much
internal state.  For example:

output = H(old_chain, entropy, secret);
new_chain = old_chain + entropy + output;

This increases the effort needed to brute-force the internal state
from 2^64 to 2^128 (barring any weaknesses in the scheme).

Also, can we not call this get_random_int()?  get_random_int() sounds
too much like get_random_bytes(), and the latter is intended to be a
real CSPRNG.  Can we call it get_weak_random_int() or similar?

--Andy


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi George,

On Fri, Dec 16, 2016 at 10:25 PM, George Spelvin
 wrote:
> But yes, the sequence number is supposed to be (random base) + (timestamp).
> In the old days before Canter & Siegel when the internet was a nice place,
> people just used a counter that started at boot time.
>
> But then someone observed that I can start a connection to host X,
> see the sequence number it gives back to me, and thereby learn the
> seauence number it's using on its connections to host Y.
>
> And I can use that to inject forged data into an X-to-Y connection,
> without ever seeing a single byte of the traffic!  (If I *can* observe
> the traffic, of course, none of this makes the slightest difference.)
>
> So the random base was made a keyed hash of the endpoint identifiers.
> (Practically only the hosts matter, but generally the ports are thrown
> in for good measure.)  That way, the ISN that host X sends to me
> tells me nothing about the ISN it's using to talk to host Y.  Now the
> only way to inject forged data into the X-to-Y connection is to
> send 2^32 bytes, which is a little less practical.

Oh, okay, that is exactly what I thought was going on. I just thought
you were implying that jiffies could be moved inside the hash, which
then confused my understanding of how things should be. In any case,
thanks for the explanation.

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Jason A. Donenfeld wrote:
> I saw that jiffies addition in there and was wondering what it was all
> about. It's currently added _after_ the siphash input, not before, to
> keep with how the old algorithm worked. I'm not sure if this is
> correct or if there's something wrong with that, as I haven't studied
> how it works. If that jiffies should be part of the siphash input and
> not added to the result, please tell me. Otherwise I'll keep things
> how they are to avoid breaking something that seems to be working.

Oh, geez, I didn't realize you didn't understand this code.

Full details at
https://en.wikipedia.org/wiki/TCP_sequence_prediction_attack

But yes, the sequence number is supposed to be (random base) + (timestamp).
In the old days before Canter & Siegel when the internet was a nice place,
people just used a counter that started at boot time.

But then someone observed that I can start a connection to host X,
see the sequence number it gives back to me, and thereby learn the
seauence number it's using on its connections to host Y.

And I can use that to inject forged data into an X-to-Y connection,
without ever seeing a single byte of the traffic!  (If I *can* observe
the traffic, of course, none of this makes the slightest difference.)

So the random base was made a keyed hash of the endpoint identifiers.
(Practically only the hosts matter, but generally the ports are thrown
in for good measure.)  That way, the ISN that host X sends to me
tells me nothing about the ISN it's using to talk to host Y.  Now the
only way to inject forged data into the X-to-Y connection is to
send 2^32 bytes, which is a little less practical.


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Hannes Frederic Sowa
On Fri, Dec 16, 2016, at 22:01, Jason A. Donenfeld wrote:
> Yes, on x86-64. But on i386 chacha20 incurs nearly the same kind of
> slowdown as siphash, so I expect the comparison to be more or less
> equal. There's another thing I really didn't like about your chacha20
> approach which is that it uses the /dev/urandom pool, which means
> various things need to kick in in the background to refill this.
> Additionally, having to refill the buffered chacha output every 32 or
> so longs isn't nice. These things together make for inconsistent and
> hard to understand general operating system performance, because
> get_random_long is called at every process startup for ASLR. So, in
> the end, I believe there's another reason for going with the siphash
> approach: deterministic performance.

*Hust*, so from where do you generate your key for siphash if called
early from ASLR?

Bye,
Hannes


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi Daniel,

On Fri, Dec 16, 2016 at 9:44 PM, Daniel Micay  wrote:
> On Fri, 2016-12-16 at 11:47 -0800, Tom Herbert wrote:
>>
>> That's about 3x of jhash speed (7 nsecs). So that might closer
>> to a more palatable replacement for jhash. Do we lose any security
>> advantages with halfsiphash?
>
> Have you tested a lower round SipHash? Probably best to stick with the
> usual construction for non-DoS mitigation, but why not try SipHash 1-3,
> 1-2, etc. for DoS mitigation?
>
> Rust and Swift both went with SipHash 1-3 for hash tables.

Maybe not a bad idea.

SipHash2-4 for MD5 replacement, as we've done so far. This is when we
actually want things to be secure (and fast).

And then HalfSipHash1-3 for certain jhash replacements. This is for
when we're talking only about DoS or sort of just joking about
security, and want things to be very competitive with jhash. (Of
course for 64-bit we'd use SipHash1-3 instead of HalfSipHash for the
speedup.)

I need to think on this a bit more, but preliminarily, I guess this
would be maybe okay...

George, JP - what do you think?

Jason


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi Ted,

On Fri, Dec 16, 2016 at 9:43 PM, Theodore Ts'o  wrote:
> What should we do with get_random_int() and get_random_long()?  In
> some cases it's being used in performance sensitive areas, and where
> anti-DoS protection might be enough.  In others, maybe not so much.
>
> If we rekeyed the secret used by get_random_int() and
> get_random_long() frequently (say, every minute or every 5 minutes),
> would that be sufficient for current and future users of these
> interfaces?

get_random_int() and get_random_long() should quite clearly use
SipHash with its secure 128-bit key and not HalfSipHash with its
64-bit key. HalfSipHash is absolutely insufficient for this use case.
Remember, we're already an order of magnitude or more faster than
md5...

With regard to periodic rekeying... since the secret is 128-bits, I
believe this is likely sufficient for _not_ rekeying. There's also the
chaining variable, to tie together invocations of the function. If
you'd prefer, instead of the chaining variable, we could use some
siphash output to mutate the original key, but I don't think this
approach is actually better and might introduce vulnerabilities. In my
opinion chaining+128bitkey is sufficient. On the other hand, rekeying
every X minutes is 3 or 4 lines of code. If you want (just say so),
I'll add this to my next revision.

You asked about the security requirements of these functions. The
comment says they're not cryptographically secure. And right now with
MD5 they're not. So the expectations are pretty low. Moving to siphash
adds some cryptographic security, certainly. Moving to siphash plus
rekeying adds a bit more. Of course, on recent x86, RDRAND is used
instead, so the cryptographic strength then depends on the thickness
of your tinfoil hat. So probably we shouldn't change what we advertise
these functions provide, even though we're certainly improving them
performance-wise and security-wise.

> P.S.  I'll note that my performance figures when testing changes to
> get_random_int() were done on a 32-bit x86; Jason, I'm guessing your
> figures were using a 64-bit x86 system?.  I haven't tried 32-bit ARM
> or smaller CPU's (e.g., mips, et. al.) that might be more likely to be
> used on IoT devices, but I'm worried about those too, of course.

Yes, on x86-64. But on i386 chacha20 incurs nearly the same kind of
slowdown as siphash, so I expect the comparison to be more or less
equal. There's another thing I really didn't like about your chacha20
approach which is that it uses the /dev/urandom pool, which means
various things need to kick in in the background to refill this.
Additionally, having to refill the buffered chacha output every 32 or
so longs isn't nice. These things together make for inconsistent and
hard to understand general operating system performance, because
get_random_long is called at every process startup for ASLR. So, in
the end, I believe there's another reason for going with the siphash
approach: deterministic performance.

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Tom Herbert
On Fri, Dec 16, 2016 at 12:41 PM, George Spelvin
 wrote:
> Tom Herbert wrote:
>> Tested this. Distribution and avalanche effect are still good. Speed
>> wise I see about a 33% improvement over siphash (20 nsecs/op versus 32
>> nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer
>> to a more palatable replacement for jhash. Do we lose any security
>> advantages with halfsiphash?
>
> What are you testing on?  And what input size?  And does "33% improvement"
> mean 4/3 the rate and 3/4 the time?  Or 2/3 the time and 3/2 the rate?
>
Sorry, that is over an IPv4 tuple. Intel(R) Xeon(R) CPU E5-2660 0 @
2.20GHz. Recoded the function I was using to look like more like 64
bit version and yes it is indeed slower.

> These are very odd results.  On a 64-bit machine, SipHash should be the
> same speed per round, and faster because it hashes more data per round.
> (Unless you're hitting some unexpected cache/decode effect due to REX
> prefixes.)
>
> On a 32-bit machine (other than ARM, where your results might make sense,
> or maybe if you're hashing large amounts of data), the difference should
> be larger.
>
> And yes, there is a *significant* security loss.  SipHash is 128 bits
> ("don't worry about it").  hsiphash is 64 bits, which is known breakable
> ("worry about it"), so we have to do a careful analysis of the cost of
> a successful attack.
>
> As mentioned in the e-mails that just flew by, hsiphash is intended
> *only* for 32-bit machines which bog down on full SipHash.  On all 64-bit
> machines, it will be implemented as an alias for SipHash and the security
> concerns will Just Go Away.
>
> The place where hsiphash is expected to make a big difference is 32-bit
> x86.  If you only see 33% difference with "gcc -m32", I'm going to be
> very confused.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
On Fri, Dec 16, 2016 at 9:17 PM, George Spelvin
 wrote:
> My (speaking enerally; I should walk through every hash table you've
> converted) opinion is that:
>
> - Hash tables, even network-facing ones, can all use hsiphash as long
>   as an attacker can only see collisions, i.e. ((H(x) ^ H(y)) & bits) ==
>   0, and the consequences of a successful attack is only more collisions
>   (timing).  While the attack is only 2x the cost (two hashes rather than
>   one to test a key), the knowledge of the collision is statistical,
>   especially for network attackers, which raises the cost of guessing
>   beyond an even more brute-force attack.
> - When the hash value directly visible (e.g. included in a network
>   packet), full SipHash should be the default.
> - Syncookies *could* use hsiphash, especially as there are
>   two keys in there.  Not sure if we need the performance.
> - For TCP ISNs, I'd prefer to use full SipHash.  I know this is
>   a very hot path, and if that's a performance bottleneck,
>   we can work harder on it.
>
> In particular, TCP ISNs *used* to rotate the key periodically,
> limiting the time available to an attacker to perform an
> attack before the secret goes stale and is useless.  commit
> 6e5714eaf77d79ae1c8b47e3e040ff5411b717ec upgraded to md5 and dropped
> the key rotation.

While I generally agree with this analysis for the most part, I do
think we should use SipHash and not HalfSipHash for syncookies.
Although the security risk is lower than with sequence numbers, it
previously used full MD5 for this, which means performance is not
generally a bottleneck and we'll get massive speedups no matter what,
whether using SipHash or HalfSipHash. In addition, using SipHash means
that the 128-bit key gives a larger margin and can be safe longterm.
So, I think we should err on the side of caution and stick with
SipHash in all cases in which we're upgrading from MD5.

In other words, only current jhash users should be potentially
eligible for hsiphash.


> Current code uses a 64 ns tick for the ISN, so it counts 2^24 per second.
> (32 bits wraps every 4.6 minutes.)  A 4-bit counter and 28-bit hash
> (or even 3+29) would work as long as the key is regenerated no more
> than once per minute.  (Just using the 4.6-minute ISN wrap time is the
> obvious simple implementation.)
>
> (Of course, I defer to DaveM's judgement on all network-related issues.)

I saw that jiffies addition in there and was wondering what it was all
about. It's currently added _after_ the siphash input, not before, to
keep with how the old algorithm worked. I'm not sure if this is
correct or if there's something wrong with that, as I haven't studied
how it works. If that jiffies should be part of the siphash input and
not added to the result, please tell me. Otherwise I'll keep things
how they are to avoid breaking something that seems to be working.


Re: [net-next PATCH v6 0/5] XDP for virtio_net

2016-12-16 Thread Michael S. Tsirkin
On Fri, Dec 16, 2016 at 01:20:02PM -0500, David Miller wrote:
> From: "Michael S. Tsirkin" 
> Date: Fri, 16 Dec 2016 01:17:44 +0200
> 
> > OK, I think we can queue this for -next.
> > 
> > It's fairly limited in the kind of hardware supported, we can and
> > probably should extend it further with time.
> > 
> > Acked-by: Michael S. Tsirkin 
> 
> Michael, thanks for reviewing.
> 
> Since the substance of this patch series has honestly been ready since
> before the merge window, would you mind if I send this to Linus now?
> 
> That's what I was hoping I would be able to do.
> 
> Thanks again.

ACK, no problem.
BTW in case people wonder, I'll be offline for a couple of weeks.
This might delay review of some patches a bit.

-- 
MST


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi JP,

On Fri, Dec 16, 2016 at 2:22 PM, Jean-Philippe Aumasson
 wrote:
> It needs some basic security review, which I'll try do next week (check for
> security margin, optimality of rotation counts, etc.). But after a lot of
> experience with this kind of construction (BLAKE, SipHash, NORX), I'm
> confident it will be safe as it is.

I've implemented it in my siphash kernel branch:

https://git.zx2c4.com/linux-dev/log/?h=siphash

It's the commit that has "HalfSipHash" in the log message. As the
structure is nearly identical to SipHash, there wasn't a lot to
change, and so the same implementation strategy exists for each.

When you've finished your security review and feel good about it, some
test vectors using the same formula (key={0x03020100, 07060504},
input={0x0, 0x1, 0x2, 0x3...}, output=test_vectors) would be nice for
verification.

Jason


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Theodore Ts'o
On Fri, Dec 16, 2016 at 03:17:39PM -0500, George Spelvin wrote:
> > That's a nice analysis. Might one conclude from that that hsiphash is
> > not useful for our purposes? Or does it still remain useful for
> > network facing code?
> 
> I think for attacks where the threat is a DoS, it's usable.  The point
> is you only have to raise the cost to equal that of a packet flood.
> (Just like in electronic warfare, the best you can possibly do is force
> the enemy to use broadband jamming.)
> 
> Hash collision attacks just aren't that powerful.  The original PoC
> was against an application that implemented a hard limit on hash chain
> length as a DoS defense, which the attack then exploited to turn it into
> a hard DoS.

What should we do with get_random_int() and get_random_long()?  In
some cases it's being used in performance sensitive areas, and where
anti-DoS protection might be enough.  In others, maybe not so much.

If we rekeyed the secret used by get_random_int() and
get_random_long() frequently (say, every minute or every 5 minutes),
would that be sufficient for current and future users of these
interfaces?

- Ted

P.S.  I'll note that my performance figures when testing changes to
get_random_int() were done on a 32-bit x86; Jason, I'm guessing your
figures were using a 64-bit x86 system?.  I haven't tried 32-bit ARM
or smaller CPU's (e.g., mips, et. al.) that might be more likely to be
used on IoT devices, but I'm worried about those too, of course.



Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Daniel Micay
On Fri, 2016-12-16 at 11:47 -0800, Tom Herbert wrote:
> 
> That's about 3x of jhash speed (7 nsecs). So that might closer
> to a more palatable replacement for jhash. Do we lose any security
> advantages with halfsiphash?

Have you tested a lower round SipHash? Probably best to stick with the
usual construction for non-DoS mitigation, but why not try SipHash 1-3,
1-2, etc. for DoS mitigation?

Rust and Swift both went with SipHash 1-3 for hash tables.

signature.asc
Description: This is a digitally signed message part


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
On Fri, Dec 16, 2016 at 9:41 PM, George Spelvin
 wrote:
> What are you testing on?  And what input size?  And does "33% improvement"
> mean 4/3 the rate and 3/4 the time?  Or 2/3 the time and 3/2 the rate?

How that I've published my hsiphash implementation to my tree, it
should be possible to conduct the tests back to back with nearly
identical implementation strategies, to remove a potential source of
error.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Tom Herbert wrote:
> Tested this. Distribution and avalanche effect are still good. Speed
> wise I see about a 33% improvement over siphash (20 nsecs/op versus 32
> nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer
> to a more palatable replacement for jhash. Do we lose any security
> advantages with halfsiphash?

What are you testing on?  And what input size?  And does "33% improvement"
mean 4/3 the rate and 3/4 the time?  Or 2/3 the time and 3/2 the rate?

These are very odd results.  On a 64-bit machine, SipHash should be the
same speed per round, and faster because it hashes more data per round.
(Unless you're hitting some unexpected cache/decode effect due to REX
prefixes.)

On a 32-bit machine (other than ARM, where your results might make sense,
or maybe if you're hashing large amounts of data), the difference should
be larger.

And yes, there is a *significant* security loss.  SipHash is 128 bits
("don't worry about it").  hsiphash is 64 bits, which is known breakable
("worry about it"), so we have to do a careful analysis of the cost of
a successful attack.

As mentioned in the e-mails that just flew by, hsiphash is intended
*only* for 32-bit machines which bog down on full SipHash.  On all 64-bit
machines, it will be implemented as an alias for SipHash and the security
concerns will Just Go Away.

The place where hsiphash is expected to make a big difference is 32-bit
x86.  If you only see 33% difference with "gcc -m32", I'm going to be
very confused.


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
>> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
>> should be used always.  Don't even compile the 32-bit code, to prevent
>> anyone accidentally using it, and make hsiphash an alias for siphash.

> Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I
> like this arrangement.

This is a basic assumption I make in the security analysis below:
on most machines, it's 128-bit-key SipHash everywhere and we can
consider security solved.

Our analysis *only* has to consider 32-bit machines.  My big concern
is home routers, with IoT appliances coming second.  The routers have
severe hardware cost constraints (so limited CPU power), but see a lot
of traffic and need to process (NAT) it.

> That's a nice analysis. Might one conclude from that that hsiphash is
> not useful for our purposes? Or does it still remain useful for
> network facing code?

I think for attacks where the threat is a DoS, it's usable.  The point
is you only have to raise the cost to equal that of a packet flood.
(Just like in electronic warfare, the best you can possibly do is force
the enemy to use broadband jamming.)

Hash collision attacks just aren't that powerful.  The original PoC
was against an application that implemented a hard limit on hash chain
length as a DoS defense, which the attack then exploited to turn it into
a hard DoS.

>> Let me consider your second example above, "secure against local users".
>> I should dig through your patchset and find the details, but what exactly
>> are the consequences of such an attack?  Hasn't a local user already
>> got much better ways to DoS the system?

> For example, an unpriv'd user putting lots of entries in one hash
> bucket for a shared resource that's used by root, like filesystems or
> other lookup tables. If he can cause root to use more of root's cpu
> schedule budget than otherwise in a directed way, then that's a bad
> DoS.

This issue was recently discussed when we redesigned the dcache hash.
Even a successful attack doesn't slow things down all *that* much.

Before you overkill every hash table in the kernel, think about whether
it's a bigger problem than the dcache.  (Hint: it's probably not.)
There's no point armor-plating the side door when the front door was
just upgraded from screen to wood.

>> These days, 32-bit CPUs are for embedded applications: network appliances,
>> TVs, etc.  That means basically single-user.  Even phones are 64 bit.
>> Is this really a threat that needs to be defended against?

> I interpret this to indicate all the more reason to alias hsiphash to
> siphash on 64-bit, and then the problem space collapses in a clear
> way.

Yes, exactly.  

> Right. Hence the need for always using full siphash and not hsiphash
> for sequence numbers, per my earlier email to David.
>
>> I wish we could get away with 64-bit security, but given that the
>> modern internet involves attacks from NSA/Spetssvyaz/3PLA, I agree
>> it's just not enough.
>
> I take this comment to be relavent for the sequence number case.

Yes.

> For hashtables and hashtable flooding, is it still your opinion that
> we will benefit from hsiphash? Or is this final conclusion a rejection
> of hsiphash for that too? We're talking about two different use cases,
> and your email kind of interleaved both into your analysis, so I'm not
> certain so to precisely what your conclusion is for each use case. Can
> you clear up the ambiguity?

My (speaking enerally; I should walk through every hash table you've
converted) opinion is that:

- Hash tables, even network-facing ones, can all use hsiphash as long
  as an attacker can only see collisions, i.e. ((H(x) ^ H(y)) & bits) ==
  0, and the consequences of a successful attack is only more collisions
  (timing).  While the attack is only 2x the cost (two hashes rather than
  one to test a key), the knowledge of the collision is statistical,
  especially for network attackers, which raises the cost of guessing
  beyond an even more brute-force attack.
- When the hash value directly visible (e.g. included in a network
  packet), full SipHash should be the default.
- Syncookies *could* use hsiphash, especially as there are
  two keys in there.  Not sure if we need the performance.
- For TCP ISNs, I'd prefer to use full SipHash.  I know this is
  a very hot path, and if that's a performance bottleneck,
  we can work harder on it.

In particular, TCP ISNs *used* to rotate the key periodically,
limiting the time available to an attacker to perform an
attack before the secret goes stale and is useless.  commit
6e5714eaf77d79ae1c8b47e3e040ff5411b717ec upgraded to md5 and dropped
the key rotation.

If 2x hsiphash is faster than siphash, we could use a double-hashing
system like syncookies.  One 32-bit hash with a permanent key, summed
with a k-bit counter and a (32-k)-bit hash, where the key is rotated
(and the counter incremented) periodically.

The requirement is that the increment rate of the counter hash doesn't

RE: [PATCH] liquidio CN23XX: make timeout HZ independent

2016-12-16 Thread Chickles, Derek
> -Original Message-
> From: Nicholas Mc Guire [mailto:hof...@osadl.org]
> Sent: Friday, December 16, 2016 12:11 AM
> To: Chickles, Derek
> Cc: Burla, Satananda; Manlunas, Felix; Vatsavayi, Raghu;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org; Nicholas Mc Guire
> Subject: [PATCH] liquidio CN23XX: make timeout HZ independent
> 
> schedule_timeout_* takes a timeout in jiffies but the code currently is
> passing in a constant which makes this timeout HZ dependent, so pass it
> through msecs_to_jiffies() to fix this up.
> 
> Signed-off-by: Nicholas Mc Guire 
> ---
> 
> Problem found by coccinelle spatch
> 
> The current delay can vary by a factor 10 depending on the HZ
> setting chose, which does not seem reasonable here.
> 
> The below patch sets the timeout to 10ms - it is though not clear
> if this is the intent or if it should be longer/shorter as it is not
> clear what HZ setting was assumed during design and used for testing.
> 
> This needs an ack by someone who knows the device and can confirm that
> 10ms is reasonable to wait for completion of queuing.

We were actually looking at this in parallel already to speed up driver
loading. It would be better if we changed LIO_MBOX_WRITE_WAIT_TIME
to 1 and applied the msecs_to_jiffies() to the line specified in your patch
and the loop just above that.

while (readq(mbox->mbox_write_reg) != OCTEON_PFVFSIG) {
schedule_timeout_uninterruptible(LIO_MBOX_WRITE_WAIT_TIME);
if (count++ == LIO_MBOX_WRITE_WAIT_CNT) {
ret = OCTEON_MBOX_STATUS_FAILED;
break;
}
}   

If you can provide a new patch with both the changes we'll sign off on it.
Otherwise, it's on our list and we'll submit it soon ourselves.

Thanks,
Derek



Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Tom Herbert
On Fri, Dec 16, 2016 at 4:39 AM, Jason A. Donenfeld  wrote:
> Hey JP,
>
> On Fri, Dec 16, 2016 at 9:08 AM, Jean-Philippe Aumasson
>  wrote:
>> Here's a tentative HalfSipHash:
>> https://github.com/veorq/SipHash/blob/halfsiphash/halfsiphash.c
>>
>> Haven't computed the cycle count nor measured its speed.
>
Tested this. Distribution and avalanche effect are still good. Speed
wise I see about a 33% improvement over siphash (20 nsecs/op versus 32
nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer
to a more palatable replacement for jhash. Do we lose any security
advantages with halfsiphash?

Tom

> This is incredible. Really. Wow!
>
> I'll integrate this into my patchset and will write up some
> documentation about when one should be used over the other.
>
> Thanks again. Quite exciting.
>
> Jason


Re: [PATCH 2/3] Bluetooth: btusb: Add out-of-band wakeup support

2016-12-16 Thread Rajat Jain
Hi Brian,

I've just posted a v2 patchset after taking care of your your
comments. Please see inline below.

On Wed, Dec 14, 2016 at 7:21 PM, Brian Norris  wrote:
> Hi,
>
> On Wed, Dec 14, 2016 at 11:12:58AM -0800, Rajat Jain wrote:
>> Some BT chips (e.g. Marvell 8997) contain a wakeup pin that can be
>> connected to a gpio on the CPU side, and can be used to wakeup
>> the host out-of-band. This can be useful in situations where the
>> in-band wakeup is not possible or not preferable (e.g. the in-band
>> wakeup may require the USB host controller to remain active, and
>> hence consuming more system power during system sleep).
>>
>> The oob gpio interrupt to be used for wakeup on the CPU side, is
>> read from the device tree node, (using standard interrupt descriptors).
>> A devcie tree binding document is also added for the driver.
>>
>> Signed-off-by: Rajat Jain 
>> ---
>>  Documentation/devicetree/bindings/net/btusb.txt | 38 
>>  drivers/bluetooth/btusb.c   | 82 
>> +
>>  2 files changed, 120 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/net/btusb.txt
>>
>> diff --git a/Documentation/devicetree/bindings/net/btusb.txt 
>> b/Documentation/devicetree/bindings/net/btusb.txt
>> new file mode 100644
>> index 000..bb27f92
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/btusb.txt
>> @@ -0,0 +1,38 @@
>> +Generic Bluetooth controller over USB (btusb driver)
>> +---
>> +
>> +Required properties:
>> +
>> +  - compatible : should comply with the format "usbVID,PID" specified in
>> +  Documentation/devicetree/bindings/usb/usb-device.txt
>> +  At the time of writing, the only OF supported devices
>> +  (more may be added later) are:
>> +
>> +   "usb1286,204e" (Marvell 8997)
>> +
>> +Optional properties:
>> +
>> +  - interrupt-parent: phandle of the parent interrupt controller
>> +  - interrupts : The first interrupt specified is the interrupt that shall 
>> be
>> +  used for out-of-band wake-on-bt. Driver will request an irq
>> +  based on this interrupt number. During system suspend, the irq
>> +  will be enabled so that the bluetooth chip can wakeup host
>> +  platform out of band. During system resume, the irq will be
>> +  disabled to make sure unnecessary interrupt is not received.
>
> Might it be worthwhile to define an 'interrupt-names' property (e.g., =
> "wakeup") to help future-proof this?

Good idea, I've used the same.

>
>> +
>> +Example:
>> +
>> +Following example uses irq pin number 3 of gpio0 for out of band wake-on-bt:
>> +
>> +_host1_ehci {
>> +status = "okay";
>> +#address-cells = <1>;
>> +#size-cells = <0>;
>> +
>> +mvl_bt1: bt@1 {
>> + compatible = "usb1286,204e";
>> + reg = <1>;
>> + interrupt-parent = <>;
>> + interrupts = <3 IRQ_TYPE_LEVEL_LOW>;
>> +};
>> +};
>> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
>> index ce22cef..32a6f22 100644
>> --- a/drivers/bluetooth/btusb.c
>> +++ b/drivers/bluetooth/btusb.c
>> @@ -24,6 +24,8 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>> +#include 
>>  #include 
>>
>>  #include 
>> @@ -369,6 +371,7 @@ static const struct usb_device_id blacklist_table[] = {
>>  #define BTUSB_BOOTING9
>>  #define BTUSB_RESET_RESUME   10
>>  #define BTUSB_DIAG_RUNNING   11
>> +#define BTUSB_OOB_WAKE_DISABLED  12
>>
>>  struct btusb_data {
>>   struct hci_dev   *hdev;
>> @@ -416,6 +419,8 @@ struct btusb_data {
>>   int (*recv_bulk)(struct btusb_data *data, void *buffer, int count);
>>
>>   int (*setup_on_usb)(struct hci_dev *hdev);
>> +
>> + int oob_wake_irq;   /* irq for out-of-band wake-on-bt */
>>  };
>>
>>  static inline void btusb_free_frags(struct btusb_data *data)
>> @@ -2728,6 +2733,65 @@ static int btusb_bcm_set_diag(struct hci_dev *hdev, 
>> bool enable)
>>  }
>>  #endif
>>
>> +#ifdef CONFIG_PM
>> +static irqreturn_t btusb_oob_wake_handler(int irq, void *priv)
>> +{
>> + struct btusb_data *data = priv;
>> +
>> + /* Disable only if not already disabled (keep it balanced) */
>> + if (!test_and_set_bit(BTUSB_OOB_WAKE_DISABLED, >flags)) {
>> + disable_irq_wake(irq);
>> + disable_irq_nosync(irq);
>> + }
>> + pm_wakeup_event(>udev->dev, 0);
>> + return IRQ_HANDLED;
>> +}
>> +
>> +static const struct of_device_id btusb_match_table[] = {
>> + { .compatible = "usb1286,204e" },
>> + { }
>> +};
>> +MODULE_DEVICE_TABLE(of, btusb_match_table);
>> +
>> +/* Use an oob wakeup pin? */
>> +static int btusb_config_oob_wake(struct hci_dev *hdev)
>> +{
>> + struct btusb_data *data = hci_get_drvdata(hdev);
>> + struct device *dev = >udev->dev;
>> + int irq, ret;
>> +
>> + if (!of_match_device(btusb_match_table, 

RE: [upstream-release] [PATCH net 2/4] fsl/fman: arm: call of_platform_populate() for arm64 platfrom

2016-12-16 Thread Madalin-Cristian Bucur
> From: Scott Wood
> Sent: Thursday, December 15, 2016 8:42 PM
> 
> On 12/15/2016 07:11 AM, Madalin Bucur wrote:
> > From: Igal Liberman 
> >
> > Signed-off-by: Igal Liberman 
> > ---
> >  drivers/net/ethernet/freescale/fman/fman.c | 10 ++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/freescale/fman/fman.c
> b/drivers/net/ethernet/freescale/fman/fman.c
> > index dafd9e1..f36b4eb 100644
> > --- a/drivers/net/ethernet/freescale/fman/fman.c
> > +++ b/drivers/net/ethernet/freescale/fman/fman.c
> > @@ -2868,6 +2868,16 @@ static struct fman *read_dts_node(struct
> platform_device *of_dev)
> >
> > fman->dev = _dev->dev;
> >
> > +#ifdef CONFIG_ARM64
> > +   /* call of_platform_populate in order to probe sub-nodes on arm64 */
> > +   err = of_platform_populate(fm_node, NULL, NULL, _dev->dev);
> > +   if (err) {
> > +   dev_err(_dev->dev, "%s: of_platform_populate() failed\n",
> > +   __func__);
> > +   goto fman_free;
> > +   }
> > +#endif
> 
> Should we remove fsl,fman from the PPC of_device_ids[], so this doesn't
> need an ifdef?
> 
> Why is it #ifdef CONFIG_ARM64 rather than #ifndef CONFIG_PPC?
> 
> -Scott

Igal was working on adding ARM64 support when this patch was created, thus the
choice of #ifdef CONFIG_ARM64. Unifying this for PPC and ARM64 by always calling
of_platform_populate() sounds like the best approach. I would need to 
synchronize
the introduction of this code with the removal of the fsl,fman entry from the
of_device_ids[] array.

Dave, Michael, Scott, is it ok to add to v2 of this patch set the patch that 
removes
the compatible "fsl,fman" from arch/powerpc/platforms/85xx/corenet_generic.c?

Thanks,
Madalin



Re: [PATCH 1/1] tools: net: bpf_dbg.c fixed keyboard typo

2016-12-16 Thread Daniel Borkmann

On 12/16/2016 07:21 PM, Ozgur Karatas wrote:


This patch fixed to keyboard typo, brackets not closed.
I think, it should be close to parenthes.

Signed-off-by: Ozgur Karatas 


NAK for obvious reasons ...


---
  tools/net/bpf_dbg.c   | 2 +-
  1 files changed, 1 insertion(+), 1 deletions(-)

diff --git a/tools/net/bpf_dbg.c b/tools/net/bpf_dbg.c
index 4f254bc..f715f46 100644
--- a/tools/net/bpf_dbg.c
+++ b/tools/net/bpf_dbg.c
@@ -1213,7 +1213,7 @@ static int cmd_disassemble(char *line_string)

if (!bpf_prog_loaded())
return CMD_ERR;
-   if (strlen(line_string) > 0 &&
+   if (strlen(line_string) > 0 &&)
(line = strtoul(line_string, NULL, 10)) < bpf_prog_len)
single_line = true;
if (single_line)



[PATCH v2 2/3] Bluetooth: btusb: Add out-of-band wakeup support

2016-12-16 Thread Rajat Jain
Some onboard BT chips (e.g. Marvell 8997) contain a wakeup pin that
can be connected to a gpio on the CPU side, and can be used to wakeup
the host out-of-band. This can be useful in situations where the
in-band wakeup is not possible or not preferable (e.g. the in-band
wakeup may require the USB host controller to remain active, and
hence consuming more system power during system sleep).

The oob gpio interrupt to be used for wakeup on the CPU side, is
read from the device tree node, (using standard interrupt descriptors).
A devcie tree binding document is also added for the driver. The
compatible string is in compliance with
Documentation/devicetree/bindings/usb/usb-device.txt

Signed-off-by: Rajat Jain 
---
v2: * Use interrupt-names ("wakeup") instead of assuming first interrupt.
* Leave it on device tree to specify IRQ flags (level /edge triggered)
* Mark the device as non wakeable on exit.

 Documentation/devicetree/bindings/net/btusb.txt | 40 
 drivers/bluetooth/btusb.c   | 84 +
 2 files changed, 124 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/btusb.txt

diff --git a/Documentation/devicetree/bindings/net/btusb.txt 
b/Documentation/devicetree/bindings/net/btusb.txt
new file mode 100644
index 000..2c0355c
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/btusb.txt
@@ -0,0 +1,40 @@
+Generic Bluetooth controller over USB (btusb driver)
+---
+
+Required properties:
+
+  - compatible : should comply with the format "usbVID,PID" specified in
+Documentation/devicetree/bindings/usb/usb-device.txt
+At the time of writing, the only OF supported devices
+(more may be added later) are:
+
+ "usb1286,204e" (Marvell 8997)
+
+Optional properties:
+
+  - interrupt-parent: phandle of the parent interrupt controller
+  - interrupt-names: (see below)
+  - interrupts : The interrupt specified by the name "wakeup" is the interrupt
+that shall be used for out-of-band wake-on-bt. Driver will
+request this interrupt for wakeup. During system suspend, the
+irq will be enabled so that the bluetooth chip can wakeup host
+platform out of band. During system resume, the irq will be
+disabled to make sure unnecessary interrupt is not received.
+
+Example:
+
+Following example uses irq pin number 3 of gpio0 for out of band wake-on-bt:
+
+_host1_ehci {
+status = "okay";
+#address-cells = <1>;
+#size-cells = <0>;
+
+mvl_bt1: bt@1 {
+   compatible = "usb1286,204e";
+   reg = <1>;
+   interrupt-parent = <>;
+   interrupt-name = "wakeup";
+   interrupts = <3 IRQ_TYPE_LEVEL_LOW>;
+};
+};
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index ce22cef..beca4e9 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -24,6 +24,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #include 
@@ -369,6 +371,7 @@ static const struct usb_device_id blacklist_table[] = {
 #define BTUSB_BOOTING  9
 #define BTUSB_RESET_RESUME 10
 #define BTUSB_DIAG_RUNNING 11
+#define BTUSB_OOB_WAKE_DISABLED12
 
 struct btusb_data {
struct hci_dev   *hdev;
@@ -416,6 +419,8 @@ struct btusb_data {
int (*recv_bulk)(struct btusb_data *data, void *buffer, int count);
 
int (*setup_on_usb)(struct hci_dev *hdev);
+
+   int oob_wake_irq;   /* irq for out-of-band wake-on-bt */
 };
 
 static inline void btusb_free_frags(struct btusb_data *data)
@@ -2728,6 +2733,65 @@ static int btusb_bcm_set_diag(struct hci_dev *hdev, bool 
enable)
 }
 #endif
 
+#ifdef CONFIG_PM
+static irqreturn_t btusb_oob_wake_handler(int irq, void *priv)
+{
+   struct btusb_data *data = priv;
+
+   /* Disable only if not already disabled (keep it balanced) */
+   if (!test_and_set_bit(BTUSB_OOB_WAKE_DISABLED, >flags)) {
+   disable_irq_nosync(irq);
+   disable_irq_wake(irq);
+   }
+   pm_wakeup_event(>udev->dev, 0);
+   return IRQ_HANDLED;
+}
+
+static const struct of_device_id btusb_match_table[] = {
+   { .compatible = "usb1286,204e" },
+   { }
+};
+MODULE_DEVICE_TABLE(of, btusb_match_table);
+
+/* Use an oob wakeup pin? */
+static int btusb_config_oob_wake(struct hci_dev *hdev)
+{
+   struct btusb_data *data = hci_get_drvdata(hdev);
+   struct device *dev = >udev->dev;
+   int irq, ret;
+
+   if (!of_match_device(btusb_match_table, dev))
+   return 0;
+
+   /* Move on if no IRQ specified */
+   irq = of_irq_get_byname(dev->of_node, "wakeup");
+   if (irq <= 0) {
+   bt_dev_dbg(hdev, "%s: no OOB Wakeup IRQ in DT", __func__);
+   return 0;
+   }
+
+   set_bit(BTUSB_OOB_WAKE_DISABLED, >flags);
+
+   ret = 

[PATCH v2 3/3] Bluetooth: btusb: Configure Marvell to use one of the pins for oob wakeup

2016-12-16 Thread Rajat Jain
The Marvell devices may have many gpio pins, and hence for wakeup
on these out-of-band pins, the chip needs to be told which pin is
to be used for wakeup, using an hci command.

Thus, we read the pin number etc from the device tree node and send
a command to the chip.

Signed-off-by: Rajat Jain 
---
v2: Fix the binding document to specify to use "wakeup" interrupt-name

 .../{marvell-bt-sd8xxx.txt => marvell-bt-8xxx.txt} | 46 ++---
 drivers/bluetooth/btusb.c  | 59 ++
 2 files changed, 97 insertions(+), 8 deletions(-)
 rename Documentation/devicetree/bindings/net/{marvell-bt-sd8xxx.txt => 
marvell-bt-8xxx.txt} (50%)

diff --git a/Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt 
b/Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
similarity index 50%
rename from Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt
rename to Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
index 6a9a63c..9be1059 100644
--- a/Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt
+++ b/Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt
@@ -1,16 +1,21 @@
-Marvell 8897/8997 (sd8897/sd8997) bluetooth SDIO devices
+Marvell 8897/8997 (sd8897/sd8997) bluetooth devices (SDIO or USB based)
 --
+The 8997 devices supports multiple interfaces. When used on SDIO interfaces,
+the btmrvl driver is used and when used on USB interface, the btusb driver is
+used.
 
 Required properties:
 
   - compatible : should be one of the following:
-   * "marvell,sd8897-bt"
-   * "marvell,sd8997-bt"
+   * "marvell,sd8897-bt" (for SDIO)
+   * "marvell,sd8997-bt" (for SDIO)
+   * "usb1286,204e"  (for USB)
 
 Optional properties:
 
   - marvell,cal-data: Calibration data downloaded to the device during
  initialization. This is an array of 28 values(u8).
+ This is only applicable to SDIO devices.
 
   - marvell,wakeup-pin: It represents wakeup pin number of the bluetooth chip.
firmware will use the pin to wakeup host system (u16).
@@ -18,10 +23,15 @@ Optional properties:
  platform. The value will be configured to firmware. This
  is needed to work chip's sleep feature as expected (u16).
   - interrupt-parent: phandle of the parent interrupt controller
-  - interrupts : interrupt pin number to the cpu. Driver will request an irq 
based
-on this interrupt number. During system suspend, the irq will 
be
-enabled so that the bluetooth chip can wakeup host platform 
under
-certain condition. During system resume, the irq will be 
disabled
+  - interrupt-names: Used only for USB based devices (See below)
+  - interrupts : specifies the interrupt pin number to the cpu. For SDIO, the
+driver will use the first interrupt specified in the interrupt
+array. For USB based devices, the driver will use the interrupt
+named "wakeup" from the interrupt-names and interrupt arrays.
+The driver will request an irq based on this interrupt number.
+During system suspend, the irq will be enabled so that the
+bluetooth chip can wakeup host platform under certain
+conditions. During system resume, the irq will be disabled
 to make sure unnecessary interrupt is not received.
 
 Example:
@@ -29,7 +39,9 @@ Example:
 IRQ pin 119 is used as system wakeup source interrupt.
 wakeup pin 13 and gap 100ms are configured so that firmware can wakeup host
 using this device side pin and wakeup latency.
-calibration data is also available in below example.
+
+Example for SDIO device follows (calibration data is also available in
+below example).
 
  {
status = "okay";
@@ -54,3 +66,21 @@ calibration data is also available in below example.
marvell,wakeup-gap-ms = /bits/ 16 <0x64>;
};
 };
+
+Example for USB device:
+
+_host1_ohci {
+status = "okay";
+#address-cells = <1>;
+#size-cells = <0>;
+
+mvl_bt1: bt@1 {
+   compatible = "usb1286,204e";
+   reg = <1>;
+   interrupt-parent = <>;
+   interrupt-names = "wakeup";
+   interrupts = <119 IRQ_TYPE_LEVEL_LOW>;
+   marvell,wakeup-pin = /bits/ 16 <0x0d>;
+   marvell,wakeup-gap-ms = /bits/ 16 <0x64>;
+};
+};
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index beca4e9..455b3d0 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -2343,6 +2343,58 @@ static int btusb_shutdown_intel(struct hci_dev *hdev)
return 0;
 }
 
+#ifdef CONFIG_PM
+static const struct of_device_id mvl_oob_wake_match_table[] = {
+   { .compatible = "usb1286,204e" },
+   { }
+};
+MODULE_DEVICE_TABLE(of, mvl_oob_wake_match_table);
+
+/* Configure an out-of-band gpio as wake-up pin, if specified in device tree */
+static int 

[PATCH v2 1/3] Bluetooth: btusb: Use an error label for error paths

2016-12-16 Thread Rajat Jain
Use a label to remove the repetetive cleanup, for error cases.

Signed-off-by: Rajat Jain 
---
v2: same as v1

 drivers/bluetooth/btusb.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
index 2f633df..ce22cef 100644
--- a/drivers/bluetooth/btusb.c
+++ b/drivers/bluetooth/btusb.c
@@ -2991,18 +2991,15 @@ static int btusb_probe(struct usb_interface *intf,
err = usb_set_interface(data->udev, 0, 0);
if (err < 0) {
BT_ERR("failed to set interface 0, alt 0 %d", err);
-   hci_free_dev(hdev);
-   return err;
+   goto out_free_dev;
}
}
 
if (data->isoc) {
err = usb_driver_claim_interface(_driver,
 data->isoc, data);
-   if (err < 0) {
-   hci_free_dev(hdev);
-   return err;
-   }
+   if (err < 0)
+   goto out_free_dev;
}
 
 #ifdef CONFIG_BT_HCIBTUSB_BCM
@@ -3016,14 +3013,16 @@ static int btusb_probe(struct usb_interface *intf,
 #endif
 
err = hci_register_dev(hdev);
-   if (err < 0) {
-   hci_free_dev(hdev);
-   return err;
-   }
+   if (err < 0)
+   goto out_free_dev;
 
usb_set_intfdata(intf, data);
 
return 0;
+
+out_free_dev:
+   hci_free_dev(hdev);
+   return err;
 }
 
 static void btusb_disconnect(struct usb_interface *intf)
-- 
2.8.0.rc3.226.g39d4020



Re: [PATCH 1/1] tools: net: bpf_dbg.c fixed keyboard typo

2016-12-16 Thread Ozgur Karatas


16.12.2016, 21:08, "Sergei Shtylyov" :
> Hello.

Hi

> On 12/16/2016 09:21 PM, Ozgur Karatas wrote:
>
>>  This patch fixed to keyboard typo, brackets not closed.
>>  I think, it should be close to parenthes.
>>
>>  Signed-off-by: Ozgur Karatas 
>>  ---
>>   tools/net/bpf_dbg.c | 2 +-
>>   1 files changed, 1 insertion(+), 1 deletions(-)
>>
>>  diff --git a/tools/net/bpf_dbg.c b/tools/net/bpf_dbg.c
>>  index 4f254bc..f715f46 100644
>>  --- a/tools/net/bpf_dbg.c
>>  +++ b/tools/net/bpf_dbg.c
>>  @@ -1213,7 +1213,7 @@ static int cmd_disassemble(char *line_string)
>>
>>   if (!bpf_prog_loaded())
>>   return CMD_ERR;
>>  - if (strlen(line_string) > 0 &&
>>  + if (strlen(line_string) > 0 &&)
>
> Have tried to you compile that? :-/

Yes, i compiled but I apologize if there was NAK.
Also, checkpatch give a error.

I could be wrong, will review again.

Best Regards!

>>   (line = strtoul(line_string, NULL, 10)) < bpf_prog_len)
>
> I think the code was correct before your patch...
>
>>   single_line = true;
>>   if (single_line)
>
> MBR, Sergei

~Ozgur


Re: [PATCH 1/1] tools: net: bpf_dbg.c fixed keyboard typo

2016-12-16 Thread Sergei Shtylyov

Hello.

On 12/16/2016 09:21 PM, Ozgur Karatas wrote:


This patch fixed to keyboard typo, brackets not closed.
I think, it should be close to parenthes.

Signed-off-by: Ozgur Karatas 
---
 tools/net/bpf_dbg.c   | 2 +-
 1 files changed, 1 insertion(+), 1 deletions(-)

diff --git a/tools/net/bpf_dbg.c b/tools/net/bpf_dbg.c
index 4f254bc..f715f46 100644
--- a/tools/net/bpf_dbg.c
+++ b/tools/net/bpf_dbg.c
@@ -1213,7 +1213,7 @@ static int cmd_disassemble(char *line_string)

if (!bpf_prog_loaded())
return CMD_ERR;
-   if (strlen(line_string) > 0 &&
+   if (strlen(line_string) > 0 &&)


   Have tried to you compile that? :-/


(line = strtoul(line_string, NULL, 10)) < bpf_prog_len)


   I think the code was correct before your patch...


single_line = true;
if (single_line)


MBR, Sergei



Re: [TSN RFC v2 0/9] TSN driver for the kernel

2016-12-16 Thread Henrik Austad
On Fri, Dec 16, 2016 at 01:20:57PM -0500, David Miller wrote:
> From: Greg 
> Date: Fri, 16 Dec 2016 10:12:44 -0800
> 
> > On Fri, 2016-12-16 at 18:59 +0100, hen...@austad.us wrote:
> >> From: Henrik Austad 
> >> 
> >> 
> >> The driver is directed via ConfigFS as we need userspace to handle
> >> stream-reservation (MSRP), discovery and enumeration (IEEE 1722.1) and
> >> whatever other management is needed. This also includes running an
> >> appropriate PTP daemon (TSN favors gPTP).
> > 
> > I suggest using a generic netlink interface to communicate with the
> > driver to set up and/or configure your drivers.
> > 
> > I think configfs is frowned upon for network drivers.  YMMV.
> 
> Agreed.

Ok - thanks!

I will have look at netlink and see if I can wrap my head around it and if 
I can apply it to how to bring the media-devices up once the TSN-link has 
been configured.

Thanks! :)

-- 
Henrik Austad


signature.asc
Description: PGP signature


Re: [PATCH 1/1] tools: net: bpf_dbg.c fixed keyboard typo

2016-12-16 Thread Ozgur Karatas
16.12.2016, 20:35, "Joe Perches" :
> On Fri, 2016-12-16 at 20:21 +0200, Ozgur Karatas wrote:
>>  This patch fixed to keyboard typo, brackets not closed.
>>  I think, it should be close to parenthes.
>
> No.
>
> Please compile and test your patches on your own system
> before you send them.

Also, checkpatch script give a error, it should not forget.

$ ./scripts/checkpatch.pl --file --terse tools/net/bpf_dbg.c
tools/net/bpf_dbg.c:1216: ERROR: do not use assignment in if condition

After fix:

$ ./scripts/checkpatch.pl --file --terse tools/net/bpf_dbg.c 
total: 0 errors, 6 warnings, 1395 lines checked

Regards,

~Ozgur


Re: [PATCH 1/1] tools: net: bpf_dbg.c fixed keyboard typo

2016-12-16 Thread Ozgur Karatas
16.12.2016, 20:35, "Joe Perches" :
> On Fri, 2016-12-16 at 20:21 +0200, Ozgur Karatas wrote:
>>  This patch fixed to keyboard typo, brackets not closed.
>>  I think, it should be close to parenthes.
>
> No.
>
> Please compile and test your patches on your own system
> before you send them.

Dear Perches,

I have already tested and it was not a part of the code anyway. if there is no 
parentheses, the code works incorrectly and give a error. 
I'm sorry, have a little problem with my english but "line_string" variables 
would not equal NULL, 10. So the code it skips it and runs to "bpf_prog_len".
If it should be equal "0 &&" and already be completed (>) right?

if (strlen(line_string) > 0 &&
(line = strtoul(line_string, NULL, 10)) < bpf_prog_len)

Testing:

$ make M=tools/
tools//Makefile:6: scripts/Makefile.include: No such file or directory
$ cp tools/scripts/Makefile.include scripts/Makefile
$ make M=tools/
  Building modules, stage 2.
  MODPOST 0 modules

I try to module (insmod) and worked.

Regards,

 ~Ozgur


[PATCHv5 perf/core 3/5] tools lib bpf: Add bpf_prog_{attach,detach}

2016-12-16 Thread Joe Stringer
Commit d8c5b17f2bc0 ("samples: bpf: add userspace example for attaching
eBPF programs to cgroups") added these functions to samples/libbpf, but
during this merge all of the samples libbpf functionality is shifting to
tools/lib/bpf. Shift these functions there.

Signed-off-by: Joe Stringer 
---
v5: No change.
---
 samples/bpf/libbpf.c | 21 -
 samples/bpf/libbpf.h |  3 ---
 tools/lib/bpf/bpf.c  | 21 +
 tools/lib/bpf/bpf.h  |  3 +++
 4 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
index 3391225ad7e9..d9af876b4a2c 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/libbpf.c
@@ -11,27 +11,6 @@
 #include 
 #include "libbpf.h"
 
-int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type)
-{
-   union bpf_attr attr = {
-   .target_fd = target_fd,
-   .attach_bpf_fd = prog_fd,
-   .attach_type = type,
-   };
-
-   return syscall(__NR_bpf, BPF_PROG_ATTACH, , sizeof(attr));
-}
-
-int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
-{
-   union bpf_attr attr = {
-   .target_fd = target_fd,
-   .attach_type = type,
-   };
-
-   return syscall(__NR_bpf, BPF_PROG_DETACH, , sizeof(attr));
-}
-
 int open_raw_sock(const char *name)
 {
struct sockaddr_ll sll;
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index cf7d2386d1f9..cc815624aacf 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -6,9 +6,6 @@
 
 struct bpf_insn;
 
-int bpf_prog_attach(int prog_fd, int attachable_fd, enum bpf_attach_type type);
-int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
-
 /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
 
 #define BPF_ALU64_REG(OP, DST, SRC)\
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index d0afb26c2e0f..e19335df0d3a 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -167,3 +167,24 @@ int bpf_obj_get(const char *pathname)
 
return sys_bpf(BPF_OBJ_GET, , sizeof(attr));
 }
+
+int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type)
+{
+   union bpf_attr attr = {
+   .target_fd = target_fd,
+   .attach_bpf_fd = prog_fd,
+   .attach_type = type,
+   };
+
+   return sys_bpf(BPF_PROG_ATTACH, , sizeof(attr));
+}
+
+int bpf_prog_detach(int target_fd, enum bpf_attach_type type)
+{
+   union bpf_attr attr = {
+   .target_fd = target_fd,
+   .attach_type = type,
+   };
+
+   return sys_bpf(BPF_PROG_DETACH, , sizeof(attr));
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 7fcdce16fd62..a2f9853dd882 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -41,5 +41,8 @@ int bpf_map_delete_elem(int fd, void *key);
 int bpf_map_get_next_key(int fd, void *key, void *next_key);
 int bpf_obj_pin(int fd, const char *pathname);
 int bpf_obj_get(const char *pathname);
+int bpf_prog_attach(int prog_fd, int attachable_fd, enum bpf_attach_type type);
+int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
+
 
 #endif
-- 
2.10.2



[PATCHv5 perf/core 5/5] samples/bpf: Move open_raw_sock to separate header

2016-12-16 Thread Joe Stringer
This function was declared in libbpf.c and was the only remaining
function in this library, but has nothing to do with BPF. Shift it out
into a new header, sock_example.h, and include it from the relevant
samples.

Signed-off-by: Joe Stringer 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Wang Nan 
Link: http://lkml.kernel.org/r/20161209024620.31660-8-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
v5: No change.
---
 samples/bpf/Makefile | 2 +-
 samples/bpf/fds_example.c| 1 +
 samples/bpf/libbpf.h | 3 ---
 samples/bpf/sock_example.c   | 1 +
 samples/bpf/{libbpf.c => sock_example.h} | 3 +--
 samples/bpf/sockex1_user.c   | 1 +
 samples/bpf/sockex2_user.c   | 1 +
 samples/bpf/sockex3_user.c   | 1 +
 8 files changed, 7 insertions(+), 6 deletions(-)
 rename samples/bpf/{libbpf.c => sock_example.h} (92%)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 5a73f5a7ace1..f01b66f277b0 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -36,7 +36,7 @@ hostprogs-y += lwt_len_hist
 hostprogs-y += xdp_tx_iptunnel
 
 # Libbpf dependencies
-LIBBPF := libbpf.o ../../tools/lib/bpf/bpf.o
+LIBBPF := ../../tools/lib/bpf/bpf.o
 
 test_lru_dist-objs := test_lru_dist.o $(LIBBPF)
 sock_example-objs := sock_example.o $(LIBBPF)
diff --git a/samples/bpf/fds_example.c b/samples/bpf/fds_example.c
index 6245062844d1..92592e38569b 100644
--- a/samples/bpf/fds_example.c
+++ b/samples/bpf/fds_example.c
@@ -14,6 +14,7 @@
 
 #include "bpf_load.h"
 #include "libbpf.h"
+#include "sock_example.h"
 
 #define BPF_F_PIN  (1 << 0)
 #define BPF_F_GET  (1 << 1)
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index 09aedc320009..3705fba453a0 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -185,7 +185,4 @@ struct bpf_insn;
.off   = 0, \
.imm   = 0 })
 
-/* create RAW socket and bind to interface 'name' */
-int open_raw_sock(const char *name);
-
 #endif
diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c
index 5546f8aac37e..6fc6e193ef1b 100644
--- a/samples/bpf/sock_example.c
+++ b/samples/bpf/sock_example.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include "libbpf.h"
+#include "sock_example.h"
 
 char bpf_log_buf[BPF_LOG_BUF_SIZE];
 
diff --git a/samples/bpf/libbpf.c b/samples/bpf/sock_example.h
similarity index 92%
rename from samples/bpf/libbpf.c
rename to samples/bpf/sock_example.h
index bee473a494f1..09f7fe7e5fd7 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/sock_example.h
@@ -1,4 +1,3 @@
-/* eBPF mini library */
 #include 
 #include 
 #include 
@@ -11,7 +10,7 @@
 #include 
 #include "libbpf.h"
 
-int open_raw_sock(const char *name)
+static inline int open_raw_sock(const char *name)
 {
struct sockaddr_ll sll;
int sock;
diff --git a/samples/bpf/sockex1_user.c b/samples/bpf/sockex1_user.c
index 9454448bf198..6cd2feb3e9b3 100644
--- a/samples/bpf/sockex1_user.c
+++ b/samples/bpf/sockex1_user.c
@@ -3,6 +3,7 @@
 #include 
 #include "libbpf.h"
 #include "bpf_load.h"
+#include "sock_example.h"
 #include 
 #include 
 
diff --git a/samples/bpf/sockex2_user.c b/samples/bpf/sockex2_user.c
index 6a40600d5a83..0e0207c90841 100644
--- a/samples/bpf/sockex2_user.c
+++ b/samples/bpf/sockex2_user.c
@@ -3,6 +3,7 @@
 #include 
 #include "libbpf.h"
 #include "bpf_load.h"
+#include "sock_example.h"
 #include 
 #include 
 #include 
diff --git a/samples/bpf/sockex3_user.c b/samples/bpf/sockex3_user.c
index 9099c4255f23..b5524d417eb5 100644
--- a/samples/bpf/sockex3_user.c
+++ b/samples/bpf/sockex3_user.c
@@ -3,6 +3,7 @@
 #include 
 #include "libbpf.h"
 #include "bpf_load.h"
+#include "sock_example.h"
 #include 
 #include 
 #include 
-- 
2.10.2



[PATCHv5 perf/core 4/5] samples/bpf: Remove perf_event_open() declaration

2016-12-16 Thread Joe Stringer
This declaration was made in samples/bpf/libbpf.c for convenience, but
there's already one in tools/perf/perf-sys.h. Reuse that one.

Committer notes:

Testing it:

  $ make -j4 O=../build/v4.9.0-rc8+ samples/bpf/
  make[1]: Entering directory '/home/build/v4.9.0-rc8+'
CHK include/config/kernel.release
GEN ./Makefile
CHK include/generated/uapi/linux/version.h
Using /home/acme/git/linux as source for kernel
CHK include/generated/utsrelease.h
CHK include/generated/timeconst.h
CHK include/generated/bounds.h
CHK include/generated/asm-offsets.h
CALL/home/acme/git/linux/scripts/checksyscalls.sh
HOSTCC  samples/bpf/test_verifier.o
HOSTCC  samples/bpf/libbpf.o
HOSTCC  samples/bpf/../../tools/lib/bpf/bpf.o
HOSTCC  samples/bpf/test_maps.o
HOSTCC  samples/bpf/sock_example.o
HOSTCC  samples/bpf/bpf_load.o

HOSTLD  samples/bpf/trace_event
HOSTLD  samples/bpf/sampleip
HOSTLD  samples/bpf/tc_l2_redirect
  make[1]: Leaving directory '/home/build/v4.9.0-rc8+'
  $

Signed-off-by: Joe Stringer 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Wang Nan 
Link: http://lkml.kernel.org/r/20161209024620.31660-7-...@ovn.org
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
v5: No change.
---
 samples/bpf/Makefile| 2 ++
 samples/bpf/bpf_load.c  | 3 ++-
 samples/bpf/libbpf.c| 7 ---
 samples/bpf/libbpf.h| 3 ---
 samples/bpf/sampleip_user.c | 3 ++-
 samples/bpf/trace_event_user.c  | 9 +
 samples/bpf/trace_output_user.c | 3 ++-
 samples/bpf/tracex6_user.c  | 3 ++-
 8 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 81b0ef2f7994..5a73f5a7ace1 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -109,6 +109,8 @@ always += xdp_tx_iptunnel_kern.o
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
 HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/
+HOSTCFLAGS += -I$(srctree)/tools/lib/ -I$(srctree)/tools/include
+HOSTCFLAGS += -I$(srctree)/tools/perf
 
 HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
 HOSTLOADLIBES_fds_example += -lelf
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 1bfb43394013..396e204888b3 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -23,6 +23,7 @@
 #include 
 #include "libbpf.h"
 #include "bpf_load.h"
+#include "perf-sys.h"
 
 #define DEBUGFS "/sys/kernel/debug/tracing/"
 
@@ -179,7 +180,7 @@ static int load_and_attach(const char *event, struct 
bpf_insn *prog, int size)
id = atoi(buf);
attr.config = id;
 
-   efd = perf_event_open(, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
+   efd = sys_perf_event_open(, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 
0);
if (efd < 0) {
printf("event %d fd %d err %s\n", id, efd, strerror(errno));
return -1;
diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
index d9af876b4a2c..bee473a494f1 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/libbpf.c
@@ -34,10 +34,3 @@ int open_raw_sock(const char *name)
 
return sock;
 }
-
-int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
-   int group_fd, unsigned long flags)
-{
-   return syscall(__NR_perf_event_open, attr, pid, cpu,
-  group_fd, flags);
-}
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index cc815624aacf..09aedc320009 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -188,7 +188,4 @@ struct bpf_insn;
 /* create RAW socket and bind to interface 'name' */
 int open_raw_sock(const char *name);
 
-struct perf_event_attr;
-int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
-   int group_fd, unsigned long flags);
 #endif
diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
index 5ac5adf75931..be59d7dcbdde 100644
--- a/samples/bpf/sampleip_user.c
+++ b/samples/bpf/sampleip_user.c
@@ -21,6 +21,7 @@
 #include 
 #include "libbpf.h"
 #include "bpf_load.h"
+#include "perf-sys.h"
 
 #define DEFAULT_FREQ   99
 #define DEFAULT_SECS   5
@@ -49,7 +50,7 @@ static int sampling_start(int *pmu_fd, int freq)
};
 
for (i = 0; i < nr_cpus; i++) {
-   pmu_fd[i] = perf_event_open(_sample_attr, -1 /* pid */, i,
+   pmu_fd[i] = sys_perf_event_open(_sample_attr, -1 /* pid */, 
i,
-1 /* group_fd */, 0 /* flags */);
if (pmu_fd[i] < 0) {
fprintf(stderr, "ERROR: Initializing perf sampling\n");
diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
index 704fe9fa77b2..0c5561d193a4 

Re: [PATCH 1/1] tools: net: bpf_dbg.c fixed keyboard typo

2016-12-16 Thread Joe Perches
On Fri, 2016-12-16 at 20:21 +0200, Ozgur Karatas wrote:
> This patch fixed to keyboard typo, brackets not closed. 
> I think, it should be close to parenthes.

No.

Please compile and test your patches on your own system
before you send them.



[PATCHv5 perf/core 1/5] samples/bpf: Make samples more libbpf-centric

2016-12-16 Thread Joe Stringer
Switch all of the sample code to use the function names from
tools/lib/bpf so that they're consistent with that, and to declare their
own log buffers. This allow the next commit to be purely devoted to
getting rid of the duplicate library in samples/bpf.

Committer notes:

Testing it:

On a fedora rawhide container, with clang/llvm 3.9, sharing the host
linux kernel git tree:

  # make O=/tmp/build/linux/ headers_install
  # make O=/tmp/build/linux -C samples/bpf/

Since I forgot to make it privileged, just tested it outside the
container, using what it generated:

  # uname -a
  Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 
x86_64 GNU/Linux
  # cd 
/var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/
  # ls -la offwaketime
  -rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime
  # file offwaketime
  offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically 
linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, 
BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped
  # readelf -SW offwaketime_kern.o  | grep PROGBITS
  [ 2] .text PROGBITS 40 00 00  AX  
0   0  4
  [ 3] kprobe/try_to_wake_up PROGBITS 40 d8 00  
AX  0   0  8
  [ 5] tracepoint/sched/sched_switch PROGBITS 000118 
000318 00  AX  0   0  8
  [ 7] maps  PROGBITS 000430 50 00  WA  
0   0  4
  [ 8] license   PROGBITS 000480 04 00  WA  
0   0  1
  [ 9] version   PROGBITS 000484 04 00  WA  
0   0  4
  # ./offwaketime | head -5
  
swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;;
 106
  CPU 
0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3
 2
  
Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh
 5
  
firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer
 13
  JS 
Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox
 2
  #

Signed-off-by: Joe Stringer 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Wang Nan 
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20161214224342.12858-2-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
v5: No change.
---
 samples/bpf/bpf_load.c| 17 +---
 samples/bpf/bpf_load.h|  3 +++
 samples/bpf/fds_example.c |  9 ---
 samples/bpf/lathist_user.c|  2 +-
 samples/bpf/libbpf.c  | 23 
 samples/bpf/libbpf.h  | 18 ++---
 samples/bpf/lwt_len_hist_user.c   |  6 +++--
 samples/bpf/offwaketime_user.c|  8 +++---
 samples/bpf/sampleip_user.c   |  4 +--
 samples/bpf/sock_example.c| 12 +
 samples/bpf/sockex1_user.c|  6 ++---
 samples/bpf/sockex2_user.c|  4 +--
 samples/bpf/sockex3_user.c|  4 +--
 samples/bpf/spintest_user.c   |  8 +++---
 samples/bpf/tc_l2_redirect_user.c |  4 +--
 samples/bpf/test_cgrp2_array_pin.c|  4 +--
 samples/bpf/test_cgrp2_attach.c   | 11 +---
 samples/bpf/test_cgrp2_attach2.c  |  7 +++--
 samples/bpf/test_cgrp2_sock.c |  6 +++--
 samples/bpf/test_current_task_under_cgroup_user.c |  8 +++---
 samples/bpf/test_lru_dist.c   | 32 +++
 samples/bpf/test_probe_write_user_user.c  |  2 +-
 samples/bpf/trace_event_user.c| 14 +-
 samples/bpf/trace_output_user.c   |  2 +-
 samples/bpf/tracex2_user.c| 10 +++
 samples/bpf/tracex3_user.c  

[PATCHv5 perf/core 0/5] Reuse libbpf from samples/bpf

2016-12-16 Thread Joe Stringer
Update tools/lib/bpf to provide the remaining bpf wrapper pieces needed by the
samples/bpf/ code, then get rid of all of the duplicate BPF libraries in
samples/bpf/libbpf.[ch].

---
v5: Fixed prog_size vs. instruction count API difference in bpf_load_program()

REBASE: Rebased v3 that was applied to perf/core.
Resolved merge conflict with net-next.
New patch shifts bpf_prog_{attach,detach}() to libbpf.
Drop unnecessary build targets
Drop extra unneeded log buffers

v3: Add ack for first patch.
Split out second patch from v2 into separate changes for remaining diff.
Add patches to switch samples/bpf over to using tools/lib/.

(Was "libbpf: Synchronize implementations")
v2: Don't shift non-bpf code into libbpf.
Drop the patch to synchronize ELF definitions with tc.

v1: https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
First post.

Joe Stringer (5):
  samples/bpf: Make samples more libbpf-centric
  samples/bpf: Switch over to libbpf
  tools lib bpf: Add bpf_prog_{attach,detach}
  samples/bpf: Remove perf_event_open() declaration
  samples/bpf: Move open_raw_sock to separate header

 samples/bpf/Makefile  |  70 +
 samples/bpf/README.rst|   4 +-
 samples/bpf/bpf_load.c|  21 ++-
 samples/bpf/bpf_load.h|   3 +
 samples/bpf/fds_example.c |  11 +-
 samples/bpf/lathist_user.c|   2 +-
 samples/bpf/libbpf.c  | 176 --
 samples/bpf/libbpf.h  |  28 +---
 samples/bpf/lwt_len_hist_user.c   |   6 +-
 samples/bpf/offwaketime_user.c|   8 +-
 samples/bpf/sampleip_user.c   |   7 +-
 samples/bpf/sock_example.c|  14 +-
 samples/bpf/sock_example.h|  35 +
 samples/bpf/sockex1_user.c|   7 +-
 samples/bpf/sockex2_user.c|   5 +-
 samples/bpf/sockex3_user.c|   5 +-
 samples/bpf/spintest_user.c   |   8 +-
 samples/bpf/tc_l2_redirect_user.c |   4 +-
 samples/bpf/test_cgrp2_array_pin.c|   4 +-
 samples/bpf/test_cgrp2_attach.c   |  12 +-
 samples/bpf/test_cgrp2_attach2.c  |   8 +-
 samples/bpf/test_cgrp2_sock.c |   7 +-
 samples/bpf/test_current_task_under_cgroup_user.c |   8 +-
 samples/bpf/test_lru_dist.c   |  32 ++--
 samples/bpf/test_probe_write_user_user.c  |   2 +-
 samples/bpf/trace_event_user.c|  23 +--
 samples/bpf/trace_output_user.c   |   5 +-
 samples/bpf/tracex2_user.c|  10 +-
 samples/bpf/tracex3_user.c|   4 +-
 samples/bpf/tracex4_user.c|   4 +-
 samples/bpf/tracex6_user.c|   5 +-
 samples/bpf/xdp1_user.c   |   2 +-
 samples/bpf/xdp_tx_iptunnel_user.c|   6 +-
 tools/lib/bpf/bpf.c   |  21 +++
 tools/lib/bpf/bpf.h   |   3 +
 35 files changed, 238 insertions(+), 332 deletions(-)
 delete mode 100644 samples/bpf/libbpf.c
 create mode 100644 samples/bpf/sock_example.h

-- 
2.10.2



[PATCHv5 perf/core 2/5] samples/bpf: Switch over to libbpf

2016-12-16 Thread Joe Stringer
Now that libbpf under tools/lib/bpf/* is synced with the version from
samples/bpf, we can get rid most of the libbpf library here.

Committer notes:

Tested it the same way as the previous patch in this series.

Signed-off-by: Joe Stringer 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Wang Nan 
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20161214224342.12858-3-...@ovn.org
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as 
noticed by Wang Nan ]
Signed-off-by: Arnaldo Carvalho de Melo 
---
v5: Joe - I took acme's version from his branch tmp.perf/samples-libbpf and
applied an incremental fix for the discrepancy between the
bpf_load_program() API - samples/ was using program size while libbpf uses
instruction count.
---
 samples/bpf/Makefile |  68 +---
 samples/bpf/README.rst   |   4 +-
 samples/bpf/bpf_load.c   |   3 +-
 samples/bpf/fds_example.c|   3 +-
 samples/bpf/libbpf.c | 111 ---
 samples/bpf/libbpf.h |  19 +--
 samples/bpf/sock_example.c   |   3 +-
 samples/bpf/test_cgrp2_attach.c  |   3 +-
 samples/bpf/test_cgrp2_attach2.c |   3 +-
 samples/bpf/test_cgrp2_sock.c|   3 +-
 10 files changed, 52 insertions(+), 168 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index f2219c1489e5..81b0ef2f7994 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -35,40 +35,43 @@ hostprogs-y += tc_l2_redirect
 hostprogs-y += lwt_len_hist
 hostprogs-y += xdp_tx_iptunnel
 
-test_lru_dist-objs := test_lru_dist.o libbpf.o
-sock_example-objs := sock_example.o libbpf.o
-fds_example-objs := bpf_load.o libbpf.o fds_example.o
-sockex1-objs := bpf_load.o libbpf.o sockex1_user.o
-sockex2-objs := bpf_load.o libbpf.o sockex2_user.o
-sockex3-objs := bpf_load.o libbpf.o sockex3_user.o
-tracex1-objs := bpf_load.o libbpf.o tracex1_user.o
-tracex2-objs := bpf_load.o libbpf.o tracex2_user.o
-tracex3-objs := bpf_load.o libbpf.o tracex3_user.o
-tracex4-objs := bpf_load.o libbpf.o tracex4_user.o
-tracex5-objs := bpf_load.o libbpf.o tracex5_user.o
-tracex6-objs := bpf_load.o libbpf.o tracex6_user.o
-test_probe_write_user-objs := bpf_load.o libbpf.o test_probe_write_user_user.o
-trace_output-objs := bpf_load.o libbpf.o trace_output_user.o
-lathist-objs := bpf_load.o libbpf.o lathist_user.o
-offwaketime-objs := bpf_load.o libbpf.o offwaketime_user.o
-spintest-objs := bpf_load.o libbpf.o spintest_user.o
-map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
-test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
-test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
-test_cgrp2_attach-objs := libbpf.o test_cgrp2_attach.o
-test_cgrp2_attach2-objs := libbpf.o test_cgrp2_attach2.o cgroup_helpers.o
-test_cgrp2_sock-objs := libbpf.o test_cgrp2_sock.o
-test_cgrp2_sock2-objs := bpf_load.o libbpf.o test_cgrp2_sock2.o
-xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
+# Libbpf dependencies
+LIBBPF := libbpf.o ../../tools/lib/bpf/bpf.o
+
+test_lru_dist-objs := test_lru_dist.o $(LIBBPF)
+sock_example-objs := sock_example.o $(LIBBPF)
+fds_example-objs := bpf_load.o $(LIBBPF) fds_example.o
+sockex1-objs := bpf_load.o $(LIBBPF) sockex1_user.o
+sockex2-objs := bpf_load.o $(LIBBPF) sockex2_user.o
+sockex3-objs := bpf_load.o $(LIBBPF) sockex3_user.o
+tracex1-objs := bpf_load.o $(LIBBPF) tracex1_user.o
+tracex2-objs := bpf_load.o $(LIBBPF) tracex2_user.o
+tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o
+tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o
+tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o
+tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o
+test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o
+trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o
+lathist-objs := bpf_load.o $(LIBBPF) lathist_user.o
+offwaketime-objs := bpf_load.o $(LIBBPF) offwaketime_user.o
+spintest-objs := bpf_load.o $(LIBBPF) spintest_user.o
+map_perf_test-objs := bpf_load.o $(LIBBPF) map_perf_test_user.o
+test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o
+test_cgrp2_array_pin-objs := $(LIBBPF) test_cgrp2_array_pin.o
+test_cgrp2_attach-objs := $(LIBBPF) test_cgrp2_attach.o
+test_cgrp2_attach2-objs := $(LIBBPF) test_cgrp2_attach2.o cgroup_helpers.o
+test_cgrp2_sock-objs := $(LIBBPF) test_cgrp2_sock.o
+test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o
+xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o
 # reuse xdp1 source intentionally
-xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
-test_current_task_under_cgroup-objs := bpf_load.o libbpf.o cgroup_helpers.o \
+xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o
+test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) cgroup_helpers.o \
   

Re: [PATCH 2/2] encx24j600: Fix some checkstyle warnings

2016-12-16 Thread David Miller
From: 
Date: Mon, 12 Dec 2016 14:29:09 +0100

> From: Jeroen De Wachter 
> 
> Signed-off-by: Jeroen De Wachter 

Applied.


Re: [PATCH 1/2] encx24j600: bugfix - always move ERXTAIL to next packet in encx24j600_rx_packets

2016-12-16 Thread David Miller
From: 
Date: Mon, 12 Dec 2016 14:29:08 +0100

> From: Jeroen De Wachter 
> 
> Before, encx24j600_rx_packets did not update encx24j600_priv's next_packet
> member when an error occurred during packet handling (either because the
> packet's RSV header indicates an error or because the 
> encx24j600_receive_packet
> method can't allocate an sk_buff).
> 
> If the next_packet member is not updated, the ERXTAIL register will be set to
> the same value it had before, which means the bad packet remains in the
> component's memory and its RSV header will be read again when a new packet
> arrives. If the RSV header indicates a bad packet or if sk_buff allocation
> continues to fail, new packets will be stored in the component's memory until
> that memory is full, after which packets will be dropped.
> 
> The SETPKTDEC command is always executed though, so the encx24j600 hardware 
> has
> an incorrect count of the packets in its memory.
> 
> To prevent this, the next_packet member should always be updated, allowing the
> packet to be skipped (either because it's bad, as indicated in its RSV header,
> or because allocating an sk_buff failed). In the allocation failure case, this
> does mean dropping a valid packet, but dropping the oldest packet to keep as
> much memory as possible available for new packets seems preferable to keeping
> old (but valid) packets around while dropping new ones.
> 
> Signed-off-by: Jeroen De Wachter 

Applied.


Re: [PATCH 0/2] net: ethernet: hisilicon: set dev->dev.parent before PHY connect

2016-12-16 Thread David Miller
From: Dongpo Li 
Date: Mon, 12 Dec 2016 20:03:41 +0800

> This patch series builds atop:
> ec988ad78ed6d184a7f4ca6b8e962b0e8f1de461 ("phy: Don't increment MDIO bus
> refcount unless it's a different owner")
> 
> I have checked all the hisilicon ethernet driver and found only two drivers
> need to be fixed to make sure set dev->dev.parent before PHY connect.

Series applied, thanks.


[patch net] mlxsw: spectrum: Mark split ports as such

2016-12-16 Thread Jiri Pirko
From: Ido Schimmel 

When a port is split we should mark it as such, as otherwise the split
ports aren't renamed correctly (e.g. sw1p3 -> sw1p3s1) and the unsplit
operation fails:

$ devlink port split sw1p3 count 4
$ devlink port unsplit eth0
devlink answers: Invalid argument
[  598.565307] mlxsw_spectrum :03:00.0 eth0: Port wasn't split

Fixes: 67963a33b4fd ("mlxsw: Make devlink port instances independent of 
spectrum/switchx2 port instances")
Signed-off-by: Ido Schimmel 
Reported-by: Tamir Winetroub 
Reviewed-by: Elad Raz 
Tested-by: Tamir Winetroub 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index fece974..d768c7b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -2404,7 +2404,7 @@ static int mlxsw_sp_port_create(struct mlxsw_sp 
*mlxsw_sp, u8 local_port,
local_port);
return err;
}
-   err = __mlxsw_sp_port_create(mlxsw_sp, local_port, false,
+   err = __mlxsw_sp_port_create(mlxsw_sp, local_port, split,
 module, width, lane);
if (err)
goto err_port_create;
-- 
2.7.4



RE: [PATCH RFC] liquidio: make timeout HZ independent

2016-12-16 Thread Chickles, Derek
> -Original Message-
> From: Nicholas Mc Guire [mailto:hof...@osadl.org]
> Sent: Thursday, December 15, 2016 10:57 PM
> To: Chickles, Derek
> Cc: Burla, Satananda; Manlunas, Felix; Vatsavayi, Raghu;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org; Nicholas Mc Guire
> Subject: [PATCH RFC] liquidio: make timeout HZ independent
> 
> schedule_timeout_* takes a timeout in jiffies but the code currently is
> passing in a constant which makes this timeout HZ dependent, so pass it
> through msecs_to_jiffies() to fix this up.
> 
> Fixes: commit b0d66369edcd ("liquidio VF error handling")
> Signed-off-by: Nicholas Mc Guire 
> ---
> 
> Problem found by coccinelle spatch
> 
> The current wait time can vary by a factor 10 depending on the HZ
> setting chose, which does not seem reasonable here.
> 
> The below patch sets the timeout to 100ms - it is though not clear
> if this is the intent or if it should be longer/shorter as it is not
> clear what HZ setting was assumed during design and used for testing.
> 
> This needs an ack by someone who knows the device and can confirm that
> 100ms is reasonable to wait for completion of in-flight requests.

Yes, 100ms was the intent here.

Thanks for catching this.

Derek




[PATCH 1/1] tools: net: bpf_dbg.c fixed keyboard typo

2016-12-16 Thread Ozgur Karatas

This patch fixed to keyboard typo, brackets not closed. 
I think, it should be close to parenthes.

Signed-off-by: Ozgur Karatas 
---
 tools/net/bpf_dbg.c   | 2 +-
 1 files changed, 1 insertion(+), 1 deletions(-)

diff --git a/tools/net/bpf_dbg.c b/tools/net/bpf_dbg.c
index 4f254bc..f715f46 100644
--- a/tools/net/bpf_dbg.c
+++ b/tools/net/bpf_dbg.c
@@ -1213,7 +1213,7 @@ static int cmd_disassemble(char *line_string)
 
if (!bpf_prog_loaded())
return CMD_ERR;
-   if (strlen(line_string) > 0 &&
+   if (strlen(line_string) > 0 &&)
(line = strtoul(line_string, NULL, 10)) < bpf_prog_len)
single_line = true;
if (single_line)
-- 
2.1.4


Re: [PATCH net] net: dsa: mv88e6xxx: Fix opps when adding vlan bridge

2016-12-16 Thread David Miller
From: Andrew Lunn 
Date: Sun, 11 Dec 2016 21:07:19 +0100

> A port is not necessarily assigned to a netdev. And a port does not
> need to be a member of a bridge. So when iterating over all ports,
> check before using the netdev and bridge_dev for a port. Otherwise we
> dereference a NULL pointer.
> 
> Fixes: da9c359e19f0 ("net: dsa: mv88e6xxx: check hardware VLAN in use")
> Signed-off-by: Andrew Lunn 

Applied.


Re: net/3com/3c515: Fix timer handling, prevent leaks and crashes

2016-12-16 Thread David Miller
From: Thomas Gleixner 
Date: Sun, 11 Dec 2016 18:31:22 +0100 (CET)

> The timer handling in this driver is broken in several ways:
> 
> - corkscrew_open() initializes and arms a timer before requesting the
>   device interrupt. If the request fails the timer stays armed.
> 
>   A second call to corkscrew_open will unconditionally reinitialize the
>   quued timer and arm it again. Also a immediate device removal will leave
>   the timer queued because close() is not called (open() failed) and
>   therefore nothing issues del_timer().
> 
>   The reinitialization corrupts the link chain in the timer wheel hash
>   bucket and causes a NULL pointer dereference when the timer wheel tries
>   to operate on that hash bucket. Immediate device removal lets the link
>   chain poke into freed and possibly reused memory.
> 
>   Solution: Arm the timer after the successful irq request.
> 
> - corkscrew_close() uses del_timer()
> 
>   On close the timer is disarmed with del_timer() which lets the following
>   code race against a concurrent timer expiry function.
> 
>   Solution: Use del_timer_sync() instead
> 
> - corkscrew_close() calls del_timer() unconditionally
> 
>   del_timer() is invoked even if the timer was never initialized. This
>   works by chance because the struct containing the timer is zeroed at
>   allocation time.
> 
>   Solution: Move the setup of the timer into corkscrew_setup().
> 
> Reported-by: Matthew Whitehead 
> Signed-off-by: Thomas Gleixner 

Applied, thanks Thomas.


Re: [TSN RFC v2 0/9] TSN driver for the kernel

2016-12-16 Thread David Miller
From: Greg 
Date: Fri, 16 Dec 2016 10:12:44 -0800

> On Fri, 2016-12-16 at 18:59 +0100, hen...@austad.us wrote:
>> From: Henrik Austad 
>> 
>> 
>> The driver is directed via ConfigFS as we need userspace to handle
>> stream-reservation (MSRP), discovery and enumeration (IEEE 1722.1) and
>> whatever other management is needed. This also includes running an
>> appropriate PTP daemon (TSN favors gPTP).
> 
> I suggest using a generic netlink interface to communicate with the
> driver to set up and/or configure your drivers.
> 
> I think configfs is frowned upon for network drivers.  YMMV.

Agreed.


Re: [PATCH] bpf: cgroup: annotate pointers in struct cgroup_bpf with __rcu

2016-12-16 Thread Alexei Starovoitov
On Thu, Dec 15, 2016 at 10:53:21AM +0100, Daniel Mack wrote:
> The member 'effective' in 'struct cgroup_bpf' is protected by RCU.
> Annotate it accordingly to squelch a sparse warning.
> 
> Signed-off-by: Daniel Mack 

Acked-by: Alexei Starovoitov 

was only wondering whether this is really needed for net or can wait till 
net-next.

> ---
>  include/linux/bpf-cgroup.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index 7b6e5d1..92bc89a 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -20,7 +20,7 @@ struct cgroup_bpf {
>* when this cgroup is accessed.
>*/
>   struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE];
> - struct bpf_prog *effective[MAX_BPF_ATTACH_TYPE];
> + struct bpf_prog __rcu *effective[MAX_BPF_ATTACH_TYPE];
>  };
>  
>  void cgroup_bpf_put(struct cgroup *cgrp);
> -- 
> 2.9.3
> 


Re: [net-next PATCH v6 0/5] XDP for virtio_net

2016-12-16 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Fri, 16 Dec 2016 01:17:44 +0200

> OK, I think we can queue this for -next.
> 
> It's fairly limited in the kind of hardware supported, we can and
> probably should extend it further with time.
> 
> Acked-by: Michael S. Tsirkin 

Michael, thanks for reviewing.

Since the substance of this patch series has honestly been ready since
before the merge window, would you mind if I send this to Linus now?

That's what I was hoping I would be able to do.

Thanks again.


Re: [PATCH] cgroup: Fix CGROUP_BPF config

2016-12-16 Thread Alexei Starovoitov
On Fri, Dec 16, 2016 at 08:33:45AM -0800, Andy Lutomirski wrote:
> CGROUP_BPF depended on SOCK_CGROUP_DATA which can't be manually
> enabled, making it rather challenging to turn CGROUP_BPF on.
> 
> Signed-off-by: Andy Lutomirski 

Acked-by: Alexei Starovoitov 



Re: [PATCH net 1/3] cpsw/netcp: cpts depends on posix_timers

2016-12-16 Thread Nicolas Pitre
On Fri, 16 Dec 2016, Arnd Bergmann wrote:

> With posix timers having become optional, we get a build error with
> the cpts time sync option of the CPSW driver:
> 
> drivers/net/ethernet/ti/cpts.c: In function 'cpts_find_ts':
> drivers/net/ethernet/ti/cpts.c:291:23: error: implicit declaration of 
> function 'ptp_classify_raw';did you mean 'ptp_classifier_init'? 
> [-Werror=implicit-function-declaration]
> 
> It really makes no sense to build this driver if we can't use PTP,
> so it's better to go back to 'select PTP_1588_CLOCK' but instead
> add a dependency on POSIX_TIMERS.

Why not depend on PTP_1588_CLOCK directly instead?

> Fixes: baa73d9e478f ("posix-timers: Make them configurable")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/net/ethernet/ti/Kconfig | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
> index 296c8efd0038..366e29ff8605 100644
> --- a/drivers/net/ethernet/ti/Kconfig
> +++ b/drivers/net/ethernet/ti/Kconfig
> @@ -76,7 +76,8 @@ config TI_CPSW
>  config TI_CPTS
>   tristate "TI Common Platform Time Sync (CPTS) Support"
>   depends on TI_CPSW || TI_KEYSTONE_NETCP
> - imply PTP_1588_CLOCK
> + depends on POSIX_TIMERS
> + select PTP_1588_CLOCK
>   ---help---
> This driver supports the Common Platform Time Sync unit of
> the CPSW Ethernet Switch and Keystone 2 1g/10g Switch Subsystem.
> -- 
> 2.9.0
> 
> 


Re: [TSN RFC v2 0/9] TSN driver for the kernel

2016-12-16 Thread Greg
On Fri, 2016-12-16 at 18:59 +0100, hen...@austad.us wrote:
> From: Henrik Austad 
> 
> 
> The driver is directed via ConfigFS as we need userspace to handle
> stream-reservation (MSRP), discovery and enumeration (IEEE 1722.1) and
> whatever other management is needed. This also includes running an
> appropriate PTP daemon (TSN favors gPTP).

I suggest using a generic netlink interface to communicate with the
driver to set up and/or configure your drivers.

I think configfs is frowned upon for network drivers.  YMMV.

- Greg

> 
> Once we have all the required attributes, we can create link using
> mkdir, and use write() to set the attributes. Once ready, specify the
> 'shim' (basically a thin wrapper between TSN and another subsystem) and
> we start pushing out frames.
> 
> The network part: it ties directly into the Rx-handler for receive and
> writes skb's using dev_queue_xmit(). This could probably be improved.
> 
> 2 new fields in netdev_ops have been introduced, and the Intel
> igb-driver has been updated (as this an AVB-capable NIC which is
> available as a PCI-e card).
> 
> What remains (tl;dr: a lot) a.k.a "Known problems" or "working on it!"
> - tie to (g)PTP properly, currently using ktime_get() for presentation
>   time
> - get time from shim into TSN
> - let shim create/manage buffer
> - redo parts of the link-stuff using RCUs, the current setup is a bit
>   clumsy.
> - The igb-driver does not work properly when compiled with IGB_TSN, some
>   details in setting the register values needs to be figured out. I am
>   working on this, but as it stands, the best bet is to load tsn using
>   in_debug=1 to bypass the capability-check. I have had e1000 and sky2
>   running for several days without crashing, igb crashes and burns
>   violently.
> - The ALSA driver does not handle multiple devices very well and is a
>   work in progress.
> 
> * v2: changes since v1
> 
> Changes since v1
> - updated to latest upstream kernel (v4.8)
> - set dedicated enabled-attribute and let shim be stored in own (support
>   future plan for enabling per-shim attributes)
> - fixed endianess issue in bitfields used in tsn-structs
> - Updated some of the trace-events to use trace_class
> - Fix various silly typos
> - Handle disabling of link from hrtimer a bit more gracefully (that
>   actually works-ish).
> - use old skb and size of skb when that is set (Reporte by Nikita)
> - Move PCP-codes to NIC and not in the link itself
> - Allow TSN-capable card to be loaded even when in debug-mode (and do
>   not enforce TSN behaviour)
> - Start hooking into ALSA's get_time_info hooks (very much incomplete)
> - use threads for sending frames, wake from hrtimer-callback.
>   This also queues up awaiting timers if we fail to complete the
>   transmit before another timer arrives, it will immediately execute
>   another iteration, so no events should be lost. That being said,
>   should this happen, it is a clear bug as we really should complete
>   well before the next interval.
> - Cleanup link-locking and differentiate between Talker and Listener (as
>   Listener grab link-lock from IRQ context)
> - Change list-lock to spinlock as we may need to take a link-lock whilst
>   holding the master list-lock.
> - Do a more careful dance holding the spinlocks to regions only doing
>   actual update.
> 
> Network driver (I210 only)
> - bring up all Tx-/Rx-queues when igb is in TSN-mode regardless of how
>   many CPUs the system has for I210
> - Correctly calculate the idle_slope in I210's configure hook
> - Update igb-driver with queue-select and return correct queue when
>   sending TSN-frames
> - add IGB_FLAG_QAV_PRIO flag to igb_adapter (to handle proper config of
>   tx-ring when adapter is brought up.
> - add TXDCTL logic (part of preparatory work for TSN) to igb-driver
> - Improve SR(A|B) accountingo
> 
> ALSA Shim
> - Allow userspace to grab much smaller chunks of data (down to a single
>   Class A frame for S16_LE 2ch 48kHz).
> - Create the card with index/id pattern to avoid collision with other
>   cards.
> * v1
> 
> Before reading on - this is not even beta, but I'd really appreciate if
> people would comment on the overall architecture and perhaps provide
> some pointers to where I should improve/fix/update
> - thanks!
> 
> This is a very early RFC for a TSN-driver in the kernel. It has been
> floating around in my repo for a while and I would appreciate some
> feedback on the overall design to avoid doing some major blunders.
> 
> There are at least one AVB-driver (the AV-part of TSN) in the kernel
> already. This driver aims to solve a wider scope as TSN can do much more
> than just audio. A very basic ALSA-driver is added to the end that
> allows you to play music between 2 machines using aplay in one end and
> arecord | aplay on the other (some fiddling required) We have plans for
> doing the same for v4l2 eventually (but there are other fishes to fry
> first). The same goes for a TSN_SOCK type approach 

[TSN RFC v2 5/9] Add TSN header for the driver

2016-12-16 Thread henrik
From: Henrik Austad 

This defines the general TSN headers for network packets, the
shim-interface and the central 'tsn_list' structure.

Cc: "David S. Miller" 
Signed-off-by: Henrik Austad 
---
 include/linux/tsn.h | 952 
 1 file changed, 952 insertions(+)
 create mode 100644 include/linux/tsn.h

diff --git a/include/linux/tsn.h b/include/linux/tsn.h
new file mode 100644
index 000..9123b25
--- /dev/null
+++ b/include/linux/tsn.h
@@ -0,0 +1,952 @@
+/*   TSN - Time Sensitive Networking
+ *
+ *   Copyright (C) 2016- Henrik Austad 
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or
+ *   (at your option) any later version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ */
+#ifndef _TSN_H
+#define _TSN_H
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * List of current subtype fields in the common header of AVTPDU
+ *
+ * Note: AVTPDU is a remnant of the standards from when it was AVB.
+ *
+ * The list has been updated with the recent values from IEEE 1722, draft 16.
+ */
+enum avtp_subtype {
+   TSN_61883_IIDC = 0, /* IEC 61883/IIDC Format */
+   TSN_MMA_STREAM, /* MMA Streams */
+   TSN_AAF,/* AVTP Audio Format */
+   TSN_CVF,/* Compressed Video Format */
+   TSN_CRF,/* Clock Reference Format */
+   TSN_TSCF,   /* Time-Synchronous Control Format */
+   TSN_SVF,/* SDI Video Format */
+   TSN_RVF,/* Raw Video Format */
+   /* 0x08 - 0x6D reserved */
+   TSN_AEF_CONTINOUS = 0x6e, /* AES Encrypted Format Continous */
+   TSN_VSF_STREAM, /* Vendor Specific Format Stream */
+   /* 0x70 - 0x7e reserved */
+   TSN_EF_STREAM = 0x7f,   /* Experimental Format Stream */
+   /* 0x80 - 0x81 reserved */
+   TSN_NTSCF = 0x82,   /* Non Time-Synchronous Control Format */
+   /* 0x83 - 0xed reserved */
+   TSN_ESCF = 0xec,/* ECC Signed Control Format */
+   TSN_EECF,   /* ECC Encrypted Control Format */
+   TSN_AEF_DISCRETE,   /* AES Encrypted Format Discrete */
+   /* 0xef - 0xf9 reserved */
+   TSN_ADP = 0xfa, /* AVDECC Discovery Protocol */
+   TSN_AECP,   /* AVDECC Enumeration and Control Protocol */
+   TSN_ACMP,   /* AVDECC Connection Management Protocol */
+   /* 0xfd reserved */
+   TSN_MAAP = 0xfe,/* MAAP Protocol */
+   TSN_EF_CONTROL, /* Experimental Format Control */
+};
+
+/* Link-states to help error-recovery detected from irq context.
+ */
+enum link_states {
+   LINK_OFF = 0,
+   LINK_RUNNING,
+   LINK_ERROR,
+};
+
+
+/* NOTE NOTE NOTE !!
+ * The headers below use bitfields extensively and verifications
+ * are needed when using little-endian vs big-endian systems.
+ */
+
+/* Common part of avtph header
+ *
+ * AVB Transport Protocol Common Header
+ *
+ * Defined in 1722-2011 Sec. 5.2
+ */
+struct avtp_ch {
+#if defined(__LITTLE_ENDIAN_BITFIELD)
+   /* use avtp_subtype enum.
+*/
+   u8 subtype:7;
+
+   /* Controlframe: 1
+* Dataframe   : 0
+*/
+   u8 cd:1;
+
+   /* Type specific data, part 1 */
+   u8 tsd_1:4;
+
+   /* In current version of AVB, only 0 is valid, all other values
+* are reserved for future versions.
+*/
+   u8 version:3;
+
+   /* Valid StreamID in frame
+*
+* ControlData not related to a specific stream should clear
+* this (and have stream_id = 0), _all_ other values should set
+* this to 1.
+*/
+   u8 sv:1;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+   u8 cd:1;
+   u8 subtype:7;
+   u8 sv:1;
+   u8 version:3;
+   u8 tsd_1:4;
+#else
+#error "Unknown Endianness, cannot determine bitfield ordering"
+#endif
+   /* Type specific data (adjacent to tsd_1, but split due to bitfield) */
+   u16 tsd_2;
+   u64 stream_id;
+
+   /*
+* payload by subtype
+*/
+   u8 pbs[0];
+} __packed;
+
+/* AVTPDU Common Control header format
+ * IEEE 1722#5.3
+ */
+struct avtpc_header {
+#if defined(__LITTLE_ENDIAN_BITFIELD)
+   u8 subtype:7;
+   u8 cd:1;
+   u8 control_data:4;
+   u8 version:3;
+   u8 sv:1;
+   u16 control_data_length:11;
+   u16 status:5;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+   u8 cd:1;
+   u8 subtype:7;
+   u8 sv:1;
+   u8 

[TSN RFC v2 6/9] Add TSN machinery to drive the traffic from a shim over the network

2016-12-16 Thread henrik
From: Henrik Austad 

In short summary:

* tsn_core.c is the main driver of tsn, all new links go through
  here and all data to/form the shims are handled here
  core also manages the shim-interface.

* tsn_configfs.c is the API to userspace. TSN is driven from userspace
  and a link is created, configured, enabled, disabled and removed
  purely from userspace. All attributes requried must be determined by
  userspace, preferrably via IEEE 1722.1 (discovery and enumeration).

  New is that setting a shim will not automatically enable it, this is to
  allow shims to expose own attributes via ConfigFS. It will also make the
  steps a bit more obvious.

* tsn_header.c small part that handles the actual header of the frames
  we send. Kept out of core for cleanliness.

* tsn_net.c handles operations towards the networking layer. A *very*
  simple hook for handling backpressure in the tx-queue is added, but this
  is currently nowhere near sufficient.

The current driver is under development. This means that from the moment it
is enabled (with a registered shim), it will send traffic, either 0-traffic
(frames of reserved length but with payload 0) or actual traffic. This will
change once the driver stabilizes.

We also use a kthread to handle the lifting when transmitting frames. This
should remove some of the old timeouts and issues we had when doing all of
this via the hrtimer callback. Should a new timer fire before we are done,
it will be queued up and handled immediately. Note that this is a bug (we
*really* should be done before the next 1ms tick happens.

For more detail, see Documentation/networking/tsn/

Cc: "David S. Miller" 
Signed-off-by: Henrik Austad 
---
 net/Makefile   |1 +
 net/tsn/Makefile   |6 +
 net/tsn/tsn_configfs.c |  673 +++
 net/tsn/tsn_core.c | 1189 
 net/tsn/tsn_header.c   |  162 +++
 net/tsn/tsn_internal.h |  397 
 net/tsn/tsn_net.c  |  392 
 7 files changed, 2820 insertions(+)
 create mode 100644 net/tsn/Makefile
 create mode 100644 net/tsn/tsn_configfs.c
 create mode 100644 net/tsn/tsn_core.c
 create mode 100644 net/tsn/tsn_header.c
 create mode 100644 net/tsn/tsn_internal.h
 create mode 100644 net/tsn/tsn_net.c

diff --git a/net/Makefile b/net/Makefile
index 4cafaa2..a0f7d41 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -81,3 +81,4 @@ obj-y += l3mdev/
 endif
 obj-$(CONFIG_QRTR) += qrtr/
 obj-$(CONFIG_NET_NCSI) += ncsi/
+obj-$(CONFIG_TSN)  += tsn/
diff --git a/net/tsn/Makefile b/net/tsn/Makefile
new file mode 100644
index 000..0d87687
--- /dev/null
+++ b/net/tsn/Makefile
@@ -0,0 +1,6 @@
+#
+# Makefile for the Linux TSN subsystem
+#
+
+obj-$(CONFIG_TSN) += tsn.o
+tsn-objs :=tsn_core.o tsn_configfs.o tsn_net.o tsn_header.o
diff --git a/net/tsn/tsn_configfs.c b/net/tsn/tsn_configfs.c
new file mode 100644
index 000..9ace1aa
--- /dev/null
+++ b/net/tsn/tsn_configfs.c
@@ -0,0 +1,673 @@
+/*
+ *   ConfigFS interface to TSN
+ *   Copyright (C) 2015- Henrik Austad 
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation; either version 2 of the License, or
+ *   (at your option) any later version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "tsn_internal.h"
+
+static inline struct tsn_link *to_tsn_link(struct config_item *item)
+{
+   /* this line causes checkpatch to WARN. making checkpatch happy,
+* makes code messy..
+*/
+   return item ? container_of(to_config_group(item), struct tsn_link, 
group) : NULL;
+}
+
+static inline struct tsn_nic *to_tsn_nic(struct config_group *group)
+{
+   return group ? container_of(group, struct tsn_nic, group) : NULL;
+}
+
+static inline struct tsn_nic *item_to_tsn_nic(struct config_item *item)
+{
+   return item ? container_of(to_config_group(item), struct tsn_nic, 
group) : NULL;
+}
+
+/* ---
+ * Tier2 attributes
+ *
+ * The content of the links userspace can see/modify
+ * ---
+*/
+static ssize_t _tsn_max_payload_size_show(struct config_item *item,
+ char *page)
+{
+   struct tsn_link *link = to_tsn_link(item);
+
+   if (!link)
+   return -EINVAL;
+   return sprintf(page, "%u\n", (u32)link->max_payload_size);
+}
+
+static ssize_t 

[TSN RFC v2 8/9] AVB ALSA - Add ALSA shim for TSN

2016-12-16 Thread henrik
From: Henrik Austad 

This exposes a *very* rudimentary and simplistic ALSA driver that hooks
into TSN to create a device for userspace.

It currently only supports 44.1/48kHz sampling, 2ch, S16_LE

Userspace is supposed to reserve bandwidth, find StreamID etc.

To use as a Talker:

mkdir /config/tsn/test/eth0/talker
cd /config/tsn/test/eth0/talker
echo 65535 > buffer_size
echo 08:00:27:08:9f:c3 > remote_mac
echo 42 > stream_id
echo alsa > shim
echo on > enabled

 aplay -Ddefault:CARD=avb music.wav

  or

 arecord -r48000 -c2 -f S16_LE | aplay -Ddefault:CARD=avb

alternatively, if the device is set as Listener;

 arecord -r48000 -c2 -f S16_LE -Ddefault:CARD=avb > file.wav

Cc: Mauro Carvalho Chehab 
Cc: Takashi Iwai 
Cc: Mark Brown 
Signed-off-by: Henrik Austad 
---
 drivers/media/Kconfig|  15 +
 drivers/media/Makefile   |   2 +-
 drivers/media/avb/Makefile   |   5 +
 drivers/media/avb/avb_alsa.c | 793 +++
 drivers/media/avb/tsn_iec61883.h | 152 
 5 files changed, 966 insertions(+), 1 deletion(-)
 create mode 100644 drivers/media/avb/Makefile
 create mode 100644 drivers/media/avb/avb_alsa.c
 create mode 100644 drivers/media/avb/tsn_iec61883.h

diff --git a/drivers/media/Kconfig b/drivers/media/Kconfig
index 7b85402..8250aff 100644
--- a/drivers/media/Kconfig
+++ b/drivers/media/Kconfig
@@ -221,3 +221,18 @@ source "drivers/media/tuners/Kconfig"
 source "drivers/media/dvb-frontends/Kconfig"
 
 endif # MEDIA_SUPPORT
+
+config MEDIA_AVB_ALSA
+   tristate "ALSA part of AVB over TSN"
+   depends on TSN
+   help
+
+ Enable the ALSA device that hoooks into TSN and allows the
+ computer to send ethernet frames over the network carrying
+ audio-data to selected hosts.
+
+This must be configured by userspace as MSRP and IEEE 1722.1
+(discovery and enumeration) is not implemented within the
+kernel.
+
+If unsure, say N
\ No newline at end of file
diff --git a/drivers/media/Makefile b/drivers/media/Makefile
index 0deaa93..9dfee62 100644
--- a/drivers/media/Makefile
+++ b/drivers/media/Makefile
@@ -34,4 +34,4 @@ obj-y += rc/
 
 obj-y += common/ platform/ pci/ usb/ mmc/ firewire/ spi/
 obj-$(CONFIG_VIDEO_DEV) += radio/
-
+obj-$(CONFIG_MEDIA_AVB_ALSA) += avb/
diff --git a/drivers/media/avb/Makefile b/drivers/media/avb/Makefile
new file mode 100644
index 000..5d6302c
--- /dev/null
+++ b/drivers/media/avb/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the ALSA shim in AVB/TSN
+#
+
+obj-$(CONFIG_MEDIA_AVB_ALSA) += avb_alsa.o
diff --git a/drivers/media/avb/avb_alsa.c b/drivers/media/avb/avb_alsa.c
new file mode 100644
index 000..bd202f5
--- /dev/null
+++ b/drivers/media/avb/avb_alsa.c
@@ -0,0 +1,793 @@
+/* Copyright 2016 Cisco Systems, Inc. and/or its affiliates. All rights
+ * reserved.
+ *
+ * This program is free software; you may redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "tsn_iec61883.h"
+
+static int index[SNDRV_CARDS] = SNDRV_DEFAULT_IDX;
+static char *id[SNDRV_CARDS] = SNDRV_DEFAULT_STR;
+module_param_array(index, int, NULL, 0444);
+MODULE_PARM_DESC(index, "Index value for AVB soundcard.");
+module_param_array(id, charp, NULL, 0444);
+MODULE_PARM_DESC(id, "ID string for AVB soundcard.");
+
+struct avb_chip {
+   struct snd_card *card;
+   struct tsn_link *link;
+   struct snd_pcm *pcm;
+   struct snd_pcm_substream *substream;
+
+   /* Need a reference to this when we unregister the platform
+* driver.
+*/
+   struct platform_device *device;
+
+   /* on first copy, we set a few values, use this to make sure we
+* only do this once.
+*/
+   u8 first_copy;
+
+   u8 sample_size;
+   u8 channels;
+
+   /* current idx in 10ms set of frames
+* class A: 80
+* class B: 40
+*
+* This is mostly relevant for 44.1kHz samplefreq
+*/
+   u8 num_10ms_series;
+
+   u32 sample_freq;
+};
+
+/* currently, only playback is implemented in TSN layer
+ *
+
+ * FIXMEs: (should be set according to the active TSN link)
+ * - format
+ * - rates
+ * - channels
+ *
+ * 

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread Jason A. Donenfeld
Hi George,

On Fri, Dec 16, 2016 at 6:36 PM, George Spelvin
 wrote:
> A 128-bit output option was added to SipHash after the initial publication;
> this is just the equivalent in 32-bit.
> Personally, I'd put in a comment saying that "there's a 64-bit output
> variant that's not implemented" and punt until someone find a need.

That's a good way to think about it. Okay, I'll do precisely that.

> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
> should be used always.  Don't even compile the 32-bit code, to prevent
> anyone accidentally using it, and make hsiphash an alias for siphash.

Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I
like this arrangement.


> Fortunately, the cost of brute-forcing hash functions can be fairly
> exactly quantified, thanks to bitcoin miners.  It currently takes 2^70
> hashes to create one bitcoin block, worth 25 bitcoins ($19,500).  Thus,
> 2^63 hashes cost $152.
>
> Now, there are two factors that must be considered:
> - That's a very very "wholesale" rate.  That's assuming you're doing
>   large numbers of these and can put in the up-front effort designing
>   silicon ASICs to do the attack.
> - That's for a more difficult hash (double sha-256) than SipHash.
>   That's a constant fator, but a pretty significant one.  If the wholesale
>   assumption holds, that might bring the cost down another 6 or 7 bits,
>   to $1-2 per break.
>
> If you're not the NSA and limited to general-purpose silicon, let's
> assume a state of the art GPU (Radeon HD 7970; AMD GPUs seem do to better
> than nVidia).  The bitcoin mining rate for those is about 700M/second,
> 29.4 bits.  So 63 bits is 152502 GPU-days, divided by some factor
> to account for SipHash's high speed compared to two rounds of SHA-2.
> Call it 1000 GPU-days.
>
> It's very doable, but also very non-trivial.  The question is, wouldn't
> it be cheaper and easier just to do a brute-force flooding DDoS?
>
> (This is why I wish the key size could be tweaked up to 80 bits.
> That would take all these numbers out of the reasonable range.)

That's a nice analysis. Might one conclude from that that hsiphash is
not useful for our purposes? Or does it still remain useful for
network facing code?

> Let me consider your second example above, "secure against local users".
> I should dig through your patchset and find the details, but what exactly
> are the consequences of such an attack?  Hasn't a local user already
> got much better ways to DoS the system?

For example, an unpriv'd user putting lots of entries in one hash
bucket for a shared resource that's used by root, like filesystems or
other lookup tables. If he can cause root to use more of root's cpu
schedule budget than otherwise in a directed way, then that's a bad
DoS.

> The thing to remember is that we're worried only about the combination
> of a *new* Linux kernel (new build or under active maintenance) and a
> 32-bit host.  You'd be hard-pressed to find a *single* machine fitting
> that description which is hosting multiple users or VMs and is not 64-bit.
>
> These days, 32-bit CPUs are for embedded applications: network appliances,
> TVs, etc.  That means basically single-user.  Even phones are 64 bit.
> Is this really a threat that needs to be defended against?

I interpret this to indicate all the more reason to alias hsiphash to
siphash on 64-bit, and then the problem space collapses in a clear
way.

> For your first case, network applications, the additional security
> is definitely attractive.  Syncookies are only a DoS, but sequence
> numbers are a real security issue; they can let you inject data into a
> TCP connection.
> With sequence numbers, large amounts (32 bits) the hash output is
> directly observable.

Right. Hence the need for always using full siphash and not hsiphash
for sequence numbers, per my earlier email to David.

>
> I wish we could get away with 64-bit security, but given that the
> modern internet involves attacks from NSA/Spetssvyaz/3PLA, I agree
> it's just not enough.

I take this comment to be relavent for the sequence number case.

For hashtables and hashtable flooding, is it still your opinion that
we will benefit from hsiphash? Or is this final conclusion a rejection
of hsiphash for that too? We're talking about two different use cases,
and your email kind of interleaved both into your analysis, so I'm not
certain so to precisely what your conclusion is for each use case. Can
you clear up the ambiguity?

Jason


[TSN RFC v2 7/9] Add TSN event-tracing

2016-12-16 Thread henrik
From: Henrik Austad 

Provide a fair debug-window into TSN. It tries to use TRACE_CLASS as much
as possible and moves as much as possible of the logic into TP_printk() to
minimize tracing overhead.

Cc: "David S. Miller" 
Cc: Steven Rostedt  (maintainer:TRACING)
Cc: Ingo Molnar  (maintainer:TRACING)
Signed-off-by: Henrik Austad 
---
 include/trace/events/tsn.h | 333 +
 1 file changed, 333 insertions(+)
 create mode 100644 include/trace/events/tsn.h

diff --git a/include/trace/events/tsn.h b/include/trace/events/tsn.h
new file mode 100644
index 000..59c
--- /dev/null
+++ b/include/trace/events/tsn.h
@@ -0,0 +1,333 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM tsn
+
+#if !defined(_TRACE_TSN_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_TSN_H
+
+#include 
+#include 
+
+#include 
+#include 
+/* #include  */
+DECLARE_EVENT_CLASS(tsn_buffer_template,
+
+   TP_PROTO(struct tsn_link *link,
+   size_t bytes),
+
+   TP_ARGS(link, bytes),
+
+   TP_STRUCT__entry(
+   __field(u64, stream_id)
+   __field(size_t, size)
+   __field(size_t, bsize)
+   __field(void *, buffer)
+   __field(void *, head)
+   __field(void *, tail)
+   __field(void *, end)
+   ),
+
+   TP_fast_assign(
+   __entry->stream_id = link->stream_id;
+   __entry->size = bytes;
+   __entry->bsize = link->used_buffer_size;
+   __entry->buffer = link->buffer;
+   __entry->head = link->head;
+   __entry->tail = link->tail;
+   __entry->end = link->end;
+   ),
+
+   TP_printk("stream_id=%llu, copy=%zd, buffer: %zd, avail=%zd, 
[buffer=%p, head=%p, tail=%p, end=%p]",
+   __entry->stream_id, __entry->size, __entry->bsize,
+   (__entry->head - __entry->tail) % __entry->bsize,
+   __entry->buffer,__entry->head, __entry->tail, __entry->end)
+);
+
+DEFINE_EVENT(tsn_buffer_template, tsn_buffer_write,
+   TP_PROTO(struct tsn_link *link, size_t bytes),
+   TP_ARGS(link, bytes)
+);
+
+DEFINE_EVENT(tsn_buffer_template, tsn_buffer_write_net,
+   TP_PROTO(struct tsn_link *link, size_t bytes),
+   TP_ARGS(link, bytes)
+);
+
+DEFINE_EVENT(tsn_buffer_template, tsn_buffer_read,
+   TP_PROTO(struct tsn_link *link, size_t bytes),
+   TP_ARGS(link, bytes)
+);
+
+DEFINE_EVENT(tsn_buffer_template, tsn_buffer_read_net,
+   TP_PROTO(struct tsn_link *link, size_t bytes),
+   TP_ARGS(link, bytes)
+);
+
+
+DECLARE_EVENT_CLASS(tsn_buffer_update,
+
+   TP_PROTO(struct tsn_link *link,
+   size_t reported_avail),
+
+   TP_ARGS(link, reported_avail),
+
+   TP_STRUCT__entry(
+   __field(u64, stream_id)
+   __field(size_t, bsize)
+   __field(void *, head)
+   __field(void *, tail)
+   __field(size_t, reported_left)
+   __field(size_t, low_water)
+   ),
+
+   TP_fast_assign(
+   __entry->stream_id = link->stream_id;
+   __entry->bsize = link->used_buffer_size;
+   __entry->head = link->head;
+   __entry->tail = link->tail;
+   __entry->reported_left = reported_avail;
+   __entry->low_water = link->low_water_mark;
+   ),
+
+   TP_printk("stream_id=%llu, buffer_size=%zd, avail=%zd, reported=%zd, 
low_water=%zd",
+   __entry->stream_id, __entry->bsize,
+   (__entry->head - __entry->tail) % __entry->bsize,
+   __entry->reported_left, __entry->low_water)
+);
+
+/* Bytes will be "reported left", i.e. how much more space we have in
+ * the buffer before we wrap.
+ */
+DEFINE_EVENT(tsn_buffer_update, tsn_refill,
+   TP_PROTO(struct tsn_link *link, size_t bytes),
+   TP_ARGS(link, bytes)
+);
+
+DEFINE_EVENT(tsn_buffer_update, tsn_buffer_drain,
+   TP_PROTO(struct tsn_link *link, size_t bytes),
+   TP_ARGS(link, bytes)
+);
+
+TRACE_EVENT(tsn_send_batch,
+
+   TP_PROTO(struct tsn_link *link,
+   int num_send,
+   u64 ts_base_ns,
+   u64 ts_delta_ns),
+
+   TP_ARGS(link, num_send, ts_base_ns, ts_delta_ns),
+
+   TP_STRUCT__entry(
+   __field(u64, stream_id)
+   __field(int, seqnr)
+   __field(int, num_send)
+   __field(u64, ts_base_ns)
+   __field(u64, ts_delta_ns)
+   ),
+
+   TP_fast_assign(
+   __entry->stream_id   = link->stream_id;
+   __entry->seqnr   = (int)link->last_seqnr;
+   __entry->ts_base_ns  = ts_base_ns;
+   __entry->ts_delta_ns = ts_delta_ns;
+   __entry->num_send= num_send;
+   ),
+
+   

[TSN RFC v2 4/9] Adding TSN-driver to Intel I210 controller

2016-12-16 Thread henrik
From: Henrik Austad 

This adds support for loading the igb.ko module with tsn
capabilities. This requires a 2-step approach. First enabling TSN in
.config, then load the module with use_tsn=1.

Once enabled and loaded, the controller will be placed in "Qav-mode"
which is when the credit-based shaper is available, 3 of the queues are
removed from regular traffic, max payload is set to 1522 octets (no
jumboframes allowed).

It dumps the registers of interest before and after, so this clutters
kern.log a bit if it is loaded with debug_tsn=1.

Regardless of number of online CPUs, it will enable *all* for Tx-queues as
2 is required for Qav traffic. This has not been tested extensively, so
there may be some instabilities in this.

Improved SR(A|B) accounting:
Use the idleslope-bins to keep track of how much time is reserved for
each class. This can then be used to strip the vlan-tag on the NIC when
the last stream goes (and also allow for reconfiguration of PCP when the
NIC is not sending TSN traffic).

Note: currently this driver is *not* stable, it is still a work in
progress, some points to keep tabs on:
- Set hicred to unlim (for testing this is ok and we avoid some nasty
  calculations)
- once we configure it for TSN, enable credit shaping, do not wait for
  first link to be configured (nobody else should use these queues after
  being configured).
- enable all Tx-/Rx-queues in TSN-mode regardless of num_online_cpus()
- Add 802.1Qav Prio-bit in adapter->flags
Cc: Jeff Kirsher 
Cc: Jesse Brandeburg 
Cc: intel-wired-...@lists.osuosl.org
Cc: "David S. Miller" 
Signed-off-by: Henrik Austad 
---
 drivers/net/ethernet/intel/Kconfig|  18 ++
 drivers/net/ethernet/intel/igb/Makefile   |   2 +-
 drivers/net/ethernet/intel/igb/igb.h  |  26 ++
 drivers/net/ethernet/intel/igb/igb_main.c |  39 ++-
 drivers/net/ethernet/intel/igb/igb_tsn.c  | 468 ++
 5 files changed, 550 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/igb/igb_tsn.c

diff --git a/drivers/net/ethernet/intel/Kconfig 
b/drivers/net/ethernet/intel/Kconfig
index c0e1743..d4382b4 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -99,6 +99,24 @@ config IGB
  To compile this driver as a module, choose M here. The module
  will be called igb.
 
+config IGB_TSN
+   tristate "TSN Support for Intel(R) 82575/82576 i210 Network Controller"
+   depends on IGB && TSN
+   ---help---
+ This driver supports TSN (AVB) on Intel I210 network controllers.
+
+ When enabled, it will allow the module to be loaded with
+ "use_tsn" which will initialize the controller to A/V-mode
+ instead of legacy-mode. This will take 3 of the tx-queues and
+ place them in 802.1Q QoS mode and enable the credit-based
+ shaper for 2 of the queues.
+
+ If built with this option, but not loaded with use_tsn, the
+ only difference is a slightly larger module, no extra
+ code paths are called.
+
+ If unsure, say No
+
 config IGB_HWMON
bool "Intel(R) PCI-Express Gigabit adapters HWMON support"
default y
diff --git a/drivers/net/ethernet/intel/igb/Makefile 
b/drivers/net/ethernet/intel/igb/Makefile
index 5bcb2de..1a9b776 100644
--- a/drivers/net/ethernet/intel/igb/Makefile
+++ b/drivers/net/ethernet/intel/igb/Makefile
@@ -33,4 +33,4 @@ obj-$(CONFIG_IGB) += igb.o
 
 igb-objs := igb_main.o igb_ethtool.o e1000_82575.o \
e1000_mac.o e1000_nvm.o e1000_phy.o e1000_mbx.o \
-   e1000_i210.o igb_ptp.o igb_hwmon.o
+   e1000_i210.o igb_ptp.o igb_hwmon.o igb_tsn.o
diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index d11093d..474a5b4 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -394,6 +394,7 @@ struct igb_nfc_filter {
 };
 
 /* board specific private data structure */
+
 struct igb_adapter {
unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
 
@@ -519,6 +520,17 @@ struct igb_adapter {
/* lock for RX network flow classification filter */
spinlock_t nfc_lock;
bool etype_bitmap[MAX_ETYPE_FILTER];
+
+#if IS_ENABLED(CONFIG_IGB_TSN)
+   /* Reserved BW for class A and B */
+   s32 sra_idleslope_res;
+   s32 srb_idleslope_res;
+   u8 pcp_hi;
+   u8 pcp_lo;
+   u8 tsn_ready:1;
+   u8 tsn_vlan_added:1;
+   u8 res:6;
+#endif /* IGB_TSN */
 };
 
 /* flags controlling PTP/1588 function */
@@ -540,6 +552,7 @@ struct igb_adapter {
 #define IGB_FLAG_HAS_MSIX  BIT(13)
 #define IGB_FLAG_EEE   BIT(14)
 #define IGB_FLAG_VLAN_PROMISC  BIT(15)
+#define IGB_FLAG_QAV_PRIO  BIT(16)
 
 /* Media Auto Sense */
 #define IGB_MAS_ENABLE_0   0X0001
@@ -603,6 

Re: [PATCH 1/2] bpf: do not use KMALLOC_SHIFT_MAX

2016-12-16 Thread Alexei Starovoitov
On Thu, Dec 15, 2016 at 05:47:21PM +0100, Michal Hocko wrote:
> From: Michal Hocko 
> 
> 01b3f52157ff ("bpf: fix allocation warnings in bpf maps and integer
> overflow") has added checks for the maximum allocateable size. It
> (ab)used KMALLOC_SHIFT_MAX for that purpose. While this is not incorrect
> it is not very clean because we already have KMALLOC_MAX_SIZE for this
> very reason so let's change both checks to use KMALLOC_MAX_SIZE instead.
> 
> Cc: Alexei Starovoitov 
> Signed-off-by: Michal Hocko 

Nack until the patches 1 and 2 are reversed.

The bug that patch 2 fixes was the reason we used KMALLOC_SHIFT_MAX - 1 here
instead of KMALLOC_MAX_SIZE,
so you have to fix the kmalloc vs __alloc_pages_slowpath discrepancy first.

> ---
>  kernel/bpf/arraymap.c | 2 +-
>  kernel/bpf/hashtab.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index a2ac051c342f..229a5d5df977 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -56,7 +56,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
>   attr->value_size == 0 || attr->map_flags)
>   return ERR_PTR(-EINVAL);
>  
> - if (attr->value_size >= 1 << (KMALLOC_SHIFT_MAX - 1))
> + if (attr->value_size > KMALLOC_MAX_SIZE)
>   /* if value_size is bigger, the user space won't be able to
>* access the elements.
>*/
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index ad1bc67aff1b..c5ec7dc71c84 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -181,7 +181,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr 
> *attr)
>*/
>   goto free_htab;
>  
> - if (htab->map.value_size >= (1 << (KMALLOC_SHIFT_MAX - 1)) -
> + if (htab->map.value_size >= KMALLOC_MAX_SIZE -
>   MAX_BPF_STACK - sizeof(struct htab_elem))
>   /* if value_size is bigger, the user space won't be able to
>* access the elements via bpf syscall. This check also makes
> -- 
> 2.10.2
> 


[TSN RFC v2 1/9] igb: add missing fields to TXDCTL-register

2016-12-16 Thread henrik
From: Henrik Austad 

The current list of E1000_TXDCTL-registers is incomplete. This adds the
missing parts for the Transmit Descriptor Control (TXDCTL) register.

The rest of these values (threshold for descriptor read/write) for
TXDCTL seems to be defined in igb/igb.h, not sure why this is split
though.

It seems that this was left out in the commit that added support for
82575 Gigabit Ethernet driver 9d5c8243 (igb: PCI-Express 82575 Gigabit
Ethernet driver).

Cc: linux-ker...@vger.kernel.org
Cc: Jeff Kirsher 
Cc: intel-wired-...@lists.osuosl.org
Signed-off-by: Henrik Austad 
---
 drivers/net/ethernet/intel/igb/e1000_82575.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.h 
b/drivers/net/ethernet/intel/igb/e1000_82575.h
index acf0605..7faa482 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.h
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.h
@@ -158,7 +158,11 @@ struct e1000_adv_tx_context_desc {
 
 /* Additional Transmit Descriptor Control definitions */
 #define E1000_TXDCTL_QUEUE_ENABLE  0x0200 /* Enable specific Tx Queue */
+
+/* Transmit Software Flush, sw-triggered desc writeback */
+#define E1000_TXDCTL_SWFLSH0x0400
 /* Tx Queue Arbitration Priority 0=low, 1=high */
+#define E1000_TXDCTL_PRIORITY  0x0800
 
 /* Additional Receive Descriptor Control definitions */
 #define E1000_RXDCTL_QUEUE_ENABLE  0x0200 /* Enable specific Rx Queue */
-- 
2.7.4



[TSN RFC v2 3/9] TSN: Add the standard formerly known as AVB to the kernel

2016-12-16 Thread henrik
From: Henrik Austad 

TSN provides a mechanism to create reliable, jitter-free, low latency
guaranteed bandwidth links over a local network. It does this by
reserving a path through the network. Support for TSN must be found in
both the NIC as well as in the network itself.

This adds required hooks into netdev_ops so that the core TSN driver can
use this when configuring a new NIC or setting up a new link. It also
provides hook for removing a link and reducing the idle_slope parameter on
the NIC.

(We need to set the PCP values when we first configure the link. This
 value should not change as long as we have valid streams running, and in
 most cases, the PCP for the domain will not change.)

Cc: "David S. Miller" 
Signed-off-by: Henrik Austad 
---
 include/linux/netdevice.h | 44 
 net/Kconfig   |  1 +
 net/tsn/Kconfig   | 32 
 3 files changed, 77 insertions(+)
 create mode 100644 net/tsn/Kconfig

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e16a2a9..0d758aa 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -112,6 +112,15 @@ enum netdev_tx {
 };
 typedef enum netdev_tx netdev_tx_t;
 
+#if IS_ENABLED(CONFIG_TSN)
+enum sr_class {
+   SR_CLASS_A = 1,
+   SR_CLASS_B = 2,
+   SR_CLASS_LAST,
+   SR_CLASS_ERR,
+};
+#endif
+
 /*
  * Current order: NETDEV_TX_MASK > NET_XMIT_MASK >= 0 is significant;
  * hard_start_xmit() return < NET_XMIT_MASK means skb was consumed.
@@ -944,6 +953,31 @@ struct netdev_xdp {
  *
  * void (*ndo_poll_controller)(struct net_device *dev);
  *
+ * TSN functions (if CONFIG_TSN)
+ *
+ * int (*ndo_tsn_capable)(struct net_device *dev);
+ * If a particular device is capable of sustaining TSN traffic
+ * provided current configuration
+ *
+ * int (*ndo_tsn_link_configure)(struct net_device *dev,
+ *  enum sr_class class,
+ *  u16 framesize,
+ *  u16 vid,
+ *  u8 add_link,
+ *  u8 pcp_hi,
+ *  u8 pcp_lo)
+);
+ * Configure a NIC to handle TSN-streams
+ * - Update the bandwidth for the particular stream-class.
+ * - The framesize is the size of the _entire_ frame (not just the payload)
+ *   since the full size is required to allocate bandwidth through
+ *   the credit based shaper in the NIC
+ * - the vlan_id is the configured vlan for TSN in this session.
+ * - add_link: if the link should be added or subtracted from the current
+ *   budget.
+ *- u8 pcp_hi: 802.1Q priority value for high-class traffic (class A)
+ *- u8 pcp_lo: 802.1Q priority value for low-class traffic (class B)
+ *
  * SR-IOV management functions.
  * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
  * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan,
@@ -1185,6 +1219,16 @@ struct net_device_ops {
 #ifdef CONFIG_NET_RX_BUSY_POLL
int (*ndo_busy_poll)(struct napi_struct *dev);
 #endif
+
+#if IS_ENABLED(CONFIG_TSN)
+   int (*ndo_tsn_capable)(struct net_device *dev);
+   int (*ndo_tsn_link_configure)(struct net_device 
*dev,
+ enum sr_class class,
+ u16 framesize,
+ u16 vid, u8 add_link,
+ u8 pcp_hi, u8 pcp_lo);
+#endif /* CONFIG_TSN */
+
int (*ndo_set_vf_mac)(struct net_device *dev,
  int queue, u8 *mac);
int (*ndo_set_vf_vlan)(struct net_device *dev,
diff --git a/net/Kconfig b/net/Kconfig
index 7b6cd34..19b8f9a 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -215,6 +215,7 @@ source "net/802/Kconfig"
 source "net/bridge/Kconfig"
 source "net/dsa/Kconfig"
 source "net/8021q/Kconfig"
+source "net/tsn/Kconfig"
 source "net/decnet/Kconfig"
 source "net/llc/Kconfig"
 source "net/ipx/Kconfig"
diff --git a/net/tsn/Kconfig b/net/tsn/Kconfig
new file mode 100644
index 000..1fc3c1d
--- /dev/null
+++ b/net/tsn/Kconfig
@@ -0,0 +1,32 @@
+#
+# Configuration for 802.1 Time Sensitive Networking (TSN)
+#
+
+config TSN
+   tristate "802.1 TSN Support"
+   depends on VLAN_8021Q && PTP_1588_CLOCK && CONFIGFS_FS
+   ---help---
+ Select this if you want to enable TSN on capable interfaces.
+
+ TSN allows you to set up deterministic links on your LAN (only
+ L2 is currently supported). Once loaded, the driver will probe
+ all available interfaces if they are capable of supporting TSN
+ links.
+
+ Once loaded, a directory in 

[TSN RFC v2 0/9] TSN driver for the kernel

2016-12-16 Thread henrik
From: Henrik Austad 


The driver is directed via ConfigFS as we need userspace to handle
stream-reservation (MSRP), discovery and enumeration (IEEE 1722.1) and
whatever other management is needed. This also includes running an
appropriate PTP daemon (TSN favors gPTP).

Once we have all the required attributes, we can create link using
mkdir, and use write() to set the attributes. Once ready, specify the
'shim' (basically a thin wrapper between TSN and another subsystem) and
we start pushing out frames.

The network part: it ties directly into the Rx-handler for receive and
writes skb's using dev_queue_xmit(). This could probably be improved.

2 new fields in netdev_ops have been introduced, and the Intel
igb-driver has been updated (as this an AVB-capable NIC which is
available as a PCI-e card).

What remains (tl;dr: a lot) a.k.a "Known problems" or "working on it!"
- tie to (g)PTP properly, currently using ktime_get() for presentation
  time
- get time from shim into TSN
- let shim create/manage buffer
- redo parts of the link-stuff using RCUs, the current setup is a bit
  clumsy.
- The igb-driver does not work properly when compiled with IGB_TSN, some
  details in setting the register values needs to be figured out. I am
  working on this, but as it stands, the best bet is to load tsn using
  in_debug=1 to bypass the capability-check. I have had e1000 and sky2
  running for several days without crashing, igb crashes and burns
  violently.
- The ALSA driver does not handle multiple devices very well and is a
  work in progress.

* v2: changes since v1

Changes since v1
- updated to latest upstream kernel (v4.8)
- set dedicated enabled-attribute and let shim be stored in own (support
  future plan for enabling per-shim attributes)
- fixed endianess issue in bitfields used in tsn-structs
- Updated some of the trace-events to use trace_class
- Fix various silly typos
- Handle disabling of link from hrtimer a bit more gracefully (that
  actually works-ish).
- use old skb and size of skb when that is set (Reporte by Nikita)
- Move PCP-codes to NIC and not in the link itself
- Allow TSN-capable card to be loaded even when in debug-mode (and do
  not enforce TSN behaviour)
- Start hooking into ALSA's get_time_info hooks (very much incomplete)
- use threads for sending frames, wake from hrtimer-callback.
  This also queues up awaiting timers if we fail to complete the
  transmit before another timer arrives, it will immediately execute
  another iteration, so no events should be lost. That being said,
  should this happen, it is a clear bug as we really should complete
  well before the next interval.
- Cleanup link-locking and differentiate between Talker and Listener (as
  Listener grab link-lock from IRQ context)
- Change list-lock to spinlock as we may need to take a link-lock whilst
  holding the master list-lock.
- Do a more careful dance holding the spinlocks to regions only doing
  actual update.

Network driver (I210 only)
- bring up all Tx-/Rx-queues when igb is in TSN-mode regardless of how
  many CPUs the system has for I210
- Correctly calculate the idle_slope in I210's configure hook
- Update igb-driver with queue-select and return correct queue when
  sending TSN-frames
- add IGB_FLAG_QAV_PRIO flag to igb_adapter (to handle proper config of
  tx-ring when adapter is brought up.
- add TXDCTL logic (part of preparatory work for TSN) to igb-driver
- Improve SR(A|B) accountingo

ALSA Shim
- Allow userspace to grab much smaller chunks of data (down to a single
  Class A frame for S16_LE 2ch 48kHz).
- Create the card with index/id pattern to avoid collision with other
  cards.
* v1

Before reading on - this is not even beta, but I'd really appreciate if
people would comment on the overall architecture and perhaps provide
some pointers to where I should improve/fix/update
- thanks!

This is a very early RFC for a TSN-driver in the kernel. It has been
floating around in my repo for a while and I would appreciate some
feedback on the overall design to avoid doing some major blunders.

There are at least one AVB-driver (the AV-part of TSN) in the kernel
already. This driver aims to solve a wider scope as TSN can do much more
than just audio. A very basic ALSA-driver is added to the end that
allows you to play music between 2 machines using aplay in one end and
arecord | aplay on the other (some fiddling required) We have plans for
doing the same for v4l2 eventually (but there are other fishes to fry
first). The same goes for a TSN_SOCK type approach as well.

Henrik Austad (9):
  igb: add missing fields to TXDCTL-register
  TSN: add documentation
  TSN: Add the standard formerly known as AVB to the kernel
  Adding TSN-driver to Intel I210 controller
  Add TSN header for the driver
  Add TSN machinery to drive the traffic from a shim over the network
  Add TSN event-tracing
  AVB ALSA - Add ALSA shim for TSN
  MAINTAINERS: add TSN/AVB-entries

 Documentation/TSN/tsn.txt

[TSN RFC v2 9/9] MAINTAINERS: add TSN/AVB-entries

2016-12-16 Thread henrik
From: Henrik Austad 

Not sure how relevant this is other than making a point about
maintaining it.

Signed-off-by: Henrik Austad 
---
 MAINTAINERS | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 63cefa6..7c5afd2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12295,6 +12295,20 @@ T: git git://linuxtv.org/anttip/media_tree.git
 S: Maintained
 F: drivers/media/tuners/tua9001*
 
+TSN CORE DRIVER
+M: Henrik Austad 
+L: linux-ker...@vger.kernel.org
+S: Supported
+F: drivers/net/tsn/
+F: include/linux/tsn.h
+F: include/trace/events/tsn.h
+
+TSN_AVB_DRIVER
+M: Henrik Austad 
+L: alsa-de...@alsa-project.org (moderated for non-subscribers)
+S: Supported
+F: drivers/media/avb/
+
 TULIP NETWORK DRIVERS
 L: netdev@vger.kernel.org
 L: linux-par...@vger.kernel.org
-- 
2.7.4



[TSN RFC v2 2/9] TSN: add documentation

2016-12-16 Thread henrik
From: Henrik Austad 

Describe the overall design behind the TSN standard, the TSN-driver,
requirements to userspace and new functionality introduced.

Cc: "David S. Miller" 
Signed-off-by: Henrik Austad 
---
 Documentation/TSN/tsn.txt | 345 ++
 1 file changed, 345 insertions(+)
 create mode 100644 Documentation/TSN/tsn.txt

diff --git a/Documentation/TSN/tsn.txt b/Documentation/TSN/tsn.txt
new file mode 100644
index 000..540246f
--- /dev/null
+++ b/Documentation/TSN/tsn.txt
@@ -0,0 +1,345 @@
+Time Sensitive Networking (TSN)
+---
+
+[work in progress]
+
+1. Motivation
+=
+
+TSN is a set of open standards, formerly known as 'AVB' (Audio/Video
+Bridging). It was renamed to TSN to better reflect that it can do much
+more than just media transport and extended to handle more types of
+traffic.
+
+TSN is a way to create reliable, deterministic streams across a network
+without loss of frames due to congestion in the network. By using gPTP
+(a specialized IEEE-1588v2 PTP profile), the time can be synchronized
+with sub-us granularity across all the connected devices in the AVB
+domain.
+
+In its current version, this driver only supports L2 traffic (i.e
+etherframes only), but later version is planned to handle L3. L2-L3
+traversing is currently being worked on by the IETF detnet working
+group.
+
+2. Intro to AVB/TSN
+===
+
+The original standards were written with Audio/Video in mind, so the
+initial standards refer to this as 'AVB'. In later standards, this has
+changed to TSN, and AVB now refers to a service you can add on top of
+TSN. In some parts of the driver, this naming shines through, in
+particular for AVTP (AVB Transport Protocol), and this is to reflect the
+naming in the standards.
+
+In this document, we refer to the infrastructure part as TSN, and AVB to
+the ALSA/V4L2 shim which can be added on top of TSN to provide a
+media-service.
+
+TSN operates with 'streams', and one stream can contain pretty much
+whatever you like. An AVB stream carrying audio can carry multiple
+channels. The current revision of AVTP (defined in IEEE 1722 d16)
+defines many more types than media.
+
+A stream flows through the network from a Talker to a Listener. A Talker
+is a single End-station in the network, a Listener can be a single
+End-station (unicast) or a group of end-stdations (multicast).
+
+2.1 Domains
+
+2.1.1 SRP Domain
+
+An SRP domain is the set of entities in the network that support the
+Stream Reservation Protocol (IEEE 802.1Q-2014 Sec 35) and where all
+entities agree on the priority code points (PCP). A bridge will mark
+each port as either SRP capable or not capable.
+
+PCP is used to map a specific priority to a given traffic-class,
+typically class A or B.
+
+2.1.2 gPTP domain
+
+This is the set of all connected bridges and end-stations that support
+the gPTP protocol. gPTO is a PTPv2 profile.
+
+2.1.3 AVB Domain
+
+An AVB domain is the intersection of an SRP Domain and gPTP domain.
+
+
+2.2 End Station (ES)
+
+An TSN ES is where a stream either originates or ends -what others would
+call sources (Talkers) and sinks (Listeners). Looking back at pre-TSN
+when this was called AVB, these names make a bit more sense.
+
+Common for both types, they need to be PTPv2 capable, i.e. you need to
+timestamp gPTP frames upon ingress/egress to improve the accuracy of
+PTP.
+
+2.2.1 Talkers
+
+A Talker must be single ES in the AVB Domain.
+
+Hardware requirements:
+- Multiple Tx-queues
+- Credit based shaper on at least one of the queues for pacing the
+  frames onto the network
+- VLAN capable
+
+2.2.2 Listener
+
+A Listener does not have the same requirements as a Talker as it cannot
+control the pace of the incoming frames anyway. It is beneficial if the
+NIC understands VLANs and has a few Rx-queues so that you can steer all
+TSN-frames to a dedicated queue, but this is not a hard requirement.
+
+If the Listener receives audio, having an adjustable PL/L is a clear
+benefit to avoid resampling.
+
+2.3 Bridges
+
+A Bridge is what TSN calls switches that are TSN-capable. They must be
+able to prioritize TSN-streams, have the credit-based shaper available
+for that class, support SRP, support gPTP and so on. The requirements is
+laid down in "Forwardin and Queueing of Time Sensitive Streams" (IEEE
+802.1Q-2014 sec. 34).
+
+2.4 Relevant standards
+
+* IEEE 802.1BA-2011 Audio Video Bridging (AVB) Systems
+
+* IEEE 802.1Q-2014 sec 34 and 35
+
+  What is referred to as:
+  IEEE 802.1Qav (Forwarding and Queueing for Time-sensitive Streams)
+  IEEE 802.1Qat (Stream Registration protocol)
+
+* IEEE 802.1AS gPTP
+
+  A PTPv2 profile (from IEEE 1588) tailored for this domain. Notable
+  changes include the requirement that all nodes in the network must be
+  gPTP capable (i.e. no traversing non-PTP entities), and it allows
+ 

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> It appears that hsiphash can produce either 32-bit output or 64-bit
> output, with the output length parameter as part of the hash algorithm
> in there. When I code this for my kernel patchset, I very likely will
> only implement one output length size. Right now I'm leaning toward
> 32-bit.

A 128-bit output option was added to SipHash after the initial publication;
this is just the equivalent in 32-bit.

> - Is this a reasonable choice?

Yes.

> - Are there reasons why hsiphash with 64-bit output would be
>   reasonable? Or will we be fine sticking with 32-bit output only?

Personally, I'd put in a comment saying that "there's a 64-bit output
variant that's not implemented" and punt until someone find a need.

> With both hsiphash and siphash, the division of usage will probably become:
> - Use 64-bit output 128-bit key siphash for keyed RNG-like things,
>   such as syncookies and sequence numbers
> - Use 64-bit output 128-bit key siphash for hashtables that must
>   absolutely be secure to an extremely high bandwidth attacker, such as
>   userspace directly DoSing a kernel hashtable
> - Use 32-bit output 64-bit key hsiphash for quick hashtable functions
>   that still must be secure but do not require as large of a security
>   margin.

On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
should be used always.  Don't even compile the 32-bit code, to prevent
anyone accidentally using it, and make hsiphash an alias for siphash.

On a 32-bit machine, it's a much trickier case.  I'd be tempted to
use the 32-bit code always, but it needs examination.

Fortunately, the cost of brute-forcing hash functions can be fairly
exactly quantified, thanks to bitcoin miners.  It currently takes 2^70
hashes to create one bitcoin block, worth 25 bitcoins ($19,500).  Thus,
2^63 hashes cost $152.

Now, there are two factors that must be considered:
- That's a very very "wholesale" rate.  That's assuming you're doing
  large numbers of these and can put in the up-front effort designing
  silicon ASICs to do the attack.
- That's for a more difficult hash (double sha-256) than SipHash.
  That's a constant fator, but a pretty significant one.  If the wholesale
  assumption holds, that might bring the cost down another 6 or 7 bits,
  to $1-2 per break.

If you're not the NSA and limited to general-purpose silicon, let's
assume a state of the art GPU (Radeon HD 7970; AMD GPUs seem do to better
than nVidia).  The bitcoin mining rate for those is about 700M/second,
29.4 bits.  So 63 bits is 152502 GPU-days, divided by some factor
to account for SipHash's high speed compared to two rounds of SHA-2.
Call it 1000 GPU-days.

It's very doable, but also very non-trivial.  The question is, wouldn't
it be cheaper and easier just to do a brute-force flooding DDoS?

(This is why I wish the key size could be tweaked up to 80 bits.
That would take all these numbers out of the reasonable range.)


Let me consider your second example above, "secure against local users".
I should dig through your patchset and find the details, but what exactly
are the consequences of such an attack?  Hasn't a local user already
got much better ways to DoS the system?

The thing to remember is that we're worried only about the combination
of a *new* Linux kernel (new build or under active maintenance) and a
32-bit host.  You'd be hard-pressed to find a *single* machine fitting
that description which is hosting multiple users or VMs and is not 64-bit.

These days, 32-bit CPUs are for embedded applications: network appliances,
TVs, etc.  That means basically single-user.  Even phones are 64 bit.
Is this really a threat that needs to be defended against?


For your first case, network applications, the additional security
is definitely attractive.  Syncookies are only a DoS, but sequence
numbers are a real security issue; they can let you inject data into a
TCP connection.

Hash tables are much harder to attack.  The information you get back from
timing probes is statistical, and thus testing a key is more expensive.
With sequence numbers, large amounts (32 bits) the hash output is
directly observable.

I wish we could get away with 64-bit security, but given that the
modern internet involves attacks from NSA/Spetssvyaz/3PLA, I agree
it's just not enough.


Re: [PULL] virtio, vhost: new device, fixes, speedups

2016-12-16 Thread Linus Torvalds
On Fri, Dec 16, 2016 at 9:09 AM, Michael S. Tsirkin  wrote:
>
> Oh, that's because I set orderfile globally rather than
> just for the qemu project which wants it.
> Fixed, sorry about that.

That explains it. I should have remembered, I think this came up once
before with somebody else.

Yeah, for the kernel it makes things much easier (at least for me) to
have everything just the default alphabetical ordering, particularly
because we use directory structure for maintenance areas.

So ordering the diffs by type ends up breaking my mental model for "is
this pull request touching the right files", which is why I reacted.

 Linus


  1   2   >