date:20071031

Re: [PATCH v2] using mii-bitbang on different processor ports

2007-10-31 Thread Sergej Stepanov

Am Dienstag, den 30.10.2007, 13:23 -0500 schrieb Scott Wood:
 Sergej Stepanov wrote:
  +   if( !of_address_to_resource(np, 1, res[1])) {
 
 The spacing is still wrong.
 
  -   iounmap(bitbang-dir);
  +   if ( bitbang-mdio.dir != bitbang-mdc.dir)
  +   iounmap(bitbang-mdio.dir);
  +   iounmap(bitbang-mdc.dir);
 
 And here.
 
 -Scott
Oh, sorry.
-- 
Sergej I. Stepanov
E-PA
IDS GmbH
Nobelstr. 18, Zim. 2.1.05
D-76275 Ettlingen
T +49 (0) 72 43/2 18-615
F +49 (0) 72 43/2 18-400
Email: [EMAIL PROTECTED]

http://www.ids.de
Geschäftsführer: Norbert Wagner, Friedrich Abriß 
Sitz der Gesellschaft: Ettlingen 
Amtsgericht Mannheim HRB 362503

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Fix networking scatterlist regressions.

2007-10-31 Thread Jens Axboe

On Tue, Oct 30 2007, David Miller wrote:
 
 I just checked the following bug fix into net-2.6
 
 Rusty, have a quick look at virtio_net wrt. the changes I
 made to skb_to_sgvec()'s behavior.  I think I might have
 even fixed something :-)
 
 Jens, please review my commentary wrt. sg_mark_end() and
 it's nonintuitive behavior which led to these bugs.

I fully agree, lets just change sg_mark_end() to NOT overwrite a stored
page there. The current interface isn't nice and can't be used after
filling the sg table, which is what users would want. I've added such a
patch to the sg repo.

From 5a0347663f51850eb52b89c4dcf6a714ea8d3965 Mon Sep 17 00:00:00 2001
From: Jens Axboe [EMAIL PROTECTED]
Date: Wed, 31 Oct 2007 08:31:23 +0100
Subject: [PATCH] [SG] Remove __sg_mark_end()

Make sg_mark_end() NOT overwrite the page link. Then it can be used
after filling the sg table, which is what users want. That means that
__sg_mark_end() is no longer useful, so kill it.

It's important the sg entries be initialized before using sg_mark_end(),
so also add a debug check to catch use-before-init.

Signed-off-by: Jens Axboe [EMAIL PROTECTED]
---
 block/ll_rw_blk.c   |2 +-
 include/linux/scatterlist.h |   10 --
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index e948407..fdc0707 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1369,7 +1369,7 @@ new_segment:
} /* segments in rq */
 
if (sg)
-   __sg_mark_end(sg);
+   sg_mark_end(sg);
 
return nsegs;
 }
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index d5e1876..aa97954 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -195,13 +195,11 @@ static inline void sg_chain(struct scatterlist *prv, 
unsigned int prv_nents,
  *   Marks the last entry as the termination point for sg_next()
  *
  **/
-static inline void sg_mark_end(struct scatterlist *sgl, unsigned int nents)
-{
-   sgl[nents - 1].page_link = 0x02;
-}
-
-static inline void __sg_mark_end(struct scatterlist *sg)
+static inline void sg_mark_end(struct scatterlist *sg)
 {
+#ifdef CONFIG_DEBUG_SG
+   BUG_ON(sg-sg_magic != SG_MAGIC);
+#endif
sg-page_link |= 0x02;
 }
 
-- 
1.5.3.GIT


-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Fix networking scatterlist regressions.

2007-10-31 Thread David Miller

From: Jens Axboe [EMAIL PROTECTED]
Date: Wed, 31 Oct 2007 08:32:07 +0100

 [SG] Remove __sg_mark_end()

 Make sg_mark_end() NOT overwrite the page link. Then it can be used
 after filling the sg table, which is what users want. That means that
 __sg_mark_end() is no longer useful, so kill it.

 It's important the sg entries be initialized before using sg_mark_end(),
 so also add a debug check to catch use-before-init.

 Signed-off-by: Jens Axboe [EMAIL PROTECTED]

Ok, but I just pushed my changes to Linus and once those show up
you'll need to extend this patch to kill the '__' prefix from
all the rest of the calls which will be in the tree.

Thanks!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Fix networking scatterlist regressions.

2007-10-31 Thread Jens Axboe

On Wed, Oct 31 2007, David Miller wrote:
 From: Jens Axboe [EMAIL PROTECTED]
 Date: Wed, 31 Oct 2007 08:32:07 +0100

  [SG] Remove __sg_mark_end()

  Make sg_mark_end() NOT overwrite the page link. Then it can be used
  after filling the sg table, which is what users want. That means that
  __sg_mark_end() is no longer useful, so kill it.

  It's important the sg entries be initialized before using sg_mark_end(),
  so also add a debug check to catch use-before-init.

  Signed-off-by: Jens Axboe [EMAIL PROTECTED]

 Ok, but I just pushed my changes to Linus and once those show up
 you'll need to extend this patch to kill the '__' prefix from
 all the rest of the calls which will be in the tree.

 Thanks!

No problem, I'll base further sg_mark_end() updates on top of yours.
Just need to get that last email I sent out resolved, the
gss_krb5_crypto bits.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Fix networking scatterlist regressions.

2007-10-31 Thread Jens Axboe

On Tue, Oct 30 2007, David Miller wrote:
 diff --git a/net/sunrpc/auth_gss/gss_krb5_crypto.c 
 b/net/sunrpc/auth_gss/gss_krb5_crypto.c
 index 91cd8f0..4a8aa94 100644
 --- a/net/sunrpc/auth_gss/gss_krb5_crypto.c
 +++ b/net/sunrpc/auth_gss/gss_krb5_crypto.c
 @@ -211,8 +211,8 @@ encryptor(struct scatterlist *sg, void *data)
   if (thislen == 0)
   return 0;
  
 - sg_mark_end(desc-infrags, desc-fragno);
 - sg_mark_end(desc-outfrags, desc-fragno);
 + __sg_mark_end(desc-infrags, desc-fragno);
 + __sg_mark_end(desc-outfrags, desc-fragno);
  
   ret = crypto_blkcipher_encrypt_iv(desc-desc, desc-outfrags,
 desc-infrags, thislen);
 @@ -293,7 +293,7 @@ decryptor(struct scatterlist *sg, void *data)
   if (thislen == 0)
   return 0;
  
 - sg_mark_end(desc-frags, desc-fragno);
 + __sg_mark_end(desc-frags, desc-fragno);
  
   ret = crypto_blkcipher_decrypt_iv(desc-desc, desc-frags,
 desc-frags, thislen);

Hmm? These don't seem right. It also has a weird code sequence:

...
sg_mark_end(desc-infrags[desc-fragno - 1]);
sg_mark_end(desc-outfrags[desc-fragno - 1]);

ret = crypto_blkcipher_encrypt_iv(desc-desc, desc-outfrags,
  desc-infrags, thislen);
if (ret)
return ret;

sg_init_table(desc-infrags, 4);
sg_init_table(desc-outfrags, 4);
...

Did something go wrong there?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Fix networking scatterlist regressions.

2007-10-31 Thread David Miller

From: Jens Axboe [EMAIL PROTECTED]
Date: Wed, 31 Oct 2007 08:46:21 +0100

 On Tue, Oct 30 2007, David Miller wrote:
  @@ -293,7 +293,7 @@ decryptor(struct scatterlist *sg, void *data)
  if (thislen == 0)
  return 0;
   
  -   sg_mark_end(desc-frags, desc-fragno);
  +   __sg_mark_end(desc-frags, desc-fragno);
   
  ret = crypto_blkcipher_decrypt_iv(desc-desc, desc-frags,
desc-frags, thislen);
 
 Hmm? These don't seem right. It also has a weird code sequence:
 ...
 Did something go wrong there?

Yes, I fixed those up after doing some allmodconfig builds.

Here is the final patch I actually pushed to Linus:

From 51c739d1f484b2562040a3e496dc8e1670d4e279 Mon Sep 17 00:00:00 2001
From: David S. Miller [EMAIL PROTECTED]
Date: Tue, 30 Oct 2007 21:29:29 -0700
Subject: [PATCH] [NET]: Fix incorrect sg_mark_end() calls.

This fixes scatterlist corruptions added by

commit 68e3f5dd4db62619fdbe520d36c9ebf62e672256
[CRYPTO] users: Fix up scatterlist conversion errors

The issue is that the code calls sg_mark_end() which clobbers the
sg_page() pointer of the final scatterlist entry.

The first part fo the fix makes skb_to_sgvec() do __sg_mark_end().

After considering all skb_to_sgvec() call sites the most correct
solution is to call __sg_mark_end() in skb_to_sgvec() since that is
what all of the callers would end up doing anyways.

I suspect this might have fixed some problems in virtio_net which is
the sole non-crypto user of skb_to_sgvec().

Other similar sg_mark_end() cases were converted over to
__sg_mark_end() as well.

Arguably sg_mark_end() is a poorly named function because it doesn't
just mark, it clears out the page pointer as a side effect, which is
what led to these bugs in the first place.

The one remaining plain sg_mark_end() call is in scsi_alloc_sgtable()
and arguably it could be converted to __sg_mark_end() if only so that
we can delete this confusing interface from linux/scatterlist.h

Signed-off-by: David S. Miller [EMAIL PROTECTED]
---
 net/core/skbuff.c |   16 +---
 net/ipv4/esp4.c   |   12 +++-
 net/ipv4/tcp_ipv4.c   |2 +-
 net/ipv6/esp6.c   |   13 +++--
 net/ipv6/tcp_ipv6.c   |2 +-
 net/rxrpc/rxkad.c |9 +
 net/sunrpc/auth_gss/gss_krb5_crypto.c |6 +++---
 7 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 573e172..64b50ff 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2028,8 +2028,8 @@ void __init skb_init(void)
  * Fill the specified scatter-gather list with mappings/pointers into a
  * region of the buffer space attached to a socket buffer.
  */
-int
-skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
+static int
+__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int 
len)
 {
int start = skb_headlen(skb);
int i, copy = start - offset;
@@ -2078,7 +2078,8 @@ skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, 
int offset, int len)
if ((copy = end - offset)  0) {
if (copy  len)
copy = len;
-   elt += skb_to_sgvec(list, sg+elt, offset - 
start, copy);
+   elt += __skb_to_sgvec(list, sg+elt, offset - 
start,
+ copy);
if ((len -= copy) == 0)
return elt;
offset += copy;
@@ -2090,6 +2091,15 @@ skb_to_sgvec(struct sk_buff *skb, struct scatterlist 
*sg, int offset, int len)
return elt;
 }
 
+int skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int 
len)
+{
+   int nsg = __skb_to_sgvec(skb, sg, offset, len);
+
+   __sg_mark_end(sg[nsg - 1]);
+
+   return nsg;
+}
+
 /**
  * skb_cow_data - Check that a socket buffer's data buffers are writable
  * @skb: The socket buffer to check.
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index cad4278..c31bccb 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -111,9 +111,10 @@ static int esp_output(struct xfrm_state *x, struct sk_buff 
*skb)
goto unlock;
}
sg_init_table(sg, nfrags);
-   sg_mark_end(sg, skb_to_sgvec(skb, sg, esph-enc_data +
- esp-conf.ivlen -
- skb-data, clen));
+   skb_to_sgvec(skb, sg,
+esph-enc_data +
+esp-conf.ivlen -
+skb-data, clen);
err = crypto_blkcipher_encrypt(desc, sg, sg, clen);
if (unlikely(sg !=

[PATCH 2.6.24 1/1]S2io: Support for add/delete/store/restore ethernet addresses

2007-10-31 Thread Sreenivasa Honnur

- Support to add/delete/store/restore 64 and 128 Ethernet addresses for Xframe 
I and Xframe II respectively.

Signed-off-by: Sreenivasa Honnur [EMAIL PROTECTED]
---
diff -urpN org/drivers/net/s2io.c patch_1/drivers/net/s2io.c
--- org/drivers/net/s2io.c  2007-09-26 00:01:14.0 +0530
+++ patch_1/drivers/net/s2io.c  2007-09-26 22:42:11.0 +0530
@@ -84,7 +84,7 @@
 #include s2io.h
 #include s2io-regs.h
 
-#define DRV_VERSION 2.0.26.5
+#define DRV_VERSION 2.0.26.6
 
 /* S2io Driver name  version. */
 static char s2io_driver_name[] = Neterion;
@@ -3363,6 +3363,9 @@ static void s2io_reset(struct s2io_nic *
/* Set swapper to enable I/O register access */
s2io_set_swapper(sp);
 
+   /* restore mac address entries */
+   do_s2io_restore_unicast_mc(sp);
+
/* Restore the MSIX table entries from local variables */
restore_xmsi_data(sp);
 
@@ -3421,9 +3424,6 @@ static void s2io_reset(struct s2io_nic *
writeq(val64, bar0-pcc_err_reg);
}
 
-   /* restore the previously assigned mac address */
-   do_s2io_prog_unicast(sp-dev, (u8 *)sp-def_mac_addr[0].mac_addr);
-
sp-device_enabled_once = FALSE;
 }
 
@@ -3896,8 +3896,17 @@ hw_init_failed:
 static int s2io_close(struct net_device *dev)
 {
struct s2io_nic *sp = dev-priv;
+   struct config_param *config = sp-config;
+   u64 tmp64;
+   int off;
 
netif_stop_queue(dev);
+   /* delete all populated mac entries */
+   for (off = 1; off  config-max_mc_addr; off++) {
+   tmp64 = do_s2io_read_unicast_mc(sp, off);
+   if (tmp64 != S2IO_DISABLE_MAC_ENTRY)
+   do_s2io_delete_unicast_mc(sp, tmp64);
+   }
napi_disable(sp-napi);
/* Reset card, kill tasklet and free Tx and Rx buffers. */
s2io_card_down(sp);
@@ -4699,8 +4708,9 @@ static void s2io_set_multicast(struct ne
struct XENA_dev_config __iomem *bar0 = sp-bar0;
u64 val64 = 0, multi_mac = 0x010203040506ULL, mask =
0xfeffULL;
-   u64 dis_addr = 0xULL, mac_addr = 0;
+   u64 dis_addr = S2IO_DISABLE_MAC_ENTRY, mac_addr = 0;
void __iomem *add;
+   struct config_param *config = sp-config;
 
if ((dev-flags  IFF_ALLMULTI)  (!sp-m_cast_flg)) {
/*  Enable all Multicast addresses */
@@ -4710,7 +4720,7 @@ static void s2io_set_multicast(struct ne
   bar0-rmac_addr_data1_mem);
val64 = RMAC_ADDR_CMD_MEM_WE |
RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD |
-   RMAC_ADDR_CMD_MEM_OFFSET(MAC_MC_ALL_MC_ADDR_OFFSET);
+   RMAC_ADDR_CMD_MEM_OFFSET(config-max_mc_addr - 1);
writeq(val64, bar0-rmac_addr_cmd_mem);
/* Wait till command completes */
wait_for_cmd_complete(bar0-rmac_addr_cmd_mem,
@@ -4718,7 +4728,7 @@ static void s2io_set_multicast(struct ne
S2IO_BIT_RESET);
 
sp-m_cast_flg = 1;
-   sp-all_multi_pos = MAC_MC_ALL_MC_ADDR_OFFSET;
+   sp-all_multi_pos = config-max_mc_addr - 1;
} else if ((dev-flags  IFF_ALLMULTI)  (sp-m_cast_flg)) {
/*  Disable all Multicast addresses */
writeq(RMAC_ADDR_DATA0_MEM_ADDR(dis_addr),
@@ -4787,7 +4797,7 @@ static void s2io_set_multicast(struct ne
/*  Update individual M_CAST address list */
if ((!sp-m_cast_flg)  dev-mc_count) {
if (dev-mc_count 
-   (MAX_ADDRS_SUPPORTED - MAC_MC_ADDR_START_OFFSET - 1)) {
+   (config-max_mc_addr - config-max_mac_addr)) {
DBG_PRINT(ERR_DBG, %s: No more Rx filters ,
  dev-name);
DBG_PRINT(ERR_DBG, can be added, please enable );
@@ -4807,7 +4817,7 @@ static void s2io_set_multicast(struct ne
val64 = RMAC_ADDR_CMD_MEM_WE |
RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD |
RMAC_ADDR_CMD_MEM_OFFSET
-   (MAC_MC_ADDR_START_OFFSET + i);
+   (config-mc_start_offset + i);
writeq(val64, bar0-rmac_addr_cmd_mem);
 
/* Wait for command completes */
@@ -4839,7 +4849,7 @@ static void s2io_set_multicast(struct ne
val64 = RMAC_ADDR_CMD_MEM_WE |
RMAC_ADDR_CMD_MEM_STROBE_NEW_CMD |
RMAC_ADDR_CMD_MEM_OFFSET
-   (i + MAC_MC_ADDR_START_OFFSET);
+   (i + config-mc_start_offset);
writeq(val64, bar0-rmac_addr_cmd_mem);
 
/* Wait for command completes */
@@ -4855,8 +4865,76 @@ static void s2io_set_multicast(struct ne
}
 }
 
-/* add unicast MAC address to CAM */
-static int

Re: [PATCH]: Fix networking scatterlist regressions.

2007-10-31 Thread David Miller

From: Jens Axboe [EMAIL PROTECTED]
Date: Wed, 31 Oct 2007 09:01:43 +0100

 On Wed, Oct 31 2007, David Miller wrote:
  From: Jens Axboe [EMAIL PROTECTED]
  Date: Wed, 31 Oct 2007 08:46:21 +0100

   On Tue, Oct 30 2007, David Miller wrote:
@@ -293,7 +293,7 @@ decryptor(struct scatterlist *sg, void *data)
if (thislen == 0)
return 0;

-   sg_mark_end(desc-frags, desc-fragno);
+   __sg_mark_end(desc-frags, desc-fragno);

ret = crypto_blkcipher_decrypt_iv(desc-desc, desc-frags,
  desc-frags, thislen);

   Hmm? These don't seem right. It also has a weird code sequence:
   ...
   Did something go wrong there?

  Yes, I fixed those up after doing some allmodconfig builds.

  Here is the final patch I actually pushed to Linus:

 That fixes up the sg_mark_end() bit, but it's still calling
 sg_init_table() just a few lines further down. Is that correct?

Absolutely.  It initially using the scatterlist for this
crypto layer call:

ret = crypto_blkcipher_decrypt_iv(desc-desc, desc-frags,
  desc-frags, thislen);
if (ret)
return ret;

then it reinits and sets the sglist to the values the caller
wants.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2.6.24 1/1]S2io: Fixed memory leak by freeing MSI-X local entry memories when vector allocation fails

2007-10-31 Thread Sreenivasa Honnur

- Fixed memory leak by freeing MSI-X local entry memories when vector allocation
fails in s2io_add_isr.
- Added two utility functions remove_msix_isr and remove_inta_isr to eliminate
code duplication.
- Implemented following review comments from Jeff
- Removed redundant stats-mem_freed and synchronize_irq call
- do_rem_msix_isr is renamed as remove_msix_isr
- do_rem_inta_isr is renamed as remove_inta_isr

Signed-off-by: Sreenivasa Honnur [EMAIL PROTECTED]
Signed-off-by: Ramkrishna Vepa [EMAIL PROTECTED]
---
diff -Nurp org/drivers/net/s2io.c patch1/drivers/net/s2io.c
--- org/drivers/net/s2io.c  2007-10-30 23:31:09.0 +0530
+++ patch1/drivers/net/s2io.c   2007-10-31 04:12:00.0 +0530
@@ -84,7 +84,7 @@
 #include s2io.h
 #include s2io-regs.h
 
-#define DRV_VERSION 2.0.26.6
+#define DRV_VERSION 2.0.26.7
 
 /* S2io Driver name  version. */
 static char s2io_driver_name[] = Neterion;
@@ -3775,6 +3775,40 @@ static int __devinit s2io_test_msi(struc
 
return err;
 }
+
+static void remove_msix_isr(struct s2io_nic *sp)
+{
+   int i;
+   u16 msi_control;
+
+   for (i = 0; i  MAX_REQUESTED_MSI_X; i++) {
+   if (sp-s2io_entries[i].in_use ==
+   MSIX_REGISTERED_SUCCESS) {
+   int vector = sp-entries[i].vector;
+   void *arg = sp-s2io_entries[i].arg;
+   free_irq(vector, arg);
+   }
+   }
+
+   kfree(sp-entries);
+   kfree(sp-s2io_entries);
+   sp-entries = NULL;
+   sp-s2io_entries = NULL;
+
+   pci_read_config_word(sp-pdev, 0x42, msi_control);
+   msi_control = 0xFFFE; /* Disable MSI */
+   pci_write_config_word(sp-pdev, 0x42, msi_control);
+
+   pci_disable_msix(sp-pdev);
+}
+
+static void remove_inta_isr(struct s2io_nic *sp)
+{
+   struct net_device *dev = sp-dev;
+
+   free_irq(sp-pdev-irq, dev);
+}
+
 /* * *
  * Functions defined below concern the OS part of the driver *
  * * */
@@ -3809,28 +3843,9 @@ static int s2io_open(struct net_device *
int ret = s2io_enable_msi_x(sp);
 
if (!ret) {
-   u16 msi_control;
-
ret = s2io_test_msi(sp);
-
/* rollback MSI-X, will re-enable during add_isr() */
-   kfree(sp-entries);
-   sp-mac_control.stats_info-sw_stat.mem_freed +=
-   (MAX_REQUESTED_MSI_X *
-   sizeof(struct msix_entry));
-   kfree(sp-s2io_entries);
-   sp-mac_control.stats_info-sw_stat.mem_freed +=
-   (MAX_REQUESTED_MSI_X *
-   sizeof(struct s2io_msix_entry));
-   sp-entries = NULL;
-   sp-s2io_entries = NULL;
-
-   pci_read_config_word(sp-pdev, 0x42, msi_control);
-   msi_control = 0xFFFE; /* Disable MSI */
-   pci_write_config_word(sp-pdev, 0x42, msi_control);
-
-   pci_disable_msix(sp-pdev);
-
+   remove_msix_isr(sp);
}
if (ret) {
 
@@ -6864,15 +6879,23 @@ static int s2io_add_isr(struct s2io_nic 
}
}
if (err) {
+   remove_msix_isr(sp);
+
DBG_PRINT(ERR_DBG,%s:MSI-X-%d registration 
  failed\n, dev-name, i);
-   DBG_PRINT(ERR_DBG, Returned: %d\n, err);
-   return -1;
+   DBG_PRINT(ERR_DBG, %s: defaulting to INTA\n,
+   dev-name);
+   sp-config.intr_type = INTA;
+   break;
}
sp-s2io_entries[i].in_use = MSIX_REGISTERED_SUCCESS;
}
-   printk(MSI-X-TX %d entries enabled\n,msix_tx_cnt);
-   printk(MSI-X-RX %d entries enabled\n,msix_rx_cnt);
+   if (!err) {
+   DBG_PRINT(INFO_DBG, MSI-X-TX %d entries enabled\n,
+   msix_tx_cnt);
+   DBG_PRINT(INFO_DBG, MSI-X-RX %d entries enabled\n,
+   msix_rx_cnt);
+   }
}
if (sp-config.intr_type == INTA) {
err = request_irq((int) sp-pdev-irq, s2io_isr, IRQF_SHARED,
@@ -6887,40 +6910,10 @@ static int s2io_add_isr(struct s2io_nic 
 }
 static void s2io_rem_isr(struct s2io_nic * sp)
 {
-   struct net_device *dev = sp-dev;
-   struct swStat *stats = sp-mac_control.stats_info-sw_stat;
-
-   if (sp-config.intr_type ==

Re: [PATCH]: Fix networking scatterlist regressions.

2007-10-31 Thread Jens Axboe

On Wed, Oct 31 2007, David Miller wrote:
 From: Jens Axboe [EMAIL PROTECTED]
 Date: Wed, 31 Oct 2007 09:01:43 +0100

  On Wed, Oct 31 2007, David Miller wrote:
   From: Jens Axboe [EMAIL PROTECTED]
   Date: Wed, 31 Oct 2007 08:46:21 +0100

On Tue, Oct 30 2007, David Miller wrote:
 @@ -293,7 +293,7 @@ decryptor(struct scatterlist *sg, void *data)
   if (thislen == 0)
   return 0;

 - sg_mark_end(desc-frags, desc-fragno);
 + __sg_mark_end(desc-frags, desc-fragno);

   ret = crypto_blkcipher_decrypt_iv(desc-desc, desc-frags,
 desc-frags, thislen);

Hmm? These don't seem right. It also has a weird code sequence:
...
Did something go wrong there?

   Yes, I fixed those up after doing some allmodconfig builds.

   Here is the final patch I actually pushed to Linus:

  That fixes up the sg_mark_end() bit, but it's still calling
  sg_init_table() just a few lines further down. Is that correct?

 Absolutely.  It initially using the scatterlist for this
 crypto layer call:

   ret = crypto_blkcipher_decrypt_iv(desc-desc, desc-frags,
 desc-frags, thislen);
   if (ret)
   return ret;

 then it reinits and sets the sglist to the values the caller
 wants.

Great, just wanted to double check that it was indeed correct!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Fix networking scatterlist regressions.

2007-10-31 Thread Jens Axboe

On Wed, Oct 31 2007, David Miller wrote:
 From: Jens Axboe [EMAIL PROTECTED]
 Date: Wed, 31 Oct 2007 08:46:21 +0100
 
  On Tue, Oct 30 2007, David Miller wrote:
   @@ -293,7 +293,7 @@ decryptor(struct scatterlist *sg, void *data)
 if (thislen == 0)
 return 0;

   - sg_mark_end(desc-frags, desc-fragno);
   + __sg_mark_end(desc-frags, desc-fragno);

 ret = crypto_blkcipher_decrypt_iv(desc-desc, desc-frags,
   desc-frags, thislen);
  
  Hmm? These don't seem right. It also has a weird code sequence:
  ...
  Did something go wrong there?
 
 Yes, I fixed those up after doing some allmodconfig builds.
 
 Here is the final patch I actually pushed to Linus:

Here's the sg_mark_end() patch on top of that.

From 2f5371509d3d4d09269bf7a46868da2ac5c61d77 Mon Sep 17 00:00:00 2001
From: Jens Axboe [EMAIL PROTECTED]
Date: Wed, 31 Oct 2007 09:11:10 +0100
Subject: [PATCH] [SG] Get rid of __sg_mark_end()

sg_mark_end() overwrites the page_link information, but all users want
__sg_mark_end() behaviour where we just set the end bit. That is the most
natural way to use the sg list, since you'll fill it in and then mark the
end point.

So change sg_mark_end() to only set the termination bit. Add a sg_magic
debug check as well, and clear a chain pointer if it is set.

Signed-off-by: Jens Axboe [EMAIL PROTECTED]
---
 block/ll_rw_blk.c |2 +-
 drivers/scsi/scsi_lib.c   |2 +-
 include/linux/scatterlist.h   |   22 --
 net/core/skbuff.c |2 +-
 net/ipv4/tcp_ipv4.c   |2 +-
 net/ipv6/tcp_ipv6.c   |2 +-
 net/rxrpc/rxkad.c |2 +-
 net/sunrpc/auth_gss/gss_krb5_crypto.c |6 +++---
 8 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index e948407..fdc0707 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1369,7 +1369,7 @@ new_segment:
} /* segments in rq */
 
if (sg)
-   __sg_mark_end(sg);
+   sg_mark_end(sg);
 
return nsegs;
 }
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 61fdaf0..88de771 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -785,7 +785,7 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd 
*cmd, gfp_t gfp_mask)
 * end-of-list
 */
if (!left)
-   sg_mark_end(sgl, this);
+   sg_mark_end(sgl[this - 1]);
 
/*
 * don't allow subsequent mempool allocs to sleep, it would
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index d5e1876..b2116a1 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -188,21 +188,23 @@ static inline void sg_chain(struct scatterlist *prv, 
unsigned int prv_nents,
 
 /**
  * sg_mark_end - Mark the end of the scatterlist
- * @sgl:   Scatterlist
- * @nents: Number of entries in sgl
+ * @sg: SG entryScatterlist
  *
  * Description:
- *   Marks the last entry as the termination point for sg_next()
+ *   Marks the passed in sg entry as the termination point for the sg
+ *   table. A call to sg_next() on this entry will return NULL.
  *
  **/
-static inline void sg_mark_end(struct scatterlist *sgl, unsigned int nents)
-{
-   sgl[nents - 1].page_link = 0x02;
-}
-
-static inline void __sg_mark_end(struct scatterlist *sg)
+static inline void sg_mark_end(struct scatterlist *sg)
 {
+#ifdef CONFIG_DEBUG_SG
+   BUG_ON(sg-sg_magic != SG_MAGIC);
+#endif
+   /*
+* Set termination bit, clear potential chain bit
+*/
sg-page_link |= 0x02;
+   sg-page_link = ~0x01;
 }
 
 /**
@@ -218,7 +220,7 @@ static inline void __sg_mark_end(struct scatterlist *sg)
 static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents)
 {
memset(sgl, 0, sizeof(*sgl) * nents);
-   sg_mark_end(sgl, nents);
+   sg_mark_end(sgl[nents - 1]);
 #ifdef CONFIG_DEBUG_SG
{
unsigned int i;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 64b50ff..32d5826 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2095,7 +2095,7 @@ int skb_to_sgvec(struct sk_buff *skb, struct scatterlist 
*sg, int offset, int le
 {
int nsg = __skb_to_sgvec(skb, sg, offset, len);
 
-   __sg_mark_end(sg[nsg - 1]);
+   sg_mark_end(sg[nsg - 1]);
 
return nsg;
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index eec02b2..d438dfb 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1083,7 +1083,7 @@ static int tcp_v4_do_calc_md5_hash(char *md5_hash, struct 
tcp_md5sig_key *key,
sg_set_buf(sg[block++], key-key, key-keylen);
nbytes += key-keylen;
 
-   __sg_mark_end(sg[block - 1]);
+   sg_mark_end(sg[block - 1]);
 
/* Now store the Hash into the packet

Re: [PATCH]: Fix networking scatterlist regressions.

2007-10-31 Thread Jens Axboe

On Wed, Oct 31 2007, David Miller wrote:
 From: Jens Axboe [EMAIL PROTECTED]
 Date: Wed, 31 Oct 2007 08:46:21 +0100
 
  On Tue, Oct 30 2007, David Miller wrote:
   @@ -293,7 +293,7 @@ decryptor(struct scatterlist *sg, void *data)
 if (thislen == 0)
 return 0;

   - sg_mark_end(desc-frags, desc-fragno);
   + __sg_mark_end(desc-frags, desc-fragno);

 ret = crypto_blkcipher_decrypt_iv(desc-desc, desc-frags,
   desc-frags, thislen);
  
  Hmm? These don't seem right. It also has a weird code sequence:
  ...
  Did something go wrong there?
 
 Yes, I fixed those up after doing some allmodconfig builds.
 
 Here is the final patch I actually pushed to Linus:

That fixes up the sg_mark_end() bit, but it's still calling
sg_init_table() just a few lines further down. Is that correct?


-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: dev_ifname32() fails on 32-64bit calls in copy_in_user().

2007-10-31 Thread Eric W. Biederman

Benjamin Herrenschmidt [EMAIL PROTECTED] writes:

 Bug is in the new dev_ifname32:

   uifr = compat_alloc_user_space(sizeof(struct ifreq));
   if (copy_in_user(uifr, compat_ptr(arg), sizeof(struct ifreq32)));
   return -EFAULT;

 There's a stray ; after the if statement, that was obviously not
 tested :-)

Grr sorry about that, and thanks for catching this.

Eric


 This fixes it here (tested):
 
 [PATCH] Fix new dev_ifname32 returning -EFAULT

 A stray semicolon slipped in the patch that updated dev_ifname32 to
 not be inline, causing it to always return -EFAULT. This fixes it.

 Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
 ---

 Index: linux-work/fs/compat_ioctl.c
 ===
 --- linux-work.orig/fs/compat_ioctl.c 2007-10-31 13:30:42.0 +1100
 +++ linux-work/fs/compat_ioctl.c  2007-10-31 13:30:46.0 +1100
 @@ -322,7 +322,7 @@ static int dev_ifname32(unsigned int fd,
   int err;
  
   uifr = compat_alloc_user_space(sizeof(struct ifreq));
 - if (copy_in_user(uifr, compat_ptr(arg), sizeof(struct ifreq32)));
 + if (copy_in_user(uifr, compat_ptr(arg), sizeof(struct ifreq32)))
   return -EFAULT;
  
   err = sys_ioctl(fd, SIOCGIFNAME, (unsigned long)uifr);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3] using mii-bitbang on different processor ports

2007-10-31 Thread Sergej Stepanov

The patch makes possible to have mdio and mdc pins on different physical ports
also for CONFIG_PPC_CPM_NEW_BINDING.
To setup it in the device tree:
reg = 10d40 14 10d60 14; // mdc: 0x10d40, mdio: 0x10d60
or
reg = 10d40 14; // mdc and mdio have the same offset 10d40
The approach was taken from older version.

Signed-off-by: Sergej Stepanov [EMAIL PROTECTED]
--

diff --git a/drivers/net/fs_enet/mii-bitbang.c 
b/drivers/net/fs_enet/mii-bitbang.c
index b8e4a73..83ce0c6 100644
--- a/drivers/net/fs_enet/mii-bitbang.c
+++ b/drivers/net/fs_enet/mii-bitbang.c
@@ -29,12 +29,16 @@
 
 #include fs_enet.h
 
-struct bb_info {
-   struct mdiobb_ctrl ctrl;
+struct bb_port {
__be32 __iomem *dir;
__be32 __iomem *dat;
-   u32 mdio_msk;
-   u32 mdc_msk;
+   u32 msk;
+};
+
+struct bb_info {
+   struct mdiobb_ctrl ctrl;
+   struct bb_port mdc;
+   struct bb_port mdio;
 };
 
 /* FIXME: If any other users of GPIO crop up, then these will have to
@@ -62,18 +66,18 @@ static inline void mdio_dir(struct mdiobb_ctrl *ctrl, int 
dir)
struct bb_info *bitbang = container_of(ctrl, struct bb_info, ctrl);
 
if (dir)
-   bb_set(bitbang-dir, bitbang-mdio_msk);
+   bb_set(bitbang-mdio.dir, bitbang-mdio.msk);
else
-   bb_clr(bitbang-dir, bitbang-mdio_msk);
+   bb_clr(bitbang-mdio.dir, bitbang-mdio.msk);
 
/* Read back to flush the write. */
-   in_be32(bitbang-dir);
+   in_be32(bitbang-mdio.dir);
 }
 
 static inline int mdio_read(struct mdiobb_ctrl *ctrl)
 {
struct bb_info *bitbang = container_of(ctrl, struct bb_info, ctrl);
-   return bb_read(bitbang-dat, bitbang-mdio_msk);
+   return bb_read(bitbang-mdio.dat, bitbang-mdio.msk);
 }
 
 static inline void mdio(struct mdiobb_ctrl *ctrl, int what)
@@ -81,12 +85,12 @@ static inline void mdio(struct mdiobb_ctrl *ctrl, int what)
struct bb_info *bitbang = container_of(ctrl, struct bb_info, ctrl);
 
if (what)
-   bb_set(bitbang-dat, bitbang-mdio_msk);
+   bb_set(bitbang-mdio.dat, bitbang-mdio.msk);
else
-   bb_clr(bitbang-dat, bitbang-mdio_msk);
+   bb_clr(bitbang-mdio.dat, bitbang-mdio.msk);
 
/* Read back to flush the write. */
-   in_be32(bitbang-dat);
+   in_be32(bitbang-mdio.dat);
 }
 
 static inline void mdc(struct mdiobb_ctrl *ctrl, int what)
@@ -94,12 +98,12 @@ static inline void mdc(struct mdiobb_ctrl *ctrl, int what)
struct bb_info *bitbang = container_of(ctrl, struct bb_info, ctrl);
 
if (what)
-   bb_set(bitbang-dat, bitbang-mdc_msk);
+   bb_set(bitbang-mdc.dat, bitbang-mdc.msk);
else
-   bb_clr(bitbang-dat, bitbang-mdc_msk);
+   bb_clr(bitbang-mdc.dat, bitbang-mdc.msk);
 
/* Read back to flush the write. */
-   in_be32(bitbang-dat);
+   in_be32(bitbang-mdc.dat);
 }
 
 static struct mdiobb_ops bb_ops = {
@@ -142,15 +146,32 @@ static int __devinit fs_mii_bitbang_init(struct mii_bus 
*bus,
return -ENODEV;
mdc_pin = *data;
 
-   bitbang-dir = ioremap(res.start, res.end - res.start + 1);
-   if (!bitbang-dir)
+   bitbang-mdc.dir = ioremap(res.start, res.end - res.start + 1);
+   if (!bitbang-mdc.dir)
return -ENOMEM;
 
-   bitbang-dat = bitbang-dir + 4;
-   bitbang-mdio_msk = 1  (31 - mdio_pin);
-   bitbang-mdc_msk = 1  (31 - mdc_pin);
+   bitbang-mdc.dat = bitbang-mdc.dir + 4;
+   if (!of_address_to_resource(np, 1, res)) {
+   if (res.end - res.start  13)
+   goto bad_resource;
+   bitbang-mdio.dir = ioremap(res.start, res.end - res.start + 1);
+   if (!bitbang-mdio.dir)
+   goto unmap_and_exit;
+   bitbang-mdio.dat = bitbang-mdio.dir + 4;
+   } else {
+   bitbang-mdio.dir = bitbang-mdc.dir;
+   bitbang-mdio.dat = bitbang-mdc.dat;
+   }
+   bitbang-mdio.msk = 1  (31 - mdio_pin);
+   bitbang-mdc.msk = 1  (31 - mdc_pin);
 
return 0;
+bad_resource:
+   iounmap(bitbang-mdc.dir);
+   return -ENODEV;
+unmap_and_exit:
+   iounmap(bitbang-mdc.dir);
+   return -ENOMEM;
 }
 
 static void __devinit add_phy(struct mii_bus *bus, struct device_node *np)
@@ -220,7 +241,9 @@ out_free_irqs:
dev_set_drvdata(ofdev-dev, NULL);
kfree(new_bus-irq);
 out_unmap_regs:
-   iounmap(bitbang-dir);
+   if (bitbang-mdio.dir != bitbang-mdc.dir)
+   iounmap(bitbang-mdio.dir);
+   iounmap(bitbang-mdc.dir);
 out_free_bus:
kfree(new_bus);
 out_free_priv:
@@ -238,6 +261,8 @@ static int fs_enet_mdio_remove(struct of_device *ofdev)
free_mdio_bitbang(bus);
dev_set_drvdata(ofdev-dev, NULL);
kfree(bus-irq);
-   iounmap(bitbang-dir);
+   if (bitbang-mdio.dir != bitbang-mdc.dir)
+

Re: Oops in 2.6.21-rc4, 2.6.23

2007-10-31 Thread Jarek Poplawski

On Tue, Oct 30, 2007 at 03:11:20PM +0100, Jarek Poplawski wrote:
 On Mon, Oct 29, 2007 at 01:41:47AM -0700, David Miller wrote:
 ...
  Actually, this was caused by a real bug in the SKB_WITH_OVERHEAD macro
  definition, which Herbert Xu quickly spotted and fixed.
  
  Which I hope you've found this by yourself by now.
  
 
 ...Btw, of course you have to be right, and I should find this in max.
 12 days yet, if I'm as smart as I hope. But as for now, I really can't
 see any meaningful difference between this buggy SKB_WITH_OVERHEAD
 version and 'generic' 2.6.20.

OK. At last I've found by myself, what you seemed to suggest with
such a great subtlety... So, there was this another, bugzilla thread...
And, accidentally of course, I could have been not so 100% wrong, as
expected?!

Regards,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Christoph Hellwig

On Tue, Oct 30, 2007 at 09:37:53PM -0700, David Miller wrote:
 Don't be misled.  Swapping over NFS is just a scarecrow for the
 seemingly real impetus behind these changes which is network storage
 stuff like iSCSI.

So can we please do swap over network storage only first?  All these
VM bits look conceptually sane to me, while the changes to the swap
code to support nfs are real crackpipe material.   Then again doing
that part properly by adding address_space methods for swap I/O without
the abuse might be a really good idea, especially as the way we
do swapfiles on block-based filesystems is an horrible hack already.

So please get the VM bits for swap over network blockdevices in first,
and then we can look into a complete revamp of the swapfile support
that cleans up the current mess and adds support for nfs insted of
making the mess even worse.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] NFS: handle IPv6 addresses in nfs ctl

2007-10-31 Thread Aurélien Charbon


Thank you Brian
Sorry, I did not see what you sent.

I have tested it with an IPv4 configuration. It's OK.
So Neil, Bruce, you can take this one for review.

fs/nfsd/export.c   |9 ++-
fs/nfsd/nfsctl.c   |   42 --
include/linux/sunrpc/svcauth.h |5 +
include/net/ipv6.h |   10 +++
net/sunrpc/svcauth_unix.c  |  118 
+++--

5 files changed, 134 insertions(+), 50 deletions(-)

Signed-off-by: Brian Haley [EMAIL PROTECTED]
Signed-off-by: Aurelien Charbon [EMAIL PROTECTED]

---

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index 66d0aeb..c47ba77 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -35,6 +35,7 @@
#include linux/lockd/bind.h
#include linux/sunrpc/msg_prot.h
#include linux/sunrpc/gss_api.h
+#include net/ipv6.h

#define NFSDDBG_FACILITYNFSDDBG_EXPORT

@@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp)
{
struct auth_domain*dom;
inti, err;
+struct in6_addrin6;

/* First, consistency check. */
err = -EINVAL;
@@ -1574,9 +1576,10 @@ exp_addclient(struct nfsctl_client *ncp)
goto out_unlock;

/* Insert client into hashtable. */
-for (i = 0; i  ncp-cl_naddr; i++)
-auth_unix_add_addr(ncp-cl_addrlist[i], dom);
-
+for (i = 0; i  ncp-cl_naddr; i++) {
+ipv6_addr_set_v4mapped(ncp-cl_addrlist[i].s_addr, in6);
+auth_unix_add_addr(in6, dom);
+}
auth_unix_forget_old(dom);
auth_domain_put(dom);

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 77dc989..5cb5f0d 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -37,6 +37,7 @@
#include linux/nfsd/syscall.h

#include asm/uaccess.h
+#include net/ipv6.h

/*
 *We have a single directory with 9 nodes in it.
@@ -219,24 +220,37 @@ static ssize_t write_getfs(struct file *file, char 
*buf, size_t size)

{
struct nfsctl_fsparm *data;
struct sockaddr_in *sin;
+struct sockaddr_in6 *sin6;
struct auth_domain *clp;
int err = 0;
struct knfsd_fh *res;
+struct in6_addr in6;

if (size  sizeof(*data))
return -EINVAL;
data = (struct nfsctl_fsparm*)buf;
err = -EPROTONOSUPPORT;
-if (data-gd_addr.sa_family != AF_INET)
+switch (data-gd_addr.sa_family) {
+case AF_INET:
+sin = (struct sockaddr_in *)data-gd_addr;
+ipv6_addr_set_v4mapped(sin-sin_addr.s_addr, in6);
+break;
+case AF_INET6:
+sin6 = (struct sockaddr_in6 *)data-gd_addr;
+ipv6_addr_copy(in6, sin6-sin6_addr);
+break;
+default:
goto out;
-sin = (struct sockaddr_in *)data-gd_addr;
+}
+
if (data-gd_maxlen  NFS3_FHSIZE)
data-gd_maxlen = NFS3_FHSIZE;

res = (struct knfsd_fh*)buf;

exp_readlock();
-if (!(clp = auth_unix_lookup(sin-sin_addr)))
+
+if (!(clp = auth_unix_lookup(in6)))
err = -EPERM;
else {
err = exp_rootfh(clp, data-gd_path, res, data-gd_maxlen);
@@ -253,25 +267,41 @@ static ssize_t write_getfd(struct file *file, char 
*buf, size_t size)

{
struct nfsctl_fdparm *data;
struct sockaddr_in *sin;
+struct sockaddr_in6 *sin6;
struct auth_domain *clp;
int err = 0;
struct knfsd_fh fh;
char *res;
+struct in6_addr in6;

if (size  sizeof(*data))
return -EINVAL;
data = (struct nfsctl_fdparm*)buf;
err = -EPROTONOSUPPORT;
-if (data-gd_addr.sa_family != AF_INET)
+if (data-gd_addr.sa_family != AF_INET 
+data-gd_addr.sa_family != AF_INET6)
goto out;
err = -EINVAL;
if (data-gd_version  2 || data-gd_version  NFSSVC_MAXVERS)
goto out;

res = buf;
-sin = (struct sockaddr_in *)data-gd_addr;
exp_readlock();
-if (!(clp = auth_unix_lookup(sin-sin_addr)))
+
+switch (data-gd_addr.sa_family) {
+case AF_INET:
+sin = (struct sockaddr_in *)data-gd_addr;
+ipv6_addr_set_v4mapped(sin-sin_addr.s_addr, in6);
+break;
+case AF_INET6:
+sin6 = (struct sockaddr_in6 *)data-gd_addr;
+ipv6_addr_copy(in6, sin6-sin6_addr);
+break;
+default:
+goto out;
+}
+
+if (!(clp = auth_unix_lookup(in6)))
err = -EPERM;
else {
err = exp_rootfh(clp, data-gd_path, fh, NFS_FHSIZE);
diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
index 22e1ef8..64ecb93 100644
--- a/include/linux/sunrpc/svcauth.h
+++ b/include/linux/sunrpc/svcauth.h
@@ -15,6 +15,7 @@
#include linux/sunrpc/msg_prot.h
#include linux/sunrpc/cache.h
#include linux/hash.h
+#include net/ipv6.h

#define SVC_CRED_NGROUPS32
struct svc_cred {
@@ -120,10 +121,10 @@ extern void
svc_auth_unregister(rpc_authflavor_t flavor);


extern struct auth_domain *unix_domain_find(char *name);
extern void auth_domain_put(struct auth_domain *item);
-extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain 
*dom);
+extern int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain 
*dom);

Re: [PATCH][RFC] Add support for the RDC R6040 Fast Ethernet controller

2007-10-31 Thread Jeff Garzik


Florian Fainelli wrote:

This patch adds support for the RDC R6040 MAC we can find in the RDC R-321x 
System-on-chips.
This driver really needs improvements especially on the NAPI part which 
probably does not
fully use the new NAPI structure.
You will need the RDC PCI identifiers if you want to test this driver which are 
the following ones :

RDC_PCI_VENDOR_ID = 0x17f3
RDC_PCI_DEVICE_ID_RDC_R6040 = 0x6040

Thank you very much in advance for your comments.

Signed-off-by: Sten Wang [EMAIL PROTECTED]
Signed-off-by: Daniel Gimpelevich [EMAIL PROTECTED]
Signed-off-by: Florian Fainelli [EMAIL PROTECTED]


Looks nice and clean to me.  Pre-merge stuff I think needs fixing:

* clean up NAPI as you describe (and delete non-NAPI code paths, unless 
there is a strong reason to keep them).


* unconditional local_irq_{enable,disable} stuff

* spin_lock_irqsave() should not be needed in interrupt handler. 
[perhaps you did this rather than put the slower locking in 
-poll_controller()]


* remove changelog from C header (git repository log is our changelog)

* handle large dev-mc_count, as you note in the C header

* use __le32 and similar data types. validate with sparse 
(Documentation/sparse.txt)


* consider using ioread{8,16,32} and iowrite{8,16,32}, if your platform 
permits.  Then switch from 'unsigned long' to special marker 'void 
__iomem *' for all I/O port addresses


* use DMA_32BIT_MASK rather than 0x in pci_set_dma_mask() call

* in r6040_init_one() call is_valid_ether_addr(), rather than 
hand-rolling the same code yourself


* you need to note carrier state when it changes, using 
netif_carrier_on() and netif_carrier_off()


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 28/33] nfs: teach the NFS client how to treat PG_swapcache pages

2007-10-31 Thread Christoph Hellwig

On Tue, Oct 30, 2007 at 05:04:29PM +0100, Peter Zijlstra wrote:
 Replace all relevant occurences of page-index and page-mapping in the NFS
 client with the new page_file_index() and page_file_mapping() functions.

As discussed personally and on the list a strong NACK for this.  Swapcache
pages have no business at all ever coming through -writepage(s).  If you
really want to support swap over NFS that can only be done properly by
adding separate methods to write out and read in pages separated from the
pagecache.  Incidentally that would also clean up the mess we have with
swap files on normal filesystems using -bmap and bypassing the filesystem
later on.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Fix networking scatterlist regressions.

2007-10-31 Thread David Miller

From: Jens Axboe [EMAIL PROTECTED]
Date: Wed, 31 Oct 2007 09:14:28 +0100

 Subject: [PATCH] [SG] Get rid of __sg_mark_end()

 sg_mark_end() overwrites the page_link information, but all users want
 __sg_mark_end() behaviour where we just set the end bit. That is the most
 natural way to use the sg list, since you'll fill it in and then mark the
 end point.

 So change sg_mark_end() to only set the termination bit. Add a sg_magic
 debug check as well, and clear a chain pointer if it is set.

 Signed-off-by: Jens Axboe [EMAIL PROTECTED]

It doesn't build.  I suspect there is something else in your tree
that is necessary for this patch to work on it's own.

[EMAIL PROTECTED]:~/src/GIT/net-2.6$ patch -p1 diff
patching file block/ll_rw_blk.c
patching file drivers/scsi/scsi_lib.c
patching file include/linux/scatterlist.h
Hunk #2 succeeded at 242 (offset 22 lines).
patching file net/core/skbuff.c
patching file net/ipv4/tcp_ipv4.c
patching file net/ipv6/tcp_ipv6.c
patching file net/rxrpc/rxkad.c
patching file net/sunrpc/auth_gss/gss_krb5_crypto.c
 ...
  CC  init/main.o
In file included from include/asm/dma-mapping.h:4,
 from include/linux/dma-mapping.h:52,
 from include/asm/sbus.h:9,
 from include/asm/dma.h:14,
 from include/linux/bootmem.h:8,
 from init/main.c:26:
include/linux/scatterlist.h: In function 'sg_init_one':
include/linux/scatterlist.h:228: error: too many arguments to function 
'sg_mark_end'
make[1]: *** [init/main.o] Error 1
make: *** [init] Error 2
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] [TCP]: Another TAGBITS - SACKED_ACKED|LOST conversion

2007-10-31 Thread Ilpo Järvinen

Similar to commit 3eec0047d9bdd, point of this is to avoid
skipping R-bit skbs.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4d72781..ca9590f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2078,7 +2078,7 @@ static void tcp_update_scoreboard(struct sock *sk)
if (!tcp_skb_timedout(sk, skb))
break;
 
-   if (!(TCP_SKB_CB(skb)-sackedTCPCB_TAGBITS)) {
+   if (!(TCP_SKB_CB(skb)-sacked  
(TCPCB_SACKED_ACKED|TCPCB_LOST))) {
TCP_SKB_CB(skb)-sacked |= TCPCB_LOST;
tp-lost_out += tcp_skb_pcount(skb);
tcp_verify_retransmit_hint(tp, skb);
-- 
1.5.0.6

Re: [PATCH 1/2] [TCP]: Process DSACKs that reside within a SACK block

2007-10-31 Thread David Miller

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Wed, 31 Oct 2007 11:48:31 +0200 (EET)

 DSACK inside another SACK block were missed if start_seq of DSACK
 was larger than SACK block's because sorting prioritizes full
 processing of the SACK block before DSACK. After SACK block
 sorting situation is like this:

  S
   D
 SS
SSS

 Because write_queue is walked in-order, when the first SACK block
 has been processed, TCP is already past the skb for which the
 DSACK arrived and we haven't taught it to backtrack (nor should
 we), so TCP just continues processing by going to the next SACK
 block after the DSACK (if any).

 Whenever such DSACK is present, do an embedded checking during
 the previous SACK block.

 If the DSACK is below snd_una, there won't be overlapping SACK
 block, and thus no problem in that case. Also if start_seq of
 the DSACK is equal to the actual block, it will be processed
 first.

 Tested this by using netem to duplicate 15% of packets, and
 by printing SACK block when found_dup_sack is true and the 
 selected skb in the dup_sack = 1 branch (if taken):

   SACK block 0: 4344-5792 (relative to snd_una 2019137317)
   SACK block 1: 4344-5792 (relative to snd_una 2019137317) 

 equal start seqnos = next_dup = 0, dup_sack = 1 won't occur...

   SACK block 0: 5792-7240 (relative to snd_una 2019214061)
   SACK block 1: 2896-7240 (relative to snd_una 2019214061)
   DSACK skb match 5792-7240 (relative to snd_una)

 ...and next_dup = 1 case (after the not shown start_seq sort),
 went to dup_sack = 1 branch.

 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

I will queue this bug fix up, thanks Ilpo!

And thanks for all of the testing information, it helps review
enormously.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] [TCP]: Another TAGBITS - SACKED_ACKED|LOST conversion

2007-10-31 Thread David Miller

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Wed, 31 Oct 2007 11:49:59 +0200 (EET)

 Similar to commit 3eec0047d9bdd, point of this is to avoid
 skipping R-bit skbs.

 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

I'll apply this also, thanks a lot.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Fix networking scatterlist regressions.

2007-10-31 Thread Jens Axboe

On Wed, Oct 31 2007, David Miller wrote:
 From: Jens Axboe [EMAIL PROTECTED]
 Date: Wed, 31 Oct 2007 09:14:28 +0100

  Subject: [PATCH] [SG] Get rid of __sg_mark_end()

  sg_mark_end() overwrites the page_link information, but all users want
  __sg_mark_end() behaviour where we just set the end bit. That is the most
  natural way to use the sg list, since you'll fill it in and then mark the
  end point.

  So change sg_mark_end() to only set the termination bit. Add a sg_magic
  debug check as well, and clear a chain pointer if it is set.

  Signed-off-by: Jens Axboe [EMAIL PROTECTED]

 It doesn't build.  I suspect there is something else in your tree
 that is necessary for this patch to work on it's own.

Builds here. But yes, it's on top of other patches, it was merely for
demonstration purposes that I posted it. Locally sg_init_one() uses
sg_init_table() here, it doesn't open code the init:

static inline void sg_init_one(struct scatterlist *sg, const void *buf,
   unsigned int buflen)
{
sg_init_table(sg, 1);
sg_set_buf(sg, buf, buflen);
}

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Peter Zijlstra

On Tue, 2007-10-30 at 21:37 -0700, David Miller wrote:
 From: Nick Piggin [EMAIL PROTECTED]
 Date: Wed, 31 Oct 2007 14:26:32 +1100

  Is it really worth all the added complexity of making swap
  over NFS files work, given that you could use a network block
  device instead?

 Don't be misled.  Swapping over NFS is just a scarecrow for the
 seemingly real impetus behind these changes which is network storage
 stuff like iSCSI.

Not quite, yes, iSCSI is also on the 'want' list of quite a few people,
but swap over NFS on its own is also a feature of great demand.

signature.asc
Description: This is a digitally signed message part

[PATCH 1/2] [TCP]: Process DSACKs that reside within a SACK block

2007-10-31 Thread Ilpo Järvinen


DSACK inside another SACK block were missed if start_seq of DSACK
was larger than SACK block's because sorting prioritizes full
processing of the SACK block before DSACK. After SACK block
sorting situation is like this:

 S
  D
SS
   SSS

Because write_queue is walked in-order, when the first SACK block
has been processed, TCP is already past the skb for which the
DSACK arrived and we haven't taught it to backtrack (nor should
we), so TCP just continues processing by going to the next SACK
block after the DSACK (if any).

Whenever such DSACK is present, do an embedded checking during
the previous SACK block.

If the DSACK is below snd_una, there won't be overlapping SACK
block, and thus no problem in that case. Also if start_seq of
the DSACK is equal to the actual block, it will be processed
first.

Tested this by using netem to duplicate 15% of packets, and
by printing SACK block when found_dup_sack is true and the 
selected skb in the dup_sack = 1 branch (if taken):

  SACK block 0: 4344-5792 (relative to snd_una 2019137317)
  SACK block 1: 4344-5792 (relative to snd_una 2019137317) 

equal start seqnos = next_dup = 0, dup_sack = 1 won't occur...

  SACK block 0: 5792-7240 (relative to snd_una 2019214061)
  SACK block 1: 2896-7240 (relative to snd_una 2019214061)
  DSACK skb match 5792-7240 (relative to snd_una)

...and next_dup = 1 case (after the not shown start_seq sort),
went to dup_sack = 1 branch.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |   25 ++---
 1 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 69d8c38..4d72781 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1330,12 +1330,15 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
cached_fack_count = 0;
}
 
-   for (i=0; inum_sacks; i++, sp++) {
+   for (i = 0; i  num_sacks; i++) {
struct sk_buff *skb;
__u32 start_seq = ntohl(sp-start_seq);
__u32 end_seq = ntohl(sp-end_seq);
int fack_count;
int dup_sack = (found_dup_sack  (i == first_sack_index));
+   int next_dup = (found_dup_sack  (i+1 == first_sack_index));
+
+   sp++;
 
if (!tcp_is_sackblock_valid(tp, dup_sack, start_seq, end_seq)) {
if (dup_sack) {
@@ -1361,7 +1364,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
flag |= FLAG_DATA_LOST;
 
tcp_for_write_queue_from(skb, sk) {
-   int in_sack;
+   int in_sack = 0;
u8 sacked;
 
if (skb == tcp_send_head(sk))
@@ -1380,7 +1383,23 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
if (!before(TCP_SKB_CB(skb)-seq, end_seq))
break;
 
-   in_sack = tcp_match_skb_to_sack(sk, skb, start_seq, 
end_seq);
+   dup_sack = (found_dup_sack  (i == first_sack_index));
+
+   /* Due to sorting DSACK may reside within this SACK 
block! */
+   if (next_dup) {
+   u32 dup_start = ntohl(sp-start_seq);
+   u32 dup_end = ntohl(sp-end_seq);
+
+   if (before(TCP_SKB_CB(skb)-seq, dup_end)) {
+   in_sack = tcp_match_skb_to_sack(sk, 
skb, dup_start, dup_end);
+   if (in_sack  0)
+   dup_sack = 1;
+   }
+   }
+
+   /* DSACK info lost if out-of-mem, try SACK still */
+   if (in_sack = 0)
+   in_sack = tcp_match_skb_to_sack(sk, skb, 
start_seq, end_seq);
if (in_sack  0)
break;
 
-- 
1.5.0.6

Re: [RFC][BNX2X] .h files rewrite

2007-10-31 Thread Eliezer Tamir

(Sorry It took so long to answer, I've had to go over a lot of stuff to
make sure I'm giving you accurate answers.)

On Mon, 2007-10-29 at 01:39 -0700, David Miller wrote:
 From: Eliezer Tamir [EMAIL PROTECTED]
 Date: Sun, 28 Oct 2007 22:21:14 +0200

 Overall things look significantly better, thanks a lot!

 However, there is still one set of magic constants in here
 which I hope you can clear up:

  +static const struct raw_op init_ops[] = {
  +#define PRS_COMMON_START0
  + {OP_WR, PRS_REG_INC_VALUE, 0xf},
  + {OP_WR, PRS_REG_EVENT_ID_1, 0x45},
  + {OP_WR, PRS_REG_EVENT_ID_2, 0x84},
  + {OP_WR, PRS_REG_EVENT_ID_3, 0x6},
  + {OP_WR, PRS_REG_NO_MATCH_EVENT_ID, 0x4},
  + {OP_WR, PRS_REG_CM_HDR_TYPE_0, 0x0},
  + {OP_WR, PRS_REG_CM_HDR_TYPE_1, 0x1217},
  + {OP_WR, PRS_REG_CM_HDR_TYPE_2, 0x2217},
  + {OP_WR, PRS_REG_CM_HDR_TYPE_3, 0x3217},
  + {OP_ZR, PRS_REG_CM_HDR_TYPE_4, 0x5},
  + {OP_WR, PRS_REG_CM_HDR_LOOPBACK_TYPE_1, 0x1215},
  + {OP_WR, PRS_REG_CM_HDR_LOOPBACK_TYPE_2, 0x2215},
  + {OP_WR, PRS_REG_CM_HDR_LOOPBACK_TYPE_3, 0x3215},
  + {OP_ZR, PRS_REG_CM_HDR_LOOPBACK_TYPE_4, 0x4},
  etc. etc.

These are the steps performed to initialize the chip and load the
microcode.

Each element in the array is an operation in the form:
{operation, chip address, value}

Where the operation is one of:
OP_WR - write a value to the chip.
OP_RD - read a register (this is a read with a side effect, usually a
clear on read register).
OP_SW - string write to the chip (parts of the microcode).
OP_SI - string write using indirect write registers.
OP_ZR - clear a range of memory.
OP_ZP - unzip and copy using DMAE.
OP_WB - string copy using DMAE.

(I will add these explanations to bnx2x_init.h)

As you can see there are quite a lot of them, that is because most of
them are setting up the microcode internal structures.
Maybe it would have been better if the microcode were to do these things
but because of size issues it currently is not an option.

We will add comments to all register definitions explaining as much as
possible about each register.

All of the registers that are related to the driver flow are accessed
explicitly inside the driver code in the main .c file.

Writes to memories are simply parts of the microcode.

However, almost all of the registers written to here, are numeric values
not bit constants. The values are derived from HW definitions (and that
was the source of the extra-ugly generated code that we replaced).

In a perfect world almost all of these would have been the reset values
of the registers. 
#include bitching_about_HW_guys.h

The added documentation for each register in the register file will
provide all the info that I have about the values.
I feel bad about adding several hundred lines of defines that are only
used once, and do not add any real information about the values.

Your insights would be appreciated.

Thanks,
Eliezer

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages

2007-10-31 Thread Peter Zijlstra

On Wed, 2007-10-31 at 14:37 +1100, Nick Piggin wrote:
 On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote:
  Restrict objects from reserve slabs (ALLOC_NO_WATERMARKS) to allocation
  contexts that are entitled to it.
 
  Care is taken to only touch the SLUB slow path.
 
  This is done to ensure reserve pages don't leak out and get consumed.
 
 I think this is generally a good idea (to prevent slab allocators
 from stealing reserve). However I naively think the implementation
 is a bit overengineered and thus has a few holes.
 
 Humour me, what was the problem with failing the slab allocation
 (actually, not fail but just call into the page allocator to do
 correct waiting  / reclaim) in the slowpath if the process fails the
 watermark checks?

Ah, we actually need slabs below the watermarks. Its just that once I
allocated those slabs using __GFP_MEMALLOC/PF_MEMALLOC I don't want
allocation contexts that do not have rights to those pages to walk off
with objects.

So, this generic reserve framework still uses the slab allocator to
provide certain kind of objects (kmalloc, kmem_cache_alloc), it just
separates those that are and are not entitled to the reserves.


signature.asc
Description: This is a digitally signed message part

Re: [PATCH 05/33] mm: kmem_estimate_pages()

2007-10-31 Thread Peter Zijlstra

On Wed, 2007-10-31 at 14:43 +1100, Nick Piggin wrote:
 On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote:
  Provide a method to get the upper bound on the pages needed to allocate
  a given number of objects from a given kmem_cache.
 
 
 Fair enough, but just to make it a bit easier, can you provide a
 little reason of why in this patch (or reference the patch number
 where you use it, or put it together with the patch where you use
 it, etc.).

A generic reserve framework, as seen in patch 11/23, needs to be able
convert from a object demand (kmalloc() bytes, kmem_cache_alloc()
objects) to a page reserve.



signature.asc
Description: This is a digitally signed message part

Re: [PATCH 06/33] mm: allow PF_MEMALLOC from softirq context

2007-10-31 Thread Peter Zijlstra

On Wed, 2007-10-31 at 14:51 +1100, Nick Piggin wrote:
 On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote:
  Allow PF_MEMALLOC to be set in softirq context. When running softirqs from
  a borrowed context save current-flags, ksoftirqd will have its own
  task_struct.
 
 
 What's this for? Why would ksoftirqd pick up PF_MEMALLOC? (I guess
 that some networking thing must be picking it up in a subsequent patch,
 but I'm too lazy to look!)... Again, can you have more of a rationale in
 your patch headers, or ref the patch that uses it... thanks

Right, I knew I was forgetting something in these changelogs.

The network stack does quite a bit of packet processing from softirq
context. Once you start swapping over network, some of the packets want
to be processed under PF_MEMALLOC.

See patch 23/33.


signature.asc
Description: This is a digitally signed message part

Re: [PATCH 09/33] mm: system wide ALLOC_NO_WATERMARK

2007-10-31 Thread Peter Zijlstra

On Wed, 2007-10-31 at 14:52 +1100, Nick Piggin wrote:
 On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote:
  Change ALLOC_NO_WATERMARK page allocation such that the reserves are system
  wide - which they are per setup_per_zone_pages_min(), when we scrape the
  barrel, do it properly.
 
 
 IIRC it's actually not too uncommon to have allocations coming here via
 page reclaim. It's not exactly clear that you want to break mempolicies
 at this point.

Hmm, the way I see it is that mempolicies are mainly for user-space
allocations, reserve allocations are always kernel allocations. These
already break mempolicies - for example hardirq context allocations.

Also, as it stands, the reserve is spread out evenly over all
zones/nodes (excluding highmem), so by restricting ourselves to a
subset, we don't have access to the full reserve.



signature.asc
Description: This is a digitally signed message part

Re: [RFC][BNX2X] .h files rewrite

2007-10-31 Thread David Miller

From: Eliezer Tamir [EMAIL PROTECTED]
Date: Wed, 31 Oct 2007 12:14:47 +0200

 I feel bad about adding several hundred lines of defines that are only
 used once, and do not add any real information about the values.

Ok, let's skip this for now.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Peter Zijlstra

On Wed, 2007-10-31 at 08:50 +, Christoph Hellwig wrote:
 On Tue, Oct 30, 2007 at 09:37:53PM -0700, David Miller wrote:
  Don't be misled.  Swapping over NFS is just a scarecrow for the
  seemingly real impetus behind these changes which is network storage
  stuff like iSCSI.
 
 So can we please do swap over network storage only first?  All these
 VM bits look conceptually sane to me, while the changes to the swap
 code to support nfs are real crackpipe material.

Yeah, I know how you stand on that. I just wanted to post all this
before going off into the woods reworking it all.

 Then again doing
 that part properly by adding address_space methods for swap I/O without
 the abuse might be a really good idea, especially as the way we
 do swapfiles on block-based filesystems is an horrible hack already.

Is planned. What do you think of the proposed a_ops extension to
accomplish this? That is,

-swapfile() - is this address space willing to back swap
-swapout() - write out a page
-swapin() - read in a page

 So please get the VM bits for swap over network blockdevices in first,

Trouble with that part is that we don't have any sane network block
devices atm, NBD is utter crap, and iSCSI is too complex to be called
sane.

Maybe Evgeniy's Distributed storage thingy would work, will have a look
at that.

 and then we can look into a complete revamp of the swapfile support
 that cleans up the current mess and adds support for nfs insted of
 making the mess even worse.

Sure, concrete suggestion are always welcome. Just being told something
is utter crap only goes so far.


signature.asc
Description: This is a digitally signed message part

Re: Bonding in active-backup mode with arp monitoring on Xen

2007-10-31 Thread Tsutomu Fujii

Hi.

Jay Vosburgh wrote:
   I'm not sure if this is a solution that will work for any peer
 (some peers may not reply to an ARP with an IP source of all zeros).  At
 first glance, there doesn't seem to be much of a downside, but I'll have
 to experiment with it a bit to see if the check should be optional or
 simply removed entirely.

If there are no downside, I think it is better to remove the check.
For your information, other mode that can use arp monitoring send an
ARP with an IP source of all zero.

Thanks.

--
Tsutomu Fujii
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

NBD was Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Pavel Machek

Hi!

  So please get the VM bits for swap over network blockdevices in first,
 
 Trouble with that part is that we don't have any sane network block
 devices atm, NBD is utter crap, and iSCSI is too complex to be called
 sane.

Hey, NBD was designed to be _simple_. And I think it works okay in
that area.. so can you elaborate on utter crap? [Ok, performance is
not great.]

Plus, I'd suggest you to look at ata-over-ethernet. It is in tree
today, quite simple, but should have better performance than nbd.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: NBD was Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Peter Zijlstra

On Wed, 2007-10-31 at 12:18 +0100, Pavel Machek wrote:
 Hi!
 
   So please get the VM bits for swap over network blockdevices in first,
  
  Trouble with that part is that we don't have any sane network block
  devices atm, NBD is utter crap, and iSCSI is too complex to be called
  sane.
 
 Hey, NBD was designed to be _simple_. And I think it works okay in
 that area.. so can you elaborate on utter crap? [Ok, performance is
 not great.]

Yeah, sorry, perhaps I was overly strong.

It doesn't work for me, because:

  - it does connection management in user-space, which makes it
impossible to reconnect. I'd want a full kernel based client.

  - it had some plugging issues, and after talking to Jens about it
he suggested a rewrite using -make_request() ala AoE. [ sorry if
I'm short on details here, it was a long time ago, and I
forgot, maybe Jens remembers ]

 Plus, I'd suggest you to look at ata-over-ethernet. It is in tree
 today, quite simple, but should have better performance than nbd.

Ah, right, I keep forgetting about that one. The only draw-back to that
on is, is that its raw ethernet, and not some IP protocol.


signature.asc
Description: This is a digitally signed message part

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Peter Zijlstra

On Wed, 2007-10-31 at 14:26 +1100, Nick Piggin wrote:
 On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote:
  Hi,
 
  Another posting of the full swap over NFS series.
 
 Hi,
 
 Is it really worth all the added complexity of making swap
 over NFS files work, given that you could use a network block
 device instead?

As it stands, we don't have a usable network block device IMHO.
NFS is by far the most used and usable network storage solution out
there, anybody with half a brain knows how to set it up and use it.

 Also, have you ensured that page_file_index, page_file_mapping
 and page_offset are only ever used on anonymous pages when the
 page is locked? (otherwise PageSwapCache could change)

Good point, I hope so, both -readpage() and -writepage() take a locked
page, I'd have to look if it remains locked throughout the NFS call
chain.

Then again, it might become obsolete with the extended swap a_ops.



signature.asc
Description: This is a digitally signed message part

Re: [PATCH 1/7] o80211s: Export dev_seq_{start,stop,next} symbols.

2007-10-31 Thread Johannes Berg



 This patches are being sent just to linux-wireless mailing list. I'm just
 sending this one patch also to netdev since it is not wireless specific.

 +EXPORT_SYMBOL_GPL(dev_seq_start);

A log message that explains why you need this would be good, rather than
a generic introduction.

johannes


signature.asc
Description: This is a digitally signed message part

Re: [PATCH 06/33] mm: allow PF_MEMALLOC from softirq context

2007-10-31 Thread Nick Piggin

On Wednesday 31 October 2007 21:42, Peter Zijlstra wrote:
 On Wed, 2007-10-31 at 14:51 +1100, Nick Piggin wrote:
  On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote:
   Allow PF_MEMALLOC to be set in softirq context. When running softirqs
   from a borrowed context save current-flags, ksoftirqd will have its
   own task_struct.
 
  What's this for? Why would ksoftirqd pick up PF_MEMALLOC? (I guess
  that some networking thing must be picking it up in a subsequent patch,
  but I'm too lazy to look!)... Again, can you have more of a rationale in
  your patch headers, or ref the patch that uses it... thanks

 Right, I knew I was forgetting something in these changelogs.

 The network stack does quite a bit of packet processing from softirq
 context. Once you start swapping over network, some of the packets want
 to be processed under PF_MEMALLOC.

Hmm... what about processing from interrupt context?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Jeff Garzik


Thoughts:

1) I absolutely agree that NFS is far more prominent and useful than any 
network block device, at the present time.



2) Nonetheless, swap over NFS is a pretty rare case.  I view this work 
as interesting, but I really don't see a huge need, for swapping over 
NBD or swapping over NFS.  I tend to think swapping to a remote resource 
starts to approach migration rather than merely swapping.  Yes, we can 
do it...  but given the lack of burning need one must examine the price.



3) You note

Swap over network has the problem that the network subsystem does not use fixed
sized allocations, but heavily relies on kmalloc(). This makes mempools
unusable.


True, but IMO there are mitigating factors that should be researched and 
taken into account:


a) To give you some net driver background/history, most mainstream net 
drivers were coded to allocate RX skbs of size 1538, under the theory 
that they would all be allocating out of the same underlying slab cache. 
 It would not be difficult to update a great many of the [non-jumbo] 
cases to create a fixed size allocation pattern.


b) Spare-time experiments and anecdotal evidence points to RX and TX skb 
recycling as a potentially valuable area of research.  If you are able 
to do something like that, then memory suddenly becomes a lot more 
bounded and predictable.



So my gut feeling is that taking a hard look at how net drivers function 
in the field should give you a lot of good ideas that approach the 
shared goal of making network memory allocations more predictable and 
bounded.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages

2007-10-31 Thread Peter Zijlstra

On Wed, 2007-10-31 at 21:46 +1100, Nick Piggin wrote:
 On Wednesday 31 October 2007 21:42, Peter Zijlstra wrote:
  On Wed, 2007-10-31 at 14:37 +1100, Nick Piggin wrote:
   On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote:
Restrict objects from reserve slabs (ALLOC_NO_WATERMARKS) to allocation
contexts that are entitled to it.
   
Care is taken to only touch the SLUB slow path.
   
This is done to ensure reserve pages don't leak out and get consumed.
  
   I think this is generally a good idea (to prevent slab allocators
   from stealing reserve). However I naively think the implementation
   is a bit overengineered and thus has a few holes.
  
   Humour me, what was the problem with failing the slab allocation
   (actually, not fail but just call into the page allocator to do
   correct waiting  / reclaim) in the slowpath if the process fails the
   watermark checks?
 
  Ah, we actually need slabs below the watermarks.
 
 Right, I'd still allow those guys to allocate slabs. Provided they
 have the right allocation context, right?
 
 
  Its just that once I 
  allocated those slabs using __GFP_MEMALLOC/PF_MEMALLOC I don't want
  allocation contexts that do not have rights to those pages to walk off
  with objects.
 
 And I'd prevent these ones from doing so.
 
 Without keeping track of reserve pages, which doesn't feel
 too clean.

The problem with that is that once a slab was allocated with the right
allocation context, anybody can get objects from these slabs.


low memory, and empty slab:

task Atask B

kmem_cache_alloc() = NULL

  current-flags |= PF_MEMALLOC
  kmem_cache_alloc() = obj
  (slab != NULL)

kmem_cache_alloc() = obj
kmem_cache_alloc() = obj
kmem_cache_alloc() = obj


And now task A, who doesn't have the right permissions walks
away with all our reserve memory.

So we either reserve a page per object, which for 32 byte objects is a
large waste, or we stop anybody who doesn't have the right permissions
from obtaining objects. I took the latter approach.



signature.asc
Description: This is a digitally signed message part

Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages

2007-10-31 Thread Nick Piggin

On Wednesday 31 October 2007 21:42, Peter Zijlstra wrote:
 On Wed, 2007-10-31 at 14:37 +1100, Nick Piggin wrote:
  On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote:
   Restrict objects from reserve slabs (ALLOC_NO_WATERMARKS) to allocation
   contexts that are entitled to it.
  
   Care is taken to only touch the SLUB slow path.
  
   This is done to ensure reserve pages don't leak out and get consumed.
 
  I think this is generally a good idea (to prevent slab allocators
  from stealing reserve). However I naively think the implementation
  is a bit overengineered and thus has a few holes.
 
  Humour me, what was the problem with failing the slab allocation
  (actually, not fail but just call into the page allocator to do
  correct waiting  / reclaim) in the slowpath if the process fails the
  watermark checks?

 Ah, we actually need slabs below the watermarks.

Right, I'd still allow those guys to allocate slabs. Provided they
have the right allocation context, right?


 Its just that once I 
 allocated those slabs using __GFP_MEMALLOC/PF_MEMALLOC I don't want
 allocation contexts that do not have rights to those pages to walk off
 with objects.

And I'd prevent these ones from doing so.

Without keeping track of reserve pages, which doesn't feel
too clean.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/8] Cleanup the allocation/freeing of the sock object

2007-10-31 Thread Pavel Emelyanov

The sock object is allocated either from the generic cache with
the kmalloc, or from the proc-slab cache.

Move this logic into an isolated set of helpers and make the
sk_alloc/sk_free look a bit nicer.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/core/sock.c b/net/core/sock.c
index 9c2dbfa..7c2e3db 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -870,6 +870,31 @@ static void sock_copy(struct sock *nsk, const struct sock 
*osk)
 #endif
 }
 
+static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority)
+{
+   struct sock *sk;
+   struct kmem_cache *slab;
+
+   slab = prot-slab;
+   if (slab != NULL)
+   sk = kmem_cache_alloc(slab, priority);
+   else
+   sk = kmalloc(prot-obj_size, priority);
+
+   return sk;
+}
+
+static void sk_prot_free(struct proto *prot, struct sock *sk)
+{
+   struct kmem_cache *slab;
+
+   slab = prot-slab;
+   if (slab != NULL)
+   kmem_cache_free(slab, sk);
+   else
+   kfree(sk);
+}
+
 /**
  * sk_alloc - All socket objects are allocated here
  * @net: the applicable net namespace
@@ -881,14 +906,9 @@ static void sock_copy(struct sock *nsk, const struct sock 
*osk)
 struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
  struct proto *prot, int zero_it)
 {
-   struct sock *sk = NULL;
-   struct kmem_cache *slab = prot-slab;
-
-   if (slab != NULL)
-   sk = kmem_cache_alloc(slab, priority);
-   else
-   sk = kmalloc(prot-obj_size, priority);
+   struct sock *sk;
 
+   sk = sk_prot_alloc(prot, priority);
if (sk) {
if (zero_it) {
memset(sk, 0, prot-obj_size);
@@ -911,10 +931,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t 
priority,
return sk;
 
 out_free:
-   if (slab != NULL)
-   kmem_cache_free(slab, sk);
-   else
-   kfree(sk);
+   sk_prot_free(prot, sk);
return NULL;
 }
 
@@ -940,10 +957,7 @@ void sk_free(struct sock *sk)
 
security_sk_free(sk);
put_net(sk-sk_net);
-   if (sk-sk_prot_creator-slab != NULL)
-   kmem_cache_free(sk-sk_prot_creator-slab, sk);
-   else
-   kfree(sk);
+   sk_prot_free(sk-sk_prot_creator, sk);
module_put(owner);
 }
 
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/8] Auto-zero the allocated sock object

2007-10-31 Thread Pavel Emelyanov

We have a __GFP_ZERO flag that allocates a zeroed chunk of memory.
Use it in the sk_alloc() and avoid a hand-made memset().

This is a temporary patch that will help us in the nearest future :)

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/core/sock.c b/net/core/sock.c
index 7c2e3db..21fc79b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -908,10 +908,12 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t 
priority,
 {
struct sock *sk;
 
+   if (zero_it)
+   priority |= __GFP_ZERO;
+
sk = sk_prot_alloc(prot, priority);
if (sk) {
if (zero_it) {
-   memset(sk, 0, prot-obj_size);
sk-sk_family = family;
/*
 * See comment in struct sock definition to understand
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/8] Move some core sock setup into sk_prot_alloc

2007-10-31 Thread Pavel Emelyanov

The security_sk_alloc() and the module_get is a part of the
object allocations - move it in the proper place.

Note, that since we do not reset the newly allocated sock
in the sk_alloc() (memset() is removed with the previous
patch) we can safely do this.

Also fix the error path in sk_prot_alloc() - release the security
context if needed.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/core/sock.c b/net/core/sock.c
index 21fc79b..e7537e4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -870,7 +870,8 @@ static void sock_copy(struct sock *nsk, const struct sock 
*osk)
 #endif
 }
 
-static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority)
+static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
+   int family)
 {
struct sock *sk;
struct kmem_cache *slab;
@@ -881,18 +882,40 @@ static struct sock *sk_prot_alloc(struct proto *prot, 
gfp_t priority)
else
sk = kmalloc(prot-obj_size, priority);
 
+   if (sk != NULL) {
+   if (security_sk_alloc(sk, family, priority))
+   goto out_free;
+
+   if (!try_module_get(prot-owner))
+   goto out_free_sec;
+   }
+
return sk;
+
+out_free_sec:
+   security_sk_free(sk);
+out_free:
+   if (slab != NULL)
+   kmem_cache_free(slab, sk);
+   else
+   kfree(sk);
+   return NULL;
 }
 
 static void sk_prot_free(struct proto *prot, struct sock *sk)
 {
struct kmem_cache *slab;
-
+   struct module *owner;
+
+   owner = prot-owner;
slab = prot-slab;
+
+   security_sk_free(sk);
if (slab != NULL)
kmem_cache_free(slab, sk);
else
kfree(sk);
+   module_put(owner);
 }
 
 /**
@@ -911,7 +934,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t 
priority,
if (zero_it)
priority |= __GFP_ZERO;
 
-   sk = sk_prot_alloc(prot, priority);
+   sk = sk_prot_alloc(prot, priority, family);
if (sk) {
if (zero_it) {
sk-sk_family = family;
@@ -923,24 +946,14 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t 
priority,
sock_lock_init(sk);
sk-sk_net = get_net(net);
}
-
-   if (security_sk_alloc(sk, family, priority))
-   goto out_free;
-
-   if (!try_module_get(prot-owner))
-   goto out_free;
}
-   return sk;
 
-out_free:
-   sk_prot_free(prot, sk);
-   return NULL;
+   return sk;
 }
 
 void sk_free(struct sock *sk)
 {
struct sk_filter *filter;
-   struct module *owner = sk-sk_prot_creator-owner;
 
if (sk-sk_destruct)
sk-sk_destruct(sk);
@@ -957,10 +970,8 @@ void sk_free(struct sock *sk)
printk(KERN_DEBUG %s: optmem leakage (%d bytes) detected.\n,
   __FUNCTION__, atomic_read(sk-sk_omem_alloc));
 
-   security_sk_free(sk);
put_net(sk-sk_net);
sk_prot_free(sk-sk_prot_creator, sk);
-   module_put(owner);
 }
 
 struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/8] Move the sock_copy() from the header

2007-10-31 Thread Pavel Emelyanov

The sock_copy() call is not used outside the sock.c file,
so just move it into a sock.c

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/include/net/sock.h b/include/net/sock.h
index 43fc3fa..ecad7b4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -993,20 +993,6 @@ static inline void sock_graft(struct sock *sk, struct 
socket *parent)
write_unlock_bh(sk-sk_callback_lock);
 }
 
-static inline void sock_copy(struct sock *nsk, const struct sock *osk)
-{
-#ifdef CONFIG_SECURITY_NETWORK
-   void *sptr = nsk-sk_security;
-#endif
-
-   memcpy(nsk, osk, osk-sk_prot-obj_size);
-   get_net(nsk-sk_net);
-#ifdef CONFIG_SECURITY_NETWORK
-   nsk-sk_security = sptr;
-   security_sk_clone(osk, nsk);
-#endif
-}
-
 extern int sock_i_uid(struct sock *sk);
 extern unsigned long sock_i_ino(struct sock *sk);
 
diff --git a/net/core/sock.c b/net/core/sock.c
index bba9949..fdacf9c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -857,6 +857,20 @@ static inline void sock_lock_init(struct sock *sk)
af_family_keys + sk-sk_family);
 }
 
+static void sock_copy(struct sock *nsk, const struct sock *osk)
+{
+#ifdef CONFIG_SECURITY_NETWORK
+   void *sptr = nsk-sk_security;
+#endif
+
+   memcpy(nsk, osk, osk-sk_prot-obj_size);
+   get_net(nsk-sk_net);
+#ifdef CONFIG_SECURITY_NETWORK
+   nsk-sk_security = sptr;
+   security_sk_clone(osk, nsk);
+#endif
+}
+
 /**
  * sk_alloc - All socket objects are allocated here
  * @net: the applicable net namespace
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/8] Cleanup/fix the sk_alloc() call

2007-10-31 Thread Pavel Emelyanov

The sk_alloc() function suffers from two problems:
1 (major). The error path is not clean in it - if the security
   call fails, the net namespace is not put, if the try_module_get
   fails  additionally the security context is not released;
2 (minor). The zero_it argument is misleading, as it doesn't just 
   zeroes it, but performs some extra setup. Besides this argument 
   is used only in one place - in the sk_clone().

So this set fixes these problems and performs some additional
cleanup.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/8] Remove bogus zero_it argument from sk_alloc

2007-10-31 Thread Pavel Emelyanov

At this point nobody calls the sk_alloc(() with zero_it == 0,
so remove unneeded checks from it.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/core/sock.c b/net/core/sock.c
index c032f48..77575c3 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -931,21 +931,16 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t 
priority,
 {
struct sock *sk;
 
-   if (zero_it)
-   priority |= __GFP_ZERO;
-
-   sk = sk_prot_alloc(prot, priority, family);
+   sk = sk_prot_alloc(prot, priority | __GFP_ZERO, family);
if (sk) {
-   if (zero_it) {
-   sk-sk_family = family;
-   /*
-* See comment in struct sock definition to understand
-* why we need sk_prot_creator -acme
-*/
-   sk-sk_prot = sk-sk_prot_creator = prot;
-   sock_lock_init(sk);
-   sk-sk_net = get_net(net);
-   }
+   sk-sk_family = family;
+   /*
+* See comment in struct sock definition to understand
+* why we need sk_prot_creator -acme
+*/
+   sk-sk_prot = sk-sk_prot_creator = prot;
+   sock_lock_init(sk);
+   sk-sk_net = get_net(net);
}
 
return sk;
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages

2007-10-31 Thread Peter Zijlstra

On Wed, 2007-10-31 at 22:25 +1100, Nick Piggin wrote:
 On Wednesday 31 October 2007 23:17, Peter Zijlstra wrote:
  On Wed, 2007-10-31 at 21:46 +1100, Nick Piggin wrote:
 
   And I'd prevent these ones from doing so.
  
   Without keeping track of reserve pages, which doesn't feel
   too clean.
 
  The problem with that is that once a slab was allocated with the right
  allocation context, anybody can get objects from these slabs.
 
 [snip]
 
 I understand that.
 
 
  So we either reserve a page per object, which for 32 byte objects is a
  large waste, or we stop anybody who doesn't have the right permissions
  from obtaining objects. I took the latter approach.
 
 What I'm saying is that the slab allocator slowpath should always
 just check watermarks against the current task. Instead of this
 -reserve stuff.

So what you say is to allocate a slab every time we take the slow path,
even when we already have one?

That sounds rather sub-optimal.


signature.asc
Description: This is a digitally signed message part

[PATCH 2/8] Move the get_net() from sock_copy()

2007-10-31 Thread Pavel Emelyanov

The sock_copy() is supposed to just clone the socket. In a perfect
world it has to be just memcpy, but we have to handle the security
mark correctly. All the extra setup must be performed in sk_clone() 
call, so move the get_net() into more proper place.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/core/sock.c b/net/core/sock.c
index fdacf9c..9c2dbfa 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -864,7 +864,6 @@ static void sock_copy(struct sock *nsk, const struct sock 
*osk)
 #endif
 
memcpy(nsk, osk, osk-sk_prot-obj_size);
-   get_net(nsk-sk_net);
 #ifdef CONFIG_SECURITY_NETWORK
nsk-sk_security = sptr;
security_sk_clone(osk, nsk);
@@ -958,6 +957,7 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t 
priority)
sock_copy(newsk, sk);
 
/* SANITY */
+   get_net(newsk-sk_net);
sk_node_init(newsk-sk_node);
sock_lock_init(newsk);
bh_lock_sock(newsk);
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages

2007-10-31 Thread Nick Piggin

On Wednesday 31 October 2007 23:17, Peter Zijlstra wrote:
 On Wed, 2007-10-31 at 21:46 +1100, Nick Piggin wrote:

  And I'd prevent these ones from doing so.
 
  Without keeping track of reserve pages, which doesn't feel
  too clean.

 The problem with that is that once a slab was allocated with the right
 allocation context, anybody can get objects from these slabs.

[snip]

I understand that.


 So we either reserve a page per object, which for 32 byte objects is a
 large waste, or we stop anybody who doesn't have the right permissions
 from obtaining objects. I took the latter approach.

What I'm saying is that the slab allocator slowpath should always
just check watermarks against the current task. Instead of this
-reserve stuff.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/8] Make the sk_clone() lighter

2007-10-31 Thread Pavel Emelyanov

The sk_prot_alloc() already performs all the stuff needed by the
sk_clone(). Besides, the sk_prot_alloc() requires almost twice
less arguments than the sk_alloc() does, so call the sk_prot_alloc()
saving the stack a bit.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/core/sock.c b/net/core/sock.c
index e7537e4..c032f48 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -976,8 +976,9 @@ void sk_free(struct sock *sk)
 
 struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
 {
-   struct sock *newsk = sk_alloc(sk-sk_net, sk-sk_family, priority, 
sk-sk_prot, 0);
-
+   struct sock *newsk;
+
+   newsk = sk_prot_alloc(sk-sk_prot, priority, sk-sk_family);
if (newsk != NULL) {
struct sk_filter *filter;
 
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Peter Zijlstra

On Wed, 2007-10-31 at 08:16 -0400, Jeff Garzik wrote:
 Thoughts:
 
 1) I absolutely agree that NFS is far more prominent and useful than any 
 network block device, at the present time.
 
 
 2) Nonetheless, swap over NFS is a pretty rare case.  I view this work 
 as interesting, but I really don't see a huge need, for swapping over 
 NBD or swapping over NFS.  I tend to think swapping to a remote resource 
 starts to approach migration rather than merely swapping.  Yes, we can 
 do it...  but given the lack of burning need one must examine the price.

There is a large corporate demand for this, which is why I'm doing this.

The typical usage scenarios are:
 - cluster/blades, where having local disks is a cost issue (maintenance
   of failures, heat, etc)
 - virtualisation, where dumping the storage on a networked storage unit
   makes for trivial migration and what not..

But please, people who want this (I'm sure some of you are reading) do
speak up. I'm just the motivated corporate drone implementing the
feature :-)

 3) You note
  Swap over network has the problem that the network subsystem does not use 
  fixed
  sized allocations, but heavily relies on kmalloc(). This makes mempools
  unusable.
 
 True, but IMO there are mitigating factors that should be researched and 
 taken into account:
 
 a) To give you some net driver background/history, most mainstream net 
 drivers were coded to allocate RX skbs of size 1538, under the theory 
 that they would all be allocating out of the same underlying slab cache. 
   It would not be difficult to update a great many of the [non-jumbo] 
 cases to create a fixed size allocation pattern.

One issue that comes to mind is how to ensure we'd still overflow the
IP-reassembly buffers. Currently those are managed on the number of
bytes present, not the number of fragments.

One of the goals of my approach was to not rewrite the network subsystem
to accomodate this feature (and I hope I succeeded).

 b) Spare-time experiments and anecdotal evidence points to RX and TX skb 
 recycling as a potentially valuable area of research.  If you are able 
 to do something like that, then memory suddenly becomes a lot more 
 bounded and predictable.
 
 
 So my gut feeling is that taking a hard look at how net drivers function 
 in the field should give you a lot of good ideas that approach the 
 shared goal of making network memory allocations more predictable and 
 bounded.

Note that being bounded only comes from dropping most packets before
trying them to a socket. That is the crucial part of the RX path, to
receive all packets from the NIC (regardless their size) but to not pass
them on to the network stack - unless they belong to a 'special' socket
that promises undelayed processing.

Thanks for these ideas, I'll look into them.


signature.asc
Description: This is a digitally signed message part

[PATCH 8/8] Forget the zero_it argument of sk_alloc()

2007-10-31 Thread Pavel Emelyanov

Finally, the zero_it argument can be completely removed from
the callers and from the function prototype.

Besides, fix the checkpatch.pl warnings about using the
assignments inside if-s.

This patch is rather big, and it is a part of the previous one.
I splitted it wishing to make the patches more readable. Hope 
this particular split helped.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index 8936ed3..a005d8f 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -491,7 +491,7 @@ static int pppoe_create(struct net *net, struct socket 
*sock)
int error = -ENOMEM;
struct sock *sk;
 
-   sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, pppoe_sk_proto, 1);
+   sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, pppoe_sk_proto);
if (!sk)
goto out;
 
diff --git a/drivers/net/pppol2tp.c b/drivers/net/pppol2tp.c
index 921d4ef..f8904fd 100644
--- a/drivers/net/pppol2tp.c
+++ b/drivers/net/pppol2tp.c
@@ -1416,7 +1416,7 @@ static int pppol2tp_create(struct net *net, struct socket 
*sock)
int error = -ENOMEM;
struct sock *sk;
 
-   sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, pppol2tp_sk_proto, 1);
+   sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, pppol2tp_sk_proto);
if (!sk)
goto out;
 
diff --git a/include/net/sock.h b/include/net/sock.h
index ecad7b4..20de3fa 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -779,7 +779,7 @@ extern void FASTCALL(release_sock(struct sock *sk));
 
 extern struct sock *sk_alloc(struct net *net, int family,
  gfp_t priority,
- struct proto *prot, int zero_it);
+ struct proto *prot);
 extern voidsk_free(struct sock *sk);
 extern struct sock *sk_clone(const struct sock *sk,
  const gfp_t priority);
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 7c0b515..e0d37d6 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1044,7 +1044,7 @@ static int atalk_create(struct net *net, struct socket 
*sock, int protocol)
if (sock-type != SOCK_RAW  sock-type != SOCK_DGRAM)
goto out;
rc = -ENOMEM;
-   sk = sk_alloc(net, PF_APPLETALK, GFP_KERNEL, ddp_proto, 1);
+   sk = sk_alloc(net, PF_APPLETALK, GFP_KERNEL, ddp_proto);
if (!sk)
goto out;
rc = 0;
diff --git a/net/atm/common.c b/net/atm/common.c
index e166d9e..eba09a0 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -133,7 +133,7 @@ int vcc_create(struct net *net, struct socket *sock, int 
protocol, int family)
sock-sk = NULL;
if (sock-type == SOCK_STREAM)
return -EINVAL;
-   sk = sk_alloc(net, family, GFP_KERNEL, vcc_proto, 1);
+   sk = sk_alloc(net, family, GFP_KERNEL, vcc_proto);
if (!sk)
return -ENOMEM;
sock_init_data(sock, sk);
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 993e5c7..8378afd 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -836,7 +836,8 @@ static int ax25_create(struct net *net, struct socket 
*sock, int protocol)
return -ESOCKTNOSUPPORT;
}
 
-   if ((sk = sk_alloc(net, PF_AX25, GFP_ATOMIC, ax25_proto, 1)) == NULL)
+   sk = sk_alloc(net, PF_AX25, GFP_ATOMIC, ax25_proto);
+   if (sk == NULL)
return -ENOMEM;
 
ax25 = sk-sk_protinfo = ax25_create_cb();
@@ -861,7 +862,8 @@ struct sock *ax25_make_new(struct sock *osk, struct 
ax25_dev *ax25_dev)
struct sock *sk;
ax25_cb *ax25, *oax25;
 
-   if ((sk = sk_alloc(osk-sk_net, PF_AX25, GFP_ATOMIC, osk-sk_prot, 1)) 
== NULL)
+   sk = sk_alloc(osk-sk_net, PF_AX25, GFP_ATOMIC, osk-sk_prot);
+   if (sk == NULL)
return NULL;
 
if ((ax25 = ax25_create_cb()) == NULL) {
diff --git a/net/bluetooth/bnep/sock.c b/net/bluetooth/bnep/sock.c
index f718965..9ebd3c6 100644
--- a/net/bluetooth/bnep/sock.c
+++ b/net/bluetooth/bnep/sock.c
@@ -213,7 +213,7 @@ static int bnep_sock_create(struct net *net, struct socket 
*sock, int protocol)
if (sock-type != SOCK_RAW)
return -ESOCKTNOSUPPORT;
 
-   sk = sk_alloc(net, PF_BLUETOOTH, GFP_ATOMIC, bnep_proto, 1);
+   sk = sk_alloc(net, PF_BLUETOOTH, GFP_ATOMIC, bnep_proto);
if (!sk)
return -ENOMEM;
 
diff --git a/net/bluetooth/cmtp/sock.c b/net/bluetooth/cmtp/sock.c
index cf700c2..783edab 100644
--- a/net/bluetooth/cmtp/sock.c
+++ b/net/bluetooth/cmtp/sock.c
@@ -204,7 +204,7 @@ static int cmtp_sock_create(struct net *net, struct socket 
*sock, int protocol)
if (sock-type != SOCK_RAW)
return -ESOCKTNOSUPPORT;
 
-   sk = sk_alloc(net, PF_BLUETOOTH, GFP_ATOMIC, cmtp_proto, 1);
+   sk = sk_alloc(net, PF_BLUETOOTH, GFP_ATOMIC,

Re: [PATCH 06/33] mm: allow PF_MEMALLOC from softirq context

2007-10-31 Thread Peter Zijlstra

On Wed, 2007-10-31 at 21:49 +1100, Nick Piggin wrote:
 On Wednesday 31 October 2007 21:42, Peter Zijlstra wrote:
  On Wed, 2007-10-31 at 14:51 +1100, Nick Piggin wrote:
   On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote:
Allow PF_MEMALLOC to be set in softirq context. When running softirqs
from a borrowed context save current-flags, ksoftirqd will have its
own task_struct.
  
   What's this for? Why would ksoftirqd pick up PF_MEMALLOC? (I guess
   that some networking thing must be picking it up in a subsequent patch,
   but I'm too lazy to look!)... Again, can you have more of a rationale in
   your patch headers, or ref the patch that uses it... thanks
 
  Right, I knew I was forgetting something in these changelogs.
 
  The network stack does quite a bit of packet processing from softirq
  context. Once you start swapping over network, some of the packets want
  to be processed under PF_MEMALLOC.
 
 Hmm... what about processing from interrupt context?

From what I could tell that is not done, ISR just fills the skb and
sticks it on an RX queue to be further processed by the softirq.


signature.asc
Description: This is a digitally signed message part

Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages

2007-10-31 Thread Peter Zijlstra

On Wed, 2007-10-31 at 13:54 +0100, Peter Zijlstra wrote:
 On Wed, 2007-10-31 at 22:25 +1100, Nick Piggin wrote:

  What I'm saying is that the slab allocator slowpath should always
  just check watermarks against the current task. Instead of this
  -reserve stuff.
 
 So what you say is to allocate a slab every time we take the slow path,
 even when we already have one?

BTW, a task that does not have reserve permissions will already attempt
to allocate a new slab - this is done to probe the current watermarks.
If this succeeds the reserve status is lifted.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/8] Cleanup/fix the sk_alloc() call

2007-10-31 Thread Arnaldo Carvalho de Melo

Em Wed, Oct 31, 2007 at 04:40:01PM +0300, Pavel Emelyanov escreveu:
 The sk_alloc() function suffers from two problems:
 1 (major). The error path is not clean in it - if the security
call fails, the net namespace is not put, if the try_module_get
fails  additionally the security context is not released;
 2 (minor). The zero_it argument is misleading, as it doesn't just 
zeroes it, but performs some extra setup. Besides this argument 
is used only in one place - in the sk_clone().
 
 So this set fixes these problems and performs some additional
 cleanup.
 
 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

for the series:

Acked-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]

Haven't tested, but it looks straightforward and conceptually sound,
thanks for improving the sk_prot infrastructure! :-)

Now we have just to make all the other protocols fill in the missing
sk-sk_prot- methods (converting what is there now in socket-ops) so
that we can kill socket-ops and eliminate one level of indirection :-P

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Arnaldo Carvalho de Melo

Em Wed, Oct 31, 2007 at 01:56:53PM +0100, Peter Zijlstra escreveu:
 On Wed, 2007-10-31 at 08:16 -0400, Jeff Garzik wrote:
  Thoughts:
  
  1) I absolutely agree that NFS is far more prominent and useful than any 
  network block device, at the present time.
  
  
  2) Nonetheless, swap over NFS is a pretty rare case.  I view this work 
  as interesting, but I really don't see a huge need, for swapping over 
  NBD or swapping over NFS.  I tend to think swapping to a remote resource 
  starts to approach migration rather than merely swapping.  Yes, we can 
  do it...  but given the lack of burning need one must examine the price.
 
 There is a large corporate demand for this, which is why I'm doing this.
 
 The typical usage scenarios are:
  - cluster/blades, where having local disks is a cost issue (maintenance
of failures, heat, etc)
  - virtualisation, where dumping the storage on a networked storage unit
makes for trivial migration and what not..
 
 But please, people who want this (I'm sure some of you are reading) do
 speak up. I'm just the motivated corporate drone implementing the
 feature :-)

Keep it up, Dave already mentioned iSCSI, there is AoE, there are RT
sockets, you name it, the networking bits we've talked about several
times, they look OK, so I'm sorry for not going over all of them in
detail, but you have my support neverthless.

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/8] Cleanup/fix the sk_alloc() call

2007-10-31 Thread Pavel Emelyanov

Arnaldo Carvalho de Melo wrote:
 Em Wed, Oct 31, 2007 at 04:40:01PM +0300, Pavel Emelyanov escreveu:
 The sk_alloc() function suffers from two problems:
 1 (major). The error path is not clean in it - if the security
call fails, the net namespace is not put, if the try_module_get
fails  additionally the security context is not released;
 2 (minor). The zero_it argument is misleading, as it doesn't just 
zeroes it, but performs some extra setup. Besides this argument 
is used only in one place - in the sk_clone().

 So this set fixes these problems and performs some additional
 cleanup.

 Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
 
 for the series:
 
 Acked-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]

Thanks a lot :)

 Haven't tested, but it looks straightforward and conceptually sound,
 thanks for improving the sk_prot infrastructure! :-)

 Now we have just to make all the other protocols fill in the missing
 sk-sk_prot- methods (converting what is there now in socket-ops) so
 that we can kill socket-ops and eliminate one level of indirection :-P

Do I get your idea right, that having the 'struct sock-ops' field is not
that good and the long-term TODO is to remove it (or smth similar)? Can you, 
please, pour some more light on this, because I'm not yet very common with 
the networking code, but I'm trying to learn it better by fixing obvious 
bugs and cleaning the code.

 - Arnaldo

Thanks,
Pavel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Gregory Haskins

Peter Zijlstra wrote:

 
 But please, people who want this (I'm sure some of you are reading) do
 speak up. I'm just the motivated corporate drone implementing the
 feature :-)

FWIW, I could have used a swap to network technology X like system at
my last job.  We were building a large networking switch with blades,
and the IO cards didn't have anywhere near the resources that the
control modules had (no persistent storage, small ram, etc).  We were
already doing userspace coredumps over NFS to the control cards.  It
would have been nice to swap as well.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Fix networking scatterlist regressions.

2007-10-31 Thread Herbert Xu

On Tue, Oct 30, 2007 at 08:40:02PM -0700, David Miller wrote:
 
 I just checked the following bug fix into net-2.6

Thanks for getting to the bottom of this Dave! I seem to have
mistaken the = for a |= in sg_mark_end :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Fix networking scatterlist regressions.

2007-10-31 Thread Jens Axboe

On Wed, Oct 31 2007, Herbert Xu wrote:
 On Tue, Oct 30, 2007 at 08:40:02PM -0700, David Miller wrote:
  
  I just checked the following bug fix into net-2.6
 
 Thanks for getting to the bottom of this Dave! I seem to have
 mistaken the = for a |= in sg_mark_end :)

I don't blame you, that function was definitely non-intuitive!

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [UDP6]: Restore sk_filter optimisation

2007-10-31 Thread Mitsuru Chinen

On Mon, 29 Oct 2007 20:53:28 +0800
Herbert Xu [EMAIL PROTECTED] wrote:

 On Mon, Oct 29, 2007 at 03:33:20PM +0900, Mitsuru Chinen wrote:
  Hello Herbert,
  
  Let me ask a question about this patch.
  After this patch was applied, 2 of the protocol stack behaviors were
  changed when it receives a UDP datagram with broken checksum:
  
   1. udp6InDatagrams is incremented instead of udpInErrors
   2. In userland, recvfrom() replies an error with EAGAIN.
  recvfrom() wasn't aware of such a packet before.
  
  Are these changes intentional?
 
 It wasn't my intention if that's what you mean :)
 
 However, this would've happened with the old code anyway if
 someone had a filter attached so this isn't new.

 If it's a problem then we should just get it fixed.

As far as I tested, this doesn't happen with the old code even if
a filter is attached. However, this happen with the new code
without a filter and I don't see this rather when a filter is
attached. So, I'm afraid it's new.

By the way, could you answer the Yoshifuji-san's question?
I think the code where we should fix depends on this. 

On Mon, 29 Oct 2007 15:41:50 +0900 (JST)
YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] wrote:

 And, we're not sure how much the optimization's benefit is.
 It is even worse when we are handling multicast packets.

Thank you,

Mitsuru Chinen [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

af_packet.c flush_dcache_page

2007-10-31 Thread Patrick McHardy


I'm currently adding mmap support to af_netlink based on the
af_packet implementation and I'm wondering about this code in
tpacket_rcv():

h-tp_status = status;
smp_mb();

{
struct page *p_start, *p_end;
u8 *h_end = (u8 *)h + macoff + snaplen - 1;

p_start = virt_to_page(h);
p_end = virt_to_page(h_end);
while (p_start = p_end) {
flush_dcache_page(p_start);
p_start++;
}
}

Shouldn't the flushing be done in reverse order to make sure
that the page containing tp_status is flushed last and userspace
doesn't start looking at following pages before all dcache entries
are flushed?

A related question: Documentation/cachetlb.txt mentions that
flushing also needs to be done for reading of shared+writable
mapped pages, so it seems like we also need to call flush_dcache_page 
before the tp_status check earlier in that function and packet_poll().

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/8] Cleanup/fix the sk_alloc() call

2007-10-31 Thread Arnaldo Carvalho de Melo

Em Wed, Oct 31, 2007 at 05:32:20PM +0300, Pavel Emelyanov escreveu:
 Arnaldo Carvalho de Melo wrote:
  Em Wed, Oct 31, 2007 at 04:40:01PM +0300, Pavel Emelyanov escreveu:
  The sk_alloc() function suffers from two problems:
  1 (major). The error path is not clean in it - if the security
 call fails, the net namespace is not put, if the try_module_get
 fails  additionally the security context is not released;
  2 (minor). The zero_it argument is misleading, as it doesn't just 
 zeroes it, but performs some extra setup. Besides this argument 
 is used only in one place - in the sk_clone().
 
  So this set fixes these problems and performs some additional
  cleanup.
 
  Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
  
  for the series:
  
  Acked-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
 
 Thanks a lot :)
 
  Haven't tested, but it looks straightforward and conceptually sound,
  thanks for improving the sk_prot infrastructure! :-)
 
  Now we have just to make all the other protocols fill in the missing
  sk-sk_prot- methods (converting what is there now in socket-ops) so
  that we can kill socket-ops and eliminate one level of indirection :-P
 
 Do I get your idea right, that having the 'struct sock-ops' field is not
 that good and the long-term TODO is to remove it (or smth similar)? Can you, 
 please, pour some more light on this, because I'm not yet very common with 
 the networking code, but I'm trying to learn it better by fixing obvious 
 bugs and cleaning the code.

Start here:

const struct proto_ops inet_stream_ops = {
.family= PF_INET,
.owner = THIS_MODULE,
.release   = inet_release,
.bind  = inet_bind,
.connect   = inet_stream_connect,
.socketpair= sock_no_socketpair,
.accept= inet_accept,
.getname   = inet_getname,
.poll  = tcp_poll,
.ioctl = inet_ioctl,
.listen= inet_listen,
.shutdown  = inet_shutdown,
.setsockopt= sock_common_setsockopt,
.getsockopt= sock_common_getsockopt,
.sendmsg   = tcp_sendmsg,
.recvmsg   = sock_common_recvmsg,
.mmap  = sock_no_mmap,
.sendpage  = tcp_sendpage,
#ifdef CONFIG_COMPAT
.compat_setsockopt = compat_sock_common_setsockopt,
.compat_getsockopt = compat_sock_common_getsockopt,
#endif
};

Now look at all the *_common_* stuff, for instance:

int sock_common_recvmsg(struct kiocb *iocb, struct socket *sock,
struct msghdr *msg, size_t size, int flags)
{
struct sock *sk = sock-sk;
int addr_len = 0;
int err;

err = sk-sk_prot-recvmsg(iocb, sk, msg, size, flags  MSG_DONTWAIT,
   flags  ~MSG_DONTWAIT, addr_len);
if (err = 0)
msg-msg_namelen = addr_len;
return err;
}

So if we made all protocols implement sk-sk_prot_recvmsg... got it?

And then look at the inet_* routines above, at least for LLC I was using
several unmodified.

Over the years the quality work is done on the mainstream protocols,
with the legacy ones lagging behind, so the more we share...

Anyway, look at my paper about it:

http://www.linuxsymposium.org/proceedings/reprints/Reprint-Melo-OLS2004.pdf

The DCCP paper also talks about this:

http://www.linuxinsight.com/files/ols2005/melo-reprint.pdf

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Byron Stanoszek

On Wed, 31 Oct 2007, Nick Piggin wrote:

On Wednesday 31 October 2007 15:37, David Miller wrote:

From: Nick Piggin [EMAIL PROTECTED]
Date: Wed, 31 Oct 2007 14:26:32 +1100

Is it really worth all the added complexity of making swap
over NFS files work, given that you could use a network block
device instead?

Don't be misled.  Swapping over NFS is just a scarecrow for the
seemingly real impetus behind these changes which is network storage
stuff like iSCSI.

Oh, I'm OK with the network reserves stuff (not the actual patch,
which I'm not really qualified to review, but at least the idea
of it...).

And also I'm not as such against the idea of swap over network.

However, specifically the change to make swapfiles work through
the filesystem layer (ATM it goes straight to the block layer,
modulo some initialisation stuff which uses block filesystem-
specific calls).

I mean, I assume that anybody trying to swap over network *today*
has to be using a network block device anyway, so the idea of
just being able to transparently improve that case seems better
than adding new complexities for seemingly not much gain.

I have some embedded diskless devices that have 16 MB of RAM and 500MB of
swap. Its root fs and swap device are both done over NBD because NFS is too
expensive in 16MB of RAM. Any memory contention (i.e needing memory to swap
memory over the network), however infrequent, causes the system to freeze when
about 50 MB of VM is used up. I would love to see some work done in this area.

 -Byron

--
Byron Stanoszek Ph: (330) 644-3059
Systems Programmer  Fax: (330) 644-8110
Commercial Timesharing Inc. Email: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Mike Snitzer

On 10/31/07, Peter Zijlstra [EMAIL PROTECTED] wrote:
 On Wed, 2007-10-31 at 08:50 +, Christoph Hellwig wrote:
  On Tue, Oct 30, 2007 at 09:37:53PM -0700, David Miller wrote:
   Don't be misled.  Swapping over NFS is just a scarecrow for the
   seemingly real impetus behind these changes which is network storage
   stuff like iSCSI.
 
  So can we please do swap over network storage only first?  All these
  VM bits look conceptually sane to me, while the changes to the swap
  code to support nfs are real crackpipe material.

 Yeah, I know how you stand on that. I just wanted to post all this
 before going off into the woods reworking it all.
...
  So please get the VM bits for swap over network blockdevices in first,

 Trouble with that part is that we don't have any sane network block
 devices atm, NBD is utter crap, and iSCSI is too complex to be called
 sane.

 Maybe Evgeniy's Distributed storage thingy would work, will have a look
 at that.

Andrew recently asked Evgeniy if his DST was ready for merging; to
which Evgeniy basically said yes:
http://lkml.org/lkml/2007/10/27/54

It would be great if DST could be merged; whereby addressing the fact
that NBD is lacking for net-vm.  If DST were scrutinized in the
context of net-vm it should help it get the review that is needed for
merging.

Mike
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [UDP6]: Restore sk_filter optimisation

2007-10-31 Thread Herbert Xu

On Wed, Oct 31, 2007 at 11:05:45PM +0900, Mitsuru Chinen wrote:

1. udp6InDatagrams is incremented instead of udpInErrors
2. In userland, recvfrom() replies an error with EAGAIN.
   recvfrom() wasn't aware of such a packet before.
   
   Are these changes intentional?

 As far as I tested, this doesn't happen with the old code even if
 a filter is attached. However, this happen with the new code
 without a filter and I don't see this rather when a filter is
 attached. So, I'm afraid it's new.

Sorry, I read the patch the wrong way around :)

1) is just an accounting issue.  It shouldn't be too difficult
to fix it up.  In fact, I think udpInErrors will still be
incremented once we detect the error.

2) shouldn't be an issue because we've already solved the
problem by making poll/select do the checksum verification
before indiciating that the socket is readable.

  And, we're not sure how much the optimization's benefit is.
  It is even worse when we are hand

The checksum verification is costly because we have to bring
the payload into cache.  Since filters are very rare it's
worthwhile to postpone the checksum verification for the common
case.

Also as a general rule, we want to avoid divergent behaviour
between IPv4 and IPv6.  So for changes like this we should
really modify both stacks in future rather than have each
stack do its own thing.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][RFC] Add support for the RDC R6040 Fast Ethernet controller

2007-10-31 Thread Stephen Hemminger

On Mon, 29 Oct 2007 22:51:42 +0100
Florian Fainelli [EMAIL PROTECTED] wrote:

 This patch adds support for the RDC R6040 MAC we can find in the RDC R-321x 
 System-on-chips.
 This driver really needs improvements especially on the NAPI part which 
 probably does not
 fully use the new NAPI structure.
 You will need the RDC PCI identifiers if you want to test this driver which 
 are the following ones :
 
 RDC_PCI_VENDOR_ID = 0x17f3
 RDC_PCI_DEVICE_ID_RDC_R6040 = 0x6040
 
 Thank you very much in advance for your comments.
 
 Signed-off-by: Sten Wang [EMAIL PROTECTED]
 Signed-off-by: Daniel Gimpelevich [EMAIL PROTECTED]
 Signed-off-by: Florian Fainelli [EMAIL PROTECTED]


 BUG *** Don't call kfree() to free the network device; use free_netdev()

* Don't define use uppercase for variable names (NUM_MAC_TABLE)

* Use get_random_ether_addr() rather than a hardcoded table of mac addresses.

* checkpatch complains about some extra blanks, and several lines  80 chars.

* use ethtool stubs for check_link

* add ethtool get_settings to allow use by bonding/bridging, etc.

* this is unusual coding style:
+   do {} while ((i++  2048)  (inw(ioaddr + 0x04)  0x1));

* add a blank line after declarations and before code in a function

* use of global NAPI_status should be replaced by putting it in priv

* the handling of shared IRQ is wrong.
 - need to check for status == 0 || status == 0x and return IRQ_NONE

* don't call napi_disable() with irq's disabled in r6040_close

* poll routine shouldn't call dev_kfree_skb_irq() to free Tx buffers because
   that means going through TX softirq, just call dev_kfree_skb()

* the down routine calls pci_unmap_single with wrong length when handling
   TX buffers.

* pci id table can be cleaned up:
static struct pci_device_id r6040_pci_tbl[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_RDC, 0x6040) },
{ PCI_DEVICE(PCI_VENDOR_VIA, 0x3065) },
{ 0 }
};

* use netdev_priv() consistently rather than dev-priv.
   Yes they are the same now, but that will be fixed in future.

* eliminate check for dev being NULL in IRQ handler.

* reorder functions to eliminate need for forward declarations

* get rid of R6040_PCI_CMD and pci_flags field it is unused.
  
* do you really have to have the whole chip_info at all? The only usage
   seems to be to validate the pci region size.  Do you have platforms with
   busted BIOS that set it wrong or something??



---

WARNING: no space between function name and open parenthesis '('
#1071: FILE: drivers/net/r6040.c:958:
+static int __init r6040_init (void)

WARNING: no space between function name and open parenthesis '('
#1073: FILE: drivers/net/r6040.c:960:
+   return pci_register_driver (r6040_driver);

WARNING: no space between function name and open parenthesis '('
#1077: FILE: drivers/net/r6040.c:964:
+static void __exit r6040_cleanup (void)

WARNING: no space between function name and open parenthesis '('
#1079: FILE: drivers/net/r6040.c:966:
+   pci_unregister_driver (r6040_driver);

total: 0 errors, 36 warnings, 1001 lines checked
Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] NFS: handle IPv6 addresses in nfs ctl

2007-10-31 Thread J. Bruce Fields

On Wed, Oct 31, 2007 at 10:06:18AM +0100, Aurélien Charbon wrote:
 Thank you Brian
 Sorry, I did not see what you sent.

 I have tested it with an IPv4 configuration. It's OK.
 So Neil, Bruce, you can take this one for review.

Did you miss Neil's question about the nfsctl stuff?  (Do we really need
that, or would the changes to the ip_map cache be sufficient?)--b.


 fs/nfsd/export.c   |9 ++-
 fs/nfsd/nfsctl.c   |   42 --
 include/linux/sunrpc/svcauth.h |5 +
 include/net/ipv6.h |   10 +++
 net/sunrpc/svcauth_unix.c  |  118 
 +++--
 5 files changed, 134 insertions(+), 50 deletions(-)

 Signed-off-by: Brian Haley [EMAIL PROTECTED]
 Signed-off-by: Aurelien Charbon [EMAIL PROTECTED]

 ---

 diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
 index 66d0aeb..c47ba77 100644
 --- a/fs/nfsd/export.c
 +++ b/fs/nfsd/export.c
 @@ -35,6 +35,7 @@
 #include linux/lockd/bind.h
 #include linux/sunrpc/msg_prot.h
 #include linux/sunrpc/gss_api.h
 +#include net/ipv6.h
 #define NFSDDBG_FACILITYNFSDDBG_EXPORT
 @@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp)
 {
 struct auth_domain*dom;
 inti, err;
 +struct in6_addrin6;
 /* First, consistency check. */
 err = -EINVAL;
 @@ -1574,9 +1576,10 @@ exp_addclient(struct nfsctl_client *ncp)
 goto out_unlock;
 /* Insert client into hashtable. */
 -for (i = 0; i  ncp-cl_naddr; i++)
 -auth_unix_add_addr(ncp-cl_addrlist[i], dom);
 -
 +for (i = 0; i  ncp-cl_naddr; i++) {
 +ipv6_addr_set_v4mapped(ncp-cl_addrlist[i].s_addr, in6);
 +auth_unix_add_addr(in6, dom);
 +}
 auth_unix_forget_old(dom);
 auth_domain_put(dom);
 diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
 index 77dc989..5cb5f0d 100644
 --- a/fs/nfsd/nfsctl.c
 +++ b/fs/nfsd/nfsctl.c
 @@ -37,6 +37,7 @@
 #include linux/nfsd/syscall.h
 #include asm/uaccess.h
 +#include net/ipv6.h
 /*
  *We have a single directory with 9 nodes in it.
 @@ -219,24 +220,37 @@ static ssize_t write_getfs(struct file *file, char 
 *buf, size_t size)
 {
 struct nfsctl_fsparm *data;
 struct sockaddr_in *sin;
 +struct sockaddr_in6 *sin6;
 struct auth_domain *clp;
 int err = 0;
 struct knfsd_fh *res;
 +struct in6_addr in6;
 if (size  sizeof(*data))
 return -EINVAL;
 data = (struct nfsctl_fsparm*)buf;
 err = -EPROTONOSUPPORT;
 -if (data-gd_addr.sa_family != AF_INET)
 +switch (data-gd_addr.sa_family) {
 +case AF_INET:
 +sin = (struct sockaddr_in *)data-gd_addr;
 +ipv6_addr_set_v4mapped(sin-sin_addr.s_addr, in6);
 +break;
 +case AF_INET6:
 +sin6 = (struct sockaddr_in6 *)data-gd_addr;
 +ipv6_addr_copy(in6, sin6-sin6_addr);
 +break;
 +default:
 goto out;
 -sin = (struct sockaddr_in *)data-gd_addr;
 +}
 +
 if (data-gd_maxlen  NFS3_FHSIZE)
 data-gd_maxlen = NFS3_FHSIZE;
 res = (struct knfsd_fh*)buf;
 exp_readlock();
 -if (!(clp = auth_unix_lookup(sin-sin_addr)))
 +
 +if (!(clp = auth_unix_lookup(in6)))
 err = -EPERM;
 else {
 err = exp_rootfh(clp, data-gd_path, res, data-gd_maxlen);
 @@ -253,25 +267,41 @@ static ssize_t write_getfd(struct file *file, char 
 *buf, size_t size)
 {
 struct nfsctl_fdparm *data;
 struct sockaddr_in *sin;
 +struct sockaddr_in6 *sin6;
 struct auth_domain *clp;
 int err = 0;
 struct knfsd_fh fh;
 char *res;
 +struct in6_addr in6;
 if (size  sizeof(*data))
 return -EINVAL;
 data = (struct nfsctl_fdparm*)buf;
 err = -EPROTONOSUPPORT;
 -if (data-gd_addr.sa_family != AF_INET)
 +if (data-gd_addr.sa_family != AF_INET 
 +data-gd_addr.sa_family != AF_INET6)
 goto out;
 err = -EINVAL;
 if (data-gd_version  2 || data-gd_version  NFSSVC_MAXVERS)
 goto out;
 res = buf;
 -sin = (struct sockaddr_in *)data-gd_addr;
 exp_readlock();
 -if (!(clp = auth_unix_lookup(sin-sin_addr)))
 +
 +switch (data-gd_addr.sa_family) {
 +case AF_INET:
 +sin = (struct sockaddr_in *)data-gd_addr;
 +ipv6_addr_set_v4mapped(sin-sin_addr.s_addr, in6);
 +break;
 +case AF_INET6:
 +sin6 = (struct sockaddr_in6 *)data-gd_addr;
 +ipv6_addr_copy(in6, sin6-sin6_addr);
 +break;
 +default:
 +goto out;
 +}
 +
 +if (!(clp = auth_unix_lookup(in6)))
 err = -EPERM;
 else {
 err = exp_rootfh(clp, data-gd_path, fh, NFS_FHSIZE);
 diff --git a/include/linux/sunrpc/svcauth.h 
 b/include/linux/sunrpc/svcauth.h
 index 22e1ef8..64ecb93 100644
 --- a/include/linux/sunrpc/svcauth.h
 +++ b/include/linux/sunrpc/svcauth.h
 @@ -15,6 +15,7 @@
 #include linux/sunrpc/msg_prot.h
 #include linux/sunrpc/cache.h
 #include linux/hash.h
 +#include net/ipv6.h
 #define SVC_CRED_NGROUPS32
 struct svc_cred {
 @@ -120,10 +121,10 @@

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Evgeniy Polyakov

Hi.

On Wed, Oct 31, 2007 at 10:54:02AM -0400, Mike Snitzer ([EMAIL PROTECTED]) 
wrote:
  Trouble with that part is that we don't have any sane network block
  devices atm, NBD is utter crap, and iSCSI is too complex to be called
  sane.
 
  Maybe Evgeniy's Distributed storage thingy would work, will have a look
  at that.
 
 Andrew recently asked Evgeniy if his DST was ready for merging; to
 which Evgeniy basically said yes:
 http://lkml.org/lkml/2007/10/27/54
 
 It would be great if DST could be merged; whereby addressing the fact
 that NBD is lacking for net-vm.  If DST were scrutinized in the
 context of net-vm it should help it get the review that is needed for
 merging.

By popular request I'm working on adding strong checksumming of the data
transferred, so I can not say that Andrew will want to merge this during
development phase. I expect to complete it quite soon (it is in testing
stage right now) though with new release scheduled this week. It will
also include some small features for userspace (hapiness).

Memory management is not changed.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

drivers/net/tlan question

2007-10-31 Thread Gabriel C

Hi,

I noticed on current git the following warning with !CONFIG_PCI :

...

drivers/net/tlan.c: In function 'TLan_probe1':
drivers/net/tlan.c:682: warning: label 'err_out' defined but not used

...

I thought a simply #ifdef is missing but looking at TLan_probe1() I got 
confused about err_out_regions

...

#ifdef CONFIG_PCI
if (pdev) {
rc = pci_enable_device(pdev);
if (rc)
return rc;

rc = pci_request_regions(pdev, TLanSignature);
if (rc) {
printk(KERN_ERR TLAN: Could not reserve IO regions\n);
goto err_out;
}
}
#endif  /*  CONFIG_PCI  */

dev = alloc_etherdev(sizeof(TLanPrivateInfo));
if (dev == NULL) {
printk(KERN_ERR TLAN: Could not allocate memory for 
device.\n);
rc = -ENOMEM;
goto err_out_regions;
}

...

...

err_out_regions:
#ifdef CONFIG_PCI
if (pdev)
pci_release_regions(pdev);
#endif
err_out:
if (pdev)
pci_disable_device(pdev);
return rc;

...

It is possible 'dev' to be NULL with !CONFIG_PCI ? If is true then 
err_out_regions: does nothing ? 

Does this look right ?


Regards,

Gabriel 

 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] tlan list is subscribers-only

2007-10-31 Thread Gabriel C

...

Your mail to 'Tlan-devel' with the subject

drivers/net/tlan question

Is being held until the list moderator can review it for approval.

The reason it is being held:

Post by non-member to a members-only list

...

Signed-off-by: Gabriel Craciunescu [EMAIL PROTECTED]

---

 MAINTAINERS |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4a26f83..6a116f3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3729,7 +3729,7 @@ S:Maintained
 TLAN NETWORK DRIVER
 P: Samuel Chessman
 M: [EMAIL PROTECTED]
-L: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED] (subscribers-only)
 W: http://sourceforge.net/projects/tlan/
 S: Maintained
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/5] Relax the reference counting of init_net_ns

2007-10-31 Thread Pavel Emelyanov

When the CONFIG_NET_NS is n there's no need in refcounting
the initial net namespace. So relax this code by making a
stupid stubs for the n case.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 5279466..1fd449a 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -51,13 +51,12 @@ static inline struct net *copy_net_ns(unsigned long flags, 
struct net *net_ns)
 }
 #endif
 
+#ifdef CONFIG_NET_NS
 extern void __put_net(struct net *net);
 
 static inline struct net *get_net(struct net *net)
 {
-#ifdef CONFIG_NET
atomic_inc(net-count);
-#endif
return net;
 }
 
@@ -75,26 +74,44 @@ static inline struct net *maybe_get_net(struct net *net)
 
 static inline void put_net(struct net *net)
 {
-#ifdef CONFIG_NET
if (atomic_dec_and_test(net-count))
__put_net(net);
-#endif
 }
 
 static inline struct net *hold_net(struct net *net)
 {
-#ifdef CONFIG_NET
atomic_inc(net-use_count);
-#endif
return net;
 }
 
 static inline void release_net(struct net *net)
 {
-#ifdef CONFIG_NET
atomic_dec(net-use_count);
-#endif
 }
+#else
+static inline struct net *get_net(struct net *net)
+{
+   return net;
+}
+
+static inline void put_net(struct net *net)
+{
+}
+
+static inline struct net *hold_net(struct net *net)
+{
+   return net;
+}
+
+static inline void release_net(struct net *net)
+{
+}
+
+static inline struct net *maybe_get_net(struct net *net)
+{
+   return net;
+}
+#endif
 
 #define for_each_net(VAR)  \
list_for_each_entry(VAR, net_namespace_list, list)
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5][NETNS] Make the init/exit hooks checks outside the loop

2007-10-31 Thread Pavel Emelyanov

When the new pernet something (subsys, device or operations) is
being registered, the init callback is to be called for each
namespace, that currently exitst in the system. During the
unregister, the same is to be done with the exit callback.

However, not every pernet something has both calls, but the
check for the appropriate pointer to be not NULL is performed
inside the for_each_net() loop.

This is (at least) strange, so tune this.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 662e6ea..4e52921 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -187,29 +187,28 @@ static int register_pernet_operations(struct list_head 
*list,
struct net *net, *undo_net;
int error;
 
-   error = 0;
list_add_tail(ops-list, list);
-   for_each_net(net) {
-   if (ops-init) {
+   if (ops-init) {
+   for_each_net(net) {
error = ops-init(net);
if (error)
goto out_undo;
}
}
-out:
-   return error;
+   return 0;
 
 out_undo:
/* If I have an error cleanup all namespaces I initialized */
list_del(ops-list);
-   for_each_net(undo_net) {
-   if (undo_net == net)
-   goto undone;
-   if (ops-exit)
+   if (ops-exit) {
+   for_each_net(undo_net) {
+   if (undo_net == net)
+   goto undone;
ops-exit(undo_net);
+   }
}
 undone:
-   goto out;
+   return error;
 }
 
 static void unregister_pernet_operations(struct pernet_operations *ops)
@@ -217,8 +216,8 @@ static void unregister_pernet_operations(struct 
pernet_operations *ops)
struct net *net;
 
list_del(ops-list);
-   for_each_net(net)
-   if (ops-exit)
+   if (ops-exit)
+   for_each_net(net)
ops-exit(net);
 }
 
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/5] Make nicer CONFIG_NET_NS=n case code

2007-10-31 Thread Pavel Emelyanov

Currently we have the NET_NS config option, but the only change it 
makes is just return ERR_PTR(-EINVAL) inside the cloning call thus
introducing a bunch of a dead code and making the reference counting
unneeded. This is not very good.

So clean the net_namespace.c to fix this.

I have sent a set of patches to Andrew to make similar thing for
other namespaces, which introduces the NAMESPACES option to turn
all the namespaces off at once (to make embedded people suffer
less). So after that stuff is in, there will be some more patches 
to tie all this together.

What is to be done after this set is to make the register_pernet_xxx
stuff smaller. Currently this code weights approximately 500 bytes, 
so it worths reducing it, but I haven't found a good solution yet.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/5] Mark the setup_net as __net_init

2007-10-31 Thread Pavel Emelyanov

The setup_net is called for the init net namespace
only (int the CONFIG_NET_NS=n of course) from the __init
function, so mark it as __net_init to disappear with the
caller after the boot.

Yet again, in the perfect world this has to be under
#ifdef CONFIG_NET_NS, but it isn't guaranteed that every
subsystem is registered *after* the init_net_ns is set
up. After we are sure, that we don't start registering
them before the init net setup, we'll be able to move
this code under the ifdef.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index d5bf8b2..a044e2d 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -25,7 +25,7 @@ EXPORT_SYMBOL_GPL(init_net);
 /*
  * setup_net runs the initializers for the network namespace object.
  */
-static int setup_net(struct net *net)
+static __net_init int setup_net(struct net *net)
 {
/* Must be called with net_mutex held */
struct pernet_operations *ops;
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] Hide the net_ns kmem cache

2007-10-31 Thread Pavel Emelyanov

This cache is only required to create new namespaces,
but we won't have them in CONFIG_NET_NS=n case.

Hide it under the appropriate ifdef.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index a044e2d..e9f0964 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -17,8 +17,6 @@ static DEFINE_MUTEX(net_mutex);
 
 LIST_HEAD(net_namespace_list);
 
-static struct kmem_cache *net_cachep;
-
 struct net init_net;
 EXPORT_SYMBOL_GPL(init_net);
 
@@ -59,6 +57,8 @@ out_undo:
 }
 
 #ifdef CONFIG_NET_NS
+static struct kmem_cache *net_cachep;
+
 static struct net *net_alloc(void)
 {
return kmem_cache_zalloc(net_cachep, GFP_KERNEL);
@@ -167,9 +167,11 @@ static int __init net_ns_init(void)
int err;
 
printk(KERN_INFO net_namespace: %zd bytes\n, sizeof(struct net));
+#ifdef CONFIG_NET_NS
net_cachep = kmem_cache_create(net_namespace, sizeof(struct net),
SMP_CACHE_BYTES,
SLAB_PANIC, NULL);
+#endif
mutex_lock(net_mutex);
err = setup_net(init_net);
 
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] Hide the dead code in the net_namespace.c

2007-10-31 Thread Pavel Emelyanov

The namespace creation/destruction code is never called
if the CONFIG_NET_NS is n, so it's OK to move it under
appropriate ifdef.

The copy_net_ns() in the n case checks for flags and
returns -EINVAL when new net ns is requested. In a perfect
world this stub must be in net_namespace.h, but this
function need to know the CLONE_NEWNET value and thus
requires sched.h. On the other hand this header is to be
injected into almost every .c file in the networking code,
and making all this code depend on the sched.h is a
suicidal attempt.

Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED]

---

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 4e52921..d5bf8b2 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -22,65 +22,6 @@ static struct kmem_cache *net_cachep;
 struct net init_net;
 EXPORT_SYMBOL_GPL(init_net);
 
-static struct net *net_alloc(void)
-{
-   return kmem_cache_zalloc(net_cachep, GFP_KERNEL);
-}
-
-static void net_free(struct net *net)
-{
-   if (!net)
-   return;
-
-   if (unlikely(atomic_read(net-use_count) != 0)) {
-   printk(KERN_EMERG network namespace not free! Usage: %d\n,
-   atomic_read(net-use_count));
-   return;
-   }
-
-   kmem_cache_free(net_cachep, net);
-}
-
-static void cleanup_net(struct work_struct *work)
-{
-   struct pernet_operations *ops;
-   struct net *net;
-
-   net = container_of(work, struct net, work);
-
-   mutex_lock(net_mutex);
-
-   /* Don't let anyone else find us. */
-   rtnl_lock();
-   list_del(net-list);
-   rtnl_unlock();
-
-   /* Run all of the network namespace exit methods */
-   list_for_each_entry_reverse(ops, pernet_list, list) {
-   if (ops-exit)
-   ops-exit(net);
-   }
-
-   mutex_unlock(net_mutex);
-
-   /* Ensure there are no outstanding rcu callbacks using this
-* network namespace.
-*/
-   rcu_barrier();
-
-   /* Finally it is safe to free my network namespace structure */
-   net_free(net);
-}
-
-
-void __put_net(struct net *net)
-{
-   /* Cleanup the network namespace in process context */
-   INIT_WORK(net-work, cleanup_net);
-   schedule_work(net-work);
-}
-EXPORT_SYMBOL_GPL(__put_net);
-
 /*
  * setup_net runs the initializers for the network namespace object.
  */
@@ -117,6 +58,12 @@ out_undo:
goto out;
 }
 
+#ifdef CONFIG_NET_NS
+static struct net *net_alloc(void)
+{
+   return kmem_cache_zalloc(net_cachep, GFP_KERNEL);
+}
+
 struct net *copy_net_ns(unsigned long flags, struct net *old_net)
 {
struct net *new_net = NULL;
@@ -127,10 +74,6 @@ struct net *copy_net_ns(unsigned long flags, struct net 
*old_net)
if (!(flags  CLONE_NEWNET))
return old_net;
 
-#ifndef CONFIG_NET_NS
-   return ERR_PTR(-EINVAL);
-#endif
-
err = -ENOMEM;
new_net = net_alloc();
if (!new_net)
@@ -157,6 +100,68 @@ out:
return new_net;
 }
 
+static void net_free(struct net *net)
+{
+   if (!net)
+   return;
+
+   if (unlikely(atomic_read(net-use_count) != 0)) {
+   printk(KERN_EMERG network namespace not free! Usage: %d\n,
+   atomic_read(net-use_count));
+   return;
+   }
+
+   kmem_cache_free(net_cachep, net);
+}
+
+static void cleanup_net(struct work_struct *work)
+{
+   struct pernet_operations *ops;
+   struct net *net;
+
+   net = container_of(work, struct net, work);
+
+   mutex_lock(net_mutex);
+
+   /* Don't let anyone else find us. */
+   rtnl_lock();
+   list_del(net-list);
+   rtnl_unlock();
+
+   /* Run all of the network namespace exit methods */
+   list_for_each_entry_reverse(ops, pernet_list, list) {
+   if (ops-exit)
+   ops-exit(net);
+   }
+
+   mutex_unlock(net_mutex);
+
+   /* Ensure there are no outstanding rcu callbacks using this
+* network namespace.
+*/
+   rcu_barrier();
+
+   /* Finally it is safe to free my network namespace structure */
+   net_free(net);
+}
+
+void __put_net(struct net *net)
+{
+   /* Cleanup the network namespace in process context */
+   INIT_WORK(net-work, cleanup_net);
+   schedule_work(net-work);
+}
+EXPORT_SYMBOL_GPL(__put_net);
+
+#else
+struct net *copy_net_ns(unsigned long flags, struct net *old_net)
+{
+   if (flags  CLONE_NEWNET)
+   return ERR_PTR(-EINVAL);
+   return old_net;
+}
+#endif
+
 static int __init net_ns_init(void)
 {
int err;
-- 
1.5.3.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

2007-10-31 Thread Dave Johnson


Depending on the network driver, I'm seeing different behavior if
a .1q packet is received to an PF_PACKET, SOCK_RAW, ETH_P_ALL socket.


On devices what do not use NETIF_F_HW_VLAN_RX, the packet socket gets
the complete packet with vlan tag included as the driver simply calls
netif_receive_skb() or equivilant.  packet_rcv() then gets the whole
thing vlan tag included and sends this through the socket.

vlan_skb_recv() also gets these all and will drop them because there
are no vlans configured.

Example, e100 driver gives this to tcpdump:

# ifconfig eth1 up
# tcpdump -s 2000 -e -n -i eth1
tcpdump: WARNING: eth1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 2000 bytes
14:11:03.707178 00:0b:82:05:22:0a  ff:ff:ff:ff:ff:ff, ethertype 802.1Q 
(0x8100), length 64: vlan 101, p 0, ethertype ARP, arp who-has 192.168.101.191 
tell 192.168.101.131
14:11:04.215164 00:0b:82:05:22:05  ff:ff:ff:ff:ff:ff, ethertype 802.1Q 
(0x8100), length 64: vlan 101, p 0, ethertype ARP, arp who-has 192.168.101.191 
tell 192.168.101.130
14:11:04.658940 00:0b:82:05:22:0c  ff:ff:ff:ff:ff:ff, ethertype 802.1Q 
(0x8100), length 64: vlan 101, p 0, ethertype ARP, arp who-has 192.168.101.191 
tell 192.168.101.135
14:11:05.706070 00:0b:82:05:22:0a  ff:ff:ff:ff:ff:ff, ethertype 802.1Q 
(0x8100), length 64: vlan 101, p 0, ethertype ARP, arp who-has 192.168.101.191 
tell 192.168.101.131
14:11:05.939195 00:b0:c2:e8:d8:1c  33:33:00:00:00:01, ethertype 802.1Q 
(0x8100), length 122: vlan 108, p 0, ethertype IPv6, fe80::2b0:c2ff:fee8:d81c  
ff02::1: icmp6: router advertisement [class 0xe0]
14:11:07.222302 00:b0:c2:e8:d8:1c  33:33:00:00:00:01, ethertype 802.1Q 
(0x8100), length 122: vlan 110, p 0, ethertype IPv6, fe80::2b0:c2ff:fee8:d81c  
ff02::1: icmp6: router advertisement [class 0xe0]
14:11:08.486953 00:b0:c2:e8:d8:1c  01:00:5e:00:00:05, ethertype 802.1Q 
(0x8100), length 134: vlan 110, p 0, ethertype IPv4, IP 192.168.110.20  
224.0.0.5: OSPFv2, Hello (1), length: 80
14:11:11.528569 00:30:48:22:63:50  ff:ff:ff:ff:ff:ff, ethertype 802.1Q 
(0x8100), length 154: vlan 208, p 0, ethertype IPv4, IP 195.180.3.200.33350  
195.180.3.255.111: UDP, length: 108
14:11:12.642762 00:0b:82:05:22:05  ff:ff:ff:ff:ff:ff, ethertype 802.1Q 
(0x8100), length 64: vlan 101, p 0, ethertype ARP, arp who-has 192.168.101.191 
tell 192.168.101.130
14:11:12.642766 00:0b:82:05:22:05  ff:ff:ff:ff:ff:ff, ethertype 802.1Q 
(0x8100), length 64: vlan 101, p 0, ethertype ARP, arp who-has 192.168.101.191 
tell 192.168.101.130

The packet socket gets everything including the vlan tag as I'd
expect.


But on the bnx2 driver (for example) I get 2 different behaviors:

1)

If no vlan interfaces are configured, it calls netif_receive_skb()
because there isn't a vlan group registered via
bnx2_vlan_rx_register().

# ifconfig eth1 up
# tcpdump -s 2000 -e -n -i eth1
tcpdump: WARNING: eth1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 2000 bytes
14:21:27.170505 00:0b:82:05:22:05  ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), 
length 60: arp who-has 192.168.101.191 tell 192.168.101.130
14:21:27.170577 00:0b:82:05:22:05  ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), 
length 60: arp who-has 192.168.101.191 tell 192.168.101.130
14:21:27.495814 00:0b:82:05:22:0c  ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), 
length 60: arp who-has 192.168.101.191 tell 192.168.101.135
14:21:27.495881 00:0b:82:05:22:0c  ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), 
length 60: arp who-has 192.168.101.191 tell 192.168.101.135
14:21:28.151070 00:0b:82:05:22:05  ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), 
length 60: arp who-has 192.168.101.191 tell 192.168.101.130
14:21:28.166780 00:b0:c2:e8:d8:1c  33:33:00:00:00:01, ethertype IPv6 (0x86dd), 
length 118: fe80::2b0:c2ff:fee8:d81c  ff02::1: icmp6: router advertisement 
[class 0xe0]
14:21:28.476404 00:0b:82:05:22:0c  ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), 
length 60: arp who-has 192.168.101.191 tell 192.168.101.135
14:21:28.492099 00:b0:c2:e8:d8:1c  01:00:5e:00:00:05, ethertype IPv4 (0x0800), 
length 130: IP 192.168.110.20  224.0.0.5: OSPFv2, Hello (1), length: 80
14:21:28.631439 00:19:b9:e7:8a:d7  33:33:ff:e7:8a:d7, ethertype IPv6 (0x86dd), 
length 78: ::  ff02::1:ffe7:8ad7: icmp6: neighbor sol: who has 
fd4d:5643:2886:67:219:b9ff:fee7:8ad7
14:21:28.671611 00:0b:82:05:22:0a  ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), 
length 60: arp who-has 192.168.101.191 tell 192.168.101.131
14:21:28.671684 00:0b:82:05:22:0a  ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), 
length 60: arp who-has 192.168.101.191 tell 192.168.101.131

the packet handed to netif_receive_skb() does not have the vlan tag on
it.  this allows all these packets to be processed by not only the
packet ptype handler, but also ip, arp, etc... this seems very wrong
as all vlan packets are stripped

Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code

2007-10-31 Thread Eric Dumazet

On Wed, 31 Oct 2007 22:19:43 +0300
Pavel Emelyanov [EMAIL PROTECTED] wrote:

 Currently we have the NET_NS config option, but the only change it 
 makes is just return ERR_PTR(-EINVAL) inside the cloning call thus
 introducing a bunch of a dead code and making the reference counting
 unneeded. This is not very good.
 
 So clean the net_namespace.c to fix this.
 
 I have sent a set of patches to Andrew to make similar thing for
 other namespaces, which introduces the NAMESPACES option to turn
 all the namespaces off at once (to make embedded people suffer
 less). So after that stuff is in, there will be some more patches 
 to tie all this together.
 
 What is to be done after this set is to make the register_pernet_xxx
 stuff smaller. Currently this code weights approximately 500 bytes, 
 so it worths reducing it, but I haven't found a good solution yet.

Definitly wanted here. Thank you.
One more refcounting on each socket creation/deletion was expensive.

Maybe we can add a macro to get nd_net from a struct net_device
so that every instance of

if (dev-nd_net != init_net)
goto drop;

can also be optimized away if !CONFIG_NET_NS

extern inline netdev_get_ns(struct netdevice *dev)
{
#ifdef CONFIG_NET_NS
return dev-nd_net;
#else
return init_net;
#endif
}

...

if (netdev_get_ns(dev) != init_net)
goto drop;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

2007-10-31 Thread Stephen Hemminger

On Wed, 31 Oct 2007 14:43:51 -0400
Dave Johnson [EMAIL PROTECTED] wrote:

 
 Depending on the network driver, I'm seeing different behavior if
 a .1q packet is received to an PF_PACKET, SOCK_RAW, ETH_P_ALL socket.
 
 
 On devices what do not use NETIF_F_HW_VLAN_RX, the packet socket gets
 the complete packet with vlan tag included as the driver simply calls
 netif_receive_skb() or equivilant.  packet_rcv() then gets the whole
 thing vlan tag included and sends this through the socket.
 
 vlan_skb_recv() also gets these all and will drop them because there
 are no vlans configured.
 

The VLAN acceleration grabs and hides the tag. It is a design flaw
that should be fixed, feel free to post a patch.


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ehea: add kexec support

2007-10-31 Thread Christoph Raisch


Michael Ellerman [EMAIL PROTECTED] wrote on 30.10.2007 23:50:36:

 On Tue, 2007-10-30 at 09:39 +0100, Christoph Raisch wrote:
 
  Michael Ellerman [EMAIL PROTECTED] wrote on 28.10.2007 23:32:17:
  Hope I didn't miss anything here...

 Perhaps. When we kdump the kernel does not call the reboot notifiers, so
 the code Jan-Bernd just added won't get called. So the eHEA resources
 won't be freed. When the kdump kernel tries to load the eHEA driver what
 will happen?

Good point.

If the device driver tries to allocate resources again (in the kdump
kernel),
which have been allocated before (in the crashed kernel) the hcalls will
fail because from the hypervisor view the resources are still in use.
Currently there's no method to find out the resource handles for these
HEA resources allocated by the crashed kernel within the hypervisor...

So we have to trigger a explicit deregister in the hypervisor before the
driver
is started again.
How do you recommend we should trigger this in the kdump process?
Is placing a hook into a ppc_md.machine_kexec be an option?

Gruss / Regards
Christoph R.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Fix IP1000 config dependencies.

2007-10-31 Thread Francois Romieu

David Miller [EMAIL PROTECTED] :
[...]
 Noticed during some randconfig runs.
 
 [NET]: IP1000 driver needs MII.

It is fixed as of bbd82f956e0db6190b16a8a00d3ed5d979f488e8.

-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/14 v2] nes: eeprom and phy routines

2007-10-31 Thread Roland Dreier

  +/* TODO: deal with EEPROM endian issues */

This is pretty scary.  Is the driver broken on big-endian systems now?

  +/*
  +Everything you wanted to know about CRC algorithms, but were afraid to ask
  + for fear that errors in your understanding might be detected. Version  : 
  3.

etc etc... can all this be replaced with what's in lib/crc32.c?  (I
hope so)

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 11/14 v2] nes: OpenFabrics kernel verbs

2007-10-31 Thread Roland Dreier

  +/**
  + * nes_post_send
  + */
  +static int nes_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr,
  +struct ib_send_wr **bad_wr)

  ...

  +switch (ib_wr-opcode) {

  ...

  +if (ib_wr-num_sge  
  nesdev-nesadapter-max_sge) {
  +err = -EINVAL;
  +break;
  +}

  ...

  +default:
  +/* error */
  +err = -EINVAL;
  +break;

looks like if you detect an error while posting a work request, you
break out of the switch statement but just continue through the while
loop going through the list of work reuqests.  Which doesn't seem like
it will work very well.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: docbook fixes for netif_ functions

2007-10-31 Thread Stephen Hemminger

Documentation updates for network interfaces.

1. Add doc for netif_napi_add
2. Remove doc for unused returns from netif_rx
3. Add doc for netif_receive_skb

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/include/linux/netdevice.h 2007-10-31 09:16:09.0 -0700
+++ b/include/linux/netdevice.h 2007-10-31 10:02:15.0 -0700
@@ -739,6 +739,16 @@ static inline void *netdev_priv(const st
  */
 #define SET_NETDEV_DEV(net, pdev)  ((net)-dev.parent = (pdev))
 
+/**
+ * netif_napi_add - initialize a napi context
+ * @dev:  network device
+ * @napi: napi context
+ * @poll: polling function
+ * @weight: default weight
+ *
+ * netif_napi_add() must be used to initialize a napi context prior to calling
+ * *any* of the other napi related functions.
+ */
 static inline void netif_napi_add(struct net_device *dev,
  struct napi_struct *napi,
  int (*poll)(struct napi_struct *, int),
--- a/net/core/dev.c2007-10-31 09:16:09.0 -0700
+++ b/net/core/dev.c2007-10-31 10:00:39.0 -0700
@@ -1751,9 +1751,6 @@ DEFINE_PER_CPU(struct netif_rx_stats, ne
  *
  * return values:
  * NET_RX_SUCCESS  (no congestion)
- * NET_RX_CN_LOW   (low congestion)
- * NET_RX_CN_MOD   (moderate congestion)
- * NET_RX_CN_HIGH  (high congestion)
  * NET_RX_DROP (packet was dropped)
  *
  */
@@ -2001,6 +1998,21 @@ out:
 }
 #endif
 
+/**
+ * netif_receive_skb - process receive buffer from network
+ * @skb: buffer to process
+ *
+ * netif_receive_skb() is the main receive data processing function.
+ * It always succeeds. The buffer may be dropped during processing
+ * for congestion control or by the protocol layers.
+ *
+ * This function may only be called from softirq context and interrupts
+ * should be enabled.
+ *
+ * return values (usually ignored).
+ * NET_RX_SUCCESS  (no congestion)
+ * NET_RX_DROP (packet was dropped)
+ */
 int netif_receive_skb(struct sk_buff *skb)
 {
struct packet_type *ptype, *pt_prev;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] - e1000_ethtool.c - convert macros to functions

2007-10-31 Thread Joe Perches

Convert REG_PATTERN_TEST and REG_SET_AND_CHECK macros to functions
Reduces x86 defconfig image by about 3k

compiled, untested (no hardware)

Signed-off-by: Joe Perches [EMAIL PROTECTED]

New:

$ size vmlinux
   textdata bss dec hex filename
4792735  490626  606208 5889569  59de21 vmlinux

Current:

$ size vmlinux
   textdata bss dec hex filename
4795759  490626  606208 5892593  59e9f1 vmlinux

---

 drivers/net/e1000/e1000_ethtool.c |  185 +
 drivers/net/e1000/e1000_osdep.h   |   42 +
 2 files changed, 149 insertions(+), 78 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethtool.c 
b/drivers/net/e1000/e1000_ethtool.c
index 667f18b..2627395 100644
--- a/drivers/net/e1000/e1000_ethtool.c
+++ b/drivers/net/e1000/e1000_ethtool.c
@@ -728,37 +728,45 @@ err_setup:
return err;
 }
 
-#define REG_PATTERN_TEST(R, M, W)  
\
-{  
\
-   uint32_t pat, val; \
-   const uint32_t test[] =\
-   {0x5A5A5A5A, 0xA5A5A5A5, 0x, 0x};  \
-   for (pat = 0; pat  ARRAY_SIZE(test); pat++) { \
-   E1000_WRITE_REG(adapter-hw, R, (test[pat]  W)); \
-   val = E1000_READ_REG(adapter-hw, R); \
-   if (val != (test[pat]  W  M)) {  \
-   DPRINTK(DRV, ERR, pattern test reg %04X failed: got  \
-   0x%08X expected 0x%08X\n,\
-   E1000_##R, val, (test[pat]  W  M));  \
-   *data = (adapter-hw.mac_type  e1000_82543) ? \
-   E1000_82542_##R : E1000_##R;   \
-   return 1;  \
-   }  \
-   }  \
+static bool reg_pattern_test(struct e1000_adapter *adapter, uint64_t *data,
+int reg, uint32_t mask, uint32_t write)
+{
+   static const uint32_t test[] =
+   {0x5A5A5A5A, 0xA5A5A5A5, 0x, 0x};
+   uint8_t __iomem *address = adapter-hw.hw_addr + reg;
+   uint32_t read;
+   int i;
+
+   for (i = 0; i  ARRAY_SIZE(test); i++) {
+   writel(write  test[i], address);
+   read = readl(address);
+   if (read != (write  test[i]  mask)) {
+   DPRINTK(DRV, ERR, pattern test reg %04X failed: 
+   got 0x%08X expected 0x%08X\n,
+   reg, read, (write  test[i]  mask));
+   *data = reg;
+   return true;
+   }
+   }
+   return false;
 }
 
-#define REG_SET_AND_CHECK(R, M, W) 
\
-{  
\
-   uint32_t val;  \
-   E1000_WRITE_REG(adapter-hw, R, W  M);   \
-   val = E1000_READ_REG(adapter-hw, R); \
-   if ((W  M) != (val  M)) {\
-   DPRINTK(DRV, ERR, set/check reg %04X test failed: got 0x%08X \
-   expected 0x%08X\n, E1000_##R, (val  M), (W  M));   \
-   *data = (adapter-hw.mac_type  e1000_82543) ? \
-   E1000_82542_##R : E1000_##R;   \
-   return 1;  \
-   }  \
+static bool reg_set_and_check(struct e1000_adapter *adapter, uint64_t *data,
+ int reg, uint32_t mask, uint32_t write)
+{
+   uint8_t __iomem *address = adapter-hw.hw_addr + reg;
+   uint32_t read;
+
+   writel(write  mask, address);
+   read = readl(address);
+   if ((read  mask) != (write  mask)) {
+   DPRINTK(DRV, ERR, set/check reg %04X test failed: 
+   got 0x%08X expected 0x%08X\n,
+   reg, (read  mask), (write  mask));
+   *data = reg;
+   return true;
+   }
+   return false;
 }
 
 static int
@@ -800,58 +808,115 @@ e1000_reg_test(struct e1000_adapter *adapter, uint64_t 
*data)
E1000_WRITE_REG(adapter-hw, STATUS, before);
 
if (adapter-hw.mac_type != e1000_ich8lan) {
-   REG_PATTERN_TEST(FCAL, 0x, 0x);
-

Re: [PATCH] - e1000_ethtool.c - convert macros to functions

2007-10-31 Thread Kok, Auke

Joe Perches wrote:
 Convert REG_PATTERN_TEST and REG_SET_AND_CHECK macros to functions
 Reduces x86 defconfig image by about 3k
 
 compiled, untested (no hardware)
 
 Signed-off-by: Joe Perches [EMAIL PROTECTED]
 
 New:
 
 $ size vmlinux
textdata bss dec hex filename
 4792735  490626  606208 5889569  59de21 vmlinux
 
 Current:
 
 $ size vmlinux
textdata bss dec hex filename
 4795759  490626  606208 5892593  59e9f1 vmlinux
 
 ---
 
  drivers/net/e1000/e1000_ethtool.c |  185 
 +
  drivers/net/e1000/e1000_osdep.h   |   42 +
  2 files changed, 149 insertions(+), 78 deletions(-)
 
 diff --git a/drivers/net/e1000/e1000_ethtool.c 
 b/drivers/net/e1000/e1000_ethtool.c
 index 667f18b..2627395 100644
 --- a/drivers/net/e1000/e1000_ethtool.c
 +++ b/drivers/net/e1000/e1000_ethtool.c
 @@ -728,37 +728,45 @@ err_setup:
   return err;
  }
  
 -#define REG_PATTERN_TEST(R, M, W)
   \
 -{
   \
 - uint32_t pat, val; \
 - const uint32_t test[] =\
 - {0x5A5A5A5A, 0xA5A5A5A5, 0x, 0x};  \
 - for (pat = 0; pat  ARRAY_SIZE(test); pat++) { \
 - E1000_WRITE_REG(adapter-hw, R, (test[pat]  W)); \
 - val = E1000_READ_REG(adapter-hw, R); \
 - if (val != (test[pat]  W  M)) {  \
 - DPRINTK(DRV, ERR, pattern test reg %04X failed: got  \
 - 0x%08X expected 0x%08X\n,\
 - E1000_##R, val, (test[pat]  W  M));  \
 - *data = (adapter-hw.mac_type  e1000_82543) ? \
 - E1000_82542_##R : E1000_##R;   \
 - return 1;  \
 - }  \
 - }  \
 +static bool reg_pattern_test(struct e1000_adapter *adapter, uint64_t *data,
 +  int reg, uint32_t mask, uint32_t write)
 +{
 + static const uint32_t test[] =
 + {0x5A5A5A5A, 0xA5A5A5A5, 0x, 0x};
 + uint8_t __iomem *address = adapter-hw.hw_addr + reg;
 + uint32_t read;
 + int i;
 +
 + for (i = 0; i  ARRAY_SIZE(test); i++) {
 + writel(write  test[i], address);
 + read = readl(address);
 + if (read != (write  test[i]  mask)) {
 + DPRINTK(DRV, ERR, pattern test reg %04X failed: 
 + got 0x%08X expected 0x%08X\n,
 + reg, read, (write  test[i]  mask));
 + *data = reg;
 + return true;
 + }
 + }
 + return false;

that's not a bad idea, however see below:

  }
  
 -#define REG_SET_AND_CHECK(R, M, W)   
   \
 -{
   \
 - uint32_t val;  \
 - E1000_WRITE_REG(adapter-hw, R, W  M);   \
 - val = E1000_READ_REG(adapter-hw, R); \
 - if ((W  M) != (val  M)) {\
 - DPRINTK(DRV, ERR, set/check reg %04X test failed: got 0x%08X \
 - expected 0x%08X\n, E1000_##R, (val  M), (W  M));   \
 - *data = (adapter-hw.mac_type  e1000_82543) ? \
 - E1000_82542_##R : E1000_##R;   \
 - return 1;  \
 - }  \
 +static bool reg_set_and_check(struct e1000_adapter *adapter, uint64_t *data,
 +   int reg, uint32_t mask, uint32_t write)
 +{
 + uint8_t __iomem *address = adapter-hw.hw_addr + reg;
 + uint32_t read;
 +
 + writel(write  mask, address);
 + read = readl(address);
 + if ((read  mask) != (write  mask)) {
 + DPRINTK(DRV, ERR, set/check reg %04X test failed: 
 + got 0x%08X expected 0x%08X\n,
 + reg, (read  mask), (write  mask));
 + *data = reg;
 + return true;
 + }
 + return false;
  }
  
  static int
 @@ -800,58 +808,115 @@ e1000_reg_test(struct e1000_adapter *adapter, uint64_t 
 *data)
   E1000_WRITE_REG(adapter-hw, STATUS, before);
  
   if (adapter-hw.mac_type != e1000_ich8lan) {
 -

Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code

2007-10-31 Thread Daniel Lezcano


Eric Dumazet wrote:

On Wed, 31 Oct 2007 22:19:43 +0300
Pavel Emelyanov [EMAIL PROTECTED] wrote:

Currently we have the NET_NS config option, but the only change it 
makes is just return ERR_PTR(-EINVAL) inside the cloning call thus

introducing a bunch of a dead code and making the reference counting
unneeded. This is not very good.

So clean the net_namespace.c to fix this.

I have sent a set of patches to Andrew to make similar thing for
other namespaces, which introduces the NAMESPACES option to turn
all the namespaces off at once (to make embedded people suffer
less). So after that stuff is in, there will be some more patches 
to tie all this together.


What is to be done after this set is to make the register_pernet_xxx
stuff smaller. Currently this code weights approximately 500 bytes, 
so it worths reducing it, but I haven't found a good solution yet.


Definitly wanted here. Thank you.
One more refcounting on each socket creation/deletion was expensive.

Maybe we can add a macro to get nd_net from a struct net_device
so that every instance of

if (dev-nd_net != init_net)
goto drop;

can also be optimized away if !CONFIG_NET_NS

extern inline netdev_get_ns(struct netdevice *dev)
{
#ifdef CONFIG_NET_NS
return dev-nd_net;
#else
return init_net;
#endif
}


Or something like:

#ifdef CONFIG_NET_NS
static inline int init_net_dev(struct net_device *dev)
{
return dev-nd_net == init_net;
}
#else
static inline int init_net_dev(struct net_device *dev)
{
return 1;
}
#endif


By the way, this kind of test will disappear when the network namespace 
will be complete and take into account the differents protocols.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH]: Fix myri10ge NAPI oops warnings

2007-10-31 Thread Stephen Hemminger

On Wed, 31 Oct 2007 17:40:06 -0400
Andrew Gallatin [EMAIL PROTECTED] wrote:

 
 When testing the myri10ge driver with 2.6.24-rc1, I found
 that the machine crashed under heavy load:
 
 Unable to handle kernel paging request at 00100108 RIP:
   [803cc8dd] net_rx_action+0x11b/0x184
 
 The address corresponds to the list_move_tail() in
 netif_rx_complete():
  if (unlikely(work == weight))
  list_move_tail(n-poll_list, list);
 
 Eventually, I traced the crashes to calling netif_rx_complete() with
 work_done == budget.  From looking at other drivers, it appears that
 one should only call netif_rx_complete() when work_done  budget.
 
 To fix it, I changed the test in myri10ge_poll() so that it refers
 to to work_done rather than looking at the rx ring status.  If
 work_done is  budget, then that implies we have no more packets to
 process. Any races will be resolved by the NIC when the write to
 irq_claim is made.
 
 In myri10ge_clean_rx_done(), if we ever exceeded our budget, it would
 report a work_done one larger than was acutally done.  This is because
 the increment was done in the conditional, so work_done would be
 incremented regardless of whether or not the test passed or failed.
 This would lead to the WARN_ON_ONCE(work  weight); warning in
 net_rx_action triggering.  I've moved the increment of work_done
 inside the loop.  Note that this would only be a problem when we had
 exceeded our budget.
 
 Signed off by: Andrew Gallatin [EMAIL PROTECTED]
 
 Andrew Gallatin Myricom Inc
 
 

Yes, this looks right.
How could the check in netif_rx_complete be changed to catch this better?

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 10/14 v2] nes: eeprom and phy routines

2007-10-31 Thread Glenn Grundstrom

   +/*
   +Everything you wanted to know about CRC algorithms, but 
 were afraid to ask
   + for fear that errors in your understanding might be 
 detected. Version  : 3.
 
 etc etc... can all this be replaced with what's in lib/crc32.c?  (I
 hope so)

Replacing this code is already in the works.

Glenn.

 
  - R.
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH]: Fix myri10ge NAPI oops warnings

2007-10-31 Thread Andrew Gallatin



When testing the myri10ge driver with 2.6.24-rc1, I found
that the machine crashed under heavy load:

Unable to handle kernel paging request at 00100108 RIP:
 [803cc8dd] net_rx_action+0x11b/0x184

The address corresponds to the list_move_tail() in
netif_rx_complete():
if (unlikely(work == weight))
list_move_tail(n-poll_list, list);

Eventually, I traced the crashes to calling netif_rx_complete() with
work_done == budget.  From looking at other drivers, it appears that
one should only call netif_rx_complete() when work_done  budget.

To fix it, I changed the test in myri10ge_poll() so that it refers
to to work_done rather than looking at the rx ring status.  If
work_done is  budget, then that implies we have no more packets to
process. Any races will be resolved by the NIC when the write to
irq_claim is made.

In myri10ge_clean_rx_done(), if we ever exceeded our budget, it would
report a work_done one larger than was acutally done.  This is because
the increment was done in the conditional, so work_done would be
incremented regardless of whether or not the test passed or failed.
This would lead to the WARN_ON_ONCE(work  weight); warning in
net_rx_action triggering.  I've moved the increment of work_done
inside the loop.  Note that this would only be a problem when we had
exceeded our budget.

Signed off by: Andrew Gallatin [EMAIL PROTECTED]

Andrew Gallatin Myricom Inc


diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
index 366e62a..0f306dd 100644
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -1151,7 +1151,7 @@ static inline int myri10ge_clean_rx_done
u16 length;
__wsum checksum;
 
-   while (rx_done-entry[idx].length != 0  work_done++  budget) {
+   while (rx_done-entry[idx].length != 0  work_done  budget) {
length = ntohs(rx_done-entry[idx].length);
rx_done-entry[idx].length = 0;
checksum = csum_unfold(rx_done-entry[idx].checksum);
@@ -1167,6 +1167,7 @@ static inline int myri10ge_clean_rx_done
rx_bytes += rx_ok * (unsigned long)length;
cnt++;
idx = cnt  (myri10ge_max_intr_slots - 1);
+   work_done++;
}
rx_done-idx = idx;
rx_done-cnt = cnt;
@@ -1233,13 +1234,12 @@ static int myri10ge_poll(struct napi_str
struct myri10ge_priv *mgp =
container_of(napi, struct myri10ge_priv, napi);
struct net_device *netdev = mgp-dev;
-   struct myri10ge_rx_done *rx_done = mgp-rx_done;
int work_done;
 
/* process as many rx events as NAPI will allow */
work_done = myri10ge_clean_rx_done(mgp, budget);
 
-   if (rx_done-entry[rx_done-idx].length == 0 || !netif_running(netdev)) 
{
+   if (work_done  budget || !netif_running(netdev)) {
netif_rx_complete(netdev, napi);
put_be32(htonl(3), mgp-irq_claim);
}

Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code

2007-10-31 Thread Daniel Lezcano


Pavel Emelyanov wrote:
Currently we have the NET_NS config option, but the only change it 
makes is just return ERR_PTR(-EINVAL) inside the cloning call thus

introducing a bunch of a dead code and making the reference counting
unneeded. This is not very good.

So clean the net_namespace.c to fix this.

I have sent a set of patches to Andrew to make similar thing for
other namespaces, which introduces the NAMESPACES option to turn
all the namespaces off at once (to make embedded people suffer
less). So after that stuff is in, there will be some more patches 
to tie all this together.


What is to be done after this set is to make the register_pernet_xxx
stuff smaller. Currently this code weights approximately 500 bytes, 
so it worths reducing it, but I haven't found a good solution yet.


Did you had time to check the impact of your patch with the rest of the 
network namespaces not yet included in mainline, belonging to Eric's git 
tree ?


ps: can you cc' emails concerning the network namespace to the 
containers mailing list too ? thx.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ucc_geth: add support for netpoll

2007-10-31 Thread Anton Vorontsov

On Mon, Oct 29, 2007 at 03:17:44PM +0300, Anton Vorontsov wrote:
[...]
  Oops.  The original patch happened to hit the Junk mail box. :(
 
 That one as well? http://lkml.org/lkml/2007/10/11/128
 
  I think
  the patch is good to merge after the cosmetic change.  I can do it in
  next pull request to Jeff.
 
 Ok, great. Thanks.

I'm wondering if you missed that email again. Maybe your mail
client/server doing weird things with emails from @ru.mvista.com?

Thanks.

 Here it is:
 
 - - - -
 From: Anton Vorontsov [EMAIL PROTECTED]
 Subject: [PATCH] ucc_geth: add support for netpoll
 
 This patch adds netpoll support for the QE UCC Gigabit Ethernet
 driver. Tested using netconsole and KGDBoE.
 
 Signed-off-by: Anton Vorontsov [EMAIL PROTECTED]
 ---
  drivers/net/ucc_geth.c |   20 
  1 files changed, 20 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
 index bec413b..94e78d8 100644
 --- a/drivers/net/ucc_geth.c
 +++ b/drivers/net/ucc_geth.c
 @@ -3678,6 +3678,23 @@ static irqreturn_t ucc_geth_irq_handler(int irq, void 
 *info)
   return IRQ_HANDLED;
  }
  
 +#ifdef CONFIG_NET_POLL_CONTROLLER
 +/*
 + * Polling 'interrupt' - used by things like netconsole to send skbs
 + * without having to re-enable interrupts. It's not called while
 + * the interrupt routine is executing.
 + */
 +static void ucc_netpoll(struct net_device *dev)
 +{
 + struct ucc_geth_private *ugeth = netdev_priv(dev);
 + int irq = ugeth-ug_info-uf_info.irq;
 +
 + disable_irq(irq);
 + ucc_geth_irq_handler(irq, dev);
 + enable_irq(irq);
 +}
 +#endif /* CONFIG_NET_POLL_CONTROLLER */
 +
  /* Called when something needs to use the ethernet device */
  /* Returns 0 for success. */
  static int ucc_geth_open(struct net_device *dev)
 @@ -3963,6 +3980,9 @@ static int ucc_geth_probe(struct of_device* ofdev, 
 const struct of_device_id *ma
  #ifdef CONFIG_UGETH_NAPI
   netif_napi_add(dev, ugeth-napi, ucc_geth_poll, UCC_GETH_DEV_WEIGHT);
  #endif   /* CONFIG_UGETH_NAPI */
 +#ifdef CONFIG_NET_POLL_CONTROLLER
 + dev-poll_controller = ucc_netpoll;
 +#endif
   dev-stop = ucc_geth_close;
  //dev-change_mtu = ucc_geth_change_mtu;
   dev-mtu = 1500;
 -- 
 1.5.2.2

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.net/bd2
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code

2007-10-31 Thread Eric W. Biederman

Eric Dumazet [EMAIL PROTECTED] writes:


 Definitly wanted here. Thank you.
 One more refcounting on each socket creation/deletion was expensive.

Really?  Have you actually measured that?  If the overhead is
measurable and expensive we may want to look at per cpu counters or
something like that.  So far I don't have any numbers that say any
of the network namespace work inherently has any overhead.

 Maybe we can add a macro to get nd_net from a struct net_device
 so that every instance of

 if (dev-nd_net != init_net)
 goto drop;

 can also be optimized away if !CONFIG_NET_NS

Well that extra check should be removed once we finish converting
those code paths.  So I'm not too worried.

If this becomes a big issue I can dig up my old code that
replaced struct net * with a net_t typedef and used functions
for all of the comparisons and allowed everything to be compiled
away.

Trouble was it was sufficiently different that it was just enough
different that people could not immediately understand the code.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] e1000e: Disable L1 ASPM power savings for 82573 mobile variants

2007-10-31 Thread Auke Kok

L1 ASPM link (pci-e link power savings) has significant benefits
(~1W savings when link is active) but unfortunately does not work
correctly on any of the chipsets that have 82573 on mobile platforms
which causes various nuisances:
 - eeprom reads return garbage information leading to bad eeprom
   checksums
 - long ping times (up to 2 seconds)
 - complete system hangs (freeze/lockup)

A lot of T60 owners have been plagued by this, but other mobile
solutions also suffer from these symptoms.

Disabling L1 ASPM before we activate the PCI-E link fixes all of
these issues at the cost of some power consumption.

Remove a workaround RDTR adjustment that is no longer needed with
this new one.

Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 drivers/net/e1000e/82571.c  |1 -
 drivers/net/e1000e/e1000.h  |1 -
 drivers/net/e1000e/netdev.c |   30 ++
 drivers/net/e1000e/param.c  |7 ---
 4 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/drivers/net/e1000e/82571.c b/drivers/net/e1000e/82571.c
index b6401ab..45f5ee2 100644
--- a/drivers/net/e1000e/82571.c
+++ b/drivers/net/e1000e/82571.c
@@ -1343,7 +1343,6 @@ struct e1000_info e1000_82573_info = {
  | FLAG_HAS_STATS_ICR_ICT
  | FLAG_HAS_SMART_POWER_DOWN
  | FLAG_HAS_AMT
- | FLAG_HAS_ASPM
  | FLAG_HAS_ERT
  | FLAG_HAS_SWSM_ON_LOAD,
.pba= 20,
diff --git a/drivers/net/e1000e/e1000.h b/drivers/net/e1000e/e1000.h
index 473f78d..8b88c22 100644
--- a/drivers/net/e1000e/e1000.h
+++ b/drivers/net/e1000e/e1000.h
@@ -288,7 +288,6 @@ struct e1000_info {
 #define FLAG_HAS_CTRLEXT_ON_LOAD  (1  5)
 #define FLAG_HAS_SWSM_ON_LOAD (1  6)
 #define FLAG_HAS_JUMBO_FRAMES (1  7)
-#define FLAG_HAS_ASPM (1  8)
 #define FLAG_HAS_STATS_ICR_ICT(1  9)
 #define FLAG_HAS_STATS_PTC_PRC(1  10)
 #define FLAG_HAS_SMART_POWER_DOWN (1  11)
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 4fd2e23..ec427e2 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -3511,6 +3511,33 @@ static int e1000_suspend(struct pci_dev *pdev, 
pm_message_t state)
return 0;
 }
 
+static void e1000e_disable_l1aspm(struct pci_dev *pdev)
+{
+   int pos;
+   u32 cap;
+   u16 val;
+
+   /*
+* 82573 workaround - disable L1 ASPM on mobile chipsets
+*
+* L1 ASPM on various mobile (ich7) chipsets do not behave properly
+* resulting in lost data or garbage information on the pci-e link
+* level. This could result in (false) bad EEPROM checksum errors,
+* long ping times (up to 2s) or even a system freeze/hang.
+*
+* Unfortunately this feature saves about 1W power consumption when
+* active.
+*/
+   pos = pci_find_capability(pdev, PCI_CAP_ID_EXP);
+   pci_read_config_dword(pdev, pos + PCI_EXP_LNKCAP, cap);
+   pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, val);
+   if (val  0x2) {
+   dev_warn(pdev-dev, Disabling L1 ASPM\n);
+   val = ~0x2;
+   pci_write_config_word(pdev, pos + PCI_EXP_LNKCTL, val);
+   }
+}
+
 #ifdef CONFIG_PM
 static int e1000_resume(struct pci_dev *pdev)
 {
@@ -3521,6 +3548,7 @@ static int e1000_resume(struct pci_dev *pdev)
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
+   e1000e_disable_l1aspm(pdev);
err = pci_enable_device(pdev);
if (err) {
dev_err(pdev-dev,
@@ -3621,6 +3649,7 @@ static pci_ers_result_t e1000_io_slot_reset(struct 
pci_dev *pdev)
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = adapter-hw;
 
+   e1000e_disable_l1aspm(pdev);
if (pci_enable_device(pdev)) {
dev_err(pdev-dev,
Cannot re-enable PCI device after reset.\n);
@@ -3722,6 +3751,7 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
u16 eeprom_data = 0;
u16 eeprom_apme_mask = E1000_EEPROM_APME;
 
+   e1000e_disable_l1aspm(pdev);
err = pci_enable_device(pdev);
if (err)
return err;
diff --git a/drivers/net/e1000e/param.c b/drivers/net/e1000e/param.c
index 3327892..df266c3 100644
--- a/drivers/net/e1000e/param.c
+++ b/drivers/net/e1000e/param.c
@@ -262,13 +262,6 @@ void __devinit e1000e_check_options(struct e1000_adapter 
*adapter)
 .max = MAX_RXDELAY } }
};
 
-   /* modify min and default if 82573 for slow ping w/a,
-* a value greater than 8 needs to be set for RDTR */
-   if (adapter-flags  FLAG_HAS_ASPM) {
-   opt.def = 32;
-

Re: [Bugme-new] [Bug 9269] New: bonding module cannot enslave ethernet devices provided by sunhme

2007-10-31 Thread Andrew Morton

On Wed, 31 Oct 2007 14:35:56 -0700 (PDT)
[EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=9269
 
Summary: bonding module cannot enslave ethernet devices provided
 by sunhme
Product: Drivers
Version: 2.5
  KernelVersion: 2.6.18-3
   Platform: All
 OS/Version: Linux
   Tree: Mainline
 Status: NEW
   Severity: normal
   Priority: P1
  Component: Network
 AssignedTo: [EMAIL PROTECTED]
 ReportedBy: [EMAIL PROTECTED]
 
 
 Most recent kernel where this bug did not occur: N/A
 Distribution: Debian 4.0r1/stable (Etch)
 Hardware Environment: Sun Netra T1 105 (sparc64)
 Software Environment:
 Problem Description:
 bonding module cannot enslave ethernet devices provided by sunhme - link
 status is never reported via netif_carrier_on/netif_carrier_off which makes
 sunhme not usable as a slave to the bonding module
 
 Steps to reproduce:
 modprobe sunhme
 modprobe bonding
 ifconfig bond0 up
 ifconfig eth0 up
 # wait for link to be up
 ifenslave bond0 eth0
 # bond0 never comes up and keeps waiting for link
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] Make nicer CONFIG_NET_NS=n case code

2007-10-31 Thread Eric Dumazet


Eric W. Biederman a écrit :

Eric Dumazet [EMAIL PROTECTED] writes:



Definitly wanted here. Thank you.
One more refcounting on each socket creation/deletion was expensive.


Really?  Have you actually measured that?  If the overhead is
measurable and expensive we may want to look at per cpu counters or
something like that.  So far I don't have any numbers that say any
of the network namespace work inherently has any overhead.


It seems that on some old opterons (two 246 for example),
if (atomic_dec_and_test(net-count)) is rather expensive yes :(

I am not sure per cpu counters help : I tried this and got no speedup. (This 
was on net_device refcnt at that time)


(on this machines, the access through fs/gs selector seems expensive too)

Maybe a lazy mode could be done, ie only do a atomic_dec(), as done in 
dev_put() ?

Also, count sits in a cache line that contains mostly read and shared 
fields, you might want to put it in a separate cache line in SMP, to avoid 
cache line ping-pongs.






Maybe we can add a macro to get nd_net from a struct net_device
so that every instance of

if (dev-nd_net != init_net)
goto drop;

can also be optimized away if !CONFIG_NET_NS


Well that extra check should be removed once we finish converting
those code paths.  So I'm not too worried.


OK. Since the conditional test can be predicted by cpu, it certainly doesnt 
matter.




If this becomes a big issue I can dig up my old code that
replaced struct net * with a net_t typedef and used functions
for all of the comparisons and allowed everything to be compiled
away.






Trouble was it was sufficiently different that it was just enough
different that people could not immediately understand the code.




-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] e1000/e1000e: Move PCI-Express device IDs over to e1000e

2007-10-31 Thread Auke Kok

e1000e will from now on support the PCI-Express adapters that
previously were supported by e1000. This support means better
performance and easier debugging from now on for both the old
PCI-X/PCI hardware and PCI-Express adapters.

This patch also moves 3 recently merged device IDs over to e1000e
that are identical to quad-port versions of already existing
dual port versions. With this last bit every former e1000 pci-e
device should work now with e1000e.

Here is a brief list of which gigabit driver to use with which
adapter:

  e1000:
82540 - 82547

  e1000e:
82571 - 82573
ich8, ich9   (82562 or 82566)
es2lan   (80003eslan)

  igb: (not yet merged, only available from e1000.sf.net)
82575

Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 drivers/net/e1000/e1000_main.c |   27 ---
 drivers/net/e1000e/82571.c |6 ++
 drivers/net/e1000e/hw.h|3 +++
 drivers/net/e1000e/netdev.c|9 +++--
 4 files changed, 12 insertions(+), 33 deletions(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 72deff0..d1b88e4 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -73,14 +73,6 @@ static struct pci_device_id e1000_pci_tbl[] = {
INTEL_E1000_ETHERNET_DEVICE(0x1026),
INTEL_E1000_ETHERNET_DEVICE(0x1027),
INTEL_E1000_ETHERNET_DEVICE(0x1028),
-   INTEL_E1000_ETHERNET_DEVICE(0x1049),
-   INTEL_E1000_ETHERNET_DEVICE(0x104A),
-   INTEL_E1000_ETHERNET_DEVICE(0x104B),
-   INTEL_E1000_ETHERNET_DEVICE(0x104C),
-   INTEL_E1000_ETHERNET_DEVICE(0x104D),
-   INTEL_E1000_ETHERNET_DEVICE(0x105E),
-   INTEL_E1000_ETHERNET_DEVICE(0x105F),
-   INTEL_E1000_ETHERNET_DEVICE(0x1060),
INTEL_E1000_ETHERNET_DEVICE(0x1075),
INTEL_E1000_ETHERNET_DEVICE(0x1076),
INTEL_E1000_ETHERNET_DEVICE(0x1077),
@@ -89,28 +81,9 @@ static struct pci_device_id e1000_pci_tbl[] = {
INTEL_E1000_ETHERNET_DEVICE(0x107A),
INTEL_E1000_ETHERNET_DEVICE(0x107B),
INTEL_E1000_ETHERNET_DEVICE(0x107C),
-   INTEL_E1000_ETHERNET_DEVICE(0x107D),
-   INTEL_E1000_ETHERNET_DEVICE(0x107E),
-   INTEL_E1000_ETHERNET_DEVICE(0x107F),
INTEL_E1000_ETHERNET_DEVICE(0x108A),
-   INTEL_E1000_ETHERNET_DEVICE(0x108B),
-   INTEL_E1000_ETHERNET_DEVICE(0x108C),
-   INTEL_E1000_ETHERNET_DEVICE(0x1096),
-   INTEL_E1000_ETHERNET_DEVICE(0x1098),
INTEL_E1000_ETHERNET_DEVICE(0x1099),
-   INTEL_E1000_ETHERNET_DEVICE(0x109A),
-   INTEL_E1000_ETHERNET_DEVICE(0x10A4),
-   INTEL_E1000_ETHERNET_DEVICE(0x10A5),
INTEL_E1000_ETHERNET_DEVICE(0x10B5),
-   INTEL_E1000_ETHERNET_DEVICE(0x10B9),
-   INTEL_E1000_ETHERNET_DEVICE(0x10BA),
-   INTEL_E1000_ETHERNET_DEVICE(0x10BB),
-   INTEL_E1000_ETHERNET_DEVICE(0x10BC),
-   INTEL_E1000_ETHERNET_DEVICE(0x10C4),
-   INTEL_E1000_ETHERNET_DEVICE(0x10C5),
-   INTEL_E1000_ETHERNET_DEVICE(0x10D5),
-   INTEL_E1000_ETHERNET_DEVICE(0x10D9),
-   INTEL_E1000_ETHERNET_DEVICE(0x10DA),
/* required last entry */
{0,}
 };
diff --git a/drivers/net/e1000e/82571.c b/drivers/net/e1000e/82571.c
index 45f5ee2..3beace5 100644
--- a/drivers/net/e1000e/82571.c
+++ b/drivers/net/e1000e/82571.c
@@ -194,6 +194,8 @@ static s32 e1000_init_mac_params_82571(struct e1000_adapter 
*adapter)
break;
case E1000_DEV_ID_82571EB_SERDES:
case E1000_DEV_ID_82572EI_SERDES:
+   case E1000_DEV_ID_82571EB_SERDES_DUAL:
+   case E1000_DEV_ID_82571EB_SERDES_QUAD:
hw-media_type = e1000_media_type_internal_serdes;
break;
default:
@@ -260,6 +262,7 @@ static s32 e1000_get_invariants_82571(struct e1000_adapter 
*adapter)
case E1000_DEV_ID_82571EB_QUAD_COPPER:
case E1000_DEV_ID_82571EB_QUAD_FIBER:
case E1000_DEV_ID_82571EB_QUAD_COPPER_LP:
+   case E1000_DEV_ID_82571PT_QUAD_COPPER:
adapter-flags |= FLAG_IS_QUAD_PORT;
/* mark the first port */
if (global_quad_port_a == 0)
@@ -285,6 +288,9 @@ static s32 e1000_get_invariants_82571(struct e1000_adapter 
*adapter)
if (adapter-flags  FLAG_IS_QUAD_PORT 
(!(adapter-flags  FLAG_IS_QUAD_PORT_A)))
adapter-flags = ~FLAG_HAS_WOL;
+   /* Does not support WoL on any port */
+   if (pdev-device == E1000_DEV_ID_82571EB_SERDES_QUAD)
+   adapter-flags = ~FLAG_HAS_WOL;
break;
 
case e1000_82573:
diff --git a/drivers/net/e1000e/hw.h b/drivers/net/e1000e/hw.h
index 1bb2052..71f93ce 100644
--- a/drivers/net/e1000e/hw.h
+++ b/drivers/net/e1000e/hw.h
@@ -303,8 +303,11 @@ enum e1e_registers {
 #define E1000_DEV_ID_82571EB_FIBER 0x105F
 #define E1000_DEV_ID_82571EB_SERDES0x1060
 #define E1000_DEV_ID_82571EB_QUAD_COPPER

1 2 >

1 - 100 of 120 matches

Mail list logo