[PATCH -mm] iseries_veth: Kill unused variable

2007-09-22 Thread Satyam Sharma

drivers/net/iseries_veth.c: In function 'veth_transmit_to_many':
drivers/net/iseries_veth.c:1174: warning: unused variable 'port'

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 drivers/net/iseries_veth.c |1 -
 1 file changed, 1 deletion(-)

diff -ruNp a/drivers/net/iseries_veth.c b/drivers/net/iseries_veth.c
--- a/drivers/net/iseries_veth.c2007-09-22 06:26:39.0 +0530
+++ b/drivers/net/iseries_veth.c2007-09-22 11:17:29.0 +0530
@@ -1171,7 +1171,6 @@ static void veth_transmit_to_many(struct
  HvLpIndexMap lpmask,
  struct net_device *dev)
 {
-   struct veth_port *port = (struct veth_port *) dev->priv;
int i, success, error;
 
success = error = 0;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -mm] mv643xx_eth: Remove redundant multiple initialization

2007-09-22 Thread Satyam Sharma

Of ethtool_ops->get_stats_count and ethtool_ops->get_ethtool_stats.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 drivers/net/mv643xx_eth.c |2 --
 1 file changed, 2 deletions(-)

diff -ruNp a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
--- a/drivers/net/mv643xx_eth.c 2007-09-22 06:26:39.0 +0530
+++ b/drivers/net/mv643xx_eth.c 2007-09-22 13:15:41.0 +0530
@@ -2740,8 +2740,6 @@ static const struct ethtool_ops mv643xx_
.get_stats_count= mv643xx_get_stats_count,
.get_ethtool_stats  = mv643xx_get_ethtool_stats,
.get_strings= mv643xx_get_strings,
-   .get_stats_count= mv643xx_get_stats_count,
-   .get_ethtool_stats  = mv643xx_get_ethtool_stats,
.nway_reset = mv643xx_eth_nway_restart,
 };
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -mm] pasemi_mac: Build fix after recent netdev stats changes

2007-09-22 Thread Satyam Sharma


On Thu, 20 Sep 2007, Satyam Sharma wrote:
> 
> BTW ppc64_defconfig didn't quite like 2.6.23-rc6-mm1 either ...
> IIRC I got build failures in:

> drivers/net/pasemi_mac.c


[PATCH -mm] pasemi_mac: Build fix after recent netdev stats changes

Unbreak the following:

drivers/net/pasemi_mac.c: In function 'pasemi_mac_clean_rx':
drivers/net/pasemi_mac.c:533: error: 'dev' undeclared (first use in this 
function)
drivers/net/pasemi_mac.c:533: error: (Each undeclared identifier is reported 
only once
drivers/net/pasemi_mac.c:533: error: for each function it appears in.)

And remove an unused static function:

drivers/net/pasemi_mac.c: At top level:
drivers/net/pasemi_mac.c:89: warning: 'read_iob_reg' defined but not used

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 drivers/net/pasemi_mac.c |9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff -ruNp a/drivers/net/pasemi_mac.c b/drivers/net/pasemi_mac.c
--- a/drivers/net/pasemi_mac.c  2007-09-22 06:26:39.0 +0530
+++ b/drivers/net/pasemi_mac.c  2007-09-22 13:03:04.0 +0530
@@ -85,11 +85,6 @@ MODULE_PARM_DESC(debug, "PA Semi MAC bit
 
 static struct pasdma_status *dma_status;
 
-static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg)
-{
-   return in_le32(mac->iob_regs+reg);
-}
-
 static void write_iob_reg(struct pasemi_mac *mac, unsigned int reg,
  unsigned int val)
 {
@@ -530,8 +525,8 @@ static int pasemi_mac_clean_rx(struct pa
} else
skb->ip_summed = CHECKSUM_NONE;
 
-   dev->stats.rx_bytes += len;
-   dev->stats.rx_packets++;
+   mac->netdev->stats.rx_bytes += len;
+   mac->netdev->stats.rx_packets++;
 
skb->protocol = eth_type_trans(skb, mac->netdev);
netif_receive_skb(skb);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc6-mm1: Build failures on ppc64_defconfig

2007-09-22 Thread Satyam Sharma


On Thu, 20 Sep 2007, Satyam Sharma wrote:
> 
> BTW ppc64_defconfig didn't quite like 2.6.23-rc6-mm1 either ...
> IIRC I got build failures in:

> drivers/net/spider_net.c

Fixing the above showed up another problem in another file of the
same driver (drivers/net/spider_net_ethtool.c)


[PATCH -mm] spider_net_ethtool: Keep up with recent netdev stats changes

Unbreak the following:

drivers/net/spider_net_ethtool.c: In function 'spider_net_get_ethtool_stats':
drivers/net/spider_net_ethtool.c:160: error: structure has no member named 
'netdev_stats'
drivers/net/spider_net_ethtool.c:161: error: structure has no member named 
'netdev_stats'
drivers/net/spider_net_ethtool.c:162: error: structure has no member named 
'netdev_stats'
drivers/net/spider_net_ethtool.c:163: error: structure has no member named 
'netdev_stats'
drivers/net/spider_net_ethtool.c:164: error: structure has no member named 
'netdev_stats'
drivers/net/spider_net_ethtool.c:165: error: structure has no member named 
'netdev_stats'
drivers/net/spider_net_ethtool.c:166: error: structure has no member named 
'netdev_stats'
make[2]: *** [drivers/net/spider_net_ethtool.o] Error 1

Also do another ARRAY_SIZE() cleanup while at it.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 drivers/net/spider_net_ethtool.c |   18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff -ruNp a/drivers/net/spider_net_ethtool.c b/drivers/net/spider_net_ethtool.c
--- a/drivers/net/spider_net_ethtool.c  2007-09-22 06:26:39.0 +0530
+++ b/drivers/net/spider_net_ethtool.c  2007-09-22 12:43:51.0 +0530
@@ -28,8 +28,6 @@
 #include "spider_net.h"
 
 
-#define SPIDER_NET_NUM_STATS 13
-
 static struct {
const char str[ETH_GSTRING_LEN];
 } ethtool_stats_keys[] = {
@@ -149,7 +147,7 @@ spider_net_ethtool_get_ringparam(struct 
 
 static int spider_net_get_stats_count(struct net_device *netdev)
 {
-   return SPIDER_NET_NUM_STATS;
+   return ARRAY_SIZE(ethtool_stats_keys);
 }
 
 static void spider_net_get_ethtool_stats(struct net_device *netdev,
@@ -157,13 +155,13 @@ static void spider_net_get_ethtool_stats
 {
struct spider_net_card *card = netdev->priv;
 
-   data[0] = card->netdev_stats.tx_packets;
-   data[1] = card->netdev_stats.tx_bytes;
-   data[2] = card->netdev_stats.rx_packets;
-   data[3] = card->netdev_stats.rx_bytes;
-   data[4] = card->netdev_stats.tx_errors;
-   data[5] = card->netdev_stats.tx_dropped;
-   data[6] = card->netdev_stats.rx_dropped;
+   data[0] = netdev->stats.tx_packets;
+   data[1] = netdev->stats.tx_bytes;
+   data[2] = netdev->stats.rx_packets;
+   data[3] = netdev->stats.rx_bytes;
+   data[4] = netdev->stats.tx_errors;
+   data[5] = netdev->stats.tx_dropped;
+   data[6] = netdev->stats.rx_dropped;
data[7] = card->spider_stats.rx_desc_error;
data[8] = card->spider_stats.tx_timeouts;
data[9] = card->spider_stats.alloc_rx_skb_error;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc6-mm1: Build failures on ppc64_defconfig

2007-09-21 Thread Satyam Sharma


On Thu, 20 Sep 2007, Satyam Sharma wrote:
> 
> BTW ppc64_defconfig didn't quite like 2.6.23-rc6-mm1 either ...
> IIRC I got build failures in:

> drivers/net/spider_net.c


[PATCH -mm] spider_net: Misc build fixes after recent netdev stats changes

Unbreak the following:

drivers/net/spider_net.c: In function 'spider_net_release_tx_chain':
drivers/net/spider_net.c:818: error: 'dev' undeclared (first use in this 
function)
drivers/net/spider_net.c:818: error: (Each undeclared identifier is reported 
only once
drivers/net/spider_net.c:818: error: for each function it appears in.)
drivers/net/spider_net.c: In function 'spider_net_xmit':
drivers/net/spider_net.c:922: error: 'dev' undeclared (first use in this 
function)
drivers/net/spider_net.c: In function 'spider_net_pass_skb_up':
drivers/net/spider_net.c:1018: error: 'dev' undeclared (first use in this 
function)
drivers/net/spider_net.c: In function 'spider_net_decode_one_descr':
drivers/net/spider_net.c:1215: error: 'dev' undeclared (first use in this 
function)
make[2]: *** [drivers/net/spider_net.o] Error 1

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 drivers/net/spider_net.c |   24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff -ruNp a/drivers/net/spider_net.c b/drivers/net/spider_net.c
--- a/drivers/net/spider_net.c  2007-09-22 06:26:39.0 +0530
+++ b/drivers/net/spider_net.c  2007-09-22 12:12:23.0 +0530
@@ -795,6 +795,7 @@ spider_net_set_low_watermark(struct spid
 static int
 spider_net_release_tx_chain(struct spider_net_card *card, int brutal)
 {
+   struct net_device *dev = card->netdev;
struct spider_net_descr_chain *chain = &card->tx_chain;
struct spider_net_descr *descr;
struct spider_net_hw_descr *hwdescr;
@@ -919,7 +920,7 @@ spider_net_xmit(struct sk_buff *skb, str
spider_net_release_tx_chain(card, 0);
 
if (spider_net_prepare_tx_descr(card, skb) != 0) {
-   dev->stats.tx_dropped++;
+   netdev->stats.tx_dropped++;
netif_stop_queue(netdev);
return NETDEV_TX_BUSY;
}
@@ -979,16 +980,12 @@ static void
 spider_net_pass_skb_up(struct spider_net_descr *descr,
   struct spider_net_card *card)
 {
-   struct spider_net_hw_descr *hwdescr= descr->hwdescr;
-   struct sk_buff *skb;
-   struct net_device *netdev;
-   u32 data_status, data_error;
-
-   data_status = hwdescr->data_status;
-   data_error = hwdescr->data_error;
-   netdev = card->netdev;
+   struct spider_net_hw_descr *hwdescr = descr->hwdescr;
+   struct sk_buff *skb = descr->skb;
+   struct net_device *netdev = card->netdev;
+   u32 data_status = hwdescr->data_status;
+   u32 data_error = hwdescr->data_error;
 
-   skb = descr->skb;
skb_put(skb, hwdescr->valid_size);
 
/* the card seems to add 2 bytes of junk in front
@@ -1015,8 +1012,8 @@ spider_net_pass_skb_up(struct spider_net
}
 
/* update netdevice statistics */
-   dev->stats.rx_packets++;
-   dev->stats.rx_bytes += skb->len;
+   netdev->stats.rx_packets++;
+   netdev->stats.rx_bytes += skb->len;
 
/* pass skb up to stack */
netif_receive_skb(skb);
@@ -1184,6 +1181,7 @@ static int spider_net_resync_tail_ptr(st
 static int
 spider_net_decode_one_descr(struct spider_net_card *card)
 {
+   struct net_device *dev = card->netdev;
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *descr = chain->tail;
struct spider_net_hw_descr *hwdescr = descr->hwdescr;
@@ -1210,7 +1208,7 @@ spider_net_decode_one_descr(struct spide
 (status == SPIDER_NET_DESCR_PROTECTION_ERROR) ||
 (status == SPIDER_NET_DESCR_FORCE_END) ) {
if (netif_msg_rx_err(card))
-   dev_err(&card->netdev->dev,
+   dev_err(&dev->dev,
   "dropping RX descriptor with state %d\n", 
status);
dev->stats.rx_dropped++;
goto bad_desc;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc4-mm1 OOPS in forcedeth?

2007-09-17 Thread Satyam Sharma


On Mon, 17 Sep 2007, Denis V. Lunev wrote:
> Dhaval Giani wrote:
> > On Thu, Sep 13, 2007 at 11:51:33PM -0400, Andrew James Wade wrote:

> >> EIP: [] tcp_rto_min+0xb/0x15 SS:ESP 0068:c0596dec

As Vlad Yasevich mentioned, this one is already fixed in 23-rc6.

The forcedeth oops is unrelated, but multiple people have reported that
same oops now -- adding Manfred Spraul to CC. [ original thread is at:
http://lkml.org/lkml/2007/9/1/115 ]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Make the pr_*() family of macros in kernel.h complete

2007-09-14 Thread Satyam Sharma
> > -Original Message-
> > From: Medve Emilian-EMMEDVE1
> > Sent: Wednesday, September 12, 2007 11:40 AM
> > To: [EMAIL PROTECTED]; netdev@vger.kernel.org;
> > [EMAIL PROTECTED]; [EMAIL PROTECTED]
> > Cc: Medve Emilian-EMMEDVE1

... no maintainer explicitly copied on the original mail who'd push this in.
Little thing, but LKML is too noisy, so you need to Cc: folks explicitly ...

> > Subject: [PATCH v3] Make the pr_*() family of macros in
> > kernel.h complete
> >
> > Other/Some pr_*() macros are already defined in kernel.h, but
> > pr_err() was defined
> > multiple times in several other places
> >
> > Signed-off-by: Emil Medve <[EMAIL PROTECTED]>

I think it's a useful cleanup, patch looks good to me ...

Reviewed-by: Satyam Sharma <[EMAIL PROTECTED]>

[ Original diff is at: http://lkml.org/lkml/diff/2007/9/12/182/1 ]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

2007-09-06 Thread Satyam Sharma


On Thu, 6 Sep 2007, Johannes Berg wrote:

> On Thu, 2007-09-06 at 16:23 +0800, Herbert Xu wrote:
> > On Thu, Sep 06, 2007 at 10:32:33AM +0530, Satyam Sharma wrote:
> > > 
> > > > > [  382.529041]  [] dev_close+0x24/0x67
> > > > > [  382.529052]  [] ieee80211_master_stop+0x4a/0x6d 
> > > > > [mac80211]
> > 
> > This is where the bug is.  You cannot call dev_close from an
> > atomic context as i33380211_master_stop does it within spin
> > locks.
> 
> Hah, I suspected as much but didn't have a chance to look yet. I had
> plans to replace that sub_if_list with an RCU list and not require the
> lock there, but that's far off.

Unless I missed something obvious (let me know if that's the case! :-)
an RCU-protected list would suffer the same fate. list_for_each_xxx_rcu()
must be under rcu_read_lock() which == preempt_disable() ...
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug

2007-09-06 Thread Satyam Sharma


On Tue, 4 Sep 2007, Mark Hindley wrote:

> On Tue, Sep 04, 2007 at 02:09:47PM +0530, Satyam Sharma wrote:
> > Hi Steffen,
> > 
> > 
> > On Tue, 4 Sep 2007, Steffen Klassert wrote:
> > 
> > > On Tue, Sep 04, 2007 at 03:45:55AM +0530, Satyam Sharma wrote:
> > > > 
> > > > drivers/net/3c59x.c: In function 'vortex_up':
> > > > drivers/net/3c59x.c:1495: warning: 'err' may be used uninitialized in 
> > > > this function
> > > 
> > > This came in with the recently applied 
> > > 3c59x-check-return-of-pci_enable_device patch
> > > from Mark Hindley. I just compiled it on a PCI only machine so far, 
> > > therefore I did
> > > not notice the warning yet.
> > 
> > Hmm, the .config I built with had PCI=y as well. Probably a compiler
> > version difference -- Jeff also mentioned yesterday that some newer
> > GCC versions fail to warn about uninitialized variables cases.
> > 
> 
> Sorry, this is my bad. I have just checked: there is no warning with gcc
> 4.2 or 4.1, but 3.3 emits the warning. 

This is a GCC bug (regression, actually, as you've found out) -- no two
ways about it. Although different from the kind Jeff mentioned couple days
back -- that was about wising GCC up to false positives and /not/ emitting
warnings. But here a genuine problem was not complained about, so this is
more serious. Do you plan to open up a bug at gcc.gnu.org/bugzilla/ ?


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

2007-09-06 Thread Satyam Sharma


On Thu, 6 Sep 2007, Herbert Xu wrote:

> On Thu, Sep 06, 2007 at 10:32:33AM +0530, Satyam Sharma wrote:
> > 
> > > > [  382.529041]  [] dev_close+0x24/0x67
> > > > [  382.529052]  [] ieee80211_master_stop+0x4a/0x6d [mac80211]
> 
> This is where the bug is.  You cannot call dev_close from an
> atomic context as i33380211_master_stop does it within spin
> locks.

Doh, of course! I must be blind ... and wait_for_completion()'s
might_sleep() clearly didn't trigger earlier because Florian must've
had CONFIG_DEBUG_SPINLOCK_SLEEP off in his .config ...
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

2007-09-05 Thread Satyam Sharma
Hi,


> On 02/09/07, Florian Lohoff <[EMAIL PROTECTED]> wrote:
> >
> > Hi,
> > with current git i got this when "ifconfig eth1" down. eth1 had a mac
> > address which looked really like an eth1394 ethernet although the module
> > was not loaded. Something is really broken in 2.6.23-currentgit. I always 
> > get the

-currentgit would be -rc5 right?


> > sysfs rename issues which are discussed to be an udev issue. Then i see
> > a eth1394 mac address on an interface which typically shouldn exist
> > (udev should rename the wireless to eth1) and when issueing an
> > ifconfig eth1 down i get a
> >
> > BUG: scheduling while atomic: ifconfig/0x0002/4170

Is this reproducible? Also, please send your .config.

BTW the calltrace below shows that eth1 was the wireless interface when
you tried to "down" it.


> > On the next boot i see the eth1394 mac address on the wireless interface
> > wmaster0_rename whereas eth1 is active (the wireless) and has the correct
> > ip address.

I don't think the "scheduling while atomic" bug you saw is related to
the other problem you're seeing ... these are probably a bunch of all
different issues, but I'm not sure if eth1394 is involved at all.


> > I dont get it - this all looks really messed up. udev is
> > debian sid 114-2.

True, messed up it indeed is.


> > eth0  Link encap:Ethernet  HWaddr 00:17:42:13:45:8C
> >   UP BROADCAST MULTICAST  MTU:1500  Metric:1
> >   RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
> >   Interrupt:19
> >
> > eth1  Link encap:Ethernet  HWaddr 00:18:DE:63:F0:B3
> >   inet addr:195.71.97.208  Bcast:195.71.97.223  Mask:255.255.255.224
> >   inet6 addr: fe80::218:deff:fe63:f0b3/64 Scope:Link
> >   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >   RX packets:2079 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:2220 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:508959 (497.0 KiB)  TX bytes:261123 (255.0 KiB)
> >
> > wmaster0_ Link encap:UNSPEC  HWaddr 
> > 00-18-DE-63-F0-B3-30-3A-00-00-00-00-00-00-00-00
> >   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >   RX packets:0 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
> >
> >
> > [   14.300736] VFS: Mounted root (ext3 filesystem) readonly.
> > [   14.300902] Freeing unused kernel memory: 216k freed
> > [   17.618804] irda_init()
> > [   17.618817] NET: Registered protocol family 23
> > [   17.636399] ACPI: PCI Interrupt :02:00.0[A] -> GSI 16 (level, low) 
> > -> IRQ 19
> > [   17.636588] PCI: Setting latency timer of device :02:00.0 to 64
> > [   17.636619] sky2 :02:00.0: v1.17 addr 0xf000 irq 19 Yukon-EC 
> > Ultra (0xb4) rev 2
> > [   17.648081] parport_pc 00:0c: reported by Plug and Play ACPI
> > [   17.648206] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE,EPP]
> > [   17.653652] sky2 eth0: addr 00:17:42:13:45:8c
> > [   17.680848] input: Video Bus as /class/input/input6
> > [   17.680961] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: 
> > no)
> > [   17.757019] usbcore: registered new interface driver usbfs
> > [   17.757139] usbcore: registered new interface driver hub
> > [   17.757264] usbcore: registered new device driver usb
> > [   17.824819] Yenta: CardBus bridge found at :08:03.0 [10cf:131e]
> > [   17.824941] Yenta O2: res at 0x94/0xD4: 00/ea
> > [   17.825034] Yenta O2: enabling read prefetch/write burst
> > [   17.828363] hda: ATAPI 24X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, 
> > UDMA(33)
> > [   17.828838] Uniform CD-ROM driver Revision: 3.20
> > [   17.891481] USB Universal Host Controller Interface driver v3.0
> > [   17.891650] ACPI: PCI Interrupt :00:1d.0[A] -> GSI 23 (level, low) 
> > -> IRQ 22
> > [   17.891840] PCI: Setting latency timer of device :00:1d.0 to 64
> > [   17.891844] uhci_hcd :00:1d.0: UHCI Host Controller
> > [   17.892155] uhci_hcd :00:1d.0: new USB bus registered, assigned bus 
> > number 1
> > [   17.892327] uhci_hcd :00:1d.0: irq 22, io base 0x1820
> > [   17.892571] usb usb1: configuration #1 chosen from 1 choice
> > [   17.892689] hub 1-0:1.0: USB hub found
> > [   17.892784] hub 1-0:1.0: 2 ports detected
> > [   17.924265] found SMC SuperIO Chip (devid=0x7a rev=00 base=0x002e): 
> > LPC47N227
> > [   17.924390] smsc_superio_flat(): fir: 0x6e8, sir: 0x2e8, dma: 03, irq: 
> > 3, mode: 0x0e
> > [   17.924526] smsc_ircc_present: can't get sir_base of 0x2e8
> > [   17.954918] Yenta: ISA IRQ mask 0x0c38, PCI irq 19
> > [   17.955009] Socket status: 3006
> > [   17.955094] Yenta: Raising subordinate bus# o

Re: kernel crashes inside MV643xx driver

2007-09-05 Thread Satyam Sharma


On Wed, 5 Sep 2007, Dale Farnsworth wrote:

> On Wed, Sep 05, 2007 at 08:24:52AM -0700, Andrew Morton wrote:
> > > On Mon, 20 Aug 2007 14:38:57 +0800 gshan <[EMAIL PROTECTED]> wrote:
> > > Hi All,
> > > 
> > > After I started the NFS server, it crashed:
> > > 
> > > <3>Badness in local_bh_enable at 
> > > /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> > > Badness in local_bh_enable at 
> > > /home/cli4/sandbox/main/TelicaRoot/components/mvlinux/cge/devkit/lsp/7xx/linux/kernel/softirq.c:195
> > > Call trace:
> > >  [c0005340] check_bug_trap+0xbc/0x11c
> > >  [c0005604] ProgramCheckException+0x264/0x2bc
> > >  [c0004ac4] ret_from_except_full+0x0/0x4c
> > >  [c0022ae4] local_bh_enable+0x18/0x80
> > >  [c024648c] skb_copy_bits+0x168/0x3b8
> > >  [c024db44] __skb_linearize+0x90/0x150
> > >  [c020e8a4] mv643xx_eth_start_xmit+0x4c0/0x5bc
> > >  [c025c934] qdisc_restart+0xac/0x2bc
> > >  [c024de9c] dev_queue_xmit+0x298/0x34c
> > >  [c0269814] ip_finish_output+0x140/0x2b8
> > >  [c026a3ac] ip_fragment+0x3cc/0x6e0
> > >  [c026bac8] ip_push_pending_frames+0x3dc/0x46c
> > >  [c0289ec4] udp_push_pending_frames+0x10c/0x1cc
> > >  [c028a7c4] udp_sendpage+0x104/0x188
> > >  [c0292fc8] inet_sendpage+0x90/0xb8
> > > 
> > > I searched the webs and found the similar problems:

> > > http://www.mail-archive.com/netdev@vger.kernel.org/msg05199.html

> > > http://oss.sgi.com/archives/netdev/2005-09/msg00025.html
> > > 
> > > Who knew there are fixes for the problem?
> > 
> > Well that got a tremendous response, didn't it?

I thought of replying to this post when I saw it couple weeks back, but
didn't because (1) that's an ancient kernel and (2) the first link that
was mentioned up there in the original post itself pointed to a patch to
solve it (which I verified was since applied to mainline too).


> > What do you mean by "crashed"?  The above is a warning and the system
> > should have survived.
> > 
> > Which kernel version is being used?
> 
> That is the key question.  From the pathnames, I suspect that gshan is
> using a MontaVista version.  I'm still (especially) interested, since
> MontaVista pays me.
> 
> BTW, I never received the original message on netdev or linux-kernel.
> Hmm.  Thanks to Andrew for replying.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c

2007-09-04 Thread Satyam Sharma


On Tue, 4 Sep 2007, Satyam Sharma wrote:

> Hi Micah,
> 
> 
> On Tue, 4 Sep 2007, Micah Gruber wrote:
> 
> > This patch fixes a potential null dereference bug where we dereference
> > dev before a null check. This patch simply moves the dereferencing after
> > the null check.
> > 
> > Signed-off-by: Micah Gruber <[EMAIL PROTECTED]>
> > ---
> > 
> > --- a/drivers/net/tulip/uli526x.c
> > +++ b/drivers/net/tulip/uli526x.c
> > @@ -663,7 +663,7 @@
> >  {
> > struct net_device *dev = dev_id;
> > struct uli526x_board_info *db = netdev_priv(dev);
> > -   unsigned long ioaddr;
> > +   unsigned long ioaddr = dev->base_addr;
> > unsigned long flags;
> >  
> > if (!dev) {
> > @@ -671,8 +671,6 @@
> > return IRQ_NONE;
> > }
> >  
> > -   ioaddr = dev->base_addr;
> > -
> > spin_lock_irqsave(&db->lock, flags);
> > outl(0, ioaddr + DCR7);
> 
> 
> [The patch is reversed.]
> 
> But it is also complete -- you've left out the "netdev_priv(dev)"
 
I meant:   incomplete

> statement _just_ before the dereference that you've pushed down.
> Coverity wasn't smart enough to tell, but that's actually a dev->priv,
> so we'll _still_ be dereferencing dev before checking it for NULL below.
> And lastly, the patch isn't quite correct. It is the (!dev) check that
> is redundant and must be removed instead :-)
> 
> I bet more such pointless checks exist all over. It makes more sense to
> remove them than unnecessarily try to solve "potential" NULL dereferences,
> that we (naturally) never saw earlier, because they're impossible. ISTR
> pointing out two similar patches a month or so back when Jeff pushed net
> driver patches upstream, IIRC ... ]
> 
> 
> Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug

2007-09-04 Thread Satyam Sharma
Hi Steffen,


On Tue, 4 Sep 2007, Steffen Klassert wrote:

> On Tue, Sep 04, 2007 at 03:45:55AM +0530, Satyam Sharma wrote:
> > 
> > drivers/net/3c59x.c: In function 'vortex_up':
> > drivers/net/3c59x.c:1495: warning: 'err' may be used uninitialized in this 
> > function
> 
> This came in with the recently applied 
> 3c59x-check-return-of-pci_enable_device patch
> from Mark Hindley. I just compiled it on a PCI only machine so far, therefore 
> I did
> not notice the warning yet.

Hmm, the .config I built with had PCI=y as well. Probably a compiler
version difference -- Jeff also mentioned yesterday that some newer
GCC versions fail to warn about uninitialized variables cases.


> > is a genuine bug. The function returns an uninitialized value of 'err'
> > back to the caller, which expects it to be 0 for success cases. Let's
> > fix this by explicitly initializing 'err' to zero.
> > 
> > Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
> Acked-by: Steffen Klassert <[EMAIL PROTECTED]>


Thanks,

Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c

2007-09-04 Thread Satyam Sharma
Hi Micah,


On Tue, 4 Sep 2007, Micah Gruber wrote:

> This patch fixes a potential null dereference bug where we dereference
> dev before a null check. This patch simply moves the dereferencing after
> the null check.
> 
> Signed-off-by: Micah Gruber <[EMAIL PROTECTED]>
> ---
> 
> --- a/drivers/net/tulip/uli526x.c
> +++ b/drivers/net/tulip/uli526x.c
> @@ -663,7 +663,7 @@
>  {
>   struct net_device *dev = dev_id;
>   struct uli526x_board_info *db = netdev_priv(dev);
> - unsigned long ioaddr;
> + unsigned long ioaddr = dev->base_addr;
>   unsigned long flags;
>  
>   if (!dev) {
> @@ -671,8 +671,6 @@
>   return IRQ_NONE;
>   }
>  
> - ioaddr = dev->base_addr;
> -
>   spin_lock_irqsave(&db->lock, flags);
>   outl(0, ioaddr + DCR7);


[The patch is reversed.]

But it is also complete -- you've left out the "netdev_priv(dev)"
statement _just_ before the dereference that you've pushed down.
Coverity wasn't smart enough to tell, but that's actually a dev->priv,
so we'll _still_ be dereferencing dev before checking it for NULL below.
And lastly, the patch isn't quite correct. It is the (!dev) check that
is redundant and must be removed instead :-)

I bet more such pointless checks exist all over. It makes more sense to
remove them than unnecessarily try to solve "potential" NULL dereferences,
that we (naturally) never saw earlier, because they're impossible. ISTR
pointing out two similar patches a month or so back when Jeff pushed net
driver patches upstream, IIRC ... ]


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -mm 2/2] 3c59x MAINTAINERS

2007-09-03 Thread Satyam Sharma
Remove duplicate entry for the same driver.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 MAINTAINERS |6 --
 1 file changed, 6 deletions(-)

--- linux-2.6.23-rc4-mm1/MAINTAINERS~fix2007-09-04 03:49:16.0 
+0530
+++ linux-2.6.23-rc4-mm1/MAINTAINERS2007-09-04 03:49:28.0 +0530
@@ -104,12 +104,6 @@ M: [EMAIL PROTECTED]
 L: netdev@vger.kernel.org
 S: Maintained
 
-3C59X NETWORK DRIVER
-P: Steffen Klassert
-M: [EMAIL PROTECTED]
-L: netdev@vger.kernel.org
-S: Maintained
-
 3CR990 NETWORK DRIVER
 P: David Dillow
 M: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -mm 1/2] 3c59x: Fix uninitialized variable bug

2007-09-03 Thread Satyam Sharma

drivers/net/3c59x.c: In function 'vortex_up':
drivers/net/3c59x.c:1495: warning: 'err' may be used uninitialized in this 
function

is a genuine bug. The function returns an uninitialized value of 'err'
back to the caller, which expects it to be 0 for success cases. Let's
fix this by explicitly initializing 'err' to zero.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 drivers/net/3c59x.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- linux-2.6.23-rc4-mm1/drivers/net/3c59x.c~fix2007-09-04 
03:29:40.0 +0530
+++ linux-2.6.23-rc4-mm1/drivers/net/3c59x.c2007-09-04 03:30:08.0 
+0530
@@ -1492,7 +1492,8 @@ vortex_up(struct net_device *dev)
struct vortex_private *vp = netdev_priv(dev);
void __iomem *ioaddr = vp->ioaddr;
unsigned int config;
-   int i, mii_reg1, mii_reg5, err;
+   int i, mii_reg1, mii_reg5;
+   int err = 0;
 
if (VORTEX_PCI(vp)) {
pci_set_power_state(VORTEX_PCI(vp), PCI_D0);/* Go active */
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -mm] sunrpc svc: Shut up bogus uninitialized variable warning

2007-09-02 Thread Satyam Sharma

net/sunrpc/svc.c: In function ‘__svc_create_thread’:
net/sunrpc/svc.c:550: warning: ‘oldmask.bits[0u]’ may be used uninitialized in 
this function

is a bogus warning, but gcc isn't smart enough to see why. We cannot just
reorganize the code in the function, because we want the set_cpus_allowed()
restore to happen only after the kernel_thread() is forked. Alas, we have
to use cpus_clear() to initialize oldmask instead to keep gcc happy.

Also add some comments to describe what's happening in the function.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 net/sunrpc/svc.c |7 +++
 1 file changed, 7 insertions(+)

--- linux-2.6.23-rc4-mm1/net/sunrpc/svc.c~fix   2007-09-02 22:55:25.0 
+0530
+++ linux-2.6.23-rc4-mm1/net/sunrpc/svc.c   2007-09-02 23:03:45.0 
+0530
@@ -568,17 +568,24 @@ __svc_create_thread(svc_thread_fn func, 
rqstp->rq_server = serv;
rqstp->rq_pool = pool;
 
+   /* Only to prevent gcc warning */
+   cpus_clear(oldmask);
+
+   /* Temporarily change CPU affinity before forking svc thread */
if (serv->sv_nrpools > 1)
have_oldmask = svc_pool_map_set_cpumask(pool->sp_id, &oldmask);
 
+   /* Will inherit current->cpus_allowed */
error = kernel_thread((int (*)(void *)) func, rqstp, 0);
 
+   /* Restore old cpus_allowed */
if (have_oldmask)
set_cpus_allowed(current, oldmask);
 
if (error < 0)
goto out_thread;
svc_sock_update_bufs(serv);
+
error = 0;
 out:
return error;

Re: [OOPS] 2.6.23-rc5 ? network/via-rhine [was: hang with CONFIG_MCYRIXIII]

2007-09-02 Thread Satyam Sharma


On Sun, 2 Sep 2007, Mark Hindley wrote:
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 0025
> [...]
> Call Trace:
>  [] tcp_rtt_estimator+0xba/0x100
> [...]
> EIP: [] tcp_rto_min+0x8/0x12 SS:ESP 0068:c0341dec

Third report of this oops within past few hours :-)

You want the patch available at:

(very long URL)
http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=blobdiff;f=net/ipv4/tcp_input.c;h=bbad2cdb74b7c18c0ae742be241857b126f06890;hp=1ee72127462bf37ed5ef834ebc7b1cf070d5;hb=5c127c58ae9bf196d787815b1bd6b0aec5aee816;hpb=88282c6ecf0dacefda6090b0289d35e8a3cc6221
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc4-mm1 OOPS in forcedeth?

2007-09-02 Thread Satyam Sharma


On Sun, 2 Sep 2007, [EMAIL PROTECTED] wrote:
> 
> > > [EMAIL PROTECTED] wrote:
> > > > 
> > > > On this machine (Athlon 64 X2 4600, 4 GiB memory, lots of disks),
> > > > 2.6.23-rc1-mm2 runs fine. 2.6.23-rc4-mm1 reproducably dies within 
> > > > seconds of
> > > > starting
> > > > a rsync session on another PC against this machine.
> > > > 
> > > > NULL pointer dereference
> > > > code:   nv_napi_poll+0x108
> > > > trace:  net_rx_action+0xab
> > > > __do_softirq+0x74
> > > > call_softirq+0x1c
> > > > do_softirq+0x3d
> > > > irq_exit+0x85
> > > > do_IRQ+0x85
> > > > ret_from_intr+0x0
> 
> There are 4 pictures of oopses here:
> 
> http://www.xs4all.nl/~thunder7/oops_2623rc4mm1_1.jpg
> http://www.xs4all.nl/~thunder7/oops_2623rc4mm1_2.jpg
> http://www.xs4all.nl/~thunder7/oops_2623rc4mm1_3.jpg
> http://www.xs4all.nl/~thunder7/oops_2623rc4mm1_4.jpg

OK, I've been pouring over forcedeth.c and the newly introduce NAPI code,
but didn't debug this yet, so I'll at least lay out the situation so that
somebody else who's more experienced @netdev can pick up from here with
minimal time wastage.

Here's what's happening (repeatedly, reproducibly) on Jurriaan's x64 box:

(1) The following NULL dereference oops:

nv_rx_process_optimized(), inlined from nv_napi_poll(), found that
"skb" i.e. np->get_rx_ctx->skb == NULL when trying to update
skb->ip_summed.

(2) The following BUG in napi_complete():

BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));

from the nv_napi_poll()->__netif_rx_complete()->napi_complete()
callchain is triggering. IOW napi_complete() found that a NAPI
poll wasn't/shouldn't have been scheduled at all (!)

The above two problems appear to be occurring independently, AFAICT.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc4-mm1: unpingable box and NULL dereference at tcp_rto_min()

2007-09-01 Thread Satyam Sharma


On Sun, 2 Sep 2007, Alexey Dobriyan wrote:
> 
> Unable to handle kernel NULL pointer dereference at 0039 RIP: 
>  [] tcp_rto_min+0xc/0x20

tcp_rto_min() lacks a check-for-NULL. You want 5c127c58ae9bf196 from
the net-2.6.git tree -- so this will be gone in -rc6.

> P.S.: uh-oh, it's "[TCP] Allow minnimum RTO ..." aka 05bb1fad1cde

Yup, it came from this last commit in net-2.6 before -rc5.

[ Considering it's pretty core code (and thus the oops fairly easily
  reproducible), I initially thought this must've come from net-2.6.24.
  I suspect lot of testers might hit this, so would be wise to put that
  patch up as a hot-fix ? ]


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -mm] net/sched/sch_cbq.c: Shut up uninitialized variable warning

2007-09-01 Thread Satyam Sharma

net/sched/sch_cbq.c: In function 'cbq_enqueue':
net/sched/sch_cbq.c:383: warning: 'ret' may be used uninitialized in this 
function

has been verified to be a bogus case. So let's shut it up.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 net/sched/sch_cbq.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.23-rc4-mm1/net/sched/sch_cbq.c~fix2007-09-02 
06:45:08.0 +0530
+++ linux-2.6.23-rc4-mm1/net/sched/sch_cbq.c2007-09-02 06:44:37.0 
+0530
@@ -380,7 +380,7 @@ cbq_enqueue(struct sk_buff *skb, struct 
 {
struct cbq_sched_data *q = qdisc_priv(sch);
int len = skb->len;
-   int ret;
+   int uninitialized_var(ret);
struct cbq_class *cl = cbq_classify(skb, sch, &ret);
 
 #ifdef CONFIG_NET_CLS_ACT
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc4-mm1 OOPS in forcedeth?

2007-09-01 Thread Satyam Sharma
Hi Jurriaan,


> [EMAIL PROTECTED] wrote:
> > From: Andrew Morton <[EMAIL PROTECTED]>
> > Date: Fri, Aug 31, 2007 at 09:58:22PM -0700
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc4/2.6.23-rc4-mm1/
> > > 
> > On this machine (Athlon 64 X2 4600, 4 GiB memory, lots of disks),
> > 2.6.23-rc1-mm2 runs fine. 2.6.23-rc4-mm1 reproducably dies within seconds of
> > starting
> > a rsync session on another PC against this machine.
> > 
> > NULL pointer dereference
> > code:   nv_napi_poll+0x108
> > trace:  net_rx_action+0xab
> > __do_softirq+0x74
> > call_softirq+0x1c
> > do_softirq+0x3d
> > irq_exit+0x85
> > do_IRQ+0x85
> > ret_from_intr+0x0

The dmesg you posted below doesn't cover the messages from this oops
itself. As you mentioned you can reproduce this oops easily, please do so,
and post the *full* oops log (if it doesn't get logged to disk, you can
try taking digicam photo, or write down *all* the messages and post here).
I built an x86_64 kernel as per your .config, but don't see any memory
dereference at nv_napi_poll+0x108 -- could be toolchain differences.

Else, can you run:
$ gdb ./vmlinux

and then:
(gdb) l *nv_napi_poll+0x108

and send us the output?


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc4-mm1 "no CRC" MODPOST warnings

2007-09-01 Thread Satyam Sharma


On Fri, 31 Aug 2007, Andrew Morton wrote:
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc4/2.6.23-rc4-mm1/

Got these on an i386 build with CONFIG_MODVERSIONS=y ...

WARNING: "div64_64" [net/netfilter/xt_connbytes.ko] has no CRC!
WARNING: "div64_64" [net/ipv4/tcp_cubic.ko] has no CRC!

Full .config at: http://www.cse.iitk.ac.in/users/ssatyam/config-mm
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug? Realtek RTL-8169 Gigabit Ethernet and High Resolution Timers

2007-08-30 Thread Satyam Sharma
[ Adding relevant Cc:'s ]


On Thu, 30 Aug 2007, n wrote:

> I found a bug when using the Ethernet controller: Realtek Semiconductor Co.,
> Ltd. RTL-8169 Gigabit Ethernet (rev 10)  ethernet card and kernel High
> Resolution Timers (menuconfig -> Processor type and features -> High
> Resolution Timer Support )
> after about 20 kernel compiles i finnaly figured out this option was making
> the ethernet card slow
> tested kernels 2.6.22,2.6.22.5, 2.6.23-rc4 (running now)
> 
> the cpu is the pc is a p3 667 mhz so its cpu limited i guess and maxes at
> 400mbit~. (without High Resolution Timers )
> with High Resolution Timers off it would only do 50-70mbit.
> 
> r8169 (compile as module + napi)
> High Resolution Timer Support (enabled)
> tested with iperf also system is very unresponsive i cant even open a ssh
> session or type a command while the test ran
> [  5]  0.0-10.0 sec  64.5 MBytes  54.0 Mbits/sec
> 
> r8169 (compile as module + napi)
> High Resolution Timer Support (disabled)
> [  3]  0.0-10.0 sec552 MBytes463 Mbits/sec
> 
> as you can see with it disabled the speed really improved.
> but im wondering what does High Resolution Timer's do and what does it have to
> do with a pci gigabit ethernet card that would slow it down
> maybe someone else could test and see if they get the same results
> 
> this might solve alot of problems with this card since alot of distros compile
> Resolution Timer default to enabled...
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Satyam Sharma
Hi Denys,


On Fri, 24 Aug 2007, Denys Vlasenko wrote:

> On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
> >
> >  static inline void wait_for_init_deassert(atomic_t *deassert)
> >  {
> > -   while (!atomic_read(deassert));
> > +   while (!atomic_read(deassert))
> > +   cpu_relax();
> > return;
> >  }
> 
> For less-than-briliant people like me, it's totally non-obvious that
> cpu_relax() is needed for correctness here, not just to make P4 happy.

This thread has been round-and-round with exactly the same discussions
:-) I had proposed few such variants to make a compiler barrier implicit
in atomic_{read,set} myself, but frankly, at least personally speaking
(now that I know better), I'm not so much in favour of implicit barriers
(compiler, memory or both) in atomic_{read,set}.

This might sound like an about-turn if you read my own postings to Nick
Piggin from a week back, but I do agree with most his opinions on the
matter now -- separation of barriers from atomic ops is actually good,
beneficial to certain code that knows what it's doing, explicit usage
of barriers stands out more clearly (most people here who deal with it
do know cpu_relax() is an explicit compiler barrier) compared to an
implicit usage in an atomic_read() or such variant ...


> IOW: "atomic_read" name quite unambiguously means "I will read
> this variable from main memory". Which is not true and creates
> potential for confusion and bugs.

I'd have to disagree here -- atomic ops are all about _atomicity_ of
memory accesses, not _making_ them happen (or visible to other CPUs)
_then and there_ itself. The latter are the job of barriers.

The behaviour (and expectations) are quite comprehensively covered in
atomic_ops.txt -- let alone atomic_{read,set}, even atomic_{inc,dec}
are permitted by archs' implementations to _not_ have any memory
barriers, for that matter. [It is unrelated that on x86 making them
SMP-safe requires the use of the LOCK prefix that also happens to be
an implicit memory barrier.]

An argument was also made about consistency of atomic_{read,set} w.r.t.
the other atomic ops -- but clearly, they are all already consistent!
All of them are atomic :-) The fact that atomic_{read,set} do _not_
require any inline asm or LOCK prefix whereas the others do, has to do
with the fact that unlike all others, atomic_{read,set} are not RMW ops
and hence guaranteed to be atomic just as they are in plain & simple C.

But if people do seem to have a mixed / confused notion of atomicity
and barriers, and if there's consensus, then as I'd said earlier, I
have no issues in going with the consensus (eg. having API variants).
Linus would be more difficult to convince, however, I suspect :-)


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-21 Thread Satyam Sharma


On Tue, 21 Aug 2007, Chris Snook wrote:

> David Miller wrote:
> > From: Linus Torvalds <[EMAIL PROTECTED]>
> > Date: Mon, 20 Aug 2007 22:46:47 -0700 (PDT)
> > 
> > > Ie a "barrier()" is likely _cheaper_ than the code generation downside
> > > from using "volatile".
> > 
> > Assuming GCC were ever better about the code generation badness
> > with volatile that has been discussed here, I much prefer
> > we tell GCC "this memory piece changed" rather than "every
> > piece of memory has changed" which is what the barrier() does.
> > 
> > I happened to have been scanning a lot of assembler lately to
> > track down a gcc-4.2 miscompilation on sparc64, and the barriers
> > do hurt quite a bit in some places.  Instead of keeping unrelated
> > variables around cached in local registers, it reloads everything.
> 
> Moore's law is definitely working against us here.  Register counts, pipeline
> depths, core counts, and clock multipliers are all increasing in the long run.
> At some point in the future, barrier() will be universally regarded as a
> hammer too big for most purposes.

I do agree, and the important point to note is that the benefits of a
/lighter/ compiler barrier, such as what David referred to above, _can_
be had without having to do anything with the "volatile" keyword at all.
And such a primitive has already been mentioned/proposed on this thread.


But this is all tangential to the core question at hand -- whether to have
implicit (compiler, possibly "light-weight" of the kind referred above)
barrier semantics in atomic ops that do not have them, or not.

I was lately looking in the kernel for _actual_ code that uses atomic_t
and benefits from the lack of any implicit barrier, with the compiler
being free to cache the atomic_t in a register. Now that often does _not_
happen, because all other ops (implemented in asm with LOCK prefix on x86)
_must_ therefore constrain the atomic_t to memory anyway. So typically all
atomic ops code sequences end up operating on memory.

Then I did locate sched.c:select_nohz_load_balancer() -- it repeatedly
references the same atomic_t object, and the code that I saw generated
(with CC_OPTIMIZE_FOR_SIZE=y) did cache it in a register for a sequence of
instructions. It uses atomic_cmpxchg, thereby not requiring explicit
memory barriers anywhere in the code, and is an example of an atomic_t
user that is safe, and yet benefits from its memory loads/stores being
elided/coalesced by the compiler.


# at this point, %%eax holds num_online_cpus() and
# %%ebx holds cpus_weight(nohz.cpu_mask)
# the variable "cpu" is in %esi

0xc1018e1d:  cmp%eax,%ebx   # if No.A.
0xc1018e1f:  mov0xc134d900,%eax # first atomic_read()
0xc1018e24:  jne0xc1018e36
0xc1018e26:  cmp%esi,%eax   # if No.B.
0xc1018e28:  jne0xc1018e80  # returns with 0
0xc1018e2a:  movl   $0x,0xc134d900  # atomic_set(-1), and ...
0xc1018e34:  jmp0xc1018e80  # ... returns with 0
0xc1018e36:  cmp$0x,%eax# if No.C. (NOTE!)
0xc1018e39:  jne0xc1018e46
0xc1018e3b:  lock cmpxchg %esi,0xc134d900   # atomic_cmpxchg()
0xc1018e43:  inc%eax
0xc1018e44:  jmp0xc1018e48
0xc1018e46:  cmp%esi,%eax   # if No.D. (NOTE!)
0xc1018e48:  jne0xc1018e80  # if !=, default return 0 (if 
No.E.)
0xc1018e4a:  jmp0xc1018e84  # otherwise (==) returns with 1

The above is:

if (cpus_weight(nohz.cpu_mask) == num_online_cpus()) {  /* if No.A. */
if (atomic_read(&nohz.load_balancer) == cpu)/* if No.B. */
atomic_set(&nohz.load_balancer, -1);/* XXX */
return 0;
}
if (atomic_read(&nohz.load_balancer) == -1) {   /* if No.C. */
/* make me the ilb owner */
if (atomic_cmpxchg(&nohz.load_balancer, -1, cpu) == -1) /* if 
No.E. */
return 1;
} else if (atomic_read(&nohz.load_balancer) == cpu) /* if No.D. */
return 1;
...
...
return 0; /* default return from function */

As you can see, the atomic_read()'s of "if"s Nos. B, C, and D, were _all_
coalesced into a single memory reference "mov0xc134d900,%eax" at the
top of the function, and then "if"s Nos. C and D simply used the value
from %%eax itself. But that's perfectly safe, such is the logic of this
function. It uses cmpxchg _whenever_ updating the value in the memory
atomic_t and then returns appropriately. The _only_ point that a casual
reader may find racy is that marked /* XXX */ above -- atomic_read()
followed by atomic_set() with no barrier in between. But even that is ok,
because if one thread ever finds that condition to succeed, it is 100%
guaranteed no other thread on any other CPU will find _any_ condition
to be true, thereby avoiding any race in the modification of that value.

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-18 Thread Satyam Sharma


On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > GCC manual, section 6.1, "When
  ^^
> > > is a Volatile Object Accessed?" doesn't say anything of the
  ^^^
> > > kind.
  ^

> > True, "implementation-defined" as per the C standard _is_ supposed to mean
^

> > "unspecified behaviour where each implementation documents how the choice
> > is made". So ok, probably GCC isn't "documenting" this
  

> > implementation-defined behaviour which it is supposed to, but can't really
> > fault them much for this, probably.
> 
> GCC _is_ documenting this, namely in this section 6.1.

(Again totally petty, but) Yes, but ...

> It doesn't
  ^^
> mention volatile-casted stuff.  Draw your own conclusions.
  ^^

... exactly. So that's why I said "GCC isn't documenting _this_".

Man, try _reading_ mails before replying to them ...
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-18 Thread Satyam Sharma


On Fri, 17 Aug 2007, Linus Torvalds wrote:

> On Sat, 18 Aug 2007, Satyam Sharma wrote:
> > 
> > No code does (or would do, or should do):
> > 
> > x.counter++;
> > 
> > on an "atomic_t x;" anyway.
> 
> That's just an example of a general problem.
> 
> No, you don't use "x.counter++". But you *do* use
> 
>   if (atomic_read(&x) <= 1)
> 
> and loading into a register is stupid and pointless, when you could just 
> do it as a regular memory-operand to the cmp instruction.

True, but that makes this a bad/poor code generation issue with the
compiler, not something that affects the _correctness_ of atomic ops if
"volatile" is used for that counter object (as was suggested), because
we'd always use the atomic_inc() etc primitives to do increments, which
are always (should be!) implemented to be atomic.


> And as far as the compiler is concerned, the problem is the 100% same: 
> combining operations with the volatile memop.
> 
> The fact is, a compiler that thinks that
> 
>   movl mem,reg
>   cmpl $val,reg
> 
> is any better than
> 
>   cmpl $val,mem
> 
> is just not a very good compiler.

Absolutely, this is definitely a bug report worth opening with gcc. And
what you've said to explain this previously sounds definitely correct --
seeing "volatile" for any access does appear to just scare the hell out
of gcc and makes it generate such (poor) code.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-18 Thread Satyam Sharma
[ LOL, you _are_ shockingly petty! ]


On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > The documentation simply doesn't say "+m" is allowed.  The code to
> > > allow it was added for the benefit of people who do not read the
> > > documentation.  Documentation for "+m" might get added later if it
> > > is decided this [the code, not the documentation] is a sane thing
> > > to have (which isn't directly obvious).
> > 
> > Huh?
> > 
> > "If the (current) documentation doesn't match up with the (current)
> > code, then _at least one_ of them has to be (as of current) wrong."
> > 
> > I wonder how could you even try to disagree with that.
> 
> Easy.
> 
> The GCC documentation you're referring to is the user's manual.
> See the blurb on the first page:
> 
> "This manual documents how to use the GNU compilers, as well as their
> features and incompatibilities, and how to report bugs.  It corresponds
> to GCC version 4.3.0.  The internals of the GNU compilers, including
> how to port them to new targets and some information about how to write
> front ends for new languages, are documented in a separate manual."
> 
> _How to use_.  This documentation doesn't describe in minute detail
> everything the compiler does (see the source code for that -- no, it
> isn't described in the internals manual either).

Wow, now that's a nice "disclaimer". By your (poor) standards of writing
documentation, one can as well write any factually incorrect stuff that
one wants in a document once you've got such a blurb in place :-)


> If it doesn't tell you how to use "+m", and even tells you _not_ to
> use it, maybe that is what it means to say?  It doesn't mean "+m"
> doesn't actually do something.  It also doesn't mean it does what
> you think it should do.  It might do just that of course.  But treating
> writing C code as an empirical science isn't such a smart idea.

Oh, really? Considering how much is (left out of being) documented, often
one would reasonably have to experimentally see (with testcases) how the
compiler behaves for some given code. Well, at least _I_ do it often
(several others on this list do as well), and I think there's everything
smart about it rather than having to read gcc sources -- I'd be surprised
(unless you have infinite free time on your hands, which does look like
teh case actually) if someone actually prefers reading gcc sources first
to know what/how gcc does something for some given code, rather than
simply write it out, compile and look the generated code (saves time for
those who don't have an infinite amount of it).


> > And I didn't go whining about this ... you asked me. (I think I'd said
> > something to the effect of GCC docs are often wrong,
> 
> No need to guess at what you said, even if you managed to delete
> your own mail already, there are plenty of free web-based archives
> around.  You said:
> 
> > See, "volatile" C keyword, for all it's ill-definition and dodgy
> > semantics, is still at least given somewhat of a treatment in the C
> > standard (whose quality is ... ummm, sadly not always good and clear,
> > but unsurprisingly, still about 5,482 orders-of-magnitude times
> > better than GCC docs).

Try _reading_ what I said there, for a change, dude. I'd originally only
said "unless GCC's docs is yet again wrong" ... then _you_ asked me what,
after which this discussion began and I wrote the above [which I fully
agree with -- so what if I used hyperbole in my sentence (yup, that was
intended, and obviously, exaggeration), am I not even allowed to do that?
Man, you're a Nazi or what ...] I didn't go whining about on my own as
you'd had earlier suggested, until _you_ asked me.

[ Ick, I somehow managed to reply this ... this is such a ...
  *disgustingly* petty argument you made here. ]


> > which is true,
> 
> Yes, documentation of that size often has shortcomings.  No surprise
> there.  However, great effort is made to make it better documentation,
> and especially to keep it up to date; if you find any errors or
> omissions, please report them.  There are many ways how to do that,
> see the GCC homepage.
 ^^

Looks like you even get paid :-)


> > but probably you feel saying that is "not allowed" on non-gcc lists?)
> 
> [amazingly pointless stuff snipped]
> 
> > As for the "PR"
> > you're requesting me to file with GCC for this, that
> > gcc-patches@ thread did precisely that
> 
> [more amazingly pointless stuff snipped]
> 
> > and more (submitted a patch to
> > said documentation -- and no, saying "documentation might get added
> > later" is totally bogus and nonsensical -- documentation exists to
> > document current behaviour, not past).
> 
> When code like you want to write becomes a supported feature, that
> will be reflected in the user manual.  It is completely nonsensical
> to expect everything that is *not* a supported feature to be mentioned
> there.

What crap. It is _perfectly reasonable_ to expect (current) docume

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > > > The "asm volatile" implementation does have exactly the same
> > > > > reordering guarantees as the "volatile cast" thing,
> > > > 
> > > > I don't think so.
> > > 
> > > "asm volatile" creates a side effect.
> > 
> > Yeah.
> > 
> > > Side effects aren't
> > > allowed to be reordered wrt sequence points.
> > 
> > Yeah.
> > 
> > > This is exactly
> > > the same reason as why "volatile accesses" cannot be reordered.
> > 
> > No, the code in that sub-thread I earlier pointed you at *WAS* written
> > such that there was a sequence point after all the uses of that volatile
> > access cast macro, and _therefore_ we were safe from re-ordering
> > (behaviour guaranteed by the C standard).
> 
> And exactly the same is true for the "asm" version.
> 
> > Now, one cannot fantasize that "volatile asms" are also sequence points.
> 
> Sure you can do that.  I don't though.
> 
> > In fact such an argument would be sadly mistaken, because "sequence
> > points" are defined by the C standard and it'd be horribly wrong to
> > even _try_ claiming that the C standard knows about "volatile asms".
> 
> That's nonsense.  GCC can extend the C standard any way they
> bloody well please -- witness the fact that they added an
> extra class of side effects...
> 
> > > > Read the relevant GCC documentation.
> > > 
> > > I did, yes.
> > 
> > No, you didn't read:
> > 
> > http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
> > 
> > Read the bit about the need for artificial dependencies, and the example
> > given there:
> > 
> > asm volatile("mtfsf 255,%0" : : "f" (fpenv));
> > sum = x + y;
> > 
> > The docs explicitly say the addition can be moved before the "volatile
> > asm". Hopefully, as you know, (x + y) is an C "expression" and hence
> > a "sequence point" as defined by the standard.
> 
> The _end of a full expression_ is a sequence point, not every
> expression.  And that is irrelevant here anyway.
> 
> It is perfectly fine to compute x+y any time before the
> assignment; the C compiler is allowed to compute it _after_
> the assignment even, if it could figure out how ;-)
> 
> x+y does not contain a side effect, you know.
> 
> > I know there is also stuff written about "side-effects" there which
> > _could_ give the same semantic w.r.t. sequence points as the volatile
> > access casts,
> 
> s/could/does/
> 
> > but hey, it's GCC's own documentation, you obviously can't
> > find fault with _me_ if there's wrong stuff written in there. Say that
> > to GCC ...
> 
> There's nothing wrong there.
> 
> > See, "volatile" C keyword, for all it's ill-definition and dodgy
> > semantics, is still at least given somewhat of a treatment in the C
> > standard (whose quality is ... ummm, sadly not always good and clear,
> > but unsurprisingly, still about 5,482 orders-of-magnitude times
> > better than GCC docs).
> 
> If you find any problems/shortcomings in the GCC documentation,
> please file a PR, don't go whine on some unrelated mailing lists.
> Thank you.
> 
> > Semantics of "volatile" as applies to inline
> > asm, OTOH? You're completely relying on the compiler for that ...
> 
> Yes, and?  GCC promises the behaviour it has documented.

LOTS there, which obviously isn't correct, but which I'll reply to later,
easier stuff first. (call this "handwaving" if you want, but don't worry,
I /will/ bother myself to reply)


> > > > [ of course, if the (latest) GCC documentation is *yet again*
> > > >   wrong, then alright, not much I can do about it, is there. ]
> > > 
> > > There was (and is) nothing wrong about the "+m" documentation, if
> > > that is what you are talking about.  It could be extended now, to
> > > allow "+m" -- but that takes more than just "fixing" the documentation.
> > 
> > No, there was (and is) _everything_ wrong about the "+" documentation as
> > applies to memory-constrained operands. I don't give a whit if it's
> > some workaround in their gimplifier, or the other, that makes it possible
> > to use "+m" (like the current kernel code does). The docs suggest
> > otherwise, so there's obviously a clear disconnect between the docs and
> > actual GCC behaviour.
> 
> The documentation simply doesn't say "+m" is allowed.  The code to
> allow it was added for the benefit of people who do not read the
> documentation.  Documentation for "+m" might get added later if it
> is decided this [the code, not the documentation] is a sane thing
> to have (which isn't directly obvious).

Huh?

"If the (current) documentation doesn't match up with the (current)
code, then _at least one_ of them has to be (as of current) wrong."

I wonder how could you even try to disagree with that.

And I didn't go whining about this ... you asked me. (I think I'd said
something to the effect of GCC docs are often wrong, which is true,
but probably you feel saying that is "not allowed" on non-gcc lists?)

As for the "PR" you're requesting me to file with GCC for this, that
gcc-patches@ threa

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> 
> > I didn't quite understand what you said here, so I'll tell what I think:
> > 
> > * foo() is a compiler barrier if the definition of foo() is invisible to
> >  the compiler at a callsite.
> > 
> > * foo() is also a compiler barrier if the definition of foo() includes
> >  a barrier, and it is inlined at the callsite.
> > 
> > If the above is wrong, or if there's something else at play as well,
> > do let me know.
> 
> [...]
> If a function is not completely visible to the compiler (so it can't
> determine whether a barrier could be in it or not), then it must always
> assume it will contain a barrier so it always does the right thing.

Yup, that's what I'd said just a few sentences above, as you can see. I
was actually asking for "elaboration" on "how a compiler determines that
function foo() (say foo == schedule), even when it cannot see that it has
a barrier(), as you'd mentioned, is a 'sleeping' function" actually, and
whether compilers have a "notion of sleep to automatically assume a
compiler barrier whenever such a sleeping function foo() is called". But
I think you've already qualified the discussion to this kernel, so okay,
I shouldn't nit-pick anymore.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > > > atomic_dec() writes
> > > > > to memory, so it _does_ have "volatile semantics", implicitly, as
> > > > > long as the compiler cannot optimise the atomic variable away
> > > > > completely -- any store counts as a side effect.
> > > > 
> > > > I don't think an atomic_dec() implemented as an inline "asm volatile"
> > > > or one that uses a "forget" macro would have the same re-ordering
> > > > guarantees as an atomic_dec() that uses a volatile access cast.
> > > 
> > > The "asm volatile" implementation does have exactly the same
> > > reordering guarantees as the "volatile cast" thing,
> > 
> > I don't think so.
> 
> "asm volatile" creates a side effect.

Yeah.

> Side effects aren't
> allowed to be reordered wrt sequence points.

Yeah.

> This is exactly
> the same reason as why "volatile accesses" cannot be reordered.

No, the code in that sub-thread I earlier pointed you at *WAS* written
such that there was a sequence point after all the uses of that volatile
access cast macro, and _therefore_ we were safe from re-ordering
(behaviour guaranteed by the C standard).

But you seem to be missing the simple and basic fact that:

(something_that_has_side_effects || statement)
!= something_that_is_a_sequence_point

Now, one cannot fantasize that "volatile asms" are also sequence points.
In fact such an argument would be sadly mistaken, because "sequence
points" are defined by the C standard and it'd be horribly wrong to
even _try_ claiming that the C standard knows about "volatile asms".


> > > if that is
> > > implemented by GCC in the "obvious" way.  Even a "plain" asm()
> > > will do the same.
> > 
> > Read the relevant GCC documentation.
> 
> I did, yes.

No, you didn't read:

http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

Read the bit about the need for artificial dependencies, and the example
given there:

asm volatile("mtfsf 255,%0" : : "f" (fpenv));
sum = x + y;

The docs explicitly say the addition can be moved before the "volatile
asm". Hopefully, as you know, (x + y) is an C "expression" and hence
a "sequence point" as defined by the standard. So the "volatile asm"
should've happened before it, right? Wrong.

I know there is also stuff written about "side-effects" there which
_could_ give the same semantic w.r.t. sequence points as the volatile
access casts, but hey, it's GCC's own documentation, you obviously can't
find fault with _me_ if there's wrong stuff written in there. Say that
to GCC ...

See, "volatile" C keyword, for all it's ill-definition and dodgy
semantics, is still at least given somewhat of a treatment in the C
standard (whose quality is ... ummm, sadly not always good and clear,
but unsurprisingly, still about 5,482 orders-of-magnitude times
better than GCC docs). Semantics of "volatile" as applies to inline
asm, OTOH? You're completely relying on the compiler for that ...


> > [ of course, if the (latest) GCC documentation is *yet again*
> >   wrong, then alright, not much I can do about it, is there. ]
> 
> There was (and is) nothing wrong about the "+m" documentation, if
> that is what you are talking about.  It could be extended now, to
> allow "+m" -- but that takes more than just "fixing" the documentation.

No, there was (and is) _everything_ wrong about the "+" documentation as
applies to memory-constrained operands. I don't give a whit if it's
some workaround in their gimplifier, or the other, that makes it possible
to use "+m" (like the current kernel code does). The docs suggest
otherwise, so there's obviously a clear disconnect between the docs and
actual GCC behaviour.


[ You seem to often take issue with _amazingly_ petty and pedantic things,
  by the way :-) ]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Christoph Lameter wrote:

> On Fri, 17 Aug 2007, Paul E. McKenney wrote:
> 
> > On Sat, Aug 18, 2007 at 08:09:13AM +0800, Herbert Xu wrote:
> > > On Fri, Aug 17, 2007 at 04:59:12PM -0700, Paul E. McKenney wrote:
> > > >
> > > > gcc bugzilla bug #33102, for whatever that ends up being worth.  ;-)
> > > 
> > > I had totally forgotten that I'd already filed that bug more
> > > than six years ago until they just closed yours as a duplicate
> > > of mine :)
> > > 
> > > Good luck in getting it fixed!
> > 
> > Well, just got done re-opening it for the third time.  And a local
> > gcc community member advised me not to give up too easily.  But I
> > must admit that I am impressed with the speed that it was identified
> > as duplicate.
> > 
> > Should be entertaining!  ;-)
> 
> Right. ROTFL... volatile actually breaks atomic_t instead of making it 
> safe. x++ becomes a register load, increment and a register store. Without 
> volatile we can increment the memory directly.

No code does (or would do, or should do):

x.counter++;

on an "atomic_t x;" anyway.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > > > No it does not have any volatile semantics. atomic_dec() can be
> > > > reordered
> > > > at will by the compiler within the current basic unit if you do not add
> > > > a
> > > > barrier.
> > > 
> > > "volatile" has nothing to do with reordering.
> > 
> > If you're talking of "volatile" the type-qualifier keyword, then
> > http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows
> > otherwise.
> 
> I'm not sure what in that mail you mean, but anyway...
> 
> Yes, of course, the fact that "volatile" creates a side effect
> prevents certain things from being reordered wrt the atomic_dec();
> but the atomic_dec() has a side effect *already* so the volatile
> doesn't change anything.

That's precisely what that sub-thread (read down to the last mail
there, and not the first mail only) shows. So yes, "volatile" does
have something to do with re-ordering (as guaranteed by the C
standard).


> > > atomic_dec() writes
> > > to memory, so it _does_ have "volatile semantics", implicitly, as
> > > long as the compiler cannot optimise the atomic variable away
> > > completely -- any store counts as a side effect.
> > 
> > I don't think an atomic_dec() implemented as an inline "asm volatile"
> > or one that uses a "forget" macro would have the same re-ordering
> > guarantees as an atomic_dec() that uses a volatile access cast.
> 
> The "asm volatile" implementation does have exactly the same
> reordering guarantees as the "volatile cast" thing,

I don't think so.

> if that is
> implemented by GCC in the "obvious" way.  Even a "plain" asm()
> will do the same.

Read the relevant GCC documentation.

[ of course, if the (latest) GCC documentation is *yet again*
  wrong, then alright, not much I can do about it, is there. ]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Sat, 18 Aug 2007, Segher Boessenkool wrote:

> > #define forget(a)   __asm__ __volatile__ ("" :"=m" (a) :"m" (a))
> > 
> > [ This is exactly equivalent to using "+m" in the constraints, as recently
> >   explained on a GCC list somewhere, in response to the patch in my bitops
> >   series a few weeks back where I thought "+m" was bogus. ]
> 
> [It wasn't explained on a GCC list in response to your patch, as
> far as I can see -- if I missed it, please point me to an archived
> version of it].

http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01758.html

is a follow-up in the thread on the [EMAIL PROTECTED] mailing list,
which began with:

http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01677.html

that was posted by Jan Kubicka, as he quotes in that initial posting,
after I had submitted:

http://lkml.org/lkml/2007/7/23/252

which was a (wrong) patch to "rectify" what I thought was the "bogus"
"+m" constraint, as per the quoted extract from gcc docs (that was
given in that (wrong) patch's changelog).

That's when _I_ came to know how GCC interprets "+m", but probably
this has been explained on those lists multiple times. Who cares,
anyway?


> One last time: it isn't equivalent on older (but still supported
> by Linux) versions of GCC.  Current versions of GCC allow it, but
> it has no documented behaviour at all, so use it at your own risk.

True.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Segher Boessenkool wrote:

> > > atomic_dec() already has volatile behavior everywhere, so this is
> > > semantically
> > > okay, but this code (and any like it) should be calling cpu_relax() each
> > > iteration through the loop, unless there's a compelling reason not to.
> > > I'll
> > > allow that for some hardware drivers (possibly this one) such a compelling
> > > reason may exist, but hardware-independent core subsystems probably have
> > > no
> > > excuse.
> > 
> > No it does not have any volatile semantics. atomic_dec() can be reordered
> > at will by the compiler within the current basic unit if you do not add a
> > barrier.
> 
> "volatile" has nothing to do with reordering.

If you're talking of "volatile" the type-qualifier keyword, then
http://lkml.org/lkml/2007/8/16/231 (and sub-thread below it) shows
otherwise.

> atomic_dec() writes
> to memory, so it _does_ have "volatile semantics", implicitly, as
> long as the compiler cannot optimise the atomic variable away
> completely -- any store counts as a side effect.

I don't think an atomic_dec() implemented as an inline "asm volatile"
or one that uses a "forget" macro would have the same re-ordering
guarantees as an atomic_dec() that uses a volatile access cast.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Paul E. McKenney wrote:

> On Fri, Aug 17, 2007 at 01:09:08PM +0530, Satyam Sharma wrote:
> > 
> > On Thu, 16 Aug 2007, Paul E. McKenney wrote:
> > 
> > > On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote:
> > > > 
> > > > First of all, I think this illustrates that what you want
> > > > here has nothing to do with atomic ops.  The ORDERED_WRT_IRQ
> > > > macro occurs a lot more times in your patch than atomic
> > > > reads/sets.  So *assuming* that it was necessary at all,
> > > > then having an ordered variant of the atomic_read/atomic_set
> > > > ops could do just as well.
> > > 
> > > Indeed.  If I could trust atomic_read()/atomic_set() to cause the compiler
> > > to maintain ordering, then I could just use them instead of having to
> > > create an  ORDERED_WRT_IRQ().  (Or ACCESS_ONCE(), as it is called in a
> > > different patch.)
> > 
> > +#define WHATEVER(x)(*(volatile typeof(x) *)&(x))
> > [...]
> > Also, this gives *zero* "re-ordering" guarantees that your code wants
> > as you've explained it below) -- neither w.r.t. CPU re-ordering (which
> > probably you don't care about) *nor* w.r.t. compiler re-ordering
> > (which you definitely _do_ care about).
> 
> You are correct about CPU re-ordering (and about the fact that this
> example doesn't care about it), but not about compiler re-ordering.
> 
> The compiler is prohibited from moving a volatile access across a sequence
> point.  One example of a sequence point is a statement boundary.  Because
> all of the volatile accesses in this code are separated by statement
> boundaries, a conforming compiler is prohibited from reordering them.

Yes, you're right, and I believe precisely this was discussed elsewhere
as well today.

But I'd call attention to what Herbert mentioned there. You're using
ORDERED_WRT_IRQ() on stuff that is _not_ defined to be an atomic_t at all:

* Member "completed" of struct rcu_ctrlblk is a long.
* Per-cpu variable rcu_flipctr is an array of ints.
* Members "rcu_read_lock_nesting" and "rcu_flipctr_idx" of
  struct task_struct are ints.

So are you saying you're "having to use" this volatile-access macro
because you *couldn't* declare all the above as atomic_t and thus just
expect the right thing to happen by using the atomic ops API by default,
because it lacks volatile access semantics (on x86)?

If so, then I wonder if using the volatile access cast is really the
best way to achieve (at least in terms of code clarity) the kind of
re-ordering guarantees it wants there. (there could be alternative
solutions, such as using barrier(), or that at bottom of this mail)

What I mean is this: If you switch to atomic_t, and x86 switched to
make atomic_t have "volatile" semantics by default, the statements
would be simply a string of: atomic_inc(), atomic_add(), atomic_set(),
and atomic_read() statements, and nothing in there that clearly makes
it *explicit* that the code is correct (and not buggy) simply because
of the re-ordering guarantees that the C "volatile" type-qualifier
keyword gives us as per the standard. But now we're firmly in
"subjective" territory, so you or anybody could legitimately disagree.


> > > Suppose I tried replacing the ORDERED_WRT_IRQ() calls with
> > > atomic_read() and atomic_set().  Starting with __rcu_read_lock():
> > > 
> > > o If "ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])++"
> > >   was ordered by the compiler after
> > >   "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1", then
> > >   suppose an NMI/SMI happened after the rcu_read_lock_nesting but
> > >   before the rcu_flipctr.
> > > 
> > >   Then if there was an rcu_read_lock() in the SMI/NMI
> > >   handler (which is perfectly legal), the nested rcu_read_lock()
> > >   would believe that it could take the then-clause of the
> > >   enclosing "if" statement.  But because the rcu_flipctr per-CPU
> > >   variable had not yet been incremented, an RCU updater would
> > >   be within its rights to assume that there were no RCU reads
> > >   in progress, thus possibly yanking a data structure out from
> > >   under the reader in the SMI/NMI function.
> > > 
> > >   Fatal outcome.  Note that only one CPU is involved here
> > >   because these are all either per-CPU or per-task variables.
> > 
> > Ok, so you don't care about CPU re-ordering. Still, I should let you know
> > that your ORDERED_WRT_IRQ() -- bad name, 

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> 
> > On Fri, 17 Aug 2007, Nick Piggin wrote:
> > 
> > > Because they should be thinking about them in terms of barriers, over
> > > which the compiler / CPU is not to reorder accesses or cache memory
> > > operations, rather than "special" "volatile" accesses.
> > 
> > This is obviously just a taste thing. Whether to have that forget(x)
> > barrier as something author should explicitly sprinkle appropriately
> > in appropriate places in the code by himself or use a primitive that
> > includes it itself.
> 
> That's not obviously just taste to me. Not when the primitive has many
> (perhaps, the majority) of uses that do not require said barriers. And
> this is not solely about the code generation (which, as Paul says, is
> relatively minor even on x86).

See, you do *require* people to have to repeat the same things to you!

As has been written about enough times already, and if you followed the
discussion on this thread, I am *not* proposing that atomic_read()'s
semantics be changed to have any extra barriers. What is proposed is a
different atomic_read_xxx() variant thereof, that those can use who do
want that.

Now whether to have a kind of barrier ("volatile", whatever) in the
atomic_read_xxx() itself, or whether to make the code writer himself to
explicitly write the order(x) appropriately in appropriate places in the
code _is_ a matter of taste.


> > That's definitely the point, why not. This is why "barrier()", being
> > heavy-handed, is not the best option.
> 
> That is _not_ the point [...]

Again, you're requiring me to repeat things that were already made evident
on this thread (if you follow it).

This _is_ the point, because a lot of loops out there (too many of them,
I WILL NOT bother citing file_name:line_number) end up having to use a
barrier just because they're using a loop-exit-condition that depends
on a value returned by atomic_read(). It would be good for them if they
used an atomic_read_xxx() primitive that gave these "volatility" semantics
without junking compiler optimizations for other memory references.

> because there has already been an alternative posted

Whether that alternative (explicitly using forget(x), or wrappers thereof,
such as the "order_atomic" you proposed) is better than other alternatives
(such as atomic_read_xxx() which includes the volatility behaviour in
itself) is still open, and precisely what we started discussing just one
mail back.

(The above was also mostly stuff I had to repeated for you, sadly.)

> that better conforms with Linux barrier
> API and is much more widely useful and more usable.

I don't think so.

(Now *this* _is_ the "taste-dependent matter" that I mentioned earlier.)

> If you are so worried
> about
> barrier() being too heavyweight, then you're off to a poor start by wanting to
> add a few K of kernel text by making atomic_read volatile.

Repeating myself, for the N'th time, NO, I DON'T want to make atomic_read
have "volatile" semantics.

> > > because as I also mentioned, the logical extention
> > > to Linux's barrier API to handle this is the order(x) macro. Again, not
> > > special volatile accessors.
> > 
> > Sure, that forget(x) macro _is_ proposed to be made part of the generic
> > API. Doesn't explain why not to define/use primitives that has volatility
 
> > semantics in itself, though (taste matters apart).
^
> If you follow the discussion You were thinking of a reason why the
> semantics *should* be changed or added, and I was rebutting your argument
> that it must be used when a full barrier() is too heavy (ie. by pointing
> out that order() has superior semantics anyway).

Amazing. Either you have reading comprehension problems, or else, please
try reading this thread (or at least this sub-thread) again. I don't want
_you_ blaming _me_ for having to repeat things to you all over again.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> [...]
> > You think both these are equivalent in terms of "looks":
> > 
> > |
> > while (!atomic_read(&v)) {  |   while (!atomic_read_xxx(&v)) {
> > ... |   ...
> > cpu_relax_no_barrier(); |
> > cpu_relax_no_barrier();
> > order_atomic(&v);   |   }
> > }   |
> > 
> > (where order_atomic() is an atomic_t
> > specific wrapper as you mentioned below)
> > 
> > ?
> 
> I think the LHS is better if your atomic_read_xxx primitive is using the
> crazy one-sided barrier,
  ^

I'd say it's purposefully one-sided.

> because the LHS code you immediately know what
> barriers are happening, and with the RHS you have to look at the
> atomic_read_xxx
> definition.

No. As I said, the _xxx (whatever the heck you want to name it as) should
give the same heads-up that your "order_atomic" thing is supposed to give.


> If your atomic_read_xxx implementation was more intuitive, then both are
> pretty well equal. More lines != ugly code.
> 
> > [...]
> > What bugs?
> 
> You can't think for yourself? Your atomic_read_volatile contains a compiler
> barrier to the atomic variable before the load. 2 such reads from different
> locations look like this:
> 
> asm volatile("" : "+m" (v1));
> atomic_read(&v1);
> asm volatile("" : "+m" (v2));
> atomic_read(&v2);
> 
> Which implies that the load of v1 can be reordered to occur after the load
> of v2.

And how would that be a bug? (sorry, I really can't think for myself)


> > > Secondly, what sort of code would do such a thing?
> > 
> > See the nodemgr_host_thread() that does something similar, though not
> > exactly same.
> 
> I'm sorry, all this waffling about made up code which might do this and
> that is just a waste of time.

First, you could try looking at the code.

And by the way, as I've already said (why do *require* people to have to
repeat things to you?) this isn't even about only existing code.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> > On Fri, 17 Aug 2007, Nick Piggin wrote:
> > > Satyam Sharma wrote:
> > > 
> > > It is very obvious. msleep calls schedule() (ie. sleeps), which is
> > > always a barrier.
> > 
> > Probably you didn't mean that, but no, schedule() is not barrier because
> > it sleeps. It's a barrier because it's invisible.
> 
> Where did I say it is a barrier because it sleeps?

Just below. What you wrote:

> It is always a barrier because, at the lowest level, schedule() (and thus
> anything that sleeps) is defined to always be a barrier.

"It is always a barrier because, at the lowest level, anything that sleeps
is defined to always be a barrier".


> Regardless of
> whatever obscure means the compiler might need to infer the barrier.
> 
> In other words, you can ignore those obscure details because schedule() is
> always going to have an explicit barrier in it.

I didn't quite understand what you said here, so I'll tell what I think:

* foo() is a compiler barrier if the definition of foo() is invisible to
  the compiler at a callsite.

* foo() is also a compiler barrier if the definition of foo() includes
  a barrier, and it is inlined at the callsite.

If the above is wrong, or if there's something else at play as well,
do let me know.

> > > The "unobvious" thing is that you wanted to know how the compiler knows
> > > a function is a barrier -- answer is that if it does not *know* it is not
> > > a barrier, it must assume it is a barrier.
> > 
> > True, that's clearly what happens here. But are you're definitely joking
> > that this is "obvious" in terms of code-clarity, right?
> 
> No. If you accept that barrier() is implemented correctly, and you know
> that sleeping is defined to be a barrier,

Curiously, that's the second time you've said "sleeping is defined to
be a (compiler) barrier". How does the compiler even know if foo() is
a function that "sleeps"? Do compilers have some notion of "sleeping"
to ensure they automatically assume a compiler barrier whenever such
a function is called? Or are you saying that the compiler can see the
barrier() inside said function ... nopes, you're saying quite the
opposite below.


> then its perfectly clear. You
> don't have to know how the compiler "knows" that some function contains
> a barrier.

I think I do, why not? Would appreciate if you could elaborate on this.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> > [...]
> > The point is about *author expecations*. If people do expect atomic_read()
> > (or a variant thereof) to have volatile semantics, why not give them such
> > a variant?
> 
> Because they should be thinking about them in terms of barriers, over
> which the compiler / CPU is not to reorder accesses or cache memory
> operations, rather than "special" "volatile" accesses.

This is obviously just a taste thing. Whether to have that forget(x)
barrier as something author should explicitly sprinkle appropriately
in appropriate places in the code by himself or use a primitive that
includes it itself.

I'm not saying "taste matters aren't important" (they are), but I'm really
skeptical if most folks would find the former tasteful.

> > And by the way, the point is *also* about the fact that cpu_relax(), as
> > of today, implies a full memory clobber, which is not what a lot of such
> > loops want. (due to stuff mentioned elsewhere, summarized in that summary)
> 
> That's not the point,

That's definitely the point, why not. This is why "barrier()", being
heavy-handed, is not the best option.

> because as I also mentioned, the logical extention
> to Linux's barrier API to handle this is the order(x) macro. Again, not
> special volatile accessors.

Sure, that forget(x) macro _is_ proposed to be made part of the generic
API. Doesn't explain why not to define/use primitives that has volatility
semantics in itself, though (taste matters apart).
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Paul Mackerras wrote:

> Satyam Sharma writes:
> 
> > I wonder if this'll generate smaller and better code than _both_ the
> > other atomic_read_volatile() variants. Would need to build allyesconfig
> > on lots of diff arch's etc to test the theory though.
> 
> I'm sure it would be a tiny effect.
> 
> This whole thread is arguing about effects that are quite
> insignificant.

Hmm, the fact that this thread became what it did, probably means that
most developers on this list do not mind thinking/arguing about effects
or optimizations that are otherwise "tiny". But yeah, they are tiny
nonetheless.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> > 
> > On Fri, 17 Aug 2007, Nick Piggin wrote:
> 
> > > Also, why would you want to make these insane accessors for atomic_t
> > > types? Just make sure everybody knows the basics of barriers, and they
> > > can apply that knowledge to atomic_t and all other lockless memory
> > > accesses as well.
> > 
> > 
> > Code that looks like:
> > 
> > while (!atomic_read(&v)) {
> > ...
> > cpu_relax_no_barrier();
> > forget(v.counter);
> > 
> > }
> > 
> > would be uglier. Also think about code such as:
> 
> I think they would both be equally ugly,

You think both these are equivalent in terms of "looks":

|
while (!atomic_read(&v)) {  |   while (!atomic_read_xxx(&v)) {
... |   ...
cpu_relax_no_barrier(); |   cpu_relax_no_barrier();
order_atomic(&v);   |   }
}   |

(where order_atomic() is an atomic_t
specific wrapper as you mentioned below)

?

Well, taste varies, but ...

> but the atomic_read_volatile
> variant would be more prone to subtle bugs because of the weird
> implementation.

What bugs?

> And it would be more ugly than introducing an order(x) statement for
> all memory operations, and adding an order_atomic() wrapper for it
> for atomic types.

Oh, that order() / forget() macro [forget() was named such by Chuck Ebbert
earlier in this thread where he first mentioned it, btw] could definitely
be generically introduced for any memory operations.

> > a = atomic_read();
> > if (!a)
> > do_something();
> > 
> > forget();
> > a = atomic_read();
> > ... /* some code that depends on value of a, obviously */
> > 
> > forget();
> > a = atomic_read();
> > ...
> > 
> > So much explicit sprinkling of "forget()" looks ugly.
> 
> Firstly, why is it ugly? It's nice because of those nice explicit
> statements there that give us a good heads up and would have some
> comments attached to them

atomic_read_xxx (where xxx = whatever naming sounds nice to you) would
obviously also give a heads up, and could also have some comments
attached to it.

> (also, lack of the word "volatile" is always a plus).

Ok, xxx != volatile.

> Secondly, what sort of code would do such a thing?

See the nodemgr_host_thread() that does something similar, though not
exactly same.

> > on the other hand, looks neater. The "_volatile()" suffix makes it also
> > no less explicit than an explicit barrier-like macro that this primitive
> > is something "special", for code clarity purposes.
> 
> Just don't use the word volatile,

That sounds amazingly frivolous, but hey, why not. As I said, ok,
xxx != volatile.

> and have barriers both before and after the memory operation,

How could that lead to bugs? (if you can point to existing code,
but just some testcase / sample code would be fine as well).

> [...] I don't see
> the point though, when you could just have a single barrier(x)
> barrier function defined for all memory locations,

As I said, barrier() is too heavy-handed.

> rather than
> this odd thing that only works for atomics

Why would it work only for atomics? You could use that generic macro
for anything you well damn please.

> (and would have to
> be duplicated for atomic_set.

#define atomic_set_xxx for something similar. Big deal ... NOT.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Andi Kleen wrote:

> On Friday 17 August 2007 05:42, Linus Torvalds wrote:
> > On Fri, 17 Aug 2007, Paul Mackerras wrote:
> > > I'm really surprised it's as much as a few K.  I tried it on powerpc
> > > and it only saved 40 bytes (10 instructions) for a G5 config.
> >
> > One of the things that "volatile" generally screws up is a simple
> >
> > volatile int i;
> >
> > i++;
> 
> But for atomic_t people use atomic_inc() anyways which does this correctly.
> It shouldn't really matter for atomic_t.
> 
> I'm worrying a bit that the volatile atomic_t change caused subtle code 
> breakage like these delay read loops people here pointed out.

Umm, I followed most of the thread, but which breakage is this?

> Wouldn't it be safer to just re-add the volatile to atomic_read() 
> for 2.6.23? Or alternatively make it asm(), but volatile seems more
> proven.

The problem with volatile is not just trashy code generation (which also
definitely is a major problem), but definition holes, and implementation
inconsistencies. Making it asm() is not the only other alternative to
volatile either (read another reply to this mail), but considering most
of the thread has been about people not wanting even a
atomic_read_volatile() variant, making atomic_read() itself have volatile
semantics sounds ... strange :-)


PS: http://lkml.org/lkml/2007/8/15/407 was submitted a couple days back,
any word if you saw that?

I have another one for you:


[PATCH] i386, x86_64: __const_udelay() should not be marked inline

Because it can never get inlined in any callsite (each translation unit
is compiled separately for the kernel and so the implementation of
__const_udelay() would be invisible to all other callsites). In fact it
turns out, the correctness of callsites at arch/x86_64/kernel/crash.c:97
and arch/i386/kernel/crash.c:101 explicitly _depends_ upon it not being
inlined, and also it's an exported symbol (modules may want to call
mdelay() and udelay() that often becomes __const_udelay() after some
macro-ing in various headers). So let's not mark it as "inline" either.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 arch/i386/lib/delay.c   |2 +-
 arch/x86_64/lib/delay.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/i386/lib/delay.c b/arch/i386/lib/delay.c
index f6edb11..0082c99 100644
--- a/arch/i386/lib/delay.c
+++ b/arch/i386/lib/delay.c
@@ -74,7 +74,7 @@ void __delay(unsigned long loops)
delay_fn(loops);
 }
 
-inline void __const_udelay(unsigned long xloops)
+void __const_udelay(unsigned long xloops)
 {
int d0;
 
diff --git a/arch/x86_64/lib/delay.c b/arch/x86_64/lib/delay.c
index 2dbebd3..d0cd9cd 100644
--- a/arch/x86_64/lib/delay.c
+++ b/arch/x86_64/lib/delay.c
@@ -38,7 +38,7 @@ void __delay(unsigned long loops)
 }
 EXPORT_SYMBOL(__delay);
 
-inline void __const_udelay(unsigned long xloops)
+void __const_udelay(unsigned long xloops)
 {
__delay(((xloops * HZ * 
cpu_data[raw_smp_processor_id()].loops_per_jiffy) >> 32) + 1);
 }
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> > 
> > On Fri, 17 Aug 2007, Nick Piggin wrote:
> 
> > > > Sure, now
> > > > that I learned of these properties I can start to audit code and insert
> > > > barriers where I believe they are needed, but this simply means that
> > > > almost all occurrences of atomic_read will get barriers (unless there
> > > > already are implicit but more or less obvious barriers like msleep).
> > > 
> > > You might find that these places that appear to need barriers are
> > > buggy for other reasons anyway. Can you point to some in-tree code
> > > we can have a look at?
> > 
> > 
> > Such code was mentioned elsewhere (query nodemgr_host_thread in cscope)
> > that managed to escape the requirement for a barrier only because of
> > some completely un-obvious compilation-unit-scope thing. But I find such
> > an non-explicit barrier quite bad taste. Stefan, do consider plunking an
> > explicit call to barrier() there.
> 
> It is very obvious. msleep calls schedule() (ie. sleeps), which is
> always a barrier.

Probably you didn't mean that, but no, schedule() is not barrier because
it sleeps. It's a barrier because it's invisible.

> The "unobvious" thing is that you wanted to know how the compiler knows
> a function is a barrier -- answer is that if it does not *know* it is not
> a barrier, it must assume it is a barrier.

True, that's clearly what happens here. But are you're definitely joking
that this is "obvious" in terms of code-clarity, right?

Just 5 minutes back you mentioned elsewhere you like seeing lots of
explicit calls to barrier() (with comments, no less, hmm? :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> [...]
> > Granted, the above IS buggy code. But, the stated objective is to avoid
> > heisenbugs.
^^

> Anyway, why are you making up code snippets that are buggy in other
> ways in order to support this assertion being made that lots of kernel
> code supposedly depends on volatile semantics. Just reference the
> actual code.

Because the point is *not* about existing bugs in kernel code. At some
point Chris Snook (who started this thread) did write that "If I knew
of the existing bugs in the kernel, I would be sending patches for them,
not this series" or something to that effect.

The point is about *author expecations*. If people do expect atomic_read()
(or a variant thereof) to have volatile semantics, why not give them such
a variant?

And by the way, the point is *also* about the fact that cpu_relax(), as
of today, implies a full memory clobber, which is not what a lot of such
loops want. (due to stuff mentioned elsewhere, summarized in that summary)


> > And we have driver / subsystem maintainers such as Stefan
> > coming up and admitting that often a lot of code that's written to use
> > atomic_read() does assume the read will not be elided by the compiler.
 ^

(so it's about compiler barrier expectations only, though I fully agree
that those who're using atomic_t as if it were some magic thing that lets
them write lockless code are sorrily mistaken.)

> So these are broken on i386 and x86-64?

Possibly, but the point is not about existing bugs, as mentioned above.

Some such bugs have been found nonetheless -- reminds me, can somebody
please apply http://www.gossamer-threads.com/lists/linux/kernel/810674 ?


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Nick Piggin wrote:

> Stefan Richter wrote:
> [...]
> Just use spinlocks if you're not absolutely clear about potential
> races and memory ordering issues -- they're pretty cheap and simple.

I fully agree with this. As Paul Mackerras mentioned elsewhere,
a lot of authors sprinkle atomic_t in code thinking they're somehow
done with *locking*. This is sad, and I wonder if it's time for a
Documentation/atomic-considered-dodgy.txt kind of document :-)


> > Sure, now
> > that I learned of these properties I can start to audit code and insert
> > barriers where I believe they are needed, but this simply means that
> > almost all occurrences of atomic_read will get barriers (unless there
> > already are implicit but more or less obvious barriers like msleep).
> 
> You might find that these places that appear to need barriers are
> buggy for other reasons anyway. Can you point to some in-tree code
> we can have a look at?

Such code was mentioned elsewhere (query nodemgr_host_thread in cscope)
that managed to escape the requirement for a barrier only because of
some completely un-obvious compilation-unit-scope thing. But I find such
an non-explicit barrier quite bad taste. Stefan, do consider plunking an
explicit call to barrier() there.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> 
> > #define atomic_read_volatile(v) \
> > ({  \
> > forget((v)->counter);   \
> > ((v)->counter); \
> > })
> > 
> > where:
> 
> *vomit* :)

I wonder if this'll generate smaller and better code than _both_ the
other atomic_read_volatile() variants. Would need to build allyesconfig
on lots of diff arch's etc to test the theory though.


> Not only do I hate the keyword volatile, but the barrier is only a
> one-sided affair so its probable this is going to have slightly
> different allowed reorderings than a real volatile access.

True ...


> Also, why would you want to make these insane accessors for atomic_t
> types? Just make sure everybody knows the basics of barriers, and they
> can apply that knowledge to atomic_t and all other lockless memory
> accesses as well.

Code that looks like:

while (!atomic_read(&v)) {
...
cpu_relax_no_barrier();
forget(v.counter);

}

would be uglier. Also think about code such as:

a = atomic_read();
if (!a)
do_something();

forget();
a = atomic_read();
... /* some code that depends on value of a, obviously */

forget();
a = atomic_read();
...

So much explicit sprinkling of "forget()" looks ugly.

atomic_read_volatile()

on the other hand, looks neater. The "_volatile()" suffix makes it also
no less explicit than an explicit barrier-like macro that this primitive
is something "special", for code clarity purposes.


> > #define forget(a)   __asm__ __volatile__ ("" :"=m" (a) :"m" (a))
> 
> I like order(x) better, but it's not the most perfect name either.

forget(x) is just a stupid-placeholder-for-a-better-name. order(x) sounds
good but we could leave quibbling about function or macro names for later,
this thread is noisy as it is :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Fri, 17 Aug 2007, Paul Mackerras wrote:

> Herbert Xu writes:
> 
> > On Fri, Aug 17, 2007 at 03:09:57PM +1000, Paul Mackerras wrote:
> > > Herbert Xu writes:
> > > 
> > > > Can you find an actual atomic_read code snippet there that is
> > > > broken without the volatile modifier?
> > > 
> > > There are some in arch-specific code, for example line 1073 of
> > > arch/mips/kernel/smtc.c.  On mips, cpu_relax() is just barrier(), so
> > > the empty loop body is ok provided that atomic_read actually does the
> > > load each time around the loop.
> > 
> > A barrier() is all you need to force the compiler to reread
> > the value.
> > 
> > The people advocating volatile in this thread are talking
> > about code that doesn't use barrier()/cpu_relax().
> 
> Did you look at it?  Here it is:
> 
>   /* Someone else is initializing in parallel - let 'em finish */
>   while (atomic_read(&idle_hook_initialized) < 1000)
>   ;


Honestly, this thread is suffering from HUGE communication gaps.

What Herbert (obviously) meant there was that "this loop could've
been okay _without_ using volatile-semantics-atomic_read() also, if
only it used cpu_relax()".

That does work, because cpu_relax() is _at least_ barrier() on all
archs (on some it also emits some arch-dependent "pause" kind of
instruction).

Now, saying that "MIPS does not have such an instruction so I won't
use cpu_relax() for arch-dependent-busy-while-loops in arch/mips/"
sounds like a wrong argument, because: tomorrow, such arch's _may_
introduce such an instruction, so naturally, at that time we'd
change cpu_relax() appropriately (in reality, we would actually
*re-define* cpu_relax() and ensure that the correct version gets
pulled in depending on whether the callsite code is legacy or only
for the newer such CPUs of said arch, whatever), but loops such as
this would remain un-changed, because they never used cpu_relax()!

OTOH an argument that said the following would've made a stronger case:

"I don't want to use cpu_relax() because that's a full memory
clobber barrier() and I have loop-invariants / other variables
around in that code that I *don't* want the compiler to forget
just because it used cpu_relax(), and hence I will not use
cpu_relax() but instead make my atomic_read() itself have
"volatility" semantics. Not just that, but I will introduce a
cpu_relax_no_barrier() on MIPS, that would be a no-op #define
for now, but which may not be so forever, and continue to use
that in such busy loops."

In general, please read the thread-summary I've tried to do at:
http://lkml.org/lkml/2007/8/17/25
Feel free to continue / comment / correct stuff from there, there's
too much confusion and circular-arguments happening on this thread
otherwise.

[ I might've made an incorrect statement there about
  "volatile" w.r.t. cache on non-x86 archs, I think. ]


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Satyam Sharma


On Thu, 16 Aug 2007, Paul E. McKenney wrote:

> On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote:
> > On Thu, Aug 16, 2007 at 09:34:41AM -0700, Paul E. McKenney wrote:
> > >
> > > The compiler can also reorder non-volatile accesses.  For an example
> > > patch that cares about this, please see:
> > > 
> > >   http://lkml.org/lkml/2007/8/7/280
> > > 
> > > This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and
> > > rcu_read_unlock() to ensure that accesses aren't reordered with respect
> > > to interrupt handlers and NMIs/SMIs running on that same CPU.
> > 
> > Good, finally we have some code to discuss (even though it's
> > not actually in the kernel yet).
> 
> There was some earlier in this thread as well.

Hmm, I never quite got what all this interrupt/NMI/SMI handling and
RCU business you mentioned earlier was all about, but now that you've
pointed to the actual code and issues with it ...


> > First of all, I think this illustrates that what you want
> > here has nothing to do with atomic ops.  The ORDERED_WRT_IRQ
> > macro occurs a lot more times in your patch than atomic
> > reads/sets.  So *assuming* that it was necessary at all,
> > then having an ordered variant of the atomic_read/atomic_set
> > ops could do just as well.
> 
> Indeed.  If I could trust atomic_read()/atomic_set() to cause the compiler
> to maintain ordering, then I could just use them instead of having to
> create an  ORDERED_WRT_IRQ().  (Or ACCESS_ONCE(), as it is called in a
> different patch.)

+#define WHATEVER(x)(*(volatile typeof(x) *)&(x))

I suppose one could want volatile access semantics for stuff that's
a bit-field too, no?

Also, this gives *zero* "re-ordering" guarantees that your code wants
as you've explained it below) -- neither w.r.t. CPU re-ordering (which
probably you don't care about) *nor* w.r.t. compiler re-ordering
(which you definitely _do_ care about).


> > However, I still don't know which atomic_read/atomic_set in
> > your patch would be broken if there were no volatile.  Could
> > you please point them out?
> 
> Suppose I tried replacing the ORDERED_WRT_IRQ() calls with
> atomic_read() and atomic_set().  Starting with __rcu_read_lock():
> 
> o If "ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])++"
>   was ordered by the compiler after
>   "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1", then
>   suppose an NMI/SMI happened after the rcu_read_lock_nesting but
>   before the rcu_flipctr.
> 
>   Then if there was an rcu_read_lock() in the SMI/NMI
>   handler (which is perfectly legal), the nested rcu_read_lock()
>   would believe that it could take the then-clause of the
>   enclosing "if" statement.  But because the rcu_flipctr per-CPU
>   variable had not yet been incremented, an RCU updater would
>   be within its rights to assume that there were no RCU reads
>   in progress, thus possibly yanking a data structure out from
>   under the reader in the SMI/NMI function.
> 
>   Fatal outcome.  Note that only one CPU is involved here
>   because these are all either per-CPU or per-task variables.

Ok, so you don't care about CPU re-ordering. Still, I should let you know
that your ORDERED_WRT_IRQ() -- bad name, btw -- is still buggy. What you
want is a full compiler optimization barrier().

[ Your code probably works now, and emits correct code, but that's
  just because of gcc did what it did. Nothing in any standard,
  or in any documented behaviour of gcc, or anything about the real
  (or expected) semantics of "volatile" is protecting the code here. ]


> o If "ORDERED_WRT_IRQ(me->rcu_read_lock_nesting) = nesting + 1"
>   was ordered by the compiler to follow the
>   "ORDERED_WRT_IRQ(me->rcu_flipctr_idx) = idx", and an NMI/SMI
>   happened between the two, then an __rcu_read_lock() in the NMI/SMI
>   would incorrectly take the "else" clause of the enclosing "if"
>   statement.  If some other CPU flipped the rcu_ctrlblk.completed
>   in the meantime, then the __rcu_read_lock() would (correctly)
>   write the new value into rcu_flipctr_idx.
> 
>   Well and good so far.  But the problem arises in
>   __rcu_read_unlock(), which then decrements the wrong counter.
>   Depending on exactly how subsequent events played out, this could
>   result in either prematurely ending grace periods or never-ending
>   grace periods, both of which are fatal outcomes.
> 
> And the following are not needed in the current version of the
> patch, but will be in a future version that either avoids disabling
> irqs or that dispenses with the smp_read_barrier_depends() that I
> have 99% convinced myself is unneeded:
> 
> o nesting = ORDERED_WRT_IRQ(me->rcu_read_lock_nesting);
> 
> o idx = ORDERED_WRT_IRQ(rcu_ctrlblk.completed) & 0x1;
> 
> Furthermore, in that future version, irq handlers can cause the same
> mischief that SMI/NMI handlers can in this version.
> 
> Next, looking 

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Satyam Sharma


On Fri, 17 Aug 2007, Herbert Xu wrote:

> On Fri, Aug 17, 2007 at 01:43:27PM +1000, Paul Mackerras wrote:
> >
> > The cost of doing so seems to me to be well down in the noise - 44
> > bytes of extra kernel text on a ppc64 G5 config, and I don't believe
> > the extra few cycles for the occasional extra load would be measurable
> > (they should all hit in the L1 dcache).  I don't mind if x86[-64] have
> > atomic_read/set be nonvolatile and find all the missing barriers, but
> > for now on powerpc, I think that not having to find those missing
> > barriers is worth the 0.00076% increase in kernel text size.
> 
> BTW, the sort of missing barriers that triggered this thread
> aren't that subtle.  It'll result in a simple lock-up if the
> loop condition holds upon entry.  At which point it's fairly
> straightforward to find the culprit.

Not necessarily. A barrier-less buggy code such as below:

atomic_set(&v, 0);

... /* some initial code */

while (atomic_read(&v))
;

... /* code that MUST NOT be executed unless v becomes non-zero */

(where v->counter is has no volatile access semantics)

could be generated by the compiler to simply *elid* or *do away* with
the loop itself, thereby making the:

"/* code that MUST NOT be executed unless v becomes non-zero */"

to be executed even when v is zero! That is subtle indeed, and causes
no hard lockups.

Granted, the above IS buggy code. But, the stated objective is to avoid
heisenbugs. And we have driver / subsystem maintainers such as Stefan
coming up and admitting that often a lot of code that's written to use
atomic_read() does assume the read will not be elided by the compiler.

See, I agree, "volatility" semantics != what we often want. However, if
what we want is compiler barrier, for only the object under consideration,
"volatility" semantics aren't really "nonsensical" or anything.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Satyam Sharma
Hi Linus,

[ and others; I think there's a communication gap in a lot of this
  thread, and a little summary would be useful. Hence this posting. ]


On Thu, 16 Aug 2007, Linus Torvalds wrote:

> On Fri, 17 Aug 2007, Paul Mackerras wrote:
> > 
> > I'm really surprised it's as much as a few K.  I tried it on powerpc
> > and it only saved 40 bytes (10 instructions) for a G5 config.
> 
> One of the things that "volatile" generally screws up is a simple
> 
>   volatile int i;
> 
>   i++;
> 
> which a compiler will generally get horribly, horribly wrong.
> 
> [...] For no good reason, except that "volatile" just doesn't have any 
> good/clear semantics for the compiler, so most compilers will just make it 
> be "I will not touch this access in any way, shape, or form". Including 
> even trivially correct instruction optimization/combination.
> 
> This is one of the reasons why we should never use "volatile". It 
> pessimises code generation for no good reason - just because compilers 
> don't know what the heck it even means! 
> [...]
> In other words: "volatile" is a horribly horribly bad way of doing things, 
> because it generates *worse*code*, for no good reason. You just don't see 
> it on powerpc, because it's already a load-store architecture, so there is 
> no "good code" for doing direct-to-memory operations.


True, and I bet *everybody* on this list has already agreed for a very
long time that using "volatile" to type-qualify the _declaration_ of an
object itself as being horribly bad (taste-wise, code-generation-wise,
often even buggy for sitations where real CPU barriers should've been
used instead).

However, the discussion on this thread (IIRC) began with only "giving
volatility semantics" to atomic ops. Now that is different, and may not
require the use the "volatile" keyword (at least not in the declaration
of the object) itself.

Sadly, most arch's *still* do type-qualify the declaration of the
"counter" member of atomic_t as "volatile". This is probably a historic
hangover, and I suspect not yet rectified because of lethargy.

Anyway, some of the variants I can think of are:

[1]

#define atomic_read_volatile(v) \
({  \
forget((v)->counter);   \
((v)->counter); \
})

where:

#define forget(a)   __asm__ __volatile__ ("" :"=m" (a) :"m" (a))

[ This is exactly equivalent to using "+m" in the constraints, as recently
  explained on a GCC list somewhere, in response to the patch in my bitops
  series a few weeks back where I thought "+m" was bogus. ]

[2]

#define atomic_read_volatile(v) (*(volatile int *)&(v)->counter)

This is something that does work. It has reasonably good semantics
guaranteed by the C standard in conjunction with how GCC currently
behaves (and how it has behaved for all supported versions). I haven't
checked if generates much different code than the first variant above,
(it probably will generate similar code to just declaring the object
as volatile, but would still be better in terms of code-clarity and
taste, IMHO), but in any case, we should pick whichever of these variants
works for us and generates good code.

[3]

static inline int atomic_read_volatile(atomic_t *v)
{
... arch-dependent __asm__ __volatile__ stuff ...
}

I can reasonably bet this variant would often generate worse code than
at least the variant "[1]" above.


Now, why do we even require these "volatility" semantics variants?

Note, "volatility" semantics *know* / assume that it can have a meaning
_only_ as far as the compiler, so atomic_read_volatile() doesn't really
care reading stale values from the cache for certain non-x86 archs, etc.

The first argument is "safety":

Use of atomic_read() (possibly in conjunction with other atomic ops) in
a lot of code out there in the kernel *assumes* the compiler will not
optimize away those ops. (which is possible given current definitions
of atomic_set and atomic_read on archs such as x86 in present code).
An additional argument that builds on this one says that by ensuring
the compiler will not elid or coalesce these ops, we could even avoid
potential heisenbugs in the future.

However, there is a counter-argument:

As Herbert Xu has often been making the point, there is *no* bug out
there involving "atomic_read" in busy-while-loops that should not have
a compiler barrier (or cpu_relax() in fact) anyway. As for non-busy-loops,
they would invariable call schedule() at some point (possibly directly)
and thus have an "implicit" compiler barrier by virtue of calling out
a function that is not in scope of the current compilation unit (although
users in sched.c itself would probably require an explicit compiler
barrier).

The second pro-volatility-in-atomic-ops argument is performance:
(surprise!)

Using a full memory clobber compiler barrier in busy loops will disqualify
optimizations fo

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Satyam Sharma


On Thu, 16 Aug 2007, Segher Boessenkool wrote:

> > Here, I should obviously admit that the semantics of *(volatile int *)&
> > aren't any neater or well-defined in the _language standard_ at all. The
> > standard does say (verbatim) "precisely what constitutes as access to
> > object of volatile-qualified type is implementation-defined", but GCC
> > does help us out here by doing the right thing.
> 
> Where do you get that idea?

Try a testcase (experimentally verify).

> GCC manual, section 6.1, "When
> is a Volatile Object Accessed?" doesn't say anything of the
> kind.

True, "implementation-defined" as per the C standard _is_ supposed to mean
"unspecified behaviour where each implementation documents how the choice
is made". So ok, probably GCC isn't "documenting" this
implementation-defined behaviour which it is supposed to, but can't really
fault them much for this, probably.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Satyam Sharma


On Thu, 16 Aug 2007, Segher Boessenkool wrote:

> > Note that "volatile"
> > is a type-qualifier, not a type itself, so a cast of the _object_ itself
> > to a qualified-type i.e. (volatile int) would not make the access itself
> > volatile-qualified.
> 
> There is no such thing as "volatile-qualified access" defined
> anywhere; there only is the concept of a "volatile-qualified
> *object*".

Sure, "volatile-qualified access" was not some standard term I used
there. Just something to mean "an access that would make the compiler
treat the object at that memory as if it were an object with a
volatile-qualified type".

Now the second wording *IS* technically correct, but come on, it's
24 words long whereas the original one was 3 -- and hopefully anybody
reading the shorter phrase *would* have known anyway what was meant,
without having to be pedantic about it :-)


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: drivers/infiniband/mlx/mad.c misplaced ;

2007-08-16 Thread Satyam Sharma
Hi Ilpo,


On Thu, 16 Aug 2007, Ilpo Järvinen wrote:

> 
> ...I guess those guys hunting for broken busyloops in the other thread 
> could also benefit from similar searching commands introduced in this 
> thread... ...Ccing Satyam to caught their attention too.
> 
> 
> > On Wed, Aug 15, 2007 at 05:40:11PM -0700, Joe Perches wrote:
> > > 
> > > There's more than a few of these (not inspected).
> > > 
> > > $ egrep -r --include=*.c "\bif[[:space:]]*\([^\)]*\)[[:space:]]*\;" * 
> 
> ...Hmm, I plugged in "a preprocessor" too to manage with non compliant 
> coding styles :-). Please understand that the line numbers are not an 
> exact match due to preprocessor changes:
> 
> $ for i in `find . -name '*.[ch]'`; do echo $i; indent -npro -kr -i8 -ts8 
> -sob -l8000 -ss -ncs -cp1 -nhnl -st $i | egrep -n "[[:space:]]if [(].*[)] ;$";
> done | grep -B1 "^[^.]"

Thanks, looks useful, will check with this.


Satyam

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Satyam Sharma
[ Bill tells me in private communication he gets this already, but I
  think it's more complicated than the shoddy explanation I'd made
  earlier so would wish to make this clearer in detail one last time,
  for the benefit of others listening in or reading the archives. ]


On Thu, 16 Aug 2007, Satyam Sharma wrote:

> On Thu, 16 Aug 2007, Satyam Sharma wrote:
> [...]
> > On Wed, 15 Aug 2007, Bill Fink wrote:
> > > [...]
> > > I'm curious about one minor tangential point.  Why, instead of:
> > > 
> > >   b = *(volatile int *)&a;
> > > 
> > > why can't this just be expressed as:
> > > 
> > >   b = (volatile int)a;
> > > 
> > > Isn't it the contents of a that's volatile, i.e. it's value can change
> > > invisibly to the compiler, and that's why you want to force a read from
> > > memory?  Why do you need the "*(volatile int *)&" construct?
> > 
> > "b = (volatile int)a;" doesn't help us because a cast to a qualified type
> > has the same effect as a cast to an unqualified version of that type, as
> > mentioned in 6.5.4:4 (footnote 86) of the standard. Note that "volatile"
> > is a type-qualifier, not a type itself, so a cast of the _object_ itself
> > to a qualified-type i.e. (volatile int) would not make the access itself
> > volatile-qualified.

Casts don't produce lvalues, and the cast ((volatile int)a) does not
produce the object-int-a-qualified-as-"volatile" -- in fact, the
result of the above cast is whatever is the _value_ of "int a", with
the access to that object having _already_ taken place, as per the
actual type-qualification of the object (that was originally declared
as being _non-volatile_, in fact). Hence, defining atomic_read() as:

#define atomic_read(v)  ((volatile int)((v)->counter))

would be buggy and not give "volatility" semantics at all, unless the
"counter" object itself isn't volatile-qualified already (which it
isn't).

The result of the cast itself being the _value_ of the int object, and
not the object itself (i.e., not an lvalue), is thereby independent of
type-qualification in that cast itself (it just wouldn't make any
difference), hence the "cast to a qualified type has the same effect
as a cast to an unqualified version of that type" bit in section 6.5.4:4
of the standard.


> > To serve our purposes, it is necessary for us to take the address of this
> > (non-volatile) object, cast the resulting _pointer_ to the corresponding
> > volatile-qualified pointer-type, and then dereference it. This makes that
> > particular _access_ be volatile-qualified, without the object itself being
> > such. Also note that the (dereferenced) result is also a valid lvalue and
> > hence can be used in "*(volatile int *)&a = b;" kind of construction
> > (which we use for the atomic_set case).

Dereferencing using the *(pointer-type-cast)& construct, OTOH, serves
us well:

#define atomic_read(v)  (*(volatile int *)&(v)->counter)

Firstly, note that the cast here being (volatile int *) and not
(int * volatile) qualifies the type of the _object_ being pointed to
by the pointer in question as being volatile-qualified, and not the
pointer itself (6.2.5:27 of the standard, and 6.3.2.3:2 allows us to
convert from a pointer-to-non-volatile-qualified-int to a pointer-to-
volatile-qualified-int, which suits us just fine) -- but note that
the _access_ to that address itself has not yet occurred.

_After_ specifying the memory address as containing a volatile-qualified-
int-type object, (and GCC co-operates as mentioned below), we proceed to
dereference it, which is when the _actual access_ occurs, therefore with
"volatility" semantics this time.

Interesting.


> Here, I should obviously admit that the semantics of *(volatile int *)&
> aren't any neater or well-defined in the _language standard_ at all. The
> standard does say (verbatim) "precisely what constitutes as access to
> object of volatile-qualified type is implementation-defined", but GCC
> does help us out here by doing the right thing. Accessing the non-volatile
> object there using the volatile-qualified pointer-type cast makes GCC
> treat the object stored at that memory address itself as if it were a 
> volatile object, thus making the access end up having what we're calling
> as "volatility" semantics here.
> 
> Honestly, given such confusion, and the propensity of the "volatile"
> type-qualifier keyword to be ill-defined (or at least poorly understood,
> often inconsistently implemented), I'd (again) express my opinion that it
> would be best to avoid its usage, given other alternatives do exist.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma


On Thu, 16 Aug 2007, Satyam Sharma wrote:

> Hi Bill,
> 
> 
> On Wed, 15 Aug 2007, Bill Fink wrote:
> 
> > On Wed, 15 Aug 2007, Satyam Sharma wrote:
> > 
> > > (C)
> > > $ cat tp3.c
> > > int a;
> > > 
> > > void func(void)
> > > {
> > >   *(volatile int *)&a = 10;
> > >   *(volatile int *)&a = 20;
> > > }
> > > $ gcc -Os -S tp3.c
> > > $ cat tp3.s
> > > ...
> > > movl$10, a
> > > movl$20, a
> > > ...
> > 
> > I'm curious about one minor tangential point.  Why, instead of:
> > 
> > b = *(volatile int *)&a;
> > 
> > why can't this just be expressed as:
> > 
> > b = (volatile int)a;
> > 
> > Isn't it the contents of a that's volatile, i.e. it's value can change
> > invisibly to the compiler, and that's why you want to force a read from
> > memory?  Why do you need the "*(volatile int *)&" construct?
> 
> "b = (volatile int)a;" doesn't help us because a cast to a qualified type
> has the same effect as a cast to an unqualified version of that type, as
> mentioned in 6.5.4:4 (footnote 86) of the standard. Note that "volatile"
> is a type-qualifier, not a type itself, so a cast of the _object_ itself
> to a qualified-type i.e. (volatile int) would not make the access itself
> volatile-qualified.
> 
> To serve our purposes, it is necessary for us to take the address of this
> (non-volatile) object, cast the resulting _pointer_ to the corresponding
> volatile-qualified pointer-type, and then dereference it. This makes that
> particular _access_ be volatile-qualified, without the object itself being
> such. Also note that the (dereferenced) result is also a valid lvalue and
> hence can be used in "*(volatile int *)&a = b;" kind of construction
> (which we use for the atomic_set case).

Here, I should obviously admit that the semantics of *(volatile int *)&
aren't any neater or well-defined in the _language standard_ at all. The
standard does say (verbatim) "precisely what constitutes as access to
object of volatile-qualified type is implementation-defined", but GCC
does help us out here by doing the right thing. Accessing the non-volatile
object there using the volatile-qualified pointer-type cast makes GCC
treat the object stored at that memory address itself as if it were a 
volatile object, thus making the access end up having what we're calling
as "volatility" semantics here.

Honestly, given such confusion, and the propensity of the "volatile"
type-qualifier keyword to be ill-defined (or at least poorly understood,
often inconsistently implemented), I'd (again) express my opinion that it
would be best to avoid its usage, given other alternatives do exist.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma
Hi Bill,


On Wed, 15 Aug 2007, Bill Fink wrote:

> On Wed, 15 Aug 2007, Satyam Sharma wrote:
> 
> > (C)
> > $ cat tp3.c
> > int a;
> > 
> > void func(void)
> > {
> > *(volatile int *)&a = 10;
> > *(volatile int *)&a = 20;
> > }
> > $ gcc -Os -S tp3.c
> > $ cat tp3.s
> > ...
> > movl$10, a
> > movl$20, a
> > ...
> 
> I'm curious about one minor tangential point.  Why, instead of:
> 
>   b = *(volatile int *)&a;
> 
> why can't this just be expressed as:
> 
>   b = (volatile int)a;
> 
> Isn't it the contents of a that's volatile, i.e. it's value can change
> invisibly to the compiler, and that's why you want to force a read from
> memory?  Why do you need the "*(volatile int *)&" construct?

"b = (volatile int)a;" doesn't help us because a cast to a qualified type
has the same effect as a cast to an unqualified version of that type, as
mentioned in 6.5.4:4 (footnote 86) of the standard. Note that "volatile"
is a type-qualifier, not a type itself, so a cast of the _object_ itself
to a qualified-type i.e. (volatile int) would not make the access itself
volatile-qualified.

To serve our purposes, it is necessary for us to take the address of this
(non-volatile) object, cast the resulting _pointer_ to the corresponding
volatile-qualified pointer-type, and then dereference it. This makes that
particular _access_ be volatile-qualified, without the object itself being
such. Also note that the (dereferenced) result is also a valid lvalue and
hence can be used in "*(volatile int *)&a = b;" kind of construction
(which we use for the atomic_set case).


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma


On Thu, 16 Aug 2007, Satyam Sharma wrote:

> On Thu, 16 Aug 2007, Paul Mackerras wrote:
> > Herbert Xu writes:
> > 
> > > See sk_stream_mem_schedule in net/core/stream.c:
> > > 
> > > /* Under limit. */
> > > if (atomic_read(sk->sk_prot->memory_allocated) < 
> > > sk->sk_prot->sysctl_mem[0]) {
> > > if (*sk->sk_prot->memory_pressure)
> > > *sk->sk_prot->memory_pressure = 0;
> > > return 1;
> > > }
> > > 
> > > /* Over hard limit. */
> > > if (atomic_read(sk->sk_prot->memory_allocated) > 
> > > sk->sk_prot->sysctl_mem[2]) {
> > > sk->sk_prot->enter_memory_pressure();
> > > goto suppress_allocation;
> > > }
> > > 
> > > We don't need to reload sk->sk_prot->memory_allocated here.
> > 
> > Are you sure?  How do you know some other CPU hasn't changed the value
> > in between?
> 
> I can't speak for this particular case, but there could be similar code
> examples elsewhere, where we do the atomic ops on an atomic_t object
> inside a higher-level locking scheme that would take care of the kind of
> problem you're referring to here. It would be useful for such or similar
> code if the compiler kept the value of that atomic object in a register.

We might not be using atomic_t (and ops) if we already have a higher-level
locking scheme, actually. So as Herbert mentioned, such cases might just
not care. [ Too much of this thread, too little sleep, sorry! ]

Anyway, the problem, of course, is that this conversion to a stronger /
safer-by-default behaviour doesn't happen with zero cost to performance.
Converting atomic ops to "volatile" behaviour did add ~2K to kernel text
for archs such as i386 (possibly to important codepaths) that didn't have
those semantics already so it would be constructive to actually look at
those differences and see if there were really any heisenbugs that got
rectified. Or if there were legitimate optimizations that got wrongly
disabled. Onus lies on those proposing the modifications, I'd say ;-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma


On Thu, 16 Aug 2007, Paul Mackerras wrote:

> Herbert Xu writes:
> 
> > See sk_stream_mem_schedule in net/core/stream.c:
> > 
> > /* Under limit. */
> > if (atomic_read(sk->sk_prot->memory_allocated) < 
> > sk->sk_prot->sysctl_mem[0]) {
> > if (*sk->sk_prot->memory_pressure)
> > *sk->sk_prot->memory_pressure = 0;
> > return 1;
> > }
> > 
> > /* Over hard limit. */
> > if (atomic_read(sk->sk_prot->memory_allocated) > 
> > sk->sk_prot->sysctl_mem[2]) {
> > sk->sk_prot->enter_memory_pressure();
> > goto suppress_allocation;
> > }
> > 
> > We don't need to reload sk->sk_prot->memory_allocated here.
> 
> Are you sure?  How do you know some other CPU hasn't changed the value
> in between?

I can't speak for this particular case, but there could be similar code
examples elsewhere, where we do the atomic ops on an atomic_t object
inside a higher-level locking scheme that would take care of the kind of
problem you're referring to here. It would be useful for such or similar
code if the compiler kept the value of that atomic object in a register.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma


On Wed, 15 Aug 2007, Christoph Lameter wrote:

> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > Understood.  My point is not that the impact is precisely zero, but
> > rather that the impact on optimization is much less hurtful than the
> > problems that could arise otherwise, particularly as compilers become
> > more aggressive in their optimizations.
> 
> The problems arise because barriers are not used as required. Volatile 
> has wishy washy semantics and somehow marries memory barriers with data 
> access. It is clearer to separate the two. Conceptual cleanness usually 
> translates into better code. If one really wants the volatile then lets 
> make it explicit and use
> 
>   atomic_read_volatile()

Completely agreed, again. To summarize again (had done so about ~100 mails
earlier in this thread too :-) ...

atomic_{read,set}_volatile() -- guarantees volatility also along with
atomicity (the two _are_ different concepts after all, irrespective of
whether callsites normally want one with the other or not)

atomic_{read,set}_nonvolatile() -- only guarantees atomicity, compiler
free to elid / coalesce / optimize such accesses, can keep the object
in question cached in a local register, leads to smaller text, etc.

As to which one should be the default atomic_read() is a question of
whether majority of callsites (more weightage to important / hot
codepaths, lesser to obscure callsites) want a particular behaviour.

Do we have a consensus here? (hoping against hope, probably :-)


[ This thread has gotten completely out of hand ... for my mail client
  alpine as well, it now seems. Reminds of that 1000+ GPLv3 fest :-) ]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma


On Wed, 15 Aug 2007, Segher Boessenkool wrote:

> [...]
> > BTW:
> > 
> > #define atomic_read(a)  (*(volatile int *)&(a))
> > #define atomic_set(a,i) (*(volatile int *)&(a) = (i))
> > 
> > int a;
> > 
> > void func(void)
> > {
> > int b;
> > 
> > b = atomic_read(a);
> > atomic_set(a, 20);
> > b = atomic_read(a);
> > }
> > 
> > gives:
> > 
> > func:
> > pushl   %ebp
> > movla, %eax
> > movl%esp, %ebp
> > movl$20, a
> > movla, %eax
> > popl%ebp
> > ret
> > 
> > so the first atomic_read() wasn't optimized away.
> 
> Of course.  It is executed by the abstract machine, so
> it will be executed by the actual machine.  On the other
> hand, try
> 
>   b = 0;
>   if (b)
>   b = atomic_read(a);
> 
> or similar.

Yup, obviously. Volatile accesses (or any access to volatile objects),
or even "__volatile__ asms" (which gcc normally promises never to elid)
can always be optimized for cases such as these where the compiler can
trivially determine that the code in question is not reachable.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma
Hi Herbert,


On Thu, 16 Aug 2007, Herbert Xu wrote:

> On Thu, Aug 16, 2007 at 06:28:42AM +0530, Satyam Sharma wrote:
> >
> > > The udelay itself certainly should have some form of cpu_relax in it.
> > 
> > Yes, a form of barrier() must be present in mdelay() or udelay() itself
> > as you say, having it in __const_udelay() is *not* enough (superflous
> > actually, considering it is already a separate translation unit and
> > invisible to the compiler).
> 
> As long as __const_udelay does something which has the same
> effect as barrier it is enough even if it's in the same unit.

Only if __const_udelay() is inlined. But as I said, __const_udelay()
-- although marked "inline" -- will never be inlined anywhere in the
kernel in reality. It's an exported symbol, and never inlined from
modules. Even from built-in targets, the definition of __const_udelay
is invisible when gcc is compiling the compilation units of those
callsites. The compiler has no idea that that function has barriers
or not, so we're saved here _only_ by the lucky fact that
__const_udelay() is in a different compilation unit.


> As a matter of fact it does on i386 where __delay either uses
> rep_nop or asm/volatile.

__delay() can be either delay_tsc() or delay_loop() on i386.

delay_tsc() uses the rep_nop() there for it's own little busy
loop, actually. But for a call site that inlines __const_udelay()
-- if it were ever moved to a .h file and marked inline -- the
call to __delay() will _still_ be across compilation units. So,
again for this case, it does not matter if the callee function
has compiler barriers or not (it would've been a different story
if we were discussing real/CPU barriers, I think), what saves us
here is just the fact that a call is made to a function from a
different compilation unit, which is invisible to the compiler
when compiling the callsite, and hence acting as the compiler
barrier.

Regarding delay_loop(), it uses "volatile" for the "asm" which
has quite different semantics from the C language "volatile"
type-qualifier keyword and does not imply any compiler barrier
at all.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma
[ Sorry for empty subject line in previous mail. I intended to make
  a patch so cleared it to change it, but ultimately neither made
  a patch nor restored subject line. Done that now. ]


On Thu, 16 Aug 2007, Herbert Xu wrote:

> On Thu, Aug 16, 2007 at 06:06:00AM +0530, Satyam Sharma wrote:
> > 
> > that are:
> > 
> > while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
> > mdelay(1);
> > msecs--;
> > }
> > 
> > where mdelay() becomes __const_udelay() which happens to be in another
> > translation unit (arch/i386/lib/delay.c) and hence saves this callsite
> > from being a bug :-)
> 
> The udelay itself certainly should have some form of cpu_relax in it.

Yes, a form of barrier() must be present in mdelay() or udelay() itself
as you say, having it in __const_udelay() is *not* enough (superflous
actually, considering it is already a separate translation unit and
invisible to the compiler).

However, there are no compiler barriers on the macro-definition-path
between mdelay(1) and __const_udelay(), so the only thing that saves us
from being a bug here is indeed the different-translation-unit concept.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-15 Thread Satyam Sharma


On Wed, 15 Aug 2007, Heiko Carstens wrote:

> [...]
> Btw.: we still have
> 
> include/asm-i386/mach-es7000/mach_wakecpu.h:  while (!atomic_read(deassert));
> include/asm-i386/mach-default/mach_wakecpu.h: while (!atomic_read(deassert));
> 
> Looks like they need to be fixed as well.


[PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

Use cpu_relax() in the busy loops, as atomic_read() doesn't automatically
imply volatility for i386 and x86_64. x86_64 doesn't have this issue because
it open-codes the while loop in smpboot.c:smp_callin() itself that already
uses cpu_relax().

For i386, however, smpboot.c:smp_callin() calls wait_for_init_deassert()
which is buggy for mach-default and mach-es7000 cases.

[ I test-built a kernel -- smp_callin() itself got inlined in its only
  callsite, smpboot.c:start_secondary() -- and the relevant piece of
  code disassembles to the following:

0xc1019704 :mov0xc144c4c8,%eax
0xc1019709 :test   %eax,%eax
0xc101970b :je 0xc1019709 

  init_deasserted (at 0xc144c4c8) gets fetched into %eax only once and
  then we loop over the test of the stale value in the register only,
  so these look like real bugs to me. With the fix below, this becomes:

0xc1019706 :pause
0xc1019708 :cmpl   $0x0,0xc144c4c8
0xc101970f :je 0xc1019706 

  which looks nice and healthy. ]

Thanks to Heiko Carstens for noticing this.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 include/asm-i386/mach-default/mach_wakecpu.h |3 ++-
 include/asm-i386/mach-es7000/mach_wakecpu.h  |3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/asm-i386/mach-default/mach_wakecpu.h 
b/include/asm-i386/mach-default/mach_wakecpu.h
index 673b85c..3ebb178 100644
--- a/include/asm-i386/mach-default/mach_wakecpu.h
+++ b/include/asm-i386/mach-default/mach_wakecpu.h
@@ -15,7 +15,8 @@
 
 static inline void wait_for_init_deassert(atomic_t *deassert)
 {
-   while (!atomic_read(deassert));
+   while (!atomic_read(deassert))
+   cpu_relax();
return;
 }
 
diff --git a/include/asm-i386/mach-es7000/mach_wakecpu.h 
b/include/asm-i386/mach-es7000/mach_wakecpu.h
index efc903b..84ff583 100644
--- a/include/asm-i386/mach-es7000/mach_wakecpu.h
+++ b/include/asm-i386/mach-es7000/mach_wakecpu.h
@@ -31,7 +31,8 @@ wakeup_secondary_cpu(int phys_apicid, unsigned long start_eip)
 static inline void wait_for_init_deassert(atomic_t *deassert)
 {
 #ifdef WAKE_SECONDARY_VIA_INIT
-   while (!atomic_read(deassert));
+   while (!atomic_read(deassert))
+   cpu_relax();
 #endif
return;
 }
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2007-08-15 Thread Satyam Sharma


On Wed, 15 Aug 2007, Segher Boessenkool wrote:

> > > > What you probably mean is that the compiler has to assume any code
> > > > it cannot currently see can do anything (insofar as allowed by the
> > > > relevant standards etc.)
> > 
> > I think this was just terminology confusion here again. Isn't "any code
> > that it cannot currently see" the same as "another compilation unit",
> 
> It is not; try  gcc -combine  or the upcoming link-time optimisation
> stuff, for example.
> 
> > and wouldn't the "compilation unit" itself expand if we ask gcc to
> > compile more than one unit at once? Or is there some more specific
> > "definition" for "compilation unit" (in gcc lingo, possibly?)
> 
> "compilation unit" is a C standard term.  It typically boils down
> to "single .c file".

As you mentioned later, "single .c file with all the other files (headers
or other .c files) that it pulls in via #include" is actually "translation
unit", both in the C standard as well as gcc docs. "Compilation unit"
doesn't seem to be nearly as standard a term, though in most places it
is indeed meant to be same as "translation unit", but with the new gcc
inter-module-analysis stuff that you referred to above, I suspect one may
reasonably want to call a "compilation unit" as all that the compiler sees
at a given instant.

BTW I did some auditing (only inside include/asm-{i386,x86_64}/ and
arch/{i386,x86_64}/) and found a couple more callsites that don't use
cpu_relax():

arch/i386/kernel/crash.c:101
arch/x86_64/kernel/crash.c:97

that are:

while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
mdelay(1);
msecs--;
}

where mdelay() becomes __const_udelay() which happens to be in another
translation unit (arch/i386/lib/delay.c) and hence saves this callsite
from being a bug :-)

Curiously, __const_udelay() is still marked as "inline" where it is
implemented in lib/delay.c which is weird, considering it won't ever
be inlined, would it? With the kernel presently being compiled one
translation unit at a time, I don't see how the implementation would
be visible to any callsite out there to be able to inline it.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma
[ The Cc: list scares me. Should probably trim it. ]


On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 08:31:25PM +0200, Segher Boessenkool wrote:
> > >>How does the compiler know that msleep() has got barrier()s?
> > >
> > >Because msleep_interruptible() is in a separate compilation unit,
> > >the compiler has to assume that it might modify any arbitrary global.
> > 
> > No; compilation units have nothing to do with it, GCC can optimise
> > across compilation unit boundaries just fine, if you tell it to
> > compile more than one compilation unit at once.
> 
> Last I checked, the Linux kernel build system did compile each .c file
> as a separate compilation unit.
> 
> > What you probably mean is that the compiler has to assume any code
> > it cannot currently see can do anything (insofar as allowed by the
> > relevant standards etc.)

I think this was just terminology confusion here again. Isn't "any code
that it cannot currently see" the same as "another compilation unit",
and wouldn't the "compilation unit" itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
"definition" for "compilation unit" (in gcc lingo, possibly?)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma


On Wed, 15 Aug 2007, Segher Boessenkool wrote:

> > "Volatile behaviour" itself isn't consistently defined (at least
> > definitely not consistently implemented in various gcc versions across
> > platforms),
> 
> It should be consistent across platforms; if not, file a bug please.
> 
> > but it is /expected/ to mean something like: "ensure that
> > every such access actually goes all the way to memory, and is not
> > re-ordered w.r.t. to other accesses, as far as the compiler can take
  ^
  (volatile)

(or, alternatively, "other accesses to the same volatile object" ...)

> > care of these". The last "as far as compiler can take care" disclaimer
> > comes about due to CPUs doing their own re-ordering nowadays.
> 
> You can *expect* whatever you want, but this isn't in line with
> reality at all.
> 
> volatile _does not_ prevent reordering wrt other accesses.
> [...]
> What volatile does are a) never optimise away a read (or write)
> to the object, since the data can change in ways the compiler
> cannot see; and b) never move stores to the object across a
> sequence point.  This does not mean other accesses cannot be
> reordered wrt the volatile access.
> 
> If the abstract machine would do an access to a volatile-
> qualified object, the generated machine code will do that
> access too.  But, for example, it can still be optimised
> away by the compiler, if it can prove it is allowed to.

As (now) indicated above, I had meant multiple volatile accesses to
the same object, obviously.

BTW:

#define atomic_read(a)  (*(volatile int *)&(a))
#define atomic_set(a,i) (*(volatile int *)&(a) = (i))

int a;

void func(void)
{
int b;

b = atomic_read(a);
atomic_set(a, 20);
b = atomic_read(a);
}

gives:

func:
pushl   %ebp
movla, %eax
movl%esp, %ebp
movl$20, a
movla, %eax
popl%ebp
ret

so the first atomic_read() wasn't optimized away.


> volatile _does not_ make accesses go all the way to memory.
> [...]
> If you want stuff to go all the way to memory, you need some
> architecture-specific flush sequence; to make a store globally
> visible before another store, you need mb(); before some following
> read, you need mb(); to prevent reordering you need a barrier.

Sure, which explains the "as far as the compiler can take care" bit.
Poor phrase / choice of words, probably.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma


On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 10:48:28PM +0530, Satyam Sharma wrote:
> > [...]
> > Not for i386 and x86_64 -- those have atomic ops without any "volatile"
> > semantics (currently as per existing definitions).
> 
> I claim unit volumes with arm, and the majority of the architectures, but
> I cannot deny the popularity of i386 and x86_64 with many developers.  ;-)

Hmm, does arm really need that "volatile int counter;"? Hopefully RMK will
take a patch removing that "volatile" ... ;-)


> However, I am not aware of code in the kernel that would benefit
> from the compiler coalescing multiple atomic_set() and atomic_read()
> invocations, thus I don't see the downside to volatility in this case.
> Are there some performance-critical code fragments that I am missing?

I don't know, and yes, code with multiple atomic_set's and atomic_read's
getting optimized or coalesced does sound strange to start with. Anyway,
I'm not against "volatile semantics" per se. As replied elsewhere, I do
appreciate the motivation behind this series (to _avoid_ gotchas, not to
fix existing ones). Just that I'd like to avoid using "volatile", for
aforementioned reasons, especially given that there are perfectly
reasonable alternatives to achieve the same desired behaviour.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma
Hi Paul,


On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 07:17:29PM +0530, Satyam Sharma wrote:
> > [...]
> > No, I'd actually prefer something like what Christoph Lameter suggested,
> > i.e. users (such as above) who want "volatile"-like behaviour from atomic
> > ops can use alternative functions. How about something like:
> > 
> > #define atomic_read_volatile(v) \
> > ({  \
> > forget(&(v)->counter);  \
> > ((v)->counter); \
> > })
> 
> Wouldn't the above "forget" the value, throw it away, then forget
> that it forgot it, giving non-volatile semantics?

Nope, I don't think so. I wrote the following trivial testcases:
[ See especially tp4.c and tp4.s (last example). ]

==
$ cat tp1.c # Using volatile access casts

#define atomic_read(a)  (*(volatile int *)&a)

int a;

void func(void)
{
a = 0;
while (atomic_read(a))
;
}
==
$ gcc -Os -S tp1.c; cat tp1.s

func:
pushl   %ebp
movl%esp, %ebp
movl$0, a
.L2:
movla, %eax
testl   %eax, %eax
jne .L2
popl%ebp
ret
==
$ cat tp2.c # Using nothing; gcc will optimize the whole loop away

#define forget(x)

#define atomic_read(a)  \
({  \
forget(&(a));   \
(a);\
})

int a;

void func(void)
{
a = 0;
while (atomic_read(a))
;
}
==
$ gcc -Os -S tp2.c; cat tp2.s

func:
pushl   %ebp
movl%esp, %ebp
popl%ebp
movl$0, a
ret
==
$ cat tp3.c # Using a full memory clobber barrier

#define forget(x)   asm volatile ("":::"memory")

#define atomic_read(a)  \
({  \
forget(&(a));   \
(a);\
})

int a;

void func(void)
{
a = 0;
while (atomic_read(a))
;
}
==
$ gcc -Os -S tp3.c; cat tp3.s

func:
pushl   %ebp
movl%esp, %ebp
movl$0, a
.L2:
cmpl$0, a
jne .L2
popl%ebp
ret
==
$ cat tp4.c # Using a forget(var) macro

#define forget(a)   __asm__ __volatile__ ("" :"=m" (a) :"m" (a))

#define atomic_read(a)  \
({  \
forget(a);  \
(a);\
})

int a;

void func(void)
{
a = 0;
while (atomic_read(a))
;
}
==
$ gcc -Os -S tp4.c; cat tp4.s

func:
pushl   %ebp
movl%esp, %ebp
movl$0, a
.L2:
cmpl$0, a
jne .L2
popl%ebp
ret
==


Possibly these were too trivial to expose any potential problems that you
may have been referring to, so would be helpful if you could write a more
concrete example / sample code.


> > Or possibly, implement these "volatile" atomic ops variants in inline asm
> > like the patch that Sebastian Siewior has submitted on another thread just
> > a while back.
> 
> Given that you are advocating a change (please keep in mind that
> atomic_read() and atomic_set() had volatile semantics on almost all
> platforms), care to give some example where these historical volatile
> semantics are causing a problem?
> [...]
> Can you give even one example
> where the pre-existing volatile semantics are causing enough of a problem
> to justify adding yet more atomic_*() APIs?

Will take this to the other sub-thread ...


> > Of course, if we find there are more callers in the kernel who want the
> > volatility behaviour than those who don't care, we can re-define the
> > existing ops to such variants, and re-name the existing definitions to
> > somethine else, say "atomic_read_nonvolatile" for all I care.
> 
> Do we really need another set of APIs?

Well, if there's one set of users who do care about volatile behaviour,
and another set that 

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma


On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 11:33:36PM +0800, Herbert Xu wrote:
> > On Wed, Aug 15, 2007 at 07:25:16AM -0700, Paul E. McKenney wrote:
> > > 
> > > Do we really need another set of APIs?  Can you give even one example
> > > where the pre-existing volatile semantics are causing enough of a problem
> > > to justify adding yet more atomic_*() APIs?
> > 
> > Let's turn this around.  Can you give a single example where
> > the volatile semantics is needed in a legitimate way?
> 
> Sorry, but you are the one advocating for the change.

Not for i386 and x86_64 -- those have atomic ops without any "volatile"
semantics (currently as per existing definitions).
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma


On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 06:09:35PM +0200, Stefan Richter wrote:
> > Herbert Xu wrote:
> > > On Wed, Aug 15, 2007 at 08:05:38PM +0530, Satyam Sharma wrote:
> > >>> I don't know if this here is affected:
> > 
> > [...something like]
> > b = atomic_read(a);
> > for (i = 0; i < 4; i++) {
> > msleep_interruptible(63);
> > c = atomic_read(a);
> > if (c != b) {
> > b = c;
> > i = 0;
> > }
> > }
> > 
> > > Nope, we're calling schedule which is a rather heavy-weight
> > > barrier.
> > 
> > How does the compiler know that msleep() has got barrier()s?
> 
> Because msleep_interruptible() is in a separate compilation unit,
> the compiler has to assume that it might modify any arbitrary global.
> In many cases, the compiler also has to assume that msleep_interruptible()
> might call back into a function in the current compilation unit, thus
> possibly modifying global static variables.

Yup, I've just verified this with a testcase. So a call to any function
outside of the current compilation unit acts as a compiler barrier. Cool.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma
Hi Stefan,


On Wed, 15 Aug 2007, Stefan Richter wrote:

> On 8/15/2007 10:18 AM, Heiko Carstens wrote:
> > On Wed, Aug 15, 2007 at 02:49:03PM +0800, Herbert Xu wrote:
> >> Chris Snook <[EMAIL PROTECTED]> wrote:
> >> > 
> >> > Because atomic operations are generally used for synchronization, which 
> >> > requires 
> >> > volatile behavior.  Most such codepaths currently use an inefficient 
> >> > barrier(). 
> >> >  Some forget to and we get bugs, because people assume that 
> >> > atomic_read() 
> >> > actually reads something, and atomic_write() actually writes something.  
> >> > Worse, 
> >> > these are architecture-specific, even compiler version-specific bugs 
> >> > that are 
> >> > often difficult to track down.
> >> 
> >> I'm yet to see a single example from the current tree where
> >> this patch series is the correct solution.  So far the only
> >> example has been a buggy piece of code which has since been
> >> fixed with a cpu_relax.
> > 
> > Btw.: we still have
> > 
> > include/asm-i386/mach-es7000/mach_wakecpu.h:  while 
> > (!atomic_read(deassert));
> > include/asm-i386/mach-default/mach_wakecpu.h: while 
> > (!atomic_read(deassert));
> > 
> > Looks like they need to be fixed as well.
> 
> 
> I don't know if this here is affected:

Yes, I think it is. You're clearly expecting the read to actually happen
when you call get_hpsb_generation(). It's clearly not a busy-loop, so
cpu_relax() sounds pointless / wrong solution for this case, so I'm now
somewhat beginning to appreciate the motivation behind this series :-)

But as I said, there are ways to achieve the same goals of this series
without using "volatile".

I think I'll submit a RFC/patch or two on this myself (will also fix
the code pieces listed here).


> /* drivers/ieee1394/ieee1394_core.h */
> static inline unsigned int get_hpsb_generation(struct hpsb_host *host)
> {
>   return atomic_read(&host->generation);
> }
> 
> /* drivers/ieee1394/nodemgr.c */
> static int nodemgr_host_thread(void *__hi)
> {
>   [...]
> 
>   for (;;) {
>   [... sleep until bus reset event ...]
> 
>   /* Pause for 1/4 second in 1/16 second intervals,
>* to make sure things settle down. */
>   g = get_hpsb_generation(host);
>   for (i = 0; i < 4 ; i++) {
>   if (msleep_interruptible(63) ||
>   kthread_should_stop())
>   goto exit;

Totally unrelated, but this looks weird. IMHO you actually wanted:

msleep_interruptible(63);
if (kthread_should_stop())
goto exit;

here, didn't you? Otherwise the thread will exit even when
kthread_should_stop() != TRUE (just because it received a signal),
and it is not good for a kthread to exit on its own if it uses
kthread_should_stop() or if some other piece of kernel code could
eventually call kthread_stop(tsk) on it.

Ok, probably the thread will never receive a signal in the first
place because it's spawned off kthreadd which ignores all signals
beforehand, but still ...

[PATCH] ieee1394: Fix kthread stopping in nodemgr_host_thread

The nodemgr host thread can exit on its own even when kthread_should_stop
is not true, on receiving a signal (might never happen in practice, as
it ignores signals). But considering kthread_stop() must not be mixed with
kthreads that can exit on their own, I think changing the code like this
is clearer. This change means the thread can cut its sleep short when
receive a signal but looking at the code around, that sounds okay (and
again, it might never actually recieve a signal in practice).

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 drivers/ieee1394/nodemgr.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/ieee1394/nodemgr.c b/drivers/ieee1394/nodemgr.c
index 2ffd534..981a7da 100644
--- a/drivers/ieee1394/nodemgr.c
+++ b/drivers/ieee1394/nodemgr.c
@@ -1721,7 +1721,8 @@ static int nodemgr_host_thread(void *__hi)
 * to make sure things settle down. */
g = get_hpsb_generation(host);
for (i = 0; i < 4 ; i++) {
-   if (msleep_interruptible(63) || kthread_should_stop())
+   msleep_interruptible(63);
+   if (kthread_should_stop())
goto exit;
 
/* Now get the generation in which the node ID's we 
collect
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma


On Wed, 15 Aug 2007, Stefan Richter wrote:

> Satyam Sharma wrote:
> > On Wed, 15 Aug 2007, Stefan Richter wrote:
> >> Doesn't "atomic WRT all processors" require volatility?
> > 
> > No, it definitely doesn't. Why should it?
> > 
> > "Atomic w.r.t. all processors" is just your normal, simple "atomicity"
> > for SMP systems (ensure that that object is modified / set / replaced
> > in main memory atomically) and has nothing to do with "volatile"
> > behaviour.
> > 
> > "Volatile behaviour" itself isn't consistently defined (at least
> > definitely not consistently implemented in various gcc versions across
> > platforms), but it is /expected/ to mean something like: "ensure that
> > every such access actually goes all the way to memory, and is not
> > re-ordered w.r.t. to other accesses, as far as the compiler can take
> > care of these". The last "as far as compiler can take care" disclaimer
> > comes about due to CPUs doing their own re-ordering nowadays.
> > 
> > For example (say on i386):
> 
> [...]
> 
> > In (A) the compiler optimized "a = 10;" away, but the actual store
> > of the final value "20" to "a" was still "atomic". (B) and (C) also
> > exhibit "volatile" behaviour apart from the "atomicity".
> > 
> > But as others replied, it seems some callers out there depend upon
> > atomic ops exhibiting "volatile" behaviour as well, so that answers
> > my initial question, actually. I haven't looked at the code Paul
> > pointed me at, but I wonder if that "forget(x)" macro would help
> > those cases. I'd wish to avoid the "volatile" primitive, personally.
> 
> So, looking at load instead of store, understand I correctly that in
> your opinion
> 
>   int b;
> 
>   b = atomic_read(&a);
>   if (b)
>   do_something_time_consuming();
> 
>   b = atomic_read(&a);
>   if (b)
>   do_something_more();
> 
> should be changed to explicitly forget(&a) after
> do_something_time_consuming?

No, I'd actually prefer something like what Christoph Lameter suggested,
i.e. users (such as above) who want "volatile"-like behaviour from atomic
ops can use alternative functions. How about something like:

#define atomic_read_volatile(v) \
({  \
forget(&(v)->counter);  \
((v)->counter); \
})

Or possibly, implement these "volatile" atomic ops variants in inline asm
like the patch that Sebastian Siewior has submitted on another thread just
a while back.

Of course, if we find there are more callers in the kernel who want the
volatility behaviour than those who don't care, we can re-define the
existing ops to such variants, and re-name the existing definitions to
somethine else, say "atomic_read_nonvolatile" for all I care.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma


On Wed, 15 Aug 2007, Stefan Richter wrote:

> Satyam Sharma wrote:
> > [ BTW, why do we want the compiler to not optimize atomic_read()'s in
> >   the first place? Atomic ops guarantee atomicity, which has nothing
> >   to do with "volatility" -- users that expect "volatility" from
> >   atomic ops are the ones who must be fixed instead, IMHO. ]
> 
> LDD3 says on page 125:  "The following operations are defined for the
> type [atomic_t] and are guaranteed to be atomic with respect to all
> processors of an SMP computer."
> 
> Doesn't "atomic WRT all processors" require volatility?

No, it definitely doesn't. Why should it?

"Atomic w.r.t. all processors" is just your normal, simple "atomicity"
for SMP systems (ensure that that object is modified / set / replaced
in main memory atomically) and has nothing to do with "volatile"
behaviour.

"Volatile behaviour" itself isn't consistently defined (at least
definitely not consistently implemented in various gcc versions across
platforms), but it is /expected/ to mean something like: "ensure that
every such access actually goes all the way to memory, and is not
re-ordered w.r.t. to other accesses, as far as the compiler can take
care of these". The last "as far as compiler can take care" disclaimer
comes about due to CPUs doing their own re-ordering nowadays.

For example (say on i386):

(A)
$ cat tp1.c
int a;

void func(void)
{
a = 10;
a = 20;
}
$ gcc -Os -S tp1.c
$ cat tp1.s
...
movl$20, a
...

(B)
$ cat tp2.c
volatile int a;

void func(void)
{
a = 10;
a = 20;
}
$ gcc -Os -S tp2.c
$ cat tp2.s
...
movl$10, a
movl$20, a
...

(C)
$ cat tp3.c
int a;

void func(void)
{
*(volatile int *)&a = 10;
*(volatile int *)&a = 20;
}
$ gcc -Os -S tp3.c
$ cat tp3.s
...
movl$10, a
movl$20, a
...

In (A) the compiler optimized "a = 10;" away, but the actual store
of the final value "20" to "a" was still "atomic". (B) and (C) also
exhibit "volatile" behaviour apart from the "atomicity".

But as others replied, it seems some callers out there depend upon
atomic ops exhibiting "volatile" behaviour as well, so that answers
my initial question, actually. I haven't looked at the code Paul
pointed me at, but I wonder if that "forget(x)" macro would help
those cases. I'd wish to avoid the "volatile" primitive, personally.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-14 Thread Satyam Sharma


On Tue, 14 Aug 2007, Christoph Lameter wrote:

> On Thu, 9 Aug 2007, Chris Snook wrote:
> 
> > This patchset makes the behavior of atomic_read uniform by removing the
> > volatile keyword from all atomic_t and atomic64_t definitions that currently
> > have it, and instead explicitly casts the variable as volatile in
> > atomic_read().  This leaves little room for creative optimization by the
> > compiler, and is in keeping with the principles behind "volatile considered
> > harmful".
> 
> volatile is generally harmful even in atomic_read(). Barriers control
> visibility and AFAICT things are fine.

Frankly, I don't see the need for this series myself either. Personal
opinion (others may differ), but I consider "volatile" to be a sad /
unfortunate wart in C (numerous threads on this list and on the gcc
lists/bugzilla over the years stand testimony to this) and if we _can_
steer clear of it, then why not -- why use this ill-defined primitive
whose implementation has often differed over compiler versions and
platforms? Granted, barrier() _is_ heavy-handed in that it makes the
optimizer forget _everything_, but then somebody did post a forget()
macro on this thread itself ...

[ BTW, why do we want the compiler to not optimize atomic_read()'s in
  the first place? Atomic ops guarantee atomicity, which has nothing
  to do with "volatility" -- users that expect "volatility" from
  atomic ops are the ones who must be fixed instead, IMHO. ]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 27/28] Introduce U16_MAX and U32_MAX

2007-08-13 Thread Satyam Sharma


On Mon, 13 Aug 2007, Satyam Sharma wrote:

> Hi David,
> 
> 
> On Fri, 10 Aug 2007, David Miller wrote:
> [...]
> > 1) The reiserfs definition is better, it is _type_ based.
> >Please use (~(__u16)0) and (~(__u32)0), respectively.
> 
> Hmm, in that case ((__u16)0x) and ((__u32)0x) are probably
> better and clearer -- as that's what u16_max and u32_max are, after all.
> 
> We do require the (~0) thing for the max int/uint/long types, but that's
> because those are types where the number-of-bits is not known when writing
> the macro definition -- but that's not case with u16 and u32, so the
> 0xff... variants are clearer, IMHO.
> 
> 
> > 2) The reiserfs definition is going to define an equivalent
> >value, so just adding an #undef and still letting reiserfs
> >override is wrong.  Why put a common define in kernel.h
> >if other headers still keep their own crufty copy too?
> 
> Because removing the (re-)definition of U32_MAX from in there in
> reiserfs_fs.h will break builds of all userspace users of U32_MAX and
> max_reiserfs_offset(), would it not? I haven't looked at any reiserfs
> userspace tools source code, so possibly none such (that use
> max_reiserfs_offset) exist, but I thought it better to be safe.
> I'll have a look at the reiserfs-utils package, just in case.

I checked with latest reiserfsprogs available from namesys.com and also
reiserfs-utils package available from Fedora, and it appears there are
no users of either U32_MAX or max_reiserfs_offset() in any userspace
reiserfs tools, so we're safe in removing the U32_MAX from
include/linux/reiserfs_fs.h.


Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 27/28] Introduce U16_MAX and U32_MAX

2007-08-13 Thread Satyam Sharma
Hi David,


On Fri, 10 Aug 2007, David Miller wrote:

> From: [EMAIL PROTECTED]
> Date: Fri, 10 Aug 2007 14:12:10 -0700
> 
> > From: Satyam Sharma <[EMAIL PROTECTED]>
> > 
> > ... in kernel.h and clean up home-grown macros elsewhere in the tree.
> > 
> > Leave out the one in reiserfs_fs.h as it is in the userspace-visible part
> > of that header. Still, #undef the (equivalent) kernel version there to
> > avoid seeing "redefined, previous definition was here" gcc warnings.
> > 
> > [EMAIL PROTECTED]: fix U16_MAX, U32_MAX defns]
> > Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> 
> I won't apply this one, for two reasons:
> 
> 1) The reiserfs definition is better, it is _type_ based.
>Please use (~(__u16)0) and (~(__u32)0), respectively.

Hmm, in that case ((__u16)0x) and ((__u32)0x) are probably
better and clearer -- as that's what u16_max and u32_max are, after all.

We do require the (~0) thing for the max int/uint/long types, but that's
because those are types where the number-of-bits is not known when writing
the macro definition -- but that's not case with u16 and u32, so the
0xff... variants are clearer, IMHO.


> 2) The reiserfs definition is going to define an equivalent
>value, so just adding an #undef and still letting reiserfs
>override is wrong.  Why put a common define in kernel.h
>if other headers still keep their own crufty copy too?

Because removing the (re-)definition of U32_MAX from in there in
reiserfs_fs.h will break builds of all userspace users of U32_MAX and
max_reiserfs_offset(), would it not? I haven't looked at any reiserfs
userspace tools source code, so possibly none such (that use
max_reiserfs_offset) exist, but I thought it better to be safe.
I'll have a look at the reiserfs-utils package, just in case.


Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 )

2007-08-02 Thread Satyam Sharma


On Thu, 2 Aug 2007, Jarek Poplawski wrote:

> On Thu, Aug 02, 2007 at 04:02:21PM +0530, Satyam Sharma wrote:
> [...]
> How often "common" developer has to make such decisions in Kconfig?
> Probably no more than once per year. So, it's fair to blame anybody
> for not reading lkml to find if there are some bugs or
> recommendations before using apparently simple tool? I think there
> is usually some README for such things (maybe in Documentation/)?

Whoops, I only said that in humour, probably should've snuck in a
smiley or two. Definitely not blaming anybody. Apologies to anyone
who felt offended, sorry, nothing such was intended, I assure.

Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 )

2007-08-02 Thread Satyam Sharma
[ Read through the thread, looked at Kconfig files,
  did some tests. Adding Kconfig experts to Cc: list. ]


> On Thu, 2 Aug 2007, Sam Ravnborg wrote:
> 
> > > > 
> > > > ...
> > > > endif # NETDEVICES
> > > > 
> > > > config NETPOLL
> > > > depends on NETDEVICES
> > > > def_bool NETCONSOLE
> > > > 
> > > > config NETPOLL_TRAP
> > > > bool "Netpoll traffic trapping"
> > > > default n
> > > > depends on NETPOLL
> > > > 
> > > > config NET_POLL_CONTROLLER
> > > > def_bool NETPOLL
> > > > depends on NETPOLL


Gargh, what we're seeing here is a whole bunch of bugs, I think. First
I thought this must be one of those randconfig-producing-wrong-configs
issues, but surprisingly, running "make oldconfig" on this .config on
a fresh 2.6.23-rc1-mm1 tree didn't change anything in the .config.


Kconfig bug #1:
===

Which means, although:

*
menuconfig BAZ

if BAZ

config BAR

endif
*

is widely believed (by most folks, I've heard this on several threads,
and as written in the comment in drivers/net/Kconfig) to be equivalent to:

*
menuconfig BAZ

if BAZ
endif

config BAR
depends on BAZ
*

this is *not* enforced by "make oldconfig"! And hence, the NETPOLL &&
!NETDEVICES situation we're seeing here.

[ We could also categorize this as a bug in Kconfig's "if", fwiw. ]


Kconfig bug #2:
===

config FOO
def_bool BAR

is supposed to ensure that FOO == BAR (as Matt mentioned earlier).

However, even this is *not* enforced by "make oldconfig". And hence,
the NETPOLL && !NET_POLL_CONTROLLER situation we're seeing here.

In fact, I believe it's possible to even pass a NETCONSOLE but
!NETPOLL kind of .config through "make oldconfig" but it still won't
catch it, and build breakages *will* occur.

[ We could also categorize this as a bug in Kconfig's "def_bool", fwiw. ]

Possibly, we could also decide to just blame "randconfig" for the whole
issue, and forget about these, because I think it's highly unlikely
(though not impossible) for people with "real" .configs to hit the
problems we saw above.


KGDBOE bug #1:
==

config KGDBOE in lib/Kconfig.kgdb must also "depend on" NETDEVICES,
and select NET_POLL_CONTROLLER also.


KGDBOE bug #2:
==

config KGDBOE_NOMODULE is a sad, sad option, and must be killed. The
"if !KGDBOE_NOMODULE" in KGDBOE must be removed, and it must lose its
dependency on "m".


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 )

2007-08-02 Thread Satyam Sharma
Hi,


On Thu, 2 Aug 2007, Sam Ravnborg wrote:

> > > 
> > > ...
> > > endif # NETDEVICES
> > > 
> > > config NETPOLL
> > > depends on NETDEVICES
> > > def_bool NETCONSOLE
> > > 
> > > config NETPOLL_TRAP
> > > bool "Netpoll traffic trapping"
> > > default n
> > > depends on NETPOLL
> > > 
> > > config NET_POLL_CONTROLLER
> > > def_bool NETPOLL
> > > depends on NETPOLL
> > > 
> > > 
> > > seems to select NET_POLL_CONTROLLER after selecting NETPOLL, but
> > > still doesn't check for NETDEVICES dependency.
> > 
> > That's odd. Adding Sam to the cc:.

I just noticed this thread, but I wonder what the fuss is all
about :-) Kconfig dependencies are easy, really -- any code that
pulls in code from elsewhere, must explicitly "depends on" it.
It is possible to use "select" as well, but could lead to breakages
as discussed to death on at least 64592 other threads on LKML already
and hence should only be used for library-like code that does not
have any dependencies itself.


> select is evil
> select will by brute force set a symbol equal to 'y' without
> visiting the dependencies.
> So abusing select you are able to select a symbol FOO even 
> if FOO depends on BAR that is not set.
> 
> In general use select only for non-visible symbols (no promts anywhere)
> and for symbols with no dependencies.
> That will limit the suefullness but on the other hand avoid the illegal
> configurations all over.

The problem with using "depends on" is that your config symbol becomes
invisible unless the dependency has already been selected.

So, there's a workaround: make the ultimate config symbol itself depend
upon the grand-dependency (excuse the nomenclature) and just "select"
the immediate-parent-dependency, i.e. the following:

CONFIG_BAZ
...

CONFIG BAR
depends on BAZ

CONFIG_FOO
depends on BAZ
select BAR

is perfectly legal, and doesn't cause any build problems. Perhaps such a
solution makes sense here as well?


> kconfig should one day warn about such things but I have not fel inclined
> to dive into the matters hoping that Roman does one day.

Yup, I've wanted to do this myself, in fact I wanted to implement an idea
I had in mind ( http://lkml.org/lkml/2007/5/16/257 ) but for some reason
I tend to stay away from stuff in scripts/ :-)


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 -mm 9/9] netconsole: Support dynamic reconfiguration using configfs

2007-07-30 Thread Satyam Sharma


On Mon, 30 Jul 2007, Andrew Morton wrote:

> On Mon, 30 Jul 2007 08:19:10 +0530
> Satyam Sharma <[EMAIL PROTECTED]> wrote:
> 
> > +/*
> > + * Wrapper over simple_strtol (base 10) with sanity and range checking.
> > + * We return (signed) long only because we may want to return errors.
> > + * Do not use this to convert numbers that are allowed to be negative.
> > + */
> > +static long strtol10_check_range(const char *cp, long min, long max)
> > +{
> > +   long ret;
> > +   char *p = (char *) cp;
> > +
> > +   WARN_ON(min < 0);
> > +   WARN_ON(max < min);
> > +
> > +   ret = simple_strtol(p, &p, 10);
> > +
> > +   if (*p && (*p != '\n')) {
> > +   printk(KERN_ERR "netconsole: invalid input\n");
> > +   return -EINVAL;
> > +   }
> > +   if ((ret < min) || (ret > max)) {
> > +   printk(KERN_ERR "netconsole: input %ld must be between "
> > +   "%ld and %ld\n", ret, min, max);
> > +   return -EINVAL;
> > +   }
> > +
> > +   return ret;
> > +}
> 
> There's probably other code around the place which does this.  It might
> be worth making this function a general-purpose thing.

Probably, yes, I thought along similar lines.

[ BTW somewhere in the 9/9 patch I remember having to define a
  __U16_MAX as well. That could be pushed out to some generic
  or appropriate header as well, possibly. ]


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 -mm 0/9] netconsole: Multiple targets and dynamic reconfigurability

2007-07-30 Thread Satyam Sharma


On Mon, 30 Jul 2007, Andrew Morton wrote:

> On Mon, 30 Jul 2007 08:17:41 +0530
> Satyam Sharma <[EMAIL PROTECTED]> wrote:
> 
> > [0/9] netconsole: Multiple targets and dynamic reconfigurability
> > 
> > This is v3 of the patchset
> 
> That all looks pretty reasonable, thanks.
> 
> Are we all comfortable with the attributions?  As I have it now, everything
> will go in as having been authored by yourself.  There is no formal (ie:
> understood-by-git) way of recording joint authorship, unfortunately.  But one
> can make a record of such things in the changelog.

Ok, making such a record would be good, if possible. I did see that the
"author" field for git commits can have only one entry, unfortunately,
and it turns out "Signed-off-by" and "Acked-by" have other meanings
attached that have to do with the patch-journey-from-author-to-you chain.
But I'd like these to be somehow marked as jointly authored, definitely.

> I'll queue this material up in the to-be-sent-to-davem queue.

Ok, thanks.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] net driver fixes

2007-07-30 Thread Satyam Sharma


On Mon, 30 Jul 2007, Jeff Garzik wrote:

> true, we should just remove the dev==NULL check

Patch below:

[PATCH] nmclan_cs: Remove bogus (dev==NULL) check in mace_interrupt()

The (dev == NULL) check in drivers/net/pcmcia/nmclan_cs.c:mace_interrupt()
handler is always false, so let's remove it.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 drivers/net/pcmcia/nmclan_cs.c |6 --
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/drivers/net/pcmcia/nmclan_cs.c b/drivers/net/pcmcia/nmclan_cs.c
index 73da611..3446be9 100644
--- a/drivers/net/pcmcia/nmclan_cs.c
+++ b/drivers/net/pcmcia/nmclan_cs.c
@@ -1000,12 +1000,6 @@ static irqreturn_t mace_interrupt(int irq, void *dev_id)
   int status;
   int IntrCnt = MACE_MAX_IR_ITERATIONS;
 
-  if (dev == NULL) {
-DEBUG(2, "mace_interrupt(): irq 0x%X for unknown device.\n",
- irq);
-return IRQ_NONE;
-  }
-
   if (lp->tx_irq_disabled) {
 printk(
   (lp->tx_irq_disabled?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] net driver fixes

2007-07-30 Thread Satyam Sharma
Hi Jeff,


On Mon, 30 Jul 2007, Jeff Garzik wrote:

>   Fix a potential NULL pointer dereference in mace_interrupt() in 
> drivers/net/pcmcia/nmclan_cs.c

This oops is _programmatically_ impossible (the only way it can occur
is if the kernel has gone bazooka already anyway ...)

[ BTW even if it were possible, we haven't fixed it still ...
  note the use of the netdev_priv() before the (dev == NULL)
  check in the function below ... ]


> diff --git a/drivers/net/pcmcia/nmclan_cs.c b/drivers/net/pcmcia/nmclan_cs.c
> index 73da611..997c2d0 100644
> --- a/drivers/net/pcmcia/nmclan_cs.c
> +++ b/drivers/net/pcmcia/nmclan_cs.c
> @@ -996,7 +996,7 @@ static irqreturn_t mace_interrupt(int irq, void *dev_id)
>  {
>struct net_device *dev = (struct net_device *) dev_id;
>mace_private *lp = netdev_priv(dev);
> -  kio_addr_t ioaddr = dev->base_addr;
> +  kio_addr_t ioaddr;
>int status;
>int IntrCnt = MACE_MAX_IR_ITERATIONS;
>  
> @@ -1006,6 +1006,8 @@ static irqreturn_t mace_interrupt(int irq, void *dev_id)
>  return IRQ_NONE;
>}
>  
> +  ioaddr = dev->base_addr;
> +
>if (lp->tx_irq_disabled) {
>  printk(
>(lp->tx_irq_disabled?

I suggest we should just get rid of the (bogus, always false) dev == NULL
check itself, instead.

Will submit patch, in a bit ...


>   Fix a potential NULL pointer dereference in write_bulk_callback() in 
> drivers/net/usb/pegasus.c

Ditto, for this. Impossible oops. And if it does occur, the kernel would
have gone irrevocably mad already ...


> diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c
> index a05fd97..04cba6b 100644
> --- a/drivers/net/usb/pegasus.c
> +++ b/drivers/net/usb/pegasus.c
> @@ -768,11 +768,13 @@ done:
>  static void write_bulk_callback(struct urb *urb)
>  {
>   pegasus_t *pegasus = urb->context;
> - struct net_device *net = pegasus->net;
> + struct net_device *net;
>  
>   if (!pegasus)
>   return;
>  
> + net = pegasus->net;
> +
>   if (!netif_device_present(net) || !netif_running(net))
>   return;

Again, the (!pegasus) check is bogus (always false) and should be removed
instead. I believe the maintainer (CC'ed here) already has a patch for
exactly that.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-usb-devel] [PATCH] USB Pegasus driver - avoid a potential NULL pointer dereference.

2007-07-30 Thread Satyam Sharma


On Mon, 30 Jul 2007, Petko Manolov wrote:

> On Sun, 29 Jul 2007, Oliver Neukum wrote:
> 
> > [...]
> > pegasus == NULL there would be a kernel bug. Silently ignoring
> > it, like the code now wants to do is bad. As the oops has never been
> > reported, I figure turning it into an explicit debugging test is overkill,
> > so removal seems to be the best option.
> 
> In the past urb->context was not guaranteed to be non-null for any
> asynchronous calls.  If this is not the case anymore then it should be removed
> from at least two more locations in the driver.
> 
> Attached you'll find the resulting patch.

Given Oliver's earlier comment, it looks okay to me. Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 -mm 7/9] netconsole: Introduce netconsole_netdev_notifier

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[7/9] netconsole: Introduce netconsole_netdev_notifier

To update fields of underlying netpoll structure at runtime on
corresponding NETDEV_CHANGEADDR or NETDEV_CHANGENAME notifications.

ioctl(SIOCSIFHWADDR or SIOCSIFNAME) could be used to change the
hardware/MAC address or name of the local interface that our netpoll
is attached to. Whenever this happens, netdev notifier chain is called
out with the NETDEV_CHANGEADDR or NETDEV_CHANGENAME event message. We
respond to that and update the local_mac or dev_name field of the struct
netpoll. This makes sense anyway, but is especially required for dynamic
netconsole because the netpoll structure's internal members become user
visible files when either sysfs or configfs are used. So this helps us
to keep up with the MAC address/name changes and keep values in struct
netpoll uptodate.

[ Note that ioctl(SIOCSIFADDR) to change IP address of interface at
  runtime is not handled (to update local_ip of netpoll) on purpose --
  some setups may set the local_ip to a private address, not necessary
  the actual IP address of the sender host, as presently allowed. ]

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Signed-off-by: Keiichi Kii <[EMAIL PROTECTED]>

---

 drivers/net/netconsole.c |   32 
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index be15ca6..5557098 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -80,6 +80,33 @@ static struct netconsole_target default_target = {
},
 };
 
+/* Handle network interface device notifications */
+static int netconsole_netdev_event(struct notifier_block *this,
+  unsigned long event,
+  void *ptr)
+{
+   struct net_device *dev = ptr;
+   struct netconsole_target *nt = &default_target;
+
+   if (nt->np.dev == dev) {
+   switch (event) {
+   case NETDEV_CHANGEADDR:
+   memcpy(nt->np.local_mac, dev->dev_addr, ETH_ALEN);
+   break;
+
+   case NETDEV_CHANGENAME:
+   strlcpy(nt->np.dev_name, dev->name, IFNAMSIZ);
+   break;
+   }
+   }
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block netconsole_netdev_notifier = {
+   .notifier_call  = netconsole_netdev_event,
+};
+
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
int frag, left;
@@ -122,6 +149,10 @@ static int __init init_netconsole(void)
if (err)
goto out;
 
+   err = register_netdevice_notifier(&netconsole_netdev_notifier);
+   if (err)
+   goto out;
+
register_console(&netconsole);
printk(KERN_INFO "netconsole: network logging started\n");
 
@@ -134,6 +165,7 @@ static void __exit cleanup_netconsole(void)
struct netconsole_target *nt = &default_target;
 
unregister_console(&netconsole);
+   unregister_netdevice_notifier(&netconsole_netdev_notifier);
netpoll_cleanup(&nt->np);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 -mm 8/9] netconsole: Support multiple logging targets

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[8/9] netconsole: Support multiple logging targets

This patch introduces support for multiple targets, independent of
CONFIG_NETCONSOLE_DYNAMIC -- this is useful even in the default case and
(including the infrastructure introduced in previous patches) doesn't
really add too many bytes to module text. All the complexity (and size)
comes with the dynamic reconfigurability / userspace interface patch,
and so it's plausible users may want to keep this enabled but that
disabled (say to avoid a dependency on CONFIG_CONFIGFS_FS too).

Also update documentation to mention the use of ";" separator to specify
multiple logging targets in the boot/module option string.

Brief overview:

We maintain a target_list (and corresponding lock). Get rid of the static
"default_target" and introduce allocation and release functions for our
netconsole_target objects (but keeping sure to preserve previous behaviour
such as default values). During init_netconsole(), ";" is used as the
separator to identify multiple target specifications in the boot/module
option string. The target specifications are parsed and netpolls setup.
During exit, the target_list is torn down and all items released.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Signed-off-by: Keiichi Kii <[EMAIL PROTECTED]>

---

 Documentation/networking/netconsole.txt |6 +
 drivers/net/netconsole.c|  171 +++---
 2 files changed, 137 insertions(+), 40 deletions(-)

diff --git a/Documentation/networking/netconsole.txt 
b/Documentation/networking/netconsole.txt
index 5962f45..1aaa738 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -34,6 +34,12 @@ Examples:
 
  insmod netconsole netconsole=@/,@10.0.0.2/
 
+It also supports logging to multiple remote agents by specifying
+parameters for the multiple agents separated by semicolons and the
+complete string enclosed in "quotes", thusly:
+
+ modprobe netconsole netconsole="@/,@10.0.0.2/;@/eth1,[EMAIL PROTECTED]/"
+
 Built-in netconsole starts immediately after the TCP stack is
 initialized and attempts to bring up the supplied dev at the supplied
 address.
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 5557098..458c4d6 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -62,44 +62,93 @@ static int __init option_setup(char *opt)
 __setup("netconsole=", option_setup);
 #endif /* MODULE */
 
+/* Linked list of all configured targets */
+static LIST_HEAD(target_list);
+
+/* This needs to be a spinlock because write_msg() cannot sleep */
+static DEFINE_SPINLOCK(target_list_lock);
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
+ * @list:  Links this target into the target_list.
  * @np:The netpoll structure for this target.
  */
 struct netconsole_target {
+   struct list_headlist;
struct netpoll  np;
 };
 
-static struct netconsole_target default_target = {
-   .np = {
-   .name   = "netconsole",
-   .dev_name   = "eth0",
-   .local_port = 6665,
-   .remote_port= ,
-   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
-   },
-};
+/* Allocate new target and setup netpoll for it */
+static struct netconsole_target *alloc_target(char *target_config)
+{
+   int err = -ENOMEM;
+   struct netconsole_target *nt;
+
+   /* Allocate and initialize with defaults */
+   nt = kzalloc(sizeof(*nt), GFP_KERNEL);
+   if (!nt) {
+   printk(KERN_ERR "netconsole: failed to allocate memory\n");
+   goto fail;
+   }
+
+   nt->np.name = "netconsole";
+   strlcpy(nt->np.dev_name, "eth0", IFNAMSIZ);
+   nt->np.local_port = 6665;
+   nt->np.remote_port = ;
+   memset(nt->np.remote_mac, 0xff, ETH_ALEN);
+
+   /* Parse parameters and setup netpoll */
+   err = netpoll_parse_options(&nt->np, target_config);
+   if (err)
+   goto fail;
+
+   err = netpoll_setup(&nt->np);
+   if (err)
+   goto fail;
+
+   return nt;
+
+fail:
+   kfree(nt);
+   return ERR_PTR(err);
+}
+
+/* Cleanup netpoll for given target and free it */
+static void free_target(struct netconsole_target *nt)
+{
+   netpoll_cleanup(&nt->np);
+   kfree(nt);
+}
 
 /* Handle network interface device notifications */
 static int netconsole_netdev_event(struct notifier_block *this,
   unsigned long event,
   void *ptr)
 {
+   unsigned long flags;
+   struct netconsole_target *nt;
struct net_device *dev = ptr;
-   struct netconsole_target *nt 

[PATCH v3 -mm 9/9] netconsole: Support dynamic reconfiguration using configfs

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[9/9] netconsole: Support dynamic reconfiguration using configfs

This patch introduces support for dynamic reconfiguration (adding, removing
and/or modifying parameters of netconsole targets at runtime) using a
userspace interface exported via configfs. Documentation is also updated
accordingly.

Issues and brief design overview:

(1) Kernel-initiated creation / destruction of kernel objects is not
possible with configfs -- the lifetimes of the "config items" is managed
exclusively from userspace. But netconsole must support boot/module params
too, and these are parsed in kernel and hence netpolls must be setup from
the kernel. Joel Becker suggested to separately manage the lifetimes of
the two kinds of netconsole_target objects -- those created via configfs
mkdir(2) from userspace and those specified from the boot/module option
string. This adds complexity and some redundancy here and also means that
boot/module param-created targets are not exposed through the configfs
namespace (and hence cannot be updated / destroyed dynamically). However,
this saves us from locking / refcounting complexities that would need to
be introduced in configfs to support kernel-initiated item creation /
destroy there.

(2) In configfs, item creation takes place in the call chain of the mkdir(2)
syscall in the driver subsystem. If we used an ioctl(2) to create / destroy
objects from userspace, the special userspace program is able to fill out
the structure to be passed into the ioctl and hence specify attributes such
as local interface that are required at the time we set up the netpoll.
For configfs, this information is not available at the time of mkdir(2).
So, we keep all newly-created targets (via configfs) disabled by default.
The user is expected to set various attributes appropriately (including the
local network interface if required) and then write(2) "1" to the "enabled"
attribute. Thus, netpoll_setup() is then called on the set parameters in the
context of _this_ write(2) on the "enabled" attribute itself. This design
enables the user to reconfigure existing netconsole targets at runtime to
be attached to newly-come-up interfaces that may not have existed when
netconsole was loaded or when the targets were actually created. All this
effectively enables us to get rid of custom ioctls.

(3) Ultra-paranoid configfs attribute show() and store() operations, with
sanity and input range checking, using only safe string primitives, and
compliant with the recommendations in Documentation/filesystems/sysfs.txt.

(4) A new function netpoll_print_options() is created in the netpoll API,
that just prints out the configured parameters for a netpoll structure.
netpoll_parse_options() is modified to use that and it is also exported to
be used from netconsole.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Acked-by: Keiichi Kii <[EMAIL PROTECTED]>

---

 Documentation/networking/netconsole.txt |   68 
 drivers/net/Kconfig |   10 +
 drivers/net/netconsole.c|  605 +--
 include/linux/netpoll.h |1 +
 net/core/netpoll.c  |   44 ++-
 5 files changed, 683 insertions(+), 45 deletions(-)

diff --git a/Documentation/networking/netconsole.txt 
b/Documentation/networking/netconsole.txt
index 1aaa738..3c2f2b3 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -3,6 +3,10 @@ started by Ingo Molnar <[EMAIL PROTECTED]>, 2001.09.17
 2.6 port and netpoll api by Matt Mackall <[EMAIL PROTECTED]>, Sep 9 2003
 
 Please send bug reports to Matt Mackall <[EMAIL PROTECTED]>
+and Satyam Sharma <[EMAIL PROTECTED]>
+
+Introduction:
+=
 
 This module logs kernel printk messages over UDP allowing debugging of
 problem where disk logging fails and serial consoles are impractical.
@@ -13,6 +17,9 @@ the specified interface as soon as possible. While this 
doesn't allow
 capture of early kernel panics, it does capture most of the boot
 process.
 
+Sender and receiver configuration:
+==
+
 It takes a string configuration parameter "netconsole" in the
 following format:
 
@@ -46,6 +53,67 @@ address.
 
 The remote host can run either 'netcat -u -l -p ' or syslogd.
 
+Dynamic reconfiguration:
+
+
+Dynamic reconfigurability is a useful addition to netconsole that enables
+remote logging targets to be dynamically added, removed, or have their
+parameters reconfigured at runtime from a configfs-based userspace interface.
+[ Note that the parameters of netconsole targets that were specified/created
+from the boot/module option are not exposed via this interface, and hence
+cannot be modified dynamically. ]
+
+To include this feature, select CONFIG_NETCONSOLE_DYNAMIC when building the
+netconsole mo

[PATCH v3 -mm 5/9] netconsole: Add some useful tips to documentation

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[5/9] netconsole: Add some useful tips to documentation

Add some useful general-purpose tips. Also suggest solution for the frequent
problem of console loglevel set too low numerically (i.e. for high priority
messages only) on the sender.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Acked-by: Keiichi Kii <[EMAIL PROTECTED]>
Acked-by: Matt Mackall <[EMAIL PROTECTED]>

---

 Documentation/networking/netconsole.txt |   25 +
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/Documentation/networking/netconsole.txt 
b/Documentation/networking/netconsole.txt
index 1caa6c7..5962f45 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -44,11 +44,36 @@ WARNING: the default target ethernet setting uses the 
broadcast
 ethernet address to send packets, which can cause increased load on
 other systems on the same ethernet segment.
 
+TIP: some LAN switches may be configured to suppress ethernet broadcasts
+so it is advised to explicitly specify the remote agents' MAC addresses
+from the config parameters passed to netconsole.
+
+TIP: to find out the MAC address of, say, 10.0.0.2, you may try using:
+
+ ping -c 1 10.0.0.2 ; /sbin/arp -n | grep 10.0.0.2
+
+TIP: in case the remote logging agent is on a separate LAN subnet than
+the sender, it is suggested to try specifying the MAC address of the
+default gateway (you may use /sbin/route -n to find it out) as the
+remote MAC address instead.
+
 NOTE: the network device (eth1 in the above case) can run any kind
 of other network traffic, netconsole is not intrusive. Netconsole
 might cause slight delays in other traffic if the volume of kernel
 messages is high, but should have no other impact.
 
+NOTE: if you find that the remote logging agent is not receiving or
+printing all messages from the sender, it is likely that you have set
+the "console_loglevel" parameter (on the sender) to only send high
+priority messages to the console. You can change this at runtime using:
+
+ dmesg -n 8
+
+or by specifying "debug" on the kernel command line at boot, to send
+all kernel messages to the console. A specific value for this parameter
+can also be set using the "loglevel" kernel boot option. See the
+dmesg(8) man page and Documentation/kernel-parameters.txt for details.
+
 Netconsole was designed to be as instantaneous as possible, to
 enable the logging of even the most critical kernel bugs. It works
 from IRQ contexts as well, and does not enable interrupts while
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 -mm 6/9] netconsole: Introduce netconsole_target

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[6/9] netconsole: Introduce netconsole_target

Introduce a wrapper structure over netpoll to represent logging targets
configured in netconsole. This will get extended with other members in
further patches.

This is done independent of the (to-be-introduced) NETCONSOLE_DYNAMIC
config option so that we're able to drastically cut down on the #ifdef
complexity of final netconsole.c. Also, struct netconsole_target would be
required for multiple targets support also, and not just dynamic
reconfigurability.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Signed-off-by: Keiichi Kii <[EMAIL PROTECTED]>

---

 drivers/net/netconsole.c |   36 +---
 1 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 75cb761..be15ca6 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -62,24 +62,35 @@ static int __init option_setup(char *opt)
 __setup("netconsole=", option_setup);
 #endif /* MODULE */
 
-static struct netpoll np = {
-   .name   = "netconsole",
-   .dev_name   = "eth0",
-   .local_port = 6665,
-   .remote_port= ,
-   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
+/**
+ * struct netconsole_target - Represents a configured netconsole target.
+ * @np:The netpoll structure for this target.
+ */
+struct netconsole_target {
+   struct netpoll  np;
+};
+
+static struct netconsole_target default_target = {
+   .np = {
+   .name   = "netconsole",
+   .dev_name   = "eth0",
+   .local_port = 6665,
+   .remote_port= ,
+   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
+   },
 };
 
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
int frag, left;
unsigned long flags;
+   struct netconsole_target *nt = &default_target;
 
-   if (netif_running(np.dev)) {
+   if (netif_running(nt->np.dev)) {
local_irq_save(flags);
for (left = len; left;) {
frag = min(left, MAX_PRINT_CHUNK);
-   netpoll_send_udp(&np, msg, frag);
+   netpoll_send_udp(&nt->np, msg, frag);
msg += frag;
left -= frag;
}
@@ -96,17 +107,18 @@ static struct console netconsole = {
 static int __init init_netconsole(void)
 {
int err = 0;
+   struct netconsole_target *nt = &default_target;
 
if (!strnlen(config, MAX_PARAM_LENGTH)) {
printk(KERN_INFO "netconsole: not configured, aborting\n");
goto out;
}
 
-   err = netpoll_parse_options(&np, config);
+   err = netpoll_parse_options(&nt->np, config);
if (err)
goto out;
 
-   err = netpoll_setup(&np);
+   err = netpoll_setup(&nt->np);
if (err)
goto out;
 
@@ -119,8 +131,10 @@ out:
 
 static void __exit cleanup_netconsole(void)
 {
+   struct netconsole_target *nt = &default_target;
+
unregister_console(&netconsole);
-   netpoll_cleanup(&np);
+   netpoll_cleanup(&nt->np);
 }
 
 module_init(init_netconsole);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 -mm 3/9] netconsole: Simplify boot/module option setup logic

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[3/9] netconsole: Simplify boot/module option setup logic

Presently, boot/module parameters are set up quite differently for the
case of built-in netconsole (__setup() -> obsolete_checksetup() ->
netpoll_parse_options() -> strlen(config) == 0 in init_netconsole())
vs modular netconsole (module_param_string() -> string copied to the
config variable -> strlen(config) != 0 init_netconsole() ->
netpoll_parse_options()).

This patch makes both of them similar by doing exactly the equivalent
of a module_param_string() in option_setup() also -- just copying the
param string passed from the kernel command line into "config" variable.
So, strlen(config) != 0 in both cases, and netpoll_parse_options() is
always called from init_netconsole(), thus making the setup logic for
both cases similar.

Now, option_setup() is only ever called / used for the built-in case,
so we put it inside a #ifndef MODULE, otherwise gcc will complain about
option_setup() being "defined but not used". Also, the "configured"
variable is redundant with this patch and hence removed.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Signed-off-by: Keiichi Kii <[EMAIL PROTECTED]>
Acked-by: Matt Mackall <[EMAIL PROTECTED]>

---

 drivers/net/netconsole.c |   27 ++-
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 2c2aef1..e56aa6c 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -53,6 +53,15 @@ static char config[MAX_PARAM_LENGTH];
 module_param_string(netconsole, config, MAX_PARAM_LENGTH, 0);
 MODULE_PARM_DESC(netconsole, " [EMAIL 
PROTECTED]/[dev],[tgt-port]@/[tgt-macaddr]\n");
 
+#ifndefMODULE
+static int __init option_setup(char *opt)
+{
+   strlcpy(config, opt, MAX_PARAM_LENGTH);
+   return 1;
+}
+__setup("netconsole=", option_setup);
+#endif /* MODULE */
+
 static struct netpoll np = {
.name   = "netconsole",
.dev_name   = "eth0",
@@ -60,7 +69,6 @@ static struct netpoll np = {
.remote_port= ,
.remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
 };
-static int configured;
 
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
@@ -85,26 +93,19 @@ static struct console netconsole = {
.write  = write_msg,
 };
 
-static int __init option_setup(char *opt)
-{
-   configured = !netpoll_parse_options(&np, opt);
-   return 1;
-}
-
-__setup("netconsole=", option_setup);
-
 static int __init init_netconsole(void)
 {
int err = 0;
 
-   if (strnlen(config, MAX_PARAM_LENGTH))
-   option_setup(config);
-
-   if (!configured) {
+   if (!strnlen(config, MAX_PARAM_LENGTH)) {
printk(KERN_INFO "netconsole: not configured, aborting\n");
goto out;
}
 
+   err = netpoll_parse_options(&np, config);
+   if (err)
+   goto out;
+
err = netpoll_setup(&np);
if (err)
goto out;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 -mm 4/9] netconsole: Use netif_running() in write_msg()

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[4/9] netconsole: Use netif_running() in write_msg()

Avoid unnecessarily disabling interrupts and calling netpoll_send_udp()
if the corresponding local interface is not up.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Acked-by: Keiichi Kii <[EMAIL PROTECTED]>
Cc: Matt Mackall <[EMAIL PROTECTED]>

---

 drivers/net/netconsole.c |   18 +-
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index e56aa6c..75cb761 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -75,16 +75,16 @@ static void write_msg(struct console *con, const char *msg, 
unsigned int len)
int frag, left;
unsigned long flags;
 
-   local_irq_save(flags);
-
-   for (left = len; left;) {
-   frag = min(left, MAX_PRINT_CHUNK);
-   netpoll_send_udp(&np, msg, frag);
-   msg += frag;
-   left -= frag;
+   if (netif_running(np.dev)) {
+   local_irq_save(flags);
+   for (left = len; left;) {
+   frag = min(left, MAX_PRINT_CHUNK);
+   netpoll_send_udp(&np, msg, frag);
+   msg += frag;
+   left -= frag;
+   }
+   local_irq_restore(flags);
}
-
-   local_irq_restore(flags);
 }
 
 static struct console netconsole = {
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 -mm 1/9] netconsole: Cleanups, codingstyle, prettyfication

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[1/9] netconsole: Cleanups, codingstyle, prettyfication

(1) Remove unwanted headers.
(2) Mark __init and __exit as appropriate.
(3) Various trivial codingstyle and prettification stuff.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Signed-off-by: Keiichi Kii <[EMAIL PROTECTED]>
Acked-by: Matt Mackall <[EMAIL PROTECTED]>

---

 drivers/net/netconsole.c |   55 ++---
 1 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 69233f6..f1c2a2d 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -35,35 +35,32 @@
  /
 
 #include 
-#include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
-#include 
 #include 
 
 MODULE_AUTHOR("Maintainer: Matt Mackall <[EMAIL PROTECTED]>");
 MODULE_DESCRIPTION("Console driver for network interfaces");
 MODULE_LICENSE("GPL");
 
-static char config[256];
-module_param_string(netconsole, config, 256, 0);
+#define MAX_PARAM_LENGTH   256
+#define MAX_PRINT_CHUNK1000
+
+static char config[MAX_PARAM_LENGTH];
+module_param_string(netconsole, config, MAX_PARAM_LENGTH, 0);
 MODULE_PARM_DESC(netconsole, " [EMAIL 
PROTECTED]/[dev],[tgt-port]@/[tgt-macaddr]\n");
 
 static struct netpoll np = {
-   .name = "netconsole",
-   .dev_name = "eth0",
-   .local_port = 6665,
-   .remote_port = ,
-   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
+   .name   = "netconsole",
+   .dev_name   = "eth0",
+   .local_port = 6665,
+   .remote_port= ,
+   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
 };
-static int configured = 0;
-
-#define MAX_PRINT_CHUNK 1000
+static int configured;
 
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
@@ -75,7 +72,7 @@ static void write_msg(struct console *con, const char *msg, 
unsigned int len)
 
local_irq_save(flags);
 
-   for(left = len; left; ) {
+   for (left = len; left;) {
frag = min(left, MAX_PRINT_CHUNK);
netpoll_send_udp(&np, msg, frag);
msg += frag;
@@ -86,12 +83,12 @@ static void write_msg(struct console *con, const char *msg, 
unsigned int len)
 }
 
 static struct console netconsole = {
-   .name = "netcon",
-   .flags = CON_ENABLED | CON_PRINTBUFFER,
-   .write = write_msg
+   .name   = "netcon",
+   .flags  = CON_ENABLED | CON_PRINTBUFFER,
+   .write  = write_msg,
 };
 
-static int option_setup(char *opt)
+static int __init option_setup(char *opt)
 {
configured = !netpoll_parse_options(&np, opt);
return 1;
@@ -99,28 +96,30 @@ static int option_setup(char *opt)
 
 __setup("netconsole=", option_setup);
 
-static int init_netconsole(void)
+static int __init init_netconsole(void)
 {
-   int err;
+   int err = 0;
 
-   if(strlen(config))
+   if (strnlen(config, MAX_PARAM_LENGTH))
option_setup(config);
 
-   if(!configured) {
-   printk("netconsole: not configured, aborting\n");
-   return 0;
+   if (!configured) {
+   printk(KERN_INFO "netconsole: not configured, aborting\n");
+   goto out;
}
 
err = netpoll_setup(&np);
if (err)
-   return err;
+   goto out;
 
register_console(&netconsole);
printk(KERN_INFO "netconsole: network logging started\n");
-   return 0;
+
+out:
+   return err;
 }
 
-static void cleanup_netconsole(void)
+static void __exit cleanup_netconsole(void)
 {
unregister_console(&netconsole);
netpoll_cleanup(&np);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 -mm 2/9] netconsole: Remove bogus check

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[2/9] netconsole: Remove bogus check

The (!np.dev) check in write_msg() is bogus (always false), because:
np.dev is set by netpoll_setup(), which is called by init_netconsole()
before register_console(), so write_msg() cannot be triggered unless
netpoll_setup() successfully set np.dev. Also np.dev cannot go away
from under us, because netpoll_setup() grabs us reference on it. So
let's remove the bogus check.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Acked-by: Keiichi Kii <[EMAIL PROTECTED]>
Acked-by: Matt Mackall <[EMAIL PROTECTED]>

---

 drivers/net/netconsole.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index f1c2a2d..2c2aef1 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -67,9 +67,6 @@ static void write_msg(struct console *con, const char *msg, 
unsigned int len)
int frag, left;
unsigned long flags;
 
-   if (!np.dev)
-   return;
-
local_irq_save(flags);
 
for (left = len; left;) {
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 -mm 0/9] netconsole: Multiple targets and dynamic reconfigurability

2007-07-29 Thread Satyam Sharma
[0/9] netconsole: Multiple targets and dynamic reconfigurability

This is v3 of the patchset, the previous versions are available at:
http://lkml.org/lkml/2007/7/4/107
http://lkml.org/lkml/2007/7/10/78

Diffed against 2.6.23-rc1-git6 (6c8dca5d as of writing), but applies
successfully to 2.6.23-rc1-mm1 as well.

Patches 1-5 are okay-to-apply to current mainline -git, I think.

Patches 6-9 introduce the new functionality and are requested for
inclusion in -mm.

[1/9] netconsole: Cleanups, codingstyle, prettyfication
[2/9] netconsole: Remove bogus check
[3/9] netconsole: Simplify boot/module option setup logic
[4/9] netconsole: Use netif_running() in write_msg()
[5/9] netconsole: Add some useful tips to documentation
[6/9] netconsole: Introduce netconsole_target
[7/9] netconsole: Introduce netconsole_netdev_notifier
[8/9] netconsole: Support multiple logging targets
[9/9] netconsole: Support dynamic reconfiguration using configfs

Changes since v2:
=

* "enabled" must be defined outside #ifdef NETCONSOLE_DYNAMIC

* Some simple enhancements to documentation

* Drop the use of "unlikely" from a condition where I'd got the
  common case wrong

About this patchset:


What?

Support multiple remote logging targets in netconsole. Also, ability to
dynamically add or remove targets or modify parameters (IP, port, remote
MAC address) of targets at runtime, from userspace, using configfs.

Why?

>From Keiichi Kii's original post:

[...] current netconsole is not flexible. For example, if you want to change
ip address for logging agent, in the case of built-in netconsole, you can't
change config except for changing boot parameter and rebooting your system,
or in the case of module netconsole, you need to remove it and reload
with different parameters. [...] and we have been losing serial console
port with PCs and Servers.

(... and especially laptops, I would add.)


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sb1000: prevent a potential NULL pointer dereference in sb1000_dev_ioctl()

2007-07-29 Thread Satyam Sharma
Hi Michael,


On Sun, 29 Jul 2007, Michael Buesch wrote:

> On Sunday 29 July 2007 20:34:46 Satyam Sharma wrote:
> > (2) !(dev->flags & IFF_UP) is bogus because the functions of this ioctl
> > can (and should) be allowed even when the interface is not up and running.
> 
> Are you _sure_? This function does poke with the device hardware.
> It might return crap or even machinecheck when not initialized.
> Hardware is probably powered down, if not IFF_UP. (I don't know if that's
> the case here, though).

IFF_UP checks if the _interface_ is up -- the hardware / card could still
be powered up, but the interface down (ifconfing eth0 down or ip link set
eth0 down).

Probably what we want here is netif_device_present()? -- I think that
should return true only when the *device* itself is up (as in powered)
but the interface itself could be down ...

Let's wait for comments from the netdev people Cc:'ed here, in that case.


> >  drivers/net/sb1000.c |3 ---
> >  1 files changed, 0 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/sb1000.c b/drivers/net/sb1000.c
> > index 1de3eec..f60fe98 100644
> > --- a/drivers/net/sb1000.c
> > +++ b/drivers/net/sb1000.c
> > @@ -993,9 +993,6 @@ static int sb1000_dev_ioctl(struct net_device *dev, 
> > struct ifreq *ifr, int cmd)
> > unsigned int stats[5];
> > struct sb1000_private *lp = netdev_priv(dev);
> >  
> > -   if (!(dev && dev->flags & IFF_UP))
> > -   return -ENODEV;
> > -


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sb1000: prevent a potential NULL pointer dereference in sb1000_dev_ioctl()

2007-07-29 Thread Satyam Sharma


> On Sun, 29 Jul 2007, Domen Puncer wrote:
> 
> > On 29/07/07 00:02 +0200, Jesper Juhl wrote:
> > > Hi,
> > > 
> > > Here's a small patch, prompted by a find by the Coverity checker, 
> > > that removes a potential NULL pointer dereference from 
> > > drivers/net/sb1000.c::sb1000_dev_ioctl().
> > > The checker spotted that we do a NULL test of 'dev', yet we 
> > > dereference the pointer prior to that check.
> > > This patch simply moves the dereference after the NULL test.
> > 
> > But... it can't be called without a valid 'dev', no?
> > A quick 'grep do_ioctl net/' confirms that all calls are in
> > the form of 'dev->do_ioctl(dev, ...'.
> 
> Yup, I think so too ...
> 
> 
> > > @@ -991,11 +991,13 @@ static int sb1000_dev_ioctl(struct net_device *dev, 
> > > struct ifreq *ifr, int cmd)
> > >   short PID[4];
> > >   int ioaddr[2], status, frequency;
> > >   unsigned int stats[5];
> > > - struct sb1000_private *lp = netdev_priv(dev);
> > > + struct sb1000_private *lp;
> > >  
> > >   if (!(dev && dev->flags & IFF_UP))
> > >   return -ENODEV;
> 
> I think we could get rid of the !dev check itself. Actually, the IFF_UP
> check /also/ looks suspect to me for two reasons: (1) I remember Stephen
> Hemminger once telling me dev->flags is legacy and unsafe, and one of
> the netif_xxx() functions be used instead, and, (2) I wonder if we really
> require the interface to be up and *running* when we do this ioctl.

Updated patch below.

[PATCH] sb1000: Remove bogus checks

In net_device->do_ioctl() of the sb1000 driver (sb1000_dev_ioctl):

(1) !dev condition is always false -- this function cannot be called with
NULL net_device.
(2) !(dev->flags & IFF_UP) is bogus because the functions of this ioctl
can (and should) be allowed even when the interface is not up and running.

So let's remove these checks.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Cc: Jesper Juhl <[EMAIL PROTECTED]>
Cc: Domen Puncer <[EMAIL PROTECTED]>

---

 drivers/net/sb1000.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/net/sb1000.c b/drivers/net/sb1000.c
index 1de3eec..f60fe98 100644
--- a/drivers/net/sb1000.c
+++ b/drivers/net/sb1000.c
@@ -993,9 +993,6 @@ static int sb1000_dev_ioctl(struct net_device *dev, struct 
ifreq *ifr, int cmd)
unsigned int stats[5];
struct sb1000_private *lp = netdev_priv(dev);
 
-   if (!(dev && dev->flags & IFF_UP))
-   return -ENODEV;
-
ioaddr[0] = dev->base_addr;
/* mem_start holds the second I/O address */
ioaddr[1] = dev->mem_start;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-usb-devel] [PATCH] USB Pegasus driver - avoid a potential NULL pointer dereference.

2007-07-29 Thread Satyam Sharma


On Sun, 29 Jul 2007, Oliver Neukum wrote:

> Am Sonntag 29 Juli 2007 schrieb Jesper Juhl:
> > On 29/07/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:
> > > Hi,
> > >
> > > On 7/29/07, Jesper Juhl <[EMAIL PROTECTED]> wrote:
> > > > Hello,
> > > >
> > > > This patch makes sure we don't dereference a NULL pointer in
> > > > drivers/net/usb/pegasus.c::write_bulk_callback() in the initial
> > > > struct net_device *net = pegasus->net; assignment.
> > > > The existing code checks if 'pegasus' is NULL and bails out if
> > > > it is, so we better not touch that pointer until after that check.
> > > > [...]
> > >
> > > Is it really possible that we're called into this function with
> > > urb->context == NULL? If not, I'd suggest let's just get rid of
> > > the check itself, instead.
> > >
> > I'm not sure. I am not very familiar with this code. I just figured
> > that moving the assignment is potentially a little safer and it is
> > certainly no worse than the current code, so that's a safe and
> > potentially benneficial change. Removing the check may be safe but I'm
> > not certain enough to make that call...
> 
> pegasus == NULL there would be a kernel bug. Silently ignoring
> it, like the code now wants to do is bad. As the oops has never been
> reported, I figure turning it into an explicit debugging test is overkill,
> so removal seems to be the best option.

Ok, thanks. Updated patch below.

[PATCH] pegasus: Remove bogus checks in urb->complete() callbacks

urb->complete() callbacks registered in drivers/net/usb/pegasus.c needlessly
check for urb->context != NULL, but that is not possible (the only way that
can be possible would be a kernel bug elsewhere, and these checks would
simply end up hiding that). So let's remove the bogus checks.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 drivers/net/usb/pegasus.c |9 +
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c
index a05fd97..439ef9f 100644
--- a/drivers/net/usb/pegasus.c
+++ b/drivers/net/usb/pegasus.c
@@ -770,9 +770,6 @@ static void write_bulk_callback(struct urb *urb)
pegasus_t *pegasus = urb->context;
struct net_device *net = pegasus->net;
 
-   if (!pegasus)
-   return;
-
if (!netif_device_present(net) || !netif_running(net))
return;
 
@@ -805,13 +802,9 @@ static void write_bulk_callback(struct urb *urb)
 static void intr_callback(struct urb *urb)
 {
pegasus_t *pegasus = urb->context;
-   struct net_device *net;
+   struct net_device *net = pegasus->net;
int status;
 
-   if (!pegasus)
-   return;
-   net = pegasus->net;
-
switch (urb->status) {
case 0:
break;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >