Re: [PATCH 3/5] Use mutex instead of semaphore in the SCSI Tape driver

2007-07-29 Thread Satyam Sharma


On Sun, 29 Jul 2007, Matthias Kaehlcke wrote:

> The SCSI Tape driver uses a semaphore as mutex. Use the mutex API
> instead of the (binary) semaphore.
> 
> Signed-off-by: Matthias Kaehlcke <[EMAIL PROTECTED]>

Reviewed-by: Satyam Sharma <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] Use mutex instead of semaphore in the OnStream SCSI Tape driver

2007-07-29 Thread Satyam Sharma
Hi,


On Sun, 29 Jul 2007, Matthias Kaehlcke wrote:

> The OnStream SCSI Tape driver uses a semaphore as mutex. Use the mutex
> API instead of the (binary) semaphore.
> 
> Signed-off-by: Matthias Kaehlcke <[EMAIL PROTECTED]>
> [...]
> @@ -3298,7 +3298,7 @@ static ssize_t osst_write(struct file * filp, const 
> char __user * buf, size_t co
>   char* name = tape_name(STp);
>  
>  
> - if (down_interruptible(>lock))
> + if (mutex_lock_interruptible(>lock))
>   return (-ERESTARTSYS);

The () after return existed in the code already, but you could've felt
free to remove them :-)

> @@ -3619,7 +3619,7 @@ static ssize_t osst_read(struct file * filp, char 
> __user * buf, size_t count, lo
>   char* name  = tape_name(STp);
>  
>  
> - if (down_interruptible(>lock))
> + if (mutex_lock_interruptible(>lock))
>   return (-ERESTARTSYS);

Same here.

Reviewed-by: Satyam Sharma <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] Use mutex instead of semaphore in the Host AP driver

2007-07-29 Thread Satyam Sharma


On Mon, 30 Jul 2007, Michael Buesch wrote:

> On Sunday 29 July 2007 23:34, Matthias Kaehlcke wrote:
> > The Host AP driver uses a semaphore as mutex. Use the mutex API
> > instead of the (binary) semaphore.
> > 
> > Signed-off-by: Matthias Kaehlcke <[EMAIL PROTECTED]>

[ Something seems to have gone wrong with your diff / patch / script.
  There was no diff header here, which should have been. ]

> > -   res = down_interruptible(>rid_bap_sem);
> > +   res = mutex_lock_interruptible(>rid_bap_mtx);
> > if (res)
> > return res;
> >  
> > @@ -902,7 +902,7 @@ static int hfa384x_set_rid(struct net_device *dev, u16 
> > rid, void *buf, int len)
> > /* RID len in words and +1 for rec.rid */
> > rec.len = cpu_to_le16(len / 2 + len % 2 + 1);
> >  
> > -   res = down_interruptible(>rid_bap_sem);
> > +   res = mutex_lock_interruptible(>rid_bap_mtx);
> > if (res)
> > return res;
> >  
> 
> Is res returned to userspace? If yes, that's not right.

Yup, that's not right.

> On a interrupted mutex allocation you should return
> -ERESTARTSYS to userspace.

Nope, userspace must not see ERESTARTSYS (I /think/ this is the third
time I'm participating in this exact same discussion :-)

If the return would be caught by a previous in-kernel caller in the
call chain, ERESTARTSYS is okay and it could try to restart the
operation. However, if the return goes unfiltered directly to
userspace, EINTR is the correct choice.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH][resend] Fix a typo in Documentation/keys.txt

2007-07-29 Thread Satyam Sharma
[PATCH] Fix a typo in Documentation/keys.txt

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Acked-by: David Howells <[EMAIL PROTECTED]>

---

[ Previously sent on: June 9, 2007 ]

 Documentation/keys.txt |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff -ruNp a/Documentation/keys.txt b/Documentation/keys.txt
--- a/Documentation/keys.txt2007-06-03 02:33:02.0 +0530
+++ b/Documentation/keys.txt2007-06-03 02:38:52.0 +0530
@@ -859,9 +859,8 @@ payload contents" for more information.
void unregister_key_type(struct key_type *type);
 
 
-Under some circumstances, it may be desirable to desirable to deal with a
-bundle of keys.  The facility provides access to the keyring type for managing
-such a bundle:
+Under some circumstances, it may be desirable to deal with a bundle of keys.
+The facility provides access to the keyring type for managing such a bundle:
 
struct key_type key_type_keyring;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 9/9] netconsole: Support dynamic reconfiguration using configfs

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[9/9] netconsole: Support dynamic reconfiguration using configfs

This patch introduces support for dynamic reconfiguration (adding, removing
and/or modifying parameters of netconsole targets at runtime) using a
userspace interface exported via configfs. Documentation is also updated
accordingly.

Issues and brief design overview:

(1) Kernel-initiated creation / destruction of kernel objects is not
possible with configfs -- the lifetimes of the "config items" is managed
exclusively from userspace. But netconsole must support boot/module params
too, and these are parsed in kernel and hence netpolls must be setup from
the kernel. Joel Becker suggested to separately manage the lifetimes of
the two kinds of netconsole_target objects -- those created via configfs
mkdir(2) from userspace and those specified from the boot/module option
string. This adds complexity and some redundancy here and also means that
boot/module param-created targets are not exposed through the configfs
namespace (and hence cannot be updated / destroyed dynamically). However,
this saves us from locking / refcounting complexities that would need to
be introduced in configfs to support kernel-initiated item creation /
destroy there.

(2) In configfs, item creation takes place in the call chain of the mkdir(2)
syscall in the driver subsystem. If we used an ioctl(2) to create / destroy
objects from userspace, the special userspace program is able to fill out
the structure to be passed into the ioctl and hence specify attributes such
as local interface that are required at the time we set up the netpoll.
For configfs, this information is not available at the time of mkdir(2).
So, we keep all newly-created targets (via configfs) disabled by default.
The user is expected to set various attributes appropriately (including the
local network interface if required) and then write(2) "1" to the "enabled"
attribute. Thus, netpoll_setup() is then called on the set parameters in the
context of _this_ write(2) on the "enabled" attribute itself. This design
enables the user to reconfigure existing netconsole targets at runtime to
be attached to newly-come-up interfaces that may not have existed when
netconsole was loaded or when the targets were actually created. All this
effectively enables us to get rid of custom ioctls.

(3) Ultra-paranoid configfs attribute show() and store() operations, with
sanity and input range checking, using only safe string primitives, and
compliant with the recommendations in Documentation/filesystems/sysfs.txt.

(4) A new function netpoll_print_options() is created in the netpoll API,
that just prints out the configured parameters for a netpoll structure.
netpoll_parse_options() is modified to use that and it is also exported to
be used from netconsole.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Acked-by: Keiichi Kii <[EMAIL PROTECTED]>

---

 Documentation/networking/netconsole.txt |   68 
 drivers/net/Kconfig |   10 +
 drivers/net/netconsole.c|  605 +--
 include/linux/netpoll.h |1 +
 net/core/netpoll.c  |   44 ++-
 5 files changed, 683 insertions(+), 45 deletions(-)

diff --git a/Documentation/networking/netconsole.txt 
b/Documentation/networking/netconsole.txt
index 1aaa738..3c2f2b3 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -3,6 +3,10 @@ started by Ingo Molnar <[EMAIL PROTECTED]>, 2001.09.17
 2.6 port and netpoll api by Matt Mackall <[EMAIL PROTECTED]>, Sep 9 2003
 
 Please send bug reports to Matt Mackall <[EMAIL PROTECTED]>
+and Satyam Sharma <[EMAIL PROTECTED]>
+
+Introduction:
+=
 
 This module logs kernel printk messages over UDP allowing debugging of
 problem where disk logging fails and serial consoles are impractical.
@@ -13,6 +17,9 @@ the specified interface as soon as possible. While this 
doesn't allow
 capture of early kernel panics, it does capture most of the boot
 process.
 
+Sender and receiver configuration:
+==
+
 It takes a string configuration parameter "netconsole" in the
 following format:
 
@@ -46,6 +53,67 @@ address.
 
 The remote host can run either 'netcat -u -l -p ' or syslogd.
 
+Dynamic reconfiguration:
+
+
+Dynamic reconfigurability is a useful addition to netconsole that enables
+remote logging targets to be dynamically added, removed, or have their
+parameters reconfigured at runtime from a configfs-based userspace interface.
+[ Note that the parameters of netconsole targets that were specified/created
+from the boot/module option are not exposed via this interface, and hence
+cannot be modified dynamically. ]
+
+To include this feature, select CONFIG_NETCONSOLE_DYNAMIC when building the
+netconsole module (or kernel

[PATCH v3 -mm 6/9] netconsole: Introduce netconsole_target

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[6/9] netconsole: Introduce netconsole_target

Introduce a wrapper structure over netpoll to represent logging targets
configured in netconsole. This will get extended with other members in
further patches.

This is done independent of the (to-be-introduced) NETCONSOLE_DYNAMIC
config option so that we're able to drastically cut down on the #ifdef
complexity of final netconsole.c. Also, struct netconsole_target would be
required for multiple targets support also, and not just dynamic
reconfigurability.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Signed-off-by: Keiichi Kii <[EMAIL PROTECTED]>

---

 drivers/net/netconsole.c |   36 +---
 1 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 75cb761..be15ca6 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -62,24 +62,35 @@ static int __init option_setup(char *opt)
 __setup("netconsole=", option_setup);
 #endif /* MODULE */
 
-static struct netpoll np = {
-   .name   = "netconsole",
-   .dev_name   = "eth0",
-   .local_port = 6665,
-   .remote_port= ,
-   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
+/**
+ * struct netconsole_target - Represents a configured netconsole target.
+ * @np:The netpoll structure for this target.
+ */
+struct netconsole_target {
+   struct netpoll  np;
+};
+
+static struct netconsole_target default_target = {
+   .np = {
+   .name   = "netconsole",
+   .dev_name   = "eth0",
+   .local_port = 6665,
+   .remote_port= ,
+   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
+   },
 };
 
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
int frag, left;
unsigned long flags;
+   struct netconsole_target *nt = _target;
 
-   if (netif_running(np.dev)) {
+   if (netif_running(nt->np.dev)) {
local_irq_save(flags);
for (left = len; left;) {
frag = min(left, MAX_PRINT_CHUNK);
-   netpoll_send_udp(, msg, frag);
+   netpoll_send_udp(>np, msg, frag);
msg += frag;
left -= frag;
}
@@ -96,17 +107,18 @@ static struct console netconsole = {
 static int __init init_netconsole(void)
 {
int err = 0;
+   struct netconsole_target *nt = _target;
 
if (!strnlen(config, MAX_PARAM_LENGTH)) {
printk(KERN_INFO "netconsole: not configured, aborting\n");
goto out;
}
 
-   err = netpoll_parse_options(, config);
+   err = netpoll_parse_options(>np, config);
if (err)
goto out;
 
-   err = netpoll_setup();
+   err = netpoll_setup(>np);
if (err)
goto out;
 
@@ -119,8 +131,10 @@ out:
 
 static void __exit cleanup_netconsole(void)
 {
+   struct netconsole_target *nt = _target;
+
unregister_console();
-   netpoll_cleanup();
+   netpoll_cleanup(>np);
 }
 
 module_init(init_netconsole);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 7/9] netconsole: Introduce netconsole_netdev_notifier

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[7/9] netconsole: Introduce netconsole_netdev_notifier

To update fields of underlying netpoll structure at runtime on
corresponding NETDEV_CHANGEADDR or NETDEV_CHANGENAME notifications.

ioctl(SIOCSIFHWADDR or SIOCSIFNAME) could be used to change the
hardware/MAC address or name of the local interface that our netpoll
is attached to. Whenever this happens, netdev notifier chain is called
out with the NETDEV_CHANGEADDR or NETDEV_CHANGENAME event message. We
respond to that and update the local_mac or dev_name field of the struct
netpoll. This makes sense anyway, but is especially required for dynamic
netconsole because the netpoll structure's internal members become user
visible files when either sysfs or configfs are used. So this helps us
to keep up with the MAC address/name changes and keep values in struct
netpoll uptodate.

[ Note that ioctl(SIOCSIFADDR) to change IP address of interface at
  runtime is not handled (to update local_ip of netpoll) on purpose --
  some setups may set the local_ip to a private address, not necessary
  the actual IP address of the sender host, as presently allowed. ]

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Signed-off-by: Keiichi Kii <[EMAIL PROTECTED]>

---

 drivers/net/netconsole.c |   32 
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index be15ca6..5557098 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -80,6 +80,33 @@ static struct netconsole_target default_target = {
},
 };
 
+/* Handle network interface device notifications */
+static int netconsole_netdev_event(struct notifier_block *this,
+  unsigned long event,
+  void *ptr)
+{
+   struct net_device *dev = ptr;
+   struct netconsole_target *nt = _target;
+
+   if (nt->np.dev == dev) {
+   switch (event) {
+   case NETDEV_CHANGEADDR:
+   memcpy(nt->np.local_mac, dev->dev_addr, ETH_ALEN);
+   break;
+
+   case NETDEV_CHANGENAME:
+   strlcpy(nt->np.dev_name, dev->name, IFNAMSIZ);
+   break;
+   }
+   }
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block netconsole_netdev_notifier = {
+   .notifier_call  = netconsole_netdev_event,
+};
+
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
int frag, left;
@@ -122,6 +149,10 @@ static int __init init_netconsole(void)
if (err)
goto out;
 
+   err = register_netdevice_notifier(_netdev_notifier);
+   if (err)
+   goto out;
+
register_console();
printk(KERN_INFO "netconsole: network logging started\n");
 
@@ -134,6 +165,7 @@ static void __exit cleanup_netconsole(void)
struct netconsole_target *nt = _target;
 
unregister_console();
+   unregister_netdevice_notifier(_netdev_notifier);
netpoll_cleanup(>np);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 8/9] netconsole: Support multiple logging targets

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[8/9] netconsole: Support multiple logging targets

This patch introduces support for multiple targets, independent of
CONFIG_NETCONSOLE_DYNAMIC -- this is useful even in the default case and
(including the infrastructure introduced in previous patches) doesn't
really add too many bytes to module text. All the complexity (and size)
comes with the dynamic reconfigurability / userspace interface patch,
and so it's plausible users may want to keep this enabled but that
disabled (say to avoid a dependency on CONFIG_CONFIGFS_FS too).

Also update documentation to mention the use of ";" separator to specify
multiple logging targets in the boot/module option string.

Brief overview:

We maintain a target_list (and corresponding lock). Get rid of the static
"default_target" and introduce allocation and release functions for our
netconsole_target objects (but keeping sure to preserve previous behaviour
such as default values). During init_netconsole(), ";" is used as the
separator to identify multiple target specifications in the boot/module
option string. The target specifications are parsed and netpolls setup.
During exit, the target_list is torn down and all items released.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Signed-off-by: Keiichi Kii <[EMAIL PROTECTED]>

---

 Documentation/networking/netconsole.txt |6 +
 drivers/net/netconsole.c|  171 +++---
 2 files changed, 137 insertions(+), 40 deletions(-)

diff --git a/Documentation/networking/netconsole.txt 
b/Documentation/networking/netconsole.txt
index 5962f45..1aaa738 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -34,6 +34,12 @@ Examples:
 
  insmod netconsole netconsole=@/,@10.0.0.2/
 
+It also supports logging to multiple remote agents by specifying
+parameters for the multiple agents separated by semicolons and the
+complete string enclosed in "quotes", thusly:
+
+ modprobe netconsole netconsole="@/,@10.0.0.2/;@/eth1,[EMAIL PROTECTED]/"
+
 Built-in netconsole starts immediately after the TCP stack is
 initialized and attempts to bring up the supplied dev at the supplied
 address.
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 5557098..458c4d6 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -62,44 +62,93 @@ static int __init option_setup(char *opt)
 __setup("netconsole=", option_setup);
 #endif /* MODULE */
 
+/* Linked list of all configured targets */
+static LIST_HEAD(target_list);
+
+/* This needs to be a spinlock because write_msg() cannot sleep */
+static DEFINE_SPINLOCK(target_list_lock);
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
+ * @list:  Links this target into the target_list.
  * @np:The netpoll structure for this target.
  */
 struct netconsole_target {
+   struct list_headlist;
struct netpoll  np;
 };
 
-static struct netconsole_target default_target = {
-   .np = {
-   .name   = "netconsole",
-   .dev_name   = "eth0",
-   .local_port = 6665,
-   .remote_port= ,
-   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
-   },
-};
+/* Allocate new target and setup netpoll for it */
+static struct netconsole_target *alloc_target(char *target_config)
+{
+   int err = -ENOMEM;
+   struct netconsole_target *nt;
+
+   /* Allocate and initialize with defaults */
+   nt = kzalloc(sizeof(*nt), GFP_KERNEL);
+   if (!nt) {
+   printk(KERN_ERR "netconsole: failed to allocate memory\n");
+   goto fail;
+   }
+
+   nt->np.name = "netconsole";
+   strlcpy(nt->np.dev_name, "eth0", IFNAMSIZ);
+   nt->np.local_port = 6665;
+   nt->np.remote_port = ;
+   memset(nt->np.remote_mac, 0xff, ETH_ALEN);
+
+   /* Parse parameters and setup netpoll */
+   err = netpoll_parse_options(>np, target_config);
+   if (err)
+   goto fail;
+
+   err = netpoll_setup(>np);
+   if (err)
+   goto fail;
+
+   return nt;
+
+fail:
+   kfree(nt);
+   return ERR_PTR(err);
+}
+
+/* Cleanup netpoll for given target and free it */
+static void free_target(struct netconsole_target *nt)
+{
+   netpoll_cleanup(>np);
+   kfree(nt);
+}
 
 /* Handle network interface device notifications */
 static int netconsole_netdev_event(struct notifier_block *this,
   unsigned long event,
   void *ptr)
 {
+   unsigned long flags;
+   struct netconsole_target *nt;
struct net_device *dev = ptr;
-   struct netconsole_target *nt = _target;
 
-   if (nt-&

[PATCH v3 -mm 3/9] netconsole: Simplify boot/module option setup logic

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[3/9] netconsole: Simplify boot/module option setup logic

Presently, boot/module parameters are set up quite differently for the
case of built-in netconsole (__setup() -> obsolete_checksetup() ->
netpoll_parse_options() -> strlen(config) == 0 in init_netconsole())
vs modular netconsole (module_param_string() -> string copied to the
config variable -> strlen(config) != 0 init_netconsole() ->
netpoll_parse_options()).

This patch makes both of them similar by doing exactly the equivalent
of a module_param_string() in option_setup() also -- just copying the
param string passed from the kernel command line into "config" variable.
So, strlen(config) != 0 in both cases, and netpoll_parse_options() is
always called from init_netconsole(), thus making the setup logic for
both cases similar.

Now, option_setup() is only ever called / used for the built-in case,
so we put it inside a #ifndef MODULE, otherwise gcc will complain about
option_setup() being "defined but not used". Also, the "configured"
variable is redundant with this patch and hence removed.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Signed-off-by: Keiichi Kii <[EMAIL PROTECTED]>
Acked-by: Matt Mackall <[EMAIL PROTECTED]>

---

 drivers/net/netconsole.c |   27 ++-
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 2c2aef1..e56aa6c 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -53,6 +53,15 @@ static char config[MAX_PARAM_LENGTH];
 module_param_string(netconsole, config, MAX_PARAM_LENGTH, 0);
 MODULE_PARM_DESC(netconsole, " [EMAIL 
PROTECTED]/[dev],[tgt-port]@/[tgt-macaddr]\n");
 
+#ifndefMODULE
+static int __init option_setup(char *opt)
+{
+   strlcpy(config, opt, MAX_PARAM_LENGTH);
+   return 1;
+}
+__setup("netconsole=", option_setup);
+#endif /* MODULE */
+
 static struct netpoll np = {
.name   = "netconsole",
.dev_name   = "eth0",
@@ -60,7 +69,6 @@ static struct netpoll np = {
.remote_port= ,
.remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
 };
-static int configured;
 
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
@@ -85,26 +93,19 @@ static struct console netconsole = {
.write  = write_msg,
 };
 
-static int __init option_setup(char *opt)
-{
-   configured = !netpoll_parse_options(, opt);
-   return 1;
-}
-
-__setup("netconsole=", option_setup);
-
 static int __init init_netconsole(void)
 {
int err = 0;
 
-   if (strnlen(config, MAX_PARAM_LENGTH))
-   option_setup(config);
-
-   if (!configured) {
+   if (!strnlen(config, MAX_PARAM_LENGTH)) {
printk(KERN_INFO "netconsole: not configured, aborting\n");
goto out;
}
 
+   err = netpoll_parse_options(, config);
+   if (err)
+   goto out;
+
err = netpoll_setup();
if (err)
goto out;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 4/9] netconsole: Use netif_running() in write_msg()

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[4/9] netconsole: Use netif_running() in write_msg()

Avoid unnecessarily disabling interrupts and calling netpoll_send_udp()
if the corresponding local interface is not up.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Acked-by: Keiichi Kii <[EMAIL PROTECTED]>
Cc: Matt Mackall <[EMAIL PROTECTED]>

---

 drivers/net/netconsole.c |   18 +-
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index e56aa6c..75cb761 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -75,16 +75,16 @@ static void write_msg(struct console *con, const char *msg, 
unsigned int len)
int frag, left;
unsigned long flags;
 
-   local_irq_save(flags);
-
-   for (left = len; left;) {
-   frag = min(left, MAX_PRINT_CHUNK);
-   netpoll_send_udp(, msg, frag);
-   msg += frag;
-   left -= frag;
+   if (netif_running(np.dev)) {
+   local_irq_save(flags);
+   for (left = len; left;) {
+   frag = min(left, MAX_PRINT_CHUNK);
+   netpoll_send_udp(, msg, frag);
+   msg += frag;
+   left -= frag;
+   }
+   local_irq_restore(flags);
}
-
-   local_irq_restore(flags);
 }
 
 static struct console netconsole = {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 5/9] netconsole: Add some useful tips to documentation

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[5/9] netconsole: Add some useful tips to documentation

Add some useful general-purpose tips. Also suggest solution for the frequent
problem of console loglevel set too low numerically (i.e. for high priority
messages only) on the sender.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Acked-by: Keiichi Kii <[EMAIL PROTECTED]>
Acked-by: Matt Mackall <[EMAIL PROTECTED]>

---

 Documentation/networking/netconsole.txt |   25 +
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/Documentation/networking/netconsole.txt 
b/Documentation/networking/netconsole.txt
index 1caa6c7..5962f45 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -44,11 +44,36 @@ WARNING: the default target ethernet setting uses the 
broadcast
 ethernet address to send packets, which can cause increased load on
 other systems on the same ethernet segment.
 
+TIP: some LAN switches may be configured to suppress ethernet broadcasts
+so it is advised to explicitly specify the remote agents' MAC addresses
+from the config parameters passed to netconsole.
+
+TIP: to find out the MAC address of, say, 10.0.0.2, you may try using:
+
+ ping -c 1 10.0.0.2 ; /sbin/arp -n | grep 10.0.0.2
+
+TIP: in case the remote logging agent is on a separate LAN subnet than
+the sender, it is suggested to try specifying the MAC address of the
+default gateway (you may use /sbin/route -n to find it out) as the
+remote MAC address instead.
+
 NOTE: the network device (eth1 in the above case) can run any kind
 of other network traffic, netconsole is not intrusive. Netconsole
 might cause slight delays in other traffic if the volume of kernel
 messages is high, but should have no other impact.
 
+NOTE: if you find that the remote logging agent is not receiving or
+printing all messages from the sender, it is likely that you have set
+the "console_loglevel" parameter (on the sender) to only send high
+priority messages to the console. You can change this at runtime using:
+
+ dmesg -n 8
+
+or by specifying "debug" on the kernel command line at boot, to send
+all kernel messages to the console. A specific value for this parameter
+can also be set using the "loglevel" kernel boot option. See the
+dmesg(8) man page and Documentation/kernel-parameters.txt for details.
+
 Netconsole was designed to be as instantaneous as possible, to
 enable the logging of even the most critical kernel bugs. It works
 from IRQ contexts as well, and does not enable interrupts while
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 2/9] netconsole: Remove bogus check

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[2/9] netconsole: Remove bogus check

The (!np.dev) check in write_msg() is bogus (always false), because:
np.dev is set by netpoll_setup(), which is called by init_netconsole()
before register_console(), so write_msg() cannot be triggered unless
netpoll_setup() successfully set np.dev. Also np.dev cannot go away
from under us, because netpoll_setup() grabs us reference on it. So
let's remove the bogus check.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Acked-by: Keiichi Kii <[EMAIL PROTECTED]>
Acked-by: Matt Mackall <[EMAIL PROTECTED]>

---

 drivers/net/netconsole.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index f1c2a2d..2c2aef1 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -67,9 +67,6 @@ static void write_msg(struct console *con, const char *msg, 
unsigned int len)
int frag, left;
unsigned long flags;
 
-   if (!np.dev)
-   return;
-
local_irq_save(flags);
 
for (left = len; left;) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 1/9] netconsole: Cleanups, codingstyle, prettyfication

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[1/9] netconsole: Cleanups, codingstyle, prettyfication

(1) Remove unwanted headers.
(2) Mark __init and __exit as appropriate.
(3) Various trivial codingstyle and prettification stuff.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Signed-off-by: Keiichi Kii <[EMAIL PROTECTED]>
Acked-by: Matt Mackall <[EMAIL PROTECTED]>

---

 drivers/net/netconsole.c |   55 ++---
 1 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 69233f6..f1c2a2d 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -35,35 +35,32 @@
  /
 
 #include 
-#include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
-#include 
 #include 
 
 MODULE_AUTHOR("Maintainer: Matt Mackall <[EMAIL PROTECTED]>");
 MODULE_DESCRIPTION("Console driver for network interfaces");
 MODULE_LICENSE("GPL");
 
-static char config[256];
-module_param_string(netconsole, config, 256, 0);
+#define MAX_PARAM_LENGTH   256
+#define MAX_PRINT_CHUNK1000
+
+static char config[MAX_PARAM_LENGTH];
+module_param_string(netconsole, config, MAX_PARAM_LENGTH, 0);
 MODULE_PARM_DESC(netconsole, " [EMAIL 
PROTECTED]/[dev],[tgt-port]@/[tgt-macaddr]\n");
 
 static struct netpoll np = {
-   .name = "netconsole",
-   .dev_name = "eth0",
-   .local_port = 6665,
-   .remote_port = ,
-   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
+   .name   = "netconsole",
+   .dev_name   = "eth0",
+   .local_port = 6665,
+   .remote_port= ,
+   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
 };
-static int configured = 0;
-
-#define MAX_PRINT_CHUNK 1000
+static int configured;
 
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
@@ -75,7 +72,7 @@ static void write_msg(struct console *con, const char *msg, 
unsigned int len)
 
local_irq_save(flags);
 
-   for(left = len; left; ) {
+   for (left = len; left;) {
frag = min(left, MAX_PRINT_CHUNK);
netpoll_send_udp(, msg, frag);
msg += frag;
@@ -86,12 +83,12 @@ static void write_msg(struct console *con, const char *msg, 
unsigned int len)
 }
 
 static struct console netconsole = {
-   .name = "netcon",
-   .flags = CON_ENABLED | CON_PRINTBUFFER,
-   .write = write_msg
+   .name   = "netcon",
+   .flags  = CON_ENABLED | CON_PRINTBUFFER,
+   .write  = write_msg,
 };
 
-static int option_setup(char *opt)
+static int __init option_setup(char *opt)
 {
configured = !netpoll_parse_options(, opt);
return 1;
@@ -99,28 +96,30 @@ static int option_setup(char *opt)
 
 __setup("netconsole=", option_setup);
 
-static int init_netconsole(void)
+static int __init init_netconsole(void)
 {
-   int err;
+   int err = 0;
 
-   if(strlen(config))
+   if (strnlen(config, MAX_PARAM_LENGTH))
option_setup(config);
 
-   if(!configured) {
-   printk("netconsole: not configured, aborting\n");
-   return 0;
+   if (!configured) {
+   printk(KERN_INFO "netconsole: not configured, aborting\n");
+   goto out;
}
 
err = netpoll_setup();
if (err)
-   return err;
+   goto out;
 
register_console();
printk(KERN_INFO "netconsole: network logging started\n");
-   return 0;
+
+out:
+   return err;
 }
 
-static void cleanup_netconsole(void)
+static void __exit cleanup_netconsole(void)
 {
unregister_console();
netpoll_cleanup();
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 0/9] netconsole: Multiple targets and dynamic reconfigurability

2007-07-29 Thread Satyam Sharma
[0/9] netconsole: Multiple targets and dynamic reconfigurability

This is v3 of the patchset, the previous versions are available at:
http://lkml.org/lkml/2007/7/4/107
http://lkml.org/lkml/2007/7/10/78

Diffed against 2.6.23-rc1-git6 (6c8dca5d as of writing), but applies
successfully to 2.6.23-rc1-mm1 as well.

Patches 1-5 are okay-to-apply to current mainline -git, I think.

Patches 6-9 introduce the new functionality and are requested for
inclusion in -mm.

[1/9] netconsole: Cleanups, codingstyle, prettyfication
[2/9] netconsole: Remove bogus check
[3/9] netconsole: Simplify boot/module option setup logic
[4/9] netconsole: Use netif_running() in write_msg()
[5/9] netconsole: Add some useful tips to documentation
[6/9] netconsole: Introduce netconsole_target
[7/9] netconsole: Introduce netconsole_netdev_notifier
[8/9] netconsole: Support multiple logging targets
[9/9] netconsole: Support dynamic reconfiguration using configfs

Changes since v2:
=

* "enabled" must be defined outside #ifdef NETCONSOLE_DYNAMIC

* Some simple enhancements to documentation

* Drop the use of "unlikely" from a condition where I'd got the
  common case wrong

About this patchset:


What?

Support multiple remote logging targets in netconsole. Also, ability to
dynamically add or remove targets or modify parameters (IP, port, remote
MAC address) of targets at runtime, from userspace, using configfs.

Why?

>From Keiichi Kii's original post:

[...] current netconsole is not flexible. For example, if you want to change
ip address for logging agent, in the case of built-in netconsole, you can't
change config except for changing boot parameter and rebooting your system,
or in the case of module netconsole, you need to remove it and reload
with different parameters. [...] and we have been losing serial console
port with PCs and Servers.

(... and especially laptops, I would add.)


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Make lguest compile with CONFIG_BLOCK=n and CONFIG_NET=n

2007-07-29 Thread Satyam Sharma
On 7/30/07, Rusty Russell <[EMAIL PROTECTED]> wrote:
> [...]
> Gabriel C reports lguest doesn't compile with CONFIG_BLOCK=n.  Fix
> this by introducing a config var for the block device, which depends
> on LGUEST && BLOCK.  Do the same for the net driver, rather then
> depending gratuitously on CONFIG_NET.
> [...]
> diff -r 9e987fcabb16 drivers/lguest/Kconfig
> --- a/drivers/lguest/KconfigMon Jul 30 09:47:25 2007 +1000
> +++ b/drivers/lguest/KconfigMon Jul 30 10:03:39 2007 +1000
> @@ -1,6 +1,6 @@ config LGUEST
>  config LGUEST
> tristate "Linux hypervisor example code"
> -   depends on X86 && PARAVIRT && NET && EXPERIMENTAL && !X86_PAE
> +   depends on X86 && PARAVIRT && EXPERIMENTAL && !X86_PAE
> select LGUEST_GUEST
> select HVC_DRIVER
> ---help---
> @@ -18,3 +18,11 @@ config LGUEST_GUEST
>   The guest needs code built-in, even if the host has lguest
>   support as a module.  The drivers are tiny, so we build them
>   in too.
> +
> +config LGUEST_NET
> +   tristate
> +   depends on LGUEST_GUEST && NET

default y ?

> +
> +config LGUEST_BLOCK
> +   tristate
> +   depends on LGUEST_GUEST && BLOCK

default y ?

I /think/ the default "y" would also automatically become "m" if any of the
dependencies are modules ... probably need to test this, though.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sb1000: prevent a potential NULL pointer dereference in sb1000_dev_ioctl()

2007-07-29 Thread Satyam Sharma
Hi Michael,


On Sun, 29 Jul 2007, Michael Buesch wrote:

> On Sunday 29 July 2007 20:34:46 Satyam Sharma wrote:
> > (2) !(dev->flags & IFF_UP) is bogus because the functions of this ioctl
> > can (and should) be allowed even when the interface is not up and running.
> 
> Are you _sure_? This function does poke with the device hardware.
> It might return crap or even machinecheck when not initialized.
> Hardware is probably powered down, if not IFF_UP. (I don't know if that's
> the case here, though).

IFF_UP checks if the _interface_ is up -- the hardware / card could still
be powered up, but the interface down (ifconfing eth0 down or ip link set
eth0 down).

Probably what we want here is netif_device_present()? -- I think that
should return true only when the *device* itself is up (as in powered)
but the interface itself could be down ...

Let's wait for comments from the netdev people Cc:'ed here, in that case.


> >  drivers/net/sb1000.c |3 ---
> >  1 files changed, 0 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/sb1000.c b/drivers/net/sb1000.c
> > index 1de3eec..f60fe98 100644
> > --- a/drivers/net/sb1000.c
> > +++ b/drivers/net/sb1000.c
> > @@ -993,9 +993,6 @@ static int sb1000_dev_ioctl(struct net_device *dev, 
> > struct ifreq *ifr, int cmd)
> > unsigned int stats[5];
> > struct sb1000_private *lp = netdev_priv(dev);
> >  
> > -   if (!(dev && dev->flags & IFF_UP))
> > -   return -ENODEV;
> > -


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: Linus 2.6.23-rc1

2007-07-29 Thread Satyam Sharma
Hi Martin,


On Sun, 29 Jul 2007, Martin Steigerwald wrote:

> Am Sonntag 29 Juli 2007 schrieb Sam Ravnborg:
> > On Sun, Jul 29, 2007 at 12:56:28PM +0200, Martin Steigerwald wrote:
> > > Am Sonntag 29 Juli 2007 schrieb Sam Ravnborg:
> > > > > I
> > > > > actually also think that the communication between Ingo and Con
> > > > > could have been better especially when Ingo decided to write CFS
> > > > > while Con was still working hard on SD.
> > > >
> > > > You realize that Ingo posted his code for anyone to look at/comment
> > > > at about 48 hours after he started to work on CFS?
> > >
> > > Yes.
> >
> > So whats wrong then?
> > Ingo decides to do a better scheduler - to some extent inspired by
> > Con's work. And after 48 hours he publish first version that _anyone_
> > can see and comment on. Whats wrong with that?
> >
> > Did you expect some lengthy discussion before the coding phase started
> > or what?
> >
> > Just trying to understand what you are arguing about.
> 
> If I tried to rewrite a kernel subsystem - should I ever happen to dig 
> that deep into kernel matters - while I actually know that someone 
> already spent countless hours on exactly rewriting the exact same 
> subsystem, I think I would have told that other developer about it as 
> soon as I started coding on it. And if it just was a
> 
> "Hi Con,
> 
> I reconsidered the scheduling ideas again you brought to the Linux kernel 
> world. Instead of using your scheduler tough I like to try to write a new 
> one with fairness in mind, cause I think this, this and this can be 
> improved upon.
> 
> I would like to hear your ideas about that as soon as possible and would 
> like you to contribute your ideas and also code, where you see hit. You 
> can find the git / bazaar / whatever repository where I do my 
> developments at: someurl.
> 
> Regards, Ingo"

Sending out the code (and early enough, 48 hours /is/ early enough) *is*
the normal (and good) way to do exactly the communication you described
above, IMHO.

> I believe that Ingo did not meant any bad at all. I think its just the way 
> he works, he likes to have code before saying anything. But still I 
> believe before I'd go about replacing someone else code completely I 
> would inform that developer beforehand, even if it then turns out not to 
> be feasible at all. No need to anounce it to the world already, but I 
> would have informed that developer.

IMHO, what you're suggesting is: (1) not the way development normally
happens in Linux, and, (2) not the way it /should/ happen, either. If
there's something (subsystem, any code big or small) I think I can do
better or have an idea for, I /should/ be able to just hack on it a bit
and then send it out so everybody can comment on it. Why should I be
forced to dance/discuss around with some other people, when the faster
and more effective way would be just present the code/patch that I
have in my mind as an RFC?

See, Martin, in the end it's not about developer A vs developer B. It's
about making the kernel the best out there -- that's what the users want,
anyway. Yes, I fully understand (and have said so earlier myself) that
there's a very important "social" aspect to development on such projects,
but it's best if developer B understands that this is the way things do
(and should) happen and adapt to it. [ It's not like I've been around for
long, however, but the above is still my opinion, fwiw. ]

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sb1000: prevent a potential NULL pointer dereference in sb1000_dev_ioctl()

2007-07-29 Thread Satyam Sharma


> On Sun, 29 Jul 2007, Domen Puncer wrote:
> 
> > On 29/07/07 00:02 +0200, Jesper Juhl wrote:
> > > Hi,
> > > 
> > > Here's a small patch, prompted by a find by the Coverity checker, 
> > > that removes a potential NULL pointer dereference from 
> > > drivers/net/sb1000.c::sb1000_dev_ioctl().
> > > The checker spotted that we do a NULL test of 'dev', yet we 
> > > dereference the pointer prior to that check.
> > > This patch simply moves the dereference after the NULL test.
> > 
> > But... it can't be called without a valid 'dev', no?
> > A quick 'grep do_ioctl net/' confirms that all calls are in
> > the form of 'dev->do_ioctl(dev, ...'.
> 
> Yup, I think so too ...
> 
> 
> > > @@ -991,11 +991,13 @@ static int sb1000_dev_ioctl(struct net_device *dev, 
> > > struct ifreq *ifr, int cmd)
> > >   short PID[4];
> > >   int ioaddr[2], status, frequency;
> > >   unsigned int stats[5];
> > > - struct sb1000_private *lp = netdev_priv(dev);
> > > + struct sb1000_private *lp;
> > >  
> > >   if (!(dev && dev->flags & IFF_UP))
> > >   return -ENODEV;
> 
> I think we could get rid of the !dev check itself. Actually, the IFF_UP
> check /also/ looks suspect to me for two reasons: (1) I remember Stephen
> Hemminger once telling me dev->flags is legacy and unsafe, and one of
> the netif_xxx() functions be used instead, and, (2) I wonder if we really
> require the interface to be up and *running* when we do this ioctl.

Updated patch below.

[PATCH] sb1000: Remove bogus checks

In net_device->do_ioctl() of the sb1000 driver (sb1000_dev_ioctl):

(1) !dev condition is always false -- this function cannot be called with
NULL net_device.
(2) !(dev->flags & IFF_UP) is bogus because the functions of this ioctl
can (and should) be allowed even when the interface is not up and running.

So let's remove these checks.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Cc: Jesper Juhl <[EMAIL PROTECTED]>
Cc: Domen Puncer <[EMAIL PROTECTED]>

---

 drivers/net/sb1000.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/net/sb1000.c b/drivers/net/sb1000.c
index 1de3eec..f60fe98 100644
--- a/drivers/net/sb1000.c
+++ b/drivers/net/sb1000.c
@@ -993,9 +993,6 @@ static int sb1000_dev_ioctl(struct net_device *dev, struct 
ifreq *ifr, int cmd)
unsigned int stats[5];
struct sb1000_private *lp = netdev_priv(dev);
 
-   if (!(dev && dev->flags & IFF_UP))
-   return -ENODEV;
-
ioaddr[0] = dev->base_addr;
/* mem_start holds the second I/O address */
ioaddr[1] = dev->mem_start;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] [PATCH] USB Pegasus driver - avoid a potential NULL pointer dereference.

2007-07-29 Thread Satyam Sharma


On Sun, 29 Jul 2007, Oliver Neukum wrote:

> Am Sonntag 29 Juli 2007 schrieb Jesper Juhl:
> > On 29/07/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:
> > > Hi,
> > >
> > > On 7/29/07, Jesper Juhl <[EMAIL PROTECTED]> wrote:
> > > > Hello,
> > > >
> > > > This patch makes sure we don't dereference a NULL pointer in
> > > > drivers/net/usb/pegasus.c::write_bulk_callback() in the initial
> > > > struct net_device *net = pegasus->net; assignment.
> > > > The existing code checks if 'pegasus' is NULL and bails out if
> > > > it is, so we better not touch that pointer until after that check.
> > > > [...]
> > >
> > > Is it really possible that we're called into this function with
> > > urb->context == NULL? If not, I'd suggest let's just get rid of
> > > the check itself, instead.
> > >
> > I'm not sure. I am not very familiar with this code. I just figured
> > that moving the assignment is potentially a little safer and it is
> > certainly no worse than the current code, so that's a safe and
> > potentially benneficial change. Removing the check may be safe but I'm
> > not certain enough to make that call...
> 
> pegasus == NULL there would be a kernel bug. Silently ignoring
> it, like the code now wants to do is bad. As the oops has never been
> reported, I figure turning it into an explicit debugging test is overkill,
> so removal seems to be the best option.

Ok, thanks. Updated patch below.

[PATCH] pegasus: Remove bogus checks in urb->complete() callbacks

urb->complete() callbacks registered in drivers/net/usb/pegasus.c needlessly
check for urb->context != NULL, but that is not possible (the only way that
can be possible would be a kernel bug elsewhere, and these checks would
simply end up hiding that). So let's remove the bogus checks.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 drivers/net/usb/pegasus.c |9 +
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c
index a05fd97..439ef9f 100644
--- a/drivers/net/usb/pegasus.c
+++ b/drivers/net/usb/pegasus.c
@@ -770,9 +770,6 @@ static void write_bulk_callback(struct urb *urb)
pegasus_t *pegasus = urb->context;
struct net_device *net = pegasus->net;
 
-   if (!pegasus)
-   return;
-
if (!netif_device_present(net) || !netif_running(net))
return;
 
@@ -805,13 +802,9 @@ static void write_bulk_callback(struct urb *urb)
 static void intr_callback(struct urb *urb)
 {
pegasus_t *pegasus = urb->context;
-   struct net_device *net;
+   struct net_device *net = pegasus->net;
int status;
 
-   if (!pegasus)
-   return;
-   net = pegasus->net;
-
switch (urb->status) {
case 0:
break;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] [PATCH] USB Pegasus driver - avoid a potential NULL pointer dereference.

2007-07-29 Thread Satyam Sharma


On Sun, 29 Jul 2007, Oliver Neukum wrote:

 Am Sonntag 29 Juli 2007 schrieb Jesper Juhl:
  On 29/07/07, Satyam Sharma [EMAIL PROTECTED] wrote:
   Hi,
  
   On 7/29/07, Jesper Juhl [EMAIL PROTECTED] wrote:
Hello,
   
This patch makes sure we don't dereference a NULL pointer in
drivers/net/usb/pegasus.c::write_bulk_callback() in the initial
struct net_device *net = pegasus-net; assignment.
The existing code checks if 'pegasus' is NULL and bails out if
it is, so we better not touch that pointer until after that check.
[...]
  
   Is it really possible that we're called into this function with
   urb-context == NULL? If not, I'd suggest let's just get rid of
   the check itself, instead.
  
  I'm not sure. I am not very familiar with this code. I just figured
  that moving the assignment is potentially a little safer and it is
  certainly no worse than the current code, so that's a safe and
  potentially benneficial change. Removing the check may be safe but I'm
  not certain enough to make that call...
 
 pegasus == NULL there would be a kernel bug. Silently ignoring
 it, like the code now wants to do is bad. As the oops has never been
 reported, I figure turning it into an explicit debugging test is overkill,
 so removal seems to be the best option.

Ok, thanks. Updated patch below.

[PATCH] pegasus: Remove bogus checks in urb-complete() callbacks

urb-complete() callbacks registered in drivers/net/usb/pegasus.c needlessly
check for urb-context != NULL, but that is not possible (the only way that
can be possible would be a kernel bug elsewhere, and these checks would
simply end up hiding that). So let's remove the bogus checks.

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]

---

 drivers/net/usb/pegasus.c |9 +
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c
index a05fd97..439ef9f 100644
--- a/drivers/net/usb/pegasus.c
+++ b/drivers/net/usb/pegasus.c
@@ -770,9 +770,6 @@ static void write_bulk_callback(struct urb *urb)
pegasus_t *pegasus = urb-context;
struct net_device *net = pegasus-net;
 
-   if (!pegasus)
-   return;
-
if (!netif_device_present(net) || !netif_running(net))
return;
 
@@ -805,13 +802,9 @@ static void write_bulk_callback(struct urb *urb)
 static void intr_callback(struct urb *urb)
 {
pegasus_t *pegasus = urb-context;
-   struct net_device *net;
+   struct net_device *net = pegasus-net;
int status;
 
-   if (!pegasus)
-   return;
-   net = pegasus-net;
-
switch (urb-status) {
case 0:
break;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sb1000: prevent a potential NULL pointer dereference in sb1000_dev_ioctl()

2007-07-29 Thread Satyam Sharma


 On Sun, 29 Jul 2007, Domen Puncer wrote:
 
  On 29/07/07 00:02 +0200, Jesper Juhl wrote:
   Hi,
   
   Here's a small patch, prompted by a find by the Coverity checker, 
   that removes a potential NULL pointer dereference from 
   drivers/net/sb1000.c::sb1000_dev_ioctl().
   The checker spotted that we do a NULL test of 'dev', yet we 
   dereference the pointer prior to that check.
   This patch simply moves the dereference after the NULL test.
  
  But... it can't be called without a valid 'dev', no?
  A quick 'grep do_ioctl net/' confirms that all calls are in
  the form of 'dev-do_ioctl(dev, ...'.
 
 Yup, I think so too ...
 
 
   @@ -991,11 +991,13 @@ static int sb1000_dev_ioctl(struct net_device *dev, 
   struct ifreq *ifr, int cmd)
 short PID[4];
 int ioaddr[2], status, frequency;
 unsigned int stats[5];
   - struct sb1000_private *lp = netdev_priv(dev);
   + struct sb1000_private *lp;

 if (!(dev  dev-flags  IFF_UP))
 return -ENODEV;
 
 I think we could get rid of the !dev check itself. Actually, the IFF_UP
 check /also/ looks suspect to me for two reasons: (1) I remember Stephen
 Hemminger once telling me dev-flags is legacy and unsafe, and one of
 the netif_xxx() functions be used instead, and, (2) I wonder if we really
 require the interface to be up and *running* when we do this ioctl.

Updated patch below.

[PATCH] sb1000: Remove bogus checks

In net_device-do_ioctl() of the sb1000 driver (sb1000_dev_ioctl):

(1) !dev condition is always false -- this function cannot be called with
NULL net_device.
(2) !(dev-flags  IFF_UP) is bogus because the functions of this ioctl
can (and should) be allowed even when the interface is not up and running.

So let's remove these checks.

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
Cc: Jesper Juhl [EMAIL PROTECTED]
Cc: Domen Puncer [EMAIL PROTECTED]

---

 drivers/net/sb1000.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/net/sb1000.c b/drivers/net/sb1000.c
index 1de3eec..f60fe98 100644
--- a/drivers/net/sb1000.c
+++ b/drivers/net/sb1000.c
@@ -993,9 +993,6 @@ static int sb1000_dev_ioctl(struct net_device *dev, struct 
ifreq *ifr, int cmd)
unsigned int stats[5];
struct sb1000_private *lp = netdev_priv(dev);
 
-   if (!(dev  dev-flags  IFF_UP))
-   return -ENODEV;
-
ioaddr[0] = dev-base_addr;
/* mem_start holds the second I/O address */
ioaddr[1] = dev-mem_start;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: Linus 2.6.23-rc1

2007-07-29 Thread Satyam Sharma
Hi Martin,


On Sun, 29 Jul 2007, Martin Steigerwald wrote:

 Am Sonntag 29 Juli 2007 schrieb Sam Ravnborg:
  On Sun, Jul 29, 2007 at 12:56:28PM +0200, Martin Steigerwald wrote:
   Am Sonntag 29 Juli 2007 schrieb Sam Ravnborg:
 I
 actually also think that the communication between Ingo and Con
 could have been better especially when Ingo decided to write CFS
 while Con was still working hard on SD.
   
You realize that Ingo posted his code for anyone to look at/comment
at about 48 hours after he started to work on CFS?
  
   Yes.
 
  So whats wrong then?
  Ingo decides to do a better scheduler - to some extent inspired by
  Con's work. And after 48 hours he publish first version that _anyone_
  can see and comment on. Whats wrong with that?
 
  Did you expect some lengthy discussion before the coding phase started
  or what?
 
  Just trying to understand what you are arguing about.
 
 If I tried to rewrite a kernel subsystem - should I ever happen to dig 
 that deep into kernel matters - while I actually know that someone 
 already spent countless hours on exactly rewriting the exact same 
 subsystem, I think I would have told that other developer about it as 
 soon as I started coding on it. And if it just was a
 
 Hi Con,
 
 I reconsidered the scheduling ideas again you brought to the Linux kernel 
 world. Instead of using your scheduler tough I like to try to write a new 
 one with fairness in mind, cause I think this, this and this can be 
 improved upon.
 
 I would like to hear your ideas about that as soon as possible and would 
 like you to contribute your ideas and also code, where you see hit. You 
 can find the git / bazaar / whatever repository where I do my 
 developments at: someurl.
 
 Regards, Ingo

Sending out the code (and early enough, 48 hours /is/ early enough) *is*
the normal (and good) way to do exactly the communication you described
above, IMHO.

 I believe that Ingo did not meant any bad at all. I think its just the way 
 he works, he likes to have code before saying anything. But still I 
 believe before I'd go about replacing someone else code completely I 
 would inform that developer beforehand, even if it then turns out not to 
 be feasible at all. No need to anounce it to the world already, but I 
 would have informed that developer.

IMHO, what you're suggesting is: (1) not the way development normally
happens in Linux, and, (2) not the way it /should/ happen, either. If
there's something (subsystem, any code big or small) I think I can do
better or have an idea for, I /should/ be able to just hack on it a bit
and then send it out so everybody can comment on it. Why should I be
forced to dance/discuss around with some other people, when the faster
and more effective way would be just present the code/patch that I
have in my mind as an RFC?

See, Martin, in the end it's not about developer A vs developer B. It's
about making the kernel the best out there -- that's what the users want,
anyway. Yes, I fully understand (and have said so earlier myself) that
there's a very important social aspect to development on such projects,
but it's best if developer B understands that this is the way things do
(and should) happen and adapt to it. [ It's not like I've been around for
long, however, but the above is still my opinion, fwiw. ]

Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sb1000: prevent a potential NULL pointer dereference in sb1000_dev_ioctl()

2007-07-29 Thread Satyam Sharma
Hi Michael,


On Sun, 29 Jul 2007, Michael Buesch wrote:

 On Sunday 29 July 2007 20:34:46 Satyam Sharma wrote:
  (2) !(dev-flags  IFF_UP) is bogus because the functions of this ioctl
  can (and should) be allowed even when the interface is not up and running.
 
 Are you _sure_? This function does poke with the device hardware.
 It might return crap or even machinecheck when not initialized.
 Hardware is probably powered down, if not IFF_UP. (I don't know if that's
 the case here, though).

IFF_UP checks if the _interface_ is up -- the hardware / card could still
be powered up, but the interface down (ifconfing eth0 down or ip link set
eth0 down).

Probably what we want here is netif_device_present()? -- I think that
should return true only when the *device* itself is up (as in powered)
but the interface itself could be down ...

Let's wait for comments from the netdev people Cc:'ed here, in that case.


   drivers/net/sb1000.c |3 ---
   1 files changed, 0 insertions(+), 3 deletions(-)
  
  diff --git a/drivers/net/sb1000.c b/drivers/net/sb1000.c
  index 1de3eec..f60fe98 100644
  --- a/drivers/net/sb1000.c
  +++ b/drivers/net/sb1000.c
  @@ -993,9 +993,6 @@ static int sb1000_dev_ioctl(struct net_device *dev, 
  struct ifreq *ifr, int cmd)
  unsigned int stats[5];
  struct sb1000_private *lp = netdev_priv(dev);
   
  -   if (!(dev  dev-flags  IFF_UP))
  -   return -ENODEV;
  -


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Make lguest compile with CONFIG_BLOCK=n and CONFIG_NET=n

2007-07-29 Thread Satyam Sharma
On 7/30/07, Rusty Russell [EMAIL PROTECTED] wrote:
 [...]
 Gabriel C reports lguest doesn't compile with CONFIG_BLOCK=n.  Fix
 this by introducing a config var for the block device, which depends
 on LGUEST  BLOCK.  Do the same for the net driver, rather then
 depending gratuitously on CONFIG_NET.
 [...]
 diff -r 9e987fcabb16 drivers/lguest/Kconfig
 --- a/drivers/lguest/KconfigMon Jul 30 09:47:25 2007 +1000
 +++ b/drivers/lguest/KconfigMon Jul 30 10:03:39 2007 +1000
 @@ -1,6 +1,6 @@ config LGUEST
  config LGUEST
 tristate Linux hypervisor example code
 -   depends on X86  PARAVIRT  NET  EXPERIMENTAL  !X86_PAE
 +   depends on X86  PARAVIRT  EXPERIMENTAL  !X86_PAE
 select LGUEST_GUEST
 select HVC_DRIVER
 ---help---
 @@ -18,3 +18,11 @@ config LGUEST_GUEST
   The guest needs code built-in, even if the host has lguest
   support as a module.  The drivers are tiny, so we build them
   in too.
 +
 +config LGUEST_NET
 +   tristate
 +   depends on LGUEST_GUEST  NET

default y ?

 +
 +config LGUEST_BLOCK
 +   tristate
 +   depends on LGUEST_GUEST  BLOCK

default y ?

I /think/ the default y would also automatically become m if any of the
dependencies are modules ... probably need to test this, though.


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 0/9] netconsole: Multiple targets and dynamic reconfigurability

2007-07-29 Thread Satyam Sharma
[0/9] netconsole: Multiple targets and dynamic reconfigurability

This is v3 of the patchset, the previous versions are available at:
http://lkml.org/lkml/2007/7/4/107
http://lkml.org/lkml/2007/7/10/78

Diffed against 2.6.23-rc1-git6 (6c8dca5d as of writing), but applies
successfully to 2.6.23-rc1-mm1 as well.

Patches 1-5 are okay-to-apply to current mainline -git, I think.

Patches 6-9 introduce the new functionality and are requested for
inclusion in -mm.

[1/9] netconsole: Cleanups, codingstyle, prettyfication
[2/9] netconsole: Remove bogus check
[3/9] netconsole: Simplify boot/module option setup logic
[4/9] netconsole: Use netif_running() in write_msg()
[5/9] netconsole: Add some useful tips to documentation
[6/9] netconsole: Introduce netconsole_target
[7/9] netconsole: Introduce netconsole_netdev_notifier
[8/9] netconsole: Support multiple logging targets
[9/9] netconsole: Support dynamic reconfiguration using configfs

Changes since v2:
=

* enabled must be defined outside #ifdef NETCONSOLE_DYNAMIC

* Some simple enhancements to documentation

* Drop the use of unlikely from a condition where I'd got the
  common case wrong

About this patchset:


What?

Support multiple remote logging targets in netconsole. Also, ability to
dynamically add or remove targets or modify parameters (IP, port, remote
MAC address) of targets at runtime, from userspace, using configfs.

Why?

From Keiichi Kii's original post:

[...] current netconsole is not flexible. For example, if you want to change
ip address for logging agent, in the case of built-in netconsole, you can't
change config except for changing boot parameter and rebooting your system,
or in the case of module netconsole, you need to remove it and reload
with different parameters. [...] and we have been losing serial console
port with PCs and Servers.

(... and especially laptops, I would add.)


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 1/9] netconsole: Cleanups, codingstyle, prettyfication

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma [EMAIL PROTECTED]

[1/9] netconsole: Cleanups, codingstyle, prettyfication

(1) Remove unwanted headers.
(2) Mark __init and __exit as appropriate.
(3) Various trivial codingstyle and prettification stuff.

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
Signed-off-by: Keiichi Kii [EMAIL PROTECTED]
Acked-by: Matt Mackall [EMAIL PROTECTED]

---

 drivers/net/netconsole.c |   55 ++---
 1 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 69233f6..f1c2a2d 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -35,35 +35,32 @@
  /
 
 #include linux/mm.h
-#include linux/tty.h
 #include linux/init.h
 #include linux/module.h
 #include linux/console.h
-#include linux/tty_driver.h
 #include linux/moduleparam.h
 #include linux/string.h
-#include linux/sysrq.h
-#include linux/smp.h
 #include linux/netpoll.h
 
 MODULE_AUTHOR(Maintainer: Matt Mackall [EMAIL PROTECTED]);
 MODULE_DESCRIPTION(Console driver for network interfaces);
 MODULE_LICENSE(GPL);
 
-static char config[256];
-module_param_string(netconsole, config, 256, 0);
+#define MAX_PARAM_LENGTH   256
+#define MAX_PRINT_CHUNK1000
+
+static char config[MAX_PARAM_LENGTH];
+module_param_string(netconsole, config, MAX_PARAM_LENGTH, 0);
 MODULE_PARM_DESC(netconsole,  [EMAIL 
PROTECTED]/[dev],[tgt-port]@tgt-ip/[tgt-macaddr]\n);
 
 static struct netpoll np = {
-   .name = netconsole,
-   .dev_name = eth0,
-   .local_port = 6665,
-   .remote_port = ,
-   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
+   .name   = netconsole,
+   .dev_name   = eth0,
+   .local_port = 6665,
+   .remote_port= ,
+   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
 };
-static int configured = 0;
-
-#define MAX_PRINT_CHUNK 1000
+static int configured;
 
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
@@ -75,7 +72,7 @@ static void write_msg(struct console *con, const char *msg, 
unsigned int len)
 
local_irq_save(flags);
 
-   for(left = len; left; ) {
+   for (left = len; left;) {
frag = min(left, MAX_PRINT_CHUNK);
netpoll_send_udp(np, msg, frag);
msg += frag;
@@ -86,12 +83,12 @@ static void write_msg(struct console *con, const char *msg, 
unsigned int len)
 }
 
 static struct console netconsole = {
-   .name = netcon,
-   .flags = CON_ENABLED | CON_PRINTBUFFER,
-   .write = write_msg
+   .name   = netcon,
+   .flags  = CON_ENABLED | CON_PRINTBUFFER,
+   .write  = write_msg,
 };
 
-static int option_setup(char *opt)
+static int __init option_setup(char *opt)
 {
configured = !netpoll_parse_options(np, opt);
return 1;
@@ -99,28 +96,30 @@ static int option_setup(char *opt)
 
 __setup(netconsole=, option_setup);
 
-static int init_netconsole(void)
+static int __init init_netconsole(void)
 {
-   int err;
+   int err = 0;
 
-   if(strlen(config))
+   if (strnlen(config, MAX_PARAM_LENGTH))
option_setup(config);
 
-   if(!configured) {
-   printk(netconsole: not configured, aborting\n);
-   return 0;
+   if (!configured) {
+   printk(KERN_INFO netconsole: not configured, aborting\n);
+   goto out;
}
 
err = netpoll_setup(np);
if (err)
-   return err;
+   goto out;
 
register_console(netconsole);
printk(KERN_INFO netconsole: network logging started\n);
-   return 0;
+
+out:
+   return err;
 }
 
-static void cleanup_netconsole(void)
+static void __exit cleanup_netconsole(void)
 {
unregister_console(netconsole);
netpoll_cleanup(np);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 2/9] netconsole: Remove bogus check

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma [EMAIL PROTECTED]

[2/9] netconsole: Remove bogus check

The (!np.dev) check in write_msg() is bogus (always false), because:
np.dev is set by netpoll_setup(), which is called by init_netconsole()
before register_console(), so write_msg() cannot be triggered unless
netpoll_setup() successfully set np.dev. Also np.dev cannot go away
from under us, because netpoll_setup() grabs us reference on it. So
let's remove the bogus check.

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
Acked-by: Keiichi Kii [EMAIL PROTECTED]
Acked-by: Matt Mackall [EMAIL PROTECTED]

---

 drivers/net/netconsole.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index f1c2a2d..2c2aef1 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -67,9 +67,6 @@ static void write_msg(struct console *con, const char *msg, 
unsigned int len)
int frag, left;
unsigned long flags;
 
-   if (!np.dev)
-   return;
-
local_irq_save(flags);
 
for (left = len; left;) {
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 3/9] netconsole: Simplify boot/module option setup logic

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma [EMAIL PROTECTED]

[3/9] netconsole: Simplify boot/module option setup logic

Presently, boot/module parameters are set up quite differently for the
case of built-in netconsole (__setup() - obsolete_checksetup() -
netpoll_parse_options() - strlen(config) == 0 in init_netconsole())
vs modular netconsole (module_param_string() - string copied to the
config variable - strlen(config) != 0 init_netconsole() -
netpoll_parse_options()).

This patch makes both of them similar by doing exactly the equivalent
of a module_param_string() in option_setup() also -- just copying the
param string passed from the kernel command line into config variable.
So, strlen(config) != 0 in both cases, and netpoll_parse_options() is
always called from init_netconsole(), thus making the setup logic for
both cases similar.

Now, option_setup() is only ever called / used for the built-in case,
so we put it inside a #ifndef MODULE, otherwise gcc will complain about
option_setup() being defined but not used. Also, the configured
variable is redundant with this patch and hence removed.

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
Signed-off-by: Keiichi Kii [EMAIL PROTECTED]
Acked-by: Matt Mackall [EMAIL PROTECTED]

---

 drivers/net/netconsole.c |   27 ++-
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 2c2aef1..e56aa6c 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -53,6 +53,15 @@ static char config[MAX_PARAM_LENGTH];
 module_param_string(netconsole, config, MAX_PARAM_LENGTH, 0);
 MODULE_PARM_DESC(netconsole,  [EMAIL 
PROTECTED]/[dev],[tgt-port]@tgt-ip/[tgt-macaddr]\n);
 
+#ifndefMODULE
+static int __init option_setup(char *opt)
+{
+   strlcpy(config, opt, MAX_PARAM_LENGTH);
+   return 1;
+}
+__setup(netconsole=, option_setup);
+#endif /* MODULE */
+
 static struct netpoll np = {
.name   = netconsole,
.dev_name   = eth0,
@@ -60,7 +69,6 @@ static struct netpoll np = {
.remote_port= ,
.remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
 };
-static int configured;
 
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
@@ -85,26 +93,19 @@ static struct console netconsole = {
.write  = write_msg,
 };
 
-static int __init option_setup(char *opt)
-{
-   configured = !netpoll_parse_options(np, opt);
-   return 1;
-}
-
-__setup(netconsole=, option_setup);
-
 static int __init init_netconsole(void)
 {
int err = 0;
 
-   if (strnlen(config, MAX_PARAM_LENGTH))
-   option_setup(config);
-
-   if (!configured) {
+   if (!strnlen(config, MAX_PARAM_LENGTH)) {
printk(KERN_INFO netconsole: not configured, aborting\n);
goto out;
}
 
+   err = netpoll_parse_options(np, config);
+   if (err)
+   goto out;
+
err = netpoll_setup(np);
if (err)
goto out;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 4/9] netconsole: Use netif_running() in write_msg()

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma [EMAIL PROTECTED]

[4/9] netconsole: Use netif_running() in write_msg()

Avoid unnecessarily disabling interrupts and calling netpoll_send_udp()
if the corresponding local interface is not up.

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
Acked-by: Keiichi Kii [EMAIL PROTECTED]
Cc: Matt Mackall [EMAIL PROTECTED]

---

 drivers/net/netconsole.c |   18 +-
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index e56aa6c..75cb761 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -75,16 +75,16 @@ static void write_msg(struct console *con, const char *msg, 
unsigned int len)
int frag, left;
unsigned long flags;
 
-   local_irq_save(flags);
-
-   for (left = len; left;) {
-   frag = min(left, MAX_PRINT_CHUNK);
-   netpoll_send_udp(np, msg, frag);
-   msg += frag;
-   left -= frag;
+   if (netif_running(np.dev)) {
+   local_irq_save(flags);
+   for (left = len; left;) {
+   frag = min(left, MAX_PRINT_CHUNK);
+   netpoll_send_udp(np, msg, frag);
+   msg += frag;
+   left -= frag;
+   }
+   local_irq_restore(flags);
}
-
-   local_irq_restore(flags);
 }
 
 static struct console netconsole = {
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 5/9] netconsole: Add some useful tips to documentation

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma [EMAIL PROTECTED]

[5/9] netconsole: Add some useful tips to documentation

Add some useful general-purpose tips. Also suggest solution for the frequent
problem of console loglevel set too low numerically (i.e. for high priority
messages only) on the sender.

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
Acked-by: Keiichi Kii [EMAIL PROTECTED]
Acked-by: Matt Mackall [EMAIL PROTECTED]

---

 Documentation/networking/netconsole.txt |   25 +
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/Documentation/networking/netconsole.txt 
b/Documentation/networking/netconsole.txt
index 1caa6c7..5962f45 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -44,11 +44,36 @@ WARNING: the default target ethernet setting uses the 
broadcast
 ethernet address to send packets, which can cause increased load on
 other systems on the same ethernet segment.
 
+TIP: some LAN switches may be configured to suppress ethernet broadcasts
+so it is advised to explicitly specify the remote agents' MAC addresses
+from the config parameters passed to netconsole.
+
+TIP: to find out the MAC address of, say, 10.0.0.2, you may try using:
+
+ ping -c 1 10.0.0.2 ; /sbin/arp -n | grep 10.0.0.2
+
+TIP: in case the remote logging agent is on a separate LAN subnet than
+the sender, it is suggested to try specifying the MAC address of the
+default gateway (you may use /sbin/route -n to find it out) as the
+remote MAC address instead.
+
 NOTE: the network device (eth1 in the above case) can run any kind
 of other network traffic, netconsole is not intrusive. Netconsole
 might cause slight delays in other traffic if the volume of kernel
 messages is high, but should have no other impact.
 
+NOTE: if you find that the remote logging agent is not receiving or
+printing all messages from the sender, it is likely that you have set
+the console_loglevel parameter (on the sender) to only send high
+priority messages to the console. You can change this at runtime using:
+
+ dmesg -n 8
+
+or by specifying debug on the kernel command line at boot, to send
+all kernel messages to the console. A specific value for this parameter
+can also be set using the loglevel kernel boot option. See the
+dmesg(8) man page and Documentation/kernel-parameters.txt for details.
+
 Netconsole was designed to be as instantaneous as possible, to
 enable the logging of even the most critical kernel bugs. It works
 from IRQ contexts as well, and does not enable interrupts while
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 6/9] netconsole: Introduce netconsole_target

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma [EMAIL PROTECTED]

[6/9] netconsole: Introduce netconsole_target

Introduce a wrapper structure over netpoll to represent logging targets
configured in netconsole. This will get extended with other members in
further patches.

This is done independent of the (to-be-introduced) NETCONSOLE_DYNAMIC
config option so that we're able to drastically cut down on the #ifdef
complexity of final netconsole.c. Also, struct netconsole_target would be
required for multiple targets support also, and not just dynamic
reconfigurability.

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
Signed-off-by: Keiichi Kii [EMAIL PROTECTED]

---

 drivers/net/netconsole.c |   36 +---
 1 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 75cb761..be15ca6 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -62,24 +62,35 @@ static int __init option_setup(char *opt)
 __setup(netconsole=, option_setup);
 #endif /* MODULE */
 
-static struct netpoll np = {
-   .name   = netconsole,
-   .dev_name   = eth0,
-   .local_port = 6665,
-   .remote_port= ,
-   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
+/**
+ * struct netconsole_target - Represents a configured netconsole target.
+ * @np:The netpoll structure for this target.
+ */
+struct netconsole_target {
+   struct netpoll  np;
+};
+
+static struct netconsole_target default_target = {
+   .np = {
+   .name   = netconsole,
+   .dev_name   = eth0,
+   .local_port = 6665,
+   .remote_port= ,
+   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
+   },
 };
 
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
int frag, left;
unsigned long flags;
+   struct netconsole_target *nt = default_target;
 
-   if (netif_running(np.dev)) {
+   if (netif_running(nt-np.dev)) {
local_irq_save(flags);
for (left = len; left;) {
frag = min(left, MAX_PRINT_CHUNK);
-   netpoll_send_udp(np, msg, frag);
+   netpoll_send_udp(nt-np, msg, frag);
msg += frag;
left -= frag;
}
@@ -96,17 +107,18 @@ static struct console netconsole = {
 static int __init init_netconsole(void)
 {
int err = 0;
+   struct netconsole_target *nt = default_target;
 
if (!strnlen(config, MAX_PARAM_LENGTH)) {
printk(KERN_INFO netconsole: not configured, aborting\n);
goto out;
}
 
-   err = netpoll_parse_options(np, config);
+   err = netpoll_parse_options(nt-np, config);
if (err)
goto out;
 
-   err = netpoll_setup(np);
+   err = netpoll_setup(nt-np);
if (err)
goto out;
 
@@ -119,8 +131,10 @@ out:
 
 static void __exit cleanup_netconsole(void)
 {
+   struct netconsole_target *nt = default_target;
+
unregister_console(netconsole);
-   netpoll_cleanup(np);
+   netpoll_cleanup(nt-np);
 }
 
 module_init(init_netconsole);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 7/9] netconsole: Introduce netconsole_netdev_notifier

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma [EMAIL PROTECTED]

[7/9] netconsole: Introduce netconsole_netdev_notifier

To update fields of underlying netpoll structure at runtime on
corresponding NETDEV_CHANGEADDR or NETDEV_CHANGENAME notifications.

ioctl(SIOCSIFHWADDR or SIOCSIFNAME) could be used to change the
hardware/MAC address or name of the local interface that our netpoll
is attached to. Whenever this happens, netdev notifier chain is called
out with the NETDEV_CHANGEADDR or NETDEV_CHANGENAME event message. We
respond to that and update the local_mac or dev_name field of the struct
netpoll. This makes sense anyway, but is especially required for dynamic
netconsole because the netpoll structure's internal members become user
visible files when either sysfs or configfs are used. So this helps us
to keep up with the MAC address/name changes and keep values in struct
netpoll uptodate.

[ Note that ioctl(SIOCSIFADDR) to change IP address of interface at
  runtime is not handled (to update local_ip of netpoll) on purpose --
  some setups may set the local_ip to a private address, not necessary
  the actual IP address of the sender host, as presently allowed. ]

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
Signed-off-by: Keiichi Kii [EMAIL PROTECTED]

---

 drivers/net/netconsole.c |   32 
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index be15ca6..5557098 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -80,6 +80,33 @@ static struct netconsole_target default_target = {
},
 };
 
+/* Handle network interface device notifications */
+static int netconsole_netdev_event(struct notifier_block *this,
+  unsigned long event,
+  void *ptr)
+{
+   struct net_device *dev = ptr;
+   struct netconsole_target *nt = default_target;
+
+   if (nt-np.dev == dev) {
+   switch (event) {
+   case NETDEV_CHANGEADDR:
+   memcpy(nt-np.local_mac, dev-dev_addr, ETH_ALEN);
+   break;
+
+   case NETDEV_CHANGENAME:
+   strlcpy(nt-np.dev_name, dev-name, IFNAMSIZ);
+   break;
+   }
+   }
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block netconsole_netdev_notifier = {
+   .notifier_call  = netconsole_netdev_event,
+};
+
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
int frag, left;
@@ -122,6 +149,10 @@ static int __init init_netconsole(void)
if (err)
goto out;
 
+   err = register_netdevice_notifier(netconsole_netdev_notifier);
+   if (err)
+   goto out;
+
register_console(netconsole);
printk(KERN_INFO netconsole: network logging started\n);
 
@@ -134,6 +165,7 @@ static void __exit cleanup_netconsole(void)
struct netconsole_target *nt = default_target;
 
unregister_console(netconsole);
+   unregister_netdevice_notifier(netconsole_netdev_notifier);
netpoll_cleanup(nt-np);
 }
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 -mm 8/9] netconsole: Support multiple logging targets

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma [EMAIL PROTECTED]

[8/9] netconsole: Support multiple logging targets

This patch introduces support for multiple targets, independent of
CONFIG_NETCONSOLE_DYNAMIC -- this is useful even in the default case and
(including the infrastructure introduced in previous patches) doesn't
really add too many bytes to module text. All the complexity (and size)
comes with the dynamic reconfigurability / userspace interface patch,
and so it's plausible users may want to keep this enabled but that
disabled (say to avoid a dependency on CONFIG_CONFIGFS_FS too).

Also update documentation to mention the use of ; separator to specify
multiple logging targets in the boot/module option string.

Brief overview:

We maintain a target_list (and corresponding lock). Get rid of the static
default_target and introduce allocation and release functions for our
netconsole_target objects (but keeping sure to preserve previous behaviour
such as default values). During init_netconsole(), ; is used as the
separator to identify multiple target specifications in the boot/module
option string. The target specifications are parsed and netpolls setup.
During exit, the target_list is torn down and all items released.

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
Signed-off-by: Keiichi Kii [EMAIL PROTECTED]

---

 Documentation/networking/netconsole.txt |6 +
 drivers/net/netconsole.c|  171 +++---
 2 files changed, 137 insertions(+), 40 deletions(-)

diff --git a/Documentation/networking/netconsole.txt 
b/Documentation/networking/netconsole.txt
index 5962f45..1aaa738 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -34,6 +34,12 @@ Examples:
 
  insmod netconsole netconsole=@/,@10.0.0.2/
 
+It also supports logging to multiple remote agents by specifying
+parameters for the multiple agents separated by semicolons and the
+complete string enclosed in quotes, thusly:
+
+ modprobe netconsole netconsole=@/,@10.0.0.2/;@/eth1,[EMAIL PROTECTED]/
+
 Built-in netconsole starts immediately after the TCP stack is
 initialized and attempts to bring up the supplied dev at the supplied
 address.
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 5557098..458c4d6 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -62,44 +62,93 @@ static int __init option_setup(char *opt)
 __setup(netconsole=, option_setup);
 #endif /* MODULE */
 
+/* Linked list of all configured targets */
+static LIST_HEAD(target_list);
+
+/* This needs to be a spinlock because write_msg() cannot sleep */
+static DEFINE_SPINLOCK(target_list_lock);
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
+ * @list:  Links this target into the target_list.
  * @np:The netpoll structure for this target.
  */
 struct netconsole_target {
+   struct list_headlist;
struct netpoll  np;
 };
 
-static struct netconsole_target default_target = {
-   .np = {
-   .name   = netconsole,
-   .dev_name   = eth0,
-   .local_port = 6665,
-   .remote_port= ,
-   .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff},
-   },
-};
+/* Allocate new target and setup netpoll for it */
+static struct netconsole_target *alloc_target(char *target_config)
+{
+   int err = -ENOMEM;
+   struct netconsole_target *nt;
+
+   /* Allocate and initialize with defaults */
+   nt = kzalloc(sizeof(*nt), GFP_KERNEL);
+   if (!nt) {
+   printk(KERN_ERR netconsole: failed to allocate memory\n);
+   goto fail;
+   }
+
+   nt-np.name = netconsole;
+   strlcpy(nt-np.dev_name, eth0, IFNAMSIZ);
+   nt-np.local_port = 6665;
+   nt-np.remote_port = ;
+   memset(nt-np.remote_mac, 0xff, ETH_ALEN);
+
+   /* Parse parameters and setup netpoll */
+   err = netpoll_parse_options(nt-np, target_config);
+   if (err)
+   goto fail;
+
+   err = netpoll_setup(nt-np);
+   if (err)
+   goto fail;
+
+   return nt;
+
+fail:
+   kfree(nt);
+   return ERR_PTR(err);
+}
+
+/* Cleanup netpoll for given target and free it */
+static void free_target(struct netconsole_target *nt)
+{
+   netpoll_cleanup(nt-np);
+   kfree(nt);
+}
 
 /* Handle network interface device notifications */
 static int netconsole_netdev_event(struct notifier_block *this,
   unsigned long event,
   void *ptr)
 {
+   unsigned long flags;
+   struct netconsole_target *nt;
struct net_device *dev = ptr;
-   struct netconsole_target *nt = default_target;
 
-   if (nt-np.dev == dev) {
-   switch (event) {
-   case NETDEV_CHANGEADDR:
-   memcpy(nt-np.local_mac, dev-dev_addr, ETH_ALEN

[PATCH v3 -mm 9/9] netconsole: Support dynamic reconfiguration using configfs

2007-07-29 Thread Satyam Sharma
From: Satyam Sharma [EMAIL PROTECTED]

[9/9] netconsole: Support dynamic reconfiguration using configfs

This patch introduces support for dynamic reconfiguration (adding, removing
and/or modifying parameters of netconsole targets at runtime) using a
userspace interface exported via configfs. Documentation is also updated
accordingly.

Issues and brief design overview:

(1) Kernel-initiated creation / destruction of kernel objects is not
possible with configfs -- the lifetimes of the config items is managed
exclusively from userspace. But netconsole must support boot/module params
too, and these are parsed in kernel and hence netpolls must be setup from
the kernel. Joel Becker suggested to separately manage the lifetimes of
the two kinds of netconsole_target objects -- those created via configfs
mkdir(2) from userspace and those specified from the boot/module option
string. This adds complexity and some redundancy here and also means that
boot/module param-created targets are not exposed through the configfs
namespace (and hence cannot be updated / destroyed dynamically). However,
this saves us from locking / refcounting complexities that would need to
be introduced in configfs to support kernel-initiated item creation /
destroy there.

(2) In configfs, item creation takes place in the call chain of the mkdir(2)
syscall in the driver subsystem. If we used an ioctl(2) to create / destroy
objects from userspace, the special userspace program is able to fill out
the structure to be passed into the ioctl and hence specify attributes such
as local interface that are required at the time we set up the netpoll.
For configfs, this information is not available at the time of mkdir(2).
So, we keep all newly-created targets (via configfs) disabled by default.
The user is expected to set various attributes appropriately (including the
local network interface if required) and then write(2) 1 to the enabled
attribute. Thus, netpoll_setup() is then called on the set parameters in the
context of _this_ write(2) on the enabled attribute itself. This design
enables the user to reconfigure existing netconsole targets at runtime to
be attached to newly-come-up interfaces that may not have existed when
netconsole was loaded or when the targets were actually created. All this
effectively enables us to get rid of custom ioctls.

(3) Ultra-paranoid configfs attribute show() and store() operations, with
sanity and input range checking, using only safe string primitives, and
compliant with the recommendations in Documentation/filesystems/sysfs.txt.

(4) A new function netpoll_print_options() is created in the netpoll API,
that just prints out the configured parameters for a netpoll structure.
netpoll_parse_options() is modified to use that and it is also exported to
be used from netconsole.

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
Acked-by: Keiichi Kii [EMAIL PROTECTED]

---

 Documentation/networking/netconsole.txt |   68 
 drivers/net/Kconfig |   10 +
 drivers/net/netconsole.c|  605 +--
 include/linux/netpoll.h |1 +
 net/core/netpoll.c  |   44 ++-
 5 files changed, 683 insertions(+), 45 deletions(-)

diff --git a/Documentation/networking/netconsole.txt 
b/Documentation/networking/netconsole.txt
index 1aaa738..3c2f2b3 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -3,6 +3,10 @@ started by Ingo Molnar [EMAIL PROTECTED], 2001.09.17
 2.6 port and netpoll api by Matt Mackall [EMAIL PROTECTED], Sep 9 2003
 
 Please send bug reports to Matt Mackall [EMAIL PROTECTED]
+and Satyam Sharma [EMAIL PROTECTED]
+
+Introduction:
+=
 
 This module logs kernel printk messages over UDP allowing debugging of
 problem where disk logging fails and serial consoles are impractical.
@@ -13,6 +17,9 @@ the specified interface as soon as possible. While this 
doesn't allow
 capture of early kernel panics, it does capture most of the boot
 process.
 
+Sender and receiver configuration:
+==
+
 It takes a string configuration parameter netconsole in the
 following format:
 
@@ -46,6 +53,67 @@ address.
 
 The remote host can run either 'netcat -u -l -p port' or syslogd.
 
+Dynamic reconfiguration:
+
+
+Dynamic reconfigurability is a useful addition to netconsole that enables
+remote logging targets to be dynamically added, removed, or have their
+parameters reconfigured at runtime from a configfs-based userspace interface.
+[ Note that the parameters of netconsole targets that were specified/created
+from the boot/module option are not exposed via this interface, and hence
+cannot be modified dynamically. ]
+
+To include this feature, select CONFIG_NETCONSOLE_DYNAMIC when building the
+netconsole module (or kernel, if netconsole is built-in).
+
+Some examples follow (where configfs is mounted at the /sys/kernel/config

[PATCH][resend] Fix a typo in Documentation/keys.txt

2007-07-29 Thread Satyam Sharma
[PATCH] Fix a typo in Documentation/keys.txt

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]
Acked-by: David Howells [EMAIL PROTECTED]

---

[ Previously sent on: June 9, 2007 ]

 Documentation/keys.txt |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff -ruNp a/Documentation/keys.txt b/Documentation/keys.txt
--- a/Documentation/keys.txt2007-06-03 02:33:02.0 +0530
+++ b/Documentation/keys.txt2007-06-03 02:38:52.0 +0530
@@ -859,9 +859,8 @@ payload contents for more information.
void unregister_key_type(struct key_type *type);
 
 
-Under some circumstances, it may be desirable to desirable to deal with a
-bundle of keys.  The facility provides access to the keyring type for managing
-such a bundle:
+Under some circumstances, it may be desirable to deal with a bundle of keys.
+The facility provides access to the keyring type for managing such a bundle:
 
struct key_type key_type_keyring;
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] Use mutex instead of semaphore in the Host AP driver

2007-07-29 Thread Satyam Sharma


On Mon, 30 Jul 2007, Michael Buesch wrote:

 On Sunday 29 July 2007 23:34, Matthias Kaehlcke wrote:
  The Host AP driver uses a semaphore as mutex. Use the mutex API
  instead of the (binary) semaphore.
  
  Signed-off-by: Matthias Kaehlcke [EMAIL PROTECTED]

[ Something seems to have gone wrong with your diff / patch / script.
  There was no diff header here, which should have been. ]

  -   res = down_interruptible(local-rid_bap_sem);
  +   res = mutex_lock_interruptible(local-rid_bap_mtx);
  if (res)
  return res;
   
  @@ -902,7 +902,7 @@ static int hfa384x_set_rid(struct net_device *dev, u16 
  rid, void *buf, int len)
  /* RID len in words and +1 for rec.rid */
  rec.len = cpu_to_le16(len / 2 + len % 2 + 1);
   
  -   res = down_interruptible(local-rid_bap_sem);
  +   res = mutex_lock_interruptible(local-rid_bap_mtx);
  if (res)
  return res;
   
 
 Is res returned to userspace? If yes, that's not right.

Yup, that's not right.

 On a interrupted mutex allocation you should return
 -ERESTARTSYS to userspace.

Nope, userspace must not see ERESTARTSYS (I /think/ this is the third
time I'm participating in this exact same discussion :-)

If the return would be caught by a previous in-kernel caller in the
call chain, ERESTARTSYS is okay and it could try to restart the
operation. However, if the return goes unfiltered directly to
userspace, EINTR is the correct choice.


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] Use mutex instead of semaphore in the OnStream SCSI Tape driver

2007-07-29 Thread Satyam Sharma
Hi,


On Sun, 29 Jul 2007, Matthias Kaehlcke wrote:

 The OnStream SCSI Tape driver uses a semaphore as mutex. Use the mutex
 API instead of the (binary) semaphore.
 
 Signed-off-by: Matthias Kaehlcke [EMAIL PROTECTED]
 [...]
 @@ -3298,7 +3298,7 @@ static ssize_t osst_write(struct file * filp, const 
 char __user * buf, size_t co
   char* name = tape_name(STp);
  
  
 - if (down_interruptible(STp-lock))
 + if (mutex_lock_interruptible(STp-lock))
   return (-ERESTARTSYS);

The () after return existed in the code already, but you could've felt
free to remove them :-)

 @@ -3619,7 +3619,7 @@ static ssize_t osst_read(struct file * filp, char 
 __user * buf, size_t count, lo
   char* name  = tape_name(STp);
  
  
 - if (down_interruptible(STp-lock))
 + if (mutex_lock_interruptible(STp-lock))
   return (-ERESTARTSYS);

Same here.

Reviewed-by: Satyam Sharma [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] Use mutex instead of semaphore in the SCSI Tape driver

2007-07-29 Thread Satyam Sharma


On Sun, 29 Jul 2007, Matthias Kaehlcke wrote:

 The SCSI Tape driver uses a semaphore as mutex. Use the mutex API
 instead of the (binary) semaphore.
 
 Signed-off-by: Matthias Kaehlcke [EMAIL PROTECTED]

Reviewed-by: Satyam Sharma [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] Use mutex instead of semaphore in the Host AP driver

2007-07-29 Thread Satyam Sharma
Whoops ...


On Mon, 30 Jul 2007, Satyam Sharma wrote:

 On Mon, 30 Jul 2007, Michael Buesch wrote:
 
  On Sunday 29 July 2007 23:34, Matthias Kaehlcke wrote:
   The Host AP driver uses a semaphore as mutex. Use the mutex API
   instead of the (binary) semaphore.
   
   Signed-off-by: Matthias Kaehlcke [EMAIL PROTECTED]
 
 [ Something seems to have gone wrong with your diff / patch / script.
   There was no diff header here, which should have been. ]
 
   - res = down_interruptible(local-rid_bap_sem);
   + res = mutex_lock_interruptible(local-rid_bap_mtx);
 if (res)
 return res;

   @@ -902,7 +902,7 @@ static int hfa384x_set_rid(struct net_device *dev, 
   u16 rid, void *buf, int len)
 /* RID len in words and +1 for rec.rid */
 rec.len = cpu_to_le16(len / 2 + len % 2 + 1);

   - res = down_interruptible(local-rid_bap_sem);
   + res = mutex_lock_interruptible(local-rid_bap_mtx);
 if (res)
 return res;

  
  Is res returned to userspace? If yes, that's not right.
 
 Yup, that's not right.
 
  On a interrupted mutex allocation you should return
  -ERESTARTSYS to userspace.
 
 Nope, userspace must not see ERESTARTSYS (I /think/ this is the third
 time I'm participating in this exact same discussion :-)
 
 If the return would be caught by a previous in-kernel caller in the
 call chain, ERESTARTSYS is okay and it could try to restart the
 operation. However, if the return goes unfiltered directly to
 userspace, EINTR is the correct choice.

Eek, and because mutex_lock_interruptible() *does* return -EINTR when
interrupted by a signal, and I noticed that hfa384x_set_rid() could be
called from an ioctl(2) codepath with no intermediate caller trying to
restart it, so the code is perfectly alright, actually.

But the patch doesn't have the diff header anyway, so Matthias would
probably have to resend in any case :-)


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] Use mutex instead of semaphore in ISDN subsystem common functions

2007-07-29 Thread Satyam Sharma


On Sun, 29 Jul 2007, Matthias Kaehlcke wrote:

 The ISDN subsystem common functions use a semaphore as mutex. Use the
 mutex API instead of the (binary) semaphore.
 
 Signed-off-by: Matthias Kaehlcke [EMAIL PROTECTED]

Reviewed-by: Satyam Sharma [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] Use mutex instead of semaphore in the DVB frontend tuning interface

2007-07-29 Thread Satyam Sharma


On Sun, 29 Jul 2007, Matthias Kaehlcke wrote:

 The DVB frontend tuning interface uses a semaphore as mutex. Use the
 mutex API instead of the (binary) semaphore.
 
 Signed-off-by: Matthias Kaehlcke [EMAIL PROTECTED]

Reviewed-by: Satyam Sharma [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LinuxPPS spinlocks

2007-07-29 Thread Satyam Sharma
Hi Rodolfo,


On Sun, 29 Jul 2007, Rodolfo Giometti wrote:

 On Sat, Jul 28, 2007 at 02:17:24AM +0530, Satyam Sharma wrote:
  
  I only glanced through the code, so could be wrong, but I noticed that
  the only global / shared data you have in there is a global pps_source
  array of pps_s structs. That's accessed / modified from the various
  syscalls introduced in the API exported to userspace, as well as the
  register/unregister/pps_event API exported to in-kernel client subsystems,
  yes? So it looks like you need to introduce proper locking for it, simply
  type-qualifying it as volatile is not enough.
  
  However, I think you've introduced two locks for it. The syscalls (that
  run in process context, obviously) seem to use a pps_mutex and
  pps_event() seems to be using the pps_lock spinlock (because that
  gets executed from interrupt context) -- and from the looks of it, the
  register/unregister functions are using /both/ the mutex and spinlock (!)
 
 This is right.
 
  This isn't quite right, (in fact there's nothing to protect pps_event from
  racing against a syscall), so you should use *only* the spinlock for
  synchronization -- the spin_lock_irqsave/restore() variants, in fact.
 
 We can't use the spin_lock_irqsave/restore() variants since PPS
 sources cannot manage IRQ enable/disable. For instance, the serial
 source doesn't manage IRQs directly but just uses it to record PPS
 events. The serial driver manages the IRQ enable/disable, not the PPS
 source which only uses the IRQ handler to records events.

Hmm? I still don't see why you can't introduce spin_lock_irqsave/restore()
in pps_event() around the access to pps_source.


 About using both mutex and spinlock I did it since (I think) I should
 protect syscalls from each others and from pps_register/unregister(),
 and pps_event() against pps_register/unregister().

Nopes, it's not about protecting code from each other, you're needlessly
complicating things. Locking is pretty simple, really -- any shared data,
that can be concurrently accessed by multiple threads (or from interrupts)
must be protected with a lock. Note that *data* is protected by a lock,
and not code that handles it (well, this is the kind of behaviour most
cases need, at least, including yours).

So here we're introducing the lock to protect *pps_source*, and not keep
*threads* of execution from stepping over each other. So, simply, just
ensure you grab the lock whenever you want to start accessing the shared
data, and release it when you're done.

The _irqsave/restore() variants are required because (say) one of the
syscalls executing in process context grabs the spinlock. Then, before it
has released it, it gets interrupted and pps_event() begins executing.
Now pps_event() also wants to grab the lock, but the syscall already
has it, so will continue spinning and deadlock!


  [ Also, have you considered making pps_source a list and not an array?
  It'll help you lose a whole lot of MAX_SOURCES, pps_is_allocated, etc
  kind of gymnastics in there, and you _can_ return a pointer to the
  corresponding pps source struct from the register() function to the 
  in-kernel
  users, so that way you get to retain the O(1) access to the corresponding
  source when a client calls into pps_event(), similar to how you're using the
  array index presently. ]
  
  I also noticed code like (from pps_event):
  
  +   /* Try to grab the lock, if not we prefere loose the event... */
  +   if (!spin_trylock(pps_lock))
  +   return;
  
  which looks worrisome and unnecessary. That spinlock looks to be of
  fine enough granularity to me, do you think there'd be any contention
  on it? I /think/ you can simply make that a spin_lock().
 
 This is due the fact I cannot manage IRQ enable/disable.

What I meant is that you could make it a proper spin_lock() --
or spin_lock_irqsave(), actually -- instead of the _trylock_ variant
that it currently is.

I think you're unnecessarily worrying about contention here -- you can
have multiple locks (one for the list, and separate ones for your sources)
if you're really worrying about contention -- or probably rwlocks. But
really, rwlocks would end up being *slower* than spinlocks, unless the
contention is really heavy and it helps to keep multiple readers in the
critical section. But frankly, with at max a few (I'd expect generally
one) PPS sources ever to be connected / registered with teh system, and
just one-pulse-per-second, I don't see why any contention is ever gonna
happen.


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LinuxPPS spinlocks

2007-07-29 Thread Satyam Sharma
Hi,


On Sun, 29 Jul 2007, Rodolfo Giometti wrote:

 On Sat, Jul 28, 2007 at 05:11:17AM +0530, Satyam Sharma wrote:
 
   [ Also, have you considered making pps_source a list and not an array?
   It'll help you lose a whole lot of MAX_SOURCES, pps_is_allocated, etc
   kind of gymnastics in there, and you _can_ return a pointer to the
   corresponding pps source struct from the register() function to the 
   in-kernel
   users, so that way you get to retain the O(1) access to the corresponding
   source when a client calls into pps_event(), similar to how you're using 
   the
   array index presently. ]
  
  I think the above would be sane and safe -- your driver has pretty simple
  lifetime rules, and sources are only created / destroyed from within 
  kernel,
  as and when clients call pps_register_source() and pps_unregister_source().
  So pps_event() can be called on a given source only between the
  corresponding register() and unregister() -- which means register() can
  return us a reference/pointer on the source after allocating / adding it to
  the list (instead of the fixed array index as it presently is), which 
  remains
  valid for the entire duration of the source, till unregister() is called, 
  after
  which we can't be calling pps_event() on the same source anyway.
 
 Ok. I see. I'll study the problem but I think this is can be done
 later, now I think is better having a working code. :)

Fair enough, but I think the code could become a trifle simpler/easier
after the conversion, so probably greater chances of getting merged :-)


  Ok, I've looked through (most of) the RFC and code now, and am only
  commenting on a design-level for now. Anyway, I didn't like the way
  you've significantly drifted from the RFC in several ways:
  
  1. The RFC mandates no such userspace interface / syscall as the
  time_pps_cmd() that you've implemented -- it looks, smells, and feels
  like an ioctl, in fact that's what it is for practical purposes. I'm 
  confused
  as to why didn't you just go ahead and implement the special-file-and-
  file-descriptor based approach as advocated / mandated there.
 
 This is because is not always true that a PPS source is connected with
 a char (or other) device.

But that's alright -- see, as I said, you're confusing between the
special device that represents the *PPS source* itself, with the port
or device that it uses to *physically* connect to the PC.

In the RFC, when they say that the userspace app must open(2) the PPS
source (as they have illustrated in the example too), they mean that
it open(2)'s the special device/file associated with the PPS source,
and *not* the /dev/lpXXX or /dev/ttySXXX that it might be connected
through physically.

So they mean something like /dev/pps0, /dev/pps1 etc instead. Of course,
no such special device exists on a Linux box already, but that's fine
and obvious! *You* are supposed to create / instantiate that when a
pps_register_source() is done from some in-kernel subsystem.


 People whose designed RFC didn't think about
 systems where the PPS signal is connected with a CPU's GPIO and the
 O.S. doesn't provide any char device to refere with!

As I said, it's not the char device for the physical interface itself
being discussed there. That could be parport, uart, some arbit GPIO pin
whatever. But whenever the corresponding kernel subsystem does a
register_source(), you could create the /dev/ppsXXX device ...

 In the common desktop PCs the GPS antenna is connected with the serial
 line and the PPS source is attached to the serial CD, but in the
 embedded systems this is _not_ true. GPS antennae may still be
 connected with serial line but the PPS signal is usually connected
 with a GPIO pin.
 
 In this scenario you cannot use the serial file descriptor to manage
 the PPS signal since it cannot goes to the serial port.

See above.

  [ You've implemented the (optional, as per RFC) time_pps_findsource
  operation in the kernel using the above pseudo-ioctl, but that wasn't
  necessary -- as the RFC itself illustrates, it's something that can easily
  be done (in fact should be done) completely in userspace itself. ]
 
 I used pseudo-ioctl interface since it allows me to easily extend PPS
 support with special, and uncommon, commands.

Hmm, but that's a non-standard, not-mandated-by-RFC syscall. I don't see
how you can get this merged, really :-)

  * At the time of pps_register_source()  -- called by an in-kernel client
  subsystem that creates a PPS source -- allocate a pps source, generate
  an identifier for it, instantiate a special file -- the RFC does not mention
  whether a char or block device, but char device (I noticed an example
  in the RFC where they've used /dev/ppsXX as a possible path) would be
  proper for this. Finally add it to the list of sources. This returns a
  reference/pointer on that source back to the in-kernel client, which then
  passes *that* to pps_event(), similar to how you're presently using

Re: LinuxPPS spinlocks

2007-07-29 Thread Satyam Sharma
Hi Rodolfo,


On Sun, 29 Jul 2007, Rodolfo Giometti wrote:

 On Sat, Jul 28, 2007 at 05:11:17AM +0530, Satyam Sharma wrote:
 
  Take the race between the time_pps_setparams() syscall and a concurrent
  pps_event() from an interrupt for instance. From sys_time_pps_setparams,
  the parameters for an existing source are not modified / set atomically,
  which means a pps_event() called on the same source in between will see
  invalid parameters ... and bad things will happen.
 
 I think this should be a good solution... :)
 
 diff --git a/drivers/pps/kapi.c b/drivers/pps/kapi.c
 index 08de71d..f0c42ec 100644
 --- a/drivers/pps/kapi.c
 +++ b/drivers/pps/kapi.c
 @@ -29,12 +29,6 @@
  #include linux/pps.h
  
  /*
 - * Local variables
 - */
 -
 -static spinlock_t pps_lock = SPIN_LOCK_UNLOCKED;
 -
 -/*
   * Local functions
   */
  
 diff --git a/drivers/pps/pps.c b/drivers/pps/pps.c
 index 9176c01..91b7e4d 100644
 --- a/drivers/pps/pps.c
 +++ b/drivers/pps/pps.c
 @@ -35,6 +35,7 @@
  
  struct pps_s pps_source[PPS_MAX_SOURCES];
  DEFINE_MUTEX(pps_mutex);
 +spinlock_t pps_lock = SPIN_LOCK_UNLOCKED;
  
  /*
   * Misc functions
 @@ -227,6 +228,8 @@ asmlinkage long sys_time_pps_setparams(int source,
 goto sys_time_pps_setparams_exit;
 }
  
 +   spin_lock(pps_lock);
 +
 /* Save the new parameters */
 ret = copy_from_user(pps_source[source].params, params,
 sizeof(struct pps_kparams));
 @@ -245,6 +248,8 @@ asmlinkage long sys_time_pps_setparams(int source,
 pps_source[source].params.mode |= PPS_CANWAIT;
 pps_source[source].params.api_version = PPS_API_VERS;
  
 +   spin_unlock(pps_lock);
 +
  sys_time_pps_setparams_exit:
 mutex_unlock(pps_mutex);


Nopes, this isn't quite correct/safe. I suggest you should read:

http://www.kernel.org/pub/linux/kernel/people/rusty/kernel-locking/


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sb1000: prevent a potential NULL pointer dereference in sb1000_dev_ioctl()

2007-07-28 Thread Satyam Sharma


On Sun, 29 Jul 2007, Domen Puncer wrote:

> On 29/07/07 00:02 +0200, Jesper Juhl wrote:
> > Hi,
> > 
> > Here's a small patch, prompted by a find by the Coverity checker, 
> > that removes a potential NULL pointer dereference from 
> > drivers/net/sb1000.c::sb1000_dev_ioctl().
> > The checker spotted that we do a NULL test of 'dev', yet we 
> > dereference the pointer prior to that check.
> > This patch simply moves the dereference after the NULL test.
> 
> But... it can't be called without a valid 'dev', no?
> A quick 'grep do_ioctl net/' confirms that all calls are in
> the form of 'dev->do_ioctl(dev, ...'.

Yup, I think so too ...


> > @@ -991,11 +991,13 @@ static int sb1000_dev_ioctl(struct net_device *dev, 
> > struct ifreq *ifr, int cmd)
> > short PID[4];
> > int ioaddr[2], status, frequency;
> > unsigned int stats[5];
> > -   struct sb1000_private *lp = netdev_priv(dev);
> > +   struct sb1000_private *lp;
> >  
> > if (!(dev && dev->flags & IFF_UP))
> > return -ENODEV;

I think we could get rid of the !dev check itself. Actually, the IFF_UP
check /also/ looks suspect to me for two reasons: (1) I remember Stephen
Hemminger once telling me dev->flags is legacy and unsafe, and one of
the netif_xxx() functions be used instead, and, (2) I wonder if we really
require the interface to be up and *running* when we do this ioctl.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB Pegasus driver - avoid a potential NULL pointer dereference.

2007-07-28 Thread Satyam Sharma
Hi,

On 7/29/07, Jesper Juhl <[EMAIL PROTECTED]> wrote:
> Hello,
>
> This patch makes sure we don't dereference a NULL pointer in
> drivers/net/usb/pegasus.c::write_bulk_callback() in the initial
> struct net_device *net = pegasus->net; assignment.
> The existing code checks if 'pegasus' is NULL and bails out if
> it is, so we better not touch that pointer until after that check.
> [...]
> diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c
> index a05fd97..04cba6b 100644
> --- a/drivers/net/usb/pegasus.c
> +++ b/drivers/net/usb/pegasus.c
> @@ -768,11 +768,13 @@ done:
>  static void write_bulk_callback(struct urb *urb)
>  {
> pegasus_t *pegasus = urb->context;
> -   struct net_device *net = pegasus->net;
> +   struct net_device *net;
>
> if (!pegasus)
> return;
>
> +   net = pegasus->net;
> +
> if (!netif_device_present(net) || !netif_running(net))
> return;

Is it really possible that we're called into this function with
urb->context == NULL? If not, I'd suggest let's just get rid of
the check itself, instead.

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB Pegasus driver - avoid a potential NULL pointer dereference.

2007-07-28 Thread Satyam Sharma
Hi,

On 7/29/07, Jesper Juhl [EMAIL PROTECTED] wrote:
 Hello,

 This patch makes sure we don't dereference a NULL pointer in
 drivers/net/usb/pegasus.c::write_bulk_callback() in the initial
 struct net_device *net = pegasus-net; assignment.
 The existing code checks if 'pegasus' is NULL and bails out if
 it is, so we better not touch that pointer until after that check.
 [...]
 diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c
 index a05fd97..04cba6b 100644
 --- a/drivers/net/usb/pegasus.c
 +++ b/drivers/net/usb/pegasus.c
 @@ -768,11 +768,13 @@ done:
  static void write_bulk_callback(struct urb *urb)
  {
 pegasus_t *pegasus = urb-context;
 -   struct net_device *net = pegasus-net;
 +   struct net_device *net;

 if (!pegasus)
 return;

 +   net = pegasus-net;
 +
 if (!netif_device_present(net) || !netif_running(net))
 return;

Is it really possible that we're called into this function with
urb-context == NULL? If not, I'd suggest let's just get rid of
the check itself, instead.

Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sb1000: prevent a potential NULL pointer dereference in sb1000_dev_ioctl()

2007-07-28 Thread Satyam Sharma


On Sun, 29 Jul 2007, Domen Puncer wrote:

 On 29/07/07 00:02 +0200, Jesper Juhl wrote:
  Hi,
  
  Here's a small patch, prompted by a find by the Coverity checker, 
  that removes a potential NULL pointer dereference from 
  drivers/net/sb1000.c::sb1000_dev_ioctl().
  The checker spotted that we do a NULL test of 'dev', yet we 
  dereference the pointer prior to that check.
  This patch simply moves the dereference after the NULL test.
 
 But... it can't be called without a valid 'dev', no?
 A quick 'grep do_ioctl net/' confirms that all calls are in
 the form of 'dev-do_ioctl(dev, ...'.

Yup, I think so too ...


  @@ -991,11 +991,13 @@ static int sb1000_dev_ioctl(struct net_device *dev, 
  struct ifreq *ifr, int cmd)
  short PID[4];
  int ioaddr[2], status, frequency;
  unsigned int stats[5];
  -   struct sb1000_private *lp = netdev_priv(dev);
  +   struct sb1000_private *lp;
   
  if (!(dev  dev-flags  IFF_UP))
  return -ENODEV;

I think we could get rid of the !dev check itself. Actually, the IFF_UP
check /also/ looks suspect to me for two reasons: (1) I remember Stephen
Hemminger once telling me dev-flags is legacy and unsafe, and one of
the netif_xxx() functions be used instead, and, (2) I wonder if we really
require the interface to be up and *running* when we do this ioctl.


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LinuxPPS & spinlocks

2007-07-27 Thread Satyam Sharma
Hi,

On 7/28/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:
> Hi Rodolfo,
>
> On 7/28/07, Rodolfo Giometti <[EMAIL PROTECTED]> wrote:
> > On Fri, Jul 27, 2007 at 01:40:14PM -0600, Chris Friesen wrote:
> > >
> > > My point is that the lock should be used to protect specific data. Thus, 
> > > it
> > > would be more correct to say, "spinlock foo is taken because
> > > pps_register_source() accesses variable bar".
> > >
> > > That way, if someone else wants to access "bar", they know that they need
> > > to take lock "foo".
> >
> > Ah, ok! I see. :)
>
> I only glanced through the code, so could be wrong, but I noticed that
> the only global / shared data you have in there is a global "pps_source"
> array of pps_s structs. That's accessed / modified from the various
> syscalls introduced in the API exported to userspace, as well as the
> register/unregister/pps_event API exported to in-kernel client subsystems,
> yes? So it looks like you need to introduce proper locking for it, simply
> type-qualifying it as "volatile" is not enough.
>
> However, I think you've introduced two locks for it. The syscalls (that
> run in process context, obviously) seem to use a pps_mutex and
> pps_event() seems to be using the pps_lock spinlock (because that
> gets executed from interrupt context) -- and from the looks of it, the
> register/unregister functions are using /both/ the mutex and spinlock (!)
>
> This isn't quite right, (in fact there's nothing to protect pps_event from
> racing against a syscall), so you should use *only* the spinlock for
> synchronization -- the spin_lock_irqsave/restore() variants, in fact.

Take the race between the time_pps_setparams() syscall and a concurrent
pps_event() from an interrupt for instance. From sys_time_pps_setparams,
the parameters for an existing source are not modified / set atomically,
which means a pps_event() called on the same source in between will see
invalid parameters ... and bad things will happen.

> [ Also, have you considered making pps_source a list and not an array?
> It'll help you lose a whole lot of MAX_SOURCES, pps_is_allocated, etc
> kind of gymnastics in there, and you _can_ return a pointer to the
> corresponding pps source struct from the register() function to the in-kernel
> users, so that way you get to retain the O(1) access to the corresponding
> source when a client calls into pps_event(), similar to how you're using the
> array index presently. ]

I think the above would be sane and safe -- your driver has pretty simple
lifetime rules, and "sources" are only created / destroyed from within kernel,
as and when clients call pps_register_source() and pps_unregister_source().
So pps_event() can be called on a given source only between the
corresponding register() and unregister() -- which means register() can
return us a reference/pointer on the source after allocating / adding it to
the list (instead of the fixed array index as it presently is), which remains
valid for the entire duration of the source, till unregister() is called, after
which we can't be calling pps_event() on the same source anyway.

> I also noticed code like (from pps_event):
>
> +   /* Try to grab the lock, if not we prefere loose the event... */
> +   if (!spin_trylock(_lock))
> +   return;
>
> which looks worrisome and unnecessary. That spinlock looks to be of
> fine enough granularity to me, do you think there'd be any contention
> on it? I /think/ you can simply make that a spin_lock().
>
> Overall the code looks simple / straightforward enough to me (except for
> the parport / uart stuff that I have no clue about), and I'll also read up on
> the relevant RFC for this and would hopefully try and give you a more
> meaningful review over the weekend.

Ok, I've looked through (most of) the RFC and code now, and am only
commenting on a design-level for now. Anyway, I didn't like the way
you've significantly drifted from the RFC in several ways:

1. The RFC mandates no such userspace interface / syscall as the
time_pps_cmd() that you've implemented -- it looks, smells, and feels
like an ioctl, in fact that's what it is for practical purposes. I'm confused
as to why didn't you just go ahead and implement the special-file-and-
file-descriptor based approach as advocated / mandated there.

[ You've implemented the (optional, as per RFC) time_pps_findsource
operation in the kernel using the above "pseudo-ioctl", but that wasn't
necessary -- as the RFC itself illustrates, it's something that can easily
be done (in fact should be done) completely in userspace itself. ]

2. If you fix the above two issues, you'll notice that you don't need to
short-circuit the (RFC-mandated) tim

Re: LinuxPPS & spinlocks

2007-07-27 Thread Satyam Sharma
Hi Rodolfo,

On 7/28/07, Rodolfo Giometti <[EMAIL PROTECTED]> wrote:
> On Fri, Jul 27, 2007 at 01:40:14PM -0600, Chris Friesen wrote:
> >
> > My point is that the lock should be used to protect specific data. Thus, it
> > would be more correct to say, "spinlock foo is taken because
> > pps_register_source() accesses variable bar".
> >
> > That way, if someone else wants to access "bar", they know that they need
> > to take lock "foo".
>
> Ah, ok! I see. :)

I only glanced through the code, so could be wrong, but I noticed that
the only global / shared data you have in there is a global "pps_source"
array of pps_s structs. That's accessed / modified from the various
syscalls introduced in the API exported to userspace, as well as the
register/unregister/pps_event API exported to in-kernel client subsystems,
yes? So it looks like you need to introduce proper locking for it, simply
type-qualifying it as "volatile" is not enough.

However, I think you've introduced two locks for it. The syscalls (that
run in process context, obviously) seem to use a pps_mutex and
pps_event() seems to be using the pps_lock spinlock (because that
gets executed from interrupt context) -- and from the looks of it, the
register/unregister functions are using /both/ the mutex and spinlock (!)

This isn't quite right, (in fact there's nothing to protect pps_event from
racing against a syscall), so you should use *only* the spinlock for
synchronization -- the spin_lock_irqsave/restore() variants, in fact.

[ Also, have you considered making pps_source a list and not an array?
It'll help you lose a whole lot of MAX_SOURCES, pps_is_allocated, etc
kind of gymnastics in there, and you _can_ return a pointer to the
corresponding pps source struct from the register() function to the in-kernel
users, so that way you get to retain the O(1) access to the corresponding
source when a client calls into pps_event(), similar to how you're using the
array index presently. ]

I also noticed code like (from pps_event):

+   /* Try to grab the lock, if not we prefere loose the event... */
+   if (!spin_trylock(_lock))
+   return;

which looks worrisome and unnecessary. That spinlock looks to be of
fine enough granularity to me, do you think there'd be any contention
on it? I /think/ you can simply make that a spin_lock().

Overall the code looks simple / straightforward enough to me (except for
the parport / uart stuff that I have no clue about), and I'll also read up on
the relevant RFC for this and would hopefully try and give you a more
meaningful review over the weekend.

Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-27 Thread Satyam Sharma
On 7/27/07, Alan Cox <[EMAIL PROTECTED]> wrote:
> > Maybe I should resurrect it & send it out...

Hmm, something that hooks in not only at do_IRQ time (as the present
in-mainline stackoverflow check thing)?

> > (FWIW I think I recall that the warning itself sometimes tipped the
> > scales enough on 4k stacks to bring the box down)
>
> You can always switch stack for the printk and it probably should panic
> at that point and give a trace then die as that is what we are trying to
> prove does not occur

Yes, only yesterday I saw exactly this happening DEBUG_STACKOVERFLOW
when doing a udf -> pktcdvd -> cdrom -> ide_cd thing. It's one of those
reproducible will-crash-4k-stacks tests, especially if you have debug stuff
enabled in your build that would make on-stack structures (where such
exist on the codepath) a bit heavier.

Admittedly, what seems to have happened is a bit pathological:

[  481.836378] cdrom: entering cdrom_count_tracks
[  481.844266] BUG: sleeping function called from invalid context at
include/asm/semaphore.h:98
[  481.844434] do_IRQ: stack overflow: 164
[  481.844540]  [] show_trace_log_lvl+0x19/0x2e
[  481.844707]  [] show_trace+0x12/0x14
[  481.844867]  [] dump_stack+0x14/0x16
[  481.845027]  [] do_IRQ+0x7b/0xe1
[  481.845186]  [] common_interrupt+0x2e/0x34
[  481.845348]  [] printk+0x1b/0x1d
[  481.845507]  [] __might_sleep+0x81/0xdc
[  481.845668]  [] __reacquire_kernel_lock+0x2d/0x4f
[  481.845833]  [] schedule+0x78a/0x7a4
[  481.845996]  [] wait_for_completion+0x72/0x97
[  481.846160]  [] ide_do_drive_cmd+0xeb/0x109
[  481.846324]  [] cdrom_queue_packet_command+0x40/0xc5 [ide_cd]
[  481.846503]  [] ide_cdrom_packet+0x86/0xa4 [ide_cd]
[  481.846669]  [] cdrom_get_disc_info+0x48/0x87 [cdrom]
[  481.846839]  [] cdrom_get_last_written+0x2a/0xfe [cdrom]
[  481.847009]  [] cdrom_read_toc+0x39d/0x3f3 [ide_cd]
[  481.847231]  [] ide_cdrom_audio_ioctl+0x130/0x1ce [ide_cd]
[  481.847414]  [] cdrom_count_tracks+0x5c/0x126 [cdrom]
[  481.847583]  [] cdrom_open+0x147/0x79c [cdrom]
[  481.847748]  [] idecd_open+0x75/0x8a [ide_cd]
[  481.847912]  [] do_open+0x1d1/0x284
[  481.848079]  [] __blkdev_get+0x73/0x7e
[  481.848242]  [] blkdev_get+0x15/0x17
[  481.848411]  [] pkt_open+0x99/0xc6e [pktcdvd]
[  481.848583]  [] do_open+0x96/0x284
[  481.848745]  [] __blkdev_get+0x73/0x7e
[  481.848910]  [] blkdev_get+0x15/0x17

(... the trace cut off there, and then the box froze hard, no sysrq ...)

The mount(2) hit the wait_for_completion() in ide_do_drive_cmd(),
little stack was left at this point. But then I have no idea why the
__reacquire_kernel_lock() from schedule() gave a might_sleep() there,
the code in sched.c and kernel_lock.c looks obviously correct -- the
down(_sem) only happens with both irqs and preemption on.

Anyway, the second line of printk() in __might_sleep (the one that
tells us in_atomic() and irqs_disabled()) was about to be printed when
an interrupt decided to join the fun. do_IRQ() comes in, with debug
stackoverflows on, it notices that only 164 bytes worth of stack is left
and decides to dump_stack ... and while we were doing just that,
we died. (this was 2.6.23-rc1-mm1)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NFSv4 poops itself

2007-07-27 Thread Satyam Sharma
On 7/27/07, Marc Dietrich <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> Am Friday 27 July 2007 14:53 schrieben Sie:
> > Background:
> >
> > Server: x86-64 dual core Intel, kernel 2.6.23-rc1 (my home fileserver)
> >   Exporting NFS/NFSv4 mounts.  Client count: 1  Uptime: 4 days
> >
> > Client: x86-64 dual core Intel, kernel 2.6.23-rc1 (my main workstation)
> >   NFS mount setup:
> >   pretzel:/ on /g type nfs4 (rw,noatime,proto=tcp,addr=10.10.10.1)
> >   Uptime: 4 days
> >
> >   Home directory mounted via NFSv4.
> >
> > Problem:
> >
> > My workstation has been happily talking to my file server for several
> > days without incident.  An hour ago, my numeric keypad stopping working
> > (unrelated problem... USB or X bug?).  The solution to the keypad
> > problem is usually to log out of X and log back in, or worse case, reboot.
> >
> > So, I log out, and log back in.  At first, a few shell windows open and
> > successfully initialize themselves (read bash profile over NFS, etc.)
> > Then, as more shell windows open, things start hanging.  I can easily
> > switch to console and ssh to the fileserver, so it is clear this is an
> > NFS hang.
> >
> > No adverse messages at all on the client.
> >
> > On the server, I see NFSv4 spamming dmesg with hundreds of thousands of
> > messages:
> >
> > Jul 27 08:20:53 pretzel kernel: NFSD: preprocess_seqid_op: old stateid!
> > Jul 27 08:21:24 pretzel last message repeated 167966 times
> > Jul 27 08:21:55 pretzel last message repeated 173628 times
> > Jul 27 08:21:55 pretzel kernel: NFSD: preprocess_seqid_op: old stateid!
> > Jul 27 08:22:26 pretzel last message repeated 171286 times
> > Jul 27 08:23:27 pretzel last message repeated 344461 times
> > Jul 27 08:23:30 pretzel last message repeated 18656 times
> >
> > I rebooted the client, the problem disappeared, and everything is happy
> > again...  but clearly NFSv4 shat itself.  And now I am worried this will
> > happen again.
> >
> > In all my quite-heavy use of NFSv4 I've never seen this behavior before,
> >   so I would call this a regression.
> >
> > I always run vanilla linux-2.6.git self-built kernels on both client and
> > server.
>
> me too, my server has 2.6.18-? (openSUSE 10.2). On the client
> (2.6.23-rc1-mm1), I also see (shortly before the hang)
>
> Jul 26 13:09:19 fb07-iapwap2 kernel: =
> Jul 26 13:09:19 fb07-iapwap2 kernel: [ INFO: inconsistent lock state ]
> Jul 26 13:09:19 fb07-iapwap2 kernel: 2.6.23-rc1-mm1 #1
> Jul 26 13:09:19 fb07-iapwap2 kernel: -
> Jul 26 13:09:19 fb07-iapwap2 kernel: inconsistent {softirq-on-W} ->
> {in-softirq-W} usage.
> Jul 26 13:09:19 fb07-iapwap2 kernel: hald/3873 [HC0[0]:SC1[1]:HE1:SE0] takes:
> Jul 26 13:09:19 fb07-iapwap2 kernel:  (rpc_credcache_lock){-+..}, at:
> [] _atomic_dec_and_lock+0x16/0x60
> Jul 26 13:09:19 fb07-iapwap2 kernel: {softirq-on-W} state was registered at:
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] mark_lock+0x77/0x630
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] add_lock_to_list+0x44/0xc0
> Jul 26 13:09:19 fb07-iapwap2 kernel:   []
> __lock_acquire+0x65f/0x1020
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] mark_held_locks+0x5e/0x80
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] local_bh_enable+0x7d/0x130
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] lock_acquire+0x5f/0x80
> Jul 26 13:09:19 fb07-iapwap2 kernel:   []
> _atomic_dec_and_lock+0x16/0x60
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] _spin_lock+0x2a/0x40
> Jul 26 13:09:19 fb07-iapwap2 kernel:   []
> _atomic_dec_and_lock+0x16/0x60
> Jul 26 13:09:19 fb07-iapwap2 kernel:   []
> _atomic_dec_and_lock+0x16/0x60
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] _spin_lock+0x2a/0x40
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] put_rpccred+0x60/0x110
> [sunrpc]
> Jul 26 13:09:19 fb07-iapwap2 kernel:   []
> rpcauth_unbindcred+0x20/0x60 [sunrpc]
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] rpc_put_task+0x44/0xb0
> [sunrpc]
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] rpc_call_sync+0x2d/0x40
> [sunrpc]
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] rpcb_register+0x10d/0x1c0
> [sunrpc]
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] svc_register+0x8f/0x160
> [sunrpc]
> Jul 26 13:09:19 fb07-iapwap2 kernel:   []
> trace_hardirqs_on+0xc5/0x170
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] __svc_create+0x1d5/0x200
> [sunrpc]
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] svc_create+0x10/0x20
> [sunrpc]
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] nfs_callback_up+0x82/0x120
> [nfs]
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] nfs_get_client+0x17d/0x390
> [nfs]
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] kmem_cache_alloc+0x79/0xd0
> Jul 26 13:09:19 fb07-iapwap2 kernel:   []
> trace_hardirqs_on+0xc5/0x170
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] nfs4_set_client+0x34/0x1a0
> [nfs]
> Jul 26 13:09:19 fb07-iapwap2 kernel:   []
> nfs4_create_server+0x61/0x3c0 [nfs]
> Jul 26 13:09:19 fb07-iapwap2 kernel:   [] __kmalloc+0xdf/0x120
> Jul 26 13:09:19 fb07-iapwap2 kernel:   []
> 

Re: NFSv4 poops itself

2007-07-27 Thread Satyam Sharma
On 7/27/07, Marc Dietrich [EMAIL PROTECTED] wrote:

 Hi,

 Am Friday 27 July 2007 14:53 schrieben Sie:
  Background:
 
  Server: x86-64 dual core Intel, kernel 2.6.23-rc1 (my home fileserver)
Exporting NFS/NFSv4 mounts.  Client count: 1  Uptime: 4 days
 
  Client: x86-64 dual core Intel, kernel 2.6.23-rc1 (my main workstation)
NFS mount setup:
pretzel:/ on /g type nfs4 (rw,noatime,proto=tcp,addr=10.10.10.1)
Uptime: 4 days
 
Home directory mounted via NFSv4.
 
  Problem:
 
  My workstation has been happily talking to my file server for several
  days without incident.  An hour ago, my numeric keypad stopping working
  (unrelated problem... USB or X bug?).  The solution to the keypad
  problem is usually to log out of X and log back in, or worse case, reboot.
 
  So, I log out, and log back in.  At first, a few shell windows open and
  successfully initialize themselves (read bash profile over NFS, etc.)
  Then, as more shell windows open, things start hanging.  I can easily
  switch to console and ssh to the fileserver, so it is clear this is an
  NFS hang.
 
  No adverse messages at all on the client.
 
  On the server, I see NFSv4 spamming dmesg with hundreds of thousands of
  messages:
 
  Jul 27 08:20:53 pretzel kernel: NFSD: preprocess_seqid_op: old stateid!
  Jul 27 08:21:24 pretzel last message repeated 167966 times
  Jul 27 08:21:55 pretzel last message repeated 173628 times
  Jul 27 08:21:55 pretzel kernel: NFSD: preprocess_seqid_op: old stateid!
  Jul 27 08:22:26 pretzel last message repeated 171286 times
  Jul 27 08:23:27 pretzel last message repeated 344461 times
  Jul 27 08:23:30 pretzel last message repeated 18656 times
 
  I rebooted the client, the problem disappeared, and everything is happy
  again...  but clearly NFSv4 shat itself.  And now I am worried this will
  happen again.
 
  In all my quite-heavy use of NFSv4 I've never seen this behavior before,
so I would call this a regression.
 
  I always run vanilla linux-2.6.git self-built kernels on both client and
  server.

 me too, my server has 2.6.18-? (openSUSE 10.2). On the client
 (2.6.23-rc1-mm1), I also see (shortly before the hang)

 Jul 26 13:09:19 fb07-iapwap2 kernel: =
 Jul 26 13:09:19 fb07-iapwap2 kernel: [ INFO: inconsistent lock state ]
 Jul 26 13:09:19 fb07-iapwap2 kernel: 2.6.23-rc1-mm1 #1
 Jul 26 13:09:19 fb07-iapwap2 kernel: -
 Jul 26 13:09:19 fb07-iapwap2 kernel: inconsistent {softirq-on-W} -
 {in-softirq-W} usage.
 Jul 26 13:09:19 fb07-iapwap2 kernel: hald/3873 [HC0[0]:SC1[1]:HE1:SE0] takes:
 Jul 26 13:09:19 fb07-iapwap2 kernel:  (rpc_credcache_lock){-+..}, at:
 [c01dc166] _atomic_dec_and_lock+0x16/0x60
 Jul 26 13:09:19 fb07-iapwap2 kernel: {softirq-on-W} state was registered at:
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c013d4f7] mark_lock+0x77/0x630
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c013c094] add_lock_to_list+0x44/0xc0
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c013e8af]
 __lock_acquire+0x65f/0x1020
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c013db0e] mark_held_locks+0x5e/0x80
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c012448d] local_bh_enable+0x7d/0x130
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c013f2cf] lock_acquire+0x5f/0x80
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c01dc166]
 _atomic_dec_and_lock+0x16/0x60
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c02d156a] _spin_lock+0x2a/0x40
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c01dc166]
 _atomic_dec_and_lock+0x16/0x60
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c01dc166]
 _atomic_dec_and_lock+0x16/0x60
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c02d156a] _spin_lock+0x2a/0x40
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [dcecf770] put_rpccred+0x60/0x110
 [sunrpc]
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [dcecf840]
 rpcauth_unbindcred+0x20/0x60 [sunrpc]
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [dcece1f4] rpc_put_task+0x44/0xb0
 [sunrpc]
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [dcec8ffd] rpc_call_sync+0x2d/0x40
 [sunrpc]
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [dced680d] rpcb_register+0x10d/0x1c0
 [sunrpc]
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [dced06ef] svc_register+0x8f/0x160
 [sunrpc]
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c013dca5]
 trace_hardirqs_on+0xc5/0x170
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [dced0fc5] __svc_create+0x1d5/0x200
 [sunrpc]
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [dced1160] svc_create+0x10/0x20
 [sunrpc]
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [dcf69602] nfs_callback_up+0x82/0x120
 [nfs]
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [dcf43efd] nfs_get_client+0x17d/0x390
 [nfs]
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c0175f59] kmem_cache_alloc+0x79/0xd0
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c013dca5]
 trace_hardirqs_on+0xc5/0x170
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [dcf44144] nfs4_set_client+0x34/0x1a0
 [nfs]
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [dcf448d1]
 nfs4_create_server+0x61/0x3c0 [nfs]
 Jul 26 13:09:19 fb07-iapwap2 kernel:   [c017657f] 

Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?

2007-07-27 Thread Satyam Sharma
On 7/27/07, Alan Cox [EMAIL PROTECTED] wrote:
  Maybe I should resurrect it  send it out...

Hmm, something that hooks in not only at do_IRQ time (as the present
in-mainline stackoverflow check thing)?

  (FWIW I think I recall that the warning itself sometimes tipped the
  scales enough on 4k stacks to bring the box down)

 You can always switch stack for the printk and it probably should panic
 at that point and give a trace then die as that is what we are trying to
 prove does not occur

Yes, only yesterday I saw exactly this happening DEBUG_STACKOVERFLOW
when doing a udf - pktcdvd - cdrom - ide_cd thing. It's one of those
reproducible will-crash-4k-stacks tests, especially if you have debug stuff
enabled in your build that would make on-stack structures (where such
exist on the codepath) a bit heavier.

Admittedly, what seems to have happened is a bit pathological:

[  481.836378] cdrom: entering cdrom_count_tracks
[  481.844266] BUG: sleeping function called from invalid context at
include/asm/semaphore.h:98
[  481.844434] do_IRQ: stack overflow: 164
[  481.844540]  [c0405cfe] show_trace_log_lvl+0x19/0x2e
[  481.844707]  [c0405dfe] show_trace+0x12/0x14
[  481.844867]  [c0405e14] dump_stack+0x14/0x16
[  481.845027]  [c0406ff6] do_IRQ+0x7b/0xe1
[  481.845186]  [c040583e] common_interrupt+0x2e/0x34
[  481.845348]  [c042b8e7] printk+0x1b/0x1d
[  481.845507]  [c0422c05] __might_sleep+0x81/0xdc
[  481.845668]  [c066d869] __reacquire_kernel_lock+0x2d/0x4f
[  481.845833]  [c066b09b] schedule+0x78a/0x7a4
[  481.845996]  [c066b538] wait_for_completion+0x72/0x97
[  481.846160]  [c05937a6] ide_do_drive_cmd+0xeb/0x109
[  481.846324]  [f89172a2] cdrom_queue_packet_command+0x40/0xc5 [ide_cd]
[  481.846503]  [f89175b7] ide_cdrom_packet+0x86/0xa4 [ide_cd]
[  481.846669]  [f8854dc1] cdrom_get_disc_info+0x48/0x87 [cdrom]
[  481.846839]  [f8854ec6] cdrom_get_last_written+0x2a/0xfe [cdrom]
[  481.847009]  [f891831b] cdrom_read_toc+0x39d/0x3f3 [ide_cd]
[  481.847231]  [f8918e7e] ide_cdrom_audio_ioctl+0x130/0x1ce [ide_cd]
[  481.847414]  [f8854123] cdrom_count_tracks+0x5c/0x126 [cdrom]
[  481.847583]  [f8855688] cdrom_open+0x147/0x79c [cdrom]
[  481.847748]  [f891799a] idecd_open+0x75/0x8a [ide_cd]
[  481.847912]  [c04aac0e] do_open+0x1d1/0x284
[  481.848079]  [c04aad89] __blkdev_get+0x73/0x7e
[  481.848242]  [c04aada9] blkdev_get+0x15/0x17
[  481.848411]  [f8b34b6b] pkt_open+0x99/0xc6e [pktcdvd]
[  481.848583]  [c04aaad3] do_open+0x96/0x284
[  481.848745]  [c04aad89] __blkdev_get+0x73/0x7e
[  481.848910]  [c04aada9] blkdev_get+0x15/0x17

(... the trace cut off there, and then the box froze hard, no sysrq ...)

The mount(2) hit the wait_for_completion() in ide_do_drive_cmd(),
little stack was left at this point. But then I have no idea why the
__reacquire_kernel_lock() from schedule() gave a might_sleep() there,
the code in sched.c and kernel_lock.c looks obviously correct -- the
down(kernel_sem) only happens with both irqs and preemption on.

Anyway, the second line of printk() in __might_sleep (the one that
tells us in_atomic() and irqs_disabled()) was about to be printed when
an interrupt decided to join the fun. do_IRQ() comes in, with debug
stackoverflows on, it notices that only 164 bytes worth of stack is left
and decides to dump_stack ... and while we were doing just that,
we died. (this was 2.6.23-rc1-mm1)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LinuxPPS spinlocks

2007-07-27 Thread Satyam Sharma
Hi Rodolfo,

On 7/28/07, Rodolfo Giometti [EMAIL PROTECTED] wrote:
 On Fri, Jul 27, 2007 at 01:40:14PM -0600, Chris Friesen wrote:
 
  My point is that the lock should be used to protect specific data. Thus, it
  would be more correct to say, spinlock foo is taken because
  pps_register_source() accesses variable bar.
 
  That way, if someone else wants to access bar, they know that they need
  to take lock foo.

 Ah, ok! I see. :)

I only glanced through the code, so could be wrong, but I noticed that
the only global / shared data you have in there is a global pps_source
array of pps_s structs. That's accessed / modified from the various
syscalls introduced in the API exported to userspace, as well as the
register/unregister/pps_event API exported to in-kernel client subsystems,
yes? So it looks like you need to introduce proper locking for it, simply
type-qualifying it as volatile is not enough.

However, I think you've introduced two locks for it. The syscalls (that
run in process context, obviously) seem to use a pps_mutex and
pps_event() seems to be using the pps_lock spinlock (because that
gets executed from interrupt context) -- and from the looks of it, the
register/unregister functions are using /both/ the mutex and spinlock (!)

This isn't quite right, (in fact there's nothing to protect pps_event from
racing against a syscall), so you should use *only* the spinlock for
synchronization -- the spin_lock_irqsave/restore() variants, in fact.

[ Also, have you considered making pps_source a list and not an array?
It'll help you lose a whole lot of MAX_SOURCES, pps_is_allocated, etc
kind of gymnastics in there, and you _can_ return a pointer to the
corresponding pps source struct from the register() function to the in-kernel
users, so that way you get to retain the O(1) access to the corresponding
source when a client calls into pps_event(), similar to how you're using the
array index presently. ]

I also noticed code like (from pps_event):

+   /* Try to grab the lock, if not we prefere loose the event... */
+   if (!spin_trylock(pps_lock))
+   return;

which looks worrisome and unnecessary. That spinlock looks to be of
fine enough granularity to me, do you think there'd be any contention
on it? I /think/ you can simply make that a spin_lock().

Overall the code looks simple / straightforward enough to me (except for
the parport / uart stuff that I have no clue about), and I'll also read up on
the relevant RFC for this and would hopefully try and give you a more
meaningful review over the weekend.

Thanks,
Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LinuxPPS spinlocks

2007-07-27 Thread Satyam Sharma
Hi,

On 7/28/07, Satyam Sharma [EMAIL PROTECTED] wrote:
 Hi Rodolfo,

 On 7/28/07, Rodolfo Giometti [EMAIL PROTECTED] wrote:
  On Fri, Jul 27, 2007 at 01:40:14PM -0600, Chris Friesen wrote:
  
   My point is that the lock should be used to protect specific data. Thus, 
   it
   would be more correct to say, spinlock foo is taken because
   pps_register_source() accesses variable bar.
  
   That way, if someone else wants to access bar, they know that they need
   to take lock foo.
 
  Ah, ok! I see. :)

 I only glanced through the code, so could be wrong, but I noticed that
 the only global / shared data you have in there is a global pps_source
 array of pps_s structs. That's accessed / modified from the various
 syscalls introduced in the API exported to userspace, as well as the
 register/unregister/pps_event API exported to in-kernel client subsystems,
 yes? So it looks like you need to introduce proper locking for it, simply
 type-qualifying it as volatile is not enough.

 However, I think you've introduced two locks for it. The syscalls (that
 run in process context, obviously) seem to use a pps_mutex and
 pps_event() seems to be using the pps_lock spinlock (because that
 gets executed from interrupt context) -- and from the looks of it, the
 register/unregister functions are using /both/ the mutex and spinlock (!)

 This isn't quite right, (in fact there's nothing to protect pps_event from
 racing against a syscall), so you should use *only* the spinlock for
 synchronization -- the spin_lock_irqsave/restore() variants, in fact.

Take the race between the time_pps_setparams() syscall and a concurrent
pps_event() from an interrupt for instance. From sys_time_pps_setparams,
the parameters for an existing source are not modified / set atomically,
which means a pps_event() called on the same source in between will see
invalid parameters ... and bad things will happen.

 [ Also, have you considered making pps_source a list and not an array?
 It'll help you lose a whole lot of MAX_SOURCES, pps_is_allocated, etc
 kind of gymnastics in there, and you _can_ return a pointer to the
 corresponding pps source struct from the register() function to the in-kernel
 users, so that way you get to retain the O(1) access to the corresponding
 source when a client calls into pps_event(), similar to how you're using the
 array index presently. ]

I think the above would be sane and safe -- your driver has pretty simple
lifetime rules, and sources are only created / destroyed from within kernel,
as and when clients call pps_register_source() and pps_unregister_source().
So pps_event() can be called on a given source only between the
corresponding register() and unregister() -- which means register() can
return us a reference/pointer on the source after allocating / adding it to
the list (instead of the fixed array index as it presently is), which remains
valid for the entire duration of the source, till unregister() is called, after
which we can't be calling pps_event() on the same source anyway.

 I also noticed code like (from pps_event):

 +   /* Try to grab the lock, if not we prefere loose the event... */
 +   if (!spin_trylock(pps_lock))
 +   return;

 which looks worrisome and unnecessary. That spinlock looks to be of
 fine enough granularity to me, do you think there'd be any contention
 on it? I /think/ you can simply make that a spin_lock().

 Overall the code looks simple / straightforward enough to me (except for
 the parport / uart stuff that I have no clue about), and I'll also read up on
 the relevant RFC for this and would hopefully try and give you a more
 meaningful review over the weekend.

Ok, I've looked through (most of) the RFC and code now, and am only
commenting on a design-level for now. Anyway, I didn't like the way
you've significantly drifted from the RFC in several ways:

1. The RFC mandates no such userspace interface / syscall as the
time_pps_cmd() that you've implemented -- it looks, smells, and feels
like an ioctl, in fact that's what it is for practical purposes. I'm confused
as to why didn't you just go ahead and implement the special-file-and-
file-descriptor based approach as advocated / mandated there.

[ You've implemented the (optional, as per RFC) time_pps_findsource
operation in the kernel using the above pseudo-ioctl, but that wasn't
necessary -- as the RFC itself illustrates, it's something that can easily
be done (in fact should be done) completely in userspace itself. ]

2. If you fix the above two issues, you'll notice that you don't need to
short-circuit the (RFC-mandated) time_pps_create/destroy(handle)
syscalls in the userspace header/library anymore, as you presently are.

Here's how I'd go about desiging/implementing this:

* At the time of pps_register_source()  -- called by an in-kernel client
subsystem that creates a PPS source -- allocate a pps source, generate
an identifier for it, instantiate a special file -- the RFC does not mention
whether

Re: [PATCH 4/7] eCryptfs: Comments for some structs

2007-07-25 Thread Satyam Sharma

Trivial nits ...

On 7/26/07, Michael Halcrow <[EMAIL PROTECTED]> wrote:

[...]
+/**
+ * ecryptfs_global_auth_tok structs refer to authentication token keys
+ * in the user keyring that apply to newly created files. A list of
+ * these objects hangs off of the mount_crypt_stat struct for any
+ * given eCryptfs mount. This struct maintains a reference to both the
+ * key contents and the key itself so that the key can be put on
+ * unmount.
+ */


/** is used to annotate kernel-doc style comments, which this
one isn't -- IIRC, kernel-doc doesn't like this (?)


 struct ecryptfs_global_auth_tok {
 #define ECRYPTFS_AUTH_TOK_INVALID 0x0001
u32 flags;
-   struct list_head mount_crypt_stat_list;
-   struct key *global_auth_tok_key;
-   struct ecryptfs_auth_tok *global_auth_tok;
-   unsigned char sig[ECRYPTFS_SIG_SIZE_HEX + 1];



+   struct list_head mount_crypt_stat_list; /* Default auth_tok list for
+* the mount_crypt_stat */
+   struct key *global_auth_tok_key; /* The key from the user's keyring for
+ * the sig */


Tsk. You could consider using kernel-doc style itself to comment the
structure -- this stuff goes up there and doesn't look icky.


+   struct ecryptfs_auth_tok *global_auth_tok; /* The key contents */
+   unsigned char sig[ECRYPTFS_SIG_SIZE_HEX + 1]; /* The key identifier */
 };

+/**
+ * Typically, eCryptfs will use the same ciphers repeatedly throughout
+ * the course of its operations. In order to avoid unnecessarily
+ * destroying and initializing the same cipher repeatedly, eCryptfs
+ * keeps a list of crypto API contexts around to use when needed.
+ */


Again, you could consider using kernel-doc style comments here.


 struct ecryptfs_key_tfm {
struct crypto_blkcipher *key_tfm;
size_t key_size;
struct mutex key_tfm_mutex;
-   struct list_head key_tfm_list;
+   struct list_head key_tfm_list; /* The module's tfm list */
unsigned char cipher_name[ECRYPTFS_MAX_CIPHER_NAME_SIZE + 1];
 };



Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] dma: INTEL_IOATDMA build fix

2007-07-25 Thread Satyam Sharma
Make CONFIG_INTEL_IOATDMA select CONFIG_DCA because it uses code
exported from said dependency:

# CONFIG_DCA is not set
CONFIG_INTEL_IOATDMA=m

ERROR: "alloc_dca_provider" [drivers/dma/ioatdma.ko] undefined!
ERROR: "register_dca_provider" [drivers/dma/ioatdma.ko] undefined!
ERROR: "unregister_dca_provider" [drivers/dma/ioatdma.ko] undefined!
ERROR: "free_dca_provider" [drivers/dma/ioatdma.ko] undefined!
make[1]: *** [__modpost] Error 1

"select" seems ok because CONFIG_DCA looks library-like and doesn't itself
depend upon anything else.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 drivers/dma/Kconfig |1 +
 1 files changed, 1 insertion(+)

diff -ruNp a/drivers/dma/Kconfig b/drivers/dma/Kconfig
--- a/drivers/dma/Kconfig   2007-07-26 03:31:47.0 +0530
+++ b/drivers/dma/Kconfig   2007-07-26 03:32:14.0 +0530
@@ -28,6 +28,7 @@ comment "DMA Devices"
 config INTEL_IOATDMA
tristate "Intel I/OAT DMA support"
depends on DMA_ENGINE && PCI
+   select DCA
default m
---help---
  Enable support for the Intel(R) I/OAT DMA engine.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: -mm merge plans for 2.6.23

2007-07-25 Thread Satyam Sharma

Hi Ingo,

[ Going off-topic, nothing related to swap/prefetch/etc. Just getting
a hang of how development goes on here ... ]

On 7/25/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:


* Rene Herman <[EMAIL PROTECTED]> wrote:

> Nick Piggin is the person to convince it seems and if I've read things
> right (I only stepped into this thing at the updatedb mention, so
> maybe I haven't) his main question is _why_ the hell it helps
> updatedb. [...]

btw., i'd like to make this clear: if you want stuff to go upstream, do
not concentrate on 'convincing the maintainer'.


It's not so easy or clear-cut, see below.


Instead concentrate on understanding the _problem_,


Of course -- that's a given.


concentrate on
making sure that both you and the maintainer understands the problem
correctly,


This itself may require some "convincing" to do. What if the maintainer
just doesn't recognize the problem? Note that the development model
here is more about the "social" thing than purely a "technical" thing.
People do handwave, possibly due to innocent misunderstandings,
possibly without. Often it's just a case of seeing different reasons behind
the "problematic behaviour". Or it could be a case of all of the above.


possibly write some testcase that clearly exposes it, and


Oh yes -- that'll be helpful, but definitely not necessarily a prerequisite
for all issues, and then you can't even expect everybody to write or
test/benchmark with testcases. (oh, btw, this is assuming you do find
consensus on a testcase)


help the maintainer debug the problem.


Umm ... well. Should this "dance-with-the-maintainer" and all be really
necessary? What you're saying is easy if a "bug" is simple and objective,
with mathematically few (probably just one) possible correct solutions.
Often (most often, in fact) it's a subjective issue -- could be about APIs,
high level design, tradeoffs, even little implementation nits ... with one
person wanting to do it one way, another thinks there's something hacky
or "band-aidy" about it and a more beautiful/elegant solution exists elsewhere.
I think there's a similar deadlock here (?)


_Optionally_, if you find joy in
it, you are also free to write a proposed solution for that problem


Oh yes. But why "optionally"? This is *precisely* what the spirit of
development in such open / distributed projects is ... unless Linux
wants to die the same, slow, ivory-towered, miserable death that
*BSD have.


and
submit it to the maintainer.


Umm, ok ... pretty unlikely Linus or Andrew would take patches for any
kernel subsystem (that isn't obvious/trivial) from anybody just like that,
so you do need to Cc: the ones they trust (maintainer) to ensure they
review/ack your work and pick it up.


But a "here is a solution, take it or leave it" approach,


Agreed. That's definitely not the way to go.


before having
communicated the problem to the maintainer


Umm, well this could depend from problem-to-problem.


and before having debugged
the problem


Again, agreed -- but people can plausibly see different root causes for
the same symptoms -- and different solutions.


is the wrong way around. It might still work out fine if the
solution is correct (especially if the patch is small and obvious), but
if there are any non-trivial tradeoffs involved, or if nontrivial amount
of code is involved, you might see your patch at the end of a really
long (and constantly growing) waiting list of patches.


That's the whole point. For non-trivial / non-obvious / subjective issues,
the "process" you laid out above could itself become a problem ...

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Print utsname on Oops on all architectures

2007-07-25 Thread Satyam Sharma

On 7/25/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

On Thu, 5 Jul 2007 18:52:27 -0700 (PDT) Joshua Wise <[EMAIL PROTECTED]> wrote:

> Background:
>  This patch is a follow-on to "Info dump on Oops or panic()" [1].
>
>  On some architectures, the kernel printed some information on the running
>  kernel, but not on all architectures. The information printed was generally
>  the version and build number, but it was not located in a consistant place,
>  and some architectures did not print it at all.
>
> Description:
>  This patch uses the already-existing die_chain to print utsname information
>  on Oops. This patch also removes the architecture-specific utsname
>  printers. To avoid crashing the system further (and hence not printing the
>  Oops) in the case where the system is so hopelessly smashed that utsname
>  might be destroyed, we vsprintf the utsname data into a static buffer
>  first, and then just print that on crash.
>
> Testing:
>  I wrote a module that does a *(int*)0 = 0; and observed that I got my
>  utsname data printed.
>
> Potential impact:
>  This adds another line to the Oops output, causing the first few lines to
>  potentially scroll off the screen. This also adds a few more pointer
>  dereferences in the Oops path, because it adds to the die_chain notifier
>  chain, reducing the likelihood that the Oops will be printed if there is
>  very bad memory corruption.

There are strange happenings due to this patch on i386:

Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
hdc: max request size: 128KiB
hdc: 156355584 sectors (80054 MB) w/1819KiB Cache, CHS=65535/16/63, UDMA(33)
hdc: cache flushes supported
 hdc:<0>Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
 hdc1 hdc2 hdc3 hdc4 <<0>Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 
2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
 hdc5<0>Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
 hdc6 >
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
initcall 0xc052a060: idedisk_init+0x0/0x10() returned 0.
initcall 0xc052a060 ran for 38 msecs: idedisk_init+0x0/0x10()
Calling initcall 0xc052a070: ide_cdrom_init+0x0/0x10()
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
initcall 0xc052a070: ide_cdrom_init+0x0/0x10() returned 0.
initcall 0xc052a070 ran for 3 msecs: ide_cdrom_init+0x0/0x10()
Calling initcall 0xc052a080: idetape_init+0x0/0x90()
initcall 0xc052a080: idetape_init+0x0/0x90() returned 0.
initcall 0xc052a080 ran for 0 msecs: idetape_init+0x0/0x90()
Calling initcall 0xc052a110: idefloppy_init+0x0/0x20()
ide-floppy driver 0.99.newide

Re: [DRIVER SUBMISSION] DRBD wants to go mainline

2007-07-25 Thread Satyam Sharma

On 7/25/07, Lars Ellenberg <[EMAIL PROTECTED]> wrote:

On Wed, Jul 25, 2007 at 04:41:53AM +0530, Satyam Sharma wrote:
> [...]
>
> But where does the "send" come into the picture over here -- a send
> won't block forever, so I don't foresee any issues whatsoever w.r.t.
> kthreads conversion for that. [ BTW I hope you're *not* using any
> signals-based interface for your kernel thread _at all_. Kthreads
> disallow (ignore) all signals by default, as they should, and you really
> shouldn't need to write any logic to handle or do-certain-things-on-seeing
> a signal in a well designed kernel thread. ]
>
> >and the sending
> >latency is crucial to performance, while the recv
> >will not timeout for the next few seconds.
>
> Again, I don't see what sending latency has to do with a kernel_thread
> to kthread conversion. Or with signals, for that matter. Anyway, as
> Kyle Moffett mentioned elsewhere, you could probably look at other
> examples (say cifs_demultiplexer_thread() in fs/cifs/connect.c).

the basic problem, and what we use signals for, is:

it is waiting in recv, waiting for the peer to say something.
but I want it to stop recv, and go send something "right now".


That's ... weird. Most (all?) communication between any two parties
would follow a protocol where someone recv's stuff, does something
with it, and sends it back ... what would you send "right now" if you
didn't receive anything?


I don't want to have two threads for that.


I really think you should -- you clearly should. From the above, it does
appear that you're mixing in multiple kinds of stuff into a single thread,
and thus mucking up the entire design (and implementation).


yes we have timeo in place, anyways: we need to detect a failed peer
node in time. we even aim for "sub-second failover" sometimes (which is
not exactly feasible; but failover times of 15 seconds and less are
requirement for useable HA-iSCSI deployments).
but that does not cut it, timeo is seconds.
you don't want seconds latency for IO operations.


Ok, that's a reasonable goal.


so I signal it, it breaks out of recv, then sends, and goes back to recv.

in-kernel epoll would probably solve this.
I don't know how to do that properly, though.


Hmm, probably I don't understand what you're doing, how you're doing etc.
Will wait for the design & implementation docs.

Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [DRIVER SUBMISSION] DRBD wants to go mainline

2007-07-25 Thread Satyam Sharma

On 7/25/07, Lars Ellenberg [EMAIL PROTECTED] wrote:

On Wed, Jul 25, 2007 at 04:41:53AM +0530, Satyam Sharma wrote:
 [...]

 But where does the send come into the picture over here -- a send
 won't block forever, so I don't foresee any issues whatsoever w.r.t.
 kthreads conversion for that. [ BTW I hope you're *not* using any
 signals-based interface for your kernel thread _at all_. Kthreads
 disallow (ignore) all signals by default, as they should, and you really
 shouldn't need to write any logic to handle or do-certain-things-on-seeing
 a signal in a well designed kernel thread. ]

 and the sending
 latency is crucial to performance, while the recv
 will not timeout for the next few seconds.

 Again, I don't see what sending latency has to do with a kernel_thread
 to kthread conversion. Or with signals, for that matter. Anyway, as
 Kyle Moffett mentioned elsewhere, you could probably look at other
 examples (say cifs_demultiplexer_thread() in fs/cifs/connect.c).

the basic problem, and what we use signals for, is:

it is waiting in recv, waiting for the peer to say something.
but I want it to stop recv, and go send something right now.


That's ... weird. Most (all?) communication between any two parties
would follow a protocol where someone recv's stuff, does something
with it, and sends it back ... what would you send right now if you
didn't receive anything?


I don't want to have two threads for that.


I really think you should -- you clearly should. From the above, it does
appear that you're mixing in multiple kinds of stuff into a single thread,
and thus mucking up the entire design (and implementation).


yes we have timeo in place, anyways: we need to detect a failed peer
node in time. we even aim for sub-second failover sometimes (which is
not exactly feasible; but failover times of 15 seconds and less are
requirement for useable HA-iSCSI deployments).
but that does not cut it, timeo is seconds.
you don't want seconds latency for IO operations.


Ok, that's a reasonable goal.


so I signal it, it breaks out of recv, then sends, and goes back to recv.

in-kernel epoll would probably solve this.
I don't know how to do that properly, though.


Hmm, probably I don't understand what you're doing, how you're doing etc.
Will wait for the design  implementation docs.

Thanks,
Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Print utsname on Oops on all architectures

2007-07-25 Thread Satyam Sharma

On 7/25/07, Andrew Morton [EMAIL PROTECTED] wrote:

On Thu, 5 Jul 2007 18:52:27 -0700 (PDT) Joshua Wise [EMAIL PROTECTED] wrote:

 Background:
  This patch is a follow-on to Info dump on Oops or panic() [1].

  On some architectures, the kernel printed some information on the running
  kernel, but not on all architectures. The information printed was generally
  the version and build number, but it was not located in a consistant place,
  and some architectures did not print it at all.

 Description:
  This patch uses the already-existing die_chain to print utsname information
  on Oops. This patch also removes the architecture-specific utsname
  printers. To avoid crashing the system further (and hence not printing the
  Oops) in the case where the system is so hopelessly smashed that utsname
  might be destroyed, we vsprintf the utsname data into a static buffer
  first, and then just print that on crash.

 Testing:
  I wrote a module that does a *(int*)0 = 0; and observed that I got my
  utsname data printed.

 Potential impact:
  This adds another line to the Oops output, causing the first few lines to
  potentially scroll off the screen. This also adds a few more pointer
  dereferences in the Oops path, because it adds to the die_chain notifier
  chain, reducing the likelihood that the Oops will be printed if there is
  very bad memory corruption.

There are strange happenings due to this patch on i386:

Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
hdc: max request size: 128KiB
hdc: 156355584 sectors (80054 MB) w/1819KiB Cache, CHS=65535/16/63, UDMA(33)
hdc: cache flushes supported
 hdc:0Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
 hdc1 hdc2 hdc3 hdc4 0Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 
2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
 hdc50Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
 hdc6 
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
initcall 0xc052a060: idedisk_init+0x0/0x10() returned 0.
initcall 0xc052a060 ran for 38 msecs: idedisk_init+0x0/0x10()
Calling initcall 0xc052a070: ide_cdrom_init+0x0/0x10()
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686
initcall 0xc052a070: ide_cdrom_init+0x0/0x10() returned 0.
initcall 0xc052a070 ran for 3 msecs: ide_cdrom_init+0x0/0x10()
Calling initcall 0xc052a080: idetape_init+0x0/0x90()
initcall 0xc052a080: idetape_init+0x0/0x90() returned 0.
initcall 0xc052a080 ran for 0 msecs: idetape_init+0x0/0x90()
Calling initcall 0xc052a110: idefloppy_init+0x0/0x20()
ide-floppy driver 0.99.newide
initcall 0xc052a110: 

Re: [ck] Re: -mm merge plans for 2.6.23

2007-07-25 Thread Satyam Sharma

Hi Ingo,

[ Going off-topic, nothing related to swap/prefetch/etc. Just getting
a hang of how development goes on here ... ]

On 7/25/07, Ingo Molnar [EMAIL PROTECTED] wrote:


* Rene Herman [EMAIL PROTECTED] wrote:

 Nick Piggin is the person to convince it seems and if I've read things
 right (I only stepped into this thing at the updatedb mention, so
 maybe I haven't) his main question is _why_ the hell it helps
 updatedb. [...]

btw., i'd like to make this clear: if you want stuff to go upstream, do
not concentrate on 'convincing the maintainer'.


It's not so easy or clear-cut, see below.


Instead concentrate on understanding the _problem_,


Of course -- that's a given.


concentrate on
making sure that both you and the maintainer understands the problem
correctly,


This itself may require some convincing to do. What if the maintainer
just doesn't recognize the problem? Note that the development model
here is more about the social thing than purely a technical thing.
People do handwave, possibly due to innocent misunderstandings,
possibly without. Often it's just a case of seeing different reasons behind
the problematic behaviour. Or it could be a case of all of the above.


possibly write some testcase that clearly exposes it, and


Oh yes -- that'll be helpful, but definitely not necessarily a prerequisite
for all issues, and then you can't even expect everybody to write or
test/benchmark with testcases. (oh, btw, this is assuming you do find
consensus on a testcase)


help the maintainer debug the problem.


Umm ... well. Should this dance-with-the-maintainer and all be really
necessary? What you're saying is easy if a bug is simple and objective,
with mathematically few (probably just one) possible correct solutions.
Often (most often, in fact) it's a subjective issue -- could be about APIs,
high level design, tradeoffs, even little implementation nits ... with one
person wanting to do it one way, another thinks there's something hacky
or band-aidy about it and a more beautiful/elegant solution exists elsewhere.
I think there's a similar deadlock here (?)


_Optionally_, if you find joy in
it, you are also free to write a proposed solution for that problem


Oh yes. But why optionally? This is *precisely* what the spirit of
development in such open / distributed projects is ... unless Linux
wants to die the same, slow, ivory-towered, miserable death that
*BSD have.


and
submit it to the maintainer.


Umm, ok ... pretty unlikely Linus or Andrew would take patches for any
kernel subsystem (that isn't obvious/trivial) from anybody just like that,
so you do need to Cc: the ones they trust (maintainer) to ensure they
review/ack your work and pick it up.


But a here is a solution, take it or leave it approach,


Agreed. That's definitely not the way to go.


before having
communicated the problem to the maintainer


Umm, well this could depend from problem-to-problem.


and before having debugged
the problem


Again, agreed -- but people can plausibly see different root causes for
the same symptoms -- and different solutions.


is the wrong way around. It might still work out fine if the
solution is correct (especially if the patch is small and obvious), but
if there are any non-trivial tradeoffs involved, or if nontrivial amount
of code is involved, you might see your patch at the end of a really
long (and constantly growing) waiting list of patches.


That's the whole point. For non-trivial / non-obvious / subjective issues,
the process you laid out above could itself become a problem ...

Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/7] eCryptfs: Comments for some structs

2007-07-25 Thread Satyam Sharma

Trivial nits ...

On 7/26/07, Michael Halcrow [EMAIL PROTECTED] wrote:

[...]
+/**
+ * ecryptfs_global_auth_tok structs refer to authentication token keys
+ * in the user keyring that apply to newly created files. A list of
+ * these objects hangs off of the mount_crypt_stat struct for any
+ * given eCryptfs mount. This struct maintains a reference to both the
+ * key contents and the key itself so that the key can be put on
+ * unmount.
+ */


/** is used to annotate kernel-doc style comments, which this
one isn't -- IIRC, kernel-doc doesn't like this (?)


 struct ecryptfs_global_auth_tok {
 #define ECRYPTFS_AUTH_TOK_INVALID 0x0001
u32 flags;
-   struct list_head mount_crypt_stat_list;
-   struct key *global_auth_tok_key;
-   struct ecryptfs_auth_tok *global_auth_tok;
-   unsigned char sig[ECRYPTFS_SIG_SIZE_HEX + 1];



+   struct list_head mount_crypt_stat_list; /* Default auth_tok list for
+* the mount_crypt_stat */
+   struct key *global_auth_tok_key; /* The key from the user's keyring for
+ * the sig */


Tsk. You could consider using kernel-doc style itself to comment the
structure -- this stuff goes up there and doesn't look icky.


+   struct ecryptfs_auth_tok *global_auth_tok; /* The key contents */
+   unsigned char sig[ECRYPTFS_SIG_SIZE_HEX + 1]; /* The key identifier */
 };

+/**
+ * Typically, eCryptfs will use the same ciphers repeatedly throughout
+ * the course of its operations. In order to avoid unnecessarily
+ * destroying and initializing the same cipher repeatedly, eCryptfs
+ * keeps a list of crypto API contexts around to use when needed.
+ */


Again, you could consider using kernel-doc style comments here.


 struct ecryptfs_key_tfm {
struct crypto_blkcipher *key_tfm;
size_t key_size;
struct mutex key_tfm_mutex;
-   struct list_head key_tfm_list;
+   struct list_head key_tfm_list; /* The module's tfm list */
unsigned char cipher_name[ECRYPTFS_MAX_CIPHER_NAME_SIZE + 1];
 };



Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] dma: INTEL_IOATDMA build fix

2007-07-25 Thread Satyam Sharma
Make CONFIG_INTEL_IOATDMA select CONFIG_DCA because it uses code
exported from said dependency:

# CONFIG_DCA is not set
CONFIG_INTEL_IOATDMA=m

ERROR: alloc_dca_provider [drivers/dma/ioatdma.ko] undefined!
ERROR: register_dca_provider [drivers/dma/ioatdma.ko] undefined!
ERROR: unregister_dca_provider [drivers/dma/ioatdma.ko] undefined!
ERROR: free_dca_provider [drivers/dma/ioatdma.ko] undefined!
make[1]: *** [__modpost] Error 1

select seems ok because CONFIG_DCA looks library-like and doesn't itself
depend upon anything else.

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]

---

 drivers/dma/Kconfig |1 +
 1 files changed, 1 insertion(+)

diff -ruNp a/drivers/dma/Kconfig b/drivers/dma/Kconfig
--- a/drivers/dma/Kconfig   2007-07-26 03:31:47.0 +0530
+++ b/drivers/dma/Kconfig   2007-07-26 03:32:14.0 +0530
@@ -28,6 +28,7 @@ comment DMA Devices
 config INTEL_IOATDMA
tristate Intel I/OAT DMA support
depends on DMA_ENGINE  PCI
+   select DCA
default m
---help---
  Enable support for the Intel(R) I/OAT DMA engine.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()

2007-07-24 Thread Satyam Sharma

On 7/23/07, Jens Axboe <[EMAIL PROTECTED]> wrote:

On Sun, Jul 22 2007, Satyam Sharma wrote:
> Hi Walter,
>
> Thanks for reporting this.
>
> On 7/22/07, walter harms <[EMAIL PROTECTED]> wrote:
>> hello all,
>> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21
>
> Did this happen when you were resuming from a suspend-to-ram/disk?
> [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ]
>
>> 
>> Using IPI Shortcut mode
>> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug()
>>  [] blk_remove_plug+0x36/0x5a
>>  [] __generic_unplug_device+0x14/0x1f
>>  [] __make_request+0x39b/0x49c
>>  [] generic_make_request+0x228/0x255
>>  [] submit_bio+0xa5/0xac
>>  [] mempool_alloc+0x37/0xae
>>  [] submit+0xc2/0x11d
>>  [] bio_read_page+0x24/0x27
>>  [] swsusp_check+0x4f/0xaf
>>  [] software_resume+0x5f/0x108
>>  [] kernel_init+0xb0/0x212
>>  [] ret_from_fork+0x6/0x1c
>>  [] kernel_init+0x0/0x212
>>  [] kernel_init+0x0/0x212
>>  [] kernel_thread_helper+0x7/0x10
>>  ===
>
> Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled
> alright on that codepath. OTOH, __make_request() is heavily goto-driven,
> uses the non-save/restore variants of spin_lock_irq, and does not even
> balance locks / unlocks for some error paths ... gaah.

__make_request() must be called from process context, hence
spin_lock_irq() is perfectly already and the fastest way to go. And of
course the locking is balanced! So please save your 'gaah's for code
you actually took the time to try and understand.


You're right, I didn't really look at that code for long (it even explicitly
comments about what's going with the locking in there!) sorry about
that.

[ Off-topic: BTW does every call to __make_request() end up in
blk_remove_plug()? Since you're explicitly making the assumption
that it *must* be called from process context (and hence the use of
the non-save/restore variants), you could consider putting a
WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON
(!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and
other such similar functions that currently have the !irqs_disabled
check. This way you'd effectively cover _both_ the assertions,
and in appropriate places -- just a suggestion. ]


But it does look like unbalanced irq disable/enable calls. I'd guess in
the suspend/resume path. Obviously something more esoteric, since this
is the first such report for 2.6.22, so like some not-very-used driver
for instance.


Now that I do look at the codepath, it does seem surprising irqs were
not disabled there. There are a bunch of calls to _other_ functions
between the spin_lock_irq and the blk_remove_plug via
__generic_unplug_device that would also have complained about
!irqs_disabled.

Walter, does this happen reproducibly?

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [DRIVER SUBMISSION] DRBD wants to go mainline

2007-07-24 Thread Satyam Sharma

Hi Lars,


On 7/24/07, Lars Ellenberg <[EMAIL PROTECTED]> wrote:

On Mon, Jul 23, 2007 at 07:10:58PM +0530, Satyam Sharma wrote:
> On 7/23/07, Lars Ellenberg <[EMAIL PROTECTED]> wrote:
> >On Sun, Jul 22, 2007 at 09:32:02PM -0400, Kyle Moffett wrote:
> >[...]
> >> Don't use signals between kernel threads, use proper primitives like
> >> notifiers and waitqueues, which means you should also probably switch
> >away
> >> from kernel_thread() to the kthread_*() APIs.  Also you should fix this
> >> FIXME or remove it if it no longer applies:-D.
> >
> >right.
> >but how to I tell a network thread in tcp_recvmsg to stop early,
> >without using signals?
>
> Yup, kthreads API cannot handle (properly stop) kernel threads that want
> to sleep on possibly-blocking-forever-till-signalled-functions such as
> tcp_recvmsg or skb_recv_datagram etc etc.
>
> There are two workarounds:
> 1. Use sk_rcvtimeo and related while-continue logic
> 2. force_sig(SIGKILL) to your kernel thread just before kthread_stop
>   (note that you don't need to allow / write code to handle / etc signals
>   in your kthread code -- force_sig will work automatically)

this is not only at stop time.
for example our "drbd_asender" thread
does receive as well as send,


That's normal -- in fact it would've been surprising if your kthread only
did recvs but no sends!

But where does the "send" come into the picture over here -- a send
won't block forever, so I don't foresee any issues whatsoever w.r.t.
kthreads conversion for that. [ BTW I hope you're *not* using any
signals-based interface for your kernel thread _at all_. Kthreads
disallow (ignore) all signals by default, as they should, and you really
shouldn't need to write any logic to handle or do-certain-things-on-seeing
a signal in a well designed kernel thread. ]


and the sending
latency is crucial to performance, while the recv
will not timeout for the next few seconds.


Again, I don't see what sending latency has to do with a kernel_thread
to kthread conversion. Or with signals, for that matter. Anyway, as
Kyle Moffett mentioned elsewhere, you could probably look at other
examples (say cifs_demultiplexer_thread() in fs/cifs/connect.c).

[ I didn't really want to give that example, because I get a nervous
breakdown when looking at that code myself, and would actively
like to save other fellow developers from a similar fate. To know
what I'm talking about, set your xterm to display 40 rows, and
then look at the line numbers 3139-3218 in that file, especially
3190-3212. Yes, what you see there is a map of Sulawesi [1]
subliminally hidden in Linux kernel code :-) ]

Anyway, cifs_demultiplexer_thread() is just your normal kthread that:

(1) Ignores all signals
(2) Calls perma-blocking-till-signalled functions such as tcp_recvmsg
   (via kernel_recvmsg)
(3) Calls send-to-socket kind of functions

Hence, it could get into trouble when the umount(2) code wants to stop
it with kthread_stop() and it happens to be blocked in tcp_recvmsg()
with noblock = 0 (hence sk_rcvtimeo == MAX_SCHEDULE_TIMEOUT), thus
would handle the wake_up_process() internally, and not break out, hence
not check kthread_should_stop() which it should -- all this ensuring that
the kthread never gets killed, kthread_stop() hangs, and the umount(2)
from userspace never returns ...

But they've solved it as follows (as I suggested earlier):

(1) First, set sock->sk_rcvtimeo to some "magical value" in your code
   that sets up the socket params after socket->proto_ops->connect().
   See ipv4_connect(), f.e. in CIFS they've set it up to 7 seconds. But
   that's arbitrarily chosen -- this'll ensure your tcp_recvmsg() isn't
   perma-blocking in the first place, but will unblock/return every 7 secs,
   and thus get a chance to check kthread_should_stop().

(2) From the code that wants to kill/stop the kthread (module exit, or
   umount(2) most probably), just ensure you make a call to force_sig()
   before kthread_stop() on that kthread -- see cifs_umount() in the
   same file. This'll ensure that even if the kthread is currently sleeping
   in tcp_recvmsg(), it'll be signalled to break out from there, and thus
   check kthread_should_stop().

(3) Note that not a single line of code needs to be written extra in the
   kthread itself for this to work -- nothing to allow / handle signals ...

Just this, should be enough for a smooth conversion to kthreads, IMHO.



> >> +/* THINK maybe we actually want to use the default "event/%s" worker
> >threads
> >> + * or similar in linux 2.6, which uses per cpu data and threads.
> >> + *
> >> + * To be general, this might need a spin_lock member.
> >> + * For now, please use the mdev->req_lock to protect list_head,
> >> + * see drbd_queue_work below.
> >>

Re: [PATCH 8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions

2007-07-24 Thread Satyam Sharma

On Tue, 24 Jul 2007, Linus Torvalds wrote:

> On Tue, 24 Jul 2007, Satyam Sharma wrote:
> > 
> > Looks like when you said "CPU memory barrier extends to all memory
> > references" you were probably referring to a _given_ CPU ... yes,
> > that statement is correct in that case.
> 
> No. CPU memory barriers extend to all CPU's. End of discussion.
> 
> It's not about "that cacheline". The whole *point* of a CPU memory barrier 
> is that it's about independent memory accesses.
> 
> Yes, for a memory barrier to be effective, all CPU's involved in the 
> transaction have to have the barriers - the same way a lock needs to be 
> taken by everybody in order for it to make sense - but the point is, CPU 
> barriers are about *global* behaviour, not local ones.
> 
> So there's a *huge* difference between
> 
>   clear_bit(x,y);
> 
> and
> 
>   clear_bit(x,y);
>   smp_mb__before_after_clear_bit();
> 
> and it has absolutely nothing to do with the particular cacheline that "y" 
> is in, it's about the *global* memory ordering.
> 
> Any write you do after that "smp_mb__before_after_clear_bit()" will be 
> guaranteed to be visible to _other_ CPU's *after* they have seen the bit 
> being cleared. Yes, those other CPU's need to have a read barrier between 
> reading the bit and reading some other thign, but the point is, this hass 
> *nothing* to do with cache coherency, and the particular cache line that 
> "y" is in.
> 
> And no, "smp_mb__before/after_clear_bit()" must *not* be just an empty "do 
> {} while (0)". It needs to be a compiler barrier even when it has no 
> actual CPU meaning, unless clear_bit() itself is guaranteed to be a 
> compiler barrier (which it isn't, although the "volatile" on the asm in 
> practice makes it something *close* to that).
> 
> Why? Think of the sequence like this:
> 
>   clear_bit(x,y);
>   smp_mb__after_clear_bit();
>   other_variable = 10;
> 
> the whole *point* of this sequence is that if another CPU does
> 
>   x = other_variable;
>   smp_rmb();
>   bit = test_bit(x,y)
> 
> then if it sees "x" being 10, then the bit *has* to be clear.
> 
> And this is why the compiler barrier in "smp_mb__after_clear_bit()" needs 
> to be a compiler barrier:
> 
>  - it doesn't matter for the action of the "clear_bit()" itself: that one 
>is locked, and on x86 it thus also happens to be a serializing 
>instruction, and the cache coherency and lock obviously means that the 
>bit clearing *itself* is safe!
> 
>  - but it *does* matter for the compiler scheduling. If the compiler were 
>to decide that "y" and "other_variable" are totally independent, it 
>might otherwise decide to move the "other_variable = 10" assignment to 
>*before* the clear_bit(), which would make the whole code pointless!
> 
> See? We have two totally independent issues:
> 
>  - the CPU itself can re-order the visibility of accesses. x86 doesn't do 
>this very much, and doesn't do it at all across a locked instruction, 
>but it's still a real issue, even if it tends to be much easier to see 
>on other architectures.
> 
>  - the compiler doesn't care about rules of "locked instruction" at all, 
>because it has no clue. It has *different* rules about how it can 
>re-order instructions and accesses, and maybe the "asm volatile" will 
>guarantee that the compiler won't re-order things around the 
>clear_bit(), and maybe it won't. But making it a compiler barrier (by 
>using the "memory clobber" thing, *guarantees* that gcc cannot reorder 
>memory writes or reads.
> 
> See? Two different - and _totally_ independent - levels of ordering, and 
> we need to make sure that both are valid.

Ok, thanks much (David and Nick too!) for taking the time to explain this
out clearly. It does look amazingly obvious now that I see it, with
callers using bitops to implement locking schemes who would completely
break otherwise -- in fact 6 and 8 in this series look amazingly obtuse
now ...


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

> > > For the purpose of this discussion (Linux memory
> > > barrier semantics, on WB memory), it is true of CPU
> > > and compiler barriers.
> > 
> > On later Intel processors, if the memory address range being referenced
> > (and say written to) by the (locked) instruction is in the cache of a
> > CPU, then it will not assert the LOCK signal on the system bus --
> > thus not assume exclusive use of shared memory. So other CPUs are free
> > to modify (other) memory at that point. Cache coherency will still
> > ensure _that_ (locked) memory area is still updated atomically, though.
> 
> The system bus does not need to be serialised because the CPU already
> holds the cacheline in exclusive state. That *is* the cache coherency
> protocol.
> 
> The memory ordering is enforced by the CPU effectively preventing
> speculative loads to pass the locked instruction and ensuring all
> stores reach the cache before it is executed. (I say effectively
> because the CPU might do clever tricks without you knowing).

Looks like when you said "CPU memory barrier extends to all memory
references" you were probably referring to a _given_ CPU ... yes,
that statement is correct in that case.

> > > Are you saying that it is OK for the store to var to
> > > be reordered below the clear_bit? If not, what are you
> > > saying?
> > 
> > 
> > I might be making a radical turn-around here, but all of a
> > sudden I think it's actually a good idea to put a complete
> > memory clobber in set_bit/clear_bit and friends themselves,
> > and leave the "__" variants as they are.
> 
> Why?

Well, why not. Callers who don't want/need any guarantees whatsoever
can still use __foo() -- for others, it makes sense to just use
foo() and get *both* the compiler and CPU barrier semantics -- I think
that's the behaviour most callers would want anyway.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

> > > Satyam Sharma wrote:
> 
> > > > Consider this (the above two functions exist
> > only for clear_bit(),
> > > > the atomic variant, as you already know), the
> > _only_ memory reference
> > > > we care about is that of the address of the
> > passed bit-string:
> > > 
> > > No. Memory barriers explicitly extend to all
> > memory references.
> > 
> > [ Compiler barrier, you mean, that's not true of CPU
> > barriers. ]
> 
> For the purpose of this discussion (Linux memory
> barrier semantics, on WB memory), it is true of CPU
> and compiler barriers.

On later Intel processors, if the memory address range being referenced
(and say written to) by the (locked) instruction is in the cache of a
CPU, then it will not assert the LOCK signal on the system bus --
thus not assume exclusive use of shared memory. So other CPUs are free
to modify (other) memory at that point. Cache coherency will still
ensure _that_ (locked) memory area is still updated atomically, though.

> Obviously because we want some kind of ordering
> guarantee at a given point. All the CPU barriers
> in the world are useless if the compiler can reorder
> access over them.

Yes ...

> > As for a compiler barrier, the asm there already
> > guarantees the compiler
> > will not optimize references to _that_ address
> 
> One or both of us still fails to understand the other.

... I think the culprit is me ...

> bit_spin_lock(LOCK_NR, );
> var++;
> /* this is bit_spin_unlock(LOCK_NR, ); */
> smp_mb__before_clear_bit();
> clear_bit(LOCK_NR, );

Yup, David has laid this out clearly, as well.

> Are you saying that it is OK for the store to var to
> be reordered below the clear_bit? If not, what are you
> saying?

I might be making a radical turn-around here, but all of a
sudden I think it's actually a good idea to put a complete
memory clobber in set_bit/clear_bit and friends themselves,
and leave the "__" variants as they are.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered unnecessarily

2007-07-24 Thread Satyam Sharma
Hi David,

On Tue, 24 Jul 2007, David Howells wrote:

> Satyam Sharma <[EMAIL PROTECTED]> wrote:
> 
> > OTOH, as per Linus' review it seems we can drop the "memory" clobber
> > and specify the output operand for the extended asm as "+m". But I
> > must admit I didn't quite understand that at all.
> 
> As I understand it, the "+m" indicates to the compiler a restriction on the
> ordering of things that access that particular memory location, whereas a
> "memory" indicates a restriction on the orderings of all accesses to memory -
> precisely what you need to produce a lock.

Ok, thanks -- I didn't know gcc's behaviour w.r.t. "+m" at all, but in my
defense I'd add all this was quite poorly/wrongly documented in the docs.

> There are a number of things that use test_and_set_bit() and co to implement a
> lock or other synchronisation.  This means that they must exhibit LOCK-class
> barrier effects or better.  LOCK-class barrier effects mean, more or less,
> that all memory accesses issued before the lock must happen before all memory
> accesses issued after the lock.  But it most happen at both CPU-level and
> compiler-level.  The "memory" constraint instructs the compiler in this
> regard.

Yes, thanks for laying this out so clearly, again. So combined with what
you explained above, I think I now fully understand why most of this
series was incorrect ...

> Remember also that this is gcc black magic and so some of it has had to be
> worked out empirically - possibly after sacrificing a goat under a full moon.

:-)


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> > On Tue, 24 Jul 2007, Nick Piggin wrote:
> > > Satyam Sharma wrote:
> > > [...]
> > > > So let's make these proper no-ops, because that's exactly what we
> > > > require
> > > > these to be on the i386 platform.
> > > 
> > > No. clear_bit is not a compiler barrier on i386,
> > 
> > Obvious.
> > 
> > > thus smp_mb__before/after
> > > must be.

> > Not so obvious. Why do we require these to be a full compiler barrier
> > is precisely the question I raised here.

> > Consider this (the above two functions exist only for clear_bit(),
> > the atomic variant, as you already know), the _only_ memory reference
> > we care about is that of the address of the passed bit-string:
> 
> No. Memory barriers explicitly extend to all memory references.

[ Compiler barrier, you mean, that's not true of CPU barriers. ]

In any case, I know that, obviously. I asked "why" not "what" :-) i.e.
why should we care about other addresses / why do we want to extend
the compiler barrier to all memory references -- but Jeremy seems to
have answered that ...

> > (1) The compiler must not optimize / elid it (i.e. we need to disallow
> > compiler optimization for that reference) -- but we've already taken
> > care of that with the __asm__ __volatile__ and the constraints on
> > the memory "addr" operand there, and,
> > (2) For the i386, it also includes an implicit memory (CPU) barrier
> > already.
> 
> Repeating what has been said before: A CPU memory barrier is not a
> compiler barrier or vice versa. Seeing as we are talking about
> the compiler barrier, it is irrelevant as to whether or not the
> assembly includes a CPU barrier.

I think it is quite relevant, in fact. From Documentation/atomic_ops.txt,
smp_mb__{before,after}_clear_bit(), as the name itself suggests, must
be _CPU barriers_ for those arch's that don't have an implicit
_CPU barrier_ in the clear_bit() itself [ which i386 does have already ].

As for a compiler barrier, the asm there already guarantees the compiler
will not optimize references to _that_ address, but there could still be
the memset()/set{clear}_bit() interspersing pitfalls for example, so yeah
the memory clobber would probably protect us there.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered unnecessarily

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

> > > [...]
> > > 
> > > __test_and_change_bit is one that you could remove the memory clobber
> > > from.
> > 
> > Yes, for the atomic versions we don't care if we're asking gcc to
> > generate trashy code (even though I'd have wanted to only disallow
> > problematic optimizations -- ones involving the passed bit-string
> > address -- there, and allow other memory references to be optimized
> > as and how the compiler feels like it) because the atomic variants
> > are slow anyway and we probably want to be extra-safe there.
> > 
> > But for the non-atomic variants, it does make sense to remove the
> > memory clobber (and the unneeded __asm__ __volatile__ that another
> > patch did -- for the non-atomic variants, again).
> 
> No. It has nothing to do with atomicity and all to do with ordering.

The memory clobber, or the volatile asm? There's more than one variable
here ... but still, I don't think either affects _ordering_ in any
_direct_ way.


> For example test_bit, clear_bit, set_bit, etc are all atomic but none
> place any restrictions on ordering.

In that case we need to update comments in include/asm-i386/bitops.h


> __test_and_change_bit has no restriction on ordering, so as long as
> the correct operands are clobbered, a "memory" clobber to enforce a
> compiler barrier is not needed.

But why even for the other operations? Consider (current code of)
test_and_set_bit():

static inline int test_and_set_bit(int nr, volatile unsigned long * addr)
{
int oldbit;
__asm__ __volatile__( LOCK_PREFIX
"btsl %2,%1\n\tsbbl %0,%0"
:"=r" (oldbit),"+m" (ADDR)
:"Ir" (nr) : "memory");

return oldbit;
}

The only memory reference in there is to the passed address, it will
be modified, yes, but that's been made obvious to gcc in the asm
already. So why are we marking all of memory as clobbered, is the
question. (I just read Jeremy's latest reply, but I don't see how
or why the memory clobber helps that case either -- does a memory
clobber affect how gcc would order / reorder code?)


> > OTOH, as per Linus' review it seems we can drop the "memory" clobber
> > and specify the output operand for the extended asm as "+m". But I
> > must admit I didn't quite understand that at all.
> 
> Yes, but that is for cases where "memory" is being used to say that
> some otherwise unspecified memory is actually being changed, rather
> than to provide a compiler barrier as is the case for test_and_set_bit
> and co.

Again, as I said above.

> > [ I should probably start reading gcc sources, the docs are said to
> >   be insufficient/out-of-date, as per the reviews of the patches. ]
> 
> You should read Documentation/atomic_ops.txt and memory-barriers.txt,
> which are quite useful and should be uptodate.

I have, of course. Will probably read again, thanks.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> > From: Satyam Sharma <[EMAIL PROTECTED]>
> > 
> > [8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions
> > 
> > > From Documentation/atomic_ops.txt, those archs that require explicit
> > memory barriers around clear_bit() must also implement these two interfaces.
> > However, for i386, clear_bit() is a strict, locked, atomic and
> > un-reorderable operation and includes an implicit memory barrier already.
> > 
> > But these two functions have been wrongly defined as "barrier()" which is
> > a pointless _compiler optimization_ barrier, and only serves to make gcc
> > not do legitimate optimizations that it could have otherwise done.
> > 
> > So let's make these proper no-ops, because that's exactly what we require
> > these to be on the i386 platform.
> 
> No. clear_bit is not a compiler barrier on i386,

Obvious.

> thus smp_mb__before/after
> must be.

Not so obvious. Why do we require these to be a full compiler barrier
is precisely the question I raised here.

Consider this (the above two functions exist only for clear_bit(),
the atomic variant, as you already know), the _only_ memory reference
we care about is that of the address of the passed bit-string:

(1) The compiler must not optimize / elid it (i.e. we need to disallow
compiler optimization for that reference) -- but we've already taken
care of that with the __asm__ __volatile__ and the constraints on
the memory "addr" operand there, and,
(2) For the i386, it also includes an implicit memory (CPU) barrier
already.

So I /think/ it makes sense to let the compiler optimize _other_ memory
references across the call to clear_bit(). There's a difference. I think
we'd be safe even if we do this, because the synchronization in callers
must be based upon the _passed bit-string_, otherwise _they_ are the
ones who're buggy.

[ However, elsewhere Jeremy Fitzhardinge mentioned the case of
  some callers, for instance, doing a memset() on an alias of
  the same bit-string. But again, I think that is dodgy/buggy/
  extremely border-line usage on the caller's side itself ...
  *unless* the caller is doing that inside a higher-level lock
  anyway, in which case he wouldn't be needing to use the
  locked variants either ... ]

[ For those interested, I've been looking at the code generated
  for the test kernel I built with these patches, and I don't
  really see anything gcc did that it shouldn't have -- but ok,
  that doesn't mean other versions/toolchains for other setups
  won't. Also, the test box has been up all night, but I'm only
  running Firefox on it anyway, and don't really know how to
  verify if I've introduced any correctness issues / bugs. ]


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered unnecessarily

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

> Satyam Sharma wrote:
> > From: Satyam Sharma <[EMAIL PROTECTED]>
> > 
> > [6/8] i386: bitops: Don't mark memory as clobbered unnecessarily
> > 
> > The goal is to let gcc generate good, beautiful, optimized code.
> > 
> > But test_and_set_bit, test_and_clear_bit, __test_and_change_bit,
> > and test_and_change_bit unnecessarily mark all of memory as clobbered,
> > thereby preventing gcc from doing perfectly valid optimizations.
> > 
> > The case of __test_and_change_bit() is particularly surprising, given
> > that it's a variant where we don't make any guarantees at all.
> 
> __test_and_change_bit is one that you could remove the memory clobber
> from.

Yes, for the atomic versions we don't care if we're asking gcc to
generate trashy code (even though I'd have wanted to only disallow
problematic optimizations -- ones involving the passed bit-string
address -- there, and allow other memory references to be optimized
as and how the compiler feels like it) because the atomic variants
are slow anyway and we probably want to be extra-safe there.

But for the non-atomic variants, it does make sense to remove the
memory clobber (and the unneeded __asm__ __volatile__ that another
patch did -- for the non-atomic variants, again).

OTOH, as per Linus' review it seems we can drop the "memory" clobber
and specify the output operand for the extended asm as "+m". But I
must admit I didn't quite understand that at all.

[ I should probably start reading gcc sources, the docs are said to
  be insufficient/out-of-date, as per the reviews of the patches. ]


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/8] i386: bitops: Kill volatile-casting of memory addresses

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

> Linus Torvalds wrote:
> > 
> > On Mon, 23 Jul 2007, Satyam Sharma wrote:
> > 
> > 
> > > [4/8] i386: bitops: Kill volatile-casting of memory addresses
> > 
> > 
> > This is wrong.
> > 
> > The "const volatile" is so that you can pass an arbitrary pointer. The only
> > kind of abritraty pointer is "const volatile".
> > 
> > In other words, the "volatile" has nothing at all to do with whether the
> > memory is volatile or not (the same way "const" has nothing to do with it:
> > it's purely a C type *safety* issue, exactly the same way "const" is a type
> > safety issue.
> > 
> > A "const" on a pointer doesn't mean that the thing it points to cannot
> > change. When you pass a source pointer to "strlen()", it doesn't have to be
> > constant. But "strlen()" takes a "const" pointer, because it work son
> > constant pointers *too*.
> > 
> > Same deal here.
> > 
> > Admittedly this may be mostly historic, but regardless - the "volatiles" are
> > right.
> > 
> > Using volatile on *data* is generally considered incorrect and bad taste,
> > but using it in situations like this potentially makes sense.
> > 
> > Of course, if we remove all "volatiles" in data in the kernel (with the
> > possible exception of "jiffies"), we can then remove them from function
> > declarations too, but it should be done in that order.
> 
> Well, regardless, it still forces the function to treat the pointer
> target as volatile, won't it? It definitely prevents valid optimisations
> that would be useful for me in mm/page_alloc.c where page flags are
> being set up or torn down or checked with non-atomic bitops.

Yes, and yes. But I think what he meant there is that we'd need to
audit the kernel for all users of set_bit and friends and see if callers
actually pass in any _data_ that _is_ volatile. So we have to kill them
there first, and then in the function declarations here. I think I'll put
that on my long-term todo list, but see below.

> Anyway by type safety, do you mean it will stop the compiler from
> warning if a pointer to a volatile is passed to the bitop?

The compiler would start warning for all those cases (passing in
a pointer to volatile data, when the bitops have lost the volatile
casting from their function declarations), actually. Something like
"passing argument discards qualifiers from pointer type" ... but
considering I didn't see *any* of those warnings after these patches,
I'm confused as to what exactly Linus meant here ... and what exactly
do we need to do "kill the volatiles".

> If so, then
> why don't we just kill all the volatiles out of here and fix any
> warnings that comeup? I doubt there would be many, and of those, some
> might show up real synchronisation problems.

Yes, but see above.

> OK, not the i386 functions as much because they are all in asm anwyay,
> but in general (btw. why does i386 or any architecture define their own
> non-atomic bitops? If the version in asm-generic/bitops/non-atomic.h
> is not good enough then surely it is a bug in gcc or that file?)



Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/8] i386: bitops: Kill volatile-casting of memory addresses

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

 Linus Torvalds wrote:
  
  On Mon, 23 Jul 2007, Satyam Sharma wrote:
  
  
   [4/8] i386: bitops: Kill volatile-casting of memory addresses
  
  
  This is wrong.
  
  The const volatile is so that you can pass an arbitrary pointer. The only
  kind of abritraty pointer is const volatile.
  
  In other words, the volatile has nothing at all to do with whether the
  memory is volatile or not (the same way const has nothing to do with it:
  it's purely a C type *safety* issue, exactly the same way const is a type
  safety issue.
  
  A const on a pointer doesn't mean that the thing it points to cannot
  change. When you pass a source pointer to strlen(), it doesn't have to be
  constant. But strlen() takes a const pointer, because it work son
  constant pointers *too*.
  
  Same deal here.
  
  Admittedly this may be mostly historic, but regardless - the volatiles are
  right.
  
  Using volatile on *data* is generally considered incorrect and bad taste,
  but using it in situations like this potentially makes sense.
  
  Of course, if we remove all volatiles in data in the kernel (with the
  possible exception of jiffies), we can then remove them from function
  declarations too, but it should be done in that order.
 
 Well, regardless, it still forces the function to treat the pointer
 target as volatile, won't it? It definitely prevents valid optimisations
 that would be useful for me in mm/page_alloc.c where page flags are
 being set up or torn down or checked with non-atomic bitops.

Yes, and yes. But I think what he meant there is that we'd need to
audit the kernel for all users of set_bit and friends and see if callers
actually pass in any _data_ that _is_ volatile. So we have to kill them
there first, and then in the function declarations here. I think I'll put
that on my long-term todo list, but see below.

 Anyway by type safety, do you mean it will stop the compiler from
 warning if a pointer to a volatile is passed to the bitop?

The compiler would start warning for all those cases (passing in
a pointer to volatile data, when the bitops have lost the volatile
casting from their function declarations), actually. Something like
passing argument discards qualifiers from pointer type ... but
considering I didn't see *any* of those warnings after these patches,
I'm confused as to what exactly Linus meant here ... and what exactly
do we need to do kill the volatiles.

 If so, then
 why don't we just kill all the volatiles out of here and fix any
 warnings that comeup? I doubt there would be many, and of those, some
 might show up real synchronisation problems.

Yes, but see above.

 OK, not the i386 functions as much because they are all in asm anwyay,
 but in general (btw. why does i386 or any architecture define their own
 non-atomic bitops? If the version in asm-generic/bitops/non-atomic.h
 is not good enough then surely it is a bug in gcc or that file?)



Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered unnecessarily

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

 Satyam Sharma wrote:
  From: Satyam Sharma [EMAIL PROTECTED]
  
  [6/8] i386: bitops: Don't mark memory as clobbered unnecessarily
  
  The goal is to let gcc generate good, beautiful, optimized code.
  
  But test_and_set_bit, test_and_clear_bit, __test_and_change_bit,
  and test_and_change_bit unnecessarily mark all of memory as clobbered,
  thereby preventing gcc from doing perfectly valid optimizations.
  
  The case of __test_and_change_bit() is particularly surprising, given
  that it's a variant where we don't make any guarantees at all.
 
 __test_and_change_bit is one that you could remove the memory clobber
 from.

Yes, for the atomic versions we don't care if we're asking gcc to
generate trashy code (even though I'd have wanted to only disallow
problematic optimizations -- ones involving the passed bit-string
address -- there, and allow other memory references to be optimized
as and how the compiler feels like it) because the atomic variants
are slow anyway and we probably want to be extra-safe there.

But for the non-atomic variants, it does make sense to remove the
memory clobber (and the unneeded __asm__ __volatile__ that another
patch did -- for the non-atomic variants, again).

OTOH, as per Linus' review it seems we can drop the memory clobber
and specify the output operand for the extended asm as +m. But I
must admit I didn't quite understand that at all.

[ I should probably start reading gcc sources, the docs are said to
  be insufficient/out-of-date, as per the reviews of the patches. ]


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

 Satyam Sharma wrote:
  From: Satyam Sharma [EMAIL PROTECTED]
  
  [8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions
  
   From Documentation/atomic_ops.txt, those archs that require explicit
  memory barriers around clear_bit() must also implement these two interfaces.
  However, for i386, clear_bit() is a strict, locked, atomic and
  un-reorderable operation and includes an implicit memory barrier already.
  
  But these two functions have been wrongly defined as barrier() which is
  a pointless _compiler optimization_ barrier, and only serves to make gcc
  not do legitimate optimizations that it could have otherwise done.
  
  So let's make these proper no-ops, because that's exactly what we require
  these to be on the i386 platform.
 
 No. clear_bit is not a compiler barrier on i386,

Obvious.

 thus smp_mb__before/after
 must be.

Not so obvious. Why do we require these to be a full compiler barrier
is precisely the question I raised here.

Consider this (the above two functions exist only for clear_bit(),
the atomic variant, as you already know), the _only_ memory reference
we care about is that of the address of the passed bit-string:

(1) The compiler must not optimize / elid it (i.e. we need to disallow
compiler optimization for that reference) -- but we've already taken
care of that with the __asm__ __volatile__ and the constraints on
the memory addr operand there, and,
(2) For the i386, it also includes an implicit memory (CPU) barrier
already.

So I /think/ it makes sense to let the compiler optimize _other_ memory
references across the call to clear_bit(). There's a difference. I think
we'd be safe even if we do this, because the synchronization in callers
must be based upon the _passed bit-string_, otherwise _they_ are the
ones who're buggy.

[ However, elsewhere Jeremy Fitzhardinge mentioned the case of
  some callers, for instance, doing a memset() on an alias of
  the same bit-string. But again, I think that is dodgy/buggy/
  extremely border-line usage on the caller's side itself ...
  *unless* the caller is doing that inside a higher-level lock
  anyway, in which case he wouldn't be needing to use the
  locked variants either ... ]

[ For those interested, I've been looking at the code generated
  for the test kernel I built with these patches, and I don't
  really see anything gcc did that it shouldn't have -- but ok,
  that doesn't mean other versions/toolchains for other setups
  won't. Also, the test box has been up all night, but I'm only
  running Firefox on it anyway, and don't really know how to
  verify if I've introduced any correctness issues / bugs. ]


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered unnecessarily

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

   [...]
   
   __test_and_change_bit is one that you could remove the memory clobber
   from.
  
  Yes, for the atomic versions we don't care if we're asking gcc to
  generate trashy code (even though I'd have wanted to only disallow
  problematic optimizations -- ones involving the passed bit-string
  address -- there, and allow other memory references to be optimized
  as and how the compiler feels like it) because the atomic variants
  are slow anyway and we probably want to be extra-safe there.
  
  But for the non-atomic variants, it does make sense to remove the
  memory clobber (and the unneeded __asm__ __volatile__ that another
  patch did -- for the non-atomic variants, again).
 
 No. It has nothing to do with atomicity and all to do with ordering.

The memory clobber, or the volatile asm? There's more than one variable
here ... but still, I don't think either affects _ordering_ in any
_direct_ way.


 For example test_bit, clear_bit, set_bit, etc are all atomic but none
 place any restrictions on ordering.

In that case we need to update comments in include/asm-i386/bitops.h


 __test_and_change_bit has no restriction on ordering, so as long as
 the correct operands are clobbered, a memory clobber to enforce a
 compiler barrier is not needed.

But why even for the other operations? Consider (current code of)
test_and_set_bit():

static inline int test_and_set_bit(int nr, volatile unsigned long * addr)
{
int oldbit;
__asm__ __volatile__( LOCK_PREFIX
btsl %2,%1\n\tsbbl %0,%0
:=r (oldbit),+m (ADDR)
:Ir (nr) : memory);

return oldbit;
}

The only memory reference in there is to the passed address, it will
be modified, yes, but that's been made obvious to gcc in the asm
already. So why are we marking all of memory as clobbered, is the
question. (I just read Jeremy's latest reply, but I don't see how
or why the memory clobber helps that case either -- does a memory
clobber affect how gcc would order / reorder code?)


  OTOH, as per Linus' review it seems we can drop the memory clobber
  and specify the output operand for the extended asm as +m. But I
  must admit I didn't quite understand that at all.
 
 Yes, but that is for cases where memory is being used to say that
 some otherwise unspecified memory is actually being changed, rather
 than to provide a compiler barrier as is the case for test_and_set_bit
 and co.

Again, as I said above.

  [ I should probably start reading gcc sources, the docs are said to
be insufficient/out-of-date, as per the reviews of the patches. ]
 
 You should read Documentation/atomic_ops.txt and memory-barriers.txt,
 which are quite useful and should be uptodate.

I have, of course. Will probably read again, thanks.


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

 Satyam Sharma wrote:
  On Tue, 24 Jul 2007, Nick Piggin wrote:
   Satyam Sharma wrote:
   [...]
So let's make these proper no-ops, because that's exactly what we
require
these to be on the i386 platform.
   
   No. clear_bit is not a compiler barrier on i386,
  
  Obvious.
  
   thus smp_mb__before/after
   must be.

  Not so obvious. Why do we require these to be a full compiler barrier
  is precisely the question I raised here.

  Consider this (the above two functions exist only for clear_bit(),
  the atomic variant, as you already know), the _only_ memory reference
  we care about is that of the address of the passed bit-string:
 
 No. Memory barriers explicitly extend to all memory references.

[ Compiler barrier, you mean, that's not true of CPU barriers. ]

In any case, I know that, obviously. I asked why not what :-) i.e.
why should we care about other addresses / why do we want to extend
the compiler barrier to all memory references -- but Jeremy seems to
have answered that ...

  (1) The compiler must not optimize / elid it (i.e. we need to disallow
  compiler optimization for that reference) -- but we've already taken
  care of that with the __asm__ __volatile__ and the constraints on
  the memory addr operand there, and,
  (2) For the i386, it also includes an implicit memory (CPU) barrier
  already.
 
 Repeating what has been said before: A CPU memory barrier is not a
 compiler barrier or vice versa. Seeing as we are talking about
 the compiler barrier, it is irrelevant as to whether or not the
 assembly includes a CPU barrier.

I think it is quite relevant, in fact. From Documentation/atomic_ops.txt,
smp_mb__{before,after}_clear_bit(), as the name itself suggests, must
be _CPU barriers_ for those arch's that don't have an implicit
_CPU barrier_ in the clear_bit() itself [ which i386 does have already ].

As for a compiler barrier, the asm there already guarantees the compiler
will not optimize references to _that_ address, but there could still be
the memset()/set{clear}_bit() interspersing pitfalls for example, so yeah
the memory clobber would probably protect us there.


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered unnecessarily

2007-07-24 Thread Satyam Sharma
Hi David,

On Tue, 24 Jul 2007, David Howells wrote:

 Satyam Sharma [EMAIL PROTECTED] wrote:
 
  OTOH, as per Linus' review it seems we can drop the memory clobber
  and specify the output operand for the extended asm as +m. But I
  must admit I didn't quite understand that at all.
 
 As I understand it, the +m indicates to the compiler a restriction on the
 ordering of things that access that particular memory location, whereas a
 memory indicates a restriction on the orderings of all accesses to memory -
 precisely what you need to produce a lock.

Ok, thanks -- I didn't know gcc's behaviour w.r.t. +m at all, but in my
defense I'd add all this was quite poorly/wrongly documented in the docs.

 There are a number of things that use test_and_set_bit() and co to implement a
 lock or other synchronisation.  This means that they must exhibit LOCK-class
 barrier effects or better.  LOCK-class barrier effects mean, more or less,
 that all memory accesses issued before the lock must happen before all memory
 accesses issued after the lock.  But it most happen at both CPU-level and
 compiler-level.  The memory constraint instructs the compiler in this
 regard.

Yes, thanks for laying this out so clearly, again. So combined with what
you explained above, I think I now fully understand why most of this
series was incorrect ...

 Remember also that this is gcc black magic and so some of it has had to be
 worked out empirically - possibly after sacrificing a goat under a full moon.

:-)


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

   Satyam Sharma wrote:
 
Consider this (the above two functions exist
  only for clear_bit(),
the atomic variant, as you already know), the
  _only_ memory reference
we care about is that of the address of the
  passed bit-string:
   
   No. Memory barriers explicitly extend to all
  memory references.
  
  [ Compiler barrier, you mean, that's not true of CPU
  barriers. ]
 
 For the purpose of this discussion (Linux memory
 barrier semantics, on WB memory), it is true of CPU
 and compiler barriers.

On later Intel processors, if the memory address range being referenced
(and say written to) by the (locked) instruction is in the cache of a
CPU, then it will not assert the LOCK signal on the system bus --
thus not assume exclusive use of shared memory. So other CPUs are free
to modify (other) memory at that point. Cache coherency will still
ensure _that_ (locked) memory area is still updated atomically, though.

 Obviously because we want some kind of ordering
 guarantee at a given point. All the CPU barriers
 in the world are useless if the compiler can reorder
 access over them.

Yes ...

  As for a compiler barrier, the asm there already
  guarantees the compiler
  will not optimize references to _that_ address
 
 One or both of us still fails to understand the other.

... I think the culprit is me ...

 bit_spin_lock(LOCK_NR, word);
 var++;
 /* this is bit_spin_unlock(LOCK_NR, word); */
 smp_mb__before_clear_bit();
 clear_bit(LOCK_NR, word);

Yup, David has laid this out clearly, as well.

 Are you saying that it is OK for the store to var to
 be reordered below the clear_bit? If not, what are you
 saying?

I might be making a radical turn-around here, but all of a
sudden I think it's actually a good idea to put a complete
memory clobber in set_bit/clear_bit and friends themselves,
and leave the __ variants as they are.


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions

2007-07-24 Thread Satyam Sharma
On Tue, 24 Jul 2007, Nick Piggin wrote:

   For the purpose of this discussion (Linux memory
   barrier semantics, on WB memory), it is true of CPU
   and compiler barriers.
  
  On later Intel processors, if the memory address range being referenced
  (and say written to) by the (locked) instruction is in the cache of a
  CPU, then it will not assert the LOCK signal on the system bus --
  thus not assume exclusive use of shared memory. So other CPUs are free
  to modify (other) memory at that point. Cache coherency will still
  ensure _that_ (locked) memory area is still updated atomically, though.
 
 The system bus does not need to be serialised because the CPU already
 holds the cacheline in exclusive state. That *is* the cache coherency
 protocol.
 
 The memory ordering is enforced by the CPU effectively preventing
 speculative loads to pass the locked instruction and ensuring all
 stores reach the cache before it is executed. (I say effectively
 because the CPU might do clever tricks without you knowing).

Looks like when you said CPU memory barrier extends to all memory
references you were probably referring to a _given_ CPU ... yes,
that statement is correct in that case.

   Are you saying that it is OK for the store to var to
   be reordered below the clear_bit? If not, what are you
   saying?
  
  
  I might be making a radical turn-around here, but all of a
  sudden I think it's actually a good idea to put a complete
  memory clobber in set_bit/clear_bit and friends themselves,
  and leave the __ variants as they are.
 
 Why?

Well, why not. Callers who don't want/need any guarantees whatsoever
can still use __foo() -- for others, it makes sense to just use
foo() and get *both* the compiler and CPU barrier semantics -- I think
that's the behaviour most callers would want anyway.


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions

2007-07-24 Thread Satyam Sharma

On Tue, 24 Jul 2007, Linus Torvalds wrote:

 On Tue, 24 Jul 2007, Satyam Sharma wrote:
  
  Looks like when you said CPU memory barrier extends to all memory
  references you were probably referring to a _given_ CPU ... yes,
  that statement is correct in that case.
 
 No. CPU memory barriers extend to all CPU's. End of discussion.
 
 It's not about that cacheline. The whole *point* of a CPU memory barrier 
 is that it's about independent memory accesses.
 
 Yes, for a memory barrier to be effective, all CPU's involved in the 
 transaction have to have the barriers - the same way a lock needs to be 
 taken by everybody in order for it to make sense - but the point is, CPU 
 barriers are about *global* behaviour, not local ones.
 
 So there's a *huge* difference between
 
   clear_bit(x,y);
 
 and
 
   clear_bit(x,y);
   smp_mb__before_after_clear_bit();
 
 and it has absolutely nothing to do with the particular cacheline that y 
 is in, it's about the *global* memory ordering.
 
 Any write you do after that smp_mb__before_after_clear_bit() will be 
 guaranteed to be visible to _other_ CPU's *after* they have seen the bit 
 being cleared. Yes, those other CPU's need to have a read barrier between 
 reading the bit and reading some other thign, but the point is, this hass 
 *nothing* to do with cache coherency, and the particular cache line that 
 y is in.
 
 And no, smp_mb__before/after_clear_bit() must *not* be just an empty do 
 {} while (0). It needs to be a compiler barrier even when it has no 
 actual CPU meaning, unless clear_bit() itself is guaranteed to be a 
 compiler barrier (which it isn't, although the volatile on the asm in 
 practice makes it something *close* to that).
 
 Why? Think of the sequence like this:
 
   clear_bit(x,y);
   smp_mb__after_clear_bit();
   other_variable = 10;
 
 the whole *point* of this sequence is that if another CPU does
 
   x = other_variable;
   smp_rmb();
   bit = test_bit(x,y)
 
 then if it sees x being 10, then the bit *has* to be clear.
 
 And this is why the compiler barrier in smp_mb__after_clear_bit() needs 
 to be a compiler barrier:
 
  - it doesn't matter for the action of the clear_bit() itself: that one 
is locked, and on x86 it thus also happens to be a serializing 
instruction, and the cache coherency and lock obviously means that the 
bit clearing *itself* is safe!
 
  - but it *does* matter for the compiler scheduling. If the compiler were 
to decide that y and other_variable are totally independent, it 
might otherwise decide to move the other_variable = 10 assignment to 
*before* the clear_bit(), which would make the whole code pointless!
 
 See? We have two totally independent issues:
 
  - the CPU itself can re-order the visibility of accesses. x86 doesn't do 
this very much, and doesn't do it at all across a locked instruction, 
but it's still a real issue, even if it tends to be much easier to see 
on other architectures.
 
  - the compiler doesn't care about rules of locked instruction at all, 
because it has no clue. It has *different* rules about how it can 
re-order instructions and accesses, and maybe the asm volatile will 
guarantee that the compiler won't re-order things around the 
clear_bit(), and maybe it won't. But making it a compiler barrier (by 
using the memory clobber thing, *guarantees* that gcc cannot reorder 
memory writes or reads.
 
 See? Two different - and _totally_ independent - levels of ordering, and 
 we need to make sure that both are valid.

Ok, thanks much (David and Nick too!) for taking the time to explain this
out clearly. It does look amazingly obvious now that I see it, with
callers using bitops to implement locking schemes who would completely
break otherwise -- in fact 6 and 8 in this series look amazingly obtuse
now ...


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [DRIVER SUBMISSION] DRBD wants to go mainline

2007-07-24 Thread Satyam Sharma

Hi Lars,


On 7/24/07, Lars Ellenberg [EMAIL PROTECTED] wrote:

On Mon, Jul 23, 2007 at 07:10:58PM +0530, Satyam Sharma wrote:
 On 7/23/07, Lars Ellenberg [EMAIL PROTECTED] wrote:
 On Sun, Jul 22, 2007 at 09:32:02PM -0400, Kyle Moffett wrote:
 [...]
  Don't use signals between kernel threads, use proper primitives like
  notifiers and waitqueues, which means you should also probably switch
 away
  from kernel_thread() to the kthread_*() APIs.  Also you should fix this
  FIXME or remove it if it no longer applies:-D.
 
 right.
 but how to I tell a network thread in tcp_recvmsg to stop early,
 without using signals?

 Yup, kthreads API cannot handle (properly stop) kernel threads that want
 to sleep on possibly-blocking-forever-till-signalled-functions such as
 tcp_recvmsg or skb_recv_datagram etc etc.

 There are two workarounds:
 1. Use sk_rcvtimeo and related while-continue logic
 2. force_sig(SIGKILL) to your kernel thread just before kthread_stop
   (note that you don't need to allow / write code to handle / etc signals
   in your kthread code -- force_sig will work automatically)

this is not only at stop time.
for example our drbd_asender thread
does receive as well as send,


That's normal -- in fact it would've been surprising if your kthread only
did recvs but no sends!

But where does the send come into the picture over here -- a send
won't block forever, so I don't foresee any issues whatsoever w.r.t.
kthreads conversion for that. [ BTW I hope you're *not* using any
signals-based interface for your kernel thread _at all_. Kthreads
disallow (ignore) all signals by default, as they should, and you really
shouldn't need to write any logic to handle or do-certain-things-on-seeing
a signal in a well designed kernel thread. ]


and the sending
latency is crucial to performance, while the recv
will not timeout for the next few seconds.


Again, I don't see what sending latency has to do with a kernel_thread
to kthread conversion. Or with signals, for that matter. Anyway, as
Kyle Moffett mentioned elsewhere, you could probably look at other
examples (say cifs_demultiplexer_thread() in fs/cifs/connect.c).

[ I didn't really want to give that example, because I get a nervous
breakdown when looking at that code myself, and would actively
like to save other fellow developers from a similar fate. To know
what I'm talking about, set your xterm to display 40 rows, and
then look at the line numbers 3139-3218 in that file, especially
3190-3212. Yes, what you see there is a map of Sulawesi [1]
subliminally hidden in Linux kernel code :-) ]

Anyway, cifs_demultiplexer_thread() is just your normal kthread that:

(1) Ignores all signals
(2) Calls perma-blocking-till-signalled functions such as tcp_recvmsg
   (via kernel_recvmsg)
(3) Calls send-to-socket kind of functions

Hence, it could get into trouble when the umount(2) code wants to stop
it with kthread_stop() and it happens to be blocked in tcp_recvmsg()
with noblock = 0 (hence sk_rcvtimeo == MAX_SCHEDULE_TIMEOUT), thus
would handle the wake_up_process() internally, and not break out, hence
not check kthread_should_stop() which it should -- all this ensuring that
the kthread never gets killed, kthread_stop() hangs, and the umount(2)
from userspace never returns ...

But they've solved it as follows (as I suggested earlier):

(1) First, set sock-sk_rcvtimeo to some magical value in your code
   that sets up the socket params after socket-proto_ops-connect().
   See ipv4_connect(), f.e. in CIFS they've set it up to 7 seconds. But
   that's arbitrarily chosen -- this'll ensure your tcp_recvmsg() isn't
   perma-blocking in the first place, but will unblock/return every 7 secs,
   and thus get a chance to check kthread_should_stop().

(2) From the code that wants to kill/stop the kthread (module exit, or
   umount(2) most probably), just ensure you make a call to force_sig()
   before kthread_stop() on that kthread -- see cifs_umount() in the
   same file. This'll ensure that even if the kthread is currently sleeping
   in tcp_recvmsg(), it'll be signalled to break out from there, and thus
   check kthread_should_stop().

(3) Note that not a single line of code needs to be written extra in the
   kthread itself for this to work -- nothing to allow / handle signals ...

Just this, should be enough for a smooth conversion to kthreads, IMHO.



  +/* THINK maybe we actually want to use the default event/%s worker
 threads
  + * or similar in linux 2.6, which uses per cpu data and threads.
  + *
  + * To be general, this might need a spin_lock member.
  + * For now, please use the mdev-req_lock to protect list_head,
  + * see drbd_queue_work below.
  + */
  +struct drbd_work_queue {
  +   struct list_head q;
  +   struct semaphore s; /* producers up it, worker down()s it */
  +   spinlock_t q_lock;  /* to protect the list. */
  +};
 
  Umm, how about fixing this to actually use proper workqueues or something
  instead of this open-coded mess?
 
 unlikely

Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()

2007-07-24 Thread Satyam Sharma

On 7/23/07, Jens Axboe [EMAIL PROTECTED] wrote:

On Sun, Jul 22 2007, Satyam Sharma wrote:
 Hi Walter,

 Thanks for reporting this.

 On 7/22/07, walter harms [EMAIL PROTECTED] wrote:
 hello all,
 on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21

 Did this happen when you were resuming from a suspend-to-ram/disk?
 [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ]

 
 Using IPI Shortcut mode
 WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug()
  [c01ac87e] blk_remove_plug+0x36/0x5a
  [c01ac8b6] __generic_unplug_device+0x14/0x1f
  [c01ad587] __make_request+0x39b/0x49c
  [c01abc8c] generic_make_request+0x228/0x255
  [c01adb54] submit_bio+0xa5/0xac
  [c013e233] mempool_alloc+0x37/0xae
  [c01314dc] submit+0xc2/0x11d
  [c0131585] bio_read_page+0x24/0x27
  [c013188b] swsusp_check+0x4f/0xaf
  [c012f6c2] software_resume+0x5f/0x108
  [c037867e] kernel_init+0xb0/0x212
  [c0103a16] ret_from_fork+0x6/0x1c
  [c03785ce] kernel_init+0x0/0x212
  [c03785ce] kernel_init+0x0/0x212
  [c010465b] kernel_thread_helper+0x7/0x10
  ===

 Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled
 alright on that codepath. OTOH, __make_request() is heavily goto-driven,
 uses the non-save/restore variants of spin_lock_irq, and does not even
 balance locks / unlocks for some error paths ... gaah.

__make_request() must be called from process context, hence
spin_lock_irq() is perfectly already and the fastest way to go. And of
course the locking is balanced! So please save your 'gaah's for code
you actually took the time to try and understand.


You're right, I didn't really look at that code for long (it even explicitly
comments about what's going with the locking in there!) sorry about
that.

[ Off-topic: BTW does every call to __make_request() end up in
blk_remove_plug()? Since you're explicitly making the assumption
that it *must* be called from process context (and hence the use of
the non-save/restore variants), you could consider putting a
WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON
(!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and
other such similar functions that currently have the !irqs_disabled
check. This way you'd effectively cover _both_ the assertions,
and in appropriate places -- just a suggestion. ]


But it does look like unbalanced irq disable/enable calls. I'd guess in
the suspend/resume path. Obviously something more esoteric, since this
is the first such report for 2.6.22, so like some not-very-used driver
for instance.


Now that I do look at the codepath, it does seem surprising irqs were
not disabled there. There are a bunch of calls to _other_ functions
between the spin_lock_irq and the blk_remove_plug via
__generic_unplug_device that would also have complained about
!irqs_disabled.

Walter, does this happen reproducibly?

Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] i386: bitops: Rectify bogus "Ir" constraints

2007-07-23 Thread Satyam Sharma
On Mon, 23 Jul 2007, H. Peter Anvin wrote:

> Linus Torvalds wrote:
> > 
> > On Mon, 23 Jul 2007, Satyam Sharma wrote:
> >> * The "I" constraint modifier is applicable only to immediate-value 
> >> operands,
> >>   and combining it with "r" is bogus.
> > 
> > This is wrong too.
> > 
> > The whole point of a "Ir" modifier is to say that the instruction takes 
> > *either* an "I" or an "r".
> > 
> > Andrew - the ones I've looked at were all wrong. Please don't take this 
> > series.
> > 
> 
> Incidentally, I just noticed the x86-64 bitops have "dIr" as their
> constraint set.  "d" would normally be redundant with "r", and as far as
> I know, gcc doesn't prefer one over the other without having "?" or "!"
> as part of the constraint, so is is "d" a stray or is there some meaning
> behind it?


Yup, I had noticed that myself. We would need to use "J" if we want
to limit the offsets to 0..63, but "d" sounds weird / stray indeed,
unless there's yet another undocumented/wrongly-documented mystery
behind this one too ... (I'd like to know even if that's the case,
obviously).

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] i386: bitops: Rectify bogus "Ir" constraints

2007-07-23 Thread Satyam Sharma
On Mon, 23 Jul 2007, Linus Torvalds wrote:

> On Mon, 23 Jul 2007, Satyam Sharma wrote:
> >
> > * The "I" constraint modifier is applicable only to immediate-value 
> > operands,
> >   and combining it with "r" is bogus.
> 
> This is wrong too.
> 
> The whole point of a "Ir" modifier is to say that the instruction takes 
> *either* an "I" or an "r".

Yup, sorry about this one, Andi pointed this out earlier. But the "I"
must still go I think, for the third reason in that changelog -- it
unnecessarily limits the bit offset to 0..31, but (at least from the
comment up front in that file) we do allow arbitrarily large @nr (upto
255, of course, these instructions won't take anything greater than that).

> Andrew - the ones I've looked at were all wrong. Please don't take this 
> series.

I think I'll rescind the series anyway, a lot of patches turned out to
be wrong -- some due to mis-reading / incorrect gcc docs, others due to
other reasons ... this was just something I did thinking of as a cleanup
anyway, so I don't intend to push or correct this or anything.

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] i386: bitops: Kill needless usage of __asm__ __volatile__

2007-07-23 Thread Satyam Sharma
On Mon, 23 Jul 2007, Jeremy Fitzhardinge wrote:

> I'm not quite sure what your point is.

Could be a case of terminology confusion ...

> The paragraph you quoted is
> pretty explicit in saying that volatile doesn't prevent an "asm
> volatile" from being interspersed with other code, and the example just
> before that is explicit in talking about how to use dependencies to
> control the ordering of asm volatiles with respect to surrounding code.

Yes, that was the (2).

> In fact nothing in that section precludes asm volatiles from being
> reordered with respect to each other either; you just have to make sure
> your dependency constraints are all correct.

The (3) as I had originally written / meant was that multiple
instructions in a volatile asm would not get _individually_
interspersed with the rest of the code i.e. be emitted out
_consecutively_. I don't think we need any such guarantees for
the non-atomic variants of those operations, so it's good to
let the compiler have a free hand with what it wants to do,
and optimize/combine multiple bitops as necessary / possible,
which was the original intention.

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/8] i386: bitops: Rectify bogus "+m" constraints

2007-07-23 Thread Satyam Sharma
On Mon, 23 Jul 2007, Andi Kleen wrote:

> On Monday 23 July 2007 18:05:43 Satyam Sharma wrote:
> > From: Satyam Sharma <[EMAIL PROTECTED]>
> > 
> > [3/8] i386: bitops: Rectify bogus "+m" constraints
> > 
> > From the gcc manual:
> > 
> >   Extended asm supports input-output or read-write operands. Use the
> >   constraint character `+' to indicate such an operand and list it with
> >   the output operands. You should only use read-write operands when the
> >   constraints for the operand (or the operand in which only some of the
> >   bits are to be changed) allow a register.
> > 
> > So, using the "+" constraint modifier for memory, like "+m" is bogus.
> > We must simply specify "=m" which handles the case correctly.
> 
> I checked with Honza (cc'ed) and he stated that the + are really needed
> at least in newer gcc.


That extract is from the latest (4.2.1) manual, but they could've
forgotten to update the documentation, of course.

But even then, I'm not sure they could ever make it "really needed",
that'll be a step that would break all existing code, otherwise! :-)

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered unnecessarily

2007-07-23 Thread Satyam Sharma
Hi,

On Mon, 23 Jul 2007, Andi Kleen wrote:

> 
> > Yes, but _that_ address (of the bit-string) is protected already -- by the
> > implicit memory barrier due to the LOCK prefix.
> 
> Compiler barrier != CPU barrier.

Exactly, but the actual _synchronization_ in all users of the bitops API
should (should, at least, otherwise the bugs lie in the callers) depend
upon the _bit-string_ whose address is passed to us. That could be some
flags/lock/etc in some caller, whatever, but all the synchronization in
those users would be based upon _that_ -- and we're handling it alright
with the full CPU barrier for _that_ address. Also note that in all the
atomic/unatomic variants, we always constrain the passed pointer to
memory and for the atomic/locked variants at least, I don't really see
how compiler optimizations would get us into trouble (again, for the
_passed bit-string memory address_).

> The memory clobber is a compiler barrier that prevents its global optimizer
> from moving memory references.
> [...]
> > We shouldn't really be 
> > caring about any other memory addresses, so it doesn't affect the
> > correctness of the bitops API at all.
> 
> The problem is the relationship to other operations.


... I don't want to get argumentative, but IMHO, we don't really need
to care about other (other than the passed memory address) references.
The problem with marking all of memory clobbered is that we're
disallowing legitimate optimizations for other memory references.

> The CPU memory ordering guarantees are completely 
> independent from this.

Of course.

> This is not theoretic and we have had bugs because of this.

Those could've been due to buggy users, not necessarily the bitops.

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] i386: bitops: Kill needless usage of __asm__ __volatile__

2007-07-23 Thread Satyam Sharma
Hi Jeremy,


On Mon, 23 Jul 2007, Jeremy Fitzhardinge wrote:

> Satyam Sharma wrote:
> > From: Satyam Sharma <[EMAIL PROTECTED]>
> >
> > [7/8] i386: bitops: Kill needless usage of __asm__ __volatile__
> >
> > Another oddity I noticed in this file. The semantics of __volatile__
> > when used to qualify inline __asm__ are that the compiler will not
> > (1) elid, or, (2) reorder, or, (3) intersperse, our inline asm with
> > the rest of the generated code.
> >   
> 
> "asm volatile" does not mean that at all.  It only guarantees (1),


Actually, you're probably right about (2), but (3)?

>From the gcc manual:



Similarly, you can't expect a sequence of volatile asm instructions to
remain perfectly consecutive. If you want consecutive output, use a
single asm. Also GCC will perform some optimizations across a volatile
asm instruction, GCC does not "forget everything" when it encounters a
volatile asm instruction the way some other compilers do.



I'm reading "Similarly, you can't expect a sequence of volatile asm
instructions to remain perfectly consecutive" to mean they're talking
about something like:

asm volatile(...);
asm volatile(...);
asm volatile(...);

But "If you want consecutive output, use a single asm" probably means:

asm volatile(... multiple instructions here ...);

would actually ensure the code written in there would not be
interspersed ... at least that's how I read it.

[ BTW "Also GCC will perform some optimizations across a volatile
asm instruction ..." is exactly the kind of optimizations we must
allow the compiler to do, but that's not related to point at hand. ]


> and
> only then if the asm is ever reachable.

Yup, of course.

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] i386: bitops: Kill needless usage of __asm__ __volatile__

2007-07-23 Thread Satyam Sharma
Hi,

On Mon, 23 Jul 2007, Andi Kleen wrote:

> On Monday 23 July 2007 18:06:03 Satyam Sharma wrote:
> > From: Satyam Sharma <[EMAIL PROTECTED]>
> > 
> > [7/8] i386: bitops: Kill needless usage of __asm__ __volatile__
> > 
> > Another oddity I noticed in this file. The semantics of __volatile__
> > when used to qualify inline __asm__ are that the compiler will not
> > (1) elid, or, (2) reorder, or, (3) intersperse, our inline asm with
> > the rest of the generated code.
> > 
> > However, we do not want these guarantees in the unlocked variants of the
> > bitops functions. 
> 
> I thought so too and did a similar transformation while moving
> some string functions out of line. After that recent misadventure
> I would be very very careful with this.

Ah, ok, so this must be the case we'd stumbled upon recently on the
other thread. I hadn't got your fix for this, so didn't know.

> Overall I'm sorry to say, but the risk:gain ratio of this
> patch is imho totally out of whack.

The patch does look correct to me, we're only killing the use of
__volatile__ from places that don't require it (the guarantee-less
variants). Without losing it, I really don't see how the compiler
can ever combine multiple bitops, which does sound beneficial when
the caller has already implemented higher-level locking around
the usage of these operations.

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered unnecessarily

2007-07-23 Thread Satyam Sharma
Hi Andi,


On Mon, 23 Jul 2007, Andi Kleen wrote:

> On Monday 23 July 2007 18:05:58 Satyam Sharma wrote:
> > From: Satyam Sharma <[EMAIL PROTECTED]>
> > 
> > [6/8] i386: bitops: Don't mark memory as clobbered unnecessarily
> > 
> > The goal is to let gcc generate good, beautiful, optimized code.
> 
> 
> The first goal is correct code.
> 
> The reason for the memory barrier is to prevent other memory references
> from being moved over the atomic reference.
> 
> e.g. when a bit is used to communicate with another CPU this might be 
> dangerous.


Yes, but _that_ address (of the bit-string) is protected already -- by the
implicit memory barrier due to the LOCK prefix. We shouldn't really be
caring about any other memory addresses, so it doesn't affect the
correctness of the bitops API at all.

[ and at least the __test_and_change_bit() case is definitely okay. ]

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/8] i386: bitops: Rectify bogus "Ir" constraints

2007-07-23 Thread Satyam Sharma
Hi Andi,


On Mon, 23 Jul 2007, Andi Kleen wrote:

> On Monday 23 July 2007 18:05:38 Satyam Sharma wrote:
> > From: Satyam Sharma <[EMAIL PROTECTED]>
> > 
> > [2/8] i386: bitops: Rectify bogus "Ir" constraints
> > 
> > The "I" constraint (on the i386 platform) is used to restrict constants to
> > the 0..31 range, for use with instructions that must deal with bit numbers.
> 
> It means I or r, not I modified by r. This means either a immediate constant
> 0..31 or a register, which is correct.
> 
> % cat t18.c 
> 
> f()
> {
> asm("xxx %0" :: "rI" (10));
> asm("yyy %0" :: "rI" (100));
> }
> % gcc -O2 -S t18.c
> % cat t18.s
> ...
> f:
> .LFB2:
> #APP
> xxx $10
> #NO_APP
> movl$100, %eax
> #APP
> yyy %eax
> #NO_APP
> ret
> .LFE2:
> ...


Whoa, thanks for explaining that to me -- I didn't know, obviously. I had
just written a test program that used "Ir" with an automatic variable
defined in the inline function (as is the case with these bitops) and
observed that even when I gave > 32 values, it would still work -- hence
my conclusion.

However, the patch still stands, does it not? [ I will modify the
changelog, obviously. ] The thing is that we don't want to limit
@nr to <= 31 in the first place, or am I wrong again? :-)

Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/8] i386: bitops: Kill needless usage of __asm__ __volatile__

2007-07-23 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[7/8] i386: bitops: Kill needless usage of __asm__ __volatile__

Another oddity I noticed in this file. The semantics of __volatile__
when used to qualify inline __asm__ are that the compiler will not
(1) elid, or, (2) reorder, or, (3) intersperse, our inline asm with
the rest of the generated code.

However, we do not want these guarantees in the unlocked variants of the
bitops functions. Also, note that test_bit() is *always* an unlocked
operation on i386 -- the i386 instruction set disallows the use of the LOCK
prefix with the "btl" instruction anyway, and the CPU will throw an invalid
opcode exception otherwise. Hence, every caller of variable_test_bit() must
have all the required locking implemented at a higher-level anyway and this
operation would necessarily be guarantee-less.

So let's remove the __volatile__ qualification of the inline asm in the
variable_test_bit() function also -- and let's give the compiler a chance
to optimize / combine multiple bitops, if possible.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Cc: David Howells <[EMAIL PROTECTED]>
Cc: Nick Piggin <[EMAIL PROTECTED]>

---

 include/asm-i386/bitops.h |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/asm-i386/bitops.h b/include/asm-i386/bitops.h
index f37b8a2..4f1fda5 100644
--- a/include/asm-i386/bitops.h
+++ b/include/asm-i386/bitops.h
@@ -96,7 +96,7 @@ static inline void clear_bit(int nr, unsigned long *addr)
  */
 static inline void __clear_bit(int nr, unsigned long *addr)
 {
-   __asm__ __volatile__(
+   __asm__(
"btrl %1,%0"
:"=m" (*addr)
:"r" (nr));
@@ -123,7 +123,7 @@ static inline void __clear_bit(int nr, unsigned long *addr)
  */
 static inline void __change_bit(int nr, unsigned long *addr)
 {
-   __asm__ __volatile__(
+   __asm__(
"btcl %1,%0"
:"=m" (*addr)
:"r" (nr));
@@ -251,7 +251,7 @@ static inline int __test_and_change_bit(int nr, unsigned 
long *addr)
 {
int oldbit;
 
-   __asm__ __volatile__(
+   __asm__(
"btcl %2,%1\n\tsbbl %0,%0"
:"=r" (oldbit),"=m" (*addr)
:"r" (nr));
@@ -297,7 +297,7 @@ static inline int variable_test_bit(int nr, const unsigned 
long *addr)
 {
int oldbit;
 
-   __asm__ __volatile__(
+   __asm__(
"btl %2,%1\n\tsbbl %0,%0"
:"=r" (oldbit)
:"m" (*addr),"r" (nr));
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions

2007-07-23 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[8/8] i386: bitops: smp_mb__{before, after}_clear_bit() definitions

>From Documentation/atomic_ops.txt, those archs that require explicit
memory barriers around clear_bit() must also implement these two interfaces.
However, for i386, clear_bit() is a strict, locked, atomic and
un-reorderable operation and includes an implicit memory barrier already.

But these two functions have been wrongly defined as "barrier()" which is
a pointless _compiler optimization_ barrier, and only serves to make gcc
not do legitimate optimizations that it could have otherwise done.

So let's make these proper no-ops, because that's exactly what we require
these to be on the i386 platform.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Cc: David Howells <[EMAIL PROTECTED]>
Cc: Nick Piggin <[EMAIL PROTECTED]>

---

[ A similar optimization needs to be done in the atomic.h also.
  Will submit that patch shortly. ]

 include/asm-i386/bitops.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/asm-i386/bitops.h b/include/asm-i386/bitops.h
index 4f1fda5..42999eb 100644
--- a/include/asm-i386/bitops.h
+++ b/include/asm-i386/bitops.h
@@ -106,8 +106,8 @@ static inline void __clear_bit(int nr, unsigned long *addr)
  * Bit operations are already serializing on x86.
  * These must still be defined here for API completeness.
  */
-#define smp_mb__before_clear_bit() barrier()
-#define smp_mb__after_clear_bit()  barrier()
+#define smp_mb__before_clear_bit() do {} while (0)
+#define smp_mb__after_clear_bit()  do {} while (0)
 
 /**
  * __change_bit - Toggle a bit in memory
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/8] i386: bitops: Contain warnings fallout from the death of volatiles

2007-07-23 Thread Satyam Sharma
From: Satyam Sharma <[EMAIL PROTECTED]>

[5/8] i386: bitops: Contain warnings fallout from the death of volatiles

The wrappers below included from all over tree re-used "volatile" just
because the bitops used them. With them killed, almost every file ends
up crying about:

warning: passing argument 2 of 'set_bit' discards qualifiers from
pointer target type

Silence these bogus warnings by killing volatile casts a second time.

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>
Cc: David Howells <[EMAIL PROTECTED]>
Cc: Nick Piggin <[EMAIL PROTECTED]>

---

[ Cscope tells us there are other such wrappers that didn't show up
  in my test .config -- I'll update them shortly.

  Also, I should probably merge this patch with the previous one.
  Otherwise git-bisecters who hit the window between these two
  patches will be flooded with bogus warnings and would probably
  want to kill me :-) ]

 include/linux/cpumask.h  |4 ++--
 include/linux/nodemask.h |4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 23f5514..49f6ed4 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -89,13 +89,13 @@ typedef struct { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;
 extern cpumask_t _unused_cpumask_arg_;
 
 #define cpu_set(cpu, dst) __cpu_set((cpu), &(dst))
-static inline void __cpu_set(int cpu, volatile cpumask_t *dstp)
+static inline void __cpu_set(int cpu, cpumask_t *dstp)
 {
set_bit(cpu, dstp->bits);
 }
 
 #define cpu_clear(cpu, dst) __cpu_clear((cpu), &(dst))
-static inline void __cpu_clear(int cpu, volatile cpumask_t *dstp)
+static inline void __cpu_clear(int cpu, cpumask_t *dstp)
 {
clear_bit(cpu, dstp->bits);
 }
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 52c54a5..81ba056 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -89,13 +89,13 @@ typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } 
nodemask_t;
 extern nodemask_t _unused_nodemask_arg_;
 
 #define node_set(node, dst) __node_set((node), &(dst))
-static inline void __node_set(int node, volatile nodemask_t *dstp)
+static inline void __node_set(int node, nodemask_t *dstp)
 {
set_bit(node, dstp->bits);
 }
 
 #define node_clear(node, dst) __node_clear((node), &(dst))
-static inline void __node_clear(int node, volatile nodemask_t *dstp)
+static inline void __node_clear(int node, nodemask_t *dstp)
 {
clear_bit(node, dstp->bits);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    2   3   4   5   6   7   8   9   10   11   >