Re: [RFC] extending splice for copy offloading

2013-09-27 Thread Miklos Szeredi
On Fri, Sep 27, 2013 at 10:50 PM, Zach Brown  wrote:
>> Also, I don't get the first option above at all.  The argument is that
>> it's safer to have more copies?  How much safety does another copy on
>> the same disk really give you?  Do systems that do dedup provide
>> interfaces to turn it off per-file?

I don't see the safety argument very compelling either.  There are
real semantic differences, however: ENOSPC on a write to a
(apparentlĂ­y) already allocated block.  That could be a bit
unexpected.  Do we need a fallocate extension to deal with shared
blocks?

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] carl9170: fix leaks at failure path in carl9170_usb_probe()

2013-09-27 Thread Alexey Khoroshilov

On 28.09.2013 00:17, Fabio Estevam wrote:

On Sat, Sep 28, 2013 at 12:51 AM, Alexey Khoroshilov
 wrote:


-   return request_firmware_nowait(THIS_MODULE, 1, CARL9170FW_NAME,
+   err = request_firmware_nowait(THIS_MODULE, 1, CARL9170FW_NAME,
 >udev->dev, GFP_KERNEL, ar, carl9170_usb_firmware_step2);
+   if (err) {
+   usb_put_dev(udev);
+   usb_put_dev(udev);

You are doing the same free twice.

Yes, because it was get twice.

I guess you meant to also free: usb_put_dev(ar->udev)

udev and ar->udev are equal, so technically the patch is correct.

I agree that there is some inconsistency, but I would prefer to fix it 
at usb_get_dev() side with a comment about reasons for the double get.


--
Alexey

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 16/22] dm: Refactor for new bio cloning/splitting

2013-09-27 Thread Mike Snitzer
On Wed, Aug 07 2013 at  5:54pm -0400,
Kent Overstreet  wrote:

> We need to convert the dm code to the new bvec_iter primitives which
> respect bi_bvec_done; they also allow us to drastically simplify dm's
> bio splitting code.
> 
> Also kill bio_sector_offset(), dm was the only user and it doesn't make
> much sense anymore.
> 
> Signed-off-by: Kent Overstreet 
> Cc: Jens Axboe 
> Cc: Alasdair Kergon 
> Cc: dm-de...@redhat.com
> ---
>  drivers/md/dm.c | 170 
> ++--
>  fs/bio.c|  38 
>  include/linux/bio.h |   1 -
>  3 files changed, 18 insertions(+), 191 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 5544af7..696269d 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1050,7 +1050,6 @@ struct clone_info {



>  /*
>   * Creates a bio that consists of range of complete bvecs.
>   */
>  static void clone_bio(struct dm_target_io *tio, struct bio *bio,
> -   sector_t sector, unsigned short idx,
> -   unsigned short bv_count, unsigned len)
> +   sector_t sector, unsigned len)
>  {
>   struct bio *clone = >clone;
> - unsigned trim = 0;
>  
>   __bio_clone(clone, bio);
> - bio_setup_sector(clone, sector, len);
> - bio_setup_bv(clone, idx, bv_count);
>  
> - if (idx != bio->bi_iter.bi_idx ||
> - clone->bi_iter.bi_size < bio->bi_iter.bi_size)
> - trim = 1;
> - clone_bio_integrity(bio, clone, idx, len, 0, trim);
> + if (bio_integrity(bio))
> + bio_integrity_clone(clone, bio, GFP_NOIO);
> +
> + bio_advance(clone, (sector - clone->bi_iter.bi_sector) << 9);
> + bio->bi_iter.bi_size = len << 9;
> +
> + if (bio_integrity(bio))
> + bio_integrity_trim(clone, 0, len);
>  }
>  
>  static struct dm_target_io *alloc_tio(struct clone_info *ci,
> @@ -1182,10 +1137,7 @@ static int __send_empty_flush(struct clone_info *ci)
>  }
>  
>  static void __clone_and_map_data_bio(struct clone_info *ci, struct dm_target 
> *ti,
> -  sector_t sector, int nr_iovecs,
> -  unsigned short idx, unsigned short 
> bv_count,
> -  unsigned offset, unsigned len,
> -  unsigned split_bvec)
> +  sector_t sector, unsigned len)
>  {
>   struct bio *bio = ci->bio;
>   struct dm_target_io *tio;
> @@ -1199,11 +1151,8 @@ static void __clone_and_map_data_bio(struct clone_info 
> *ci, struct dm_target *ti
>   num_target_bios = ti->num_write_bios(ti, bio);
>  
>   for (target_bio_nr = 0; target_bio_nr < num_target_bios; 
> target_bio_nr++) {
> - tio = alloc_tio(ci, ti, nr_iovecs, target_bio_nr);
> - if (split_bvec)
> - clone_split_bio(tio, bio, sector, idx, offset, len);
> - else
> - clone_bio(tio, bio, sector, idx, bv_count, len);
> + tio = alloc_tio(ci, ti, 0, target_bio_nr);
> + clone_bio(tio, bio, sector, len);
>   __map_bio(tio);
>   }
>  }

Hey Kent,

I haven't been able to pinpoint the issue yet, but using your for-jens
branch, if I create a dm-thin volume with this lvm command:
lvcreate -L20G -V20G -T vg/pool --name thinlv

and try to format /dev/vg/thinlv with XFS the kernel warns and then
hangs with the following:

WARNING: CPU: 0 PID: 11789 at include/linux/bio.h:202 bio_advance+0xd0/0xe0()
Attempted to advance past end of bvec iter
Modules linked in: dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio 
libcrc32c skd(O) ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 
target_core_i
block target_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe 
libfcoe 8021q libfc garp stp llc scsi_transport_fc scsi_tgt sunrpc 
cpufreq_ondemand ipt_R
EJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT 
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter 
ip6_tables bnx2i cni
c uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan 
vhost tun
kvm_intel kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core 
igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio 
dm_mod ses e
nclosure sg acpi_cpufreq freq_table ext4 jbd2 mbcache sr_mod cdrom pata_acpi 
ata_generic ata_piix sd_mod crc_t10dif crct10dif_common megaraid_sas
CPU: 0 PID: 11789 Comm: mkfs.xfs Tainted: GW  O 3.12.0-rc2.snitm+ #74
Hardware name: FUJITSU  PRIMERGY RX300 S6 
/D2619, BIOS 6.00 Rev. 1.10.2619.N1   05/24/2011
 00ca 8803313156a8 8151e8e8 00ca
 8803313156f8 8803313156e8 8104c23c 8803
 8802dd524220 0400 8802ddfb9680 

[PATCH] driver-core: remove struct bus_type.drv_attrs

2013-09-27 Thread Greg Kroah-Hartman
From: Greg Kroah-Hartman 

Now that all in-kernel users of bus_type.drv_attrs have been converted
to use drv_groups instead, the drv_attrs field, and logic surrounding
it, can be removed.

Signed-off-by: Greg Kroah-Hartman 
---
 drivers/base/bus.c |   40 ++--
 include/linux/device.h |2 --
 2 files changed, 2 insertions(+), 40 deletions(-)

--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -591,37 +591,6 @@ void bus_remove_device(struct device *de
bus_put(dev->bus);
 }
 
-static int driver_add_attrs(struct bus_type *bus, struct device_driver *drv)
-{
-   int error = 0;
-   int i;
-
-   if (bus->drv_attrs) {
-   for (i = 0; bus->drv_attrs[i].attr.name; i++) {
-   error = driver_create_file(drv, >drv_attrs[i]);
-   if (error)
-   goto err;
-   }
-   }
-done:
-   return error;
-err:
-   while (--i >= 0)
-   driver_remove_file(drv, >drv_attrs[i]);
-   goto done;
-}
-
-static void driver_remove_attrs(struct bus_type *bus,
-   struct device_driver *drv)
-{
-   int i;
-
-   if (bus->drv_attrs) {
-   for (i = 0; bus->drv_attrs[i].attr.name; i++)
-   driver_remove_file(drv, >drv_attrs[i]);
-   }
-}
-
 static int __must_check add_bind_files(struct device_driver *drv)
 {
int ret;
@@ -720,16 +689,12 @@ int bus_add_driver(struct device_driver
printk(KERN_ERR "%s: uevent attr (%s) failed\n",
__func__, drv->name);
}
-   error = driver_add_attrs(bus, drv);
+   error = driver_add_groups(drv, bus->drv_groups);
if (error) {
/* How the hell do we get out of this pickle? Give up */
-   printk(KERN_ERR "%s: driver_add_attrs(%s) failed\n",
-   __func__, drv->name);
-   }
-   error = driver_add_groups(drv, bus->drv_groups);
-   if (error)
printk(KERN_ERR "%s: driver_create_groups(%s) failed\n",
__func__, drv->name);
+   }
 
if (!drv->suppress_bind_attrs) {
error = add_bind_files(drv);
@@ -766,7 +731,6 @@ void bus_remove_driver(struct device_dri
 
if (!drv->suppress_bind_attrs)
remove_bind_files(drv);
-   driver_remove_attrs(drv->bus, drv);
driver_remove_groups(drv, drv->bus->drv_groups);
driver_remove_file(drv, _attr_uevent);
klist_remove(>p->knode_bus);
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -64,7 +64,6 @@ extern void bus_remove_file(struct bus_t
  * @dev_name:  Used for subsystems to enumerate devices like ("foo%u", 
dev->id).
  * @dev_root:  Default device to use as the parent.
  * @dev_attrs: Default attributes of the devices on the bus.
- * @drv_attrs: Default attributes of the device drivers on the bus.
  * @bus_groups:Default attributes of the bus.
  * @dev_groups:Default attributes of the devices on the bus.
  * @drv_groups: Default attributes of the device drivers on the bus.
@@ -106,7 +105,6 @@ struct bus_type {
const char  *dev_name;
struct device   *dev_root;
struct device_attribute *dev_attrs; /* use dev_groups instead */
-   struct driver_attribute *drv_attrs; /* use drv_groups instead */
const struct attribute_group **bus_groups;
const struct attribute_group **dev_groups;
const struct attribute_group **drv_groups;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] driver-core: remove struct bus_type.bus_attrs

2013-09-27 Thread Greg Kroah-Hartman
From: Greg Kroah-Hartman 

Now that all in-kernel users of bus_type.bus_attrs have been converted
to use bus_groups instead, the bus_attrs field, and logic surrounding
it, can be removed.

Signed-off-by: Greg Kroah-Hartman 

---
 drivers/base/bus.c |   42 --
 include/linux/device.h |2 --
 2 files changed, 44 deletions(-)

--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -846,42 +846,6 @@ struct bus_type *find_bus(char *name)
 }
 #endif  /*  0  */
 
-
-/**
- * bus_add_attrs - Add default attributes for this bus.
- * @bus: Bus that has just been registered.
- */
-
-static int bus_add_attrs(struct bus_type *bus)
-{
-   int error = 0;
-   int i;
-
-   if (bus->bus_attrs) {
-   for (i = 0; bus->bus_attrs[i].attr.name; i++) {
-   error = bus_create_file(bus, >bus_attrs[i]);
-   if (error)
-   goto err;
-   }
-   }
-done:
-   return error;
-err:
-   while (--i >= 0)
-   bus_remove_file(bus, >bus_attrs[i]);
-   goto done;
-}
-
-static void bus_remove_attrs(struct bus_type *bus)
-{
-   int i;
-
-   if (bus->bus_attrs) {
-   for (i = 0; bus->bus_attrs[i].attr.name; i++)
-   bus_remove_file(bus, >bus_attrs[i]);
-   }
-}
-
 static int bus_add_groups(struct bus_type *bus,
  const struct attribute_group **groups)
 {
@@ -983,9 +947,6 @@ int bus_register(struct bus_type *bus)
if (retval)
goto bus_probe_files_fail;
 
-   retval = bus_add_attrs(bus);
-   if (retval)
-   goto bus_attrs_fail;
retval = bus_add_groups(bus, bus->bus_groups);
if (retval)
goto bus_groups_fail;
@@ -994,8 +955,6 @@ int bus_register(struct bus_type *bus)
return 0;
 
 bus_groups_fail:
-   bus_remove_attrs(bus);
-bus_attrs_fail:
remove_probe_files(bus);
 bus_probe_files_fail:
kset_unregister(bus->p->drivers_kset);
@@ -1024,7 +983,6 @@ void bus_unregister(struct bus_type *bus
pr_debug("bus: '%s': unregistering\n", bus->name);
if (bus->dev_root)
device_unregister(bus->dev_root);
-   bus_remove_attrs(bus);
bus_remove_groups(bus, bus->bus_groups);
remove_probe_files(bus);
kset_unregister(bus->p->drivers_kset);
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -63,7 +63,6 @@ extern void bus_remove_file(struct bus_t
  * @name:  The name of the bus.
  * @dev_name:  Used for subsystems to enumerate devices like ("foo%u", 
dev->id).
  * @dev_root:  Default device to use as the parent.
- * @bus_attrs: Default attributes of the bus.
  * @dev_attrs: Default attributes of the devices on the bus.
  * @drv_attrs: Default attributes of the device drivers on the bus.
  * @bus_groups:Default attributes of the bus.
@@ -106,7 +105,6 @@ struct bus_type {
const char  *name;
const char  *dev_name;
struct device   *dev_root;
-   struct bus_attribute*bus_attrs; /* use bus_groups instead */
struct device_attribute *dev_attrs; /* use dev_groups instead */
struct driver_attribute *drv_attrs; /* use drv_groups instead */
const struct attribute_group **bus_groups;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Jason Low
On Fri, Sep 27, 2013 at 7:19 PM, Paul E. McKenney
 wrote:
> On Fri, Sep 27, 2013 at 04:54:06PM -0700, Jason Low wrote:
>> On Fri, Sep 27, 2013 at 4:01 PM, Paul E. McKenney
>>  wrote:
>> > Yep.  The previous lock holder's smp_wmb() won't keep either the compiler
>> > or the CPU from reordering things for the new lock holder.  They could for
>> > example reorder the critical section to precede the node->locked check,
>> > which would be very bad.
>>
>> Paul, Tim, Longman,
>>
>> How would you like the proposed changes below?
>
> Could you point me at what this applies to?  I can find flaws looking
> at random pieces, given a little luck, but at some point I need to look
> at the whole thing.  ;-)

Sure. Here is a link to the patch we are trying to modify:
https://lkml.org/lkml/2013/9/25/532

Also, below is what the mcs_spin_lock() and mcs_spin_unlock()
functions would look like after applying the proposed changes.

static noinline
void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
{
struct mcs_spin_node *prev;

/* Init node */
node->locked = 0;
node->next   = NULL;

prev = xchg(lock, node);
if (likely(prev == NULL)) {
/* Lock acquired. No need to set node->locked since it
won't be used */
return;
}
ACCESS_ONCE(prev->next) = node;
/* Wait until the lock holder passes the lock down */
while (!ACCESS_ONCE(node->locked))
arch_mutex_cpu_relax();
smp_mb();
}

static void mcs_spin_unlock(struct mcs_spin_node **lock, struct
mcs_spin_node *node)
{
struct mcs_spin_node *next = ACCESS_ONCE(node->next);

if (likely(!next)) {
/*
 * Release the lock by setting it to NULL
 */
if (cmpxchg(lock, node, NULL) == node)
return;
/* Wait until the next pointer is set */
while (!(next = ACCESS_ONCE(node->next)))
arch_mutex_cpu_relax();
}
smp_wmb();
ACCESS_ONCE(next->locked) = 1;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] tps65090-charger: Use "IS_ENABLED(CONFIG_OF)" for DT code.

2013-09-27 Thread Manish Badarkhe
Instead of "#if defined(CONFIG_OF)" use "IS_ENABLED(CONFIG_OF)" option
for DT code to avoid if-deffery in code.
Also, arranged header files in alphabetically.

Signed-off-by: Manish Badarkhe 
---
:100644 100644 bdd7b9b... 8b9c406... M  drivers/power/tps65090-charger.c
 drivers/power/tps65090-charger.c |   19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/power/tps65090-charger.c b/drivers/power/tps65090-charger.c
index bdd7b9b..8b9c406 100644
--- a/drivers/power/tps65090-charger.c
+++ b/drivers/power/tps65090-charger.c
@@ -15,15 +15,17 @@
  * You should have received a copy of the GNU General Public License
  * along with this program.  If not, see .
  */
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
-#include 
+#include 
 #include 
 #include 
+#include 
+
 #include 
 
 #define TPS65090_REG_INTR_STS  0x00
@@ -185,10 +187,6 @@ static irqreturn_t tps65090_charger_isr(int irq, void 
*dev_id)
return IRQ_HANDLED;
 }
 
-#if defined(CONFIG_OF)
-
-#include 
-
 static struct tps65090_platform_data *
tps65090_parse_dt_charger_data(struct platform_device *pdev)
 {
@@ -210,13 +208,6 @@ static struct tps65090_platform_data *
return pdata;
 
 }
-#else
-static struct tps65090_platform_data *
-   tps65090_parse_dt_charger_data(struct platform_device *pdev)
-{
-   return NULL;
-}
-#endif
 
 static int tps65090_charger_probe(struct platform_device *pdev)
 {
@@ -228,7 +219,7 @@ static int tps65090_charger_probe(struct platform_device 
*pdev)
 
pdata = dev_get_platdata(pdev->dev.parent);
 
-   if (!pdata && pdev->dev.of_node)
+   if (IS_ENABLED(CONFIG_OF) && !pdata && pdev->dev.of_node)
pdata = tps65090_parse_dt_charger_data(pdev);
 
if (!pdata) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] carl9170: fix leaks at failure path in carl9170_usb_probe()

2013-09-27 Thread Fabio Estevam
On Sat, Sep 28, 2013 at 12:51 AM, Alexey Khoroshilov
 wrote:

> -   return request_firmware_nowait(THIS_MODULE, 1, CARL9170FW_NAME,
> +   err = request_firmware_nowait(THIS_MODULE, 1, CARL9170FW_NAME,
> >udev->dev, GFP_KERNEL, ar, carl9170_usb_firmware_step2);
> +   if (err) {
> +   usb_put_dev(udev);
> +   usb_put_dev(udev);

You are doing the same free twice.

I guess you meant to also free: usb_put_dev(ar->udev)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] HID: roccat: Fix "cannot create duplicate filename" problems

2013-09-27 Thread Greg KH
On Sat, Sep 28, 2013 at 05:57:46AM +0200, Stefan Achatz wrote:
> Fixing some wrong macro stringification/concatenation.
> 
> Cc: Greg Kroah-Hartman 
> Signed-off-by: Stefan Achatz 
> ---
>  drivers/hid/hid-roccat-kone.c |2 +-
>  drivers/hid/hid-roccat-koneplus.c |4 ++--
>  drivers/hid/hid-roccat-kovaplus.c |4 ++--
>  drivers/hid/hid-roccat-pyra.c |4 ++--
>  4 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/hid/hid-roccat-kone.c b/drivers/hid/hid-roccat-kone.c
> index 602c188..6101816 100644
> --- a/drivers/hid/hid-roccat-kone.c
> +++ b/drivers/hid/hid-roccat-kone.c
> @@ -382,7 +382,7 @@ static ssize_t kone_sysfs_write_profilex(struct file *fp,
>  }
>  #define PROFILE_ATTR(number) \
>  static struct bin_attribute bin_attr_profile##number = { \
> - .attr = { .name = "profile##number", .mode = 0660 },\
> + .attr = { .name = "profile" #number, .mode = 0660 },\

Ugh, very sorry about that, I hate macros in C at times :)

Jiri, can you take this through your tree, or I can if you want, it
needs to go in for 3.12-final.

If yours, feel free to add:

Acked-by: Greg Kroah-Hartman 

Stefan, thanks for fixing my bugs, it's much appreciated.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] HID: roccat: Fix "cannot create duplicate filename" problems

2013-09-27 Thread Stefan Achatz
Fixing some wrong macro stringification/concatenation.

Cc: Greg Kroah-Hartman 
Signed-off-by: Stefan Achatz 
---
 drivers/hid/hid-roccat-kone.c |2 +-
 drivers/hid/hid-roccat-koneplus.c |4 ++--
 drivers/hid/hid-roccat-kovaplus.c |4 ++--
 drivers/hid/hid-roccat-pyra.c |4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/hid/hid-roccat-kone.c b/drivers/hid/hid-roccat-kone.c
index 602c188..6101816 100644
--- a/drivers/hid/hid-roccat-kone.c
+++ b/drivers/hid/hid-roccat-kone.c
@@ -382,7 +382,7 @@ static ssize_t kone_sysfs_write_profilex(struct file *fp,
 }
 #define PROFILE_ATTR(number)   \
 static struct bin_attribute bin_attr_profile##number = {   \
-   .attr = { .name = "profile##number", .mode = 0660 },\
+   .attr = { .name = "profile" #number, .mode = 0660 },\
.size = sizeof(struct kone_profile),\
.read = kone_sysfs_read_profilex,   \
.write = kone_sysfs_write_profilex, \
diff --git a/drivers/hid/hid-roccat-koneplus.c 
b/drivers/hid/hid-roccat-koneplus.c
index 5ddf605..5e99fcd 100644
--- a/drivers/hid/hid-roccat-koneplus.c
+++ b/drivers/hid/hid-roccat-koneplus.c
@@ -229,13 +229,13 @@ static ssize_t 
koneplus_sysfs_read_profilex_buttons(struct file *fp,
 
 #define PROFILE_ATTR(number)   \
 static struct bin_attribute bin_attr_profile##number##_settings = {\
-   .attr = { .name = "profile##number##_settings", .mode = 0440 }, \
+   .attr = { .name = "profile" #number "_settings", .mode = 0440 },
\
.size = KONEPLUS_SIZE_PROFILE_SETTINGS, \
.read = koneplus_sysfs_read_profilex_settings,  \
.private = _numbers[number-1],  \
 }; \
 static struct bin_attribute bin_attr_profile##number##_buttons = { \
-   .attr = { .name = "profile##number##_buttons", .mode = 0440 },  \
+   .attr = { .name = "profile" #number "_buttons", .mode = 0440 }, \
.size = KONEPLUS_SIZE_PROFILE_BUTTONS,  \
.read = koneplus_sysfs_read_profilex_buttons,   \
.private = _numbers[number-1],  \
diff --git a/drivers/hid/hid-roccat-kovaplus.c 
b/drivers/hid/hid-roccat-kovaplus.c
index 515bc03..0c8e1ef 100644
--- a/drivers/hid/hid-roccat-kovaplus.c
+++ b/drivers/hid/hid-roccat-kovaplus.c
@@ -257,13 +257,13 @@ static ssize_t 
kovaplus_sysfs_read_profilex_buttons(struct file *fp,
 
 #define PROFILE_ATTR(number)   \
 static struct bin_attribute bin_attr_profile##number##_settings = {\
-   .attr = { .name = "profile##number##_settings", .mode = 0440 }, \
+   .attr = { .name = "profile" #number "_settings", .mode = 0440 },
\
.size = KOVAPLUS_SIZE_PROFILE_SETTINGS, \
.read = kovaplus_sysfs_read_profilex_settings,  \
.private = _numbers[number-1],  \
 }; \
 static struct bin_attribute bin_attr_profile##number##_buttons = { \
-   .attr = { .name = "profile##number##_buttons", .mode = 0440 },  \
+   .attr = { .name = "profile" #number "_buttons", .mode = 0440 }, \
.size = KOVAPLUS_SIZE_PROFILE_BUTTONS,  \
.read = kovaplus_sysfs_read_profilex_buttons,   \
.private = _numbers[number-1],  \
diff --git a/drivers/hid/hid-roccat-pyra.c b/drivers/hid/hid-roccat-pyra.c
index 5a6dbbe..1a07e07 100644
--- a/drivers/hid/hid-roccat-pyra.c
+++ b/drivers/hid/hid-roccat-pyra.c
@@ -225,13 +225,13 @@ static ssize_t pyra_sysfs_read_profilex_buttons(struct 
file *fp,
 
 #define PROFILE_ATTR(number)   \
 static struct bin_attribute bin_attr_profile##number##_settings = {\
-   .attr = { .name = "profile##number##_settings", .mode = 0440 }, \
+   .attr = { .name = "profile" #number "_settings", .mode = 0440 },
\
.size = PYRA_SIZE_PROFILE_SETTINGS, \
.read = pyra_sysfs_read_profilex_settings,  \
.private = _numbers[number-1],  \
 }; \
 static struct bin_attribute bin_attr_profile##number##_buttons = { \
-   .attr = { .name = "profile##number##_buttons", .mode = 0440 },  \
+   .attr = { .name = "profile" #number "_buttons", .mode = 0440 }, \
.size = PYRA_SIZE_PROFILE_BUTTONS,  \
.read = pyra_sysfs_read_profilex_buttons,   \
.private = _numbers[number-1],  

[PATCH] carl9170: fix leaks at failure path in carl9170_usb_probe()

2013-09-27 Thread Alexey Khoroshilov
carl9170_usb_probe() does not handle request_firmware_nowait() failure
that leads to several leaks in this case.
The patch adds all required deallocations.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
---
 drivers/net/wireless/ath/carl9170/usb.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/carl9170/usb.c 
b/drivers/net/wireless/ath/carl9170/usb.c
index 307bc0d..3c76de1 100644
--- a/drivers/net/wireless/ath/carl9170/usb.c
+++ b/drivers/net/wireless/ath/carl9170/usb.c
@@ -1076,8 +1076,14 @@ static int carl9170_usb_probe(struct usb_interface *intf,
 
carl9170_set_state(ar, CARL9170_STOPPED);
 
-   return request_firmware_nowait(THIS_MODULE, 1, CARL9170FW_NAME,
+   err = request_firmware_nowait(THIS_MODULE, 1, CARL9170FW_NAME,
>udev->dev, GFP_KERNEL, ar, carl9170_usb_firmware_step2);
+   if (err) {
+   usb_put_dev(udev);
+   usb_put_dev(udev);
+   carl9170_free(ar);
+   }
+   return err;
 }
 
 static void carl9170_usb_disconnect(struct usb_interface *intf)
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 1/4] ARM: dts: am335x-bone: add CD for mmc1

2013-09-27 Thread Jason Kridner
On Thu, Sep 12, 2013 at 2:35 PM, Koen Kooi  wrote:
> From: Alexander Holler 
>
> This enables the use of MMC cards even when no card was inserted at boot.
>
> Signed-off-by: Alexander Holler 
> Signed-off-by: Koen Kooi 

Acked-by: Jason Kridner 

> ---
>  arch/arm/boot/dts/am335x-bone-common.dtsi | 14 ++
>  arch/arm/boot/dts/am335x-bone.dts |  1 -
>  2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/boot/dts/am335x-bone-common.dtsi 
> b/arch/arm/boot/dts/am335x-bone-common.dtsi
> index 2f66ded..0d95d54 100644
> --- a/arch/arm/boot/dts/am335x-bone-common.dtsi
> +++ b/arch/arm/boot/dts/am335x-bone-common.dtsi
> @@ -107,6 +107,12 @@
> 0x14c (PIN_INPUT_PULLDOWN | MUX_MODE7)
> >;
> };
> +
> +   mmc1_pins: pinmux_mmc1_pins {
> +   pinctrl-single,pins = <
> +   0x160 (PIN_INPUT | MUX_MODE7) /* GPIO0_6 */
> +   >;
> +   };
> };
>
> ocp {
> @@ -260,3 +266,11 @@
> pinctrl-0 = <_mdio_default>;
> pinctrl-1 = <_mdio_sleep>;
>  };
> +
> + {
> +   status = "okay";
> +   pinctrl-names = "default";
> +   pinctrl-0 = <_pins>;
> +   cd-gpios = < 6 GPIO_ACTIVE_HIGH>;
> +   cd-inverted;
> +};
> diff --git a/arch/arm/boot/dts/am335x-bone.dts 
> b/arch/arm/boot/dts/am335x-bone.dts
> index d5f43fe..0d63348 100644
> --- a/arch/arm/boot/dts/am335x-bone.dts
> +++ b/arch/arm/boot/dts/am335x-bone.dts
> @@ -17,6 +17,5 @@
>  };
>
>   {
> -   status = "okay";
> vmmc-supply = <_reg>;
>  };
> --
> 1.8.2.1
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Please revert 928bea964827d7824b548c1f8e06eccbbc4d0d7d

2013-09-27 Thread Benjamin Herrenschmidt
On Fri, 2013-09-27 at 16:44 -0700, Yinghai Lu wrote:

> > Thus the port driver bails out before calling pci_set_master(). The fix
> > is to call pci_set_master() unconditionally. However that lead me to
> > find to a few interesting oddities in that port driver code:
> 
> can we revert that partially change ? aka we should check get_port
> at first...
> 
> like attached.

In the meantime, can you properly submit the other one with the warning
to Linus ? It will make things more robust overall...

Also, please read my other comments. I think we are treading on very
fragile ground with that whole business of potentially disabling bridges
in the pcieport driver ...

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Waiman Long

On 09/27/2013 02:09 PM, Tim Chen wrote:

On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:

On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:

We will need the MCS lock code for doing optimistic spinning for rwsem.
Extracting the MCS code from mutex.c and put into its own file allow us
to reuse this code easily for rwsem.

Signed-off-by: Tim Chen
Signed-off-by: Davidlohr Bueso
---
  include/linux/mcslock.h |   58 +++
  kernel/mutex.c  |   58 +-
  2 files changed, 65 insertions(+), 51 deletions(-)
  create mode 100644 include/linux/mcslock.h

diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
new file mode 100644
index 000..20fd3f0
--- /dev/null
+++ b/include/linux/mcslock.h
@@ -0,0 +1,58 @@
+/*
+ * MCS lock defines
+ *
+ * This file contains the main data structure and API definitions of MCS lock.
+ */
+#ifndef __LINUX_MCSLOCK_H
+#define __LINUX_MCSLOCK_H
+
+struct mcs_spin_node {
+   struct mcs_spin_node *next;
+   int   locked;   /* 1 if lock acquired */
+};
+
+/*
+ * We don't inline mcs_spin_lock() so that perf can correctly account for the
+ * time spent in this lock function.
+ */
+static noinline
+void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node *node)
+{
+   struct mcs_spin_node *prev;
+
+   /* Init node */
+   node->locked = 0;
+   node->next   = NULL;
+
+   prev = xchg(lock, node);
+   if (likely(prev == NULL)) {
+   /* Lock acquired */
+   node->locked = 1;
+   return;
+   }
+   ACCESS_ONCE(prev->next) = node;
+   smp_wmb();
+   /* Wait until the lock holder passes the lock down */
+   while (!ACCESS_ONCE(node->locked))
+   arch_mutex_cpu_relax();
+}
+
+static void mcs_spin_unlock(struct mcs_spin_node **lock, struct mcs_spin_node 
*node)
+{
+   struct mcs_spin_node *next = ACCESS_ONCE(node->next);
+
+   if (likely(!next)) {
+   /*
+* Release the lock by setting it to NULL
+*/
+   if (cmpxchg(lock, node, NULL) == node)
+   return;
+   /* Wait until the next pointer is set */
+   while (!(next = ACCESS_ONCE(node->next)))
+   arch_mutex_cpu_relax();
+   }
+   ACCESS_ONCE(next->locked) = 1;
+   smp_wmb();

Shouldn't the memory barrier precede the "ACCESS_ONCE(next->locked) = 1;"?
Maybe in an "else" clause of the prior "if" statement, given that the
cmpxchg() does it otherwise.

Otherwise, in the case where the "if" conditionn is false, the critical
section could bleed out past the unlock.

Yes, I agree with you that the smp_wmb should be moved before
ACCESS_ONCE to prevent critical section from bleeding.  Copying Waiman
who is the original author of the mcs code to see if he has any comments
on things we may have missed.

Tim


As a more general lock/unlock mechanism, I also agreed that we should 
move smp_wmb() before ACCESS_ONCE(). For the mutex case, it is used as a 
queuing mechanism rather than guarding critical section, so it doesn't 
really matter.


Regards,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [block:for-next 5/6] drivers/block/skd_main.c:441:3: error: implicit declaration of function 'readq'

2013-09-27 Thread Jens Axboe
On Fri, Sep 27 2013, Jens Axboe wrote:
> On 09/27/2013 05:26 AM, Akhil Bhansali wrote:
> > Hello Jens,
> > 
> > There are few improvements suggested by community for driver:
> > 1. Replacement of custom macros for debug printing. (DPRINTK / VPRINTK).
> > 2. Reformatting of "skd_request_fn" which is too long.
> > 
> > Kindly let us know if you see any other changes required before it can be 
> > moved  to mainline kernel.
> 
> Did you forget to attach the patch?

Sorry, I misread that initially. The major eye sore right now is the
PRINTK stuff. The refactor of the request_fn would be a plus. Start with
those two, and I'll take a look and see if we need to do more right now.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] rsxx: Kernel Panic caused by mapping Discards.

2013-09-27 Thread Jens Axboe
On Fri, Sep 27 2013, Philip J. Kelleher wrote:
> From: Philip J Kelleher 
> 
> This fixes a kernel panic injected by commit id
> 8d26750143341831bc312f61c5ed141eeb75b8d0 where discards
> are getting mapped through the pci_map_page function call.
> 
> The driver will now start verifying that a dma is not a
> discard before issuing a the pci_map_page function call.
> 
> Also, we are updating the driver version.

Thanks, applied.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-27 Thread Bob Liu


On 09/28/2013 06:00 AM, Seth Jennings wrote:
> On Fri, Sep 27, 2013 at 12:16:37PM +0200, Tomasz Stanislawski wrote:
>> On 09/25/2013 11:57 PM, Seth Jennings wrote:
>>> On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote:
> I just had an idea this afternoon to potentially kill both these birds 
> with one
> stone: Replace the rbtree in zswap with an address_space.
>
> Each swap type would have its own page_tree to organize the compressed 
> objects
> by type and offset (radix tree is more suited for this anyway) and a_ops 
> that
> could be called by shrink_page_list() (writepage) or the migration code
> (migratepage).
>
> Then zbud pages could be put on the normal LRU list, maybe at the 
> beginning of
> the inactive LRU so they would live for another cycle through the list, 
> then be
> reclaimed in the normal way with the mapping->a_ops->writepage() pointing 
> to a
> zswap_writepage() function that would decompress the pages and call
> __swap_writepage() on them.
>
> This might actually do away with the explicit pool size too as the 
> compressed
> pool pages wouldn't be outside the control of the MM anymore.
>
> I'm just starting to explore this but I think it has promise.
>
> Seth
>

 Hi Seth,
 There is a problem with the proposed idea.
 The radix tree used 'struct address_space' is a part of
 a bigger data structure.
 The radix tree is used to translate an offset to a page.
 That is ok for zswap. But struct page has a field named 'index'.
 The MM assumes that this index is an offset in radix tree
 where one can find the page. A lot is done by MM to sustain
 this consistency.
>>>
>>> Yes, this is how it is for page cache pages.  However, the MM is able to
>>> work differently with anonymous pages.  In the case of an anonymous
>>> page, the mapping field points to an anon_vma struct, or, if ksm in
>>> enabled and dedup'ing the page, a private ksm tracking structure.  If
>>> the anonymous page is fully unmapped and resides only in the swap cache,
>>> the page mapping is NULL.  So there is precedent for the fields to mean
>>> other things.
>>
>> Hi Seth,
>> You are right that page->mapping is NULL for pages in swap_cache but
>> page_mapping() is not NULL in such a case. The mapping is taken from
>> struct address_space swapper_spaces[]. It is still an address space,
>> and it should preserve constraints for struct address_space.
>> The same happen for page->index and page_index().
>>
>>>
>>> The question is how to mark and identify zbud pages among the other page
>>> types that will be on the LRU.  There are many ways.  The question is
>>> what is the best and most acceptable way.
>>>
>>
>> If you consider hacking I have some idea how address_space could utilized 
>> for ZBUD.
>> One solution whould be using tags in a radix tree. Every entry in a radix 
>> tree
>> can have a few bits assigned to it. Currently 3 bits are supported:
>>
>> From include/linux/fs.h
>> #define PAGECACHE_TAG_DIRTY  0
>> #define PAGECACHE_TAG_WRITEBACK  1
>> #define PAGECACHE_TAG_TOWRITE2
>>
>> You could add a new bit or utilize one of existing ones.
>>
>> The other idea is use a trick from a RB trees and scatter-gather lists.
>> I mean using the last bits of pointers to keep some metadata.
>> Values of 'struct page *' variables are aligned to a pointer alignment which 
>> is
>> 4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could
>> could use the last bit of page pointer in a radix tree to track if a swap 
>> entry
>> refers to a lower or a higher part of a ZBUD page.
>> I think it is a serious hacking/obfuscation but it may work with the minimal
>> amount of changes to MM. Adding only (x&~3) while extracting page pointer is
>> probably enough.
>>
>> What do you think about this idea?
> 
> I think it is a good one.
> 
> I have to say that when I first came up with the idea, I was thinking
> the address space would be at the zswap layer and the radix slots would
> hold zbud handles, not struct page pointers.
> 
> However, as I have discovered today, this is problematic when it comes
> to reclaim and migration and serializing access.
> 
> I wanted to do as much as possible in the zswap layer since anything
> done in the zbud layer would need to be duplicated in any other future
> allocator that zswap wanted to support.
> 
> Unfortunately, zbud abstracts away the struct page and that visibility
> is needed to properly do what we are talking about.
> 
> So maybe it is inevitable that this will need to be in the zbud code
> with the radix tree slots pointing to struct pages after all.
> 

But in this way, zswap_frontswap_load() can't find zswap_entry. We still
need the rbtree in current zswap.

> I like the idea of masking the bit into the struct page pointer to
> indicate which buddy maps to the offset.
> 

I have no idea why 

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Paul E. McKenney
On Fri, Sep 27, 2013 at 04:54:06PM -0700, Jason Low wrote:
> On Fri, Sep 27, 2013 at 4:01 PM, Paul E. McKenney
>  wrote:
> > Yep.  The previous lock holder's smp_wmb() won't keep either the compiler
> > or the CPU from reordering things for the new lock holder.  They could for
> > example reorder the critical section to precede the node->locked check,
> > which would be very bad.
> 
> Paul, Tim, Longman,
> 
> How would you like the proposed changes below?

Could you point me at what this applies to?  I can find flaws looking
at random pieces, given a little luck, but at some point I need to look
at the whole thing.  ;-)

Thanx, Paul

> ---
> Subject: [PATCH] MCS: optimizations and barrier corrections
> 
> Delete the node->locked = 1 assignment if the lock is free as it won't be 
> used.
> 
> Delete the smp_wmb() in mcs_spin_lock() and add a full memory barrier at the
> end of the mcs_spin_lock() function. As Paul McKenney suggested, "you do need 
> a
> full memory barrier here in order to ensure that you see the effects of the
> previous lock holder's critical section." And in the mcs_spin_unlock(), move 
> the
> memory barrier so that it is before the "ACCESS_ONCE(next->locked) = 1;".
> 
> Signed-off-by: Jason Low 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Tim Chen 
> ---
>  include/linux/mcslock.h |7 +++
>  1 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> index 20fd3f0..edd57d2 100644
> --- a/include/linux/mcslock.h
> +++ b/include/linux/mcslock.h
> @@ -26,15 +26,14 @@ void mcs_spin_lock(struct mcs_spin_node **lock,
> struct mcs_spin_node *node)
> 
> prev = xchg(lock, node);
> if (likely(prev == NULL)) {
> -   /* Lock acquired */
> -   node->locked = 1;
> +   /* Lock acquired. No need to set node->locked since it
> won't be used */
> return;
> }
> ACCESS_ONCE(prev->next) = node;
> -   smp_wmb();
> /* Wait until the lock holder passes the lock down */
> while (!ACCESS_ONCE(node->locked))
> arch_mutex_cpu_relax();
> +   smp_mb();
>  }
> 
>  static void mcs_spin_unlock(struct mcs_spin_node **lock, struct
> mcs_spin_node *node)
> @@ -51,8 +50,8 @@ static void mcs_spin_unlock(struct mcs_spin_node
> **lock, struct mcs_spin_node *n
> while (!(next = ACCESS_ONCE(node->next)))
> arch_mutex_cpu_relax();
> }
> -   ACCESS_ONCE(next->locked) = 1;
> smp_wmb();
> +   ACCESS_ONCE(next->locked) = 1;
>  }
> 
>  #endif
> -- 
> 1.7.1
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Investment partnership read attached letter

2013-09-27 Thread Mr.Marie Leegail


Letter.docx
Description: Binary data


[PATCH] genirq: Avoid NULL OOPS in irq handling

2013-09-27 Thread Huacai Chen
Some devices (e.g. serial port) setup irq handler at dev open and free
it at dev close. So, sometimes there is no irqaction for a specific
irq. But some buggy devices may send irqs at any time. This patch avoid
the NULL OOPS when irqaction isn't registered.

Signed-off-by: Huacai Chen 
---
 kernel/irq/handle.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index 131ca17..1c78e69 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -135,6 +135,9 @@ handle_irq_event_percpu(struct irq_desc *desc, struct 
irqaction *action)
irqreturn_t retval = IRQ_NONE;
unsigned int flags = 0, irq = desc->irq_data.irq;
 
+   if (!action)
+   goto out;
+
do {
irqreturn_t res;
 
@@ -174,6 +177,7 @@ handle_irq_event_percpu(struct irq_desc *desc, struct 
irqaction *action)
 
add_interrupt_randomness(irq, flags);
 
+out:
if (!noirqdebug)
note_interrupt(irq, desc, retval);
return retval;
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1 2/7] arm64: introduce interfaces to hotpatch kernel and module code

2013-09-27 Thread Sandeepa Prabhu
On 27 September 2013 21:11, Jiang Liu  wrote:
> On 09/25/2013 10:35 PM, Sandeepa Prabhu wrote:
>> On 25 September 2013 16:14, Jiang Liu  wrote:
>>> From: Jiang Liu 
>>>
>>> Introduce aarch64_insn_patch_text() and __aarch64_insn_patch_text()
>>> to patch kernel and module code.
>>>
>>> Function aarch64_insn_patch_text() is a heavy version which may use
>>> stop_machine() to serialize all online CPUs, and function
>>> __aarch64_insn_patch_text() is light version without explicitly
>>> serialization.
>> Hi Jiang,
>>
>> I have written kprobes support for aarch64, and need both the
>> functionality (lightweight and stop_machine() versions).
>> I would like to rebase these API in kprobes, however slight changes
>> would require in case of stop_machine version, which I explained
>> below.
>> [Though kprobes cannot share Instruction encode support of jump labels
>> as, decoding & simulation quite different for kprobes/uprobes and
>> based around single stepping]
>>
>>>
>>> Signed-off-by: Jiang Liu 
>>> Cc: Jiang Liu 
>>> ---
>>>  arch/arm64/include/asm/insn.h |  2 ++
>>>  arch/arm64/kernel/insn.c  | 64 
>>> +++
>>>  2 files changed, 66 insertions(+)
>>>
>>> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
>>> index e7d1bc8..0ea7193 100644
>>> --- a/arch/arm64/include/asm/insn.h
>>> +++ b/arch/arm64/include/asm/insn.h
>>> @@ -49,5 +49,7 @@ __AARCH64_INSN_FUNCS(nop, 0x, 0xD503201F)
>>>  enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
>>>
>>>  bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
>>> +int aarch64_insn_patch_text(void *addr, u32 *insns, int cnt);
>>> +int __aarch64_insn_patch_text(void *addr, u32 *insns, int cnt);
>>>
>>>  #endif /* _ASM_ARM64_INSN_H */
>>> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
>>> index 8541c3a..50facfc 100644
>>> --- a/arch/arm64/kernel/insn.c
>>> +++ b/arch/arm64/kernel/insn.c
>>> @@ -15,6 +15,8 @@
>>>   * along with this program.  If not, see .
>>>   */
>>>  #include 
>>> +#include 
>>> +#include 
>>>  #include 
>>>
>>>  static int aarch64_insn_cls[] = {
>>> @@ -69,3 +71,65 @@ bool __kprobes aarch64_insn_hotpatch_safe(u32 old_insn, 
>>> u32 new_insn)
>>> return __aarch64_insn_hotpatch_safe(old_insn) &&
>>>__aarch64_insn_hotpatch_safe(new_insn);
>>>  }
>>> +
>>> +struct aarch64_insn_patch {
>>> +   void*text_addr;
>>> +   u32 *new_insns;
>>> +   int insn_cnt;
>>> +};
>>> +
>>> +int __kprobes __aarch64_insn_patch_text(void *addr, u32 *insns, int cnt)
>>> +{
>>> +   int i;
>>> +   u32 *tp = addr;
>>> +
>>> +   /* instructions must be word aligned */
>>> +   if (cnt <= 0 || ((uintptr_t)addr & 0x3))
>>> +   return -EINVAL;
>>> +
>>> +   for (i = 0; i < cnt; i++)
>>> +   tp[i] = insns[i];
>>> +
>>> +   flush_icache_range((uintptr_t)tp, (uintptr_t)tp + cnt * 
>>> sizeof(u32));
>>> +
>>> +   return 0;
>>> +}
>> Looks fine, but do you need to check for CPU big endian mode here? (I
>> think swab32() needed if EL1 is in big-endian mode)
> Hi Sandeepa,
> Thanks for reminder, we do need to take care of data access endian
> issue here, will fix it in next version.
>
>>
>>> +
>>> +static int __kprobes aarch64_insn_patch_text_cb(void *arg)
>>> +{
>>> +   struct aarch64_insn_patch *pp = arg;
>>> +
>>> +   return __aarch64_insn_patch_text(pp->text_addr, pp->new_insns,
>>> +pp->insn_cnt);
>>> +}
>>> +
>>> +int __kprobes aarch64_insn_patch_text(void *addr, u32 *insns, int cnt)
>>> +{
>>> +   int ret;
>>> +   bool safe = false;
>>> +
>>> +   /* instructions must be word aligned */
>>> +   if (cnt <= 0 || ((uintptr_t)addr & 0x3))
>>> +   return -EINVAL;
>>> +
>>> +   if (cnt == 1)
>>> +   safe = aarch64_insn_hotpatch_safe(*(u32 *)addr, insns[0]);
>>> +
>>> +   if (safe) {
>>> +   ret = __aarch64_insn_patch_text(addr, insns, cnt);
>>> +   } else {
>>
>> Can you move the code below this into separate API that just apply
>> patch with stop_machine? And then a wrapper function for jump label
>> specific handling that checks for aarch64_insn_hotpatch_safe() ?
>> Also, it will be good to move the patching code out of insn.c to
>> patch.c (refer to arch/arm/kernel/patch.c).
>>
>> Please refer to attached file (my current implementation) to make
>> sense of exactly what kprobes would need (ignore the big-endian part
>> for now). I think splitting the code should be straight-forward and we
>> can avoid two different implementations. Please let me know if this
>> can be done, I will rebase my patches above your next version.
>
> After reading the attached file, I feel current implementation of
> aarch64_insn_patch_text() should satisfy kprobe's requirements too.
>
> The extra optimization of aarch64_insn_hotpatch_safe() should work
> for 

[PATCH v5 1/4] Move Intel SNB device ids from sb_edac to pci_ids.h

2013-09-27 Thread Andy Lutomirski
The i2c_imc driver will use two of them, and moving only part of
the list seems messier.

Cc: Mauro Carvalho Chehab 
Cc: Rui Wang 
Signed-off-by: Andy Lutomirski 
---
 drivers/edac/sb_edac.c  | 30 --
 include/linux/pci_ids.h | 15 +++
 2 files changed, 15 insertions(+), 30 deletions(-)

diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index e04462b..4fac6f5 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -52,36 +52,6 @@ static int probed;
 #define GET_BITFIELD(v, lo, hi)\
(((v) & ((1ULL << ((hi) - (lo) + 1)) - 1) << (lo)) >> (lo))
 
-/*
- * sbridge Memory Controller Registers
- */
-
-/*
- * FIXME: For now, let's order by device function, as it makes
- * easier for driver's development process. This table should be
- * moved to pci_id.h when submitted upstream
- */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD0   0x3cf4  /* 12.6 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD1   0x3cf6  /* 12.7 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_BR 0x3cf5  /* 13.6 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA00x3ca0  /* 14.0 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA 0x3ca8  /* 15.0 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_RAS0x3c71  /* 15.1 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD0   0x3caa  /* 15.2 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD1   0x3cab  /* 15.3 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD2   0x3cac  /* 15.4 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD3   0x3cad  /* 15.5 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_DDRIO  0x3cb8  /* 17.0 */
-
-   /*
-* Currently, unused, but will be needed in the future
-* implementations, as they hold the error counters
-*/
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR0   0x3c72  /* 16.2 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR1   0x3c73  /* 16.3 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR2   0x3c76  /* 16.6 */
-#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR3   0x3c77  /* 16.7 */
-
 /* Devices 12 Function 6, Offsets 0x80 to 0xcc */
 static const u32 dram_rule[] = {
0x80, 0x88, 0x90, 0x98, 0xa0,
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index d1fe5d0..fb8932b 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2811,7 +2811,22 @@
 #define PCI_DEVICE_ID_INTEL_UNC_R2PCIE 0x3c43
 #define PCI_DEVICE_ID_INTEL_UNC_R3QPI0 0x3c44
 #define PCI_DEVICE_ID_INTEL_UNC_R3QPI1 0x3c45
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_RAS0x3c71  /* 15.1 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR0   0x3c72  /* 16.2 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR1   0x3c73  /* 16.3 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR2   0x3c76  /* 16.6 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR3   0x3c77  /* 16.7 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA00x3ca0  /* 14.0 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA 0x3ca8  /* 15.0 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD0   0x3caa  /* 15.2 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD1   0x3cab  /* 15.3 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD2   0x3cac  /* 15.4 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD3   0x3cad  /* 15.5 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_DDRIO  0x3cb8  /* 17.0 */
 #define PCI_DEVICE_ID_INTEL_JAKETOWN_UBOX  0x3ce0
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD0   0x3cf4  /* 12.6 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_BR 0x3cf5  /* 13.6 */
+#define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD1   0x3cf6  /* 12.7 */
 #define PCI_DEVICE_ID_INTEL_IOAT_SNB   0x402f
 #define PCI_DEVICE_ID_INTEL_5100_160x65f0
 #define PCI_DEVICE_ID_INTEL_5100_190x65f3
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 2/4] sb_edac: Claim a different PCI device

2013-09-27 Thread Andy Lutomirski
sb_edac controls a large number of different PCI functions.  Rather
than registering as a normal PCI driver for all of them, it
registers for just one so that it gets probed and, at probe time, it
looks for all the others.

Coincidentally, the device it registers for also contains the SMBUS
registers, so the PCI core will refuse to probe both sb_edac and an
iMC SMBUS driver.  The drivers don't actually conflict, so just
change sb_edac's device table to probe a different device.

An alternative fix would be to merge the two drivers, but sb_edac
will also refuse to load on non-ECC systems, whereas the i2c_imc
is still useful without ECC.

The only user-visible change should be that sb_edac appears to bind
a different device.

Cc: Mauro Carvalho Chehab 
Cc: Rui Wang 
Signed-off-by: Andy Lutomirski 
---
 drivers/edac/sb_edac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 4fac6f5..4f30aa7 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -338,7 +338,7 @@ static const struct pci_id_table 
pci_dev_descr_sbridge_table[] = {
  * pci_device_id   table for which devices we are looking for
  */
 static DEFINE_PCI_DEVICE_TABLE(sbridge_pci_tbl) = {
-   {PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA)},
+   {PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0)},
{0,}/* 0 terminated list. */
 };
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 4/4] i2c, i2c_imc: Add DIMM bus code

2013-09-27 Thread Andy Lutomirski
Add i2c_scan_dimm_bus to declare that a particular i2c_adapter
contains DIMMs.  This will probe (and autoload modules!) for useful
SMBUS devices that live on DIMMs.  i2c_imc calls it.

As more SMBUS-addressable DIMM components become supported, this
code can be extended to probe for them.

Signed-off-by: Andy Lutomirski 
---
 drivers/i2c/busses/Kconfig|  4 ++
 drivers/i2c/busses/Makefile   |  4 ++
 drivers/i2c/busses/dimm-bus.c | 97 +++
 drivers/i2c/busses/i2c-imc.c  |  3 ++
 include/linux/i2c/dimm-bus.h  | 24 +++
 5 files changed, 132 insertions(+)
 create mode 100644 drivers/i2c/busses/dimm-bus.c
 create mode 100644 include/linux/i2c/dimm-bus.h

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 3709540..3eca819 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -134,6 +134,10 @@ config I2C_ISMT
  This driver can also be built as a module.  If so, the module will be
  called i2c-ismt.
 
+config I2C_DIMM_BUS
+   tristate
+   default n
+
 config I2C_IMC
tristate "Intel iMC (LGA 2011) SMBus Controller"
depends on PCI && X86
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index d37340a..d4c0e02 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -25,6 +25,10 @@ obj-$(CONFIG_I2C_SIS96X) += i2c-sis96x.o
 obj-$(CONFIG_I2C_VIA)  += i2c-via.o
 obj-$(CONFIG_I2C_VIAPRO)   += i2c-viapro.o
 
+# DIMM busses
+obj-$(CONFIG_I2C_DIMM_BUS) += dimm-bus.o
+obj-$(CONFIG_I2C_IMC)  += i2c-imc.o
+
 # Mac SMBus host controller drivers
 obj-$(CONFIG_I2C_HYDRA)+= i2c-hydra.o
 obj-$(CONFIG_I2C_POWERMAC) += i2c-powermac.o
diff --git a/drivers/i2c/busses/dimm-bus.c b/drivers/i2c/busses/dimm-bus.c
new file mode 100644
index 000..0968428
--- /dev/null
+++ b/drivers/i2c/busses/dimm-bus.c
@@ -0,0 +1,97 @@
+/*
+ * Copyright (c) 2013 Andrew Lutomirski 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+static bool probe_addr(struct i2c_adapter *adapter, int addr)
+{
+   /*
+* So far, all known devices that live on DIMMs can be safely
+* and reliably detected by trying to read a byte at address
+* zero.  (The exception is the SPD write protection control,
+* which can't be probed and requires special hardware and/or
+* quick writes to access, and has no driver.)
+*/
+   union i2c_smbus_data dummy;
+   return i2c_smbus_xfer(adapter, addr, 0, I2C_SMBUS_READ, 0,
+ I2C_SMBUS_BYTE_DATA, ) >= 0;
+}
+
+/**
+ * i2c_scan_dimm_bus() - Scans an SMBUS segment known to contain DIMMs
+ * @adapter: The SMBUS adapter to scan
+ *
+ * This function tells the DIMM-bus code that the adapter is known to
+ * contain DIMMs.  i2c_scan_dimm_bus will probe for devices known to
+ * live on DIMMs.
+ *
+ * Do NOT call this function on general-purpose system SMBUS segments
+ * unless you know that the only things on the bus are DIMMs.
+ * Otherwise is it very likely to mis-identify other things on the
+ * bus.
+ *
+ * Callers are advised not to set adapter->class = I2C_CLASS_SPD.
+ */
+void i2c_scan_dimm_bus(struct i2c_adapter *adapter)
+{
+   struct i2c_board_info info = {};
+   int slot;
+
+   /*
+* We probe with "read byte data".  If any DIMM SMBUS driver can't
+* support that access type, this function should be updated.
+*/
+   if (WARN_ON(!i2c_check_functionality(adapter,
+ I2C_FUNC_SMBUS_READ_BYTE_DATA)))
+   return;
+
+   /*
+* Addresses on DIMMs use the three low bits to identify the slot
+* and the four high bits to identify the device type.  Known
+* devices are:
+*
+*  - 0x50 - 0x57: SPD (Serial Presence Detect) EEPROM
+*  - 0x30 - 0x37: SPD WP control -- not worth trying to probe
+*  - 0x18 - 0x1f: TSOD (Temperature Sensor on DIMM)
+*
+* There may be more some day.
+*/
+   for (slot = 0; slot < 8; slot++) {
+   /* If there's no SPD, then assume there's no DIMM here. */
+   if (!probe_addr(adapter, 0x50 | slot))
+   continue;
+
+   strcpy(info.type, "spd");
+   

[PATCH v5 3/4] i2c_imc: New driver for Intel's iMC, found on LGA2011 chips

2013-09-27 Thread Andy Lutomirski
Sandy Bridge Xeon and Extreme chips have integrated memory controllers
with (rather limited) onboard SMBUS masters.  This driver gives access
to the bus.

Signed-off-by: Andy Lutomirski 
---
 drivers/i2c/busses/Kconfig   |  15 ++
 drivers/i2c/busses/Makefile  |   1 +
 drivers/i2c/busses/i2c-imc.c | 559 +++
 3 files changed, 575 insertions(+)
 create mode 100644 drivers/i2c/busses/i2c-imc.c

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index dc6dea6..3709540 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -134,6 +134,21 @@ config I2C_ISMT
  This driver can also be built as a module.  If so, the module will be
  called i2c-ismt.
 
+config I2C_IMC
+   tristate "Intel iMC (LGA 2011) SMBus Controller"
+   depends on PCI && X86
+   select I2C_DIMM_BUS
+   help
+ If you say yes to this option, support will be included for the Intel
+ Integrated Memory Controller SMBus host controller interface.  This
+ controller is found on LGA 2011 Xeons and Core i7 Extremes.
+
+ It is possibly, although unlikely, that the use of this driver will
+ interfere with your platform's RAM thermal management.
+
+ This driver can also be built as a module.  If so, the module will be
+ called i2c-imc.
+
 config I2C_PIIX4
tristate "Intel PIIX4 and compatible 
(ATI/AMD/Serverworks/Broadcom/SMSC)"
depends on PCI
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index d00997f..d37340a 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -15,6 +15,7 @@ obj-$(CONFIG_I2C_AMD8111) += i2c-amd8111.o
 obj-$(CONFIG_I2C_I801) += i2c-i801.o
 obj-$(CONFIG_I2C_ISCH) += i2c-isch.o
 obj-$(CONFIG_I2C_ISMT) += i2c-ismt.o
+obj-$(CONFIG_I2C_IMC)  += i2c-imc.o
 obj-$(CONFIG_I2C_NFORCE2)  += i2c-nforce2.o
 obj-$(CONFIG_I2C_NFORCE2_S4985)+= i2c-nforce2-s4985.o
 obj-$(CONFIG_I2C_PIIX4)+= i2c-piix4.o
diff --git a/drivers/i2c/busses/i2c-imc.c b/drivers/i2c/busses/i2c-imc.c
new file mode 100644
index 000..c846077
--- /dev/null
+++ b/drivers/i2c/busses/i2c-imc.c
@@ -0,0 +1,559 @@
+/*
+ * Copyright (c) 2013 Andrew Lutomirski 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * The datasheet can be found here, for example:
+ * 
http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-1600-2600-vol-2-datasheet.pdf
+ *
+ * There seem to be quite a few bugs or spec errors, though:
+ *
+ *  - A successful transaction sets WOD and RDO.
+ *
+ *  - The docs for TSOD_POLL_EN make no sense (see imc_channel_claim).
+ *
+ *  - Erratum BT109, which says:
+ *
+ *  The processor may not complete SMBus (System Management Bus)
+ *  transactions targeting the TSOD (Temperature Sensor On DIMM)
+ *  when Package C-States are enabled. Due to this erratum, if the
+ *  processor transitions into a Package C-State while an SMBus
+ *  transaction with the TSOD is in process, the processor will
+ *  suspend receipt of the transaction. The transaction completes
+ *  while the processor is in a Package C-State.  Upon exiting
+ *  Package C-State, the processor will attempt to resume the
+ *  SMBus transaction, detect a protocol violation, and log an
+ *  error.
+ *
+ *   The description notwithstanding, I've seen difficult-to-reproduce
+ *   issues when the system goes completely idle (so package C-states can
+ *   be entered) while software-initiated SMBUS transactions are in
+ *   progress.
+ */
+
+/* Register offsets (in PCI configuration space) */
+#define SMBSTAT(i) (0x180 + 0x10*i)
+#define SMBCMD(i)  (0x184 + 0x10*i)
+#define SMBCNTL(i) (0x188 + 0x10*i)
+#define SMB_TSOD_POLL_RATE_CNTR(i) (0x18C + 0x10*i)
+#define SMB_TSOD_POLL_RATE (0x1A8)
+
+/* SMBSTAT fields */
+#define SMBSTAT_RDO(1U << 31)  /* Read Data Valid */
+#define SMBSTAT_WOD(1U << 30)  /* Write Operation Done */
+#define SMBSTAT_SBE(1U << 29)  /* SMBus Error */
+#define SMBSTAT_SMB_BUSY   (1U << 28)  /* SMBus Busy State */
+/* 26:24 is the last automatically polled TSOD 

Re: linux-next: build failure after merge of the final tree (char-misc tree related)

2013-09-27 Thread Greg KH
On Fri, Sep 27, 2013 at 06:48:04PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the final tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
> 
> drivers/misc/mic/card/mic_x100.c: In function 'mic_init':
> drivers/misc/mic/card/mic_x100.c:215:9: error: implicit declaration of 
> function 'cpu_data' [-Werror=implicit-function-declaration]
>   struct cpuinfo_x86 *c = _data(0);
>  ^



Should now be fixed, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the final tree (tty tree related)

2013-09-27 Thread Greg KH
On Fri, Sep 27, 2013 at 06:21:17PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the final tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
> 
> drivers/tty/serial/8250/8250_dw.c: In function 'dw8250_probe':
> drivers/tty/serial/8250/8250_dw.c:341:3: error: too many arguments to 
> function 'dw8250_probe_acpi'
>err = dw8250_probe_acpi(, data);
>^
> drivers/tty/serial/8250/8250_dw.c:281:19: note: declared here
>  static inline int dw8250_probe_acpi(struct uart_8250_port *up)
>^
> 
> Caused by commit fe95855539fd ("serial: 8250_dw: don't limit DMA support
> to ACPI") from the tty tree.
> 
> I have reverted that commit for today (and commit 7fb8c56c7fa0 ("serial:
> 8250_dw: provide a filter for DMA channel detection") that depends on it).

Should now be fixed, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the char-misc tree

2013-09-27 Thread Greg KH
On Fri, Sep 27, 2013 at 05:10:29PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the char-misc tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> drivers/misc/mic/host/mic_main.c: In function 'mic_probe':
> drivers/misc/mic/host/mic_main.c:320:3: error: too many arguments to function 
> 'sysfs_get_dirent'
>NULL, "state");
>^
> In file included from include/linux/kobject.h:21:0,
>  from include/linux/module.h:16,
>  from drivers/misc/mic/host/mic_main.c:26:
> include/linux/sysfs.h:465:1: note: declared here
>  sysfs_get_dirent(struct sysfs_dirent *parent_sd, const unsigned char *name)
>  ^
> 
> Caused by commit 3a6a9201897c ("Intel MIC Host Driver, card OS state
> management") interacting with commit 388975cccaaf ("sysfs: clean up
> sysfs_get_dirent()") from the driver-core tree.
> 
> I added this merge fix patch:

That looks correct, thanks, I'll carry this along to handle the merge
issue when it all goes to Linus.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 0/4] iMC SMBUS and DIMM bus probing

2013-09-27 Thread Andy Lutomirski
Intel LGA2011 machines have dedicated SMBUS controllers for DIMM
sockets.  Because they're dedicated, they can be safely and accurately
probed, since all devices on them are known to be attached to DIMMs.
The devices found are:
 - SPD EEPROMs
 - TSODs (Temperature Sensor on DIMMs)
 - Other interesting things, with drivers hopefully to follow...

This patch series adds a simple generic layer for probing for DIMMs over
SMBUS and an i2c bus driver for the iMC controller found on Intel
LGA2011 chips.

It now uses only modern infrastructure -- new-style I2C probing and the
at24 (instead of eeprom) driver.

I've tested this on a Core i7 Extreme and on a Xeon E5 server.

Patches 1-3 are useful even without patch 4.  I'm still hoping for feedback
on patch 4.

Changes from v4:
 - Added the sb_edac changes -- i2c_imc and sb_edac can now coexist
 - Added some paranoid race detection.
 - The driver now confirms its ability to claim software control of the SMBUS
   master.  This prevents unpleasant problems on systems that enable CLTT
   (closed loop thermal throttling).
 - Reordered the patches so that the DIMM bus code is last.

Changes from v3:
 - Dropped redundant "tsod" driver
 - Dropped eeprom modalias
 - Switched to probing for the "eeprom" and "jc42"

Andy Lutomirski (4):
  Move Intel SNB device ids from sb_edac to pci_ids.h
  sb_edac: Claim a different PCI device
  i2c_imc: New driver for Intel's iMC, found on LGA2011 chips
  i2c, i2c_imc: Add DIMM bus code

 drivers/edac/sb_edac.c|  32 +--
 drivers/i2c/busses/Kconfig|  19 ++
 drivers/i2c/busses/Makefile   |   5 +
 drivers/i2c/busses/dimm-bus.c |  97 
 drivers/i2c/busses/i2c-imc.c  | 562 ++
 include/linux/i2c/dimm-bus.h  |  24 ++
 include/linux/pci_ids.h   |  15 ++
 7 files changed, 723 insertions(+), 31 deletions(-)
 create mode 100644 drivers/i2c/busses/dimm-bus.c
 create mode 100644 drivers/i2c/busses/i2c-imc.c
 create mode 100644 include/linux/i2c/dimm-bus.h

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rwsem: reduce spinlock contention in wakeup code path

2013-09-27 Thread Waiman Long

On 09/27/2013 03:32 PM, Peter Hurley wrote:

On 09/27/2013 03:00 PM, Waiman Long wrote:

With the 3.12-rc2 kernel, there is sizable spinlock contention on
the rwsem wakeup code path when running AIM7's high_systime workload
on a 8-socket 80-core DL980 (HT off) as reported by perf:

   7.64%   reaim  [kernel.kallsyms]   [k] _raw_spin_lock_irqsave
  |--41.77%-- rwsem_wake
   1.61%   reaim  [kernel.kallsyms]   [k] _raw_spin_lock_irq
  |--92.37%-- rwsem_down_write_failed

That was 4.7% of recorded CPU cycles.

On a large NUMA machine, it is entirely possible that a fairly large
number of threads are queuing up in the ticket spinlock queue to do
the wakeup operation. In fact, only one will be needed.  This patch
tries to reduce spinlock contention by doing just that.

A new wakeup field is added to the rwsem structure. This field is
set on entry to rwsem_wake() and __rwsem_do_wake() to mark that a
thread is pending to do the wakeup call. It is cleared on exit from
those functions.

By checking if the wakeup flag is set, a thread can exit rwsem_wake()
immediately if another thread is pending to do the wakeup instead of
waiting to get the spinlock and find out that nothing need to be done.


This will leave readers stranded if a former writer is in __rwsem_do_wake
to wake up the readers and another writer steals the lock, but before
the former writer exits without having woken up the readers, the locking
stealing writer drops the lock and sees the wakeup flag is set, so
doesn't bother to wake the readers.

Regards,
Peter Hurley



Yes, you are right. That can be a problem. Thank for pointing this out. 
The workloads that I used doesn't seem to exercise the readers. I will 
modify the patch to add code handle this failure case by resetting the 
wakeup flag, pushing it out and then retrying one more time to get the 
read lock. I  think that should address the problem.


Regards,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v12 0/7] PHY framework

2013-09-27 Thread Greg KH
On Fri, Sep 27, 2013 at 11:53:24AM +0530, Kishon Vijay Abraham I wrote:
> Added a generic PHY framework that provides a set of APIs for the PHY drivers
> to create/destroy a PHY and APIs for the PHY users to obtain a reference to
> the PHY with or without using phandle.
> 
> This framework will be of use only to devices that uses external PHY (PHY
> functionality is not embedded within the controller).
> 
> The intention of creating this framework is to bring the phy drivers spread
> all over the Linux kernel to drivers/phy to increase code re-use and to
> increase code maintainability.
> 
> Comments to make PHY as bus wasn't done because PHY devices can be part of
> other bus and making a same device attached to multiple bus leads to bad
> design.
> 
> If the PHY driver has to send notification on connect/disconnect, the PHY
> driver should make use of the extcon framework. Using this susbsystem
> to use extcon framwork will have to be analysed.
> 
> You can find this patch series @
> git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy.git testing
> 
> I'll create a new branch *next* once this patch series is finalized. All the
> PHY driver development that depends on PHY framework can be based on this
> branch.
> 
> Did USB enumeration testing in panda and beagle after applying [1] (needed for
> non-dt)

All now applied to my usb-next branch.  Thanks for redoing this many
times and sticking with it.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] rsxx: Kernel Panic caused by mapping Discards.

2013-09-27 Thread Philip J. Kelleher
From: Philip J Kelleher 

This fixes a kernel panic injected by commit id
8d26750143341831bc312f61c5ed141eeb75b8d0 where discards
are getting mapped through the pci_map_page function call.

The driver will now start verifying that a dma is not a
discard before issuing a the pci_map_page function call.

Also, we are updating the driver version.

Signed-off-by: Philip J Kelleher 
---

diff -uprN -X linux-2.6.32-420.el6-vanilla/Documentation/dontdiff 
linux-2.6.32-420.el6-vanilla/drivers/block/rsxx/dma.c 
linux-2.6.32-420.el6/drivers/block/rsxx/dma.c
--- linux-2.6.32-420.el6-vanilla/drivers/block/rsxx/dma.c   2013-09-23 
09:15:30.0 -0500
+++ linux-2.6.32-420.el6/drivers/block/rsxx/dma.c   2013-09-26 
15:12:16.471465749 -0500
@@ -434,26 +434,29 @@ static void rsxx_issue_dmas(struct rsxx_
continue;
}
 
-   if (dma->cmd == HW_CMD_BLK_WRITE)
-   dir = PCI_DMA_TODEVICE;
-   else
-   dir = PCI_DMA_FROMDEVICE;
+   if (dma->cmd != HW_CMD_BLK_DISCARD) {
+   if (dma->cmd == HW_CMD_BLK_WRITE)
+   dir = PCI_DMA_TODEVICE;
+   else
+   dir = PCI_DMA_FROMDEVICE;
 
-   /*
-* The function pci_map_page is placed here because we can
-* only, by design, issue up to 255 commands to the hardware
-* at one time per DMA channel. So the maximum amount of mapped
-* memory would be 255 * 4 channels * 4096 Bytes which is less
-* than 2GB, the limit of a x8 Non-HWWD PCIe slot. This way the
-* pci_map_page function should never fail because of a
-* lack of mappable memory.
-*/
-   dma->dma_addr = pci_map_page(ctrl->card->dev, dma->page,
-dma->pg_off, dma->sub_page.cnt << 9, dir);
-   if (pci_dma_mapping_error(ctrl->card->dev, dma->dma_addr)) {
-   push_tracker(ctrl->trackers, tag);
-   rsxx_complete_dma(ctrl, dma, DMA_CANCELLED);
-   continue;
+   /*
+* The function pci_map_page is placed here because we
+* can only, by design, issue up to 255 commands to the
+* hardware at one time per DMA channel. So the maximum
+* amount of mapped memory would be 255 * 4 channels *
+* 4096 Bytes which is less than 2GB, the limit of a x8
+* Non-HWWD PCIe slot. This way the pci_map_page
+* function should never fail because of a lack of
+* mappable memory.
+*/
+   dma->dma_addr = pci_map_page(ctrl->card->dev, dma->page,
+   dma->pg_off, dma->sub_page.cnt << 9, 
dir);
+   if (pci_dma_mapping_error(ctrl->card->dev, 
dma->dma_addr)) {
+   push_tracker(ctrl->trackers, tag);
+   rsxx_complete_dma(ctrl, dma, DMA_CANCELLED);
+   continue;
+   }
}
 
set_tracker_dma(ctrl->trackers, tag, dma);
diff -uprN -X linux-2.6.32-420.el6-vanilla/Documentation/dontdiff 
linux-2.6.32-420.el6-vanilla/drivers/block/rsxx/rsxx_priv.h 
linux-2.6.32-420.el6/drivers/block/rsxx/rsxx_priv.h
--- linux-2.6.32-420.el6-vanilla/drivers/block/rsxx/rsxx_priv.h 2013-09-23 
09:15:30.0 -0500
+++ linux-2.6.32-420.el6/drivers/block/rsxx/rsxx_priv.h 2013-09-26 
15:24:14.185459296 -0500
@@ -52,7 +52,7 @@ struct proc_cmd;
 #define RS70_PCI_REV_SUPPORTED 4
 
 #define DRIVER_NAME "rsxx"
-#define DRIVER_VERSION "4.0"
+#define DRIVER_VERSION "4.0.1.2498"
 
 /* Block size is 4096 */
 #define RSXX_HW_BLK_SHIFT  12

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/1] rsxx: driver bug fix

2013-09-27 Thread Philip J. Kelleher
The incoming patch is for the rsxx driver (drivers/block/rsxx)

This patch fixes a kernel panic injected by patch id
8d26750143341831bc312f61c5ed141eeb75b8d0 where discards
are getting mapped through the pci_map_page function call.

This patch relies on git commit 8d26750143341831bc312f61c5ed141eeb75b8d0.

Regards,
- Philip Kelleher


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel: replace strict_strto*() with kstrto*()

2013-09-27 Thread Bruno Wolff III

On Fri, Sep 27, 2013 at 13:18:20 -0700,
  Andrew Morton  wrote:

On Fri, 27 Sep 2013 19:53:53 +0200 Jean Delvare  wrote:


Andrew,

On Fri, 27 Sep 2013 09:50:39 -0600, Bjorn Helgaas wrote:
> There's some indication that this change might have broken handling of
> signed types.  See
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-September/111758.html
> and https://bugzilla.kernel.org/show_bug.cgi?id=61811.

It seems this is hurting more users than I would have expected, and
people are spending significant amounts of time to figure out what the
root cause to their problem is. May I suggest that my fix should find
its way to Linus' tree rather sooner than later?



Done.  I had to send that one by hand as I'm not at my desk...


I just did a successful test of the patch from the mailing list. So it at 
least solved the problem in my case.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 02/10] mm: convert mm->nr_ptes to atomic_t

2013-09-27 Thread Johannes Weiner
On Sat, Sep 28, 2013 at 01:24:51AM +0300, Kirill A. Shutemov wrote:
> Cody P Schafer wrote:
> > On 09/27/2013 06:16 AM, Kirill A. Shutemov wrote:
> > > With split page table lock for PMD level we can't hold
> > > mm->page_table_lock while updating nr_ptes.
> > >
> > > Let's convert it to atomic_t to avoid races.
> > >
> > 
> > > ---
> > 
> > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > > index 84e0c56e1e..99f19e850d 100644
> > > --- a/include/linux/mm_types.h
> > > +++ b/include/linux/mm_types.h
> > > @@ -339,6 +339,7 @@ struct mm_struct {
> > >   pgd_t * pgd;
> > >   atomic_t mm_users;  /* How many users with 
> > > user space? */
> > >   atomic_t mm_count;  /* How many references 
> > > to "struct mm_struct" (users count as 1) */
> > > + atomic_t nr_ptes;   /* Page table pages */
> > >   int map_count;  /* number of VMAs */
> > >
> > >   spinlock_t page_table_lock; /* Protects page tables 
> > > and some counters */
> > > @@ -360,7 +361,6 @@ struct mm_struct {
> > >   unsigned long exec_vm;  /* VM_EXEC & ~VM_WRITE */
> > >   unsigned long stack_vm; /* VM_GROWSUP/DOWN */
> > >   unsigned long def_flags;
> > > - unsigned long nr_ptes;  /* Page table pages */
> > >   unsigned long start_code, end_code, start_data, end_data;
> > >   unsigned long start_brk, brk, start_stack;
> > >   unsigned long arg_start, arg_end, env_start, env_end;
> > 
> > Will 32bits always be enough here? Should atomic_long_t be used instead?
> 
> Good question!
> 
> On x86_64 we need one table to cover 2M (512 entries by 4k, 21 bits) of
> virtual address space. Total size of virtual memory which can be covered
> by 31-bit (32 - sign) nr_ptes is 52 bits (31 + 21).
> 
> Currently, on x86_64 with 4-level page tables we can use at most 48 bit of
> virtual address space (only half of it available for userspace), so we
> pretty safe here.
> 
> Although, it can be a potential problem, if (when) x86_64 will implement
> 5-level page tables -- 57-bits of virtual address space.
> 
> Any thoughts?

I'd just go with atomic_long_t to avoid having to worry about this in
the first place.  It's been ulong forever and I'm not aware of struct
mm_struct size being an urgent issue.  Cutting this type in half and
adding overflow checks adds more problems than it solves.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] staging/olpc: fix dependencies to fix build errors

2013-09-27 Thread Randy Dunlap
From: Randy Dunlap 

Fix build errors when GPIO_CS5535=m and FB_OLPC_DCON=y
by preventing that kconfig combination.

These build errors are caused by having a kconfig bool symbol
(FB_OLPC_DCON_1) that depend on a tristate symbol (GPIO_CS5535),
but when the tristate symbol is =m, the bool symbol is =y.

drivers/built-in.o: In function `dcon_read_status_xo_1':
olpc_dcon_xo_1.c:(.text+0x359531): undefined reference to `cs5535_gpio_set'
drivers/built-in.o: In function `dcon_wiggle_xo_1':
olpc_dcon_xo_1.c:(.text+0x35959f): undefined reference to `cs5535_gpio_set'
olpc_dcon_xo_1.c:(.text+0x359610): undefined reference to `cs5535_gpio_clear'
drivers/built-in.o:olpc_dcon_xo_1.c:(.text+0x3596a1): more undefined references 
to `cs5535_gpio_clear' follow
drivers/built-in.o: In function `dcon_wiggle_xo_1':
olpc_dcon_xo_1.c:(.text+0x359708): undefined reference to `cs5535_gpio_set'
drivers/built-in.o: In function `dcon_init_xo_1':
olpc_dcon_xo_1.c:(.text+0x35989b): undefined reference to `cs5535_gpio_clear'
olpc_dcon_xo_1.c:(.text+0x3598b5): undefined reference to `cs5535_gpio_isset'
olpc_dcon_xo_1.c:(.text+0x359963): undefined reference to 
`cs5535_gpio_setup_event'
olpc_dcon_xo_1.c:(.text+0x359980): undefined reference to `cs5535_gpio_set_irq'
olpc_dcon_xo_1.c:(.text+0x359a36): undefined reference to `cs5535_gpio_set'

Signed-off-by: Randy Dunlap 
Cc: Daniel Drake 
Cc: Jens Frederich 
---
This build problem has been around for a long time and is not
specific to mmotm.

 drivers/staging/olpc_dcon/Kconfig |1 +
 1 file changed, 1 insertion(+)

--- mmotm-2013-0926-1615.orig/drivers/staging/olpc_dcon/Kconfig
+++ mmotm-2013-0926-1615/drivers/staging/olpc_dcon/Kconfig
@@ -1,6 +1,7 @@
 config FB_OLPC_DCON
tristate "One Laptop Per Child Display CONtroller support"
depends on OLPC && FB
+   depends on (GPIO_CS5535 || GPIO_CS5535=n)
select I2C
select BACKLIGHT_CLASS_DEVICE
---help---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Davidlohr Bueso
On Fri, 2013-09-27 at 16:54 -0700, Jason Low wrote:
> On Fri, Sep 27, 2013 at 4:01 PM, Paul E. McKenney
>  wrote:
> > Yep.  The previous lock holder's smp_wmb() won't keep either the compiler
> > or the CPU from reordering things for the new lock holder.  They could for
> > example reorder the critical section to precede the node->locked check,
> > which would be very bad.
> 
> Paul, Tim, Longman,
> 
> How would you like the proposed changes below?
> 
> ---
> Subject: [PATCH] MCS: optimizations and barrier corrections

We *really* need to comment those barriers - explicitly that is :)

> 
> Delete the node->locked = 1 assignment if the lock is free as it won't be 
> used.
> 
> Delete the smp_wmb() in mcs_spin_lock() and add a full memory barrier at the
> end of the mcs_spin_lock() function. As Paul McKenney suggested, "you do need 
> a
> full memory barrier here in order to ensure that you see the effects of the
> previous lock holder's critical section." And in the mcs_spin_unlock(), move 
> the
> memory barrier so that it is before the "ACCESS_ONCE(next->locked) = 1;".
> 
> Signed-off-by: Jason Low 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Tim Chen 
> ---
>  include/linux/mcslock.h |7 +++
>  1 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> index 20fd3f0..edd57d2 100644
> --- a/include/linux/mcslock.h
> +++ b/include/linux/mcslock.h
> @@ -26,15 +26,14 @@ void mcs_spin_lock(struct mcs_spin_node **lock,
> struct mcs_spin_node *node)
> 
> prev = xchg(lock, node);
> if (likely(prev == NULL)) {
> -   /* Lock acquired */
> -   node->locked = 1;
> +   /* Lock acquired. No need to set node->locked since it
> won't be used */

Then, we need to explain/comment then the relationship between this
situation and the locked being set in mspin_unlock(), passing the lock
holder down the list.

> return;
> }
> ACCESS_ONCE(prev->next) = node;
> -   smp_wmb();
> /* Wait until the lock holder passes the lock down */
> while (!ACCESS_ONCE(node->locked))
> arch_mutex_cpu_relax();
> +   smp_mb();
>  }
> 
>  static void mcs_spin_unlock(struct mcs_spin_node **lock, struct
> mcs_spin_node *node)
> @@ -51,8 +50,8 @@ static void mcs_spin_unlock(struct mcs_spin_node
> **lock, struct mcs_spin_node *n
> while (!(next = ACCESS_ONCE(node->next)))
> arch_mutex_cpu_relax();
> }
> -   ACCESS_ONCE(next->locked) = 1;
> smp_wmb();
> +   ACCESS_ONCE(next->locked) = 1;
>  }
> 
>  #endif


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Jason Low
On Fri, Sep 27, 2013 at 4:01 PM, Paul E. McKenney
 wrote:
> Yep.  The previous lock holder's smp_wmb() won't keep either the compiler
> or the CPU from reordering things for the new lock holder.  They could for
> example reorder the critical section to precede the node->locked check,
> which would be very bad.

Paul, Tim, Longman,

How would you like the proposed changes below?

---
Subject: [PATCH] MCS: optimizations and barrier corrections

Delete the node->locked = 1 assignment if the lock is free as it won't be used.

Delete the smp_wmb() in mcs_spin_lock() and add a full memory barrier at the
end of the mcs_spin_lock() function. As Paul McKenney suggested, "you do need a
full memory barrier here in order to ensure that you see the effects of the
previous lock holder's critical section." And in the mcs_spin_unlock(), move the
memory barrier so that it is before the "ACCESS_ONCE(next->locked) = 1;".

Signed-off-by: Jason Low 
Signed-off-by: Paul E. McKenney 
Signed-off-by: Tim Chen 
---
 include/linux/mcslock.h |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
index 20fd3f0..edd57d2 100644
--- a/include/linux/mcslock.h
+++ b/include/linux/mcslock.h
@@ -26,15 +26,14 @@ void mcs_spin_lock(struct mcs_spin_node **lock,
struct mcs_spin_node *node)

prev = xchg(lock, node);
if (likely(prev == NULL)) {
-   /* Lock acquired */
-   node->locked = 1;
+   /* Lock acquired. No need to set node->locked since it
won't be used */
return;
}
ACCESS_ONCE(prev->next) = node;
-   smp_wmb();
/* Wait until the lock holder passes the lock down */
while (!ACCESS_ONCE(node->locked))
arch_mutex_cpu_relax();
+   smp_mb();
 }

 static void mcs_spin_unlock(struct mcs_spin_node **lock, struct
mcs_spin_node *node)
@@ -51,8 +50,8 @@ static void mcs_spin_unlock(struct mcs_spin_node
**lock, struct mcs_spin_node *n
while (!(next = ACCESS_ONCE(node->next)))
arch_mutex_cpu_relax();
}
-   ACCESS_ONCE(next->locked) = 1;
smp_wmb();
+   ACCESS_ONCE(next->locked) = 1;
 }

 #endif
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3] x86: EFI stub support for large memory maps

2013-09-27 Thread Linn Crosetto
On Thu, Sep 26, 2013 at 12:34:00PM +0100, Matt Fleming wrote:
> > I might add the following to your merge for semantic reasons:
> > 
> > diff --git a/arch/x86/boot/compressed/eboot.c 
> > b/arch/x86/boot/compressed/eboot.c
> > index 04b228d..a7677ba 100644
> > --- a/arch/x86/boot/compressed/eboot.c
> > +++ b/arch/x86/boot/compressed/eboot.c
> > @@ -730,6 +730,8 @@ get_map:
> > boot_params->alt_mem_k = 32 * 1024;
> >  
> > status = setup_e820(boot_params, e820ext, e820ext_size);
> > +   if (status != EFI_SUCCESS)
> > +   return status;
> >  
> > return EFI_SUCCESS;
> 
> Aha, nice catch! Though if setup_e820() fails we should be jumping to
> the 'free_mem_map' label so we don't leak the memory map, like so,
> 
> diff --git a/arch/x86/boot/compressed/eboot.c 
> b/arch/x86/boot/compressed/eboot.c
> index 04b228d..602950b 100644
> --- a/arch/x86/boot/compressed/eboot.c
> +++ b/arch/x86/boot/compressed/eboot.c
> @@ -730,8 +730,8 @@ get_map:
>   boot_params->alt_mem_k = 32 * 1024;
>  
>   status = setup_e820(boot_params, e820ext, e820ext_size);
> -
> - return EFI_SUCCESS;
> + if (status == EFI_SUCCESS)
> + return status;
>  
>  free_mem_map:
>   efi_call_phys1(sys_table->boottime->free_pool, mem_map);

Given that we have already successfully called exit_boot_services, can we still
make this call to free_pool?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Please revert 928bea964827d7824b548c1f8e06eccbbc4d0d7d

2013-09-27 Thread Yinghai Lu
[+ Rafael]

On Fri, Sep 27, 2013 at 4:19 PM, Benjamin Herrenschmidt
 wrote:
> On Fri, 2013-09-27 at 15:56 -0700, Yinghai Lu wrote:
>
>> ok, please if you are ok attached one instead. It will print some warning 
>> about
>> driver skipping pci_set_master, so we can catch more problem with drivers.
>
> Except that the message is pretty cryptic :-) Especially since the
> driver causing the message to be printed is not the one that did
> the mistake in the first place, it's the next one coming up that
> trips the warning.
>
> In any case, the root cause is indeed the PCIe port driver:
>
> We don't have ACPI, so pcie_port_platform_notify() isn't implemented,
> and pcie_ports_auto is true, so we end up with capabilities set to 0.

in
| commit fe31e69740eddc7316071ed5165fed6703c8cd12
| Author: Rafael J. Wysocki 
| Date:   Sun Dec 19 15:57:16 2010 +0100
|
|PCI/PCIe: Clear Root PME Status bits early during system resume
|
|I noticed that PCI Express PMEs don't work on my Toshiba Portege R500
|after the system has been woken up from a sleep state by a PME
|(through Wake-on-LAN).  After some investigation it turned out that
|the BIOS didn't clear the Root PME Status bit in the root port that
|received the wakeup PME and since the Requester ID was also set in
|the port's Root Status register, any subsequent PMEs didn't trigger
|interrupts.
|
|This problem can be avoided by clearing the Root PME Status bits in
|all PCI Express root ports during early resume.  For this purpose,
|add an early resume routine to the PCIe port driver and make this
|driver be always registered, even if pci_ports_disable is set (in
|which case the driver's only function is to provide the early
|resume callback).
|
|
|@@ -349,15 +349,18 @@ int pcie_port_device_register(struct pci_dev *dev)
|int status, capabilities, i, nr_service;
|int irqs[PCIE_PORT_DEVICE_MAXSERVICES];
|
|-   /* Get and check PCI Express port services */
|-   capabilities = get_port_device_capability(dev);
|-   if (!capabilities)
|-   return -ENODEV;
|-
|/* Enable PCI Express port device */
|status = pci_enable_device(dev);
|if (status)
|return status;
|+
|+   /* Get and check PCI Express port services */
|+   capabilities = get_port_device_capability(dev);
|+   if (!capabilities) {
|+   pcie_no_aspm();
|+   return 0;
|+   }
|+
|pci_set_master(dev);
|/*
| * Initialize service irqs. Don't use service devices that

>
> Thus the port driver bails out before calling pci_set_master(). The fix
> is to call pci_set_master() unconditionally. However that lead me to
> find to a few interesting oddities in that port driver code:

can we revert that partially change ? aka we should check get_port
at first...

like attached.

Thanks

Yinghai


fix_pci_set_master_port_pcie.patch
Description: Binary data


Re: [PATCH] checkpatch: Make the memory barrier test noisier

2013-09-27 Thread Oliver Neukum
On Fri, 2013-09-27 at 16:50 +0200, Peter Zijlstra wrote:
> On Fri, Sep 27, 2013 at 07:34:55AM -0700, Joe Perches wrote:
> > That would make it seem as if all barriers are SMP no?
> 
> I would think any memory barrier is ordering against someone else; if
> not smp then a device/hardware -- like for instance the hardware page
> table walker.
> 
> Barriers are fundamentally about order; and order only makes sense if
> there's more than 1 party to the game.

But not necessarily more than 1 kind of parties. It is perfectly
possible to have a barrier against other threads running the same
function.

Regards
Oliver


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] ipc,shm: prevent race with rmid in shmat(2)

2013-09-27 Thread Davidlohr Bueso
Hi Manfred,

On Fri, 2013-09-27 at 07:45 +0200, Manfred Spraul wrote:
> Hi Davidlohr,
> 
> On 09/16/2013 05:04 AM, Davidlohr Bueso wrote:
> > This fixes a race in shmat() between finding the msq and
> > actually attaching the segment, as another thread can delete shmid
> > underneath us if we are preempted before acquiring the kern_ipc_perm.lock.
> According the the man page, Linux supports attaching to deleted shm 
> segments:
> 
> http://linux.die.net/man/2/shmat
> >
> > On Linux, it is possible to attach a shared memory segment even if it 
> > is already marked to be deleted. However, POSIX.1-2001 does not 
> > specify this behavior and many other implementations do not support it.
> >

Good catch!

> Does your patch change that?

Yes, it should and furthermore it affects the following property:

 shm_nattch is decremented by one.  If it becomes 0 and the segment is
marked for deletion, the segment is deleted.



> Unfortunately, I have neither any experience with ipc/shm nor any test 
> cases.
> 
> And:
> As far as I can see it's not a problem if we are attaching to a deleted 
> segment: shm_close cleans up everything.

Agreed, please disregard this patch.

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] perf, tools: Fix sorting for 64bit entries

2013-09-27 Thread Andi Kleen
From: Andi Kleen 

Some of the node comparisons in hist.c dropped the upper
32bit by using an int variable to store the compare
result. This broke various 64bit fields, causing
incorrect collapsing (found for the TSX transaction field)

Just use int64_t always.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/hist.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 4714a72..4832b59 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -350,7 +350,7 @@ static struct hist_entry *add_hist_entry(struct hists 
*hists,
struct rb_node **p;
struct rb_node *parent = NULL;
struct hist_entry *he;
-   int cmp;
+   int64_t cmp;
 
p = >entries_in->rb_node;
 
@@ -887,7 +887,7 @@ static struct hist_entry *hists__add_dummy_entry(struct 
hists *hists,
struct rb_node **p;
struct rb_node *parent = NULL;
struct hist_entry *he;
-   int cmp;
+   int64_t cmp;
 
if (sort__need_collapse)
root = >entries_collapsed;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: applesmc oops in 3.10/3.11

2013-09-27 Thread Guenter Roeck

On 09/27/2013 11:03 AM, Chris Murphy wrote:


On Sep 27, 2013, at 11:59 AM, Guenter Roeck  wrote:


On Fri, Sep 27, 2013 at 11:41:42AM -0600, Chris Murphy wrote:


On Sep 27, 2013, at 11:12 AM, Guenter Roeck  wrote:


On Fri, Sep 27, 2013 at 12:21:04PM -0400, Josh Boyer wrote:

On Thu, Sep 26, 2013 at 2:34 AM, Henrik Rydberg  wrote:

This suggests that initialization may be attempted more than once. The key cache
is allocated only once, but the number of keys is read for each attempt.

No idea if that can happen, but if the number of keys can increase after
the first initialization attempt you would have an explanation for the crash.


Good idea, and easy enough to test with the patch below.


Should we apply this patch even though it may not solve the specific problem ?


Yes, why not - it certainly won't hurt. I am running it right now, so
it is at least run-tested.


Again, not sure if the key count can change, but the current code is at the very
least inconsistent, as it keeps reading the key count without updating or
verifying the cache size.


Yes - I agree that the error state is far-fetched, but it is hard to
see any other logical explanation. There is of course always the
possibility that the problem is somewhere else completely.

Proper patch attached.

Thanks,
Henrik

---

 From dedefba9167913c46e1896ce0624e68ffe95d532 Mon Sep 17 00:00:00 2001
From: Henrik Rydberg 
Date: Thu, 26 Sep 2013 08:33:16 +0200
Subject: [PATCH] hwmon: (applesmc) Check key count before proceeding

After reports from Chris and Josh Boyer of a rare crash in applesmc,
Guenter pointed at the initialization problem fixed below. The patch
has not been verified to fix the crash, but should be applied
regardless.

Reported-by: 
Suggested-by: Guenter Roeck 
Signed-off-by: Henrik Rydberg 
---
drivers/hwmon/applesmc.c | 11 ++-
1 file changed, 10 insertions(+), 1 deletion(-)


Thanks for the quick reply.  I'll get this rolled into our kernels soon.


I sent a pull request to Linus, so you should be able to pull it from
the upstream kernel shortly. Would be great to get feedback if the patch
solves the problem (or doesn't).


I'll start running it when it appears in koji. It's very transient, maybe one 
oops per week with lots of (other) testing. I'm not even sure if it happens on 
warm or cold boots or both.


When you do, can you possibly trigger an event based on the warning added
with the patch ? This might help us to identify if the problem fixed
with the patch actually happens.


I don't understand the question. I'm uncertain how to trigger, and also what 
event.



The patch includes a new warning message.

pr_warn("key count changed from %d to %d\n",
s->key_count, count);

It would be great if there would be a means to detect if this message is seen
in a kernel log, because it would show that the potential crash condition
fixed with the patch was actually encountered. This would help us to determine
if we actually fixed the problem or not.

Of course, we'll know if is wasn't fixed if the system still crashes.

Thanks,
Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] ACPI and power management fixes for v3.12-rc3

2013-09-27 Thread Rafael J. Wysocki
Hi Linus,

Please pull from the git repository at

  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
pm+acpi-3.12-rc3

to receive ACPI and power management fixes for v3.12-rc3 with
top-most commit dcc7bc3f3d91bbd5c15409a92317c2c24449a285

  Merge branch 'pm-cpufreq-fixes'

on top of commit 4a10c2ac2f368583138b774ca41fac4207911983

  Linux 3.12-rc2

These fix one recent cpufreq regression, a few older bugs that may
harm users and a kerneldoc typo.

Specifics:

 1) After recent locking changes in the cpufreq core it is possible
to trigger BUG_ON(!policy) in lock_policy_rwsem_read() if
cpufreq_get() is called before registering a cpufreq driver.
Fix from Viresh Kumar.

 2) If intel_pstate has been loaded already, it doesn't make sense
to do anything in acpi_cpufreq_init() and moreover doing something
in there in that case may be harmful, so make that function return
immediately if another cpufreq driver is already present.  From
Yinghai Lu.

 3) The ACPI IPMI driver sometimes attempts to acquire a mutex from
interrupt context, which can be avoided by replacing that mutex
with a spinlock.  From Lv Zheng.

 4) A NULL pointer may be dereferenced by the exynos5440 cpufreq
driver if a memory allocation made by it fails.  Fix from
Sachin Kamat.

 5) Hanjun Guo's commit fixes a typo in the kerneldoc comment
documenting acpi_bus_unregister_driver().

Thanks!


---

Hanjun Guo (1):
  ACPI / scan: fix typo in comments of acpi_bus_unregister_driver()

Lv Zheng (1):
  ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler()

Sachin Kamat (1):
  cpufreq: exynos5440: Fix potential NULL pointer dereference

Viresh Kumar (1):
  cpufreq: check cpufreq driver is valid and cpufreq isn't disabled in 
cpufreq_get()

Yinghai Lu (1):
  acpi-cpufreq: skip loading acpi_cpufreq after intel_pstate

---

 drivers/acpi/acpi_ipmi.c |   24 ++--
 drivers/acpi/scan.c  |2 +-
 drivers/cpufreq/acpi-cpufreq.c   |4 
 drivers/cpufreq/cpufreq.c|3 +++
 drivers/cpufreq/exynos5440-cpufreq.c |2 +-
 5 files changed, 23 insertions(+), 12 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Please revert 928bea964827d7824b548c1f8e06eccbbc4d0d7d

2013-09-27 Thread Benjamin Herrenschmidt
On Fri, 2013-09-27 at 15:56 -0700, Yinghai Lu wrote:

> ok, please if you are ok attached one instead. It will print some warning 
> about
> driver skipping pci_set_master, so we can catch more problem with drivers.

Except that the message is pretty cryptic :-) Especially since the
driver causing the message to be printed is not the one that did
the mistake in the first place, it's the next one coming up that
trips the warning.

In any case, the root cause is indeed the PCIe port driver:

We don't have ACPI, so pcie_port_platform_notify() isn't implemented,
and pcie_ports_auto is true, so we end up with capabilities set to 0.

Thus the port driver bails out before calling pci_set_master(). The fix
is to call pci_set_master() unconditionally. However that lead me to
find to a few interesting oddities in that port driver code:

 - If capabilities is 0, it returns after enabling the device and
doesn't disable it. But if it fails for any other reason later on (such
as failing to enable interrupts), it *will* disable the device. This is
inconsistent. In fact, if it had disabled the device as a result of the
0 capabilities, the problem would also not have happened (the subsequent
enable_bridge done by the e1000e driver would have done the right
thing). I've tested "fixing" that instead of moving the set_master call
and it fixes my problem as well.

 - In get_port_device_capability(), all capabilities are gated by a
combination of the bit in cap_mask and a corresponding HW check of
the presence of the relevant PCIe capability or similar... except
for the VC one, which is solely read from the HW capability. That means
that the platform pcie_port_platform_notify() has no ability to prevent
the VC capability (so if I have a broken bridge that advertises it but
my platform wants to block it, it can't).

 - I am quite nervous with the PCIe port driver disabling bridges. I
understand the intent but what if that bridge has some system device
behind it ? Something you don't even know about (used by ACPI, behind an
ISA bridge for example ?).

I think disabling bridges is a VERY risky proposition at all times
(including during PM) and I don't think the port driver should do it.

Maybe a more robust approach would be to detect one way or another that
the bridge was already enabled and only disable it if it wasn't or
something along those lines (ie ,restore it in the state it was)...

This is not my problem right now of course (in my case the bridge was
disabled to begin with) but I have a long experience with system stuff
hiding behind bridges and the code as it is written makes me a bit
nervous.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 5/6] x86, acpi, crash, kdump: Do reserve_crashkernel() after SRAT is parsed

2013-09-27 Thread Toshi Kani
On Wed, 2013-09-25 at 02:34 +0800, Zhang Yanfei wrote:
> From: Tang Chen 
> 
> Memory reserved for crashkernel could be large. So we should not allocate
> this memory bottom up from the end of kernel image.
> 
> When SRAT is parsed, we will be able to know whihc memory is hotpluggable,
> and we can avoid allocating this memory for the kernel. So reorder
> reserve_crashkernel() after SRAT is parsed.
> 
> Acked-by: Tejun Heo 
> Signed-off-by: Tang Chen 
> Reviewed-by: Zhang Yanfei 
> ---
>  arch/x86/kernel/setup.c |8 ++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index f0de629..36cfce3 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1120,8 +1120,6 @@ void __init setup_arch(char **cmdline_p)
>   acpi_initrd_override((void *)initrd_start, initrd_end - initrd_start);
>  #endif
>  
> - reserve_crashkernel();
> -
>   vsmp_init();
>  
>   io_delay_init();
> @@ -1136,6 +1134,12 @@ void __init setup_arch(char **cmdline_p)
>   initmem_init();
>   memblock_find_dma_reserve();
>  
> + /*
> +  * Reserve memory for crash kernel after SRAT is parsed so that it
> +  * won't consume hotpluggable memory.
> +  */
> + reserve_crashkernel();

Out of curiosity, is there any particular reason why it is moved after
memblock_find_dma_reserve(), not initmem_init()?

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [stable] Re: Dependency bug in the uvcvideo Kconfig

2013-09-27 Thread Greg KH
On Thu, Sep 19, 2013 at 04:00:53PM -0700, Randy Dunlap wrote:
> On 09/19/13 13:17, Randy Dunlap wrote:
> > On 09/18/13 20:44, Jeff P. Zacher wrote:
> >>  
> >>
> >> You are correct that this problem shown in the forum was in 3.5.4. 
> >> However, I am 
> >> having wither the same or similar problem in 3.10.7.
> >> Here is the broken config file, saved as .config-bad
> >>
> > 
> > The failing kernel config file is attached.
> 
> For Linux 3.10.x:
> 
> 
> This is already fixed in mainline but patches need to be backported.
> Specifically these 2 commits (in this order):
> 
> 
> commit a0f9354b1a319cb29c331bfd2e5a15d7f9b87fa4
> Author: Randy Dunlap 
> Date:   Wed May 8 17:28:13 2013 -0300
> 
> [media] media/usb: fix kconfig dependencies
> 
> and
> 
> commit 5077ac3b8108007f4a2b4589f2d373cf55453206
> Author: Mauro Carvalho Chehab 
> Date:   Wed May 22 11:25:52 2013 -0300
> 
> Properly handle tristate dependencies on USB/PCI menus
> 

Applied, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.12-rc2 - MIPS regression

2013-09-27 Thread Aaro Koskinen
Hi,

3.12-rc2 breaks the boot (BUG: scheduling while atomic, see logs below)
on Lemote Mini-PC (MIPS). According to git bisect, this is caused by:

ff522058bd717506b2fa066fa564657f2b86477e is the first bad commit
commit ff522058bd717506b2fa066fa564657f2b86477e
Author: Ralf Baechle 
Date:   Tue Sep 17 12:44:31 2013 +0200

MIPS: Fix accessing to per-cpu data when flushing the cache

Reverting the commit from v3.12-rc2 makes the board boot fine.

Here's the console log from the problem situation (vanilla 3.12-rc2),
it seems the logging is never stopping, so just the first few screenfuls:

[0.00] Linux version 3.12.0-rc2-lemote-los.git-5318619-dirty 
(aaro@blackmetal) (gcc version 4.8.1 (GCC) ) #1 PREEMPT Sat Sep 28 01:52:37 
EEST 2013
[0.00] busclock=6600, cpuclock=79778, memsize=256, 
highmemsize=256
[0.00] bootconsole [early0] enabled
[0.00] CPU revision is: 6303 (ICT Loongson-2)
[0.00] FPU revision is: 0501
[0.00] Checking for the multiply/shift bug... no.
[0.00] Checking for the daddiu bug... no.
[0.00] Determined physical RAM map:
[0.00]  memory: 1000 @  (usable)
[0.00]  memory: 3000 @ 1000 (reserved)
[0.00]  memory: 1000 @ 9000 (usable)
[0.00]  memory: 1000 @ 8000 (reserved)
[0.00] Initrd not found or empty - disabling initrd
[0.00] Zone ranges:
[0.00]   Normal   [mem 0x-0x9fff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x-0x3fff]
[0.00]   node   0: [mem 0x8000-0x9fff]
[0.00] Reserving 0MB of memory at 0MB for crashkernel
[0.00] Primary instruction cache 64kB, VIPT, direct mapped, linesize 32 
bytes.
[0.00] Primary data cache 64kB, 4-way, VIPT, no aliases, linesize 32 
bytes
[0.00] Unified secondary cache 512kB 4-way, linesize 32 bytes.
[0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 97968
[0.00] Kernel command line: console=tty console=ttyS0,115200
[0.00] PID hash table entries: 4096 (order: 1, 32768 bytes)
[0.00] Dentry cache hash table entries: 262144 (order: 7, 2097152 bytes)
[0.00] Inode-cache hash table entries: 131072 (order: 6, 1048576 bytes)
[0.00] Memory: 501904K/1572864K available (4091K kernel code, 264K 
rwdata, 884K rodata, 9824K init, 195K bss, 1070960K reserved)
[0.00] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[0.00] Preemptible hierarchical RCU implementation.
[0.00] NR_IRQS:128
[0.00] Console: colour dummy device 80x25
[0.00] console [tty0] enabled
[0.008000] Calibrating delay loop... 528.38 BogoMIPS (lpj=1056768)
[0.044000] pid_max: default: 32768 minimum: 301
[0.048000] Mount-cache hash table entries: 1024
[0.052000] Checking for the daddi bug... no.
[0.056000] devtmpfs: initialized
[0.06] NET: Registered protocol family 16
[0.08] bio: create slab  at 0
[0.088000] SCSI subsystem initialized
[0.092000] usbcore: registered new interface driver usbfs
[0.096000] usbcore: registered new interface driver hub
[0.10] usbcore: registered new device driver usb
[0.104000] PCI host bridge to bus :00
[0.108000] pci_bus :00: root bus resource [mem 0x4000-0x7fff]
[0.112000] pci_bus :00: root bus resource [io  0x4000-0x]
[0.116000] pci_bus :00: No busn resource found for root bus, will use 
[bus 00-ff]
[0.124000] pci :00:08.0: BAR 0: assigned [mem 0x4000-0x4fff 
pref]
[0.128000] pci :00:08.0: BAR 1: assigned [mem 0x5000-0x5003]
[0.132000] pci :00:06.0: BAR 6: assigned [mem 0x5004-0x5005 
pref]
[0.136000] pci :00:08.0: BAR 6: assigned [mem 0x5006-0x5006 
pref]
[0.14] pci :00:0e.4: BAR 0: assigned [mem 0x5007-0x50070fff]
[0.144000] pci :00:0e.5: BAR 0: assigned [mem 0x50071000-0x50071fff]
[0.148000] pci :00:06.0: BAR 0: assigned [io  0x4000-0x40ff]
[0.152000] pci :00:06.0: BAR 1: assigned [mem 0x50072000-0x500720ff]
[0.156000] pci :00:0e.0: BAR 1: assigned [io  0x4400-0x44ff]
[0.16] pci :00:08.0: BAR 2: assigned [io  0x4800-0x487f]
[0.164000] pci :00:0e.0: BAR 4: assigned [io  0x4880-0x48ff]
[0.168000] pci :00:0e.3: BAR 0: assigned [io  0x4c00-0x4c7f]
[0.172000] pci :00:0e.0: BAR 2: assigned [io  0x4c80-0x4cbf]
[0.176000] pci :00:0e.0: BAR 5: assigned [io  0x4cc0-0x4cdf]
[0.18] pci :00:0e.2: BAR 4: assigned [io  0x4ce0-0x4cef]
[0.184000] pci :00:0e.0: BAR 0: assigned [io  0x4cf0-0x4cf7]
[0.188000] slot: 6, pin: 1, irq: 36
[0.192000] slot: 8, pin: 1, irq: 38
[0.196000] Switched to clocksource mfgpt
[

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Paul E. McKenney
On Fri, Sep 27, 2013 at 03:46:45PM -0700, Tim Chen wrote:
> On Fri, 2013-09-27 at 13:38 -0700, Paul E. McKenney wrote:
> > On Fri, Sep 27, 2013 at 12:38:53PM -0700, Tim Chen wrote:
> > > On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
> > > > On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> > > > > We will need the MCS lock code for doing optimistic spinning for 
> > > > > rwsem.
> > > > > Extracting the MCS code from mutex.c and put into its own file allow 
> > > > > us
> > > > > to reuse this code easily for rwsem.
> > > > > 
> > > > > Signed-off-by: Tim Chen 
> > > > > Signed-off-by: Davidlohr Bueso 
> > > > > ---
> > > > >  include/linux/mcslock.h |   58 
> > > > > +++
> > > > >  kernel/mutex.c  |   58 
> > > > > +-
> > > > >  2 files changed, 65 insertions(+), 51 deletions(-)
> > > > >  create mode 100644 include/linux/mcslock.h
> > > > > 
> > > > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > > > new file mode 100644
> > > > > index 000..20fd3f0
> > > > > --- /dev/null
> > > > > +++ b/include/linux/mcslock.h
> > > > > @@ -0,0 +1,58 @@
> > > > > +/*
> > > > > + * MCS lock defines
> > > > > + *
> > > > > + * This file contains the main data structure and API definitions of 
> > > > > MCS lock.
> > > > > + */
> > > > > +#ifndef __LINUX_MCSLOCK_H
> > > > > +#define __LINUX_MCSLOCK_H
> > > > > +
> > > > > +struct mcs_spin_node {
> > > > > + struct mcs_spin_node *next;
> > > > > + int   locked;   /* 1 if lock acquired */
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * We don't inline mcs_spin_lock() so that perf can correctly 
> > > > > account for the
> > > > > + * time spent in this lock function.
> > > > > + */
> > > > > +static noinline
> > > > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node 
> > > > > *node)
> > > > > +{
> > > > > + struct mcs_spin_node *prev;
> > > > > +
> > > > > + /* Init node */
> > > > > + node->locked = 0;
> > > > > + node->next   = NULL;
> > > > > +
> > > > > + prev = xchg(lock, node);
> > > > > + if (likely(prev == NULL)) {
> > > > > + /* Lock acquired */
> > > > > + node->locked = 1;
> > > > > + return;
> > > > > + }
> > > > > + ACCESS_ONCE(prev->next) = node;
> > > > > + smp_wmb();
> > > 
> > > BTW, is the above memory barrier necessary?  It seems like the xchg
> > > instruction already provided a memory barrier.
> > > 
> > > Now if we made the changes that Jason suggested:
> > > 
> > > 
> > > /* Init node */
> > > -   node->locked = 0;
> > > node->next   = NULL;
> > > 
> > > prev = xchg(lock, node);
> > > if (likely(prev == NULL)) {
> > > /* Lock acquired */
> > > -   node->locked = 1;
> > > return;
> > > }
> > > +   node->locked = 0;
> > > ACCESS_ONCE(prev->next) = node;
> > > smp_wmb();
> > > 
> > > We are probably still okay as other cpus do not read the value of
> > > node->locked, which is a local variable.
> > 
> > I don't immediately see the need for the smp_wmb() in either case.
> 
> 
> Thinking a bit more, the following could happen in Jason's 
> initial patch proposal.  In this case variable "prev" referenced 
> by CPU1 points to "node" referenced by CPU2  
> 
>   CPU 1 (calling lock)CPU 2 (calling unlock)
>   ACCESS_ONCE(prev->next) = node
>   *next = ACCESS_ONCE(node->next);
>   ACCESS_ONCE(next->locked) = 1;
>   node->locked = 0;
> 
> Then we will be spinning forever on CPU1 as we overwrite the lock passed
> from CPU2 before we check it.  The original code assign 
> "node->locked = 0" before xchg does not have this issue.
> Doing the following change of moving smp_wmb immediately
> after node->locked assignment (suggested by Jason)
> 
>   node->locked = 0;
>   smp_wmb();
>   ACCESS_ONCE(prev->next) = node;
> 
> could avoid the problem, but will need closer scrutiny to see if
> there are other pitfalls if wmb happen before 
>   
>   ACCESS_ONCE(prev->next) = node;

I could believe that an smp_wmb() might be needed before the
"ACCESS_ONCE(prev->next) = node;", just not after.

> > > > > + /* Wait until the lock holder passes the lock down */
> > > > > + while (!ACCESS_ONCE(node->locked))
> > > > > + arch_mutex_cpu_relax();
> > 
> > However, you do need a full memory barrier here in order to ensure that
> > you see the effects of the previous lock holder's critical section.
> 
> Is it necessary to add a memory barrier after acquiring
> the lock if the previous lock holder execute smp_wmb before passing
> the lock?

Yep.  The previous lock holder's smp_wmb() won't keep either the compiler
or the CPU from reordering things for the new 

Re: Please revert 928bea964827d7824b548c1f8e06eccbbc4d0d7d

2013-09-27 Thread Yinghai Lu
On Fri, Sep 27, 2013 at 3:38 PM, Benjamin Herrenschmidt
 wrote:
> On Fri, 2013-09-27 at 14:54 -0700, Yinghai Lu wrote:
>> On Fri, Sep 27, 2013 at 2:46 PM, Benjamin Herrenschmidt
>>  wrote:
>>
>> > Wouldn't it be better to simply have pci_enable_device() always set bus
>> > master on a bridge? I don't see any case where it makes sense to have
>> > an enabled bridge without the master bit set on it...
>>
>> Do you mean attached?
>
> So this patch works and fixes the problem. I think it makes the whole
> thing more robust and should be applied.

good.

>
> I still don't know why the bridge doesn't get enabled properly without
> it yes, tracking it down (the machine in question takes a LONG time to
> reboot :-)

ok, please if you are ok attached one instead. It will print some warning about
driver skipping pci_set_master, so we can catch more problem with drivers.

Thanks

Yinghai


pci_set_master_again_v2.patch
Description: Binary data


Re: [PATCH 1/3] thermal: bcm281xx: Add thermal driver

2013-09-27 Thread Wendy Ng


On 9/25/2013 12:26 PM, Matt Porter wrote:

On Mon, Sep 23, 2013 at 10:51:36AM -0700, Wendy Ng wrote:

This adds the support for reading out temperature from Broadcom bcm281xx
SoCs.

Signed-off-by: Wendy Ng 
Reviewed-by: Markus Mayer 
Reviewed-by: Christian Daudt 
---
  .../bindings/thermal/bcm-kona-thermal.txt  |   18 +++
  drivers/thermal/Kconfig|   10 ++
  drivers/thermal/Makefile   |1 +
  drivers/thermal/bcm_thermal.c  |  170 
  4 files changed, 199 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/thermal/bcm-kona-thermal.txt
  create mode 100644 drivers/thermal/bcm_thermal.c

diff --git a/Documentation/devicetree/bindings/thermal/bcm-kona-thermal.txt 
b/Documentation/devicetree/bindings/thermal/bcm-kona-thermal.txt
new file mode 100644
index 000..acca99e
--- /dev/null
+++ b/Documentation/devicetree/bindings/thermal/bcm-kona-thermal.txt
@@ -0,0 +1,18 @@
+* Broadcom Kona Thermal Management Unit
+
+This version is for the BCM281xx family of SoCs.
+
+Required properties:
+- compatible : "brcm,bcm11351-thermal", "brcm,kona-thermal"
+- reg : Address range of the thermal register
+- thermal-name: this entry must be specified and it will be passed into
+thermal_zone_device_register().  This name will also be reported under Hwmon
+sysfs 'name' attribute.
+
+Example:
+   thermal@34008000 {
+   compatible = "brcm,bcm11351-thermal", "brcm,kona-thermal";
+   reg = <0x34008000 0x0024>;
+   thermal-name = "bcm_kona_therm";
+   status = "disabled";
+   };
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index dbfc390..7f823f0 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -134,6 +134,16 @@ config KIRKWOOD_THERMAL
  Support for the Kirkwood thermal sensor driver into the Linux thermal
  framework. Only kirkwood 88F6282 and 88F6283 have this sensor.
  
+config BCM_THERMAL

+   tristate "Temperature sensor on Broadcom BCM281xx family of SoCs"
+   depends on ARCH_BCM

Hi Wendy,

I just noticed that, depending on acceptance, this could collide with
Christian's ARCH_BCM->ARCH_BCM_MOBILE rename that is expected to go into
3.13. I'm not sure if this series is targeted for 3.13 (due to the discussion
about Eduardo's subsystem changes that impact it). If it is, you might
want to rebase on Christian's rename series
http://www.spinics.net/lists/arm-kernel/msg274963.html and use
ARCH_BCM_MOBILE here.

-Matt

Hi Matt,

I have uploaded v2 of this thermal driver patch series to make it 
compatible with Christian's

ARCH_BCM->ARCH_BCM_MOBILE changes.

Thanks for pointing this out!


+   default y
+   help
+ If you say yes here you get support for TMU (Thermal Management
+ Unit) on Broadcom BCM281xx family of SoCs. This provides thermal
+ monitoring of CPU clusters, graphics, and SoC glue, but does not
+ include monitoring of charger temperature.




--
Best regards,
-Wendy


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Tim Chen
On Fri, 2013-09-27 at 13:38 -0700, Paul E. McKenney wrote:
> On Fri, Sep 27, 2013 at 12:38:53PM -0700, Tim Chen wrote:
> > On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
> > > On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> > > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > > to reuse this code easily for rwsem.
> > > > 
> > > > Signed-off-by: Tim Chen 
> > > > Signed-off-by: Davidlohr Bueso 
> > > > ---
> > > >  include/linux/mcslock.h |   58 
> > > > +++
> > > >  kernel/mutex.c  |   58 
> > > > +-
> > > >  2 files changed, 65 insertions(+), 51 deletions(-)
> > > >  create mode 100644 include/linux/mcslock.h
> > > > 
> > > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > > new file mode 100644
> > > > index 000..20fd3f0
> > > > --- /dev/null
> > > > +++ b/include/linux/mcslock.h
> > > > @@ -0,0 +1,58 @@
> > > > +/*
> > > > + * MCS lock defines
> > > > + *
> > > > + * This file contains the main data structure and API definitions of 
> > > > MCS lock.
> > > > + */
> > > > +#ifndef __LINUX_MCSLOCK_H
> > > > +#define __LINUX_MCSLOCK_H
> > > > +
> > > > +struct mcs_spin_node {
> > > > +   struct mcs_spin_node *next;
> > > > +   int   locked;   /* 1 if lock acquired */
> > > > +};
> > > > +
> > > > +/*
> > > > + * We don't inline mcs_spin_lock() so that perf can correctly account 
> > > > for the
> > > > + * time spent in this lock function.
> > > > + */
> > > > +static noinline
> > > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node 
> > > > *node)
> > > > +{
> > > > +   struct mcs_spin_node *prev;
> > > > +
> > > > +   /* Init node */
> > > > +   node->locked = 0;
> > > > +   node->next   = NULL;
> > > > +
> > > > +   prev = xchg(lock, node);
> > > > +   if (likely(prev == NULL)) {
> > > > +   /* Lock acquired */
> > > > +   node->locked = 1;
> > > > +   return;
> > > > +   }
> > > > +   ACCESS_ONCE(prev->next) = node;
> > > > +   smp_wmb();
> > 
> > BTW, is the above memory barrier necessary?  It seems like the xchg
> > instruction already provided a memory barrier.
> > 
> > Now if we made the changes that Jason suggested:
> > 
> > 
> > /* Init node */
> > -   node->locked = 0;
> > node->next   = NULL;
> > 
> > prev = xchg(lock, node);
> > if (likely(prev == NULL)) {
> > /* Lock acquired */
> > -   node->locked = 1;
> > return;
> > }
> > +   node->locked = 0;
> > ACCESS_ONCE(prev->next) = node;
> > smp_wmb();
> > 
> > We are probably still okay as other cpus do not read the value of
> > node->locked, which is a local variable.
> 
> I don't immediately see the need for the smp_wmb() in either case.


Thinking a bit more, the following could happen in Jason's 
initial patch proposal.  In this case variable "prev" referenced 
by CPU1 points to "node" referenced by CPU2  

CPU 1 (calling lock)CPU 2 (calling unlock)
ACCESS_ONCE(prev->next) = node
*next = ACCESS_ONCE(node->next);
ACCESS_ONCE(next->locked) = 1;
node->locked = 0;

Then we will be spinning forever on CPU1 as we overwrite the lock passed
from CPU2 before we check it.  The original code assign 
"node->locked = 0" before xchg does not have this issue.
Doing the following change of moving smp_wmb immediately
after node->locked assignment (suggested by Jason)

node->locked = 0;
smp_wmb();
ACCESS_ONCE(prev->next) = node;

could avoid the problem, but will need closer scrutiny to see if
there are other pitfalls if wmb happen before 

ACCESS_ONCE(prev->next) = node;


> > 
> > > > +   /* Wait until the lock holder passes the lock down */
> > > > +   while (!ACCESS_ONCE(node->locked))
> > > > +   arch_mutex_cpu_relax();
> 
> However, you do need a full memory barrier here in order to ensure that
> you see the effects of the previous lock holder's critical section.

Is it necessary to add a memory barrier after acquiring
the lock if the previous lock holder execute smp_wmb before passing
the lock?

Thanks.

Tim


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 3/6] x86/mm: Factor out of top-down direct mapping setup

2013-09-27 Thread Toshi Kani
On Wed, 2013-09-25 at 02:29 +0800, Zhang Yanfei wrote:
> From: Tang Chen 
> 
> This patch creates a new function memory_map_top_down to
> factor out of the top-down direct memory mapping pagetable
> setup. This is also a preparation for the following patch,
> which will introduce the bottom-up memory mapping. That said,
> we will put the two ways of pagetable setup into separate
> functions, and choose to use which way in init_mem_mapping,
> which makes the code more clear.
> 
> Signed-off-by: Tang Chen 
> Signed-off-by: Zhang Yanfei 

Acked-by: Toshi Kani 

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 3/3] ARM: bcm281xx: Add thermal driver to device tree.

2013-09-27 Thread Wendy Ng
This patch adds the device tree node for Broadcom bcm281xx SoCs thermal
driver.

Signed-off-by: Wendy Ng 
Reviewed-by: Markus Mayer 
Reviewed-by: Christian Daudt 
---
 arch/arm/boot/dts/bcm11351-brt.dts |4 +++-
 arch/arm/boot/dts/bcm11351.dtsi|6 ++
 arch/arm/boot/dts/bcm28155-ap.dts  |4 
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/bcm11351-brt.dts 
b/arch/arm/boot/dts/bcm11351-brt.dts
index 9d36eb4..0771b6b 100644
--- a/arch/arm/boot/dts/bcm11351-brt.dts
+++ b/arch/arm/boot/dts/bcm11351-brt.dts
@@ -43,5 +43,7 @@
status = "okay";
};
 
-
+   thermal@34008000 {
+   status = "okay";
+   };
 };
diff --git a/arch/arm/boot/dts/bcm11351.dtsi b/arch/arm/boot/dts/bcm11351.dtsi
index 05a5aab..aa13353 100644
--- a/arch/arm/boot/dts/bcm11351.dtsi
+++ b/arch/arm/boot/dts/bcm11351.dtsi
@@ -96,4 +96,10 @@
status = "disabled";
};
 
+   thermal@34008000 {
+   compatible = "brcm,bcm11351-thermal", "brcm,kona-thermal";
+   reg = <0x34008000 0x0024>;
+   thermal-name = "bcm_kona_therm";
+   status = "disabled";
+   };
 };
diff --git a/arch/arm/boot/dts/bcm28155-ap.dts 
b/arch/arm/boot/dts/bcm28155-ap.dts
index 96ae67a..a39aa47 100644
--- a/arch/arm/boot/dts/bcm28155-ap.dts
+++ b/arch/arm/boot/dts/bcm28155-ap.dts
@@ -42,4 +42,8 @@
max-frequency = <4800>;
status = "okay";
};
+
+   thermal@34008000 {
+   status = "okay";
+   };
 };
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/3] thermal: bcm281xx: Add thermal driver

2013-09-27 Thread Wendy Ng
This adds the support for reading out temperature from Broadcom bcm281xx
SoCs.

Signed-off-by: Wendy Ng 
Reviewed-by: Markus Mayer 
Reviewed-by: Christian Daudt 
---
 .../bindings/thermal/bcm-kona-thermal.txt  |   18 +++
 drivers/thermal/Kconfig|   10 ++
 drivers/thermal/Makefile   |1 +
 drivers/thermal/bcm_thermal.c  |  170 
 4 files changed, 199 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/thermal/bcm-kona-thermal.txt
 create mode 100644 drivers/thermal/bcm_thermal.c

diff --git a/Documentation/devicetree/bindings/thermal/bcm-kona-thermal.txt 
b/Documentation/devicetree/bindings/thermal/bcm-kona-thermal.txt
new file mode 100644
index 000..acca99e
--- /dev/null
+++ b/Documentation/devicetree/bindings/thermal/bcm-kona-thermal.txt
@@ -0,0 +1,18 @@
+* Broadcom Kona Thermal Management Unit
+
+This version is for the BCM281xx family of SoCs.
+
+Required properties:
+- compatible : "brcm,bcm11351-thermal", "brcm,kona-thermal"
+- reg : Address range of the thermal register
+- thermal-name: this entry must be specified and it will be passed into
+thermal_zone_device_register().  This name will also be reported under Hwmon
+sysfs 'name' attribute.
+
+Example:
+   thermal@34008000 {
+   compatible = "brcm,bcm11351-thermal", "brcm,kona-thermal";
+   reg = <0x34008000 0x0024>;
+   thermal-name = "bcm_kona_therm";
+   status = "disabled";
+   };
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index dbfc390..6a5341c 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -134,6 +134,16 @@ config KIRKWOOD_THERMAL
  Support for the Kirkwood thermal sensor driver into the Linux thermal
  framework. Only kirkwood 88F6282 and 88F6283 have this sensor.
 
+config BCM_THERMAL
+   tristate "Temperature sensor on Broadcom BCM281xx family of SoCs"
+   depends on ARCH_BCM_MOBILE
+   default y
+   help
+ If you say yes here you get support for TMU (Thermal Management
+ Unit) on Broadcom BCM281xx family of SoCs. This provides thermal
+ monitoring of CPU clusters, graphics, and SoC glue, but does not
+ include monitoring of charger temperature.
+
 config DOVE_THERMAL
tristate "Temperature sensor on Marvell Dove SoCs"
depends on ARCH_DOVE
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index 584b363..3ea8c1c 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_SPEAR_THERMAL)   += spear_thermal.o
 obj-$(CONFIG_RCAR_THERMAL) += rcar_thermal.o
 obj-$(CONFIG_KIRKWOOD_THERMAL)  += kirkwood_thermal.o
 obj-y  += samsung/
+obj-$(CONFIG_BCM_THERMAL)   += bcm_thermal.o
 obj-$(CONFIG_DOVE_THERMAL) += dove_thermal.o
 obj-$(CONFIG_DB8500_THERMAL)   += db8500_thermal.o
 obj-$(CONFIG_ARMADA_THERMAL)   += armada_thermal.o
diff --git a/drivers/thermal/bcm_thermal.c b/drivers/thermal/bcm_thermal.c
new file mode 100644
index 000..131d3c4
--- /dev/null
+++ b/drivers/thermal/bcm_thermal.c
@@ -0,0 +1,170 @@
+/*
+ * Copyright 2013 Broadcom Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2,
+ * as published by the Free Software Foundation (the "GPL").
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * A copy of the GPL is available at 
http://www.broadcom.com/licenses/GPLv2.php,
+ * or by writing to the Free Software Foundation, Inc.,
+ * 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ */
+
+/**
+*  Broadcom Thermal Management Unit - bcm_tmu
+*/
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* From TMON Register Database */
+#define TMON_TEMP_VAL_OFFSET   0x001c
+#define TMON_TEMP_VAL_TEMP_VAL_SHIFT   0
+#define TMON_TEMP_VAL_TEMP_VAL_MASK0x03ff
+
+/* Broadcom Thermal Zone Device Structure */
+struct bcm_thermal_zone_priv {
+   char name[THERMAL_NAME_LENGTH];
+   void __iomem *base;
+};
+
+/* Temperature conversion function for TMON block */
+static long raw_to_mcelsius(u32 raw)
+{
+   /*
+* According to Broadcom internal Analog Module Specification
+* the formula for converting TMON block output to temperature in
+* degree Celsius is:
+*  T = 428 - (0.561 * raw)
+* Note: the valid operating range for the TMON block is -40C to 125C
+*/
+   return 428000 - (561 * (long)raw);
+}
+
+/* Get temperature callback function for thermal zone */
+static int bcm_get_temp(struct thermal_zone_device *thermal,

[PATCH v2 2/3] ARM: bcm281xx: Turn on Thermal and HWMON drivers.

2013-09-27 Thread Wendy Ng
This enables the thermal and HWMON drivers for Broadcom bcm281xx SoCs.

Signed-off-by: Wendy Ng 
Reviewed-by: Markus Mayer 
Reviewed-by: Christian Daudt 
---
 arch/arm/configs/bcm_defconfig |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/configs/bcm_defconfig b/arch/arm/configs/bcm_defconfig
index 6e49310..1474661 100644
--- a/arch/arm/configs/bcm_defconfig
+++ b/arch/arm/configs/bcm_defconfig
@@ -83,7 +83,8 @@ CONFIG_SERIAL_8250_DW=y
 CONFIG_HW_RANDOM=y
 CONFIG_I2C=y
 CONFIG_I2C_CHARDEV=y
-# CONFIG_HWMON is not set
+CONFIG_THERMAL=y
+CONFIG_HWMON=y
 CONFIG_VIDEO_OUTPUT_CONTROL=y
 CONFIG_FB=y
 CONFIG_BACKLIGHT_LCD_SUPPORT=y
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/3] thermal: bcm281xx: Add thermal driver

2013-09-27 Thread Wendy Ng
This patch series adds the support of the thermal driver for Broadcom
bcm281xx family of SoCs.

The change for this version of the patch is to work with the
ARCH_BCM->ARCH_BCM_MOBILE renaming series as shown in the link below:
http://www.spinics.net/lists/arm-kernel/msg274963.html

Wendy Ng (3):
  thermal: bcm281xx: Add thermal driver
  ARM: bcm281xx: Turn on Thermal and HWMON drivers.
  ARM: bcm281xx: Add thermal driver to device tree.

 .../bindings/thermal/bcm-kona-thermal.txt  |   18 +++
 arch/arm/boot/dts/bcm11351-brt.dts |4 +-
 arch/arm/boot/dts/bcm11351.dtsi|6 +
 arch/arm/boot/dts/bcm28155-ap.dts  |4 +
 arch/arm/configs/bcm_defconfig |3 +-
 drivers/thermal/Kconfig|   10 ++
 drivers/thermal/Makefile   |1 +
 drivers/thermal/bcm_thermal.c  |  170 
 8 files changed, 214 insertions(+), 2 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/thermal/bcm-kona-thermal.txt
 create mode 100644 drivers/thermal/bcm_thermal.c

-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Please revert 928bea964827d7824b548c1f8e06eccbbc4d0d7d

2013-09-27 Thread Benjamin Herrenschmidt
On Fri, 2013-09-27 at 14:54 -0700, Yinghai Lu wrote:
> On Fri, Sep 27, 2013 at 2:46 PM, Benjamin Herrenschmidt
>  wrote:
> 
> > Wouldn't it be better to simply have pci_enable_device() always set bus
> > master on a bridge? I don't see any case where it makes sense to have
> > an enabled bridge without the master bit set on it...
> 
> Do you mean attached?

So this patch works and fixes the problem. I think it makes the whole
thing more robust and should be applied.

I still don't know why the bridge doesn't get enabled properly without
it yes, tracking it down (the machine in question takes a LONG time to
reboot :-)

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 2/6] memblock: Introduce bottom-up allocation mode

2013-09-27 Thread Toshi Kani
On Wed, 2013-09-25 at 02:27 +0800, Zhang Yanfei wrote:
> From: Tang Chen 
> 
> The Linux kernel cannot migrate pages used by the kernel. As a result, kernel
> pages cannot be hot-removed. So we cannot allocate hotpluggable memory for
> the kernel.
> 
> ACPI SRAT (System Resource Affinity Table) contains the memory hotplug info.
> But before SRAT is parsed, memblock has already started to allocate memory
> for the kernel. So we need to prevent memblock from doing this.
> 
> In a memory hotplug system, any numa node the kernel resides in should
> be unhotpluggable. And for a modern server, each node could have at least
> 16GB memory. So memory around the kernel image is highly likely 
> unhotpluggable.
> 
> So the basic idea is: Allocate memory from the end of the kernel image and
> to the higher memory. Since memory allocation before SRAT is parsed won't
> be too much, it could highly likely be in the same node with kernel image.
> 
> The current memblock can only allocate memory top-down. So this patch 
> introduces
> a new bottom-up allocation mode to allocate memory bottom-up. And later
> when we use this allocation direction to allocate memory, we will limit
> the start address above the kernel.
> 
> Signed-off-by: Tang Chen 
> Signed-off-by: Zhang Yanfei 

 :

>  /**
> + * __memblock_find_range - find free area utility
> + * @start: start of candidate range
> + * @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
> + * @size: size of free area to find
> + * @align: alignment of free area to find
> + * @nid: nid of the free area to find, %MAX_NUMNODES for any node
> + *
> + * Utility called from memblock_find_in_range_node(), find free area 
> bottom-up.
> + *
> + * RETURNS:
> + * Found address on success, 0 on failure.
> + */
> +static phys_addr_t __init_memblock
> +__memblock_find_range(phys_addr_t start, phys_addr_t end, phys_addr_t size,

Similarly, how about name this function as
__memblock_find_range_bottom_up()?


> +   phys_addr_t align, int nid)
> +{
> + phys_addr_t this_start, this_end, cand;
> + u64 i;
> +
> + for_each_free_mem_range(i, nid, _start, _end, NULL) {
> + this_start = clamp(this_start, start, end);
> + this_end = clamp(this_end, start, end);
> +
> + cand = round_up(this_start, align);
> + if (cand < this_end && this_end - cand >= size)
> + return cand;
> + }
> +
> + return 0;
> +}
> +
> +/**
>   * __memblock_find_range_rev - find free area utility, in reverse order
>   * @start: start of candidate range
>   * @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
> @@ -93,7 +128,7 @@ static long __init_memblock 
> memblock_overlaps_region(struct memblock_type *type,
>   * Utility called from memblock_find_in_range_node(), find free area 
> top-down.
>   *
>   * RETURNS:
> - * Found address on success, %0 on failure.
> + * Found address on success, 0 on failure.
>   */
>  static phys_addr_t __init_memblock
>  __memblock_find_range_rev(phys_addr_t start, phys_addr_t end,
> @@ -127,13 +162,24 @@ __memblock_find_range_rev(phys_addr_t start, 
> phys_addr_t end,
>   *
>   * Find @size free area aligned to @align in the specified range and node.
>   *
> + * When allocation direction is bottom-up, the @start should be greater
> + * than the end of the kernel image. Otherwise, it will be trimmed. The
> + * reason is that we want the bottom-up allocation just near the kernel
> + * image so it is highly likely that the allocated memory and the kernel
> + * will reside in the same node.
> + *
> + * If bottom-up allocation failed, will try to allocate memory top-down.
> + *
>   * RETURNS:
> - * Found address on success, %0 on failure.
> + * Found address on success, 0 on failure.
>   */
>  phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
>   phys_addr_t end, phys_addr_t size,
>   phys_addr_t align, int nid)
>  {
> + int ret;
> + phys_addr_t kernel_end;
> +
>   /* pump up @end */
>   if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
>   end = memblock.current_limit;
> @@ -141,6 +187,37 @@ phys_addr_t __init_memblock 
> memblock_find_in_range_node(phys_addr_t start,
>   /* avoid allocating the first page */
>   start = max_t(phys_addr_t, start, PAGE_SIZE);
>   end = max(start, end);
> + kernel_end = __pa_symbol(_end);

Please address the issue in __pa_symbol() that Andrew pointed out.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/6] memblock: Factor out of top-down allocation

2013-09-27 Thread Toshi Kani
On Wed, 2013-09-25 at 02:25 +0800, Zhang Yanfei wrote:
> From: Tang Chen 
> 
> This patch creates a new function __memblock_find_range_rev
> to factor out of top-down allocation from memblock_find_in_range_node.
> This is a preparation because we will introduce a new bottom-up
> allocation mode in the following patch.
> 
> Acked-by: Tejun Heo 
> Signed-off-by: Tang Chen 
> Signed-off-by: Zhang Yanfei 

Acked-by: Toshi Kani 

A minor comment below...

> ---
>  mm/memblock.c |   47 ++-
>  1 files changed, 34 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 0ac412a..3d80c74 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -83,33 +83,25 @@ static long __init_memblock 
> memblock_overlaps_region(struct memblock_type *type,
>  }
>  
>  /**
> - * memblock_find_in_range_node - find free area in given range and node
> + * __memblock_find_range_rev - find free area utility, in reverse order
>   * @start: start of candidate range
>   * @end: end of candidate range, can be %MEMBLOCK_ALLOC_{ANYWHERE|ACCESSIBLE}
>   * @size: size of free area to find
>   * @align: alignment of free area to find
>   * @nid: nid of the free area to find, %MAX_NUMNODES for any node
>   *
> - * Find @size free area aligned to @align in the specified range and node.
> + * Utility called from memblock_find_in_range_node(), find free area 
> top-down.
>   *
>   * RETURNS:
>   * Found address on success, %0 on failure.
>   */
> -phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
> - phys_addr_t end, phys_addr_t size,
> - phys_addr_t align, int nid)
> +static phys_addr_t __init_memblock
> +__memblock_find_range_rev(phys_addr_t start, phys_addr_t end,

Since we are now using the terms "top down" and "bottom up"
consistently, how about name this function as
__memblock_find_range_top_down()? 

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Please revert 928bea964827d7824b548c1f8e06eccbbc4d0d7d

2013-09-27 Thread Benjamin Herrenschmidt
On Fri, 2013-09-27 at 10:44 -0700, Yinghai Lu wrote:
> |/* Get and check PCI Express port services */
> |capabilities = get_port_device_capability(dev);
> |if (!capabilities)
> |return 0;
> |
> |pci_set_master(dev);
> 
> so how come that pci_set_master is not called for powerpc?
> 
> Can you send out lspci -vvxxx with current linus-tree and v3.11?

Ah good point. It should have ... except that there are a number of ways
for get_port_device_capability() to fail and that should *not* leave the
bridge without the bus master capability set.

However I don't think that's what's happening here. I'll have to dig
more, will get back to you.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Devicetree fixes for 3.12

2013-09-27 Thread Rob Herring
On Tue, Sep 24, 2013 at 10:14 PM, Linus Torvalds
 wrote:
> On Tue, Sep 24, 2013 at 7:53 PM, Rob Herring  wrote:
>>
>> are available in the git repository at:
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git
>
> Hmm. Did you mean for me to pull the "devicetree-fixes" tag?

Yes, please.

> If so, please say so very explicitly. I don't want to have to search
> for these things..
>
> IOW, if that's what you meant, that line should have said
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git
> tags/devicetree-fixes
>
> instead.

That's strange. The tag description was there, but somehow the tag
name wasn't. Sorry about that.

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/5] mm: migrate zbud pages

2013-09-27 Thread Seth Jennings
On Fri, Sep 27, 2013 at 12:16:37PM +0200, Tomasz Stanislawski wrote:
> On 09/25/2013 11:57 PM, Seth Jennings wrote:
> > On Wed, Sep 25, 2013 at 07:09:50PM +0200, Tomasz Stanislawski wrote:
> >>> I just had an idea this afternoon to potentially kill both these birds 
> >>> with one
> >>> stone: Replace the rbtree in zswap with an address_space.
> >>>
> >>> Each swap type would have its own page_tree to organize the compressed 
> >>> objects
> >>> by type and offset (radix tree is more suited for this anyway) and a_ops 
> >>> that
> >>> could be called by shrink_page_list() (writepage) or the migration code
> >>> (migratepage).
> >>>
> >>> Then zbud pages could be put on the normal LRU list, maybe at the 
> >>> beginning of
> >>> the inactive LRU so they would live for another cycle through the list, 
> >>> then be
> >>> reclaimed in the normal way with the mapping->a_ops->writepage() pointing 
> >>> to a
> >>> zswap_writepage() function that would decompress the pages and call
> >>> __swap_writepage() on them.
> >>>
> >>> This might actually do away with the explicit pool size too as the 
> >>> compressed
> >>> pool pages wouldn't be outside the control of the MM anymore.
> >>>
> >>> I'm just starting to explore this but I think it has promise.
> >>>
> >>> Seth
> >>>
> >>
> >> Hi Seth,
> >> There is a problem with the proposed idea.
> >> The radix tree used 'struct address_space' is a part of
> >> a bigger data structure.
> >> The radix tree is used to translate an offset to a page.
> >> That is ok for zswap. But struct page has a field named 'index'.
> >> The MM assumes that this index is an offset in radix tree
> >> where one can find the page. A lot is done by MM to sustain
> >> this consistency.
> > 
> > Yes, this is how it is for page cache pages.  However, the MM is able to
> > work differently with anonymous pages.  In the case of an anonymous
> > page, the mapping field points to an anon_vma struct, or, if ksm in
> > enabled and dedup'ing the page, a private ksm tracking structure.  If
> > the anonymous page is fully unmapped and resides only in the swap cache,
> > the page mapping is NULL.  So there is precedent for the fields to mean
> > other things.
> 
> Hi Seth,
> You are right that page->mapping is NULL for pages in swap_cache but
> page_mapping() is not NULL in such a case. The mapping is taken from
> struct address_space swapper_spaces[]. It is still an address space,
> and it should preserve constraints for struct address_space.
> The same happen for page->index and page_index().
> 
> > 
> > The question is how to mark and identify zbud pages among the other page
> > types that will be on the LRU.  There are many ways.  The question is
> > what is the best and most acceptable way.
> > 
> 
> If you consider hacking I have some idea how address_space could utilized for 
> ZBUD.
> One solution whould be using tags in a radix tree. Every entry in a radix tree
> can have a few bits assigned to it. Currently 3 bits are supported:
> 
> From include/linux/fs.h
> #define PAGECACHE_TAG_DIRTY  0
> #define PAGECACHE_TAG_WRITEBACK  1
> #define PAGECACHE_TAG_TOWRITE2
> 
> You could add a new bit or utilize one of existing ones.
> 
> The other idea is use a trick from a RB trees and scatter-gather lists.
> I mean using the last bits of pointers to keep some metadata.
> Values of 'struct page *' variables are aligned to a pointer alignment which 
> is
> 4 for 32-bit CPUs and 8 for 64-bit ones (not sure). This means that one could
> could use the last bit of page pointer in a radix tree to track if a swap 
> entry
> refers to a lower or a higher part of a ZBUD page.
> I think it is a serious hacking/obfuscation but it may work with the minimal
> amount of changes to MM. Adding only (x&~3) while extracting page pointer is
> probably enough.
> 
> What do you think about this idea?

I think it is a good one.

I have to say that when I first came up with the idea, I was thinking
the address space would be at the zswap layer and the radix slots would
hold zbud handles, not struct page pointers.

However, as I have discovered today, this is problematic when it comes
to reclaim and migration and serializing access.

I wanted to do as much as possible in the zswap layer since anything
done in the zbud layer would need to be duplicated in any other future
allocator that zswap wanted to support.

Unfortunately, zbud abstracts away the struct page and that visibility
is needed to properly do what we are talking about.

So maybe it is inevitable that this will need to be in the zbud code
with the radix tree slots pointing to struct pages after all.

I like the idea of masking the bit into the struct page pointer to
indicate which buddy maps to the offset.

There is a twist here in that, unlike a normal page cache tree, we can
have two offsets pointing at different buddies in the same frame
which means we'll have to do some custom stuff for migration.

The rabbit hole I was going down today has come to an 

Re: Please revert 928bea964827d7824b548c1f8e06eccbbc4d0d7d

2013-09-27 Thread Benjamin Herrenschmidt
On Fri, 2013-09-27 at 14:54 -0700, Yinghai Lu wrote:
> On Fri, Sep 27, 2013 at 2:46 PM, Benjamin Herrenschmidt
>  wrote:
> 
> > Wouldn't it be better to simply have pci_enable_device() always set bus
> > master on a bridge? I don't see any case where it makes sense to have
> > an enabled bridge without the master bit set on it...
> 
> Do you mean attached?

That's an option. I was thinking making pci_enable_device() itself
enable bus master on a bridge but yes, you approach should work.

I'm digging a bit more to figure out what went wrong in the
pcie port driver since that's interesting in its own right and I'll then
test your patch which I think is a more robust approach.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Please revert 928bea964827d7824b548c1f8e06eccbbc4d0d7d

2013-09-27 Thread Yinghai Lu
On Fri, Sep 27, 2013 at 2:46 PM, Benjamin Herrenschmidt
 wrote:

> Wouldn't it be better to simply have pci_enable_device() always set bus
> master on a bridge? I don't see any case where it makes sense to have
> an enabled bridge without the master bit set on it...

Do you mean attached?


pci_set_master_again.patch
Description: Binary data


Re: [Xen-devel] [PATCH] xen/hvc-console: Make it work with HVM guests.

2013-09-27 Thread Julien Grall

On 09/27/2013 10:25 PM, Konrad Rzeszutek Wilk wrote:


@@ -641,7 +641,20 @@ struct console xenboot_console = {

  void xen_raw_console_write(const char *str)
  {
-   dom0_write_console(0, str, strlen(str));
+   ssize_t len = strlen(str);
+   int rc = 0;
+
+   if (xen_domain()) {
+   dom0_write_console(0, str, len);
+   if (rc == -ENOSYS && xen_hvm_domain())
+   goto outb_print;
+
+   } else if (xen_cpuid_base()) {
+   int i;
+outb_print:
+   for (i = 0; i < len; i++)
+   outb(str[i], 0xe9);
+   }
  }


xen_cpuid_base and outb(0xe9) is x86 specific and won't compile on ARM.
You need to add ifdef around.

--
Julien Grall
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rwsem: reduce spinlock contention in wakeup code path

2013-09-27 Thread Tim Chen
On Fri, 2013-09-27 at 12:39 -0700, Davidlohr Bueso wrote:
> On Fri, 2013-09-27 at 12:28 -0700, Linus Torvalds wrote:
> > On Fri, Sep 27, 2013 at 12:00 PM, Waiman Long  wrote:
> > >
> > > On a large NUMA machine, it is entirely possible that a fairly large
> > > number of threads are queuing up in the ticket spinlock queue to do
> > > the wakeup operation. In fact, only one will be needed.  This patch
> > > tries to reduce spinlock contention by doing just that.
> > >
> > > A new wakeup field is added to the rwsem structure. This field is
> > > set on entry to rwsem_wake() and __rwsem_do_wake() to mark that a
> > > thread is pending to do the wakeup call. It is cleared on exit from
> > > those functions.
> > 
> > Ok, this is *much* simpler than adding the new MCS spinlock, so I'm
> > wondering what the performance difference between the two are.
> 
> Both approaches should be complementary. The idea of optimistic spinning
> in rwsems is to avoid putting putting the writer on the wait queue -
> reducing contention and giving a greater chance for the rwsem
> to get acquired. Waiman's approach is once the blocking actually occurs,
> and at this point I'm not sure how this will affect writer stealing
> logic.
> 

I agree with the view that the two approaches are complementary 
to each other.  They address different bottleneck areas in the
rwsem. Here're the performance numbers for exim workload 
compared to a vanilla kernel.

Waimain's patch:+2.0%
Alex+Tim's patchset:+4.8%
Waiman+Alex+Tim:+5.3%

Tim

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Please revert 928bea964827d7824b548c1f8e06eccbbc4d0d7d

2013-09-27 Thread Benjamin Herrenschmidt
On Fri, 2013-09-27 at 10:10 -0700, Linus Torvalds wrote:
> > So i would like to use the first way that you suggest : call pci_set_master
> > PCIe port driver.
> 
> So I have to say, that if we can fix this with just adding a single
> new pci_set_master() call, we should do that before we decide to
> revert.
> 
> If other, bigger issues then come up, we can decide to revert. But if
> there's a one-liner fix, let's just do that first, ok?
> 
> Mind sending a patch?

Wouldn't it be better to simply have pci_enable_device() always set bus
master on a bridge? I don't see any case where it makes sense to have
an enabled bridge without the master bit set on it...

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] powerpc 8xx: Fixing issue with CONFIG_PIN_TLB

2013-09-27 Thread Scott Wood
On Tue, 2013-09-24 at 10:18 +0200, Christophe Leroy wrote:
> Activating CONFIG_PIN_TLB is supposed to pin the IMMR and the first three
> 8Mbytes pages. But the setting of MD_CTR to a pinnable entry was missing 
> before
> the pinning of the third 8Mb page. As the index is decremented module 28
> (MD_RSV4D is set) after every DTLB update, the third 8Mbytes page was
> not pinned.

The examples you showed weren't quite modulo 28, more like "if (x >= 28)
x -= 4".  I'll fix up the changelog on applying, to read something like
"As the index is decremented to a value within the first 28 entries
(MD_RSV4D is set)...".

-Scott



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: About atags_proc buffer size

2013-09-27 Thread Geert Uytterhoeven
On Fri, Sep 27, 2013 at 10:47 PM, Russell King - ARM Linux
 wrote:
> On Fri, Sep 27, 2013 at 10:25:45PM +0200, Vojtech Bocek wrote:
>> I want to ask something about atags_proc.c implementation. Currently,
>> it uses a buffer to temporarily store atags. The buffer size is set to
>> 1.5kb for some reason, but as far as I know, atag list's size is not
>> limited in any way. I've got a device (HTC One) which uses about 12kb
>> of tags, that means it panics during boot if CONFIG_ATAGS_PROC is
>> enabled, because the buffer contains only part of the tag list without
>> an end tag.
>
> The tags are supposed to be a short-lived structure which gets used to
> pass barest minimum of details to the kernel, and the data stored in
> them almost certainly gets overwritten before the kernels memory
> allocators are up and running.
>
> So, we need to statically allocate some space to save these things -
> it can't be done dynamically.
>
> The problem is this: for the vast majority of platforms, they pass no
> more than 1.5kB (lower case b is *bits* not *bytes*) to the kernel in
> their tagged list.  Having a static allocation of 12k would be wasteful
> for the majority of users.

It's __initdata memory, right?
So it would waste this 12 KiB only temporarily.
Stll, it enlarges the kernel image by 12 KiB, as there's no such thing
as __initbss.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Personal Email (27th september 2013)

2013-09-27 Thread Mrs.Nicole Marois

Dear Beloved Friend,

  I am Mrs Nicole Marois Benoite, and i have been suffering from ovarian 
cancer disease and the doctor says that i have just few days to leave. I am 
from (Paris) France but based in Africa Burkina Faso since eleven years ago as 
a business woman dealing with gold exportation.

Now that i am about to end the race like this, without any family members and 
no child. I have $3 Million US DOLLARS in Africa Development Bank (ADB) Burkina 
Faso which i instructed the bank to give St Andrews Missionary Home in Burkina 
Faso.But my mind is not at rest because i am writing this letter now through 
the help of my computer beside my sick bed.

I also have $4.5 Million US Dollars at Eco-Bank here in Burkina Faso and i 
instructed the bank to transfer the fund to you as foreigner that will apply to 
the bank after i have gone, that they should release the fund to him/her,but 
you will assure me that you will take 50% of the fund and give 50% to the 
orphanages home in your country for my heart to rest.

Respond to me immediately  for further details since I have just few days to 
end my life due to the ovarian cancer disease, hoping you will understand my 
point.

Yours fairly friend,
Mrs Nicole Marois Benoite
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] xen/hvc-console: Make it work with HVM guests.

2013-09-27 Thread Konrad Rzeszutek Wilk
Signed-off-by: Konrad Rzeszutek Wilk 
---
 drivers/tty/hvc/hvc_xen.c |   17 +++--
 1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index e61c36c..513a79b 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -183,7 +183,7 @@ static int dom0_write_console(uint32_t vtermno, const char 
*str, int len)
 {
int rc = HYPERVISOR_console_io(CONSOLEIO_write, len, (char *)str);
if (rc < 0)
-   return 0;
+   return rc;
 
return len;
 }
@@ -641,7 +641,20 @@ struct console xenboot_console = {
 
 void xen_raw_console_write(const char *str)
 {
-   dom0_write_console(0, str, strlen(str));
+   ssize_t len = strlen(str);
+   int rc = 0;
+
+   if (xen_domain()) {
+   dom0_write_console(0, str, len);
+   if (rc == -ENOSYS && xen_hvm_domain())
+   goto outb_print;
+
+   } else if (xen_cpuid_base()) {
+   int i;
+outb_print:
+   for (i = 0; i < len; i++)
+   outb(str[i], 0xe9);
+   }
 }
 
 void xen_raw_printk(const char *fmt, ...)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: About atags_proc buffer size

2013-09-27 Thread Vojtech Bocek
Okay then, I suppose there is no nice way around that, or at least none
that I can find. I'll just make that initial buffer 12kb big on my kernel
for that device and be done with it.

Anyway, thanks for the information and help, it is much appreciated.

On 27.9.2013 23:15, Russell King - ARM Linux wrote:

> On Fri, Sep 27, 2013 at 11:09:13PM +0200, Vojtech Bocek wrote:
>> It only needs to survive until init_atags_procfs is called, because it is
>> copied to another buffer for procfs entry. Can I be sure it survives until
>> that? My guess is that it is likely to survive, but not certain.
>>
>> I suppose it is possible to somehow protect that bit of ram until it is
>> read by init_atags_procfs, but I wonder if you even want to do that in
>> mainline - if majority of devices doesn't use such big tag lists, then
>> it is probably that device's vendor problem. I've met this problem on two
>> devices so far though, both of them Android phones, one is the HTC One
>> (that is MSM APQ8064 SoC) and I unfortunately can't remember the first
>> one - I discarded it as usual Android kernel's mess.
> 
> We've been through several early allocators - particularly one which
> allocates from the bottom of memory upwards, which would overwrite the
> ATAGs long before init_atags_procfs gets called.
> 
> If we rely on the behaviour of the current early allocator not to
> touch that, and it changes in the future, that's going to be rather
> too fragile.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: About atags_proc buffer size

2013-09-27 Thread Russell King - ARM Linux
On Fri, Sep 27, 2013 at 11:09:13PM +0200, Vojtech Bocek wrote:
> It only needs to survive until init_atags_procfs is called, because it is
> copied to another buffer for procfs entry. Can I be sure it survives until
> that? My guess is that it is likely to survive, but not certain.
> 
> I suppose it is possible to somehow protect that bit of ram until it is
> read by init_atags_procfs, but I wonder if you even want to do that in
> mainline - if majority of devices doesn't use such big tag lists, then
> it is probably that device's vendor problem. I've met this problem on two
> devices so far though, both of them Android phones, one is the HTC One
> (that is MSM APQ8064 SoC) and I unfortunately can't remember the first
> one - I discarded it as usual Android kernel's mess.

We've been through several early allocators - particularly one which
allocates from the bottom of memory upwards, which would overwrite the
ATAGs long before init_atags_procfs gets called.

If we rely on the behaviour of the current early allocator not to
touch that, and it changes in the future, that's going to be rather
too fragile.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Crash of 3.12-rc2 BUG: unable to handle kernel NULL pointer dereference

2013-09-27 Thread Russell King - ARM Linux
On Fri, Sep 27, 2013 at 10:04:44AM -0600, Bjorn Helgaas wrote:
> [+cc Thomas, Russell]

Someone is doing something quite bad in the kernel, and as yet I've not
figured out a way to track it down.

The issue is this: someone is kfree'ing a kobject before its release
function has been called, and the memory is being re-used.  The problem
is that when the last reference has been dropped with the debug enabled,
the kobject is linked into the timer lists for the delayed work.  When
the timer lists get run, they're found to be corrupted.

The obvious solution to this is to move the delayed work out of the
kobject into a separately allocated structure.  That would work if
x86 didn't register kobjects very early in boot, before the memory
allocators were up and running.

Frankly, I've no idea how to solve this.  So I regard x86 as just being
difficult and broken.  :)

If anyone has any ideas, then I'm all ears.
http://www.annhuey.com/ed-pix/fa_i-pix/I%27m-All-Ears.jpg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: About atags_proc buffer size

2013-09-27 Thread Vojtech Bocek
It only needs to survive until init_atags_procfs is called, because it is
copied to another buffer for procfs entry. Can I be sure it survives until
that? My guess is that it is likely to survive, but not certain.

I suppose it is possible to somehow protect that bit of ram until it is
read by init_atags_procfs, but I wonder if you even want to do that in
mainline - if majority of devices doesn't use such big tag lists, then
it is probably that device's vendor problem. I've met this problem on two
devices so far though, both of them Android phones, one is the HTC One
(that is MSM APQ8064 SoC) and I unfortunately can't remember the first
one - I discarded it as usual Android kernel's mess.

On 27.9.2013 22:47, Russell King - ARM Linux wrote:

> On Fri, Sep 27, 2013 at 10:25:45PM +0200, Vojtech Bocek wrote:
>> I want to ask something about atags_proc.c implementation. Currently,
>> it uses a buffer to temporarily store atags. The buffer size is set to
>> 1.5kb for some reason, but as far as I know, atag list's size is not
>> limited in any way. I've got a device (HTC One) which uses about 12kb
>> of tags, that means it panics during boot if CONFIG_ATAGS_PROC is
>> enabled, because the buffer contains only part of the tag list without
>> an end tag.
> 
> The tags are supposed to be a short-lived structure which gets used to
> pass barest minimum of details to the kernel, and the data stored in
> them almost certainly gets overwritten before the kernels memory
> allocators are up and running.
> 
> So, we need to statically allocate some space to save these things -
> it can't be done dynamically.
> 
> The problem is this: for the vast majority of platforms, they pass no
> more than 1.5kB (lower case b is *bits* not *bytes*) to the kernel in
> their tagged list.  Having a static allocation of 12k would be wasteful
> for the majority of users.
> 
>> I don't know much about the way ARM boot process works, but I tried to
>> store just the pointer to atag list, and it works fine (quick patch
>> attached). Do atags get erased later in boot process on some platforms,
>> is that the reason why buffer had to be used?
> 
> This may appear to work, but check it after you've been running for
> some time and have exercised the memory systems.  You'll probably find
> that your tags have vanished!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] misc: (at24) move header to linux/platform_data/

2013-09-27 Thread Wolfram Sang
On Fri, Sep 27, 2013 at 03:06:28PM -0400, Vivien Didelot wrote:
> This patch moves the at24.h header from include/linux/i2c to
> include/linux/platform_data and updates existing support accordingly.
> 
> It also fixes the following checkpatch warning:
> 
> WARNING: please, no space before tabs
> #436: FILE: include/linux/platform_data/at24.h:31:
> + * ^Iu8 *mac_addr = ethernet_pdata->mac_addr;$
> 
> Signed-off-by: Vivien Didelot 

Applied to for-next, thanks!



signature.asc
Description: Digital signature


Re: pull request: wireless 2013-09-27

2013-09-27 Thread David Miller
From: "John W. Linville" 
Date: Fri, 27 Sep 2013 14:05:49 -0400

> Please pull this batch of fixes intended for the 3.12 stream!

Pulled, thanks a lot John.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] gpio/omap: maintain GPIO and IRQ usage separately

2013-09-27 Thread Tony Lindgren
* Javier Martinez Canillas  [130924 17:45]:
> The GPIO OMAP controller pins can be used as IRQ and GPIO
> independently so is necessary to keep track GPIO pins and
> IRQ lines usage separately to make sure that the bank will
> always be enabled while being used.
> 
> Also move gpio_is_input() definition in preparation for the
> next patch that setups the controller's irq_chip driver when
> a caller requests an interrupt line.
> 
> Signed-off-by: Javier Martinez Canillas 

Thanks both of these look good to me for fixing
the regression:

Acked-by: Tony Lindgren 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 02/10] mm: convert mm->nr_ptes to atomic_t

2013-09-27 Thread Dave Hansen
On 09/27/2013 01:46 PM, Cody P Schafer wrote:
> On 09/27/2013 06:16 AM, Kirill A. Shutemov wrote:
>> @@ -339,6 +339,7 @@ struct mm_struct {
>>   pgd_t * pgd;
>>   atomic_t mm_users;/* How many users with user space? */
>>   atomic_t mm_count;/* How many references to "struct
>> mm_struct" (users count as 1) */
>> +atomic_t nr_ptes;/* Page table pages */
>>   int map_count;/* number of VMAs */
...
> 
> Will 32bits always be enough here? Should atomic_long_t be used instead?

There are 48 bits of virtual address space on x86 today.  12 bits of
that is the address inside the page, so we've at *most* 2^36 pages.  2^9
(512) pages are mapped by a pte page, so that means the page tables only
hold 2^27 pte pages in a single process.

We've got 31 bits of usable space in the atomic_t, so that definitely
works _today_.  If the virtual address space ever gets bigger, we might
have problems, though.

In practice, though, we steal a big chunk of that virtual address space
for the kernel, and that doesn't get accounted in mm->nr_ptes, so we've
got a _bit_ more wiggle room than just 4 bits.  Also, anybody that's
mapping >4 petabytes of memory with 4k ptes is just off their rocker.

I'm also not sure what the virtual address limits are for the more
obscure architectures, so I guess it's also possible they'll hit this.
I guess it wouldn't hurt to stick an overflow check in there for VM
debugging purposes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: About atags_proc buffer size

2013-09-27 Thread Russell King - ARM Linux
On Fri, Sep 27, 2013 at 10:25:45PM +0200, Vojtech Bocek wrote:
> I want to ask something about atags_proc.c implementation. Currently,
> it uses a buffer to temporarily store atags. The buffer size is set to
> 1.5kb for some reason, but as far as I know, atag list's size is not
> limited in any way. I've got a device (HTC One) which uses about 12kb
> of tags, that means it panics during boot if CONFIG_ATAGS_PROC is
> enabled, because the buffer contains only part of the tag list without
> an end tag.

The tags are supposed to be a short-lived structure which gets used to
pass barest minimum of details to the kernel, and the data stored in
them almost certainly gets overwritten before the kernels memory
allocators are up and running.

So, we need to statically allocate some space to save these things -
it can't be done dynamically.

The problem is this: for the vast majority of platforms, they pass no
more than 1.5kB (lower case b is *bits* not *bytes*) to the kernel in
their tagged list.  Having a static allocation of 12k would be wasteful
for the majority of users.

> I don't know much about the way ARM boot process works, but I tried to
> store just the pointer to atag list, and it works fine (quick patch
> attached). Do atags get erased later in boot process on some platforms,
> is that the reason why buffer had to be used?

This may appear to work, but check it after you've been running for
some time and have exercised the memory systems.  You'll probably find
that your tags have vanished!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] extending splice for copy offloading

2013-09-27 Thread Zach Brown

> > >Sure.  So we'd have:
> > >
> > >- no flag default that forbids knowingly copying with shared references
> > >   so that it will be used by default by people who feel strongly about
> > >   their assumptions about independent write durability.
> > >
> > >- a flag that allows shared references for people who would otherwise
> > >   use the file system shared reference ioctls (ocfs2 reflink, btrfs
> > >   clone) but would like it to also do server-side read/write copies
> > >   over nfs without additional intervention.
> > >
> > >- a flag that requires shared references for callers who don't want
> > >   giant copies to take forever if they aren't instant.  (The qemu guys
> > >   asked for this at Plumbers.)
> 
> Why not implement only the last flag only as  the first step?  It seems
> like the simplest one.  So I think that would mean:
> 
>   - no worrying about cancelling, etc.
>   - apps should be told to pass the entire range at once (normally
> the whole file).
>   - The NFS server probably shouldn't do the internal copy loop by
> default.
> 
> We can't prevent some storage system from implementing a high-latency
> copy operation, but we can refuse to provide them any help (providing no
> progress reports or easy way to cancel) and then they can deal with the
> complaints from their users.

I can see where you're going with that, yeah.

It'd make less sense as a splice extension, then, perhaps.  It'd be more
like a generic entry point for the existing ioctls.  Maybe even just
defining the semantics of a common ioctl.

Hmm.

> Also, I don't get the first option above at all.  The argument is that
> it's safer to have more copies?  How much safety does another copy on
> the same disk really give you?  Do systems that do dedup provide
> interfaces to turn it off per-file?

Yeah, got me.  It's certainly nonsense on a lot of FTL logging
implementations (which are making their way into SMR drives in the
future).

> But I understand that Zach's tired of the woodshedding and I could live
> with the above I guess

No, it's fine.  At least people are expressing some interest in the
interface!  That's a marked improvement over the state of things in the
past.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arch: tile: re-use kbasename() helper

2013-09-27 Thread Chris Metcalf
On 9/27/2013 4:26 AM, Andy Shevchenko wrote:
> kbasename() returns the filename part of a pathname.
>
> Signed-off-by: Andy Shevchenko 
> ---
>  arch/tile/kernel/stack.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)

Thanks!  Taken into the tile tree.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 02/10] mm: convert mm->nr_ptes to atomic_t

2013-09-27 Thread Cody P Schafer

On 09/27/2013 06:16 AM, Kirill A. Shutemov wrote:

With split page table lock for PMD level we can't hold
mm->page_table_lock while updating nr_ptes.

Let's convert it to atomic_t to avoid races.




---



diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 84e0c56e1e..99f19e850d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -339,6 +339,7 @@ struct mm_struct {
pgd_t * pgd;
atomic_t mm_users;  /* How many users with user 
space? */
atomic_t mm_count;  /* How many references to "struct 
mm_struct" (users count as 1) */
+   atomic_t nr_ptes;   /* Page table pages */
int map_count;  /* number of VMAs */

spinlock_t page_table_lock; /* Protects page tables and 
some counters */
@@ -360,7 +361,6 @@ struct mm_struct {
unsigned long exec_vm;  /* VM_EXEC & ~VM_WRITE */
unsigned long stack_vm; /* VM_GROWSUP/DOWN */
unsigned long def_flags;
-   unsigned long nr_ptes;  /* Page table pages */
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
unsigned long arg_start, arg_end, env_start, env_end;


Will 32bits always be enough here? Should atomic_long_t be used instead?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] spi: Add a spi_w8r16be() helper

2013-09-27 Thread Guenter Roeck
On Fri, Sep 27, 2013 at 08:22:33PM +0100, Mark Brown wrote:
> On Fri, Sep 27, 2013 at 08:46:56PM +0200, Lars-Peter Clausen wrote:
> 
> > According to the documentation of spi_w8r16() it is a feature.
> 
> > * The number is returned in wire-order, which is at least sometimes
> > * big-endian.
> 
> Indeed.  I don't think that's terribly well thought through though,
> especially not now we have annotations for endianness (as you noticed!).
> 
> > There seem to be at least two users though which assume that the result is
> > in native endianness drivers/hwmon/ads7871.c and drivers/mfd/stmpe-spi.c
> 
> Yeah, I saw.  The ads7871 is just going to break when run on the
> opposite endianness to the one it was (hopefully) tested on since it
> doesn't make any effort I saw to cope with endianness.  Looking at the
> history it's not terribly obvious which that was but it'd be surprising
> to see a little endian register...
> 
It might make sense to convert the ads7871 driver to an iio driver;
it doesn't look like the chip is commonly used for hardware monitoring.

At least if someone finds the time to do it ;).

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hotplug: Optimize {get,put}_online_cpus()

2013-09-27 Thread Peter Zijlstra
On Fri, Sep 27, 2013 at 08:15:32PM +0200, Oleg Nesterov wrote:
> On 09/26, Peter Zijlstra wrote:
> >
> > But if the readers does see BLOCK it will not be an active reader no
> > more; and thus the writer doesn't need to observe and wait for it.
> 
> I meant they both can block, but please ignore. Today I simply can't
> understand what I was thinking about yesterday.

I think we all know that state all too well ;-)

> I tried hard to find any hole in this version but failed, I believe it
> is correct.

Yay!

> But, could you help me to understand some details?

I'll try, but I'm not too bright atm myself :-)

> > +void __get_online_cpus(void)
> > +{
> > +again:
> > +   /* See __srcu_read_lock() */
> > +   __this_cpu_inc(__cpuhp_refcount);
> > +   smp_mb(); /* A matches B, E */
> > +   __this_cpu_inc(cpuhp_seq);
> > +
> > +   if (unlikely(__cpuhp_state == readers_block)) {
> 
> Note that there is no barrier() after inc(seq) and __cpuhp_state
> check, this inc() can be "postponed" till ...
> 
> > +void __put_online_cpus(void)
> >  {
> > +   /* See __srcu_read_unlock() */
> > +   smp_mb(); /* C matches D */
> 
> ... this mb() in __put_online_cpus().
> 
> And this is fine! The qustion is, perhaps it would be more "natural"
> and understandable to shift this_cpu_inc(cpuhp_seq) into
> __put_online_cpus().

Possibly; I never got further than that the required order is:

  ref++
  MB
  seq++
  MB
  ref--

It doesn't matter if the seq++ is in the lock or unlock primitive. I
never considered one place more natural than the other.

> We need to ensure 2 things:
> 
> 1. The reader should notic state = BLOCK or the writer should see
>inc(__cpuhp_refcount). This is guaranteed by 2 mb's in
>__get_online_cpus() and in cpu_hotplug_begin().
> 
>We do not care if the writer misses some inc(__cpuhp_refcount)
>in per_cpu_sum(__cpuhp_refcount), that reader(s) should notice
>state = readers_block (and inc(cpuhp_seq) can't help anyway).

Agreed.

> 2. If the writer sees the result of this_cpu_dec(__cpuhp_refcount)
>from __put_online_cpus() (note that the writer can miss the
>corresponding inc() if it was done on another CPU, so this dec()
>can lead to sum() == 0), it should also notice the change in cpuhp_seq.
> 
>Fortunately, this can only happen if the reader migrates, in
>this case schedule() provides a barrier, the writer can't miss
>the change in cpuhp_seq.

Again, agreed; this is also the message of the second comment in
cpuhp_readers_active_check() by Paul.

> IOW. Unless I missed something, cpuhp_seq is actually needed to
> serialize __put_online_cpus()->this_cpu_dec(__cpuhp_refcount) and
> and /* D matches C */ in cpuhp_readers_active_check(), and this
> is not immediately clear if you look at __get_online_cpus().
> 
> I do not suggest to change this code, but please tell me if my
> understanding is not correct.

I think you're entirely right.

> > +static bool cpuhp_readers_active_check(void)
> >  {
> > +   unsigned int seq = per_cpu_sum(cpuhp_seq);
> > +
> > +   smp_mb(); /* B matches A */
> > +
> > +   /*
> > +* In other words, if we see __get_online_cpus() cpuhp_seq increment,
> > +* we are guaranteed to also see its __cpuhp_refcount increment.
> > +*/
> >  
> > +   if (per_cpu_sum(__cpuhp_refcount) != 0)
> > +   return false;
> >  
> > +   smp_mb(); /* D matches C */
> 
> It seems that both barries could be smp_rmb() ? I am not sure the comments
> from srcu_readers_active_idx_check() can explain mb(), note that
> __srcu_read_lock() always succeeds unlike get_cpus_online().

I see what you mean; cpuhp_readers_active_check() is all purely reads;
there are no writes to order.

Paul; is there any argument for the MB here as opposed to RMB; and if
not should we change both these and SRCU?

> >  void cpu_hotplug_done(void)
> >  {
> > +   /* Signal the writer is done, no fast path yet. */
> > +   __cpuhp_state = readers_slow;
> > +   wake_up_all(_readers);
> > +
> > +   /*
> > +* The wait_event()/wake_up_all() prevents the race where the readers
> > +* are delayed between fetching __cpuhp_state and blocking.
> > +*/
> > +
> > +   /* See percpu_up_write(); readers will no longer attempt to block. */
> > +   synchronize_sched();
> > +
> > +   /* Let 'em rip */
> > +   __cpuhp_state = readers_fast;
> > +   current->cpuhp_ref--;
> > +
> > +   /*
> > +* Wait for any pending readers to be running. This ensures readers
> > +* after writer and avoids writers starving readers.
> > +*/
> > +   wait_event(cpuhp_writer, !atomic_read(_waitcount));
> >  }
> 
> OK, to some degree I can understand "avoids writers starving readers"
> part (although the next writer should do synchronize_sched() first),
> but could you explain "ensures readers after writer" ?

Suppose reader A sees state == BLOCK and goes to sleep; our writer B
does cpu_hotplug_done() and wakes all pending readers. If for some
reason A doesn't schedule to inc ref until B again executes

Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Paul E. McKenney
On Fri, Sep 27, 2013 at 12:38:53PM -0700, Tim Chen wrote:
> On Fri, 2013-09-27 at 08:29 -0700, Paul E. McKenney wrote:
> > On Wed, Sep 25, 2013 at 03:10:49PM -0700, Tim Chen wrote:
> > > We will need the MCS lock code for doing optimistic spinning for rwsem.
> > > Extracting the MCS code from mutex.c and put into its own file allow us
> > > to reuse this code easily for rwsem.
> > > 
> > > Signed-off-by: Tim Chen 
> > > Signed-off-by: Davidlohr Bueso 
> > > ---
> > >  include/linux/mcslock.h |   58 
> > > +++
> > >  kernel/mutex.c  |   58 
> > > +-
> > >  2 files changed, 65 insertions(+), 51 deletions(-)
> > >  create mode 100644 include/linux/mcslock.h
> > > 
> > > diff --git a/include/linux/mcslock.h b/include/linux/mcslock.h
> > > new file mode 100644
> > > index 000..20fd3f0
> > > --- /dev/null
> > > +++ b/include/linux/mcslock.h
> > > @@ -0,0 +1,58 @@
> > > +/*
> > > + * MCS lock defines
> > > + *
> > > + * This file contains the main data structure and API definitions of MCS 
> > > lock.
> > > + */
> > > +#ifndef __LINUX_MCSLOCK_H
> > > +#define __LINUX_MCSLOCK_H
> > > +
> > > +struct mcs_spin_node {
> > > + struct mcs_spin_node *next;
> > > + int   locked;   /* 1 if lock acquired */
> > > +};
> > > +
> > > +/*
> > > + * We don't inline mcs_spin_lock() so that perf can correctly account 
> > > for the
> > > + * time spent in this lock function.
> > > + */
> > > +static noinline
> > > +void mcs_spin_lock(struct mcs_spin_node **lock, struct mcs_spin_node 
> > > *node)
> > > +{
> > > + struct mcs_spin_node *prev;
> > > +
> > > + /* Init node */
> > > + node->locked = 0;
> > > + node->next   = NULL;
> > > +
> > > + prev = xchg(lock, node);
> > > + if (likely(prev == NULL)) {
> > > + /* Lock acquired */
> > > + node->locked = 1;
> > > + return;
> > > + }
> > > + ACCESS_ONCE(prev->next) = node;
> > > + smp_wmb();
> 
> BTW, is the above memory barrier necessary?  It seems like the xchg
> instruction already provided a memory barrier.
> 
> Now if we made the changes that Jason suggested:
> 
> 
> /* Init node */
> -   node->locked = 0;
> node->next   = NULL;
> 
> prev = xchg(lock, node);
> if (likely(prev == NULL)) {
> /* Lock acquired */
> -   node->locked = 1;
> return;
> }
> +   node->locked = 0;
> ACCESS_ONCE(prev->next) = node;
> smp_wmb();
> 
> We are probably still okay as other cpus do not read the value of
> node->locked, which is a local variable.

I don't immediately see the need for the smp_wmb() in either case.

> Tim
> 
> > > + /* Wait until the lock holder passes the lock down */
> > > + while (!ACCESS_ONCE(node->locked))
> > > + arch_mutex_cpu_relax();

However, you do need a full memory barrier here in order to ensure that
you see the effects of the previous lock holder's critical section.

Thanx, Paul

> > > +}
> > > +
> > > +static void mcs_spin_unlock(struct mcs_spin_node **lock, struct 
> > > mcs_spin_node *node)
> > > +{
> > > + struct mcs_spin_node *next = ACCESS_ONCE(node->next);
> > > +
> > > + if (likely(!next)) {
> > > + /*
> > > +  * Release the lock by setting it to NULL
> > > +  */
> > > + if (cmpxchg(lock, node, NULL) == node)
> > > + return;
> > > + /* Wait until the next pointer is set */
> > > + while (!(next = ACCESS_ONCE(node->next)))
> > > + arch_mutex_cpu_relax();
> > > + }
> > > + ACCESS_ONCE(next->locked) = 1;
> > > + smp_wmb();
> > 
> > Shouldn't the memory barrier precede the "ACCESS_ONCE(next->locked) = 1;"?
> > Maybe in an "else" clause of the prior "if" statement, given that the
> > cmpxchg() does it otherwise.
> > 
> > Otherwise, in the case where the "if" conditionn is false, the critical
> > section could bleed out past the unlock.
> > 
> > Thanx, Paul
> > 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] slub: Proper kmemleak tracking if CONFIG_SLUB_DEBUG disabled

2013-09-27 Thread Frank Rowand
From: Roman Bobniev 

When kmemleak checking is enabled and CONFIG_SLUB_DEBUG is
disabled, the kmemleak code for small block allocation is
disabled.  This results in false kmemleak errors when memory
is freed.

Move the kmemleak code for small block allocation out from
under CONFIG_SLUB_DEBUG.

Signed-off-by: Roman Bobniev 
Signed-off-by: Frank Rowand 
---
 mm/slub.c |6   3 + 3 - 0 !
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: b/mm/slub.c
===
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -947,13 +947,10 @@ static inline void slab_post_alloc_hook(
 {
flags &= gfp_allowed_mask;
kmemcheck_slab_alloc(s, flags, object, slab_ksize(s));
-   kmemleak_alloc_recursive(object, s->object_size, 1, s->flags, flags);
 }
 
 static inline void slab_free_hook(struct kmem_cache *s, void *x)
 {
-   kmemleak_free_recursive(x, s->flags);
-
/*
 * Trouble is that we may no longer disable interupts in the fast path
 * So in order to make the debug calls that expect irqs to be
@@ -2418,6 +2415,8 @@ redo:
memset(object, 0, s->object_size);
 
slab_post_alloc_hook(s, gfpflags, object);
+   kmemleak_alloc_recursive(object, s->objsize, 1, s->flags,
+gfpflags & gfp_allowed_mask);
 
return object;
 }
@@ -2614,6 +2613,7 @@ static __always_inline void slab_free(st
struct kmem_cache_cpu *c;
unsigned long tid;
 
+   kmemleak_free_recursive(x, s->flags);
slab_free_hook(s, x);
 
 redo:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2] gpio/omap: auto-setup a GPIO when used as an IRQ

2013-09-27 Thread Aaro Koskinen
Hi,

On Fri, Sep 27, 2013 at 09:35:33AM +0200, Javier Martinez Canillas wrote:
> I cc'ed Aaro Koskinen and Paul Walmsley now which seems to have OMAP1
> platforms to test. Could you please test [1] and [2] on a OMAP1 board?

[...]

> [1]: https://patchwork.kernel.org/patch/2937351/
> [2]: https://patchwork.kernel.org/patch/2937371/

Tested-by: Aaro Koskinen 

I applied these patches on top of 3.12-rc2 and tested them on Nokia
770 (OMAP1, Touchscreen & Retu powerbutton GPIO IRQs) and N800 (OMAP2,
Retu powerbutton). Seems to work fine on both boards.

A.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fix apparent cut-n-paste mistake in Dell reboot workaround.

2013-09-27 Thread H. Peter Anvin
On 09/27/2013 01:19 PM, Dave Jones wrote:
> 
> Either that or 'bios'.  The question I have is, of those marked 'bios', does 
> =pci
> work too ?  If we knew that was true, I'd probably say yes.
> 

Who knows.  reboot=bios used to only work on 32 bits until very recently.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


x86: sort reboot DMI quirks by vendor.

2013-09-27 Thread Dave Jones
Grouping them by vendor should make it easier to spot duplicates.

Signed-off-by: Dave Jones 

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e643e74..12521b2 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -136,244 +136,255 @@ static int __init set_kbd_reboot(const struct 
dmi_system_id *d)
  * This is a single dmi_table handling all reboot quirks.
  */
 static struct dmi_system_id __initdata reboot_dmi_table[] = {
-   {   /* Handle problems with rebooting on Dell E520's */
-   .callback = set_bios_reboot,
-   .ident = "Dell E520",
+   /* Acer */
+   {   /* Handle reboot issue on Acer Aspire one */
+   .callback = set_kbd_reboot,
+   .ident = "Acer Aspire One A110",
.matches = {
-   DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
-   DMI_MATCH(DMI_PRODUCT_NAME, "Dell DM061"),
+   DMI_MATCH(DMI_SYS_VENDOR, "Acer"),
+   DMI_MATCH(DMI_PRODUCT_NAME, "AOA110"),
},
},
-   {   /* Handle problems with rebooting on Dell 1300's */
+
+   /* ASUS */
+   {   /* Handle problems with rebooting on ASUS P4S800 */
.callback = set_bios_reboot,
-   .ident = "Dell PowerEdge 1300",
+   .ident = "ASUS P4S800",
.matches = {
-   DMI_MATCH(DMI_SYS_VENDOR, "Dell Computer Corporation"),
-   DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge 1300/"),
+   DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK Computer INC."),
+   DMI_MATCH(DMI_BOARD_NAME, "P4S800"),
},
},
-   {   /* Handle problems with rebooting on Dell 300's */
-   .callback = set_bios_reboot,
-   .ident = "Dell PowerEdge 300",
+
+   /* Apple */
+   {   /* Handle problems with rebooting on Apple MacBook5 */
+   .callback = set_pci_reboot,
+   .ident = "Apple MacBook5",
.matches = {
-   DMI_MATCH(DMI_SYS_VENDOR, "Dell Computer Corporation"),
-   DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge 300/"),
+   DMI_MATCH(DMI_SYS_VENDOR, "Apple Inc."),
+   DMI_MATCH(DMI_PRODUCT_NAME, "MacBook5"),
},
},
-   {   /* Handle problems with rebooting on Dell Optiplex 745's SFF */
-   .callback = set_bios_reboot,
-   .ident = "Dell OptiPlex 745",
+   {   /* Handle problems with rebooting on Apple MacBookPro5 */
+   .callback = set_pci_reboot,
+   .ident = "Apple MacBookPro5",
.matches = {
-   DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
-   DMI_MATCH(DMI_PRODUCT_NAME, "OptiPlex 745"),
+   DMI_MATCH(DMI_SYS_VENDOR, "Apple Inc."),
+   DMI_MATCH(DMI_PRODUCT_NAME, "MacBookPro5"),
},
},
-   {   /* Handle problems with rebooting on Dell Optiplex 745's DFF */
-   .callback = set_bios_reboot,
-   .ident = "Dell OptiPlex 745",
+   {   /* Handle problems with rebooting on Apple Macmini3,1 */
+   .callback = set_pci_reboot,
+   .ident = "Apple Macmini3,1",
.matches = {
-   DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
-   DMI_MATCH(DMI_PRODUCT_NAME, "OptiPlex 745"),
-   DMI_MATCH(DMI_BOARD_NAME, "0MM599"),
+   DMI_MATCH(DMI_SYS_VENDOR, "Apple Inc."),
+   DMI_MATCH(DMI_PRODUCT_NAME, "Macmini3,1"),
},
},
-   {   /* Handle problems with rebooting on Dell Optiplex 745 with 
0KW626 */
-   .callback = set_bios_reboot,
-   .ident = "Dell OptiPlex 745",
+   {   /* Handle problems with rebooting on the iMac9,1. */
+   .callback = set_pci_reboot,
+   .ident = "Apple iMac9,1",
.matches = {
-   DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
-   DMI_MATCH(DMI_PRODUCT_NAME, "OptiPlex 745"),
-   DMI_MATCH(DMI_BOARD_NAME, "0KW626"),
+   DMI_MATCH(DMI_SYS_VENDOR, "Apple Inc."),
+   DMI_MATCH(DMI_PRODUCT_NAME, "iMac9,1"),
},
},
-   {   /* Handle problems with rebooting on Dell Optiplex 330 with 
0KP561 */
+
+   /* Dell */
+   {   /* Handle problems with rebooting on Dell DXP061 */
.callback = set_bios_reboot,
-   .ident = "Dell OptiPlex 330",
+   .ident = "Dell DXP061",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
-   DMI_MATCH(DMI_PRODUCT_NAME, "OptiPlex 330"),
- 

About atags_proc buffer size

2013-09-27 Thread Vojtech Bocek
Hello,
I want to ask something about atags_proc.c implementation. Currently,
it uses a buffer to temporarily store atags. The buffer size is set to
1.5kb for some reason, but as far as I know, atag list's size is not
limited in any way. I've got a device (HTC One) which uses about 12kb
of tags, that means it panics during boot if CONFIG_ATAGS_PROC is
enabled, because the buffer contains only part of the tag list without
an end tag.

Why is only 1.5kb used? Is it some specific size the device should not
exceed?

I don't know much about the way ARM boot process works, but I tried to
store just the pointer to atag list, and it works fine (quick patch
attached). Do atags get erased later in boot process on some platforms,
is that the reason why buffer had to be used?
This requires modifications in kexec-tools, because it uses that 1.5kb
buffer, too. Again, here's my version of such modification[1]. If this
is okay, I can create a proper patch and submit it.

Yours,
Vojtech Bocek

[1]: 
https://github.com/Tasssadar/kexec-tools/commit/c6844e1ddb13a6b60cfefcb01c3843da97d6174c

---

diff --git a/arch/arm/kernel/atags_proc.c b/arch/arm/kernel/atags_proc.c
index c7ff807..12cc483 100644
--- a/arch/arm/kernel/atags_proc.c
+++ b/arch/arm/kernel/atags_proc.c
@@ -21,12 +21,11 @@ static const struct file_operations atags_fops = {
.llseek = default_llseek,
 };
 
-#define BOOT_PARAMS_SIZE 1536
-static char __initdata atags_copy[BOOT_PARAMS_SIZE];
+static const struct tag* __initdata atags_copy;
 
 void __init save_atags(const struct tag *tags)
 {
-   memcpy(atags_copy, tags, sizeof(atags_copy));
+   atags_copy = tags;
 }
 
 static int __init init_atags_procfs(void)
@@ -40,7 +39,7 @@ static int __init init_atags_procfs(void)
struct buffer *b;
size_t size;
 
-   if (tag->hdr.tag != ATAG_CORE) {
+   if (!atags_copy || tag->hdr.tag != ATAG_CORE) {
printk(KERN_INFO "No ATAGs?");
return -EINVAL;
}
@@ -49,7 +48,7 @@ static int __init init_atags_procfs(void)
;
 
/* include the terminating ATAG_NONE */
-   size = (char *)tag - atags_copy + sizeof(struct tag_header);
+   size = (char *)tag - (char *)atags_copy + sizeof(struct tag_header);
 
WARN_ON(tag->hdr.tag != ATAG_NONE);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH v2] efivars,efi-pstore: Hold off deletion of sysfs entry until the scan is completed

2013-09-27 Thread Seiji Aguchi
Change form v1
 - Rebase to 3.12-rc2

Currently, when mounting pstore file system, a read callback of efi_pstore
driver runs mutiple times as below.

- In the first read callback, scan efivar_sysfs_list from head and pass
  a kmsg buffer of a entry to an upper pstore layer.
- In the second read callback, rescan efivar_sysfs_list from the entry and pass
  another kmsg buffer to it.
- Repeat the scan and pass until the end of efivar_sysfs_list.

In this process, an entry is read across the multiple read function calls.
To avoid race between the read and erasion, the whole process above is
protected by a spinlock, holding in open() and releasing in close().

At the same time, kmemdup() is called to pass the buffer to pstore filesystem
during it.
And then, it causes a following lockdep warning.

To make the read callback runnable without taking spinlok,
holding off a deletion of sysfs entry if it happens while scanning it
via efi_pstore, and deleting it after the scan is completed.

To implement it, this patch introduces two flags, scanning and deleting,
to efivar_entry.
Also, __efivar_entry_get() is removed because it was used in efi_pstore only.

[1.143710] [ cut here ]
[1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740
lockdep_trace_alloc+0x104/0x110()
[1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
[1.144058] Modules linked in:

[1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
[1.144058]  0009 8800797e9ae0 816614a5
8800797e9b28
[1.144058]  8800797e9b18 8105510d 0080
0046
[1.144058]  00d0 03af 81ccd0c0
8800797e9b78
[1.144058] Call Trace:
[1.144058]  [] dump_stack+0x54/0x74
[1.144058]  [] warn_slowpath_common+0x7d/0xa0
[1.144058]  [] warn_slowpath_fmt+0x4c/0x50
[1.144058]  [] ? vsscanf+0x57f/0x7b0
[1.144058]  [] lockdep_trace_alloc+0x104/0x110
[1.144058]  [] __kmalloc_track_caller+0x50/0x280
[1.144058]  [] ?
efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [] kmemdup+0x20/0x50
[1.144058]  [] efi_pstore_read_func.part.1+0x12b/0x170
[1.144058]  [] ?
efi_pstore_read_func.part.1+0x170/0x170
[1.144058]  [] efi_pstore_read_func+0xb4/0xe0
[1.144058]  [] __efivar_entry_iter+0xfb/0x120
[1.144058]  [] efi_pstore_read+0x3f/0x50
[1.144058]  [] pstore_get_records+0x9a/0x150
[1.158207]  [] ? selinux_d_instantiate+0x1c/0x20
[1.158207]  [] ? parse_options+0x80/0x80
[1.158207]  [] pstore_fill_super+0xa5/0xc0
[1.158207]  [] mount_single+0xa2/0xd0
[1.158207]  [] pstore_mount+0x18/0x20
[1.158207]  [] mount_fs+0x39/0x1b0
[1.158207]  [] ? __alloc_percpu+0x10/0x20
[1.158207]  [] vfs_kern_mount+0x63/0xf0
[1.158207]  [] do_mount+0x23e/0xa20
[1.158207]  [] ? strndup_user+0x4b/0xf0
[1.158207]  [] SyS_mount+0x83/0xc0
[1.158207]  [] system_call_fastpath+0x16/0x1b
[1.158207] ---[ end trace 61981bc62de9f6f4 ]---

Signed-off-by: Seiji Aguchi 
---
 drivers/firmware/efi/efi-pstore.c | 145 +++---
 drivers/firmware/efi/efivars.c|   3 +-
 drivers/firmware/efi/vars.c   |  39 +++---
 include/linux/efi.h   |   4 +-
 4 files changed, 151 insertions(+), 40 deletions(-)

diff --git a/drivers/firmware/efi/efi-pstore.c 
b/drivers/firmware/efi/efi-pstore.c
index 5002d50..53001a5 100644
--- a/drivers/firmware/efi/efi-pstore.c
+++ b/drivers/firmware/efi/efi-pstore.c
@@ -18,14 +18,12 @@ module_param_named(pstore_disable, efivars_pstore_disable, 
bool, 0644);
 
 static int efi_pstore_open(struct pstore_info *psi)
 {
-   efivar_entry_iter_begin();
psi->data = NULL;
return 0;
 }
 
 static int efi_pstore_close(struct pstore_info *psi)
 {
-   efivar_entry_iter_end();
psi->data = NULL;
return 0;
 }
@@ -39,6 +37,23 @@ struct pstore_read_data {
char **buf;
 };
 
+/**
+ * efi_pstore_read_func
+ * @entry: reading entry
+ * @data:  data of the entry
+ *
+ * This function runs in non-atomic context.
+ *
+ * Also, it returns a size of NVRAM entry logged via efi_pstore_write().
+ * pstore in accordance with the returned value as below.
+ *
+ * size > 0: Got data of an entry logged via efi_pstore_write() successfully,
+ *   and pstore filesystem will continue reading subsequent entries.
+ * size == 0: Entry was not logged via efi_pstore_write(),
+ *and efi_pstore driver will continue reading subsequent entries.
+ * size < 0: Failed to get data of entry logging via efi_pstore_write(),
+ *   and pstore will stop reading entry.
+ */
 static int efi_pstore_read_func(struct efivar_entry *entry, void *data)
 {
efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
@@ -88,8 +103,9 @@ static int efi_pstore_read_func(struct efivar_entry *entry, 
void *data)
return 0;
 
entry->var.DataSize = 1024;
-   __efivar_entry_get(entry, 

Re: Fix apparent cut-n-paste mistake in Dell reboot workaround.

2013-09-27 Thread Dave Jones
On Fri, Sep 27, 2013 at 03:16:49PM -0500, H. Peter Anvin wrote:
 > On 09/24/2013 07:13 PM, Dave Jones wrote:
 > > This seems to have been copied from the Optiplex 990 entry above, but 
 > > somoene
 > > forgot to change the ident text.
 > > 
 > > Signed-off-by: Dave Jones 
 > > 
 > > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
 > > index f6835d8..ea2aaca 100644
 > > --- a/arch/x86/kernel/reboot.c
 > > +++ b/arch/x86/kernel/reboot.c
 > > @@ -360,7 +360,7 @@ static struct dmi_system_id __initdata 
 > > reboot_dmi_table[] = {
 > >},
 > >{   /* Handle problems with rebooting on the Precision M6600. */
 > >.callback = set_pci_reboot,
 > > -  .ident = "Dell OptiPlex 990",
 > > +  .ident = "Dell Precision M6600",
 > >.matches = {
 > >DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
 > >DMI_MATCH(DMI_PRODUCT_NAME, "Precision M6600"),
 > > 
 > 
 > It really is starting to feel like *ALL* Dell machines need reboot=pci?

Either that or 'bios'.  The question I have is, of those marked 'bios', does 
=pci
work too ?  If we knew that was true, I'd probably say yes.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kernel: replace strict_strto*() with kstrto*()

2013-09-27 Thread Andrew Morton
On Fri, 27 Sep 2013 19:53:53 +0200 Jean Delvare  wrote:

> Andrew,
> 
> On Fri, 27 Sep 2013 09:50:39 -0600, Bjorn Helgaas wrote:
> > There's some indication that this change might have broken handling of
> > signed types.  See
> > https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-September/111758.html
> > and https://bugzilla.kernel.org/show_bug.cgi?id=61811.
> 
> It seems this is hurting more users than I would have expected, and
> people are spending significant amounts of time to figure out what the
> root cause to their problem is. May I suggest that my fix should find
> its way to Linus' tree rather sooner than later?
> 

Done.  I had to send that one by hand as I'm not at my desk...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fix apparent cut-n-paste mistake in Dell reboot workaround.

2013-09-27 Thread H. Peter Anvin
On 09/24/2013 07:13 PM, Dave Jones wrote:
> This seems to have been copied from the Optiplex 990 entry above, but somoene
> forgot to change the ident text.
> 
> Signed-off-by: Dave Jones 
> 
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index f6835d8..ea2aaca 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -360,7 +360,7 @@ static struct dmi_system_id __initdata reboot_dmi_table[] 
> = {
>   },
>   {   /* Handle problems with rebooting on the Precision M6600. */
>   .callback = set_pci_reboot,
> - .ident = "Dell OptiPlex 990",
> + .ident = "Dell Precision M6600",
>   .matches = {
>   DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
>   DMI_MATCH(DMI_PRODUCT_NAME, "Precision M6600"),
> 

It really is starting to feel like *ALL* Dell machines need reboot=pci?

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 5/6] MCS Lock: Restructure the MCS lock defines and locking code into its own file

2013-09-27 Thread Jason Low
On Fri, Sep 27, 2013 at 12:38 PM, Tim Chen  wrote:

> BTW, is the above memory barrier necessary?  It seems like the xchg
> instruction already provided a memory barrier.
>
> Now if we made the changes that Jason suggested:
>
>
> /* Init node */
> -   node->locked = 0;
> node->next   = NULL;
>
> prev = xchg(lock, node);
> if (likely(prev == NULL)) {
> /* Lock acquired */
> -   node->locked = 1;
> return;
> }
> +   node->locked = 0;
> ACCESS_ONCE(prev->next) = node;
> smp_wmb();
>
> We are probably still okay as other cpus do not read the value of
> node->locked, which is a local variable.

Similarly, I was wondering if we should also move smp_wmb() so that it
is before the ACCESS_ONCE(prev->next) = node and after the
node->locked = 0. Would we want to guarantee that the node->locked
gets set before it is added to the linked list where a previous thread
calling mcs_spin_unlock() would potentially modify node->locked?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tile: use a more conservative __my_cpu_offset in CONFIG_PREEMPT

2013-09-27 Thread Chris Metcalf
On 9/26/2013 1:57 PM, Will Deacon wrote:
> Hi Chris,
>
> On Thu, Sep 26, 2013 at 06:24:53PM +0100, Chris Metcalf wrote:
>> [...]
>> +static inline unsigned long __my_cpu_offset(void)
>> +{
>> +unsigned long tp;
>> +register unsigned long *sp asm("sp");
>> +asm("move %0, tp" : "=r" (tp) : "m" (*sp));
>> +return tp;
>> +}
> Hehe, nice to see this hack working out for you too. One thing to check is
> whether you have any funky addressing modes (things like writeback or
> post-increment), since the "m" constraint can bite you if you don't actually
> use it in the asm.

Well, we do have post increments, though I don't see why this is a problem here.
We define a target specific constraint "U" that excludes post-increments, but
again I don't see why "m" would cause trouble here.  What was your experience?

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] clocksource: arm_arch_timer: Use clocksource for suspend timekeeping

2013-09-27 Thread Stephen Boyd
The ARM architected timers keep counting during suspend so we can
mark this clocksource with the CLOCK_SOURCE_SUSPEND_NONSTOP flag.
This flag will indicate that this clocksource can be used for
calculating suspend time and injecting sleep time into the
timekeeping core. This should be more accurate than using an
external RTC or architecture specific persistent clock.

Cc: Mark Rutland 
Signed-off-by: Stephen Boyd 
---
 drivers/clocksource/arm_arch_timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/arm_arch_timer.c 
b/drivers/clocksource/arm_arch_timer.c
index fbd9ccd..ce98d5e 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -389,7 +389,7 @@ static struct clocksource clocksource_counter = {
.rating = 400,
.read   = arch_counter_read,
.mask   = CLOCKSOURCE_MASK(56),
-   .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
+   .flags  = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_SUSPEND_NONSTOP,
 };
 
 static struct cyclecounter cyclecounter = {
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >