date:20140527

[RFC] Bluetooth: Keep master role when SCO or eSCO is active

2014-05-27 Thread Kiran Kumar Raparthy

From: "hyungseoung.yoo" 

Preserve the master role when SCO or eSCO is active
as this improves compatability with lots of
headset and chipset combinations.

This is one of the number of patches from the Android AOSP
common.git tree, which is used on almost all Android devices.
It looks like it would improve support for compatibility with
lot of headset,so I wanted to submit it for review to see
if it should go upstream.

Cc: Marcel Holtmann  (maintainer:BLUETOOTH SUBSYSTEM)
Cc: Gustavo Padovan  (maintainer:BLUETOOTH SUBSYSTEM)
Cc: Johan Hedberg  (maintainer:BLUETOOTH SUBSYSTEM)
Cc: "David S. Miller"  (maintainer:NETWORKING [GENERAL])
Cc: linux-blueto...@vger.kernel.org (open list:BLUETOOTH SUBSYSTEM)
Cc: net...@vger.kernel.org (open list:NETWORKING [GENERAL])
Cc: linux-kernel@vger.kernel.org (open list)
Cc: Android Kernel Team 
Cc: John Stultz 
Signed-off-by: hyungseoung.yoo 
Signed-off-by: Jaikumar Ganesh 
[kiran: Added context to commit message]
Signed-off-by: Kiran Raparthy 
---
 net/bluetooth/hci_event.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 15010a2..6f944d5 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -1915,6 +1915,15 @@ unlock:
hci_conn_check_pending(hdev);
 }
 
+static inline bool is_sco_active(struct hci_dev *hdev)
+{
+   if (hci_conn_hash_lookup_state(hdev, SCO_LINK, BT_CONNECTED) ||
+   (hci_conn_hash_lookup_state(hdev, ESCO_LINK,
+   BT_CONNECTED)))
+   return true;
+   return false;
+}
+
 static void hci_conn_request_evt(struct hci_dev *hdev, struct sk_buff *skb)
 {
struct hci_ev_conn_request *ev = (void *) skb->data;
@@ -1961,7 +1970,8 @@ static void hci_conn_request_evt(struct hci_dev *hdev, 
struct sk_buff *skb)
 
bacpy(, >bdaddr);
 
-   if (lmp_rswitch_capable(hdev) && (mask & HCI_LM_MASTER))
+   if (lmp_rswitch_capable(hdev) && ((mask & HCI_LM_MASTER)
+   || is_sco_active(hdev)))
cp.role = 0x00; /* Become master */
else
cp.role = 0x01; /* Remain slave */
-- 
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next V6 1/2] cpumask: Utility function to set n'th cpu - local cpu first

2014-05-27 Thread Or Gerlitz

On Tue, May 27, 2014 at 10:24 PM, David Miller  wrote:

> I would like someone who cares about these cpumask interfaces to provide
> a review.

understood, still, looking in the git log of that file didn't yield
much only 1-2 commits per years for 2011/12/13, so, any concrete
suggestion?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

xfs: possible deadlock warning

2014-05-27 Thread Gu Zheng

Hi all,
When running the latest Linus' tree, the following possible deadlock warning 
occurs.

[  140.949000] ==
[  140.949000] [ INFO: possible circular locking dependency detected ]
[  140.949000] 3.15.0-rc7+ #93 Not tainted
[  140.949000] ---
[  140.949000] qemu-kvm/5056 is trying to acquire lock:
[  140.949000]  (>lock){+.+.+.}, at: [] 
inode_doinit_with_dentry+0xa5/0x640
[  140.949000] 
[  140.949000] but task is already holding lock:
[  140.949000]  (>mmap_sem){++}, at: [] 
vm_mmap_pgoff+0x6f/0xc0
[  140.949000] 
[  140.949000] which lock already depends on the new lock.
[  140.949000] 
[  140.949000] 
[  140.949000] the existing dependency chain (in reverse order) is:
[  140.949000] 
[  140.949000] -> #2 (>mmap_sem){++}:
[  140.949000][] __lock_acquire+0xadc/0x12f0
[  140.949000][] lock_acquire+0xa2/0x130
[  140.949000][] might_fault+0x8c/0xb0
[  140.949000][] filldir+0x91/0x120
[  140.949000][] xfs_dir2_block_getdents+0x1e8/0x250 
[xfs]
[  140.949000][] xfs_readdir+0xda/0x120 [xfs]
[  140.949000][] xfs_file_readdir+0x2b/0x40 [xfs]
[  140.949000][] iterate_dir+0xa8/0xe0
[  140.949000][] SyS_getdents+0x8a/0x120
[  140.949000][] system_call_fastpath+0x16/0x1b
[  140.949000] 
[  140.949000] -> #1 (_dir_ilock_class){.+}:
[  140.949000][] __lock_acquire+0xadc/0x12f0
[  140.949000][] lock_acquire+0xa2/0x130
[  140.949000][] down_read_nested+0x57/0xa0
[  140.949000][] xfs_ilock+0xf2/0x120 [xfs]
[  140.949000][] xfs_ilock_attr_map_shared+0x34/0x40 
[xfs]
[  140.949000][] xfs_attr_get+0x79/0xb0 [xfs]
[  140.949000][] xfs_xattr_get+0x37/0x50 [xfs]
[  140.949000][] generic_getxattr+0x4f/0x70
[  140.949000][] inode_doinit_with_dentry+0x150/0x640
[  140.949000][] sb_finish_set_opts+0xd8/0x270
[  140.949000][] selinux_set_mnt_opts+0x28f/0x5e0
[  140.949000][] superblock_doinit+0x68/0xd0
[  140.949000][] delayed_superblock_init+0x10/0x20
[  140.949000][] iterate_supers+0xb2/0x110
[  140.949000][] selinux_complete_init+0x33/0x40
[  140.949000][] security_load_policy+0xf4/0x600
[  140.949000][] sel_write_load+0xac/0x750
[  140.949000][] vfs_write+0xbd/0x1f0
[  140.949000][] SyS_write+0x49/0xb0
[  140.949000][] system_call_fastpath+0x16/0x1b
[  140.949000] 
[  140.949000] -> #0 (>lock){+.+.+.}:
[  140.949000][] check_prevs_add+0x951/0x970
[  140.949000][] __lock_acquire+0xadc/0x12f0
[  140.949000][] lock_acquire+0xa2/0x130
[  140.949000][] mutex_lock_nested+0x78/0x4f0
[  140.949000][] inode_doinit_with_dentry+0xa5/0x640
[  140.949000][] selinux_d_instantiate+0x1c/0x20
[  140.949000][] security_d_instantiate+0x1b/0x30
[  140.949000][] d_instantiate+0x50/0x70
[  140.95][] __shmem_file_setup+0xe0/0x1d0
[  140.95][] shmem_zero_setup+0x28/0x70
[  140.95][] mmap_region+0x543/0x5a0
[  140.95][] do_mmap_pgoff+0x301/0x3d0
[  140.95][] vm_mmap_pgoff+0x90/0xc0
[  140.95][] vm_mmap+0x2d/0x40
[  140.95][] 
kvm_arch_prepare_memory_region+0x47/0x60 [kvm]
[  140.95][] __kvm_set_memory_region+0x1ff/0x770 
[kvm]
[  140.95][] kvm_set_memory_region+0x2d/0x50 [kvm]
[  140.95][] vmx_set_tss_addr+0x4a/0x190 
[kvm_intel]
[  140.95][] kvm_arch_vm_ioctl+0x9c0/0xb80 [kvm]
[  140.95][] kvm_vm_ioctl+0x8e/0x730 [kvm]
[  140.95][] do_vfs_ioctl+0x300/0x520
[  140.95][] SyS_ioctl+0x81/0xa0
[  140.95][] system_call_fastpath+0x16/0x1b
[  140.95] 
[  140.95] other info that might help us debug this:
[  140.95] 
[  140.95] Chain exists of:
[  140.95]   >lock --> _dir_ilock_class --> >mmap_sem
[  140.95] 
[  140.95]  Possible unsafe locking scenario:
[  140.95] 
[  140.95]CPU0CPU1
[  140.95]
[  140.95]   lock(>mmap_sem);
[  140.95]lock(_dir_ilock_class);
[  140.95]lock(>mmap_sem);
[  140.95]   lock(>lock);
[  140.95] 
[  140.95]  *** DEADLOCK ***
[  140.95] 
[  140.95] 2 locks held by qemu-kvm/5056:
[  140.95]  #0:  (>slots_lock){+.+.+.}, at: [] 
kvm_set_memory_region+0x22/0x50 [kvm]
[  140.95]  #1:  (>mmap_sem){++}, at: [] 
vm_mmap_pgoff+0x6f/0xc0
[  140.95] 
[  140.95] stack backtrace:
[  140.95] CPU: 76 PID: 5056 Comm: qemu-kvm Not tainted 3.15.0-rc7+ #93
[  140.95] Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 
Series BIOS Version 01.48 05/07/2014
[  140.95]  823925a0 880830ba7750 81638c00

Re: [PATCH v4 00/16] PCI/iommu: Fix DMA alias problems

2014-05-27 Thread Pat Erley


On 05/22/2014 06:07 PM, Alex Williamson wrote:

For testing, this version can be found in my git tree:

git://github.com/awilliam/linux-vfio.git dma-alias-v4

Please report any issues.

v4:
  - Change dma_func_alias to dma_alias_devfn, holding a single
devfn to alias, thereby supporting aliases to the wrong slot.
The DMA alias iterator is easily changed, but IOMMU grouping
requires significant rework.  This is now done in IOMMU code
rather than PCI code.

  - AMD-Vi - try to incorporate IVRS aliases dynamically into
PCI alias quirks to make sure that our grouping remains the
same.  Potentially this could end up reporting BIOS aliases
that we can add to our list of quirks.

v3:
  - Found several instances where I had PCI_SLOT when I meant
PCI_FUNC.  Thanks to Andrew for spotting this.  This should
fix the problem he was having with Ricoh quirks.  We also
pruned down the func0 quirks to only those that we know are
needed.  We can always add them back later.

  - Found a case in intel-iommu of using dev_is_pci() where I
really wanted !dev_is_pci().  Fixed.

v2:
  - Several new Marvell controllers added to quirks.  There's been
a lot of success reported with this series in
https://bugzilla.kernel.org/show_bug.cgi?id=42679

  - Add quirk for ASMedia and Tundra PCIe-to-PCI bridges that do
not expose a PCIe capability.  These have been shown to use
the standard PCIe-to-PCI bridge requester ID.

  - Fix copy/paste duplicate Ricoh quirk ID

  - Fixed AMD IOMMU for the "ghost" function case where the DMA
alias is for an absent device.  The iommu rlookup table and
data fields need to be initializes.

  - Fixed Intel interrupt remapping, I wasn't passing the target
bus number, only the alias bus number.

These patches are split across PCI and IOMMU, but I've front-loaded
all of the PCI infrastructure so that the first 7 patches can be
applied to PCI-core, the IOMMU maintainers can pickup their patches,
then we can finish with dead code removal.  Bjorn might also be
willing to carry the IOMMU changes if the maintainers want to ack
them.

Original description:

This series attempts to fix a couple issues we've had outstanding in
the PCI/IOMMU code for a while.  The first issue is with devices that
use the wrong requester ID for DMA transactions.  We already have a
sort of half-baked attempt to fix this for several Ricoh devices, but
the fix only helps them be useful through IOMMU groups, not the
general DMA case.  There are also several Marvell devices which use
use a different wrong requester ID and don't even fit into the DMA
source idea.  This series creates a DMA alias iterator that will
step through each possible alias of a device, allowing IOMMUs to
insert mappings for both the device and its aliases.

Hand-in-hand with this is our broken pci_find_upstream_pcie_bridge()
function, which is known to blowup when it finds itself suddenly at
a PCIe device without crossing a PCIe-to-PCI bridge (as identified by
the PCIe capability).  It also likes to make the invalid assumption
that a PCIe device never has its requester ID masked by any usptream
bus.  We can fix this using the above new DMA alias iterator, since
that's effectively what this function was meant to do.

Finally, with all these helpers, it makes sense to consolidate code
for determining IOMMU groups.  The first step in finding the root
of a group is finding the final upstream DMA alias for the device,
then applying additional ACS rules and incorporating device specific
aliases.  As this is all common to PCI, create a single implementation
and remove piles of code from the individual IOMMU drivers.

This series allows devices like the Marvell 88SE9123 to finally work
on Linux with either AMD-Vi or VT-d enabled on the box.  I've
collected device IDs from various bugs to support as many SKUs of
these devices as possible, but I'm sure there are others that I've
missed.

This should also enable motherboards with an onboard ASmedia
ASM1083/1085 PCIe-to-PCI bridge to work with VT-d enabled.  I've
acquired an adapter board with this chip, but it actually exposes
a PCIe capability, unlike most of the onboard controllers.  Therefore
I expect this series will fix the WARN_ON currently hit during boot,
but there's a 50/50 chance whether the device behaves like a PCI
bridge or a PCIe bridge with regard to the requester ID that it uses
to take ownership of the transaction.  If it turns out to use the
PCIe bridge model, I expect we can quirk it using a dev_flags bit
to identify a PCI bridge that takes ownership as if it was a PCIe
bridge.

Please test and provide feedback.  I expect IOMMU group topology
should not change from this series, but if a case is found where it
does, please share.  Also, if there are additional quirks we need
to add, please either file new or add to the existing bugs.  Thanks,

Alex

---

Alex Williamson (16):
   PCI: Add DMA alias iterator
   PCI: define

Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker

2014-05-27 Thread Christoph Hellwig

On Tue, May 27, 2014 at 08:31:18PM -0600, Jens Axboe wrote:
> Christoph, I'll just run a few tests and then queue it up in the morning. 
> Can you send a properly signed-off patch with a commit message as well? I 
> was writing one up, but I still need the signed-off-by.

Attached.

>From 125823de325211c3e96dad884b0d1a52ec04947d Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Wed, 21 May 2014 19:37:11 +0200
Subject: blk-mq: add helper to insert requests from irq context

Both the cache flush state machine and the SCSI midlayer want to submit
requests from irq context, and the current per-request requeue_work
unfortunately causes corruption due to sharing with the csd field for
flushes.  Replace them with a per-request_queue list of requests to
be requeued.

Based on an earlier test by Ming Lei.

Signed-off-by: Christoph Hellwig 
Reported-by: Ming Lei 
Tested-by: Ming Lei 
---
 block/blk-flush.c  |   16 +++-
 block/blk-mq.c |   64 +++-
 include/linux/blk-mq.h |3 +++
 include/linux/blkdev.h |5 +++-
 4 files changed, 74 insertions(+), 14 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index ec7a224..ef608b3 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -130,21 +130,13 @@ static void blk_flush_restore_request(struct request *rq)
 	blk_clear_rq_complete(rq);
 }
 
-static void mq_flush_run(struct work_struct *work)
-{
-	struct request *rq;
-
-	rq = container_of(work, struct request, requeue_work);
-
-	memset(>csd, 0, sizeof(rq->csd));
-	blk_mq_insert_request(rq, false, true, false);
-}
-
 static bool blk_flush_queue_rq(struct request *rq, bool add_front)
 {
 	if (rq->q->mq_ops) {
-		INIT_WORK(>requeue_work, mq_flush_run);
-		kblockd_schedule_work(>requeue_work);
+		struct request_queue *q = rq->q;
+
+		blk_mq_add_to_requeue_list(rq, add_front);
+		blk_mq_kick_requeue_list(q);
 		return false;
 	} else {
 		if (add_front)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 62082c5..0457010 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -510,10 +510,68 @@ void blk_mq_requeue_request(struct request *rq)
 	blk_clear_rq_complete(rq);
 
 	BUG_ON(blk_queued_rq(rq));
-	blk_mq_insert_request(rq, true, true, false);
+	blk_mq_add_to_requeue_list(rq, true);
 }
 EXPORT_SYMBOL(blk_mq_requeue_request);
 
+static void blk_mq_requeue_work(struct work_struct *work)
+{
+	struct request_queue *q =
+		container_of(work, struct request_queue, requeue_work);
+	LIST_HEAD(rq_list);
+	struct request *rq, *next;
+	unsigned long flags;
+
+	spin_lock_irqsave(>requeue_lock, flags);
+	list_splice_init(>requeue_list, _list);
+	spin_unlock_irqrestore(>requeue_lock, flags);
+
+	list_for_each_entry_safe(rq, next, _list, queuelist) {
+		if (!(rq->cmd_flags & REQ_SOFTBARRIER))
+			continue;
+
+		rq->cmd_flags &= ~REQ_SOFTBARRIER;
+		list_del_init(>queuelist);
+		blk_mq_insert_request(rq, true, false, false);
+	}
+
+	while (!list_empty(_list)) {
+		rq = list_entry(rq_list.next, struct request, queuelist);
+		list_del_init(>queuelist);
+		blk_mq_insert_request(rq, false, false, false);
+	}
+
+	blk_mq_run_queues(q, false);
+}
+
+void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
+{
+	struct request_queue *q = rq->q;
+	unsigned long flags;
+
+	/*
+	 * We abuse this flag that is otherwise used by the I/O scheduler to
+	 * request head insertation from the workqueue.
+	 */
+	BUG_ON(rq->cmd_flags & REQ_SOFTBARRIER);
+
+	spin_lock_irqsave(>requeue_lock, flags);
+	if (at_head) {
+		rq->cmd_flags |= REQ_SOFTBARRIER;
+		list_add(>queuelist, >requeue_list);
+	} else {
+		list_add_tail(>queuelist, >requeue_list);
+	}
+	spin_unlock_irqrestore(>requeue_lock, flags);
+}
+EXPORT_SYMBOL(blk_mq_add_to_requeue_list);
+
+void blk_mq_kick_requeue_list(struct request_queue *q)
+{
+	kblockd_schedule_work(>requeue_work);
+}
+EXPORT_SYMBOL(blk_mq_kick_requeue_list);
+
 struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
 {
 	return tags->rqs[tag];
@@ -1777,6 +1835,10 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
 
 	q->sg_reserved_size = INT_MAX;
 
+	INIT_WORK(>requeue_work, blk_mq_requeue_work);
+	INIT_LIST_HEAD(>requeue_list);
+	spin_lock_init(>requeue_lock);
+
 	if (q->nr_hw_queues > 1)
 		blk_queue_make_request(q, blk_mq_make_request);
 	else
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f76bb18..81bb7f1 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -173,6 +173,9 @@ void __blk_mq_end_io(struct request *rq, int error);
 
 void blk_mq_requeue_request(struct request *rq);
 
+void blk_mq_add_to_requeue_list(struct request *rq, bool at_head);
+void blk_mq_kick_requeue_list(struct request_queue *q);
+
 void blk_mq_complete_request(struct request *rq);
 
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b0104ba..e90e169 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@

Re: blk-mq: refactor request allocation

2014-05-27 Thread Christoph Hellwig

On Tue, May 27, 2014 at 02:58:08PM -0600, Jens Axboe wrote:
> On 05/27/2014 12:59 PM, Christoph Hellwig wrote:
> > This series streamlines the request allocation path.
> > 
> 
> Series looks innocuous enough to me, but it's about a 1.5% performance
> drop here with an actual device. These tests are very stable, anything
> over ~0.1% is definitely outside of noise. I repeated and rebooted a few
> times and tested both, it's persistent. No smoking guns in the profile.

Can you do a bisect to narrow it down to one of the patches?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] memory-failure: Send right signal code to correct thread

2014-05-27 Thread Tony Luck

I'm exploring options to see what writers of threaded applications might 
want/need. I'm very doubtful that they would really want "broadcast to all 
threads". What if there are hundreds or thousands of threads? We send the 
signals from the context of the thread that hit the error. But that might take 
a while. Meanwhile any of those threads that were already scheduled on other 
CPUs are back running again. So there are big races even if we broadcast.

Sent from my iPhone

> On May 27, 2014, at 17:15, Naoya Horiguchi  wrote:
> 
> On Tue, May 27, 2014 at 03:53:55PM -0700, Tony Luck wrote:
>>> - make sure that every thread in a recovery aware application should have
>>>   a SIGBUS handler, inside which
>>>   * code for SIGBUS(BUS_MCEERR_AR) is enabled for every thread
>>>   * code for SIGBUS(BUS_MCEERR_AO) is enabled only for a dedicated thread
>> 
>> But how does the kernel know which is the special thread that
>> should see the "AO" signal?  Broadcasting the signal to all
>> threads seems to be just as likely to cause problems to
>> an application as the h/w broadcasting MCE to all processors.
> 
> I thought that kernel doesn't have to know about which thread is the
> special one if the AO signal is broadcasted to all threads, because
> in such case the special thread always gets the AO signal.
> 
> The reported problem happens only the application sets PF_MCE_EARLY flag,
> and such application is surely recovery aware, so we can assume that the
> coders must implement SIGBUS handler for all threads. Then all other threads
> but the special one can intentionally ignore AO signal. This is to avoid the
> default behavior for SIGBUS ("kill all threads" as Kamil said in the previous
> email.)
> 
> And I hope that downside of signal broadcasting is smaller than MCE
> broadcasting because the range of broadcasting is limited to a process group,
> not to the whole system.
> 
> # I don't intend to rule out other possibilities like adding another prctl
> # flag, so if you have a patch, that's would be great.
> 
> Thanks,
> Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch V3 19/37] x86, irq: introduce mechanisms to support dynamically allocate IRQ for IOAPIC

2014-05-27 Thread Jiang Liu

Hi Thomas,
Thanks for your comments. Please refer to inline
comments below.

On 2014/5/28 3:58, Thomas Gleixner wrote:
> Jiang,
> 
> On Tue, 27 May 2014, Jiang Liu wrote:
> 
>> +static int alloc_irq_from_domain(struct irq_domain *domain, u32 gsi, int 
>> pin)
>>  {
>> +int irq = -1;
>> +
>> +if (gsi >= arch_dynirq_lower_bound(0)) {
>> +irq = irq_create_mapping(domain, pin);
>> +} else if (gsi < NR_IRQS_LEGACY) {
>> +if (!ioapic_identity_map)
>> +irq = irq_create_mapping(domain, pin);
>> +else if (irq_domain_associate(domain, gsi, pin) == 0)
>> +irq = gsi;
>> +} else if (irq_create_strict_mappings(domain, gsi, pin, 1) == 0) {
>> +irq = gsi;
>> +}
> 
> So you have these cases covered here:
> 
> 1) The ACPI case of secondary ioapics. You only have the strict 1:1
>mapping for the first ioapic
> 
> 2) The gsi < NR_IRQS_LEGACY case where you have two options:
> 
> a) Let the core create a random virq number if ioapic_identity_map
>is 0
> 
>ioapic_identity_map is only set by SFI and devicetree
> 
>So in all other cases we fall into that code path for all
>legacy interrupts. So how is that supposed to work lets say for
>i8042 which has hardcoded irq 1 and 12?
> 
>irq_create_mapping(1)
>
>   hint = 1 % nr_irqs; --> 1
>   virq = irq_alloc_desc_from(hint, of_node_to_nid(domain->of_node));
> 
>   This returns something >= 16, because the irq descriptors
>   for 0-15 (LEGACY) are allocated already.
> 
>The pin association works, but how is the i8042 driver supposed
>to figure out that it should request the virq >=16 which was
>created instead of the hardcoded 1 ?
This is used to work around special non-ISA interrupts with GSI below
NR_IRQS_LEGACY. The original code for the special case is:
/*
 * Provide an identity mapping of gsi == irq except on truly
 * weird platforms that have non isa irqs in the first 16 gsis.
 */
return gsi >= NR_IRQS_LEGACY ? gsi : gsi_top + gsi;

We have one path to handle ISA IRQs before calling
alloc_irq_from_domain() as below:
if (idx >= 0 && test_bit(mp_irqs[idx].srcbus, mp_bus_not_pci))
return mp_irqs[idx].srcbusirq;

>   
> b) Associate the gsi and the pin
> 
>This only works because the virqs are already allocated at boot
>time unconditionally due to arch_probe_nr_irqs() returning
>NR_IRQS_LEGACY. So irq_domain_associate() works.
>Undocumented works by chance behaviour.
Yes. It's a good suggestion to enhance legacy_pic to make this
code more clear.

> 
> 3) The case where gsi < arch_dynirq_lower_bound()
> 
>You create a strict mapping here, fine.
> 
> This is confusing at best.
> 
> First of all, we should use legacy_pic->nr_legacy_irqs instead of
> NR_IRQS_LEGACY all over the place.
> 
> mshyperv, ce4100 and intel-mid use the null_legacy_pic which has
> nr_legacy_irqs = 0 and everything else uses the real pic which has
> nr_legacy_irqs = NR_IRQS_LEGACY. So why do we even bother to allocate
> and deal with NR_IRQS_LEGACY in the cases where we have no legacy?
I'm not sure whether it works with ce4100, so used NR_IRQS_LEGACY
instead of legacy_pic->nr_legacy_irqs for safety. Will try to refine
it in next version.

> 
> ce4100 is an oddball though. The ioapic is registered way before the
> interrupt subsystem is initialized and I have a hard time to
> understand that comment:
> 
> /* We can't set this earlier, because we need to calibrate the timer 
> */
> legacy_pic = _legacy_pic;
I haven't figured out the story behind the comment yet:(

> 
> The timer calibration happens after the interrupts are set up. I
> assume it's check_timer() which wants that, but we know exactly how
> the ce4100 works, so we might be able to avoid that whole "testing"
> stuff. Sebastian, any input on this?
> 
> If it turns out that ce4100 needs the inital real legacy pic for some
> magic reason we still can be clever by extending the legacy pic data
> structure to tell us about that change, i.e. instead of using
> legacy_pic->nr_legacy_irqs having a field "nr_allocated_irqs", which
> is set to NR_IRQS_LEGACY for the real pic and to 0 for the null_pic
> and let ce4100 set that field to NR_IRQS_LEGACY before switching the
> legacy_pic over to the null implementation.
Good suggestion, will try this way.

> But what's really disgusting is the magic ioapic_identity_map and the
> extra ACPI specific ioapic_dynirq_base hackery.
> 
> Why do we need strict mappings in the non ACPI case for all ioapic
> pins? What's so different about ACPI? Or is this just to avoid
> breaking the existing SFI/devicetree stuff. If that's the reason I'm
> fine with it, but ...
It's to avoid breaking SFI/intel_mid stuff. intel_mid assumes IRQ
number equals to pin number and use pci_dev->irq to save both IRQ
number and pin number.

Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-05-27 Thread Michael Kerrisk (man-pages)

On 05/27/2014 10:30 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, May 27, 2014 at 09:28:37PM +0200, Michael Kerrisk (man-pages) 
> escreveu:
>> On Tue, May 27, 2014 at 9:21 PM, Arnaldo Carvalho de Melo
>>  wrote:
>>> Em Tue, May 27, 2014 at 06:35:17PM +0200, Michael Kerrisk (man-pages) 
>>> escreveu:
 On 05/26/2014 11:17 PM, Arnaldo Carvalho de Melo wrote:
> Can you try the attached patch on top of the first one?
>>>
 Patches on patches is a way to make your testers work unnecessarily
 harder. Also, it means that anyone else who was interested in this
>>>
>>> It was meant to highlight the changes with regard to the previous patch,
>>> i.e. to make things easier for reviewing.
>>
>> (I don't think that works...)
> 
> Lets try both then, attached goes the updated patch, and this is the
> diff to the last combined one:

What tree does this apply to? I tried applying to 3.15-rc7, but a piece 
was rejected, and the fix was not obvious.

Cheers,

Michael


drivers/net/tun.c.rej

--- drivers/net/tun.c
+++ drivers/net/tun.c
@@ -1343,7 +1343,7 @@
 
/* Read frames from queue */
skb = __skb_recv_datagram(tfile->socket.sk, noblock ? MSG_DONTWAIT : 0,
- , , );
+ , , , timeop);
if (skb) {
ret = tun_put_user(tun, tfile, skb, iv, len);
kfree_skb(skb);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] notify block layer when using temporary change to cache_type

2014-05-27 Thread Vaughan Cao

On 05/28/2014 12:18 AM, James Bottomley wrote:
> On Tue, 2014-05-27 at 19:39 +0800, Vaughan Cao wrote:
>> This is a fix for commit:
>>   39c60a0948cc06139e2fbfe084f83cb7e7deae3b sd: fix array cache flushing bug 
>> causing performance problems
>> We must notify the block layer via q->flush_flags after temporary change the 
>> cache_type to write through.
>> If not, SYNCHRONIZE CACHE command will still be generated.
>>
>> Signed-off-by: Vaughan Cao 
>> ---
>>  drivers/scsi/sd.c | 12 
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 6146b9d..366e48b 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -144,6 +144,7 @@ sd_store_cache_type(struct device *dev, struct 
>> device_attribute *attr,
> This is a weird function name.  The name in the vanilla kernel is
> cache_type_store().
Sorry, this patch is created on oracle uek kernel, I just checked it can
apply to vanilla kernel cleanly and didn't notice they are using
different device attribute function name.

>>  struct scsi_sense_hdr sshdr;
>>  static const char temp[] = "temporary ";
>>  int len;
>> +unsigned flush;
>>  
>>  if (sdp->type != TYPE_DISK)
>>  /* no cache control on RBC devices; theoretically they
>> @@ -174,6 +175,17 @@ sd_store_cache_type(struct device *dev, struct 
>> device_attribute *attr,
>>  if (sdkp->cache_override) {
>>  sdkp->WCE = wce;
>>  sdkp->RCD = rcd;
>> +
>> +/* set flush_flags to notify the block layer */
>> +flush = 0;
>> +if (sdkp->WCE) {
>> +flush |= REQ_FLUSH;
>> +if (sdkp->DPOFUA)
>> +flush |= REQ_FUA;
>> +}
>> +
>> +blk_queue_flush(sdkp->disk->queue, flush);
>> +
> Is there a reason you cut and paste from sd_revalidate_disk() instead of
> just calling it directly?
No, I just want to keep the modification as small as possible. Checked
the actions of sd_revalidate_disk() again, it seems no harm to call it
from here. Also actual actions are skipped in sd_read_cache_type() if
cache_override!=0, so I suppose your original plan is to jump to
sd_revalidate_disk() from here.

However, that way may not be acceptable after further code review.
Changing the sdkp->WCE,RCD will cause these real parameters of the
underlying device *lost*. In sd_shutdown() and sd_suspend_common(), we
need them to call sd_sync_cache() if necessary. I don't think this
action is avoidable, just for the performance issue to solve. And we
can't change the mode just by setting q->flush_flags while leave
sdkp->WCE,RCD untouched either, because cache_type_show() needs
sdkp->WCE,RCD to present the temporary config to userspace.

static void sd_shutdown(struct device *dev)
{
...
if (sdkp->WCE && sdkp->media_present) {
sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n");
sd_sync_cache(sdkp);
}

static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
{
...
if (sdkp->WCE && sdkp->media_present) {
sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n");
ret = sd_sync_cache(sdkp);

So, It seems new fields like realWCE and readRCD in scsi_disk are needed
to save those configuration. What's your opinion?

Vaughan
>
> James
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] extcon: Reorder the sequence of extcon device driver alphabetically

2014-05-27 Thread Chanwoo Choi

This patch reorder the sequence of extcon device diver alphabetically
to imporbe readability.

Signed-off-by: Chanwoo Choi 
---
 drivers/extcon/Kconfig  | 28 ++--
 drivers/extcon/Makefile |  6 +++---
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/extcon/Kconfig b/drivers/extcon/Kconfig
index aebde48..9125eba 100644
--- a/drivers/extcon/Kconfig
+++ b/drivers/extcon/Kconfig
@@ -14,6 +14,20 @@ if EXTCON
 
 comment "Extcon Device Drivers"
 
+config EXTCON_ADC_JACK
+   tristate "ADC Jack extcon support"
+   depends on IIO
+   help
+ Say Y here to enable extcon device driver based on ADC values.
+
+config EXTCON_ARIZONA
+   tristate "Wolfson Arizona EXTCON support"
+   depends on MFD_ARIZONA && INPUT && SND_SOC
+   help
+ Say Y here to enable support for external accessory detection
+ with Wolfson Arizona devices. These are audio CODECs with
+ advanced audio accessory detection support.
+
 config EXTCON_GPIO
tristate "GPIO extcon support"
depends on GPIOLIB
@@ -21,12 +35,6 @@ config EXTCON_GPIO
  Say Y here to enable GPIO based extcon support. Note that GPIO
  extcon supports single state per extcon instance.
 
-config EXTCON_ADC_JACK
-   tristate "ADC Jack extcon support"
-   depends on IIO
-   help
- Say Y here to enable extcon device driver based on ADC values.
-
 config EXTCON_MAX14577
tristate "MAX14577/77836 EXTCON Support"
depends on MFD_MAX14577
@@ -55,14 +63,6 @@ config EXTCON_MAX8997
  Maxim MAX8997 PMIC. The MAX8997 MUIC is a USB port accessory
  detector and switch.
 
-config EXTCON_ARIZONA
-   tristate "Wolfson Arizona EXTCON support"
-   depends on MFD_ARIZONA && INPUT && SND_SOC
-   help
- Say Y here to enable support for external accessory detection
- with Wolfson Arizona devices. These are audio CODECs with
- advanced audio accessory detection support.
-
 config EXTCON_PALMAS
tristate "Palmas USB EXTCON support"
depends on MFD_PALMAS
diff --git a/drivers/extcon/Makefile b/drivers/extcon/Makefile
index bf7861e..e48abc6 100644
--- a/drivers/extcon/Makefile
+++ b/drivers/extcon/Makefile
@@ -1,12 +1,12 @@
-#
+
 # Makefile for external connector class (extcon) devices
 #
 
 obj-$(CONFIG_EXTCON)   += extcon-class.o
-obj-$(CONFIG_EXTCON_GPIO)  += extcon-gpio.o
 obj-$(CONFIG_EXTCON_ADC_JACK)  += extcon-adc-jack.o
+obj-$(CONFIG_EXTCON_ARIZONA)   += extcon-arizona.o
+obj-$(CONFIG_EXTCON_GPIO)  += extcon-gpio.o
 obj-$(CONFIG_EXTCON_MAX14577)  += extcon-max14577.o
 obj-$(CONFIG_EXTCON_MAX77693)  += extcon-max77693.o
 obj-$(CONFIG_EXTCON_MAX8997)   += extcon-max8997.o
-obj-$(CONFIG_EXTCON_ARIZONA)   += extcon-arizona.o
 obj-$(CONFIG_EXTCON_PALMAS)+= extcon-palmas.o
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v11 2/3] clk: exynos5410: register clocks using common clock framework

2014-05-27 Thread Kukjin Kim

Mike Turquette wrote:
> 
> Quoting Tarek Dakhran (2014-05-25 20:23:32)
> > The EXYNOS5410 clocks are statically listed and registered
> > using the Samsung specific common clock helper functions.
> >
> > Signed-off-by: Tarek Dakhran 
> > Signed-off-by: Vyacheslav Tyrtov 
> > ---
> >  .../devicetree/bindings/clock/exynos5410-clock.txt |   45 +
> >  drivers/clk/samsung/Makefile   |1 +
> >  drivers/clk/samsung/clk-exynos5410.c   |  209
> 
> >  include/dt-bindings/clock/exynos5410.h |   33 
> >  4 files changed, 288 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/clock/exynos5410-
> clock.txt
> >  create mode 100644 drivers/clk/samsung/clk-exynos5410.c
> >  create mode 100644 include/dt-bindings/clock/exynos5410.h
> >
> > diff --git a/Documentation/devicetree/bindings/clock/exynos5410-
> clock.txt b/Documentation/devicetree/bindings/clock/exynos5410-clock.txt
> > new file mode 100644
> > index 000..aeab635
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/clock/exynos5410-clock.txt
> > @@ -0,0 +1,45 @@
> > +* Samsung Exynos5410 Clock Controller
> > +
> > +The Exynos5410 clock controller generates and supplies clock to various
> > +controllers within the Exynos5410 SoC.
> > +
> > +Required Properties:
> > +
> > +- compatible: should be "samsung,exynos5410-clock"
> > +
> > +- reg: physical base address of the controller and length of memory
> mapped
> > +  region.
> > +
> > +- #clock-cells: should be 1.
> > +
> > +All available clocks are defined as preprocessor macros in
> > +dt-bindings/clock/exynos5410.h header and can be used in device
> > +tree sources.
> > +
> > +External clock:
> > +
> > +There is clock that is generated outside the SoC. It
> > +is expected that it is defined using standard clock bindings
> > +with following clock-output-name:
> > +
> > + - "fin_pll" - PLL input clock from XXTI
> 
> Does fin_pll feed into the exynos5410-clock controller? If so, should
> the example clock-controller node below have a clocks and clock-names
> property?
> 
Well, it is fixed clocks and generated outside of the SoC...so maybe the 
properties are not required?

BTW, I've applied this series with Tomasz Figa's reviewed tag and sent out to 
arm-soc today so if any concerns on this, please let me know immediately.

> Otherwise patch looks good.
> 

Thanks,
Kukjin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/6] powerpc, powernv, CPU hotplug: Put offline CPUs in Fast-Sleep instead of Nap

2014-05-27 Thread Preeti U Murthy

From: Srivatsa S. Bhat 

The offline cpus are put to fast sleep if the idle state is discovered in the
device tree. This is to gain maximum powersavings in the offline state.

Signed-off-by: Srivatsa S. Bhat 
[ Changelog added by  ]
Signed-off-by: Preeti U Murthy 
---

 arch/powerpc/include/asm/processor.h |8 +
 arch/powerpc/kernel/idle.c   |   52 ++
 arch/powerpc/platforms/powernv/smp.c |   12 +++-
 3 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index d922e5c..c5256db 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -449,6 +449,14 @@ static inline unsigned long get_clean_sp(unsigned long sp, 
int is_32)
 #define IDLE_INST_NAP  0x0001 /* nap instruction can be used */
 #define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */
 
+/* Flags to indicate which of the CPU idle states are available for use */
+
+#define IDLE_USE_NAP   (1UL << 0)
+#define IDLE_USE_SLEEP (1UL << 1)
+
+extern unsigned int supported_cpuidle_states;
+extern unsigned int pnv_get_supported_cpuidle_states(void);
+
 extern unsigned long cpuidle_disable;
 enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index d7216c9..e51d574 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -32,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 
 unsigned long cpuidle_disable = IDLE_NO_OVERRIDE;
@@ -79,6 +81,56 @@ void arch_cpu_idle(void)
ppc64_runlatch_on();
 }
 
+#ifdef CONFIG_PPC_POWERNV
+
+unsigned int supported_cpuidle_states = 0;
+
+unsigned int pnv_get_supported_cpuidle_states(void)
+{
+   return supported_cpuidle_states;
+}
+
+static int __init pnv_probe_idle_states(void)
+{
+   struct device_node *power_mgt;
+   struct property *prop;
+   int dt_idle_states;
+   u32 *flags;
+   int i;
+
+   if (!firmware_has_feature(FW_FEATURE_OPALv3))
+   return 0;
+
+   power_mgt = of_find_node_by_path("/ibm,opal/power-mgt");
+   if (!power_mgt) {
+   pr_warn("opal: PowerMgmt Node not found\n");
+   return 0;
+   }
+
+   prop = of_find_property(power_mgt, "ibm,cpu-idle-state-flags", NULL);
+   if (!prop) {
+   pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n");
+   return 0;
+   }
+
+   dt_idle_states = prop->length / sizeof(u32);
+   flags = (u32 *) prop->value;
+
+   for (i = 0; i < dt_idle_states; i++) {
+   if (flags[i] & IDLE_INST_NAP)
+   supported_cpuidle_states |= IDLE_USE_NAP;
+
+   if (flags[i] & IDLE_INST_SLEEP)
+   supported_cpuidle_states |= IDLE_USE_SLEEP;
+   }
+
+   return 0;
+}
+
+__initcall(pnv_probe_idle_states);
+#endif
+
+
 int powersave_nap;
 
 #ifdef CONFIG_SYSCTL
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index bf5fcd4..fc83006 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "powernv.h"
 
@@ -142,6 +143,7 @@ static int pnv_smp_cpu_disable(void)
 static void pnv_smp_cpu_kill_self(void)
 {
unsigned int cpu;
+   unsigned long idle_states;
 
/* Standard hot unplug procedure */
local_irq_disable();
@@ -152,13 +154,21 @@ static void pnv_smp_cpu_kill_self(void)
generic_set_cpu_dead(cpu);
smp_wmb();
 
+   idle_states = pnv_get_supported_cpuidle_states();
+
/* We don't want to take decrementer interrupts while we are offline,
 * so clear LPCR:PECE1. We keep PECE2 enabled.
 */
mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
while (!generic_check_cpu_restart(cpu)) {
ppc64_runlatch_off();
-   power7_nap();
+
+   /* If sleep is supported, go to sleep, instead of nap */
+   if (idle_states & IDLE_USE_SLEEP)
+   power7_sleep();
+   else
+   power7_nap();
+
ppc64_runlatch_on();
if (!generic_check_cpu_restart(cpu)) {
DBG("CPU%d Unexpected exit while offline !\n", cpu);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] extcon: palmas: Make of_device_id array const

2014-05-27 Thread Chanwoo Choi

On 05/23/2014 06:03 PM, Krzysztof Kozlowski wrote:
> Array of struct of_device_id may be be const as expected by
> of_match_table field.
> 
> Signed-off-by: Krzysztof Kozlowski 
> Cc: Graeme Gregory 
> ---
>  drivers/extcon/extcon-palmas.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/extcon/extcon-palmas.c b/drivers/extcon/extcon-palmas.c
> index ddff2b72f0a8..eb59fe564fd1 100644
> --- a/drivers/extcon/extcon-palmas.c
> +++ b/drivers/extcon/extcon-palmas.c
> @@ -273,7 +273,7 @@ static int palmas_usb_resume(struct device *dev)
>  
>  static SIMPLE_DEV_PM_OPS(palmas_pm_ops, palmas_usb_suspend, 
> palmas_usb_resume);
>  
> -static struct of_device_id of_palmas_match_tbl[] = {
> +static const struct of_device_id of_palmas_match_tbl[] = {
>   { .compatible = "ti,palmas-usb", },
>   { .compatible = "ti,palmas-usb-vid", },
>   { .compatible = "ti,twl6035-usb", },
> 

Applied.

Thanks,
Chanwoo Choi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/6] KVM: PPC: Book3S HV: Put KVM standby hwthreads to fast-sleep instead of nap

2014-05-27 Thread Preeti U Murthy

From: Srivatsa S. Bhat 

Now that the support for fast sleep idle state is present, allow
the KVM standby threads to go to fast sleep if the platform supports
it.This will fetch us maximum power savings if an entire core is idle.

Signed-off-by: Srivatsa S. Bhat 
[ Changelog added by  ]
Signed-off-by: Preeti U Murthy 
---

 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   73 ---
 1 file changed, 65 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 43aa806..69244cc 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -207,7 +207,7 @@ kvmppc_primary_no_guest:
li  r3, 1
stb r3, HSTATE_HWTHREAD_REQ(r13)
 
-   b   kvm_do_nap
+   b   kvm_do_idle
 
 kvm_novcpu_wakeup:
ld  r1, HSTATE_HOST_R1(r13)
@@ -247,7 +247,7 @@ kvm_novcpu_exit:
b   hdec_soon
 
 /*
- * We come in here when wakened from nap mode.
+ * We come in here when wakened from nap or fast-sleep mode.
  * Relocation is off and most register values are lost.
  * r13 points to the PACA.
  */
@@ -303,7 +303,7 @@ kvm_start_guest:
 
bl  kvmppc_hv_entry
 
-   /* Back from the guest, go back to nap */
+   /* Back from the guest, go back to nap or fastsleep */
/* Clear our vcpu pointer so we don't come back in early */
li  r0, 0
std r0, HSTATE_KVM_VCPU(r13)
@@ -314,7 +314,7 @@ kvm_start_guest:
 */
lwsync
 
-   /* increment the nap count and then go to nap mode */
+   /* increment the nap count and then go to nap or fast-sleep mode */
ld  r4, HSTATE_KVM_VCORE(r13)
addir4, r4, VCORE_NAP_COUNT
 51:lwarx   r3, 0, r4
@@ -325,6 +325,24 @@ kvm_start_guest:
 kvm_no_guest:
li  r0, KVM_HWTHREAD_IN_NAP
stb r0, HSTATE_HWTHREAD_STATE(r13)
+
+kvm_do_idle:
+   /*
+* if (supported_cpuidle_states & IDLE_USE_SLEEP)
+*  kvm_do_fastsleep();
+* else
+*  kvm_do_nap();
+*/
+   LOAD_REG_ADDRBASE(r3,supported_cpuidle_states)
+   lwz r4,ADDROFF(supported_cpuidle_states)(r3)
+   /*
+* andi. r4,r4,IDLE_USE_SLEEP. Replacing IDLE_USE_SLEEP
+* with the immediate value since it is a 32 bit instruction
+* and the operand needs to fit into this.
+*/
+   andi.   r4,r4,2
+   bne kvm_do_fastsleep
+
 kvm_do_nap:
/* Clear the runlatch bit before napping */
mfspr   r2, SPRN_CTRLF
@@ -339,6 +357,18 @@ kvm_do_nap:
IDLE_STATE_ENTER_SEQ_HV(PPC_NAP)
/* No return */
 
+kvm_do_fastsleep:
+   li  r3, LPCR_PECE0
+   mfspr   r4, SPRN_LPCR
+   /* Don't set LPCR_PECE1 since we want to wakeup only on an external
+* interrupt, and not on a decrementer interrupt.
+*/
+   rlwimi  r4, r3, 0, LPCR_PECE0
+   mtspr   SPRN_LPCR, r4
+   isync
+   IDLE_STATE_ENTER_SEQ_HV(PPC_SLEEP)
+   /* No return */
+
 
 /**
  **
@@ -2016,8 +2046,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
bl  kvmppc_save_fp
 
/*
-* Take a nap until a decrementer or external or doobell interrupt
-* occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the
+* Go to fastsleep until an external or doobell interrupt
+* occurs, with PECE0 and PECEDP set in LPCR. Also clear the
 * runlatch bit before napping.
 */
mfspr   r2, SPRN_CTRLF
@@ -2026,6 +2056,22 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
 
li  r0,1
stb r0,HSTATE_HWTHREAD_REQ(r13)
+   /*
+* if (supported_cpuidle_states & IDLE_USE_SLEEP)
+*  PPC_SLEEP;
+* else
+*  PPC_NAP;
+*/
+   LOAD_REG_ADDRBASE(r3,supported_cpuidle_states)
+   lwz r4,ADDROFF(supported_cpuidle_states)(r3)
+   /*
+* andi. r4,r4,IDLE_USE_SLEEP. Replacing IDLE_USE_SLEEP
+* with the immediate value since it is a 32 bit instruction
+* and the operand needs to fit into this.
+*/
+   andi.   r4,r4,2
+   bne 35f
+
mfspr   r5,SPRN_LPCR
ori r5,r5,LPCR_PECE0 | LPCR_PECE1
 BEGIN_FTR_SECTION
@@ -2037,6 +2083,17 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
IDLE_STATE_ENTER_SEQ_HV(PPC_NAP)
/* No return */
 
+35:mfspr   r5,SPRN_LPCR
+   ori r5,r5,LPCR_PECE0
+BEGIN_FTR_SECTION
+   orisr5,r5,LPCR_PECEDP@h
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+   mtspr   SPRN_LPCR,r5
+   isync
+   li  r0, 0
+   IDLE_STATE_ENTER_SEQ_HV(PPC_SLEEP)
+   /* No return */
+
 33:mr  r4, r3
li  r3, 0
li

[PATCH 4/6] KVM: PPC: Book3S HV: Consolidate the idle-state enter sequence in KVM

2014-05-27 Thread Preeti U Murthy

From: Srivatsa S. Bhat 

Now that the support for fast sleep idle state is present, the KVM
standby threads can be put to fast sleep when they are either idle
or do not have a guest to run. Today they enter nap in these scenarios.
The purpose is to gain maximum power savings in a KVM scenario as well
when an entire cpu core is idle.

As a precursor, consolidate the code common across all idle states.

Signed-off-by: Srivatsa S. Bhat 
[ Changelog added by  ]
Signed-off-by: Preeti U Murthy 
---

 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b031f93..43aa806 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -40,6 +40,17 @@
 #define NAPPING_CEDE   1
 #define NAPPING_NOVCPU 2
 
+#define IDLE_STATE_ENTER_SEQ_HV(IDLE_INST) \
+   /* Magic NAP/SLEEP/WINKLE mode enter sequence */\
+   std r0, HSTATE_SCRATCH0(r13);   \
+   ptesync;\
+   ld  r0, HSTATE_SCRATCH0(r13);   \
+1: cmpdr0, r0; \
+   bne 1b; \
+   IDLE_INST;  \
+   b   .
+
+
 /*
  * Call kvmppc_hv_entry in real mode.
  * Must be called with interrupts hard-disabled.
@@ -325,13 +336,9 @@ kvm_do_nap:
rlwimi  r4, r3, 0, LPCR_PECE0 | LPCR_PECE1
mtspr   SPRN_LPCR, r4
isync
-   std r0, HSTATE_SCRATCH0(r13)
-   ptesync
-   ld  r0, HSTATE_SCRATCH0(r13)
-1: cmpdr0, r0
-   bne 1b
-   nap
-   b   .
+   IDLE_STATE_ENTER_SEQ_HV(PPC_NAP)
+   /* No return */
+
 
 /**
  **
@@ -2027,13 +2034,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_LPCR,r5
isync
li  r0, 0
-   std r0, HSTATE_SCRATCH0(r13)
-   ptesync
-   ld  r0, HSTATE_SCRATCH0(r13)
-1: cmpdr0, r0
-   bne 1b
-   nap
-   b   .
+   IDLE_STATE_ENTER_SEQ_HV(PPC_NAP)
+   /* No return */
 
 33:mr  r4, r3
li  r3, 0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/6] ppc, book3s: Go back to same idle state after handling machine check interrupt

2014-05-27 Thread Preeti U Murthy

From: Srivatsa S. Bhat 

Now that the support for fast sleep is present, threads could have woken up
from fast sleep on getting a machine check interrupt. Hence add code to allow
threads to go back to the idle state they woke up from after handling the
interrupt. Today they go back to nap by default.

Signed-off-by: Srivatsa S. Bhat 
[ Changelog added by  ]
Signed-off-by: Preeti U Murthy 
---

 arch/powerpc/kernel/exceptions-64s.S |   21 +++--
 arch/powerpc/kernel/idle_power7.S|2 +-
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index b4bf464..94cee3c 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1396,15 +1396,16 @@ machine_check_handle_early:
 * of the following is true:
 * a. thread wasn't in power saving mode
 * b. thread was in power saving mode with no state loss or
-*supervisor state loss
+*supervisor state loss or hypervisor state loss (fastsleep)
 *
-* Go back to nap again if (b) is true.
+* Go back to nap or fastsleep again if (b) is true.
 */
rlwinm. r11,r12,47-31,30,31 /* Was it in power saving mode? */
beq 4f  /* No, it wasn;t */
-   /* Thread was in power saving mode. Go back to nap again. */
-   cmpwi   r11,2
-   bne 3f
+   /* Thread was in power saving mode. Go back to the same state again. */
+   cmpwi   cr1,r11,2
+   blt cr1,3f
+7:
/* Supervisor state loss */
li  r0,1
stb r0,PACA_NAPSTATELOST(r13)
@@ -1412,7 +1413,15 @@ machine_check_handle_early:
MACHINE_CHECK_HANDLER_WINDUP
GET_PACA(r13)
ld  r1,PACAR1(r13)
-   b   .power7_enter_nap_mode
+   /* We need to pass the idle state in r3: 0 -> nap, 1 -> sleep */
+   bgt cr1,8f
+   li  r3,0
+   b   .power7_enter_idle
+   /* No return */
+
+8: li  r3,1 /* Pass 1 in r3 to request sleep in power7_enter_idle */
+   b   .power7_enter_idle
+   /* No return */
 4:
 #endif
/*
diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index c3ab869..e13e21b 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -95,7 +95,7 @@ _GLOBAL(power7_powersave_common)
std r9,_MSR(r1)
std r1,PACAR1(r13)
 
-_GLOBAL(power7_enter_nap_mode)
+_GLOBAL(power7_enter_idle)
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
/* Tell KVM we're napping */
li  r4,KVM_HWTHREAD_IN_NAP

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/6] KVM: PPC: Book3S HV: Enable CPUs to run guest after waking up from fast-sleep

2014-05-27 Thread Preeti U Murthy

From: Srivatsa S. Bhat 

When guests have to be launched, the secondary threads which are offline
are woken up to run the guests. Today these threads wake up from nap
and check if they have to run guests. Now that the offline secondary threads
can go to fastsleep, add this check in the fastsleep wakeup path as well.

Signed-off-by: Srivatsa S. Bhat 
[ Changelog added by  ]
Signed-off-by: Preeti U Murthy 
---

 arch/powerpc/kernel/exceptions-64s.S |   30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 3afd391..b4bf464 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -100,6 +100,19 @@ system_reset_pSeries:
SET_SCRATCH0(r13)
 #ifdef CONFIG_PPC_P7_NAP
 BEGIN_FTR_SECTION
+
+   GET_PACA(r13)
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   li  r0,KVM_HWTHREAD_IN_KERNEL
+   stb r0,HSTATE_HWTHREAD_STATE(r13)
+   /* Order setting hwthread_state vs. testing hwthread_req */
+   sync
+   lbz r0,HSTATE_HWTHREAD_REQ(r13)
+   cmpwi   r0,0
+   beq 1f
+   b   kvm_start_guest
+1:
+#endif
/* Running native on arch 2.06 or later, check if we are
 * waking up from nap. We only handle no state loss and
 * supervisor state loss. We do -not- handle hypervisor
@@ -116,28 +129,15 @@ BEGIN_FTR_SECTION
 * OPAL v3 based powernv platforms have new idle states
 * which fall in this catagory.
 */
-   bgt cr1,8f
GET_PACA(r13)
-
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-   li  r0,KVM_HWTHREAD_IN_KERNEL
-   stb r0,HSTATE_HWTHREAD_STATE(r13)
-   /* Order setting hwthread_state vs. testing hwthread_req */
-   sync
-   lbz r0,HSTATE_HWTHREAD_REQ(r13)
-   cmpwi   r0,0
-   beq 1f
-   b   kvm_start_guest
-1:
-#endif
+   bgt cr1,8f
 
beq cr1,2f
b   .power7_wakeup_noloss
 2: b   .power7_wakeup_loss
 
/* Fast Sleep wakeup on PowerNV */
-8: GET_PACA(r13)
-   b   .power7_wakeup_tb_loss
+8: b   .power7_wakeup_tb_loss
 
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] extcon: arizona: support inverted jack detect switch

2014-05-27 Thread Chanwoo Choi

On 05/23/2014 08:54 PM, Richard Fitzgerald wrote:
> Add config option for inverted jack detect switch that
> opens when jack is inserted.
> 
> Signed-off-by: Richard Fitzgerald 
> ---
>  drivers/extcon/extcon-arizona.c   |   34 ++
>  include/linux/mfd/arizona/pdata.h |3 +++
>  2 files changed, 29 insertions(+), 8 deletions(-)

Applied.

Thanks,
Chanwoo Choi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/6] powernv, cpuidle: Move the flags used for idle state discovery to powernv core

2014-05-27 Thread Preeti U Murthy

From: Srivatsa S. Bhat 

These flags will be used by the cpuidle driver as well as in the cpu
offline path. The offline cpus should be put to fastsleep if the idle state
is discovered so as to gain maximum power savings in the offline state.

Signed-off-by: Srivatsa S. Bhat 
[ Changelog added by  ]
Signed-off-by: Preeti U Murthy 
---

 arch/powerpc/include/asm/processor.h |4 
 drivers/cpuidle/cpuidle-powernv.c|7 +++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index d660dc3..d922e5c 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -445,6 +445,10 @@ static inline unsigned long get_clean_sp(unsigned long sp, 
int is_32)
 }
 #endif
 
+/* Support for 'nap' and 'sleep' instructions, as discovered from the DT */
+#define IDLE_INST_NAP  0x0001 /* nap instruction can be used */
+#define IDLE_INST_SLEEP0x0002 /* sleep instruction can be used */
+
 extern unsigned long cpuidle_disable;
 enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 719f6fb..5d4f9e8 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -17,12 +17,11 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Flags and constants used in PowerNV platform */
 
 #define MAX_POWERNV_IDLE_STATES8
-#define IDLE_USE_INST_NAP  0x0001 /* Use nap instruction */
-#define IDLE_USE_INST_SLEEP0x0002 /* Use sleep instruction */
 
 struct cpuidle_driver powernv_idle_driver = {
.name = "powernv_idle",
@@ -187,7 +186,7 @@ static int powernv_add_idle_states(void)
 
for (i = 0; i < dt_idle_states; i++) {
 
-   if (flags[i] & IDLE_USE_INST_NAP) {
+   if (flags[i] & IDLE_INST_NAP) {
/* Add NAP state */
strcpy(powernv_states[nr_idle_states].name, "Nap");
strcpy(powernv_states[nr_idle_states].desc, "Nap");
@@ -198,7 +197,7 @@ static int powernv_add_idle_states(void)
nr_idle_states++;
}
 
-   if (flags[i] & IDLE_USE_INST_SLEEP) {
+   if (flags[i] & IDLE_INST_SLEEP) {
/* Add FASTSLEEP state */
strcpy(powernv_states[nr_idle_states].name, 
"FastSleep");
strcpy(powernv_states[nr_idle_states].desc, 
"FastSleep");

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/6] ppc, kvm, cpuidle: Allow offline and kvm standby threads to enter fastsleep

2014-05-27 Thread Preeti U Murthy

Fast sleep is a deep idle state on Power8. The support for the state was
added in commit 0d94873011. Today the idle threads in the host can
potentially be put to fast sleep. But when we launch guests using kvm,
the secondary threads are required to be offline and the offline threads
are put to nap. Besides this case, when secondary threads are woken up
to run guests and eventually go idle or when the guest is killed, they
enter nap. So when the entire core goes idle in both the above scenarios,
the maximum power savings that we can obtain is as much as we can get from
napping the cpus. This patchset adds support in the above two cases
for the threads to enter fast sleep.
---

Srivatsa S. Bhat (6):
  powernv, cpuidle: Move the flags used for idle state discovery to powernv 
core
  powerpc, powernv, CPU hotplug: Put offline CPUs in Fast-Sleep instead of 
Nap
  KVM: PPC: Book3S HV: Enable CPUs to run guest after waking up from 
fast-sleep
  KVM: PPC: Book3S HV: Consolidate the idle-state enter sequence in KVM
  KVM: PPC: Book3S HV: Put KVM standby hwthreads to fast-sleep instead of 
nap
  ppc,book3s: Go back to same idle state after handling machine check 
interrupt


 arch/powerpc/include/asm/processor.h|   12 
 arch/powerpc/kernel/exceptions-64s.S|   51 +--
 arch/powerpc/kernel/idle.c  |   52 
 arch/powerpc/kernel/idle_power7.S   |2 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  103 ---
 arch/powerpc/platforms/powernv/smp.c|   12 +++-
 drivers/cpuidle/cpuidle-powernv.c   |7 +-
 7 files changed, 190 insertions(+), 49 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v6 0/6] add cpuidle support for Exynos5420

2014-05-27 Thread Kukjin Kim

Chander Kashyap wrote:
> 
> On 26 May 2014 15:59, Tomasz Figa  wrote:
> > Hi Chander,
> >
> > On 16.05.2014 10:03, Chander Kashyap wrote:
> >> Exynos5420 is a big-little Soc from Samsung. It has 4 A15 and 4 A7
> cores.
> >>
> >> This patchset adds cpuidle support for Exynos5420 SoC based on
> >> generic big.little cpuidle driver.
> >>
> >> Tested on SMDK5420.
> >>
> >> This patch set depends on:
> >>   1. [PATCH 0/5] MCPM backend for Exynos5420
> >>  http://www.spinics.net/lists/arm-kernel/msg331100.html
> >> Changelog is in respective patches.
> >> Chander Kashyap (5):
> >>   driver: cpuidle-big-little: add of_device_id structure
> >>   arm: exynos: add generic function to calculate cpu number
> >>   cpuidle: config: Add ARCH_EXYNOS entry to select cpuidle-big-little
> >> driver
> >>   driver: cpuidle: cpuidle-big-little: init driver for Exynos5420
> >>   exynos: cpuidle: do not allow cpuidle registration for Exynos5420
> >>   mcpm: exynos: populate suspend and powered_up callbacks
> >>
> >>  arch/arm/mach-exynos/exynos.c|4 +++-
> >>  arch/arm/mach-exynos/mcpm-exynos.c   |   36
> ++
> >>  arch/arm/mach-exynos/regs-pmu.h  |9 +
> >>  drivers/cpuidle/Kconfig.arm  |2 +-
> >>  drivers/cpuidle/cpuidle-big_little.c |   12 +++-
> >>  5 files changed, 60 insertions(+), 3 deletions(-)
> >>
> >
> > For the whole series,
> >
> > Reviewed-by: Tomasz Figa 
> 
> Thanks Tomasz.
> 
> Dear Kukjin,
> Can you take these patches.
> >
When I looked at this series quickly, looks good to me but I need to get ack 
from cpuidle maintainer Rafael or Daniel.

Thanks,
Kukjin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 0/6] add cpuidle support for Exynos5420

2014-05-27 Thread Chander Kashyap

On 26 May 2014 15:59, Tomasz Figa  wrote:
> Hi Chander,
>
> On 16.05.2014 10:03, Chander Kashyap wrote:
>> Exynos5420 is a big-little Soc from Samsung. It has 4 A15 and 4 A7 cores.
>>
>> This patchset adds cpuidle support for Exynos5420 SoC based on
>> generic big.little cpuidle driver.
>>
>> Tested on SMDK5420.
>>
>> This patch set depends on:
>>   1. [PATCH 0/5] MCPM backend for Exynos5420
>>  http://www.spinics.net/lists/arm-kernel/msg331100.html
>> Changelog is in respective patches.
>> Chander Kashyap (5):
>>   driver: cpuidle-big-little: add of_device_id structure
>>   arm: exynos: add generic function to calculate cpu number
>>   cpuidle: config: Add ARCH_EXYNOS entry to select cpuidle-big-little
>> driver
>>   driver: cpuidle: cpuidle-big-little: init driver for Exynos5420
>>   exynos: cpuidle: do not allow cpuidle registration for Exynos5420
>>   mcpm: exynos: populate suspend and powered_up callbacks
>>
>>  arch/arm/mach-exynos/exynos.c|4 +++-
>>  arch/arm/mach-exynos/mcpm-exynos.c   |   36 
>> ++
>>  arch/arm/mach-exynos/regs-pmu.h  |9 +
>>  drivers/cpuidle/Kconfig.arm  |2 +-
>>  drivers/cpuidle/cpuidle-big_little.c |   12 +++-
>>  5 files changed, 60 insertions(+), 3 deletions(-)
>>
>
> For the whole series,
>
> Reviewed-by: Tomasz Figa 

Thanks Tomasz.

Dear Kukjin,
Can you take these patches.
>
> Best regards,
> Tomasz



-- 
with warm regards,
Chander Kashyap
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv5 2/4] mailbox: Introduce framework for mailbox

2014-05-27 Thread Jassi Brar

On Wed, May 21, 2014 at 10:57 PM, Mark Brown  wrote:
> On Thu, May 15, 2014 at 11:41:00AM +0530, Jassi Brar wrote:
>> Introduce common framework for client/protocol drivers and
>> controller drivers of Inter-Processor-Communication (IPC).
>
> This looks pretty nice, though I do have a few *very* small nits beyond
> those Arnd had.
>
>> + if (chan->cl->tx_block && chan->active_req) {
>> + int ret;
>> + init_completion(>tx_complete);
>
> reinit_completion().
>
>> + if (!cl->tx_tout) /* wait for ever */
>> + cl->tx_tout = msecs_to_jiffies(360);
>> + else
>> + cl->tx_tout = msecs_to_jiffies(cl->tx_tout);
>
> Is the default wait for ever the best timeout - I'm not sure it's best
> from a defensiveness point of view.  It should be fine either way,
> it's just a matter of taste.
>
The client wants the call to be blocking. Out of 'zero', 'infinity'
and some 'valid' delay, it makes better sense to have 'infinity' than
zero or another value that might be valid for some platform. I assume
1hr to be 'infinity', though I am open to better suggestions. Maybe
put a WARN() ?


>> + ret = chan->mbox->ops->startup(chan);
>> + if (ret) {
>> + pr_err("Unable to startup the chan\n");
>
> Perhaps print the error codes?  Might be helpful to users.
>
OK.


BTW, I have not converted Highbank's PL320 and OMAP's controller and
client drivers. I believe Highbank's can't be converted to DT now and
Suman would want to convert the OMAP himself.

Also, maybe mailbox patches could be upstreamed via, say, arm-soc tree?

Regards,
Jassi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC/PATCH] ksm: add vma size threshold parameter

2014-05-27 Thread Hugh Dickins

On Tue, 27 May 2014, Vitaly Wool wrote:

> Hi,
> 
> I have recently been poking around saving memory on low-RAM Android devices,
> basically
> following the Google KSM+ZRAM guidelines for KitKat and measuring the
> gain/performance.
> While getting quite some RAM savings indeed (in the range of 10k-20k pages)
> we noticed
> that kswapd used a lot of CPU cycles most of the time, and that iowait times
> reported
> by e. g. top were sometimes off the reasonable limits (up to 40%). From what
> I could see,
> the reason for that behavior at least in part is that KSM has to traverse
> really long
> VMA lists.
> 
> Android userspace should be held somewhat responsible for that since it
> "advises" KSM all
> MAP_PRIVATE|MAP_ANONYMOUS mmap'ed pages are mergeable while this seems to be
> exhaustive
> and not quite following the kernel KSM Documentation piece saying:
> "Applications should be considerate in their use of MADV_MERGEABLE,
> restricting its use to areas likely to benefit.  KSM's scans may use a lot
> of processing power: some installations will disable KSM for that reason."
> 
> As a mitigation to this, we suggest an additional parameter to be added to
> KSM
> sysfs-exported ones. It will allow for bypassing small VM areas advertised as
> mergeable
> and only add bigger ones to KSM lists, keeping the default behavior intact.
> 
> The RFC/patch code may then look like this:
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 68710e8..069f6b0 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -232,6 +232,10 @@ static int ksm_nr_node_ids = 1;
>  #define ksm_nr_node_ids  1
>  #endif
>  +/* Threshold for minimal VMA size to consider */
> +static unsigned long ksm_vma_size_threshold = 4096;
> +
> +
>  #define KSM_RUN_STOP 0
>  #define KSM_RUN_MERGE1
>  #define KSM_RUN_UNMERGE  2
> @@ -1757,6 +1761,9 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned
> long start,
>   return 0;
>  #endif
>  +if (end - start < ksm_vma_size_threshold)
> + return 0;
> +
>   if (!test_bit(MMF_VM_MERGEABLE, >flags)) {
>   err = __ksm_enter(mm);
>   if (err)
> @@ -2240,6 +2247,29 @@ static ssize_t merge_across_nodes_store(struct kobject
> *kobj,
>  KSM_ATTR(merge_across_nodes);
>  #endif
>  +static ssize_t vma_size_threshold_show(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buf)
> +{
> + return sprintf(buf, "%lu\n", ksm_vma_size_threshold);
> +}
> +
> +static ssize_t vma_size_threshold_store(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + const char *buf, size_t count)
> +{
> + int err;
> + unsigned long thresh;
> +
> + err = strict_strtoul(buf, 10, );
> + if (err || thresh > UINT_MAX)
> + return -EINVAL;
> +
> + ksm_vma_size_threshold = thresh;
> +
> + return count;
> +}
> +KSM_ATTR(vma_size_threshold);
> +
>  static ssize_t pages_shared_show(struct kobject *kobj,
>struct kobj_attribute *attr, char *buf)
>  {
> @@ -2297,6 +2327,7 @@ static struct attribute *ksm_attrs[] = {
>  #ifdef CONFIG_NUMA
>   _across_nodes_attr.attr,
>  #endif
> + _size_threshold_attr.attr,
>   NULL,
>  };
> 
> With our (narrow) use case, setting vma_size_threshold to 65536 significantly
> decreases the
> iowait time and the CPU idle load, while the KSM gain descreases quite
> slightly (by 5-15%).
> 
> Any comments will be greatly appreciated,

It's interesting, even amusing, but I think the emphasis has to be on
your "(narrow) use case".

I can't see any particular per-vma overhead in KSM's scan; and what
little per-vma overhead there is (find_vma, vma->vm_next) includes
the non-mergeable vmas along with the mergeable ones.

And I don't think it's a universal rule of nature that small vmas are
less likely to contain identical pages than large ones - beyond, of
course, the obvious fact that small vmas are likely to contain fewer
pages than large ones, so to that degree less likely to have merge hits.

But you see a significantly/slightly effect beyond that: any theory why?

I think it's just a feature of your narrow use case, and the adjustment
for it best made in userspace (or hacked into your own kernel if you
wish); but I cannot at present see the case for doing this in an
upstream kernel.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci: Save and restore VFs as a part of a reset

2014-05-27 Thread Alex Williamson

On Tue, 2014-05-27 at 19:19 -0600, Bjorn Helgaas wrote:
> [+cc Alex, Don]
> 
> On Tue, May 27, 2014 at 5:53 PM, Alexander Duyck
>  wrote:
> > On 05/27/2014 03:22 PM, Bjorn Helgaas wrote:
> >> On Mon, May 05, 2014 at 02:25:17PM -0700, Alexander Duyck wrote:
> >>> This fixes an issue I found in which triggering a reset via the PCI sysfs
> >>> reset while SR-IOV was enabled would leave the VFs in a state in which the
> >>> BME and MSI-X enable bits were all cleared.
> >>>
> >>> To correct that I have added code so that the VF state is saved and 
> >>> restored
> >>> as a part of the PF save and restore state functions.  By doing this the 
> >>> VF
> >>> state is restored as well as the IOV state allowing the VFs to resume 
> >>> function
> >>> following a reset.
> >>>
> >>> Signed-off-by: Alexander Duyck 
> >>> ---
> >>>  drivers/pci/iov.c |   48 ++--
> >>>  drivers/pci/pci.c |2 ++
> >>>  drivers/pci/pci.h |5 +
> >>>  3 files changed, 53 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> >>> index de7a747..645ed71 100644
> >>> --- a/drivers/pci/iov.c
> >>> +++ b/drivers/pci/iov.c
> >>> @@ -521,13 +521,57 @@ resource_size_t pci_sriov_resource_alignment(struct 
> >>> pci_dev *dev, int resno)
> >>>  }
> >>>
> >>>  /**
> >>> + * pci_save_iov_state - Save the state of the VF configurations
> >>> + * @dev: the PCI device
> >>> + */
> >>> +int pci_save_iov_state(struct pci_dev *dev)
> >>> +{
> >>> +struct pci_dev *vfdev = NULL;
> >>> +unsigned short dev_id;
> >>> +
> >>> +/* only search if we are a PF */
> >>> +if (!dev->is_physfn)
> >>> +return 0;
> >>> +
> >>> +/* retrieve VF device ID */
> >>> +pci_read_config_word(dev, dev->sriov->pos + PCI_SRIOV_VF_DID, 
> >>> _id);
> ...
> 
> >>> +/* loop through all the VFs and save their state information */
> >>> +while ((vfdev = pci_get_device(dev->vendor, dev_id, vfdev))) {
> >>> +if (vfdev->is_virtfn && (vfdev->physfn == dev)) {
> >>> +int err = pci_save_state(vfdev);
> >>
> >> It makes me uneasy to operate on another device (we're resetting A, and
> >> here we save state for B).  I know B is dependent on A, since B is a VF
> >> related to PF A, but what synchronization is there to serialize this
> >> against any other save/restore operations that may be in progress by B's
> >> driver or by a sysfs operation on B?
> >
> > I don't believe there is any synchronization mechanism in place
> > currently.  I can look into that as well.  Odds are we probably need to
> > have the VFs check the parent lock before they take any independent action.
> 
> It's just the whole question of how we manage the single "saved-state"
> area.  Right now, I think almost all use of it is under control of the
> driver that owns the device, in suspend/resume methods.  The
> exceptions are the PM suspend/freeze/etc. routines in
> pci/pci-driver.c, which I assume prevent the driver from running and
> are therefore safe, and the reset path.  I don't know how the

Makes me a little uneasy too, what happens to a transaction headed
to/from the VF while the PF is in a reset state?  I suspect not good
things.  OTOH, the reset interface and a good bit of pci-sysfs have
always been at-your-own-risk interfaces and this restores some bits that
might get us closer to it being survivable.

We do have a way for drivers to get a long-term save state that they can
keep on their own, pci_save_state(); pci_store_saved_state() along with
pci_load_saved_state(); pci_restore_state().  Both KVM and VFIO use this
for assigning a device so we can attempt to re-load the pre-assigned
saved state.

> >> Is there anything in the reset path that pays attention to whether
> >> resetting this PF will clobber VFs?  Do we care whether those VFs are in
> >> use?  I assume they might be in use by guests?
> >
> > The problem I found was that the sysfs reset call doesn't bother to
> > check with the PF driver at all.  It just clobbers the PF and any VFs on
> > it without talking to the PF driver.
> 
> There is Keith Busch's recent patch:
> http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?h=pci/hotplug=3ebe7f9f7e4a4fd1f6461ecd01ff2961317a483a
> .  I dunno if that's useful to you or not.
> 
> And I'm not sure there's actually a requirement to *have* a PF driver.
>  Obviously there has to be a way to enable the VFs, but once they're
> enabled, it might be possible to keep using them via VF drivers even
> without a PF driver in the picture.
> 
> Maybe resetting the PF should just fail if there's an active VF.  If
> you need to reset the PF, you'd have to unbind the VFs first.

The use case is certainly questionable, personally I'm not going to
expect VFs to continue working after the PF is reset.  Driver binding
gets complicated, especially when KVM doesn't actually bind devices to
use them.  Hopefully we'll get that out of the tree some day

Re: [PATCH 1/4] lib/debugobjects.c: convert printk to pr_foo()

2014-05-27 Thread Josh Triplett

On Tue, May 27, 2014 at 04:25:54PM +0200, Fabian Frederick wrote:
> On Sat, 24 May 2014 20:40:43 -0700
> Josh Triplett  wrote:
> 
> > On Sun, May 25, 2014 at 05:18:36AM +0200, Fabian Frederick wrote:
> > > On Sat, 24 May 2014 14:53:22 -0700
> > > Josh Triplett  wrote:
> > > 
> > > > On Sat, May 24, 2014 at 03:06:08PM +0200, Fabian Frederick wrote:
> > > > > Convert all except KERN_DEBUG
> > > > 
> > > > Why not KERN_DEBUG?
> > > printk(KERN_DEBUG can't be converted to pr_debug the same way as other 
> > > printk.
> > 
> > True, but I don't see any obvious reason why that prevents you from
> > converting them.  More importantly, though, you should explain for the
> > benefit of the changelog.
> 
>   There's no documentation yet in mainline for this but in linux-next:
> commit 8ce2658fc31bb7
> 
>   I can submit one more patch with pr_debug conversion using #define DEBUG
> or compilation -DDEBUG

Definitely keep it as a separate patch, but I don't think you should
define DEBUG by default; just let people turn it on if desired, and
leave it out otherwise.  There's only a single KERN_DEBUG printk in this
driver, and it seems fine to just convert to pr_debug so it gets left
out of normal builds.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] swap: Avoid scanning invalidated region for cheap seek

2014-05-27 Thread Hugh Dickins

On Mon, 26 May 2014, Chen Yucong wrote:

> For cheap seek, when we scan the region between si->lowset_bit
> and scan_base, if san_base is greater than si->highest_bit, the
> scan operation between si->highest_bit and scan_base is not
> unnecessary.
> 
> This patch can be used to avoid scanning invalidated region for
> cheap seek.
> 
> Signed-off-by: Chen Yucong 

I was going to suggest that you are adding a little code to a common
path, in order to optimize a very unlikely case: which does not seem
worthwhile to me.

But digging a little deeper, I think you have hit upon something more
interesting (though still in no need of your patch): it looks to me
like that is not even a common path, but dead code.

Shaohua, am I missing something, or does all SWP_SOLIDSTATE "seek is
cheap" now go your si->cluster_info scan_swap_map_try_ssd_cluster()
route?  So that the "last_in_cluster < scan_base" loop in the body
of scan_swap_map() is just redundant, and should have been deleted?

Hugh

> ---
>  mm/swapfile.c |5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index bf8..7f0f27e 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -489,6 +489,7 @@ static unsigned long scan_swap_map(struct 
> swap_info_struct *si,
>  {
>   unsigned long offset;
>   unsigned long scan_base;
> + unsigned long upper_bound;
>   unsigned long last_in_cluster = 0;
>   int latency_ration = LATENCY_LIMIT;
>  
> @@ -551,9 +552,11 @@ static unsigned long scan_swap_map(struct 
> swap_info_struct *si,
>  
>   offset = si->lowest_bit;
>   last_in_cluster = offset + SWAPFILE_CLUSTER - 1;
> + upper_bound = (scan_base <= si->highest_bit) ?
> + scan_base : (si->highest_bit + 1);
>  
>   /* Locate the first empty (unaligned) cluster */
> - for (; last_in_cluster < scan_base; offset++) {
> + for (; last_in_cluster < upper_bound; offset++) {
>   if (si->swap_map[offset])
>   last_in_cluster = offset + SWAPFILE_CLUSTER;
>   else if (offset == last_in_cluster) {
> -- 
> 1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] ARM: ux500: Staticize ux500_soc_attr

2014-05-27 Thread Sachin Kamat

'ux500_soc_attr' is local to this file. While at it also make it
const to match the argument list of device_create_file.

Signed-off-by: Sachin Kamat 
---
 arch/arm/mach-ux500/cpu.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-ux500/cpu.c b/arch/arm/mach-ux500/cpu.c
index db16b5a04ad5..dbb2970ee7da 100644
--- a/arch/arm/mach-ux500/cpu.c
+++ b/arch/arm/mach-ux500/cpu.c
@@ -125,7 +125,7 @@ static void __init soc_info_populate(struct 
soc_device_attribute *soc_dev_attr,
soc_dev_attr->revision = ux500_get_revision();
 }
 
-struct device_attribute ux500_soc_attr =
+static const struct device_attribute ux500_soc_attr =
__ATTR(process,  S_IRUGO, ux500_get_process,  NULL);
 
 struct device * __init ux500_soc_device_init(const char *soc_id)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/5] devicetree: bindings: document Broadcom CPU enable method

2014-05-27 Thread Alex Elder

On 05/27/2014 06:49 AM, Lorenzo Pieralisi wrote:
> On Tue, May 20, 2014 at 06:43:46PM +0100, Alex Elder wrote:
>> Broadcom mobile SoCs use a ROM-implemented holding pen for
>> controlled boot of secondary cores.  A special register is
>> used to communicate to the ROM that a secondary core should
>> start executing kernel code.  This enable method is currently
>> used for members of the bcm281xx and bcm21664 SoC families.
>>
>> The use of an enable method also allows the SMP operation vector to
>> be assigned as a result of device tree content for these SoCs.
>>
>> Signed-off-by: Alex Elder 
> 
> This is getting out of control, it is absolutely ghastly. I wonder how
> I can manage to keep cpus.txt updated if anyone with a boot method
> du jour adds into cpus.txt, and honestly in this specific case it is even
> hard to understand why.

OK, in this message I'll focus on the particulars of this
proposed binding.

> Can't it be done with bindings for the relative register address space
> (regmap ?) and platform code just calls the registers driver to set-up the
> jump address ? It is platform specific code anyway there is no way you
> can make this generic.

I want to clarify what you're after here.

My aim is to add SMP support for a class of Broadcom SMP
machines.  To do so, I'm told I need to use the technique
of assigning the SMP operations vector as a result of
identifying an enable method in the DT.

For 32-bit ARM, there are no generic "enable-method" values.
(I did attempt to create one for "spin-table" but that was
rejected by Russell King.)  For the machines I'm trying to
enable, secondary CPUS start out spinning in a ROM-based
holding pen, and there is no need for a kernel-based one.

However, like a spin-table/holding pen enable method, a
memory location is required for coordination between the
boot CPU running kernel code and secondary CPUs running ROM
code.  My proposal specifies it using a special numeric
property value named "secondary-boot-reg" in the "cpus"
node in the DT.

And as I understand it, the issue you have relates to how
this memory location is specified.

You suggest regmap.  I'm using a single 32-bit register,
only at very early boot time, and thereafter access to
it is meaningless.  It seems like overkill if it's only
used for this purpose.  I could hide the register values
in the code, but with the exception of that, the code I'm
using is generic (in the context of this class of Broadcom
machine).  I could specify the register differently somehow,
in a different node, or with a different property.

The bottom line here is I'm not sure whether I understand
what you're suggesting, or perhaps why what you suggest is
preferable.  I'm very open to suggestions, I just need it
laid out a bit more detail in order to respond directly.

Thanks.

-Alex

> I really do not see the point in cluttering cpus.txt with this stuff, it
> is a platform specific hack, and do not belong in generic bindings in my
> opinion.
> 
> Thanks,
> Lorenzo
> 
>> ---
>>  Documentation/devicetree/bindings/arm/cpus.txt | 12 
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/arm/cpus.txt 
>> b/Documentation/devicetree/bindings/arm/cpus.txt
>> index 333f4ae..c6a2411 100644
>> --- a/Documentation/devicetree/bindings/arm/cpus.txt
>> +++ b/Documentation/devicetree/bindings/arm/cpus.txt
>> @@ -185,6 +185,7 @@ nodes to be present and contain the properties described 
>> below.
>>  "qcom,gcc-msm8660"
>>  "qcom,kpss-acc-v1"
>>  "qcom,kpss-acc-v2"
>> +"brcm,bcm11351-cpu-method"
>>  
>>  - cpu-release-addr
>>  Usage: required for systems that have an "enable-method"
>> @@ -209,6 +210,17 @@ nodes to be present and contain the properties 
>> described below.
>>  Value type: 
>>  Definition: Specifies the ACC[2] node associated with this CPU.
>>  
>> +- secondary-boot-reg
>> +Usage:
>> +Required for systems that have an "enable-method"
>> +property value of "brcm,bcm11351-cpu-method".
>> +Value type: 
>> +Definition:
>> +Specifies the physical address of the register used to
>> +request the ROM holding pen code release a secondary
>> +CPU.  The value written to the register is formed by
>> +encoding the target CPU id into the low bits of the
>> +physical start address it should jump to.
>>  
>>  Example 1 (dual-cluster big.LITTLE system 32-bit):
>>  
>> -- 
>> 1.9.1
>>
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] PCI: Introduce new device binding path using pci_dev.driver_override

2014-05-27 Thread Greg KH

On Tue, May 27, 2014 at 09:07:42PM -0600, Bjorn Helgaas wrote:
> On Tue, May 20, 2014 at 08:53:21AM -0600, Alex Williamson wrote:
> > The driver_override field allows us to specify the driver for a device
> > rather than relying on the driver to provide a positive match of the
> > device.  This shortcuts the existing process of looking up the vendor
> > and device ID, adding them to the driver new_id, binding the device,
> > then removing the ID, but it also provides a couple advantages.
> > 
> > First, the above existing process allows the driver to bind to any
> > device matching the new_id for the window where it's enabled.  This is
> > often not desired, such as the case of trying to bind a single device
> > to a meta driver like pci-stub or vfio-pci.  Using driver_override we
> > can do this deterministically using:
> > 
> > echo pci-stub > /sys/bus/pci/devices/:03:00.0/driver_override
> > echo :03:00.0 > /sys/bus/pci/devices/:03:00.0/driver/unbind
> > echo :03:00.0 > /sys/bus/pci/drivers_probe
> > 
> > Previously we could not invoke drivers_probe after adding a device
> > to new_id for a driver as we get non-deterministic behavior whether
> > the driver we intend or the standard driver will claim the device.
> > Now it becomes a deterministic process, only the driver matching
> > driver_override will probe the device.
> > 
> > To return the device to the standard driver, we simply clear the
> > driver_override and reprobe the device:
> > 
> > echo > /sys/bus/pci/devices/:03:00.0/driver_override
> > echo :03:00.0 > /sys/bus/pci/devices/:03:00.0/driver/unbind
> > echo :03:00.0 > /sys/bus/pci/drivers_probe
> > 
> > Another advantage to this approach is that we can specify a driver
> > override to force a specific binding or prevent any binding.  For
> > instance when an IOMMU group is exposed to userspace through VFIO
> > we require that all devices within that group are owned by VFIO.
> > However, devices can be hot-added into an IOMMU group, in which case
> > we want to prevent the device from binding to any driver (override
> > driver = "none") or perhaps have it automatically bind to vfio-pci.
> > With driver_override it's a simple matter for this field to be set
> > internally when the device is first discovered to prevent driver
> > matches.
> > 
> > Signed-off-by: Alex Williamson 
> > Cc: Greg Kroah-Hartman 
> 
> Greg, are you going to weigh in on this?  It does seem to solve some real
> problems.  ISTR you had an opinion once, but I don't know your current
> thoughts.

Give me a few more days, still digging through my patch backlog...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] Please pull powerpc.git merge branch

2014-05-27 Thread Benjamin Herrenschmidt

Hi Linus !

Here's a pair of powerpc fixes for 3.15 which are also going to stable.

One's a fix for building with newer binutils (the problem currently only
affects the BookE kernels but the affected macro might come back into
use on BookS platforms at any time). Unfortunately, the binutils maintainer
did a backward incompatible change to a construct that we use so we have
to add Makefile check.

The other one is a fix for CPUs getting stuck in kexec when running single
threaded. Since we routinely use kexec on power (including in our newer
bootloaders), I deemed that important enough.

Cheers,
Ben.

The following changes since commit 8050936caf125fbe54111ba5e696b68a360556ba:

  powerpc: irq work racing with timer interrupt can result in timer interrupt 
hang (2014-05-12 14:29:28 +1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

for you to fetch changes up to 011e4b02f1da156ac7fea28a9da878f3c23af739:

  powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode 
(2014-05-28 13:24:26 +1000)


Guenter Roeck (1):
  powerpc: Fix 64 bit builds with binutils 2.24

Srivatsa S. Bhat (1):
  powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

 arch/powerpc/Makefile  | 4 +++-
 arch/powerpc/include/asm/ppc_asm.h | 7 ++-
 arch/powerpc/kernel/machine_kexec_64.c | 2 +-
 kernel/kexec.c | 8 
 4 files changed, 18 insertions(+), 3 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] hv: use correct order when freeing monitor_pages

2014-05-27 Thread Jason Wang

On 05/28/2014 01:16 AM, Radim Krčmář wrote:
> We try to free two pages when only one has been allocated.
> Cleanup path is unlikely, so I haven't found any trace that would fit,
> but I hope that free_pages_prepare() does catch it.
>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Radim Krčmář 
> ---
>  Cc'd stable because the worst-case looks hard to debug.
>  Btw. the module can't get unloaded after we successfully connect?
>
>  drivers/hv/connection.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 7f10c15..e84f452 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -224,8 +224,8 @@ cleanup:
>   vmbus_connection.int_page = NULL;
>   }
>  
> - free_pages((unsigned long)vmbus_connection.monitor_pages[0], 1);
> - free_pages((unsigned long)vmbus_connection.monitor_pages[1], 1);
> + free_pages((unsigned long)vmbus_connection.monitor_pages[0], 0);
> + free_pages((unsigned long)vmbus_connection.monitor_pages[1], 0);
>   vmbus_connection.monitor_pages[0] = NULL;
>   vmbus_connection.monitor_pages[1] = NULL;
>  

Acked-by: Jason Wang 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: fs/dcache.c - BUG: soft lockup - CPU#5 stuck for 22s! [systemd-udevd:1667]

2014-05-27 Thread Al Viro

On Tue, May 27, 2014 at 10:04:09AM +0300, Mika Westerberg wrote:
> On Tue, May 27, 2014 at 05:00:26AM +0100, Al Viro wrote:
> > On Tue, May 27, 2014 at 04:14:15AM +0100, Al Viro wrote:
> > 
> > > As the matter of fact, let's try this instead - retry the same sucker
> > > immediately in case if trylocks fail.  Comments?
> > 
> > Better yet, let's take "move back to shrink list" into dentry_kill()
> > itself.  Then we get consistent locking rules for dentry_kill() and
> > instead of unlock_on_failure we simply pass it NULL or the shrink
> > list to put the sucker back.  Mika, could you test this one and see
> > if it fixes that livelock?  The difference in behaviour is that in
> > case of trylock failure we hit that sucker again without letting
> > it ride all the way around the list, same as we do for other dentry_kill()
> > callers.
> 
> I tried this patch and unfortunately it still results the same sort of
> livelock. I've attached the dmesg.
> 
> I also tried the serialization patch from Linus and it seemed to fix the
> problem. After several rounds of USB memory stick plug/unplug I haven't
> seen a single "soft lockup" warning in dmesg.
> 
> I'm able to reproduce the problem pretty easily, so if you have
> something else to try I'm more than happy to give it a try.

Could you try this and post the resulting log?  I'd really like to understand
what's going on there - are we really hitting trylock failures there and what
dentries are involved.

diff --git a/fs/dcache.c b/fs/dcache.c
index 42ae01e..75f56a6 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include "mount.h"
 
@@ -448,7 +449,7 @@ EXPORT_SYMBOL(d_drop);
  * Returns dentry requiring refcount drop, or NULL if we're done.
  */
 static struct dentry *
-dentry_kill(struct dentry *dentry, int unlock_on_failure)
+dentry_kill(struct dentry *dentry, struct list_head *shrink_list)
__releases(dentry->d_lock)
 {
struct inode *inode;
@@ -464,10 +465,10 @@ dentry_kill(struct dentry *dentry, int unlock_on_failure)
inode = dentry->d_inode;
if (inode && !spin_trylock(>i_lock)) {
 relock:
-   if (unlock_on_failure) {
-   spin_unlock(>d_lock);
-   cpu_relax();
-   }
+   if (shrink_list)
+   d_shrink_add(dentry, shrink_list);
+   spin_unlock(>d_lock);
+   cpu_relax();
return dentry; /* try again with same dentry */
}
if (!IS_ROOT(dentry))
@@ -542,6 +543,14 @@ out:
  * on the compiler to always get this right (gcc generally doesn't).
  * Real recursion would eat up our stack space.
  */
+static inline void dump(const char *s, struct dentry *dentry)
+{
+   if (unlikely(dentry->d_sb->s_magic == SYSFS_MAGIC)) {
+   printk(KERN_ERR "%s[%pd4]; CPU %d PID %d [%s]\n",
+   s, dentry, smp_processor_id(),
+   task_pid_nr(current), current->comm);
+   }
+}
 
 /*
  * dput - release a dentry
@@ -579,7 +588,9 @@ repeat:
return;
 
 kill_it:
-   dentry = dentry_kill(dentry, 1);
+   if (dentry->d_inode)
+   dump("dput", dentry);
+   dentry = dentry_kill(dentry, NULL);
if (dentry)
goto repeat;
 }
@@ -798,6 +809,7 @@ static void shrink_dentry_list(struct list_head *list)
 
while (!list_empty(list)) {
dentry = list_entry(list->prev, struct dentry, d_lru);
+again:
spin_lock(>d_lock);
/*
 * The dispose list is isolated and dentries are not accounted
@@ -815,22 +827,19 @@ static void shrink_dentry_list(struct list_head *list)
continue;
}
 
-   parent = dentry_kill(dentry, 0);
+   dump("shrink", dentry);
+   parent = dentry_kill(dentry, list);
/*
 * If dentry_kill returns NULL, we have nothing more to do.
 */
if (!parent)
continue;
 
+/* if trylocks have failed; just do it again */
if (unlikely(parent == dentry)) {
-   /*
-* trylocks have failed and d_lock has been held the
-* whole time, so it could not have been added to any
-* other lists. Just add it back to the shrink list.
-*/
-   d_shrink_add(dentry, list);
-   spin_unlock(>d_lock);
-   continue;
+   if (dentry->d_sb->s_magic == SYSFS_MAGIC)
+   printk(KERN_ERR "A");
+   goto again;
}
/*
 * We need to prune ancestors too. This is necessary to prevent
@@ -839,8 +848,10 @@ static void

Re: [PATCH v3] PCI: Introduce new device binding path using pci_dev.driver_override

2014-05-27 Thread Bjorn Helgaas

On Tue, May 20, 2014 at 08:53:21AM -0600, Alex Williamson wrote:
> The driver_override field allows us to specify the driver for a device
> rather than relying on the driver to provide a positive match of the
> device.  This shortcuts the existing process of looking up the vendor
> and device ID, adding them to the driver new_id, binding the device,
> then removing the ID, but it also provides a couple advantages.
> 
> First, the above existing process allows the driver to bind to any
> device matching the new_id for the window where it's enabled.  This is
> often not desired, such as the case of trying to bind a single device
> to a meta driver like pci-stub or vfio-pci.  Using driver_override we
> can do this deterministically using:
> 
> echo pci-stub > /sys/bus/pci/devices/:03:00.0/driver_override
> echo :03:00.0 > /sys/bus/pci/devices/:03:00.0/driver/unbind
> echo :03:00.0 > /sys/bus/pci/drivers_probe
> 
> Previously we could not invoke drivers_probe after adding a device
> to new_id for a driver as we get non-deterministic behavior whether
> the driver we intend or the standard driver will claim the device.
> Now it becomes a deterministic process, only the driver matching
> driver_override will probe the device.
> 
> To return the device to the standard driver, we simply clear the
> driver_override and reprobe the device:
> 
> echo > /sys/bus/pci/devices/:03:00.0/driver_override
> echo :03:00.0 > /sys/bus/pci/devices/:03:00.0/driver/unbind
> echo :03:00.0 > /sys/bus/pci/drivers_probe
> 
> Another advantage to this approach is that we can specify a driver
> override to force a specific binding or prevent any binding.  For
> instance when an IOMMU group is exposed to userspace through VFIO
> we require that all devices within that group are owned by VFIO.
> However, devices can be hot-added into an IOMMU group, in which case
> we want to prevent the device from binding to any driver (override
> driver = "none") or perhaps have it automatically bind to vfio-pci.
> With driver_override it's a simple matter for this field to be set
> internally when the device is first discovered to prevent driver
> matches.
> 
> Signed-off-by: Alex Williamson 
> Cc: Greg Kroah-Hartman 

Greg, are you going to weigh in on this?  It does seem to solve some real
problems.  ISTR you had an opinion once, but I don't know your current
thoughts.

Bjorn

> ---
> 
> v3: kfree() override buffer on device release, noted by Alex Graf
> 
> v2: Use strchr() as suggested by Guenter Roeck and adopted by the
> platform driver version of this same interface.
> 
>  Documentation/ABI/testing/sysfs-bus-pci |   21 
>  drivers/pci/pci-driver.c|   25 +--
>  drivers/pci/pci-sysfs.c |   40 
> +++
>  drivers/pci/probe.c |1 +
>  include/linux/pci.h |1 +
>  5 files changed, 85 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-pci 
> b/Documentation/ABI/testing/sysfs-bus-pci
> index a3c5a66..898ddc4 100644
> --- a/Documentation/ABI/testing/sysfs-bus-pci
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -250,3 +250,24 @@ Description:
>   valid.  For example, writing a 2 to this file when sriov_numvfs
>   is not 0 and not 2 already will return an error. Writing a 10
>   when the value of sriov_totalvfs is 8 will return an error.
> +
> +What:/sys/bus/pci/devices/.../driver_override
> +Date:April 2014
> +Contact: Alex Williamson 
> +Description:
> + This file allows the driver for a device to be specified which
> + will override standard static and dynamic ID matching.  When
> + specified, only a driver with a name matching the value written
> + to driver_override will have an opportunity to bind to the
> + device.  The override is specified by writing a string to the
> + driver_override file (echo pci-stub > driver_override) and
> + may be cleared with an empty string (echo > driver_override).
> + This returns the device to standard matching rules binding.
> + Writing to driver_override does not automatically unbind the
> + device from its current driver or make any attempt to
> + automatically load the specified driver.  If no driver with a
> + matching name is currently loaded in the kernel, the device
> + will not bind to any driver.  This also allows devices to
> + opt-out of driver binding using a driver_override name such as
> + "none".  Only a single driver may be specified in the override,
> + there is no support for parsing delimiters.
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index d911e0c..4393c12 100644
> --- a/drivers/pci/pci-driver.c
> +++

Re: [PATCH v3 1/4] mfd: intel_soc_pmic: Core driver

2014-05-27 Thread Zhu, Lejun



On 5/27/2014 11:35 PM, Lee Jones wrote:
>> This patch provides the common code for the intel_soc_pmic MFD driver, such 
>> as read/write register and set up IRQ.
(...)
>> +/*
>> +* Set and clear multiple bits of a PMIC register
>> +*/
>> +int intel_soc_pmic_update(int reg, u8 val, u8 mask)
>> +{
>> +int ret;
>> +
>> +mutex_lock(_lock);
>> +
>> +if (!pmic)
>> +ret = -EIO;
>> +else
>> +ret = regmap_update_bits(pmic->regmap, reg, mask, val);
>> +
>> +mutex_unlock(_lock);
>> +
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(intel_soc_pmic_update);
> 
> I'm really not a fan of all these pointless agregation call-backs.  I
> see them as unesersary overhead.  Just use the regmap API directly.

OK. I'll remove these wrappers from the MFD driver, seems no one likes
them...

I'll fix the patch set as you suggested and resubmit. Thank you for
reviewing this.

Best Regards
Lejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci: hotplug: cpqphp_ctrl.c: Fix for possible null pointer dereference

2014-05-27 Thread Bjorn Helgaas

On Sun, May 18, 2014 at 06:02:57PM +0200, Rickard Strandqvist wrote:
> There is otherwise a risk of a possible null pointer dereference.
> 
> Was largely found by using a static code analysis program called cppcheck.
> 
> Signed-off-by: Rickard Strandqvist 

Applied to pci/hotplug for v3.16, thanks!

> ---
>  drivers/pci/hotplug/cpqphp_ctrl.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/hotplug/cpqphp_ctrl.c 
> b/drivers/pci/hotplug/cpqphp_ctrl.c
> index 11845b7..a319d07 100644
> --- a/drivers/pci/hotplug/cpqphp_ctrl.c
> +++ b/drivers/pci/hotplug/cpqphp_ctrl.c
> @@ -709,7 +709,8 @@ static struct pci_resource *get_max_resource(struct 
> pci_resource **head, u32 siz
>   temp = temp->next;
>   }
>  
> - temp->next = max->next;
> + if(temp)
> + temp->next = max->next;
>   }
>  
>   max->next = NULL;
> -- 
> 1.7.10.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/3] RAS: Correctable Errors Collector thing

2014-05-27 Thread Chen Yucong


> From: Borislav Petkov 
> 
> Hi all,
> 
> this is something Tony and I have been working on behind the curtains
> recently. Here it is in a RFC form, it passes quick testing in kvm. Let
> me send it out before I start hammering on it on a real machine.
> 
> More indepth info about what it is and what it does is in patch 1/3.
> 
> As always, comments and suggestions are most welcome.
> 
> Thanks.

What's the point of this patch set?
My understanding is that if there are some(COUNT_MASK) corrected DRAM
ECC errors for a specific page frame, we can believe that the page frame
is so ill that it should be isolated as soon as possible.

The question is: memory_failure can not be used for isolating the page
frame which is being used by kernel, because it just poison the page and
IGNORED. memory_failure is mostly used for handling AR/AO type errors
related to the page frame which the userspace tasks are using now.

Although the relative page frame is very ill, it is not dead and can
still work. However, memory_failure may kill the userspace tasks,
especially for those page frames that are holding dynamic data rather
than file-backed(file/swap) data.

So I do not think that it is a good idea to directly use memory_failure
in this patch set. 

thx!
cyc


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

MIGRATE_RESERVE pages in show_mem function problems

2014-05-27 Thread Wang, Yalin

Hi  

I find the show_mem function show page MIGRATE types result is not correct for
MIGRATE_RESERVE pages :

Normal: 1582*4kB (UEMC) 1317*8kB (UEMC) 1020*16kB (UEMC) 450*32kB (UEMC) 
206*64kB (UEMC) 40*128kB (UM) 10*256kB (UM) 10*512kB (UM) 1*1024kB (M) 0*2048kB 
0*4096kB = 74592kB

Some pages should be marked (R)  , while it is changed into MIGRATE_MOVEABLE or 
UNMOVEABLE in free_area list ,
It's not correct for debug .
I make a patch for this:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..6ef8ebe 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1198,7 +1198,8 @@ static int rmqueue_bulk(struct zone *zone, unsigned int 
order,
list_add_tail(>lru, list);
if (IS_ENABLED(CONFIG_CMA)) {
mt = get_pageblock_migratetype(page);
-   if (!is_migrate_cma(mt) && !is_migrate_isolate(mt))
+   if (!is_migrate_cma(mt) && !is_migrate_isolate(mt)
+   && mt != MIGRATE_RESERVE)
mt = migratetype;
}
set_freepage_migratetype(page, mt);


seems work ok , I am curious is it a BUG ? or designed like this for some 
reason ?

Thanks 


<6>[  250.751554] lowmem_reserve[]: 0 0 0
<6>[  250.751606] Normal: 1582*4kB (UEMC) 1317*8kB (UEMC) 1020*16kB (UEMC) 
450*32kB (UEMC) 206*64kB (UEMC) 40*128kB (UM) 10*256kB (UM) 10*512kB (UM) 
1*1024kB (M) 0*2048kB 0*4096kB = 74592kB
<6>[  250.751848] HighMem: 167*4kB (UC) 3*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 692kB
<6>[  250.752020] 62596 total pagecache pages
<6>[  250.752046] 0 pages in swap cache
<6>[  250.752074] Swap cache stats: add 0, delete 0, find 0/0




Sony Mobile Communications
Tel: My Number +18610323092
yalin.w...@sonymobile.com  
sonymobile.com



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/8] Enable dma driver for MIC X100 Coprocessors.

2014-05-27 Thread Sudeep Dutt

On Tue, 2014-05-27 at 14:14 -0700, Greg Kroah-Hartman wrote:
> On Wed, May 07, 2014 at 08:10:57PM -0700, Sudeep Dutt wrote:
> > On Thu, 2014-04-24 at 11:10 -0700, Siva Krishna Yerramreddy wrote:
> > > On Mon, 2014-04-14 at 13:14 -0700, Siva Yerramreddy wrote:
> > > > I am sending all these patches to char-misc because there is a 
> > > > dependency
> > > > between the patches for dma driver and other drivers.
> > > > 
> > > Greg, any feedback on the patches?
> > 
> > Hi Greg,
> > The primary author of this patch series Siva is no longer with Intel so
> > we will be taking ownership of addressing review feedback.
> 
> Care to resend these with an author email address that will not bounce?
> I don't like taking code from people with invalid email addresses...
> 

Sure, I have resent the patch series. Please take a look.

Thanks,
Sudeep Dutt

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH char-misc-next 1/8] misc: mic: Add mic bus and dma driver documentation

2014-05-27 Thread Sudeep Dutt

From: Siva Yerramreddy 

Added an overview of mic bus and dma driver.

Reviewed-by: Ashutosh Dixit 
Reviewed-by: Nikhil Rao 
Reviewed-by: Sudeep Dutt 
Signed-off-by: Siva Yerramreddy 
---
 Documentation/mic/mic_overview.txt | 67 +++---
 1 file changed, 41 insertions(+), 26 deletions(-)

diff --git a/Documentation/mic/mic_overview.txt 
b/Documentation/mic/mic_overview.txt
index b419292..77c5418 100644
--- a/Documentation/mic/mic_overview.txt
+++ b/Documentation/mic/mic_overview.txt
@@ -17,35 +17,50 @@ for applications. A key benefit of our solution is that it 
leverages
 the standard virtio framework for network, disk and console devices,
 though in our case the virtio framework is used across a PCIe bus.
 
+MIC PCIe card has a dma controller with 8 channels. These channels are
+shared between the host s/w and the card s/w. 0 to 3 are used by host
+and 4 to 7 by card. As the dma device doesn't show up as PCIe device,
+a virtual bus called mic bus is created and virtual dma devices are
+created on it by the host/card drivers. On host the channels are private
+and used only by the host driver to transfer data for the virtio devices.
+
 Here is a block diagram of the various components described above. The
 virtio backends are situated on the host rather than the card given better
 single threaded performance for the host compared to MIC, the ability of
 the host to initiate DMA's to/from the card using the MIC DMA engine and
 the fact that the virtio block storage backend can only be on the host.
 
-  |
-   +--+   | +--+
-   | Card OS  |   | | Host OS  |
-   +--+   | +--+
-  |
-+---+ ++ +--+ | +-+  ++ ++
-| Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
-| Net   | |Console | |Block | | |Net  |  |Console | |Block   |
-| Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
-+---+ ++ +--+ | +-+  ++ ++
-| | | |  || |
-| | | |User  || |
-| | | |--||-|---
-+---+ |Kernel +--+
-  |   |   | Virtio over PCIe IOCTLs  |
-  |   |   +--+
-  +--+|   |
-  |Intel MIC ||+---+
-  |Card Driver   |||Intel MIC  |
-  +--+||Host Driver|
-  |   |+---+
-  |   |   |
- +-+
- | |
- |PCIe Bus |
- +-+
+  |
+   +--+   | +--+
+   | Card OS  |   | | Host OS  |
+   +--+   | +--+
+  |
++---+ ++ +--+ | +-+  ++ ++
+| Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
+| Net   | |Console | |Block | | |Net  |  |Console | |Block   |
+| Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
++---+ ++ +--+ | +-+  ++ ++
+| | | |  || |
+| | | |User  || |
+| | | |--||-|---
++---+ |Kernel +--+
+  |   |   | Virtio over PCIe IOCTLs  |
+  |   |   +--+
++---+ |   |   |  +---+
+| MIC DMA   | |   |   |  | MIC DMA   |
+| Driver| |   |   |  | Driver|
++---+ |   |   |  +---+
+  |   |   |   ||
++---+ |   |   |  ++
+|MIC virtual Bus| |   |   |  |MIC virtual Bus |
++---+ |   |   |  ++
+  |   |   |   |  |
+  |   +--+

[PATCH char-misc-next 0/8] Enable dma driver for MIC X100 Coprocessors

2014-05-27 Thread Sudeep Dutt

These patches are being sent to char-misc because there is a dependency
between the patches for dma driver and other drivers.

Description:

This set of patches add support for MIC X100 dma driver.
MIC PCIe card has a dma controller with 8 channels. These channels are
shared between the host s/w and the card s/w. 0 to 3 are used by host
nd 4 to 7 by card. As the dma device doesn't show up as PCIe device,
a virtual bus called mic bus is created and virtual dma devices are
created on it by the host/card drivers. On host the channels are private
and used only by the host driver to transfer data for the virtio devices.

Here is a higher level block diagram.
  |
   +--+   | +--+
   | Card OS  |   | | Host OS  |
   +--+   | +--+
  |
+---+ ++ +--+ | +-+  ++ ++
| Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
| Net   | |Console | |Block | | |Net  |  |Console | |Block   |
| Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
+---+ ++ +--+ | +-+  ++ ++
| | | |  || |
| | | |User  || |
| | | |--||-|---
+---+ |Kernel +--+
  |   |   | Virtio over PCIe IOCTLs  |
  |   |   +--+
+---+ |   |   |  +---+
| MIC DMA   | |   |   |  | MIC DMA   |
| Driver| |   |   |  | Driver|
+---+ |   |   |  +---+
  |   |   |   ||
+---+ |   |   |  ++
|MIC virtual Bus| |   |   |  |MIC virtual Bus |
+---+ |   |   |  ++
  |   |   |   |  |
  |   +--+|+---+ |
  |   |Intel MIC |||Intel MIC  | |
  +---|Card Driver   |||Host Driver| |
  +--+|+---+-+
  |   |   |
 +-+
 | |
 |PCIe Bus |
 +-+

The following series of patches are partitioned as follows:

Patch 1: Add mic bus and dma driver documentation.
 Author: Siva Yerramreddy
Patch 2: Add a bus driver for virtual MIC devices.
 Authors: Siva Yerramreddy, Sudeep Dutt
Patch 3: MIC X100 DMA Driver.
 Author: Siva Yerramreddy
Patch 4: Add threaded irq support in host driver.
 This is needed as the dma driver uses threaded irq.
 Author: Siva Yerramreddy
Patch 5: Use dma to transfer data between MIC and host.
 Authors: Siva Yerramreddy, Ashutosh Dixit
Patch 6: Add threaded irq support in mic_request_card_irq.
 This is needed as the dma driver uses threaded irq.
 Author: Siva Yerramreddy
Patch 7: Add dma device on mic bus.
 Author: Siva Yerramreddy
Patch 8: Modify the mpss script to load/unload mic_x100_dma.ko.
 Author: Siva Yerramreddy

The patches have been compiled/validated against v3.15-rc3. Tested using
dmatest module with module parameter "threads_per_chan=60". These patches
have also been scanned by Fengguang Wu's 0-day infrastructure and no
issues have been reported.

Thanks to Dan Williams, Vinod Koul, Jon Mason, Dave Jiang for the initial
review.

Siva Yerramreddy (8):
  misc: mic: Add mic bus and dma driver documentation
  misc: mic: add a bus driver for virtual MIC devices
  dma: MIC X100 DMA Driver
  misc: mic: add threaded irq support in host driver
  misc: mic: add dma support in host driver
  misc: mic: add threaded irq support in card driver
  misc: mic: add dma support in card driver
  misc: mic: add support for loading/unloading dma driver

 Documentation/mic/mic_overview.txt |  67 ++--
 Documentation/mic/mpssd/mpss   |  14 +-
 drivers/dma/Kconfig|  19 +
 drivers/dma/Makefile   |   1 +
 drivers/dma/mic_x100_dma.c | 774

[PATCH char-misc-next 2/8] misc: mic: add a bus driver for virtual MIC devices

2014-05-27 Thread Sudeep Dutt

From: Siva Yerramreddy 

This MIC virtual bus driver takes the responsibility of creating all
the virtual devices connected to the PCIe device on the host and the
platform device on the card. The MIC bus hardware operations provide
a way to abstract certain hardware details from the base physical devices.
Examples of devices added on the MIC virtual bus include host DMA and card DMA.
This abstraction enables using a common DMA driver on host and card.

Reviewed-by: Ashutosh Dixit 
Reviewed-by: Nikhil Rao 
Signed-off-by: Sudeep Dutt 
Signed-off-by: Siva Yerramreddy 
---
 drivers/misc/mic/Kconfig   |  17 
 drivers/misc/mic/Makefile  |   1 +
 drivers/misc/mic/bus/Makefile  |   5 ++
 drivers/misc/mic/bus/mic_bus.c | 188 +
 include/linux/mic_bus.h| 148 
 5 files changed, 359 insertions(+)
 create mode 100644 drivers/misc/mic/bus/Makefile
 create mode 100644 drivers/misc/mic/bus/mic_bus.c
 create mode 100644 include/linux/mic_bus.h

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index 462a5b1..ee1d2ac 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -1,3 +1,20 @@
+comment "Intel MIC Bus Driver"
+
+config INTEL_MIC_BUS
+   tristate "Intel MIC Bus Driver"
+   depends on 64BIT && PCI && X86 && X86_DEV_DMA_OPS
+   help
+ This option is selected by any driver which registers a
+ device or driver on the MIC Bus, such as CONFIG_INTEL_MIC_HOST,
+ CONFIG_INTEL_MIC_CARD, CONFIG_INTEL_MIC_X100_DMA etc.
+
+ If you are building a host/card kernel with an Intel MIC device
+ then say M (recommended) or Y, else say N. If unsure say N.
+
+ More information about the Intel MIC family as well as the Linux
+ OS and tools for MIC to use with this driver are available from
+ .
+
 comment "Intel MIC Host Driver"
 
 config INTEL_MIC_HOST
diff --git a/drivers/misc/mic/Makefile b/drivers/misc/mic/Makefile
index 05b34d6..e9bf148 100644
--- a/drivers/misc/mic/Makefile
+++ b/drivers/misc/mic/Makefile
@@ -4,3 +4,4 @@
 #
 obj-$(CONFIG_INTEL_MIC_HOST) += host/
 obj-$(CONFIG_INTEL_MIC_CARD) += card/
+obj-$(CONFIG_INTEL_MIC_BUS) += bus/
diff --git a/drivers/misc/mic/bus/Makefile b/drivers/misc/mic/bus/Makefile
new file mode 100644
index 000..d85c7f2
--- /dev/null
+++ b/drivers/misc/mic/bus/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile - Intel MIC Linux driver.
+# Copyright(c) 2014, Intel Corporation.
+#
+obj-$(CONFIG_INTEL_MIC_BUS) += mic_bus.o
diff --git a/drivers/misc/mic/bus/mic_bus.c b/drivers/misc/mic/bus/mic_bus.c
new file mode 100644
index 000..39253b5
--- /dev/null
+++ b/drivers/misc/mic/bus/mic_bus.c
@@ -0,0 +1,188 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2014 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Bus driver.
+ *
+ * This implementation is very similar to the the virtio bus driver
+ * implementation @ drivers/virtio/virtio.c
+ */
+#include 
+#include 
+#include 
+#include 
+
+/* Unique numbering for mbus devices. */
+static DEFINE_IDA(mbus_index_ida);
+
+static ssize_t device_show(struct device *d,
+  struct device_attribute *attr, char *buf)
+{
+   struct mbus_device *dev = dev_to_mbus(d);
+   return sprintf(buf, "0x%04x\n", dev->id.device);
+}
+static DEVICE_ATTR_RO(device);
+
+static ssize_t vendor_show(struct device *d,
+  struct device_attribute *attr, char *buf)
+{
+   struct mbus_device *dev = dev_to_mbus(d);
+   return sprintf(buf, "0x%04x\n", dev->id.vendor);
+}
+static DEVICE_ATTR_RO(vendor);
+
+static ssize_t modalias_show(struct device *d,
+struct device_attribute *attr, char *buf)
+{
+   struct mbus_device *dev = dev_to_mbus(d);
+   return sprintf(buf, "mbus:d%08Xv%08X\n",
+  dev->id.device, dev->id.vendor);
+}
+static DEVICE_ATTR_RO(modalias);
+
+static struct attribute *mbus_dev_attrs[] = {
+   _attr_device.attr,
+   _attr_vendor.attr,
+   _attr_modalias.attr,
+   NULL,
+};
+ATTRIBUTE_GROUPS(mbus_dev);
+
+static inline int mbus_id_match(const struct mbus_device *dev,
+ const struct mbus_device_id *id)
+{
+   if (id->device != dev->id.device && id->device != MBUS_DEV_ANY_ID)
+   return 0;
+
+   return

[PATCH char-misc-next 4/8] misc: mic: add threaded irq support in host driver

2014-05-27 Thread Sudeep Dutt

From: Siva Yerramreddy 

Convert mic_request_irq to mic_request_threaded_irq to support threaded
irq for virtual devices on mic bus.

Reviewed-by: Ashutosh Dixit 
Reviewed-by: Nikhil Rao 
Reviewed-by: Sudeep Dutt 
Signed-off-by: Siva Yerramreddy 
---
 drivers/misc/mic/host/mic_intr.c   | 116 ++---
 drivers/misc/mic/host/mic_intr.h   |  18 --
 drivers/misc/mic/host/mic_main.c   |   5 +-
 drivers/misc/mic/host/mic_virtio.c |   6 +-
 4 files changed, 90 insertions(+), 55 deletions(-)

diff --git a/drivers/misc/mic/host/mic_intr.c b/drivers/misc/mic/host/mic_intr.c
index dbc5afd..e53b150 100644
--- a/drivers/misc/mic/host/mic_intr.c
+++ b/drivers/misc/mic/host/mic_intr.c
@@ -24,28 +24,29 @@
 #include "../common/mic_dev.h"
 #include "mic_device.h"
 
-/*
- * mic_invoke_callback - Invoke callback functions registered for
- * the corresponding source id.
- *
- * @mdev: pointer to the mic_device instance
- * @idx: The interrupt source id.
- *
- * Returns none.
- */
-static inline void mic_invoke_callback(struct mic_device *mdev, int idx)
+static irqreturn_t mic_thread_fn(int irq, void *dev)
 {
+   struct mic_device *mdev = dev;
+   struct mic_intr_info *intr_info = mdev->intr_info;
+   struct mic_irq_info *irq_info = >irq_info;
struct mic_intr_cb *intr_cb;
struct pci_dev *pdev = container_of(mdev->sdev->parent,
-   struct pci_dev, dev);
+   struct pci_dev, dev);
+   int i;
 
-   spin_lock(>irq_info.mic_intr_lock);
-   list_for_each_entry(intr_cb, >irq_info.cb_list[idx], list)
-   if (intr_cb->func)
-   intr_cb->func(pdev->irq, intr_cb->data);
-   spin_unlock(>irq_info.mic_intr_lock);
+   spin_lock(_info->mic_thread_lock);
+   for (i = intr_info->intr_start_idx[MIC_INTR_DB];
+   i < intr_info->intr_len[MIC_INTR_DB]; i++)
+   if (test_and_clear_bit(i, _info->mask)) {
+   list_for_each_entry(intr_cb, _info->cb_list[i],
+   list)
+   if (intr_cb->thread_fn)
+   intr_cb->thread_fn(pdev->irq,
+intr_cb->data);
+   }
+   spin_unlock(_info->mic_thread_lock);
+   return IRQ_HANDLED;
 }
-
 /**
  * mic_interrupt - Generic interrupt handler for
  * MSI and INTx based interrupts.
@@ -53,7 +54,11 @@ static inline void mic_invoke_callback(struct mic_device 
*mdev, int idx)
 static irqreturn_t mic_interrupt(int irq, void *dev)
 {
struct mic_device *mdev = dev;
-   struct mic_intr_info *info = mdev->intr_info;
+   struct mic_intr_info *intr_info = mdev->intr_info;
+   struct mic_irq_info *irq_info = >irq_info;
+   struct mic_intr_cb *intr_cb;
+   struct pci_dev *pdev = container_of(mdev->sdev->parent,
+   struct pci_dev, dev);
u32 mask;
int i;
 
@@ -61,12 +66,19 @@ static irqreturn_t mic_interrupt(int irq, void *dev)
if (!mask)
return IRQ_NONE;
 
-   for (i = info->intr_start_idx[MIC_INTR_DB];
-   i < info->intr_len[MIC_INTR_DB]; i++)
-   if (mask & BIT(i))
-   mic_invoke_callback(mdev, i);
-
-   return IRQ_HANDLED;
+   spin_lock(_info->mic_intr_lock);
+   for (i = intr_info->intr_start_idx[MIC_INTR_DB];
+   i < intr_info->intr_len[MIC_INTR_DB]; i++)
+   if (mask & BIT(i)) {
+   list_for_each_entry(intr_cb, _info->cb_list[i],
+   list)
+   if (intr_cb->handler)
+   intr_cb->handler(pdev->irq,
+intr_cb->data);
+   set_bit(i, _info->mask);
+   }
+   spin_unlock(_info->mic_intr_lock);
+   return IRQ_WAKE_THREAD;
 }
 
 /* Return the interrupt offset from the index. Index is 0 based. */
@@ -99,14 +111,15 @@ static struct msix_entry *mic_get_available_vector(struct 
mic_device *mdev)
  *
  * @mdev: pointer to the mic_device instance
  * @idx: The source id to be registered.
- * @func: The function to be called when the source id receives
+ * @handler: The function to be called when the source id receives
  * the interrupt.
+ * @thread_fn: thread fn. corresponding to the handler
  * @data: Private data of the requester.
  * Return the callback structure that was registered or an
  * appropriate error on failure.
  */
 static struct mic_intr_cb *mic_register_intr_callback(struct mic_device *mdev,
-   u8 idx, irqreturn_t (*func) (int irq, void *dev),
+   u8 idx, irq_handler_t handler, irq_handler_t thread_fn,
void *data)
 {
struct mic_intr_cb *intr_cb;

[PATCH char-misc-next 6/8] misc: mic: add threaded irq support in card driver

2014-05-27 Thread Sudeep Dutt

From: Siva Yerramreddy 

Add threaded irq support in mic_request_card_irq which will be used
for virtual devices added on mic bus.

Reviewed-by: Ashutosh Dixit 
Reviewed-by: Nikhil Rao 
Reviewed-by: Sudeep Dutt 
Signed-off-by: Siva Yerramreddy 
---
 drivers/misc/mic/card/mic_device.c | 21 +++--
 drivers/misc/mic/card/mic_device.h |  5 +++--
 drivers/misc/mic/card/mic_virtio.c |  4 ++--
 3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/misc/mic/card/mic_device.c 
b/drivers/misc/mic/card/mic_device.c
index d0980ff..ff485b7 100644
--- a/drivers/misc/mic/card/mic_device.c
+++ b/drivers/misc/mic/card/mic_device.c
@@ -83,8 +83,8 @@ static int mic_shutdown_init(void)
int shutdown_db;
 
shutdown_db = mic_next_card_db();
-   shutdown_cookie = mic_request_card_irq(mic_shutdown_isr,
-   "Shutdown", mdrv, shutdown_db);
+   shutdown_cookie = mic_request_card_irq(mic_shutdown_isr, NULL,
+  "Shutdown", mdrv, shutdown_db);
if (IS_ERR(shutdown_cookie))
rc = PTR_ERR(shutdown_cookie);
else
@@ -136,7 +136,8 @@ static void mic_dp_uninit(void)
 /**
  * mic_request_card_irq - request an irq.
  *
- * @func: The callback function that handles the interrupt.
+ * @handler: interrupt handler passed to request_threaded_irq.
+ * @thread_fn: thread fn. passed to request_threaded_irq.
  * @name: The ASCII name of the callee requesting the irq.
  * @data: private data that is returned back when calling the
  * function handler.
@@ -149,17 +150,17 @@ static void mic_dp_uninit(void)
  * error code.
  *
  */
-struct mic_irq *mic_request_card_irq(irqreturn_t (*func)(int irq, void *data),
-   const char *name, void *data, int index)
+struct mic_irq *mic_request_card_irq(irq_handler_t handler,
+   irq_handler_t thread_fn, const char *name, void *data, int index)
 {
int rc = 0;
unsigned long cookie;
struct mic_driver *mdrv = g_drv;
 
-   rc  = request_irq(mic_db_to_irq(mdrv, index), func,
-   0, name, data);
+   rc  = request_threaded_irq(mic_db_to_irq(mdrv, index), handler,
+  thread_fn, 0, name, data);
if (rc) {
-   dev_err(mdrv->dev, "request_irq failed rc = %d\n", rc);
+   dev_err(mdrv->dev, "request_threaded_irq failed rc = %d\n", rc);
goto err;
}
mdrv->irq_info.irq_usage_count[index]++;
@@ -172,9 +173,9 @@ err:
 /**
  * mic_free_card_irq - free irq.
  *
- * @cookie: cookie obtained during a successful call to mic_request_irq
+ * @cookie: cookie obtained during a successful call to 
mic_request_threaded_irq
  * @data: private data specified by the calling function during the
- * mic_request_irq
+ * mic_request_threaded_irq
  *
  * returns: none.
  */
diff --git a/drivers/misc/mic/card/mic_device.h 
b/drivers/misc/mic/card/mic_device.h
index 306f502..e12a0c2 100644
--- a/drivers/misc/mic/card/mic_device.h
+++ b/drivers/misc/mic/card/mic_device.h
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * struct mic_intr_info - Contains h/w specific interrupt sources info
@@ -116,8 +117,8 @@ mic_mmio_write(struct mic_mw *mw, u32 val, u32 offset)
 int mic_driver_init(struct mic_driver *mdrv);
 void mic_driver_uninit(struct mic_driver *mdrv);
 int mic_next_card_db(void);
-struct mic_irq *mic_request_card_irq(irqreturn_t (*func)(int irq, void *data),
-   const char *name, void *data, int intr_src);
+struct mic_irq *mic_request_card_irq(irq_handler_t handler,
+   irq_handler_t thread_fn, const char *name, void *data, int intr_src);
 void mic_free_card_irq(struct mic_irq *cookie, void *data);
 u32 mic_read_spad(struct mic_device *mdev, unsigned int idx);
 void mic_send_intr(struct mic_device *mdev, int doorbell);
diff --git a/drivers/misc/mic/card/mic_virtio.c 
b/drivers/misc/mic/card/mic_virtio.c
index 653799b..8cdbc68 100644
--- a/drivers/misc/mic/card/mic_virtio.c
+++ b/drivers/misc/mic/card/mic_virtio.c
@@ -417,7 +417,7 @@ static int mic_add_device(struct mic_device_desc __iomem *d,
 
virtio_db = mic_next_card_db();
mvdev->virtio_cookie = mic_request_card_irq(mic_virtio_intr_handler,
-   "virtio intr", mvdev, virtio_db);
+   NULL, "virtio intr", mvdev, virtio_db);
if (IS_ERR(mvdev->virtio_cookie)) {
ret = PTR_ERR(mvdev->virtio_cookie);
goto kfree;
@@ -606,7 +606,7 @@ int mic_devices_init(struct mic_driver *mdrv)
mic_scan_devices(mdrv, !REMOVE_DEVICES);
 
config_db = mic_next_card_db();
-   virtio_config_cookie = mic_request_card_irq(mic_extint_handler,
+   virtio_config_cookie = mic_request_card_irq(mic_extint_handler, NULL,
"virtio_config_intr", mdrv, config_db);
if (IS_ERR(virtio_config_cookie)) {
rc = PTR_ERR(virtio_config_cookie);
-- 
1.8.2.1

--
To

[PATCH char-misc-next 5/8] misc: mic: add dma support in host driver

2014-05-27 Thread Sudeep Dutt

From: Siva Yerramreddy 

This patch adds a dma device on the mic virtual bus and uses this dmaengine
to transfer data for virtio devices

Reviewed-by: Nikhil Rao 
Signed-off-by: Sudeep Dutt 
Signed-off-by: Ashutosh Dixit 
Signed-off-by: Siva Yerramreddy 
---
 drivers/misc/mic/Kconfig   |   2 +-
 drivers/misc/mic/host/mic_boot.c   |  78 +++-
 drivers/misc/mic/host/mic_device.h |  24 +
 drivers/misc/mic/host/mic_intr.h   |   3 +-
 drivers/misc/mic/host/mic_virtio.c | 179 +
 drivers/misc/mic/host/mic_virtio.h |  21 -
 drivers/misc/mic/host/mic_x100.c   |   8 ++
 7 files changed, 274 insertions(+), 41 deletions(-)

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index ee1d2ac..bf76313 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -19,7 +19,7 @@ comment "Intel MIC Host Driver"
 
 config INTEL_MIC_HOST
tristate "Intel MIC Host Driver"
-   depends on 64BIT && PCI && X86
+   depends on 64BIT && PCI && X86 && INTEL_MIC_BUS
select VHOST_RING
help
  This enables Host Driver support for the Intel Many Integrated
diff --git a/drivers/misc/mic/host/mic_boot.c b/drivers/misc/mic/host/mic_boot.c
index b75c6b5..b462177 100644
--- a/drivers/misc/mic/host/mic_boot.c
+++ b/drivers/misc/mic/host/mic_boot.c
@@ -23,11 +23,66 @@
 #include 
 
 #include 
+#include 
 #include "../common/mic_dev.h"
 #include "mic_device.h"
 #include "mic_smpt.h"
 #include "mic_virtio.h"
 
+static inline struct mic_device *mbdev_to_mdev(struct mbus_device *mbdev)
+{
+   return dev_get_drvdata(mbdev->dev.parent);
+}
+
+static dma_addr_t mic_dma_map_page(struct device *dev, struct page *page,
+   unsigned long offset, size_t size, enum dma_data_direction dir,
+   struct dma_attrs *attrs)
+{
+   void *va = phys_to_virt(page_to_phys(page)) + offset;
+   struct mic_device *mdev = mbdev_to_mdev(dev_get_drvdata(dev));
+
+   return mic_map_single(mdev, va, size);
+}
+
+static void mic_dma_unmap_page(struct device *dev, dma_addr_t dma_addr,
+   size_t size, enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+   struct mic_device *mdev = mbdev_to_mdev(dev_get_drvdata(dev));
+   mic_unmap_single(mdev, dma_addr, size);
+}
+
+static struct dma_map_ops mic_dma_ops = {
+   .map_page = mic_dma_map_page,
+   .unmap_page = mic_dma_unmap_page,
+};
+
+static struct mic_irq *_mic_request_threaded_irq(struct mbus_device *mbdev,
+   irq_handler_t handler, irq_handler_t thread_fn,
+   const char *name, void *data, int intr_src)
+{
+   return mic_request_threaded_irq(mbdev_to_mdev(mbdev), handler,
+   thread_fn, name, data,
+   intr_src, MIC_INTR_DMA);
+}
+
+static void _mic_free_irq(struct mbus_device *mbdev,
+   struct mic_irq *cookie, void *data)
+{
+   return mic_free_irq(mbdev_to_mdev(mbdev), cookie, data);
+}
+
+static void _mic_ack_interrupt(struct mbus_device *mbdev, int num)
+{
+   struct mic_device *mdev = mbdev_to_mdev(mbdev);
+   mdev->ops->intr_workarounds(mdev);
+}
+
+static struct mbus_hw_ops mbus_hw_ops = {
+   .request_threaded_irq = _mic_request_threaded_irq,
+   .free_irq = _mic_free_irq,
+   .ack_interrupt = _mic_ack_interrupt,
+};
+
 /**
  * mic_reset - Reset the MIC device.
  * @mdev: pointer to mic_device instance
@@ -95,9 +150,20 @@ retry:
 */
goto retry;
}
-   rc = mdev->ops->load_mic_fw(mdev, buf);
+   rc = mbus_add_device(>dma_mbdev, mdev->sdev->parent,
+MBUS_DEV_DMA_HOST, _dma_ops, _hw_ops,
+mdev->mmio.va);
if (rc)
goto unlock_ret;
+
+   mdev->dma_ch = mic_request_dma_chan(mdev);
+   if (!mdev->dma_ch) {
+   rc = -ENXIO;
+   goto dma_remove;
+   }
+   rc = mdev->ops->load_mic_fw(mdev, buf);
+   if (rc)
+   goto dma_release;
mic_smpt_restore(mdev);
mic_intr_restore(mdev);
mdev->intr_ops->enable_interrupts(mdev);
@@ -105,6 +171,11 @@ retry:
mdev->ops->write_spad(mdev, MIC_DPHI_SPAD, mdev->dp_dma_addr >> 32);
mdev->ops->send_firmware_intr(mdev);
mic_set_state(mdev, MIC_ONLINE);
+   goto unlock_ret;
+dma_release:
+   dma_release_channel(mdev->dma_ch);
+dma_remove:
+   mbus_remove_device(>dma_mbdev);
 unlock_ret:
mutex_unlock(>mic_mutex);
return rc;
@@ -122,6 +193,11 @@ void mic_stop(struct mic_device *mdev, bool force)
mutex_lock(>mic_mutex);
if (MIC_OFFLINE != mdev->state || force) {
mic_virtio_reset_devices(mdev);
+   if (mdev->dma_ch) {
+   dma_release_channel(mdev->dma_ch);
+   mdev->dma_ch = NULL;
+   }
+   mbus_remove_device(>dma_mbdev);

[PATCH char-misc-next 3/8] dma: MIC X100 DMA Driver

2014-05-27 Thread Sudeep Dutt

From: Siva Yerramreddy 

This patch implements DMA Engine API for DMA controller on MIC X100
Coprocessors. DMA h/w is shared between host and card s/w.
Channels 0 to 3 are used by host and 4 to 7 are used by card.
Since the DMA device doesn't show up as PCIe device, a virtual bus called mic
bus is created and virtual devices are added on that bus to follow device model.
Allowed dma transfer directions are host to card, card to host and card to card.

Reviewed-by: Ashutosh Dixit 
Reviewed-by: Nikhil Rao 
Reviewed-by: Sudeep Dutt 
Signed-off-by: Siva Yerramreddy 
---
 drivers/dma/Kconfig|  19 ++
 drivers/dma/Makefile   |   1 +
 drivers/dma/mic_x100_dma.c | 774 +
 drivers/dma/mic_x100_dma.h | 286 +
 4 files changed, 1080 insertions(+)
 create mode 100644 drivers/dma/mic_x100_dma.c
 create mode 100644 drivers/dma/mic_x100_dma.h

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 5c58638..39b66a8 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -33,6 +33,25 @@ if DMADEVICES
 
 comment "DMA Devices"
 
+config INTEL_MIC_X100_DMA
+   tristate "Intel MIC X100 DMA Driver"
+   depends on 64BIT && X86 && INTEL_MIC_BUS
+   select DMAENGINE
+   default N
+   help
+ This enables DMA support for the Intel Many Integrated Core
+ (MIC) family of PCIe form factor coprocessor X100 devices that
+ run a 64 bit Linux OS. This driver will be used by both MIC
+ host and card drivers.
+
+ If you are building host kernel with a MIC device or a card
+ kernel for a MIC device, then say M (recommended) or Y, else
+ say N. If unsure say N.
+
+ More information about the Intel MIC family as well as the Linux
+ OS and tools for MIC to use with this driver are available from
+ .
+
 config INTEL_MID_DMAC
tristate "Intel MID DMA support for Peripheral DMA controllers"
depends on PCI && X86
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 5150c82..c933022 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -46,3 +46,4 @@ obj-$(CONFIG_K3_DMA) += k3dma.o
 obj-$(CONFIG_MOXART_DMA) += moxart-dma.o
 obj-$(CONFIG_FSL_EDMA) += fsl-edma.o
 obj-$(CONFIG_QCOM_BAM_DMA) += qcom_bam_dma.o
+obj-$(CONFIG_INTEL_MIC_X100_DMA) += mic_x100_dma.o
diff --git a/drivers/dma/mic_x100_dma.c b/drivers/dma/mic_x100_dma.c
new file mode 100644
index 000..6aec4df
--- /dev/null
+++ b/drivers/dma/mic_x100_dma.c
@@ -0,0 +1,774 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2014 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC X100 DMA Driver.
+ *
+ * Adapted from IOAT dma driver.
+ */
+#include 
+#include 
+#include 
+
+#include "mic_x100_dma.h"
+
+#define MIC_DMA_MAX_XFER_SIZE_CARD  (1 * 1024 * 1024 -\
+  MIC_DMA_ALIGN_BYTES)
+#define MIC_DMA_MAX_XFER_SIZE_HOST  (1 * 1024 * 1024 >> 1)
+#define MIC_DMA_DESC_TYPE_SHIFT60
+#define MIC_DMA_MEMCPY_LEN_SHIFT 46
+#define MIC_DMA_STAT_INTR_SHIFT 59
+
+/* high-water mark for pushing dma descriptors */
+static int mic_dma_pending_level = 4;
+
+/* Status descriptor is used to write a 64 bit value to a memory location */
+enum mic_dma_desc_format_type {
+   MIC_DMA_MEMCPY = 1,
+   MIC_DMA_STATUS,
+};
+
+static inline u32 mic_dma_hw_ring_inc(u32 val)
+{
+   return (val + 1) % MIC_DMA_DESC_RX_SIZE;
+}
+
+static inline u32 mic_dma_hw_ring_dec(u32 val)
+{
+   return val ? val - 1 : MIC_DMA_DESC_RX_SIZE - 1;
+}
+
+static inline void mic_dma_hw_ring_inc_head(struct mic_dma_chan *ch)
+{
+   ch->head = mic_dma_hw_ring_inc(ch->head);
+}
+
+/* Prepare a memcpy desc */
+static inline void mic_dma_memcpy_desc(struct mic_dma_desc *desc,
+   dma_addr_t src_phys, dma_addr_t dst_phys, u64 size)
+{
+   u64 qw0, qw1;
+
+   qw0 = src_phys;
+   qw0 |= (size >> MIC_DMA_ALIGN_SHIFT) << MIC_DMA_MEMCPY_LEN_SHIFT;
+   qw1 = MIC_DMA_MEMCPY;
+   qw1 <<= MIC_DMA_DESC_TYPE_SHIFT;
+   qw1 |= dst_phys;
+   desc->qw0 = qw0;
+   desc->qw1 = qw1;
+}
+
+/* Prepare a status desc. with @data to be written at @dst_phys */
+static inline void mic_dma_prep_status_desc(struct mic_dma_desc *desc, u64 
data,
+   dma_addr_t dst_phys, bool generate_intr)
+{
+   u64 qw0, qw1;
+
+   qw0 = data;
+   qw1 = (u64)

[PATCH char-misc-next 8/8] misc: mic: add support for loading/unloading dma driver

2014-05-27 Thread Sudeep Dutt

From: Siva Yerramreddy 

modprobe dma driver upon start and remove it upon unload.

Signed-off-by: Siva Yerramreddy 
---
 Documentation/mic/mpssd/mpss | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/Documentation/mic/mpssd/mpss b/Documentation/mic/mpssd/mpss
index 3136c68..cacbdb0 100755
--- a/Documentation/mic/mpssd/mpss
+++ b/Documentation/mic/mpssd/mpss
@@ -48,18 +48,18 @@ start()
fi
 
echo -e $"Starting MPSS Stack"
-   echo -e $"Loading MIC_HOST Module"
+   echo -e $"Loading MIC_X100_DMA & MIC_HOST Modules"
 
-   # Ensure the driver is loaded
-   if [ ! -d "$sysfs" ]; then
-   modprobe mic_host
+   for f in "mic_host" "mic_x100_dma"
+   do
+   modprobe $f
RETVAL=$?
if [ $RETVAL -ne 0 ]; then
failure
echo
return $RETVAL
fi
-   fi
+   done
 
# Start the daemon
echo -n $"Starting MPSSD "
@@ -170,8 +170,8 @@ unload()
stop
 
sleep 5
-   echo -n $"Removing MIC_HOST Module: "
-   modprobe -r mic_host
+   echo -n $"Removing MIC_HOST & MIC_X100_DMA Modules: "
+   modprobe -r mic_host mic_x100_dma
RETVAL=$?
[ $RETVAL -ne 0 ] && failure || success
echo
-- 
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH char-misc-next 7/8] misc: mic: add dma support in card driver

2014-05-27 Thread Sudeep Dutt

From: Siva Yerramreddy 

This patch adds a dma device on the mic virtual bus

Reviewed-by: Nikhil Rao 
Signed-off-by: Sudeep Dutt 
Signed-off-by: Siva Yerramreddy 
Signed-off-by: Ashutosh Dixit 
---
 drivers/misc/mic/Kconfig   |  2 +-
 drivers/misc/mic/card/mic_device.h |  3 +++
 drivers/misc/mic/card/mic_x100.c   | 52 +-
 3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index bf76313..cc4eef0 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -39,7 +39,7 @@ comment "Intel MIC Card Driver"
 
 config INTEL_MIC_CARD
tristate "Intel MIC Card Driver"
-   depends on 64BIT && X86
+   depends on 64BIT && X86 && INTEL_MIC_BUS
select VIRTIO
help
  This enables card driver support for the Intel Many Integrated
diff --git a/drivers/misc/mic/card/mic_device.h 
b/drivers/misc/mic/card/mic_device.h
index e12a0c2..8d735ba 100644
--- a/drivers/misc/mic/card/mic_device.h
+++ b/drivers/misc/mic/card/mic_device.h
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * struct mic_intr_info - Contains h/w specific interrupt sources info
@@ -71,6 +72,7 @@ struct mic_device {
  * @hotplug_work: Hot plug work for adding/removing virtio devices.
  * @irq_info: The OS specific irq information
  * @intr_info: H/W specific interrupt information.
+ * @dma_mbdev: dma device on the MIC virtual bus.
  */
 struct mic_driver {
char name[20];
@@ -81,6 +83,7 @@ struct mic_driver {
struct work_struct hotplug_work;
struct mic_irq_info irq_info;
struct mic_intr_info intr_info;
+   struct mbus_device dma_mbdev;
 };
 
 /**
diff --git a/drivers/misc/mic/card/mic_x100.c b/drivers/misc/mic/card/mic_x100.c
index 2868945..85066cf 100644
--- a/drivers/misc/mic/card/mic_x100.c
+++ b/drivers/misc/mic/card/mic_x100.c
@@ -148,6 +148,46 @@ void mic_card_unmap(struct mic_device *mdev, void __iomem 
*addr)
iounmap(addr);
 }
 
+static inline struct mic_driver *mbdev_to_mdrv(struct mbus_device *mbdev)
+{
+   return dev_get_drvdata(mbdev->dev.parent);
+}
+
+static struct mic_irq *_mic_request_threaded_irq(struct mbus_device *mbdev,
+   irq_handler_t handler, irq_handler_t thread_fn,
+   const char *name, void *data, int intr_src)
+{
+   int rc = 0;
+   unsigned int irq = intr_src;
+   unsigned long cookie = irq;
+
+   rc  = request_threaded_irq(irq, handler, thread_fn, 0, name, data);
+   if (rc) {
+   dev_err(mbdev_to_mdrv(mbdev)->dev,
+   "request_threaded_irq failed rc = %d\n", rc);
+   return ERR_PTR(rc);
+   }
+   return (struct mic_irq *)cookie;
+}
+
+static void _mic_free_irq(struct mbus_device *mbdev,
+   struct mic_irq *cookie, void *data)
+{
+   unsigned long irq = (unsigned long)cookie;
+   free_irq(irq, data);
+}
+
+static void _mic_ack_interrupt(struct mbus_device *mbdev, int num)
+{
+   mic_ack_interrupt(_to_mdrv(mbdev)->mdev);
+}
+
+static struct mbus_hw_ops mbus_hw_ops = {
+   .request_threaded_irq = _mic_request_threaded_irq,
+   .free_irq = _mic_free_irq,
+   .ack_interrupt = _mic_ack_interrupt,
+};
+
 static int __init mic_probe(struct platform_device *pdev)
 {
struct mic_driver *mdrv = _drv;
@@ -166,13 +206,22 @@ static int __init mic_probe(struct platform_device *pdev)
goto done;
}
mic_hw_intr_init(mdrv);
+   platform_set_drvdata(pdev, mdrv);
+   rc = mbus_add_device(>dma_mbdev, mdrv->dev, MBUS_DEV_DMA_MIC,
+NULL, _hw_ops, mdrv->mdev.mmio.va);
+   if (rc) {
+   dev_err(>dev, "mbus_add_device failed rc %d\n", rc);
+   goto iounmap;
+   }
rc = mic_driver_init(mdrv);
if (rc) {
dev_err(>dev, "mic_driver_init failed rc %d\n", rc);
-   goto iounmap;
+   goto remove_dma;
}
 done:
return rc;
+remove_dma:
+   mbus_remove_device(>dma_mbdev);
 iounmap:
iounmap(mdev->mmio.va);
return rc;
@@ -184,6 +233,7 @@ static int mic_remove(struct platform_device *pdev)
struct mic_device *mdev = >mdev;
 
mic_driver_uninit(mdrv);
+   mbus_remove_device(>dma_mbdev);
iounmap(mdev->mmio.va);
return 0;
 }
-- 
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] staging: comedi: addi_apci_1564: add a subdevice for Change-of-State interrupt support

2014-05-27 Thread Chase Southwood

On Tue, May 27, 2014 at 11:34 AM, Ian Abbott  wrote:
> On 2014-05-24 23:24, Chase Southwood wrote:
>>
>> This board supports an interrupt that can be generated by an AND/OR
>> combination of 16 of the input channels.
>>
>> Create a separate subdevice to handle this interrupt.
>>
>> In doing this, this patch moves the apci1564_di_config() operation from
>> the digital input subdevice to this new subdevice, and also renames it to
>> make it more apparent that it is the config operation for the COS
>> interrupt.
>>
>> Signed-off-by: Chase Southwood 
>> Cc: Ian Abbott 
>> Cc: H Hartley Sweeten 
>> ---
>>   .../staging/comedi/drivers/addi-data/hwdrv_apci1564.c  |  8 
>>   drivers/staging/comedi/drivers/addi_apci_1564.c| 18
>> --
>>   2 files changed, 20 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/staging/comedi/drivers/addi-data/hwdrv_apci1564.c
>> b/drivers/staging/comedi/drivers/addi-data/hwdrv_apci1564.c
>> index 0ba5385..a38ccf9 100644
>> --- a/drivers/staging/comedi/drivers/addi-data/hwdrv_apci1564.c
>> +++ b/drivers/staging/comedi/drivers/addi-data/hwdrv_apci1564.c
>> @@ -101,10 +101,10 @@ static unsigned int ui_InterruptData, ui_Type;
>>* data[2] Interrupt mask for the mode 1
>>* data[3] Interrupt mask for the mode 2
>>*/
>> -static int apci1564_di_config(struct comedi_device *dev,
>> - struct comedi_subdevice *s,
>> - struct comedi_insn *insn,
>> - unsigned int *data)
>> +static int apci1564_cos_insn_config(struct comedi_device *dev,
>> +   struct comedi_subdevice *s,
>> +   struct comedi_insn *insn,
>> +   unsigned int *data)
>
>
> Since the original insn_config routine for the "DI" subdevice was quite
> "bespoke" shall we say, I don't think it's worth adopting it "as is" for the
> "COS" subdevice.  Better just to remove it until it can be implemented
> properly.

Yeah, I had a feeling about that.  I'll just rip it out and respin
this patch series to do that function properly.

>
>
>>   {
>> struct addi_private *devpriv = dev->private;
>>
>> diff --git a/drivers/staging/comedi/drivers/addi_apci_1564.c
>> b/drivers/staging/comedi/drivers/addi_apci_1564.c
>> index 13d9962..6af1e4c 100644
>> --- a/drivers/staging/comedi/drivers/addi_apci_1564.c
>> +++ b/drivers/staging/comedi/drivers/addi_apci_1564.c
>> @@ -105,7 +105,7 @@ static int apci1564_auto_attach(struct comedi_device
>> *dev,
>> dev->irq = pcidev->irq;
>> }
>>
>> -   ret = comedi_alloc_subdevices(dev, 3);
>> +   ret = comedi_alloc_subdevices(dev, 4);
>> if (ret)
>> return ret;
>>
>> @@ -117,7 +117,6 @@ static int apci1564_auto_attach(struct comedi_device
>> *dev,
>> s->maxdata = 1;
>> s->len_chanlist = 32;
>> s->range_table = _digital;
>> -   s->insn_config = apci1564_di_config;
>> s->insn_bits = apci1564_di_insn_bits;
>>
>> /*  Allocate and Initialise DO Subdevice Structures */
>> @@ -144,6 +143,21 @@ static int apci1564_auto_attach(struct comedi_device
>> *dev,
>> s->insn_read = apci1564_timer_read;
>> s->insn_config = apci1564_timer_config;
>>
>> +   /* Change-Of-State (COS) interrupt subdevice */
>> +   s = >subdevices[3];
>> +   if (dev->irq) {
>> +   dev->read_subdev = s;
>> +   s->type = COMEDI_SUBD_DI;
>> +   s->subdev_flags = SDF_READABLE | SDF_CMD_READ;
>> +   s->n_chan = 1;
>> +   s->maxdata = 1;
>> +   s->len_chanlist = 1;
>> +   s->range_table = _digital;
>> +   s->insn_config = apci1564_cos_insn_config;
>
>
> It would be nice to have an 'insn_bits' routine, even if the routine just
> gives back a dummy data value for now.
>

I think in the next version of this patch, I will include a dummy
'insn_bits' function (so there will be no gap between introduction of
this subdevice and introduction of the insn_bits routine), and then I
will extend the patchset to include all of the other necessary
functions for the COS functionality in a later patch.

>
>> +   } else {
>> +   s->type = COMEDI_SUBD_UNUSED;
>> +   }
>> +
>> return 0;
>>   }
>>
>>
>

Thanks as always for the review, I'll get a new patchset out as soon as I can.
Chase.

>
> --
> -=( Ian Abbott @ MEV Ltd.E-mail: )=-
> -=( Tel: +44 (0)161 477 1898   FAX: +44 (0)161 718 3587 )=-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker

2014-05-27 Thread Jens Axboe


On 2014-05-27 20:26, Jens Axboe wrote:

On 2014-05-27 19:34, Ming Lei wrote:

On Wed, May 28, 2014 at 3:35 AM, Jens Axboe  wrote:

On 05/27/2014 01:21 PM, Christoph Hellwig wrote:

On Tue, May 27, 2014 at 01:17:40PM -0600, Jens Axboe wrote:

But I think you sent the old one again, not the new variant :-)


Oh well, next try:


This looks good to me. Was trying to think of ways to reduce that to one
list iteration, but I think it's cleaner to just retain the two separate
ones.

Reusing REQ_SOFTBARRIER is fine as well, not used in blk-mq otherwise.

Let me know when you have runtime verified it.


Looks writing over ext4(especially sync writing) can survive
with Christoph's patch now, thanks Christoph.

Reported-and-tested-by: Ming Lei 


Great! I'll queue it up here too, then.


Christoph, I'll just run a few tests and then queue it up in the 
morning. Can you send a properly signed-off patch with a commit message 
as well? I was writing one up, but I still need the signed-off-by.



--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker

2014-05-27 Thread Jens Axboe


On 2014-05-27 19:34, Ming Lei wrote:

On Wed, May 28, 2014 at 3:35 AM, Jens Axboe  wrote:

On 05/27/2014 01:21 PM, Christoph Hellwig wrote:

On Tue, May 27, 2014 at 01:17:40PM -0600, Jens Axboe wrote:

But I think you sent the old one again, not the new variant :-)


Oh well, next try:


This looks good to me. Was trying to think of ways to reduce that to one
list iteration, but I think it's cleaner to just retain the two separate
ones.

Reusing REQ_SOFTBARRIER is fine as well, not used in blk-mq otherwise.

Let me know when you have runtime verified it.


Looks writing over ext4(especially sync writing) can survive
with Christoph's patch now, thanks Christoph.

Reported-and-tested-by: Ming Lei 


Great! I'll queue it up here too, then.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v11 2/3] clk: exynos5410: register clocks using common clock framework

2014-05-27 Thread Tarek Dakhran

Hi Mike,

On Wed, May 28, 2014 at 4:41 AM, Mike Turquette  wrote:
> Quoting Tarek Dakhran (2014-05-25 20:23:32)
>> The EXYNOS5410 clocks are statically listed and registered
>> using the Samsung specific common clock helper functions.
>>
>> Signed-off-by: Tarek Dakhran 
>> Signed-off-by: Vyacheslav Tyrtov 
>> ---
>>  .../devicetree/bindings/clock/exynos5410-clock.txt |   45 +
>>  drivers/clk/samsung/Makefile   |1 +
>>  drivers/clk/samsung/clk-exynos5410.c   |  209 
>> 
>>  include/dt-bindings/clock/exynos5410.h |   33 
>>  4 files changed, 288 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/clock/exynos5410-clock.txt
>>  create mode 100644 drivers/clk/samsung/clk-exynos5410.c
>>  create mode 100644 include/dt-bindings/clock/exynos5410.h
>>
>> diff --git a/Documentation/devicetree/bindings/clock/exynos5410-clock.txt 
>> b/Documentation/devicetree/bindings/clock/exynos5410-clock.txt
>> new file mode 100644
>> index 000..aeab635
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/clock/exynos5410-clock.txt
>> @@ -0,0 +1,45 @@
>> +* Samsung Exynos5410 Clock Controller
>> +
>> +The Exynos5410 clock controller generates and supplies clock to various
>> +controllers within the Exynos5410 SoC.
>> +
>> +Required Properties:
>> +
>> +- compatible: should be "samsung,exynos5410-clock"
>> +
>> +- reg: physical base address of the controller and length of memory mapped
>> +  region.
>> +
>> +- #clock-cells: should be 1.
>> +
>> +All available clocks are defined as preprocessor macros in
>> +dt-bindings/clock/exynos5410.h header and can be used in device
>> +tree sources.
>> +
>> +External clock:
>> +
>> +There is clock that is generated outside the SoC. It
>> +is expected that it is defined using standard clock bindings
>> +with following clock-output-name:
>> +
>> + - "fin_pll" - PLL input clock from XXTI
>
> Does fin_pll feed into the exynos5410-clock controller? If so, should
> the example clock-controller node below have a clocks and clock-names
> property?

fin_pll does not feed into exynos5410-clock controller, but into mct do.

> diff --git a/arch/arm/boot/dts/exynos5410.dtsi 
> b/arch/arm/boot/dts/exynos5410.dtsi
> new file mode 100644
> index 000..3839c26
> --- /dev/null
> +++ b/arch/arm/boot/dts/exynos5410.dtsi
> @@ -0,0 +1,206 @@
> +/*
> + * SAMSUNG EXYNOS5410 SoC device tree source
> + *
> + * Copyright (c) 2013 Samsung Electronics Co., Ltd.
> + * http://www.samsung.com
> + *
> + * SAMSUNG EXYNOS5410 SoC device nodes are listed in this file.
> + * EXYNOS5410 based board files can include this file and provide
> + * values for board specfic bindings.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include "skeleton.dtsi"
> +#include 
> +
> +/ {
> +   compatible = "samsung,exynos5410", "samsung,exynos5";
> +   interrupt-parent = <>;
[snip]
> +   mct: mct@101C {
> +   compatible = "samsung,exynos4210-mct";
> +   reg = <0x101C 0xB00>;
> +   interrupt-parent = <_map>;
> +   interrupts = <0>, <1>, <2>, <3>,
> +   <4>, <5>, <6>, <7>,
> +   <8>, <9>, <10>, <11>;
> +   clocks = <_pll>, < CLK_MCT>;
> +   clock-names = "fin_pll", "mct";
> +
> +   interrupt_map: interrupt-map {
> +   #interrupt-cells = <1>;
> +   #address-cells = <0>;
> +   #size-cells = <0>;
> +   interrupt-map = <0  23 3>,
> +   <1  23 4>,
> +   <2  25 2>,
> +   <3  25 3>,
> +   <4  0 120 0>,
> +   <5  0 121 0>,
> +   <6  0 122 0>,
> +   <7  0 123 0>,
> +   <8  0 128 0>,
> +   <9  0 129 0>,
> +   <10  0 130 0>,
> +   <11  0 131 0>;
> +   };
> +   };
> +

That's why I documented fin_pll. Should I add mct binding example to
documentation too?

Best regards,
 Tarek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] clk: divider: Fix overflow in clk_divider_bestdiv

2014-05-27 Thread Mike Turquette

Quoting Tomasz Figa (2014-05-07 09:24:10)
> Commit c686078 ("clk: divider: Add round to closest divider") introduced
> a helper function to check whether given divisor is the best one instead
> of direct check. However due to int type used instead of unsigned long
> for passing calculated rates to this function in certain cases an
> overflow could occur, for example when trying to obtain maximum possible
> clock rate by calling clk_round_rate(..., UINT_MAX).
> 
> This patch fixes this issue by changing the type of rate, now and best
> arguments of the function to unsigned long, which is the type that
> should be used for clock rates.
> 
> Signed-off-by: Tomasz Figa 

Sorry for the long wait. This one flew under the radar. Applied to
clk-next.

Regards,
Mike

> ---
>  drivers/clk/clk-divider.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c
> index c572945..e0b360a 100644
> --- a/drivers/clk/clk-divider.c
> +++ b/drivers/clk/clk-divider.c
> @@ -232,7 +232,7 @@ static int _div_round(struct clk_divider *divider, 
> unsigned long parent_rate,
>  }
>  
>  static bool _is_best_div(struct clk_divider *divider,
> -   int rate, int now, int best)
> +   unsigned long rate, unsigned long now, unsigned long best)
>  {
> if (divider->flags & CLK_DIVIDER_ROUND_CLOSEST)
> return abs(rate - now) < abs(rate - best);
> -- 
> 1.9.2
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] export efi.flags to sysfs

2014-05-27 Thread Dave Young

On 05/27/14 at 09:34am, Vivek Goyal wrote:
> On Mon, May 26, 2014 at 04:39:35PM +0800, Dave Young wrote:
> > 
> > For efi=old_map and any old_map quirks like SGI UV in current
> > tree kexec/kdump will fail because it depends on the new 1:1 mapping.
> > 
> > Thus export the mapping method to sysfs so kexec tools can switch
> > to original way to boot.
> > 
> > Since we have efi.flags for all efi facilities so let's just export the
> > efi.flags itself, it maybe useful for other arches and use cases.
> > 
> 
> Does it require any documentation in Documentation/ABI/..

Yes, it's necessary. Will do in next version.

I'm still discussing with Matt, exporting efi.flags seems not a good way
because they are more internal interfaces. 

Probably I should export only a file 'old_map' instead.

> 
> Vivek
> 
> > Signed-off-by: Dave Young 
> > ---
> >  drivers/firmware/efi/efi.c |3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > Index: linux-2.6/drivers/firmware/efi/efi.c
> > ===
> > --- linux-2.6.orig/drivers/firmware/efi/efi.c
> > +++ linux-2.6/drivers/firmware/efi/efi.c
> > @@ -86,16 +86,19 @@ static ssize_t name##_show(struct kobjec
> >  EFI_ATTR_SHOW(fw_vendor);
> >  EFI_ATTR_SHOW(runtime);
> >  EFI_ATTR_SHOW(config_table);
> > +EFI_ATTR_SHOW(flags);
> >  
> >  static struct kobj_attribute efi_attr_fw_vendor = __ATTR_RO(fw_vendor);
> >  static struct kobj_attribute efi_attr_runtime = __ATTR_RO(runtime);
> >  static struct kobj_attribute efi_attr_config_table = 
> > __ATTR_RO(config_table);
> > +static struct kobj_attribute efi_attr_flags = __ATTR_RO(flags);
> >  
> >  static struct attribute *efi_subsys_attrs[] = {
> > _attr_systab.attr,
> > _attr_fw_vendor.attr,
> > _attr_runtime.attr,
> > _attr_config_table.attr,
> > +   _attr_flags.attr,
> > NULL,
> >  };
> >  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] export efi.flags to sysfs

2014-05-27 Thread Dave Young

On 05/27/14 at 02:36pm, Fleming, Matt wrote:
> On 27 May 2014 04:00, Dave Young  wrote:
> > On 05/26/14 at 04:39pm, Dave Young wrote:
> >>
> >> For efi=old_map and any old_map quirks like SGI UV in current
> >> tree kexec/kdump will fail because it depends on the new 1:1 mapping.
> >>
> >> Thus export the mapping method to sysfs so kexec tools can switch
> >> to original way to boot.
> >>
> >> Since we have efi.flags for all efi facilities so let's just export the
> >> efi.flags itself, it maybe useful for other arches and use cases.
> >
> > Rethink about this issue, export flags will expose the efi facility
> > macros to userspace, Matt, what's your opinion? It might be better to export
> > a file 'old_map' only which is '0|1'
> 
> Exporting efi.flags is a non-starter. Those flags are part of an
> internal interface and I'm not prepared to turn them into a userspace
> ABI that we can never, ever change without a massive amount of pain.

Agree that it's not good to move them to external ones.

> 
> I've only vaguely been following along with the other thread, so please
> summarise everything again in your patch. Particularly, I need answers
> to the following questions,
> 
>  - Are you trying to fix a kexec/kdump regression?

Somehow it is a regression.
Before the 1:1 mapping kexec/kdump works with 'noefi'
plus acpi_rsdp= kernel cmdline. kexec-tools does not fill efi_info in 
boot_params
so kexec kernel will simply boot like 'noefi'.

Now we have 1:1 mapping, kexec-tools will boot with efi enabled but SGI UV is
still using old maping thus it become a problem. 

So kexec-tools need to know whether it's old_map or nor so it can switch to the
right way in case efi boot.

>  - Does SGI UV work with kexec + UEFI at all?

It works previously without enabling efi in boot_params. 

> 
> The 1:1 mapping was required to make kexec + EFI work in the first
> instance. If a machine implements the EFI 1:1 mapping, kexec should
> work. If it doesn't implement the 1:1 mapping, then it's probably not
> going to work, right?
> 
> The crux of the question: are you trying to fix a regression?
> 
> If not, then we just need to get SGI UV working with the EFI 1:1
> mapping. No?

Ditto as before explanation...

Thanks
Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: build failure after merge of the nfsd tree

2014-05-27 Thread Stephen Rothwell

Hi Bruce,

After merging the nfsd tree, today's linux-next build (x86_64 allmodconfig)
failed like this:


fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_security_label':
fs/nfsd/nfs4xdr.c:1945:15: error: 'pp' undeclared (first use in this function)
  __be32 *p = *pp;
   ^

Caused by commit 8ea0abf0a992 ("nfsd4: use xdr_reserve_space in
attribute encoding"). "This is a cosmetic change for now; no change in
behavior" :-(  They are the ones you have to be very careful of ...

I have used the nfsd tree from next-20140523 and for today.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature

Re: [RFC PATCH 2/5] clk: Introduce 'clk_round_rate_nearest()'

2014-05-27 Thread Mike Turquette

Quoting Rafael J. Wysocki (2014-05-26 04:22:32)
> On Monday, May 26, 2014 11:59:09 AM Viresh Kumar wrote:
> > On 23 May 2014 21:44, Sören Brinkmann  wrote:
> > > Viresh: Could you imagine something similar for cpufreq? You suggested
> > > migrating to Hz resolution. I guess that would ideally mean to follow
> > > the CCF to a 64-bit type for frequencies and increasing the resolution.
> > > I have a messy patch migrating cpufreq and OPP to Hz and unsigned long
> > > that works on Zynq. But cpufreq has so many users that it would become
> > > quite an undertaking.
> > > And we'd need some new/amended OPP DT binding.
> > 
> > If we are going to migrate to Hz from KHz, I think we must consider the
> > 64 bit stuff right now, otherwise it will bite us later.
> > 
> > @Rafael: What do you think?
> 
> I agree as far as the 64-bit thing goes, but is switching to Hz really
> necessary?

Rafael,

Why should CPUfreq migrate to 64-bit if not switching to Hz? CPU clock
rates are specified as KHz in CPUfreq via an unsigned int. On 32-bit
systems that comes out to a max of 4.29THz (terahertz!)!

Or maybe you meant, "I agree that the clock framework should switch to
the 64-bit thing"?

Personally I'd like to see the clock framework and cpufreq get on the
same page (data type) for specifying clock rates, and the clock
framework really should not use a granularity like KHz. In fact we have
some fractional rates like 13.25Hz ...

Thanks,
Mike

> 
> Rafael
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 03/17] phy: ti-pipe3: add external clock support for PCIe PHY

2014-05-27 Thread Mike Turquette

Quoting Nishanth Menon (2014-05-15 05:33:13)
> On 05/15/2014 07:18 AM, Kishon Vijay Abraham I wrote:
> > Hi,
> > 
> > On Thursday 15 May 2014 05:42 PM, Nishanth Menon wrote:
> >> On Thu, May 15, 2014 at 6:59 AM, Kishon Vijay Abraham I  
> >> wrote:
> >>> Hi Nishant,
> >>>
> >>> On Thursday 15 May 2014 05:16 PM, Nishanth Menon wrote:
>  On Thu, May 15, 2014 at 4:25 AM, Roger Quadros  wrote:
> > On 05/15/2014 12:15 PM, Kishon Vijay Abraham I wrote:
> >> Hi Nishanth,
> >>
> >> On Wednesday 14 May 2014 09:04 PM, Nishanth Menon wrote:
> >>> On Wed, May 14, 2014 at 10:19 AM, Kishon Vijay Abraham I 
> >>>  wrote:
>  Hi Roger,
> 
>  On Wednesday 14 May 2014 06:46 PM, Roger Quadros wrote:
> > Hi Kishon,
> >
> > On 05/06/2014 04:33 PM, Kishon Vijay Abraham I wrote:
> >> APLL used by PCIE phy can either use external clock as input or 
> >> the clock
> >> from DPLL. Added support for the APLL to use external clock as 
> >> input here.
> >>
> >> Cc: Rajendra Nayak 
> >> Cc: Tero Kristo 
> >> Cc: Paul Walmsley 
> >> Signed-off-by: Kishon Vijay Abraham I 
> >> ---
> >>  Documentation/devicetree/bindings/phy/ti-phy.txt |4 ++
> >>  drivers/phy/phy-ti-pipe3.c   |   75 
> >> ++
> >>  2 files changed, 52 insertions(+), 27 deletions(-)
> >>
> >> diff --git a/Documentation/devicetree/bindings/phy/ti-phy.txt 
> >> b/Documentation/devicetree/bindings/phy/ti-phy.txt
> >> index bc9afb5..d50f8ee 100644
> >> --- a/Documentation/devicetree/bindings/phy/ti-phy.txt
> >> +++ b/Documentation/devicetree/bindings/phy/ti-phy.txt
> >> @@ -76,6 +76,10 @@ Required properties:
> >> * "dpll_ref_m2" - external dpll ref clk
> >> * "phy-div" - divider for apll
> >> * "div-clk" - apll clock
> >> +   * "apll_mux" - mux for pcie apll
> >> +   * "refclk_ext" - external reference clock for pcie apll
> >> + - ti,ext-clk: To specifiy if PCIE apll should use external 
> >> clock. Applicable
> >> +   only to PCIE PHY.
> >
> > Instead of specifying both clock sources "dpll_ref_clock", 
> > "refclk_ext" and then specifying a 3rd control option "ti,ext-clk" 
> > to select one of the 2 sources, why can't the DT just supply one 
> > clock source, i.e. the one that is being used in the board 
> > instance? The driver should then just configure the clock rate that 
> > is needed at that node. Shouldn't the clock framework automatically 
> > take care of muxing and parent rates?
> 
>  Want the dt to have all the clocks used by the controller. 
>  "ti,ext-clk" should
>  go in the board dt file (suggested by Nishanth).
>  The point is at some point later if some one wants to change the 
>  clock source,
>  it should be a simple enabling "ti,ext-clk" flag instead of finding 
>  the clock
>  phandle etc..
> >>>
> >>> Wonder if that is implicit by the presence of  "refclk_ext" in the
> >>> clocks provided?
> >>
> >> IMO the presence of "refclk_ext" is useless unless the board indicates 
> >> it
> >> provides the clock source.
> >>
> >> refclk_ext holds phandle for *fixed-clock*, so irrespective of whether 
> >> the
> >> board provides a clock or not, it can have that handle for configuring 
> >> in PRCM.
> >> However if the board does not provide the clock source, configuring 
> >> refclk_ext
> >> in PRCM is useless.
> >
> > I think what Nishant meant is that if "refclk_ext" is provided it means 
> > that the driver
> > should use that over "dpll_ref_clock" so no need of a separate 
> > "ti,ext-clk" flag.
> 
>  yes, thank you for clarifying - it does indeed redundant to have
>  "ti,ext-clk". and apologies on being a little obscure in the comment.
> >>>
> >>> Irrespective of whether external reference clock is used or not, all DRA7
> >>> (apll) has an input for external reference clock (and also a PRCM 
> >>> register for
> >>> programming it) and it has to be specified in dt no?
> >>
> >> Why is that a binding for ti-phy? that is a problem for the APLL clock
> >> driver (selecting it's own source). PHY properties should describe
> >> itself -> let the bindings of the APLL describe itself. please dont
> >> mix the two up.
> > 
> > The apll clock node is like this
> > 
> > apll_pcie_in_clk_mux: apll_pcie_in_clk_mux@4ae06118 {
> > compatible = "mux-clock";
> > clocks = <_pcie_ref_m2ldo_ck>, <_acs_clk_ck>;
> > #clock-cells = <0>;
> > reg = <0x4a00821c 0x4>;
> > bit-mask = <0x80>;
> > };
> > 
> > The external reference clock is denoted by

Re: balance storm

2014-05-27 Thread Mike Galbraith

On Wed, 2014-05-28 at 09:04 +0800, Libo Chen wrote:

> oh yes, no tsc only hpet in my box.

Making poor E5-2658 box a crippled wreck.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] Shrinkers and proportional reclaim

2014-05-27 Thread Dave Chinner

On Tue, May 27, 2014 at 04:19:12PM -0700, Hugh Dickins wrote:
> On Wed, 28 May 2014, Konstantin Khlebnikov wrote:
> > On Wed, May 28, 2014 at 1:17 AM, Hugh Dickins  wrote:
> > > On Tue, 27 May 2014, Dave Chinner wrote:
> > >> On Mon, May 26, 2014 at 02:44:29PM -0700, Hugh Dickins wrote:
> > >> >
> > >> > [PATCH 4/3] fs/superblock: Avoid counting without __GFP_FS
> > >> >
> > >> > Don't waste time counting objects in super_cache_count() if no 
> > >> > __GFP_FS:
> > >> > super_cache_scan() would only back out with SHRINK_STOP in that case.
> > >> >
> > >> > Signed-off-by: Hugh Dickins 
> > >>
> > >> While you might think that's a good thing, it's not.  The act of
> > >> shrinking is kept separate from the accounting of how much shrinking
> > >> needs to take place.  The amount of work the shrinker can't do due
> > >> to the reclaim context is deferred until the shrinker is called in a
> > >> context where it can do work (eg. kswapd)
> > >>
> > >> Hence not accounting for work that can't be done immediately will
> > >> adversely impact the balance of the system under memory intensive
> > >> filesystem workloads. In these worklaods, almost all allocations are
> > >> done in the GFP_NOFS or GFP_NOIO contexts so not deferring the work
> > >> will will effectively stop superblock cache reclaim entirely
> > >
> > > Thanks for filling me in on that.  At first I misunderstood you,
> > > and went off looking in the wrong direction.  Now I see what you're
> > > referring to: the quantity that shrink_slab_node() accumulates in
> > > and withdraws from shrinker->nr_deferred[nid].
> > 
> > Maybe shrinker could accumulate fraction nr_pages_scanned / lru_pages
> > instead of exact amount of required work? Count of shrinkable objects
> > might be calculated later, when shrinker is called from a suitable context
> > and can actualy do something.
> 
> Good idea, probably a worthwhile optimization to think through further.
> (Though experience says that Dave will explain how that can never work.)

Heh. :)

Two things, neither are show-stoppers but would need to be handled
in some way.

First: it would remove a lot of the policy flexibility from the
shrinker implementations that we currently have. i.e. the "work to
do" policy is current set by the shrinker, not by the shrinker
infrastructure. The shrinker infrastructure only determines whether
it can be done immediately of whether it shoul dbe deferred

e.g. there are shrinkers that don't do work unless they are
over certain thresholds. For these shrinkers, they need to have the
work calculated by the callout as they may decide nothing
can/should/needs to be done, and that decision may have nothing to
do with the current reclaim context. You can't really do this
without a callout to determine the cache size.

The other thing I see is that deferring the ratio of work rather
than the actual work is that it doesn't take into account the fact
that the cache sizes might be changing in a different way to memory
pressure. i.e. a sudden increase in cache size just before deferred
reclaim occurred would cause much more reclaim than the current
code, even though the cache wasn't contributing to the original
deferred memory pressure.

This will lead to bursty/peaky reclaim behaviour because we then
can't distinguish an large instantenous change in memory pressure
from "wind up" caused by lots of small increments of deferred work.
We specifically damp the second case:

/*
 * We need to avoid excessive windup on filesystem shrinkers
 * due to large numbers of GFP_NOFS allocations causing the
 * shrinkers to return -1 all the time. This results in a large
 * nr being built up so when a shrink that can do some work
 * comes along it empties the entire cache due to nr >>>
 * freeable. This is bad for sustaining a working set in
 * memory.
 *
 * Hence only allow the shrinker to scan the entire cache when
 * a large delta change is calculated directly.
 */

Hence we'd need a different mechanism to prevent such defered work
wind up from occurring. We can probably do better than the current
SWAG if we design a new algorithm that has this damping built in.
The current algorithm is all based around the "seek penalty"
reinstantiating a reclaimed object has, and that simply does not
match for many shrinker users now as they aren't spinning disk
based. Hence I think we really need to look at improving the entire
shrinker "work" algorithm rather than just tinkering around the
edges...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci: rcar host needs OF

2014-05-27 Thread Jingoo Han

On Wednesday, May 28, 2014 7:54 AM, Bjorn Helgaas wrote:
> On Thu, May 08, 2014 at 04:56:25PM +0200, Arnd Bergmann wrote:
> > The pci-rcar driver is enabled for compile tests, and this has
> > now shown that the driver cannot build without CONFIG_OF,
> > following the inclusion of f8f2fe7355fb "PCI: rcar: Use new OF
> > interrupt mapping when possible":
> >
> > drivers/built-in.o: In function `rcar_pci_map_irq':
> > :(.text+0x1cc7c): undefined reference to `of_irq_parse_and_map_pci'
> >
> > Signed-off-by: Arnd Bergmann 
> > Cc: Bjorn Helgaas 
> > Cc: Magnus Damm 
> > Cc: linux-...@vger.kernel.org
> > Cc: linux...@vger.kernel.org
> 
> If I understand correctly, this patch was superceded by this one:
> 
> "[PATCH] of/irq: provide int of_irq_parse_and_map_pci wrapper"
> 
> and you aren't expecting me to do anything.  Let me know if otherwise.

Yes, right. I checked that the build error was resolved by the above
mentioned patch.

Best regards,
Jingoo Han

> 
> > ---
> >  drivers/pci/host/Kconfig | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
> > index fbbef0b..4675f47 100644
> > --- a/drivers/pci/host/Kconfig
> > +++ b/drivers/pci/host/Kconfig
> > @@ -27,7 +27,7 @@ config PCI_TEGRA
> >
> >  config PCI_RCAR_GEN2
> > bool "Renesas R-Car Gen2 Internal PCI controller"
> > -   depends on ARCH_SHMOBILE || (ARM && COMPILE_TEST)
> > +   depends on ARCH_SHMOBILE || (ARM && OF && COMPILE_TEST)
> > help
> >   Say Y here if you want internal PCI support on R-Car Gen2 SoC.
> >   There are 3 internal PCI controllers available with a single
> > --
> > 1.8.3.2
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] block: mq flush: fix race between IPI handler and mq flush worker

2014-05-27 Thread Ming Lei

On Wed, May 28, 2014 at 3:35 AM, Jens Axboe  wrote:
> On 05/27/2014 01:21 PM, Christoph Hellwig wrote:
>> On Tue, May 27, 2014 at 01:17:40PM -0600, Jens Axboe wrote:
>>> But I think you sent the old one again, not the new variant :-)
>>
>> Oh well, next try:
>
> This looks good to me. Was trying to think of ways to reduce that to one
> list iteration, but I think it's cleaner to just retain the two separate
> ones.
>
> Reusing REQ_SOFTBARRIER is fine as well, not used in blk-mq otherwise.
>
> Let me know when you have runtime verified it.

Looks writing over ext4(especially sync writing) can survive
with Christoph's patch now, thanks Christoph.

Reported-and-tested-by: Ming Lei 


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] MAINTAINERS: Add me as the get_maintainer.pl maintainer

2014-05-27 Thread Joe Perches

Might as well be the get_maintainer maintainer...

Signed-off-by: Joe Perches 
---
 MAINTAINERS | 5 +
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c0d1e36..98604ee 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3900,6 +3900,11 @@ L:   k...@vger.kernel.org
 S: Supported
 F: drivers/uio/uio_pci_generic.c
 
+GET_MAINTAINER SCRIPT
+M: Joe Perches 
+S: Maintained
+F: scripts/get_maintainer.pl
+
 GFS2 FILE SYSTEM
 M: Steven Whitehouse 
 L: cluster-de...@redhat.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] MAINTAINERS: add AT91 Clock Support entry

2014-05-27 Thread Mike Turquette

Quoting Boris BREZILLON (2014-05-27 04:39:28)
> Signed-off-by: Boris BREZILLON 

Applied to clk-next.

Regards,
Mike

> ---
>  MAINTAINERS | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 1066264..40c5580 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -808,6 +808,11 @@ F: arch/arm/boot/dts/at91*.dtsi
>  F: arch/arm/boot/dts/sama*.dts
>  F: arch/arm/boot/dts/sama*.dtsi
>  
> +ARM/ATMEL AT91 Clock Support
> +M: Boris Brezillon 
> +S: Maintained
> +F: drivers/clk/at91
> +
>  ARM/CALXEDA HIGHBANK ARCHITECTURE
>  M: Rob Herring 
>  L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
> -- 
> 1.8.3.2
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci: Save and restore VFs as a part of a reset

2014-05-27 Thread Bjorn Helgaas

[+cc Alex, Don]

On Tue, May 27, 2014 at 5:53 PM, Alexander Duyck
 wrote:
> On 05/27/2014 03:22 PM, Bjorn Helgaas wrote:
>> On Mon, May 05, 2014 at 02:25:17PM -0700, Alexander Duyck wrote:
>>> This fixes an issue I found in which triggering a reset via the PCI sysfs
>>> reset while SR-IOV was enabled would leave the VFs in a state in which the
>>> BME and MSI-X enable bits were all cleared.
>>>
>>> To correct that I have added code so that the VF state is saved and restored
>>> as a part of the PF save and restore state functions.  By doing this the VF
>>> state is restored as well as the IOV state allowing the VFs to resume 
>>> function
>>> following a reset.
>>>
>>> Signed-off-by: Alexander Duyck 
>>> ---
>>>  drivers/pci/iov.c |   48 ++--
>>>  drivers/pci/pci.c |2 ++
>>>  drivers/pci/pci.h |5 +
>>>  3 files changed, 53 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>>> index de7a747..645ed71 100644
>>> --- a/drivers/pci/iov.c
>>> +++ b/drivers/pci/iov.c
>>> @@ -521,13 +521,57 @@ resource_size_t pci_sriov_resource_alignment(struct 
>>> pci_dev *dev, int resno)
>>>  }
>>>
>>>  /**
>>> + * pci_save_iov_state - Save the state of the VF configurations
>>> + * @dev: the PCI device
>>> + */
>>> +int pci_save_iov_state(struct pci_dev *dev)
>>> +{
>>> +struct pci_dev *vfdev = NULL;
>>> +unsigned short dev_id;
>>> +
>>> +/* only search if we are a PF */
>>> +if (!dev->is_physfn)
>>> +return 0;
>>> +
>>> +/* retrieve VF device ID */
>>> +pci_read_config_word(dev, dev->sriov->pos + PCI_SRIOV_VF_DID, _id);
...

>>> +/* loop through all the VFs and save their state information */
>>> +while ((vfdev = pci_get_device(dev->vendor, dev_id, vfdev))) {
>>> +if (vfdev->is_virtfn && (vfdev->physfn == dev)) {
>>> +int err = pci_save_state(vfdev);
>>
>> It makes me uneasy to operate on another device (we're resetting A, and
>> here we save state for B).  I know B is dependent on A, since B is a VF
>> related to PF A, but what synchronization is there to serialize this
>> against any other save/restore operations that may be in progress by B's
>> driver or by a sysfs operation on B?
>
> I don't believe there is any synchronization mechanism in place
> currently.  I can look into that as well.  Odds are we probably need to
> have the VFs check the parent lock before they take any independent action.

It's just the whole question of how we manage the single "saved-state"
area.  Right now, I think almost all use of it is under control of the
driver that owns the device, in suspend/resume methods.  The
exceptions are the PM suspend/freeze/etc. routines in
pci/pci-driver.c, which I assume prevent the driver from running and
are therefore safe, and the reset path.  I don't know how the

>> Is there anything in the reset path that pays attention to whether
>> resetting this PF will clobber VFs?  Do we care whether those VFs are in
>> use?  I assume they might be in use by guests?
>
> The problem I found was that the sysfs reset call doesn't bother to
> check with the PF driver at all.  It just clobbers the PF and any VFs on
> it without talking to the PF driver.

There is Keith Busch's recent patch:
http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?h=pci/hotplug=3ebe7f9f7e4a4fd1f6461ecd01ff2961317a483a
.  I dunno if that's useful to you or not.

And I'm not sure there's actually a requirement to *have* a PF driver.
 Obviously there has to be a way to enable the VFs, but once they're
enabled, it might be possible to keep using them via VF drivers even
without a PF driver in the picture.

Maybe resetting the PF should just fail if there's an active VF.  If
you need to reset the PF, you'd have to unbind the VFs first.

>>> +if (err)
>>> +return err;
>>> +}
>>> +}
>>
>> pci_get_device() acquires a reference on each device it returns, so this
>> strategy would require a pci_dev_put().
>
> Yes, if I am not mistaken the pci_dev_put is called as a part of
> pci_get_dev_by_id which is what pci_get_device ends up being.

Oh, yeah, you're right.  I forgot about that.  Since you call it in a
loop until you get NULL back, you're OK.  It's only when you stop
before you get NULL that you have to deal with the extra reference.

>> But I'm not really keen on pci_get_device() in the first place.  It works
>> by iterating over all PCI devices in the system, which seems like a
>> sledgehammer approach.  It *is* widely used, but mostly in quirk-type code
>> from which I avert my eyes.
>>
>> Maybe you could do something based on pci_walk_bus()?  If you did that, I
>> think the PCI_SRIOV_VF_DID would become superfluous.
>>
>
> I can look into that, I'm not familiar with the interface.  I'll have to
> see what the relationship is between the PF and VF in terms of busses as
> I don't recall it off of the

Re: [PATCH] hv: use correct order when freeing monitor_pages

2014-05-27 Thread Amos Kong

On Tue, May 27, 2014 at 07:16:20PM +0200, Radim Krčmář wrote:
> We try to free two pages when only one has been allocated.
> Cleanup path is unlikely, so I haven't found any trace that would fit,
> but I hope that free_pages_prepare() does catch it.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Radim Krčmář 
> ---
>  Cc'd stable because the worst-case looks hard to debug.
>  Btw. the module can't get unloaded after we successfully connect?
> 
>  drivers/hv/connection.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 7f10c15..e84f452 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -224,8 +224,8 @@ cleanup:
>   vmbus_connection.int_page = NULL;
>   }
>  
> - free_pages((unsigned long)vmbus_connection.monitor_pages[0], 1);
> - free_pages((unsigned long)vmbus_connection.monitor_pages[1], 1);
> + free_pages((unsigned long)vmbus_connection.monitor_pages[0], 0);
> + free_pages((unsigned long)vmbus_connection.monitor_pages[1], 0);


Allocate order is 0. (2^0 = 1 page)

  vmbus_connection.monitor_pages[0] = (void 
*)__get_free_pages((GFP_KERNEL|__GFP_ZERO), 0);
  vmbus_connection.monitor_pages[1] = (void 
*)__get_free_pages((GFP_KERNEL|__GFP_ZERO), 0);

Looks good.

Reviewed-by: Amos Kong 

>   vmbus_connection.monitor_pages[0] = NULL;
>   vmbus_connection.monitor_pages[1] = NULL;
>  
> -- 
> 1.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Amos.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] clk: bcm/kona: implement determine_rate()

2014-05-27 Thread Mike Turquette

Quoting Alex Elder (2014-05-27 09:56:56)
> Implement the clk->determine_rate method for Broadcom Kona peripheral
> clocks.  This allows a peripheral clock to be re-parented in order to
> satisfy a rate change request.  This takes the place of the previous
> kona_peri_clk_round_rate() functionality, though that function remains
> because it is used by the new one.
> 
> The parent clock that allows the peripheral clock to produce a rate
> closest to the one requested is the one selected, though the current
> parent is used by default.
> 
> Signed-off-by: Alex Elder 

Applied to clk-next.

Regards,
Mike

> ---
> v2: Added WARN_ON_ONCE() call as suggested.
> 
> This patch is based on Mike Turquette's current "clk-next" branch.
> 42dd880 Merge branch 'clk-fixes' into clk-next
> 
> It is available here:
> http://git.linaro.org/landing-teams/working/broadcom/kernel.git
> Branch review/bcm-determine-rate-v2
> 
>  drivers/clk/bcm/clk-kona.c | 54 
> +-
>  1 file changed, 53 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/clk/bcm/clk-kona.c b/drivers/clk/bcm/clk-kona.c
> index d603c4e..95af2e6 100644
> --- a/drivers/clk/bcm/clk-kona.c
> +++ b/drivers/clk/bcm/clk-kona.c
> @@ -1031,6 +1031,58 @@ static long kona_peri_clk_round_rate(struct clk_hw 
> *hw, unsigned long rate,
> rate ? rate : 1, *parent_rate, NULL);
>  }
>  
> +static long kona_peri_clk_determine_rate(struct clk_hw *hw, unsigned long 
> rate,
> +   unsigned long *best_parent_rate, struct clk **best_parent)
> +{
> +   struct kona_clk *bcm_clk = to_kona_clk(hw);
> +   struct clk *clk = hw->clk;
> +   struct clk *current_parent;
> +   unsigned long parent_rate;
> +   unsigned long best_delta;
> +   unsigned long best_rate;
> +   u32 parent_count;
> +   u32 which;
> +
> +   /*
> +* If there is no other parent to choose, use the current one.
> +* Note:  We don't honor (or use) CLK_SET_RATE_NO_REPARENT.
> +*/
> +   WARN_ON_ONCE(bcm_clk->init_data.flags & CLK_SET_RATE_NO_REPARENT);
> +   parent_count = (u32)bcm_clk->init_data.num_parents;
> +   if (parent_count < 2)
> +   return kona_peri_clk_round_rate(hw, rate, best_parent_rate);
> +
> +   /* Unless we can do better, stick with current parent */
> +   current_parent = clk_get_parent(clk);
> +   parent_rate = __clk_get_rate(current_parent);
> +   best_rate = kona_peri_clk_round_rate(hw, rate, _rate);
> +   best_delta = abs(best_rate - rate);
> +
> +   /* Check whether any other parent clock can produce a better result */
> +   for (which = 0; which < parent_count; which++) {
> +   struct clk *parent = clk_get_parent_by_index(clk, which);
> +   unsigned long delta;
> +   unsigned long other_rate;
> +
> +   BUG_ON(!parent);
> +   if (parent == current_parent)
> +   continue;
> +
> +   /* We don't support CLK_SET_RATE_PARENT */
> +   parent_rate = __clk_get_rate(parent);
> +   other_rate = kona_peri_clk_round_rate(hw, rate, _rate);
> +   delta = abs(other_rate - rate);
> +   if (delta < best_delta) {
> +   best_delta = delta;
> +   best_rate = other_rate;
> +   *best_parent = parent;
> +   *best_parent_rate = parent_rate;
> +   }
> +   }
> +
> +   return best_rate;
> +}
> +
>  static int kona_peri_clk_set_parent(struct clk_hw *hw, u8 index)
>  {
> struct kona_clk *bcm_clk = to_kona_clk(hw);
> @@ -1135,7 +1187,7 @@ struct clk_ops kona_peri_clk_ops = {
> .disable = kona_peri_clk_disable,
> .is_enabled = kona_peri_clk_is_enabled,
> .recalc_rate = kona_peri_clk_recalc_rate,
> -   .round_rate = kona_peri_clk_round_rate,
> +   .determine_rate = kona_peri_clk_determine_rate,
> .set_parent = kona_peri_clk_set_parent,
> .get_parent = kona_peri_clk_get_parent,
> .set_rate = kona_peri_clk_set_rate,
> -- 
> 1.9.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf: fix 'make help' message error

2014-05-27 Thread Jianyu Zhan

Currently 'make help' message has such hint:

   use "make prefix= " to install to a particular
   path like make prefix=/usr/local install install-doc

But this is misleading, when I specify "prefix=/usr/local", it has got no
respect at all. Instead, what takes effect is the "DESTDIR" variable.
In this case, "DESTDIR" has a empty value, so the actual install
directory falls back $HOME, not '/usr/local'.

Specifying "DESTDIR=/usr/local" will work as desired.

This patch fixes the help message.

Signed-off-by: Jianyu Zhan 
---
 tools/perf/Makefile.perf | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 895edd3..37c5f90 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -784,8 +784,8 @@ help:
@echo ''
@echo 'Perf install targets:'
@echo '  NOTE: documentation build requires asciidoc, xmlto packages to 
be installed'
-   @echo '  HINT: use "make prefix= " to install to 
a particular'
-   @echo 'path like make prefix=/usr/local install install-doc'
+   @echo '  HINT: use "make DESTDIR= " to install to 
a particular'
+   @echo 'path like "make DESTDIR=/usr/local install install-doc"'
@echo '  install- install compiled binaries'
@echo '  install-doc- install *all* documentation'
@echo '  install-man- install manpage documentation'
-- 
2.0.0-rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Staging: rtl8192u: r8180_93cx6.c & r8192U_wx.c fix checkpatch.pl errors and warnings

2014-05-27 Thread DaeSeok Youn

Hi,

2014-05-28 5:28 GMT+09:00 Chaitanya Hazarey :
> Hey DaeSeok,
>
> I fixed all of them except -
>
> this "if" condition like below:
> if (5 == wrqu->encoding.length || 13 == wrqu->encoding.length)
>   mask = 0x00;
> and this should be outside "for" loop
>
> Can you please elaborate on this?
origin code:
if(wrqu->encoding.length!=0){

for(i=0 ; i<4 ; i++){
hwkey[i] |=  key[4*i+0]
if(i==1&&(4*i+1)==wrqu->encoding.length) mask=0x00;
if(i==3&&(4*i+1)==wrqu->encoding.length) mask=0x00;
hwkey[i] |= (key[4*i+1])<<8;
hwkey[i] |= (key[4*i+2])<<16;
hwkey[i] |= (key[4*i+3])<<24;
}

I think "i == 1 && (4*i+1)" statement is same as 5, so it seems to be
"if (wrqu->encoding.length == 5)". Once mask value is set to zero, it
doesn't need to set anymore. (I don't know what the "5" means :-) )
It is just my opinion.

regards,
Daeseok Youn
>
> Thanks,
>
> Chaitanya
>
> On Mon, May 26, 2014 at 11:56 PM, DaeSeok Youn  wrote:
>> Hi,
>>
>> 2014-05-27 14:43 GMT+09:00 Chaitanya Hazarey :
>>> Fixed the following:
>>>
>>> ERROR: do not use C99 // comments
>>> ERROR: else should follow close brace '}'
>>> ERROR: need consistent spacing around '*' (ctx:WxV)
>>> ERROR: need consistent spacing around '|' (ctx:VxW)
>>> ERROR: space prohibited after that open parenthesis '('
>>> ERROR: space prohibited before that '--' (ctx:WxO)
>>> ERROR: space prohibited before that close parenthesis ')'
>>> ERROR: space required after that ',' (ctx:VxV)
>>> ERROR: space required after that ';' (ctx:VxV)
>>> ERROR: space required after that close brace '}'
>>> ERROR: space required before the open brace '{'
>>> ERROR: space required before the open parenthesis '('
>>> ERROR: spaces required around that '!=' (ctx:VxV)
>>> ERROR: spaces required around that '!=' (ctx:WxV)
>>> ERROR: spaces required around that '&&' (ctx:VxV)
>>> ERROR: spaces required around that '<' (ctx:VxV)
>>> ERROR: spaces required around that '=' (ctx:VxV)
>>> ERROR: spaces required around that '=' (ctx:VxW)
>>> ERROR: spaces required around that '=' (ctx:WxV)
>>> ERROR: spaces required around that '==' (ctx:VxV)
>>> ERROR: that open brace { should be on the previous line
>>> ERROR: trailing statements should be on next line
>>> WARNING: Missing a blank line after declarations
>>> WARNING: missing space after struct definition
>>> WARNING: please, no spaces at the start of a line
>>> WARNING: suspect code indent for conditional statements (16, 16)
>>
>> Please break this up into smaller patches.
>> And need to resend this patch.
>>>
>>> Signed-off-by: Chaitanya Hazarey 
>>> ---
>>>  drivers/staging/rtl8192u/r8180_93cx6.c |   58 ++---
>>>  drivers/staging/rtl8192u/r8192U_wx.c   |  373 
>>> +---
>>>  2 files changed, 223 insertions(+), 208 deletions(-)
>>>
>>> diff --git a/drivers/staging/rtl8192u/r8180_93cx6.c 
>>> b/drivers/staging/rtl8192u/r8180_93cx6.c
>>> index cd06054..7a0051e 100644
>>> --- a/drivers/staging/rtl8192u/r8180_93cx6.c
>>> +++ b/drivers/staging/rtl8192u/r8180_93cx6.c
>>> @@ -53,7 +53,7 @@ static void eprom_ck_cycle(struct net_device *dev)
>>>  }
>>>
>>>
>>> -static void eprom_w(struct net_device *dev,short bit)
>>> +static void eprom_w(struct net_device *dev, short bit)
>>>  {
>>> u8 cmdreg;
>>>
>>> @@ -86,7 +86,7 @@ static void eprom_send_bits_string(struct net_device 
>>> *dev, short b[], int len)
>>>  {
>>> int i;
>>>
>>> -   for(i=0; i>> +   for (i = 0 ; i < len ; i++) {
>> I think it doesn't need to add a space between "i = 0" and ";".
>> just like below:
>> for (i = 0; i < len; i++) {
>>
>>> eprom_w(dev, b[i]);
>>> eprom_ck_cycle(dev);
>>> }
>>> @@ -96,50 +96,50 @@ static void eprom_send_bits_string(struct net_device 
>>> *dev, short b[], int len)
>>>  u32 eprom_read(struct net_device *dev, u32 addr)
>>>  {
>>> struct r8192_priv *priv = ieee80211_priv(dev);
>>> -   short read_cmd[]={1,1,0};
>>> +   short read_cmd[] = {1, 1, 0};
>>> short addr_str[8];
>>> int i;
>>> int addr_len;
>>> u32 ret;
>>>
>>> -   ret=0;
>>> -   //enable EPROM programming
>>> +   ret = 0;
>>> +   /* enable EPROM programming */
>>> write_nic_byte_E(dev, EPROM_CMD,
>>>(EPROM_CMD_PROGRAM<>> force_pci_posting(dev);
>>> udelay(EPROM_DELAY);
>>>
>>> -   if (priv->epromtype==EPROM_93c56){
>>> -   addr_str[7]=addr & 1;
>>> -   addr_str[6]=addr & (1<<1);
>>> -   addr_str[5]=addr & (1<<2);
>>> -   addr_str[4]=addr & (1<<3);
>>> -   addr_str[3]=addr & (1<<4);
>>> -   addr_str[2]=addr & (1<<5);
>>> -   addr_str[1]=addr & (1<<6);
>>> -   addr_str[0]=addr & (1<<7);
>>> -   addr_len=8;
>>> -

Re: [PATCH] spi: Set cs-gpios to output direction

2014-05-27 Thread Stephen Boyd

On 05/24/14 04:54, Mark Brown wrote:
> On Fri, May 23, 2014 at 05:57:34PM -0700, Stephen Boyd wrote:
>> Some gpios used for cs-gpios may not be configured for output by
>> default. In these cases gpio_set_value() won't have any effect
>> and so the chip select line won't toggle. Request the cs-gpios
>> and set them to output direction once we know if the chip select
>> is default high or default low.
> Currently the SPI framework is expecting that the controller driver will
> own the GPIOs so it's not requesting them at all - starting to request
> them in the core without warning is likely to lead to double requests
> which doesn't seem like the best idea ever.  The driver has to
> understand that there are GPIO chip selects since it needs to figure out
> what to do with any underlying hardware chip selects that it can't stop
> toggling (there may be none or it may be directable into space with
> pinmux but we can't rely on that).  

Ok. My SPI controller is relying on the pinctrl framework to request
these gpios and I didn't have that configured in DT.

>
>> I wonder if we should request the gpios when the master controller
>> probes or when a spi device is added? We only know what the default
>> value should be when the spi device is added. On the other hand,
>> we should probably fail probe if the gpio controller isn't ready when
>> the spi master controller probes.
> Right, plus the fact that each driver has to open code the requesting,
> probe deferral handling and so on.  It's not super awesome, the whole
> area around GPIO chip select handling needs a bit of a sorched earth
> refactoring.
>
> Ideally we'd be able to error out only the device using an individual
> GPIO rather than the whole controller if a GPIO isn't there for some
> reason so doing it at device time would be nicer but my recollection is
> that this won't play nicely with deferred probe, it's a while since I
> looked so I may be misremembering.

Yes. There would need to be some hook into the SPI core from the driver
core that notified of any new driver probes. Then we could try and get
any pending cs-gpios again and then add the device that uses that chip
select.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: balance storm

2014-05-27 Thread Libo Chen

On 2014/5/28 4:53, Thomas Gleixner wrote:
> On Tue, 27 May 2014, Libo Chen wrote:
>> On 2014/5/27 17:55, Mike Galbraith wrote:
>>> On Tue, 2014-05-27 at 15:56 +0800, Libo Chen wrote: 
> On 2014/5/26 22:19, Mike Galbraith wrote:
>>> On Mon, 2014-05-26 at 20:16 +0800, Libo Chen wrote: 
> On 2014/5/26 13:11, Mike Galbraith wrote:
>>>
>>> Your synthetic test is the absolute worst case scenario.  There has 
>>> to
>>> be work between wakeups for select_idle_sibling() to have any chance
>>> whatsoever of turning in a win.  At 0 work, it becomes 100% 
>>> overhead.
>
> not synthetic, it is a real problem in our product. under no load, 
> waste
> much cpu time.
>>>
>>> What happens in your product if you apply the commit I pointed out?
>
> under no load, cpu usage is up to 60%, but the same apps cost 10% on
> susp sp1.  The apps use a lot of timer.
>>> Something is rotten.  3.14-rt contains that commit, I ran your test with
>>> 256 threads on 64 core box, saw ~4%.
>>>
>>> Putting master/nopreempt config on box and doing the same test, box is
>>> chewing up truckloads of CPU, but not from migrations. 
>>>
>>> perf top -g --sort=symbol
>> in my box:
>>
>> perf top -g --sort=symbol
>>
>> Events: 3K cycles
>>  73.27%  [k] read_hpet
> 
> Why is that machine using read_hpet() ?
> 
> Please provide the output of 
> 
> # dmesg | grep -i tsc
> 

Euler:/home # dmesg  | grep -i tsc
[0.00] Fast TSC calibration using PIT
[0.226921] TSC synchronization [CPU#0 -> CPU#1]:
[0.227142] Measured 1053728 cycles TSC warp between CPUs, turning off TSC 
clock.
[0.008000] Marking TSC unstable due to check_tsc_sync_source failed

> and
> 
> # cat /sys/devices/system/clocksource/clocksource0/available_clocksource

hpet acpi_pm

> 
> and
> 
> # cat /sys/devices/system/clocksource/clocksource0/current_clocksource

hpet

> 
> Thanks,
> 
>   tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: balance storm

2014-05-27 Thread Libo Chen

On 2014/5/27 21:20, Mike Galbraith wrote:
> On Tue, 2014-05-27 at 20:50 +0800, Libo Chen wrote:
> 
>> in my box:
>>
>> perf top -g --sort=symbol
>>
>> Events: 3K cycles
>>  73.27%  [k] read_hpet
>>   4.30%  [k] _raw_spin_lock_irqsave
>>   1.88%  [k] __schedule
>>   1.00%  [k] idle_cpu
>>   0.91%  [k] native_write_msr_safe
>>   0.68%  [k] select_task_rq_fair
>>   0.51%  [k] module_get_kallsym
>>   0.49%  [.] sem_post
>>   0.44%  [.] main
>>   0.41%  [k] menu_select
>>   0.39%  [k] _raw_spin_lock
>>   0.38%  [k] __switch_to
>>   0.33%  [k] _raw_spin_lock_irq
>>   0.32%  [k] format_decode
>>   0.29%  [.] usleep
>>   0.28%  [.] symbols__insert
>>   0.27%  [k] tick_nohz_stop_sched_tick
>>   0.27%  [k] update_stats_wait_end
>>   0.26%  [k] apic_timer_interrupt
>>   0.25%  [k] enqueue_entity
>>   0.25%  [k] sched_clock_local
>>   0.24%  [k] _raw_spin_unlock_irqrestore
>>   0.24%  [k] select_idle_sibling
> 
> read_hpet?  Are you booting box notsc or something?  Migration cost is
> the least of your worries.

oh yes, no tsc only hpet in my box. I don't know hhy is read_hpet is hot.
but when I bind 3-th tasks to percpu,cost will be rapid decline, yet perf
shows read_hpet is still hot.

after bind

Events: 561K cycles
 64.18%  [kernel]  [k] read_hpet
  5.51%  usleep[.] main
  2.71%  [kernel]  [k] __schedule
  1.82%  [kernel]  [k] _raw_spin_lock_irqsave
  1.56%  libc-2.11.3.so[.] usleep
  1.07%  [kernel]  [k] apic_timer_interrupt
  0.89%  libc-2.11.3.so[.] __GI___libc_nanosleep
  0.82%  [kernel]  [k] native_write_msr_safe
  0.82%  [kernel]  [k] ktime_get
  0.71%  [kernel]  [k] trace_hardirqs_off
  0.63%  [kernel]  [k] __switch_to
  0.60%  [kernel]  [k] _raw_spin_unlock_irqrestore
  0.47%  [kernel]  [k] menu_select
  0.46%  [kernel]  [k] _raw_spin_lock
  0.45%  [kernel]  [k] enqueue_entity
  0.45%  [kernel]  [k] sched_clock_local
  0.43%  [kernel]  [k] try_to_wake_up
  0.42%  [kernel]  [k] hrtimer_nanosleep
  0.36%  [kernel]  [k] do_nanosleep
  0.35%  [kernel]  [k] _raw_spin_lock_irq
  0.34%  [kernel]  [k] rb_insert_color
  0.29%  [kernel]  [k] update_curr
  0.29%  [kernel]  [k] native_sched_clock
  0.28%  [kernel]  [k] hrtimer_interrupt
  0.28%  [kernel]  [k] rcu_idle_exit_common
  0.27%  [kernel]  [k] hrtimer_init
  0.27%  [kernel]  [k] __hrtimer_start_range_ns
  0.26%  [kernel]  [k] __rb_erase_color
  0.26%  [kernel]  [k] lock_hrtimer_base
  0.25%  [kernel]  [k] trace_hardirqs_on
  0.23%  [kernel]  [k] rcu_idle_enter_common
  0.23%  [kernel]  [k] cpuidle_idle_call
  0.23%  [kernel]  [k] finish_task_switch
  0.22%  [kernel]  [k] set_next_entity
  0.22%  [kernel]  [k] cpuacct_charge
  0.22%  [kernel]  [k] pick_next_task_fair
  0.21%  [kernel]  [k] sys_nanosleep
  0.20%  [kernel]  [k] rb_next
  0.20%  [kernel]  [k] start_critical_timings
> 
> -Mike
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[f2fs-dev] [PATCH v2] f2fs: avoid overflow when large directory feathure is enabled

2014-05-27 Thread Chao Yu

When large directory feathure is enable, We have one case which could cause
overflow in dir_buckets() as following:
special case: level + dir_level >= 32 and level < MAX_DIR_HASH_DEPTH / 2.

Here we define MAX_DIR_BUCKETS to limit the return value when the condition
could trigger potential overflow.

Changes from V1
 o modify description of calculation in f2fs.txt suggested by Changman Lee.

Suggested-by: Changman Lee 
Signed-off-by: Chao Yu 
---
 Documentation/filesystems/f2fs.txt |8 
 fs/f2fs/dir.c  |4 ++--
 include/linux/f2fs_fs.h|3 +++
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index 25311e11..51afba1 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -461,11 +461,11 @@ The number of blocks and buckets are determined by,
   # of blocks in level #n = |
 `- 4, Otherwise
 
- ,- 2^ (n + dir_level),
-|if n < MAX_DIR_HASH_DEPTH / 2,
+ ,- 2^(n + dir_level),
+|if n + dir_level < MAX_DIR_HASH_DEPTH / 2,
   # of buckets in level #n = |
- `- 2^((MAX_DIR_HASH_DEPTH / 2 + dir_level) - 1),
- Otherwise
+ `- 2^((MAX_DIR_HASH_DEPTH / 2) - 1),
+ Otherwise
 
 When F2FS finds a file name in a directory, at first a hash value of the file
 name is calculated. Then, F2FS scans the hash table in level #0 to find the
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index c3f1485..966acb0 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -23,10 +23,10 @@ static unsigned long dir_blocks(struct inode *inode)
 
 static unsigned int dir_buckets(unsigned int level, int dir_level)
 {
-   if (level < MAX_DIR_HASH_DEPTH / 2)
+   if (level + dir_level < MAX_DIR_HASH_DEPTH / 2)
return 1 << (level + dir_level);
else
-   return 1 << ((MAX_DIR_HASH_DEPTH / 2 + dir_level) - 1);
+   return MAX_DIR_BUCKETS;
 }
 
 static unsigned int bucket_blocks(unsigned int level)
diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
index 8c03f71..ba6f312 100644
--- a/include/linux/f2fs_fs.h
+++ b/include/linux/f2fs_fs.h
@@ -394,6 +394,9 @@ typedef __le32  f2fs_hash_t;
 /* MAX level for dir lookup */
 #define MAX_DIR_HASH_DEPTH 63
 
+/* MAX buckets in one level of dir */
+#define MAX_DIR_BUCKETS(1 << ((MAX_DIR_HASH_DEPTH / 2) - 1))
+
 #define SIZE_OF_DIR_ENTRY  11  /* by byte */
 #define SIZE_OF_DENTRY_BITMAP  ((NR_DENTRY_IN_BLOCK + BITS_PER_BYTE - 1) / \
BITS_PER_BYTE)
-- 
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND v2 1/4] mfd: intel_soc_pmic: Core driver

2014-05-27 Thread Zhu, Lejun


On 5/27/2014 7:20 PM, Mark Brown wrote:
> On Tue, May 27, 2014 at 08:48:58AM +0800, Zhu, Lejun wrote:
>> On 5/26/2014 10:51 PM, Mark Brown wrote:
> 
 We created these names to hide the implementation of how read/write is
 done from other platform specific patches interacting with this driver.
 So when we change the implementation, e.g. from I2C read/write to
 regmap, we don't have to touch all these patches.
> 
>>> This sort of HAL is frowned upon in the upstream kernel.
> 
>> We want to do what other MFD drivers' been doing, and make it easier for
>> the callers. A couple of similar examples are intel_msic_reg_read() and
>> lp3943_read_byte(). We want to do the same with intel_soc_pmic_readb(),
>> and I don't think it's too odd.
> 
> The odd and problematic bit is the global variable part of things -
> these wrappers are usually just doing lookup of the underlying I/O
> handle in the struct for the device and can be implemented as static
> inlines in the header.
> 

Oh I see. Sorry I missed your point. So you are saying "int
intel_soc_pmic_readb(int reg)" is bad, but if I have:

int intel_soc_pmic_readb(struct intel_soc_pmic *pmic, int reg)
{
int ret;
unsigned int val;

ret = regmap_read(pmic->regmap, reg, );
if (!ret)
ret = val;

return ret;
}

And have the caller (device or core) look up and pass *pmic in, this
will be OK?

Best Regards
Lejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] gpio: Add support for Intel SoC PMIC (Crystal Cove)

2014-05-27 Thread Zhu, Lejun



On 5/27/2014 5:11 PM, Linus Walleij wrote:
> On Wed, May 21, 2014 at 7:22 AM, Zhu, Lejun  wrote:
> 
>> Devices based on Intel SoC products such as Baytrail have a Power
>> Management IC. In the PMIC there are subsystems for voltage regulation,
>> A/D conversion, GPIO and PWMs. The PMIC in Baytrail-T platform is called
>> Crystal Cove.
>>
>> This patch adds support for the GPIO function in Crystal Cove.
>>
>> v2:
>> - Use IRQ chip helper to provide irqdomain.
>> - Implement .remove and can now build as a module.
>> - Various fix for unreadable or ugly code pieces.
> 
> This is much improved! I still have comments though.
> 
>> +#define GPIO_TO_CTL(gpio, dir) \
>> +   ((gpio < 8 ? GPIO0P0CTL ## dir : GPIO1P0CTL ## dir) + (gpio % 8))
> 
> This is unreadble. Use an explicit static inline function instead.
> 
>> +static void crystalcove_update_irq_type(int gpio, int type)
>> +{
>> +   u8 ctli = GPIO_TO_CTL(gpio, I);
>> +
>> +   type &= IRQ_TYPE_EDGE_BOTH;
> 
> You silently ignore all other type configurations?
> 
>> +   intel_soc_pmic_clearb(ctli, CTLI_INTCNT_BE);
>> +
>> +   if (type == IRQ_TYPE_EDGE_BOTH)
>> +   intel_soc_pmic_setb(ctli, CTLI_INTCNT_BE);
>> +   else if (type == IRQ_TYPE_EDGE_RISING)
>> +   intel_soc_pmic_setb(ctli, CTLI_INTCNT_PE);
>> +   else if (type & IRQ_TYPE_EDGE_FALLING)
>> +   intel_soc_pmic_setb(ctli, CTLI_INTCNT_NE);
>> +}
> 
> I would prefer a switch(type) {} construction with a default:
> that warns.
> 
> (...)
>> +static int crystalcove_irq_type(struct irq_data *data, unsigned type)
>> +{
>> +   struct gpio_chip *gc = irq_data_get_irq_chip_data(data);
>> +   struct crystalcove_gpio *cg =
>> +   container_of(gc, struct crystalcove_gpio, chip);
> 
> I would create a static inline at the top of the file instead of
> using container_of() explicitly everywhere:
> 
> static inline struct crystalcove_gpio *to_cg(struct gpio_chip *gc)
> {
> return container_of(gc, struct crystalcove_gpio, chip);
> }
> 
> Then just use:
> 
> struct crystalcove_gpio *cg = to_cg(gc);
> 
> Everywhere. Or if you only want the cg in some case (like this?)
> 
> struct crystalcove_gpio *cg = to_cg(irq_data_get_irq_chip_data(data));
> 
>> +   cg->trigger_type = type;
>> +   cg->update |= UPDATE_TYPE;
>> +
>> +   return 0;
>> +}
> (...)
> 
>> +   gpiochip_irqchip_add(>chip, _irqchip, 0,
>> +handle_simple_irq, IRQ_TYPE_NONE);
> 
> Really nice. Thanks for doing this!
> 
> Yours,
> Linus Walleij
> 

Thank you. I will return -EINVAL for unsupported IRQ types and fix all
the coding style problems you listed here in the next version.

Best Regards
Lejun

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/16] byteorder: provide a linux/byteorder.h with {be,le}_to_cpu() and cpu_to_{be,le}() macros

2014-05-27 Thread Joe Perches

On Tue, 2014-05-27 at 17:22 -0700, Cody P Schafer wrote:
> Rather manually specifying the size of the integer to be converted, key
> off of the type size. Reduces duplicate size info and the occurance of
> certain types of bugs (using the wrong sized conversion).
[]
> diff --git a/include/linux/byteorder.h b/include/linux/byteorder.h
[]
> @@ -0,0 +1,34 @@
> +#ifndef LINUX_BYTEORDER_H_
> +#define LINUX_BYTEORDER_H_
> +
> +#include 
> +
> +#define be_to_cpu(v) \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), be16_to_cpu(v), \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), be32_to_cpu(v), \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), be64_to_cpu(v), \
> + (void)0

probably better to use BUILD_BUG instead of these 0 returns

> +
> +#define le_to_cpu(v) \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), le16_to_cpu(v), \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), le32_to_cpu(v), \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), le64_to_cpu(v), \
> + (void)0
> +
> +#define cpu_to_le(v) \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), cpu_to_le16(v), \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), cpu_to_le32(v), \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), cpu_to_le64(v), \
> + (void)0
> +
> +#define cpu_to_be(v) \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), cpu_to_be16(v), \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), cpu_to_be32(v), \
> + __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), cpu_to_be64(v), \
> + (void)0
> +
> +#endif



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] tracing: Don't account for cpu idle time with irqsoff tracers

2014-05-27 Thread Steven Rostedt

On Tue, 2014-05-27 at 17:11 -0700, Stephen Boyd wrote:

> cpuidle_enter_state() calls ktime_get() which on lockdep enabled builds
> calls seqcount_lockdep_reader_access() which calls local_irq_save() that

seqcount_lockdep_reader_access()?? Ug, I wonder if that should call
raw_local_irq_save/restore() as it's a lockdep helper to begin with. If
it's wrong then it's the lockdep infrastructure that broke, not the core
kernel.

Peter?

-- Steve


> then turns on the tracer again. Perhaps the problem is that irqsoff
> tracer is triggered even when we aren't transitioning between irqs on
> and irqs off? What about this patch? I assume there is a reason that
> this is wrong, but I don't know what it is.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v11 2/3] clk: exynos5410: register clocks using common clock framework

2014-05-27 Thread Mike Turquette

Quoting Tarek Dakhran (2014-05-25 20:23:32)
> The EXYNOS5410 clocks are statically listed and registered
> using the Samsung specific common clock helper functions.
> 
> Signed-off-by: Tarek Dakhran 
> Signed-off-by: Vyacheslav Tyrtov 
> ---
>  .../devicetree/bindings/clock/exynos5410-clock.txt |   45 +
>  drivers/clk/samsung/Makefile   |1 +
>  drivers/clk/samsung/clk-exynos5410.c   |  209 
> 
>  include/dt-bindings/clock/exynos5410.h |   33 
>  4 files changed, 288 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/clock/exynos5410-clock.txt
>  create mode 100644 drivers/clk/samsung/clk-exynos5410.c
>  create mode 100644 include/dt-bindings/clock/exynos5410.h
> 
> diff --git a/Documentation/devicetree/bindings/clock/exynos5410-clock.txt 
> b/Documentation/devicetree/bindings/clock/exynos5410-clock.txt
> new file mode 100644
> index 000..aeab635
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/clock/exynos5410-clock.txt
> @@ -0,0 +1,45 @@
> +* Samsung Exynos5410 Clock Controller
> +
> +The Exynos5410 clock controller generates and supplies clock to various
> +controllers within the Exynos5410 SoC.
> +
> +Required Properties:
> +
> +- compatible: should be "samsung,exynos5410-clock"
> +
> +- reg: physical base address of the controller and length of memory mapped
> +  region.
> +
> +- #clock-cells: should be 1.
> +
> +All available clocks are defined as preprocessor macros in
> +dt-bindings/clock/exynos5410.h header and can be used in device
> +tree sources.
> +
> +External clock:
> +
> +There is clock that is generated outside the SoC. It
> +is expected that it is defined using standard clock bindings
> +with following clock-output-name:
> +
> + - "fin_pll" - PLL input clock from XXTI

Does fin_pll feed into the exynos5410-clock controller? If so, should
the example clock-controller node below have a clocks and clock-names
property?

Otherwise patch looks good.

Regards,
Mike

> +
> +Example 1: An example of a clock controller node is listed below.
> +
> +   clock: clock-controller@0x1001 {
> +   compatible = "samsung,exynos5410-clock";
> +   reg = <0x1001 0x3>;
> +   #clock-cells = <1>;
> +   };
> +
> +Example 2: UART controller node that consumes the clock generated by the 
> clock
> +  controller. Refer to the standard clock bindings for information
> +  about 'clocks' and 'clock-names' property.
> +
> +   serial@12C2 {
> +   compatible = "samsung,exynos4210-uart";
> +   reg = <0x12C0 0x100>;
> +   interrupts = <0 51 0>;
> +   clocks = < CLK_UART0>, < CLK_SCLK_UART0>;
> +   clock-names = "uart", "clk_uart_baud0";
> +   };
> diff --git a/drivers/clk/samsung/Makefile b/drivers/clk/samsung/Makefile
> index 25646c6..69e8177 100644
> --- a/drivers/clk/samsung/Makefile
> +++ b/drivers/clk/samsung/Makefile
> @@ -7,6 +7,7 @@ obj-$(CONFIG_SOC_EXYNOS3250)+= clk-exynos3250.o
>  obj-$(CONFIG_ARCH_EXYNOS4) += clk-exynos4.o
>  obj-$(CONFIG_SOC_EXYNOS5250)   += clk-exynos5250.o
>  obj-$(CONFIG_SOC_EXYNOS5260)   += clk-exynos5260.o
> +obj-$(CONFIG_SOC_EXYNOS5410)   += clk-exynos5410.o
>  obj-$(CONFIG_SOC_EXYNOS5420)   += clk-exynos5420.o
>  obj-$(CONFIG_SOC_EXYNOS5440)   += clk-exynos5440.o
>  obj-$(CONFIG_ARCH_EXYNOS)  += clk-exynos-audss.o
> diff --git a/drivers/clk/samsung/clk-exynos5410.c 
> b/drivers/clk/samsung/clk-exynos5410.c
> new file mode 100644
> index 000..c9505ab
> --- /dev/null
> +++ b/drivers/clk/samsung/clk-exynos5410.c
> @@ -0,0 +1,209 @@
> +/*
> + * Copyright (c) 2013 Samsung Electronics Co., Ltd.
> + * Author: Tarek Dakhran 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Common Clock Framework support for Exynos5410 SoC.
> +*/
> +
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "clk.h"
> +
> +#define APLL_LOCK   0x0
> +#define APLL_CON0   0x100
> +#define CPLL_LOCK   0x10020
> +#define CPLL_CON0   0x10120
> +#define MPLL_LOCK   0x4000
> +#define MPLL_CON0   0x4100
> +#define BPLL_LOCK   0x20010
> +#define BPLL_CON0   0x20110
> +#define KPLL_LOCK   0x28000
> +#define KPLL_CON0   0x28100
> +
> +#define SRC_CPU0x200
> +#define DIV_CPU0   0x500
> +#define SRC_CPERI1 0x4204
> +#define DIV_TOP0   0x10510
> +#define DIV_TOP1   0x10514
> +#define DIV_FSYS1  0x1054c
> +#define DIV_FSYS2  0x10550
> +#define DIV_PERIC0 0x10558
> +#define SRC_TOP0   0x10210
> +#define SRC_TOP1

Re: [PATCH 6/6] mm/zpool: prevent zbud/zsmalloc from unloading when used

2014-05-27 Thread Dan Streetman

On Tue, May 27, 2014 at 6:40 PM, Seth Jennings  wrote:
> On Sat, May 24, 2014 at 03:06:09PM -0400, Dan Streetman wrote:
>> Add try_module_get() to pool creation functions for zbud and zsmalloc,
>> and module_put() to pool destruction functions, since they now can be
>> modules used via zpool.  Without usage counting, they could be unloaded
>> while pool(s) were active, resulting in an oops.
>
> I like the idea here, but what about doing this in the zpool layer? For
> me, it is kinda weird for a module to be taking a ref on itself.  Maybe
> this is excepted practice.  Is there precedent for this?

It's done in some places already:
git grep try_module_get\(THIS_MODULE | wc -l
83

but it definitely could be done in zpool, and since other users of
zbud/zsmalloc would be calling directly to their functions, instead of
indirectly by driver registration, I believe the module dependency
there would prevent zbud/zsmalloc unloading while a using module was
still loaded (if I understand module usage counting correctly).

>
> What about having the zbud/zsmalloc drivers pass their module pointers
> to zpool_register_driver() as an additional field in struct zpool_driver
> and have zpool take the reference?  Since zpool is the one in trouble if
> the driver is unloaded.

Yep this seems to be the other common way of doing it, with a ->owner
field in the registered struct.  Either way is fine with me, and zpool
definitely is the one in trouble if its driver is unloaded.  I'll
update for v4 of this patch set.

>
> Seth
>
>>
>> Signed-off-by: Dan Streetman 
>> Cc: Seth Jennings 
>> Cc: Minchan Kim 
>> Cc: Nitin Gupta 
>> Cc: Weijie Yang 
>> ---
>>
>> New for this patch set.
>>
>>  mm/zbud.c | 5 +
>>  mm/zsmalloc.c | 5 +
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/mm/zbud.c b/mm/zbud.c
>> index 8a72cb1..2b3689c 100644
>> --- a/mm/zbud.c
>> +++ b/mm/zbud.c
>> @@ -282,6 +282,10 @@ struct zbud_pool *zbud_create_pool(gfp_t gfp, struct 
>> zbud_ops *ops)
>>   pool = kmalloc(sizeof(struct zbud_pool), GFP_KERNEL);
>>   if (!pool)
>>   return NULL;
>> + if (!try_module_get(THIS_MODULE)) {
>> + kfree(pool);
>> + return NULL;
>> + }
>>   spin_lock_init(>lock);
>>   for_each_unbuddied_list(i, 0)
>>   INIT_LIST_HEAD(>unbuddied[i]);
>> @@ -302,6 +306,7 @@ struct zbud_pool *zbud_create_pool(gfp_t gfp, struct 
>> zbud_ops *ops)
>>  void zbud_destroy_pool(struct zbud_pool *pool)
>>  {
>>   kfree(pool);
>> + module_put(THIS_MODULE);
>>  }
>>
>>  /**
>> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
>> index 07c3130..2cc2647 100644
>> --- a/mm/zsmalloc.c
>> +++ b/mm/zsmalloc.c
>> @@ -946,6 +946,10 @@ struct zs_pool *zs_create_pool(gfp_t flags)
>>   pool = kzalloc(ovhd_size, GFP_KERNEL);
>>   if (!pool)
>>   return NULL;
>> + if (!try_module_get(THIS_MODULE)) {
>> + kfree(pool);
>> + return NULL;
>> + }
>>
>>   for (i = 0; i < ZS_SIZE_CLASSES; i++) {
>>   int size;
>> @@ -985,6 +989,7 @@ void zs_destroy_pool(struct zs_pool *pool)
>>   }
>>   }
>>   kfree(pool);
>> + module_put(THIS_MODULE);
>>  }
>>  EXPORT_SYMBOL_GPL(zs_destroy_pool);
>>
>> --
>> 1.8.3.1
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/5] Smart Card(SC) interface, TI USIM & NxP SC phy driver

2014-05-27 Thread Greg KH

On Fri, May 16, 2014 at 09:14:34AM +0530, Satish Patel wrote:
> 
> On 1/30/2014 6:35 PM, Greg KH wrote:
> >On Thu, Jan 30, 2014 at 11:22:48AM +0530, Satish Patel wrote:
> >>On 1/20/2014 10:03 AM, Satish Patel wrote:
> >>>Changes from v1:
> >>>* RFC(v1) comments are fixed
> >>>
> >>>** removed "gpio_to_irq" as GPIO controller process  cell from DT and
> >>>give it to DT node
> >>>** comments on documentation
> >>>** few other comments on null checks are resolved
> >>>
> >>>* BWT timing configuration is added to ti-usim driver
> >>>
> >>>v1 cover letter link#
> >>>https://lkml.org/lkml/2014/1/6/250
> >>>
> >>>Satish Patel (5):
> >>>   sc_phy:SmartCard(SC) PHY interface to SC controller
> >>>   misc: tda8026: Add NXP TDA8026 PHY driver
> >>>   char: ti-usim: Add driver for USIM module on AM43xx
> >>>   ARM: dts: AM43xx: DT entries added for ti-usim
> >>>   ARM: dts: AM43xx-epos-evm: DT entries  for ti-usim and phy
> >>>
> >>>  Documentation/devicetree/bindings/misc/tda8026.txt |   19 +
> >>>  .../devicetree/bindings/ti-usim/ti-usim.txt|   31 +
> >>>  Documentation/sc_phy.txt   |  171 ++
> >>>  arch/arm/boot/dts/am4372.dtsi  |   10 +
> >>>  arch/arm/boot/dts/am43x-epos-evm.dts   |   43 +
> >>>  drivers/char/Kconfig   |7 +
> >>>  drivers/char/Makefile  |1 +
> >>>  drivers/char/ti-usim-hw.h  |  863 +
> >>>  drivers/char/ti-usim.c | 1859 
> >>> 
> >>>  drivers/misc/Kconfig   |7 +
> >>>  drivers/misc/Makefile  |1 +
> >>>  drivers/misc/tda8026.c | 1255 +
> >>>  include/linux/sc_phy.h |  132 ++
> >>>  include/linux/ti-usim.h|   98 +
> >>>  14 files changed, 4497 insertions(+), 0 deletions(-)
> >>>  create mode 100644 Documentation/devicetree/bindings/misc/tda8026.txt
> >>>  create mode 100644 Documentation/devicetree/bindings/ti-usim/ti-usim.txt
> >>>  create mode 100644 Documentation/sc_phy.txt
> >>>  create mode 100644 drivers/char/ti-usim-hw.h
> >>>  create mode 100644 drivers/char/ti-usim.c
> >>>  create mode 100644 drivers/misc/tda8026.c
> >>>  create mode 100644 include/linux/sc_phy.h
> >>>  create mode 100644 include/linux/ti-usim.h
> >>Any comments on this patch series ?
> >>
> >>If not,
> >>Can you accept these patches for next merge window
> >It's the middle of this merge window, and I can't accept any patches
> >until after 3.14-rc1 is out, at which point I'll start to work on my
> >patch backlog.
> Are these to be consider for next submission ? Or you want me to start
> review cycle one more time.

I don't have them in my queue, so please resend.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1] of/irq: do irq resolution in platform_get_irq_byname()

2014-05-27 Thread Rob Herring

On Tue, May 27, 2014 at 3:23 PM, Kevin Hilman  wrote:
> On Fri, May 23, 2014 at 1:03 AM, Grant Likely  wrote:
>> On Tue, 20 May 2014 13:42:02 +0300, Grygorii Strashko 
>>  wrote:
>>> The commit 9ec36cafe43bf835f8f29273597a5b0cbc8267ef
>>> "of/irq: do irq resolution in platform_get_irq" from Rob Herring -
>>> moves resolving of the interrupt resources in platform_get_irq().
>>> But this solution isn't complete because platform_get_irq_byname()
>>> need to be modified the same way.
>>>
>>> Hence, fix it by adding interrupt resolution code at the
>>> platform_get_irq_byname() function too.
>>>
>>> Cc: Russell King 
>>> Cc: Rob Herring 
>>> Cc: Tony Lindgren 
>>> Cc: Grant Likely 
>>> Cc: Thierry Reding 
>>>
>>> Signed-off-by: Grygorii Strashko 
>>
>> Applied, Thanks.
>
> As of next-20150526, the ST u8500 Snowball board has been failing boot
> in linux-next, and was bisected down to this patch (commit
> ad69674e73a1 in -next).   Full boot failure attached.
>
> I have not dug any deeper, but can confirm that next-20140526 with
> this patch reverted boots again on the snowball board.

There's a patch on the list which fixes it. The problem is stmmac
driver was expecting only one error code.

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [char-misc-next 3/3] mei: add WPT second mei interface

2014-05-27 Thread Greg KH

On Tue, May 27, 2014 at 11:47:44PM +, Winkler, Tomas wrote:
> 
> 
> > -Original Message-
> > From: Greg KH [mailto:gre...@linuxfoundation.org]
> > Sent: Wednesday, May 28, 2014 01:07
> > To: Winkler, Tomas
> > Cc: a...@arndb.de; linux-kernel@vger.kernel.org; Usyskin, Alexander
> > Subject: Re: [char-misc-next 3/3] mei: add WPT second mei interface
> > 
> > On Tue, May 27, 2014 at 09:42:19PM +, Winkler, Tomas wrote:
> > > > > +/* PCH devices MEI 2 interface */
> > > > > +const struct mei_cfg mei_me_pch_2_cfg = {
> > > > > + MEI_CFG_PCH_HFS,
> > > > > + .mei_id = 1
> > > >
> > > > That's going to be a recipe for disaster.  Have the MEI core allocate
> > > > the id numbers as things are registered, don't have the individual
> > > > drivers create their id.
> > >
> > > I'm don't think can ensure the enumeration order.
> > 
> > You should not be relying on the order to get anything right.
> > 
> > > This is per device not per driver configuration structure.
> > > Each pci device is actually just  another head to one MEI device but 
> > > heads are not
> > equal the name/id matters
> > > Yes I assume it looks odd at the first glance, anyhow we are open to any
> > reasonable suggestions
> > 
> > Just dynamically allocate the numbers like all other subsystems do?
> >
> > Then userspace can open the device nodes it cares about, it should be
> > able to somehow tell what device is what somehow, right?  If not, you
> > are doing something wrong with the interface as you can't rely on minor
> > numbers.
> > 
> 
> The only way for user space to distinguish among interfaces is the same way 
> as driver does by looking for device id. 

What's with the lack of line-wrapping?

Anyway, why does userspace care?  If it wants to look at the device id,
then just look at the symlink to the device id of the device node.  Then
you don't care what the device node is named / numbered.

> The assignment of the device name/id is pretty much static in the HW.
> Now the same device id database from the driver has to be copied to
> the udev rules or equivalent  and kept in sync.

What?  No.  Why would you want to do that?  Why does it matter what
order these devices are connected?  Are they talked to in different
ways?  Are they different "classes" of devices?  What are they and why
aren't they just "interchangeable" as far as userspace cares?

> Most importantly all the legacy user space just opens /dev/mei  so I
> wanted to leave /dev/mei (not /dev/mei0) to be assigned always to the
> first mei head and to not break existing user space. 

That's fine, I have no objection to that.

> Can you be a bit more specific on why we cannot depend on minor
> numbers and default naming rules, what is the scenario that will
> break? 

Why is the kernel picking a name for these with a specific minor number
in it?  What makes this different from a random tty device where you
don't care what the minor number is, you just care about where is it
connected so you look at the other things "around" it (pci path, vendor
id, etc.) and create a deterministic name from with a symlink, if you
care, with udev / mdev.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] perf, tools: Support spark lines in perf stat v4

2014-05-27 Thread Andi Kleen

From: Andi Kleen 

perf stat -rX prints the stddev for multiple measurements.
Just looking at the stddev for judging the quality of the data
is a bit dangerous The simplest sanity check is to just look
at a simple plot. This patchs add a sparkline to the end
of the measurements to make it simple to judge the data.

The sparkline only uses UTF-8, so should be readable
in all modern tools and terminals.

The sparkline is between the minimum and maximum of the data,
so it's mainly a indicator of variance. To keep the code
simple and make the output not too wide only the first
8 values are printed. If more values are there it adds '..'

The code is inspired by Zach Holman's spark shell script.

Example output (view in non-proportial font):

 Performance counter stats for 'true' (10 runs):

  0.175672  task-clock (msec) #0.555 CPUs utilized  
  ( +-  1.77% ) █▄▁▁..
 0  context-switches  #0.000 K/sec
 0  cpu-migrations#0.000 K/sec
   114  page-faults   #0.647 M/sec  
  ( +-  0.14% ) ▁█▁▁..
   520,798  cycles#2.965 GHz
  ( +-  1.75% ) █▄▁▁..
   433,525  instructions  #0.83  insns per cycle
  ( +-  0.28% ) ▅▇▅▄▇█▁▆..
83,012  branches  #  472.537 M/sec  
  ( +-  0.31% ) ▅▇▆▄▇█▁▆..
 3,157  branch-misses #3.80% of all branches
  ( +-  2.55% ) ▇█▃▅▁▃▁▂..

   0.000316660 seconds time elapsed 
 ( +-  1.78% ) █▅▁▁..

As you can see even in the most simple run there are quite interesting
patterns. The time sparkline suggests it would be also useful to have an option
to throw the first measurement away.

Known issues:
- Makes the perf stat output wider. Could be adjust by shrinking
some white space. Not done so far.
- No output for -A/--per-socket/--per-core with -rX. This code
is missing the basic noise detection code. Once it's added there
sparklines could be shown too.

v2: Avoid printing spark lines for normal CSV case (Jiri)
v3: LONG->ULONG, random changes
v4: Add some missing changes from the forked v2: checks value is not
zero instead of all the same. Update documentation. Remove n variable.
Remove #pragma once
Signed-off-by: Andi Kleen 
---
 tools/perf/Documentation/perf-stat.txt |  4 
 tools/perf/Makefile.perf   |  1 +
 tools/perf/builtin-stat.c  | 12 
 tools/perf/util/spark.c| 28 
 tools/perf/util/spark.h|  4 
 tools/perf/util/stat.c | 34 ++
 tools/perf/util/stat.h | 10 ++
 7 files changed, 93 insertions(+)
 create mode 100644 tools/perf/util/spark.c
 create mode 100644 tools/perf/util/spark.h

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 29ee857..840c1db 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -53,6 +53,10 @@ OPTIONS
 -r::
 --repeat=::
repeat command and print average + stddev (max: 100). 0 means forever.
+   In addition it prints a spark line (when not in CSV mode), which 
visualizes the
+   variance between minimum and maximum of the measurements. This allows a 
simple sanity
+   check of the measurements. Only 8 values are printed, when more are 
available
+   it adds ..
 
 -B::
 --big-num::
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 7257e7e..432d099 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -359,6 +359,7 @@ LIB_OBJS += $(OUTPUT)util/trace-event-scripting.o
 LIB_OBJS += $(OUTPUT)util/trace-event.o
 LIB_OBJS += $(OUTPUT)util/svghelper.o
 LIB_OBJS += $(OUTPUT)util/sort.o
+LIB_OBJS += $(OUTPUT)util/spark.o
 LIB_OBJS += $(OUTPUT)util/hist.o
 LIB_OBJS += $(OUTPUT)util/probe-event.o
 LIB_OBJS += $(OUTPUT)util/util.o
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 65a151e..cb0f7c5 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1176,6 +1176,9 @@ static void print_aggr(char *prefix)
if (run != ena)
fprintf(output, "  (%.2f%%)",
100.0 * run / ena);
+
+   fputc(' ', output);
+   print_stat_spark(output, counter->priv);
}
fputc('\n', output);
}
@@ -1229,6 +1232,9 @@ static void print_counter_aggr(struct perf_evsel 
*counter, char *prefix)
return;
}
 
+   fputc(' ', output);
+   print_stat_spark(output, counter->priv);
+
if (scaled) {

[PATCH 14/16] perf: add PMU_EVENT_ATTR_STRING() helper

2014-05-27 Thread Cody P Schafer

Helper for constructing static struct perf_pmu_events_attr s.

CC: Sukadev Bhattiprolu 
Signed-off-by: Cody P Schafer 
---
 include/linux/perf_event.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6c1d6dd..1313171 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -876,6 +876,13 @@ static struct perf_pmu_events_attr _var = {
\
.id   =  _id,   \
 };
 
+#define PMU_EVENT_ATTR_STRING(_name, _var, _value) \
+static struct perf_pmu_events_attr _var = {\
+   .attr = __ATTR(_name, 0444, perf_event_sysfs_show, NULL),   \
+   .event_str = _value,\
+};
+
+
 #define PMU_FORMAT_ATTR(_name, _format)
\
 static ssize_t \
 _name##_show(struct device *dev,   \
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 15/16] powerpc/perf/{hv-gpci,hv-common}: generate requests with counters annotated

2014-05-27 Thread Cody P Schafer

This adds (in req-gen/) a framework for defining gpci counter requests.
It uses macro magic similar to ftrace.

Also convert the existing hv-gpci request structures and enum values to
use the new framework (and adjust old users of the structs and enum
values to cope with changes in naming).

In exchange for this macro disaster, we get autogenerated event listing
for GPCI in sysfs, build time field offset checking, and zero
duplication of information about GPCI requests.

CC: Sukadev Bhattiprolu 
Signed-off-by: Cody P Schafer 
---
 arch/powerpc/perf/hv-common.c  |  10 +-
 arch/powerpc/perf/hv-gpci-requests.h   |  79 +++
 arch/powerpc/perf/hv-gpci.c|   8 ++
 arch/powerpc/perf/hv-gpci.h|  37 +++
 arch/powerpc/perf/req-gen/_begin.h |  13 +++
 arch/powerpc/perf/req-gen/_clear.h |   5 +
 arch/powerpc/perf/req-gen/_end.h   |   4 +
 arch/powerpc/perf/req-gen/_request-begin.h |  15 +++
 arch/powerpc/perf/req-gen/_request-end.h   |   8 ++
 arch/powerpc/perf/req-gen/perf.h   | 155 +
 10 files changed, 304 insertions(+), 30 deletions(-)
 create mode 100644 arch/powerpc/perf/hv-gpci-requests.h
 create mode 100644 arch/powerpc/perf/req-gen/_begin.h
 create mode 100644 arch/powerpc/perf/req-gen/_clear.h
 create mode 100644 arch/powerpc/perf/req-gen/_end.h
 create mode 100644 arch/powerpc/perf/req-gen/_request-begin.h
 create mode 100644 arch/powerpc/perf/req-gen/_request-end.h
 create mode 100644 arch/powerpc/perf/req-gen/perf.h

diff --git a/arch/powerpc/perf/hv-common.c b/arch/powerpc/perf/hv-common.c
index 47e02b3..7dce8f10 100644
--- a/arch/powerpc/perf/hv-common.c
+++ b/arch/powerpc/perf/hv-common.c
@@ -9,13 +9,13 @@ unsigned long hv_perf_caps_get(struct hv_perf_caps *caps)
unsigned long r;
struct p {
struct hv_get_perf_counter_info_params params;
-   struct cv_system_performance_capabilities caps;
+   struct hv_gpci_system_performance_capabilities caps;
} __packed __aligned(sizeof(uint64_t));
 
struct p arg = {
.params = {
.counter_request = cpu_to_be32(
-   CIR_SYSTEM_PERFORMANCE_CAPABILITIES),
+   HV_GPCI_system_performance_capabilities),
.starting_index = cpu_to_be32(-1),
.counter_info_version_in = 0,
}
@@ -31,9 +31,9 @@ unsigned long hv_perf_caps_get(struct hv_perf_caps *caps)
 
caps->version = arg.params.counter_info_version_out;
caps->collect_privileged = !!arg.caps.perf_collect_privileged;
-   caps->ga = !!(arg.caps.capability_mask & CV_CM_GA);
-   caps->expanded = !!(arg.caps.capability_mask & CV_CM_EXPANDED);
-   caps->lab = !!(arg.caps.capability_mask & CV_CM_LAB);
+   caps->ga = !!(arg.caps.capability_mask & HV_GPCI_CM_GA);
+   caps->expanded = !!(arg.caps.capability_mask & HV_GPCI_CM_EXPANDED);
+   caps->lab = !!(arg.caps.capability_mask & HV_GPCI_CM_LAB);
 
return r;
 }
diff --git a/arch/powerpc/perf/hv-gpci-requests.h 
b/arch/powerpc/perf/hv-gpci-requests.h
new file mode 100644
index 000..0dfc4d9
--- /dev/null
+++ b/arch/powerpc/perf/hv-gpci-requests.h
@@ -0,0 +1,79 @@
+
+#include "req-gen/_begin.h"
+
+/*
+ * Based on the document "getPerfCountInfo v1.07"
+ */
+
+/* this needs to be -1 encoded in hex suitable for parsing by tools/perf. */
+#define M1 0x
+
+/*
+ * #define REQUEST_NAME counter_request_name
+ * #define REQUEST_NUM r_num
+ * #define REQUEST_IDX_KIND starting_index_kind
+ * #include I(REQUEST_BEGIN)
+ * REQUEST(
+ * __field(...)
+ * __field(...)
+ * __array(...)
+ * __count(...)
+ * )
+ * #include I(REQUEST_END)
+ *
+ * - starting_index_kind is one of:
+ *   M1: must be -1
+ *   chip_id: hardware chip id or -1 for current hw chip
+ *   phys_processor_idx:
+ *
+ * __count(offset, bytes, name):
+ * a counter that should be exposed via perf
+ * __field(offset, bytes, name)
+ * a normal field
+ * __array(offset, bytes, name)
+ * an array of bytes
+ *
+ *
+ * @bytes for __count, and __field _must_ be a numeral token
+ * in decimal, not an expression and not in hex.
+ *
+ *
+ * TODO:
+ * - expose secondary index (if any counter ever uses it, only 0xA0
+ *   appears to use it right now, and it doesn't have any counters)
+ * - embed versioning info
+ * - include counter descriptions
+ */
+#define REQUEST_NAME dispatch_timebase_by_processor
+#define REQUEST_NUM 0x10
+#define REQUEST_IDX_KIND phys_processor_idx
+#include I(REQUEST_BEGIN)
+REQUEST(__count(0, 8,  processor_time_in_timebase_cycles)
+   __field(0x8,4,  hw_processor_id)
+   __field(0xC,2,  owning_part_id)
+   __field(0xE,1,  processor_state)
+   __field(0xF,1,  version)
+   __field(0x10,   4,

[PATCH 16/16] powerpc/perf/hv-gpci: add the remaining gpci requests

2014-05-27 Thread Cody P Schafer

Add the remaining gpci requests that contain counters suitable for use
by perf. Omit those that don't contain any counters (but note their
ommision).

CC: Sukadev Bhattiprolu 
Signed-off-by: Cody P Schafer 
---
 arch/powerpc/perf/hv-gpci-requests.h | 179 +++
 1 file changed, 179 insertions(+)

diff --git a/arch/powerpc/perf/hv-gpci-requests.h 
b/arch/powerpc/perf/hv-gpci-requests.h
index 0dfc4d9..af3b73c 100644
--- a/arch/powerpc/perf/hv-gpci-requests.h
+++ b/arch/powerpc/perf/hv-gpci-requests.h
@@ -65,6 +65,33 @@ REQUEST(__count(0,   8,  
processor_time_in_timebase_cycles)
 )
 #include I(REQUEST_END)
 
+#define REQUEST_NAME 
entitled_capped_uncapped_donated_idle_timebase_by_partition
+#define REQUEST_NUM 0x20
+#define REQUEST_IDX_KIND sibling_part_id
+#include I(REQUEST_BEGIN)
+REQUEST(__field(0, 8,  partition_id)
+   __count(0x8,8,  entitled_cycles)
+   __count(0x10,   8,  consumed_capped_cycles)
+   __count(0x18,   8,  consumed_uncapped_cycles)
+   __count(0x20,   8,  cycles_donated)
+   __count(0x28,   8,  purr_idle_cycles)
+)
+#include I(REQUEST_END)
+
+/*
+ * Not avaliable for counter_info_version >= 0x8, use
+ * run_instruction_cycles_by_partition(0x100) instead.
+ */
+#define REQUEST_NAME run_instructions_run_cycles_by_partition
+#define REQUEST_NUM 0x30
+#define REQUEST_IDX_KIND sibling_part_id
+#include I(REQUEST_BEGIN)
+REQUEST(__field(0, 8,  partition_id)
+   __count(0x8,8,  instructions_completed)
+   __count(0x10,   8,  cycles)
+)
+#include I(REQUEST_END)
+
 #define REQUEST_NAME system_performance_capabilities
 #define REQUEST_NUM 0x40
 #define REQUEST_IDX_KIND M1
@@ -75,5 +102,157 @@ REQUEST(__field(0, 1,  perf_collect_privileged)
 )
 #include I(REQUEST_END)
 
+#define REQUEST_NAME processor_bus_utilization_abc_links
+#define REQUEST_NUM 0x50
+#define REQUEST_IDX_KIND hw_chip_id
+#include I(REQUEST_BEGIN)
+REQUEST(__field(0, 4,  hw_chip_id)
+   __array(0x4,0xC,reserved1)
+   __count(0x10,   8,  total_link_cycles)
+   __count(0x18,   8,  idle_cycles_for_a_link)
+   __count(0x20,   8,  idle_cycles_for_b_link)
+   __count(0x28,   8,  idle_cycles_for_c_link)
+   __array(0x30,   0x20,   reserved2)
+)
+#include I(REQUEST_END)
+
+#define REQUEST_NAME processor_bus_utilization_wxyz_links
+#define REQUEST_NUM 0x60
+#define REQUEST_IDX_KIND hw_chip_id
+#include I(REQUEST_BEGIN)
+REQUEST(__field(0, 4,  hw_chip_id)
+   __array(0x4,0xC,reserved1)
+   __count(0x10,   8,  total_link_cycles)
+   __count(0x18,   8,  idle_cycles_for_w_link)
+   __count(0x20,   8,  idle_cycles_for_x_link)
+   __count(0x28,   8,  idle_cycles_for_y_link)
+   __count(0x30,   8,  idle_cycles_for_z_link)
+   __array(0x38,   0x28,   reserved2)
+)
+#include I(REQUEST_END)
+
+#define REQUEST_NAME processor_bus_utilization_gx_links
+#define REQUEST_NUM 0x70
+#define REQUEST_IDX_KIND hw_chip_id
+#include I(REQUEST_BEGIN)
+REQUEST(__field(0, 4,  hw_chip_id)
+   __array(0x4,0xC,reserved1)
+   __count(0x10,   8,  gx0_in_address_cycles)
+   __count(0x18,   8,  gx0_in_data_cycles)
+   __count(0x20,   8,  gx0_in_retries)
+   __count(0x28,   8,  gx0_in_bus_cycles)
+   __count(0x30,   8,  gx0_in_cycles_total)
+   __count(0x38,   8,  gx0_out_address_cycles)
+   __count(0x40,   8,  gx0_out_data_cycles)
+   __count(0x48,   8,  gx0_out_retries)
+   __count(0x50,   8,  gx0_out_bus_cycles)
+   __count(0x58,   8,  gx0_out_cycles_total)
+   __count(0x60,   8,  gx1_in_address_cycles)
+   __count(0x68,   8,  gx1_in_data_cycles)
+   __count(0x70,   8,  gx1_in_retries)
+   __count(0x78,   8,  gx1_in_bus_cycles)
+   __count(0x80,   8,  gx1_in_cycles_total)
+   __count(0x88,   8,  gx1_out_address_cycles)
+   __count(0x90,   8,  gx1_out_data_cycles)
+   __count(0x98,   8,  gx1_out_retries)
+   __count(0xA0,   8,  gx1_out_bus_cycles)
+   __count(0xA8,   8,  gx1_out_cycles_total)
+)
+#include I(REQUEST_END)
+
+#define REQUEST_NAME processor_bus_utilization_mc_links
+#define REQUEST_NUM 0x80
+#define REQUEST_IDX_KIND hw_chip_id
+#include I(REQUEST_BEGIN)
+REQUEST(__field(0, 4,  hw_chip_id)
+   __array(0x4,0xC,reserved1)
+   __count(0x10,   8,  mc0_frames)
+   __count(0x18,   8,  mc0_reads)
+   __count(0x20,   8,  mc0_write)
+   __count(0x28,   8,  mc0_total_cycles)
+   __count(0x30,   8,  mc1_frames)
+   __count(0x38,   8,  mc1_reads)
+   __count(0x40,   8,  mc1_writes)
+   __count(0x48,   8,  mc1_total_cycles)
+)
+#include I(REQUEST_END)
+
+/* Processor_config (0x90) skipped, no counters */
+/* Current_processor_frequency (0x91) skipped, no counters */
+
+#define

Re: [PATCH] arm: Set hardirq tracing to on when idling

2014-05-27 Thread Corey Minyard

On 05/27/2014 02:27 PM, Arnd Bergmann wrote:
> On Tuesday 27 May 2014 11:53:59 Stephen Boyd wrote:
>> On 05/27/14 11:49, Arnd Bergmann wrote:
>>> You also commented in that thread about stop_critical_timings()/
>>> start_critical_timings(). Corey, can you look at that, too? I
>>> think it's designed to avoid the issue you are seeing but
>>> for some reason doesn't.
>> I sent a patch last week to "solve" this problem. I'm not sure if it's
>> right but it works for me.
>>
>> https://lkml.org/lkml/2014/5/19/607
> I think that one was also wrong, as the intention of the existing
> stop_critical_timings() function is already to do the same that
> Corey's patch does, i.e. stop the trace before we go to idle as
> if we were turning IRQs on.
>
> Corey, does it work for you if you replace the new trace_hardirqs_on()
> you added with time_hardirqs_on() or stop_critical_timing()?

Well, more information on this.  It turns out that the generic idle loop
calls stop_critical_timing() and start_critical timing(), so the
arch_cpu_idle() shouldn't have to.

However, the idle loop calls rcu_idle_enter() after it calls
stop_critical_timing(), and that is resetting the critical timing, it
appears.  It's disabling/enabling interrupts in rcu_idle_enter().  If I
switch the order of the rcu_idle and critical timing calls, the issue
goes away.

Stephen's patch does not seem to be necessary for my issue. I tried with
the patch applied, too.  It doesn't seem to hurt, at least.  It did not
fix the problem by itself, though.

-corey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/16] powerpc/perf/hv-24x7: parse catalog and populate sysfs with events

2014-05-27 Thread Cody P Schafer

Retrieves and parses the 24x7 catalog on POWER systems that supply it
(right now, only POWER 8). Events are exposed via sysfs in the standard
fashion, and are all parameterized.

Catalog is (at the moment) only parsed on boot. It needs re-parsing
when a some hypervisor events occur. At that point we'll also need to
prevent old events from continuing to function (counter that is passed
in via spare space in the config values?).

CC: Sukadev Bhattiprolu 
Signed-off-by: Cody P Schafer 
---
 arch/powerpc/perf/hv-24x7-catalog.h |  25 ++
 arch/powerpc/perf/hv-24x7-domains.h |  19 +
 arch/powerpc/perf/hv-24x7.c | 760 +++-
 arch/powerpc/perf/hv-24x7.h |  12 +-
 4 files changed, 804 insertions(+), 12 deletions(-)
 create mode 100644 arch/powerpc/perf/hv-24x7-domains.h

diff --git a/arch/powerpc/perf/hv-24x7-catalog.h 
b/arch/powerpc/perf/hv-24x7-catalog.h
index 21b19dd..69e2e1f 100644
--- a/arch/powerpc/perf/hv-24x7-catalog.h
+++ b/arch/powerpc/perf/hv-24x7-catalog.h
@@ -30,4 +30,29 @@ struct hv_24x7_catalog_page_0 {
__u8 reserved6[2];
 } __packed;
 
+struct hv_24x7_event_data {
+   __be16 length; /* in bytes, must be a multiple of 16 */
+   __u8 reserved1[2];
+   __u8 domain; /* Chip = 1, Core = 2 */
+   __u8 reserved2[1];
+   __be16 event_group_record_offs; /* in bytes, must be 8 byte aligned */
+   __be16 event_group_record_len; /* in bytes */
+
+   /* in bytes, offset from event_group_record */
+   __be16 event_counter_offs;
+
+   /* verified_state, unverified_state, caveat_state, broken_state, ... */
+   __be32 flags;
+
+   __be16 primary_group_ix;
+   __be16 group_count;
+   __be16 event_name_len;
+   __u8 remainder[];
+   /* __u8 event_name[event_name_len - 2]; */
+   /* __be16 event_description_len; */
+   /* __u8 event_desc[event_description_len - 2]; */
+   /* __be16 detailed_desc_len; */
+   /* __u8 detailed_desc[detailed_desc_len - 2]; */
+} __packed;
+
 #endif
diff --git a/arch/powerpc/perf/hv-24x7-domains.h 
b/arch/powerpc/perf/hv-24x7-domains.h
new file mode 100644
index 000..9c5c862
--- /dev/null
+++ b/arch/powerpc/perf/hv-24x7-domains.h
@@ -0,0 +1,19 @@
+
+/*
+ * DOMAIN(name, num, index_kind, is_physical)
+ *
+ * @name: an all caps token, suitable for use in generating an enum member and
+ *appending to an event name in sysfs.
+ * @num: the number corresponding to the domain as given in documentation. We
+ *   assume the catalog domain and the hcall domain have the same numbering
+ *   (so far they do), but this may need to be changed in the future.
+ * @index_kind: a stringifiable token describing the meaning of the index 
within the
+ *  given domain. Must fit the parsing rules of the perf sysfs api.
+ * @is_physical: true if the domain is physical, false otherwise (if virtual).
+ */
+DOMAIN(PHYSICAL_CHIP, 0x01, chip, true)
+DOMAIN(PHYSICAL_CORE, 0x02, core, true)
+DOMAIN(VIRTUAL_PROCESSOR_HOME_CORE, 0x03, vcpu, false)
+DOMAIN(VIRTUAL_PROCESSOR_HOME_CHIP, 0x04, vcpu, false)
+DOMAIN(VIRTUAL_PROCESSOR_HOME_NODE, 0x05, vcpu, false)
+DOMAIN(VIRTUAL_PROCESSOR_REMOTE_NODE, 0x06, vcpu, false)
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 9a7a830..c9b7c55 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1,3 +1,4 @@
+#define DEBUG 1
 /*
  * Hypervisor supplied "24x7" performance counter support
  *
@@ -12,9 +13,13 @@
 
 #define pr_fmt(fmt) "hv-24x7: " fmt
 
+#include 
 #include 
+#include 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -23,6 +28,66 @@
 #include "hv-24x7-catalog.h"
 #include "hv-common.h"
 
+static const char *domain_to_index_string(unsigned domain)
+{
+   switch (domain) {
+#define DOMAIN(n, v, x, c) \
+   case HV_PERF_DOMAIN_##n:\
+   return #x;
+#include "hv-24x7-domains.h"
+#undef DOMAIN
+   default:
+   WARN(1, "unknown domain %d\n", domain);
+   return "UNKNOWN_DOMAIN_INDEX_STRING";
+   }
+}
+
+static const char *event_domain_suffix(unsigned domain)
+{
+   switch (domain) {
+#define DOMAIN(n, v, x, c) \
+   case HV_PERF_DOMAIN_##n:\
+   return "__" #n;
+#include "hv-24x7-domains.h"
+#undef DOMAIN
+   default:
+   WARN(1, "unknown domain %d\n", domain);
+   return "__UNKNOWN_DOMAIN_SUFFIX";
+   }
+}
+
+static bool domain_is_valid(unsigned domain)
+{
+   switch (domain) {
+#define DOMAIN(n, v, x, c) \
+   case HV_PERF_DOMAIN_##n:\
+   /* fall through */
+#include "hv-24x7-domains.h"
+#undef DOMAIN
+   return true;
+   default:
+   return false;
+   }
+}
+
+static bool is_physical_domain(unsigned domain)
+{
+   switch (domain) {
+#define DOMAIN(n, v, x, c) \
+   case HV_PERF_DOMAIN_##n:\
+

[PATCH 11/16] byteorder: provide a linux/byteorder.h with {be,le}_to_cpu() and cpu_to_{be,le}() macros

2014-05-27 Thread Cody P Schafer

Rather manually specifying the size of the integer to be converted, key
off of the type size. Reduces duplicate size info and the occurance of
certain types of bugs (using the wrong sized conversion).

CC: Sukadev Bhattiprolu 
Signed-off-by: Cody P Schafer 
---
 include/linux/byteorder.h | 34 ++
 1 file changed, 34 insertions(+)
 create mode 100644 include/linux/byteorder.h

diff --git a/include/linux/byteorder.h b/include/linux/byteorder.h
new file mode 100644
index 000..c7ab8da
--- /dev/null
+++ b/include/linux/byteorder.h
@@ -0,0 +1,34 @@
+#ifndef LINUX_BYTEORDER_H_
+#define LINUX_BYTEORDER_H_
+
+#include 
+
+#define be_to_cpu(v) \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), be16_to_cpu(v), \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), be32_to_cpu(v), \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), be64_to_cpu(v), \
+   (void)0
+
+#define le_to_cpu(v) \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), le16_to_cpu(v), \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), le32_to_cpu(v), \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), le64_to_cpu(v), \
+   (void)0
+
+#define cpu_to_le(v) \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), cpu_to_le16(v), \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), cpu_to_le32(v), \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), cpu_to_le64(v), \
+   (void)0
+
+#define cpu_to_be(v) \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint8_t) , v, \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint16_t), cpu_to_be16(v), \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint32_t), cpu_to_be32(v), \
+   __builtin_choose_expr(sizeof(v) == sizeof(uint64_t), cpu_to_be64(v), \
+   (void)0
+
+#endif
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/16] powerpc/perf/hv-24x7: Documentaion for new sysfs entries which expose descriptions

2014-05-27 Thread Cody P Schafer

CC: Sukadev Bhattiprolu 
Signed-off-by: Cody P Schafer 
---
 .../testing/sysfs-bus-event_source-devices-hv_24x7 | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
index e78ee79..5b501d7 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
@@ -21,3 +21,25 @@ Contact: Cody P Schafer 
 Description:
Exposes the "version" field of the 24x7 catalog. This is also
extractable from the provided binary "catalog" sysfs entry.
+
+What:  /sys/bus/event_source/devices/hv_24x7/event_descs/
+Date:  February 2014
+Contact:   Cody P Schafer 
+Description:
+   Provides the description of a particular event as provided by
+   the firmware. If firmware does not provide a description, no
+   file will be created.
+
+   Note that the event-name lacks the domain suffix appended for
+   events in the events/ dir.
+
+What:  
/sys/bus/event_source/devices/hv_24x7/event_long_descs/
+Date:  February 2014
+Contact:   Cody P Schafer 
+Description:
+   Provides the "long" description of a particular event as
+   provided by the firmware. If firmware does not provide a
+   description, no file will be created.
+
+   Note that the event-name lacks the domain suffix appended for
+   events in the events/ dir.
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf, tools: Support spark lines in perf stat v3

2014-05-27 Thread Andi Kleen

> google says this pragma got obsolete.. any reason for using this?

google is wrong.

It's standard e.g. on MacOS and imho simpler and nicer than 
the usual ifdef. But I removed it.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/16] tools/perf: support parsing parameterized events

2014-05-27 Thread Cody P Schafer

Enable event specification like:

pmu/event_name,param1=0x1,param2=0x4/

Assuming that

/sys/bus/event_source/devices/pmu/events/event_name

Contains something like

bar=param2,foo=1,baz=param1

CC: Sukadev Bhattiprolu 
Signed-off-by: Cody P Schafer 
---
 tools/perf/util/parse-events.h |  1 +
 tools/perf/util/pmu.c  | 55 ++
 2 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index f1cb4c4..1147e87 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -60,6 +60,7 @@ struct parse_events_term {
int type_val;
int type_term;
struct list_head list;
+   bool used;
 };
 
 struct parse_events_evlist {
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 906ae40..db53fac 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -504,27 +504,57 @@ static __u64 pmu_format_value(unsigned long *format, 
__u64 value)
 }
 
 /*
+ * Term is a string term, and might be a param-term. Try to look up it's value
+ * in the remaining terms.
+ * - We have a term like "base-or-format-term=param-term",
+ * - We need to find the value supplied for "param-term" (with param-term named
+ *   in a config string) later on in the term list.
+ */
+static int pmu_resolve_param_term(struct parse_events_term *term,
+ struct list_head *head_terms,
+ __u64 *value)
+{
+   struct parse_events_term *t;
+
+   list_for_each_entry(t, head_terms, list)
+   if (t->type_val == PARSE_EVENTS__TERM_TYPE_NUM) {
+   if (!strcmp(t->config, term->val.str)) {
+   t->used = true;
+   *value = t->val.num;
+   return 0;
+   }
+   }
+
+   return -1;
+}
+
+/*
  * Setup one of config[12] attr members based on the
  * user input data - term parameter.
  */
 static int pmu_config_term(struct list_head *formats,
   struct perf_event_attr *attr,
-  struct parse_events_term *term)
+  struct parse_events_term *term,
+  struct list_head *head_terms)
 {
struct perf_pmu_format *format;
__u64 *vp;
+   __u64 val;
+
+   /*
+* If this is a parameter we've already used for parameterized-eval,
+* skip it in normal eval.
+*/
+   if (term->used)
+   return 0;
 
/*
-* Support only for hardcoded and numnerial terms.
 * Hardcoded terms should be already in, so nothing
 * to be done for them.
 */
if (parse_events__is_hardcoded_term(term))
return 0;
 
-   if (term->type_val != PARSE_EVENTS__TERM_TYPE_NUM)
-   return -EINVAL;
-
format = pmu_find_format(formats, term->config);
if (!format)
return -EINVAL;
@@ -544,11 +574,16 @@ static int pmu_config_term(struct list_head *formats,
}
 
/*
-* XXX If we ever decide to go with string values for
-* non-hardcoded terms, here's the place to translate
-* them into value.
+* Either directly use a numeric term, or try to translate string terms
+* using event parameters.
 */
-   *vp |= pmu_format_value(format->bits, term->val.num);
+   if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM)
+   val = term->val.num;
+   else
+   if (pmu_resolve_param_term(term, head_terms, ))
+   return -EINVAL;
+
+   *vp |= pmu_format_value(format->bits, val);
return 0;
 }
 
@@ -559,7 +594,7 @@ int perf_pmu__config_terms(struct list_head *formats,
struct parse_events_term *term;
 
list_for_each_entry(term, head_terms, list)
-   if (pmu_config_term(formats, attr, term))
+   if (pmu_config_term(formats, attr, term, head_terms))
return -EINVAL;
 
return 0;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/16] tools/perf: document parameterized events and note symbolically formed events

2014-05-27 Thread Cody P Schafer

CC: Sukadev Bhattiprolu 
Signed-off-by: Cody P Schafer 
---
 tools/perf/Documentation/perf-list.txt   | 13 +
 tools/perf/Documentation/perf-record.txt |  5 +
 2 files changed, 18 insertions(+)

diff --git a/tools/perf/Documentation/perf-list.txt 
b/tools/perf/Documentation/perf-list.txt
index 6fce6a6..626818b 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -89,6 +89,19 @@ raw encoding of 0x1A8 can be used:
 You should refer to the processor specific documentation for getting these
 details. Some of them are referenced in the SEE ALSO section below.
 
+PARAMETERIZED EVENTS
+
+
+Some pmu events listed by 'perf-list' will be displayed with '?' in them. For
+example:
+
+  hv_gpci/dtbp_ptitc,phys_processor_idx=?/
+
+This means that when provided as an event, a value for phys_processor_idx must
+also be supplied. For example:
+
+  perf stat -e 'hv_gpci/dtbp_ptitc,phys_processor_idx=0x2/' ...
+
 OPTIONS
 ---
 
diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index c71b0f3..c005180 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -33,6 +33,11 @@ OPTIONS
 - a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a
  hexadecimal event descriptor.
 
+   - a symbolicly formed PMU event like 'pmu/value1=0x3,value2/' where
+ 'value1' and 'value2' are defined as formats in
+ /sys/bus/event_sources/devices/pmu/format/* OR are one of 'config',
+ 'config1', 'config2'.
+
 - a hardware breakpoint event in the form of '\mem:addr[:access]'
   where addr is the address in memory you want to break in.
   Access is the memory access type (read, write, execute) it can
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/16] perf Documentation: add event parameters

2014-05-27 Thread Cody P Schafer

Event parameters are a basic way for partial events to be specified in
sysfs with per-event names given to the fields that need to be filled in
when using a particular event.

It is intended for supporting cases where the single 'cpu' parameter is
insufficient. For example, POWER 8 has events for physical
sockets/cores/cpus that are accessible from with virtual machines. To
keep using the single 'cpu' parameter we'd need to perform a mapping
between Linux's cpus and the physical machine's cpus (in this case
Linux is running under a hypervisor). This isn't possible because
bindings between our cpus and physical cpus may not be fixed, and we
probably won't have a "cpu" on each physical cpu.

CC: Sukadev Bhattiprolu 
Signed-off-by: Cody P Schafer 
---
 Documentation/ABI/testing/sysfs-bus-event_source-devices-events | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
index 20979f8..c1f9850 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
@@ -52,12 +52,18 @@ Description:Per-pmu performance monitoring events 
specific to the running syste
event=0x2abc
event=0x423,inv,cmask=0x3
domain=0x1,offset=0x8,starting_index=0x
+   domain=0x1,offset=0x8,starting_index=phys_cpu
 
Each of the assignments indicates a value to be assigned to a
particular set of bits (as defined by the format file
corresponding to the ) in the perf_event structure passed
to the perf_open syscall.
 
+   In the case of the last example, a value replacing "phys_cpu"
+   would need to be provided by the user selecting the particular
+   event. This is refered to as "event parameterization". All
+   non-numerical values indicate an event parameter.
+
 What: /sys/bus/event_source/devices//events/.unit
 Date: 2014/02/24
 Contact:   Linux kernel mailing list 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/16] tools/perf: extend format_alias() to include event parameters

2014-05-27 Thread Cody P Schafer

This causes `perf list pmu` to show parameters for parameterized events
like follows:

  pmu/event_name,param1=?,param2=?/ [Kernel PMU event]

An example:

  
hv_gpci/dispatch_timebase_by_processor_processor_time_in_timebase_cycles,phys_processor_idx=?/
 [Kernel PMU event]

CC: Sukadev Bhattiprolu 
Signed-off-by: Cody P Schafer 
---
 tools/perf/util/pmu.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index db53fac..7b8d067 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -741,10 +741,33 @@ void perf_pmu__set_format(unsigned long *bits, long from, 
long to)
set_bit(b, bits);
 }
 
+static int sub_non_neg(int a, int b)
+{
+   if (b > a)
+   return 0;
+   return a - b;
+}
+
 static char *format_alias(char *buf, int len, struct perf_pmu *pmu,
  struct perf_pmu_alias *alias)
 {
-   snprintf(buf, len, "%s/%s/", pmu->name, alias->name);
+   struct parse_events_term *term;
+   int used = snprintf(buf, len, "%s/%s", pmu->name, alias->name);
+
+   list_for_each_entry(term, >terms, list)
+   if (term->type_val == PARSE_EVENTS__TERM_TYPE_STR)
+   used += snprintf(buf + used, sub_non_neg(len, used),
+   ",%s=?", term->val.str);
+
+   if (sub_non_neg(len, used) > 0) {
+   buf[used] = '/';
+   used++;
+   }
+   if (sub_non_neg(len, used) > 0) {
+   buf[used] = '\0';
+   used++;
+   } else
+   buf[len - 1] = '\0';
return buf;
 }
 
@@ -795,6 +818,7 @@ void print_pmu_events(const char *event_glob, bool 
name_only)
if (is_cpu && !name_only)
aliases[j] = format_alias_or(buf, sizeof(buf),
  pmu, alias);
+
aliases[j] = strdup(aliases[j]);
j++;
}
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/16] perf: provide sysfs_show for struct perf_pmu_events_attr

2014-05-27 Thread Cody P Schafer

(struct perf_pmu_events_attr) is defined in include/linux/perf_event.h,
but the only "show" for it is in x86 and contains x86 specific stuff.

Make a generic one for those of us who are just using the event_str.

CC: Sukadev Bhattiprolu 
Signed-off-by: Cody P Schafer 
---
 include/linux/perf_event.h | 3 +++
 kernel/events/core.c   | 8 
 2 files changed, 11 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 3356abc..6c1d6dd 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -867,6 +867,9 @@ struct perf_pmu_events_attr {
const char *event_str;
 };
 
+ssize_t perf_event_sysfs_show(struct device *dev, struct device_attribute 
*attr,
+ char *page);
+
 #define PMU_EVENT_ATTR(_name, _var, _id, _show)
\
 static struct perf_pmu_events_attr _var = {\
.attr = __ATTR(_name, 0444, _show, NULL),   \
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f83a71a..6830e21 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7971,6 +7971,14 @@ void __init perf_event_init(void)
 != 1024);
 }
 
+ssize_t perf_event_sysfs_show(struct device *dev, struct device_attribute 
*attr,
+ char *page)
+{
+   struct perf_pmu_events_attr *pmu_attr =
+   container_of(attr, struct perf_pmu_events_attr, attr);
+   return sprintf(page, "%s\n", pmu_attr->event_str);
+}
+
 static int __init perf_event_sysfs_init(void)
 {
struct pmu *pmu;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/16] tools/perf: annotate list_head with type info

2014-05-27 Thread Cody P Schafer

CC: Sukadev Bhattiprolu 
Signed-off-by: Cody P Schafer 
---
 tools/perf/util/pmu.c | 4 ++--
 tools/perf/util/pmu.h | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 00a7dcb..906ae40 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -14,8 +14,8 @@
 
 struct perf_pmu_alias {
char *name;
-   struct list_head terms;
-   struct list_head list;
+   struct list_head terms; /* HEAD struct parse_events_term -> list */
+   struct list_head list;  /* ELEM */
char unit[UNIT_MAX_LEN+1];
double scale;
 };
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 8b64125..4a85230 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -17,9 +17,9 @@ struct perf_pmu {
char *name;
__u32 type;
struct cpu_map *cpus;
-   struct list_head format;
-   struct list_head aliases;
-   struct list_head list;
+   struct list_head format;  /* HEAD struct perf_pmu_format -> list */
+   struct list_head aliases; /* HEAD struct perf_pmu_alias -> list */
+   struct list_head list;/* ELEM */
 };
 
 struct perf_pmu *perf_pmu__find(const char *name);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/16] powerpc/perf/hv-24x7: use kmem_cache instead of aligned stack allocations

2014-05-27 Thread Cody P Schafer

Ian pointed out the use of __aligned(4096) caused rather large stack
consumption in single_24x7_request(), so use the kmem_cache
hv_page_cache (which we've already got set up for other allocations)
insead of allocating locally.

CC: Sukadev Bhattiprolu 
Reported-by: Ian Munsie 
Signed-off-by: Cody P Schafer 
---
 arch/powerpc/perf/hv-24x7.c | 52 -
 1 file changed, 37 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index e0766b8..9a7a830 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -294,7 +294,7 @@ static unsigned long single_24x7_request(u8 domain, u32 
offset, u16 ix,
 u16 lpar, u64 *res,
 bool success_expected)
 {
-   unsigned long ret;
+   unsigned long ret = -ENOMEM;
 
/*
 * request_buffer and result_buffer are not required to be 4k aligned,
@@ -304,7 +304,27 @@ static unsigned long single_24x7_request(u8 domain, u32 
offset, u16 ix,
struct reqb {
struct hv_24x7_request_buffer buf;
struct hv_24x7_request req;
-   } __packed __aligned(4096) request_buffer = {
+   } __packed *request_buffer;
+   struct resb {
+   struct hv_24x7_data_result_buffer buf;
+   struct hv_24x7_result res;
+   struct hv_24x7_result_element elem;
+   __be64 result;
+   } __packed *result_buffer;
+
+   BUILD_BUG_ON(sizeof(*request_buffer) > 4096);
+   BUILD_BUG_ON(sizeof(*result_buffer) > 4096);
+
+   request_buffer = kmem_cache_alloc(hv_page_cache, GFP_USER);
+
+   if (!request_buffer)
+   goto out_reqb;
+
+   result_buffer = kmem_cache_zalloc(hv_page_cache, GFP_USER);
+   if (!result_buffer)
+   goto out_resb;
+
+   *request_buffer = (struct reqb) {
.buf = {
.interface_version = HV_24X7_IF_VERSION_CURRENT,
.num_requests = 1,
@@ -320,28 +340,30 @@ static unsigned long single_24x7_request(u8 domain, u32 
offset, u16 ix,
}
};
 
-   struct resb {
-   struct hv_24x7_data_result_buffer buf;
-   struct hv_24x7_result res;
-   struct hv_24x7_result_element elem;
-   __be64 result;
-   } __packed __aligned(4096) result_buffer = {};
-
ret = plpar_hcall_norets(H_GET_24X7_DATA,
-   virt_to_phys(_buffer), sizeof(request_buffer),
-   virt_to_phys(_buffer),  sizeof(result_buffer));
+   virt_to_phys(request_buffer), sizeof(*request_buffer),
+   virt_to_phys(result_buffer),  sizeof(*result_buffer));
 
if (ret) {
if (success_expected)
pr_err_ratelimited("hcall failed: %d %#x %#x %d => 
0x%lx (%ld) detail=0x%x failing ix=%x\n",
domain, offset, ix, lpar,
ret, ret,
-   result_buffer.buf.detailed_rc,
-   result_buffer.buf.failing_request_ix);
-   return ret;
+   result_buffer->buf.detailed_rc,
+   result_buffer->buf.failing_request_ix);
+   goto out_hcall;
}
 
-   *res = be64_to_cpu(result_buffer.result);
+   *res = be64_to_cpu(result_buffer->result);
+   kfree(result_buffer);
+   kfree(request_buffer);
+   return ret;
+
+out_hcall:
+   kfree(result_buffer);
+out_resb:
+   kfree(request_buffer);
+out_reqb:
return ret;
 }
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/16] perf Documentation: sysfs events/ interfaces

2014-05-27 Thread Cody P Schafer

Add documentation for the , .scale, and .unit
files in sysfs.

.scale and .unit were undocumented.
 was previously documented only for specific powerpc pmu events.

CC: Sukadev Bhattiprolu 
Signed-off-by: Cody P Schafer 
---
 .../testing/sysfs-bus-event_source-devices-events  | 60 ++
 1 file changed, 60 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
index 7b40a3c..a5226f0 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
@@ -599,3 +599,63 @@ Description:   POWER-systems specific performance 
monitoring events
Further, multiple terms like 'event=0x' can be specified
and separated with comma. All available terms are defined in
the /sys/bus/event_source/devices//format file.
+
+What: /sys/bus/event_source/devices//events/
+Date: 2014/02/24
+Contact:   Linux kernel mailing list 
+Description:   Per-pmu performance monitoring events specific to the running 
system
+
+   Each file (except for some of those with a '.' in them, '.unit'
+   and '.scale') in the 'events' directory describes a single
+   performance monitoring event supported by the . The name
+   of the file is the name of the event.
+
+   File contents:
+
+   [=][,[=]]...
+
+   Where  is one of the terms listed under
+   /sys/bus/event_source/devices//format/ and  is
+   a number is base-16 format with a '0x' prefix (lowercase only).
+   If a  is specified alone (without an assigned value), it
+   is implied that 0x1 is assigned to that .
+
+   Examples (each of these lines would be in a seperate file):
+
+   event=0x2abc
+   event=0x423,inv,cmask=0x3
+   domain=0x1,offset=0x8,starting_index=0x
+
+   Each of the assignments indicates a value to be assigned to a
+   particular set of bits (as defined by the format file
+   corresponding to the ) in the perf_event structure passed
+   to the perf_open syscall.
+
+What: /sys/bus/event_source/devices//events/.unit
+Date: 2014/02/24
+Contact:   Linux kernel mailing list 
+Description:   Perf event units
+
+   A string specifying the English plural numerical unit that 

+   (once multiplied by .scale) represents.
+
+   Example:
+
+   Joules
+
+What: /sys/bus/event_source/devices//events/.scale
+Date: 2014/02/24
+Contact:   Linux kernel mailing list 
+Description:   Perf event scaling factors
+
+   A string representing a floating point value expressed in
+   scientific notation to be multiplied by the event count
+   recieved from the kernel to match the unit specified in the
+   .unit file.
+
+   Example:
+
+   2.3283064365386962890625e-10
+
+   This is provided to avoid performing floating point arithmetic
+   in the kernel.
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1648 matches

Mail list logo