On Thu, Mar 26, 2015 at 04:42:07PM +1100, Gavin Shan wrote:
There are two equivalent sets of PE state constants, defined in
arch/powerpc/include/asm/eeh.h and include/uapi/linux/vfio.h.
Though the names are different, their corresponding values are
exactly same. The former is used by EEH core
Cedric Le Goater c...@fr.ibm.com writes:
The sensor service in OPAL only handles one FSP request at a time and
returns OPAL_BUSY if one is already in progress. The lock covers this case
but we could also remove it return EBUSY to the driver or even retry the
call. That might be dangerous
On Thu, 2015-26-03 at 10:46:56 UTC, Philippe Bergheaud wrote:
Fix the attribute name of the configuration record class ID.
Signed-off-by: Philippe Bergheaud fe...@linux.vnet.ibm.com
---
Documentation/ABI/testing/sysfs-class-cxl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
On (03/26/15 08:05), Benjamin Herrenschmidt wrote:
PowerPC folks, what do you think?
I'll give it another look today.
Cheers,
Ben.
Hi Ben,
did you have a chance to look at this?
--Sowmini
___
Linuxppc-dev mailing list
RTC interrupt uses IRQ11 on T2080QDS.
Signed-off-by: Shengzhou Liu shengzhou@freescale.com
---
arch/powerpc/boot/dts/t208xqds.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/boot/dts/t208xqds.dtsi
b/arch/powerpc/boot/dts/t208xqds.dtsi
index
On 26/03/2015 19:55, Ingo Molnar wrote:
* Laurent Dufour lduf...@linux.vnet.ibm.com wrote:
+{
+unsigned long vdso_end, vdso_start;
+
+if (!mm-context.vdso_base)
+return;
+vdso_start = mm-context.vdso_base;
+
+#ifdef CONFIG_PPC64
+/* Calling is_32bit_task()
Add support for EON en25s64 spi device.
Signed-off-by: Shengzhou Liu shengzhou@freescale.com
---
drivers/mtd/spi-nor/spi-nor.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
index 0f8ec3c..f8acef7 100644
---
On 03/27/2015 11:36 AM, Benjamin Herrenschmidt wrote:
On Fri, 2015-03-27 at 20:59 +1100, Michael Ellerman wrote:
Can you put it in opal.h and give it a better name, maybe
opal_error_code() ?
Do we want it to be inlined all the time ? Feels more like something we
should have in opal.c
On Fri, 2015-03-27 at 20:59 +1100, Michael Ellerman wrote:
Can you put it in opal.h and give it a better name, maybe
opal_error_code() ?
Do we want it to be inlined all the time ? Feels more like something we
should have in opal.c
Also we only want to call it when we forward the error code
On 03/27/2015 10:59 AM, Michael Ellerman wrote:
On Thu, 2015-26-03 at 16:04:45 UTC, =?utf-8?q?C=C3=A9dric_Le_Goater?= wrote:
OPAL has its own list of return codes. The patch provides a translation
of such codes in errnos for the opal_sensor_read call.
Signed-off-by: Cédric Le Goater
By default we enable CONFIG_I2C_MUX and CONFIG_I2C_MUX_PCA954x,
which are needed on T2080QDS, T4240QDS, B4860QDS, etc.
Signed-off-by: Shengzhou Liu shengzhou@freescale.com
---
against 'next' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git
On Thu, 2015-26-03 at 16:04:45 UTC, =?utf-8?q?C=C3=A9dric_Le_Goater?= wrote:
OPAL has its own list of return codes. The patch provides a translation
of such codes in errnos for the opal_sensor_read call.
Signed-off-by: Cédric Le Goater c...@fr.ibm.com
---
On Tue, 2015-13-01 at 10:22:34 UTC, Anshuman Khandual wrote:
This patch adds a test case for the system wide DSCR default
value, which when changed through it's sysfs interface must
be visible to all threads reading DSCR either through the
privilege state SPR or the problem state SPR. The DSCR
Michael Ellerman m...@ellerman.id.au writes:
The powernv code has some conditional support for running on bare metal
machines that have no OPAL firmware, but provide RTAS.
No released machines ever supported that, and even in the lab it was
just a transitional hack in the days when OPAL was
I can't apply the patch. There seem to be whitespace problems. Please fix
the patch or your mail sending.
Sorry for the Delayed response and It's my Bad as I didn't pass it through
checkpatch.
I would send a fresh patch.
Thanks
Amit.
---
drivers/i2c/busses/i2c-mpc.c | 3 ++-
1 file
This makes use of the it_page_size from the iommu_table struct
as page size can differ.
This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code
as recently introduced IOMMU_PAGE_XXX macros do not include
IOMMU_PAGE_SHIFT.
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
This moves page pinning (get_user_pages_fast()/put_page()) code out of
the platform IOMMU code and puts it to VFIO IOMMU driver where it belongs
to as the platform code does not deal with page pinning.
This makes iommu_take_ownership()/iommu_release_ownership() deal with
the IOMMU table bitmap
This clears the TCE table when a container is being closed as this is
a good thing to leave the table clean before passing the ownership
back to the host kernel.
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
drivers/vfio/vfio_iommu_spapr_tce.c | 14 +++---
1 file changed, 11
This checks that the TCE table page size is not bigger that the size of
a page we just pinned and going to put its physical address to the table.
Otherwise the hardware gets unwanted access to physical memory between
the end of the actual page and the end of the aligned up TCE page.
Since
This enables sPAPR defined feature called Dynamic DMA windows (DDW).
Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus
where devices are allowed to do DMA. These ranges are called DMA windows.
By default, there is a single DMA window, 1 or 2GB big, mapped at zero
on a
This is a part of moving DMA window programming to an iommu_ops
callback.
This is a mechanical patch.
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
arch/powerpc/platforms/powernv/pci-ioda.c | 85 ---
1 file changed, 56 insertions(+), 29 deletions(-)
diff
This adds missing locks in iommu_take_ownership()/
iommu_release_ownership().
This marks all pages busy in iommu_table::it_map in order to catch
errors if there is an attempt to use this table while ownership over it
is taken.
This only clears TCE content if there is no page marked busy in
This extends iommu_table_group_ops by a set of callbacks to support
dynamic DMA windows management.
query() returns IOMMU capabilities such as default DMA window address and
supported number of DMA windows and TCE table levels.
create_table() creates a TCE table with specific parameters.
it
At the moment DMA map/unmap requests are handled irrespective to
the container's state. This allows the user space to pin memory which
it might not be allowed to pin.
This adds checks to MAP/UNMAP that the container is enabled, otherwise
-EPERM is returned.
Signed-off-by: Alexey Kardashevskiy
This adds create/remove window ioctls to create and remove DMA windows.
sPAPR defines a Dynamic DMA windows capability which allows
para-virtualized guests to create additional DMA windows on a PCI bus.
The existing linux kernels use this new window to map the entire guest
memory and switch to the
This is a pretty mechanical patch to make next patches simpler.
New tce_iommu_unuse_page() helper does put_page() now but it might skip
that after the memory registering patch applied.
As we are here, this removes unnecessary checks for a value returned
by pfn_to_page() as it cannot possibly
There moves locked pages accounting to helpers.
Later they will be reused for Dynamic DMA windows (DDW).
This reworks debug messages to show the current value and the limit.
This stores the locked pages number in the container so when unlocking
the iommu table pointer won't be needed. This does
We are adding support for DMA memory pre-registration to be used in
conjunction with VFIO. The idea is that the userspace which is going to
run a guest may want to pre-register a user space memory region so
it all gets pinned once and never goes away. Having this done,
a hypervisor will not have
This adds a way for the IOMMU user to know how much a new table will
use so it can be accounted in the locked_vm limit before allocation
happens.
This stores the allocated table size in pnv_pci_ioda2_create_table()
so the locked_vm counter can be updated correctly when a table is
being disposed.
On 27.02.2015 03:05, Scott Wood wrote:
On Thu, 2015-02-26 at 14:31 +0100, Sebastian Andrzej Siewior wrote:
On 02/26/2015 02:02 PM, Paolo Bonzini wrote:
On 24/02/2015 00:27, Scott Wood wrote:
This isn't a host PIC driver. It's guest PIC emulation, some of which
is indeed not suitable for a
In order to support memory pre-registration, we need a way to track
the use of every registered memory region and only allow unregistration
if a region is not in use anymore. So we need a way to tell from what
region the just cleared TCE was from.
This adds a userspace view of the TCE table into
The iommu_free_table helper release memory it is using (the TCE table and
@it_map) and release the iommu_table struct as well. We might not want
the very last step as we store iommu_table in parent structures.
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
OPAL has its own list of return codes. The patch provides a translation
of such codes in errnos for the opal_sensor_read call, and possibly
others if needed.
Signed-off-by: Cédric Le Goater c...@fr.ibm.com
---
Changes since v2 :
- renamed and moved the routine to opal.[ch]
- changed default
This changes few functions to receive a iommu_table_group pointer
rather than PE as they are going to be a part of upcoming
iommu_table_group_ops callback set.
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
arch/powerpc/platforms/powernv/pci-ioda.c | 13 -
1 file changed, 8
At the moment writing new TCE value to the IOMMU table fails with EBUSY
if there is a valid entry already. However PAPR specification allows
the guest to write new TCE value without clearing it first.
Another problem this patch is addressing is the use of pool locks for
external IOMMU users such
The existing implementation accounts the whole DMA window in
the locked_vm counter. This is going to be worse with multiple
containers and huge DMA windows. Also, real-time accounting would requite
additional tracking of accounted pages due to the page size difference -
IOMMU uses 4K pages and
On Fri, Mar 27, 2015 at 05:38:30PM +0800, Shengzhou Liu wrote:
Add support for EON en25s64 spi device.
Signed-off-by: Shengzhou Liu shengzhou@freescale.com
---
drivers/mtd/spi-nor/spi-nor.c | 1 +
1 file changed, 1 insertion(+)
This is a MTD driver, not a SPI driver - you need to send
Normally a bitmap from the iommu_table is used to track what TCE entry
is in use. Since we are going to use iommu_table without its locks and
do xchg() instead, it becomes essential not to put bits which are not
implied in the direction flag.
This adds iommu_direction_to_tce_perm() (its
At the moment the iommu_table struct has a set_bypass() which enables/
disables DMA bypass on IODA2 PHB. This is exposed to POWERPC IOMMU code
which calls this callback when external IOMMU users such as VFIO are
about to get over a PHB.
The set_bypass() callback is not really an iommu_table
TCE tables might get too big in case of 4K IOMMU pages and DDW enabled
on huge guests (hundreds of GB of RAM) so the kernel might be unable to
allocate contiguous chunk of physical memory to store the TCE table.
To address this, POWER8 CPU (actually, IODA2) supports multi-level TCE tables,
up to
Currently, when a sensor value is read, the kernel calls OPAL, which in
turn builds a message for the FSP, and waits for a message back.
The new device tree for OPAL sensors [1] adds new sensors that can be
read synchronously (core temperatures for instance) and that don't need
to wait for a
The pnv_pci_ioda_tce_invalidate() helper invalidates TCE cache. It is
supposed to be called on IODA1/2 and not called on p5ioc2. It receives
start and end host addresses of TCE table. This approach makes it possible
to get pnv_pci_ioda_tce_invalidate() unintentionally called on p5ioc2.
Another
This is a part of moving TCE table allocation into an iommu_ops
callback to support multiple IOMMU groups per one VFIO container.
This enforce window size to be a power of two.
This is a pretty mechanical patch.
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
At the moment only one group per container is supported.
POWER8 CPUs have more flexible design and allows naving 2 TCE tables per
IOMMU group so we can relax this limitation and support multiple groups
per container.
This adds TCE table descriptors to a container and uses iommu_table_group_ops
to
This replaces multiple calls of kzalloc_node() with a new
iommu_table_alloc() helper. Right now it calls kzalloc_node() but
later it will be modified to allocate a iommu_table_group struct with
a single iommu_table in it.
Later the helper will allocate a iommu_table_group struct which embeds
the
This moves iommu_table creation to the beginning. This is a mechanical
patch.
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
arch/powerpc/platforms/powernv/pci-ioda.c | 34 ---
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git
This adds a iommu_table_ops struct and puts pointer to it into
the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush
callbacks from ppc_md to the new struct where they really belong to.
This adds the requirement for @it_ops to be initialized before calling
iommu_init_table() to
The opal sensor mutex protects the opal_sensor_read call which
can return a OPAL_BUSY code on IBM Power systems if a previous
request is in progress.
This can be handled at user level with a retry.
Signed-off-by: Cédric Le Goater c...@fr.ibm.com
---
Changes since v2 :
- removed a goto label
Modern IBM POWERPC systems support multiple (currently two) TCE tables
per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
for TCE tables. Right now just one table is supported.
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
Documentation/vfio.txt |
This replaces iommu_take_ownership()/iommu_release_ownership() calls
with the callback calls and it is up to the platform code to call
iommu_take_ownership()/iommu_release_ownership() if needed.
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
arch/powerpc/include/asm/iommu.h| 4 +--
This is to make extended ownership and multiple groups support patches
simpler for review.
This is a mechanical patch.
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
drivers/vfio/vfio_iommu_spapr_tce.c | 38 ++---
1 file changed, 23 insertions(+), 15
Before the IOMMU user (VFIO) would take control over the IOMMU table
belonging to a specific IOMMU group. This approach did not allow sharing
tables between IOMMU groups attached to the same container.
This introduces a new IOMMU ownership flavour when the user can not
just control the existing
On 03/27/2015 12:28 PM, Nishanth Aravamudan wrote:
@@ -2585,7 +2585,7 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
for (i = 0; i = ZONE_NORMAL; i++) {
zone = pgdat-node_zones[i];
- if (!populated_zone(zone))
+ if
During suspend/migration operation we must wait for the VASI state reported
by the hypervisor to become Suspending prior to making the ibm,suspend-me
RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state variable
that exposes the VASI state to the caller. This is unnecessary as the
Based upon 675becce15 (mm: vmscan: do not throttle based on pfmemalloc
reserves if node has no ZONE_NORMAL) from Mel.
We have a system with the following topology:
(0) root @ br30p03: /root
# numactl -H
available: 3 nodes (0,2-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[ Sorry, typo'd anton's address ]
On 27.03.2015 [12:28:50 -0700], Nishanth Aravamudan wrote:
Based upon 675becce15 (mm: vmscan: do not throttle based on pfmemalloc
reserves if node has no ZONE_NORMAL) from Mel.
We have a system with the following topology:
(0) root @ br30p03: /root
#
On Fri, Mar 27, 2015 at 3:28 PM, Nishanth Aravamudan
n...@linux.vnet.ibm.com wrote:
Based upon 675becce15 (mm: vmscan: do not throttle based on pfmemalloc
reserves if node has no ZONE_NORMAL) from Mel.
We have a system with the following topology:
(0) root @ br30p03: /root
# numactl -H
On Fri, 2015-03-27 at 10:45 +1100, Michael Ellerman wrote:
On Thu, 2015-03-26 at 10:31 -0500, Emil Medve wrote:
Hello Kumar,
On 03/26/2015 10:18 AM, Kumar Gala wrote:
Why no commit message with what issue this change was trying to fix?
A while back, when I attempted to remove
crickets.
How do we make progress in this area?
(a) can we assume Andi's json format is acceptable? We would like
to know this so we don't have to reformat our data more than once.
(b) Would an acceptable interim resolution the 'download area'
problem be to take Andi's perf: Add support for
On 27.03.2015 [13:17:59 -0700], Dave Hansen wrote:
On 03/27/2015 12:28 PM, Nishanth Aravamudan wrote:
@@ -2585,7 +2585,7 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
for (i = 0; i = ZONE_NORMAL; i++) {
zone = pgdat-node_zones[i];
- if
We currently have a special syscall for switching endianness. This is
syscall number 0x1ebe, which is handled explicitly in the 64-bit syscall
exception entry.
That has a few problems, firstly the syscall number is outside of the
usual range, which confuses various tools. For example strace
On Thu, 2015-03-26 at 11:54 +0530, Anshuman Khandual wrote:
On 03/26/2015 06:06 AM, Michael Ellerman wrote:
On Wed, 2015-03-25 at 17:02 +0530, Anshuman Khandual wrote:
On 03/25/2015 10:58 AM, Michael Ellerman wrote:
On Wed, 2015-03-18 at 16:04 +1100, Michael Ellerman wrote:
On Tue,
Thanks for supporting the JSON format too.
(c) If not, given we don't know how to get us out of the current
status quo, can this patchseries still be applied, given the
original complaint was the size of our events-list.h (whereas
The Intel core event lists are far larger even
(and will grow
This adds a test of the switch_endian() syscall we added in the previous
commit.
We test it by calling the endian switch syscall, and then executing some
code in the other endian to check everything went as expected. That code
checks registers we expect to be maintained are. If the endian switch
64 matches
Mail list logo