Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2018-12-08 Thread Christoph Hellwig
On Sat, Dec 08, 2018 at 12:47:30PM -0500, Jerome Glisse wrote:
> Most of the user of GUP are well behave (everything under driver/gpu and
> so is mellanox driver and many other) ie they abide by mmu notifier
> invalidation call backs. They are a handfull of device driver that thought
> they could just do GUP and ignore the mmu notifier part and those are the
> one being problematic. So to me it feels like bystander are be shot for no
> good reasons.

get_user_pages is used by every single direct I/O, and while the race
windows in that case are small they very much exists.


Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2018-12-08 Thread Christoph Hellwig
On Sat, Dec 08, 2018 at 10:09:26AM -0800, Dan Williams wrote:
> Another fix that may put pressure 'struct page' is resolving the
> untenable situation of dax being incompatible with reflink, i.e.
> reflink currently requires page-cache pages. Dave has talked about
> silently establishing page-cache entries when a dax-page is cow'd for
> reflink, but I wonder if we could go the other way and introduce the
> mechanism of a page belonging to multiple mappings simultaneously and
> managed by the filesystem.

FYI, I had a a prototype for DAX + reflink that didn't require
the page cache, although it badly reimplemented parts of it.  But
that was a long time ago, before we started requiring struct page
for the DAX memory.


Re: use generic DMA mapping code in powerpc V4

2018-12-08 Thread Christoph Hellwig
On Sun, Dec 02, 2018 at 05:11:02PM +1100, Benjamin Herrenschmidt wrote:
> Talking of which ... Christoph, not sure if we can do something about
> this at the DMA API level or keep hacks but some adapters such as the
> nVidia GPUs have a HW hack we can use to work around their limitations
> in that case.
> 
> They have a register that can program a fixed value for the top bits
> that they don't support.
> 
> This works fine for any linear mapping with an offset, provided they
> can program the offset in that register and they have enough DMA range
> to cover all memory from that offset.
> 
> I can probably get the info about this from them so we can exploit it
> in nouveau.

I think we can expose the direct mapping offset if people care enough,
we just have to be very careful designing the API.  I'll happily leave
that to those that actually want to use it, but I'll gladly help
reviewing it.


Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2018-12-08 Thread Christoph Hellwig
On Sat, Dec 08, 2018 at 11:33:53AM -0500, Jerome Glisse wrote:
> Patchset to use HMM inside nouveau have already been posted, some
> of the bits have already made upstream and more are line up for
> next merge window.

Even with that it is a relative fringe feature compared to making
something like get_user_pages() that is literally used every to actually
work properly.

So I think we need to kick out HMM here and just find another place for
it to store data.

And just to make clear that I'm not picking just on this - the same is
true to a just a little smaller extent for the pgmap..


Re: [PATCH v2 3/8] dma-debug: Refactor dma_debug_entry allocation

2018-12-06 Thread Christoph Hellwig
On Thu, Dec 06, 2018 at 06:10:47PM +, Robin Murphy wrote:
> AFAICS the tmp list wasn't about locking as much as meaning that if 
> kzalloc() failed at any point, we can free the partial allocation and back 
> out without disturbing free_entries at all - that still makes sense to me 
> up until patch #8 where we embrace the "never free anything" paradigm and 
> rip out the final traces.
>
> That said, maybe I should just drop the refactoring of 
> dma_debug_resize_entries() now that I'm deleting it as part of the same 
> series anyway - then I guess I squash what's left of this patch into #4 and 
> bring forward some of the simplification from #8 to start with. Would that 
> be more agreeable?

Yes, I just noticed all this goes away toward the end anyway.  We can
either keep it as is, or just drop the intermediate step if that is
easy enough for you.


Re: [PATCH 2/3] kbuild: generate asm-generic wrappers if mandatory headers are missing

2018-12-06 Thread Christoph Hellwig
On Wed, Dec 05, 2018 at 08:28:05PM +0900, Masahiro Yamada wrote:
> Some time ago, Sam pointed out a certain degree of overwrap between
> generic-y and mandatory-y. (https://lkml.org/lkml/2017/7/10/121)
> 
> I a bit tweaked the meaning of mandatory-y; now it defines the minimum
> set of ASM headers that all architectures must have.
> 
> If arch does not have specific implementation of a mandatory header,
> Kbuild will let it fallback to the asm-generic one by automatically
> generating a wrapper. This will allow to drop lots of redundant
> generic-y defines.
> 
> Previously, "mandatory" was used in the context of UAPI, but I guess
> this can be extended to kernel space ASM headers.

How useful is it to keep the generic-y behavior around at all vs making
everything useful mandatory?


Re: [PATCH 2/2] arch: switch the default on ARCH_HAS_SG_CHAIN

2018-12-06 Thread Christoph Hellwig
I've picked this up for the dma-mapping for-next tree.


[PATCH 1/5] swiotlb: remove SWIOTLB_MAP_ERROR

2018-12-04 Thread Christoph Hellwig
We can use DMA_MAPPING_ERROR instead, which already maps to the same
value.

Signed-off-by: Christoph Hellwig 
---
 drivers/xen/swiotlb-xen.c | 4 ++--
 include/linux/swiotlb.h   | 3 ---
 kernel/dma/swiotlb.c  | 4 ++--
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 6dc969d5ea2f..833e80b46eb2 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -403,7 +403,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
 
map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir,
 attrs);
-   if (map == SWIOTLB_MAP_ERROR)
+   if (map == DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
 
dev_addr = xen_phys_to_bus(map);
@@ -572,7 +572,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct 
scatterlist *sgl,
 sg_phys(sg),
 sg->length,
 dir, attrs);
-   if (map == SWIOTLB_MAP_ERROR) {
+   if (map == DMA_MAPPING_ERROR) {
dev_warn(hwdev, "swiotlb buffer is full\n");
/* Don't panic here, we expect map_sg users
   to do proper error handling. */
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index a387b59640a4..14aec0b70dd9 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -46,9 +46,6 @@ enum dma_sync_target {
SYNC_FOR_DEVICE = 1,
 };
 
-/* define the last possible byte of physical address space as a mapping error 
*/
-#define SWIOTLB_MAP_ERROR (~(phys_addr_t)0x0)
-
 extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
  dma_addr_t tbl_dma_addr,
  phys_addr_t phys, size_t size,
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ff1ce81bb623..19ba8e473d71 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -526,7 +526,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
spin_unlock_irqrestore(_tlb_lock, flags);
if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
size);
-   return SWIOTLB_MAP_ERROR;
+   return DMA_MAPPING_ERROR;
 found:
spin_unlock_irqrestore(_tlb_lock, flags);
 
@@ -637,7 +637,7 @@ static dma_addr_t swiotlb_bounce_page(struct device *dev, 
phys_addr_t *phys,
/* Oh well, have to allocate and map a bounce buffer. */
*phys = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
*phys, size, dir, attrs);
-   if (*phys == SWIOTLB_MAP_ERROR)
+   if (*phys == DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
 
/* Ensure that the address returned is DMA'ble */
-- 
2.19.1



[PATCH 3/5] dma-direct: improve addressability error reporting

2018-12-04 Thread Christoph Hellwig
Only report report a DMA addressability report once to avoid spewing the
kernel log with repeated message.  Also provide a stack trace to make it
easy to find the actual caller that caused the problem.

Last but not least move the actual check into the fast path and only
leave the error reporting in a helper.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 36 +++-
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 308f88a750c8..edb24f94ea1e 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -30,27 +30,16 @@ static inline bool force_dma_unencrypted(void)
return sev_active();
 }
 
-static bool
-check_addr(struct device *dev, dma_addr_t dma_addr, size_t size,
-   const char *caller)
+static void report_addr(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
-   if (unlikely(dev && !dma_capable(dev, dma_addr, size))) {
-   if (!dev->dma_mask) {
-   dev_err(dev,
-   "%s: call on device without dma_mask\n",
-   caller);
-   return false;
-   }
-
-   if (*dev->dma_mask >= DMA_BIT_MASK(32) || dev->bus_dma_mask) {
-   dev_err(dev,
-   "%s: overflow %pad+%zu of device mask %llx bus 
mask %llx\n",
-   caller, _addr, size,
-   *dev->dma_mask, dev->bus_dma_mask);
-   }
-   return false;
+   if (!dev->dma_mask) {
+   dev_err_once(dev, "DMA map on device without dma_mask\n");
+   } else if (*dev->dma_mask >= DMA_BIT_MASK(32) || dev->bus_dma_mask) {
+   dev_err_once(dev,
+   "overflow %pad+%zu of DMA mask %llx bus mask %llx\n",
+   _addr, size, *dev->dma_mask, dev->bus_dma_mask);
}
-   return true;
+   WARN_ON_ONCE(1);
 }
 
 static inline dma_addr_t phys_to_dma_direct(struct device *dev,
@@ -288,8 +277,10 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct 
page *page,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (!check_addr(dev, dma_addr, size, __func__))
+   if (unlikely(dev && !dma_capable(dev, dma_addr, size))) {
+   report_addr(dev, dma_addr, size);
return DMA_MAPPING_ERROR;
+   }
 
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
dma_direct_sync_single_for_device(dev, dma_addr, size, dir);
@@ -306,8 +297,11 @@ int dma_direct_map_sg(struct device *dev, struct 
scatterlist *sgl, int nents,
BUG_ON(!sg_page(sg));
 
sg_dma_address(sg) = phys_to_dma(dev, sg_phys(sg));
-   if (!check_addr(dev, sg_dma_address(sg), sg->length, __func__))
+   if (unlikely(dev && !dma_capable(dev, sg_dma_address(sg),
+   sg->length))) {
+   report_addr(dev, sg_dma_address(sg), sg->length);
return 0;
+   }
sg_dma_len(sg) = sg->length;
}
 
-- 
2.19.1



[PATCH 5/5] dma-direct: merge swiotlb_dma_ops into the dma_direct code

2018-12-04 Thread Christoph Hellwig
While the dma-direct code is (relatively) clean and simple we actually
have to use the swiotlb ops for the mapping on many architectures due
to devices with addressing limits.  Instead of keeping two
implementations around this commit allows the dma-direct
implementation to call the swiotlb bounce buffering functions and
thus share the guts of the mapping implementation.  This also
simplified the dma-mapping setup on a few architectures where we
don't have to differenciate which implementation to use.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c  |   2 +-
 arch/ia64/hp/common/hwsw_iommu.c |   2 +-
 arch/ia64/hp/common/sba_iommu.c  |   6 +-
 arch/ia64/kernel/dma-mapping.c   |   2 +-
 arch/mips/include/asm/dma-mapping.h  |   2 -
 arch/powerpc/kernel/dma-swiotlb.c|  16 +-
 arch/riscv/include/asm/dma-mapping.h |  15 --
 arch/x86/kernel/pci-swiotlb.c|   4 +-
 arch/x86/mm/mem_encrypt.c|   7 -
 arch/x86/pci/sta2x11-fixup.c |   1 -
 include/linux/dma-direct.h   |  12 ++
 include/linux/swiotlb.h  |  68 +++-
 kernel/dma/direct.c  | 113 +
 kernel/dma/swiotlb.c | 227 ++-
 14 files changed, 144 insertions(+), 333 deletions(-)
 delete mode 100644 arch/riscv/include/asm/dma-mapping.h

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4c0f498069e8..e4effbb243b1 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -549,7 +549,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
const struct iommu_ops *iommu, bool coherent)
 {
if (!dev->dma_ops)
-   dev->dma_ops = _dma_ops;
+   dev->dma_ops = _direct_ops;
 
dev->dma_coherent = coherent;
__iommu_setup_dma_ops(dev, dma_base, size, iommu);
diff --git a/arch/ia64/hp/common/hwsw_iommu.c b/arch/ia64/hp/common/hwsw_iommu.c
index 58969039bed2..f40ca499b246 100644
--- a/arch/ia64/hp/common/hwsw_iommu.c
+++ b/arch/ia64/hp/common/hwsw_iommu.c
@@ -38,7 +38,7 @@ static inline int use_swiotlb(struct device *dev)
 const struct dma_map_ops *hwsw_dma_get_ops(struct device *dev)
 {
if (use_swiotlb(dev))
-   return _dma_ops;
+   return _direct_ops;
return _dma_ops;
 }
 EXPORT_SYMBOL(hwsw_dma_get_ops);
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 0d21c0b5b23d..5ee74820a0f6 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -2065,8 +2065,6 @@ static int __init acpi_sba_ioc_init_acpi(void)
 /* This has to run before acpi_scan_init(). */
 arch_initcall(acpi_sba_ioc_init_acpi);
 
-extern const struct dma_map_ops swiotlb_dma_ops;
-
 static int __init
 sba_init(void)
 {
@@ -2080,7 +2078,7 @@ sba_init(void)
 * a successful kdump kernel boot is to use the swiotlb.
 */
if (is_kdump_kernel()) {
-   dma_ops = _dma_ops;
+   dma_ops = _direct_ops;
if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
panic("Unable to initialize software I/O TLB:"
  " Try machvec=dig boot option");
@@ -2102,7 +2100,7 @@ sba_init(void)
 * If we didn't find something sba_iommu can claim, we
 * need to setup the swiotlb and switch to the dig machvec.
 */
-   dma_ops = _dma_ops;
+   dma_ops = _direct_ops;
if (swiotlb_late_init_with_default_size(64 * (1<<20)) != 0)
panic("Unable to find SBA IOMMU or initialize "
  "software I/O TLB: Try machvec=dig boot option");
diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c
index 36dd6aa6d759..80cd3e1ea95a 100644
--- a/arch/ia64/kernel/dma-mapping.c
+++ b/arch/ia64/kernel/dma-mapping.c
@@ -36,7 +36,7 @@ long arch_dma_coherent_to_pfn(struct device *dev, void 
*cpu_addr,
 
 void __init swiotlb_dma_init(void)
 {
-   dma_ops = _dma_ops;
+   dma_ops = _direct_ops;
swiotlb_init(1);
 }
 #endif
diff --git a/arch/mips/include/asm/dma-mapping.h 
b/arch/mips/include/asm/dma-mapping.h
index b4c477eb46ce..69f914667f3e 100644
--- a/arch/mips/include/asm/dma-mapping.h
+++ b/arch/mips/include/asm/dma-mapping.h
@@ -10,8 +10,6 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
 {
 #if defined(CONFIG_MACH_JAZZ)
return _dma_ops;
-#elif defined(CONFIG_SWIOTLB)
-   return _dma_ops;
 #else
return _direct_ops;
 #endif
diff --git a/arch/powerpc/kernel/dma-swiotlb.c 
b/arch/powerpc/kernel/dma-swiotlb.c
index 3d8df2cf8be9..21a869fb9734 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -50,15 +50,15 @@ const struct dma_map_ops powerpc

merge swiotlb support into dma_direct_ops

2018-12-04 Thread Christoph Hellwig
Hi Konrad and others,

can you review this series?  It merges the swiotlb support into the
DMA direct ops so that we don't have to duplicate the dma mapping logic
in multiple places.

Note that this is based on the dma_mapping_error series for which I'd
still like to collect a few more reviews, so pulling the git tree might
be easiest for testing.

The git tree is available here:

git://git.infradead.org/users/hch/misc.git swiotlb-merge

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-merge


Re: [PATCH v3, RFC] iommu/io-pgtable-arm-v7s: Use page_frag to request DMA32 memory

2018-12-04 Thread Christoph Hellwig
On Tue, Dec 04, 2018 at 04:23:00PM +0800, Nicolas Boichat wrote:
> IOMMUs using ARMv7 short-descriptor format require page tables
> (level 1 and 2) to be allocated within the first 4GB of RAM, even
> on 64-bit systems.
> 
> For level 1/2 tables, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
> is defined (e.g. on arm64 platforms).
> 
> For level 2 tables (1 KB), we use page_frag to allocate these pages,
> as we cannot directly use kmalloc (no slab cache for GFP_DMA32) or
> kmem_cache (mm/ code treats GFP_DMA32 as an invalid flag).
> 
> One downside is that we only free the allocated page if all the
> 4 fragments (4 IOMMU L2 tables) are freed, but given that we
> usually only allocate limited number of IOMMU L2 tables, this
> should not have too much impact on memory usage: In the absolute
> worst case (4096 L2 page tables, each on their own 4K page),
> we would use 16 MB of memory for 4 MB of L2 tables.

I think this needs to be documemented in the code.  That is move
the explanation about into a comment in the code.


Re: [PATCH] dma-debug: Kconfig for PREALLOC_DMA_DEBUG_ENTRIES

2018-12-03 Thread Christoph Hellwig
On Mon, Dec 03, 2018 at 11:56:11AM +, John Garry wrote:
> On 01/12/2018 16:36, Christoph Hellwig wrote:
>> On Fri, Nov 30, 2018 at 07:39:50PM +, Robin Murphy wrote:
>>> I was assuming the point was to also add something like
>>>
>>> default 131072 if HNS_ENET
>>>
>>> so that DMA debug doesn't require too much thought from the user. If they
>>> still have to notice the overflow message and empirically figure out a
>>> value that does work, rebuilding the kernel each time is far less
>>> convenient than simply adding "dma_debug_entries=..." to their kernel
>>> command line and rebooting, which they can do today. If they do already
>>> know up-front that the default will need overriding and what the
>>> appropriate value is, then the command line still seems seems just as
>>> convenient.
>>
>> I'm not so fond of random drivers changing the defaults.  My idea
>> was rather to have the config option so that the defconfig files for
>> the Hisilicon SOCs with this hardware could select a larger number
>> without making a total mess of the kernel configuration.
>>
>> If we really have to we could do different defaults, but I'd still
>> much rather do this on a arch/platform basis than specific drivers.
>
> As I understand, some drivers could even use much more than this (131072), 
> to such a point where I can't imagine that we would want to set an arch 
> default to support them. For this HNS_ENET case, it is arm64 specific so it 
> would be an arch defconfig.

But I'm not sure we could always do the right thing for everyone.

I think we might be better of trying to just dynamically allocate
entries when we run out of them instead of coming up with a perfect
number.


Re: XFS patches for stable

2018-12-02 Thread Christoph Hellwig
As someone who has done xfs stable backports for a while I really don't
think the autoselection is helpful at all.  Someone who is vaguely
familiar with the code needs to manually select the commits and QA them,
which takes a fair amount of time, but just needs some manual help if it
should work ok.

I think we are about ready to have a new xfs stable maintainer lined up
if everything works well fortunately.


Re: [PATCH v2] fs: fix lost error code in dio_complete

2018-11-30 Thread Christoph Hellwig
Al, Jens,

can someone pick this up, please?

On Fri, Nov 30, 2018 at 10:02:22AM +0100, Maximilian Heyne wrote:
> On 11/8/18 7:58 PM, Maximilian Heyne wrote:
>> commit e259221763a40403d5bb232209998e8c45804ab8 ("fs: simplify the
>> generic_write_sync prototype") reworked callers of generic_write_sync(),
>> and ended up dropping the error return for the directio path. Prior to
>> that commit, in dio_complete(), an error would be bubbled up the stack,
>> but after that commit, errors passed on to dio_complete were eaten up.
>>
>> This was reported on the list earlier, and a fix was proposed in
>> https://lore.kernel.org/lkml/20160921141539.ga17...@infradead.org/, but
>> never followed up with.  We recently hit this bug in our testing where
>> fencing io errors, which were previously erroring out with EIO, were
>> being returned as success operations after this commit.
> I just wanted to follow up on this. Has anyone picked up this patch?
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich
> Ust-ID: DE 289 237 879
> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
>
---end quoted text---


Re: [PATCH 04/13] blkcg: introduce common blkg association logic

2018-11-30 Thread Christoph Hellwig
>  EXPORT_SYMBOL_GPL(bio_associate_blkcg);
>  
>  /**
> - * bio_associate_blkg - associate a bio with the a blkg
> + * bio_has_queue - required check for blkg association
> + * @bio: target bio
> + *
> + * A blkg represents the relationship between a blkcg and a request_queue.
> + * If there is no request_queue, there is no blkg and therefore nothing to
> + * associate with.
> + */
> +static inline bool bio_has_queue(struct bio *bio)
> +{
> + return bio->bi_disk && bio->bi_disk->queue;
> +}

How do you ever see a bio without a queue?  We can't even do I/O in
that case.


Re: [PATCH v3 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*

2018-11-29 Thread Christoph Hellwig
On Thu, Nov 29, 2018 at 09:24:17AM -0800, Tomasz Figa wrote:
> Whether the cache maintenance operation needs to actually do anything
> or not is a function of `dev`. We can have some devices that are
> coherent with CPU caches, and some that are not, on the same system.

Yes, but that part is not decided by these low-level helpers but their
callers in the DMA code (or maybe IOMMU code as well in the future).

> There is also the use case of using CMA with device-specific pools of
> memory reusable by the system when not used by the device and those
> would have to somehow get the pool to allocate from, but I wonder if
> struct device is the right way to pass such information. I'd see the
> pool given explicitly like cma_alloc(struct cma_pool *, size, flags)
> and perhaps a wrapper cma_alloc_default(size, flags) that is just a
> simple macro calling cma_alloc(_pool_default, size, flags).

Yes, the cma APIs have quite a few warts that need addressing.
I have a few of those things on my todo list, but help is always
welcome.


Re: [LKP] [block] 9d037ad707: BUG:unable_to_handle_kernel

2018-11-29 Thread Christoph Hellwig
On Thu, Nov 29, 2018 at 05:20:31PM +0800, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-7):
> 
> commit: 9d037ad707ed6069fbea4e38e6ee37e027b13f1d ("block: remove 
> req->timeout_list")
> https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git mq-perf

This looks very odd.  How could we introduce a new BUG with the removal
of an unused structure member?


Re: use generic DMA mapping code in powerpc V4

2018-11-29 Thread Christoph Hellwig
On Wed, Nov 28, 2018 at 10:05:19PM +1100, Michael Ellerman wrote:
> Is the plan that you take these via the dma-mapping tree or that they go
> via powerpc?

In principle either way is fine with me.  If it goes through the powerpc
tree we might run into a few minor conflicts with the dma-mapping tree
depending on how some of the current discussions go.


Re: [PATCH v6 3/6] m68k: coldfire: Add clk_get_optional() function

2018-11-29 Thread Christoph Hellwig
On Thu, Nov 29, 2018 at 09:54:37PM +1000, Greg Ungerer wrote:
> Hi Phil,
> 
> On 17/11/18 12:59 am, Phil Edworthy wrote:
> > clk_get_optional() returns NULL if not found instead of -ENOENT,
> > otherwise the behaviour is the same as clk_get().
> > 
> > Signed-off-by: Phil Edworthy 
> 
> Acked-by: Greg Ungerer 
> 
> Looks good. Do you want me to take this in the m68knommu git tree?
> Or is the whole series going through some other tree?

Any chance we could just get coldfire moved over to the common clock
framework?


Re: [PATCH v3 1/1] drm: msm: Replace dma_map_sg with dma_sync_sg*

2018-11-29 Thread Christoph Hellwig
On Thu, Nov 29, 2018 at 07:33:15PM +0530, Vivek Gautam wrote:
> dma_map_sg() expects a DMA domain. However, the drm devices
> have been traditionally using unmanaged iommu domain which
> is non-dma type. Using dma mapping APIs with that domain is bad.
> 
> Replace dma_map_sg() calls with dma_sync_sg_for_device{|cpu}()
> to do the cache maintenance.

As I told you before:  hell no.  If you spent the slightest amount of
actually trying to understand what you are doing here you'd know this
can't work.  Just turn on dma debugging and this will blow up in your
face.

Either you use the DMA API properly, that is you use it to map and
to sync, or you don't use it at all.  Mix and match between iommu
APIs and DMA APIs is simply not possible.


Re: [PATCH] sis5513: fix potential use after free

2018-11-27 Thread Christoph Hellwig
On Wed, Nov 28, 2018 at 08:55:26AM +0800, Pan Bian wrote:
> The function sis_find_family drops lpc_bridge reference via pci_dev_put,
> however, after that, field lpc_bridge->revision is read. This may result
> in a use after free bug. The patch moves the put operation after the
> condition check.
> 
> Signed-off-by: Pan Bian 

Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH] ata: read ->revision before dropping pci_device reference

2018-11-27 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH] nios2: remove unneeded HAS_DMA define

2018-11-26 Thread Christoph Hellwig
On Mon, Nov 26, 2018 at 04:42:10PM +0900, Masahiro Yamada wrote:
> kernel/dma/Kconfig globally defines HAS_DMA as follows:
> 
>   config HAS_DMA
>   bool
>   depends on !NO_DMA
>   default y
> 
> Signed-off-by: Masahiro Yamada 

Looks good, thanks!

Reviewed-by: Christoph Hellwig 


Re: [PATCH] pcmcia: remove per-arch PCMCIA config entry

2018-11-26 Thread Christoph Hellwig
On Mon, Nov 26, 2018 at 05:15:41PM +0900, Masahiro Yamada wrote:
> Now that all architectures include drivers/pcmcia/Kconfig where
> the PCMCIA config is defined, the PCMCIA config entries in per-arch
> Kconfig files are redundant.
> 
> Signed-off-by: Masahiro Yamada 
> ---
> 
> I will queue this up to my kbuild tree
> along with Christoph's clean-up patch set.

Thanks a lot, not sure how I missed those.

Reviewed-by: Christoph Hellwig 


Re: linux-next: build warning after merge of the btrfs-kdave tree

2018-11-25 Thread Christoph Hellwig
On Mon, Nov 26, 2018 at 11:06:29AM +1100, Stephen Rothwell wrote:
> Introduced by commit
> 
>   cf8cddd38bab3 ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block")
> 
> exposed by my new use of -Wimplicit-fallthrough
> 
> I am not sure why this has only turned up now (as opposed to earlier
> today).

It looks like something only in linux-next moved the code around a bit.

Either way the fallthough looks fine, it will just need a little
/*FALLTHRU*/ annotation.


Re: [PATCH] s390: Remove obsolete bust_spinlock() implementation

2018-11-22 Thread Christoph Hellwig
Please remove the weak attribute from the generic implementation as
well now that the last override is gone.



[PATCH 13/24] alpha: remove the mapping_error dma_map_ops method

2018-11-22 Thread Christoph Hellwig
Return DMA_MAPPING_ERROR instead of 0 on a dma mapping failure and let
the core dma-mapping code handle the rest.

Signed-off-by: Christoph Hellwig 
---
 arch/alpha/kernel/pci_iommu.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 46e08e0d9181..e1716e0d92fd 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -291,7 +291,7 @@ pci_map_single_1(struct pci_dev *pdev, void *cpu_addr, 
size_t size,
   use direct_map above, it now must be considered an error. */
if (! alpha_mv.mv_pci_tbi) {
printk_once(KERN_WARNING "pci_map_single: no HW sg\n");
-   return 0;
+   return DMA_MAPPING_ERROR;
}
 
arena = hose->sg_pci;
@@ -307,7 +307,7 @@ pci_map_single_1(struct pci_dev *pdev, void *cpu_addr, 
size_t size,
if (dma_ofs < 0) {
printk(KERN_WARNING "pci_map_single failed: "
   "could not allocate dma page tables\n");
-   return 0;
+   return DMA_MAPPING_ERROR;
}
 
paddr &= PAGE_MASK;
@@ -455,7 +455,7 @@ static void *alpha_pci_alloc_coherent(struct device *dev, 
size_t size,
memset(cpu_addr, 0, size);
 
*dma_addrp = pci_map_single_1(pdev, cpu_addr, size, 0);
-   if (*dma_addrp == 0) {
+   if (*dma_addrp == DMA_MAPPING_ERROR) {
free_pages((unsigned long)cpu_addr, order);
if (alpha_mv.mv_pci_tbi || (gfp & GFP_DMA))
return NULL;
@@ -671,7 +671,7 @@ static int alpha_pci_map_sg(struct device *dev, struct 
scatterlist *sg,
sg->dma_address
  = pci_map_single_1(pdev, SG_ENT_VIRT_ADDRESS(sg),
 sg->length, dac_allowed);
-   return sg->dma_address != 0;
+   return sg->dma_address != DMA_MAPPING_ERROR;
}
 
start = sg;
@@ -935,11 +935,6 @@ iommu_unbind(struct pci_iommu_arena *arena, long pg_start, 
long pg_count)
return 0;
 }
 
-static int alpha_pci_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
-   return dma_addr == 0;
-}
-
 const struct dma_map_ops alpha_pci_ops = {
.alloc  = alpha_pci_alloc_coherent,
.free   = alpha_pci_free_coherent,
@@ -947,7 +942,6 @@ const struct dma_map_ops alpha_pci_ops = {
.unmap_page = alpha_pci_unmap_page,
.map_sg = alpha_pci_map_sg,
.unmap_sg   = alpha_pci_unmap_sg,
-   .mapping_error  = alpha_pci_mapping_error,
.dma_supported  = alpha_pci_supported,
 };
 EXPORT_SYMBOL(alpha_pci_ops);
-- 
2.19.1



[PATCH 23/24] xen-swiotlb: remove the mapping_error dma_map_ops method

2018-11-22 Thread Christoph Hellwig
Return DMA_MAPPING_ERROR instead of 0 on a dma mapping failure and let
the core dma-mapping code handle the rest.

Signed-off-by: Christoph Hellwig 
---
 drivers/xen/swiotlb-xen.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 2a7f545bd0b5..6dc969d5ea2f 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -53,8 +53,6 @@
  * API.
  */
 
-#define XEN_SWIOTLB_ERROR_CODE (~(dma_addr_t)0x0)
-
 static char *xen_io_tlb_start, *xen_io_tlb_end;
 static unsigned long xen_io_tlb_nslabs;
 /*
@@ -406,7 +404,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir,
 attrs);
if (map == SWIOTLB_MAP_ERROR)
-   return XEN_SWIOTLB_ERROR_CODE;
+   return DMA_MAPPING_ERROR;
 
dev_addr = xen_phys_to_bus(map);
xen_dma_map_page(dev, pfn_to_page(map >> PAGE_SHIFT),
@@ -421,7 +419,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
attrs |= DMA_ATTR_SKIP_CPU_SYNC;
swiotlb_tbl_unmap_single(dev, map, size, dir, attrs);
 
-   return XEN_SWIOTLB_ERROR_CODE;
+   return DMA_MAPPING_ERROR;
 }
 
 /*
@@ -700,11 +698,6 @@ xen_swiotlb_get_sgtable(struct device *dev, struct 
sg_table *sgt,
return dma_common_get_sgtable(dev, sgt, cpu_addr, handle, size, attrs);
 }
 
-static int xen_swiotlb_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
-   return dma_addr == XEN_SWIOTLB_ERROR_CODE;
-}
-
 const struct dma_map_ops xen_swiotlb_dma_ops = {
.alloc = xen_swiotlb_alloc_coherent,
.free = xen_swiotlb_free_coherent,
@@ -719,5 +712,4 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
.dma_supported = xen_swiotlb_dma_supported,
.mmap = xen_swiotlb_dma_mmap,
.get_sgtable = xen_swiotlb_get_sgtable,
-   .mapping_error  = xen_swiotlb_mapping_error,
 };
-- 
2.19.1



[PATCH 21/24] iommu/vt-d: remove the mapping_error dma_map_ops method

2018-11-22 Thread Christoph Hellwig
Return DMA_MAPPING_ERROR instead of 0 on a dma mapping failure and let
the core dma-mapping code handle the rest.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/intel-iommu.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3b8a7acac7a1..a5704155932d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3617,7 +3617,7 @@ static dma_addr_t __intel_map_page(struct device *dev, 
struct page *page,
 
domain = get_valid_domain_for_dev(dev);
if (!domain)
-   return 0;
+   return DMA_MAPPING_ERROR;
 
iommu = domain_get_iommu(domain);
size = aligned_nrpages(paddr, size);
@@ -3655,7 +3655,7 @@ static dma_addr_t __intel_map_page(struct device *dev, 
struct page *page,
free_iova_fast(>iovad, iova_pfn, dma_to_mm_pfn(size));
pr_err("Device %s request: %zx@%llx dir %d --- failed\n",
dev_name(dev), size, (unsigned long long)paddr, dir);
-   return 0;
+   return DMA_MAPPING_ERROR;
 }
 
 static dma_addr_t intel_map_page(struct device *dev, struct page *page,
@@ -3756,7 +3756,7 @@ static void *intel_alloc_coherent(struct device *dev, 
size_t size,
 
*dma_handle = __intel_map_page(dev, page, 0, size, DMA_BIDIRECTIONAL,
   dev->coherent_dma_mask);
-   if (*dma_handle)
+   if (*dma_handle != DMA_MAPPING_ERROR)
return page_address(page);
if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT))
__free_pages(page, order);
@@ -3865,11 +3865,6 @@ static int intel_map_sg(struct device *dev, struct 
scatterlist *sglist, int nele
return nelems;
 }
 
-static int intel_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
-   return !dma_addr;
-}
-
 static const struct dma_map_ops intel_dma_ops = {
.alloc = intel_alloc_coherent,
.free = intel_free_coherent,
@@ -3877,7 +3872,6 @@ static const struct dma_map_ops intel_dma_ops = {
.unmap_sg = intel_unmap_sg,
.map_page = intel_map_page,
.unmap_page = intel_unmap_page,
-   .mapping_error = intel_mapping_error,
.dma_supported = dma_direct_supported,
 };
 
-- 
2.19.1



Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-11-22 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 10:26:26PM +, Robin Murphy wrote:
> TBH, if this DMA32 stuff is going to be contentious we could possibly just
> rip out the offending kmem_cache - it seemed like good practice for the
> use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to
> give the same 1KB alignment and chance of succeeding as the equivalent
> kmem_cache_alloc(), then we could quite easily make do with that instead.

Neither is the slab support for kmalloc, not do kmalloc allocations
have useful alignment apparently (at least if you use slub debug).

But I do agree with the sentiment of not wanting to spread GFP_DMA32
futher into the slab allocator.

I think you want a simple genalloc allocator for this rather special
use case.


Re: [PATCH 2/6] lib/lzo: enable 64-bit CTZ on Arm

2018-11-21 Thread Christoph Hellwig
>  #define LZO_USE_CTZ321
>  #elif defined(__i386__) || defined(__powerpc__)
>  #define LZO_USE_CTZ321
> -#elif defined(__arm__) && (__LINUX_ARM_ARCH__ >= 5)
> +#elif defined(__arm__)
> +#if (__LINUX_ARM_ARCH__ >= 5)
>  #define LZO_USE_CTZ321
>  #endif
> +#if (__LINUX_ARM_ARCH__ >= 6) && (CONFIG_THUMB2_KERNEL)
> +#define LZO_USE_CTZ641

All of this really needs to be driven by Kconfig symbols that
the architecture selects instead of magic arch ifdefs here.


Re: [PATCH V11 03/19] block: introduce bio_for_each_bvec()

2018-11-21 Thread Christoph Hellwig
On Wed, Nov 21, 2018 at 05:10:25PM +0100, Christoph Hellwig wrote:
> No - I think we can always use the code without any segment in
> bvec_iter_advance.  Because bvec_iter_advance only operates on the
> iteractor, the generation of an actual single-page or multi-page
> bvec is left to the caller using the bvec_iter_bvec or segment_iter_bvec
> helpers.  The only difference is how many bytes you can move the
> iterator forward in a single loop iteration - so if you pass in
> PAGE_SIZE as the max_seg_len you just will have to loop more often
> for a large enough bytes, but not actually do anything different.

FYI, this patch reverts the max_seg_len related changes back to where
we are in mainline, and as expected everything works fine for me:

diff --git a/include/linux/bio.h b/include/linux/bio.h
index e5b975fa0558..926550ce2d21 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -137,24 +137,18 @@ static inline bool bio_full(struct bio *bio)
for (i = 0, iter_all.idx = 0; iter_all.idx < (bio)->bi_vcnt; 
iter_all.idx++)\
bvec_for_each_segment(bvl, &((bio)->bi_io_vec[iter_all.idx]), 
i, iter_all)
 
-static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
- unsigned bytes, unsigned max_seg_len)
+static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+   unsigned bytes)
 {
iter->bi_sector += bytes >> 9;
 
if (bio_no_advance_iter(bio))
iter->bi_size -= bytes;
else
-   __bvec_iter_advance(bio->bi_io_vec, iter, bytes, max_seg_len);
+   bvec_iter_advance(bio->bi_io_vec, iter, bytes);
/* TODO: It is reasonable to complete bio with error here. */
 }
 
-static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
-   unsigned bytes)
-{
-   __bio_advance_iter(bio, iter, bytes, PAGE_SIZE);
-}
-
 #define __bio_for_each_segment(bvl, bio, iter, start)  \
for (iter = (start);\
 (iter).bi_size &&  \
@@ -168,7 +162,7 @@ static inline void bio_advance_iter(struct bio *bio, struct 
bvec_iter *iter,
for (iter = (start);\
 (iter).bi_size &&  \
((bvl = bio_iter_mp_iovec((bio), (iter))), 1);  \
-__bio_advance_iter((bio), &(iter), (bvl).bv_len, BVEC_MAX_LEN))
+bio_advance_iter((bio), &(iter), (bvl).bv_len))
 
 /* returns one real segment(multi-page bvec) each time */
 #define bio_for_each_bvec(bvl, bio, iter)  \
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index cab36d838ed0..138b4007b8f2 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -25,8 +25,6 @@
 #include 
 #include 
 
-#define BVEC_MAX_LEN  ((unsigned int)-1)
-
 /*
  * was unsigned short, but we might as well be ready for > 64kB I/O pages
  */
@@ -102,8 +100,8 @@ struct bvec_iter_all {
.bv_offset  = segment_iter_offset((bvec), (iter)),  \
 })
 
-static inline bool __bvec_iter_advance(const struct bio_vec *bv,
-   struct bvec_iter *iter, unsigned bytes, unsigned max_seg_len)
+static inline bool bvec_iter_advance(const struct bio_vec *bv,
+   struct bvec_iter *iter, unsigned bytes)
 {
if (WARN_ONCE(bytes > iter->bi_size,
 "Attempted to advance past end of bvec iter\n")) {
@@ -112,18 +110,12 @@ static inline bool __bvec_iter_advance(const struct 
bio_vec *bv,
}
 
while (bytes) {
-   unsigned segment_len = segment_iter_len(bv, *iter);
-
-   if (max_seg_len < BVEC_MAX_LEN)
-   segment_len = min_t(unsigned, segment_len,
-   max_seg_len -
-   bvec_iter_offset(bv, *iter));
+   unsigned iter_len = bvec_iter_len(bv, *iter);
+   unsigned len = min(bytes, iter_len);
 
-   segment_len = min(bytes, segment_len);
-
-   bytes -= segment_len;
-   iter->bi_size -= segment_len;
-   iter->bi_bvec_done += segment_len;
+   bytes -= len;
+   iter->bi_size -= len;
+   iter->bi_bvec_done += len;
 
if (iter->bi_bvec_done == __bvec_iter_bvec(bv, *iter)->bv_len) {
iter->bi_bvec_done = 0;
@@ -157,13 +149,6 @@ static inline bool bvec_iter_rewind(const struct bio_vec 
*bv,
return true;
 }
 
-static inline bool bvec_iter_advance(const struct bio_vec *bv,
-struct bvec_iter *iter,
-u

Re: [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-20 Thread Christoph Hellwig
On Mon, Nov 19, 2018 at 04:49:27PM -0800, Sagi Grimberg wrote:
>
>> The only user in your final tree seems to be the loop driver, and
>> even that one only uses the helper for read/write bios.
>>
>> I think something like this would be much simpler in the end:
>
> The recently submitted nvme-tcp host driver should also be a user
> of this. Does it make sense to keep it as a helper then?

I did take a brief look at the code, and I really don't understand
why the heck it even deals with bios to start with.  Like all the
other nvme transports it is a blk-mq driver and should iterate
over segments in a request and more or less ignore bios.  Something
is horribly wrong in the design.


Re: [PATCH 06/10] swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs

2018-11-20 Thread Christoph Hellwig
On Mon, Nov 19, 2018 at 03:22:13PM -0800, John Stultz wrote:
> > +   sg->dma_address = dma_addr;
> > sg_dma_len(sg) = sg->length;
> > }
> 
> I know Robin has already replied with more detailed info, but just to
> close the loop as I'm finally home, applying this patch didn't seem to
> help with the IO hangs I'm seeing w/ HiKey960.

If Robins observation is right this should fix the problem for you:

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index bd73e7a91410..1833f0c1fba0 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -5,7 +5,7 @@
 #include 
 #include 
 
-#define DIRECT_MAPPING_ERROR   0
+#define DIRECT_MAPPING_ERROR   (~(dma_addr_t)0x0)
 
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
 #include 


Re: [PATCH 2/2] arch: switch the default on ARCH_HAS_SG_CHAIN

2018-11-19 Thread Christoph Hellwig
On Fri, Nov 16, 2018 at 08:52:14AM -0800, Palmer Dabbelt wrote:
> As far as I can tell, it looks like m68k, mips, and powerpc mention an 
> IOMMU in their ports, don't set ARCH_HAS_SG_CHAIN, and with this patch set 
> won't set ARCH_NO_SG_CHAIN.

m68k has no iommu, and not operations that operate on a scatterlist.

mips has a trivial iommu driver (jazzdma), but I wrote the current
instance of it, nad it is fine.

powerpc has various iommu, but actually enables ARCH_HAS_SG_CHAIN
unconditionally.

> The issue is that I'm not sure how to 
> determine what constitutes a horrible legacy IOMMU, at least with respect 
> to not being able to use scatterlist chaining.

It basically means someone is iterating using manual pointer arithmetics
over a multi-element scatterlist instead of using the sg_next and
for_each_sg helpers.


Re: [RFC] remove the ->mapping_error method from dma_map_ops

2018-11-19 Thread Christoph Hellwig
On Fri, Nov 09, 2018 at 03:12:34PM -0800, David Miller wrote:
> But patch #2 on the other hand, not so much.
> 
> I hate seeing values returned by reference, it adds cost especially
> on cpus where all argments and return values fit in registers (we end
> up forcing a stack slot and memory references).
> 
> And we don't need it here.
> 
> DMA addresses are like pointers, and therefore we can return errors and
> valid success values in the same dma_addr_t just fine.  PTR_ERR() --> 
> DMA_ERR(),
> IS_PTR_ERR() --> IS_DMA_ERR, etc.

In the end this is an inline function, so with a decently smart
compiler the generated code shouldn't change too much.  The big problem
that prompted me to come up with this patch is that not handling failure
from dma_map* in a swiotlb setup can lead to grave data corruption, and
we have no easy way to force error checking on these return values.

I've added a few of the static typechecking suspect if they have a
better idea on how to make the return value of dma_map_single/pages
in a way that we get warnings if dma_mapping_error isn't called on them.
But I can't really think of a good way.


Re: [PATCH 1/2] dma-mapping: remove ->mapping_error

2018-11-19 Thread Christoph Hellwig
On Fri, Nov 09, 2018 at 02:41:18PM +, Robin Murphy wrote:
>> -
>>   #define CMD_SET_TYPE(cmd, t) ((cmd)->data[1] |= ((t) << 28))
>> #define LOOP_TIMEOUT 10
>> @@ -2339,7 +2337,7 @@ static dma_addr_t __map_single(struct device *dev,
>>  paddr &= PAGE_MASK;
>>  address = dma_ops_alloc_iova(dev, dma_dom, pages, dma_mask);
>> -if (address == AMD_IOMMU_MAPPING_ERROR)
>> +if (address == DMA_MAPPING_ERROR)
>
> This for one is clearly broken, because the IOVA allocator still returns 0 
> on failure here...

Indeed.  And that shows how the original code was making a mess of these
different constants..

> I very much agree with the concept, but I think the way to go about it is 
> to convert the implementations which need it to the standardised 
> *_MAPPING_ERROR value one-by-one, and only then then do the big sweep to 
> remove them all. That has more of a chance of getting worthwhile review and 
> testing from the respective relevant parties (I'll confess I came looking 
> for this bug specifically, since I happened to recall amd_iommu having a 
> tricky implicit reliance on the old DMA_ERROR_CODE being 0 on x86).

I'll see if I can split this out somehow, but I'm not sure it is going
to be all that much more readable..

> In terms of really minimising the error-checking overhead it's a bit of a 
> shame that DMA_MAPPING_ERROR = 0 doesn't seem viable as the thing to 
> standardise on, since that has advantages at the micro-optimisation level 
> for many ISAs - fixing up the legacy IOMMU code doesn't seem 
> insurmountable, but I suspect there may well be non-IOMMU platforms where 
> DMA to physical address 0 is a thing :(

Yes, that is what I'm more worried about.

> (yeah, I know saving a couple of instructions and potential register 
> allocations is down in the noise when we're already going from an indirect 
> call to an inline comparison; I'm mostly just thinking out loud there)

The nice bit of standardizing the value is that we get rid of an
indirect call, which generally is much more of a problem at the
micro-architecture level.


Re: nvme: allow ANA support to be independent of native multipathing

2018-11-19 Thread Christoph Hellwig
On Fri, Nov 16, 2018 at 02:28:02PM -0500, Mike Snitzer wrote:
> You rejected the idea of allowing fine-grained control over whether
> native NVMe multipathing is enabled or not on a per-namespace basis.
> All we have is the coarse-grained nvme_core.multipath=N knob.  Now
> you're forecasting removing even that.  Please don't do that.

The whole point is that this hook was intended as a band aid for the
hypothetical pre-existing setups.  Not ever for new development.

> Please, PLEASE take v2 of this patch.. please? ;)

See the previous mail for the plan ahead.   I'm sick and tired of you
trying to sneak your new developemts back in.


Re: [PATCH V10 08/19] btrfs: move bio_pages_all() to btrfs

2018-11-19 Thread Christoph Hellwig
On Mon, Nov 19, 2018 at 04:19:24PM +0800, Ming Lei wrote:
> On Fri, Nov 16, 2018 at 02:38:45PM +0100, Christoph Hellwig wrote:
> > On Thu, Nov 15, 2018 at 04:52:55PM +0800, Ming Lei wrote:
> > > BTRFS is the only user of this helper, so move this helper into
> > > BTRFS, and implement it via bio_for_each_segment_all(), since
> > > bio->bi_vcnt may not equal to number of pages after multipage bvec
> > > is enabled.
> > 
> > btrfs only uses the value to check if it is larger than 1.  No amount
> > of multipage bio merging should ever make bi_vcnt go from 0 to 1 or
> > vice versa.
> 
> Could you explain a bit why?
> 
> Suppose 2 physically continuous pages are added to this bio, .bi_vcnt
> can be 1 in case of multi-page bvec, but it is 2 in case of single-page
> bvec.

True, I did think of 0 vs 1.

The magic here in btrfs still doesn't make much sense.  The comments
down in btrfs_check_repairable talk about sectors, so it might be a
leftover from the blocksize == PAGE_SIZE days (assuming those patches
actually got merged).  I'd like to have the btrfs folks chime in,
but in the end we should probably check if the bio was larger than
a single sector here.


Re: [PATCH 1/4] PCI / ACPI: Identify external PCI devices

2018-11-16 Thread Christoph Hellwig
On Thu, Nov 15, 2018 at 09:10:26PM +0200, Mika Westerberg wrote:
> FireWire is kind of different but there are connectors such as
> ExpressCard and NVMe (over U.2 connector) which carry PCIe and are
> relatively easy to access without need for a screwdriver. AFAIK some
> eGPUs are also using some other proprietary (non-TBT) connector that
> carries PCIe.

U.2 is a data center internal form factor with hot plug capability. If
you enable an iommu for that by default you will make a lot of people
very unhappy.

More importantly NVMe is now used for the current/next generation
Compact Flash and SD cards, which contain full PCIe gen 3 links.


Re: [PATCH v2 1/2] pci: prevent sk hynix nvme from entering D3

2018-11-15 Thread Christoph Hellwig
On Thu, Nov 15, 2018 at 11:30:15AM -0600, Bjorn Helgaas wrote:
> 
> But I guess you have to do this anyway just to add the vendor/device
> ID to the driver, so maybe this isn't a big deal to you.  If you can
> do a quirk like this in the driver, it would be invisible to me and I
> wouldn't care.  I just don't want to deal with ongoing tweaks like
> this in the PCI core :)

No, NVMe is a spec with a class code, and a specification that is
vendor independent.  NVMe devices declare invididual features based
on common fields.

APST is an optional feature with all kinds of parameters, but there
is absolutely no language that a host should not put the device into
D3 mode if APST is supported anywhere in the NVMe spec, and such
behavior is also rather counter intuitive.  If SK Hynix thinks this
is sensible behavior they should bring it up in the NVMe technical
working group.  I've pinged a contact there to see what this whole
story is about.


[PATCH 8/9] rapidio: consolidate RAPIDIO config entry in drivers/rapidio

2018-11-15 Thread Christoph Hellwig
There is no good reason to duplicate the RAPIDIO menu in various
architectures.  Instead provide a selectable HAVE_RAPIDIO symbol
that indicates native availability of RAPIDIO support and the handle
the rest in drivers/pci.  This also means we now provide support
for PCI(e) to Rapidio bridges for every architecture instead of a
limited subset.

Signed-off-by: Christoph Hellwig 
Acked-by: Thomas Gleixner 
---
 arch/mips/Kconfig   | 15 +--
 arch/powerpc/Kconfig| 14 +-
 arch/powerpc/platforms/85xx/Kconfig |  8 
 arch/powerpc/platforms/86xx/Kconfig |  4 ++--
 arch/x86/Kconfig|  9 -
 drivers/Kconfig |  1 +
 drivers/rapidio/Kconfig | 11 +++
 7 files changed, 20 insertions(+), 42 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 3912250ff813..67fbd4952ff4 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -892,7 +892,7 @@ config CAVIUM_OCTEON_SOC
bool "Cavium Networks Octeon SoC based boards"
select CEVT_R4K
select ARCH_HAS_PHYS_TO_DMA
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
select PHYS_ADDR_T_64BIT
select SYS_SUPPORTS_64BIT_KERNEL
select SYS_SUPPORTS_BIG_ENDIAN
@@ -3107,19 +3107,6 @@ config ZONE_DMA
 config ZONE_DMA32
bool
 
-config HAS_RAPIDIO
-   bool
-   default n
-
-config RAPIDIO
-   tristate "RapidIO support"
-   depends on HAS_RAPIDIO || PCI
-   help
- If you say Y here, the kernel will include drivers and
- infrastructure code to support RapidIO interconnect devices.
-
-source "drivers/rapidio/Kconfig"
-
 endmenu
 
 config TRAD_SIGNALS
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cc8435d87949..f2f70cc2bd44 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -939,26 +939,14 @@ config PCI_8260
select PPC_INDIRECT_PCI
default y
 
-config HAS_RAPIDIO
-   bool
-
-config RAPIDIO
-   tristate "RapidIO support"
-   depends on HAS_RAPIDIO || PCI
-   help
- If you say Y here, the kernel will include drivers and
- infrastructure code to support RapidIO interconnect devices.
-
 config FSL_RIO
bool "Freescale Embedded SRIO Controller support"
-   depends on RAPIDIO = y && HAS_RAPIDIO
+   depends on RAPIDIO = y && HAVE_RAPIDIO
default "n"
---help---
  Include support for RapidIO controller on Freescale embedded
  processors (MPC8548, MPC8641, etc).
 
-source "drivers/rapidio/Kconfig"
-
 endmenu
 
 config NONSTATIC_KERNEL
diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index ba0ea84ce578..d1af0ee2f8c8 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -66,7 +66,7 @@ config MPC85xx_CDS
bool "Freescale MPC85xx CDS"
select DEFAULT_UIMAGE
select PPC_I8259
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
help
  This option enables support for the MPC85xx CDS board
 
@@ -74,7 +74,7 @@ config MPC85xx_MDS
bool "Freescale MPC85xx MDS"
select DEFAULT_UIMAGE
select PHYLIB if NETDEVICES
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
select SWIOTLB
help
  This option enables support for the MPC85xx MDS board
@@ -219,7 +219,7 @@ config PPA8548
help
  This option enables support for the Prodrive PPA8548 board.
select DEFAULT_UIMAGE
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
 
 config GE_IMP3A
bool "GE Intelligent Platforms IMP3A"
@@ -277,7 +277,7 @@ config CORENET_GENERIC
select SWIOTLB
select GPIOLIB
select GPIO_MPC8XXX
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
select PPC_EPAPR_HV_PIC
help
  This option enables support for the FSL CoreNet based boards.
diff --git a/arch/powerpc/platforms/86xx/Kconfig 
b/arch/powerpc/platforms/86xx/Kconfig
index a4fa31a40502..413837a63242 100644
--- a/arch/powerpc/platforms/86xx/Kconfig
+++ b/arch/powerpc/platforms/86xx/Kconfig
@@ -15,7 +15,7 @@ config MPC8641_HPCN
select PPC_I8259
select DEFAULT_UIMAGE
select FSL_ULI1575 if PCI
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
select SWIOTLB
help
  This option enables support for the MPC8641 HPCN board.
@@ -57,7 +57,7 @@ config GEF_SBC610
select MMIO_NVRAM
select GPIOLIB
select GE_FPGA
-   select HAS_RAPIDIO
+   select HAVE_RAPIDIO
help
  This option enables support for the GE SBC610.
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 659d59d7f033..4c8052a7c3f9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2811,15 +2811,6 @@ config AMD_NB
d

[PATCH 7/9] pcmcia: allow PCMCIA support independent of the architecture

2018-11-15 Thread Christoph Hellwig
There is nothing architecture specific in the PCMCIA core, so allow
building it everywhere.  The actual host controllers will depend on ISA,
PCI or a specific SOC.

Signed-off-by: Christoph Hellwig 
Acked-by: Dominik Brodowski 
Acked-by: Thomas Gleixner 
---
 arch/alpha/Kconfig |  2 --
 arch/arm/Kconfig   |  2 --
 arch/ia64/Kconfig  | 10 --
 arch/m68k/Kconfig.bus  |  2 --
 arch/mips/Kconfig  |  2 --
 arch/powerpc/Kconfig   |  2 --
 arch/sh/Kconfig|  2 --
 arch/sparc/Kconfig |  2 --
 arch/unicore32/Kconfig |  6 --
 arch/x86/Kconfig   |  2 --
 arch/xtensa/Kconfig|  2 --
 drivers/Kconfig|  1 +
 drivers/parisc/Kconfig |  2 --
 drivers/pcmcia/Kconfig |  1 +
 14 files changed, 2 insertions(+), 36 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 1f679508bc34..0ff180ab2a42 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -669,8 +669,6 @@ config HZ
 
 source "drivers/eisa/Kconfig"
 
-source "drivers/pcmcia/Kconfig"
-
 config SRM_ENV
tristate "SRM environment through procfs"
depends on PROC_FS
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 73d0f5e9feb7..7b1dfaec030e 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1225,8 +1225,6 @@ config PCI_HOST_ITE8152
default y
select DMABOUNCE
 
-source "drivers/pcmcia/Kconfig"
-
 endmenu
 
 menu "Kernel Features"
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 8f18d90c933d..887e7bfd7055 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -542,16 +542,6 @@ endif
 
 endmenu
 
-if !IA64_HP_SIM
-
-menu "Bus options (PCI, PCMCIA)"
-
-source "drivers/pcmcia/Kconfig"
-
-endmenu
-
-endif
-
 source "arch/ia64/hp/sim/Kconfig"
 
 config MSPEC
diff --git a/arch/m68k/Kconfig.bus b/arch/m68k/Kconfig.bus
index 8cb0604b195b..9d0a3a23d50e 100644
--- a/arch/m68k/Kconfig.bus
+++ b/arch/m68k/Kconfig.bus
@@ -68,6 +68,4 @@ if !MMU
 config ISA_DMA_API
 def_bool !M5272
 
-source "drivers/pcmcia/Kconfig"
-
 endif
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 151a4aaf0610..3912250ff813 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -3107,8 +3107,6 @@ config ZONE_DMA
 config ZONE_DMA32
bool
 
-source "drivers/pcmcia/Kconfig"
-
 config HAS_RAPIDIO
bool
default n
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index cbdcd1c0b1e0..cc8435d87949 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -939,8 +939,6 @@ config PCI_8260
select PPC_INDIRECT_PCI
default y
 
-source "drivers/pcmcia/Kconfig"
-
 config HAS_RAPIDIO
bool
 
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 8a3c292ae906..44a45a37a3c4 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -855,8 +855,6 @@ config MAPLE
 Dreamcast with a serial line terminal or a remote network
 connection.
 
-source "drivers/pcmcia/Kconfig"
-
 endmenu
 
 menu "Power management options (EXPERIMENTAL)"
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 20417b8b12a5..daee2c73b6c5 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -503,8 +503,6 @@ config SPARC_GRPCI2
help
  Say Y here to include the GRPCI2 Host Bridge Driver.
 
-source "drivers/pcmcia/Kconfig"
-
 config SUN_OPENPROMFS
tristate "Openprom tree appears in /proc/openprom"
help
diff --git a/arch/unicore32/Kconfig b/arch/unicore32/Kconfig
index 4658859c6aee..96ac6cc6ab2a 100644
--- a/arch/unicore32/Kconfig
+++ b/arch/unicore32/Kconfig
@@ -117,12 +117,6 @@ config UNICORE_FPU_F64
 
 endmenu
 
-menu "Bus support"
-
-source "drivers/pcmcia/Kconfig"
-
-endmenu
-
 menu "Kernel Features"
 
 source "kernel/Kconfig.hz"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 953db09165c2..659d59d7f033 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2811,8 +2811,6 @@ config AMD_NB
def_bool y
depends on CPU_SUP_AMD && PCI
 
-source "drivers/pcmcia/Kconfig"
-
 config RAPIDIO
tristate "RapidIO support"
depends on PCI
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index 2865a556163a..322b7391de89 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -512,8 +512,6 @@ config FORCE_MAX_ZONEORDER
  This config option is actually maximum order plus one. For example,
  a value of 11 means that the largest free memory block is 2^10 pages.
 
-source "drivers/pcmcia/Kconfig"
-
 config PLATFORM_WANT_DEFAULT_MEM
def_bool n
 
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 059573823387..58ee88c36cf5 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -5,6 +5,7 @@ menu "Device Drivers"
 
 source "drivers/amba/Kconfig"
 source "drivers/pci/Kconfig"
+source "drivers/pcmcia/K

[PATCH 4/9] PCI: consolidate PCI config entry in drivers/pci

2018-11-15 Thread Christoph Hellwig
There is no good reason to duplicate the PCI menu in every architecture.
Instead provide a selectable HAVE_PCI symbol that indicates availability
of PCI support, and a FORCE_PCI symbol to for PCI on and the handle the
rest in drivers/pci.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Palmer Dabbelt 
Acked-by: Max Filippov 
Acked-by: Thomas Gleixner 
Acked-by: Bjorn Helgaas 
---
 arch/alpha/Kconfig | 15 +---
 arch/arc/Kconfig   | 20 --
 arch/arc/plat-axs10x/Kconfig   |  2 +-
 arch/arc/plat-hsdk/Kconfig |  2 +-
 arch/arm/Kconfig   | 25 +++-
 arch/arm/mach-alpine/Kconfig   |  2 +-
 arch/arm/mach-footbridge/Kconfig   |  8 ++--
 arch/arm/mach-ixp4xx/Kconfig   | 22 +--
 arch/arm/mach-ks8695/Kconfig   | 10 ++---
 arch/arm/mach-mv78xx0/Kconfig  |  2 +-
 arch/arm/mach-mvebu/Kconfig|  2 +-
 arch/arm/mach-orion5x/Kconfig  |  2 +-
 arch/arm/mach-pxa/Kconfig  |  2 +-
 arch/arm/mach-sa1100/Kconfig   |  2 +-
 arch/arm64/Kconfig | 14 +--
 arch/hexagon/Kconfig   |  3 --
 arch/ia64/Kconfig  | 10 +
 arch/m68k/Kconfig.bus  | 11 --
 arch/m68k/Kconfig.cpu  |  1 +
 arch/microblaze/Kconfig|  6 +--
 arch/mips/Kconfig  | 44 --
 arch/mips/alchemy/Kconfig  |  6 +--
 arch/mips/ath25/Kconfig|  3 +-
 arch/mips/ath79/Kconfig|  8 ++--
 arch/mips/bcm63xx/Kconfig  | 14 +++
 arch/mips/lantiq/Kconfig   |  2 +-
 arch/mips/loongson64/Kconfig   |  7 ++--
 arch/mips/pmcs-msp71xx/Kconfig | 10 ++---
 arch/mips/ralink/Kconfig   |  8 ++--
 arch/mips/sibyte/Kconfig   | 10 ++---
 arch/mips/txx9/Kconfig |  8 ++--
 arch/mips/vr41xx/Kconfig   |  8 ++--
 arch/parisc/Kconfig|  1 +
 arch/powerpc/Kconfig   | 20 +-
 arch/powerpc/platforms/40x/Kconfig | 10 ++---
 arch/powerpc/platforms/44x/Kconfig | 32 
 arch/powerpc/platforms/512x/Kconfig|  2 +-
 arch/powerpc/platforms/52xx/Kconfig|  2 +-
 arch/powerpc/platforms/83xx/Kconfig|  2 +-
 arch/powerpc/platforms/85xx/Kconfig|  2 +-
 arch/powerpc/platforms/86xx/Kconfig|  4 +-
 arch/powerpc/platforms/Kconfig |  2 +-
 arch/powerpc/platforms/Kconfig.cputype |  4 +-
 arch/powerpc/platforms/amigaone/Kconfig|  2 +-
 arch/powerpc/platforms/cell/Kconfig|  2 +-
 arch/powerpc/platforms/chrp/Kconfig|  2 +-
 arch/powerpc/platforms/embedded6xx/Kconfig |  4 +-
 arch/powerpc/platforms/maple/Kconfig   |  2 +-
 arch/powerpc/platforms/pasemi/Kconfig  |  2 +-
 arch/powerpc/platforms/powermac/Kconfig|  2 +-
 arch/powerpc/platforms/powernv/Kconfig |  2 +-
 arch/powerpc/platforms/ps3/Kconfig |  2 +-
 arch/powerpc/platforms/pseries/Kconfig |  2 +-
 arch/riscv/Kconfig | 18 +
 arch/s390/Kconfig  | 19 +++---
 arch/sh/Kconfig| 19 ++
 arch/sh/boards/Kconfig | 30 +++
 arch/sparc/Kconfig | 15 +---
 arch/um/Kconfig|  3 --
 arch/unicore32/Kconfig | 11 +-
 arch/x86/Kconfig   | 12 +-
 arch/x86/configs/i386_defconfig|  1 +
 arch/x86/configs/x86_64_defconfig  |  1 +
 arch/xtensa/Kconfig| 16 +---
 arch/xtensa/configs/common_defconfig   |  1 +
 arch/xtensa/configs/iss_defconfig  |  1 -
 drivers/Kconfig|  4 ++
 drivers/parisc/Kconfig | 11 --
 drivers/pci/Kconfig| 18 +
 drivers/pci/endpoint/Kconfig   |  2 +-
 70 files changed, 195 insertions(+), 379 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 65f6d0bf69d4..ef6ea8171994 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -6,7 +6,7 @@ config ALPHA
select ARCH_MIGHT_HAVE_PC_SERIO
select ARCH_NO_PREEMPT
select ARCH_USE_CMPXCHG_LOCKREF
-   select PCI if !ALPHA_JENSEN
+   select FORCE_PCI if !ALPHA_JENSEN
select HAVE_AOUT
select HAVE_IDE
select HAVE_OPROFILE
@@ -16,6 +16,7 @@ config ALPHA
select NEED_SG_DMA_LENGTH
select VIRT_TO_BUS
select GENERIC_IRQ_PROBE
+   select GENERIC_PCI_IOMAP if PCI
select AUTO_IRQ_AFFINITY if SMP
select GENERIC_IRQ_SHOW
select ARCH_WANT_IPC_PARSE_VERSION
@@ -320,17 +321,6

Re: [PATCH] xfs: Remove noinline from #define STATIC

2018-11-15 Thread Christoph Hellwig
On Tue, Nov 13, 2018 at 04:26:51PM +1100, Dave Chinner wrote:
> That's just as bad as removing them all, if not worse.
> 
> If you are writing new code or reworking existing code, then we'll
> consider the usage of STATIC/static in the context of that work.
> Otherwise, we leave it alone.
> 
> It if ain't broke, don't fix it. And it sure as hell isn't broken
> right now. We've got more than enough bugs to fix without having to
> deal with drive-by bikeshed painting...

Agreed.  That being said I think we should aim to add manual
noline annotations to those functions where we really care while we
go through the code instead of the weird STATIC that just seems to
cause this kind of confusion.

And if Joe or somone else can come up with a few patches where removing
the noinline (aka STATIC) makes a huge difference for a small number
of functions we should consider it as well.


Re: Re-run features-refresh.sh

2018-11-15 Thread Christoph Hellwig
FYI, we really should kill ARCH_SG_CHAIN in its current form.

See my series here, which could use a review or two:

https://lkml.org/lkml/2018/11/9/958


Re: [PATCH v3 3/3] MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur

2018-11-14 Thread Christoph Hellwig
T_64BIT && PCI
>  
>  config SIBYTE_SENTOSA
>   bool "Sibyte BCM91250E-Sentosa"
> @@ -826,6 +828,7 @@ config SIBYTE_BIGSUR
>   select SYS_SUPPORTS_HIGHMEM
>   select SYS_SUPPORTS_LITTLE_ENDIAN
>   select ZONE_DMA32 if 64BIT
> + select SWIOTLB if ARCH_DMA_ADDR_T_64BIT && PCI
>  
>  config SNI_RM
>   bool "SNI RM200/300/400"
> Index: linux-20181104-swarm64-eb/arch/mips/sibyte/common/Makefile
> ===
> --- linux-20181104-swarm64-eb.orig/arch/mips/sibyte/common/Makefile
> +++ linux-20181104-swarm64-eb/arch/mips/sibyte/common/Makefile
> @@ -1,4 +1,5 @@
>  obj-y := cfe.o
> +obj-$(CONFIG_SWIOTLB)+= dma.o
>  obj-$(CONFIG_SIBYTE_BUS_WATCHER) += bus_watcher.o
>  obj-$(CONFIG_SIBYTE_CFE_CONSOLE) += cfe_console.o
>  obj-$(CONFIG_SIBYTE_TBPROF)  += sb_tbprof.o
> Index: linux-20181104-swarm64-eb/arch/mips/sibyte/common/dma.c
> ===
> --- /dev/null
> +++ linux-20181104-swarm64-eb/arch/mips/sibyte/common/dma.c
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + *   DMA support for Broadcom SiByte platforms.
> + *
> + *   Copyright (c) 2018  Maciej W. Rozycki
> + *
> + *   This program is free software; you can redistribute it and/or
> + *   modify it under the terms of the GNU General Public License
> + *   as published by the Free Software Foundation; either version
> + *   2 of the License, or (at your option) any later version.
> + */

Ok, so you have the SPDX tag, but also the duplicate license text,
which is what confused me earlier.

This should just be:

// SPDX-License-Identifier: GPL-2.0+
/*
 *  DMA support for Broadcom SiByte platforms.
 *
 *  Copyright (c) 2018  Maciej W. Rozycki
 */

With that looks good to me:

Reviewed-by: Christoph Hellwig 


Re: [PATCH v3 2/3] MIPS: SiByte: Enable ZONE_DMA32 for LittleSur

2018-11-14 Thread Christoph Hellwig
On Tue, Nov 13, 2018 at 10:42:37PM +, Maciej W. Rozycki wrote:
> The LittleSur board is marked for high memory support and therefore 
> clearly must provide a way to have enough memory installed for some to 
> be present outside the low 4GiB physical address range.  With the memory 
> map of the BCM1250 SOC it has been built around it means over 1GiB of 
> actual DRAM, as only the first 1GiB is mapped in the low 4GiB physical 
> address range[1].
> 
> Complement commit cce335ae47e2 ("[MIPS] 64-bit Sibyte kernels need 
> DMA32.") then and also enable ZONE_DMA32 for LittleSur.

Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH v3 1/3] MIPS: SiByte: Set 32-bit bus mask for BCM1250 PCI

2018-11-14 Thread Christoph Hellwig
Looks fine,

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-12 Thread Christoph Hellwig
On Mon, Nov 12, 2018 at 03:12:39PM +1100, Finn Thain wrote:
> Implementations of arch_gettimeoffset are generally not re-entrant
> and assume that interrupts have been disabled. Unfortunately this
> pre-condition got broken in v2.6.32.
> 
> Fixes: 5cfc8ee0bb51 ("ARM: convert arm to arch_gettimeoffset()")
> Signed-off-by: Finn Thain 

This looks like a sensible fix for -stable backporting.  But with m68k
converted over it might be time for these two arm platforms to either
be converted to the clocksource API or to be dropped..


Re: [PATCH -next] sysv: return 'err' instead of 0 in __sysv_write_inode

2018-11-10 Thread Christoph Hellwig
Looks fine:

Reviewed-by: Christoph Hellwig 

Al, can you queue it up?


Re: [PATCH 2/4] irqchip: sifive-plic: More flexible plic_irq_toggle()

2018-11-09 Thread Christoph Hellwig
> -
>  struct plic_handler {
>   boolpresent;
> - int ctxid;
>   void __iomem*hart_base;
>   raw_spinlock_t  enable_lock;
>   void __iomem*enable_base;
>  };
> -static DEFINE_PER_CPU(struct plic_handler, plic_handlers);

This does not match the changelog at all.


Re: [PATCH 1/4] irqchip: sifive-plic: Pre-compute context hart base and enable base

2018-11-09 Thread Christoph Hellwig
On Mon, Oct 22, 2018 at 05:15:14PM +0530, Anup Patel wrote:
> This patch does following optimizations:
> 1. Pre-compute hart base for each context handler
> 2. Pre-compute enable base for each context handler

Why?

> 3. Have enable lock for each context handler instead
> of global plic_toggle_lock

Why?  Also even if you want this it should be a separate patch.

>  #define PRIORITY_BASE0
> -#define PRIORITY_PER_ID  4
> +#define PRIORITY_PER_ID  4

Also please drop the random whitespace changes.


Re: RFC: userspace exception fixups

2018-11-08 Thread Christoph Hellwig
On Thu, Nov 08, 2018 at 12:05:42PM -0800, Andy Lutomirski wrote:
> This whole thing is a mess.  I'm starting to think that the cleanest
> solution would be to provide a way to just tell the kernel that
> certain RIP values have exception fixups.

The bay far cleanest solution would be to say that SGX is sich a mess
that we are not going to support it at all.  It's not like it is a must
have a feature to start with.


Re: [PATCH] jffs2: implement mount option to configure endianness

2018-11-08 Thread Christoph Hellwig
On Wed, Nov 07, 2018 at 06:41:47PM +, Joakim Tjernlund wrote:
> > Certainly not. I'm not sure which architectures do have Spectre V2
> > mitigations which make indirect branches expensive now... perhaps there is
> > no intersection with the cases where we really care about JFFS2 being
> > CPU-bound?
> 
> All that passing of a new extra arg and extra if .. elif .. does cost too.

I know of no current microarchitecture where an indirect call is cheaper
than a simple branch, which can be very well predicted.


Re: [PATCH v2] nvme: create 'paths' entries for hidden controllers

2018-11-08 Thread Christoph Hellwig
On Thu, Nov 01, 2018 at 08:29:55PM -0300, Thadeu Lima de Souza Cascardo wrote:
> When using initramfs-tools with only the necessary dependencies to mount
> the root filesystem, it will fail to include nvme drivers for a root on a
> multipath nvme. That happens because the slaves relationship is not
> present.
> 
> As discussed in [1], using slaves will break lsblk, because the slaves are
> hidden from userspace, that is, they have no real block device, just an
> entry under sysfs.

I wonder if the better way would be to unhide the slaves, but always
have a claim on them, so that others can't really use them?  While the
hiding idea seamed very neat it seems to cause a fair amount of problems
after all.


Re: [PATCH 1/2] pci: prevent sk hynix nvme from entering D3

2018-11-07 Thread Christoph Hellwig
On Tue, Nov 06, 2018 at 06:11:31PM +0800, AceLan Kao wrote:
> Agree, this is not a good fix for Sk hynix nvme, so Dell is still pushing
> Sk hynix to fix it from firmware.
> But before the firmware is ready, this is still a issue that need to be fixed 
> in
> kernel side, and the new firmware may not be applied on the old nvme
> modules.
> This list won't keep growing, and we'll keep an eye on the new firmware and
> co-work with engineers from Sky hynix to make sure this issue won't happen
> again.

We generally quirk for grave issues that make devices unusable.

To me working around a D3 mode that consumers more power than D0
does not quite fit that bill.


Re: [PATCH 2/2] MIPS: SiByte: Enable swiotlb for SWARM and BigSur

2018-11-06 Thread Christoph Hellwig
Do we really need a new source file just to call swiotlb_init?  And
if we do that file should have a SPDX header these days.

Otherwise looks fine:

Reviewed-by: Christoph Hellwig 


Re: [PATCH 1/2] MIPS: SiByte: Set 32-bit bus mask for BCM1250 PCI

2018-11-06 Thread Christoph Hellwig
.diff
> Index: linux-20181104-swarm64-eb/arch/mips/pci/fixup-sb1250.c
> ===
> --- linux-20181104-swarm64-eb.orig/arch/mips/pci/fixup-sb1250.c
> +++ linux-20181104-swarm64-eb/arch/mips/pci/fixup-sb1250.c
> @@ -1,6 +1,7 @@
>  /*
>   *   Copyright (C) 2004, 2006  MIPS Technologies, Inc.  All rights reserved.
>   *   Author: Maciej W. Rozycki 
> + *   Copyright (C) 2018  Maciej W. Rozycki
>   *
>   *   This program is free software; you can redistribute it and/or
>   *   modify it under the terms of the GNU General Public License
> @@ -8,6 +9,7 @@
>   *   2 of the License, or (at your option) any later version.
>   */
>  
> +#include 
>  #include 
>  
>  /*
> @@ -22,6 +24,52 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SI
>   quirk_sb1250_pci);
>  
>  /*
> + * The BCM1250, etc. PCI host bridge does not support DAC on its 32-bit
> + * bus, so we set the bus's DMA mask accordingly.  However the HT link
> + * down the artificial PCI-HT bridge supports 40-bit addressing and the
> + * SP1011 HT-PCI bridge downstream supports both DAC and a 64-bit bus
> + * width, so we record the PCI-HT bridge's secondary and subordinate bus
> + * numbers and do not set the mask for devices present in the inclusive
> + * range of those.
> + */
> +struct sb1250_bus_dma_mask_exclude {
> + bool set;
> + unsigned char start;
> + unsigned char end;
> +};
> +
> +static int sb1250_bus_dma_mask(struct pci_dev *dev, void *data)
> +{
> + struct sb1250_bus_dma_mask_exclude *exclude = data;
> +
> + if (!exclude->set && (dev->vendor == PCI_VENDOR_ID_SIBYTE &&
> +   dev->device == PCI_DEVICE_ID_BCM1250_HT)) {
> + exclude->start = dev->subordinate->number;
> + exclude->end = pci_bus_max_busnr(dev->subordinate);
> + exclude->set = true;
> + dev_dbg(>dev, "not disabling DAC for [bus %02x-%02x]",
> + exclude->start, exclude->end);
> + } else if (!exclude->set ||
> +(exclude->set && (dev->bus->number < exclude->start ||
> +  dev->bus->number > exclude->end))) {
> +     dev_dbg(>dev, "disabling DAC for device");
> + dev->dev.bus_dma_mask = DMA_BIT_MASK(32);
> + } else {
> + dev_dbg(>dev, "not disabling DAC for device");
> + }
> + return 0;

Hmm, these conditions look very hard to read to me.  Wouldn't this
have the same effect?

if (exclude->set)
return;

if (dev->vendor == PCI_VENDOR_ID_SIBYTE &&
dev->device == PCI_DEVICE_ID_BCM1250_HT) {
exclude->start = dev->subordinate->number;
exclude->end = pci_bus_max_busnr(dev->subordinate);
exclude->set = true;
dev_dbg(>dev, "not disabling DAC for [bus %02x-%02x]",
exclude->start, exclude->end);
return;
}

if (dev->bus->number < exclude->start ||
dev->bus->number > exclude->end))) {
dev_dbg(>dev, "disabling DAC for device");
dev->dev.bus_dma_mask = DMA_BIT_MASK(32);
return;
}

dev_dbg(>dev, "not disabling DAC for device");
return 0;

Otherwise this looks fine to me:

Reviewed-by: Christoph Hellwig 


Re: [PATCH 2/2] nvme: add quirk to not call disable function when suspending

2018-11-05 Thread Christoph Hellwig
On Tue, Nov 06, 2018 at 10:04:02AM +0800, AceLan Kao wrote:
> Call nvme_dev_disable() function leads to the power consumption goes
> up to 2.2 Watt during suspend-to-idle, and from SK hynix FE, they
> suggest us to use its own APST feature to do the power management during
> s2idle.
> After D3 is diabled and nvme_dev_disable() is not called while
> suspending, the power consumption drops to 0.77 Watt during s2idle.

Same comment as previous one.  Get the firmware fixed please - this
doesn't prevent the drive from working, but it is a grave implementation
bug in it that needs to get fixed.


Re: [PATCH 1/2] pci: prevent sk hynix nvme from entering D3

2018-11-05 Thread Christoph Hellwig
On Tue, Nov 06, 2018 at 10:04:01AM +0800, AceLan Kao wrote:
> It leads to the power consumption raises to 2.2W during s2idle, while
> it consumes less than 1W during long idle if put SK hynix nvme to D3
> and then enter s2idle.
> >From SK hynix FE, MS Windows doesn't put nvme to D3, and uses its own
> APST feature to do the power management.
> To leverage its APST feature during s2idle, we can't disable nvme
> device while suspending, too.
> 
> BTW, prevent it from entering D3 will increase the power consumtion around
> 0.13W ~ 0.15W during short/long idle, and the power consumption during
> s2idle becomes 0.77W.

Please get this fixed in firmware.  And yes, depending on which Windows
driver you use Windows might be doing very agressive runtime D3, which
is parted of Windows modern standby, so it'll trip up there eventually 
as well.


Re: [RFC 0/2] RISC-V: A proposal to add vendor-specific code

2018-11-05 Thread Christoph Hellwig
On Mon, Nov 05, 2018 at 02:51:33PM +0100, Arnd Bergmann wrote:
> With the stricter policy you suggest, we'd loose the ability to support
> some extensions that might be common:
> 
> - an extension for user space that adds new registers that must be
>   saved and restored on a task switch, e.g. FPU, DSP or NPU
>   instructions. ARM supports several incompatible extensions like
>   that in one kernel, and this is really ugly, but I suspect RISC-V
>   will already need the same thing to support all combinations of
>   standard extensions, so from a practical perspective it's not
>   much different for custom extension, aside from the question
>   how far you want to go to discourage custom extensions by
>   requiring users to patch their kernels.

Palmer already explain that this is supposed to be handled by the
XS bit + SBI calls.  I'm personally not totally sold on the SBI call
and standard ways to save the state in the instruction set, similar
to modern x86 might be a better option, but that is something the
privileged spec working group will have to decide.

> - A crypto instruction for a cipher that is used in the kernel
>   for speeding up network or block data encryption.
>   This would typically be a standalone loadable module, so
>   the impact of allowing custom extensions in addition to
>   standard ones is minimal.

And that is a prime example for something that should never be vendor
specific.  If an instruction set extension is useful for something
entirely generic it should be standardized in a working group as an
extension.


Re: [RFC 0/2] RISC-V: A proposal to add vendor-specific code

2018-11-05 Thread Christoph Hellwig
On Mon, Nov 05, 2018 at 09:39:29PM +0200, Nick Kossifidis wrote:
> a) By directly modifying your custom CSRs, it means that we will need
> compiler support in order to compile a kernel with your code in it. This
> will break CI systems and will introduce various issues on testing and
> reviewing your code. In general if we need custom toolchains to compile
> the kernel, that may be unavailable (vendors will not always open source
> their compiler support), we won't be able to maintain a decent level of
> code quality in the tree. How can the maintainer push your code on the
> repository if he/she can't even perform a basic compilation test ?

And that (besides avoiding the wild growth of extensions) is the major
reason why accepting vendor specific CSRs or instructions is a no-go.


Re: [REGRESSION] OCTEON MMC driver failure with v4.19

2018-11-05 Thread Christoph Hellwig
On Tue, Nov 06, 2018 at 12:06:32AM +0200, Aaro Koskinen wrote:
> With the below change, the MMC card probe seems to with v4.19. But it
> feels a bit hackish, don't you think... Is there some obvious simple
> fix that I'm missing? Any comments?

Please just use dma_coerce_mask_and_coherent in the platform drivers
instead.


Re: [PATCH] x86/pci: Annotate 'pci_root_ops' with __ro_after_init

2018-11-05 Thread Christoph Hellwig
On Mon, Nov 05, 2018 at 12:17:35PM -0800, Zubin Mithra wrote:
> When CONFIG_X86_INTEL_MID is set pci_root_ops is written to inside
> intel_mid_pci_init(which is marked __init) and not modified after. This
> makes pci_root_ops a suitable candidate for annotating as
> __ro_after_init.

Yikes.  pci_root_ops really should be const to start with, and we
should use a separate struct for MID.  Having structures that contain
function pointers not marked const is generally a bad idea to start with.


Re: [RFC 0/2] RISC-V: A proposal to add vendor-specific code

2018-11-05 Thread Christoph Hellwig
On Mon, Nov 05, 2018 at 09:52:52AM +0100, Arnd Bergmann wrote:
> > I fundamentally disagree with this… and think it should be the contrary.
> >
> > 1. The kernel shall support no vendor specific instructions whatsoever,
> > period.
> 
> I think what was meant above is
> 
> 1. If a vendor extension requires kernel support, that support
> must be able to be built into a kernel image without breaking support
> for CPUs that do not have that extension, to allow building a single
> kernel image that works on all CPUs.

No.  This literally means no vendor extensions involving instructions
or CSRs in the kernel.  They are fine for userspace, or for the M-mode
code including impementation of the SBI, but not for the kernel.


Re: [RFC 0/2] RISC-V: A proposal to add vendor-specific code

2018-11-04 Thread Christoph Hellwig
On Mon, Nov 05, 2018 at 02:58:07PM +0800, Vincent Chen wrote:
> Many thanks for kinds of comments. I quickly synthesize the comments and
> list them as below.
> 1. The kernel image shall include all vendor-specific code.

I fundamentally disagree with this… and think it should be the contrary.

1. The kernel shall support no vendor specific instructions whatsoever,
period.


Re: [RFC 0/2] RISC-V: A proposal to add vendor-specific code

2018-10-31 Thread Christoph Hellwig
On Wed, Oct 31, 2018 at 04:46:10PM +0530, Anup Patel wrote:
> I agree that we need a place for vendor-specific ISA extensions and
> having vendor-specific directories is also good.

The only sensible answer is that we should not allow vendor specific
extensions in the kernel at all.  We need to standardize cache flushing
and we need to do it soon and not introduce horrible bandaids because
the foundation did not do its work.


Re: [PATCH] fs: fix lost error code in dio_complete

2018-10-30 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH 2/2] x86/ldt: Unmap PTEs for the slow before freeing LDT

2018-10-24 Thread Christoph Hellwig
The subject line does not parse..


[PATCH] parisc: remove the dead ccio-rm-dma driver

2018-10-22 Thread Christoph Hellwig
This driver has never been wired up due to the life of the Linux
git tree, and has severely bitrotted.

Signed-off-by: Christoph Hellwig 
---
 drivers/parisc/Makefile  |   3 -
 drivers/parisc/ccio-rm-dma.c | 202 ---
 2 files changed, 205 deletions(-)
 delete mode 100644 drivers/parisc/ccio-rm-dma.c

diff --git a/drivers/parisc/Makefile b/drivers/parisc/Makefile
index 3cd5e6cb8478..99fa6a89e0b9 100644
--- a/drivers/parisc/Makefile
+++ b/drivers/parisc/Makefile
@@ -8,9 +8,6 @@
 obj-$(CONFIG_IOSAPIC)  += iosapic.o
 obj-$(CONFIG_IOMMU_SBA)+= sba_iommu.o
 obj-$(CONFIG_PCI_LBA)  += lba_pci.o
-
-# Only use one of them: ccio-rm-dma is for PCX-W systems *only*
-# obj-$(CONFIG_IOMMU_CCIO) += ccio-rm-dma.o
 obj-$(CONFIG_IOMMU_CCIO)   += ccio-dma.o
 
 obj-$(CONFIG_GSC)  += gsc.o
diff --git a/drivers/parisc/ccio-rm-dma.c b/drivers/parisc/ccio-rm-dma.c
deleted file mode 100644
index df7932af48b7..
--- a/drivers/parisc/ccio-rm-dma.c
+++ /dev/null
@@ -1,202 +0,0 @@
-/*
- * ccio-rm-dma.c:
- * DMA management routines for first generation cache-coherent machines.
- * "Real Mode" operation refers to U2/Uturn chip operation. The chip
- *  can perform coherency checks w/o using the I/O MMU. That's all we
- *  need until support for more than 4GB phys mem is needed.
- * 
- * This is the trivial case - basically what x86 does.
- *
- * Drawbacks of using Real Mode are:
- * o outbound DMA is slower since one isn't using the prefetching
- *   U2 can do for outbound DMA.
- * o Ability to do scatter/gather in HW is also lost.
- *  o only known to work with PCX-W processor. (eg C360)
- *(PCX-U/U+ are not coherent with U2 in real mode.)
- *
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- *
- * Original version/author:
- *  CVSROOT=:pserver:anonymous@198.186.203.37:/cvsroot/linux-parisc
- *  cvs -z3 co linux/arch/parisc/kernel/dma-rm.c
- *
- * (C) Copyright 2000 Philipp Rumpf 
- *
- *
- * Adopted for The Puffin Group's parisc-linux port by Grant Grundler.
- * (C) Copyright 2000 Grant Grundler 
- * 
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-
-#include 
-#include 
-#include 
-
-/* Only chose "ccio" since that's what HP-UX calls it
-** Make it easier for folks to migrate from one to the other :^)
-*/
-#define MODULE_NAME "ccio"
-
-#define U2_IOA_RUNWAY 0x580
-#define U2_BC_GSC 0x501
-#define UTURN_IOA_RUNWAY 0x581
-#define UTURN_BC_GSC 0x502
-
-#define IS_U2(id) ( \
-(((id)->hw_type == HPHW_IOA) && ((id)->hversion == U2_IOA_RUNWAY)) || \
-(((id)->hw_type == HPHW_BCPORT) && ((id)->hversion == U2_BC_GSC))  \
-)
-
-#define IS_UTURN(id) ( \
-(((id)->hw_type == HPHW_IOA) && ((id)->hversion == UTURN_IOA_RUNWAY)) || \
-(((id)->hw_type == HPHW_BCPORT) && ((id)->hversion == UTURN_BC_GSC))  \
-)
-
-static int ccio_dma_supported( struct pci_dev *dev, u64 mask)
-{
-   if (dev == NULL) {
-   printk(KERN_ERR MODULE_NAME ": EISA/ISA/et al not supported\n");
-   BUG();
-   return(0);
-   }
-
-   /* only support 32-bit devices (ie PCI/GSC) */
-   return((int) (mask >= 0xUL));
-}
-
-
-static void *ccio_alloc_consistent(struct pci_dev *dev, size_t size,
-dma_addr_t *handle)
-{
-   void *ret;
-   
-   ret = (void *)__get_free_pages(GFP_ATOMIC, get_order(size));
-
-   if (ret != NULL) {
-   memset(ret, 0, size);
-   *handle = virt_to_phys(ret);
-   }
-   return ret;
-}
-   
-static void ccio_free_consistent(struct pci_dev *dev, size_t size,
-  void *vaddr, dma_addr_t handle)
-{
-   free_pages((unsigned long)vaddr, get_order(size));
-}
-
-static dma_addr_t ccio_map_single(struct pci_dev *dev, void *ptr, size_t size,
- int direction)
-{
-   return virt_to_phys(ptr);
-}
-
-static void ccio_unmap_single(struct pci_dev *dev, dma_addr_t dma_addr,
-   size_t size, int direction)
-{
-   /* Nothing to do */
-}
-
-
-static int ccio_map_sg(struct pci_dev *dev, struct scatterlist *sglist, int 
nents, int direction)
-{
-   int tmp = nents;
-
-/* KISS: map each buffer separately. */
-   while (nents) {
-   sg_dma_address(sglist) = ccio_map_single(dev, sglist->address, 
sglist->length, direction);
-   sg_dma_len(sglist) = sglist->length;
-   nents--;
-   sglist++;
-   }
-
-   return tmp;
-}
-
-
-static void ccio_unmap_s

[tip:x86/urgent] x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels

2018-10-18 Thread tip-bot for Christoph Hellwig
Commit-ID:  485734f3fc77c1eb77ffe138c027b9a4bf0178f3
Gitweb: https://git.kernel.org/tip/485734f3fc77c1eb77ffe138c027b9a4bf0178f3
Author: Christoph Hellwig 
AuthorDate: Sun, 14 Oct 2018 09:52:08 +0200
Committer:  Ingo Molnar 
CommitDate: Fri, 19 Oct 2018 07:49:32 +0200

x86/swiotlb: Enable swiotlb for > 4GiG RAM on 32-bit kernels

We already build the swiotlb code for 32-bit kernels with PAE support,
but the code to actually use swiotlb has only been enabled for 64-bit
kernels for an unknown reason.

Before Linux v4.18 we paper over this fact because the networking code,
the SCSI layer and some random block drivers implemented their own
bounce buffering scheme.

[ mingo: Changelog fixes. ]

Fixes: 21e07dba9fb1 ("scsi: reduce use of block bounce buffers")
Fixes: ab74cfebafa3 ("net: remove the PCI_DMA_BUS_IS_PHYS check in 
illegal_highdma")
Reported-by: Matthew Whitehead 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Thomas Gleixner 
Tested-by: Matthew Whitehead 
Cc: konrad.w...@oracle.com
Cc: io...@lists.linux-foundation.org
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20181014075208.2715-1-...@lst.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/pci-swiotlb.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 661583662430..71c0b01d93b1 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -42,10 +42,8 @@ IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
 int __init pci_swiotlb_detect_4gb(void)
 {
/* don't initialize swiotlb if iommu=off (no_iommu=1) */
-#ifdef CONFIG_X86_64
if (!no_iommu && max_possible_pfn > MAX_DMA32_PFN)
swiotlb = 1;
-#endif
 
/*
 * If SME is active then swiotlb will be set to 1 so that bounce


[PATCH] mtd: rawnand: r852: use generic DMA API

2018-10-18 Thread Christoph Hellwig
Use the generic DMA API instead of the legacy PCI DMA API.

Signed-off-by: Christoph Hellwig 
---
 drivers/mtd/nand/raw/r852.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/mtd/nand/raw/r852.c b/drivers/mtd/nand/raw/r852.c
index 39be65b35ac25..35f0b343cf908 100644
--- a/drivers/mtd/nand/raw/r852.c
+++ b/drivers/mtd/nand/raw/r852.c
@@ -151,8 +151,9 @@ static void r852_dma_done(struct r852_device *dev, int 
error)
dev->dma_stage = 0;
 
if (dev->phys_dma_addr && dev->phys_dma_addr != dev->phys_bounce_buffer)
-   pci_unmap_single(dev->pci_dev, dev->phys_dma_addr, R852_DMA_LEN,
-   dev->dma_dir ? PCI_DMA_FROMDEVICE : PCI_DMA_TODEVICE);
+   dma_unmap_single(>pci_dev->dev, dev->phys_dma_addr,
+   R852_DMA_LEN,
+   dev->dma_dir ? DMA_FROM_DEVICE : DMA_TO_DEVICE);
 }
 
 /*
@@ -197,11 +198,10 @@ static void r852_do_dma(struct r852_device *dev, uint8_t 
*buf, int do_read)
bounce = 1;
 
if (!bounce) {
-   dev->phys_dma_addr = pci_map_single(dev->pci_dev, (void *)buf,
+   dev->phys_dma_addr = dma_map_single(>pci_dev->dev, buf,
R852_DMA_LEN,
-   (do_read ? PCI_DMA_FROMDEVICE : PCI_DMA_TODEVICE));
-
-   if (pci_dma_mapping_error(dev->pci_dev, dev->phys_dma_addr))
+   do_read ? DMA_FROM_DEVICE : DMA_TO_DEVICE);
+   if (dma_mapping_error(>pci_dev->dev, dev->phys_dma_addr))
bounce = 1;
}
 
@@ -835,7 +835,7 @@ static int  r852_probe(struct pci_dev *pci_dev, const 
struct pci_device_id *id)
 
pci_set_master(pci_dev);
 
-   error = pci_set_dma_mask(pci_dev, DMA_BIT_MASK(32));
+   error = dma_set_mask(_dev->dev, DMA_BIT_MASK(32));
if (error)
goto error2;
 
@@ -885,8 +885,8 @@ static int  r852_probe(struct pci_dev *pci_dev, const 
struct pci_device_id *id)
dev->pci_dev = pci_dev;
pci_set_drvdata(pci_dev, dev);
 
-   dev->bounce_buffer = pci_alloc_consistent(pci_dev, R852_DMA_LEN,
-   >phys_bounce_buffer);
+   dev->bounce_buffer = dma_alloc_coherent(_dev->dev, R852_DMA_LEN,
+   >phys_bounce_buffer, GFP_KERNEL);
 
if (!dev->bounce_buffer)
goto error6;
@@ -946,8 +946,8 @@ static int  r852_probe(struct pci_dev *pci_dev, const 
struct pci_device_id *id)
 error8:
pci_iounmap(pci_dev, dev->mmio);
 error7:
-   pci_free_consistent(pci_dev, R852_DMA_LEN,
-   dev->bounce_buffer, dev->phys_bounce_buffer);
+   dma_free_coherent(_dev->dev, R852_DMA_LEN, dev->bounce_buffer,
+ dev->phys_bounce_buffer);
 error6:
kfree(dev);
 error5:
@@ -980,8 +980,8 @@ static void r852_remove(struct pci_dev *pci_dev)
/* Cleanup */
kfree(dev->tmp_buffer);
pci_iounmap(pci_dev, dev->mmio);
-   pci_free_consistent(pci_dev, R852_DMA_LEN,
-   dev->bounce_buffer, dev->phys_bounce_buffer);
+   dma_free_coherent(_dev->dev, R852_DMA_LEN, dev->bounce_buffer,
+ dev->phys_bounce_buffer);
 
kfree(dev->chip);
kfree(dev);
-- 
2.19.1



Re: [PATCH] ata: add Buddha PATA controller driver

2018-10-18 Thread Christoph Hellwig
> +static int __init pata_buddha_init_one(void)
> +{
> + struct zorro_dev *z = NULL;
> +
> + while ((z = zorro_find_device(ZORRO_WILDCARD, z))) {

I'm not really an m68k expert, but shouldn't this implement a
struct zorro_driver instead?  (or maybe two of them as xsurf
seems sufficiently different).


Re: [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd

2018-10-17 Thread Christoph Hellwig
On Wed, Oct 17, 2018 at 08:44:01AM -0700, Daniel Colascione wrote:
> > Even if no one changes these specific flags we still need a lock due
> > to rmw cycles on the field.  For example fadvise can set or clear
> > FMODE_RANDOM.  It seems to use file->f_lock for synchronization.
> 
> Compare-and-exchange will suffice, right?

Only if all users use the compare and exchange, and right now they
don't.


Re: [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd

2018-10-17 Thread Christoph Hellwig
On Wed, Oct 17, 2018 at 03:39:58AM -0700, Joel Fernandes wrote:
> > > This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> > > To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
> > > prevents any future mmap and write syscalls from succeeding while
> > > keeping the existing mmap active. The following program shows the seal
> > > working in action:
> > 
> > Where does the FS come from?  I'd rather expect this to be implemented
> > as a 'force' style flag that applies the seal even if the otherwise
> > required precondition is not met.
> 
> The "FS" was meant to convey that the seal is preventing writes at the VFS
> layer itself, for example vfs_write checks FMODE_WRITE and does not proceed,
> it instead returns an error if the flag is not set. I could not find a better
> name for it, I could call it F_SEAL_VFS_WRITE if you prefer?

I don't think there is anything VFS or FS about that - at best that
is an implementation detail.

Either do something like the force flag I suggested in the last mail,
or give it a name that matches the intention, e.g F_SEAL_FUTURE_WRITE.

> I could make it such that this seal would not be allowed unless F_SEAL_SHRINK
> and F_SEAL_GROW are either previously set, or they are passed along with this
> seal. Would that make more sense to you?

Yes.

> > >  static int memfd_add_seals(struct file *file, unsigned int seals)
> > >  {
> > > @@ -219,6 +220,9 @@ static int memfd_add_seals(struct file *file, 
> > > unsigned int seals)
> > >   }
> > >   }
> > >  
> > > + if ((seals & F_SEAL_FS_WRITE) && !(*file_seals & F_SEAL_FS_WRITE))
> > > + file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > > +
> > 
> > This seems to lack any synchronization for f_mode.
> 
> The f_mode is set when the struct file is first created and then memfd sets
> additional flags in memfd_create. Then later we are changing it here at the
> time of setting the seal. I donot see any possiblity of a race since it is
> impossible to set the seal before memfd_create returns. Could you provide
> more details about what kind of synchronization is needed and what is the
> race condition scenario you were thinking off?

Even if no one changes these specific flags we still need a lock due
to rmw cycles on the field.  For example fadvise can set or clear
FMODE_RANDOM.  It seems to use file->f_lock for synchronization.


Re: [PATCH v2 1/2] mm: Add an F_SEAL_FS_WRITE seal to memfd

2018-10-17 Thread Christoph Hellwig
On Tue, Oct 09, 2018 at 03:20:41PM -0700, Joel Fernandes (Google) wrote:
> One of the main usecases Android has is the ability to create a region
> and mmap it as writeable, then drop its protection for "future" writes
> while keeping the existing already mmap'ed writeable-region active.

s/drop/add/ ?

Otherwise this doesn't make much sense to me.

> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
> To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
> prevents any future mmap and write syscalls from succeeding while
> keeping the existing mmap active. The following program shows the seal
> working in action:

Where does the FS come from?  I'd rather expect this to be implemented
as a 'force' style flag that applies the seal even if the otherwise
required precondition is not met.

> Note: This seal will also prevent growing and shrinking of the memfd.
> This is not something we do in Android so it does not affect us, however
> I have mentioned this behavior of the seal in the manpage.

This seems odd, as that is otherwise split into the F_SEAL_SHRINK /
F_SEAL_GROW flags.

>  static int memfd_add_seals(struct file *file, unsigned int seals)
>  {
> @@ -219,6 +220,9 @@ static int memfd_add_seals(struct file *file, unsigned 
> int seals)
>   }
>   }
>  
> + if ((seals & F_SEAL_FS_WRITE) && !(*file_seals & F_SEAL_FS_WRITE))
> + file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> +

This seems to lack any synchronization for f_mode.


Re: rdma-core doesn't install driver.h, broke libibscif

2018-10-17 Thread Christoph Hellwig
On Mon, Oct 15, 2018 at 05:53:44PM +, Woodruff, Robert J wrote:
> James Harvey wrote,
> 
> >Short: Is libibscif dead, and should OS repositories remove it?
> 
> Libibscif is for an old product,  Intel's KNC, that is no longer sold and the 
> S/W is frozen. The open source libibscif is no longer maintained. 
> It has already been removed from the latest community OFED distribution. The 
> individual package on the OFA downloads is for archive purposed only.

Does this mean we can kill the kernel scif code as well?  It is a
pretty horrible codebase that gets in the way a lot (and hasn't really
seen updates since the initial merge).


Re: [PATCH] xfs: Fix error code in 'xfs_ioc_getbmap()'

2018-10-17 Thread Christoph Hellwig
On Wed, Oct 17, 2018 at 08:21:38AM +0200, Christophe JAILLET wrote:
> In this function, once 'buf' has been allocated, we unconditionally
> return 0.
> However, 'error' is set to some error codes in several error handling
> paths.
> Before commit 232b51948b99 ("xfs: simplify the xfs_getbmap interface")
> this was not an issue because all error paths were returning directly,
> but now that some cleanup at the end may be needed, we must propagate the
> error code.
> 
> Fixes: 232b51948b99 ("xfs: simplify the xfs_getbmap interface")
> Signed-off-by: Christophe JAILLET 

Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH 00/11] staging: gasket: fixes

2018-10-15 Thread Christoph Hellwig
On Sun, Oct 14, 2018 at 09:59:16PM -0700, Todd Poynor wrote:
> From: Todd Poynor 
> 
> Various fixes for gasket/apex drivers.

I still can't find any explanation in the staging tree or in your
comments what gasket even is.


Re: [PATCH 1/2] LICENSES: Add note to CDDL-1.0 license that it should not be used

2018-10-15 Thread Christoph Hellwig
On Wed, Oct 10, 2018 at 03:23:07PM +0200, Thomas Gleixner wrote:
> I'd rather replace the 'GPL-2.0' with something like '$GPL-COMPATIBLE-ID'
> and say explicitely in the prosa text that CCDL can only be used with OR
> and not with AND.

Maybe we should take the whole thing to a higher level, and split
the other folder into

legacy/
 - for licenses that exist but should never be used for new code

and

dual/
 - for licenses that should only be used for dual licensing with
   something from preferred/


[tip:x86/urgent] x86/swiotlb: Enable swiotlb for > 4GiG ram on 32-bit kernels

2018-10-15 Thread tip-bot for Christoph Hellwig
Commit-ID:  6f3bc8028570e4c326030e8795dbcd57c561b723
Gitweb: https://git.kernel.org/tip/6f3bc8028570e4c326030e8795dbcd57c561b723
Author: Christoph Hellwig 
AuthorDate: Sun, 14 Oct 2018 09:52:08 +0200
Committer:  Thomas Gleixner 
CommitDate: Mon, 15 Oct 2018 10:55:03 +0200

x86/swiotlb: Enable swiotlb for > 4GiG ram on 32-bit kernels

We already build the swiotlb code for 32b-t kernels with PAE support,
but the code to actually use swiotlb has only been enabled for 64-bit
kernel for an unknown reason.

Before Linux 4.18 we paper over this fact because the networking code,
the scsi layer and some random block drivers implemented their own
bounce buffering scheme.

Fixes: 21e07dba9fb1 ("scsi: reduce use of block bounce buffers")
Fixes: ab74cfebafa3 ("net: remove the PCI_DMA_BUS_IS_PHYS check in 
illegal_highdma")
Reported-by: Matthew Whitehead 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Thomas Gleixner 
Tested-by: Matthew Whitehead 
Cc: konrad.w...@oracle.com
Cc: io...@lists.linux-foundation.org
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20181014075208.2715-1-...@lst.de
---
 arch/x86/kernel/pci-swiotlb.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 661583662430..71c0b01d93b1 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -42,10 +42,8 @@ IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
 int __init pci_swiotlb_detect_4gb(void)
 {
/* don't initialize swiotlb if iommu=off (no_iommu=1) */
-#ifdef CONFIG_X86_64
if (!no_iommu && max_possible_pfn > MAX_DMA32_PFN)
swiotlb = 1;
-#endif
 
/*
 * If SME is active then swiotlb will be set to 1 so that bounce


[PATCH] xtensa: remove ZONE_DMA

2018-10-14 Thread Christoph Hellwig
ZONE_DMA is intended for magic < 32-bit pools (usually ISA DMA), which
isn't required on xtensa.  Move all the non-highmem memory into
ZONE_NORMAL instead to match other architectures.

Signed-off-by: Christoph Hellwig 
---
 arch/xtensa/Kconfig   | 3 ---
 arch/xtensa/mm/init.c | 2 +-
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index 516694937b7a..9a7c654a7654 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -1,7 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
-config ZONE_DMA
-   def_bool y
-
 config XTENSA
def_bool y
select ARCH_HAS_SYNC_DMA_FOR_CPU
diff --git a/arch/xtensa/mm/init.c b/arch/xtensa/mm/init.c
index 34aead7dcb48..b385e6b73065 100644
--- a/arch/xtensa/mm/init.c
+++ b/arch/xtensa/mm/init.c
@@ -71,7 +71,7 @@ void __init zones_init(void)
 {
/* All pages are DMA-able, so we put them all in the DMA zone. */
unsigned long zones_size[MAX_NR_ZONES] = {
-   [ZONE_DMA] = max_low_pfn - ARCH_PFN_OFFSET,
+   [ZONE_NORMAL] = max_low_pfn - ARCH_PFN_OFFSET,
 #ifdef CONFIG_HIGHMEM
[ZONE_HIGHMEM] = max_pfn - max_low_pfn,
 #endif
-- 
2.19.1



Re: [PATCH v2 3/6] esp_scsi: Grant disconnect privilege for untagged commands

2018-10-14 Thread Christoph Hellwig
> + *p++ = IDENTIFY(lp && (tp->flags & ESP_TGT_DISCONNECT), lun);

I think lp should always be non-NULL here.


[tip:x86/urgent] x86/swiotlb: Enable swiotlb for > 4GiG ram on 32-bit kernels

2018-10-14 Thread tip-bot for Christoph Hellwig
Commit-ID:  ab555321e4ddc7f6f17f6c80dfaad228883a8e7c
Gitweb: https://git.kernel.org/tip/ab555321e4ddc7f6f17f6c80dfaad228883a8e7c
Author: Christoph Hellwig 
AuthorDate: Sun, 14 Oct 2018 09:52:08 +0200
Committer:  Thomas Gleixner 
CommitDate: Sun, 14 Oct 2018 11:11:23 +0200

x86/swiotlb: Enable swiotlb for > 4GiG ram on 32-bit kernels

We already build the swiotlb code for 32b-t kernels with PAE support,
but the code to actually use swiotlb has only been enabled for 64-bit
kernel for an unknown reason.

Before Linux 4.18 we paper over this fact because the networking code,
the scsi layer and some random block drivers implemented their own
bounce buffering scheme.

Fixes: 21e07dba9fb1 ("scsi: reduce use of block bounce buffers")
Fixes: ab74cfebafa3 ("net: remove the PCI_DMA_BUS_IS_PHYS check in 
illegal_highdma")
Reported-by: tedheadster 
Tested-by: tedheadster 
Signed-off-by: Christoph Hellwig 
Cc: konrad.w...@oracle.com
Cc: io...@lists.linux-foundation.org
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20181014075208.2715-1-...@lst.de

---
 arch/x86/kernel/pci-swiotlb.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 661583662430..71c0b01d93b1 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -42,10 +42,8 @@ IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
 int __init pci_swiotlb_detect_4gb(void)
 {
/* don't initialize swiotlb if iommu=off (no_iommu=1) */
-#ifdef CONFIG_X86_64
if (!no_iommu && max_possible_pfn > MAX_DMA32_PFN)
swiotlb = 1;
-#endif
 
/*
 * If SME is active then swiotlb will be set to 1 so that bounce


Re: [PATCH 4/6] mm: introduce page->dma_pinned_flags, _count

2018-10-13 Thread Christoph Hellwig
On Sat, Oct 13, 2018 at 12:34:12AM -0700, John Hubbard wrote:
> In patch 6/6, pin_page_for_dma(), which is called at the end of 
> get_user_pages(),
> unceremoniously rips the pages out of the LRU, as a prerequisite to using
> either of the page->dma_pinned_* fields. 
> 
> The idea is that LRU is not especially useful for this situation anyway,
> so we'll just make it one or the other: either a page is dma-pinned, and
> just hanging out doing RDMA most likely (and LRU is less meaningful during 
> that
> time), or it's possibly on an LRU list.

Have you done any benchmarking what this does to direct I/O performance,
especially for small I/O directly to a (fast) block device?


Re: [PATCH 5/5] RISC-V: Implement sparsemem

2018-10-11 Thread Christoph Hellwig
> +/*
> + * Log2 of the upper bound of the size of a struct page. Used for sizing
> + * the vmemmap region only, does not affect actual memory footprint.
> + * We don't use sizeof(struct page) directly since taking its size here
> + * requires its definition to be available at this point in the inclusion
> + * chain, and it may not be a power of 2 in the first place.
> + */
> +#define STRUCT_PAGE_MAX_SHIFT6

I know this is copied from arm64, but wouldn't this be a good time
to move this next to the struct page defintion?

Also this:

arch/arm64/mm/init.c:   BUILD_BUG_ON(sizeof(struct page) > (1 << 
STRUCT_PAGE_MAX_SHIFT));

should move to comment code (or would have to be duplicated for riscv)

> +#define VMEMMAP_SIZE (UL(1) << (CONFIG_VA_BITS - PAGE_SHIFT - 1 + \
> +STRUCT_PAGE_MAX_SHIFT))

Might be more readable with a another define, and without abuse of the
horrible UL macro:

#define VMEMMAP_SHIFT \
(CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
#define VMEMMAP_SIZE(1UL << VMEMMAP_SHIFT)

> +#define VMEMMAP_END  (VMALLOC_START - 1)
> +#define VMEMMAP_START(VMALLOC_START - VMEMMAP_SIZE)
> +
> +#define vmemmap  ((struct page *)VMEMMAP_START)

This could also use some comments..

> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_SPARSEMEM_H
> +#define __ASM_SPARSEMEM_H
> +
> +#ifdef CONFIG_SPARSEMEM
> +#define MAX_PHYSMEM_BITS CONFIG_PA_BITS
> +#define SECTION_SIZE_BITS30
> +#endif
> +
> +#endif

For potentially wide-spanning ifdefs like inclusion headers it always
is nice to have a comment with the symbol on the endif line.


Re: [PATCH 1/5] mm/sparse: add common helper to mark all memblocks present

2018-10-11 Thread Christoph Hellwig
> + for_each_memblock(memory, reg) {
> + int nid = memblock_get_region_node(reg);
> +
> + memory_present(nid, memblock_region_memory_base_pfn(reg),
> +memblock_region_memory_end_pfn(reg));

Any reason you have a local variable for the node id while you happily
get away without one for the others?  Why not simply:

for_each_memblock(memory, reg) {
memory_present(memblock_get_region_node(reg)
   memblock_region_memory_base_pfn(reg),
   memblock_region_memory_end_pfn(reg));
}


Re: [PATCH 0/7] hisi_sas: Misc bugfixes and an optimisation patch

2018-10-11 Thread Christoph Hellwig
On Thu, Oct 11, 2018 at 10:59:11AM +0100, John Garry wrote:
> 
> > blk-mq tags are always per-host (which has actually caused problems for
> > ATA, which is now using its own per-device tags).
> > 
> 
> So, for example, if Scsi_host.can_queue = 2048 and Scsi_host.nr_hw_queues =
> 16, then rq tags are still in range [0, 2048) for that HBA, i.e. invariant
> on queue count?

Yes, if can_queue is 2048 you will gets tags from 0..2047.

IFF you device needs different tags for different queues it can use
the blk_mq_unique_tag heper to generate unique global tag.

But unless you actuall have multiple hardware queues that latter part
is rather irrelevant to start with.


Re: [PATCH 0/7] hisi_sas: Misc bugfixes and an optimisation patch

2018-10-11 Thread Christoph Hellwig
On Wed, Oct 10, 2018 at 09:58:25PM -0400, Martin K. Petersen wrote:
> > This is because the IPTT index must be a unique value per HBA. However,
> > if we switched to SCSI MQ, the block layer tag becomes unique per queue,
> > and not per HBA.
> 
> That doesn't sound right.

blk-mq tags are always per-host (which has actually caused problems for
ATA, which is now using its own per-device tags).


Re: [RFC 4/4] gpio: sifive: Add GPIO driver for SiFive SoCs

2018-10-10 Thread Christoph Hellwig
On Wed, Oct 10, 2018 at 03:01:29PM +0200, Andreas Schwab wrote:
> On Okt 09 2018, Atish Patra  wrote:
> 
> > +static void sifive_set_ie(struct sifive_gpio *chip, unsigned int offset)
> > +{
> > +   unsigned long flags;
> > +   unsigned int trigger;
> > +
> > +   raw_spin_lock_irqsave(>lock, flags);
> > +   trigger = (chip->enabled & BIT(offset)) ? chip->trigger[offset] : 0;
> 
> This should use test_bit instead.

Given that this apparently needs the spinlock for atomciy with more than
just the bitmap test_bit would be rather pointless.


Re: [RFC 2/4] pwm: sifive: Add a driver for SiFive SoC PWM

2018-10-10 Thread Christoph Hellwig
Thanks for getting these drivers submitted upstream!

I don't really know anything about PWM, so just some random nitpicking
below..

> + iowrite32(frac, pwm->regs + REG_PWMCMP0 + (dev->hwpwm * SIZE_PWMCMP));

* already has a higher precedence than +, so no need for the inner
braces.

> + duty = ioread32(pwm->regs + REG_PWMCMP0 + (dev->hwpwm * SIZE_PWMCMP));

Same here.

> + /* (1 << (16+scale)) * 10^9/rate = real_period */
unsigned long scalePow = (pwm->approx_period * (u64)rate) / 10;

no camcel case, please.

> + int scale = ilog2(scalePow) - 16;
> +
> + scale = clamp(scale, 0, 0xf);

Why not:

int scale = clamp(ilog2(scale_pow) - 16, 0, 0xf);

> +static int sifive_pwm_clock_notifier(struct notifier_block *nb,
> +  unsigned long event, void *data)
> +{
> + struct clk_notifier_data *ndata = data;
> + struct sifive_pwm_device *pwm = container_of(nb,
> +  struct sifive_pwm_device,
> +  notifier);

I don't think there are any guidlines, but I always prefer to just move
the whole container_of onto a new line:

struct sifive_pwm_device *pwm =
container_of(nb, struct sifive_pwm_device, notifier);

> +static struct platform_driver sifive_pwm_driver = {
> + .probe = sifive_pwm_probe,
> + .remove = sifive_pwm_remove,
> + .driver = {
> + .name = "pwm-sifivem",
> + .of_match_table = of_match_ptr(sifive_pwm_of_match),
> + },
> +};

What about using tabs to align this a little more nicely?

static struct platform_driver sifive_pwm_driver = {
.probe  = sifive_pwm_probe,
.remove = sifive_pwm_remove,
.driver = {
.name   = "pwm-sifivem",
.of_match_table = of_match_ptr(sifive_pwm_of_match),
},
};


[PATCH 1/7] fore200e: simplify fore200e_bus usage

2018-10-09 Thread Christoph Hellwig
There is no need to have a global array of the ops, instead PCI and sbus
can have their own instances assigned in *_probe.  Also switch to C99
initializers.

Signed-off-by: Christoph Hellwig 
---
 drivers/atm/fore200e.c | 121 +++--
 1 file changed, 56 insertions(+), 65 deletions(-)

diff --git a/drivers/atm/fore200e.c b/drivers/atm/fore200e.c
index 99a38115b0a8..008bd8541c61 100644
--- a/drivers/atm/fore200e.c
+++ b/drivers/atm/fore200e.c
@@ -106,7 +106,6 @@
 
 
 static const struct atmdev_ops   fore200e_ops;
-static const struct fore200e_bus fore200e_bus[];
 
 static LIST_HEAD(fore200e_boards);
 
@@ -664,9 +663,31 @@ fore200e_pca_proc_read(struct fore200e* fore200e, char 
*page)
   pci_dev->bus->number, PCI_SLOT(pci_dev->devfn), 
PCI_FUNC(pci_dev->devfn));
 }
 
+static const struct fore200e_bus fore200e_pci_ops = {
+   .model_name = "PCA-200E",
+   .proc_name  = "pca200e",
+   .descr_alignment= 32,
+   .buffer_alignment   = 4,
+   .status_alignment   = 32,
+   .read   = fore200e_pca_read,
+   .write  = fore200e_pca_write,
+   .dma_map= fore200e_pca_dma_map,
+   .dma_unmap  = fore200e_pca_dma_unmap,
+   .dma_sync_for_cpu   = fore200e_pca_dma_sync_for_cpu,
+   .dma_sync_for_device= fore200e_pca_dma_sync_for_device,
+   .dma_chunk_alloc= fore200e_pca_dma_chunk_alloc,
+   .dma_chunk_free = fore200e_pca_dma_chunk_free,
+   .configure  = fore200e_pca_configure,
+   .map= fore200e_pca_map,
+   .reset  = fore200e_pca_reset,
+   .prom_read  = fore200e_pca_prom_read,
+   .unmap  = fore200e_pca_unmap,
+   .irq_check  = fore200e_pca_irq_check,
+   .irq_ack= fore200e_pca_irq_ack,
+   .proc_read  = fore200e_pca_proc_read,
+};
 #endif /* CONFIG_PCI */
 
-
 #ifdef CONFIG_SBUS
 
 static u32 fore200e_sba_read(volatile u32 __iomem *addr)
@@ -855,8 +876,32 @@ static int fore200e_sba_proc_read(struct fore200e 
*fore200e, char *page)
return sprintf(page, "   SBUS slot/device:\t\t%d/'%s'\n",
   (regs ? regs->which_io : 0), op->dev.of_node->name);
 }
-#endif /* CONFIG_SBUS */
 
+static const struct fore200e_bus fore200e_sbus_ops = {
+   .model_name = "SBA-200E",
+   .proc_name  = "sba200e",
+   .descr_alignment= 32,
+   .buffer_alignent= 64,
+   .status_alignment   = 32,
+   .read   = fore200e_sba_read,
+   .write  = fore200e_sba_write,
+   .dma_map= fore200e_sba_dma_map,
+   .dma_unap   = fore200e_sba_dma_unmap,
+   .dma_sync_for_cpu   = fore200e_sba_dma_sync_for_cpu,
+   .dma_sync_for_device= fore200e_sba_dma_sync_for_device,
+   .dma_chunk_alloc= fore200e_sba_dma_chunk_alloc,
+   .dma_chunk_free = fore200e_sba_dma_chunk_free,
+   .configure  = fore200e_sba_configure,
+   .map= fore200e_sba_map,
+   .reset  = fore200e_sba_reset,
+   .prom_read  = fore200e_sba_prom_read,
+   .unmap  = fore200e_sba_unmap,
+   .irq_enable = fore200e_sba_irq_enable,
+   .irq_check  = fore200e_sba_irq_check,
+   .irq_ack= fore200e_sba_irq_ack,
+   .proc_read  = fore200e_sba_proc_read,
+};
+#endif /* CONFIG_SBUS */
 
 static void
 fore200e_tx_irq(struct fore200e* fore200e)
@@ -2631,7 +2676,6 @@ static const struct of_device_id fore200e_sba_match[];
 static int fore200e_sba_probe(struct platform_device *op)
 {
const struct of_device_id *match;
-   const struct fore200e_bus *bus;
struct fore200e *fore200e;
static int index = 0;
int err;
@@ -2639,18 +2683,17 @@ static int fore200e_sba_probe(struct platform_device 
*op)
match = of_match_device(fore200e_sba_match, >dev);
if (!match)
return -EINVAL;
-   bus = match->data;
 
fore200e = kzalloc(sizeof(struct fore200e), GFP_KERNEL);
if (!fore200e)
return -ENOMEM;
 
-   fore200e->bus = bus;
+   fore200e->bus = _sbus_ops;
fore200e->bus_dev = op;
fore200e->irq = op->archdata.irqs[0];
fore200e->phys_base = op->resource[0].start;
 
-   sprintf(fore200e->name, "%s-%d", bus->model_name, index);
+   sprintf(fore200e->name, "SBA-200E-%d", index);
 
err = fore200e_init(fore200e, >dev);
if (err < 0) {
@@ -2678,7 +2721,6 @@ static int fore200e_sba_remove(struct platform_device *op)
 static const st

fore200e DMA cleanups and fixes

2018-10-09 Thread Christoph Hellwig
The fore200e driver came up during some dma-related audits, so
here is the fallout.  Compile tested (x86 & sparc) only.


[PATCH 02/33] powerpc/dma: remove the unused ARCH_HAS_DMA_MMAP_COHERENT define

2018-10-09 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
Acked-by: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 8fa394520af6..f2a4a7142b1e 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -112,7 +112,5 @@ extern int dma_set_mask(struct device *dev, u64 dma_mask);
 
 extern u64 __dma_get_required_mask(struct device *dev);
 
-#define ARCH_HAS_DMA_MMAP_COHERENT
-
 #endif /* __KERNEL__ */
 #endif /* _ASM_DMA_MAPPING_H */
-- 
2.19.0



use generic DMA mapping code in powerpc V3

2018-10-09 Thread Christoph Hellwig
Hi all,

this series switches the powerpc port to use the generic swiotlb and
noncoherent dma ops, and to use more generic code for the coherent
direct mapping, as well as removing a lot of dead code.

The changes since v1 are to big to list and v2 was not posted in public.

As this series is very large and depends on the dma-mapping tree I've
also published a git tree:

git://git.infradead.org/users/hch/misc.git powerpc-dma.3

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.3


[PATCH 14/33] powerpc/dart: use the generic iommu bypass code

2018-10-09 Thread Christoph Hellwig
Use the generic iommu bypass code instead of overriding set_dma_mask.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/sysdev/dart_iommu.c | 45 +++-
 1 file changed, 15 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index ce5dd2048f57..e7d1645a2d2e 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -360,13 +360,6 @@ static void iommu_table_dart_setup(void)
set_bit(iommu_table_dart.it_size - 1, iommu_table_dart.it_map);
 }
 
-static void pci_dma_dev_setup_dart(struct pci_dev *dev)
-{
-   if (dart_is_u4)
-   set_dma_offset(>dev, DART_U4_BYPASS_BASE);
-   set_iommu_table_base(>dev, _table_dart);
-}
-
 static void pci_dma_bus_setup_dart(struct pci_bus *bus)
 {
if (!iommu_table_dart_inited) {
@@ -390,27 +383,16 @@ static bool dart_device_on_pcie(struct device *dev)
return false;
 }
 
-static int dart_dma_set_mask(struct device *dev, u64 dma_mask)
+static void pci_dma_dev_setup_dart(struct pci_dev *dev)
 {
-   if (!dev->dma_mask || !dma_supported(dev, dma_mask))
-   return -EIO;
-
-   /* U4 supports a DART bypass, we use it for 64-bit capable
-* devices to improve performances. However, that only works
-* for devices connected to U4 own PCIe interface, not bridged
-* through hypertransport. We need the device to support at
-* least 40 bits of addresses.
-*/
-   if (dart_device_on_pcie(dev) && dma_mask >= DMA_BIT_MASK(40)) {
-   dev_info(dev, "Using 64-bit DMA iommu bypass\n");
-   set_dma_ops(dev, _nommu_ops);
-   } else {
-   dev_info(dev, "Using 32-bit DMA via iommu\n");
-   set_dma_ops(dev, _iommu_ops);
-   }
+   if (dart_is_u4 && dart_device_on_pcie(>dev))
+   set_dma_offset(>dev, DART_U4_BYPASS_BASE);
+   set_iommu_table_base(>dev, _table_dart);
+}
 
-   *dev->dma_mask = dma_mask;
-   return 0;
+static bool iommu_bypass_supported_dart(struct pci_dev *dev, u64 mask)
+{
+   return dart_is_u4 && dart_device_on_pcie(>dev);
 }
 
 void __init iommu_init_early_dart(struct pci_controller_ops *controller_ops)
@@ -430,12 +412,15 @@ void __init iommu_init_early_dart(struct 
pci_controller_ops *controller_ops)
if (dart_init(dn) != 0)
return;
 
-   /* Setup bypass if supported */
-   if (dart_is_u4)
-   ppc_md.dma_set_mask = dart_dma_set_mask;
-
+   /*
+* U4 supports a DART bypass, we use it for 64-bit capable devices to
+* improve performance.  However, that only works for devices connected
+* to the U4 own PCIe interface, not bridged through hypertransport.
+* We need the device to support at least 40 bits of addresses.
+*/
controller_ops->dma_dev_setup = pci_dma_dev_setup_dart;
controller_ops->dma_bus_setup = pci_dma_bus_setup_dart;
+   controller_ops->iommu_bypass_supported = iommu_bypass_supported_dart;
 
/* Setup pci_dma ops */
set_pci_dma_ops(_iommu_ops);
-- 
2.19.0



[PATCH 13/33] powerpc/dart: remove dead cleanup code in iommu_init_early_dart

2018-10-09 Thread Christoph Hellwig
If dart_init failed we didn't have a chance to setup dma or controller
ops yet, so there is no point in resetting them.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/sysdev/dart_iommu.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/powerpc/sysdev/dart_iommu.c b/arch/powerpc/sysdev/dart_iommu.c
index 5ca3e22d0512..ce5dd2048f57 100644
--- a/arch/powerpc/sysdev/dart_iommu.c
+++ b/arch/powerpc/sysdev/dart_iommu.c
@@ -428,7 +428,7 @@ void __init iommu_init_early_dart(struct pci_controller_ops 
*controller_ops)
 
/* Initialize the DART HW */
if (dart_init(dn) != 0)
-   goto bail;
+   return;
 
/* Setup bypass if supported */
if (dart_is_u4)
@@ -439,15 +439,6 @@ void __init iommu_init_early_dart(struct 
pci_controller_ops *controller_ops)
 
/* Setup pci_dma ops */
set_pci_dma_ops(_iommu_ops);
-   return;
-
- bail:
-   /* If init failed, use direct iommu and null setup functions */
-   controller_ops->dma_dev_setup = NULL;
-   controller_ops->dma_bus_setup = NULL;
-
-   /* Setup pci_dma ops */
-   set_pci_dma_ops(_nommu_ops);
 }
 
 #ifdef CONFIG_PM
-- 
2.19.0



[PATCH 20/33] powerpc/dma: remove the iommu fallback for coherent allocations

2018-10-09 Thread Christoph Hellwig
All iommu capable platforms now always use the iommu code with the
internal bypass, so there is not need for this magic anymore.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/Kconfig  |  4 ---
 arch/powerpc/kernel/dma.c | 68 ++-
 2 files changed, 2 insertions(+), 70 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 06996df07cad..7097019d8907 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -119,9 +119,6 @@ config GENERIC_HWEIGHT
bool
default y
 
-config ARCH_HAS_DMA_SET_COHERENT_MASK
-bool
-
 config PPC
bool
default y
@@ -129,7 +126,6 @@ config PPC
# Please keep this list sorted alphabetically.
#
select ARCH_HAS_DEVMEM_IS_ALLOWED
-   select ARCH_HAS_DMA_SET_COHERENT_MASK
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 78ef4076e15c..bef91b8ad064 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -114,51 +114,6 @@ void __dma_nommu_free_coherent(struct device *dev, size_t 
size,
 }
 #endif /* !CONFIG_NOT_COHERENT_CACHE */
 
-static void *dma_nommu_alloc_coherent(struct device *dev, size_t size,
-  dma_addr_t *dma_handle, gfp_t flag,
-  unsigned long attrs)
-{
-   struct iommu_table *iommu;
-
-   /* The coherent mask may be smaller than the real mask, check if
-* we can really use the direct ops
-*/
-   if (dma_nommu_dma_supported(dev, dev->coherent_dma_mask))
-   return __dma_nommu_alloc_coherent(dev, size, dma_handle,
-  flag, attrs);
-
-   /* Ok we can't ... do we have an iommu ? If not, fail */
-   iommu = get_iommu_table_base(dev);
-   if (!iommu)
-   return NULL;
-
-   /* Try to use the iommu */
-   return iommu_alloc_coherent(dev, iommu, size, dma_handle,
-   dev->coherent_dma_mask, flag,
-   dev_to_node(dev));
-}
-
-static void dma_nommu_free_coherent(struct device *dev, size_t size,
-void *vaddr, dma_addr_t dma_handle,
-unsigned long attrs)
-{
-   struct iommu_table *iommu;
-
-   /* See comments in dma_nommu_alloc_coherent() */
-   if (dma_nommu_dma_supported(dev, dev->coherent_dma_mask))
-   return __dma_nommu_free_coherent(dev, size, vaddr, dma_handle,
- attrs);
-   /* Maybe we used an iommu ... */
-   iommu = get_iommu_table_base(dev);
-
-   /* If we hit that we should have never allocated in the first
-* place so how come we are freeing ?
-*/
-   if (WARN_ON(!iommu))
-   return;
-   iommu_free_coherent(iommu, size, vaddr, dma_handle);
-}
-
 int dma_nommu_mmap_coherent(struct device *dev, struct vm_area_struct *vma,
 void *cpu_addr, dma_addr_t handle, size_t size,
 unsigned long attrs)
@@ -228,8 +183,8 @@ static inline void dma_nommu_sync_single(struct device *dev,
 #endif
 
 const struct dma_map_ops dma_nommu_ops = {
-   .alloc  = dma_nommu_alloc_coherent,
-   .free   = dma_nommu_free_coherent,
+   .alloc  = __dma_nommu_alloc_coherent,
+   .free   = __dma_nommu_free_coherent,
.mmap   = dma_nommu_mmap_coherent,
.map_sg = dma_nommu_map_sg,
.dma_supported  = dma_nommu_dma_supported,
@@ -243,25 +198,6 @@ const struct dma_map_ops dma_nommu_ops = {
 };
 EXPORT_SYMBOL(dma_nommu_ops);
 
-int dma_set_coherent_mask(struct device *dev, u64 mask)
-{
-   if (!dma_supported(dev, mask)) {
-   /*
-* We need to special case the direct DMA ops which can
-* support a fallback for coherent allocations. There
-* is no dma_op->set_coherent_mask() so we have to do
-* things the hard way:
-*/
-   if (get_dma_ops(dev) != _nommu_ops ||
-   get_iommu_table_base(dev) == NULL ||
-   !dma_iommu_dma_supported(dev, mask))
-   return -EIO;
-   }
-   dev->coherent_dma_mask = mask;
-   return 0;
-}
-EXPORT_SYMBOL(dma_set_coherent_mask);
-
 int dma_set_mask(struct device *dev, u64 dma_mask)
 {
if (ppc_md.dma_set_mask)
-- 
2.19.0



  1   2   3   4   5   6   7   8   9   10   >