Re: [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()

2020-10-10 Thread James Bottomley
On Fri, 2020-10-09 at 14:34 -0700, Eric Biggers wrote:
> On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> > 
> > The kmap() calls in this FS are localized to a single thread.  To
> > avoid the over head of global PKRS updates use the new
> > kmap_thread() call.
> > 
> > Cc: Jaegeuk Kim 
> > Cc: Chao Yu 
> > Signed-off-by: Ira Weiny 
> > ---
> >  fs/f2fs/f2fs.h | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index d9e52a7f3702..ff72a45a577e 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -2410,12 +2410,12 @@ static inline struct page
> > *f2fs_pagecache_get_page(
> >  
> >  static inline void f2fs_copy_page(struct page *src, struct page
> > *dst)
> >  {
> > -   char *src_kaddr = kmap(src);
> > -   char *dst_kaddr = kmap(dst);
> > +   char *src_kaddr = kmap_thread(src);
> > +   char *dst_kaddr = kmap_thread(dst);
> >  
> > memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
> > -   kunmap(dst);
> > -   kunmap(src);
> > +   kunmap_thread(dst);
> > +   kunmap_thread(src);
> >  }
> 
> Wouldn't it make more sense to switch cases like this to
> kmap_atomic()?
> The pages are only mapped to do a memcpy(), then they're immediately
> unmapped.

On a VIPT/VIVT architecture, this is horrendously wasteful.  You're
taking something that was mapped at colour c_src mapping it to a new
address src_kaddr, which is likely a different colour and necessitates
flushing the original c_src, then you copy it to dst_kaddr, which is
also likely a different colour from c_dst, so dst_kaddr has to be
flushed on kunmap and c_dst has to be invalidated on kmap.  What we
should have is an architectural primitive for doing this, something
like kmemcopy_arch(dst, src).  PIPT architectures can implement it as
the above (possibly losing kmap if they don't need it) but VIPT/VIVT
architectures can set up a correctly coloured mapping so they can
simply copy from c_src to c_dst without any need to flush and the data
arrives cache hot at c_dst.

James




Re: [PATCH 0/8] scsi: convert tasklets to use new tasklet_setup()

2020-08-17 Thread James Bottomley
On Mon, 2020-08-17 at 12:28 -0700, Kees Cook wrote:
> On Mon, Aug 17, 2020 at 07:41:58AM -0700, James Bottomley wrote:
> > On Mon, 2020-08-17 at 14:24 +0530, Allen Pais wrote:
> > > From: Allen Pais 
> > > 
> > > Commit 12cc923f1ccc ("tasklet: Introduce new initialization
> > > API")' introduced a new tasklet initialization API. This series
> > > converts all the scsi drivers to use the new tasklet_setup() API
> > 
> > I've got to say I agree with Jens, this was a silly obfuscation:
> > 
> > +#define from_tasklet(var, callback_tasklet, tasklet_fieldname) \
> > +   container_of(callback_tasklet, typeof(*var),
> > tasklet_fieldname)
> > 
> > Just use container_of directly since we all understand what it
> > does.
> 
> But then the lines get really long, wrapped, etc.

I really don't think that's a problem but if you want to add a new
generic container_of that does typeof instead of insisting on the type,
I'd be sort of OK with that ... provided you don't gratuitously alter
the argument order.

The thing I object to is that this encourages everyone to roll their
own unnecessary container_of type macros in spite of the fact that it's
function is wholly generic.  It's fine if you're eliminating one of the
arguments, or actually making the macro specific to the type, but in
this case you're not, you're making a completely generic macro where
the name is the only thing that's specific to this case.

>  This is what the timer_struct conversion did too (added a
> container_of wrapper), so I think it makes sense here too.

I didn't see that one to object to it ...

James



Re: [PATCH 0/8] scsi: convert tasklets to use new tasklet_setup()

2020-08-17 Thread James Bottomley
On Mon, 2020-08-17 at 14:24 +0530, Allen Pais wrote:
> From: Allen Pais 
> 
> Commit 12cc923f1ccc ("tasklet: Introduce new initialization API")'
> introduced a new tasklet initialization API. This series converts 
> all the scsi drivers to use the new tasklet_setup() API

I've got to say I agree with Jens, this was a silly obfuscation:

+#define from_tasklet(var, callback_tasklet, tasklet_fieldname) \
+   container_of(callback_tasklet, typeof(*var), tasklet_fieldname)

Just use container_of directly since we all understand what it does.

James



Re: [PATCH] scsi: target/sbp: remove firewire SBP target driver

2020-06-16 Thread James Bottomley
On Tue, 2020-06-16 at 14:13 +, Johannes Thumshirn wrote:
> On 16/06/2020 16:09, Bart Van Assche wrote:
> > On 2020-06-16 02:42, Finn Thain wrote:
> > > Martin said, "I'd appreciate a patch to remove it"
> > > 
> > > And Bart said, "do you want to keep this driver in the kernel
> > > tree?"
> > > 
> > > AFAICT both comments are quite ambiguous. I don't see an
> > > actionable request, just an expression of interest from people
> > > doing their jobs.
> > > 
> > > Note well: there is no pay check associated with having a
> > > MAINTAINERS file 
> > > entry.
> > 
> > Hi Finn,
> > 
> > As far as I know the sbp driver only has had one user ever and that
> > user is no longer user the sbp driver. So why to keep it in the
> > kernel tree? Restoring a kernel driver can be easy - the first step
> > is a "git revert".
> 
> Why not move the driver to drivers/staging for 2 or 3 kernel releases
> and if noone steps up, delete it?

Because that's pretty much the worst of all worlds: If the driver is
simply going orphaned it can stay where it is to avoid confusion.  If
it's being removed, it's better to remove it from where it is because
that makes the patch to restore it easy to find.

Chris, the thing is this: if this driver has just one user on a stable
distro who complains about its removal six months to two years from
now, Linus will descend on us from a great height (which won't matter
to you, since you'll be long gone).  This makes everyone very wary of
outright removal.  If you're really, really sure it has no users, it
can be deleted, but if there's the slightest chance it has just one, it
should get orphaned.

James



Re: [PATCH v2] ima: add a new CONFIG for loading arch-specific policies

2020-03-04 Thread James Bottomley
On Wed, 2020-03-04 at 07:35 -0500, Mimi Zohar wrote:
> On Tue, 2020-03-03 at 23:43 -0800, James Bottomley wrote:
> > On Tue, 2020-03-03 at 21:33 -0500, Nayna Jain wrote:
> > > diff --git a/security/integrity/ima/Kconfig
> > > b/security/integrity/ima/Kconfig
> > > index 3f3ee4e2eb0d..d17972aa413a 100644
> > > --- a/security/integrity/ima/Kconfig
> > > +++ b/security/integrity/ima/Kconfig
> > > @@ -327,3 +327,12 @@ config IMA_QUEUE_EARLY_BOOT_KEYS
> > >   depends on IMA_MEASURE_ASYMMETRIC_KEYS
> > >   depends on SYSTEM_TRUSTED_KEYRING
> > >   default y
> > > +
> > > +config IMA_SECURE_AND_OR_TRUSTED_BOOT
> > > + bool
> > > + depends on IMA
> > > + depends on IMA_ARCH_POLICY
> > > + default n
> > 
> > You can't do this: a symbol designed to be selected can't depend on
> > other symbols because Kconfig doesn't see the dependencies during
> > select.  We even have a doc for this now:
> > 
> > Documentation/kbuild/Kconfig.select-break
> 
> The document is discussing a circular dependency, where C selects B.
>  IMA_SECURE_AND_OR_TRUSTED_BOOT is not selecting anything, but is
> being selected.  All of the Kconfig's are now dependent on
> IMA_ARCH_POLICY being enabled before selecting
> IMA_SECURE_AND_OR_TRUSTED_BOOT.
> 
> As Ard pointed out, both IMA and IMA_ARCH_POLICY are not needed, as
> IMA_ARCH_POLICY is already dependent on IMA.

Then removing them is fine, if they're not necessary ... you just can't
 select a symbol with dependencies because the two Kconfig mechanisms
don't mix.

James



Re: [PATCH v2] ima: add a new CONFIG for loading arch-specific policies

2020-03-04 Thread James Bottomley
On Tue, 2020-03-03 at 21:33 -0500, Nayna Jain wrote:
> Every time a new architecture defines the IMA architecture specific
> functions - arch_ima_get_secureboot() and arch_ima_get_policy(), the
> IMA
> include file needs to be updated. To avoid this "noise", this patch
> defines a new IMA Kconfig IMA_SECURE_AND_OR_TRUSTED_BOOT option,
> allowing
> the different architectures to select it.
> 
> Suggested-by: Linus Torvalds 
> Signed-off-by: Nayna Jain 
> Cc: Ard Biesheuvel 
> Cc: Philipp Rudo 
> Cc: Michael Ellerman 
> ---
> v2:
> * Fixed the issue identified by Mimi. Thanks Mimi, Ard, Heiko and
> Michael for
> discussing the fix.
> 
>  arch/powerpc/Kconfig   | 1 +
>  arch/s390/Kconfig  | 1 +
>  arch/x86/Kconfig   | 1 +
>  include/linux/ima.h| 3 +--
>  security/integrity/ima/Kconfig | 9 +
>  5 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 497b7d0b2d7e..a5cfde432983 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -979,6 +979,7 @@ config PPC_SECURE_BOOT
>   bool
>   depends on PPC_POWERNV
>   depends on IMA_ARCH_POLICY
> + select IMA_SECURE_AND_OR_TRUSTED_BOOT
>   help
> Systems with firmware secure boot enabled need to define
> security
> policies to extend secure boot to the OS. This config
> allows a user
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index 8abe77536d9d..4a502fbcb800 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -195,6 +195,7 @@ config S390
>   select ARCH_HAS_FORCE_DMA_UNENCRYPTED
>   select SWIOTLB
>   select GENERIC_ALLOCATOR
> + select IMA_SECURE_AND_OR_TRUSTED_BOOT if IMA_ARCH_POLICY
>  
>  
>  config SCHED_OMIT_FRAME_POINTER
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index beea77046f9b..7f5bfaf0cbd2 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -230,6 +230,7 @@ config X86
>   select VIRT_TO_BUS
>   select X86_FEATURE_NAMESif PROC_FS
>   select PROC_PID_ARCH_STATUS if PROC_FS
> + select IMA_SECURE_AND_OR_TRUSTED_BOOT   if EFI &&
> IMA_ARCH_POLICY
>  
>  config INSTRUCTION_DECODER
>   def_bool y
> diff --git a/include/linux/ima.h b/include/linux/ima.h
> index 1659217e9b60..aefe758f4466 100644
> --- a/include/linux/ima.h
> +++ b/include/linux/ima.h
> @@ -30,8 +30,7 @@ extern void ima_kexec_cmdline(const void *buf, int
> size);
>  extern void ima_add_kexec_buffer(struct kimage *image);
>  #endif
>  
> -#if (defined(CONFIG_X86) && defined(CONFIG_EFI)) ||
> defined(CONFIG_S390) \
> - || defined(CONFIG_PPC_SECURE_BOOT)
> +#ifdef CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT
>  extern bool arch_ima_get_secureboot(void);
>  extern const char * const *arch_get_ima_policy(void);
>  #else
> diff --git a/security/integrity/ima/Kconfig
> b/security/integrity/ima/Kconfig
> index 3f3ee4e2eb0d..d17972aa413a 100644
> --- a/security/integrity/ima/Kconfig
> +++ b/security/integrity/ima/Kconfig
> @@ -327,3 +327,12 @@ config IMA_QUEUE_EARLY_BOOT_KEYS
>   depends on IMA_MEASURE_ASYMMETRIC_KEYS
>   depends on SYSTEM_TRUSTED_KEYRING
>   default y
> +
> +config IMA_SECURE_AND_OR_TRUSTED_BOOT
> + bool
> + depends on IMA
> + depends on IMA_ARCH_POLICY
> + default n

You can't do this: a symbol designed to be selected can't depend on
other symbols because Kconfig doesn't see the dependencies during
select.  We even have a doc for this now:

Documentation/kbuild/Kconfig.select-break

The only way to get this to work would be to have the long name symbol
select both IMA and IMA_ARCH_POLICY, which doesn't seem to be what you
want either.

Looking at what you're trying to do, I think making the symbol
independent of IMA and IMA_ARCH_POLICY is the correct thing, then
enforce the dependencies inside the outer #ifdef, but I haven't looked
deeply at the code.

James



Re: [PATCH 7/8] parisc: don't set ARCH_NO_COHERENT_DMA_MMAP

2019-08-15 Thread James Bottomley
On Thu, 2019-08-08 at 19:00 +0300, Christoph Hellwig wrote:
> parisc is the only architecture that sets ARCH_NO_COHERENT_DMA_MMAP
> when an MMU is enabled.  AFAIK this is because parisc CPUs use VIVT
> caches,

We're actually VIPT but the same principle applies.

>  which means exporting normally cachable memory to userspace is
> relatively dangrous due to cache aliasing.
> 
> But normally cachable memory is only allocated by dma_alloc_coherent
> on parisc when using the sba_iommu or ccio_iommu drivers, so just
> remove the .mmap implementation for them so that we don't have to set
> ARCH_NO_COHERENT_DMA_MMAP, which I plan to get rid of.

So I don't think this is quite right.  We have three architectural
variants essentially (hidden behind about 12 cpu types):

   1. pa70xx: These can't turn off page caching, so they were the non
  coherent problem case
   2. pa71xx: These can manufacture coherent memory simply by turning off
  the cache on a per page basis
   3. pa8xxx: these have a full cache flush coherence mechanism.

(I might have this slightly wrong: I vaguely remember the pa71xxlc
variants have some weird cache quirks for DMA as well)

So I think pa70xx we can't mmap.  pa71xx we can provided we mark the
page as uncached ... which should already have happened in the allocate
and pa8xxx which can always mmap dma memory without any special tricks.

James



Re: [PATCH v2 00/29] y2038: add time64 syscalls

2019-01-18 Thread James Bottomley
On Fri, 2019-01-18 at 11:57 -0500, Dennis Clarke wrote:
> On 1/18/19 11:18 AM, Arnd Bergmann wrote:
> > This is a minor update of the patches I posted last week, I
> > would like to add this into linux-next now, but would still do
> > changes if there are concerns about the contents. The first
> > version did not see a lot of replies, which could mean that
> > either everyone is happy with it, or that it was largely ignored.
> > 
> > See also the article at https://lwn.net/Articles/776435/.
> 
> I would be happy to read "Approaching the kernel year-2038 end game"
> however it is behind a pay wall.  Perhaps it may be best to just
> host interesting articles about open source idea elsewhere.

Hey, this is an unfair characterization: lwn.net operates an early
access subscription model, so you can't read the above for 14 days
after publication without paying for an lwn.net subscription, but by
the time these patches are upstream there will be no paywall because it
will expire on 24 January and that URL will then be readable by all. 
That makes LWN.net a nice, reliable resource for us while still
supporting some business model to keep it going.

James



Re: [PATCH v8 01/25] scsi/atari_scsi: Don't select CONFIG_NVRAM

2019-01-02 Thread James Bottomley
On Sun, 2018-12-30 at 18:50 +0100, LEROY Christophe wrote:
> Arnd Bergmann  a écrit :
> > On Sat, Dec 29, 2018 at 3:51 AM Michael Schmitz
> >  wrote:
[...]
> > > (On second thought - I don't want to speculate whether there's
> > > weird compiler options that could result in the
> > > nvram_check_checksum and nvram_read_bytes symbols to still be
> > > referenced in the final link, even though
> > > IS_BUILTIN(CONFIG_NVRAM) always evaluates to false. Best
> > > leave this as-is.)
> > 
> > As far as I know, it's totally reliable with the supported
> > compilers  (gcc-4.6+). In the older compilers (e.g. 4.1), there was
> > a corner case, where it could have failed to eliminate a function
> > that was only referenced through a pointer from a discarded
> > variable, but a plain IS_ENABLED() check like the one here
> > was still ok, and lots of code relies on that.
> > 
> > Other than that, I agree either way is totally fine here, so no
> > objections to using the #ifdef.
> 
> As far as I know, kernel codying style promotes the use of  
> IS_ENABLED() etc. instead of #ifdefs when possible.

It's a preference, as with a lot of coding style stuff, which we leave
up to the maintainer.

That said, as has been pointed out, the current #ifdef has a failing
corner case when both are modular (because the code should then be
included).  The runtime macro that correctly expresses this is
IS_REACHABLE(CONFIG_NVRAM).

James



Re: [PATCH 0/4] char-TPM: Adjustments for ten function implementations

2017-10-18 Thread James Bottomley
On Wed, 2017-10-18 at 18:10 +0300, Jarkko Sakkinen wrote:
> On Tue, Oct 17, 2017 at 08:57:13AM -0700, James Bottomley wrote:
> > 
> > On Tue, 2017-10-17 at 11:25 +0200, SF Markus Elfring wrote:
> > > 
> > > > 
> > > > 
> > > > Fixes is only for bug fixes.  These don't fix any bugs.
> > > 
> > > How do you distinguish these in questionable source code
> > > from other error categories or software weaknesses?
> > 
> > A style change is one that doesn't change the effect of the
> > execution.
> >  These don't actually even change the assembly, so there's
> > programmatic
> > proof they're not fixing anything.
> > 
> > Bug means potentially user visible fault.  In any bug fix commit
> > you
> > should document the fault and its effects on users so those
> > backporting
> > can decide if they care or not.
> > 
> > James
> 
> OK, I'll adjust my definition of a bug :-)

Subsystems are free to define bugs in any reasonable way.  However,
there are two things to note here:

   1. The style guide is just that, a guide; it's not hard and fast rules.
   That means that violations aren't bugs in the usual sense.
   However, new code should mostly follow it and if it doesn't, there
  should be a good reason to go against the guide which should be
  explained in the change log.
   2. The coding style evolves, so older drivers usually don't conform.
   Classifying coding style issues as bugs leads to tons of patches
  "fixing" older drivers, some of which actually end up breaking the
  drivers in subtle ways which take ages to be found (at least that's
  what we've seen in SCSI).

James



Re: [PATCH 0/4] char-TPM: Adjustments for ten function implementations

2017-10-17 Thread James Bottomley
On Tue, 2017-10-17 at 11:25 +0200, SF Markus Elfring wrote:
> > 
> > Fixes is only for bug fixes.  These don't fix any bugs.
> 
> How do you distinguish these in questionable source code
> from other error categories or software weaknesses?

A style change is one that doesn't change the effect of the execution.
 These don't actually even change the assembly, so there's programmatic
proof they're not fixing anything.

Bug means potentially user visible fault.  In any bug fix commit you
should document the fault and its effects on users so those backporting
can decide if they care or not.

James



Re: [PATCH 08/29] drivers, md: convert mddev.active from atomic_t to refcount_t

2017-03-14 Thread James Bottomley
On Tue, 2017-03-14 at 12:29 +, Reshetova, Elena wrote:
> > Elena Reshetova  writes:
> > 
> > > refcount_t type and corresponding API should be
> > > used instead of atomic_t when the variable is used as
> > > a reference counter. This allows to avoid accidental
> > > refcounter overflows that might lead to use-after-free
> > > situations.
> > > 
> > > Signed-off-by: Elena Reshetova 
> > > Signed-off-by: Hans Liljestrand 
> > > Signed-off-by: Kees Cook 
> > > Signed-off-by: David Windsor 
> > > ---
> > >  drivers/md/md.c | 6 +++---
> > >  drivers/md/md.h | 3 ++-
> > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > 
> > When booting linux-next (specifically 5be4921c9958ec) I'm seeing
> > the
> > backtrace below. I suspect this patch is just exposing an existing
> > issue?
> 
> Yes, we have actually been following this issue in the another
> thread. 
> It looks like the object is re-used somehow, but I can't quite
> understand how just by reading the code. 
> This was what I put into the previous thread:
> 
> "The log below indicates that you are using your refcounter in a bit
> weird way in mddev_find(). 
> However, I can't find the place (just by reading the code) where you
> would increment refcounter from zero (vs. setting it to one).
> It looks like you either iterate over existing nodes (and increment
> their counters, which should be >= 1 at the time of increment) or
> create a new node, but then mddev_init() sets the counter to 1. "
> 
> If you can help to understand what is going on with the object
> creation/destruction, would be appreciated!
> 
> Also Shaohua Li stopped this patch coming from his tree since the 
> issue was caught at that time, so we are not going to merge this 
> until we figure it out. 

Asking on the correct list (dm-devel) would have got you the easy
answer:  The refcount behind mddev->active is a genuine atomic.  It has
refcount properties but only if the array fails to initialise (in that
case, final put kills it).  Once it's added to the system as a gendisk,
it cannot be freed until md_free().  Thus its ->active count can go to
zero (when it becomes inactive; usually because of an unmount). On a
simple allocation regardless of outcome, the last executed statement in
md_alloc is mddev_put(): that destroys the device if we didn't manage
to create it or returns 0 and adds an inactive device to the system
which the user can get with mddev_find().

James




Re: [PATCH v2 3/7] ibmvscsi: Replace magic values in set_adpater_info() with defines

2016-02-12 Thread James Bottomley
On Wed, 2016-02-10 at 19:32 -0600, Tyrel Datwyler wrote:
> Add defines for mad version and mad os_type, and replace the magic
> numbers in set_adapter_info() accordingly.
> 
> Signed-off-by: Tyrel Datwyler 
> ---

Is there some reason you didn't carry the review tag over from this:

http://mid.gmane.org/20160204084459.gw27...@c203.arch.suse.de

?

James

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 3/7] ibmvscsi: Replace magic values in set_adpater_info() with defines

2016-02-12 Thread James Bottomley
On Fri, 2016-02-12 at 08:51 -0800, Tyrel Datwyler wrote:
> On 02/12/2016 08:43 AM, James Bottomley wrote:
> > On Wed, 2016-02-10 at 19:32 -0600, Tyrel Datwyler wrote:
> > > Add defines for mad version and mad os_type, and replace the
> > > magic
> > > numbers in set_adapter_info() accordingly.
> > > 
> > > Signed-off-by: Tyrel Datwyler <tyr...@linux.vnet.ibm.com>
> > > ---
> > 
> > Is there some reason you didn't carry the review tag over from
> > this:
> > 
> > http://mid.gmane.org/20160204084459.gw27...@c203.arch.suse.de
> > 
> > ?
> > 
> > James
> 
> The patch is slightly changed from v1. A define for AIX os type was
> added as mentioned in the cover letter v2 changes, and I moved the
> defines to the mad_adapter_info_data structure around the fields they 
> apply.

OK we need a re-review then ...

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 01/30] cxlflash: Fix to avoid invalid port_sel value

2015-09-16 Thread James Bottomley
Could you please add a cover letter (a 0/30) and thread your patches
from that?  For large patch series, it really does make following
everything a lot easier for me (and most other people who use a threaded
mail reader).

Thanks,

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 29/31] parisc: handle page-less SG entries

2015-08-13 Thread James Bottomley
On Thu, 2015-08-13 at 20:30 -0700, Dan Williams wrote:
 On Thu, Aug 13, 2015 at 7:31 AM, Christoph Hellwig h...@lst.de wrote:
  On Wed, Aug 12, 2015 at 09:01:02AM -0700, Linus Torvalds wrote:
  I'm assuming that anybody who wants to use the page-less
  scatter-gather lists always does so on memory that isn't actually
  virtually mapped at all, or only does so on sane architectures that
  are cache coherent at a physical level, but I'd like that assumption
  *documented* somewhere.
 
  It's temporarily mapped by kmap-like helpers.  That code isn't in
  this series. The most recent version of it is here:
 
  https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git/commit/?h=pfnid=de8237c99fdb4352be2193f3a7610e902b9bb2f0
 
  note that it's not doing the cache flushing it would have to do yet, but
  it's also only enabled for x86 at the moment.
 
 For virtually tagged caches I assume we would temporarily map with
 kmap_atomic_pfn_t(), similar to how drm_clflush_pages() implements
 powerpc support.  However with DAX we could end up with multiple
 virtual aliases for a page-less pfn.

At least on some PA architectures, you have to be very careful.
Improperly managed, multiple aliases will cause the system to crash
(actually a machine check in the cache chequerboard). For the most
temperamental systems, we need the cache line flushed and the alias
mapping ejected from the TLB cache before we access the same page at an
inequivalent alias.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: RFC: prepare for struct scatterlist entries without page backing

2015-08-12 Thread James Bottomley
On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote:
 Dan Williams started to look into addressing I/O to and from
 Persistent Memory in his series from June:
 
   http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944
 
 I've started looking into DMA mapping of these SGLs specifically instead
 of the map_pfn method in there.  In addition to supporting NVDIMM backed
 I/O I also suspect this would be highly useful for media drivers that
 go through nasty hoops to be able to DMA from/to their ioremapped regions,
 with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c
 being a prime example for the unsafe hacks currently used.
 
 It turns out most DMA mapping implementation can handle SGLs without
 page structures with some fairly simple mechanical work.  Most of it
 is just about consistently using sg_phys.  For implementations that
 need to flush caches we need a new helper that skips these cache
 flushes if a entry doesn't have a kernel virtual address.
 
 However the ccio (parisc) and sba_iommu (parisc  ia64) IOMMUs seem
 to be operate mostly on virtual addresses.  It's a fairly odd concept
 that I don't fully grasp, so I'll need some help with those if we want
 to bring this forward.

I can explain that.  I think this doesn't apply to ia64 because it's
cache is PIPT, but on parisc, we have a VIPT cache.

On normal physically indexed architectures, when the iommu sees a DMA
transfer to/from physical memory, it also notifies the CPU to flush the
internal CPU caches of those lines.  This is usually an interlocking
step of the transfer to make sure the page is coherent before transfer
to/from the device (it's why the ia32 for instance is a coherent
architecture).  Because the system is physically indexed, there's no
need to worry about aliases.

On Virtually Indexed systems, like parisc, there is an aliasing problem.
The CCIO iommu unit (and all other iommu systems on parisc) have what's
called a local coherence index (LCI).  You program it as part of the
IOMMU page table and it tells the system which Virtual line in the cache
to flush as part of the IO transaction, thus still ensuring cache
coherence.  That's why we have to know the virtual as well as physical
addresses for the page.  The problem we have in Linux is that we have
two virtual addresses, which are often incoherent aliases: the user
virtual address and a kernel virtual address but we can only make the
page coherent with a single alias (only one LCI).  The way I/O on Linux
currently works is that get_user_pages actually flushes the user virtual
address, so that's expected to be coherent, so the address we program
into the VCI is the kernel virtual address.  Usually nothing in the
kernel has ever touched the page, so there's nothing to flush, but we do
it just in case.

In theory, for these non kernel page backed SG entries, we can make the
process more efficient by not flushing in gup and instead programming
the user virtual address into the local coherence index.  However,
simply zeroing the LCI will also work (except that poor VI zero line
will get flushed repeatedly, so it's probably best to pick a known
untouched line in the kernel).

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] mpt2sas: Fall back to 64 bit coherent mask if 64 bit DMA / 32 bit coherent mask not supported

2015-05-13 Thread James Bottomley
On Wed, 2015-05-13 at 15:31 +0200, Arnd Bergmann wrote:
 On Wednesday 13 May 2015 08:23:41 Brian King wrote:
  On 05/13/2015 03:10 AM, Arnd Bergmann wrote:
   On Tuesday 12 May 2015 18:24:43 Brian King wrote:
  
   Commit 5fb1bf8aaa832e1e9ca3198de7bbecb8eff7db9c broke 64 bit DMA for 
   mpt2sas on Power.
   That commit changed the sequence for setting up the DMA and coherent DMA 
   masks so
   that during initialization the driver requests a 64 bit DMA mask and a 
   32 bit consistent
   DMA mask, then later requests a 64 bit consistent DMA mask. The Power 
   architecture does
   not currently support this, which results in always falling back to a 32 
   bit DMA window,
   which has a negative impact on performance. Tweak this algorithm 
   slightly so that
   if requesting a 32 bit consistent mask fails after we've successfully 
   set a 64 bit
   DMA mask, just try to get a 64 bit consistent mask. This should preserve 
   existing
   behavior on platforms that support mixed mask setting and restore 
   previous functionality
   to those that do not.
  
   Signed-off-by: Brian King brk...@linux.vnet.ibm.com
   
   I believe the way the API is designed, it should guarantee that after 
   dma_set_mask()
   succeeds for a device, dma_set_coherent_mask() with the same mask will 
   also succeed.
   
   Could you just call dma_set_mask_and_coherent() here to avoid that 
   complex logic?
  
  I don't think that would work. The mpt2sas driver wants to set the dma mask 
  to 64bits
  but set the coherent mask to 32 bits, then my change is to set the coherent 
  mask to
  64bits if setting it to 32bit fails. I'm not seeing how using 
  dma_set_mask_and_coherent
  would simplify anything here.
 
 My question was more about why the driver would ask for a 32-bit coherent
 mask to start with. Is this a limitation in the mpt2sas device that can
 only have descriptors at low addresses, or is it trying to work around some
 bug in a particular host system?

Can't speak for mp2sas, but it's a standard pattern for a lot of script
based (internal microcode) devices.  The microcode often gets extended
for external 64 bit addressing just by adding extra descriptor types, so
the device itself can do DMA to/from the full 64 bit range.  However,
the actual bit width of the microcode processor is usually left at 32
bits.  This means that if microcode scripts are executed out of system
memory, they have to be located in the lower 32 bits of physical memory
otherwise the internal microcode pointer references wrap (the aic79xx
has exactly this problem).  Hence they need a 64 bit mask but a 32 bit
coherent mask (coherent memory is where the scripts are located).

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 1/2] mm: Update generic gup implementation to handle hugepage directory

2014-10-24 Thread James Bottomley
On Fri, 2014-10-24 at 10:40 +1100, Benjamin Herrenschmidt wrote:
 On Thu, 2014-10-23 at 18:40 -0400, David Miller wrote:
  Hey guys, was looking over the generic GUP while working on a sparc64
  issue and I noticed that you guys do speculative page gets, and after
  talking with Johannes Weiner (CC:'d) about this we don't see how it
  could be necessary.
  
  If interrupts are disabled during the page table scan (which they
  are), no IPI tlb flushes can arrive.  Therefore any removal from the
  page tables is guarded by interrupts being re-enabled.  And as a
  result, page counts of pages we see in the page tables must always
  have a count  0.
  
  x86 does direct atomic_add() on page-_count because of this
  invariant and I would rather see the generic version do this too.
 
 This is of course only true of archs who use IPIs for TLB flushes, so if
 we are going down the path of not being speculative, powerpc would have
 to go back to doing its own since our broadcast TLB flush means we
 aren't protected (we are only protected vs. the page tables themselves
 being freed since we do that via sched RCU).
 
 AFAIK, ARM also broadcasts TLB flushes...

Parisc does this.  As soon as one CPU issues a TLB purge, it's broadcast
to all the CPUs on the inter-CPU bus.  The next instruction isn't
executed until they respond.

But this is only for our CPU TLB.  There's no other external
consequence, so removal from the page tables isn't effected by this TLB
flush, therefore the theory on which Dave bases the change to
atomic_add() should work for us (of course, atomic_add is lock add
unlock on our CPU, so it's not going to be of much benefit).

James

 Another option would be to make the generic code use something defined
 by the arch to decide whether to use speculative get or
 not. I like the idea of keeping the bulk of that code generic...
 
 Cheers,
 Ben.
 
  --
  To unsubscribe, send a message with 'unsubscribe linux-mm' in
  the body to majord...@kvack.org.  For more info on Linux MM,
  see: http://www.linux-mm.org/ .
  Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 
 
 --
 To unsubscribe, send a message with 'unsubscribe linux-mm' in
 the body to majord...@kvack.org.  For more info on Linux MM,
 see: http://www.linux-mm.org/ .
 Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-10 Thread James Bottomley
On Tue, 2014-09-09 at 06:40 -0400, Peter Hurley wrote:
 On 09/08/2014 10:56 PM, James Bottomley wrote:
  On Mon, 2014-09-08 at 19:30 -0400, Peter Hurley wrote:
  On 09/08/2014 01:50 AM, James Bottomley wrote:
  But additionally, even if gcc combines adjacent writes _that are part
  of the program flow_ then I believe the situation is no worse than
  would otherwise exist.
 
  For instance, given the following:
 
  struct x {
   spinlock_t lock;
   long a;
   byte b;
   byte c;
  };
 
  void locked_store_b(struct x *p)
  {
   spin_lock(p-lock);
   p-b = 1;
   spin_unlock(p-lock);
   p-c = 2;
  }
 
  Granted, the author probably expects ordered writes of
   STORE B
   STORE C
  but that's not guaranteed because there is no memory barrier
  ordering B before C.
 
  Yes, there is: loads and stores may not migrate into or out of critical
  sections.
 
  That's a common misconception.
 
  The processor is free to re-order this to:
 
 STORE C
 STORE B
 UNLOCK
 
  That's because the unlock() only guarantees that:
 
  Stores before the unlock in program order are guaranteed to complete
  before the unlock completes. Stores after the unlock _may_ complete
  before the unlock completes.
 
  My point was that even if compiler barriers had the same semantics
  as memory barriers, the situation would be no worse. That is, code
  that is sensitive to memory barriers (like the example I gave above)
  would merely have the same fragility with one-way compiler barriers
  (with respect to the compiler combining writes).
 
  That's what I meant by no worse than would otherwise exist.
  
  Actually, that's not correct.  This is actually deja vu with me on the
  other side of the argument.  When we first did spinlocks on PA, I argued
  as you did: lock only a barrier for code after and unlock for code
  before.  The failing case is that you can have a critical section which
  performs an atomically required operation and a following unit which
  depends on it being performed.  If you begin the following unit before
  the atomic requirement, you may end up losing.  It turns out this kind
  of pattern is inherent in a lot of mail box device drivers: you need to
  set up the mailbox atomically then poke it.  Setup is usually atomic,
  deciding which mailbox to prime and actually poking it is in the
  following unit.  Priming often involves an I/O bus transaction and if
  you poke before priming, you get a misfire.
 
 Take it up with the man because this was discussed extensively last
 year and it was decided that unlocks would not be full barriers.
 Thus the changes to memory-barriers.txt that explicitly note this
 and the addition of smp_mb__after_unlock_lock() (for two different
 locks; an unlock followed by a lock on the same lock is a full barrier).
 
 Code that expects ordered writes after an unlock needs to explicitly
 add the memory barrier.

I don't really care what ARM does; spin locks are full barriers on
architectures that need them.  The driver problem we had that detected
our semi permeable spinlocks was an LSI 53c875 which is enterprise class
PCI, so presumably not relevant to ARM anyway.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread James Bottomley
On Mon, 2014-09-08 at 18:52 +0100, One Thousand Gnomes wrote:
 On Fri, 05 Sep 2014 08:41:52 -0700
 H. Peter Anvin h...@zytor.com wrote:
 
  On 09/05/2014 08:31 AM, Peter Hurley wrote:
   
   Which is a bit ironic because I remember when Digital had a team
   working on emulating native x86 apps on Alpha/NT.
   
  
  Right, because the x86 architecture was obsolete and would never scale...
 
 Talking about not scaling can anyone explain how a you need to use
 set_bit() and friends bug report scaled into a hundred message plus
 discussion about ambiguous properties of processors (and nobody has
 audited all the embedded platforms we support yet, or the weirder ARMs)
 and a propsal to remove Alpha support.
 
 Wouldn't it be *much* simpler to do what I suggested in the first place
 and use the existing intended for purpose, deliberately put there,
 functions for atomic bitops, because they are fast on sane processors and
 they work on everything else.
 
 I think the whole removing Alpha EV5 support is basically bonkers. Just
 use set_bit in the tty layer. Alpha will continue to work as well as it
 always has done and you won't design out support for any future processor
 that turns out not to do byte aligned stores.

Seconded.  We implement via hashed spinlocks on PA ... but hey, we're
not the fastest architecture anyway and semantically it just works.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread James Bottomley
On Mon, 2014-09-08 at 11:12 -0700, H. Peter Anvin wrote:
 On 09/07/2014 10:56 PM, James Bottomley wrote:
  On Sun, 2014-09-07 at 16:39 -0700, H. Peter Anvin wrote:
  How many PARISC systems do we have that actually do real work on Linux?
  
  I'd be very surprised if this problem didn't exist on all alignment
  requiring architectures, like PPC and Sparc as well.  I know it would be
  very convenient if all the world were an x86 ... but it would also be
  very boring as well.
 
 I wouldn't be so sure about that.  That is a pretty aggressive
 relaxation of ordering that PARISC has enacted here, kind of like the
 Alpha we don't need no stinking byte accesses.

Um, I think you need to re-read the thread; that's not what I said at
all. It's even written lower down: PA can't do atomic bit sets (no
atomic RMW except the ldcw operation) it can do atomic writes to
fundamental sizes (byte, short, int, long) provided gcc emits the
correct primitive.  The original question was whether atomicity
required native bus width access, which we currently assume, so there's
no extant problem.

  The rules for coping with it are well known and a relaxation of what we
  currently do in the kernel, so I don't see what the actual problem is.
  
  In the context of this thread, PA can't do atomic bit sets (no atomic
  RMW except the ldcw operation) it can do atomic writes to fundamental
  sizes (byte, short, int, long) provided gcc emits the correct primitive
  (there are lots of gotchas in this, but that's not an architectural
  problem).  These atomicity guarantees depend on the underlying storage
  and are respected for main memory but not for any other type of bus.
 
 So I'm not trying to push the all the world is an x86... certainly not
 given that x86 has abnormally strict ordering rules and so is itself an
 outlier.  What I *don't* really want to have to deal with is going
 through more than causal effort to accommodate outliers which no longer
 have any real value -- we have too much work to do.

What accommodation?  I was just explaining the architectural atomicity
rules on PA.  How they're bus dependent (which isn't a PA restriction)
and how the compiler can mess them up.  None of the current PA atomicity
rules conflicts with anything we do today ... the danger is that moving
to implicit atomicity gives us more scope for compiler problems ... but
that's not PA specific either; any architecture requiring alignment has
this problem.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread James Bottomley
On Mon, 2014-09-08 at 16:45 -0400, Chris Metcalf wrote:
 On 9/8/2014 1:50 AM, James Bottomley wrote:
  Actual alignment is pretty irrelevant.  That's why all architectures
  which require alignment also have to implement misaligned traps ... this
  is a fundamental requirement of the networking code, for instance.
 
 Can you clarify what you think the requirement is?  The tile architecture
 doesn't support misaligned load/store in general, but we do support it for
 userspace (using a nifty JIT approach with a direct-map hash table kept
 in userspace), and also for get_user/put_user.  But that's it, and,
 the networking subsystem works fine for us.

This was years ago (possibly decades).  We had to implement in-kernel
unaligned traps for the networking layer because it could access short
and int fields that weren't of the correct alignment when processing
packets.  It that's all corrected now, we wouldn't really notice (except
a bit of a speed up since an unaligned trap effectively places the
broken out instructions into the bit stream).

James


 Occasionally we report bugs for driver code that doesn't use the
 get_unaligned_xxx() macros and friends, and our fixes are generally taken
 upstream.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread James Bottomley
On Mon, 2014-09-08 at 19:30 -0400, Peter Hurley wrote:
 On 09/08/2014 01:50 AM, James Bottomley wrote:
  Two things: I think that gcc has given up on combining adjacent writes,
  perhaps because unaligned writes on some arches are prohibitive, so
  whatever minor optimization was believed to be gained was quickly lost,
  multi-fold. (Although this might not be the reason since one would
  expect that gcc would know the alignment of the promoted store).
  
  Um, gcc assumes architecturally correct alignment; that's why it pads
  structures.  Only when accessing packed structures will it use the
  lowest unit load/store.
  
  if you have a struct { char a, char b }; and load first a then b with a
  constant gcc will obligingly optimise to a short store.
 
 Um, please re-look at the test above. The exact test you describe is
 coded above and compiled with gcc 4.6.3 cross-compiler for parisc using
 the kernel compiler options.
 
 In the generated code, please note the _absence_ of a combined write
 to two adjacent byte stores.

So one version of gcc doesn't do it.  Others do because I've been
surprised seeing it in assembly.

  But additionally, even if gcc combines adjacent writes _that are part
  of the program flow_ then I believe the situation is no worse than
  would otherwise exist.
 
  For instance, given the following:
 
  struct x {
 spinlock_t lock;
 long a;
 byte b;
 byte c;
  };
 
  void locked_store_b(struct x *p)
  {
 spin_lock(p-lock);
 p-b = 1;
 spin_unlock(p-lock);
 p-c = 2;
  }
 
  Granted, the author probably expects ordered writes of
 STORE B
 STORE C
  but that's not guaranteed because there is no memory barrier
  ordering B before C.
  
  Yes, there is: loads and stores may not migrate into or out of critical
  sections.
 
 That's a common misconception.
 
 The processor is free to re-order this to:
 
   STORE C
   STORE B
   UNLOCK
 
 That's because the unlock() only guarantees that:
 
 Stores before the unlock in program order are guaranteed to complete
 before the unlock completes. Stores after the unlock _may_ complete
 before the unlock completes.
 
 My point was that even if compiler barriers had the same semantics
 as memory barriers, the situation would be no worse. That is, code
 that is sensitive to memory barriers (like the example I gave above)
 would merely have the same fragility with one-way compiler barriers
 (with respect to the compiler combining writes).
 
 That's what I meant by no worse than would otherwise exist.

Actually, that's not correct.  This is actually deja vu with me on the
other side of the argument.  When we first did spinlocks on PA, I argued
as you did: lock only a barrier for code after and unlock for code
before.  The failing case is that you can have a critical section which
performs an atomically required operation and a following unit which
depends on it being performed.  If you begin the following unit before
the atomic requirement, you may end up losing.  It turns out this kind
of pattern is inherent in a lot of mail box device drivers: you need to
set up the mailbox atomically then poke it.  Setup is usually atomic,
deciding which mailbox to prime and actually poking it is in the
following unit.  Priming often involves an I/O bus transaction and if
you poke before priming, you get a misfire.

  I see no problem with gcc write-combining in the absence of
  memory barriers (as long as alignment is still respected,
  ie., the size-promoted write is still naturally aligned). The
  combined write is still atomic.
  
  Actual alignment is pretty irrelevant.  That's why all architectures
  which require alignment also have to implement misaligned traps ... this
  is a fundamental requirement of the networking code, for instance.
  
  The main problem is gcc thinking there's a misalignment (or a packed
  structure).  That causes splitting of loads and stores and that destroys
  atomicity.
 
 Yeah, the extra requirement I added is basically nonsense, since the
 only issue is what instructions the compiler is emitting. So if compiler
 thinks the alignment is natural and combines the writes -- ok. If the
 compiler thinks the alignment is off and doesn't combine the writes --
 also ok.

Yes, I think I can agree that the only real problem is gcc thinking the
store or load needs splitting.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-07 Thread James Bottomley
On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
 On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
  On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
   On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
Hi James,

On 09/04/2014 10:11 PM, James Bottomley wrote:
 On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
 +And there are anti-guarantees:
 +
 + (*) These guarantees do not apply to bitfields, because compilers 
 often
 + generate code to modify these using non-atomic 
 read-modify-write
 + sequences.  Do not attempt to use bitfields to synchronize 
 parallel
 + algorithms.
 +
 + (*) Even in cases where bitfields are protected by locks, all 
 fields
 + in a given bitfield must be protected by one lock.  If two 
 fields
 + in a given bitfield are protected by different locks, the 
 compiler's
 + non-atomic read-modify-write sequences can cause an update to 
 one
 + field to corrupt the value of an adjacent field.
 +
 + (*) These guarantees apply only to properly aligned and sized 
 scalar
 + variables.  Properly sized currently means int and long,
 + because some CPU families do not support loads and stores of
 + other sizes.  (Some CPU families is currently believed to
 + be only Alpha 21064.  If this is actually the case, a different
 + non-guarantee is likely to be formulated.)
 
 This is a bit unclear.  Presumably you're talking about definiteness 
 of
 the outcome (as in what's seen after multiple stores to the same
 variable).

No, the last conditions refers to adjacent byte stores from different
cpu contexts (either interrupt or SMP).

 The guarantees are only for natural width on Parisc as well,
 so you would get a mess if you did byte stores to adjacent memory
 locations.

For a simple test like:

struct x {
long a;
char b;
char c;
char d;
char e;
};

void store_bc(struct x *p) {
p-b = 1;
p-c = 2;
}

on parisc, gcc generates separate byte stores

void store_bc(struct x *p) {
   0:   34 1c 00 02 ldi 1,ret0
   4:   0f 5c 12 08 stb ret0,4(r26)
   8:   34 1c 00 04 ldi 2,ret0
   c:   e8 40 c0 00 bv r0(rp)
  10:   0f 5c 12 0a stb ret0,5(r26)

which appears to confirm that on parisc adjacent byte data
is safe from corruption by concurrent cpu updates; that is,

CPU 0| CPU 1
 |
p-b = 1 | p-c = 2
 |

will result in p-b == 1  p-c == 2 (assume both values
were 0 before the call to store_bc()).
   
   What Peter said.  I would ask for suggestions for better wording, but
   I would much rather be able to say that single-byte reads and writes
   are atomic and that aligned-short reads and writes are also atomic.
   
   Thus far, it looks like we lose only very old Alpha systems, so unless
   I hear otherwise, I update my patch to outlaw these very old systems.
  
  This isn't universally true according to the architecture manual.  The
  PARISC CPU can make byte to long word stores atomic against the memory
  bus but not against the I/O bus for instance.  Atomicity is a property
  of the underlying substrate, not of the CPU.  Implying that atomicity is
  a CPU property is incorrect.
 
 OK, fair point.
 
 But are there in-use-for-Linux PARISC memory fabrics (for normal memory,
 not I/O) that do not support single-byte and double-byte stores?

For aligned access, I believe that's always the case for the memory bus
(on both 32 and 64 bit systems).  However, it only applies to machine
instruction loads and stores of the same width..  If you mix the widths
on the loads and stores, all bets are off.  That means you have to
beware of the gcc penchant for coalescing loads and stores: if it sees
two adjacent byte stores it can coalesce them into a short store
instead ... that screws up the atomicity guarantees.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-07 Thread James Bottomley
On Sun, 2014-09-07 at 16:41 -0400, Peter Hurley wrote:
 On 09/07/2014 03:04 PM, James Bottomley wrote:
  On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
  On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
  On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
  On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
  Hi James,
 
  On 09/04/2014 10:11 PM, James Bottomley wrote:
  On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
  +And there are anti-guarantees:
  +
  + (*) These guarantees do not apply to bitfields, because compilers 
  often
  + generate code to modify these using non-atomic read-modify-write
  + sequences.  Do not attempt to use bitfields to synchronize 
  parallel
  + algorithms.
  +
  + (*) Even in cases where bitfields are protected by locks, all fields
  + in a given bitfield must be protected by one lock.  If two 
  fields
  + in a given bitfield are protected by different locks, the 
  compiler's
  + non-atomic read-modify-write sequences can cause an update to 
  one
  + field to corrupt the value of an adjacent field.
  +
  + (*) These guarantees apply only to properly aligned and sized scalar
  + variables.  Properly sized currently means int and long,
  + because some CPU families do not support loads and stores of
  + other sizes.  (Some CPU families is currently believed to
  + be only Alpha 21064.  If this is actually the case, a different
  + non-guarantee is likely to be formulated.)
 
  This is a bit unclear.  Presumably you're talking about definiteness of
  the outcome (as in what's seen after multiple stores to the same
  variable).
 
  No, the last conditions refers to adjacent byte stores from different
  cpu contexts (either interrupt or SMP).
 
  The guarantees are only for natural width on Parisc as well,
  so you would get a mess if you did byte stores to adjacent memory
  locations.
 
  For a simple test like:
 
  struct x {
  long a;
  char b;
  char c;
  char d;
  char e;
  };
 
  void store_bc(struct x *p) {
  p-b = 1;
  p-c = 2;
  }
 
  on parisc, gcc generates separate byte stores
 
  void store_bc(struct x *p) {
 0:   34 1c 00 02 ldi 1,ret0
 4:   0f 5c 12 08 stb ret0,4(r26)
 8:   34 1c 00 04 ldi 2,ret0
 c:   e8 40 c0 00 bv r0(rp)
10:   0f 5c 12 0a stb ret0,5(r26)
 
  which appears to confirm that on parisc adjacent byte data
  is safe from corruption by concurrent cpu updates; that is,
 
  CPU 0| CPU 1
   |
  p-b = 1 | p-c = 2
   |
 
  will result in p-b == 1  p-c == 2 (assume both values
  were 0 before the call to store_bc()).
 
  What Peter said.  I would ask for suggestions for better wording, but
  I would much rather be able to say that single-byte reads and writes
  are atomic and that aligned-short reads and writes are also atomic.
 
  Thus far, it looks like we lose only very old Alpha systems, so unless
  I hear otherwise, I update my patch to outlaw these very old systems.
 
  This isn't universally true according to the architecture manual.  The
  PARISC CPU can make byte to long word stores atomic against the memory
  bus but not against the I/O bus for instance.  Atomicity is a property
  of the underlying substrate, not of the CPU.  Implying that atomicity is
  a CPU property is incorrect.
 
 To go back to this briefly, while it's true that atomicity is not
 a CPU property, atomicity is not possible if the CPU is not cooperating.

You mean if it doesn't have the bus logic to emit?  Like ia32 mostly
can't do 64 bit writes ... sure.

  OK, fair point.
 
  But are there in-use-for-Linux PARISC memory fabrics (for normal memory,
  not I/O) that do not support single-byte and double-byte stores?
  
  For aligned access, I believe that's always the case for the memory bus
  (on both 32 and 64 bit systems).  However, it only applies to machine
  instruction loads and stores of the same width..  If you mix the widths
  on the loads and stores, all bets are off.  That means you have to
  beware of the gcc penchant for coalescing loads and stores: if it sees
  two adjacent byte stores it can coalesce them into a short store
  instead ... that screws up the atomicity guarantees.
 
 Two things: I think that gcc has given up on combining adjacent writes,
 perhaps because unaligned writes on some arches are prohibitive, so
 whatever minor optimization was believed to be gained was quickly lost,
 multi-fold. (Although this might not be the reason since one would
 expect that gcc would know the alignment of the promoted store).

Um, gcc assumes architecturally correct alignment; that's why it pads
structures.  Only when accessing packed structures will it use the
lowest unit load/store.

if you have a struct { char a, char b }; and load first a then b with a
constant gcc will obligingly optimise

Re: bit fields data tearing

2014-09-07 Thread James Bottomley
On Sun, 2014-09-07 at 16:39 -0700, H. Peter Anvin wrote:
 How many PARISC systems do we have that actually do real work on Linux?

I'd be very surprised if this problem didn't exist on all alignment
requiring architectures, like PPC and Sparc as well.  I know it would be
very convenient if all the world were an x86 ... but it would also be
very boring as well.

The rules for coping with it are well known and a relaxation of what we
currently do in the kernel, so I don't see what the actual problem is.

In the context of this thread, PA can't do atomic bit sets (no atomic
RMW except the ldcw operation) it can do atomic writes to fundamental
sizes (byte, short, int, long) provided gcc emits the correct primitive
(there are lots of gotchas in this, but that's not an architectural
problem).  These atomicity guarantees depend on the underlying storage
and are respected for main memory but not for any other type of bus.

James


 On September 7, 2014 4:36:55 PM PDT, Paul E. McKenney 
 paul...@linux.vnet.ibm.com wrote:
 On Sun, Sep 07, 2014 at 04:17:30PM -0700, H. Peter Anvin wrote:
  I'm confused why storing 0x0102 would be a problem.  I think gcc does
 that even on other cpus.
  
  More atomicity can't hurt, can it?
 
 I must defer to James for any additional details on why PARISC systems
 don't provide atomicity for partially overlapping stores.  ;-)
 
  Thanx, Paul
 
  On September 7, 2014 4:00:19 PM PDT, Paul E. McKenney
 paul...@linux.vnet.ibm.com wrote:
  On Sun, Sep 07, 2014 at 12:04:47PM -0700, James Bottomley wrote:
   On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
 On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
  On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley
 wrote:
   Hi James,
   
   On 09/04/2014 10:11 PM, James Bottomley wrote:
On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney
 wrote:
+And there are anti-guarantees:
+
+ (*) These guarantees do not apply to bitfields,
 because
  compilers often
+ generate code to modify these using non-atomic
  read-modify-write
+ sequences.  Do not attempt to use bitfields to
  synchronize parallel
+ algorithms.
+
+ (*) Even in cases where bitfields are protected by
  locks, all fields
+ in a given bitfield must be protected by one
 lock. 
  If two fields
+ in a given bitfield are protected by different
  locks, the compiler's
+ non-atomic read-modify-write sequences can cause
 an
  update to one
+ field to corrupt the value of an adjacent field.
+
+ (*) These guarantees apply only to properly aligned
 and
  sized scalar
+ variables.  Properly sized currently means
 int
  and long,
+ because some CPU families do not support loads
 and
  stores of
+ other sizes.  (Some CPU families is currently
  believed to
+ be only Alpha 21064.  If this is actually the
 case,
  a different
+ non-guarantee is likely to be formulated.)

This is a bit unclear.  Presumably you're talking about
  definiteness of
the outcome (as in what's seen after multiple stores to
 the
  same
variable).
   
   No, the last conditions refers to adjacent byte stores
 from
  different
   cpu contexts (either interrupt or SMP).
   
The guarantees are only for natural width on Parisc as
  well,
so you would get a mess if you did byte stores to
 adjacent
  memory
locations.
   
   For a simple test like:
   
   struct x {
   long a;
   char b;
   char c;
   char d;
   char e;
   };
   
   void store_bc(struct x *p) {
   p-b = 1;
   p-c = 2;
   }
   
   on parisc, gcc generates separate byte stores
   
   void store_bc(struct x *p) {
  0:   34 1c 00 02 ldi 1,ret0
  4:   0f 5c 12 08 stb ret0,4(r26)
  8:   34 1c 00 04 ldi 2,ret0
  c:   e8 40 c0 00 bv r0(rp)
 10:   0f 5c 12 0a stb ret0,5(r26)
   
   which appears to confirm that on parisc adjacent byte data
   is safe from corruption by concurrent cpu updates; that
 is,
   
   CPU 0| CPU 1
|
   p-b = 1 | p-c = 2
|
   
   will result in p-b == 1  p-c == 2 (assume both values
   were 0 before the call to store_bc()).
  
  What Peter said.  I would ask for suggestions for better
  wording, but
  I would much rather be able to say that single-byte reads
 and
  writes
  are atomic and that aligned-short reads and writes are also
  atomic.
  
  Thus far, it looks like we lose only very

Re: bit fields data tearing

2014-09-06 Thread James Bottomley
On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
 On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
  Hi James,
  
  On 09/04/2014 10:11 PM, James Bottomley wrote:
   On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
   +And there are anti-guarantees:
   +
   + (*) These guarantees do not apply to bitfields, because compilers often
   + generate code to modify these using non-atomic read-modify-write
   + sequences.  Do not attempt to use bitfields to synchronize parallel
   + algorithms.
   +
   + (*) Even in cases where bitfields are protected by locks, all fields
   + in a given bitfield must be protected by one lock.  If two fields
   + in a given bitfield are protected by different locks, the 
   compiler's
   + non-atomic read-modify-write sequences can cause an update to one
   + field to corrupt the value of an adjacent field.
   +
   + (*) These guarantees apply only to properly aligned and sized scalar
   + variables.  Properly sized currently means int and long,
   + because some CPU families do not support loads and stores of
   + other sizes.  (Some CPU families is currently believed to
   + be only Alpha 21064.  If this is actually the case, a different
   + non-guarantee is likely to be formulated.)
   
   This is a bit unclear.  Presumably you're talking about definiteness of
   the outcome (as in what's seen after multiple stores to the same
   variable).
  
  No, the last conditions refers to adjacent byte stores from different
  cpu contexts (either interrupt or SMP).
  
   The guarantees are only for natural width on Parisc as well,
   so you would get a mess if you did byte stores to adjacent memory
   locations.
  
  For a simple test like:
  
  struct x {
  long a;
  char b;
  char c;
  char d;
  char e;
  };
  
  void store_bc(struct x *p) {
  p-b = 1;
  p-c = 2;
  }
  
  on parisc, gcc generates separate byte stores
  
  void store_bc(struct x *p) {
 0:   34 1c 00 02 ldi 1,ret0
 4:   0f 5c 12 08 stb ret0,4(r26)
 8:   34 1c 00 04 ldi 2,ret0
 c:   e8 40 c0 00 bv r0(rp)
10:   0f 5c 12 0a stb ret0,5(r26)
  
  which appears to confirm that on parisc adjacent byte data
  is safe from corruption by concurrent cpu updates; that is,
  
  CPU 0| CPU 1
   |
  p-b = 1 | p-c = 2
   |
  
  will result in p-b == 1  p-c == 2 (assume both values
  were 0 before the call to store_bc()).
 
 What Peter said.  I would ask for suggestions for better wording, but
 I would much rather be able to say that single-byte reads and writes
 are atomic and that aligned-short reads and writes are also atomic.
 
 Thus far, it looks like we lose only very old Alpha systems, so unless
 I hear otherwise, I update my patch to outlaw these very old systems.

This isn't universally true according to the architecture manual.  The
PARISC CPU can make byte to long word stores atomic against the memory
bus but not against the I/O bus for instance.  Atomicity is a property
of the underlying substrate, not of the CPU.  Implying that atomicity is
a CPU property is incorrect.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-04 Thread James Bottomley
On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
 +And there are anti-guarantees:
 +
 + (*) These guarantees do not apply to bitfields, because compilers often
 + generate code to modify these using non-atomic read-modify-write
 + sequences.  Do not attempt to use bitfields to synchronize parallel
 + algorithms.
 +
 + (*) Even in cases where bitfields are protected by locks, all fields
 + in a given bitfield must be protected by one lock.  If two fields
 + in a given bitfield are protected by different locks, the compiler's
 + non-atomic read-modify-write sequences can cause an update to one
 + field to corrupt the value of an adjacent field.
 +
 + (*) These guarantees apply only to properly aligned and sized scalar
 + variables.  Properly sized currently means int and long,
 + because some CPU families do not support loads and stores of
 + other sizes.  (Some CPU families is currently believed to
 + be only Alpha 21064.  If this is actually the case, a different
 + non-guarantee is likely to be formulated.)

This is a bit unclear.  Presumably you're talking about definiteness of
the outcome (as in what's seen after multiple stores to the same
variable).  The guarantees are only for natural width on Parisc as well,
so you would get a mess if you did byte stores to adjacent memory
locations.  But multiple 32 bit stores guarantees to see one of the
stored values as the final outcome.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND][PATCH 1/2] lib/scatterlist: Make ARCH_HAS_SG_CHAIN an actual Kconfig

2014-03-22 Thread James Bottomley
On Sat, 2014-03-22 at 11:13 -0700, Laura Abbott wrote:
 Rather than have architectures #define ARCH_HAS_SG_CHAIN in an architecture
 specific scatterlist.h, make it a proper Kconfig option and use that
 instead. At same time, remove the header files are are now mostly
 useless and just include asm-generic/scatterlist.h.

Well, the transformation looks fine.  Perhaps part of the reason for the
lack of response is that there's no compelling reason in the change log
above for doing this.  The usual reason for eliminating ARCH_HAS is that
it's hiding something that would be better expressed a different way
(that's actually intuitive to grep) or that it's expressing something
that should be configurable.  Neither of these reasons apply in this
case, because SG_CHAIN definitely is a property of the architecture not
the config space and it's not really hiding anything.

Perhaps now might be the time to ask which are the remaining
architectures that cannot do SG chaining and then we can fix them and
pull the whole thing out.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND][PATCH 1/2] lib/scatterlist: Make ARCH_HAS_SG_CHAIN an actual Kconfig

2014-03-22 Thread James Bottomley
On Sat, 2014-03-22 at 22:23 +, Russell King - ARM Linux wrote:
 On Sat, Mar 22, 2014 at 02:31:21PM -0700, James Bottomley wrote:
  Perhaps now might be the time to ask which are the remaining
  architectures that cannot do SG chaining and then we can fix them and
  pull the whole thing out.
 
 Not quite.  You're making the assumption that we can be sure that all
 the scatterlist users on an architecture have been converted - that's
 simply not true on ARM.

No I'm not, I said now might be the time to ask which are the remaining
architectures that cannot do SG chaining I think it's time to list them
so we know what work remains.  I know we've got a bunch in parisc (all
of our iommu code in driver/parisc - about 5 different ones - are
unconverted).  However, the conversion is pretty simple; it's mostly
replacing sglist++ with sglist=sg_next(sglist)

   We have some which have, and some which still
 have not been audited.
 
 The cases that get us here would be old platform DMA code which walks
 scatterlists handed to it from drivers - stuff like
 arch/arm/mach-rpc/dma.c (which probably can cope), and drivers/scsi/arm/*
 (which definitely can't because of their SCSI pointers save/restore
 handling message.)  I know that's one case where SG_CHAIN definitely
 isn't supported on ARM.
 
 So, we had decided not to enable it, but this means that new stuff
 isn't benefitting from this.  I've recently asked arm-soc to enable
 it for the modern multi-platform builds, because modern stuff really
 be written with correct SG chaining in mind.

OK, so lets see what the actual effort is.

James



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RESEND][PATCH 1/2] lib/scatterlist: Make ARCH_HAS_SG_CHAIN an actual Kconfig

2014-03-22 Thread James Bottomley
On Sat, 2014-03-22 at 22:52 +, Russell King - ARM Linux wrote:
 On Sat, Mar 22, 2014 at 03:37:40PM -0700, James Bottomley wrote:
  On Sat, 2014-03-22 at 22:23 +, Russell King - ARM Linux wrote:
   On Sat, Mar 22, 2014 at 02:31:21PM -0700, James Bottomley wrote:
Perhaps now might be the time to ask which are the remaining
architectures that cannot do SG chaining and then we can fix them and
pull the whole thing out.
   
   Not quite.  You're making the assumption that we can be sure that all
   the scatterlist users on an architecture have been converted - that's
   simply not true on ARM.
  
  No I'm not, I said now might be the time to ask which are the remaining
  architectures that cannot do SG chaining
 
 And I'm disagreeing with that statement which implies that it's something
 that is an architecture wide property for any particular architecture.
 
 Right now in mainline, if ARM has IOMMU support enabled, then SG_CHAIN
 support will also be enabled.  I've a patch out of tree which I've been
 using for years which enables SG_CHAIN for a particular SoC (Dove).
 Otherwise, it doesn't have support for SG_CHAIN.
 
 PARISC on the other hand (as you list) has no support to enable SG_CHAIN
 under any circumstances.
 
 Where we're disagreeing is whether this is something that is always-on or
 always-off for any particular architecture.

Actually, I don't disagree with that.  PA used to share sb_iommu with
ia64 (it's the same chipset for the HP versions), but we can't now
because ia64 is chained and we're not and there's no way to say chained
for this platform but not for these other more legacy ones.  If you have
a proposal for this, I'd be interested, so I don't have to do an all or
nothing conversion, but the config option isn't it because our platform
configuration is runtime determined (we usually select every driver and
let the actual one be chosen at runtime from the config table).

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RESEND 1/1] arch Kconfig: remove references to IRQ_PER_CPU

2013-02-04 Thread James Bottomley
On Mon, 2013-02-04 at 10:09 +, James Hogan wrote:
 The IRQ_PER_CPU Kconfig symbol was removed in the following commit:
 
 Commit 6a58fb3bad099076f36f0f30f44507bc3275cdb6 (genirq: Remove
 CONFIG_IRQ_PER_CPU) merged in v2.6.39-rc1.
 
 But IRQ_PER_CPU wasn't removed from any of the architecture Kconfig
 files where it was defined or selected. It's completely unused so remove
 the remaining references.
 
 Signed-off-by: James Hogan james.ho...@imgtec.com
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Mike Frysinger vap...@gentoo.org
 Cc: Fenghua Yu fenghua...@intel.com
 Cc: Ralf Baechle r...@linux-mips.org
 Cc: James E.J. Bottomley j...@parisc-linux.org
 Cc: Helge Deller del...@gmx.de
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: Paul Mackerras pau...@samba.org
 Cc: Paul Mundt let...@linux-sh.org
 Acked-by: Tony Luck tony.l...@intel.com
 Acked-by: Richard Kuo r...@codeaurora.org

For what it's worth ACK, but I don't really think you need it since the
patch is trivial and obviously correct.

 
 Does anybody want to pick this patch up?

I see Thomas already has.  Thanks, by the way, for not doing this as one
patch per architecture ...

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/8] mm: use vm_unmapped_area() on parisc architecture

2013-01-24 Thread James Bottomley
On Wed, 2013-01-23 at 17:29 -0800, Michel Lespinasse wrote:
 Update the parisc arch_get_unmapped_area function to make use of
 vm_unmapped_area() instead of implementing a brute force search.
 
 Signed-off-by: Michel Lespinasse wal...@google.com
 Acked-by: Rik van Riel r...@redhat.com
 
 ---
  arch/parisc/kernel/sys_parisc.c |   46 ++
  1 files changed, 17 insertions(+), 29 deletions(-)
 
 diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
 index f76c10863c62..6ab138088076 100644
 --- a/arch/parisc/kernel/sys_parisc.c
 +++ b/arch/parisc/kernel/sys_parisc.c
 @@ -35,18 +35,15 @@
  
  static unsigned long get_unshared_area(unsigned long addr, unsigned long len)
  {
 - struct vm_area_struct *vma;
 + struct vm_unmapped_area_info info;
  
 - addr = PAGE_ALIGN(addr);
 -
 - for (vma = find_vma(current-mm, addr); ; vma = vma-vm_next) {
 - /* At this point:  (!vma || addr  vma-vm_end). */
 - if (TASK_SIZE - len  addr)
 - return -ENOMEM;
 - if (!vma || addr + len = vma-vm_start)
 - return addr;
 - addr = vma-vm_end;
 - }
 + info.flags = 0;
 + info.length = len;
 + info.low_limit = PAGE_ALIGN(addr);
 + info.high_limit = TASK_SIZE;
 + info.align_mask = 0;
 + info.align_offset = 0;
 + return vm_unmapped_area(info);
  }
  
  #define DCACHE_ALIGN(addr) (((addr) + (SHMLBA - 1)) ~ (SHMLBA - 1))

This macro is now redundant and can be removed.

 @@ -63,30 +60,21 @@ static unsigned long get_unshared_area(unsigned long 
 addr, unsigned long len)
   */
  static int get_offset(struct address_space *mapping)
  {
 - int offset = (unsigned long) mapping  (PAGE_SHIFT - 8);
 - return offset  0x3FF000;
 + return (unsigned long) mapping  8;

I'm not sure I agree with this shift (but I think the original was wrong
as well so the comment probably needs updating.)  Trying to derive
entropy from the mapping address is always nasty.  Mostly they're
embedded in the inode, so the right shift should be something like
log2(sizeof(inode)) + 1 and since the inode size is usually somewhere
between 512 and 1024 bytes, that comes out to 10 I think.

  }
  
  static unsigned long get_shared_area(struct address_space *mapping,
   unsigned long addr, unsigned long len, unsigned long pgoff)
  {
 - struct vm_area_struct *vma;
 - int offset = mapping ? get_offset(mapping) : 0;
 -
 - offset = (offset + (pgoff  PAGE_SHIFT))  0x3FF000;
 + struct vm_unmapped_area_info info;
  
 - addr = DCACHE_ALIGN(addr - offset) + offset;
 -
 - for (vma = find_vma(current-mm, addr); ; vma = vma-vm_next) {
 - /* At this point:  (!vma || addr  vma-vm_end). */
 - if (TASK_SIZE - len  addr)
 - return -ENOMEM;
 - if (!vma || addr + len = vma-vm_start)
 - return addr;
 - addr = DCACHE_ALIGN(vma-vm_end - offset) + offset;
 - if (addr  vma-vm_end) /* handle wraparound */
 - return -ENOMEM;
 - }
 + info.flags = 0;
 + info.length = len;
 + info.low_limit = PAGE_ALIGN(addr);
 + info.high_limit = TASK_SIZE;
 + info.align_mask = PAGE_MASK  (SHMLBA - 1);
 + info.align_offset = (get_offset(mapping) + pgoff)  PAGE_SHIFT;
 + return vm_unmapped_area(info);
  }
  
  unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,

Other than that, I think this will work, but I'd like to test it.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] scsi/ibmvscsi: /sys/class/scsi_host/hostX/config doesn't show any information

2012-07-27 Thread James Bottomley
On Fri, 2012-07-27 at 15:19 +1000, Benjamin Herrenschmidt wrote:
 On Wed, 2012-07-18 at 18:49 +0200, o...@aepfle.de wrote:
  From: Linda Xie lx...@us.ibm.com
 
 James, can I assume you're picking up those two ?

If they get acked by the maintiners ...

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Build regressions/improvements in v3.4-rc7

2012-05-16 Thread James Bottomley
On Wed, 2012-05-16 at 10:47 +0200, Geert Uytterhoeven wrote:
 On Wed, May 16, 2012 at 10:30 AM, Geert Uytterhoeven
 ge...@linux-m68k.org wrote:
   + lib/mpi/generic_mpih-mul1.c: error: inconsistent operand
 constraints in an 'asm':  = 50:70
   + lib/mpi/generic_mpih-mul2.c: error: inconsistent operand
 constraints in an 'asm':  = 49:70
   + lib/mpi/generic_mpih-mul3.c: error: inconsistent operand
 constraints in an 'asm':  = 49:70
   + lib/mpi/mpih-div.c: error: inconsistent operand constraints in an
 'asm':  = 135:122, 135:371, 97:122, 106:121, 106:370, 97:371
 
 parisc-allmodconfig

Wow, lib/mpi/ is a complete horror: it's full of hand crafted asm code.
The error in this case appears to be that umul_ppm() is implemented as
an xmpyu instruction.  That's a floating point instruction.  We
deliberately compile the kernel with floating point disabled because we
don't want to save and restore the floating point register file on each
context switch, hence the operand constraints are unsatisfiable.

It appears to be completely untested on non-x86 and to have been
imported via the security tree ... what are we supposed to do with this?
I thought the general principle was that asm code was really supposed to
be confined to the arch directories?

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: linux-next: build failure in Linus' tree

2011-08-29 Thread James Bottomley
On Tue, 2011-08-30 at 08:32 +1000, Stephen Rothwell wrote:
 Hi Linus,
 
 On Mon, 29 Aug 2011 10:44:51 +1000 Stephen Rothwell s...@canb.auug.org.au 
 wrote:
 
  After merging the fixes tree, today's linux-next build (powerpc
  ppc64_defconfig) failed like this:
  
  arch/powerpc/kernel/built-in.o: In function `.sys_call_table':
  (.text+0xbd00): undefined reference to `.sys_nfsservctl'
  arch/powerpc/kernel/built-in.o: In function `.sys_call_table':
  (.text+0xbd08): undefined reference to `.compat_sys_nfsservctl'
  
  Caused by commit f5b940997397 (All Arch: remove linkage for
  sys_nfsservctl system call) which also missed parisc.
  
  I will apply this patch for today:
 
 Will you please appply this?  (repeated for ease of inclusion)
 
 From: Stephen Rothwell s...@canb.auug.org.au
 Date: Mon, 29 Aug 2011 10:38:57 +1000
 Subject: [PATCH] remove remaining references to nfsservctl
 
 These were missed in commit f5b940997397 All Arch: remove linkage
 for sys_nfsservctl system call due to them having no sys_ prefix
 (presumably).
 
 Cc: NeilBrown ne...@suse.de
 Cc: linuxppc-dev@lists.ozlabs.org
 Cc: linux-par...@vger.kernel.org
 Signed-off-by: Stephen Rothwell s...@canb.auug.org.au

Thanks for finding this ... definitely acked by me if necessary.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-18 Thread James Bottomley
On Thu, 2011-05-19 at 13:08 +0900, Hitoshi Mitake wrote:
 On Thu, May 19, 2011 at 04:11, Moore, Eric eric.mo...@lsi.com wrote:
  On Wednesday, May 18, 2011 12:31 PM Milton Miller wrote:
  Ingo I would propose the following commits added in 2.6.29 be reverted.
  I think the current concensus is drivers must know if the writeq is
  not atomic so they can provide their own locking or other workaround.
 
 
 
  Exactly.
 
 
 The original motivation of preparing common readq/writeq is that
 letting each driver
 have their own readq/writeq is bad for maintenance of source code.
 
 But if you really dislike them, there might be two solutions:
 
 1. changing the name of readq/writeq to readq_nonatomic/writeq_nonatomic

This is fine, but not really very useful

 2. adding new C file to somewhere and defining spinlock for them.
 With spin_lock_irqsave() and spin_unlock_irqrestore() on the spinlock,
 readq/writeq can be atomic.

This can't really be done generically.  There are several considerations
to do with hardware requirements.  I can see some hw requiring a
specific write order (I think this applies more to read order, though).

The specific mpt2sas problem is that if we write a 64 bit register non
atomically, we can't allow any interleaving writes for any other region
on the chip, otherwise the HW will take the write as complete in the 64
bit register and latch the wrong value.  The only way to achieve that
given the semantics of writeq is a global static spinlock.

 How do you think about them? If you cannot agree with the above two
 solutions, I'll agree with reverting them.

Having x86 roll its own never made any sense, so I think they need
reverting anyway.  This is a driver/platform bus problem not an
architecture problem.  The assumption we can make is that the platform
CPU can write atomically at its chip width.  We *may* be able to make
the assumption that the bus controller can translate an atomic chip
width transaction to a single atomic bus transaction; I think that
assumption holds true for at least PCI and on the parisc legacy busses,
so if we can agree on semantics, this should be a global define
somewhere.  If there are problems with the bus assumption, we'll likely
need some type of opt-in (or just not bother).

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic

2011-05-17 Thread James Bottomley
On Tue, 2011-05-17 at 22:15 -0600, Matthew Wilcox wrote:
 On Wed, May 18, 2011 at 09:37:08AM +0530, Desai, Kashyap wrote:
  On Wed, 2011-05-04 at 17:23 +0530, Kashyap, Desai wrote:
   The following code seems to be there in 
   /usr/src/linux/arch/x86/include/asm/io.h.
   This is not going to work.
   
   static inline void writeq(__u64 val, volatile void __iomem *addr)
   {
   writel(val, addr);
   writel(val  32, addr+4);
   }
   
   So with this code turned on in the kernel, there is going to be race 
   condition 
   where multiple cpus can be writing to the request descriptor at the same 
   time.
   
   Meaning this could happen:
   (A) CPU A doest 32bit write
   (B) CPU B does 32 bit write
   (C) CPU A does 32 bit write
   (D) CPU B does 32 bit write
   
   We need the 64 bit completed in one access pci memory write, else spin 
   lock is required.
   Since it's going to be difficult to know which writeq was implemented in 
   the kernel, 
   the driver is going to have to always acquire a spin lock each time we do 
   64bit write.
   
   Cc: sta...@kernle.org
   Signed-off-by: Kashyap Desai kashyap.de...@lsi.com
   ---
   diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c 
   b/drivers/scsi/mpt2sas/mpt2sas_base.c
   index efa0255..5778334 100644
   --- a/drivers/scsi/mpt2sas/mpt2sas_base.c
   +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
   @@ -1558,7 +1558,6 @@ mpt2sas_base_free_smid(struct MPT2SAS_ADAPTER *ioc, 
   u16 smid)
 * care of 32 bit environment where its not quarenteed to send the 
   entire word
 * in one transfer.
 */
   -#ifndef writeq
  
  Why not make this #ifndef CONFIG_64BIT?  You know that all 64 bit
  systems have writeq implemented correctly; you suspect 32 bit systems
  don't.
  
  James
  
  James, This issue was observed on PPC64 system. So what you have suggested 
  will not solve this issue.
  If we are sure that writeq() is atomic across all architecture, we can use 
  it safely. As we have seen issue on ppc64, we are not confident to use
  writeq call.
 
 So have you told the powerpc people that they have a broken writeq?

I'm just in the process of finding them now on IRC so I can demand an
explanation: this is a really serious API problem because writeq is
supposed to be atomic on 64 bit.

 And why do you obfuscate your report by talking about i386 when it's
 really about powerpc64?

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] atomic: add *_dec_not_zero

2011-05-04 Thread James Bottomley
On Wed, 2011-05-04 at 00:44 -0400, Mike Frysinger wrote:
 On Tue, May 3, 2011 at 17:30, Sven Eckelmann wrote:
  Introduce an *_dec_not_zero operation.  Make this a special case of
  *_add_unless because batman-adv uses atomic_dec_not_zero in different
  places like re-broadcast queue or aggregation queue management. There
  are other non-final patches which may also want to use this macro.
 
  Cc: uclinux-dist-de...@blackfin.uclinux.org
 
  --- a/arch/blackfin/include/asm/atomic.h
  +++ b/arch/blackfin/include/asm/atomic.h
  @@ -103,6 +103,7 @@ static inline int atomic_test_mask(int mask, atomic_t 
  *v)
 c != (u);   \
   })
   #define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
  +#define atomic_dec_not_zero(v) atomic_add_unless((v), -1, 0)
 
   /*
   * atomic_inc_and_test - increment and test
 
 no opinion on the actual idea, but for the Blackfin pieces:
 Acked-by: Mike Frysinger vap...@gentoo.org

This goes for parisc as well.

Acked-by: James Bottomley james.bottom...@hansenpartnership.com

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 00/34] Make kernel build deterministic

2011-04-05 Thread James Bottomley
On Tue, 2011-04-05 at 08:49 -0700, Greg KH wrote:
 On Tue, Apr 05, 2011 at 04:58:47PM +0200, Michal Marek wrote:
  
  Hi,
  
  this series makes it possible to build bit-identical kernel image and
  modules from identical sources. Of course the build is already
  deterministic in terms of behavior of the code, but the various
  timestamps embedded in the object files make it hard to compare two
  builds, for instance to verify that a makefile cleanup didn't
  accidentally change something. A prime example is /proc/config.gz, which
  has both a timestamp in the gzip header and a timestamp in the payload
  data. With this series applied, a script like this will produce
  identical kernels each time:
 
 Very nice stuff.  Do you want to take the individual patches through one
 of your trees, or do you mind if the subsystem maintainers take them
 through theirs?

I'm happy for this to go through a single tree.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] [SCSI] mpt fusion: Fix 32 bit platforms with 64 bit resources.

2009-12-10 Thread James Bottomley
On Wed, 2009-12-02 at 17:51 -0800, Pravin Bathija wrote:
 Powerpc 44x uses 36 bit real address while the real address defined
 in MPT Fusion driver is of type 32 bit. This causes ioremap to fail and 
 driver
 fails to initialize. This fix changes the data types representing the real
 address from unsigned long 32-bit types to resource_size_t which is 
 64-bit. The
 driver has been tested, the disks get discovered correctly and can do IO.
 
 Signed-off-by: Pravin Bathija pbath...@amcc.com
 Acked-by: Feng Kan f...@amcc.com
 Acked-by: Fushen Chen fc...@amcc.com
 Acked-by: Loc Ho l...@amcc.com
 Acked-by: Tirumala Reddy Marri tma...@amcc.com
 Acked-by: Victor Gallardo vgalla...@amcc.com
 ---
  drivers/message/fusion/mptbase.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/message/fusion/mptbase.c 
 b/drivers/message/fusion/mptbase.c
 index 5d496a9..9f14a60 100644
 --- a/drivers/message/fusion/mptbase.c
 +++ b/drivers/message/fusion/mptbase.c
 @@ -1511,7 +1511,7 @@ mpt_mapresources(MPT_ADAPTER *ioc)
  {
   u8  __iomem *mem;
   int  ii;
 - unsigned longmem_phys;
 + resource_size_t  mem_phys;

You never actually compiled this, did you?

drivers/message/fusion/mptbase.c: In function 'mpt_mapresources':
drivers/message/fusion/mptbase.c:1680: warning: format '%lx' expects type 'long 
unsigned int', but argument 4 has type 'resource_size_t'

I'll just fold the fix in

James

---

diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
index 162923f..85bc6a6 100644
--- a/drivers/message/fusion/mptbase.c
+++ b/drivers/message/fusion/mptbase.c
@@ -1677,8 +1677,8 @@ mpt_mapresources(MPT_ADAPTER *ioc)
return -EINVAL;
}
ioc-memmap = mem;
-   dinitprintk(ioc, printk(MYIOC_s_INFO_FMT mem = %p, mem_phys = %lx\n,
-   ioc-name, mem, mem_phys));
+   dinitprintk(ioc, printk(MYIOC_s_INFO_FMT mem = %p, mem_phys = %llx\n,
+   ioc-name, mem, (unsigned long long)mem_phys));
 
ioc-mem_phys = mem_phys;
ioc-chip = (SYSIF_REGS __iomem *)mem;


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] [SCSI] mpt fusion: Fix 32 bit platforms with 64 bit resources.

2009-11-05 Thread James Bottomley
On Thu, 2009-11-05 at 08:43 -0500, Josh Boyer wrote:
 On Tue, Sep 15, 2009 at 03:25:55PM -0700, pbath...@amcc.com wrote:
 From: Pravin Bathija pbath...@amcc.com
 
 Powerpc 44x uses 36 bit real address while the real address defined
 in MPT Fusion driver is of type 32 bit. This causes ioremap to fail and 
 driver
 fails to initialize. This fix changes the data types representing the real
 address from unsigned long 32-bit types to phys_addr_t which is 64-bit. The
 driver has been tested, the disks get discovered correctly and can do IO. 
 Also,
 replaced phys_addr_t with resource_size_t as suggested by Ben.
 
 Signed-off-by: Pravin Bathija pbath...@amcc.com
 Acked-by: Feng Kan f...@amcc.com
 Acked-by: Prodyut Hazarika phazar...@amcc.com
 Acked-by: Loc Ho l...@amcc.com
 Acked-by: Tirumala Reddy Marri tma...@amcc.com
 Acked-by: Victor Gallardo vgalla...@amcc.com
 
 Is this patch included in the scsi tree at all?  I can't seem to find it in
 linux-next and I know it's not in the powerpc tree.  Are there further changes
 needed, or has it simply been missed?

What was the feedback from LSI ... I haven't seen any here?

 josh
 
 
 ---
  drivers/message/fusion/mptbase.c |   34 +-
  drivers/message/fusion/mptbase.h |5 +++--
  2 files changed, 28 insertions(+), 11 deletions(-)
 
 diff --git a/drivers/message/fusion/mptbase.c 
 b/drivers/message/fusion/mptbase.c
 index 5d496a9..e296f2e 100644
 --- a/drivers/message/fusion/mptbase.c
 +++ b/drivers/message/fusion/mptbase.c
 @@ -1510,11 +1510,12 @@ static int
  mpt_mapresources(MPT_ADAPTER *ioc)
  {
  u8  __iomem *mem;
 +u8  __iomem *port;
  int  ii;
 -unsigned longmem_phys;
 -unsigned longport;
 -u32  msize;
 -u32  psize;
 +resource_size_t  mem_phys;
 +resource_size_t  port_phys;
 +resource_size_t  msize;
 +resource_size_t  psize;
  u8   revision;
  int  r = -ENODEV;
  struct pci_dev *pdev;
 @@ -1552,13 +1553,13 @@ mpt_mapresources(MPT_ADAPTER *ioc)
  }
 
  mem_phys = msize = 0;
 -port = psize = 0;
 +port_phys = psize = 0;
  for (ii = 0; ii  DEVICE_COUNT_RESOURCE; ii++) {
  if (pci_resource_flags(pdev, ii)  PCI_BASE_ADDRESS_SPACE_IO) {
  if (psize)
  continue;
  /* Get I/O space! */
 -port = pci_resource_start(pdev, ii);
 +port_phys = pci_resource_start(pdev, ii);
  psize = pci_resource_len(pdev, ii);
  } else {
  if (msize)
 @@ -1580,14 +1581,23 @@ mpt_mapresources(MPT_ADAPTER *ioc)
  return -EINVAL;
  }
  ioc-memmap = mem;
 -dinitprintk(ioc, printk(MYIOC_s_INFO_FMT mem = %p, mem_phys = %lx\n,
 -ioc-name, mem, mem_phys));
 +dinitprintk(ioc, printk(MYIOC_s_INFO_FMT mem = %p, mem_phys = %llx\n,
 +ioc-name, mem, (u64)mem_phys));
 
  ioc-mem_phys = mem_phys;
  ioc-chip = (SYSIF_REGS __iomem *)mem;
 
  /* Save Port IO values in case we need to do downloadboot */
 -ioc-pio_mem_phys = port;
 +port = ioremap(port_phys, psize);
 +if (port == NULL) {
 +printk(MYIOC_s_ERR_FMT  : ERROR - Unable to map adapter
 + port !\n, ioc-name);
 +return -EINVAL;

So this looks problematic on a few platforms ... what happens to
platforms that have no IO space?  They automatically fail here and it
looks like the adapter never attaches.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Boot failure on the powerstation with 2.6.30 latest

2009-06-22 Thread James Bottomley
2.6.30-rc8 worked fine ... unless this is a known problem, I suppose I
can begin bisecting.

The boot log of the hang is:

Please wait, loading kernel...
   Elf64 kernel loaded...
Loading ramdisk...
ramdisk loaded at 0250, size: 8280 Kbytes
OF stdout device is: /ht/i...@8/ser...@2f8
Preparing to boot Linux version 2.6.30 (j...@claymoor) (gcc version 4.3.3 
(Debian 4.3.3-10) ) #1 SMP Mon Jun 22 09:59:35 CDT 2009
command line: root=/dev/sda3 ro console=ttyS0,19200n1 
memory layout at init:
  alloc_bottom : 02d16000
  alloc_top: 3000
  alloc_top_hi : 8000
  rmo_top  : 3000
  ram_top  : 8000
instantiating rtas at 0x2fff5000... done
boot cpu hw idx 
starting cpu hw idx 0001... done
starting cpu hw idx 0002... done
starting cpu hw idx 0003... done
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x03117000 - 0x03117640
Device tree struct  0x03118000 - 0x0311b000
Calling quiesce...
returning from prom_init

So it looks like some type of early boot failure or handoff in head_64

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Boot failure on the powerstation with 2.6.30 latest

2009-06-22 Thread James Bottomley
On Mon, 2009-06-22 at 11:11 -0500, Brian King wrote:
 James,
 
 I was running into a similar hang on one of my Power boxes as well.
 Reverting c868d550115b9ccc0027c67265b9520790f05601 allowed by system
 to boot. It looks like that patch injected a bug where we can end up
 waiting on an uninitialized mutex:
 
 [c09f3c30] c052c7dc .mutex_lock+0x34/0x50
 [c09f3cb0] c008b190 .get_online_cpus+0x3c/0x74
 [c09f3d40] c0146cd0 .kmem_cache_create+0xcc/0x548
 [c09f3e50] c0032ae0 .pgtable_cache_init+0x28/0x6c
 [c09f3ee0] c0780960 .start_kernel+0x1ec/0x520
 [c09f3f90] c00083d8 .start_here_common+0x1c/0x44
 
 The mutex gets initialized in cpu_hotplug_init, which doesn't get called until
 after pgtable_cache_init.

Actually, no, reverting that one doesn't fix it.

A full run of git bisect turns up this commit as the culprit; I'll make
a fuss on lkml:

83b519e8b9572c319c8e0c615ee5dd7272856090 is first bad commit
commit 83b519e8b9572c319c8e0c615ee5dd7272856090
Author: Pekka Enberg penb...@cs.helsinki.fi
Date:   Wed Jun 10 19:40:04 2009 +0300

slab: setup allocators earlier in the boot sequence

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: ipr boot failure caused by MSI (2.6.30-rc1+)

2009-05-22 Thread James Bottomley
On Thu, 2009-05-21 at 14:51 -0500, James Bottomley wrote:
 On Thu, 2009-05-21 at 13:47 -0500, Brian King wrote:
  cc'ing linuxppc-dev...
  
  -Brian
  
  
  James Bottomley wrote:
   Kernels after 2.6.30-rc1 stopped booting on my powerstation.  The ipr
   just times out and refuses to probe devices.  If I let it drop into the
   initramfs system, this is what the interrupts shows:
   
   (initramfs) cat /proc/interrupts 
  CPU0   CPU1   CPU2   CPU3   
16: 20 10 13 11   MPIC  Level 
   pata_amd
20:  0  0  0  0   MPIC  Level 
   ohci_hcd:usb1, ohci_hcd:usb2
21:  0  0  0  0  MPIC-U3MSI Edge  ipr
68: 37 37 48 37   MPIC  Edge  
   serial
   251: 10 71 69 72   MPIC  Edge  
   ipi call function
   252:   1555   1779   1372   1155   MPIC  Edge  
   ipi reschedule
   253:  0  0  0  0   MPIC  Edge  
   ipi call function single
   254:  0  0  0  0   MPIC  Edge  
   ipi debugger
   BAD:416
   
   So you see the IPR is the only device not receiving them.
   
   I can fix the boot hang by reverting
   
   commit 5a9ef25b14d39b8413364df12cb8d9bb7a673a32
   Author: Wayne Boyer way...@linux.vnet.ibm.com
   Date:   Fri Jan 23 09:17:35 2009 -0800
   
   [SCSI] ipr: add MSI support
   
   The system in question is:
   
   SYSTEM INFORMATION
Processor  = PowerPC,970MP @ 2500 MHz
I/O Bridge = U4 (4.4)
SMP Size   = 4 (#0 #1 #2 #3)
Boot-Date  = 2009-04-21 17:13:36
Memory = 2 GB of RAM @ 666 MHz
Board Type = Bimini (7047191/000/1)
MFG Date   = 1608
Part No.   = 10N8748 
FRU No.= 10N7182 
FRU Serial = YL30W8106038
UUID   = 
Flashside  = 1 (temporary)
Version= HEAD
Build Date = 12-04-2008 16:13
 
 OK, so as an update, I booted to the initrd and inserted the network
 modules, which are also MSI enabled and this is what I get:
 
 (initramfs) cat /proc/interrupts 
CPU0   CPU1   CPU2   CPU3   
  16: 14 11 11 18   MPIC  Level 
 pata_amd
  20:  0  0  0  0   MPIC  Level 
 ohci_hcd:usb1, ohci_hcd:usb2
  21:  0  0  0  0  MPIC-U3MSI Edge  ipr
  22:  1  0  1  0  MPIC-U3MSI Edge  eth0
  23:  0  2  1  0  MPIC-U3MSI Edge  eth1
  68:193166113177   MPIC  Edge  serial
 251: 16 65 71 70   MPIC  Edge  ipi 
 call function
 252:   1574   1804   1346   1289   MPIC  Edge  ipi 
 reschedule
 253:  0  0  0  0   MPIC  Edge  ipi 
 call function single
 254:  0  0  0  0   MPIC  Edge  ipi 
 debugger
 BAD:   1866
 
 So clearly the MSI interrupts to the network cards are working and it
 looks like just a local problem with the ipr rather than a platform
 problem with MSI.

I saw the quirk fix for this go by:

http://ozlabs.org/pipermail/linuxppc-dev/2009-May/072436.html

Is there an easy way to trigger an interrupt on this device?  Preferably
in ipr_probe_ioa() so we can at least print out if the interrupts are
misrouted and fall back from MSI to normal using the PCI infrastructure?

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: ipr boot failure caused by MSI (2.6.30-rc1+)

2009-05-21 Thread James Bottomley
On Thu, 2009-05-21 at 13:47 -0500, Brian King wrote:
 cc'ing linuxppc-dev...
 
 -Brian
 
 
 James Bottomley wrote:
  Kernels after 2.6.30-rc1 stopped booting on my powerstation.  The ipr
  just times out and refuses to probe devices.  If I let it drop into the
  initramfs system, this is what the interrupts shows:
  
  (initramfs) cat /proc/interrupts 
 CPU0   CPU1   CPU2   CPU3   
   16: 20 10 13 11   MPIC  Level 
  pata_amd
   20:  0  0  0  0   MPIC  Level 
  ohci_hcd:usb1, ohci_hcd:usb2
   21:  0  0  0  0  MPIC-U3MSI Edge  ipr
   68: 37 37 48 37   MPIC  Edge  
  serial
  251: 10 71 69 72   MPIC  Edge  ipi 
  call function
  252:   1555   1779   1372   1155   MPIC  Edge  ipi 
  reschedule
  253:  0  0  0  0   MPIC  Edge  ipi 
  call function single
  254:  0  0  0  0   MPIC  Edge  ipi 
  debugger
  BAD:416
  
  So you see the IPR is the only device not receiving them.
  
  I can fix the boot hang by reverting
  
  commit 5a9ef25b14d39b8413364df12cb8d9bb7a673a32
  Author: Wayne Boyer way...@linux.vnet.ibm.com
  Date:   Fri Jan 23 09:17:35 2009 -0800
  
  [SCSI] ipr: add MSI support
  
  The system in question is:
  
  SYSTEM INFORMATION
   Processor  = PowerPC,970MP @ 2500 MHz
   I/O Bridge = U4 (4.4)
   SMP Size   = 4 (#0 #1 #2 #3)
   Boot-Date  = 2009-04-21 17:13:36
   Memory = 2 GB of RAM @ 666 MHz
   Board Type = Bimini (7047191/000/1)
   MFG Date   = 1608
   Part No.   = 10N8748 
   FRU No.= 10N7182 
   FRU Serial = YL30W8106038
   UUID   = 
   Flashside  = 1 (temporary)
   Version= HEAD
   Build Date = 12-04-2008 16:13

OK, so as an update, I booted to the initrd and inserted the network
modules, which are also MSI enabled and this is what I get:

(initramfs) cat /proc/interrupts 
   CPU0   CPU1   CPU2   CPU3   
 16: 14 11 11 18   MPIC  Level pata_amd
 20:  0  0  0  0   MPIC  Level 
ohci_hcd:usb1, ohci_hcd:usb2
 21:  0  0  0  0  MPIC-U3MSI Edge  ipr
 22:  1  0  1  0  MPIC-U3MSI Edge  eth0
 23:  0  2  1  0  MPIC-U3MSI Edge  eth1
 68:193166113177   MPIC  Edge  serial
251: 16 65 71 70   MPIC  Edge  ipi call 
function
252:   1574   1804   1346   1289   MPIC  Edge  ipi 
reschedule
253:  0  0  0  0   MPIC  Edge  ipi call 
function single
254:  0  0  0  0   MPIC  Edge  ipi 
debugger
BAD:   1866

So clearly the MSI interrupts to the network cards are working and it
looks like just a local problem with the ipr rather than a platform
problem with MSI.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Next March 25: Boot failure on powerpc [recursive locking detected]

2009-03-26 Thread James Bottomley
On Thu, 2009-03-26 at 12:04 +0530, Sachin Sant wrote:
 Sachin Sant wrote:
  Today's next failed to boot on a powerpc box
  (Power6 blade IBM,7998-61X) with following recursive locking message.
 
  =
  [ INFO: possible recursive locking detected ]
  2.6.29-next-20090325 #1
 After bisecting the failure seems to be because of the following
 patch from James ( block: move SCSI timeout check into block )
 
 http://patchwork.kernel.org/patch/8017/
 
 If i back out the above mentioned patch, the machine boots fine
 without any problems.

Yes, that patch already got dropped for other reasons:

http://marc.info/?t=12374077372

I'm going to see if I can redo it in a better way, since moving this
type of timeout checking from scsi to block is a useful generalisation.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch] powerpc/ps3: Use hard coded values for LV1 device type

2009-02-08 Thread James Bottomley
On Mon, 2009-02-09 at 11:11 +1100, Benjamin Herrenschmidt wrote:
 On Sun, 2009-02-08 at 22:29 +1100, Michael Ellerman wrote:
  On Fri, 2009-02-06 at 18:42 -0800, Geoff Levand wrote:
   Change the PS3 platform code to use hard coded numbers for its
   LV1 device types.
   
   The PS3 platform code was incorrectly using some scsi block
   constants for the device type returned from the LV1 hypervisor.
   
   Fixes build errors like these when CONFIG_BLOCK=n:
   
 In file included from include/scsi/scsi.h:12,
  from arch/powerpc/platforms/ps3/platform.h:25,
  from arch/powerpc/platforms/ps3/setup.c:36:
 include/scsi/scsi_cmnd.h:27:25: warning: BLK_MAX_CDB is not defined
 include/scsi/scsi_cmnd.h:28:3: error: #error MAX_COMMAND_SIZE can not 
   be bigger than BLK_MAX_CDB
 
 Adding Jens and James on CC since I think a proper fix lies in blkdev.h
 or scsi*.h

And cc'd linux-scsi

 So basically, the whole of blkdev.h is inside a big ifdef
 CONFIG_BLOCK... which means that scsi_cmnd.h can't build which in turn
 makes scsi.h fail.

Well, look at it from our point of view; it's impossible to build SCSI
without block, so a little interdependence is easy to get.

 The PS3 platform code wants to use some of the standard SCSI types from
 there though, as they are part of the hypervisor ABI. (And in fact it
 can be argued that non-block devices using SCSI do exist, such as
 scanners, no ?)
 
 Any reason other than pre-historical to have blkdev.h shielded like
 that ?

Actually, I think the fix lies in scsi.h ... we can make that into a
nicely independent protocol header file.  Your current woes come because
it pulls in scsi_cmnd.h ... perhaps just getting rid of this will fix
it.

Can the rest of linux-scsi verify that the fix below doesn't break
something else?

I found one cockup: block/cmd-filter.c is apparently not including
linuc/blkdev.h directly but via scsi/scsi.h ... I fixed this up.

 Cheers,
 Ben.
 
   Signed-off-by: Geoff Levand geoffrey.lev...@am.sony.com
   ---
   Ben,
   
   Please send upstream for 2.6.29.
   
   -Geoff
   
arch/powerpc/platforms/ps3/platform.h |8 +++-
1 file changed, 3 insertions(+), 5 deletions(-)
   
   --- a/arch/powerpc/platforms/ps3/platform.h
   +++ b/arch/powerpc/platforms/ps3/platform.h
   @@ -22,8 +22,6 @@
#define _PS3_PLATFORM_H

#include linux/rtc.h
   -#include scsi/scsi.h
   -
#include asm/ps3.h

/* htab */
   @@ -83,12 +81,12 @@ enum ps3_bus_type {
};

enum ps3_dev_type {
   - PS3_DEV_TYPE_STOR_DISK = TYPE_DISK, /* 0 */
   + PS3_DEV_TYPE_STOR_DISK = 0, /* TYPE_DISK */
 PS3_DEV_TYPE_SB_GELIC = 3,
 PS3_DEV_TYPE_SB_USB = 4,
   - PS3_DEV_TYPE_STOR_ROM = TYPE_ROM,   /* 5 */
   + PS3_DEV_TYPE_STOR_ROM = 5, /* TYPE_ROM */
 PS3_DEV_TYPE_SB_GPIO = 6,
   - PS3_DEV_TYPE_STOR_FLASH = TYPE_RBC, /* 14 */
   + PS3_DEV_TYPE_STOR_FLASH = 14, /* TYPE_RBC */
  
  This looks like you're just papering over the bug, by hardcoding the
  same values that are in the scsi header. Or are they really independent,
  in which case I'd say the comments are confusing.
  
  cheers

James

---

diff --git a/block/cmd-filter.c b/block/cmd-filter.c
index 504b275..572bbc2 100644
--- a/block/cmd-filter.c
+++ b/block/cmd-filter.c
@@ -22,6 +22,7 @@
 #include linux/spinlock.h
 #include linux/capability.h
 #include linux/bitops.h
+#include linux/blkdev.h
 
 #include scsi/scsi.h
 #include linux/cdrom.h
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 80d7f60..084478e 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -9,7 +9,8 @@
 #define _SCSI_SCSI_H
 
 #include linux/types.h
-#include scsi/scsi_cmnd.h
+
+struct scsi_cmnd;
 
 /*
  * The maximum number of SG segments that we will put inside a
@@ -439,22 +440,6 @@ static inline int scsi_is_wlun(unsigned int lun)
 #define host_byte(result)   (((result)  16)  0xff)
 #define driver_byte(result) (((result)  24)  0xff)
 
-static inline void set_msg_byte(struct scsi_cmnd *cmd, char status)
-{
-   cmd-result |= status  8;
-}
-
-static inline void set_host_byte(struct scsi_cmnd *cmd, char status)
-{
-   cmd-result |= status  16;
-}
-
-static inline void set_driver_byte(struct scsi_cmnd *cmd, char status)
-{
-   cmd-result |= status  24;
-}
-
-
 #define sense_class(sense)  (((sense)  4)  0x7)
 #define sense_error(sense)  ((sense)  0xf)
 #define sense_valid(sense)  ((sense)  0x80);
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 855bf95..43b50d3 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -291,4 +291,19 @@ static inline struct scsi_data_buffer *scsi_prot(struct 
scsi_cmnd *cmd)
 #define scsi_for_each_prot_sg(cmd, sg, nseg, __i)  \
for_each_sg(scsi_prot_sglist(cmd), sg, nseg, __i)
 
+static inline void set_msg_byte(struct scsi_cmnd *cmd, char status)
+{
+   cmd-result |= status  8;
+}
+
+static inline void set_host_byte(struct scsi_cmnd *cmd, 

RE: [PATCH] Correct printk %pF to work on all architectures

2008-09-09 Thread James Bottomley
OK, so could we get this in to -rc5 please?  It's a bug fix for parisc
since we're currently printing rubbish.

James

---
From: James Bottomley [EMAIL PROTECTED]
Date: Wed, 3 Sep 2008 20:43:36 -0500
Subject: lib: Correct printk %pF to work on all architectures

It was introduced by

commit 0fe1ef24f7bd0020f29ffe287dfdb9ead33ca0b2
Author: Linus Torvalds [EMAIL PROTECTED]
Date:   Sun Jul 6 16:43:12 2008 -0700

vsprintf: add support for '%pS' and '%pF' pointer formats


However, the current way its coded doesn't work on parisc64.  For two
reasons:  1) parisc isn't in the #ifdef and 2) parisc has a different
format for function descriptors

Make dereference_function_descriptor() more accommodating by allowing
architecture overrides.  I put the three overrides (for parisc64, ppc64
and ia64) in arch/kernel/module.c because that's where the kernel
internal linker which knows how to deal with function descriptors sits.

Signed-off-by: James Bottomley [EMAIL PROTECTED]
Acked-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
Acked-by: Tony Luck [EMAIL PROTECTED]
---
 arch/ia64/include/asm/sections.h|3 +++
 arch/ia64/kernel/module.c   |   12 
 arch/parisc/kernel/module.c |   14 ++
 arch/powerpc/include/asm/sections.h |3 +++
 arch/powerpc/kernel/module_64.c |   13 -
 include/asm-generic/sections.h  |6 ++
 include/asm-parisc/sections.h   |5 +
 lib/vsprintf.c  |   11 +--
 8 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/arch/ia64/include/asm/sections.h b/arch/ia64/include/asm/sections.h
index 7286e4a..a7acad2 100644
--- a/arch/ia64/include/asm/sections.h
+++ b/arch/ia64/include/asm/sections.h
@@ -21,5 +21,8 @@ extern char __start_gate_brl_fsys_bubble_down_patchlist[], 
__end_gate_brl_fsys_b
 extern char __start_unwind[], __end_unwind[];
 extern char __start_ivt_text[], __end_ivt_text[];
 
+#undef dereference_function_descriptor
+void *dereference_function_descriptor(void *);
+
 #endif /* _ASM_IA64_SECTIONS_H */
 
diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c
index 29aad34..545626f 100644
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -31,9 +31,11 @@
 #include linux/elf.h
 #include linux/moduleloader.h
 #include linux/string.h
+#include linux/uaccess.h
 #include linux/vmalloc.h
 
 #include asm/patch.h
+#include asm/sections.h
 #include asm/unaligned.h
 
 #define ARCH_MODULE_DEBUG 0
@@ -941,3 +943,13 @@ module_arch_cleanup (struct module *mod)
if (mod-arch.core_unw_table)
unw_remove_unwind_table(mod-arch.core_unw_table);
 }
+
+void *dereference_function_descriptor(void *ptr)
+{
+   struct fdesc *desc = ptr;
+   void *p;
+
+   if (!probe_kernel_address(desc-ip, p))
+   ptr = p;
+   return ptr;
+}
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index fdacdd4..44138c3 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -47,7 +47,9 @@
 #include linux/string.h
 #include linux/kernel.h
 #include linux/bug.h
+#include linux/uaccess.h
 
+#include asm/sections.h
 #include asm/unwind.h
 
 #if 0
@@ -860,3 +862,15 @@ void module_arch_cleanup(struct module *mod)
deregister_unwind_table(mod);
module_bug_cleanup(mod);
 }
+
+#ifdef CONFIG_64BIT
+void *dereference_function_descriptor(void *ptr)
+{
+   Elf64_Fdesc *desc = ptr;
+   void *p;
+
+   if (!probe_kernel_address(desc-addr, p))
+   ptr = p;
+   return ptr;
+}
+#endif
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index 916018e..7710e9e 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -16,6 +16,9 @@ static inline int in_kernel_text(unsigned long addr)
return 0;
 }
 
+#undef dereference_function_descriptor
+void *dereference_function_descriptor(void *);
+
 #endif
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index ee6a298..ad79de2 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -21,8 +21,9 @@
 #include linux/err.h
 #include linux/vmalloc.h
 #include linux/bug.h
+#include linux/uaccess.h
 #include asm/module.h
-#include asm/uaccess.h
+#include asm/sections.h
 #include asm/firmware.h
 #include asm/code-patching.h
 #include linux/sort.h
@@ -451,3 +452,13 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 
return 0;
 }
+
+void *dereference_function_descriptor(void *ptr)
+{
+   struct ppc64_opd_entry *desc = ptr;
+   void *p;
+
+   if (!probe_kernel_address(desc-funcaddr, p))
+   ptr = p;
+   return ptr;
+}
diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
index 8feeae1..79a7ff9 100644
--- a/include/asm-generic/sections.h
+++ b/include/asm-generic/sections.h
@@ -14,4 +14,10 @@ extern char __kprobes_text_start

[PATCH] fix compile failure with non modular builds

2008-09-09 Thread James Bottomley
commit deac93df26b20cf8438339b5935b5f5643bc30c9
Author: James Bottomley [EMAIL PROTECTED]
Date:   Wed Sep 3 20:43:36 2008 -0500

lib: Correct printk %pF to work on all architectures

Broke the non modular builds by moving an essential function into
modules.c.  Fix this by moving it out again and into asm/sections.h as
an inline.  To do this, the definition of struct ppc64_opd_entry has
been lifted out of modules.c and put in asm/elf.h where it belongs.

Signed-off-by: James Bottomley [EMAIL PROTECTED]

---
diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h
index 80d1f39..64c6ee2 100644
--- a/arch/powerpc/include/asm/elf.h
+++ b/arch/powerpc/include/asm/elf.h
@@ -409,6 +409,13 @@ do {   
\
 /* Keep this the last entry.  */
 #define R_PPC64_NUM107
 
+/* There's actually a third entry here, but it's unused */
+struct ppc64_opd_entry
+{
+   unsigned long funcaddr;
+   unsigned long r2;
+};
+
 #ifdef  __KERNEL__
 
 #ifdef CONFIG_SPU_BASE
diff --git a/arch/powerpc/include/asm/sections.h 
b/arch/powerpc/include/asm/sections.h
index 7710e9e..07956f3 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -2,6 +2,8 @@
 #define _ASM_POWERPC_SECTIONS_H
 #ifdef __KERNEL__
 
+#include linux/elf.h
+#include linux/uaccess.h
 #include asm-generic/sections.h
 
 #ifdef __powerpc64__
@@ -17,7 +19,15 @@ static inline int in_kernel_text(unsigned long addr)
 }
 
 #undef dereference_function_descriptor
-void *dereference_function_descriptor(void *);
+static inline void *dereference_function_descriptor(void *ptr)
+{
+   struct ppc64_opd_entry *desc = ptr;
+   void *p;
+
+   if (!probe_kernel_address(desc-funcaddr, p))
+   ptr = p;
+   return ptr;
+}
 
 #endif
 
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index ad79de2..1af2377 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -21,9 +21,7 @@
 #include linux/err.h
 #include linux/vmalloc.h
 #include linux/bug.h
-#include linux/uaccess.h
 #include asm/module.h
-#include asm/sections.h
 #include asm/firmware.h
 #include asm/code-patching.h
 #include linux/sort.h
@@ -43,13 +41,6 @@
 #define DEBUGP(fmt , ...)
 #endif
 
-/* There's actually a third entry here, but it's unused */
-struct ppc64_opd_entry
-{
-   unsigned long funcaddr;
-   unsigned long r2;
-};
-
 /* Like PPC32, we need little trampolines to do  24-bit jumps (into
the kernel itself).  But on PPC64, these need to be used for every
jump, actually, to reset r2 (TOC+0x8000). */
@@ -452,13 +443,3 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 
return 0;
 }
-
-void *dereference_function_descriptor(void *ptr)
-{
-   struct ppc64_opd_entry *desc = ptr;
-   void *p;
-
-   if (!probe_kernel_address(desc-funcaddr, p))
-   ptr = p;
-   return ptr;
-}


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] fix compile failure with non modular builds

2008-09-09 Thread James Bottomley
On Wed, 2008-09-10 at 10:09 +1000, Benjamin Herrenschmidt wrote:
 On Tue, 2008-09-09 at 19:04 -0500, James Bottomley wrote:
  commit deac93df26b20cf8438339b5935b5f5643bc30c9
  Author: James Bottomley [EMAIL PROTECTED]
  Date:   Wed Sep 3 20:43:36 2008 -0500
  
  lib: Correct printk %pF to work on all architectures
  
  Broke the non modular builds by moving an essential function into
  modules.c.  Fix this by moving it out again and into asm/sections.h as
  an inline.  To do this, the definition of struct ppc64_opd_entry has
  been lifted out of modules.c and put in asm/elf.h where it belongs.
  
  Signed-off-by: James Bottomley [EMAIL PROTECTED]
 
 Ouch. Ack.

Actually, this is for your powerpc tree, so we actually run it through
some extended config testing this time ...

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] Correct printk %pF to work on all architectures

2008-09-03 Thread James Bottomley
It was introduced by 

commit 0fe1ef24f7bd0020f29ffe287dfdb9ead33ca0b2
Author: Linus Torvalds [EMAIL PROTECTED]
Date:   Sun Jul 6 16:43:12 2008 -0700

vsprintf: add support for '%pS' and '%pF' pointer formats


However, the current way its coded doesn't work on parisc64.  For two
reasons:  1) parisc isn't in the #ifdef and 2) parisc has a different
format for function descriptors

Make dereference_function_descriptor() more accommodating by allowing
architecture overrides.  I put the three overrides (for parisc64, ppc64
and ia64) in arch/kernel/module.c because that's where the kernel
internal linker which knows how to deal with function descriptors sits.

Signed-off-by: James Bottomley [EMAIL PROTECTED]
---
 arch/ia64/kernel/module.c   |9 +
 arch/parisc/kernel/module.c |   13 +
 arch/powerpc/kernel/module_64.c |9 +
 include/linux/kernel.h  |3 +++
 lib/vsprintf.c  |   12 ++--
 5 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c
index 29aad34..596a862 100644
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -31,6 +31,7 @@
 #include linux/elf.h
 #include linux/moduleloader.h
 #include linux/string.h
+#include linux/uaccess.h
 #include linux/vmalloc.h
 
 #include asm/patch.h
@@ -941,3 +942,11 @@ module_arch_cleanup (struct module *mod)
if (mod-arch.core_unw_table)
unw_remove_unwind_table(mod-arch.core_unw_table);
 }
+
+void *dereference_function_descriptor(void *ptr)
+{
+   void *p = NULL;
+   if (!probe_kernel_address(ptr, p))
+   ptr = p;
+   return p;
+}
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index fdacdd4..4ad80a5 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -47,6 +47,7 @@
 #include linux/string.h
 #include linux/kernel.h
 #include linux/bug.h
+#include linux/uaccess.h
 
 #include asm/unwind.h
 
@@ -860,3 +861,15 @@ void module_arch_cleanup(struct module *mod)
deregister_unwind_table(mod);
module_bug_cleanup(mod);
 }
+
+#ifdef CONFIG_64BIT
+void *dereference_function_descriptor(void *ptr)
+{
+   Elf64_Fdesc *desc = ptr;
+   void *p = NULL;
+
+   if (!probe_kernel_address(desc-addr, p))
+   ptr = p;
+   return p;
+}
+#endif
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index ee6a298..60e9749 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -21,6 +21,7 @@
 #include linux/err.h
 #include linux/vmalloc.h
 #include linux/bug.h
+#include linux/uaccess.h
 #include asm/module.h
 #include asm/uaccess.h
 #include asm/firmware.h
@@ -451,3 +452,11 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 
return 0;
 }
+
+void *dereference_function_descriptor(void *ptr)
+{
+   void *p = NULL;
+   if (!probe_kernel_address(ptr, p))
+   ptr = p;
+   return p;
+}
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 2651f80..8ff19b3 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -189,6 +189,9 @@ extern int __kernel_text_address(unsigned long addr);
 extern int kernel_text_address(unsigned long addr);
 struct pid;
 extern struct pid *session_of_pgrp(struct pid *pgrp);
+/* function descriptor handling (if any) */
+extern void *dereference_function_descriptor(void *ptr);
+
 
 #ifdef CONFIG_PRINTK
 asmlinkage int vprintk(const char *fmt, va_list args)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index d8d1d11..cffcd95 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -513,13 +513,13 @@ static char *string(char *buf, char *end, char *s, int 
field_width, int precisio
return buf;
 }
 
-static inline void *dereference_function_descriptor(void *ptr)
+/*
+ * Some architectures need special handling for pointers
+ * to functions, which are done via function descriptors
+ * Do a non weak override of this function for them
+ */
+void __weak *dereference_function_descriptor(void *ptr)
 {
-#if defined(CONFIG_IA64) || defined(CONFIG_PPC64)
-   void *p;
-   if (!probe_kernel_address(ptr, p))
-   ptr = p;
-#endif
return ptr;
 }
 
-- 
1.5.6.5



___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Correct printk %pF to work on all architectures

2008-09-03 Thread James Bottomley
On Wed, 2008-09-03 at 14:22 -0700, Linus Torvalds wrote:
 
 On Wed, 3 Sep 2008, James Bottomley wrote:
  
  Make dereference_function_descriptor() more accommodating by allowing
  architecture overrides.
 
 Don't do it like this.
 
 We don't want some stupid useless weak function that is empty on all sane 
 platforms.
 
 Just do
 
   .. declare or create an inline 'parisc_function_descriptor()' ..
 
   #define dereference_function_descriptor(p) parisc_function_descriptor(p)
 
 in some arch header file. And then use
 
   #ifndef dereference_function_descriptor
   #define dereference_function_descriptor(p) (p)
   #endif
 
 in the generic code, so that sane architectures don't need to do anything 
 at all.

Is that finally final?  because the last time I tried to do the above
for a voyager override I was told weak functions were the preferred
method ...

Anyway, it's easy to do (if a slightly larger diff) ... I have to move
the prototype from include/kernel.h to include/module.h because I need
an assured asm/xxx include before it to get the override.

It was also pointed out that I should be returning the passed in ptr,
not NULL on failure and that the return should be ptr not p.

James

---

The current way its coded doesn't work on parisc64.  For two
reasons:  1) parisc isn't in the #ifdef and 2) parisc has a different
format for function descriptors

Make dereference_function_descriptor() more accommodating by allowing
architecture overrides.

Signed-off-by: James Bottomley [EMAIL PROTECTED]
---
 arch/ia64/include/asm/module.h|4 
 arch/ia64/kernel/module.c |9 +
 arch/parisc/kernel/module.c   |   13 +
 arch/powerpc/include/asm/module.h |6 ++
 arch/powerpc/kernel/module_64.c   |9 +
 include/asm-parisc/module.h   |6 ++
 include/linux/module.h|5 +
 lib/vsprintf.c|   10 --
 8 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/arch/ia64/include/asm/module.h b/arch/ia64/include/asm/module.h
index d2da61e..d0328be 100644
--- a/arch/ia64/include/asm/module.h
+++ b/arch/ia64/include/asm/module.h
@@ -33,4 +33,8 @@ struct mod_arch_specific {
 
 #define ARCH_SHF_SMALL SHF_IA_64_SHORT
 
+void *ia64_dereference_function_descriptor(void *);
+#define dereference_function_descriptor(p) \
+   ia64_dereference_function_descriptor(p)
+
 #endif /* _ASM_IA64_MODULE_H */
diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c
index 29aad34..4aca326 100644
--- a/arch/ia64/kernel/module.c
+++ b/arch/ia64/kernel/module.c
@@ -31,6 +31,7 @@
 #include linux/elf.h
 #include linux/moduleloader.h
 #include linux/string.h
+#include linux/uaccess.h
 #include linux/vmalloc.h
 
 #include asm/patch.h
@@ -941,3 +942,11 @@ module_arch_cleanup (struct module *mod)
if (mod-arch.core_unw_table)
unw_remove_unwind_table(mod-arch.core_unw_table);
 }
+
+void *ia64_dereference_function_descriptor(void *ptr)
+{
+   void *p;
+   if (!probe_kernel_address(ptr, p))
+   ptr = p;
+   return ptr;
+}
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index fdacdd4..6ec3b07 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -47,6 +47,7 @@
 #include linux/string.h
 #include linux/kernel.h
 #include linux/bug.h
+#include linux/uaccess.h
 
 #include asm/unwind.h
 
@@ -860,3 +861,15 @@ void module_arch_cleanup(struct module *mod)
deregister_unwind_table(mod);
module_bug_cleanup(mod);
 }
+
+#ifdef CONFIG_64BIT
+void *parisc_dereference_function_descriptor(void *ptr)
+{
+   Elf64_Fdesc *desc = ptr;
+   void *p;
+
+   if (!probe_kernel_address(desc-addr, p))
+   ptr = p;
+   return ptr;
+}
+#endif
diff --git a/arch/powerpc/include/asm/module.h 
b/arch/powerpc/include/asm/module.h
index e5f14b1..a861b2c 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -73,5 +73,11 @@ struct exception_table_entry;
 void sort_ex_table(struct exception_table_entry *start,
   struct exception_table_entry *finish);
 
+#ifdef __powerpc64__
+void *powerpc64_dereference_function_descriptor(void *);
+#define dereference_function_descriptor(p) \
+   powerpc64_dereference_function_descriptor(p)
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_MODULE_H */
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index ee6a298..e814c2a 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -21,6 +21,7 @@
 #include linux/err.h
 #include linux/vmalloc.h
 #include linux/bug.h
+#include linux/uaccess.h
 #include asm/module.h
 #include asm/uaccess.h
 #include asm/firmware.h
@@ -451,3 +452,11 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 
return 0;
 }
+
+void *powerpc64_dereference_function_descriptor(void *ptr)
+{
+   void *p;
+   if (!probe_kernel_address

Re: [PATCH] Correct printk %pF to work on all architectures

2008-09-03 Thread James Bottomley
On Wed, 2008-09-03 at 15:54 -0700, Linus Torvalds wrote:
  Anyway, it's easy to do (if a slightly larger diff) ... I have to move
  the prototype from include/kernel.h to include/module.h because I need
  an assured asm/xxx include before it to get the override.
 
 I don't really see what this has to do with module.h, though.
 
 Why do this in linux/module.h?  Why not just do it in lib/vsptintf.c 
 which is the only place that cares? None of this needs to pollute the 
 generic header files that simply don't care.

You want me to pull the elf header files into lib/vsprintf.c and have
something like

static inline void *dereference_function_descritpor(void *ptr)
{
#if defined(CONFIG_IA64) || defined(CONFIG_PPC64)
void *p;
if (!probe_kernel_address(ptr, p))
ptr = p;
#elif defined(CONFIG_PARISC)  defined(CONFIG_64BITS)
Elf64_Fptr *desc = ptr;
void *p;
if (!probe_kernel_address(desc-addr, p))
ptr = p;
#endif
...

?

Because it just looks rather tacky ...

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] Correct printk %pF to work on all architectures

2008-09-03 Thread James Bottomley
On Wed, 2008-09-03 at 16:15 -0700, Linus Torvalds wrote:
 
 On Wed, 3 Sep 2008, James Bottomley wrote:
  
  You want me to pull the elf header files into lib/vsprintf.c and have
  something like
 
 No.
 
 I want you to stop polluting linux/module.h with total and utter crap.
 
 Please tell my WHY the hell you have
 
   diff --git a/include/linux/module.h b/include/linux/module.h
   index 68e0955..a549f89 100644
   --- a/include/linux/module.h
   +++ b/include/linux/module.h
   @@ -76,6 +76,11 @@ void sort_extable(struct exception_table_entry 
 *start,
 struct exception_table_entry *finish);
void sort_main_extable(void);

   +/* function descriptor handling (if any) */
   +#ifndef dereference_function_descriptor
   +#define dereference_function_descriptor(p) (p)
   +#endif
   +
#ifdef MODULE
#define MODULE_GENERIC_TABLE(gtype,name)   \
extern const struct gtype##_id __mod_##gtype##_table   \
 
 in your patch? What the _hell_ does that have to do with module.h?
 
 Why the heck don't you just have that in the ONLY place that cares, namely 
 lib/vfprintf.c?

Oh ... because Arjan has a patch to export
dereference_function_descriptor.  I suppose I could make him do the
heavy lifting, but it seemed sensible to make it easy for him (and me)
by putting it in a header.

http://marc.info/?l=linux-kernelm=121976793429869

It's already in Ingo's tree, so it will be upstream soon.

 In other words, WHY did you do something as stupid and totally 
 unexplainable as
 
   diff --git a/lib/vsprintf.c b/lib/vsprintf.c
   index d8d1d11..0c47629 100644
   --- a/lib/vsprintf.c
   +++ b/lib/vsprintf.c
   @@ -513,16 +513,6 @@ static char *string(char *buf, char *end, char *s, 
 int field_width, int precisio
   return buf;
}

   -static inline void *dereference_function_descriptor(void *ptr)
   -{
   -#if defined(CONFIG_IA64) || defined(CONFIG_PPC64)
   -   void *p;
   -   if (!probe_kernel_address(ptr, p))
   -   ptr = p;
   -#endif
   -   return ptr;
   -}
   -
 
 when that thing _used_ to be in a place where it made sense? Why didn't 
 you just change that already existing code to use a #ifdef instead?
 
 Why do you move that to linux/module.h? It makes no sense.
 
 And why do you put the arch-specific defines in asm/module.h? That makes 
 no sense either. WHY?
 
 As far as I can tell, the _only_ reason you happened to pick 
 linux/module.h was literally that it is the first of our header files 
 that lib/vsprintf.c includes. And quite frankly, that makes no sense. 
 
 That's about as sensible as putting it into linux/string.h. 
 
 Put those things in some _sane_ place. That means:
 
  - default non-implementation in lib/vsprintf.c, since there is no point 
in putting it anywhere else, when it's not used anywhere else.
 
  - arch-specific implementations can go into some more sensible asm header 
file that is more relevant. Maybe asm/processor.h?
 
 IOW, I'm complaining about your totally senseless and apparently random 
 choice of location.

Because arch/../kernel/module.c is where we put the in-kernel linker
which uses the function descriptors and already processes them.

The original patch put the prototype in kernel.h, but kernel.h doesn't
include too many files before it, so if I'm putting the arch
implementation in module.c, it makes sense to me to put the headers in
linux/module.h and asm/module.h being the natural pair belonging to
arch/kernel/module.c

So if I use asm/processor.h, you want me to add that to linux/kernel.h
and move the prototypes back in there?

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: linux-next: scsi tree build failure

2008-07-07 Thread James Bottomley
Correct cc's added

On Mon, 2008-07-07 at 22:25 +1000, Stephen Rothwell wrote:
 Hi James,
 
 Today's linux-next build (powerpc ppc64_defconfig) failed like this:
 
 drivers/scsi/ibmvscsi/ibmvscsi.c: In function 'map_sg_data':
 drivers/scsi/ibmvscsi/ibmvscsi.c:431: error: 'FW_FEATURE_CMO' undeclared 
 (first use in this function)
 drivers/scsi/ibmvscsi/ibmvscsi.c:431: error: (Each undeclared identifier is 
 reported only once
 drivers/scsi/ibmvscsi/ibmvscsi.c:431: error: for each function it appears in.)
 drivers/scsi/ibmvscsi/ibmvscsi.c: In function 'ibmvscsi_queuecommand':
 drivers/scsi/ibmvscsi/ibmvscsi.c:750: error: 'FW_FEATURE_CMO' undeclared 
 (first use in this function)
 drivers/scsi/ibmvscsi/ibmvscsi.c: In function 'send_mad_adapter_info':
 drivers/scsi/ibmvscsi/ibmvscsi.c:864: error: 'FW_FEATURE_CMO' undeclared 
 (first use in this function)
 drivers/scsi/ibmvscsi/ibmvscsi.c: In function 'ibmvscsi_do_host_config':
 drivers/scsi/ibmvscsi/ibmvscsi.c:1412: error: 'FW_FEATURE_CMO' undeclared 
 (first use in this function)
 drivers/scsi/ibmvscsi/ibmvscsi.c: At top level:
 drivers/scsi/ibmvscsi/ibmvscsi.c:1769: error: unknown field 
 'get_io_entitlement' specified in initializer
 drivers/scsi/ibmvscsi/ibmvscsi.c:1769: warning: missing braces around 
 initializer
 drivers/scsi/ibmvscsi/ibmvscsi.c:1769: warning: (near initialization for 
 'ibmvscsi_driver.driver')
 drivers/scsi/ibmvscsi/ibmvscsi.c:1769: warning: initialization from 
 incompatible pointer type
 
 Caused because commit 341b56db6804040aa9559e913865108424e3b18b ([SCSI]
 ibmvscsi: driver enablement for CMO), which was 15/16 in a series, has
 been merged before any of the other patches in the series.  I have
 reverted that commit.

Do I detect the fact that IBM sent a patch for a SCSI driver for which
the core features weren't yet enabled (and which I couldn't check, not
having a ppc build system)?

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: linux-next: scsi tree build failure

2008-07-07 Thread James Bottomley
On Tue, 2008-07-08 at 08:05 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2008-07-07 at 09:39 -0500, James Bottomley wrote:
   Caused because commit 341b56db6804040aa9559e913865108424e3b18b
  ([SCSI]
   ibmvscsi: driver enablement for CMO), which was 15/16 in a series,
  has
   been merged before any of the other patches in the series.  I have
   reverted that commit.
  
  Do I detect the fact that IBM sent a patch for a SCSI driver for which
  the core features weren't yet enabled (and which I couldn't check, not
  having a ppc build system)?
 
 I suspect the sender (Robert) was asking for comments/review :-) In any
 case, Robert, next time make it explicit on a patch CCed to a separate
 list from the rest of the serie that it isn't to be merged without
 dependencies.
 
 James, if you ack it, I'll put it in powerpc.git.

I thought it was all done and dusted (which is why I put it in
scsi-misc).  However, now Brian King seems to have some late arriving
comments.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: MMIO and gcc re-ordering issue

2008-06-10 Thread James Bottomley
On Tue, 2008-06-10 at 10:41 -0700, Jesse Barnes wrote:
 On Monday, June 09, 2008 11:56 pm Nick Piggin wrote:
  So that still doesn't tell us what *minimum* level of ordering we should
  provide in the cross platform readl/writel API. Some relatively sane
  suggestions would be:
 
  - as strong as x86. guaranteed not to break drivers that work on x86,
but slower on some archs. To me, this is most pleasing. It is much
much easier to notice something is going a little slower and to work
out how to use weaker ordering there, than it is to debug some
once-in-a-bluemoon breakage caused by just the right architecture,
driver, etc. It totally frees up the driver writer from thinking
about barriers, provided they get the locking right.
 
  - ordered WRT other IO accessors, constrained within spinlocks, but not
cacheable memory. This is what powerpc does now. It's a little faster
for them, and probably covers the vast majority of drivers, but there
are real possibilities to get it wrong (trivial example: using bit
locks or mutexes or any kind of open coded locking or lockless
synchronisation can break).
 
  - (less sane) same as above, but not ordered WRT spinlocks. This is what
ia64 (sn2) does. From a purist POV, it is a little less arbitrary than
powerpc, but in practice, it will break a lot more drivers than powerpc.
 
  I was kind of joking about taking control of this issue :) But seriously,
  it needs a decision to be made. I vote for #1. My rationale: I'm still
  finding relatively major (well, found maybe 4 or 5 in the last couple of
  years) bugs in the mm subsystem due to memory ordering problems. This is
  apparently one of the most well reviewed and tested bit of code in the
  kernel by people who know all about memory ordering. Not to mention that
  mm/ does not have to worry about IO ordering at all. Then apparently
  driver are the least reviewed and tested. Connect dots.
 
  Now that doesn't leave waker ordering architectures lumped with slow old
  x86 semantics. Think of it as giving them the benefit of sharing x86
  development and testing :) We can then formalise the relaxed __ accessors
  to be more complete (ie. +/- byteswapping). I'd also propose to add
  io_rmb/io_wmb/io_mb that order io/io access, to help architectures like
  sn2 where the io/cacheable barrier is pretty expensive.
 
  Any comments?
 
 FWIW that approach sounds pretty good to me.  Arches that suffer from 
 performance penalties can still add lower level primitives and port selected 
 drivers over, so really they won't be losing much.  AFAICT though drivers 
 will still have to worry about regular memory ordering issues; they'll just 
 be safe from I/O related ones. :)  Still, the simplification is probably 
 worth it.

me too.  That's the whole basis for readX_relaxed() and its cohorts: we
make our weirdest machines (like altix) conform to the x86 norm.  Then
where it really kills us we introduce additional semantics to selected
drivers that enable us to recover I/O speed on the abnormal platforms.

About the only problem we've had is that architectures aren't very good
at co-ordinating for their additional accessors so we tend to get a
forest of strange ones growing up, which appear only in a few drivers
(i.e. the ones that need the speed ups) and which have no well
documented meaning.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: MMIO and gcc re-ordering issue

2008-05-29 Thread James Bottomley
On Thu, 2008-05-29 at 10:47 -0400, Jes Sorensen wrote:
  Roland == Roland Dreier [EMAIL PROTECTED] writes:
 
  This is a different issue. We deal with it on powerpc by having
  writel set a per-cpu flag and spin_unlock() test it, and do the
  barrier if needed there.
 
 Roland Cool... I assume you do this for mutex_unlock() etc?
 
 Roland Is there any reason why ia64 can't do this too so we can kill
 Roland mmiowb and save everyone a lot of hassle?  (mips, sh and frv
 Roland have non-empty mmiowb() definitions too but I'd guess that
 Roland these are all bugs based on misunderstandings of the mmiowb()
 Roland semantics...)
 
 Hi Roland,
 
 Thats not going to solve the problem on Altix. On Altix the issue is
 that there can be multiple paths through the NUMA fabric from cpuX to
 PCI bridge Y. 
 
 Consider this uber-cooltm ascii art - NR is my abbrevation for NUMA
 router:
 
 --- ---
 |cpu X| |cpu Y|
 --- ---
  |   \  /|
  |\/ |
  |/\ |
  |   /  \|
  -  --
  |NR 1| |NR 2|
  -- --
   \ /
\   /
 ---
 | PCI |
 ---
 
 The problem is that your two writel's, despite being both issued on
 cpu X, due to the spin lock, in your example, can end up with the
 first one going through NR 1 and the second one going through NR 2. If
 there's contention on NR 1, the write going via NR 2 may hit the PCI
 bridge prior to the one going via NR 1.
 
 Of course, the bigger the system, the worse the problem
 
 The only way to guarantee ordering in the above setup, is to either
 make writel() fully ordered or adding the mmiowb()'s inbetween the two
 writel's. On Altix you have to go and read from the PCI brige to
 ensure all writes to it have been flushed, which is also what mmiowb()
 is doing. If writel() was to guarantee this ordering, it would make
 every writel() call extremely expensive :-(

So if a read from the bridge achieves the same effect, can't we just put
one after the writes within the spinlock (an unrelaxed one).  That way
this whole sequence will look like a well understood PCI posting flush
rather than have to muck around with little understood (at least by most
driver writers) io barriers?

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: MMIO and gcc re-ordering issue

2008-05-27 Thread James Bottomley
On Tue, 2008-05-27 at 08:50 -0700, Roland Dreier wrote:
  Though it's my understanding that at least ia64 does require the
   explicit barriers anyway, so we are still in a dodgy situation here
   where it's not clear what drivers should do and we end up with
   possibly excessive barriers on powerpc where I end up with both
   the wmb/rmb/mb that were added for ia64 -and- the ones I have in
   readl/writel to make them look synchronous... Not nice.
 
 ia64 is a disaster with a slightly different ordering problem -- the
 mmiowb() issue.  I know Ben knows far too much about this, but for big
 SGI boxes, you sometimes need mmiowb() to avoid problems with driver
 code that does totally sane stuff like
 
   spin_lock(mmio_lock);
   writel(val1, reg1);
   writel(val2, reg2);
   spin_unlock(mmio_lock);
 
 If that snippet is called on two CPUs at the same time, then the device
 might see a sequence like
 
   CPU1 -- write reg1
   CPU2 -- write reg1
   CPU1 -- write reg2
   CPU2 -- write reg2
 
 in spite of the fact that everything is totally ordered on the CPUs by
 the spin lock.
 
 The reason this is such a disaster is because the code looks right,
 makes sense, and works fine on 99.99% of all systems out there.  So I
 would bet that 99% of our drivers have missing mmiowb() bugs -- no one
 has plugged the hardware into an Altix box and cared enough to stress
 test it.
 
 However for the issue at hand, my expectation as a driver writer is that
 readl()/writel() are ordered with respect to MMIO operations, but not
 necessarily with respect to normal writes to coherent CPU memory.  And
 I've included explicit wmb()s in code I've written like
 drivers/infiniband/hw/mthca.

Actually, this specifically should not be.  The need for mmiowb on altix
is because it explicitly violates some of the PCI rules that would
otherwise impede performance.   The compromise is that readX on altix
contains the needed dma flush but there's a variant operator,
readX_relaxed that doesn't (for drivers that know what they're doing).
The altix critical drivers have all been converted to use the relaxed
form for performance, and the unconverted ones should all operate just
fine (albeit potentially more slowly).

You can see all of this in include/asm-ia64/sn/io.h

It is confusing to me that sn_dma_flush() and sn_mmiowb() have different
implementations, but I think both fix the spinlock problem you allude to
by ensuring the DMA operation is completed before the CPU instruction is
executed.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: MMIO and gcc re-ordering issue

2008-05-27 Thread James Bottomley
On Tue, 2008-05-27 at 10:38 -0700, Roland Dreier wrote:
  Actually, this specifically should not be.  The need for mmiowb on altix
   is because it explicitly violates some of the PCI rules that would
   otherwise impede performance.   The compromise is that readX on altix
   contains the needed dma flush but there's a variant operator,
   readX_relaxed that doesn't (for drivers that know what they're doing).
   The altix critical drivers have all been converted to use the relaxed
   form for performance, and the unconverted ones should all operate just
   fine (albeit potentially more slowly).
 
 Is this a recent change?  Because as of October 2007, 76d7cc03
 (IB/mthca: Use mmiowb() to avoid firmware commands getting jumbled up)
 was needed.  But this was involving writel() (__raw_writel() actually,
 looking at the code), not readl().  But writel_relaxed() doesn't exist
 (and doesn't make sense).

Um, OK, you've said write twice now ... I was assuming you meant read.
Even on an x86, writes are posted, so there's no way a spin lock could
serialise a write without an intervening read to flush the posting
(that's why only reads have a relaxed version on altix).  Or is there
something else I'm missing?

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: CONFIG_BLK_DEV_BSG=n

2007-09-14 Thread James Bottomley
On Fri, 2007-09-14 at 12:50 -0700, Medve Emilian-EMMEDVE1 wrote:
 Which solution would you be more comfortable with?

The one which is currently in -mm is this one:

http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=49892223f7d3a2333ef9e6cbdd526676e1fc517a

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch 5/6] ps3: BD/DVD/CD-ROM Storage Driver

2007-07-16 Thread James Bottomley
On Mon, 2007-07-16 at 14:16 +0200, Jens Axboe wrote:
 On Mon, Jul 16 2007, Geert Uytterhoeven wrote:
  On Fri, 13 Jul 2007, Geert Uytterhoeven wrote:
   Ah, that explains it. flush_dcache_page() is used in some drivers.
   I'll update my patches. Thanks for the comments!
  
  Does this look OK?
- Replaced KM_USER0 by KM_IRQ0 (all routines are either called from an
  interrupt handler, from .request_fn (ps3disk), or from .queuecommand
  (ps3rom))
 
 That looks good.
 
- Add a call to flush_kernel_dcache_page() in routines that write to 
  buffers
 
 Hmm, I would have thought a flush_dcache_page() would be more
 appropriate, the backing could be page cache pages.

No ... that was the point of flush_kernel_dcache_page().  The page in
question is page cache backed and contains user mappings.  However, the
block layer has already done a flush_dcache_page() in get_user_pages()
and the user shouldn't be touching memory under I/O (unless they want
self induced aliasing problems) so we're free to assume all the user
cachelines are purged, hence all we have to do is flush the kernel alias
to bring the page up to date and make the users see it correctly.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch 5/6] ps3: BD/DVD/CD-ROM Storage Driver

2007-07-13 Thread James Bottomley
On Fri, 2007-07-13 at 17:10 +0200, Geert Uytterhoeven wrote:
 On Fri, 13 Jul 2007, Arnd Bergmann wrote:
  On Friday 13 July 2007, James Bottomley wrote:
IC.

 - flush_kernel_dcache_page() is a no-op on ppc64
  (ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE is defined on parisc only).

 - For reference, drivers/scsi/ipr.c (another ppc64 driver) just uses a 
plain
  kmap/memcpy/kunmap sequence

So what should I do?
   
   Ask someone who knows the architecture ... Anton, Paulus or Benh ... I'm
   fairly certain PPC is VIPT and will need some kind of alias
   resolution ... perhaps its associative enough not to let the aliases be
   a problem.
  
  I'm pretty sure that no ppc64 machine needs alias resolution in the kernel,
  although some are VIPT. Last time we discussed this, Segher explained it
  to me, but I don't remember which way Cell does it. IIRC, it automatically
  flushes cache lines that are accessed through aliases.
 
 Thanks for confirming!
 
  It's probably a good idea to have the flush_kernel_dcache_page() in there
  anyway, if only to serve as an example for people that copy it into
  architecture-independent drivers, same as what we do for the
  k{,un}map_atomic() that is also not required on ppc64.
 
 Now my next question: why should I add it, if currently no single driver in
 mainline calls flush_kernel_dcache_page()? 
 
 `git grep' finds it in the following files only:
 Documentation/cachetlb.txt
 arch/parisc/kernel/cache.c
 arch/parisc/kernel/pacache.S
 include/asm-parisc/cacheflush.h
 include/linux/highmem.h

It's a recent addition to the API ... the previous way of doing it was
flush_dcache_page() but that's expensive and flushes all the user
aliases.  Since you know in this case there are no current user aliases
and you only need to flush the kernel alias, flush_kernel_dcache_page()
was introduced as the cheaper alternative.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev