Re: [PATCH v5 1/3] locking/rwsem: Remove arch specific rwsem files

2019-03-22 Thread Waiman Long
On 03/22/2019 01:01 PM, Linus Torvalds wrote:
> On Fri, Mar 22, 2019 at 7:30 AM Waiman Long  wrote:
>>  19 files changed, 133 insertions(+), 930 deletions(-)
> Lovely. And it all looks sane to me.
>
> So ack.
>
> The only comment I have is about __down_read_trylock(), which probably
> isn't critical enough to actually care about, but:
>
>> +static inline int __down_read_trylock(struct rw_semaphore *sem)
>> +{
>> +   long tmp;
>> +
>> +   while ((tmp = atomic_long_read(>count)) >= 0) {
>> +   if (tmp == atomic_long_cmpxchg_acquire(>count, tmp,
>> +  tmp + RWSEM_ACTIVE_READ_BIAS)) {
>> +   return 1;
>> +   }
>> +   }
>> +   return 0;
>> +}
> So this seems to
>
>  (a) read the line early (the whole cacheline in shared state issue)
>
>  (b) read the line again unnecessarily in the while loop
>
> Now, (a) might be explained by "well, maybe we do trylock even with
> existing readers", although I continue to think that the case we
> should optimize for is simply the uncontended one, where we don't even
> have multiple readers.
>
> But (b) just seems silly.
>
> So I wonder if it shouldn't just be
>
> long tmp = 0;
>
> do {
> long new = atomic_long_cmpxchg_acquire(>count, tmp,
> tmp + RWSEM_ACTIVE_READ_BIAS);
> if (likely(new == tmp))
> return 1;
>tmp = new;
> } while (tmp >= 0);
> return 0;
>
> which would seem simpler and solve both issues. Hmm?
>
> But honestly, I didn't check what our uses of down_read_trylock() look
> like. We have more of them than I expected, and I _think_ the normal
> case is the "nobody else holds the lock", but that's just a gut
> feeling.
>
> Some of them _might_ be performance-critical. There's the one on
> mmap_sem in the fault handling path, for example. And yes, I'd expect
> the normal case to very much be "no other readers or writers" for that
> one.
>
> NOTE! The above code snippet is absolutely untested, and might be
> completely wrong. Take it as a "something like this" rather than
> anything else.
>
>Linus

As you have noticed already, this patch is just for moving code around
without changing it. I optimize __down_read_trylock() in patch 3.

Cheers,
Longman



Re: [PATCH v5 2/3] locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs

2019-03-22 Thread Linus Torvalds
On Fri, Mar 22, 2019 at 7:30 AM Waiman Long  wrote:
>
> For simplication, we are going to remove rwsem-spinlock.c and make all
> architectures use a single implementation of rwsem - rwsem-xadd.c.

Ack.

   Linus


Re: [PATCH v5 3/3] locking/rwsem: Optimize down_read_trylock()

2019-03-22 Thread Linus Torvalds
On Fri, Mar 22, 2019 at 7:30 AM Waiman Long  wrote:
>
> Modify __down_read_trylock() to optimize for an unlocked rwsem and make
> it generate slightly better code.

Oh, that should teach me to read all patches in the series before
starting to comment on them.

So ignore my comment on #1.

Linus


[PATCH v5 1/3] locking/rwsem: Remove arch specific rwsem files

2019-03-22 Thread Waiman Long
As the generic rwsem-xadd code is using the appropriate acquire and
release versions of the atomic operations, the arch specific rwsem.h
files will not be that much faster than the generic code as long as the
atomic functions are properly implemented. So we can remove those arch
specific rwsem.h and stop building asm/rwsem.h to reduce maintenance
effort.

Currently, only x86, alpha and ia64 have implemented architecture
specific fast paths. I don't have access to alpha and ia64 systems for
testing, but they are legacy systems that are not likely to be updated
to the latest kernel anyway.

By using a rwsem microbenchmark, the total locking rates on a 4-socket
56-core 112-thread x86-64 system before and after the patch were as
follows (mixed means equal # of read and write locks):

  Before Patch  After Patch
   # of Threads  wlock   rlock   mixed wlock   rlock   mixed
     -   -   - -   -   -
129,201  30,143  29,45828,615  30,172  29,201
2 6,807  13,299   1,171 7,725  15,025   1,804
4 6,504  12,755   1,520 7,127  14,286   1,345
8 6,762  13,412 764 6,826  13,652 726
   16 6,693  15,408 662 6,599  15,938 626
   32 6,145  15,286 496 5,549  15,487 511
   64 5,812  15,495  60 5,858  15,572  60

There were some run-to-run variations for the multi-thread tests. For
x86-64, using the generic C code fast path seems to be a little bit
faster than the assembly version with low lock contention.  Looking at
the assembly version of the fast paths, there are assembly to/from C
code wrappers that save and restore all the callee-clobbered registers
(7 registers on x86-64). The assembly generated from the generic C
code doesn't need to do that. That may explain the slight performance
gain here.

The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h
with no code change as no other code other than those under
kernel/locking needs to access the internal rwsem macros and functions.

Signed-off-by: Waiman Long 
---
 MAINTAINERS |   1 -
 arch/alpha/include/asm/rwsem.h  | 211 
 arch/arm/include/asm/Kbuild |   1 -
 arch/arm64/include/asm/Kbuild   |   1 -
 arch/hexagon/include/asm/Kbuild |   1 -
 arch/ia64/include/asm/rwsem.h   | 172 ---
 arch/powerpc/include/asm/Kbuild |   1 -
 arch/s390/include/asm/Kbuild|   1 -
 arch/sh/include/asm/Kbuild  |   1 -
 arch/sparc/include/asm/Kbuild   |   1 -
 arch/x86/include/asm/rwsem.h| 237 
 arch/x86/lib/Makefile   |   1 -
 arch/x86/lib/rwsem.S| 156 -
 arch/x86/um/Makefile|   1 -
 arch/xtensa/include/asm/Kbuild  |   1 -
 include/asm-generic/rwsem.h | 140 ---
 include/linux/rwsem.h   |   4 +-
 kernel/locking/percpu-rwsem.c   |   2 +
 kernel/locking/rwsem.h  | 130 ++
 19 files changed, 133 insertions(+), 930 deletions(-)
 delete mode 100644 arch/alpha/include/asm/rwsem.h
 delete mode 100644 arch/ia64/include/asm/rwsem.h
 delete mode 100644 arch/x86/include/asm/rwsem.h
 delete mode 100644 arch/x86/lib/rwsem.S
 delete mode 100644 include/asm-generic/rwsem.h

diff --git a/MAINTAINERS b/MAINTAINERS
index e17ebf70b548..6bfd5a94c08e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9089,7 +9089,6 @@ F:arch/*/include/asm/spinlock*.h
 F: include/linux/rwlock*.h
 F: include/linux/mutex*.h
 F: include/linux/rwsem*.h
-F: arch/*/include/asm/rwsem.h
 F: include/linux/seqlock.h
 F: lib/locking*.[ch]
 F: kernel/locking/
diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h
deleted file mode 100644
index cf8fc8f9a2ed..
--- a/arch/alpha/include/asm/rwsem.h
+++ /dev/null
@@ -1,211 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ALPHA_RWSEM_H
-#define _ALPHA_RWSEM_H
-
-/*
- * Written by Ivan Kokshaysky , 2001.
- * Based on asm-alpha/semaphore.h and asm-i386/rwsem.h
- */
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-
-#include 
-
-#define RWSEM_UNLOCKED_VALUE   0xL
-#define RWSEM_ACTIVE_BIAS  0x0001L
-#define RWSEM_ACTIVE_MASK  0xL
-#define RWSEM_WAITING_BIAS (-0x0001L)
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS(RWSEM_WAITING_BIAS + 
RWSEM_ACTIVE_BIAS)
-
-static inline int ___down_read(struct rw_semaphore *sem)
-{
-   long oldcount;
-#ifndefCONFIG_SMP
-   oldcount = sem->count.counter;
-   sem->count.counter += RWSEM_ACTIVE_READ_BIAS;
-#else
-   long temp;
-   __asm__ __volatile__(
-   "1: ldq_l 

[PATCH v5 3/3] locking/rwsem: Optimize down_read_trylock()

2019-03-22 Thread Waiman Long
Modify __down_read_trylock() to optimize for an unlocked rwsem and make
it generate slightly better code.

Before this patch, down_read_trylock:

   0x <+0>: callq  0x5 
   0x0005 <+5>: jmp0x18 
   0x0007 <+7>: lea0x1(%rdx),%rcx
   0x000b <+11>:mov%rdx,%rax
   0x000e <+14>:lock cmpxchg %rcx,(%rdi)
   0x0013 <+19>:cmp%rax,%rdx
   0x0016 <+22>:je 0x23 
   0x0018 <+24>:mov(%rdi),%rdx
   0x001b <+27>:test   %rdx,%rdx
   0x001e <+30>:jns0x7 
   0x0020 <+32>:xor%eax,%eax
   0x0022 <+34>:retq
   0x0023 <+35>:mov%gs:0x0,%rax
   0x002c <+44>:or $0x3,%rax
   0x0030 <+48>:mov%rax,0x20(%rdi)
   0x0034 <+52>:mov$0x1,%eax
   0x0039 <+57>:retq

After patch, down_read_trylock:

   0x <+0>: callq  0x5 
   0x0005 <+5>: xor%eax,%eax
   0x0007 <+7>: lea0x1(%rax),%rdx
   0x000b <+11>:lock cmpxchg %rdx,(%rdi)
   0x0010 <+16>:jne0x29 
   0x0012 <+18>:mov%gs:0x0,%rax
   0x001b <+27>:or $0x3,%rax
   0x001f <+31>:mov%rax,0x20(%rdi)
   0x0023 <+35>:mov$0x1,%eax
   0x0028 <+40>:retq
   0x0029 <+41>:test   %rax,%rax
   0x002c <+44>:jns0x7 
   0x002e <+46>:xor%eax,%eax
   0x0030 <+48>:retq

By using a rwsem microbenchmark, the down_read_trylock() rate (with a
load of 10 to lengthen the lock critical section) on a x86-64 system
before and after the patch were:

 Before PatchAfter Patch
   # of Threads rlock   rlock
    -   -
1   14,496  14,716
28,644   8,453
46,799   6,983
85,664   7,190

On a ARM64 system, the performance results were:

 Before PatchAfter Patch
   # of Threads rlock   rlock
    -   -
1   23,676  24,488
27,697   9,502
44,945   3,440
82,641   1,603

For the uncontended case (1 thread), the new down_read_trylock() is a
little bit faster. For the contended cases, the new down_read_trylock()
perform pretty well in x86-64, but performance degrades at high
contention level on ARM64.

Suggested-by: Linus Torvalds 
Signed-off-by: Waiman Long 
---
 kernel/locking/rwsem.h | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index 45ee00236e03..1f5775aa6a1d 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -174,14 +174,17 @@ static inline int __down_read_killable(struct 
rw_semaphore *sem)
 
 static inline int __down_read_trylock(struct rw_semaphore *sem)
 {
-   long tmp;
+   /*
+* Optimize for the case when the rwsem is not locked at all.
+*/
+   long tmp = RWSEM_UNLOCKED_VALUE;
 
-   while ((tmp = atomic_long_read(>count)) >= 0) {
-   if (tmp == atomic_long_cmpxchg_acquire(>count, tmp,
-  tmp + RWSEM_ACTIVE_READ_BIAS)) {
+   do {
+   if (atomic_long_try_cmpxchg_acquire(>count, ,
+   tmp + RWSEM_ACTIVE_READ_BIAS)) {
return 1;
}
-   }
+   } while (tmp >= 0);
return 0;
 }
 
-- 
2.18.1



[PATCH v5 2/3] locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs

2019-03-22 Thread Waiman Long
Currently, we have two different implementation of rwsem:
 1) CONFIG_RWSEM_GENERIC_SPINLOCK (rwsem-spinlock.c)
 2) CONFIG_RWSEM_XCHGADD_ALGORITHM (rwsem-xadd.c)

As we are going to use a single generic implementation for rwsem-xadd.c
and no architecture-specific code will be needed, there is no point
in keeping two different implementations of rwsem. In most cases, the
performance of rwsem-spinlock.c will be worse. It also doesn't get all
the performance tuning and optimizations that had been implemented in
rwsem-xadd.c over the years.

For simplication, we are going to remove rwsem-spinlock.c and make all
architectures use a single implementation of rwsem - rwsem-xadd.c.

All references to RWSEM_GENERIC_SPINLOCK and RWSEM_XCHGADD_ALGORITHM
in the code are removed.

Suggested-by: Peter Zijlstra 
Signed-off-by: Waiman Long 
---
 arch/alpha/Kconfig  |   7 -
 arch/arc/Kconfig|   3 -
 arch/arm/Kconfig|   4 -
 arch/arm64/Kconfig  |   3 -
 arch/c6x/Kconfig|   3 -
 arch/csky/Kconfig   |   3 -
 arch/h8300/Kconfig  |   3 -
 arch/hexagon/Kconfig|   6 -
 arch/ia64/Kconfig   |   4 -
 arch/m68k/Kconfig   |   7 -
 arch/microblaze/Kconfig |   6 -
 arch/mips/Kconfig   |   7 -
 arch/nds32/Kconfig  |   3 -
 arch/nios2/Kconfig  |   3 -
 arch/openrisc/Kconfig   |   6 -
 arch/parisc/Kconfig |   6 -
 arch/powerpc/Kconfig|   7 -
 arch/riscv/Kconfig  |   3 -
 arch/s390/Kconfig   |   6 -
 arch/sh/Kconfig |   6 -
 arch/sparc/Kconfig  |   8 -
 arch/unicore32/Kconfig  |   6 -
 arch/x86/Kconfig|   3 -
 arch/x86/um/Kconfig |   6 -
 arch/xtensa/Kconfig |   3 -
 include/linux/rwsem-spinlock.h  |  47 -
 include/linux/rwsem.h   |   5 -
 kernel/Kconfig.locks|   2 +-
 kernel/locking/Makefile |   4 +-
 kernel/locking/rwsem-spinlock.c | 339 
 kernel/locking/rwsem.h  |   3 -
 31 files changed, 2 insertions(+), 520 deletions(-)
 delete mode 100644 include/linux/rwsem-spinlock.h
 delete mode 100644 kernel/locking/rwsem-spinlock.c

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 584a6e114853..27c871227eee 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -49,13 +49,6 @@ config MMU
bool
default y
 
-config RWSEM_GENERIC_SPINLOCK
-   bool
-
-config RWSEM_XCHGADD_ALGORITHM
-   bool
-   default y
-
 config ARCH_HAS_ILOG2_U32
bool
default n
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index c781e45d1d99..23e063df5d2c 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -63,9 +63,6 @@ config SCHED_OMIT_FRAME_POINTER
 config GENERIC_CSUM
def_bool y
 
-config RWSEM_GENERIC_SPINLOCK
-   def_bool y
-
 config ARCH_DISCONTIGMEM_ENABLE
def_bool n
 
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 054ead960f98..c11c61093c6c 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -178,10 +178,6 @@ config TRACE_IRQFLAGS_SUPPORT
bool
default !CPU_V7M
 
-config RWSEM_XCHGADD_ALGORITHM
-   bool
-   default y
-
 config ARCH_HAS_ILOG2_U32
bool
 
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7e34b9eba5de..c62b9db2b5e8 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -237,9 +237,6 @@ config LOCKDEP_SUPPORT
 config TRACE_IRQFLAGS_SUPPORT
def_bool y
 
-config RWSEM_XCHGADD_ALGORITHM
-   def_bool y
-
 config GENERIC_BUG
def_bool y
depends on BUG
diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig
index e5cd3c5f8399..ed92b5840c0a 100644
--- a/arch/c6x/Kconfig
+++ b/arch/c6x/Kconfig
@@ -27,9 +27,6 @@ config MMU
 config FPU
def_bool n
 
-config RWSEM_GENERIC_SPINLOCK
-   def_bool y
-
 config GENERIC_CALIBRATE_DELAY
def_bool y
 
diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index 725a115759c9..6555d1781132 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -92,9 +92,6 @@ config GENERIC_HWEIGHT
 config MMU
def_bool y
 
-config RWSEM_GENERIC_SPINLOCK
-   def_bool y
-
 config STACKTRACE_SUPPORT
def_bool y
 
diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
index c071da34e081..61c01db6c292 100644
--- a/arch/h8300/Kconfig
+++ b/arch/h8300/Kconfig
@@ -27,9 +27,6 @@ config H8300
 config CPU_BIG_ENDIAN
def_bool y
 
-config RWSEM_GENERIC_SPINLOCK
-   def_bool y
-
 config GENERIC_HWEIGHT
def_bool y
 
diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index ac441680dcc0..3e54a53208d5 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -65,12 +65,6 @@ config GENERIC_CSUM
 config GENERIC_IRQ_PROBE
def_bool y
 
-config RWSEM_GENERIC_SPINLOCK
-   def_bool n
-
-config RWSEM_XCHGADD_ALGORITHM
-   def_bool y
-
 config GENERIC_HWEIGHT
 

[PATCH v5 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-03-22 Thread Waiman Long
v5:
 - Rebase to the latest v5.1 tree and fix conflicts in 
   arch/{xtensa,s390}/include/asm/Kbuild.

v4:
 - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c.

v3:
 - Optimize __down_read_trylock() for the uncontended case as suggested
   by Linus.

v2:
 - Add patch 2 to optimize __down_read_trylock() as suggested by PeterZ.
 - Update performance test data in patch 1.

The goal of this patchset is to remove the architecture specific files
for rwsem-xadd to make it easer to add enhancements in the later rwsem
patches. It also removes the legacy rwsem-spinlock.c file and make all
the architectures use one single implementation of rwsem - rwsem-xadd.c.

Waiman Long (3):
  locking/rwsem: Remove arch specific rwsem files
  locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all
archs
  locking/rwsem: Optimize down_read_trylock()

 MAINTAINERS |   1 -
 arch/alpha/Kconfig  |   7 -
 arch/alpha/include/asm/rwsem.h  | 211 
 arch/arc/Kconfig|   3 -
 arch/arm/Kconfig|   4 -
 arch/arm/include/asm/Kbuild |   1 -
 arch/arm64/Kconfig  |   3 -
 arch/arm64/include/asm/Kbuild   |   1 -
 arch/c6x/Kconfig|   3 -
 arch/csky/Kconfig   |   3 -
 arch/h8300/Kconfig  |   3 -
 arch/hexagon/Kconfig|   6 -
 arch/hexagon/include/asm/Kbuild |   1 -
 arch/ia64/Kconfig   |   4 -
 arch/ia64/include/asm/rwsem.h   | 172 
 arch/m68k/Kconfig   |   7 -
 arch/microblaze/Kconfig |   6 -
 arch/mips/Kconfig   |   7 -
 arch/nds32/Kconfig  |   3 -
 arch/nios2/Kconfig  |   3 -
 arch/openrisc/Kconfig   |   6 -
 arch/parisc/Kconfig |   6 -
 arch/powerpc/Kconfig|   7 -
 arch/powerpc/include/asm/Kbuild |   1 -
 arch/riscv/Kconfig  |   3 -
 arch/s390/Kconfig   |   6 -
 arch/s390/include/asm/Kbuild|   1 -
 arch/sh/Kconfig |   6 -
 arch/sh/include/asm/Kbuild  |   1 -
 arch/sparc/Kconfig  |   8 -
 arch/sparc/include/asm/Kbuild   |   1 -
 arch/unicore32/Kconfig  |   6 -
 arch/x86/Kconfig|   3 -
 arch/x86/include/asm/rwsem.h| 237 --
 arch/x86/lib/Makefile   |   1 -
 arch/x86/lib/rwsem.S| 156 ---
 arch/x86/um/Kconfig |   6 -
 arch/x86/um/Makefile|   1 -
 arch/xtensa/Kconfig |   3 -
 arch/xtensa/include/asm/Kbuild  |   1 -
 include/asm-generic/rwsem.h | 140 -
 include/linux/rwsem-spinlock.h  |  47 -
 include/linux/rwsem.h   |   9 +-
 kernel/Kconfig.locks|   2 +-
 kernel/locking/Makefile |   4 +-
 kernel/locking/percpu-rwsem.c   |   2 +
 kernel/locking/rwsem-spinlock.c | 339 
 kernel/locking/rwsem.h  | 130 
 48 files changed, 135 insertions(+), 1447 deletions(-)
 delete mode 100644 arch/alpha/include/asm/rwsem.h
 delete mode 100644 arch/ia64/include/asm/rwsem.h
 delete mode 100644 arch/x86/include/asm/rwsem.h
 delete mode 100644 arch/x86/lib/rwsem.S
 delete mode 100644 include/asm-generic/rwsem.h
 delete mode 100644 include/linux/rwsem-spinlock.h
 delete mode 100644 kernel/locking/rwsem-spinlock.c

-- 
2.18.1



PRIVATE...

2019-03-21 Thread daniele
I have a business Proposal that will be of benefit to the both of us.Kindly 
contact me on mrmichealwu...@yahoo.com.hk should this be of interest to you.



Re: [PATCH v2 13/13] syscall_get_arch: add "struct task_struct *" argument

2019-03-20 Thread Paul Moore
On Sun, Mar 17, 2019 at 7:30 PM Dmitry V. Levin  wrote:
>
> This argument is required to extend the generic ptrace API with
> PTRACE_GET_SYSCALL_INFO request: syscall_get_arch() is going
> to be called from ptrace_request() along with syscall_get_nr(),
> syscall_get_arguments(), syscall_get_error(), and
> syscall_get_return_value() functions with a tracee as their argument.
>
> The primary intent is that the triple (audit_arch, syscall_nr, arg1..arg6)
> should describe what system call is being called and what its arguments
> are.
>
> Reverts: 5e937a9ae913 ("syscall_get_arch: remove useless function arguments")
> Reverts: 1002d94d3076 ("syscall.h: fix doc text for syscall_get_arch()")
> Reviewed-by: Andy Lutomirski  # for x86
> Reviewed-by: Palmer Dabbelt 
> Acked-by: Paul Moore 
> Acked-by: Paul Burton  # MIPS parts
> Acked-by: Michael Ellerman  (powerpc)
> Acked-by: Kees Cook  # seccomp parts
> Acked-by: Mark Salter  # for the c6x bit
> Cc: Elvira Khabirova 
> Cc: Eugene Syromyatnikov 
> Cc: Oleg Nesterov 
> Cc: x...@kernel.org
> Cc: linux-alpha@vger.kernel.org
> Cc: linux-snps-...@lists.infradead.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-c6x-...@linux-c6x.org
> Cc: uclinux-h8-de...@lists.sourceforge.jp
> Cc: linux-hexa...@vger.kernel.org
> Cc: linux-i...@vger.kernel.org
> Cc: linux-m...@lists.linux-m68k.org
> Cc: linux-m...@vger.kernel.org
> Cc: nios2-...@lists.rocketboards.org
> Cc: openr...@lists.librecores.org
> Cc: linux-par...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-ri...@lists.infradead.org
> Cc: linux-s...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: sparcli...@vger.kernel.org
> Cc: linux...@lists.infradead.org
> Cc: linux-xte...@linux-xtensa.org
> Cc: linux-a...@vger.kernel.org
> Cc: linux-au...@redhat.com
> Signed-off-by: Dmitry V. Levin 
> ---
>
> Notes:
> v2: unchanged
>
>  arch/alpha/include/asm/syscall.h  |  2 +-
>  arch/arc/include/asm/syscall.h|  2 +-
>  arch/arm/include/asm/syscall.h|  2 +-
>  arch/arm64/include/asm/syscall.h  |  4 ++--
>  arch/c6x/include/asm/syscall.h|  2 +-
>  arch/csky/include/asm/syscall.h   |  2 +-
>  arch/h8300/include/asm/syscall.h  |  2 +-
>  arch/hexagon/include/asm/syscall.h|  2 +-
>  arch/ia64/include/asm/syscall.h   |  2 +-
>  arch/m68k/include/asm/syscall.h   |  2 +-
>  arch/microblaze/include/asm/syscall.h |  2 +-
>  arch/mips/include/asm/syscall.h   |  6 +++---
>  arch/mips/kernel/ptrace.c |  2 +-
>  arch/nds32/include/asm/syscall.h  |  2 +-
>  arch/nios2/include/asm/syscall.h  |  2 +-
>  arch/openrisc/include/asm/syscall.h   |  2 +-
>  arch/parisc/include/asm/syscall.h |  4 ++--
>  arch/powerpc/include/asm/syscall.h| 10 --
>  arch/riscv/include/asm/syscall.h  |  2 +-
>  arch/s390/include/asm/syscall.h   |  4 ++--
>  arch/sh/include/asm/syscall_32.h  |  2 +-
>  arch/sh/include/asm/syscall_64.h  |  2 +-
>  arch/sparc/include/asm/syscall.h  |  5 +++--
>  arch/unicore32/include/asm/syscall.h  |  2 +-
>  arch/x86/include/asm/syscall.h|  8 +---
>  arch/x86/um/asm/syscall.h |  2 +-
>  arch/xtensa/include/asm/syscall.h |  2 +-
>  include/asm-generic/syscall.h |  5 +++--
>  kernel/auditsc.c  |  4 ++--
>  kernel/seccomp.c  |  4 ++--
>  30 files changed, 52 insertions(+), 42 deletions(-)

Merged into audit/next, thanks everyone.

-- 
paul moore
www.paul-moore.com


Re: [PATCH] y2038: fix socket.h header inclusion

2019-03-18 Thread Florian Weimer
* Arnd Bergmann:

> Ok, so not '__fds_bits'.
>
> Is '__kernel_fds_bits' ok? I would prefer to keep at least the
> name __kernel_ namespace that we have for typedefs and the
> occasional struct tag.

glibc should be okay with that.  We use __kernel_ in the math
libraries for something completely different, but those files do not
(or should not) include UAPI headers, and in any case, the set of such
identifiers is really small.


Re: [PATCH] y2038: fix socket.h header inclusion

2019-03-18 Thread Arnd Bergmann
On Mon, Mar 18, 2019 at 2:12 PM Florian Weimer  wrote:
> > On Mon, Mar 18, 2019 at 10:25 AM Florian Weimer  wrote:
> >>
> >> * Arnd Bergmann:
> >>
> >> > Should we just remove __kernel_fd_set from the exported headers and
> >> > define the internal fd_set directly in include/linux/types.h? (Adding the
> >> > folks from the old thread to Cc).
> >>
> >> The type is used in the sanitizers, but incorrectly.  They assume that
> >> FD_SETSIZE is always 1024.  (The existence of __kernel_fd_set is
> >> itself somewhat questionable because it leads to such bugs.)
> >> Moving around the type could cause a build failure in the sanitizers, but 
> >> I'm
> >> not entirely clear how the UAPI headers are included there.
> >
> > It looks like sanitizer_platform_limits_posix.cc includes
> > linux/posix_types.h to ensure that __kernel_fd_set is the same
> > size as __sanitizer___kernel_fd_set, and then it uses the
> > latter afterwards.
> >
> > What I don't see here is what kind of operation is actually done
> > on the data, I only see a cast to void.
>
> I think it is used to assert that the select family of system calls
> writes to the 1024 bits for each of the passed pointers.

Yes, that is what I expected to see in libsanitizer, I just couldn't
find any code that actually does this check.

> Which is not actually true—the write size is controlled by the
> file descriptor count argument.

Yes, of course. In fact, I see multiple possible problems that

- kernel reading uninitialized data if 'FD_ZERO()' was
  used with a shorter size than the count argument.
- kernel writing beyond the fd_set data on stack
  when the declaration had a shorter size than the count
  argument.

Each one could happen either because __FD_SETSIZE
is smaller than 'count', or because kernel and user space
disagree on the element size (32 vs 64 bit on x32).

> > If libsanitizer actually does
> > anything interesting here, we should definitely fix it to use the
> > correct size, especially since this is actually something that
> > can trigger a buffer overflow in subtle ways when used carelessly.
> > See for example [1], which we still have not addressed
>
> The footnote is missing.

Sorry, I meant [1] https://patchwork.kernel.org/patch/10245053/

> > For this specific use (and probably others like it), renaming the
> > fds_bits member to __kernel_fds_bits or something like that
> > would keep user space still compiling. That would only break
> > if someone was using __kernel_fd_set, and actually doing
> > bit operations on it. glibc uses '__fds_bits' unless __USE_XOPEN
> > is set, so maybe we should use use that name unconditionally.
>
> Please use something that is more obviously Linux-specific.

Ok, so not '__fds_bits'.

Is '__kernel_fds_bits' ok? I would prefer to keep at least the
name __kernel_ namespace that we have for typedefs and the
occasional struct tag.

Arnd


Re: [PATCH] y2038: fix socket.h header inclusion

2019-03-18 Thread Florian Weimer
* Arnd Bergmann:

> On Mon, Mar 18, 2019 at 10:25 AM Florian Weimer  wrote:
>>
>> * Arnd Bergmann:
>>
>> > Should we just remove __kernel_fd_set from the exported headers and
>> > define the internal fd_set directly in include/linux/types.h? (Adding the
>> > folks from the old thread to Cc).
>>
>> The type is used in the sanitizers, but incorrectly.  They assume that
>> FD_SETSIZE is always 1024.  (The existence of __kernel_fd_set is
>> itself somewhat questionable because it leads to such bugs.)
>> Moving around the type could cause a build failure in the sanitizers, but I'm
>> not entirely clear how the UAPI headers are included there.
>
> It looks like sanitizer_platform_limits_posix.cc includes
> linux/posix_types.h to ensure that __kernel_fd_set is the same
> size as __sanitizer___kernel_fd_set, and then it uses the
> latter afterwards.
>
> What I don't see here is what kind of operation is actually done
> on the data, I only see a cast to void.

I think it is used to assert that the select family of system calls
writes to the 1024 bits for each of the passed pointers.  Which is not
actually true—the write size is controlled by the file descriptor
count argument.

> If libsanitizer actually does
> anything interesting here, we should definitely fix it to use the
> correct size, especially since this is actually something that
> can trigger a buffer overflow in subtle ways when used carelessly.
> See for example [1], which we still have not addressed

The footnote is missing.

> For this specific use (and probably others like it), renaming the
> fds_bits member to __kernel_fds_bits or something like that
> would keep user space still compiling. That would only break
> if someone was using __kernel_fd_set, and actually doing
> bit operations on it. glibc uses '__fds_bits' unless __USE_XOPEN
> is set, so maybe we should use use that name unconditionally.

Please use something that is more obviously Linux-specific.


Re: [PATCH] y2038: fix socket.h header inclusion

2019-03-18 Thread Florian Weimer
* Arnd Bergmann:

> Should we just remove __kernel_fd_set from the exported headers and
> define the internal fd_set directly in include/linux/types.h? (Adding the
> folks from the old thread to Cc).

The type is used in the sanitizers, but incorrectly.  They assume that
FD_SETSIZE is always 1024.  (The existence of __kernel_fd_set is
itself somewhat questionable because it leads to such bugs.)  Moving
around the type could cause a build failure in the sanitizers, but I'm
not entirely clear how the UAPI headers are included there.

Otherwise, I couldn't find any uses.


Re: [PATCH] y2038: fix socket.h header inclusion

2019-03-18 Thread Arnd Bergmann
On Sun, Mar 17, 2019 at 7:20 PM Deepa Dinamani  wrote:
> On Fri, Mar 15, 2019 at 2:20 PM Florian Weimer  wrote:
> > > On Thu, Mar 14, 2019 at 7:41 PM Florian Weimer  wrote:
> > >> > diff --git a/arch/alpha/include/uapi/asm/socket.h
> > >> > b/arch/alpha/include/uapi/asm/socket.h
> > >> > index 0d0fddb7e738..976e89b116e5 100644
> > >> > --- a/arch/alpha/include/uapi/asm/socket.h
> > >> > +++ b/arch/alpha/include/uapi/asm/socket.h
> > >> > @@ -2,8 +2,8 @@
> > >> >  #ifndef _UAPI_ASM_SOCKET_H
> > >> >  #define _UAPI_ASM_SOCKET_H
> > >> >
> > >> > +#include 
> > >> >  #include 
> > >> > -#include 
> > >>
> > >> This breaks POSIX conformance in glibc because the
> > >>  header is not namespace clean.  It contains the
> > >> identifiers fds_bits and val:
> > >>
> > >> unsigned long fds_bits[__FD_SETSIZE / (8 * sizeof(long))];
> > >>
> > >> int val[2];
> > >
> > > What is problematic about the struct members here? I had thought that
> > > only the struct names have to be in a namespace to be usable here,
> > > but not the members.
> >
> > According POSIX, a user can do this:
> >
> > #define fds_bits 1024
> >
> > before including the  header file.  Similarly for val.
> >
> > Since glibc pulls in  indirectly, the result is a parse
> > error, even though the programmer did nothing wrong (fds_bits is not
> > an identifier used by POSIX, nor is it in the implementation
> > namespace, ans  is a POSIX header).

Ok, I see. Thanks for the explanation!

> > > We could use asm/posix_types.h instead of linux/posix_types.h,
> > > would that address your concern?
> >
> > It should fix the fds_bits case, I think.  But
> >  still uses val, so that part of the issue
> > remains.
>
> Would moving kernel namespace types(__kernel prefix) to a different
> header file(kernel_types.h?) and then including this from
> linux/posix_types.h.
> And, for socket.h just including kernel_types.h make sense?

I fear we have considered linux/posix_types.h to be something that
can be included anywhere for a long time, so it may be better to
ensure that this is actually the case, and avoid the problem with those
two structures but leave the rest untouched.

I think we can move  __kernel_fsid_t into include/uapi/asm-generic/statfs.h,
which is the only thing that needs it anyway. We have two definitions of
it today, the non-generic one being for mips32, but incidentally there was
a patch the other day to remove that and use the generic one instead.

With that done, we can change asm/socket.h to just use asm/posix_types.h.

I would still prefer to solve the problem for linux/posix_types.h as well,
but I'm not sure even how __kernel_fd_set  is used today in
user space, if at all. Commit 8ded2bbc1845 ("posix_types.h: Cleanup
stale __NFDBITS and related definitions") removed most of the fd_set
definition after a long discussion [1], and since then it has been
basically impossible to use 'struct fd_set'  from the kernel in a
meaningful way without including the libc headers or duplicating
them.

Should we just remove __kernel_fd_set from the exported headers and
define the internal fd_set directly in include/linux/types.h? (Adding the
folks from the old thread to Cc).

  Arnd

[1] https://lore.kernel.org/lkml/20120724181209.ga10...@zod.bos.redhat.com/t/


[PATCH v2 13/13] syscall_get_arch: add "struct task_struct *" argument

2019-03-17 Thread Dmitry V. Levin
This argument is required to extend the generic ptrace API with
PTRACE_GET_SYSCALL_INFO request: syscall_get_arch() is going
to be called from ptrace_request() along with syscall_get_nr(),
syscall_get_arguments(), syscall_get_error(), and
syscall_get_return_value() functions with a tracee as their argument.

The primary intent is that the triple (audit_arch, syscall_nr, arg1..arg6)
should describe what system call is being called and what its arguments
are.

Reverts: 5e937a9ae913 ("syscall_get_arch: remove useless function arguments")
Reverts: 1002d94d3076 ("syscall.h: fix doc text for syscall_get_arch()")
Reviewed-by: Andy Lutomirski  # for x86
Reviewed-by: Palmer Dabbelt 
Acked-by: Paul Moore 
Acked-by: Paul Burton  # MIPS parts
Acked-by: Michael Ellerman  (powerpc)
Acked-by: Kees Cook  # seccomp parts
Acked-by: Mark Salter  # for the c6x bit
Cc: Elvira Khabirova 
Cc: Eugene Syromyatnikov 
Cc: Oleg Nesterov 
Cc: x...@kernel.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c6x-...@linux-c6x.org
Cc: uclinux-h8-de...@lists.sourceforge.jp
Cc: linux-hexa...@vger.kernel.org
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-m...@vger.kernel.org
Cc: nios2-...@lists.rocketboards.org
Cc: openr...@lists.librecores.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@lists.infradead.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
Cc: linux-au...@redhat.com
Signed-off-by: Dmitry V. Levin 
---

Notes:
v2: unchanged

 arch/alpha/include/asm/syscall.h  |  2 +-
 arch/arc/include/asm/syscall.h|  2 +-
 arch/arm/include/asm/syscall.h|  2 +-
 arch/arm64/include/asm/syscall.h  |  4 ++--
 arch/c6x/include/asm/syscall.h|  2 +-
 arch/csky/include/asm/syscall.h   |  2 +-
 arch/h8300/include/asm/syscall.h  |  2 +-
 arch/hexagon/include/asm/syscall.h|  2 +-
 arch/ia64/include/asm/syscall.h   |  2 +-
 arch/m68k/include/asm/syscall.h   |  2 +-
 arch/microblaze/include/asm/syscall.h |  2 +-
 arch/mips/include/asm/syscall.h   |  6 +++---
 arch/mips/kernel/ptrace.c |  2 +-
 arch/nds32/include/asm/syscall.h  |  2 +-
 arch/nios2/include/asm/syscall.h  |  2 +-
 arch/openrisc/include/asm/syscall.h   |  2 +-
 arch/parisc/include/asm/syscall.h |  4 ++--
 arch/powerpc/include/asm/syscall.h| 10 --
 arch/riscv/include/asm/syscall.h  |  2 +-
 arch/s390/include/asm/syscall.h   |  4 ++--
 arch/sh/include/asm/syscall_32.h  |  2 +-
 arch/sh/include/asm/syscall_64.h  |  2 +-
 arch/sparc/include/asm/syscall.h  |  5 +++--
 arch/unicore32/include/asm/syscall.h  |  2 +-
 arch/x86/include/asm/syscall.h|  8 +---
 arch/x86/um/asm/syscall.h |  2 +-
 arch/xtensa/include/asm/syscall.h |  2 +-
 include/asm-generic/syscall.h |  5 +++--
 kernel/auditsc.c  |  4 ++--
 kernel/seccomp.c  |  4 ++--
 30 files changed, 52 insertions(+), 42 deletions(-)

diff --git a/arch/alpha/include/asm/syscall.h b/arch/alpha/include/asm/syscall.h
index d73a6fcb519c..11c688c1d7ec 100644
--- a/arch/alpha/include/asm/syscall.h
+++ b/arch/alpha/include/asm/syscall.h
@@ -4,7 +4,7 @@
 
 #include 
 
-static inline int syscall_get_arch(void)
+static inline int syscall_get_arch(struct task_struct *task)
 {
return AUDIT_ARCH_ALPHA;
 }
diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
index c7fc4c0c3bcb..caf2697ef5b7 100644
--- a/arch/arc/include/asm/syscall.h
+++ b/arch/arc/include/asm/syscall.h
@@ -70,7 +70,7 @@ syscall_get_arguments(struct task_struct *task, struct 
pt_regs *regs,
 }
 
 static inline int
-syscall_get_arch(void)
+syscall_get_arch(struct task_struct *task)
 {
return IS_ENABLED(CONFIG_ISA_ARCOMPACT)
? (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)
diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index 06dea6bce293..3940ceac0bdc 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -104,7 +104,7 @@ static inline void syscall_set_arguments(struct task_struct 
*task,
memcpy(>ARM_r0 + i, args, n * sizeof(args[0]));
 }
 
-static inline int syscall_get_arch(void)
+static inline int syscall_get_arch(struct task_struct *task)
 {
/* ARM tasks don't change audit architectures on the fly. */
return AUDIT_ARCH_ARM;
diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index ad8be16a39c9..1870df03f774 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -117,9 +117,9 @@ static inline void syscall_set_arguments(struct task_struct 
*task,
  * We don't care about endianness (__AUDIT_ARCH_LE bit) here because
  * AArch64 has the same system 

Re: [PATCH] y2038: fix socket.h header inclusion

2019-03-17 Thread Deepa Dinamani
On Fri, Mar 15, 2019 at 2:20 PM Florian Weimer  wrote:
>
> * Arnd Bergmann:
>
> > On Thu, Mar 14, 2019 at 7:41 PM Florian Weimer  wrote:
> >>
> >> * Arnd Bergmann:
> >>
> >> > diff --git a/arch/alpha/include/uapi/asm/socket.h
> >> > b/arch/alpha/include/uapi/asm/socket.h
> >> > index 0d0fddb7e738..976e89b116e5 100644
> >> > --- a/arch/alpha/include/uapi/asm/socket.h
> >> > +++ b/arch/alpha/include/uapi/asm/socket.h
> >> > @@ -2,8 +2,8 @@
> >> >  #ifndef _UAPI_ASM_SOCKET_H
> >> >  #define _UAPI_ASM_SOCKET_H
> >> >
> >> > +#include 
> >> >  #include 
> >> > -#include 
> >>
> >> This breaks POSIX conformance in glibc because the
> >>  header is not namespace clean.  It contains the
> >> identifiers fds_bits and val:
> >>
> >> unsigned long fds_bits[__FD_SETSIZE / (8 * sizeof(long))];
> >>
> >> int val[2];
> >
> > What is problematic about the struct members here? I had thought that
> > only the struct names have to be in a namespace to be usable here,
> > but not the members.
>
> According POSIX, a user can do this:
>
> #define fds_bits 1024
>
> before including the  header file.  Similarly for val.
>
> Since glibc pulls in  indirectly, the result is a parse
> error, even though the programmer did nothing wrong (fds_bits is not
> an identifier used by POSIX, nor is it in the implementation
> namespace, ans  is a POSIX header).
>
> > We could use asm/posix_types.h instead of linux/posix_types.h,
> > would that address your concern?
>
> It should fix the fds_bits case, I think.  But
>  still uses val, so that part of the issue
> remains.

Would moving kernel namespace types(__kernel prefix) to a different
header file(kernel_types.h?) and then including this from
linux/posix_types.h.
And, for socket.h just including kernel_types.h make sense?

-Deepa


Re: [PATCH] y2038: fix socket.h header inclusion

2019-03-15 Thread Florian Weimer
* Arnd Bergmann:

> On Thu, Mar 14, 2019 at 7:41 PM Florian Weimer  wrote:
>>
>> * Arnd Bergmann:
>>
>> > diff --git a/arch/alpha/include/uapi/asm/socket.h
>> > b/arch/alpha/include/uapi/asm/socket.h
>> > index 0d0fddb7e738..976e89b116e5 100644
>> > --- a/arch/alpha/include/uapi/asm/socket.h
>> > +++ b/arch/alpha/include/uapi/asm/socket.h
>> > @@ -2,8 +2,8 @@
>> >  #ifndef _UAPI_ASM_SOCKET_H
>> >  #define _UAPI_ASM_SOCKET_H
>> >
>> > +#include 
>> >  #include 
>> > -#include 
>>
>> This breaks POSIX conformance in glibc because the
>>  header is not namespace clean.  It contains the
>> identifiers fds_bits and val:
>>
>> unsigned long fds_bits[__FD_SETSIZE / (8 * sizeof(long))];
>>
>> int val[2];
>
> What is problematic about the struct members here? I had thought that
> only the struct names have to be in a namespace to be usable here,
> but not the members.

According POSIX, a user can do this:

#define fds_bits 1024

before including the  header file.  Similarly for val.

Since glibc pulls in  indirectly, the result is a parse
error, even though the programmer did nothing wrong (fds_bits is not
an identifier used by POSIX, nor is it in the implementation
namespace, ans  is a POSIX header).

> We could use asm/posix_types.h instead of linux/posix_types.h,
> would that address your concern?

It should fix the fds_bits case, I think.  But
 still uses val, so that part of the issue
remains.


Re: [PATCH] y2038: fix socket.h header inclusion

2019-03-15 Thread Arnd Bergmann
On Thu, Mar 14, 2019 at 7:41 PM Florian Weimer  wrote:
>
> * Arnd Bergmann:
>
> > diff --git a/arch/alpha/include/uapi/asm/socket.h 
> > b/arch/alpha/include/uapi/asm/socket.h
> > index 0d0fddb7e738..976e89b116e5 100644
> > --- a/arch/alpha/include/uapi/asm/socket.h
> > +++ b/arch/alpha/include/uapi/asm/socket.h
> > @@ -2,8 +2,8 @@
> >  #ifndef _UAPI_ASM_SOCKET_H
> >  #define _UAPI_ASM_SOCKET_H
> >
> > +#include 
> >  #include 
> > -#include 
>
> This breaks POSIX conformance in glibc because the
>  header is not namespace clean.  It contains the
> identifiers fds_bits and val:
>
> unsigned long fds_bits[__FD_SETSIZE / (8 * sizeof(long))];
>
> int val[2];

What is problematic about the struct members here? I had thought that
only the struct names have to be in a namespace to be usable here,
but not the members.

The only part that might be problematic is

#undef __FD_SETSIZE
#define __FD_SETSIZE1024

but we already get that from a number of other inclusions of
linux/posix_types.h. Is this what you mean?

> We could duplicate some of the SO_* constants for POSIX mode in glibc,
> but it would be nice to avoid that.
>
> Is there a different way of fixing this on the kernel side that avoids
> including ?

We could use asm/posix_types.h instead of linux/posix_types.h,
would that address your concern?

   Arnd


HI

2019-03-15 Thread CAPTAIN JAMES WILLIAMS




--
I am Captain JAMES WILLIAMS from united states, but I am currently in 
Syria for peace keeping mission. I am the commanding officer of the 
third Battalion soldier regime.


Please forgive my manners I am not good when it comes to the Internet 
because that is not really my field.


I came across your email confidently when I was searching for a 
reliable person and I suddenly got your email address and I guess it's 
for a good purpose.


Here in Syria we are not allowed to go out that makes it very bored for 
me so I just think I need a friend to talk to outside to keep me going.


I would like to know more about you as a friend.

Respectfully,

James williams
--



Re: [PATCH] x86: Deprecate a.out support

2019-03-14 Thread Miguel Ojeda
On Tue, Mar 12, 2019 at 9:46 AM Geert Uytterhoeven  wrote:
>
> Yeah, the alphas on the server side, powering AltaVista, are also long
> gone...

As usual with these things, people can still use older Linux releases
for a very long time. If they really need it (e.g. commercially), they
have the reference code and can bring it up to speed again relatively
easily. It is not as if they have no way to submit it back again.

Cheers,
Miguel


Re: [PATCH] x86: Deprecate a.out support

2019-03-12 Thread Geert Uytterhoeven
On Mon, Mar 11, 2019 at 10:46 PM Linus Torvalds
 wrote:
> On Mon, Mar 11, 2019 at 2:34 PM Arnd Bergmann  wrote:
> > The main historic use case I've heard of was running Netscape
> > Navigator on Alpha Linux, before there was an open source version.
> > Doing this today to connect to the open internet is probably
> > a bit pointless, but there may be other use cases.
>
> The _really_ main version was that I decided to make my life easier
> for the initial alpha port by trying to run basic (tested) OSF/1
> binaries directly.
>
> Netscape may have been one of the binaries people actually ended up
> using, but it's probably not a reason any more, since the internet has
> moved past that anyway.

Yeah, the alphas on the server side, powering AltaVista, are also long
gone...

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH] x86: Deprecate a.out support

2019-03-12 Thread Michael Cree
On Mon, Mar 11, 2019 at 03:11:55PM -0700, Matt Turner wrote:
> On Mon, Mar 11, 2019 at 2:34 PM Arnd Bergmann  wrote:
> > On Mon, Mar 11, 2019 at 8:47 PM Måns Rullgård  wrote:
> > > Linus Torvalds  writes:
> > > > On Mon, Mar 11, 2019 at 11:08 AM Måns Rullgård  wrote:
> > > > We don't have any specific support for ECOFF.
> > > >
> > > > I _think_. Again, it's been years and years.

I agree. I personally have never run any OSF/1 executables on
Linux Alpha and have no interest in doing so.

> > The main historic use case I've heard of was running Netscape
> > Navigator on Alpha Linux, before there was an open source version.
> > Doing this today to connect to the open internet is probably
> > a bit pointless, but there may be other use cases.
> 
> The best use case I know of is to run their C compiler. Måns sent
> patches in fact to make it work.
> 
> There is a Linux version of the same compiler but I have a vague
> memory that it's broken in various ways that the Tru64 version is
> not.

The last time I tried the Compaq C compiler for Alpha-Linux it still
worked, well, that is, the compiler worked, but the library header
files are broken and haven't worked with glibc for a long time.  So
it is only useful as a free-standing compiler.

In the past it also produced better code than gcc, but gcc is now
so vastly improved w.r.t. optimisation and compliance to more recent
standards, that I would be surprised if there is any real use for
the Compaq compiler.

Cheers,
Michael.


Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread Måns Rullgård
Linus Torvalds  writes:

> On Mon, Mar 11, 2019 at 2:34 PM Arnd Bergmann  wrote:
>>
>> The main historic use case I've heard of was running Netscape
>> Navigator on Alpha Linux, before there was an open source version.
>> Doing this today to connect to the open internet is probably
>> a bit pointless, but there may be other use cases.
>
> The _really_ main version was that I decided to make my life easier
> for the initial alpha port by trying to run basic (tested) OSF/1
> binaries directly.
>
> Netscape may have been one of the binaries people actually ended up
> using, but it's probably not a reason any more, since the internet has
> moved past that anyway.
>
>> Looking at the system call table in the kernel
>> (arch/alpha/kernel/syscalls/syscall.tbl), we seem to support a
>> specific subset that was required for a set of applications, and
>> not much more.
>
> Yeah, it never supported arbitrary binaries, particularly since
> there's often lots of other issues too with running things like that
> (ie filesystem layout etc). It worked for normal fairly well behaved
> stuff, but wasn't ever a full OSF/1 emulation environment.
>
> I _suspect_ nobody actually runs any OSF/1 binaries any more, but it
> would obviously be good to verify that. Your argument that timeval
> handling was broken _may_ be an indication of that (or may just mean
> very few apps care).

Does it count if I fire up an Alpha and run a few OSF/1 binaries right
now? :-)

-- 
Måns Rullgård


Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread Matt Turner
On Mon, Mar 11, 2019 at 2:34 PM Arnd Bergmann  wrote:
>
> On Mon, Mar 11, 2019 at 8:47 PM Måns Rullgård  wrote:
> > Linus Torvalds  writes:
> > > On Mon, Mar 11, 2019 at 11:08 AM Måns Rullgård  wrote:
> > >>
> > >> The latest version I have is 5.1, and that uses ECOFF.
> > >
> > > ECOFF _is_ a.out as far as Linux is concerned.
> > >
> > > So Linux basically treats ECOFF as "regular a.out with just some
> > > header extensions".
> > >
> > > We don't have any specific support for ECOFF.
> > >
> > > I _think_. Again, it's been years and years.
> >
> > Right, which is why killing a.out entirely would have the unfortunate
> > effect of also removing the OSF/1 compatibility on Alpha.
> >
> > If we are to support Alpha as an architecture at all, it makes sense to
> > support the things people actually use it for.
> >
> > Now, personally I can live without it.  I just don't like to see
> > features removed without due consideration.
>
> The main historic use case I've heard of was running Netscape
> Navigator on Alpha Linux, before there was an open source version.
> Doing this today to connect to the open internet is probably
> a bit pointless, but there may be other use cases.

The best use case I know of is to run their C compiler. Måns sent
patches in fact to make it work.

There is a Linux version of the same compiler but I have a vague
memory that it's broken in various ways that the Tru64 version is not.


Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread Måns Rullgård
Arnd Bergmann  writes:

> On Mon, Mar 11, 2019 at 8:47 PM Måns Rullgård  wrote:
>> Linus Torvalds  writes:
>> > On Mon, Mar 11, 2019 at 11:08 AM Måns Rullgård  wrote:
>> >>
>> >> The latest version I have is 5.1, and that uses ECOFF.
>> >
>> > ECOFF _is_ a.out as far as Linux is concerned.
>> >
>> > So Linux basically treats ECOFF as "regular a.out with just some
>> > header extensions".
>> >
>> > We don't have any specific support for ECOFF.
>> >
>> > I _think_. Again, it's been years and years.
>>
>> Right, which is why killing a.out entirely would have the unfortunate
>> effect of also removing the OSF/1 compatibility on Alpha.
>>
>> If we are to support Alpha as an architecture at all, it makes sense to
>> support the things people actually use it for.
>>
>> Now, personally I can live without it.  I just don't like to see
>> features removed without due consideration.
>
> The main historic use case I've heard of was running Netscape
> Navigator on Alpha Linux, before there was an open source version.
> Doing this today to connect to the open internet is probably
> a bit pointless, but there may be other use cases.

Once upon a time, I used it to run Matlab.

> Looking at the system call table in the kernel
> (arch/alpha/kernel/syscalls/syscall.tbl), we seem to support a
> specific subset that was required for a set of applications, and
> not much more. Old system calls (osf_old_open, osf_execve,
> osf_old_sigaction) are listed but not implemented, and the same
> is true for most of the later calls (osf_fuser, osf_sigsendset,
> osf_waitid, osf_signal, ...), just the ones in the middle are there.
> This would also indicate that it never really worked as a
> general-purpose emulation layer but was only there for a specific
> set of applications.

It works for many applications, though I did have to add a few syscalls
myself (yes, I sent patches).

> Another data point I have is that osf1 emulation was broken
> between linux-4.13 and linux-4.16 without anyone noticing, see
> 47669fb6b595 ("alpha: osf_sys.c: fix put_tv32 regression").

That's interesting, but it doesn't mean nobody is using it.  I tend to
run the LTS branches and switch to a new one only once it seems to have
settled a bit, so when 4.16 was released, I was probably still running
4.9.  I don't think I'm the only one using this strategy.

-- 
Måns Rullgård


Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread Linus Torvalds
On Mon, Mar 11, 2019 at 2:34 PM Arnd Bergmann  wrote:
>
> The main historic use case I've heard of was running Netscape
> Navigator on Alpha Linux, before there was an open source version.
> Doing this today to connect to the open internet is probably
> a bit pointless, but there may be other use cases.

The _really_ main version was that I decided to make my life easier
for the initial alpha port by trying to run basic (tested) OSF/1
binaries directly.

Netscape may have been one of the binaries people actually ended up
using, but it's probably not a reason any more, since the internet has
moved past that anyway.

> Looking at the system call table in the kernel
> (arch/alpha/kernel/syscalls/syscall.tbl), we seem to support a
> specific subset that was required for a set of applications, and
> not much more.

Yeah, it never supported arbitrary binaries, particularly since
there's often lots of other issues too with running things like that
(ie filesystem layout etc). It worked for normal fairly well behaved
stuff, but wasn't ever a full OSF/1 emulation environment.

I _suspect_ nobody actually runs any OSF/1 binaries any more, but it
would obviously be good to verify that. Your argument that timeval
handling was broken _may_ be an indication of that (or may just mean
very few apps care).

I think we should try the a.out removal and see if anybody notices.

Linus


Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread Arnd Bergmann
On Mon, Mar 11, 2019 at 8:47 PM Måns Rullgård  wrote:
> Linus Torvalds  writes:
> > On Mon, Mar 11, 2019 at 11:08 AM Måns Rullgård  wrote:
> >>
> >> The latest version I have is 5.1, and that uses ECOFF.
> >
> > ECOFF _is_ a.out as far as Linux is concerned.
> >
> > So Linux basically treats ECOFF as "regular a.out with just some
> > header extensions".
> >
> > We don't have any specific support for ECOFF.
> >
> > I _think_. Again, it's been years and years.
>
> Right, which is why killing a.out entirely would have the unfortunate
> effect of also removing the OSF/1 compatibility on Alpha.
>
> If we are to support Alpha as an architecture at all, it makes sense to
> support the things people actually use it for.
>
> Now, personally I can live without it.  I just don't like to see
> features removed without due consideration.

The main historic use case I've heard of was running Netscape
Navigator on Alpha Linux, before there was an open source version.
Doing this today to connect to the open internet is probably
a bit pointless, but there may be other use cases.

Looking at the system call table in the kernel
(arch/alpha/kernel/syscalls/syscall.tbl), we seem to support a
specific subset that was required for a set of applications, and
not much more. Old system calls (osf_old_open, osf_execve,
osf_old_sigaction) are listed but not implemented, and the same
is true for most of the later calls (osf_fuser, osf_sigsendset,
osf_waitid, osf_signal, ...), just the ones in the middle are there.
This would also indicate that it never really worked as a
general-purpose emulation layer but was only there for a specific
set of applications.

Another data point I have is that osf1 emulation was broken
between linux-4.13 and linux-4.16 without anyone noticing, see
47669fb6b595 ("alpha: osf_sys.c: fix put_tv32 regression").

  Arnd


Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread Matt Turner
On Mon, Mar 11, 2019 at 12:47 PM Måns Rullgård  wrote:
>
> Linus Torvalds  writes:
>
> > On Mon, Mar 11, 2019 at 11:08 AM Måns Rullgård  wrote:
> >>
> >> The latest version I have is 5.1, and that uses ECOFF.
> >
> > ECOFF _is_ a.out as far as Linux is concerned.
> >
> > So Linux basically treats ECOFF as "regular a.out with just some
> > header extensions".
> >
> > We don't have any specific support for ECOFF.
> >
> > I _think_. Again, it's been years and years.
>
> Right, which is why killing a.out entirely would have the unfortunate
> effect of also removing the OSF/1 compatibility on Alpha.
>
> If we are to support Alpha as an architecture at all, it makes sense to
> support the things people actually use it for.

I agree. I was not aware that a.out was effectively the same as ECOFF.


Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread Måns Rullgård
Linus Torvalds  writes:

> On Mon, Mar 11, 2019 at 11:08 AM Måns Rullgård  wrote:
>>
>> The latest version I have is 5.1, and that uses ECOFF.
>
> ECOFF _is_ a.out as far as Linux is concerned.
>
> So Linux basically treats ECOFF as "regular a.out with just some
> header extensions".
>
> We don't have any specific support for ECOFF.
>
> I _think_. Again, it's been years and years.

Right, which is why killing a.out entirely would have the unfortunate
effect of also removing the OSF/1 compatibility on Alpha.

If we are to support Alpha as an architecture at all, it makes sense to
support the things people actually use it for.

Now, personally I can live without it.  I just don't like to see
features removed without due consideration.

-- 
Måns Rullgård


Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread Linus Torvalds
On Mon, Mar 11, 2019 at 11:08 AM Måns Rullgård  wrote:
>
> The latest version I have is 5.1, and that uses ECOFF.

ECOFF _is_ a.out as far as Linux is concerned.

So Linux basically treats ECOFF as "regular a.out with just some
header extensions".

We don't have any specific support for ECOFF.

I _think_. Again, it's been years and years.

   Linus


Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread Matt Turner
On Mon, Mar 11, 2019 at 9:26 AM Måns Rullgård  wrote:
>
> Linus Torvalds  writes:
>
> > On Sun, Mar 10, 2019 at 2:37 PM Matt Turner  wrote:
> >>
> >> I'm not aware of a reason to keep a.out support on alpha.
> >
> > Hmm. I was looking at removing a.out support entirely, but it's
> > actually fairly incestuous on alpha.
> >
> > For example, arch/alpha/boot/tools/objstrip.c very much has some a.out
> > support in it. Maybe it can just be removed entirely.
> >
> > There's also an a.out.h include in arch/alpha/kernel/binfmt_loader.c.
> >
> > Finally, note that CONFIG_OSF4_COMPAT also no longer makes sense
> > without a.out support.
>
> Anyone running an Alpha machine likely also has some old OSF/1 binaries
> they may wish to use.  It would be a shame to remove this feature, IMO.

Tru64 5.1 uses ECOFF binaries, I believe. Do you know when OSF/1 /
Digital UNIX / Tru64 switched from a.out to ECOFF?


Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread Måns Rullgård
Linus Torvalds  writes:

> On Mon, Mar 11, 2019 at 9:26 AM Måns Rullgård  wrote:
>>
>> Anyone running an Alpha machine likely also has some old OSF/1 binaries
>> they may wish to use.  It would be a shame to remove this feature, IMO.
>
> If that's the case then we'd have to keep a.out alive for alpha, since
> that's the OSF/1 binary format (at least the only one we support - I'm
> not sure if later versions of OSF/1 ended up getting ELF).

The latest version I have is 5.1, and that uses ECOFF.

> Which I guess we could do, but the question is whether people really
> do have OSF/1 binaries. It was really useful early on as a source of
> known-good binaries to test with, but I'm not convinced it's still in
> use.
>
> It's not like there were OSF/1 binaries that we didn't havce access to
> natively (well, there _were_ special ones that didn't have open source
> versions, but most of them required more system-side support than
> Linux ever implemented, afaik).

I don't have any specific examples, but I can well imagine people
keeping an Alpha machine for no other reason than the ability to run
some (old) application only available (to them) for OSF/1.  Running them
on Linux rather than Tru64 brings the advantage of being a modern system
in other regards.

For anything open source, there's little reason to keep the Alpha at
all.

-- 
Måns Rullgård


Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread Linus Torvalds
On Mon, Mar 11, 2019 at 9:26 AM Måns Rullgård  wrote:
>
> Anyone running an Alpha machine likely also has some old OSF/1 binaries
> they may wish to use.  It would be a shame to remove this feature, IMO.

If that's the case then we'd have to keep a.out alive for alpha, since
that's the OSF/1 binary format (at least the only one we support - I'm
not sure if later versions of OSF/1 ended up getting ELF).

Which I guess we could do, but the question is whether people really
do have OSF/1 binaries. It was really useful early on as a source of
known-good binaries to test with, but I'm not convinced it's still in
use.

It's not like there were OSF/1 binaries that we didn't havce access to
natively (well, there _were_ special ones that didn't have open source
versions, but most of them required more system-side support than
Linux ever implemented, afaik).

   Linus


Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread Måns Rullgård
Linus Torvalds  writes:

> On Sun, Mar 10, 2019 at 2:37 PM Matt Turner  wrote:
>>
>> I'm not aware of a reason to keep a.out support on alpha.
>
> Hmm. I was looking at removing a.out support entirely, but it's
> actually fairly incestuous on alpha.
>
> For example, arch/alpha/boot/tools/objstrip.c very much has some a.out
> support in it. Maybe it can just be removed entirely.
>
> There's also an a.out.h include in arch/alpha/kernel/binfmt_loader.c.
>
> Finally, note that CONFIG_OSF4_COMPAT also no longer makes sense
> without a.out support.

Anyone running an Alpha machine likely also has some old OSF/1 binaries
they may wish to use.  It would be a shame to remove this feature, IMO.

-- 
Måns Rullgård


[PATCH] y2038: fix socket.h header inclusion

2019-03-11 Thread Arnd Bergmann
Referencing the __kernel_long_t type caused some user space applications
to stop compiling when they had not already included linux/posix_types.h,
e.g.

s/multicast.c -o ext/sockets/multicast.lo
In file included from /builddir/build/BUILD/php-7.3.3/main/php.h:468,
 from /builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c:27:
/builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c: In function 
'zm_startup_sockets':
/builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c:776:40: error: 
'__kernel_long_t' undeclared (first use in this function)
  776 |  REGISTER_LONG_CONSTANT("SO_SNDTIMEO", SO_SNDTIMEO, CONST_CS | 
CONST_PERSISTENT);

It is safe to include that header here, since it only contains kernel
internal types that do not conflict with other user space types.

It's still possible that some related build failures remain, but those
are likely to be for code that is not already y2038 safe.

Reported-by: Laura Abbott 
Fixes: a9beb86ae6e5 ("sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW")
Signed-off-by: Arnd Bergmann 
---
 arch/alpha/include/uapi/asm/socket.h  | 2 +-
 arch/mips/include/uapi/asm/socket.h   | 2 +-
 arch/parisc/include/uapi/asm/socket.h | 2 +-
 arch/sparc/include/uapi/asm/socket.h  | 2 +-
 include/uapi/asm-generic/socket.h | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 0d0fddb7e738..976e89b116e5 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -2,8 +2,8 @@
 #ifndef _UAPI_ASM_SOCKET_H
 #define _UAPI_ASM_SOCKET_H
 
+#include 
 #include 
-#include 
 
 /* For setsockopt(2) */
 /*
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index eb9f33f8a8b3..d41765cfbc6e 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -10,8 +10,8 @@
 #ifndef _UAPI_ASM_SOCKET_H
 #define _UAPI_ASM_SOCKET_H
 
+#include 
 #include 
-#include 
 
 /*
  * For setsockopt(2)
diff --git a/arch/parisc/include/uapi/asm/socket.h 
b/arch/parisc/include/uapi/asm/socket.h
index 16e428f03526..66c5dd245ac7 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -2,8 +2,8 @@
 #ifndef _UAPI_ASM_SOCKET_H
 #define _UAPI_ASM_SOCKET_H
 
+#include 
 #include 
-#include 
 
 /* For setsockopt(2) */
 #define SOL_SOCKET 0x
diff --git a/arch/sparc/include/uapi/asm/socket.h 
b/arch/sparc/include/uapi/asm/socket.h
index 88fe4f978aca..9265a9eece15 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -2,8 +2,8 @@
 #ifndef _ASM_SOCKET_H
 #define _ASM_SOCKET_H
 
+#include 
 #include 
-#include 
 
 /* For setsockopt(2) */
 #define SOL_SOCKET 0x
diff --git a/include/uapi/asm-generic/socket.h 
b/include/uapi/asm-generic/socket.h
index c8b430cb6dc4..8c1391c89171 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -2,8 +2,8 @@
 #ifndef __ASM_GENERIC_SOCKET_H
 #define __ASM_GENERIC_SOCKET_H
 
+#include 
 #include 
-#include 
 
 /* For setsockopt(2) */
 #define SOL_SOCKET 1
-- 
2.20.0



[PATCH] arch: alpha: Kconfig: pedantic formatting

2019-03-11 Thread Enrico Weigelt, metux IT consult
Formatting of Kconfig files doesn't look so pretty, so let the
Great White Handkerchief come around and clean it up.

Signed-off-by: Enrico Weigelt, metux IT consult 
---
 arch/alpha/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 584a6e1..ca70946 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -550,7 +550,7 @@ config NR_CPUS
default "4" if !ALPHA_GENERIC && !ALPHA_MARVEL
help
  MARVEL support can handle a maximum of 32 CPUs, all the others
-  with working support have a maximum of 4 CPUs.
+ with working support have a maximum of 4 CPUs.
 
 config ARCH_DISCONTIGMEM_ENABLE
bool "Discontiguous Memory Support"
@@ -662,7 +662,7 @@ choice
 endchoice
 
 config HZ
-   int 
+   int
default 32 if HZ_32
default 64 if HZ_64
default 128 if HZ_128
-- 
1.9.1



Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread Arnd Bergmann
On Sun, Mar 10, 2019 at 11:46 PM Linus Torvalds
 wrote:
>
> On Sun, Mar 10, 2019 at 2:37 PM Matt Turner  wrote:
> >
> > I'm not aware of a reason to keep a.out support on alpha.
>
> Hmm. I was looking at removing a.out support entirely, but it's
> actually fairly incestuous on alpha.
>
> For example, arch/alpha/boot/tools/objstrip.c very much has some a.out
> support in it. Maybe it can just be removed entirely.
>
> There's also an a.out.h include in arch/alpha/kernel/binfmt_loader.c.
>
> Finally, note that CONFIG_OSF4_COMPAT also no longer makes sense
> without a.out support.
>
> So this attached patch does not compile on alpha, but it's been many
> many years since I had an alpha to test with, so I'm stuck.
>
> Matt, can you fill in the details and complete this patch?

I wonder if we could remove the osf time32 compat code as well,
this was one of the areas that kept causing problems with the y2038
rework. (I think it's all good now, but it's never been tested as far as I
can tell).

For some syscalls (e.g. brk, mmap, getxuid, ...) we definitely need to
keep the osf1 version, since it is the only supported ABI.

I just looked up some really old source trees and found that glibc-2.1
was the first release to use 64-bit time_t the way we do it today,
as implemented in [1], so all Debian and SuSE releases for alpha
had it, but any ELF binaries built on Red Hat Linux 4.x and 5.x
(released 1996 through 1998) or earlier would use 32-bit time_t
osf1 syscalls. Red Hat 2.x and 3.x were a.out based on alpha.

  Arnd

[1] https://repo.or.cz/glibc/history.git/commitdiff/64819b5c3a94e81e4


Re: [PATCH] x86: Deprecate a.out support

2019-03-11 Thread John Paul Adrian Glaubitz
Hi Linus!

On 3/11/19 7:40 AM, Linus Torvalds wrote:
> So this attached patch does not compile on alpha, but it's been many
> many years since I had an alpha to test with, so I'm stuck.

Michael Cree (CC'ed) has several Alpha servers running which are also used
for building Debian packages and testing kernels.

I also have four AlphaStations (233, 433au and XP-1000) on which I could
test any patch. But I'm currently in Japan until Friday next week, so
I don't have any means to access the machines.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: [PATCH] x86: Deprecate a.out support

2019-03-10 Thread Al Viro
On Sun, Mar 10, 2019 at 03:40:20PM -0700, Linus Torvalds wrote:
>  SYSCALL_DEFINE3(osf_readv, unsigned long, fd,
>   const struct iovec __user *, vector, unsigned long, count)
>  {
> -#ifdef CONFIG_OSF4_COMPAT
> - if (unlikely(personality(current->personality) == PER_OSF4))
> - if (osf_fix_iov_len(vector, count))
> - return -EFAULT;
> -#endif
> -
>   return sys_readv(fd, vector, count);
>  }
>  
>  SYSCALL_DEFINE3(osf_writev, unsigned long, fd,
>   const struct iovec __user *, vector, unsigned long, count)
>  {
> -#ifdef CONFIG_OSF4_COMPAT
> - if (unlikely(personality(current->personality) == PER_OSF4))
> - if (osf_fix_iov_len(vector, count))
> - return -EFAULT;
> -#endif
>   return sys_writev(fd, vector, count);
>  }

Might as well kill those two off, while we are at it - just use 
sys_readv/sys_writev
in arch/alpha/kernel/syscalls/syscall.tbl and be done with that...


Re: [PATCH] x86: Deprecate a.out support

2019-03-10 Thread Linus Torvalds
On Sun, Mar 10, 2019 at 2:37 PM Matt Turner  wrote:
>
> I'm not aware of a reason to keep a.out support on alpha.

Hmm. I was looking at removing a.out support entirely, but it's
actually fairly incestuous on alpha.

For example, arch/alpha/boot/tools/objstrip.c very much has some a.out
support in it. Maybe it can just be removed entirely.

There's also an a.out.h include in arch/alpha/kernel/binfmt_loader.c.

Finally, note that CONFIG_OSF4_COMPAT also no longer makes sense
without a.out support.

So this attached patch does not compile on alpha, but it's been many
many years since I had an alpha to test with, so I'm stuck.

Matt, can you fill in the details and complete this patch?

  Linus
 arch/alpha/Kconfig  |   1 -
 arch/alpha/kernel/osf_sys.c |  30 
 arch/m68k/Kconfig   |   1 -
 arch/x86/Kconfig|   7 -
 arch/x86/ia32/Makefile  |   2 -
 arch/x86/ia32/ia32_aout.c   | 330 --
 fs/Kconfig.binfmt   |  33 -
 fs/Makefile |   1 -
 fs/binfmt_aout.c| 343 
 include/linux/a.out.h   |  18 ---
 10 files changed, 766 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 584a6e114853..9b9770b45f36 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -10,7 +10,6 @@ config ALPHA
 	select FORCE_PCI if !ALPHA_JENSEN
 	select PCI_DOMAINS if PCI
 	select PCI_SYSCALL if PCI
-	select HAVE_AOUT
 	select HAVE_IDE
 	select HAVE_OPROFILE
 	select HAVE_PCSPKR_PLATFORM
diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index bf497b8b0ec6..09a0746c9681 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -1342,45 +1342,15 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
 	return addr;
 }
 
-#ifdef CONFIG_OSF4_COMPAT
-/* Clear top 32 bits of iov_len in the user's buffer for
-   compatibility with old versions of OSF/1 where iov_len
-   was defined as int. */
-static int
-osf_fix_iov_len(const struct iovec __user *iov, unsigned long count)
-{
-	unsigned long i;
-
-	for (i = 0 ; i < count ; i++) {
-		int __user *iov_len_high = (int __user *)[i].iov_len + 1;
-
-		if (put_user(0, iov_len_high))
-			return -EFAULT;
-	}
-	return 0;
-}
-#endif
-
 SYSCALL_DEFINE3(osf_readv, unsigned long, fd,
 		const struct iovec __user *, vector, unsigned long, count)
 {
-#ifdef CONFIG_OSF4_COMPAT
-	if (unlikely(personality(current->personality) == PER_OSF4))
-		if (osf_fix_iov_len(vector, count))
-			return -EFAULT;
-#endif
-
 	return sys_readv(fd, vector, count);
 }
 
 SYSCALL_DEFINE3(osf_writev, unsigned long, fd,
 		const struct iovec __user *, vector, unsigned long, count)
 {
-#ifdef CONFIG_OSF4_COMPAT
-	if (unlikely(personality(current->personality) == PER_OSF4))
-		if (osf_fix_iov_len(vector, count))
-			return -EFAULT;
-#endif
 	return sys_writev(fd, vector, count);
 }
 
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index b54206408f91..65d263c60669 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -8,7 +8,6 @@ config M68K
 	select ARCH_NO_COHERENT_DMA_MMAP if !MMU
 	select ARCH_NO_PREEMPT if !COLDFIRE
 	select HAVE_IDE
-	select HAVE_AOUT if MMU
 	select HAVE_DEBUG_BUGVERBOSE
 	select GENERIC_IRQ_SHOW
 	select GENERIC_ATOMIC64
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c1f9b3cf437c..4a9438e4fba6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2836,13 +2836,6 @@ config IA32_EMULATION
 	  64-bit kernel. You should likely turn this on, unless you're
 	  100% sure that you don't have any 32-bit programs left.
 
-config IA32_AOUT
-	tristate "IA32 a.out support"
-	depends on IA32_EMULATION
-	depends on BROKEN
-	---help---
-	  Support old a.out binaries in the 32bit emulation.
-
 config X86_X32
 	bool "x32 ABI for 64-bit mode"
 	depends on X86_64
diff --git a/arch/x86/ia32/Makefile b/arch/x86/ia32/Makefile
index cd4339bae066..b98fedaa7642 100644
--- a/arch/x86/ia32/Makefile
+++ b/arch/x86/ia32/Makefile
@@ -4,7 +4,5 @@
 
 obj-$(CONFIG_IA32_EMULATION) := sys_ia32.o ia32_signal.o
 
-obj-$(CONFIG_IA32_AOUT) += ia32_aout.o
-
 audit-class-$(CONFIG_AUDIT) := audit.o
 obj-$(CONFIG_IA32_EMULATION) += $(audit-class-y)
diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
deleted file mode 100644
index 3c135084e1eb..
--- a/arch/x86/ia32/ia32_aout.c
+++ /dev/null
@@ -1,330 +0,0 @@
-/*
- *  a.out loader for x86-64
- *
- *  Copyright (C) 1991, 1992, 1996  Linus Torvalds
- *  Hacked together by Andi Kleen
- */
-
-#include 
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#undef WARN_OLD
-
-static int load_aout_binary(struct linux_binprm *);
-static int load_aout_library(struct file *);
-
-static struct linux_binfmt 

Re: [PATCH] x86: Deprecate a.out support

2019-03-10 Thread Matt Turner
On Tue, Mar 5, 2019 at 11:04 AM Borislav Petkov  wrote:
>
> On Tue, Mar 05, 2019 at 07:11:38PM +0100, Borislav Petkov wrote:
> > I guess you could Cc arch maintainers with the a.out-core.h removal
> > patch to see if anyone screams.
>
> And they're like two for which we need confirmation:
>
> $ git ls-files | grep a.out-core.h
> arch/alpha/include/asm/a.out-core.h
> arch/m68k/include/asm/a.out-core.h
> arch/um/include/asm/a.out-core.h
> arch/x86/include/asm/a.out-core.h
>
> um and x86 are clear.
>
> Adding alpha and m68k MLs to Cc.

I'm not aware of a reason to keep a.out support on alpha.


Re: [PATCH] x86: Deprecate a.out support

2019-03-06 Thread Geert Uytterhoeven
Hi Borislav,

On Tue, Mar 5, 2019 at 8:04 PM Borislav Petkov  wrote:
> On Tue, Mar 05, 2019 at 07:11:38PM +0100, Borislav Petkov wrote:
> > I guess you could Cc arch maintainers with the a.out-core.h removal
> > patch to see if anyone screams.
>
> And they're like two for which we need confirmation:
>
> $ git ls-files | grep a.out-core.h
> arch/alpha/include/asm/a.out-core.h
> arch/m68k/include/asm/a.out-core.h
> arch/um/include/asm/a.out-core.h
> arch/x86/include/asm/a.out-core.h
>
> um and x86 are clear.
>
> Adding alpha and m68k MLs to Cc.

Thanks!

The oldest binaries I still have lying around (an ext2 ramdisk image,
still used from time to time) are ELF, from just after the a.out to ELF
transition on m68k.

I think it's safe to assume no one still runs a.out binaries on m68k.

Gr{oetje,eeting}s,

Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH] x86: Deprecate a.out support

2019-03-05 Thread Borislav Petkov
On Tue, Mar 05, 2019 at 07:11:38PM +0100, Borislav Petkov wrote:
> I guess you could Cc arch maintainers with the a.out-core.h removal
> patch to see if anyone screams.

And they're like two for which we need confirmation:

$ git ls-files | grep a.out-core.h
arch/alpha/include/asm/a.out-core.h
arch/m68k/include/asm/a.out-core.h
arch/um/include/asm/a.out-core.h
arch/x86/include/asm/a.out-core.h

um and x86 are clear.

Adding alpha and m68k MLs to Cc.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] add delay between port write and port read

2019-03-01 Thread Maciej W. Rozycki
On Fri, 1 Mar 2019, Linus Torvalds wrote:

> > Without that, using __raw_xyz() to copy between RAM and
> > buffers in PCI memory space is broken, as you said, but the
> > assumption would be broken on certain older machines that
> > do a hardware endian swap by swizzling the address lines rather
> > than swapping bytes on the data bus.
> 
> Honestly, there's no way we can ever fix that.

 Well, (: the good news is we actually have it covered in the MIPS port 
already, with the interfaces I described in the other e-mail.  We have had 
it for some 15 years now, ever since we got the ATA PIO stuff covered.

 So all that we need is to agree upon names so that everyone does not come 
up with their own ones, write them down somewhere, and let individual port 
maintainers sort it out as they deem necessary.  This way we have a 
solution available, but don't have to do anything.  Win-win.

> Because no new CPU's will ever be designed where the byte order isn't
> little-endian, and nobody will ever make those broken "we'll do random
> things in hardware to worfk around the fact that we're doing crazy
> things" machines.

 Even better then: the trasformation will be trivial, i.e. pass-through, 
so all the new CPU stuff will merely alias all the accessor variants to 
each other.  Which means we can put all that as a bunch of #defines in 
include/asm-generic/io.h and forget about it.

  Maciej


Re: [PATCH] add delay between port write and port read

2019-03-01 Thread Linus Torvalds
On Fri, Mar 1, 2019 at 1:52 PM Arnd Bergmann  wrote:
>
> Without that, using __raw_xyz() to copy between RAM and
> buffers in PCI memory space is broken, as you said, but the
> assumption would be broken on certain older machines that
> do a hardware endian swap by swizzling the address lines rather
> than swapping bytes on the data bus.

Honestly, there's no way we can ever fix that.

The fact is, __raw_readl/writel simply isn't portable. If a driver
uses them, the driver HAS TO KNOW what the situation is on the
hardware it runs on.

Because the "byte order" of the end result will basically be random on
some machines.

And you know what? We absolutely do not care. Not one whit.

Because no new CPU's will ever be designed where the byte order isn't
little-endian, and nobody will ever make those broken "we'll do random
things in hardware to worfk around the fact that we're doing crazy
things" machines.

So it's all legacy anyway, and it's absolutely not worth
over-designing or over-thinking this issue.

If you have a random SoC that does insane things, that random SoC may

 (a) need its own drivers that know exactly how it works

 (b) not be able to use some drivers for IP that the SoC has, but the
SoC designers screwed up and did some random crazy byte lane swizzling

and that's all perfectly fine.  We don't care. That SoC gets to do its
own drivers.

And 99% of truly generic drivers won't have any of these issues,
because they don't use __raw_readl/writel to begin with.

So don't make this any stranger than it is. Just accept that by
design, and by definition, __raw_readl/writel is a bit non-portable,
and won't work for everything.

The corollary of that is that __raw_readl/writel() may also then be
used _for_ that mis-designed piece of garbage SoC to implement the
special driver needed for just that SoC.

But as mentioned, this simply isn't a problem in practice. Sure,
people can do insane things if they really want to. Maybe some "stable
genius" decides to do a big-endian RISC-V core. We don't care, and we
don't make everybody else have to worry about that kind of insanity.

   Linus


Re: [PATCH] add delay between port write and port read

2019-03-01 Thread Maciej W. Rozycki
On Fri, 1 Mar 2019, Arnd Bergmann wrote:

> > I think the whole point of __raw_xyz() is that it's the lowest level
> > model. It gives you relaxed ordering (together with the ioremap
> > model), and it gives you straight-through behavior.
> >
> > And yes, any driver using them needs to be aware of the byte ordering,
> > which may or may not be the same as regular memory, and may or may not
> > be the same as other devices.
> >
> > So __raw_xyz() is very much for low-level drivers that know what they
> > are doing. Caveat user.
> >
> > "If it breaks, you get to keep both pieces"
> 
> I agree in principle, but I think we already have a lot of precedence
> for __raw_xyz() being relied on having a specific behavior in
> architecture independent drivers, and I think it makes sense for
> architectures to provide that.
> 
> Specifically, I think we need __raw_xyz() to do the same as xyz()
> on all little-endian kernels regarding byte ordering (not barriers), and
> I would expect it to provide the same ordering and addressing
> as swabX(xyz()) on big-endian kernels.
> 
> Without that, using __raw_xyz() to copy between RAM and
> buffers in PCI memory space is broken, as you said, but the
> assumption would be broken on certain older machines that
> do a hardware endian swap by swizzling the address lines rather
> than swapping bytes on the data bus.

 Ah, swizzling!  Thanks for reminding me of that.  The swizzling machines 
will be those that do bit (rather than byte) lane matching in hardware and 
consequently produce a numeric value of data that is consistent between 
the CPU and a device when accessed in bus-width quantities.  These do 
indeed require address adjustment if you access a narrower quantity.

> The best idea I have for working around this is to never rely
> on __raw_xyz() to not do byte swapping in platform specific
> drivers with CPU-endian MMIO space, but to have a platform
> specific set of wrappers around the normal I/O functions, and
> make __raw_xyz() just do whatever we expect them to do on
> PCI devices.

 For the record in the MIPS port we have the following accessors:

1. `readX', `writeX', `inX', `outX': data passed in the CPU endianness, 
   ordered WRT MMIO and (for reads) WRT memory; suitable for device CSR 
   accesses,

2. `__mem_readX', `__mem_writeX', `__mem_inX', `__mem_outX': data passed 
   in the memory endianness, ordered WRT MMIO and (for reads) WRT memory, 
   suitable for data transfers between memory and a device, e.g. PIO ATA,

3. `__relaxed_readX', `__relaxed_writeX': data passed in the CPU 
   endianness, ordered WRT MMIO only; aliased to `*_relaxed',

4. `__raw_readX', `__raw_writeX': data passed straight-through with no 
   transformation, ordered WRT MMIO and (for reads) WRT memory.

For correct operation of generic drivers we need at least #1 and #2, and 
we need #4 for platform drivers (which will know they're on a local bus); 
#3 is an optimisation only that is nice having, but drivers would do with 
#1 instead.  Note however that depending on the wiring of a particular 
big-endian machine we may have to swizzle addresses for #1, #2, #3, and 
then byte-swap either #1 and #3, or #2.

 See arch/mips/include/asm/*/mangle-port.h for all the mess.

 It actually took quite a while back in early 2000s to get PIO ATA to work 
correctly on big-endian MIPS machines, because people did not understand 
the subtlety between the CPU and the memory endianness, and consequently 
that different accessors are required for the 16-bit ATA data register; 
you just can't get 16-bit ATA data transfers and 16-bit CSR accesses made 
elsewhere right with the use of a single accessor only, we need two.

 I haven't looked at what other ports did in this area, but we do need at 
least #1 and #2, strongly-ordered, kernel-wide, and then we can discuss 
what the semantics of `__raw_xyz' is supposed to be, but at least some 
ports will have to retain the semantics of #4, strongly-ordered, under 
whatever name.  I think using a uniform one would be advantageous though, 
for obvious reasons.

 NB the old IDE driver had special local provisions for PIO, which wired 
#2 accessors on MIPS by including a platform specific header that provided 
IDE-specific names and I suppose libata carried that over or it wouldn't 
work.  But I think that instead of having such driver-specific hacks that 
require port maintainers' attention on a case-by-case basis we really 
ought to provide #2 semantics kernel-wide.

  Maciej


Re: [PATCH] add delay between port write and port read

2019-03-01 Thread Arnd Bergmann
On Fri, Mar 1, 2019 at 8:19 PM Linus Torvalds
 wrote:
>
> On Fri, Mar 1, 2019 at 11:13 AM Maciej W. Rozycki  
> wrote:
> >
> >  What do we do WRT straight-through vs byte-swapping properties of these
> > accessors?
>
> I think the whole point of __raw_xyz() is that it's the lowest level
> model. It gives you relaxed ordering (together with the ioremap
> model), and it gives you straight-through behavior.
>
> And yes, any driver using them needs to be aware of the byte ordering,
> which may or may not be the same as regular memory, and may or may not
> be the same as other devices.
>
> So __raw_xyz() is very much for low-level drivers that know what they
> are doing. Caveat user.
>
> "If it breaks, you get to keep both pieces"

I agree in principle, but I think we already have a lot of precedence
for __raw_xyz() being relied on having a specific behavior in
architecture independent drivers, and I think it makes sense for
architectures to provide that.

Specifically, I think we need __raw_xyz() to do the same as xyz()
on all little-endian kernels regarding byte ordering (not barriers), and
I would expect it to provide the same ordering and addressing
as swabX(xyz()) on big-endian kernels.

Without that, using __raw_xyz() to copy between RAM and
buffers in PCI memory space is broken, as you said, but the
assumption would be broken on certain older machines that
do a hardware endian swap by swizzling the address lines rather
than swapping bytes on the data bus.

The best idea I have for working around this is to never rely
on __raw_xyz() to not do byte swapping in platform specific
drivers with CPU-endian MMIO space, but to have a platform
specific set of wrappers around the normal I/O functions, and
make __raw_xyz() just do whatever we expect them to do on
PCI devices.

   Arnd


Re: [PATCH] add delay between port write and port read

2019-03-01 Thread Maciej W. Rozycki
On Fri, 1 Mar 2019, Linus Torvalds wrote:

> Because quite often you don't want any extra byte ordering because
> you've moving things around anyway (ie you're copying from the device
> to memory or similar, and switching to little-endian in between would
> just mean that you have to switch back for the memory write anyway).

 Good point!

  Maciej


Re: [PATCH] add delay between port write and port read

2019-03-01 Thread Linus Torvalds
On Fri, Mar 1, 2019 at 11:13 AM Maciej W. Rozycki  wrote:
>
>  What do we do WRT straight-through vs byte-swapping properties of these
> accessors?

I think the whole point of __raw_xyz() is that it's the lowest level
model. It gives you relaxed ordering (together with the ioremap
model), and it gives you straight-through behavior.

And yes, any driver using them needs to be aware of the byte ordering,
which may or may not be the same as regular memory, and may or may not
be the same as other devices.

So __raw_xyz() is very much for low-level drivers that know what they
are doing. Caveat user.

"If it breaks, you get to keep both pieces"

 Linus


Re: [PATCH] add delay between port write and port read

2019-03-01 Thread Maciej W. Rozycki
On Fri, 1 Mar 2019, Linus Torvalds wrote:

> So that would seem what an architecture implementation should _aim_
> for: having various "ioremap_xyz()" for setting the
> PCIe/system/whatever controller level ordering, and then using the
> "__raw_xyz()" accessors for unordered CPU accesses.

 What do we do WRT straight-through vs byte-swapping properties of these 
accessors?

 For the record: we do need to have straight-through accessors for 
endianness-agnostic peripherals.  This is the case for example with SOC 
devices that have plenty of I/O onchip that are always accessed natively 
regardless of the endianness of the external bus.

 E.g. with the Broadcom SiByte BCM1250 SOC we have a whole lot of devices 
onchip (including a pair of MIPS64 CPU cores), and then PCI and HT 
endpoints, and the SOC's endianness is configurable at reset.  The 
endianness selected only applies to the external bus, affecting external 
data lanes: in the big-endian case there's a choice between matching byte 
lanes or bit lanes by using one of the two MMIO physical address spaces 
that apply either of these policies; in the little-endian case access is 
pass-through of course.  Consequently we have to use straight-through 
accessors for onchip devices and byte-swapped ones for PCI/HT devices.

  Maciej


Re: [PATCH] add delay between port write and port read

2019-03-01 Thread Linus Torvalds
On Fri, Mar 1, 2019 at 10:03 AM Maciej W. Rozycki  wrote:
>
>  Well, `__raw_*' accessors are never byte-swapped, not at least with the
> MIPS port, making them a tad cumbersome for a driver that has no interest
> in paying attention to any endianness mismatch between the CPU bus and the
> device's peripheral bus.

Well, the people who want ultimate performance and not worry about
access ordering almost always _also_ want to handle byte ordering
manually.

Because quite often you don't want any extra byte ordering because
you've moving things around anyway (ie you're copying from the device
to memory or similar, and switching to little-endian in between would
just mean that you have to switch back for the memory write anyway).

Does it make things more complicated for a driver when it has to care
about byte ordering and not just say "PCI is always little-endian"?
Yes. But if you want simple, you shouldn't be doing the unordered
accesses.

   Linus


Re: [PATCH] add delay between port write and port read

2019-03-01 Thread Maciej W. Rozycki
Hi Will,

> >  FAOD, I think this assumption/requirement only applies to the plain 
> > accessors (`inX', `readX', `ioreadX', etc.).
> 
> It's also a requirement for the *_relaxed accessors, and there are drivers
> that rely on this being the case.

 Well, from the reading of memory-barriers.txt, be it as it stands, or 
with your rewording applied, I take it the `*_relaxed' accessors do not 
guarantee ordering WRT locking or DMA and hence a trailing barrier in 
`readX_relaxed' is not necessary, and so we don't need a trailing `mb' 
with the Alpha port either (but we do need a leading `mb' there, as well 
as with `writeX_relaxed').

> >  For performance reasons we may decide sometime to opt in for accessors 
> > that do not suffer from the requirement to be strongly ordered WRT each 
> > other, for the benefit to architectures that are not strongly ordered with 
> > MMIO and that suffer a lot from serialising accesses that do not really 
> > care, e.g. where you need to load a bunch of device registers or maybe 
> > even device RAM in any order before making a serialised final request to 
> > accept the values loaded.
> 
> I'd expect accesses to device RAM to use something like ioremap_wc() if
> possible. In that case, the ordering of accesses is weakened by the
> underlying memory type in the page tables, but we're not yet at the point
> where we've figured out the portable semantics in this case. I plan to
> look at that once we've nailed normal ioremap()!

 Some CPU hardware and certainly many if not most MIPS implementations do 
not give such a finegrained control over mapping.  I think all the MIPS 
CPUs I dealt with only gave the choice between a cached (coherent or not) 
and an uncached mapping, with the ordering being weak in all cases.  So 
the only control over ordering was with explicit barrier instructions.

> >  That piece of hardware is however rather peculiar and not an example of 
> > the most common design seen nowadays and I am not sure if the extra 
> > maintenance burden across all the ports for any additional accessors would 
> > be outweighed by the benefit for the weakly ordered MMIO architectures 
> > (where an execution stall can indeed count in hundreds of clock cycles per 
> > barrier inserted) combined with the appreciation (i.e. actual use) level 
> > from driver writers who do not necessarily grok all that weak ordering 
> > business.
> 
> If you use ioremap_wc() for device RAM and __raw_readX() for bunching up
> register accesses, then I don't think we need to add anything new, do we?

 Well, `__raw_*' accessors are never byte-swapped, not at least with the 
MIPS port, making them a tad cumbersome for a driver that has no interest 
in paying attention to any endianness mismatch between the CPU bus and the 
device's peripheral bus.

 Granted this does not matter for this driver as it only has a chance to 
be used with DECstation (MIPS), DEC 3000 AXP (Alpha) and VAXstation 4000 
(VAX) hardware, all of which are little-endian throughout (and not all of 
which we have support for upstream).  Still I'd consider it a hack as 
obviously we do have drivers for options living on say PCI, which is 
little-endian, that we want to use with processors configured for the big 
endianness with their FSB.

  Maciej


Re: [PATCH] add delay between port write and port read

2019-03-01 Thread Linus Torvalds
On Fri, Mar 1, 2019 at 9:33 AM Will Deacon  wrote:
>
> I'd expect accesses to device RAM to use something like ioremap_wc() if
> possible. In that case, the ordering of accesses is weakened by the
> underlying memory type in the page tables, but we're not yet at the point
> where we've figured out the portable semantics in this case. I plan to
> look at that once we've nailed normal ioremap()!

The case that matters most from a performance standpoint is
traditionally stupid framebuffer accesses, and the fb layer has
basically standardized this:

  #define fb_writel __raw_writel

together with various fb drivers then using "wmb()" etc for ordering
for the non-framebuffer effects.

So that would seem what an architecture implementation should _aim_
for: having various "ioremap_xyz()" for setting the
PCIe/system/whatever controller level ordering, and then using the
"__raw_xyz()" accessors for unordered CPU accesses.

Linus


Re: [PATCH] add delay between port write and port read

2019-03-01 Thread Will Deacon
Hi Maciej,

On Wed, Feb 27, 2019 at 06:49:57PM +, Maciej W. Rozycki wrote:
> On Tue, 26 Feb 2019, Will Deacon wrote:
> 
> > > If they are the same device (just different data ports), I'd
> > > *definitely* expect them to be ordered.
> > > 
> > > We have tons of code that depends on that. Almost every driver out
> > > there, in fact.
> > > 
> > > So we need the mb() on alpha to guarantee the access ordering on the
> > > CPU side, and then PCI itself ends up guaranteeing that accesses to
> > > the same device will remain ordered outside the CPU.
> > > 
> > > Agreed?
> > 
> > Yup, agreed. I'd consider all those ports to be the same endpoint, so we're
> > good.
> 
>  FAOD, I think this assumption/requirement only applies to the plain 
> accessors (`inX', `readX', `ioreadX', etc.).

It's also a requirement for the *_relaxed accessors, and there are drivers
that rely on this being the case.

>  For performance reasons we may decide sometime to opt in for accessors 
> that do not suffer from the requirement to be strongly ordered WRT each 
> other, for the benefit to architectures that are not strongly ordered with 
> MMIO and that suffer a lot from serialising accesses that do not really 
> care, e.g. where you need to load a bunch of device registers or maybe 
> even device RAM in any order before making a serialised final request to 
> accept the values loaded.

I'd expect accesses to device RAM to use something like ioremap_wc() if
possible. In that case, the ordering of accesses is weakened by the
underlying memory type in the page tables, but we're not yet at the point
where we've figured out the portable semantics in this case. I plan to
look at that once we've nailed normal ioremap()!

>  I made provisions for that with a driver I recently added with commit 
> 61414f5ec983 ("FDDI: defza: Add support for DEC FDDIcontroller 700 
> TURBOchannel adapter"), where locally defined accessor macros suffixed 
> with `_o' and `_u' denote accesses that have to be strongly ordered and 
> can be weakly ordered respectively WRT each other.
> 
>  Right now they all expand to the respective `_relaxed' accessors (with a 
> lone `dma_rmb' inserted appropriately; yes, the device does DMA one way 
> only, and the other one is PIO with a lot of MMIO traffic to board RAM 
> that would benefit from omitting barriers), however they can be replaced 
> with references to truly unordered accessors if we ever have them.
> 
>  That piece of hardware is however rather peculiar and not an example of 
> the most common design seen nowadays and I am not sure if the extra 
> maintenance burden across all the ports for any additional accessors would 
> be outweighed by the benefit for the weakly ordered MMIO architectures 
> (where an execution stall can indeed count in hundreds of clock cycles per 
> barrier inserted) combined with the appreciation (i.e. actual use) level 
> from driver writers who do not necessarily grok all that weak ordering 
> business.

If you use ioremap_wc() for device RAM and __raw_readX() for bunching up
register accesses, then I don't think we need to add anything new, do we?

Will


Re: [PATCH] add delay between port write and port read

2019-02-27 Thread Linus Torvalds
On Wed, Feb 27, 2019 at 10:54 AM Maciej W. Rozycki  wrote:
>
>  For that reason though we don't have the trailing barrier in the
> `readX_relaxed' accessors in the MIPS port.

Makes sense. And alpha should probably follow suit.

 Linus


Re: [PATCH] add delay between port write and port read

2019-02-27 Thread Maciej W. Rozycki
On Wed, 27 Feb 2019, Linus Torvalds wrote:

> > Should "writeb_relaxed" on Alpha also use the barrier?
> 
> I think they should.  Only the double-underscore (__raw_xyz()) ones
> are entirely unordered, the relaxed ones are just unordered wrt
> regular memory and DMA.

 For that reason though we don't have the trailing barrier in the 
`readX_relaxed' accessors in the MIPS port.

  Maciej


Re: [PATCH] add delay between port write and port read

2019-02-27 Thread Maciej W. Rozycki
On Tue, 26 Feb 2019, Will Deacon wrote:

> > If they are the same device (just different data ports), I'd
> > *definitely* expect them to be ordered.
> > 
> > We have tons of code that depends on that. Almost every driver out
> > there, in fact.
> > 
> > So we need the mb() on alpha to guarantee the access ordering on the
> > CPU side, and then PCI itself ends up guaranteeing that accesses to
> > the same device will remain ordered outside the CPU.
> > 
> > Agreed?
> 
> Yup, agreed. I'd consider all those ports to be the same endpoint, so we're
> good.

 FAOD, I think this assumption/requirement only applies to the plain 
accessors (`inX', `readX', `ioreadX', etc.).

 For performance reasons we may decide sometime to opt in for accessors 
that do not suffer from the requirement to be strongly ordered WRT each 
other, for the benefit to architectures that are not strongly ordered with 
MMIO and that suffer a lot from serialising accesses that do not really 
care, e.g. where you need to load a bunch of device registers or maybe 
even device RAM in any order before making a serialised final request to 
accept the values loaded.

 I made provisions for that with a driver I recently added with commit 
61414f5ec983 ("FDDI: defza: Add support for DEC FDDIcontroller 700 
TURBOchannel adapter"), where locally defined accessor macros suffixed 
with `_o' and `_u' denote accesses that have to be strongly ordered and 
can be weakly ordered respectively WRT each other.

 Right now they all expand to the respective `_relaxed' accessors (with a 
lone `dma_rmb' inserted appropriately; yes, the device does DMA one way 
only, and the other one is PIO with a lot of MMIO traffic to board RAM 
that would benefit from omitting barriers), however they can be replaced 
with references to truly unordered accessors if we ever have them.

 That piece of hardware is however rather peculiar and not an example of 
the most common design seen nowadays and I am not sure if the extra 
maintenance burden across all the ports for any additional accessors would 
be outweighed by the benefit for the weakly ordered MMIO architectures 
(where an execution stall can indeed count in hundreds of clock cycles per 
barrier inserted) combined with the appreciation (i.e. actual use) level 
from driver writers who do not necessarily grok all that weak ordering 
business.

  Maciej


Re: [PATCH] add delay between port write and port read

2019-02-27 Thread Maciej W. Rozycki
On Wed, 27 Feb 2019, Linus Torvalds wrote:

> > I suppose you might need the mb() before *and* after the I/O access in the
> > read case. The idea with readX()/ioreadX() is that you should be able to
> > do something like:
> 
> Yeah, that sounds reasonable.
> 
> You might relax the barrier after the readX() to just a rmb(), which
> might make the performance impact slightly less noticeable and might
> be sufficient in practice. But I guess once you do IO, it's not like
> the CPU barrier will be the limiting case.

 FWIW in the MIPS port we've had it as `rmb' since Sinan's commit 
a1cc7034e33d ("MIPS: io: Add barrier after register read in readX()"), but 
then we expand this macro to a hardware SYNC instruction anyway (rather 
than SYNC_RMB or at most SYNC_MB), which is stronger these days even as it 
has been at one point redefined in the architecture as a completion rather 
than an ordering barrier.

 NB MIPS also has SYNC_ACQUIRE and SYNC_RELEASE too, which are assymetric 
load <= load/store and load/store <= store ordering barriers respectively, 
obviously meant for locking.

> So maybe just a full mb() and see if anybody notices. Better to have
> working code than random failures.

 For Alpha it doesn't matter anyway as it doesn't have a separate read 
barrier.  Although I'd prefer to have the expected semantics recorded even 
if the underlying implementation is the same, as otherwise it gets even 
more confusing to people than it already is.

  Maciej


Re: [PATCH] add delay between port write and port read

2019-02-27 Thread Maciej W. Rozycki
On Wed, 27 Feb 2019, Sinan Kaya wrote:

> What we missed is the fact that alpha reorders accesses across two
> register accesses. This is guaranteed in other architectures.

 Not for MIPS either.

  Maciej


Re: [PATCH] add delay between port write and port read

2019-02-27 Thread Linus Torvalds
On Wed, Feb 27, 2019 at 9:26 AM Will Deacon  wrote:
>
> I suppose you might need the mb() before *and* after the I/O access in the
> read case. The idea with readX()/ioreadX() is that you should be able to
> do something like:

Yeah, that sounds reasonable.

You might relax the barrier after the readX() to just a rmb(), which
might make the performance impact slightly less noticeable and might
be sufficient in practice. But I guess once you do IO, it's not like
the CPU barrier will be the limiting case.

That said, excessively weakly ordered CPU's are horrible for a reason,
and the barrier model was broken and made them even more so.  And
alpha is a posted boy for "don't let hw people design a memory odering
just to make it easy for them".

So maybe just a full mb() and see if anybody notices. Better to have
working code than random failures.

  Linus


Re: [PATCH] add delay between port write and port read

2019-02-27 Thread Linus Torvalds
On Wed, Feb 27, 2019 at 9:05 AM Mikulas Patocka  wrote:
>
> Should "writeb_relaxed" on Alpha also use the barrier?

I think they should.  Only the double-underscore (__raw_xyz()) ones
are entirely unordered, the relaxed ones are just unordered wrt
regular memory and DMA.

That said, I doubt anything much cares, and I'm not sure alpha has
ever done that.

 Linus


Re: [PATCH] add delay between port write and port read

2019-02-27 Thread Will Deacon
On Wed, Feb 27, 2019 at 12:12:51PM -0500, Mikulas Patocka wrote:
> 
> 
> On Tue, 26 Feb 2019, Linus Torvalds wrote:
> 
> > Does anybody see any worries with the "just move the mb() earlier in
> > the ioread functions" model?
> > 
> > Linus
> 
> It used to be like that and it worked.
> 
> Then, commits cd0e00c106722eca40b38ebf11cf134c01901086 and 
> 92d7223a74235054f2aa7227d207d9c57f84dca0 came.
> 
> These commits claim that they changed the code to be consistent with the 
> specification (now we have barrier after ioread and before iowrite). But 
> they broke serial port and RTC on my Alpha machine.

I suppose you might need the mb() before *and* after the I/O access in the
read case. The idea with readX()/ioreadX() is that you should be able to
do something like:

status = ioread8(STATUS_REG);
if (status & RX_DMA_COMPLETE)
memcpy(mem_buf, dma_buf, size);

and rely on the DMA'd data being read out of the buffer.

Will


Re: [PATCH] add delay between port write and port read

2019-02-27 Thread Sinan Kaya

On 2/27/2019 12:12 PM, Mikulas Patocka wrote:

It used to be like that and it worked.

Then, commits cd0e00c106722eca40b38ebf11cf134c01901086 and
92d7223a74235054f2aa7227d207d9c57f84dca0 came.

These commits claim that they changed the code to be consistent with the
specification (now we have barrier after ioread and before iowrite). But
they broke serial port and RTC on my Alpha machine.


A barrier before write is needed to ensure that memory operations done
are visible to the hardware before you send a write command.

Barrier after read is needed to ensure that if you read a memory
location after register read, memory contents are coherent.

What we missed is the fact that alpha reorders accesses across two
register accesses. This is guaranteed in other architectures.

To satisfy this, you need something like

barrier
register write
barrier
register read
barrier



Re: [PATCH] add delay between port write and port read

2019-02-27 Thread Mikulas Patocka



On Tue, 26 Feb 2019, Linus Torvalds wrote:

> Does anybody see any worries with the "just move the mb() earlier in
> the ioread functions" model?
> 
> Linus

It used to be like that and it worked.

Then, commits cd0e00c106722eca40b38ebf11cf134c01901086 and 
92d7223a74235054f2aa7227d207d9c57f84dca0 came.

These commits claim that they changed the code to be consistent with the 
specification (now we have barrier after ioread and before iowrite). But 
they broke serial port and RTC on my Alpha machine.

Mikulas


[PATCH v3 01/34] alpha: mm: Add p?d_large() definitions

2019-02-27 Thread Steven Price
walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information will be provided by the
p?d_large() functions/macros.

For alpha, we don't support huge pages, so add stubs returning 0.

CC: linux-alpha@vger.kernel.org
CC: Richard Henderson 
CC: Ivan Kokshaysky 
CC: Matt Turner 
Signed-off-by: Steven Price 
---
 arch/alpha/include/asm/pgtable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h
index 89c2032f9960..e5726d3a3200 100644
--- a/arch/alpha/include/asm/pgtable.h
+++ b/arch/alpha/include/asm/pgtable.h
@@ -254,11 +254,13 @@ extern inline void pte_clear(struct mm_struct *mm, 
unsigned long addr, pte_t *pt
 extern inline int pmd_none(pmd_t pmd)  { return !pmd_val(pmd); }
 extern inline int pmd_bad(pmd_t pmd)   { return (pmd_val(pmd) & 
~_PFN_MASK) != _PAGE_TABLE; }
 extern inline int pmd_present(pmd_t pmd)   { return pmd_val(pmd) & 
_PAGE_VALID; }
+extern inline int pmd_large(pmd_t pmd) { return 0; }
 extern inline void pmd_clear(pmd_t * pmdp) { pmd_val(*pmdp) = 0; }
 
 extern inline int pgd_none(pgd_t pgd)  { return !pgd_val(pgd); }
 extern inline int pgd_bad(pgd_t pgd)   { return (pgd_val(pgd) & 
~_PFN_MASK) != _PAGE_TABLE; }
 extern inline int pgd_present(pgd_t pgd)   { return pgd_val(pgd) & 
_PAGE_VALID; }
+extern inline int pgd_large(pgd_t pgd) { return 0; }
 extern inline void pgd_clear(pgd_t * pgdp) { pgd_val(*pgdp) = 0; }
 
 /*
-- 
2.20.1



Re: [PATCH] add delay between port write and port read

2019-02-27 Thread Mikulas Patocka



On Tue, 26 Feb 2019, Linus Torvalds wrote:

> On Tue, Feb 26, 2019 at 10:38 AM Will Deacon  wrote:
> >
> > That makes sense to me for this Alpha-specific case, but in general I
> > don't think we require that I/O accesses to different endpoints are
> > ordered with respect to each other, which was implied by one of Maciej's
> > examples. e.g.
> >
> > writeb(0x42, _mmio_reg);
> > readb(_mmio_reg);
> 
> If they are the same device (just different data ports), I'd
> *definitely* expect them to be ordered.
> 
> We have tons of code that depends on that. Almost every driver out
> there, in fact.
> 
> So we need the mb() on alpha to guarantee the access ordering on the
> CPU side, and then PCI itself ends up guaranteeing that accesses to
> the same device will remain ordered outside the CPU.
> 
> Agreed?
> 
> Linus

Should "writeb_relaxed" on Alpha also use the barrier?

The understanding is that readX_relaxed and writeX_relaxed are ordered 
with other accesses to the same PCI device. If Alpha doesn't gaurantee 
that, they should use barriers too.

Mikulas


Re: [PATCH] add delay between port write and port read

2019-02-26 Thread Will Deacon
On Tue, Feb 26, 2019 at 10:43:24AM -0800, Linus Torvalds wrote:
> On Tue, Feb 26, 2019 at 10:38 AM Will Deacon  wrote:
> >
> > That makes sense to me for this Alpha-specific case, but in general I
> > don't think we require that I/O accesses to different endpoints are
> > ordered with respect to each other, which was implied by one of Maciej's
> > examples. e.g.
> >
> > writeb(0x42, _mmio_reg);
> > readb(_mmio_reg);
> 
> If they are the same device (just different data ports), I'd
> *definitely* expect them to be ordered.
> 
> We have tons of code that depends on that. Almost every driver out
> there, in fact.
> 
> So we need the mb() on alpha to guarantee the access ordering on the
> CPU side, and then PCI itself ends up guaranteeing that accesses to
> the same device will remain ordered outside the CPU.
> 
> Agreed?

Yup, agreed. I'd consider all those ports to be the same endpoint, so we're
good.

Will


Re: [PATCH] add delay between port write and port read

2019-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2019 at 10:38 AM Will Deacon  wrote:
>
> That makes sense to me for this Alpha-specific case, but in general I
> don't think we require that I/O accesses to different endpoints are
> ordered with respect to each other, which was implied by one of Maciej's
> examples. e.g.
>
> writeb(0x42, _mmio_reg);
> readb(_mmio_reg);

If they are the same device (just different data ports), I'd
*definitely* expect them to be ordered.

We have tons of code that depends on that. Almost every driver out
there, in fact.

So we need the mb() on alpha to guarantee the access ordering on the
CPU side, and then PCI itself ends up guaranteeing that accesses to
the same device will remain ordered outside the CPU.

Agreed?

Linus


Re: [PATCH] add delay between port write and port read

2019-02-26 Thread Will Deacon
On Tue, Feb 26, 2019 at 10:12:03AM -0800, Linus Torvalds wrote:
> On Tue, Feb 26, 2019 at 9:52 AM Maciej W. Rozycki  
> wrote:
> >
> >  The thing is taking for example `ioreadX' and `iowriteX' we have `mb'
> > before a write and `mb' after a read.  So if we do say (addresses are
> > arbitrary for illustration only):
> >
> > iowrite8(0, 1);
> > ioread8(2);
> >
> > then these effectively expand to chipset-specific:
> >
> > mb();
> > foo_iowrite8(0, 1);
> > foo_ioread8(2);
> > mb();
> 
> Yeah, looking at the current alpha io.c file, I'd suggest that all IO
> operations just do the "mb()" _before_ doing the op.
> 
> That way all IO ops are at least ordered wrt each other, any any IO
> ops are ordered wrt any memory writes before it (in particular the
> "people wrote to memory, and then did a IO write to start DMA" case).
> 
> Also, since "spin_unlock()" on alpha has a memory barrier before
> releasing the lock, it means that IO ends up being ordered wrt locks
> too.
> 
> Does it mean that a "ioread()" can then be delayed until after later
> non-IO memory operations? Yes. But that seems harmless
> 
> That said, I didn't spend a _ton_ of time thinking about this and
> looking at the code, but it does seem like the simplest model really
> is: "just always do a mb() before any IO ops". I think the "order IO
> wrt each other, and wrt any _preceding_ memory traffic (and all
> locking) is the important part, and that would seem to guarantee it.
> 
> Obviously there can then be other re-ordering at the IO layer level
> (ie posted writes and IO being re-ordered across different devices
> etc), but that's universal and not alpha-specific.
> 
> Does anybody see any worries with the "just move the mb() earlier in
> the ioread functions" model?

That makes sense to me for this Alpha-specific case, but in general I
don't think we require that I/O accesses to different endpoints are
ordered with respect to each other, which was implied by one of Maciej's
examples. e.g.

writeb(0x42, _mmio_reg);
readb(_mmio_reg);

are not ordered with respect to each other (and I think if you wanted
this ordering in general you're going to need more than just fences).

I've been trying to write this up in memory-barriers.txt, so if anybody
has comments or concerns I'm keen to hear them, especially if there's
a feeling that we should be providing stronger guarantees for port I/O:

http://lkml.kernel.org/r/20190222175525.1198-1-will.dea...@arm.com

Cheers,

Will


Re: [PATCH] add delay between port write and port read

2019-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2019 at 9:52 AM Maciej W. Rozycki  wrote:
>
>  The thing is taking for example `ioreadX' and `iowriteX' we have `mb'
> before a write and `mb' after a read.  So if we do say (addresses are
> arbitrary for illustration only):
>
> iowrite8(0, 1);
> ioread8(2);
>
> then these effectively expand to chipset-specific:
>
> mb();
> foo_iowrite8(0, 1);
> foo_ioread8(2);
> mb();

Yeah, looking at the current alpha io.c file, I'd suggest that all IO
operations just do the "mb()" _before_ doing the op.

That way all IO ops are at least ordered wrt each other, any any IO
ops are ordered wrt any memory writes before it (in particular the
"people wrote to memory, and then did a IO write to start DMA" case).

Also, since "spin_unlock()" on alpha has a memory barrier before
releasing the lock, it means that IO ends up being ordered wrt locks
too.

Does it mean that a "ioread()" can then be delayed until after later
non-IO memory operations? Yes. But that seems harmless

That said, I didn't spend a _ton_ of time thinking about this and
looking at the code, but it does seem like the simplest model really
is: "just always do a mb() before any IO ops". I think the "order IO
wrt each other, and wrt any _preceding_ memory traffic (and all
locking) is the important part, and that would seem to guarantee it.

Obviously there can then be other re-ordering at the IO layer level
(ie posted writes and IO being re-ordered across different devices
etc), but that's universal and not alpha-specific.

Does anybody see any worries with the "just move the mb() earlier in
the ioread functions" model?

Linus


Re: [PATCH] add delay between port write and port read

2019-02-26 Thread Maciej W. Rozycki
Linus,

 I have added you to this discussion in hope you can clear any uncertainty 
there might be about MMIO ordering properties of the Alpha; a question for 
you is towards the end of this e-mail.

On Thu, 21 Feb 2019, Mikulas Patocka wrote:

> Do you think that we should fix this by identifying hardware that needs 
> the delays and adding the delays there?

 First of all we need to correctly identify what is really going on there.  

 And I've had a look at our arch/alpha/kernel/io.c and 
arch/alpha/include/asm/io.h as they stand now and AFAICT what they have is 
completely broken and if it works anywhere, then by pure luck only, e.g. 
because external bus accesses complete quickly enough for code execution 
not to progress to a subsequent external bus access.

 The thing is taking for example `ioreadX' and `iowriteX' we have `mb' 
before a write and `mb' after a read.  So if we do say (addresses are 
arbitrary for illustration only):

iowrite8(0, 1);
ioread8(2);

then these effectively expand to chipset-specific:

mb();
foo_iowrite8(0, 1);
foo_ioread8(2);
mb();

Which means `foo_iowrite8' and `foo_ioread8' are unordered WRT each other 
as there is no barrier between them.  Worse yet for:

iowrite8(0, 1);
ioread8(1);

the `ioread8' operation may not even appear externally, as the CPU may 
peek the value in the CPU's write buffer while the write is still pending.  
A similar consideration can be done for `readX' and `writeX', as well as a 
mixture between the two kinds of these accesses.

 We do need to add a preceding `mb' to all the `*readX' accessors to 
satisfy the strong ordering requirement WRT any preceding `*writeX' 
access.  We need that leading `mb' in `*_relaxed' accessors as well.  And 
we need to alias `mmiowb' to `wmb' (unless it's removed as I've seen 
with Will's recent patches).

 NB, the 82378ZB SIO adds a minimum of 5 ISA clocks between back-to-back 
accesses from PCI to ISA, which AFAIU should be enough for the PC87332 
Super I/O.  More can be configured if required, but I doubt that is 
needed, because it wasn't for 20+ years.

> In my opinion, adding mb() to the port accessing functions is safer - it 
> is 6 line patch.

 The amount of code does not matter as long as it is wrong.  We need to 
get this right.

> Reading all the hardware manuals is time consuming and hardly anyone will 
> do it for 25 years old hardware.

 I do this regularly and I'm fine doing it this time too, also for the 
value of the education gained.  I'll see if I can experiment with my 
hardware too, but regrettably that'll have to wait a couple of months at 
the very least (due to Brexit :( -- I have left my Alpha in the UK and the 
very last weekend I literally headed off to stay away from all of the 
upcoming mess).

 Meanwhile here is what I have tracked down in the architecture manual[1]:

"Access to local I/O space does not cause any implicit read/write 
ordering; explicit barrier instructions must be used to ensure any desired 
ordering.  Barrier instructions must be used:

* After updating a memory-resident data structure and before writing a 
  local I/O space location to notify the device of the updates.

* Between multiple consecutive direct accesses to local I/O space, e.g.  
  device control registers, if those accesses are expected to be ordered
  at the device.

Again, note that implementations may cache not only memory-resident data
structures, but also local I/O space locations."

-- which in my opinion makes it clear that we need these barriers that 
were removed with commit 92d7223a7423 ("alpha: io: reorder barriers to 
guarantee writeX() and iowriteX() ordering #2"), in addition to ones added 
there (the `*_relaxed' accessors don't need the latter ones though).

 NB for some reason the whole chapter on I/O has been reduced in later 
revisions of the architecure manual[2][3] (as well as the shortened 
version of the book called the architecture handbook) to a single-page 
overview lacking any details.  I think that does not make the architecture 
ordered any more strongly WRT MMIO though; instead it makes it even less 
specified.

 Linus, can you confirm or contradict my conclusions taken from the 
manual in the context of their later removal?

 You did the Alpha port back in mid-1990s (BTW, Jon 'maddog' spoke very 
fondly about that effort at FOSDEM earlier this month) and certainly had 
to know these details to complete it and you worked with people directly 
involved with the design of the Alpha at the time it was being developed, 
and given the somewhat terse formal specification you could therefore be 
the only person around I know of who can give a definite answer.  I hope 
you still remember the bits.

 Questions, thoughts?

References:

[1] Richard L. Sites, ed., "Alpha Architecture Reference Manual", 
Digital Press, 1992, Chapter 8, "Input/Output (I)", Section 8.2.1 
"Read/Write Ordering", 


Re: [PATCH] add delay between port write and port read

2019-02-22 Thread Will Deacon
On Wed, Feb 20, 2019 at 05:52:18PM +0100, Arnd Bergmann wrote:
> On Wed, Feb 20, 2019 at 4:38 PM Maciej W. Rozycki  
> wrote:
> > On Wed, 20 Feb 2019, Mikulas Patocka wrote:
> > > >  Well, actually `iowriteX' is generic and for MMIO you have `writeX'.
> > > > See lib/iomap.c, the comment at the top and e.g. `iowrite8' there for an
> > > > actual proof.  Alpha may have an oddball implementation, but there you 
> > > > go.
> > > > Drivers will assume they can do `iowriteX' to any kind of I/O resource,
> > > > and ordering must be respected as per Documentation/memory-barriers.txt.
> > >
> > > So, do you think that the barrier whould be added to iowriteX and slow
> > > down every MMIO access?
> >
> >  We need it either for `outX' and `iowriteX' calls operating on port I/O
> > resources, or for neither of them, both at a time, to ensure the required
> > consistency between the two interfaces.  If that badly affects MMIO (and
> > is not required there; please remind me what the exact justification to
> > use `mb' here is, as it's not entirely clear to me from the commit
> > message; `mb' is a barrier and not means for a delay), then we need to
> > find a away for `iowriteX' to tell port I/O and MMIO accesses apart and
> > only apply the barrier for the former kind.
> 
> Will Deacon is in the process of sanitizing our documentation for this,
> and this part is still under discussion.

This thread seems to be confusing barriers with delays in places, but fwiw,
I agree with everything Maciej has said. I'm about to post a new version of
my memory-barriers.txt updates, so I'll cc people on that patch and we can
discuss the ordering side of things over there.

Will


ATM VISA CARD

2019-02-22 Thread ATM VISA
ATTENTION:

WE RECEIVE YOUR CONTENT OF YOUR EMAIL FORM THIS DHL MASTER CARD
OFFICES FUND OF $USD.8 MILLION  AFTER THE BOARD OF DIRECTORS
MEETING,WE DECIDED TO ISSUE YOU AN ATM VISA CARD  VALUED AT  8
MILLION DOLLAR.

THIS IS TO  BRING TO YOUR NOTICE THAT YOUR VALUED SUM OF 8 MILLION
DOLLAR HAS BEING TODAY CREDITED INTO ATM VISA CARD AND WE HAVE HANDLE
IT TO  OUR REMITANCE  DEPARTMENT  TO SEND IT TO YOU TODAY IN YOUR
FAVOR.WITH THE  VISA CARD YOU CAN BE ABLE TO MAKE WITHDRAW OF
$10,000.00 UNITED STATE DOLLARS  DAILIES AS PROGRAMMED UNTIL YOU
WITHDRAW YOUR TOTAL SUM IN ANY ATM MACHINE. WE  ARE SENDING YOU THE
SCAN COPY OF YOUR ATM VISA CARD WHICH WE HAVE REGISTERED IN OUR
SYSTEM FOR PAYMENT RECORD AS SOON AS WE RECIEVE YOUR INFORMATION AND
YOUR HOME ADDRESS YOUR ATM VISA CARD WE BE SEND TO YOU THROUGH DHL
COURIER OFFICE

WE HAVE RECEIVED A SIGNAL FROM THE SWISS WORLD BANK TO  INFECT YOUR
TRANSFER TO YOU WITHIN ONE WEEK, WE HAVE JUST FINISH OUR ANNUAL
MEETING WITH THE CENTRAL BANK OF BURKINA FASO (BCEAO) AND THE END OF
THE  MEETING WE HAVE CONCLUDED TO MAKE YOUR PAYMENT TO YOU THROUGH OUR
ATM VISA  CARD. AND YOUR VALUE SUM HAS BE CREDITED INTO YOUR ATM VISA
ACCOUNT. WHICH  YOU WILL USE TO WITHDRAW YOUR FUND IN ANY PART OF THE
WORLD, WE HAVE ISSUE AND  CREDIT YOUR ATM CARD IN YOUR NAME TODAY .THE
CARD WILL BE INSURE AND SEND TO  YOU THROUGH DHL COURIER SERVICE .VIEW
YOUR ATTACHED FILE FOR THE COPY OF YOUR  ATM CARD.THIS IS THE PAYMENT
DEPARTMENT  EMAIL ID FOR YOU TO CONTACT US

  (visadepartmen...@gmail.com)
YOU SHOULD FORWARD TO THIS FINANCIAL INSTITUTION YOUR RESIDENTIAL
ADDRESS AND CALL THE  BANK  DIRECTOR WITH THIS NUMBER :+226-79 15 93
02. CONGRATULATION  NOTICE: IF YOU FIND ANYTHING THAT YOU DON’T
UNDERSTAND PLEASE KINDLY INFORM  THIS FINANCIAL INSTITUTION FOR URGENT
 ASSISTANCE.YOU ARE ADVISE TO SEND YOUR INFORMATION DOWN IMMEDATILY
WITH YOUR PHONE NUMBER WHEN THEY ARRIVEL IN YOUR COUNTRY THEY WE CALL
YOU.
EMAIL;(visadepartmen...@gmail.com )

DIRECTOR NAME MUSA KAMAL


Re: [PATCH v4 3/3] locking/rwsem: Optimize down_read_trylock()

2019-02-21 Thread Waiman Long
On 02/21/2019 09:14 AM, Will Deacon wrote:
> On Wed, Feb 13, 2019 at 05:00:17PM -0500, Waiman Long wrote:
>> Modify __down_read_trylock() to optimize for an unlocked rwsem and make
>> it generate slightly better code.
>>
>> Before this patch, down_read_trylock:
>>
>>0x <+0>: callq  0x5 
>>0x0005 <+5>: jmp0x18 
>>0x0007 <+7>: lea0x1(%rdx),%rcx
>>0x000b <+11>:mov%rdx,%rax
>>0x000e <+14>:lock cmpxchg %rcx,(%rdi)
>>0x0013 <+19>:cmp%rax,%rdx
>>0x0016 <+22>:je 0x23 
>>0x0018 <+24>:mov(%rdi),%rdx
>>0x001b <+27>:test   %rdx,%rdx
>>0x001e <+30>:jns0x7 
>>0x0020 <+32>:xor%eax,%eax
>>0x0022 <+34>:retq
>>0x0023 <+35>:mov%gs:0x0,%rax
>>0x002c <+44>:or $0x3,%rax
>>0x0030 <+48>:mov%rax,0x20(%rdi)
>>0x0034 <+52>:mov$0x1,%eax
>>0x0039 <+57>:retq
>>
>> After patch, down_read_trylock:
>>
>>0x <+0>:  callq  0x5 
>>0x0005 <+5>:  xor%eax,%eax
>>0x0007 <+7>:  lea0x1(%rax),%rdx
>>0x000b <+11>: lock cmpxchg %rdx,(%rdi)
>>0x0010 <+16>: jne0x29 
>>0x0012 <+18>: mov%gs:0x0,%rax
>>0x001b <+27>: or $0x3,%rax
>>0x001f <+31>: mov%rax,0x20(%rdi)
>>0x0023 <+35>: mov$0x1,%eax
>>0x0028 <+40>: retq
>>0x0029 <+41>: test   %rax,%rax
>>0x002c <+44>: jns0x7 
>>0x002e <+46>: xor%eax,%eax
>>0x0030 <+48>: retq
>>
>> By using a rwsem microbenchmark, the down_read_trylock() rate (with a
>> load of 10 to lengthen the lock critical section) on a x86-64 system
>> before and after the patch were:
>>
>>  Before PatchAfter Patch
>># of Threads rlock   rlock
>> -   -
>> 1   14,496  14,716
>> 28,644   8,453
>>  46,799   6,983
>>  85,664   7,190
>>
>> On a ARM64 system, the performance results were:
>>
>>  Before PatchAfter Patch
>># of Threads rlock   rlock
>> -   -
>> 1   23,676  24,488
>> 27,697   9,502
>> 44,945   3,440
>> 82,641   1,603
>>
>> For the uncontended case (1 thread), the new down_read_trylock() is a
>> little bit faster. For the contended cases, the new down_read_trylock()
>> perform pretty well in x86-64, but performance degrades at high
>> contention level on ARM64.
>>
>> Suggested-by: Linus Torvalds 
>> Signed-off-by: Waiman Long 
>> ---
>>  kernel/locking/rwsem.h | 13 -
>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
>> index 45ee002..1f5775a 100644
>> --- a/kernel/locking/rwsem.h
>> +++ b/kernel/locking/rwsem.h
>> @@ -174,14 +174,17 @@ static inline int __down_read_killable(struct 
>> rw_semaphore *sem)
>>  
>>  static inline int __down_read_trylock(struct rw_semaphore *sem)
>>  {
>> -long tmp;
>> +/*
>> + * Optimize for the case when the rwsem is not locked at all.
>> + */
>> +long tmp = RWSEM_UNLOCKED_VALUE;
>>  
>> -while ((tmp = atomic_long_read(>count)) >= 0) {
>> -if (tmp == atomic_long_cmpxchg_acquire(>count, tmp,
>> -   tmp + RWSEM_ACTIVE_READ_BIAS)) {
>> +do {
>> +if (atomic_long_try_cmpxchg_acquire(>count, ,
>> +tmp + RWSEM_ACTIVE_READ_BIAS)) {
>>  return 1;
>>  }
>> -}
>> +} while (tmp >= 0);
> Nit: but I guess that should be RWSEM_UNLOCKED_VALUE instead of 0.
>
> Will

Using RWSEM_UNLOCKED_VALUE may be better. Anyway, it is not a big deal
as I am going to change this again in a later patch.

Thanks,
Longman

RWSEM_UNLOCKED_VALUE



Re: [PATCH] add delay between port write and port read

2019-02-21 Thread Mikulas Patocka
Do you think that we should fix this by identifying hardware that needs 
the delays and adding the delays there?

In my opinion, adding mb() to the port accessing functions is safer - it 
is 6 line patch.

Reading all the hardware manuals is time consuming and hardly anyone will 
do it for 25 years old hardware.

Mikulas



On Wed, 20 Feb 2019, Maciej W. Rozycki wrote:

> On Wed, 20 Feb 2019, Mikulas Patocka wrote:
> 
> > > Will Deacon is in the process of sanitizing our documentation for this,
> > > and this part is still under discussion.
> > > 
> > > I personally don't see a problem with having different barrier
> > > semantics between outb() and iowrite8(), the reasoning being that
> > > any caller of iowriteX() would already have to be careful about
> > > posted writes when iowriteX is backed by ioremapped memory
> > > space rather than port I/O.
> > > 
> > > I think the more important question we have to figure out here
> > > however is why exactly the barrier is needed for alpha, as I still
> > > don't fully understand the issue:
> > 
> > Port delays were common on MS-DOS with the ISA bus. On the ISA bus, the 
> > device cannot reject a transaction, so the programmer has to wait until 
> > the device is ready before talking to it.
> > 
> > My hypothesis is that the engineers who built this Alpha Avanti machine 
> > simply bought some crappy serial line and real-time clock logic from some 
> > ISA bus vendor. And so it requires delays. The PCI stuff on the same 
> > machine doesn't require any extra delays.
> 
>  What you call crap was in the day, i.e. back in 1994, a state-of-the art 
> ISA super I/O chip (the NS87332) and a standard RTC chip (the BQ3287).  
> There was no LPC in existence yet when the Avanti line was built, the 
> south bridge used was the i82378 SIO, additionally wiring two actual ISA 
> slots.
> 
>  Datasheets fo all these pieces should be available; I'm fairly sure I 
> have at least some of them myself.  Also full engineering documentation is 
> available for the Avanti.  So it can all be verified quite easily.
> 
>  If any of these devices need actual ISA access delays, then `outX_p' and 
> `inX_p' accessors should be used by the respective drivers, and we need to 
> implement these accordingly; that would be no different to a standard x86 
> PC with say an ISA serial port (I still have such hardware BTW).
> 
>  BTW, I have recently got an Avanti system myself, an AS/250 I believe, 
> although for an unknown reason placed in an AS/300 enclosure.  The main 
> board of the two systems is identical and the only difference is the 
> maximum amount of RAM supported anyway.  I have't wired it properly yet 
> though, so I can't do any verification ATM, but I will try sometime later.
> 
> > > a) maybe the barrier here is only needed to provide the
> > >non-posted PCI write behavior required by outb(), in that
> > >case I would suggest adding the barriers to outX() but
> > >perhaps not iowriteX()
> > 
> > Except for serial line and real-time-clock, the machine works OK. So 
> > there's no need to slow down regural PCI accesses that use iowriteX.
> 
>  If actual delays are required rather than strong ordering only, then the 
> respective drivers need to be fixed, however platform code still has to 
> enforce the ordering semantics we put into the respective internal 
> interfaces.
> 
> > > b) or the barriers are in fact needed for /any/ I/O operation
> > >on alpha, to ensure that a store is ordered with regard to
> > >a following load. If this is the case, we need the barriers
> > >in all three families: outX(), writeX() and iowriteX().
> > 
> > My understanding is that multiple accesses to the same PCI space are 
> > ordered.
> 
>  I believe they are not; there's not such thing as "a PCI space" from the 
> CPU's point of view, its just another MMIO location, and it's the CPU that 
> is weakly ordered, not the chipset.
> 
>  I'll come back with the relevant citations from the architecture manual, 
> ISTR I quoted them before already.
> 
>   Maciej
> 


Re: [PATCH v4 3/3] locking/rwsem: Optimize down_read_trylock()

2019-02-21 Thread Will Deacon
On Wed, Feb 13, 2019 at 05:00:17PM -0500, Waiman Long wrote:
> Modify __down_read_trylock() to optimize for an unlocked rwsem and make
> it generate slightly better code.
> 
> Before this patch, down_read_trylock:
> 
>0x <+0>: callq  0x5 
>0x0005 <+5>: jmp0x18 
>0x0007 <+7>: lea0x1(%rdx),%rcx
>0x000b <+11>:mov%rdx,%rax
>0x000e <+14>:lock cmpxchg %rcx,(%rdi)
>0x0013 <+19>:cmp%rax,%rdx
>0x0016 <+22>:je 0x23 
>0x0018 <+24>:mov(%rdi),%rdx
>0x001b <+27>:test   %rdx,%rdx
>0x001e <+30>:jns0x7 
>0x0020 <+32>:xor%eax,%eax
>0x0022 <+34>:retq
>0x0023 <+35>:mov%gs:0x0,%rax
>0x002c <+44>:or $0x3,%rax
>0x0030 <+48>:mov%rax,0x20(%rdi)
>0x0034 <+52>:mov$0x1,%eax
>0x0039 <+57>:retq
> 
> After patch, down_read_trylock:
> 
>0x <+0>:   callq  0x5 
>0x0005 <+5>:   xor%eax,%eax
>0x0007 <+7>:   lea0x1(%rax),%rdx
>0x000b <+11>:  lock cmpxchg %rdx,(%rdi)
>0x0010 <+16>:  jne0x29 
>0x0012 <+18>:  mov%gs:0x0,%rax
>0x001b <+27>:  or $0x3,%rax
>0x001f <+31>:  mov%rax,0x20(%rdi)
>0x0023 <+35>:  mov$0x1,%eax
>0x0028 <+40>:  retq
>0x0029 <+41>:  test   %rax,%rax
>0x002c <+44>:  jns0x7 
>0x002e <+46>:  xor%eax,%eax
>0x0030 <+48>:  retq
> 
> By using a rwsem microbenchmark, the down_read_trylock() rate (with a
> load of 10 to lengthen the lock critical section) on a x86-64 system
> before and after the patch were:
> 
>  Before PatchAfter Patch
># of Threads rlock   rlock
> -   -
> 1   14,496  14,716
> 28,644   8,453
>   46,799   6,983
>   85,664   7,190
> 
> On a ARM64 system, the performance results were:
> 
>  Before PatchAfter Patch
># of Threads rlock   rlock
> -   -
> 1   23,676  24,488
> 27,697   9,502
> 44,945   3,440
> 82,641   1,603
> 
> For the uncontended case (1 thread), the new down_read_trylock() is a
> little bit faster. For the contended cases, the new down_read_trylock()
> perform pretty well in x86-64, but performance degrades at high
> contention level on ARM64.
> 
> Suggested-by: Linus Torvalds 
> Signed-off-by: Waiman Long 
> ---
>  kernel/locking/rwsem.h | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
> index 45ee002..1f5775a 100644
> --- a/kernel/locking/rwsem.h
> +++ b/kernel/locking/rwsem.h
> @@ -174,14 +174,17 @@ static inline int __down_read_killable(struct 
> rw_semaphore *sem)
>  
>  static inline int __down_read_trylock(struct rw_semaphore *sem)
>  {
> - long tmp;
> + /*
> +  * Optimize for the case when the rwsem is not locked at all.
> +  */
> + long tmp = RWSEM_UNLOCKED_VALUE;
>  
> - while ((tmp = atomic_long_read(>count)) >= 0) {
> - if (tmp == atomic_long_cmpxchg_acquire(>count, tmp,
> -tmp + RWSEM_ACTIVE_READ_BIAS)) {
> + do {
> + if (atomic_long_try_cmpxchg_acquire(>count, ,
> + tmp + RWSEM_ACTIVE_READ_BIAS)) {
>   return 1;
>   }
> - }
> + } while (tmp >= 0);

Nit: but I guess that should be RWSEM_UNLOCKED_VALUE instead of 0.

Will


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-21 Thread Jan Kara
On Thu 21-02-19 01:23:50, Meelis Roos wrote:
> > > First, I found out that both the problematic alphas had memory compaction 
> > > and
> > > page migration and bounce buffers turned on, and working alphas had them 
> > > off.
> > > 
> > > Next, turing off these options makes the problematic alphas work.
> > 
> > OK, thanks for testing! Can you narrow down whether the problem is due to
> > CONFIG_BOUNCE or CONFIG_MIGRATION + CONFIG_COMPACTION? These are two
> > completely different things so knowing where to look will help. Thanks!
> 
> Tested both.
> 
> Just CONFIG_MIGRATION + CONFIG_COMPACTION breaks the alpha.
> Just CONFIG_BOUNCE has no effect in 5 tries.

OK, so page migration is problematic. Thanks for confirmation!

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-20 Thread Meelis Roos

First, I found out that both the problematic alphas had memory compaction and
page migration and bounce buffers turned on, and working alphas had them off.

Next, turing off these options makes the problematic alphas work.


OK, thanks for testing! Can you narrow down whether the problem is due to
CONFIG_BOUNCE or CONFIG_MIGRATION + CONFIG_COMPACTION? These are two
completely different things so knowing where to look will help. Thanks!


Tested both.

Just CONFIG_MIGRATION + CONFIG_COMPACTION breaks the alpha.
Just CONFIG_BOUNCE has no effect in 5 tries.

--
Meelis Roos


Re: [PATCH] add delay between port write and port read

2019-02-20 Thread Maciej W. Rozycki
On Wed, 20 Feb 2019, Mikulas Patocka wrote:

> > Will Deacon is in the process of sanitizing our documentation for this,
> > and this part is still under discussion.
> > 
> > I personally don't see a problem with having different barrier
> > semantics between outb() and iowrite8(), the reasoning being that
> > any caller of iowriteX() would already have to be careful about
> > posted writes when iowriteX is backed by ioremapped memory
> > space rather than port I/O.
> > 
> > I think the more important question we have to figure out here
> > however is why exactly the barrier is needed for alpha, as I still
> > don't fully understand the issue:
> 
> Port delays were common on MS-DOS with the ISA bus. On the ISA bus, the 
> device cannot reject a transaction, so the programmer has to wait until 
> the device is ready before talking to it.
> 
> My hypothesis is that the engineers who built this Alpha Avanti machine 
> simply bought some crappy serial line and real-time clock logic from some 
> ISA bus vendor. And so it requires delays. The PCI stuff on the same 
> machine doesn't require any extra delays.

 What you call crap was in the day, i.e. back in 1994, a state-of-the art 
ISA super I/O chip (the NS87332) and a standard RTC chip (the BQ3287).  
There was no LPC in existence yet when the Avanti line was built, the 
south bridge used was the i82378 SIO, additionally wiring two actual ISA 
slots.

 Datasheets fo all these pieces should be available; I'm fairly sure I 
have at least some of them myself.  Also full engineering documentation is 
available for the Avanti.  So it can all be verified quite easily.

 If any of these devices need actual ISA access delays, then `outX_p' and 
`inX_p' accessors should be used by the respective drivers, and we need to 
implement these accordingly; that would be no different to a standard x86 
PC with say an ISA serial port (I still have such hardware BTW).

 BTW, I have recently got an Avanti system myself, an AS/250 I believe, 
although for an unknown reason placed in an AS/300 enclosure.  The main 
board of the two systems is identical and the only difference is the 
maximum amount of RAM supported anyway.  I have't wired it properly yet 
though, so I can't do any verification ATM, but I will try sometime later.

> > a) maybe the barrier here is only needed to provide the
> >non-posted PCI write behavior required by outb(), in that
> >case I would suggest adding the barriers to outX() but
> >perhaps not iowriteX()
> 
> Except for serial line and real-time-clock, the machine works OK. So 
> there's no need to slow down regural PCI accesses that use iowriteX.

 If actual delays are required rather than strong ordering only, then the 
respective drivers need to be fixed, however platform code still has to 
enforce the ordering semantics we put into the respective internal 
interfaces.

> > b) or the barriers are in fact needed for /any/ I/O operation
> >on alpha, to ensure that a store is ordered with regard to
> >a following load. If this is the case, we need the barriers
> >in all three families: outX(), writeX() and iowriteX().
> 
> My understanding is that multiple accesses to the same PCI space are 
> ordered.

 I believe they are not; there's not such thing as "a PCI space" from the 
CPU's point of view, its just another MMIO location, and it's the CPU that 
is weakly ordered, not the chipset.

 I'll come back with the relevant citations from the architecture manual, 
ISTR I quoted them before already.

  Maciej


Re: [PATCH] add delay between port write and port read

2019-02-20 Thread Mikulas Patocka



On Wed, 20 Feb 2019, Arnd Bergmann wrote:

> On Wed, Feb 20, 2019 at 4:38 PM Maciej W. Rozycki  
> wrote:
> > On Wed, 20 Feb 2019, Mikulas Patocka wrote:
> > > >  Well, actually `iowriteX' is generic and for MMIO you have `writeX'.
> > > > See lib/iomap.c, the comment at the top and e.g. `iowrite8' there for an
> > > > actual proof.  Alpha may have an oddball implementation, but there you 
> > > > go.
> > > > Drivers will assume they can do `iowriteX' to any kind of I/O resource,
> > > > and ordering must be respected as per Documentation/memory-barriers.txt.
> > >
> > > So, do you think that the barrier whould be added to iowriteX and slow
> > > down every MMIO access?
> >
> >  We need it either for `outX' and `iowriteX' calls operating on port I/O
> > resources, or for neither of them, both at a time, to ensure the required
> > consistency between the two interfaces.  If that badly affects MMIO (and
> > is not required there; please remind me what the exact justification to
> > use `mb' here is, as it's not entirely clear to me from the commit
> > message; `mb' is a barrier and not means for a delay), then we need to
> > find a away for `iowriteX' to tell port I/O and MMIO accesses apart and
> > only apply the barrier for the former kind.
> 
> Will Deacon is in the process of sanitizing our documentation for this,
> and this part is still under discussion.
> 
> I personally don't see a problem with having different barrier
> semantics between outb() and iowrite8(), the reasoning being that
> any caller of iowriteX() would already have to be careful about
> posted writes when iowriteX is backed by ioremapped memory
> space rather than port I/O.
> 
> I think the more important question we have to figure out here
> however is why exactly the barrier is needed for alpha, as I still
> don't fully understand the issue:

Port delays were common on MS-DOS with the ISA bus. On the ISA bus, the 
device cannot reject a transaction, so the programmer has to wait until 
the device is ready before talking to it.

My hypothesis is that the engineers who built this Alpha Avanti machine 
simply bought some crappy serial line and real-time clock logic from some 
ISA bus vendor. And so it requires delays. The PCI stuff on the same 
machine doesn't require any extra delays.

> a) maybe the barrier here is only needed to provide the
>non-posted PCI write behavior required by outb(), in that
>case I would suggest adding the barriers to outX() but
>perhaps not iowriteX()

Except for serial line and real-time-clock, the machine works OK. So 
there's no need to slow down regural PCI accesses that use iowriteX.

> b) or the barriers are in fact needed for /any/ I/O operation
>on alpha, to ensure that a store is ordered with regard to
>a following load. If this is the case, we need the barriers
>in all three families: outX(), writeX() and iowriteX().
> 
>   Arnd

My understanding is that multiple accesses to the same PCI space are 
ordered.

Mikulas


Re: [PATCH] add delay between port write and port read

2019-02-20 Thread Maciej W. Rozycki
On Wed, 20 Feb 2019, Arnd Bergmann wrote:

> I personally don't see a problem with having different barrier
> semantics between outb() and iowrite8(), the reasoning being that
> any caller of iowriteX() would already have to be careful about
> posted writes when iowriteX is backed by ioremapped memory
> space rather than port I/O.

 The problem is architectures (such as certainly MIPS) may imply posted 
writes for accesses that ultimately end up as port I/O (most architectures 
obviously have no real port I/O, and it all boils down to accessing an 
MMIO address range used to issue I/O access bus cycles to ISA or PCI), due 
to write buffering within the CPU itself.  This has to be dealt with by 
explicits barrier instructions inserted in port-specific I/O accessors so 
that generic driver code has a consistent view regardless of the 
peculiarities of the underlying CPU architecture.

> b) or the barriers are in fact needed for /any/ I/O operation
>on alpha, to ensure that a store is ordered with regard to
>a following load. If this is the case, we need the barriers
>in all three families: outX(), writeX() and iowriteX().

 I do believe Alpha is weakly ordered, at least architecturally, when it 
comes to any I/O (uncached) operations; I'll double-check with the 
architecture manual, as I recall the wording there was peculiar.

 NB I did a bunch of fixes for the MIPS port somewhat recently:

8b656253a7a4 MIPS: Provide actually relaxed MMIO accessors
3d474dacae72 MIPS: Enforce strong ordering for MMIO accessors
a711d43cbbaa MIPS: Correct `mmiowb' barrier for `wbflush' platforms
4ae0452bddca MIPS: Define MMIO ordering barriers

as it was similarly broken in this regard.

  Maciej


Re: [PATCH] add delay between port write and port read

2019-02-20 Thread Mikulas Patocka



On Tue, 19 Feb 2019, Maciej W. Rozycki wrote:

> On Tue, 19 Feb 2019, Mikulas Patocka wrote:
> 
> > >  As I say, shouldn't the barrier be in `iowriteX' instead?  I don't 
> > > suppose these are allowed to be weakly ordered.
> > > 
> > >   Maciej
> > 
> > iowriteX is for memory-mapped I/O, out[bwl] is for I/O ports.
> 
>  Well, actually `iowriteX' is generic and for MMIO you have `writeX'.  
> See lib/iomap.c, the comment at the top and e.g. `iowrite8' there for an 
> actual proof.  Alpha may have an oddball implementation, but there you go. 
> Drivers will assume they can do `iowriteX' to any kind of I/O resource, 
> and ordering must be respected as per Documentation/memory-barriers.txt.
> 
>   Maciej

So, do you think that the barrier whould be added to iowriteX and slow 
down every MMIO access?

Mikulas


Re: [PATCH v4 1/3] locking/rwsem: Remove arch specific rwsem files

2019-02-20 Thread Will Deacon
On Wed, Feb 13, 2019 at 05:00:15PM -0500, Waiman Long wrote:
> As the generic rwsem-xadd code is using the appropriate acquire and
> release versions of the atomic operations, the arch specific rwsem.h
> files will not be that much faster than the generic code as long as the
> atomic functions are properly implemented. So we can remove those arch
> specific rwsem.h and stop building asm/rwsem.h to reduce maintenance
> effort.
> 
> Currently, only x86, alpha and ia64 have implemented architecture
> specific fast paths. I don't have access to alpha and ia64 systems for
> testing, but they are legacy systems that are not likely to be updated
> to the latest kernel anyway.
> 
> By using a rwsem microbenchmark, the total locking rates on a 4-socket
> 56-core 112-thread x86-64 system before and after the patch were as
> follows (mixed means equal # of read and write locks):
> 
>   Before Patch  After Patch
># of Threads  wlock   rlock   mixed wlock   rlock   mixed
>  -   -   - -   -   -
> 129,201  30,143  29,45828,615  30,172  29,201
> 2 6,807  13,299   1,171 7,725  15,025   1,804
> 4 6,504  12,755   1,520 7,127  14,286   1,345
> 8 6,762  13,412 764 6,826  13,652 726
>16 6,693  15,408 662 6,599  15,938 626
>32 6,145  15,286 496 5,549  15,487 511
>64 5,812  15,495  60 5,858  15,572  60
> 
> There were some run-to-run variations for the multi-thread tests. For
> x86-64, using the generic C code fast path seems to be a little bit
> faster than the assembly version with low lock contention.  Looking at
> the assembly version of the fast paths, there are assembly to/from C
> code wrappers that save and restore all the callee-clobbered registers
> (7 registers on x86-64). The assembly generated from the generic C
> code doesn't need to do that. That may explain the slight performance
> gain here.
> 
> The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h
> with no code change as no other code other than those under
> kernel/locking needs to access the internal rwsem macros and functions.
> 
> Signed-off-by: Waiman Long 
> ---
>  MAINTAINERS |   1 -
>  arch/alpha/include/asm/rwsem.h  | 211 ---
>  arch/arm/include/asm/Kbuild |   1 -
>  arch/arm64/include/asm/Kbuild   |   1 -
>  arch/hexagon/include/asm/Kbuild |   1 -
>  arch/ia64/include/asm/rwsem.h   | 172 -
>  arch/powerpc/include/asm/Kbuild |   1 -
>  arch/s390/include/asm/Kbuild|   1 -
>  arch/sh/include/asm/Kbuild  |   1 -
>  arch/sparc/include/asm/Kbuild   |   1 -
>  arch/x86/include/asm/rwsem.h| 237 
> 
>  arch/x86/lib/Makefile   |   1 -
>  arch/x86/lib/rwsem.S| 156 --
>  arch/x86/um/Makefile|   1 -
>  arch/xtensa/include/asm/Kbuild  |   1 -
>  include/asm-generic/rwsem.h | 140 
>  include/linux/rwsem.h   |   4 +-
>  kernel/locking/percpu-rwsem.c   |   2 +
>  kernel/locking/rwsem.h  | 130 ++
>  19 files changed, 133 insertions(+), 930 deletions(-)
>  delete mode 100644 arch/alpha/include/asm/rwsem.h
>  delete mode 100644 arch/ia64/include/asm/rwsem.h
>  delete mode 100644 arch/x86/include/asm/rwsem.h
>  delete mode 100644 arch/x86/lib/rwsem.S
>  delete mode 100644 include/asm-generic/rwsem.h

Looks like a nice cleanup, thanks:

Acked-by: Will Deacon 

Will


Re: [PATCH v4 2/3] locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs

2019-02-20 Thread Will Deacon
On Wed, Feb 13, 2019 at 05:00:16PM -0500, Waiman Long wrote:
> Currently, we have two different implementation of rwsem:
>  1) CONFIG_RWSEM_GENERIC_SPINLOCK (rwsem-spinlock.c)
>  2) CONFIG_RWSEM_XCHGADD_ALGORITHM (rwsem-xadd.c)
> 
> As we are going to use a single generic implementation for rwsem-xadd.c
> and no architecture-specific code will be needed, there is no point
> in keeping two different implementations of rwsem. In most cases, the
> performance of rwsem-spinlock.c will be worse. It also doesn't get all
> the performance tuning and optimizations that had been implemented in
> rwsem-xadd.c over the years.
> 
> For simplication, we are going to remove rwsem-spinlock.c and make all
> architectures use a single implementation of rwsem - rwsem-xadd.c.
> 
> All references to RWSEM_GENERIC_SPINLOCK and RWSEM_XCHGADD_ALGORITHM
> in the code are removed.
> 
> Suggested-by: Peter Zijlstra 
> Signed-off-by: Waiman Long 
> ---
>  arch/alpha/Kconfig  |   7 -
>  arch/arc/Kconfig|   3 -
>  arch/arm/Kconfig|   4 -
>  arch/arm64/Kconfig  |   3 -
>  arch/c6x/Kconfig|   3 -
>  arch/csky/Kconfig   |   3 -
>  arch/h8300/Kconfig  |   3 -
>  arch/hexagon/Kconfig|   6 -
>  arch/ia64/Kconfig   |   4 -
>  arch/m68k/Kconfig   |   7 -
>  arch/microblaze/Kconfig |   6 -
>  arch/mips/Kconfig   |   7 -
>  arch/nds32/Kconfig  |   3 -
>  arch/nios2/Kconfig  |   3 -
>  arch/openrisc/Kconfig   |   6 -
>  arch/parisc/Kconfig |   6 -
>  arch/powerpc/Kconfig|   7 -
>  arch/riscv/Kconfig  |   3 -
>  arch/s390/Kconfig   |   6 -
>  arch/sh/Kconfig |   6 -
>  arch/sparc/Kconfig  |   8 -
>  arch/unicore32/Kconfig  |   6 -
>  arch/x86/Kconfig|   3 -
>  arch/x86/um/Kconfig |   6 -
>  arch/xtensa/Kconfig |   3 -
>  include/linux/rwsem-spinlock.h  |  47 --
>  include/linux/rwsem.h   |   5 -
>  kernel/Kconfig.locks|   2 +-
>  kernel/locking/Makefile |   4 +-
>  kernel/locking/rwsem-spinlock.c | 339 
> 
>  kernel/locking/rwsem.h  |   3 -
>  31 files changed, 2 insertions(+), 520 deletions(-)
>  delete mode 100644 include/linux/rwsem-spinlock.h
>  delete mode 100644 kernel/locking/rwsem-spinlock.c

Another nice cleanup, and it looks like the optimistic spinning in the xadd
implementation is predicated on ARCH_SUPPORTS_ATOMIC_RMW, so we're all good
there too.

Acked-by: Will Deacon 

Will


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-20 Thread Jan Kara
On Wed 20-02-19 08:31:05, Meelis Roos wrote:
> > Could
> > https://lore.kernel.org/linux-mm/20190219123212.29838-1-lar...@axis.com/T/#u
> > be relevant?
> 
> Tried it, still broken.

OK, I didn't put too much hope into this patch as you see filesystem
metadata corruption so icache/dcache coherency issues seemed unlikely.
Still good that you've tried so that we are sure.

> I wrote:
> 
> > But my kernel config had memory compaction (that turned on page migration) 
> > and
> > bounce buffers. I do not remember why I found them necessary but I will try
> > without them.
> 
> First, I found out that both the problematic alphas had memory compaction and
> page migration and bounce buffers turned on, and working alphas had them off.
> 
> Next, turing off these options makes the problematic alphas work.

OK, thanks for testing! Can you narrow down whether the problem is due to
CONFIG_BOUNCE or CONFIG_MIGRATION + CONFIG_COMPACTION? These are two
completely different things so knowing where to look will help. Thanks!

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-19 Thread Meelis Roos

Could
https://lore.kernel.org/linux-mm/20190219123212.29838-1-lar...@axis.com/T/#u
be relevant?


Tried it, still broken.

I wrote:


But my kernel config had memory compaction (that turned on page migration) and
bounce buffers. I do not remember why I found them necessary but I will try
without them. 


First, I found out that both the problematic alphas had memory compaction and
page migration and bounce buffers turned on, and working alphas had them off.

Next, turing off these options makes the problematic alphas work.

--
Meelis Roos 


Re: [PATCH] add delay between port write and port read

2019-02-19 Thread Maciej W. Rozycki
On Tue, 19 Feb 2019, Mikulas Patocka wrote:

> >  As I say, shouldn't the barrier be in `iowriteX' instead?  I don't 
> > suppose these are allowed to be weakly ordered.
> > 
> >   Maciej
> 
> iowriteX is for memory-mapped I/O, out[bwl] is for I/O ports.

 Well, actually `iowriteX' is generic and for MMIO you have `writeX'.  
See lib/iomap.c, the comment at the top and e.g. `iowrite8' there for an 
actual proof.  Alpha may have an oddball implementation, but there you go. 
Drivers will assume they can do `iowriteX' to any kind of I/O resource, 
and ordering must be respected as per Documentation/memory-barriers.txt.

  Maciej


Re: [PATCH] add delay between port write and port read

2019-02-19 Thread Mikulas Patocka



On Tue, 19 Feb 2019, Arnd Bergmann wrote:

> On Tue, Feb 19, 2019 at 2:44 PM Mikulas Patocka  wrote:
> > On Tue, 19 Feb 2019, Mikulas Patocka wrote:
> >
> > > The patches cd0e00c106722eca40b38ebf11cf134c01901086 and
> > > 92d7223a74235054f2aa7227d207d9c57f84dca0 fix a theoretical issue where the
> > > code didn't follow the specification. Unfortunatelly, they also reduce
> > > timing when port write is followed by a port read.
> > >
> > > These reduced timing cause hang on boot on the Avanti platform when
> > > probing serial ports. This patch adds memory barrier after the outb, outw,
> > > outl functions, so that there is delay between port write and subsequent
> > > port read - just like before.
> > >
> > > Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() 
> > > and iowriteX() ordering")
> > > Cc: sta...@vger.kernel.org# v4.17+
> >
> > you can also add:
> >
> > Tested-by: Mikulas Patocka 
> 
> Acked-by: Arnd Bergmann 
> 
> but I notice you are missing Signed-off-by.

I forgot about it. You made that patch, so there should be your signed-off 
too.

Signed-off-by: Mikulas Patocka 

> We clearly need this patch, but I assumed the alpha maintainers would pick
> it up, not me. I merged the original changes since they were 
> cross-architecture,
> but I don't normally take patches for a particular architecture through the
> asm-generic tree (or the soc tree for that matter).
> 
>   Arnd
> 


Re: [PATCH] add delay between port write and port read

2019-02-19 Thread Mikulas Patocka



On Tue, 19 Feb 2019, Maciej W. Rozycki wrote:

> On Tue, 19 Feb 2019, Arnd Bergmann wrote:
> 
> > We clearly need this patch, but I assumed the alpha maintainers would pick
> > it up, not me. I merged the original changes since they were 
> > cross-architecture,
> > but I don't normally take patches for a particular architecture through the
> > asm-generic tree (or the soc tree for that matter).
> 
>  As I say, shouldn't the barrier be in `iowriteX' instead?  I don't 
> suppose these are allowed to be weakly ordered.
> 
>   Maciej

iowriteX is for memory-mapped I/O, out[bwl] is for I/O ports.

If someone finds a problem with memory-mapped device, he can add the 
barier there.

Mikulas


Re: [PATCH] add delay between port write and port read

2019-02-19 Thread Maciej W. Rozycki
On Tue, 19 Feb 2019, Arnd Bergmann wrote:

> We clearly need this patch, but I assumed the alpha maintainers would pick
> it up, not me. I merged the original changes since they were 
> cross-architecture,
> but I don't normally take patches for a particular architecture through the
> asm-generic tree (or the soc tree for that matter).

 As I say, shouldn't the barrier be in `iowriteX' instead?  I don't 
suppose these are allowed to be weakly ordered.

  Maciej


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-19 Thread Matthew Wilcox
On Tue, Feb 19, 2019 at 02:20:26PM +0100, Jan Kara wrote:
> Thanks for information. Yeah, that makes somewhat more sense. Can you ever
> see the failure if you disable CONFIG_TRANSPARENT_HUGEPAGE? Because your
> findings still seem to indicate that there' some problem with page
> migration and Alpha (added MM list to CC).

Could
https://lore.kernel.org/linux-mm/20190219123212.29838-1-lar...@axis.com/T/#u
be relevant?


Re: [PATCH] add delay between port write and port read

2019-02-19 Thread Arnd Bergmann
On Tue, Feb 19, 2019 at 2:44 PM Mikulas Patocka  wrote:
> On Tue, 19 Feb 2019, Mikulas Patocka wrote:
>
> > The patches cd0e00c106722eca40b38ebf11cf134c01901086 and
> > 92d7223a74235054f2aa7227d207d9c57f84dca0 fix a theoretical issue where the
> > code didn't follow the specification. Unfortunatelly, they also reduce
> > timing when port write is followed by a port read.
> >
> > These reduced timing cause hang on boot on the Avanti platform when
> > probing serial ports. This patch adds memory barrier after the outb, outw,
> > outl functions, so that there is delay between port write and subsequent
> > port read - just like before.
> >
> > Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and 
> > iowriteX() ordering")
> > Cc: sta...@vger.kernel.org# v4.17+
>
> you can also add:
>
> Tested-by: Mikulas Patocka 

Acked-by: Arnd Bergmann 

but I notice you are missing Signed-off-by.

We clearly need this patch, but I assumed the alpha maintainers would pick
it up, not me. I merged the original changes since they were cross-architecture,
but I don't normally take patches for a particular architecture through the
asm-generic tree (or the soc tree for that matter).

  Arnd


[PATCH] add delay between port write and port read

2019-02-19 Thread Mikulas Patocka
The patches cd0e00c106722eca40b38ebf11cf134c01901086 and
92d7223a74235054f2aa7227d207d9c57f84dca0 fix a theoretical issue where the
code didn't follow the specification. Unfortunatelly, they also reduce
timing when port write is followed by a port read.

These reduced timing cause hang on boot on the Avanti platform when
probing serial ports. This patch adds memory barrier after the outb, outw,
outl functions, so that there is delay between port write and subsequent
port read - just like before.

Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and 
iowriteX() ordering")
Cc: sta...@vger.kernel.org  # v4.17+

---
 arch/alpha/kernel/io.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-stable/arch/alpha/kernel/io.c
===
--- linux-stable.orig/arch/alpha/kernel/io.c2019-02-19 14:06:03.0 
+0100
+++ linux-stable/arch/alpha/kernel/io.c 2019-02-19 14:12:29.0 +0100
@@ -78,16 +78,19 @@ u32 inl(unsigned long port)
 void outb(u8 b, unsigned long port)
 {
iowrite8(b, ioport_map(port, 1));
+   mb();
 }
 
 void outw(u16 b, unsigned long port)
 {
iowrite16(b, ioport_map(port, 2));
+   mb();
 }
 
 void outl(u32 b, unsigned long port)
 {
iowrite32(b, ioport_map(port, 4));
+   mb();
 }
 
 EXPORT_SYMBOL(inb);
@@ -336,6 +339,7 @@ void iowrite8_rep(void __iomem *port, co
 void outsb(unsigned long port, const void *src, unsigned long count)
 {
iowrite8_rep(ioport_map(port, 1), src, count);
+   mb();
 }
 
 EXPORT_SYMBOL(iowrite8_rep);
@@ -376,6 +380,7 @@ void iowrite16_rep(void __iomem *port, c
 void outsw(unsigned long port, const void *src, unsigned long count)
 {
iowrite16_rep(ioport_map(port, 2), src, count);
+   mb();
 }
 
 EXPORT_SYMBOL(iowrite16_rep);
@@ -408,6 +413,7 @@ void iowrite32_rep(void __iomem *port, c
 void outsl(unsigned long port, const void *src, unsigned long count)
 {
iowrite32_rep(ioport_map(port, 4), src, count);
+   mb();
 }
 
 EXPORT_SYMBOL(iowrite32_rep);


Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130

2019-02-19 Thread Mikulas Patocka



On Tue, 21 Aug 2018, Arnd Bergmann wrote:

> On Mon, Aug 20, 2018 at 11:42 PM Mikulas Patocka  wrote:
> > On Mon, 20 Aug 2018, Arnd Bergmann wrote:
> >
> > > On Mon, Aug 20, 2018 at 4:17 PM Mikulas Patocka  
> > > wrote:
> > > > On Sun, 19 Aug 2018, ok...@codeaurora.org wrote:
> > > >
> > > > > +my new email
> > > > >
> > > > > On 2018-08-18 19:03, Arnd Bergmann wrote:
> > > > > > On Sat, Aug 18, 2018 at 12:05 AM Maciej W. Rozycki 
> > > > > > 
> > >
> > > > > I think we need to identify the driver that is failing.
> > > >
> > > > It also may be some timing issue.
> > > >
> > > > I observed that not every kernel with the patch
> > > > 92d7223a74235054f2aa7227d207d9c57f84dca0 fails, some of them get stuck
> > > > only at boot, some get stuck only at shutdown, some not at all. Although
> > > > all the kernels with this patch reverted work.
> > > >
> > > > So the patch may have uncovered some timing problem somewhere.
> > > >
> > > > x86 has the function io_delay that injects delays between I/O accesses 
> > > > for
> > > > hardware that needs it - does alpha have something like this?
> > >
> > > The I/O delay would be very low on my list of possible root causes
> > > for this, hardly any hardware at all relies on it, and all uses I see
> > > are related to outb(), which you've already shown not to be the problem
> > > with my test patch.
> > >
> > The lockup happens somewhere in the function autoconfig in
> > drivers/tty/serial/8250/8250_port.c, but I don't know where exactly
> > because serial console doesn't work while the port is being probed.
> >
> > When I use console on a graphics card, the lockup doesn't happen.
> 
> Ok, this does strongly suggest that it is the outb() operation that I
> suspected after all, I just sent you a wrong patch to test, failing
> to realize that alpha has two implementations of outb, and that the
> extern one is the one that gets used in a defconfig build.
> 
> Could you try again with this patch added in? (Sorry for the whitespace
> damage, you'll have to apply it by hand). Presumably a wmb()
> is sufficient here, but I'm trying to play safe here by restoring the
> barrier that was part of outb() before it broke.
> 
>Arnd

Hi

Will you commit this patch?

The avanti platform is still broken in the kernel 5.0 and I tested that 
this patch fixes it.

Mikulas


> diff --git a/arch/alpha/kernel/io.c b/arch/alpha/kernel/io.c
> index c025a3e5e357..604237fa821f 100644
> --- a/arch/alpha/kernel/io.c
> +++ b/arch/alpha/kernel/io.c
> @@ -78,16 +78,19 @@ u32 inl(unsigned long port)
>  void outb(u8 b, unsigned long port)
>  {
> iowrite8(b, ioport_map(port, 1));
> +   mb();
>  }
> 
>  void outw(u16 b, unsigned long port)
>  {
> iowrite16(b, ioport_map(port, 2));
> +   mb();
>  }
> 
>  void outl(u32 b, unsigned long port)
>  {
> iowrite32(b, ioport_map(port, 4));
> +   mb();
>  }
> 
>  EXPORT_SYMBOL(inb);
> @@ -336,6 +339,7 @@ void iowrite8_rep(void __iomem *port, const void
> *xsrc, unsigned long count)
>  void outsb(unsigned long port, const void *src, unsigned long count)
>  {
> iowrite8_rep(ioport_map(port, 1), src, count);
> +   mb();
>  }
> 
>  EXPORT_SYMBOL(iowrite8_rep);
> @@ -376,6 +380,7 @@ void iowrite16_rep(void __iomem *port, const void
> *src, unsigned long count)
>  void outsw(unsigned long port, const void *src, unsigned long count)
>  {
> iowrite16_rep(ioport_map(port, 2), src, count);
> +   mb();
>  }
> 
>  EXPORT_SYMBOL(iowrite16_rep);
> @@ -408,6 +413,7 @@ void iowrite32_rep(void __iomem *port, const void
> *src, unsigned long count)
>  void outsl(unsigned long port, const void *src, unsigned long count)
>  {
> iowrite32_rep(ioport_map(port, 4), src, count);
> +   mb();
>  }
> 
>  EXPORT_SYMBOL(iowrite32_rep);
> 


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-19 Thread Jan Kara
On Tue 19-02-19 14:17:09, Meelis Roos wrote:
> > > > > The result of the bisection is
> > > > > [88dbcbb3a4847f5e6dfeae952d3105497700c128] blkdev: avoid migration 
> > > > > stalls for blkdev pages
> > > > > 
> > > > > Is that result relevant for the problem or should I continue 
> > > > > bisecting between 4.20.0 and the so far first bad commit?
> > > > 
> > > > Can you try reverting the commit and see if it makes the problem go 
> > > > away?
> > > 
> > > Tried reverting it on top of 5.0.0-rc6-00153-g5ded5871030e and it seems
> > > to make the kernel work - emerge --sync succeeded.
> There is more to it.
> 
> After running 5.0.0-rc6-00153-g5ded5871030e-dirty (with the revert of
> that patch) successfully for Gentoo update, I upgraded the kernel to
> 5.0.0-rc7-00011-gb5372fe5dc84-dirty (todays git + revert of this patch)
> and it broke on rsync again:
> 
> RepoStorageException: command exited with status -6: rsync -a --link-dest 
> /usr/portage --exclude=/distfiles --exclude=/local --exclude=/lost+found 
> --exclude=/packages --exclude /.tmp-unverified-download-quarantine 
> /usr/portage/ /usr/portage/.tmp-unverified-download-quarantine/
> 
> Nothing in dmesg.
> 
> This means the real root reason is somewhere deeper and reverting this
> commit just made it less likely to happen.

Thanks for information. Yeah, that makes somewhat more sense. Can you ever
see the failure if you disable CONFIG_TRANSPARENT_HUGEPAGE? Because your
findings still seem to indicate that there' some problem with page
migration and Alpha (added MM list to CC).

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-19 Thread Meelis Roos

The result of the bisection is
[88dbcbb3a4847f5e6dfeae952d3105497700c128] blkdev: avoid migration stalls for 
blkdev pages

Is that result relevant for the problem or should I continue bisecting between 
4.20.0 and the so far first bad commit?


Can you try reverting the commit and see if it makes the problem go away?


Tried reverting it on top of 5.0.0-rc6-00153-g5ded5871030e and it seems
to make the kernel work - emerge --sync succeeded.

There is more to it.

After running 5.0.0-rc6-00153-g5ded5871030e-dirty (with the revert of that 
patch)
successfully for Gentoo update, I upgraded the kernel to
5.0.0-rc7-00011-gb5372fe5dc84-dirty (todays git + revert of this patch) and it 
broke on rsync again:

RepoStorageException: command exited with status -6: rsync -a --link-dest 
/usr/portage --exclude=/distfiles --exclude=/local --exclude=/lost+found 
--exclude=/packages --exclude /.tmp-unverified-download-quarantine 
/usr/portage/ /usr/portage/.tmp-unverified-download-quarantine/

Nothing in dmesg.

This means the real root reason is somewhere deeper and reverting this commit 
just made
it less likely to happen.

--
Meelis Roos 


Re: [PATCH v4 0/3] locking/rwsem: Rwsem rearchitecture part 0

2019-02-18 Thread Will Deacon
On Fri, Feb 15, 2019 at 01:58:34PM -0500, Waiman Long wrote:
> On 02/15/2019 01:40 PM, Will Deacon wrote:
> > On Thu, Feb 14, 2019 at 11:37:15AM +0100, Peter Zijlstra wrote:
> >> On Wed, Feb 13, 2019 at 05:00:14PM -0500, Waiman Long wrote:
> >>> v4:
> >>>  - Remove rwsem-spinlock.c and make all archs use rwsem-xadd.c.
> >>>
> >>> v3:
> >>>  - Optimize __down_read_trylock() for the uncontended case as suggested
> >>>by Linus.
> >>>
> >>> v2:
> >>>  - Add patch 2 to optimize __down_read_trylock() as suggested by PeterZ.
> >>>  - Update performance test data in patch 1.
> >>>
> >>> The goal of this patchset is to remove the architecture specific files
> >>> for rwsem-xadd to make it easer to add enhancements in the later rwsem
> >>> patches. It also removes the legacy rwsem-spinlock.c file and make all
> >>> the architectures use one single implementation of rwsem - rwsem-xadd.c.
> >>>
> >>> Waiman Long (3):
> >>>   locking/rwsem: Remove arch specific rwsem files
> >>>   locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all
> >>> archs
> >>>   locking/rwsem: Optimize down_read_trylock()
> >> Acked-by: Peter Zijlstra (Intel) 
> >>
> >> with the caveat that I'm happy to exchange patch 3 back to my earlier
> >> suggestion in case Will expesses concerns wrt the ARM64 performance of
> >> Linus' suggestion.
> > Right, the current proposal doesn't work well for us, unfortunately. Which
> > was your earlier suggestion?
> >
> > Will
> 
> In my posting yesterday, I showed that most of the trylocks done were
> actually uncontended. Assuming that pattern hold for the most of the
> workloads, it will not that bad after all.

That's fair enough; if you're going to sit in a tight trylock() loop like the
benchmark does, then you're much better off just calling lock() if you care
at all about scalability.

Will


[PATCH 4.19 70/85] alpha: fix page fault handling for r16-r18 targets

2019-02-18 Thread Greg Kroah-Hartman
4.19-stable review patch.  If anyone has any objections, please let me know.

--

From: Sergei Trofimovich 

commit 491af60ffb848b59e82f7c9145833222e0bf27a5 upstream.

Fix page fault handling code to fixup r16-r18 registers.
Before the patch code had off-by-two registers bug.
This bug caused overwriting of ps,pc,gp registers instead
of fixing intended r16,r17,r18 (see `struct pt_regs`).

More details:

Initially Dmitry noticed a kernel bug as a failure
on strace test suite. Test passes unmapped userspace
pointer to io_submit:

```c
#include 
#include 
#include 
#include 
int main(void)
{
unsigned long ctx = 0;
if (syscall(__NR_io_setup, 1, ))
err(1, "io_setup");
const size_t page_size = sysconf(_SC_PAGESIZE);
const size_t size = page_size * 2;
void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (MAP_FAILED == ptr)
err(1, "mmap(%zu)", size);
if (munmap(ptr, size))
err(1, "munmap");
syscall(__NR_io_submit, ctx, 1, ptr + page_size);
syscall(__NR_io_destroy, ctx);
return 0;
}
```

Running this test causes kernel to crash when handling page fault:

```
Unable to handle kernel paging request at virtual address 9468
CPU 3
aio(26027): Oops 0
pc = []  ra = []  ps = Not 
tainted
pc is at sys_io_submit+0x108/0x200
ra is at sys_io_submit+0x6c/0x200
v0 = fc00c58e6300  t0 = fff2  t1 = 0225e000
t2 = fc01f159fef8  t3 = fc0001009640  t4 = fce0f6e0
t5 = 020001002e9e  t6 = 4c41564e49452031  t7 = fc01f159c000
s0 = 0002  s1 = 0225e000  s2 = 
s3 =   s4 =   s5 = fff2
s6 = fc00c58e6300
a0 = fc00c58e6300  a1 =   a2 = 0225e000
a3 = 021ac260  a4 = 021ac1e8  a5 = 0001
t8 = 0008  t9 = 00011f8bce30  t10= 021ac440
t11=   pv = fc6fd320  at = 
gp =   sp = 265fd174
Disabling lock debugging due to kernel taint
Trace:
[] entSys+0xa4/0xc0
```

Here `gp` has invalid value. `gp is s overwritten by a fixup for the
following page fault handler in `io_submit` syscall handler:

```
__se_sys_io_submit
...
ldq a1,0(t1)
bne t0,4280 <__se_sys_io_submit+0x180>
```

After a page fault `t0` should contain -EFALUT and `a1` is 0.
Instead `gp` was overwritten in place of `a1`.

This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18`
(aka `a0-a2`).

I think the bug went unnoticed for a long time as `gp` is one
of scratch registers. Any kernel function call would re-calculate `gp`.

Dmitry tracked down the bug origin back to 2.1.32 kernel version
where trap_a{0,1,2} fields were inserted into struct pt_regs.
And even before that `dpf_reg()` contained off-by-one error.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: linux-alpha@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Reported-and-reviewed-by: "Dmitry V. Levin" 
Cc: sta...@vger.kernel.org # v2.1.32+
Bug: https://bugs.gentoo.org/672040
Signed-off-by: Sergei Trofimovich 
Signed-off-by: Matt Turner 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/alpha/mm/fault.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -78,7 +78,7 @@ __load_new_mm_context(struct mm_struct *
 /* Macro for exception fixup code to access integer registers.  */
 #define dpf_reg(r) \
(((unsigned long *)regs)[(r) <= 8 ? (r) : (r) <= 15 ? (r)-16 :  \
-(r) <= 18 ? (r)+8 : (r)-10])
+(r) <= 18 ? (r)+10 : (r)-10])
 
 asmlinkage void
 do_page_fault(unsigned long address, unsigned long mmcsr,




[PATCH 4.14 50/62] alpha: fix page fault handling for r16-r18 targets

2019-02-18 Thread Greg Kroah-Hartman
4.14-stable review patch.  If anyone has any objections, please let me know.

--

From: Sergei Trofimovich 

commit 491af60ffb848b59e82f7c9145833222e0bf27a5 upstream.

Fix page fault handling code to fixup r16-r18 registers.
Before the patch code had off-by-two registers bug.
This bug caused overwriting of ps,pc,gp registers instead
of fixing intended r16,r17,r18 (see `struct pt_regs`).

More details:

Initially Dmitry noticed a kernel bug as a failure
on strace test suite. Test passes unmapped userspace
pointer to io_submit:

```c
#include 
#include 
#include 
#include 
int main(void)
{
unsigned long ctx = 0;
if (syscall(__NR_io_setup, 1, ))
err(1, "io_setup");
const size_t page_size = sysconf(_SC_PAGESIZE);
const size_t size = page_size * 2;
void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (MAP_FAILED == ptr)
err(1, "mmap(%zu)", size);
if (munmap(ptr, size))
err(1, "munmap");
syscall(__NR_io_submit, ctx, 1, ptr + page_size);
syscall(__NR_io_destroy, ctx);
return 0;
}
```

Running this test causes kernel to crash when handling page fault:

```
Unable to handle kernel paging request at virtual address 9468
CPU 3
aio(26027): Oops 0
pc = []  ra = []  ps = Not 
tainted
pc is at sys_io_submit+0x108/0x200
ra is at sys_io_submit+0x6c/0x200
v0 = fc00c58e6300  t0 = fff2  t1 = 0225e000
t2 = fc01f159fef8  t3 = fc0001009640  t4 = fce0f6e0
t5 = 020001002e9e  t6 = 4c41564e49452031  t7 = fc01f159c000
s0 = 0002  s1 = 0225e000  s2 = 
s3 =   s4 =   s5 = fff2
s6 = fc00c58e6300
a0 = fc00c58e6300  a1 =   a2 = 0225e000
a3 = 021ac260  a4 = 021ac1e8  a5 = 0001
t8 = 0008  t9 = 00011f8bce30  t10= 021ac440
t11=   pv = fc6fd320  at = 
gp =   sp = 265fd174
Disabling lock debugging due to kernel taint
Trace:
[] entSys+0xa4/0xc0
```

Here `gp` has invalid value. `gp is s overwritten by a fixup for the
following page fault handler in `io_submit` syscall handler:

```
__se_sys_io_submit
...
ldq a1,0(t1)
bne t0,4280 <__se_sys_io_submit+0x180>
```

After a page fault `t0` should contain -EFALUT and `a1` is 0.
Instead `gp` was overwritten in place of `a1`.

This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18`
(aka `a0-a2`).

I think the bug went unnoticed for a long time as `gp` is one
of scratch registers. Any kernel function call would re-calculate `gp`.

Dmitry tracked down the bug origin back to 2.1.32 kernel version
where trap_a{0,1,2} fields were inserted into struct pt_regs.
And even before that `dpf_reg()` contained off-by-one error.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: linux-alpha@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Reported-and-reviewed-by: "Dmitry V. Levin" 
Cc: sta...@vger.kernel.org # v2.1.32+
Bug: https://bugs.gentoo.org/672040
Signed-off-by: Sergei Trofimovich 
Signed-off-by: Matt Turner 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/alpha/mm/fault.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -78,7 +78,7 @@ __load_new_mm_context(struct mm_struct *
 /* Macro for exception fixup code to access integer registers.  */
 #define dpf_reg(r) \
(((unsigned long *)regs)[(r) <= 8 ? (r) : (r) <= 15 ? (r)-16 :  \
-(r) <= 18 ? (r)+8 : (r)-10])
+(r) <= 18 ? (r)+10 : (r)-10])
 
 asmlinkage void
 do_page_fault(unsigned long address, unsigned long mmcsr,




[PATCH 4.9 44/58] alpha: fix page fault handling for r16-r18 targets

2019-02-18 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Sergei Trofimovich 

commit 491af60ffb848b59e82f7c9145833222e0bf27a5 upstream.

Fix page fault handling code to fixup r16-r18 registers.
Before the patch code had off-by-two registers bug.
This bug caused overwriting of ps,pc,gp registers instead
of fixing intended r16,r17,r18 (see `struct pt_regs`).

More details:

Initially Dmitry noticed a kernel bug as a failure
on strace test suite. Test passes unmapped userspace
pointer to io_submit:

```c
#include 
#include 
#include 
#include 
int main(void)
{
unsigned long ctx = 0;
if (syscall(__NR_io_setup, 1, ))
err(1, "io_setup");
const size_t page_size = sysconf(_SC_PAGESIZE);
const size_t size = page_size * 2;
void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (MAP_FAILED == ptr)
err(1, "mmap(%zu)", size);
if (munmap(ptr, size))
err(1, "munmap");
syscall(__NR_io_submit, ctx, 1, ptr + page_size);
syscall(__NR_io_destroy, ctx);
return 0;
}
```

Running this test causes kernel to crash when handling page fault:

```
Unable to handle kernel paging request at virtual address 9468
CPU 3
aio(26027): Oops 0
pc = []  ra = []  ps = Not 
tainted
pc is at sys_io_submit+0x108/0x200
ra is at sys_io_submit+0x6c/0x200
v0 = fc00c58e6300  t0 = fff2  t1 = 0225e000
t2 = fc01f159fef8  t3 = fc0001009640  t4 = fce0f6e0
t5 = 020001002e9e  t6 = 4c41564e49452031  t7 = fc01f159c000
s0 = 0002  s1 = 0225e000  s2 = 
s3 =   s4 =   s5 = fff2
s6 = fc00c58e6300
a0 = fc00c58e6300  a1 =   a2 = 0225e000
a3 = 021ac260  a4 = 021ac1e8  a5 = 0001
t8 = 0008  t9 = 00011f8bce30  t10= 021ac440
t11=   pv = fc6fd320  at = 
gp =   sp = 265fd174
Disabling lock debugging due to kernel taint
Trace:
[] entSys+0xa4/0xc0
```

Here `gp` has invalid value. `gp is s overwritten by a fixup for the
following page fault handler in `io_submit` syscall handler:

```
__se_sys_io_submit
...
ldq a1,0(t1)
bne t0,4280 <__se_sys_io_submit+0x180>
```

After a page fault `t0` should contain -EFALUT and `a1` is 0.
Instead `gp` was overwritten in place of `a1`.

This happens due to a off-by-two bug in `dpf_reg()` for `r16-r18`
(aka `a0-a2`).

I think the bug went unnoticed for a long time as `gp` is one
of scratch registers. Any kernel function call would re-calculate `gp`.

Dmitry tracked down the bug origin back to 2.1.32 kernel version
where trap_a{0,1,2} fields were inserted into struct pt_regs.
And even before that `dpf_reg()` contained off-by-one error.

Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: linux-alpha@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Reported-and-reviewed-by: "Dmitry V. Levin" 
Cc: sta...@vger.kernel.org # v2.1.32+
Bug: https://bugs.gentoo.org/672040
Signed-off-by: Sergei Trofimovich 
Signed-off-by: Matt Turner 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/alpha/mm/fault.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/alpha/mm/fault.c
+++ b/arch/alpha/mm/fault.c
@@ -77,7 +77,7 @@ __load_new_mm_context(struct mm_struct *
 /* Macro for exception fixup code to access integer registers.  */
 #define dpf_reg(r) \
(((unsigned long *)regs)[(r) <= 8 ? (r) : (r) <= 15 ? (r)-16 :  \
-(r) <= 18 ? (r)+8 : (r)-10])
+(r) <= 18 ? (r)+10 : (r)-10])
 
 asmlinkage void
 do_page_fault(unsigned long address, unsigned long mmcsr,




<    3   4   5   6   7   8   9   10   11   12   >