RE: [PATCH v4 0/7] lib/lzo: performance improvements

2018-12-06 Thread Matt Sealey
Markus,

> Request 2 - add COPY16; *NOT* acked by me
> 
>   [PATCH 2/8] lib/lzo: clean-up by introducing COPY16
> 
> is still not correct because of possible overlapping copies. I'll
> address this on the weekend.

Can you give a syndrome as to why

{
COPY8(op, ip);
COPY8(op+8,ip+8);
ip+=16;
op+=16;
}

or

{ 
COPY8(op, ip);
ip+=8;
op+=8;
COPY8(op, ip);
ip+=8;
op+=8;
}

vs.

#define COPY16(dst,src) COPY8(dst,src); COPY8(dst+8,src+8)

{
COPY16(op, ip);
ip+=16;
op+=16;
}

.. causes "overlapping copies"?

COPY8 was only ever used in pairs as above and the second method
broke compiler optimizers since it adds an artificial barrier
between the two groups. The only difference was that decompress
and compress had the pointer increments spread out. If we need
to fix that then that's a good reason, but your reasoning continues
to elude me.

I can refactor the patch to align the second method with the first
and make compress and decompress get the same codegen, which is
functionally identical to the COPY16 patch, but that would seem to
in your opinion be the whole problem..

I'll see what you've got after the weekend ;D

Ta
Matt Sealey


RE: [PATCH v4 0/7] lib/lzo: performance improvements

2018-12-06 Thread Matt Sealey
Markus,

> Request 2 - add COPY16; *NOT* acked by me
> 
>   [PATCH 2/8] lib/lzo: clean-up by introducing COPY16
> 
> is still not correct because of possible overlapping copies. I'll
> address this on the weekend.

Can you give a syndrome as to why

{
COPY8(op, ip);
COPY8(op+8,ip+8);
ip+=16;
op+=16;
}

or

{ 
COPY8(op, ip);
ip+=8;
op+=8;
COPY8(op, ip);
ip+=8;
op+=8;
}

vs.

#define COPY16(dst,src) COPY8(dst,src); COPY8(dst+8,src+8)

{
COPY16(op, ip);
ip+=16;
op+=16;
}

.. causes "overlapping copies"?

COPY8 was only ever used in pairs as above and the second method
broke compiler optimizers since it adds an artificial barrier
between the two groups. The only difference was that decompress
and compress had the pointer increments spread out. If we need
to fix that then that's a good reason, but your reasoning continues
to elude me.

I can refactor the patch to align the second method with the first
and make compress and decompress get the same codegen, which is
functionally identical to the COPY16 patch, but that would seem to
in your opinion be the whole problem..

I'll see what you've got after the weekend ;D

Ta
Matt Sealey


RE: [PATCH 2/6] lib/lzo: enable 64-bit CTZ on Arm

2018-11-21 Thread Matt Sealey
Hi Christoph,

> >  #define LZO_USE_CTZ32  1
> >  #elif defined(__i386__) || defined(__powerpc__)
> >  #define LZO_USE_CTZ32  1
> > -#elif defined(__arm__) && (__LINUX_ARM_ARCH__ >= 5)
> > +#elif defined(__arm__)
> > +#if (__LINUX_ARM_ARCH__ >= 5)
> >  #define LZO_USE_CTZ32  1
> >  #endif
> > +#if (__LINUX_ARM_ARCH__ >= 6) && (CONFIG_THUMB2_KERNEL)
> > +#define LZO_USE_CTZ64  1
> 
> All of this really needs to be driven by Kconfig symbols that
> the architecture selects instead of magic arch ifdefs here.

TL;DR summary: Low hanging fruit.

TL explanation of the thought process:

Rather than using arch ifdefs, it seems we can do __builtin_ctzll()
and __builtin_clzll() on all architectures with no measurable impact
to codegen - libgcc implements them as __builtin_ctz/clz everywhere
I've looked, so gcc at least is going to use that code for the
builtins. On 32-bit Arm it introduces a tiny bit of extra register
pressure in the instruction sequence but it's not measurable.

That is to say that the difference between architectures or
architecture versions isn't the availability of a count-leading-zeros
optimization or the builtin or an efficient algorithm, but the code
preceding it to collect a 64-bit quantity to count.

That is a whole other cleanup.

Longer:

Just to further the point, though, if we rationalize the use of
builtins to use the 64-bit versions, then we may as well use the
kernel version -- __ffs64 is __builtin_clzll in asm-generic with
the right magic arch Kconfig, and if not arches provide exactly
the same code as LZO's fallback when LZO_USE_CxZnn isn't defined.

Even the distinction between little and big endian is moot, since

__builtin_clzll(cpu_to_be32(n)) 

amounts to precisely the same disassembly on both arm and arm64
as __builtin_ctzll(n), so we can just go the whole hog, replace
the 32-bit loads and xor with 64-bit ones, and do:

__ffs64(cpu_to_be32(n))

which greatly reduces code complexity in the main LZO code by
removing the distinction.

It turns out the compiler knows what it's doing, and where it
doesn't, the arch maintainers do.

Therein lies the rub; the source gets smaller and we lose the
arch/endian distinction but the efficiency of the preceding 64-bit
load and exclusive-or needs investigation. We still have to
figure out per architecture what the most efficient element size
is or if there's a hideous xor((u64) n) implementation.

I simply don't have confidence in anything except that the magic
arch ifdefs work to improve performance without changing too much,
and the mess it introduces here belies the performance improvement
it provides. Versus the true scope of the code cleanup to do it
right, it's a very concise and reliable and within-style improvement.

Would we agree to let it in for now on the basis that we thought
about it but I side with MFXJO's urgency?

Ta!
Matt Sealey 
Arm Partner Enablement Group



RE: [PATCH 2/6] lib/lzo: enable 64-bit CTZ on Arm

2018-11-21 Thread Matt Sealey
Hi Christoph,

> >  #define LZO_USE_CTZ32  1
> >  #elif defined(__i386__) || defined(__powerpc__)
> >  #define LZO_USE_CTZ32  1
> > -#elif defined(__arm__) && (__LINUX_ARM_ARCH__ >= 5)
> > +#elif defined(__arm__)
> > +#if (__LINUX_ARM_ARCH__ >= 5)
> >  #define LZO_USE_CTZ32  1
> >  #endif
> > +#if (__LINUX_ARM_ARCH__ >= 6) && (CONFIG_THUMB2_KERNEL)
> > +#define LZO_USE_CTZ64  1
> 
> All of this really needs to be driven by Kconfig symbols that
> the architecture selects instead of magic arch ifdefs here.

TL;DR summary: Low hanging fruit.

TL explanation of the thought process:

Rather than using arch ifdefs, it seems we can do __builtin_ctzll()
and __builtin_clzll() on all architectures with no measurable impact
to codegen - libgcc implements them as __builtin_ctz/clz everywhere
I've looked, so gcc at least is going to use that code for the
builtins. On 32-bit Arm it introduces a tiny bit of extra register
pressure in the instruction sequence but it's not measurable.

That is to say that the difference between architectures or
architecture versions isn't the availability of a count-leading-zeros
optimization or the builtin or an efficient algorithm, but the code
preceding it to collect a 64-bit quantity to count.

That is a whole other cleanup.

Longer:

Just to further the point, though, if we rationalize the use of
builtins to use the 64-bit versions, then we may as well use the
kernel version -- __ffs64 is __builtin_clzll in asm-generic with
the right magic arch Kconfig, and if not arches provide exactly
the same code as LZO's fallback when LZO_USE_CxZnn isn't defined.

Even the distinction between little and big endian is moot, since

__builtin_clzll(cpu_to_be32(n)) 

amounts to precisely the same disassembly on both arm and arm64
as __builtin_ctzll(n), so we can just go the whole hog, replace
the 32-bit loads and xor with 64-bit ones, and do:

__ffs64(cpu_to_be32(n))

which greatly reduces code complexity in the main LZO code by
removing the distinction.

It turns out the compiler knows what it's doing, and where it
doesn't, the arch maintainers do.

Therein lies the rub; the source gets smaller and we lose the
arch/endian distinction but the efficiency of the preceding 64-bit
load and exclusive-or needs investigation. We still have to
figure out per architecture what the most efficient element size
is or if there's a hideous xor((u64) n) implementation.

I simply don't have confidence in anything except that the magic
arch ifdefs work to improve performance without changing too much,
and the mess it introduces here belies the performance improvement
it provides. Versus the true scope of the code cleanup to do it
right, it's a very concise and reliable and within-style improvement.

Would we agree to let it in for now on the basis that we thought
about it but I side with MFXJO's urgency?

Ta!
Matt Sealey 
Arm Partner Enablement Group



RE: [PATCH 0/6] lib/lzo: performance improvements

2018-11-21 Thread Matt Sealey
Hi Markus.

> Hi Dave,
> 
> thanks for your patch set. Just some initial comments:
> 
> 
> I think the three patches
> 
>   [PATCH 2/6] lib/lzo: enable 64-bit CTZ on Arm
>   [PATCH 3/6] lib/lzo: 64-bit CTZ on Arm aarch64
>   [PATCH 4/6] lib/lzo: fast 8-byte copy on arm64
> 
> should be applied in any case - could you please make an extra
> pull request out of these and try to get them merged as fast
> as possible. Thanks.
> 
> 
> The first patch
> 
>   [PATCH 1/6] lib/lzo: clean-up by introducing COPY16
> 
> does not really affect the resulting code at the moment, but please
> note that in one case the actual copy unit is not allowed to
> be greater 8 bytes (which might be implied by the name "COPY16").
> So this needs more work like an extra COPY16_BY_8() macro.

I agree in principle but not in practice ;)

The original intent of those patches was to avail ourselves of
known architectural features, but also give the compiler a helping
hand where LZO isn't supposed to be that knowledgeable of the
underlying processor implementation. The end result is I think
an architecture having a discrete 16-byte movement is more
beneficial than dictating how things are being copied under it.

What came out is that there is a difference in codegen in compress
vs. decompress literally because of a difference in coding style:

decompress:

COPY8(op, ip);
op += 8;
ip += 8;
COPY8(op, ip);
op += 8;
ip += 8;

compress:

COPY8(op, ip);
COPY8(op+8, ip+8);
ip += 16;
op += 16;

Compilers do very well with the latter, and terribly with the former.
Once I aligned them to the same sequence I noticed the only time
LZO ever does 8-byte copies is in 16-byte chunks, and this aligned
with my expectation that on arm64 LDP/STP with writeback gets it done
in two instructions. I want to go to there!

COPY16 being defined as COPY8,COPY8 is a low hanging fruit that
does improve codegen on arm64 slightly (because it cleans the
decompression code up) and performance even more slightly as a
result. 

Just for elucidation, it goes from (decompress)

add x0, x0, 16  // op, op,
ldr w2, [x1]//, MEM[base: ii_17, offset: 0B]
add x1, x1, 16  // ii, ii,
cmp x0, x3  // op, _490
str w2, [x0, -16]   // _89, MEM[base: op_8, offset: 0B]
ldr w2, [x1, -12]   //, MEM[base: ii_17, offset: 4B]
str w2, [x0, -12]   // _105, MEM[base: op_8, offset: 4B]
ldr w2, [x1, -8]//, MEM[base: ii_17, offset: 8B]
str w2, [x0, -8]// _145, MEM[base: op_8, offset: 8B]
ldr w2, [x1, -4]//, MEM[base: ii_17, offset: 12B]
str w2, [x0, -4]// _156, MEM[base: op_8, offset: 12B]

To this (same code as compress):

add x0, x0, 16  // op, op,
ldr x2, [x1]// _153, MEM[base: ii_17, offset: 0B]
add x1, x1, 16  // ii, ii,
cmp x0, x3  // op, _418
str x2, [x0, -16]   // _153, MEM[base: op_8, offset: 0B]
ldr x2, [x1, -8]// _181, MEM[base: ii_17, offset: 8B]
str x2, [x0, -8]// _181, MEM[base: op_8, offset: 8B]

But it's going to want to be this way:

ldp x2, x3, [x1], 16// _182, MEM[base: ii_17, offset: 0B]
stp x2, x3, [x0], 16// _182, MEM[base: op_8, offset: 0B]

Which is a COPY16_BY_8 but only because of the architecture rules
on element sizes.. COPY16_BY_8 doesn't fit the idiom we're
trying to create.

For arm64 the easiest way to get LDP/STP with writeback just
happens to be allowing compilers to lift memcpy(o,i,16) which
is always going to be alignment-safe, but this doesn't hold true
on arm32 or elsewhere and we've not done enough testing on other
arches for that concept to bear fruit, yet.

I'm confused by the 'one case the actual copy unit is not allowed to
be greater 8 bytes' statement: this is not evidenced by the loop
structure. COPY8 is only ever implemented in pairs in the kernel -
can you point out where this might not be the case in the future
(and why they couldn't still use COPY8 in that case)?

Ta!
Matt Sealey 
Arm Partner Enablement Group



RE: [PATCH 0/6] lib/lzo: performance improvements

2018-11-21 Thread Matt Sealey
Hi Markus.

> Hi Dave,
> 
> thanks for your patch set. Just some initial comments:
> 
> 
> I think the three patches
> 
>   [PATCH 2/6] lib/lzo: enable 64-bit CTZ on Arm
>   [PATCH 3/6] lib/lzo: 64-bit CTZ on Arm aarch64
>   [PATCH 4/6] lib/lzo: fast 8-byte copy on arm64
> 
> should be applied in any case - could you please make an extra
> pull request out of these and try to get them merged as fast
> as possible. Thanks.
> 
> 
> The first patch
> 
>   [PATCH 1/6] lib/lzo: clean-up by introducing COPY16
> 
> does not really affect the resulting code at the moment, but please
> note that in one case the actual copy unit is not allowed to
> be greater 8 bytes (which might be implied by the name "COPY16").
> So this needs more work like an extra COPY16_BY_8() macro.

I agree in principle but not in practice ;)

The original intent of those patches was to avail ourselves of
known architectural features, but also give the compiler a helping
hand where LZO isn't supposed to be that knowledgeable of the
underlying processor implementation. The end result is I think
an architecture having a discrete 16-byte movement is more
beneficial than dictating how things are being copied under it.

What came out is that there is a difference in codegen in compress
vs. decompress literally because of a difference in coding style:

decompress:

COPY8(op, ip);
op += 8;
ip += 8;
COPY8(op, ip);
op += 8;
ip += 8;

compress:

COPY8(op, ip);
COPY8(op+8, ip+8);
ip += 16;
op += 16;

Compilers do very well with the latter, and terribly with the former.
Once I aligned them to the same sequence I noticed the only time
LZO ever does 8-byte copies is in 16-byte chunks, and this aligned
with my expectation that on arm64 LDP/STP with writeback gets it done
in two instructions. I want to go to there!

COPY16 being defined as COPY8,COPY8 is a low hanging fruit that
does improve codegen on arm64 slightly (because it cleans the
decompression code up) and performance even more slightly as a
result. 

Just for elucidation, it goes from (decompress)

add x0, x0, 16  // op, op,
ldr w2, [x1]//, MEM[base: ii_17, offset: 0B]
add x1, x1, 16  // ii, ii,
cmp x0, x3  // op, _490
str w2, [x0, -16]   // _89, MEM[base: op_8, offset: 0B]
ldr w2, [x1, -12]   //, MEM[base: ii_17, offset: 4B]
str w2, [x0, -12]   // _105, MEM[base: op_8, offset: 4B]
ldr w2, [x1, -8]//, MEM[base: ii_17, offset: 8B]
str w2, [x0, -8]// _145, MEM[base: op_8, offset: 8B]
ldr w2, [x1, -4]//, MEM[base: ii_17, offset: 12B]
str w2, [x0, -4]// _156, MEM[base: op_8, offset: 12B]

To this (same code as compress):

add x0, x0, 16  // op, op,
ldr x2, [x1]// _153, MEM[base: ii_17, offset: 0B]
add x1, x1, 16  // ii, ii,
cmp x0, x3  // op, _418
str x2, [x0, -16]   // _153, MEM[base: op_8, offset: 0B]
ldr x2, [x1, -8]// _181, MEM[base: ii_17, offset: 8B]
str x2, [x0, -8]// _181, MEM[base: op_8, offset: 8B]

But it's going to want to be this way:

ldp x2, x3, [x1], 16// _182, MEM[base: ii_17, offset: 0B]
stp x2, x3, [x0], 16// _182, MEM[base: op_8, offset: 0B]

Which is a COPY16_BY_8 but only because of the architecture rules
on element sizes.. COPY16_BY_8 doesn't fit the idiom we're
trying to create.

For arm64 the easiest way to get LDP/STP with writeback just
happens to be allowing compilers to lift memcpy(o,i,16) which
is always going to be alignment-safe, but this doesn't hold true
on arm32 or elsewhere and we've not done enough testing on other
arches for that concept to bear fruit, yet.

I'm confused by the 'one case the actual copy unit is not allowed to
be greater 8 bytes' statement: this is not evidenced by the loop
structure. COPY8 is only ever implemented in pairs in the kernel -
can you point out where this might not be the case in the future
(and why they couldn't still use COPY8 in that case)?

Ta!
Matt Sealey 
Arm Partner Enablement Group



Re: framebuffer corruption due to overlapping stp instructions on arm64

2018-08-03 Thread Matt Sealey
On 3 August 2018 at 13:25, Mikulas Patocka  wrote:
>
>
> On Fri, 3 Aug 2018, Ard Biesheuvel wrote:
>
>> Are we still talking about overlapping unaligned accesses here? Or do
>> you see other failures as well?
>
> Yes - it is caused by overlapping unaligned accesses inside memcpy. When I
> put "dmb sy" between the overlapping accesses in
> glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory
> corruption.

It is a symptom of generating reorderable accesses inside memcpy. It's nothing
to do with alignment, per se (see below). A dmb sy just hides the symptoms.

What we're talking about here - yes, Ard, within certain amounts of
reason - is that
you cannot use PCI BAR memory as 'Normal' - certainly never cacheable memory,
but Normal NC isn't good either. That is that your CPU cannot post
writes or reads
towards PCI memory spaces unless it is dealing with it as Device memory or very
strictly controlled use of Normal Non-Cacheable.

I understand why the rest of the world likes to mark stuff as
'writecombine,' but
that's x86-ism, not an Arm memory type.

There is potential for accesses to the same slave from different
masters (or just
different AXI IDs, most cores rotate over 8 or 16 or so for Normal
memory to achieve)
to be reordered. PCIe has no idea what the source was, it will just
accept them in the order it receives them, and also it will be
strictly defined to
manage incoming AXI or ACE transactions (and barriers..) in a way that does
not violate the PCIe memory model - the worst case is deadlocks, the best case
is you see some very strange behavior.

In any case the original ordering of two Normal-NC transactions may
not make it to
the PCIe bridge in the first place which is probably why a DMB
resolves it - it will
force the core to issue them in order and it's likely unless there is
some hyper-complex
multi-pathing going on, they'll stay ordered. If you MUST preserve the
order between
two Normal memory accesses, a barrier is required. The same is true also of any
re-orderable device access.

>> > I tried to run it on system RAM mapped with the NC attribute and I didn't
>> > get any corruption - that suggests the the bug may be in the PCIE
>> > subsystem.

Pure fluke.

I'll give a simple explanation. The Arm Architecture defines
single-copy and multi-copy
atomic transactions. You can treat 'single-copy' to mean that that
transaction cannot
be made partial, or reordered within itself, i.e. it must modify
memory (if it is a store) in
a single swift effort and any future reads from that memory must
return the FULL result
of that write.

Multi-copy means it can be resized and reordered a bit. Will Deacon is
going to crucify
me for simplifying it, but.. let's proceed with a poor example:

STR X0,[X1] on a 32-bit bus cannot ever be single-copy atomic, because
you cannot
write 64-bits of data on a 32-bit bus in a single, unbreakable
transaction. This is because
from one bus cycle to the next, one half of the transaction will be in
a different place. Your
interconnect will have latched and buffered 32-bits and the CPU is
holding the other.

STP X0, X1, [X2] on a 64-bit bus can be single-copy atomic with
respect to the element
size. But it is on the whole multi-copy atomic - that is to say that
it can provide a single
transaction with multiple elements which are transmitted, and those
elements could be
messed with on the way down the pipe.

On a 128-bit bus, you might expect it to be single-copy atomic because
the entire
transaction can be fit into one single data beat, but *it is most
definitely not* according
to the architecture. The data from X0 and X1 may be required to be
stored at *X2 and
*(X2+8), but the architecture doesn't care which one is written first.
Neither does AMBA.

STP is only ever guaranteed to be single-copy atomic with regards to
the element size
(which is the X register in question). If you swap the data around,
and do STP X1, X0,
[X2] you may see a different result dependent on how the processor
decides to pull
data from the register file and in what order. Users of the old 32-bit
ARM STM instruction
will recall that it writes the register list in incrementing order,
lowest register number to
lowest address, so what is the solution for STP? Do you expect expect
X0 to be emitted
on the bus first or the data to be stored in *X2?

It's neither!

That means you can do an STP on one processor and an LDR of one of the 64-bit
words on another processor, and you may be able to see

a) None of the STP transaction
b) X2 is written with the value in X0, but X2+8 is not holding the value in X1
c) b, only reversed
d) What you expect

And this can change dependent on the resizers and bridges and QoS and paths
between a master interface and a slave interface, although a truly
single-copy atomic
transaction going through a downsizer to smaller than the transaction
size is a broken
system design, it may be allowable if the downsizer hazards addresses
to the granularity
of the larger bus 

Re: framebuffer corruption due to overlapping stp instructions on arm64

2018-08-03 Thread Matt Sealey
On 3 August 2018 at 13:25, Mikulas Patocka  wrote:
>
>
> On Fri, 3 Aug 2018, Ard Biesheuvel wrote:
>
>> Are we still talking about overlapping unaligned accesses here? Or do
>> you see other failures as well?
>
> Yes - it is caused by overlapping unaligned accesses inside memcpy. When I
> put "dmb sy" between the overlapping accesses in
> glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory
> corruption.

It is a symptom of generating reorderable accesses inside memcpy. It's nothing
to do with alignment, per se (see below). A dmb sy just hides the symptoms.

What we're talking about here - yes, Ard, within certain amounts of
reason - is that
you cannot use PCI BAR memory as 'Normal' - certainly never cacheable memory,
but Normal NC isn't good either. That is that your CPU cannot post
writes or reads
towards PCI memory spaces unless it is dealing with it as Device memory or very
strictly controlled use of Normal Non-Cacheable.

I understand why the rest of the world likes to mark stuff as
'writecombine,' but
that's x86-ism, not an Arm memory type.

There is potential for accesses to the same slave from different
masters (or just
different AXI IDs, most cores rotate over 8 or 16 or so for Normal
memory to achieve)
to be reordered. PCIe has no idea what the source was, it will just
accept them in the order it receives them, and also it will be
strictly defined to
manage incoming AXI or ACE transactions (and barriers..) in a way that does
not violate the PCIe memory model - the worst case is deadlocks, the best case
is you see some very strange behavior.

In any case the original ordering of two Normal-NC transactions may
not make it to
the PCIe bridge in the first place which is probably why a DMB
resolves it - it will
force the core to issue them in order and it's likely unless there is
some hyper-complex
multi-pathing going on, they'll stay ordered. If you MUST preserve the
order between
two Normal memory accesses, a barrier is required. The same is true also of any
re-orderable device access.

>> > I tried to run it on system RAM mapped with the NC attribute and I didn't
>> > get any corruption - that suggests the the bug may be in the PCIE
>> > subsystem.

Pure fluke.

I'll give a simple explanation. The Arm Architecture defines
single-copy and multi-copy
atomic transactions. You can treat 'single-copy' to mean that that
transaction cannot
be made partial, or reordered within itself, i.e. it must modify
memory (if it is a store) in
a single swift effort and any future reads from that memory must
return the FULL result
of that write.

Multi-copy means it can be resized and reordered a bit. Will Deacon is
going to crucify
me for simplifying it, but.. let's proceed with a poor example:

STR X0,[X1] on a 32-bit bus cannot ever be single-copy atomic, because
you cannot
write 64-bits of data on a 32-bit bus in a single, unbreakable
transaction. This is because
from one bus cycle to the next, one half of the transaction will be in
a different place. Your
interconnect will have latched and buffered 32-bits and the CPU is
holding the other.

STP X0, X1, [X2] on a 64-bit bus can be single-copy atomic with
respect to the element
size. But it is on the whole multi-copy atomic - that is to say that
it can provide a single
transaction with multiple elements which are transmitted, and those
elements could be
messed with on the way down the pipe.

On a 128-bit bus, you might expect it to be single-copy atomic because
the entire
transaction can be fit into one single data beat, but *it is most
definitely not* according
to the architecture. The data from X0 and X1 may be required to be
stored at *X2 and
*(X2+8), but the architecture doesn't care which one is written first.
Neither does AMBA.

STP is only ever guaranteed to be single-copy atomic with regards to
the element size
(which is the X register in question). If you swap the data around,
and do STP X1, X0,
[X2] you may see a different result dependent on how the processor
decides to pull
data from the register file and in what order. Users of the old 32-bit
ARM STM instruction
will recall that it writes the register list in incrementing order,
lowest register number to
lowest address, so what is the solution for STP? Do you expect expect
X0 to be emitted
on the bus first or the data to be stored in *X2?

It's neither!

That means you can do an STP on one processor and an LDR of one of the 64-bit
words on another processor, and you may be able to see

a) None of the STP transaction
b) X2 is written with the value in X0, but X2+8 is not holding the value in X1
c) b, only reversed
d) What you expect

And this can change dependent on the resizers and bridges and QoS and paths
between a master interface and a slave interface, although a truly
single-copy atomic
transaction going through a downsizer to smaller than the transaction
size is a broken
system design, it may be allowable if the downsizer hazards addresses
to the granularity
of the larger bus 

Re: framebuffer corruption due to overlapping stp instructions on arm64

2018-08-02 Thread Matt Sealey
The easiest explanation for this would be that the memory isn’t mapped
correctly. You can’t use PCIe memory spaces with anything other than
Device-nGnRE or stricter mappings. That’s just differences between the AMBA
and PCIe (posted/unposted) memory models.

Normal memory (cacheable or uncacheable, which Linux tends to call “memory”
and “writecombine” respectively) is not a good idea.

There are two options; make sure Links maps it’s framebuffer as Device
memory, or the driver, or both - and make sure that only aligned accesses
happen (otherwise you’ll just get a synchronous exception) and there isn’t
a Normal memory alias.

Alternatively, tell the PCIe driver that the framebuffer is in system
memory - you can map it however you like but there’ll be a performance hit
if you start to use GPU acceleration, but a significant performance boost
from the PoV of the CPU. Only memory accessed from the PCIe master
interface (i.e. reads and writes generated by the card itself - telling the
GPU to pull from system memory or other DMA) can be in Normal memory and
this allows PCIe to be cache coherent with the right interconnect. The
slave port on a PCIe root complex (i.e. CPU writes) can’t be used with
Normal, or reorderable, and therefore your 2GB of graphics memory is going
to be slow from the point of view of the CPU.

To find the correct mapping you’ll need to know just how cache coherent the
PCIe RC is...

Ta,
Matt

On Thu, Aug 2, 2018 at 14:31 Mikulas Patocka  wrote:

> Hi
>
> I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a
> strange problem.
>
> When I use the links browser in graphics mode on the framebuffer, I get
> occasional pixel corruption. Links does memcpy, memset and 4-byte writes
> on the framebuffer - nothing else.
>
> I found out that the pixel corruption is caused by overlapping unaligned
> stp instructions inside memcpy. In order to avoid branching, the arm64
> memcpy implementation may write the same destination twice with different
> alignment. If I put "dmb sy" between the overlapping stp instructions, the
> pixel corruption goes away.
>
> This seems like a hardware bug. Is it a known errata? Do you have any
> workarounds for it?
>
> I tried AMD card (HD 6350) and NVidia (NVS 285) and both exhibit the same
> corruption. OpenGL doesn't work (it results in artifacts on the AMD card
> and lock-up on the NVidia card), but it's quite expected if even simple
> writing to the framebuffer doesn't work.
>
> Mikulas
>


Re: framebuffer corruption due to overlapping stp instructions on arm64

2018-08-02 Thread Matt Sealey
The easiest explanation for this would be that the memory isn’t mapped
correctly. You can’t use PCIe memory spaces with anything other than
Device-nGnRE or stricter mappings. That’s just differences between the AMBA
and PCIe (posted/unposted) memory models.

Normal memory (cacheable or uncacheable, which Linux tends to call “memory”
and “writecombine” respectively) is not a good idea.

There are two options; make sure Links maps it’s framebuffer as Device
memory, or the driver, or both - and make sure that only aligned accesses
happen (otherwise you’ll just get a synchronous exception) and there isn’t
a Normal memory alias.

Alternatively, tell the PCIe driver that the framebuffer is in system
memory - you can map it however you like but there’ll be a performance hit
if you start to use GPU acceleration, but a significant performance boost
from the PoV of the CPU. Only memory accessed from the PCIe master
interface (i.e. reads and writes generated by the card itself - telling the
GPU to pull from system memory or other DMA) can be in Normal memory and
this allows PCIe to be cache coherent with the right interconnect. The
slave port on a PCIe root complex (i.e. CPU writes) can’t be used with
Normal, or reorderable, and therefore your 2GB of graphics memory is going
to be slow from the point of view of the CPU.

To find the correct mapping you’ll need to know just how cache coherent the
PCIe RC is...

Ta,
Matt

On Thu, Aug 2, 2018 at 14:31 Mikulas Patocka  wrote:

> Hi
>
> I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a
> strange problem.
>
> When I use the links browser in graphics mode on the framebuffer, I get
> occasional pixel corruption. Links does memcpy, memset and 4-byte writes
> on the framebuffer - nothing else.
>
> I found out that the pixel corruption is caused by overlapping unaligned
> stp instructions inside memcpy. In order to avoid branching, the arm64
> memcpy implementation may write the same destination twice with different
> alignment. If I put "dmb sy" between the overlapping stp instructions, the
> pixel corruption goes away.
>
> This seems like a hardware bug. Is it a known errata? Do you have any
> workarounds for it?
>
> I tried AMD card (HD 6350) and NVidia (NVS 285) and both exhibit the same
> corruption. OpenGL doesn't work (it results in artifacts on the AMD card
> and lock-up on the NVidia card), but it's quite expected if even simple
> writing to the framebuffer doesn't work.
>
> Mikulas
>


RE: [RFC PATCH 6/8] dts: coresight: Clean up the device tree graph bindings

2018-06-14 Thread Matt Sealey
Hi Rob,

To take this in a somewhat different direction...

> > No, the above comment is about using "unit" ( if it is a standard
> > property for specifying something specific to hardware) instead of
> > "coresight,hwid". I would prefer to stick to the DT graph bindings,
> > because :
>
> "unit" is not a standard property and I don't like it either.

Fair enough.

> If you have "in-ports" and "out-ports" nodes, then that gives you
> direction and you can use reg for the "hardware port".
>
> in-ports {
>   port@0 {
> reg = <0>;
> endpoint {...};
>   };
>   port@1 {
> reg = <1>;
> endpoint {...};
>   };
> };
> out-ports {
>   port@0 {
> reg = <0>;
> endpoint {...};
>   };
> };
>
> I'll need to check, but dtc may need an update to not warn about this.

If the requirement that unit-address and the low #address-cells of reg
now being forced by a validation tool is screwing us over here, we can't
really change the tool; that's a IEEE 1275-1994 requirement we'd be
hacking around.

Here's a thought - why don't we define the ports in each component in a
separate graph? Describing the ATB topology as a bus with a graph structure
and then having the components simply reference the port nodes would make
it much easier to design.

Part of the problem is trying to describe an ATB bus via a component on
an APB bus. You lose all the niceties of describing a bus with bus
subcomponents. Lets take the ATB bus outside.. by describing the ingress
and egress of the component you have no idea where in the graph you are
at the point of entry (you could start parsing at a funnel, which means
travelling around looking for other units that match other compatibles).

If the CoreSight subsystem could have a graph pre-made of the full
topology, then the components can point to how they wrap parts of
that topology. You can also do away with ridiculous non-programmable
funnels and replicator nodes, since the trace graph would encapsulate
that information without having to instantiate a stub driver for it.

That doesn't solve 'unit' or 'unit-address' or 'reg' or anything but
it would keep us in graph bindings and abstract the ATB topology from
the components. Other fun stuff; describing bridge components and bus
widths within the graph could give abilities to describe bandwidths
over each trace path.

In any case, what happens when you have to decide what your funnel
port priorities are, you'll have to define a new property for that,
and it won't be 'reg'.

Trying not to add anything to the graph bindings is one thing but
trying to shoehorn it in either by doing that or inventing wholly
device-specific properties are just as bad.

Perhaps we can use the path@unit-address:arguments <-- parameters here to
add extra information about the port. 'my-args' would pull it out in Forth,
I don't know how FDT code deals with it (but there should be something,
since we use it for console definition in /chosen). They're entirely
specific to the node in question, but not used in path matching.

Why can't we decide on a graph-binding update that allows specifying
something simple as a demultiplexer output in a way the driver can
understand and map to the hardware?

> If DT bindings can be reused for ACPI, that's fine

However in my opinion not the preferred way to do it..

> any DT bindings to be accepted simply because they match ACPI
> bindings.

I'm sure ACPI has capability of a much classier way of describing
things, but I have some fear that it's just OF bindings stuffed in
a _DSD.. DT and ACPI have the responsibility of shouldering the
burden of describing hardware. Any attempts to try and make some
commonality to promote brevity of Linux code is totally out of scope..

You can always abstract whatever definition into whatever the Linux
drivers need at the time. That's kind of the point. Locking in to
'how it's currently coded' is stifling when realizing the first
attempt didn't actually adequately do what it was intended.

Ta,
Matt
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


RE: [RFC PATCH 6/8] dts: coresight: Clean up the device tree graph bindings

2018-06-14 Thread Matt Sealey
Hi Rob,

To take this in a somewhat different direction...

> > No, the above comment is about using "unit" ( if it is a standard
> > property for specifying something specific to hardware) instead of
> > "coresight,hwid". I would prefer to stick to the DT graph bindings,
> > because :
>
> "unit" is not a standard property and I don't like it either.

Fair enough.

> If you have "in-ports" and "out-ports" nodes, then that gives you
> direction and you can use reg for the "hardware port".
>
> in-ports {
>   port@0 {
> reg = <0>;
> endpoint {...};
>   };
>   port@1 {
> reg = <1>;
> endpoint {...};
>   };
> };
> out-ports {
>   port@0 {
> reg = <0>;
> endpoint {...};
>   };
> };
>
> I'll need to check, but dtc may need an update to not warn about this.

If the requirement that unit-address and the low #address-cells of reg
now being forced by a validation tool is screwing us over here, we can't
really change the tool; that's a IEEE 1275-1994 requirement we'd be
hacking around.

Here's a thought - why don't we define the ports in each component in a
separate graph? Describing the ATB topology as a bus with a graph structure
and then having the components simply reference the port nodes would make
it much easier to design.

Part of the problem is trying to describe an ATB bus via a component on
an APB bus. You lose all the niceties of describing a bus with bus
subcomponents. Lets take the ATB bus outside.. by describing the ingress
and egress of the component you have no idea where in the graph you are
at the point of entry (you could start parsing at a funnel, which means
travelling around looking for other units that match other compatibles).

If the CoreSight subsystem could have a graph pre-made of the full
topology, then the components can point to how they wrap parts of
that topology. You can also do away with ridiculous non-programmable
funnels and replicator nodes, since the trace graph would encapsulate
that information without having to instantiate a stub driver for it.

That doesn't solve 'unit' or 'unit-address' or 'reg' or anything but
it would keep us in graph bindings and abstract the ATB topology from
the components. Other fun stuff; describing bridge components and bus
widths within the graph could give abilities to describe bandwidths
over each trace path.

In any case, what happens when you have to decide what your funnel
port priorities are, you'll have to define a new property for that,
and it won't be 'reg'.

Trying not to add anything to the graph bindings is one thing but
trying to shoehorn it in either by doing that or inventing wholly
device-specific properties are just as bad.

Perhaps we can use the path@unit-address:arguments <-- parameters here to
add extra information about the port. 'my-args' would pull it out in Forth,
I don't know how FDT code deals with it (but there should be something,
since we use it for console definition in /chosen). They're entirely
specific to the node in question, but not used in path matching.

Why can't we decide on a graph-binding update that allows specifying
something simple as a demultiplexer output in a way the driver can
understand and map to the hardware?

> If DT bindings can be reused for ACPI, that's fine

However in my opinion not the preferred way to do it..

> any DT bindings to be accepted simply because they match ACPI
> bindings.

I'm sure ACPI has capability of a much classier way of describing
things, but I have some fear that it's just OF bindings stuffed in
a _DSD.. DT and ACPI have the responsibility of shouldering the
burden of describing hardware. Any attempts to try and make some
commonality to promote brevity of Linux code is totally out of scope..

You can always abstract whatever definition into whatever the Linux
drivers need at the time. That's kind of the point. Locking in to
'how it's currently coded' is stifling when realizing the first
attempt didn't actually adequately do what it was intended.

Ta,
Matt
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


RE: [RFC PATCH 6/8] dts: coresight: Clean up the device tree graph bindings

2018-06-13 Thread Matt Sealey


> -Original Message-
> From: Mathieu Poirier 
>
> > So, if the suggestion is to use an existing property "unit", I am fine
> > with it, if people agree to it.
>
> If we're going to have something sharply different than ACPI I prefer
> Rob's idea.

What are you trying to say about being sharply different than ACPI?

Ta,
Matt


IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


RE: [RFC PATCH 6/8] dts: coresight: Clean up the device tree graph bindings

2018-06-13 Thread Matt Sealey


> -Original Message-
> From: Mathieu Poirier 
>
> > So, if the suggestion is to use an existing property "unit", I am fine
> > with it, if people agree to it.
>
> If we're going to have something sharply different than ACPI I prefer
> Rob's idea.

What are you trying to say about being sharply different than ACPI?

Ta,
Matt


IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


RE: [RFC PATCH 6/8] dts: coresight: Clean up the device tree graph bindings

2018-06-13 Thread Matt Sealey
Hi Suzuki,

> > Why not use “unit”?
> >
> > I believe we had this discussion years ago about numbering serial ports
> > and sdhci (i.e. how do you know it’s UART0 or UART1 from just the address?
> > Some SoC’s don’t address sequentially *or* in a forward direction) - I
> > believe it’s not exactly codified in ePAPR, not am I sure where it may be
> > otherwise, but it exists.
>
> We have different situation here. We need to know *the port number* as
> understood by the hardware, so that we can enable *the specific* port for
> a given path.

For the purposes of abstraction, each port will have the property of having
a node which is pointed to by other nodes, and in the case of a true ATB
endpoint, no other nodes behind it.

It doesn't matter what the HW numbers it as as long as the driver can derive
it from whatever you put in the DT. So a funnel (which is ~8 ports muxed into
one output):

   f1p0: port {
  unit = <0>;
  endpoint = <>;
   };
   f1p1: port {
  unit = <4>;
  endpoint = <>;
   };
   f1out: port {
  endpoint = <>;
   };

"unit" here is specific to the driver's understanding of ports within it's
own cycle of the graph. For a replicator you can invert the logic - input
ports don't need a unit, but the two outputs are filtered in CoreSight not
by leg but by transiting ATB ID in groups of 16 IDs. In that case maybe
you would want to describe all 8 possible units on each leg with the first
ID it would filter? Or just list tuples of filter IDs 

Who cares, really, as long as the driver knows what it means.

You don't need to namespace every property.

> As I mentioned above, we need the hardware numbers to enable the
> "specific" port.

Okay and how is this not able to be prescribed in a binding for 
"arm,coresight-funnel"
that:

"input ports are numbered from 0 to N where N is the maximum input port
number. This number is identified with the "unit" property, which directly
corresponds to the bit position in the funnel Ctrl_Reg register, and the
bit position multiplied by 3 for each 3-bit priority in the funnel
Priority_Ctrl_Reg, with N having a maximum of the defined register bitfield
DEVID[PORTCOUNT], minus one, for that component"

Or a replicator:

"output ports are numbered per the CoreSight ATB Replicator specification,
unit corresponding to the IDFILTERn register controlling ID filters for
that leg, with a maximum of the defined register bitfield DEVID[PORTNUM],
minus one"

One could clarify it, even, with labels for readability ("label" definitely
is a well defined if also completely arbitrary property).

..

> static void funnel_enable_hw(struct funnel_drvdata *drvdata, int port)
> {
>  u32 functl;
>
>  CS_UNLOCK(drvdata->base);
>
>  functl = readl_relaxed(drvdata->base + FUNNEL_FUNCTL);
>  functl &= ~FUNNEL_HOLDTIME_MASK;
>  functl |= FUNNEL_HOLDTIME;
>  functl |= (1 << port);
>  writel_relaxed(functl, drvdata->base + FUNNEL_FUNCTL);
>  writel_relaxed(drvdata->priority, drvdata->base + FUNNEL_PRICTL);
>
>  CS_LOCK(drvdata->base);
> }
>
> No we don't need to parse it in both ways, up and down. Btw, the trace
> paths are not statically created. They are done at runtime, as configured
> by the user.

You do realize this isn't how the hardware works, correct?

Trace paths are fixed, they may diverge with different configurations, but
the full CoreSight topology (all funnels, replicators and intermediary
Components) is entirely unchangeable.

The DT should provide the information to provide a reference acyclic directed
graph of the entire topology (or entirely reasonably programmable topology where
at all possible) - if a user wants to trace from ETM_0 then they only
have particular paths to particular sinks, for instance ETM_0 and ETF_0
may be on their own path, so you cannot just "configure as a user"
a path from ETM_1 to ETF_0 since there isn't one.

Walking said graphs with the knowledge that CoreSight specifically disallows
loopbacks in ATB topology is basic computer science problem - literally a
matter of topological sorting. But let's build a graph once and traverse it -
don't build the graph partially each time or try and build it to cross-check
every time. The paths are wires in the design, lets not fake to the user
that there is any configurability in that or try and encode that in the
DT.

> Coming back to your suggestion of "unit", what does it imply ?

Whatever the driver likes. For uart and mmc, it was just a spurious number
but it could be applied as the end of, say, ttyS or mmcblkp3 or used
in any other driver-specific manner. The number you put in is up to you,
but the valid numbers would be in the binding for that particular device.

> Its too generic a term for something as concrete as a port number.

Is it?

Why would you need a whole other property type to encode a u32 that
describes an arbitrary number specific to that hardware device?

Ta,
Matt


IMPORTANT NOTICE: The contents of 

RE: [RFC PATCH 6/8] dts: coresight: Clean up the device tree graph bindings

2018-06-13 Thread Matt Sealey
Hi Suzuki,

> > Why not use “unit”?
> >
> > I believe we had this discussion years ago about numbering serial ports
> > and sdhci (i.e. how do you know it’s UART0 or UART1 from just the address?
> > Some SoC’s don’t address sequentially *or* in a forward direction) - I
> > believe it’s not exactly codified in ePAPR, not am I sure where it may be
> > otherwise, but it exists.
>
> We have different situation here. We need to know *the port number* as
> understood by the hardware, so that we can enable *the specific* port for
> a given path.

For the purposes of abstraction, each port will have the property of having
a node which is pointed to by other nodes, and in the case of a true ATB
endpoint, no other nodes behind it.

It doesn't matter what the HW numbers it as as long as the driver can derive
it from whatever you put in the DT. So a funnel (which is ~8 ports muxed into
one output):

   f1p0: port {
  unit = <0>;
  endpoint = <>;
   };
   f1p1: port {
  unit = <4>;
  endpoint = <>;
   };
   f1out: port {
  endpoint = <>;
   };

"unit" here is specific to the driver's understanding of ports within it's
own cycle of the graph. For a replicator you can invert the logic - input
ports don't need a unit, but the two outputs are filtered in CoreSight not
by leg but by transiting ATB ID in groups of 16 IDs. In that case maybe
you would want to describe all 8 possible units on each leg with the first
ID it would filter? Or just list tuples of filter IDs 

Who cares, really, as long as the driver knows what it means.

You don't need to namespace every property.

> As I mentioned above, we need the hardware numbers to enable the
> "specific" port.

Okay and how is this not able to be prescribed in a binding for 
"arm,coresight-funnel"
that:

"input ports are numbered from 0 to N where N is the maximum input port
number. This number is identified with the "unit" property, which directly
corresponds to the bit position in the funnel Ctrl_Reg register, and the
bit position multiplied by 3 for each 3-bit priority in the funnel
Priority_Ctrl_Reg, with N having a maximum of the defined register bitfield
DEVID[PORTCOUNT], minus one, for that component"

Or a replicator:

"output ports are numbered per the CoreSight ATB Replicator specification,
unit corresponding to the IDFILTERn register controlling ID filters for
that leg, with a maximum of the defined register bitfield DEVID[PORTNUM],
minus one"

One could clarify it, even, with labels for readability ("label" definitely
is a well defined if also completely arbitrary property).

..

> static void funnel_enable_hw(struct funnel_drvdata *drvdata, int port)
> {
>  u32 functl;
>
>  CS_UNLOCK(drvdata->base);
>
>  functl = readl_relaxed(drvdata->base + FUNNEL_FUNCTL);
>  functl &= ~FUNNEL_HOLDTIME_MASK;
>  functl |= FUNNEL_HOLDTIME;
>  functl |= (1 << port);
>  writel_relaxed(functl, drvdata->base + FUNNEL_FUNCTL);
>  writel_relaxed(drvdata->priority, drvdata->base + FUNNEL_PRICTL);
>
>  CS_LOCK(drvdata->base);
> }
>
> No we don't need to parse it in both ways, up and down. Btw, the trace
> paths are not statically created. They are done at runtime, as configured
> by the user.

You do realize this isn't how the hardware works, correct?

Trace paths are fixed, they may diverge with different configurations, but
the full CoreSight topology (all funnels, replicators and intermediary
Components) is entirely unchangeable.

The DT should provide the information to provide a reference acyclic directed
graph of the entire topology (or entirely reasonably programmable topology where
at all possible) - if a user wants to trace from ETM_0 then they only
have particular paths to particular sinks, for instance ETM_0 and ETF_0
may be on their own path, so you cannot just "configure as a user"
a path from ETM_1 to ETF_0 since there isn't one.

Walking said graphs with the knowledge that CoreSight specifically disallows
loopbacks in ATB topology is basic computer science problem - literally a
matter of topological sorting. But let's build a graph once and traverse it -
don't build the graph partially each time or try and build it to cross-check
every time. The paths are wires in the design, lets not fake to the user
that there is any configurability in that or try and encode that in the
DT.

> Coming back to your suggestion of "unit", what does it imply ?

Whatever the driver likes. For uart and mmc, it was just a spurious number
but it could be applied as the end of, say, ttyS or mmcblkp3 or used
in any other driver-specific manner. The number you put in is up to you,
but the valid numbers would be in the binding for that particular device.

> Its too generic a term for something as concrete as a port number.

Is it?

Why would you need a whole other property type to encode a u32 that
describes an arbitrary number specific to that hardware device?

Ta,
Matt


IMPORTANT NOTICE: The contents of 

Re: [RFC PATCH 6/8] dts: coresight: Clean up the device tree graph bindings

2018-06-13 Thread Matt Sealey
Suzuki,

Why not use “unit”?

I believe we had this discussion years ago about numbering serial ports and 
sdhci (i.e. how do you know it’s UART0 or UART1 from just the address? Some 
SoC’s don’t address sequentially *or* in a forward direction) - I believe it’s 
not exactly codified in ePAPR, not am I sure where it may be otherwise, but it 
exists.

I agree with Rob on the slave-mode nonsense, this is an SPI controller concept 
weirdly stuffed into a directed graph which implicitly tells you the data 
direction - it’s a rooted tree (just like DT!).

For the case of a funnel each device supplying trace should end up into an 
input node - numbered with a unit - and all those nodes should point to the 
output node as endpoints. Describing the hardware as a black box is probably 
less of a good idea than showing that it’s a funnel, or replicator by showing 
the internal paths. You wouldn’t need to “number” ports with a unit except 
where the HW needs to differentiate between them, and you don’t need reg or a 
node address to do it.

If you really need to parse full graphs in both directions (find root, find 
leaf) then could we simply introduce properties which list the phandles of all 
uplink sources, as linked lists point to the list head?

This gives a way to validate that the graph starts and ends the way we expect, 
and also allows every port to be associated with being a required path between 
any two devices without parsing the *whole* graph (although you still need to 
do that to find the route to sinks).

Ta,
Matt

Sent from my iPhone

> On Jun 13, 2018, at 04:45, Suzuki K Poulose  wrote:
>
> Hi Rob,
>
>> On 12/06/18 21:48, Rob Herring wrote:
>>> On Fri, Jun 01, 2018 at 02:16:05PM +0100, Suzuki K Poulose wrote:
>>> The coresight drivers relied on default bindings for graph
>>> in DT, while reusing the "reg" field of the "ports" to indicate
>>> the actual hardware port number for the connections. However,
>>> with the rules getting stricter w.r.t to the address mismatch
>>> with the label, it is no longer possible to use the port address
>>> field for the hardware port number. Hence, we add an explicit
>>> property to denote the hardware port number, "coresight,hwid"
>>> which must be specified for each "endpoint".
>>>
>>> Cc: Mathieu Poirier 
>>> Cc: Sudeep Holla 
>>> Cc: Rob Herring 
>>> Signed-off-by: Suzuki K Poulose 
>>> ---
>>>  .../devicetree/bindings/arm/coresight.txt  | 26 +---
>>>  drivers/hwtracing/coresight/of_coresight.c | 46 
>>> --
>>>  2 files changed, 54 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt 
>>> b/Documentation/devicetree/bindings/arm/coresight.txt
>>> index bd36e40..385581a 100644
>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>>> @@ -104,7 +104,11 @@ properties to uniquely identify the connection details.
>>>  "slave-mode"
>>> * Hardware Port number at the component:
>>> - -  The hardware port number is assumed to be the address of the 
>>> "port" component.
>>> +   - (Obsolete) The hardware port number is assumed to be the address of 
>>> the "port" component.
>>> +   - Each "endpoint" must define the hardware port of the local end of the
>>> + connection using the following property:
>>> +"coresight,hwid" - 32bit integer, hardware port number at the local 
>>> end.
>> "coresight" is not a vendor and properties are in the form
>> [,].
>
> OK. The issue here is that a coresight component could be an Arm IP or
> a custom partner IP. So, the vendor could be either arm or the partner id.
> However, this property is kind of a generic one for the Coresight family,
> which is why we opted for "coresight". What is the guideline for such
> cases ?
>
> Or in other words I see the following possible options :
>
> 1) coresight,hwid- coresight generic
> 2) arm,coresight-hwid- arm vendor, however the device could be from any 
> vendor.
> 3) hwid- Generic
> 4) none of the above, something completely different.
>
> What do you recommend from the above ?
>
>>> +
>>>  Example:
>>> @@ -120,6 +124,7 @@ Example:
>>>  etb_in_port: endpoint@0 {
>> There shouldn't be a unit address here because there is no reg property.
>>>  slave-mode;
>>>  remote-endpoint = <_out_port0>;
>>> +coresight,hwid = <0>;
>> It doesn't make sense for these to be in the endpoint. If you had
>> multiple endpoints, then you would have to duplicate it. "ports" are
>> a single data stream. "endpoints" are connections to that stream. So if
>> you have a muxed (input) or fanout/1-to-many (output) connection, then
>> you have multiple endpoints.
>
> We do have many-to-1 input (e.g, funnels) and 1-to-many outputs
> (e.g replicators). However, we have (so far) used only one endpoint per
> port.
>
> Also we could potentially have multiple data streams 

Re: [RFC PATCH 6/8] dts: coresight: Clean up the device tree graph bindings

2018-06-13 Thread Matt Sealey
Suzuki,

Why not use “unit”?

I believe we had this discussion years ago about numbering serial ports and 
sdhci (i.e. how do you know it’s UART0 or UART1 from just the address? Some 
SoC’s don’t address sequentially *or* in a forward direction) - I believe it’s 
not exactly codified in ePAPR, not am I sure where it may be otherwise, but it 
exists.

I agree with Rob on the slave-mode nonsense, this is an SPI controller concept 
weirdly stuffed into a directed graph which implicitly tells you the data 
direction - it’s a rooted tree (just like DT!).

For the case of a funnel each device supplying trace should end up into an 
input node - numbered with a unit - and all those nodes should point to the 
output node as endpoints. Describing the hardware as a black box is probably 
less of a good idea than showing that it’s a funnel, or replicator by showing 
the internal paths. You wouldn’t need to “number” ports with a unit except 
where the HW needs to differentiate between them, and you don’t need reg or a 
node address to do it.

If you really need to parse full graphs in both directions (find root, find 
leaf) then could we simply introduce properties which list the phandles of all 
uplink sources, as linked lists point to the list head?

This gives a way to validate that the graph starts and ends the way we expect, 
and also allows every port to be associated with being a required path between 
any two devices without parsing the *whole* graph (although you still need to 
do that to find the route to sinks).

Ta,
Matt

Sent from my iPhone

> On Jun 13, 2018, at 04:45, Suzuki K Poulose  wrote:
>
> Hi Rob,
>
>> On 12/06/18 21:48, Rob Herring wrote:
>>> On Fri, Jun 01, 2018 at 02:16:05PM +0100, Suzuki K Poulose wrote:
>>> The coresight drivers relied on default bindings for graph
>>> in DT, while reusing the "reg" field of the "ports" to indicate
>>> the actual hardware port number for the connections. However,
>>> with the rules getting stricter w.r.t to the address mismatch
>>> with the label, it is no longer possible to use the port address
>>> field for the hardware port number. Hence, we add an explicit
>>> property to denote the hardware port number, "coresight,hwid"
>>> which must be specified for each "endpoint".
>>>
>>> Cc: Mathieu Poirier 
>>> Cc: Sudeep Holla 
>>> Cc: Rob Herring 
>>> Signed-off-by: Suzuki K Poulose 
>>> ---
>>>  .../devicetree/bindings/arm/coresight.txt  | 26 +---
>>>  drivers/hwtracing/coresight/of_coresight.c | 46 
>>> --
>>>  2 files changed, 54 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/Documentation/devicetree/bindings/arm/coresight.txt 
>>> b/Documentation/devicetree/bindings/arm/coresight.txt
>>> index bd36e40..385581a 100644
>>> --- a/Documentation/devicetree/bindings/arm/coresight.txt
>>> +++ b/Documentation/devicetree/bindings/arm/coresight.txt
>>> @@ -104,7 +104,11 @@ properties to uniquely identify the connection details.
>>>  "slave-mode"
>>> * Hardware Port number at the component:
>>> - -  The hardware port number is assumed to be the address of the 
>>> "port" component.
>>> +   - (Obsolete) The hardware port number is assumed to be the address of 
>>> the "port" component.
>>> +   - Each "endpoint" must define the hardware port of the local end of the
>>> + connection using the following property:
>>> +"coresight,hwid" - 32bit integer, hardware port number at the local 
>>> end.
>> "coresight" is not a vendor and properties are in the form
>> [,].
>
> OK. The issue here is that a coresight component could be an Arm IP or
> a custom partner IP. So, the vendor could be either arm or the partner id.
> However, this property is kind of a generic one for the Coresight family,
> which is why we opted for "coresight". What is the guideline for such
> cases ?
>
> Or in other words I see the following possible options :
>
> 1) coresight,hwid- coresight generic
> 2) arm,coresight-hwid- arm vendor, however the device could be from any 
> vendor.
> 3) hwid- Generic
> 4) none of the above, something completely different.
>
> What do you recommend from the above ?
>
>>> +
>>>  Example:
>>> @@ -120,6 +124,7 @@ Example:
>>>  etb_in_port: endpoint@0 {
>> There shouldn't be a unit address here because there is no reg property.
>>>  slave-mode;
>>>  remote-endpoint = <_out_port0>;
>>> +coresight,hwid = <0>;
>> It doesn't make sense for these to be in the endpoint. If you had
>> multiple endpoints, then you would have to duplicate it. "ports" are
>> a single data stream. "endpoints" are connections to that stream. So if
>> you have a muxed (input) or fanout/1-to-many (output) connection, then
>> you have multiple endpoints.
>
> We do have many-to-1 input (e.g, funnels) and 1-to-many outputs
> (e.g replicators). However, we have (so far) used only one endpoint per
> port.
>
> Also we could potentially have multiple data streams 

RE: [PATCH] drivers/perf: arm-ccn: stop spamming dmesg in event_init

2018-05-17 Thread Matt Sealey
Hi all,

> > I don't have any dog in this, but maybe if providing information to the
> > users is so essential to having a pleasant user experience, then
> > rethinking the whole way these messages are funneled is necessary
> > because the kernel log + dmesg is by no means appropriate. Take a look
> > at what the networking maintainers recently did with netlink extended
> > ack. You used to just get an "unsupported" error, and now you know
> > exactly what is wrong when extack is available. It seems to me like
> > something like this is what you might want here since you want to have
> > perf be as user friendly as possible.
>
> Thanks, Florian.

Florian, I'd love to know if you mean "implement netlink extended ack in
perf" or "do something idiomatic for perf"?

Let us assume we are semantically challenged over here. I'm going to
proceed from the latter.

> Acme & other perf people, do you foresee a problem adding netlink
> extended ack support to the perf subsystem for extended error message
> communication?
>
> If not, is struct perf_event_attr amenable to having error reporting
> bit(s) added?

I did have a think about this when Kim mentioned it in passing, and my
reasoning was that a serialized error record similar to ACPI APEI BERT/ERST
tables (without the NVRAM abstraction) would fit the need.

As soon as I wrote down my thoughts I realized it scratch more itches than
just something useful for perf.

It would conceptually be a buffer passed from userspace with the syscall,
which would be populated in the kernel with an identifying header, each
record denoting its own length. The syscall fills the buffer with records
of particular format for the syscall, and the errno returned to the
application is then the state of the error recording buffer - for instance,
if the kernel ran out of space in the buffer before reporting all the errors
it could (-ENOBUFS or -EAGAIN), if it stopped recording errors on something
that requires being contained and returned at that point (-EFAULT), and so
on (anyone who's got RAS on the brain will see where I'm coming from).

I have a distinct dislike of filling the kernel with const strings, so the
records would be strictly machine-readable and contain information about
the error for the record and the source. If userspace needs to print a
string then it can look up unique identifiers (UUIDv1, give or take, to
remove the need for any authority on numbering) in a database - be that
plaintext, gettext.po, bdb, sqlite, xml, json, C structure embedded in the
tool - one for each error source. That keeps strings, translations, string
formatting entirely outside the kernel, and keeps records from being freeform
typo-laden strings.

That'd give some generic Producer code in the kernel, and imply a companion
Consumer library in userspace (with said database backend), which could also
be responsible for logging the binary records somewhere for future reference
(perhaps bounded by capabilities or container privileges).

Pretty much every syscall that has problems returning 'just an errno'
could benefit from such a system, the only impediment I can see to it is
that it's adding a new subsystem to the kernel to produce these records,
and any syscall that needs it would have to gain either an extra parameter,
or attribute setting addition (like setsockopts) or to shoehorn the buffer
pointer into an existing parameter structure like perf_event_attr. That
would have to be locked down to a consistent method that would be
recommended (like adding a new syscall interface, not everyone has an
attribute interface or parameter structure) although there's no stopping
kernel devs adding in a way for legacy applications to easily receive
the same information through existing ABI.

In the case of perf_event_attr there is space enough to mark a bit to say
that errors could be reported in a buffer, but not enough in the reserved
space that exists to store a pointer for 64-bit systems (or 128-bit ones..)
without increasing its size. But, besides that, it would have the benefit
of simply being serialized with the syscall, and not a supplemental,
potentially non-thread-safe errno/strerror-like kernel-side implementation,
nor extra syscalls to retrieve information or arbitrary formatted or
unformatted strings.

Netlink extended ack could benefit from it simply by having been passed
an nlattr pointing to the buffer and recording extended error information
in it - the extended ack structure can report the status of the buffer
and use the cookie field to reproduce some information if necessary (which
extends the RAS error record concept further when needed).

Thoughts? Does anyone have any objections to a RAS-like error reporting
system for system calls, or any ideas on things that would benefit from
it above and beyond perf? We could always audit all the system calls and
their behaviors so we could do some worked examples but anyone who's got
a good candidate outside perf is welcome to suggest it.

Ta,

RE: [PATCH] drivers/perf: arm-ccn: stop spamming dmesg in event_init

2018-05-17 Thread Matt Sealey
Hi all,

> > I don't have any dog in this, but maybe if providing information to the
> > users is so essential to having a pleasant user experience, then
> > rethinking the whole way these messages are funneled is necessary
> > because the kernel log + dmesg is by no means appropriate. Take a look
> > at what the networking maintainers recently did with netlink extended
> > ack. You used to just get an "unsupported" error, and now you know
> > exactly what is wrong when extack is available. It seems to me like
> > something like this is what you might want here since you want to have
> > perf be as user friendly as possible.
>
> Thanks, Florian.

Florian, I'd love to know if you mean "implement netlink extended ack in
perf" or "do something idiomatic for perf"?

Let us assume we are semantically challenged over here. I'm going to
proceed from the latter.

> Acme & other perf people, do you foresee a problem adding netlink
> extended ack support to the perf subsystem for extended error message
> communication?
>
> If not, is struct perf_event_attr amenable to having error reporting
> bit(s) added?

I did have a think about this when Kim mentioned it in passing, and my
reasoning was that a serialized error record similar to ACPI APEI BERT/ERST
tables (without the NVRAM abstraction) would fit the need.

As soon as I wrote down my thoughts I realized it scratch more itches than
just something useful for perf.

It would conceptually be a buffer passed from userspace with the syscall,
which would be populated in the kernel with an identifying header, each
record denoting its own length. The syscall fills the buffer with records
of particular format for the syscall, and the errno returned to the
application is then the state of the error recording buffer - for instance,
if the kernel ran out of space in the buffer before reporting all the errors
it could (-ENOBUFS or -EAGAIN), if it stopped recording errors on something
that requires being contained and returned at that point (-EFAULT), and so
on (anyone who's got RAS on the brain will see where I'm coming from).

I have a distinct dislike of filling the kernel with const strings, so the
records would be strictly machine-readable and contain information about
the error for the record and the source. If userspace needs to print a
string then it can look up unique identifiers (UUIDv1, give or take, to
remove the need for any authority on numbering) in a database - be that
plaintext, gettext.po, bdb, sqlite, xml, json, C structure embedded in the
tool - one for each error source. That keeps strings, translations, string
formatting entirely outside the kernel, and keeps records from being freeform
typo-laden strings.

That'd give some generic Producer code in the kernel, and imply a companion
Consumer library in userspace (with said database backend), which could also
be responsible for logging the binary records somewhere for future reference
(perhaps bounded by capabilities or container privileges).

Pretty much every syscall that has problems returning 'just an errno'
could benefit from such a system, the only impediment I can see to it is
that it's adding a new subsystem to the kernel to produce these records,
and any syscall that needs it would have to gain either an extra parameter,
or attribute setting addition (like setsockopts) or to shoehorn the buffer
pointer into an existing parameter structure like perf_event_attr. That
would have to be locked down to a consistent method that would be
recommended (like adding a new syscall interface, not everyone has an
attribute interface or parameter structure) although there's no stopping
kernel devs adding in a way for legacy applications to easily receive
the same information through existing ABI.

In the case of perf_event_attr there is space enough to mark a bit to say
that errors could be reported in a buffer, but not enough in the reserved
space that exists to store a pointer for 64-bit systems (or 128-bit ones..)
without increasing its size. But, besides that, it would have the benefit
of simply being serialized with the syscall, and not a supplemental,
potentially non-thread-safe errno/strerror-like kernel-side implementation,
nor extra syscalls to retrieve information or arbitrary formatted or
unformatted strings.

Netlink extended ack could benefit from it simply by having been passed
an nlattr pointing to the buffer and recording extended error information
in it - the extended ack structure can report the status of the buffer
and use the cookie field to reproduce some information if necessary (which
extends the RAS error record concept further when needed).

Thoughts? Does anyone have any objections to a RAS-like error reporting
system for system calls, or any ideas on things that would benefit from
it above and beyond perf? We could always audit all the system calls and
their behaviors so we could do some worked examples but anyone who's got
a good candidate outside perf is welcome to suggest it.

Ta,

Re: [PATCH 1/2] dt-bindings: Documentation for qcom,llcc

2018-02-08 Thread Matt Sealey
Hiya,

On 25 January 2018 at 17:55, Channagoud Kadabi  wrote:
> Documentation for last level cache controller device tree bindings,
> client bindings usage examples.

[snippety snip]

> +- llcc-bank-off:
> +   Usage: required
> +   Value Type: 
> +   Definition: Offsets of llcc banks from llcc base address starting from
> +   LLCC bank0.
> +
> +- llcc-broadcast-off:
> +   Usage: required
> +   Value Type: 
> +   Definition: Offset of broadcast register from LLCC bank0 address.

What's with the redundant namespacing?

Have we not, as a community, realised that we do not need to namespace
properties which are only present under
a single binding or node, or even those that aren't? This mess started
with the regulator bindings and it's never
stopped. What are we at now, 25 years of device trees, 10 years of FDT on Arm?

Notwithstanding the complete waste of rodata in the kernel image for
matching (& increased time to compare), why
wouldn't you consider why "bank-offset" for your node be any different
than a common property for any other node?

And if you need to describe register offsets... why aren't you able to
use the reg property?

Ta,
Matt


Re: [PATCH 1/2] dt-bindings: Documentation for qcom,llcc

2018-02-08 Thread Matt Sealey
Hiya,

On 25 January 2018 at 17:55, Channagoud Kadabi  wrote:
> Documentation for last level cache controller device tree bindings,
> client bindings usage examples.

[snippety snip]

> +- llcc-bank-off:
> +   Usage: required
> +   Value Type: 
> +   Definition: Offsets of llcc banks from llcc base address starting from
> +   LLCC bank0.
> +
> +- llcc-broadcast-off:
> +   Usage: required
> +   Value Type: 
> +   Definition: Offset of broadcast register from LLCC bank0 address.

What's with the redundant namespacing?

Have we not, as a community, realised that we do not need to namespace
properties which are only present under
a single binding or node, or even those that aren't? This mess started
with the regulator bindings and it's never
stopped. What are we at now, 25 years of device trees, 10 years of FDT on Arm?

Notwithstanding the complete waste of rodata in the kernel image for
matching (& increased time to compare), why
wouldn't you consider why "bank-offset" for your node be any different
than a common property for any other node?

And if you need to describe register offsets... why aren't you able to
use the reg property?

Ta,
Matt


Re: [RFC PATCH 0/5] Add smp booting support for Qualcomm ARMv8 SoCs

2015-04-16 Thread Matt Sealey
Hi Rob,

On Thu, Apr 16, 2015 at 12:17 PM, Rob Clark  wrote:
> On Thu, Apr 16, 2015 at 11:21 AM, Catalin Marinas
>  wrote:
>
>> But I'm definitely going to discourage companies like Qualcomm
>> deliberately ignoring the existing booting protocols while trying to get
>> their code upstream. This patch series is posted by Qualcomm without
>> providing any technical reason on why they don't want to/couldn't use
>> PSCI (well, I guess there is no technical reason but they may not care
>> much about mainline either).
>
> Sure.. just trying to make sure the wrong people don't end up being
> the ones that suffer.  I would assume/expect that it is at least
> possible for qcom to change firmware/bootloader for their dev boards
> and future devices and whatnot, whether they grumble about it or not.
> But I guess most of what the general public has are devices w/ signed
> fw, which is why "go fix your firmware" is an option that sets off
> alarm bells for me.
>
> I guess the first device where this will matter to me and a large
> group of community folks would be the dragonboard 410c..  *hopefully*
> it does not require signed firmware or at least qcom could make
> available signed firmware which supports psci..

For development boards, one would hope there is a way to sign your own firmware.
You can't expect - even for a phone SoC - that the development boards require
entering some kind of Faustian contract for development of low-level
software. What if
someone wants to develop a platform that doesn't require signing?

That said most of these dev boards have completely mangled JTAG anyway, and
I know Inforce (and Hardkernel, and so on) love their barely-ever-updated custom
firmware binaries, so..

The best thing would be to pick up one of those boards and port a PSCI firmware
to it (ATF or your own..) and just embarrass the SoC vendor by having better
mainline power management support (implemented by 10 lines in a device tree)
with the complicated code hidden away behind the scenes there, like it
should have
been done in the first place..

Ta.
Matt Sealey 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/5] Add smp booting support for Qualcomm ARMv8 SoCs

2015-04-16 Thread Matt Sealey
Hi Rob,

On Thu, Apr 16, 2015 at 12:17 PM, Rob Clark robdcl...@gmail.com wrote:
 On Thu, Apr 16, 2015 at 11:21 AM, Catalin Marinas
 catalin.mari...@arm.com wrote:

 But I'm definitely going to discourage companies like Qualcomm
 deliberately ignoring the existing booting protocols while trying to get
 their code upstream. This patch series is posted by Qualcomm without
 providing any technical reason on why they don't want to/couldn't use
 PSCI (well, I guess there is no technical reason but they may not care
 much about mainline either).

 Sure.. just trying to make sure the wrong people don't end up being
 the ones that suffer.  I would assume/expect that it is at least
 possible for qcom to change firmware/bootloader for their dev boards
 and future devices and whatnot, whether they grumble about it or not.
 But I guess most of what the general public has are devices w/ signed
 fw, which is why go fix your firmware is an option that sets off
 alarm bells for me.

 I guess the first device where this will matter to me and a large
 group of community folks would be the dragonboard 410c..  *hopefully*
 it does not require signed firmware or at least qcom could make
 available signed firmware which supports psci..

For development boards, one would hope there is a way to sign your own firmware.
You can't expect - even for a phone SoC - that the development boards require
entering some kind of Faustian contract for development of low-level
software. What if
someone wants to develop a platform that doesn't require signing?

That said most of these dev boards have completely mangled JTAG anyway, and
I know Inforce (and Hardkernel, and so on) love their barely-ever-updated custom
firmware binaries, so..

The best thing would be to pick up one of those boards and port a PSCI firmware
to it (ATF or your own..) and just embarrass the SoC vendor by having better
mainline power management support (implemented by 10 lines in a device tree)
with the complicated code hidden away behind the scenes there, like it
should have
been done in the first place..

Ta.
Matt Sealey n...@bakuhatsu.net
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/3] Documentation: arm: add UEFI support documentation

2013-12-06 Thread Matt Sealey
On Wed, Dec 4, 2013 at 4:44 PM, Matthew Garrett  wrote:
> On Wed, Dec 04, 2013 at 03:06:47PM -0600, Matt Sealey wrote:
>
>> there's no guarantee that the kernel hasn't been decompressed over
>> some important UEFI feature or some memory hasn't been trashed. You
>> can't make that guarantee because by entering the plain zImage, you
>> forfeited that information.
>
> The stub is responsible for ensuring that the compressed kernel is
> loaded at a suitable address. Take a look at efi_relocate_kernel().

My objection is the suitable address is based on a restriction that
booting from UEFI doesn't have and information UEFI provides that
makes kernel features from head.S (both of them) easier to get around.
The kernel doesn't need to be within a particular range of the start
of memory, nor does the device tree or ramdisk require being in a
particular place. What the code before efi_relocate_kernel does is
allocate a maximum-sized-buffer to safely decompress in, which is just
a gross way to do it, then crosses it's fingers based on the way it
has historically worked - while you might want to assume that the
decompression process is quite well defined and reliable, I keep
seeing patches come in that stop it from doing weird unsavory behavior
- for example decompressing over it's own page table.

The decompressor - and the kernel head it jumps to after decompression
- *guess* all the information UEFI could have provided and completely
regenerate the environment for the decompressor itself (stacks, hacky
memory allocations, cache on, off, on, off, on... fudging locations of
page tables, zreladdr fixup, low level debug message output, in
context of UEFI - reimplementation of memcpy, memset). It forfeits a
more controlled and lean boot process to capitulate to a historical
legacy. Since you're taking over the decompressor head.S anyway, why
not take control of the decompression process?

It sets up a page table location the hard way (as above.. also patched
recently not to decompress over it's own page table). It doesn't need
to relocate itself past the end of the decompressed image. It doesn't
need to set up the C environment - UEFI did that for it. It makes
assumptions about the stack and hacks memory allocations for the
decompression.. it turns the cache on, decompresses, then turns it off
again... you can just walk through the code under the EFI stub in
compressed/head.S and see all this can just fall away.

There's one immediate advantage too, if it's actually implemented and
working, which is that for kernel images that are compressed using the
standard UEFI compression method no actual decompression code needs to
be added to the stub, and the functionality gets the exact length of
the required decompression buffer.. that doesn't reduce flexibility in
kernel compression as long as there is still the possibility of adding
additional compression code to the stub.

The second immediate advantage is that the EFI stub/decompressor can
actually verify that the *decompressed* image meets Secure Boot
requirements.

Once you get past the decompressor and into the kernel proper head.S,
creating the page tables (again) and turning the MMU on, pv table
patching.. if you still had the information around, that gets simpler
too.

Grant suggested I should propose some patches; sure, if I'm not otherwise busy.

Maybe the Linaro guys can recommend a platform (real or emulated) that
would be best to test it on with the available UEFI?

>> Most of the guessing is ideally not required to be a guess at all, the
>> restrictions are purely to deal with the lack of trust for the
>> bootloader environment. Why can't we trust UEFI? Or at least hold it
>> to a higher standard. If someone ships a broken UEFI, they screw a
>> feature or have a horrible bug and ship it, laud the fact Linux
>> doesn't boot on it and the fact that it's their fault - over their
>> head. It actually works these days, Linux actually has "market share,"
>> companies really go out of their way to rescue their "image" and
>> resolve the situation when someone blogs about a serious UEFI bug on
>> their $1300 laptops, or even $300 tablets.
>
> Yeah, that hasn't actually worked out too well for us.

Aside from Teething problems caused by a rush to market ;)

For the "ARM server market" rather than the "get the cheapest
tablet/ultrabook out of the door that runs Windows 8/RT" I am sure
this is going to get to be VERY important for vendors to take into
account. Imagine if Dell shipped a *server* where Linux would brick it
out of the box just for setting a variable.. however, if it works the
day they ship the server, and Linux gains better support for UEFI
booting which breaks the server in question, that's our fault for not
doing it in the right way in the first place, and Dell can be just as
angry at us as we would be at 

Re: [PATCH v3 1/3] Documentation: arm: add UEFI support documentation

2013-12-06 Thread Matt Sealey
On Wed, Dec 4, 2013 at 4:44 PM, Matthew Garrett mj...@srcf.ucam.org wrote:
 On Wed, Dec 04, 2013 at 03:06:47PM -0600, Matt Sealey wrote:

 there's no guarantee that the kernel hasn't been decompressed over
 some important UEFI feature or some memory hasn't been trashed. You
 can't make that guarantee because by entering the plain zImage, you
 forfeited that information.

 The stub is responsible for ensuring that the compressed kernel is
 loaded at a suitable address. Take a look at efi_relocate_kernel().

My objection is the suitable address is based on a restriction that
booting from UEFI doesn't have and information UEFI provides that
makes kernel features from head.S (both of them) easier to get around.
The kernel doesn't need to be within a particular range of the start
of memory, nor does the device tree or ramdisk require being in a
particular place. What the code before efi_relocate_kernel does is
allocate a maximum-sized-buffer to safely decompress in, which is just
a gross way to do it, then crosses it's fingers based on the way it
has historically worked - while you might want to assume that the
decompression process is quite well defined and reliable, I keep
seeing patches come in that stop it from doing weird unsavory behavior
- for example decompressing over it's own page table.

The decompressor - and the kernel head it jumps to after decompression
- *guess* all the information UEFI could have provided and completely
regenerate the environment for the decompressor itself (stacks, hacky
memory allocations, cache on, off, on, off, on... fudging locations of
page tables, zreladdr fixup, low level debug message output, in
context of UEFI - reimplementation of memcpy, memset). It forfeits a
more controlled and lean boot process to capitulate to a historical
legacy. Since you're taking over the decompressor head.S anyway, why
not take control of the decompression process?

It sets up a page table location the hard way (as above.. also patched
recently not to decompress over it's own page table). It doesn't need
to relocate itself past the end of the decompressed image. It doesn't
need to set up the C environment - UEFI did that for it. It makes
assumptions about the stack and hacks memory allocations for the
decompression.. it turns the cache on, decompresses, then turns it off
again... you can just walk through the code under the EFI stub in
compressed/head.S and see all this can just fall away.

There's one immediate advantage too, if it's actually implemented and
working, which is that for kernel images that are compressed using the
standard UEFI compression method no actual decompression code needs to
be added to the stub, and the functionality gets the exact length of
the required decompression buffer.. that doesn't reduce flexibility in
kernel compression as long as there is still the possibility of adding
additional compression code to the stub.

The second immediate advantage is that the EFI stub/decompressor can
actually verify that the *decompressed* image meets Secure Boot
requirements.

Once you get past the decompressor and into the kernel proper head.S,
creating the page tables (again) and turning the MMU on, pv table
patching.. if you still had the information around, that gets simpler
too.

Grant suggested I should propose some patches; sure, if I'm not otherwise busy.

Maybe the Linaro guys can recommend a platform (real or emulated) that
would be best to test it on with the available UEFI?

 Most of the guessing is ideally not required to be a guess at all, the
 restrictions are purely to deal with the lack of trust for the
 bootloader environment. Why can't we trust UEFI? Or at least hold it
 to a higher standard. If someone ships a broken UEFI, they screw a
 feature or have a horrible bug and ship it, laud the fact Linux
 doesn't boot on it and the fact that it's their fault - over their
 head. It actually works these days, Linux actually has market share,
 companies really go out of their way to rescue their image and
 resolve the situation when someone blogs about a serious UEFI bug on
 their $1300 laptops, or even $300 tablets.

 Yeah, that hasn't actually worked out too well for us.

Aside from Teething problems caused by a rush to market ;)

For the ARM server market rather than the get the cheapest
tablet/ultrabook out of the door that runs Windows 8/RT I am sure
this is going to get to be VERY important for vendors to take into
account. Imagine if Dell shipped a *server* where Linux would brick it
out of the box just for setting a variable.. however, if it works the
day they ship the server, and Linux gains better support for UEFI
booting which breaks the server in question, that's our fault for not
doing it in the right way in the first place, and Dell can be just as
angry at us as we would be at them. Vendors won't test code that
doesn't exist for obvious reasons.

This is what I was trying to get at about them not updating their
firmware support for the more firmware

Re: [PATCH v3 1/3] Documentation: arm: add UEFI support documentation

2013-12-04 Thread Matt Sealey
On Mon, Dec 2, 2013 at 3:07 PM, Leif Lindholm  wrote:
> On Mon, Dec 02, 2013 at 01:51:22PM -0600, Matt Sealey wrote:
>> Here's where I think this whole thing falls down as being the weirdest
>> possible implementation of this. It defies logic to put this
>> information in the device tree /chosen node while also attempting to
>> boot the kernel using an EFI stub; the stub is going to have this
>> information because it is going to have the pointer to the system
>> System Table (since it was called by StartImage()). Why not stash the
>> System Table pointer somewhere safe in the stub?
>
> We do. In the DT.

Hang on... see way below about "reinventing the wheel"

>> The information in the device tree is all accessible from Boot
>> Services and as long as the System Table isn't being thrown away (my
>> suggestion would be.. stuff it in r2, and set r1 = "EFI\0" then work
>> with arch/arm/kernel/head{-common,}.S code to do the right thing)
>
> You left out the bit of redefining the kernel boot protocol to permit
> calling it with caches, MMU and interrupts enabled - also known as
> before ExitBootServices().

And that's a horrible idea because of what?

What's evident here is there could be two major ways to generate an
image that boots from a UEFI implementation;

* one whereby UEFI is jostled or coerced by second stage bootloader to
load a plain zImage and you lose all information about UEFI except in
the event that that information is preserved in the device tree by the
firmware
* one whereby a 'stock' UEFI is used and it boots only on UEFI because
it is in a format very reasonably only capable of being booted by UEFI
and, subordinately,
 - one where that plain zImage got glued to an EFI stub just like the
decompressor is glued to the Image
 - one where the kernel needs to be built with support for UEFI and
that somewhat changes the boot path

By the time we get half-way through arm/kernel/head.S the cache and
MMU has been turned off and on and off again by the decompressor, and
after a large amount of guesswork and arbitrary restriction-based
implementation, there's no guarantee that the kernel hasn't been
decompressed over some important UEFI feature or some memory hasn't
been trashed. You can't make that guarantee because by entering the
plain zImage, you forfeited that information. This is at worst case
going to be lots of blank screens and blinking serial console prompts
and little more than frustration..

Most of the guessing is ideally not required to be a guess at all, the
restrictions are purely to deal with the lack of trust for the
bootloader environment. Why can't we trust UEFI? Or at least hold it
to a higher standard. If someone ships a broken UEFI, they screw a
feature or have a horrible bug and ship it, laud the fact Linux
doesn't boot on it and the fact that it's their fault - over their
head. It actually works these days, Linux actually has "market share,"
companies really go out of their way to rescue their "image" and
resolve the situation when someone blogs about a serious UEFI bug on
their $1300 laptops, or even $300 tablets.

> Which is what we are going to implement anyway in order to permit
> firmware to supply DT hardware description in the same way as with
> ACPI. Yes, we could pass the system table pointer directly - but that
> doesn't get us the memory map.

Boot Services gives you the ability to get the memory map.. and the
kinds of things that live in those spots in the memory map. It's at
least a better guess than "I am located at a specific place and can
infer from linker data and masking off the bottom bits that there's
probably this amount of RAM that starts at this location or
thereabouts". It at least gives the ability to 'allocate' memory to
put the page table instead of having a firmware call walk all over it,
or having the kernel walk over some parts of firmware, or even not
have to do anything except link in a decompressor (eh, sure, it means
duplicating decompressor code in some cases, but I also don't think
it's a sane requirement to include the entire decompression suite in
the kernel proper if it only gets used once at early boot).

> I prefer to see it as a way to not reinvent things that do not need
> reinventing, while not adding more special-case code to the kernel.

Isn't putting the System Table pointer in the DT specifically
reinventing the UEFI boot process?

Booting from UEFI is a special case in itself.. the EFI stub here is
putting a round block in a square hole.

There are two much, much better solutions: put the round block in a
round hole. Put a square block in that square hole. We could do so
much better than gluing the round block into the square hole.

>> What that meant is nobody bothered to implement working, re-entrant,
>> re-locatable firmware to a great degree. This ended

Re: [PATCH v3 1/3] Documentation: arm: add UEFI support documentation

2013-12-04 Thread Matt Sealey
On Mon, Dec 2, 2013 at 3:07 PM, Leif Lindholm leif.lindh...@linaro.org wrote:
 On Mon, Dec 02, 2013 at 01:51:22PM -0600, Matt Sealey wrote:
 Here's where I think this whole thing falls down as being the weirdest
 possible implementation of this. It defies logic to put this
 information in the device tree /chosen node while also attempting to
 boot the kernel using an EFI stub; the stub is going to have this
 information because it is going to have the pointer to the system
 System Table (since it was called by StartImage()). Why not stash the
 System Table pointer somewhere safe in the stub?

 We do. In the DT.

Hang on... see way below about reinventing the wheel

 The information in the device tree is all accessible from Boot
 Services and as long as the System Table isn't being thrown away (my
 suggestion would be.. stuff it in r2, and set r1 = EFI\0 then work
 with arch/arm/kernel/head{-common,}.S code to do the right thing)

 You left out the bit of redefining the kernel boot protocol to permit
 calling it with caches, MMU and interrupts enabled - also known as
 before ExitBootServices().

And that's a horrible idea because of what?

What's evident here is there could be two major ways to generate an
image that boots from a UEFI implementation;

* one whereby UEFI is jostled or coerced by second stage bootloader to
load a plain zImage and you lose all information about UEFI except in
the event that that information is preserved in the device tree by the
firmware
* one whereby a 'stock' UEFI is used and it boots only on UEFI because
it is in a format very reasonably only capable of being booted by UEFI
and, subordinately,
 - one where that plain zImage got glued to an EFI stub just like the
decompressor is glued to the Image
 - one where the kernel needs to be built with support for UEFI and
that somewhat changes the boot path

By the time we get half-way through arm/kernel/head.S the cache and
MMU has been turned off and on and off again by the decompressor, and
after a large amount of guesswork and arbitrary restriction-based
implementation, there's no guarantee that the kernel hasn't been
decompressed over some important UEFI feature or some memory hasn't
been trashed. You can't make that guarantee because by entering the
plain zImage, you forfeited that information. This is at worst case
going to be lots of blank screens and blinking serial console prompts
and little more than frustration..

Most of the guessing is ideally not required to be a guess at all, the
restrictions are purely to deal with the lack of trust for the
bootloader environment. Why can't we trust UEFI? Or at least hold it
to a higher standard. If someone ships a broken UEFI, they screw a
feature or have a horrible bug and ship it, laud the fact Linux
doesn't boot on it and the fact that it's their fault - over their
head. It actually works these days, Linux actually has market share,
companies really go out of their way to rescue their image and
resolve the situation when someone blogs about a serious UEFI bug on
their $1300 laptops, or even $300 tablets.

 Which is what we are going to implement anyway in order to permit
 firmware to supply DT hardware description in the same way as with
 ACPI. Yes, we could pass the system table pointer directly - but that
 doesn't get us the memory map.

Boot Services gives you the ability to get the memory map.. and the
kinds of things that live in those spots in the memory map. It's at
least a better guess than I am located at a specific place and can
infer from linker data and masking off the bottom bits that there's
probably this amount of RAM that starts at this location or
thereabouts. It at least gives the ability to 'allocate' memory to
put the page table instead of having a firmware call walk all over it,
or having the kernel walk over some parts of firmware, or even not
have to do anything except link in a decompressor (eh, sure, it means
duplicating decompressor code in some cases, but I also don't think
it's a sane requirement to include the entire decompression suite in
the kernel proper if it only gets used once at early boot).

 I prefer to see it as a way to not reinvent things that do not need
 reinventing, while not adding more special-case code to the kernel.

Isn't putting the System Table pointer in the DT specifically
reinventing the UEFI boot process?

Booting from UEFI is a special case in itself.. the EFI stub here is
putting a round block in a square hole.

There are two much, much better solutions: put the round block in a
round hole. Put a square block in that square hole. We could do so
much better than gluing the round block into the square hole.

 What that meant is nobody bothered to implement working, re-entrant,
 re-locatable firmware to a great degree. This ended up being a
 self-fulfilling prophecy of don't trust the bootloader and get rid
 of it as soon as we can, which essentially meant Linux never took
 advantage of the resources available. In OF's case

Re: [PATCH v3 1/3] Documentation: arm: add UEFI support documentation

2013-12-02 Thread Matt Sealey
> +UEFI kernel support on ARM
> +==
> +UEFI kernel support on the ARM architectures (arm and arm64) is only 
> available
> +when boot is performed through the stub.
> +
> +The stub populates the FDT /chosen node with (and the kernel scans for) the
> +following parameters:
> +
> +Name  | Size   | Description
> +
> +linux,uefi-system-table   | 64-bit | Physical address of the UEFI System 
> Table.
> +
> +linux,uefi-mmap-start | 64-bit | Physical address of the UEFI memory map,
> +  || populated by the UEFI GetMemoryMap() 
> call.
> +
> +linux,uefi-mmap-size  | 32-bit | Size in bytes of the UEFI memory map
> +  || pointed to in previous entry.
> +
> +linux,uefi-mmap-desc-size | 32-bit | Size in bytes of each entry in the UEFI
> +  || memory map.
> +
> +linux,uefi-mmap-desc-ver  | 32-bit | Version of the mmap descriptor format.
> +
> +linux,uefi-stub-kern-ver  | string | Copy of linux_banner from build.
> +

This flies in the face of actually using #address-cells and
#size-cells to define how big addresses and sizes are. You're limited
here by the root node definitions... that's the spec.

Here's where I think this whole thing falls down as being the weirdest
possible implementation of this. It defies logic to put this
information in the device tree /chosen node while also attempting to
boot the kernel using an EFI stub; the stub is going to have this
information because it is going to have the pointer to the system
System Table (since it was called by StartImage()). Why not stash the
System Table pointer somewhere safe in the stub?

The information in the device tree is all accessible from Boot
Services and as long as the System Table isn't being thrown away (my
suggestion would be.. stuff it in r2, and set r1 = "EFI\0" then work
with arch/arm/kernel/head{-common,}.S code to do the right thing)

It seems like the advantages of booting from UEFI and having all this
information and API around are being thrown away very early, and
picked up when it's no longer relevant to gain access to the very
minimal runtime services. What's missing is a UUID for a "Device Tree
Blob" in the Configuration Table, so you can very easily go grab that
information from the firmware.

As implemented, these patches employ a very long-winded and complex
method of recovering UEFI after throwing the system table pointer away
early in boot, and then implements an EFI calling convention which
isn't strictly necessary according to the UEFI spec - the question is,
is this a workaround for SetVirtualAddressMap() not actually doing the
right thing on ARM UEFI implementations? If you can't guarantee that
most of the calls from Boot Services or Runtime Services are going to
allow this, then having any UEFI support in the kernel at all seems
rather weird.

What I'm worried about is that this is basically a hack to try and
shoehorn an existing UEFI implementation to an existing Linux boot
method - and implements it in a way nobody is ever going to be able to
justify improving. Part of the reason the OpenFirmware CIF got thrown
away early in SPARC/PowerPC boot (after "flattening" the device tree
using the CIF calls to parse it out) was because you had to disable
the MMU, caches, interrupts etc. which caused all kinds of slow
firmware code to be all kinds of extra-slow.

What that meant is nobody bothered to implement working, re-entrant,
re-locatable firmware to a great degree. This ended up being a
self-fulfilling prophecy of "don't trust the bootloader" and "get rid
of it as soon as we can," which essentially meant Linux never took
advantage of the resources available. In OF's case, the CIF sucked by
specification. In UEFI's case here, it's been implemented in Linux in
such a way that guarantees poor-performing firmware code with huge
penalties to call them, which isn't even required by UEFI if the
earlier boot code did the right things in the first place.

Ta,
Matt Sealey 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/3] Documentation: arm: add UEFI support documentation

2013-12-02 Thread Matt Sealey
 +UEFI kernel support on ARM
 +==
 +UEFI kernel support on the ARM architectures (arm and arm64) is only 
 available
 +when boot is performed through the stub.
 +
 +The stub populates the FDT /chosen node with (and the kernel scans for) the
 +following parameters:
 +
 +Name  | Size   | Description
 +
 +linux,uefi-system-table   | 64-bit | Physical address of the UEFI System 
 Table.
 +
 +linux,uefi-mmap-start | 64-bit | Physical address of the UEFI memory map,
 +  || populated by the UEFI GetMemoryMap() 
 call.
 +
 +linux,uefi-mmap-size  | 32-bit | Size in bytes of the UEFI memory map
 +  || pointed to in previous entry.
 +
 +linux,uefi-mmap-desc-size | 32-bit | Size in bytes of each entry in the UEFI
 +  || memory map.
 +
 +linux,uefi-mmap-desc-ver  | 32-bit | Version of the mmap descriptor format.
 +
 +linux,uefi-stub-kern-ver  | string | Copy of linux_banner from build.
 +

This flies in the face of actually using #address-cells and
#size-cells to define how big addresses and sizes are. You're limited
here by the root node definitions... that's the spec.

Here's where I think this whole thing falls down as being the weirdest
possible implementation of this. It defies logic to put this
information in the device tree /chosen node while also attempting to
boot the kernel using an EFI stub; the stub is going to have this
information because it is going to have the pointer to the system
System Table (since it was called by StartImage()). Why not stash the
System Table pointer somewhere safe in the stub?

The information in the device tree is all accessible from Boot
Services and as long as the System Table isn't being thrown away (my
suggestion would be.. stuff it in r2, and set r1 = EFI\0 then work
with arch/arm/kernel/head{-common,}.S code to do the right thing)

It seems like the advantages of booting from UEFI and having all this
information and API around are being thrown away very early, and
picked up when it's no longer relevant to gain access to the very
minimal runtime services. What's missing is a UUID for a Device Tree
Blob in the Configuration Table, so you can very easily go grab that
information from the firmware.

As implemented, these patches employ a very long-winded and complex
method of recovering UEFI after throwing the system table pointer away
early in boot, and then implements an EFI calling convention which
isn't strictly necessary according to the UEFI spec - the question is,
is this a workaround for SetVirtualAddressMap() not actually doing the
right thing on ARM UEFI implementations? If you can't guarantee that
most of the calls from Boot Services or Runtime Services are going to
allow this, then having any UEFI support in the kernel at all seems
rather weird.

What I'm worried about is that this is basically a hack to try and
shoehorn an existing UEFI implementation to an existing Linux boot
method - and implements it in a way nobody is ever going to be able to
justify improving. Part of the reason the OpenFirmware CIF got thrown
away early in SPARC/PowerPC boot (after flattening the device tree
using the CIF calls to parse it out) was because you had to disable
the MMU, caches, interrupts etc. which caused all kinds of slow
firmware code to be all kinds of extra-slow.

What that meant is nobody bothered to implement working, re-entrant,
re-locatable firmware to a great degree. This ended up being a
self-fulfilling prophecy of don't trust the bootloader and get rid
of it as soon as we can, which essentially meant Linux never took
advantage of the resources available. In OF's case, the CIF sucked by
specification. In UEFI's case here, it's been implemented in Linux in
such a way that guarantees poor-performing firmware code with huge
penalties to call them, which isn't even required by UEFI if the
earlier boot code did the right things in the first place.

Ta,
Matt Sealey n...@bakuhatsu.net
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-08 Thread Matt Sealey
On Fri, Nov 8, 2013 at 5:00 PM, Stephen Boyd  wrote:
> If we're running on a v7 ARM CPU, detect if the CPU supports the
> sdiv/udiv instructions and replace the signed and unsigned
> division library functions with an sdiv/udiv instruction.
>
> Running the perf messaging benchmark in pipe mode
>
>  $ perf bench sched messaging -p
>
> shows a modest improvement on my v7 CPU.
>
> before:
> (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 
> 5.683) / 10 = 5.805
>
> after:
> (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 
> 5.093) / 10 = 5.538
>
> (5.805 - 5.538) / 5.805 = 4.6%

Even with the change to the output constraint suggested by Mans, you
get absolutely identical benchmark results? There's a lot of variance
in any case..

BTW has there been any evaluation of the penalty for the extra
branching, or the performance hit for the ARMv7-without-division
cases?

Ta,
Matt Sealey 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-08 Thread Matt Sealey
On Fri, Nov 8, 2013 at 5:00 PM, Stephen Boyd sb...@codeaurora.org wrote:
 If we're running on a v7 ARM CPU, detect if the CPU supports the
 sdiv/udiv instructions and replace the signed and unsigned
 division library functions with an sdiv/udiv instruction.

 Running the perf messaging benchmark in pipe mode

  $ perf bench sched messaging -p

 shows a modest improvement on my v7 CPU.

 before:
 (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 
 5.683) / 10 = 5.805

 after:
 (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 
 5.093) / 10 = 5.538

 (5.805 - 5.538) / 5.805 = 4.6%

Even with the change to the output constraint suggested by Mans, you
get absolutely identical benchmark results? There's a lot of variance
in any case..

BTW has there been any evaluation of the penalty for the extra
branching, or the performance hit for the ARMv7-without-division
cases?

Ta,
Matt Sealey n...@bakuhatsu.net
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ARM audit, seccomp, etc are broken wrt OABI syscalls

2013-11-06 Thread Matt Sealey
On Tue, Nov 5, 2013 at 6:14 PM, Kees Cook  wrote:
>
> Alternatively, CONFIG_SECCOMP_FILTER could depend on
> !CONFIG_OABI_COMPAT. That seems like the least work, given the desire
> to kill OABI in the real world. (Though I would note that at least
> Ubuntu's ARM kernels build with CONFIG_OABI_COMPAT; Chrome OS does
> not.)

I think CONFIG_OABI_COMPAT probably leaked in from the original
configurations of the kernel taken from Debian.

There were several big decisions they made (build for ARMv5 soft
float, then switch to ARMv7 softfp, then switch to ARMv7 hardfp, then
switch to using THUMB2 kernels) which would have just broken OABI
binaries at every step of the way since they had some subtle
implications in kernel configuration (note: Ubuntu have never, ever
enabled FPA emulation in the kernel, and all Debian's OABI userspace
is built for FPA, for example. A good chunk of the original Debian arm
port probably would just pitch a SIGILL if you ran it under an Ubuntu
kernel).

I would ignore anyone who enables it in a distribution, since they
probably weren't intending to enable it in the first place, and never
noticed the.. what.. 3-4KiB it adds to the kernel?

Matt Sealey 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ARM audit, seccomp, etc are broken wrt OABI syscalls

2013-11-06 Thread Matt Sealey
On Tue, Nov 5, 2013 at 6:14 PM, Kees Cook keesc...@chromium.org wrote:

 Alternatively, CONFIG_SECCOMP_FILTER could depend on
 !CONFIG_OABI_COMPAT. That seems like the least work, given the desire
 to kill OABI in the real world. (Though I would note that at least
 Ubuntu's ARM kernels build with CONFIG_OABI_COMPAT; Chrome OS does
 not.)

I think CONFIG_OABI_COMPAT probably leaked in from the original
configurations of the kernel taken from Debian.

There were several big decisions they made (build for ARMv5 soft
float, then switch to ARMv7 softfp, then switch to ARMv7 hardfp, then
switch to using THUMB2 kernels) which would have just broken OABI
binaries at every step of the way since they had some subtle
implications in kernel configuration (note: Ubuntu have never, ever
enabled FPA emulation in the kernel, and all Debian's OABI userspace
is built for FPA, for example. A good chunk of the original Debian arm
port probably would just pitch a SIGILL if you ran it under an Ubuntu
kernel).

I would ignore anyone who enables it in a distribution, since they
probably weren't intending to enable it in the first place, and never
noticed the.. what.. 3-4KiB it adds to the kernel?

Matt Sealey n...@bakuhatsu.net
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] ARM: dts: i.MX53: dts for Voipac x53-dmm-668 module

2013-10-24 Thread Matt Sealey
On Thu, Oct 24, 2013 at 2:12 PM, Rostislav Lisovy  wrote:
> Dear Shawn;
> Thank you for your comments.
> Should I also add Voipac to
> Documentation/devicetree/bindings/vendor-prefixes.txt?

I would agree with that, but why is your chosen prefix "vp" instead of
"voipac" anyway?

-- 
Matt Sealey 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] ARM: dts: i.MX53: dts for Voipac x53-dmm-668 module

2013-10-24 Thread Matt Sealey
On Thu, Oct 24, 2013 at 2:12 PM, Rostislav Lisovy lis...@gmail.com wrote:
 Dear Shawn;
 Thank you for your comments.
 Should I also add Voipac to
 Documentation/devicetree/bindings/vendor-prefixes.txt?

I would agree with that, but why is your chosen prefix vp instead of
voipac anyway?

-- 
Matt Sealey n...@bakuhatsu.net
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

2013-08-20 Thread Matt Sealey
On Mon, Aug 19, 2013 at 11:59 AM, Ezequiel Garcia
 wrote:
> On Mon, Aug 12, 2013 at 07:29:42PM +0100, Will Deacon wrote:
>
>> This means that you don't have ordering guarantees between the two accesses
>> outside of the CPU, potentially giving you:
>>
>>   spin_lock(&__io_lock);
>>   spin_unlock(&__io_lock);
>>   writel((readl(reg) & ~clear) | set, reg);
>>
>> which is probably not what you want.
>>
>> I suggest adding an iowmb after the writel if you really need this ordering
>> to be enforced (but this may have a significant performance impact,
>> depending on your SoC).
>
> I don't want to argue with you, given I have zero knowledge about this
> ordering issue. However let me ask you a question.
>
> In arch/arm/include/asm/spinlock.h I'm seeing this comment:
>
> ""ARMv6 ticket-based spin-locking.
> A memory barrier is required after we get a lock, and before we
> release it, because V6 CPUs are assumed to have weakly ordered
> memory.""
>
> and also:
>
> static inline void arch_spin_unlock(arch_spinlock_t *lock)
> {
> smp_mb();
> lock->tickets.owner++;
> dsb_sev();
> }
>
> So, knowing this atomic API should work for every ARMv{N}, and not being very
> sure what the call to dsb_sev() does. Would you care to explain how the above
> is *not* enough to guarantee a memory barrier before the spin unlocking?

arch_spin_[un]lock as an API is not guaranteed to use a barrier before
or after doing anything, even if this particular implementation does.

dsb_sev() is an SMP helper which does a synchronization barrier and
then sends events to other CPUs which informs them of the unlock. If
the other CPUs were in WFE state waiting on that spinlock, they can
now thunder in and attempt to lock it themselves. It's not really
relevant to the discussion as arch_spin_unlock is not guaranteed to
perform a barrier before returning.

On some other architecture there may be ISA additions which make
locking barrier-less, or on a specific implementation of an ARM
architecture SoC whereby there may be a bank of "hardware spinlocks"
available.

So, in this sense, you shouldn't rely on implementation-specific
behaviors of a function. If you need to be sure C follows B follows A,
insert a barrier yourself. Don't expect A to barrier for you just
because you saw some source code that does it today, as tomorrow it
may be different. It's not an optimization, just a potential source of
new bugs if the implementation changes underneath you later.

--
Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

2013-08-20 Thread Matt Sealey
On Mon, Aug 19, 2013 at 11:59 AM, Ezequiel Garcia
ezequiel.gar...@free-electrons.com wrote:
 On Mon, Aug 12, 2013 at 07:29:42PM +0100, Will Deacon wrote:

 This means that you don't have ordering guarantees between the two accesses
 outside of the CPU, potentially giving you:

   spin_lock(__io_lock);
   spin_unlock(__io_lock);
   writel((readl(reg)  ~clear) | set, reg);

 which is probably not what you want.

 I suggest adding an iowmb after the writel if you really need this ordering
 to be enforced (but this may have a significant performance impact,
 depending on your SoC).

 I don't want to argue with you, given I have zero knowledge about this
 ordering issue. However let me ask you a question.

 In arch/arm/include/asm/spinlock.h I'm seeing this comment:

 ARMv6 ticket-based spin-locking.
 A memory barrier is required after we get a lock, and before we
 release it, because V6 CPUs are assumed to have weakly ordered
 memory.

 and also:

 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
 smp_mb();
 lock-tickets.owner++;
 dsb_sev();
 }

 So, knowing this atomic API should work for every ARMv{N}, and not being very
 sure what the call to dsb_sev() does. Would you care to explain how the above
 is *not* enough to guarantee a memory barrier before the spin unlocking?

arch_spin_[un]lock as an API is not guaranteed to use a barrier before
or after doing anything, even if this particular implementation does.

dsb_sev() is an SMP helper which does a synchronization barrier and
then sends events to other CPUs which informs them of the unlock. If
the other CPUs were in WFE state waiting on that spinlock, they can
now thunder in and attempt to lock it themselves. It's not really
relevant to the discussion as arch_spin_unlock is not guaranteed to
perform a barrier before returning.

On some other architecture there may be ISA additions which make
locking barrier-less, or on a specific implementation of an ARM
architecture SoC whereby there may be a bank of hardware spinlocks
available.

So, in this sense, you shouldn't rely on implementation-specific
behaviors of a function. If you need to be sure C follows B follows A,
insert a barrier yourself. Don't expect A to barrier for you just
because you saw some source code that does it today, as tomorrow it
may be different. It's not an optimization, just a potential source of
new bugs if the implementation changes underneath you later.

--
Matt
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Ksummit-2013-discuss] [ATTEND] [ARM ATTEND] kernel data bloat and how to avoid it

2013-08-02 Thread Matt Sealey
On Fri, Aug 2, 2013 at 3:13 AM, Tony Lindgren  wrote:
> * Mel Gorman  [130731 08:28]:
>> On Wed, Jul 31, 2013 at 12:38:03AM -0700, Tony Lindgren wrote:
>> > Hi all,
>> >
>> > Probably the biggest kernel data bloat issue is in the ARM land, but
>> > it also seems that it's becoming a Linux generic issue too, so I
>> > guess it could be discussed in either context.
>> >
>>
>> Would scripts/bloat-o-meter highlight where the growth problems are?
>
> Well to some extent yes, the board/SoC/driver specific options are
> often behind Kconfig options. So if you want to limit the set of
> supported SoCs and drivers for the kernel you can optimize it out.
>
> The bloat-o-meter won't help for things like checking that a device
> tree binding really describes the hardware, and is not just pointing
> to a table of defined registers in the device driver.

Specifically naming and shaming, like arch/arm/mach-imx/clk-*.c kind
of bloat where names of clocks, their bit definitions etc. are
extremely specific to the chip but also can change per-board and
should be in the device tree? It would be possible to list and parse
all these clocks in about 1/3rd of the code of one of those files and
have it work on every one of the SoCs they cover, if they're
adequately described in the device tree (which gives an opportunity to
stop changing the clock binding for each SoC to add new "numbers" for
"clocks we forgot" and also allow registration of ONLY the
board-specific clocks, on a per-board basis (if you don't use the TVE
or SATA, why define a clock for it just to turn it off?).

There's not a lot else going on with regards to "too much data in the
kernel that shouldn't be in the kernel" to my mind that won't
eventually get taken out once bindings in the DT. Most other platforms
are pretty sane in this regard - what is left is legacy stuff for
boards that don't have DTs and methods are in place to boot similar
boards with DT-only. As above, Mike Turquette and the OMAP guys are
doing something very similar here.

What you'd suggest/need is a review by maintainers of each platform
they support for any static data they are maintaining that can either
be excised by moving it to the device tree as boot
configuration/description data, if that data is not basically constant
anyway, or just by getting rid of rare and unused boards. The PowerPC
guys had a great cull when they moved from arch-ppc to arch-powerpc
and decided no non-DT boards allowed. Anything that didn't get ported
got deleted when arch-ppc went away. The arm arch kernel tree doesn't
have the luxury of a hard expiration in place..

~

BTW Mike, the solution is big device trees with clock data. The kernel
*should* be pure logic, not description and data tables (data tables
are exactly why Device Trees exist!) except where they are a
fundamental constant. Clock trees are a *PER BOARD WORLD* even if they
use the same SoC.

I would - and have since the dawn of time - advocate doing it the way
you're doing it (250 clocks in the tree!!! GANBATTE!!!) simply because
the "logic" behind mapping clocks to numbers is obtuse and needs to be
reproduced in very limited ways - adding a clock that was previously
undefined but of a common, already-used structure (like a gate or mux
that had no previous driver to use it) means modifying the kernel AND
the device tree (and update the binding!). In this case, they are
doing it wrong and I keep seeing patches to add 2 or 3 new clocks per
merge window, and absolutely NONE of them are the JTAG clock, I keep
having to add that in to every kernel I test. It shouldn't be a
compile-time thing to boot, it should be a "which DT do I pick, the
one with JTAG or the one without?" issue. We should ONLY have to
change the device tree - individual clock type bindings would be
fixed, boards would have to define their clocks. Since any external
clocks and peripherals have to be validated by board designers anyway
(for stability), everyone will know exactly what every board will need
per peripheral and be able to define their own clock trees with fixed
parents if need be. Vendors can provide suitable reference trees that
match the full, clumsy configuration of an entire SoC for board
bringup and we should be enabling (as in DT status="okay" property)
the clocks that make sense on that board.

There's a major advantage as above - if you do not use the clock on
your board, you do not have to specify it. It may be left on by the
bootloader and drain power and cause noise and screw parenting for
muxes, but then that's an issue for the bootloader dev guys to
resolve.

The reason consensus doesn't flow that way is because "it is a lot of
work to do this in device tree" but someone wrote out those 5 or 6
board files already and so far the clock subsystem has been rewritten
like 4 times in the meantime. None of Freescale's BSPs (for example)
use the same clock subsystem as they update to new kernels.

I *will* change the i.MX stuff one day if I ever find the time. I

Re: [Ksummit-2013-discuss] [ATTEND] [ARM ATTEND] kernel data bloat and how to avoid it

2013-08-02 Thread Matt Sealey
On Fri, Aug 2, 2013 at 3:13 AM, Tony Lindgren t...@atomide.com wrote:
 * Mel Gorman mgor...@suse.de [130731 08:28]:
 On Wed, Jul 31, 2013 at 12:38:03AM -0700, Tony Lindgren wrote:
  Hi all,
 
  Probably the biggest kernel data bloat issue is in the ARM land, but
  it also seems that it's becoming a Linux generic issue too, so I
  guess it could be discussed in either context.
 

 Would scripts/bloat-o-meter highlight where the growth problems are?

 Well to some extent yes, the board/SoC/driver specific options are
 often behind Kconfig options. So if you want to limit the set of
 supported SoCs and drivers for the kernel you can optimize it out.

 The bloat-o-meter won't help for things like checking that a device
 tree binding really describes the hardware, and is not just pointing
 to a table of defined registers in the device driver.

Specifically naming and shaming, like arch/arm/mach-imx/clk-*.c kind
of bloat where names of clocks, their bit definitions etc. are
extremely specific to the chip but also can change per-board and
should be in the device tree? It would be possible to list and parse
all these clocks in about 1/3rd of the code of one of those files and
have it work on every one of the SoCs they cover, if they're
adequately described in the device tree (which gives an opportunity to
stop changing the clock binding for each SoC to add new numbers for
clocks we forgot and also allow registration of ONLY the
board-specific clocks, on a per-board basis (if you don't use the TVE
or SATA, why define a clock for it just to turn it off?).

There's not a lot else going on with regards to too much data in the
kernel that shouldn't be in the kernel to my mind that won't
eventually get taken out once bindings in the DT. Most other platforms
are pretty sane in this regard - what is left is legacy stuff for
boards that don't have DTs and methods are in place to boot similar
boards with DT-only. As above, Mike Turquette and the OMAP guys are
doing something very similar here.

What you'd suggest/need is a review by maintainers of each platform
they support for any static data they are maintaining that can either
be excised by moving it to the device tree as boot
configuration/description data, if that data is not basically constant
anyway, or just by getting rid of rare and unused boards. The PowerPC
guys had a great cull when they moved from arch-ppc to arch-powerpc
and decided no non-DT boards allowed. Anything that didn't get ported
got deleted when arch-ppc went away. The arm arch kernel tree doesn't
have the luxury of a hard expiration in place..

~

BTW Mike, the solution is big device trees with clock data. The kernel
*should* be pure logic, not description and data tables (data tables
are exactly why Device Trees exist!) except where they are a
fundamental constant. Clock trees are a *PER BOARD WORLD* even if they
use the same SoC.

I would - and have since the dawn of time - advocate doing it the way
you're doing it (250 clocks in the tree!!! GANBATTE!!!) simply because
the logic behind mapping clocks to numbers is obtuse and needs to be
reproduced in very limited ways - adding a clock that was previously
undefined but of a common, already-used structure (like a gate or mux
that had no previous driver to use it) means modifying the kernel AND
the device tree (and update the binding!). In this case, they are
doing it wrong and I keep seeing patches to add 2 or 3 new clocks per
merge window, and absolutely NONE of them are the JTAG clock, I keep
having to add that in to every kernel I test. It shouldn't be a
compile-time thing to boot, it should be a which DT do I pick, the
one with JTAG or the one without? issue. We should ONLY have to
change the device tree - individual clock type bindings would be
fixed, boards would have to define their clocks. Since any external
clocks and peripherals have to be validated by board designers anyway
(for stability), everyone will know exactly what every board will need
per peripheral and be able to define their own clock trees with fixed
parents if need be. Vendors can provide suitable reference trees that
match the full, clumsy configuration of an entire SoC for board
bringup and we should be enabling (as in DT status=okay property)
the clocks that make sense on that board.

There's a major advantage as above - if you do not use the clock on
your board, you do not have to specify it. It may be left on by the
bootloader and drain power and cause noise and screw parenting for
muxes, but then that's an issue for the bootloader dev guys to
resolve.

The reason consensus doesn't flow that way is because it is a lot of
work to do this in device tree but someone wrote out those 5 or 6
board files already and so far the clock subsystem has been rewritten
like 4 times in the meantime. None of Freescale's BSPs (for example)
use the same clock subsystem as they update to new kernels.

I *will* change the i.MX stuff one day if I ever find the time. I
started already but I 

Re: [PATCH v4 2/6] misc: sram: add ability to mark sram sections as reserved

2013-08-01 Thread Matt Sealey
IEEE1275 (Open Firmware) specification has details on it in the MMU
(section 3.6.5), Memory (section 3.7.6) node specifications and the
glossary defines how the property should look. How you find that
document is.. up to you (I have a PDF of it I got hold of around 2004,
it used to be linked at www.openfirmware.org but I've not checked). I
think it's missed from the devicetree.org spec since it's already very
well defined and there's no reason to reinvent (or redefine) the
wheel.

Essentially it follows the same format as "reg" - phys.hi phys.lo
length - which define regions which are ACTUALLY available to the
system. The intent is that it would be used to define the memory as
going from an address for a certain size, but mark sections of it the
device tree user could use (further intent being that you would want
the OS to know that there is, for example, 4GiB of memory there and
set up any mappings to suit, but the last 256MiB is where you've put
some important data and you would rather it didn't just allocate it to
something else..).

Linux doesn't actually parse and use the information in the available
property (or care about "existing" property in an MMU node since it
replaces page tables super early and has different ideas about mapping
than most bootloaders), but other operating systems possibly do. It
would not be a significant hardship for Linux to actually parse the
property if present and use that information after a bit of sanity
checking rather than just trusting reg. And in that sense, it would
not be a significant hardship to use existing specifications to define
nodes that describe memory, since if there IS parsing code available,
it will be re-used, which is good all round.

-- 
Matt

On Thu, Aug 1, 2013 at 11:35 AM, Heiko Stübner  wrote:
> Am Montag, 29. Juli 2013, 23:39:45 schrieb Matt Sealey:
>> On Mon, Jul 29, 2013 at 9:02 AM, Philipp Zabel 
> wrote:
>> > Hi Heiko,
>> >
>> > Am Montag, den 29.07.2013, 15:12 +0200 schrieb Heiko Stübner:
>> >> Some SoCs need parts of their sram for special purposes. So while being
>> >> part of the peripheral, it should not be part of the genpool
>> >> controlling the sram.
>> >>
>> >> Therefore add an option mmio-sram-reserved to keep arbitrary portions of
>> >> the sram from being part of the pool.
>> >>
>> >> Suggested-by: Rob Herring 
>> >> Signed-off-by: Heiko Stuebner 
>> >> Tested-by: Ulrich Prinz 
>> >> ---
>> >> Philipp: I didn't carry the ack, because the loop changed significantly
>> >> again. So if it looks ok, could you re-ack it please?
>> >
>> > I'd prefer the first loop to contain the magic and produce a list of
>> > useable chunks, instead of a list of reserved blocks. The second loop
>> > could then iterate over the array and just call gen_pool_add_virt
>> > repeatedly.
>> >
>> > regards
>> > Philipp
>>
>> Agreed, however specifying chunks of memory should probably match the
>> format of the standard memory@ node "available" property - mostly
>> because it would be the same syntax and definition as defining any
>> other chunk of memory, as OpenFirmware and device trees have been
>> doing since the dark ages. In this case, why not re-use the
>> "available" property name instead of creating a new one? Standard OF
>> memory parsing code is then free for you to use to pull the chunks
>> out.
>
> Sound interesting, but could you give me a pointer to some sort of docs about
> this "available" property in memory nodes?
>
> Documentation/devicetree/booting-without-of.txt seems to be the only file
> describing the memory node at all but only required properties, and not any
> "available" property.
>
> I've found a document  on power.org describing the memory node, but also not
> mentioning any "available" property.
>
> And devicetree.org does not seem to handle the memory node at all.
>
>
> Thanks for any hints
> Heiko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/6] misc: sram: add ability to mark sram sections as reserved

2013-08-01 Thread Matt Sealey
IEEE1275 (Open Firmware) specification has details on it in the MMU
(section 3.6.5), Memory (section 3.7.6) node specifications and the
glossary defines how the property should look. How you find that
document is.. up to you (I have a PDF of it I got hold of around 2004,
it used to be linked at www.openfirmware.org but I've not checked). I
think it's missed from the devicetree.org spec since it's already very
well defined and there's no reason to reinvent (or redefine) the
wheel.

Essentially it follows the same format as reg - phys.hi phys.lo
length - which define regions which are ACTUALLY available to the
system. The intent is that it would be used to define the memory as
going from an address for a certain size, but mark sections of it the
device tree user could use (further intent being that you would want
the OS to know that there is, for example, 4GiB of memory there and
set up any mappings to suit, but the last 256MiB is where you've put
some important data and you would rather it didn't just allocate it to
something else..).

Linux doesn't actually parse and use the information in the available
property (or care about existing property in an MMU node since it
replaces page tables super early and has different ideas about mapping
than most bootloaders), but other operating systems possibly do. It
would not be a significant hardship for Linux to actually parse the
property if present and use that information after a bit of sanity
checking rather than just trusting reg. And in that sense, it would
not be a significant hardship to use existing specifications to define
nodes that describe memory, since if there IS parsing code available,
it will be re-used, which is good all round.

-- 
Matt

On Thu, Aug 1, 2013 at 11:35 AM, Heiko Stübner he...@sntech.de wrote:
 Am Montag, 29. Juli 2013, 23:39:45 schrieb Matt Sealey:
 On Mon, Jul 29, 2013 at 9:02 AM, Philipp Zabel p.za...@pengutronix.de
 wrote:
  Hi Heiko,
 
  Am Montag, den 29.07.2013, 15:12 +0200 schrieb Heiko Stübner:
  Some SoCs need parts of their sram for special purposes. So while being
  part of the peripheral, it should not be part of the genpool
  controlling the sram.
 
  Therefore add an option mmio-sram-reserved to keep arbitrary portions of
  the sram from being part of the pool.
 
  Suggested-by: Rob Herring robherri...@gmail.com
  Signed-off-by: Heiko Stuebner he...@sntech.de
  Tested-by: Ulrich Prinz ulrich.pr...@googlemail.com
  ---
  Philipp: I didn't carry the ack, because the loop changed significantly
  again. So if it looks ok, could you re-ack it please?
 
  I'd prefer the first loop to contain the magic and produce a list of
  useable chunks, instead of a list of reserved blocks. The second loop
  could then iterate over the array and just call gen_pool_add_virt
  repeatedly.
 
  regards
  Philipp

 Agreed, however specifying chunks of memory should probably match the
 format of the standard memory@ node available property - mostly
 because it would be the same syntax and definition as defining any
 other chunk of memory, as OpenFirmware and device trees have been
 doing since the dark ages. In this case, why not re-use the
 available property name instead of creating a new one? Standard OF
 memory parsing code is then free for you to use to pull the chunks
 out.

 Sound interesting, but could you give me a pointer to some sort of docs about
 this available property in memory nodes?

 Documentation/devicetree/booting-without-of.txt seems to be the only file
 describing the memory node at all but only required properties, and not any
 available property.

 I've found a document  on power.org describing the memory node, but also not
 mentioning any available property.

 And devicetree.org does not seem to handle the memory node at all.


 Thanks for any hints
 Heiko
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/6] misc: sram: add ability to mark sram sections as reserved

2013-07-29 Thread Matt Sealey
On Mon, Jul 29, 2013 at 9:02 AM, Philipp Zabel  wrote:
> Hi Heiko,
>
> Am Montag, den 29.07.2013, 15:12 +0200 schrieb Heiko Stübner:
>> Some SoCs need parts of their sram for special purposes. So while being part
>> of the peripheral, it should not be part of the genpool controlling the sram.
>>
>> Therefore add an option mmio-sram-reserved to keep arbitrary portions of the
>> sram from being part of the pool.
>>
>> Suggested-by: Rob Herring 
>> Signed-off-by: Heiko Stuebner 
>> Tested-by: Ulrich Prinz 
>> ---
>> Philipp: I didn't carry the ack, because the loop changed significantly 
>> again.
>> So if it looks ok, could you re-ack it please?
>
> I'd prefer the first loop to contain the magic and produce a list of
> useable chunks, instead of a list of reserved blocks. The second loop
> could then iterate over the array and just call gen_pool_add_virt
> repeatedly.
>
> regards
> Philipp

Agreed, however specifying chunks of memory should probably match the
format of the standard memory@ node "available" property - mostly
because it would be the same syntax and definition as defining any
other chunk of memory, as OpenFirmware and device trees have been
doing since the dark ages. In this case, why not re-use the
"available" property name instead of creating a new one? Standard OF
memory parsing code is then free for you to use to pull the chunks
out.

-- 
Matt Sealey 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/6] misc: sram: add ability to mark sram sections as reserved

2013-07-29 Thread Matt Sealey
On Mon, Jul 29, 2013 at 9:02 AM, Philipp Zabel p.za...@pengutronix.de wrote:
 Hi Heiko,

 Am Montag, den 29.07.2013, 15:12 +0200 schrieb Heiko Stübner:
 Some SoCs need parts of their sram for special purposes. So while being part
 of the peripheral, it should not be part of the genpool controlling the sram.

 Therefore add an option mmio-sram-reserved to keep arbitrary portions of the
 sram from being part of the pool.

 Suggested-by: Rob Herring robherri...@gmail.com
 Signed-off-by: Heiko Stuebner he...@sntech.de
 Tested-by: Ulrich Prinz ulrich.pr...@googlemail.com
 ---
 Philipp: I didn't carry the ack, because the loop changed significantly 
 again.
 So if it looks ok, could you re-ack it please?

 I'd prefer the first loop to contain the magic and produce a list of
 useable chunks, instead of a list of reserved blocks. The second loop
 could then iterate over the array and just call gen_pool_add_virt
 repeatedly.

 regards
 Philipp

Agreed, however specifying chunks of memory should probably match the
format of the standard memory@ node available property - mostly
because it would be the same syntax and definition as defining any
other chunk of memory, as OpenFirmware and device trees have been
doing since the dark ages. In this case, why not re-use the
available property name instead of creating a new one? Standard OF
memory parsing code is then free for you to use to pull the chunks
out.

-- 
Matt Sealey mwsea...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 1/3] dts: change Marvell prefix to 'marvell'

2013-07-10 Thread Matt Sealey
On Tue, Jul 9, 2013 at 7:49 AM, Jason Cooper  wrote:
> Neil,
>
> I agree with the need to change, however, this has been in the binding
> documentation since v3.5.  I wish we had caught this when we decided
> against using stock ticker symbols (not all stock markets use
> alphabetical abbreviated names, not all companies are listed on any
> stock exchange).

Who decided that?

You can't just "stop using stock ticker symbols" - FDT is inherently
based on the original OpenFirmware device tree and therefore any
existing bindings which are done on real OpenFirmware solutions where
using stock ticker symbols is entirely appropriate (although, these
days, not useful) is counter-productive.

If Marvell had originally had mrvl as their ticker, and used this in
OF DTs (and it is..), then mrvl it stays. In the case where new
devices are added with marvell, this is in this case wrong. You should
keep using the old one. A good example of this; Freescale. Nobody is
saying everyone should move to "freescale,imx-this" or
"freescale,vybrid-that" - it's fsl and it stays fsl for
backwards/forwards compatibility because things exist already.

Any new companies can have a long, descriptive name; a privilege of
being late to the party, you might say :)

Having an odd mix of mrvl and marvell or moving to marvell is just
completely obtuse, whether they had that stock ticker, will have it in
the future, it is how they're defined and you can't in good conscience
change the binding after devices ship with it.

> To do this properly, the drivers are going to have to be compatible with
> the old and the new names, and the binding docs updated to reflect the
> legacy name and the preferred name.

Properly would be as above. You can stop using stock tickers for new
company names, but anything that has been defined in a device tree
before has to stay that way, and all the manufacturer prefixes to
device names should be the same. What you're proposing is purely
driver bloat and increasing the size of kernel.

--
Matt Sealey 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 1/3] dts: change Marvell prefix to 'marvell'

2013-07-10 Thread Matt Sealey
On Tue, Jul 9, 2013 at 7:49 AM, Jason Cooper ja...@lakedaemon.net wrote:
 Neil,

 I agree with the need to change, however, this has been in the binding
 documentation since v3.5.  I wish we had caught this when we decided
 against using stock ticker symbols (not all stock markets use
 alphabetical abbreviated names, not all companies are listed on any
 stock exchange).

Who decided that?

You can't just stop using stock ticker symbols - FDT is inherently
based on the original OpenFirmware device tree and therefore any
existing bindings which are done on real OpenFirmware solutions where
using stock ticker symbols is entirely appropriate (although, these
days, not useful) is counter-productive.

If Marvell had originally had mrvl as their ticker, and used this in
OF DTs (and it is..), then mrvl it stays. In the case where new
devices are added with marvell, this is in this case wrong. You should
keep using the old one. A good example of this; Freescale. Nobody is
saying everyone should move to freescale,imx-this or
freescale,vybrid-that - it's fsl and it stays fsl for
backwards/forwards compatibility because things exist already.

Any new companies can have a long, descriptive name; a privilege of
being late to the party, you might say :)

Having an odd mix of mrvl and marvell or moving to marvell is just
completely obtuse, whether they had that stock ticker, will have it in
the future, it is how they're defined and you can't in good conscience
change the binding after devices ship with it.

 To do this properly, the drivers are going to have to be compatible with
 the old and the new names, and the binding docs updated to reflect the
 legacy name and the preferred name.

Properly would be as above. You can stop using stock tickers for new
company names, but anything that has been defined in a device tree
before has to stay that way, and all the manufacturer prefixes to
device names should be the same. What you're proposing is purely
driver bloat and increasing the size of kernel.

--
Matt Sealey n...@bakuhatsu.net
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 3/3] clk: dt: binding for basic divider clock

2013-06-04 Thread Matt Sealey
On Tue, Jun 4, 2013 at 2:22 PM, Mike Turquette  wrote:
> Quoting Matt Sealey (2013-06-04 10:39:53)
>> On Tue, Jun 4, 2013 at 12:11 PM, Stephen Boyd  wrote:
>> > On 06/03/13 10:53, Mike Turquette wrote:
>> >> +Required properties:
>> >> +- compatible : shall be "divider-clock".
>> >> +- #clock-cells : from common clock binding; shall be set to 0.
>> >> +- clocks : link to phandle of parent clock
>> >> +- reg : base address for register controlling adjustable divider
>> >> +- mask : arbitrary bitmask for programming the adjustable divider
>> >> +
>> >> +Optional properties:
>> >> +- clock-output-names : From common clock binding.
>> >> +- table : array of integer pairs defining divisors & bitfield values
>> >> +- shift : number of bits to shift the mask, defaults to 0 if not present
>> >> +- index_one : valid divisor programming starts at 1, not zero
>> >> +- index_power_of_two : valid divisor programming must be a power of two
>> >> +- allow_zero : implies index_one, and programming zero results in
>> >> +  divide-by-one
>> >
>> > It's preferred that property names have dashes instead of underscores.
>>
>> I think I have a suggestion or two here..
>>
>> I mailed one to Mike earlier, I have had some mailing list problems
>> which I think are solved now..
>>
>> If you provide a mask for programming the divider, surely the "shift"
>> property is therefore redundant. Unless some device has a completely
>> strange way of doing things and uses two seperated bitfields for
>> programming in the same register, the shift value is simply a function
>> of ffs() on the mask, isn't it? It is nice to put the information
>> specifically in the device tree, but where a shift is not specified
>> it'd probably be a good idea to go about deriving that value that way
>> anyway, right, and if we're doing that.. why specify it?
>
> Shift is optional.  Also I think the integer values in a DTS are
> 32-bits, so if some day someone had a clock driver that needed to write
> to a 64-bit register then shift might be useful there, combined with a
> 32-bit mask.  Or perhaps dtc will have a 64-bit value for that case?  I
> don't know.
>
> I certainly do see your point about only using the mask, and the
> original clock basic types did this a while back before the shift+width
> style came into vogue.
>
> If other reviewers also want to use a pure 32-bit mask and no shift then
> I'll drop it in the next round.

Hmm.. that's a very good point...

You can't have a 64-bit OF property unless you're on a 64-bit platform
or do some really weird workarounds for it (multiple cells etc.).
That's how the OF spec was laid out, it sucks.. but it's the standard
and the DT seems to stay compatible with the general concept. A
supplemental binding to support clock dividers living in 64-bit
registers may be a better way.

You'd need a shift for the mask if you could only do a 32-bit property
(32-bit platform with a 4-byte cell size) and a shift for the bits
that would go into the mask as well as a width property for the
register size to bound the shift..  Too complicated for now. I would
say stick with 32-bit until this magical 64-bit divider register
appears, and then do something slightly different :)

>> Also it may be much simpler to simply define the default clock setup
>> as being an integer divider that follows the form 2^n - I am not sure
>> I can think off the top of my head of any clock dividers doing the
>> rounds that have divide-by-non-power-of-two and if they exist out
>> there (please correct me) they would be the rare ones. I think
>> properties that modify behavior should err on the side of being rarely
>> used and to define unusual behavior, not confirm the status quo.
>
> Please review the current use of CLK_DIVIDER_POWER_OF_TWO in the kernel.

You're right, I think I got it backwards or was actually thinking of
allow_zero (as in, 0x0 divider = divide by 1)? i.MX definitely has
more 0x0=div-by-1 than any other kind..

>> In any case, once you figure out that, rather than specifying a full
>> table of <0 1>, <1 2>, <2, 4> and creating a big list, you could just
>> give the lower and upper limits and let the driver figure out what is
>> in between with two values. If there were gaps, a full table would be
>> necessary. The allow zero property might be better served by the very
>> same range property.
>
> You have described how it already works.  The table is only there for
> when a common formula fails to capture the possible divisor
> combinations.  For the defa

Re: [PATCH RFC 3/3] clk: dt: binding for basic divider clock

2013-06-04 Thread Matt Sealey
On Tue, Jun 4, 2013 at 12:11 PM, Stephen Boyd  wrote:
> On 06/03/13 10:53, Mike Turquette wrote:
>> +Required properties:
>> +- compatible : shall be "divider-clock".
>> +- #clock-cells : from common clock binding; shall be set to 0.
>> +- clocks : link to phandle of parent clock
>> +- reg : base address for register controlling adjustable divider
>> +- mask : arbitrary bitmask for programming the adjustable divider
>> +
>> +Optional properties:
>> +- clock-output-names : From common clock binding.
>> +- table : array of integer pairs defining divisors & bitfield values
>> +- shift : number of bits to shift the mask, defaults to 0 if not present
>> +- index_one : valid divisor programming starts at 1, not zero
>> +- index_power_of_two : valid divisor programming must be a power of two
>> +- allow_zero : implies index_one, and programming zero results in
>> +  divide-by-one
>
> It's preferred that property names have dashes instead of underscores.

I think I have a suggestion or two here..

I mailed one to Mike earlier, I have had some mailing list problems
which I think are solved now..

If you provide a mask for programming the divider, surely the "shift"
property is therefore redundant. Unless some device has a completely
strange way of doing things and uses two seperated bitfields for
programming in the same register, the shift value is simply a function
of ffs() on the mask, isn't it? It is nice to put the information
specifically in the device tree, but where a shift is not specified
it'd probably be a good idea to go about deriving that value that way
anyway, right, and if we're doing that.. why specify it?

Also it may be much simpler to simply define the default clock setup
as being an integer divider that follows the form 2^n - I am not sure
I can think off the top of my head of any clock dividers doing the
rounds that have divide-by-non-power-of-two and if they exist out
there (please correct me) they would be the rare ones. I think
properties that modify behavior should err on the side of being rarely
used and to define unusual behavior, not confirm the status quo.

Just to be sure, though, someone would need to go visit a bunch of
current clock handlers already implemented or read a lot of docs to
see how they're expecting their dividers. Maybe some of the Samsung,
Qualcomm, TI, Freescale guys can comment on what is the most common
case inside their SoCs.

In any case, once you figure out that, rather than specifying a full
table of <0 1>, <1 2>, <2, 4> and creating a big list, you could just
give the lower and upper limits and let the driver figure out what is
in between with two values. If there were gaps, a full table would be
necessary. The allow zero property might be better served by the very
same range property.

I would also say.. bit-mask rather than mask, divider-table rather
than table, divider-range (modifying the idea above). While we're
inside a particular namespace in the binding I think naming properties
with their intent (i.e. what table is it?) is good behavior.

--
Matt Sealey 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 3/3] clk: dt: binding for basic divider clock

2013-06-04 Thread Matt Sealey
On Tue, Jun 4, 2013 at 12:11 PM, Stephen Boyd sb...@codeaurora.org wrote:
 On 06/03/13 10:53, Mike Turquette wrote:
 +Required properties:
 +- compatible : shall be divider-clock.
 +- #clock-cells : from common clock binding; shall be set to 0.
 +- clocks : link to phandle of parent clock
 +- reg : base address for register controlling adjustable divider
 +- mask : arbitrary bitmask for programming the adjustable divider
 +
 +Optional properties:
 +- clock-output-names : From common clock binding.
 +- table : array of integer pairs defining divisors  bitfield values
 +- shift : number of bits to shift the mask, defaults to 0 if not present
 +- index_one : valid divisor programming starts at 1, not zero
 +- index_power_of_two : valid divisor programming must be a power of two
 +- allow_zero : implies index_one, and programming zero results in
 +  divide-by-one

 It's preferred that property names have dashes instead of underscores.

I think I have a suggestion or two here..

I mailed one to Mike earlier, I have had some mailing list problems
which I think are solved now..

If you provide a mask for programming the divider, surely the shift
property is therefore redundant. Unless some device has a completely
strange way of doing things and uses two seperated bitfields for
programming in the same register, the shift value is simply a function
of ffs() on the mask, isn't it? It is nice to put the information
specifically in the device tree, but where a shift is not specified
it'd probably be a good idea to go about deriving that value that way
anyway, right, and if we're doing that.. why specify it?

Also it may be much simpler to simply define the default clock setup
as being an integer divider that follows the form 2^n - I am not sure
I can think off the top of my head of any clock dividers doing the
rounds that have divide-by-non-power-of-two and if they exist out
there (please correct me) they would be the rare ones. I think
properties that modify behavior should err on the side of being rarely
used and to define unusual behavior, not confirm the status quo.

Just to be sure, though, someone would need to go visit a bunch of
current clock handlers already implemented or read a lot of docs to
see how they're expecting their dividers. Maybe some of the Samsung,
Qualcomm, TI, Freescale guys can comment on what is the most common
case inside their SoCs.

In any case, once you figure out that, rather than specifying a full
table of 0 1, 1 2, 2, 4 and creating a big list, you could just
give the lower and upper limits and let the driver figure out what is
in between with two values. If there were gaps, a full table would be
necessary. The allow zero property might be better served by the very
same range property.

I would also say.. bit-mask rather than mask, divider-table rather
than table, divider-range (modifying the idea above). While we're
inside a particular namespace in the binding I think naming properties
with their intent (i.e. what table is it?) is good behavior.

--
Matt Sealey m...@genesi-usa.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 3/3] clk: dt: binding for basic divider clock

2013-06-04 Thread Matt Sealey
On Tue, Jun 4, 2013 at 2:22 PM, Mike Turquette mturque...@linaro.org wrote:
 Quoting Matt Sealey (2013-06-04 10:39:53)
 On Tue, Jun 4, 2013 at 12:11 PM, Stephen Boyd sb...@codeaurora.org wrote:
  On 06/03/13 10:53, Mike Turquette wrote:
  +Required properties:
  +- compatible : shall be divider-clock.
  +- #clock-cells : from common clock binding; shall be set to 0.
  +- clocks : link to phandle of parent clock
  +- reg : base address for register controlling adjustable divider
  +- mask : arbitrary bitmask for programming the adjustable divider
  +
  +Optional properties:
  +- clock-output-names : From common clock binding.
  +- table : array of integer pairs defining divisors  bitfield values
  +- shift : number of bits to shift the mask, defaults to 0 if not present
  +- index_one : valid divisor programming starts at 1, not zero
  +- index_power_of_two : valid divisor programming must be a power of two
  +- allow_zero : implies index_one, and programming zero results in
  +  divide-by-one
 
  It's preferred that property names have dashes instead of underscores.

 I think I have a suggestion or two here..

 I mailed one to Mike earlier, I have had some mailing list problems
 which I think are solved now..

 If you provide a mask for programming the divider, surely the shift
 property is therefore redundant. Unless some device has a completely
 strange way of doing things and uses two seperated bitfields for
 programming in the same register, the shift value is simply a function
 of ffs() on the mask, isn't it? It is nice to put the information
 specifically in the device tree, but where a shift is not specified
 it'd probably be a good idea to go about deriving that value that way
 anyway, right, and if we're doing that.. why specify it?

 Shift is optional.  Also I think the integer values in a DTS are
 32-bits, so if some day someone had a clock driver that needed to write
 to a 64-bit register then shift might be useful there, combined with a
 32-bit mask.  Or perhaps dtc will have a 64-bit value for that case?  I
 don't know.

 I certainly do see your point about only using the mask, and the
 original clock basic types did this a while back before the shift+width
 style came into vogue.

 If other reviewers also want to use a pure 32-bit mask and no shift then
 I'll drop it in the next round.

Hmm.. that's a very good point...

You can't have a 64-bit OF property unless you're on a 64-bit platform
or do some really weird workarounds for it (multiple cells etc.).
That's how the OF spec was laid out, it sucks.. but it's the standard
and the DT seems to stay compatible with the general concept. A
supplemental binding to support clock dividers living in 64-bit
registers may be a better way.

You'd need a shift for the mask if you could only do a 32-bit property
(32-bit platform with a 4-byte cell size) and a shift for the bits
that would go into the mask as well as a width property for the
register size to bound the shift..  Too complicated for now. I would
say stick with 32-bit until this magical 64-bit divider register
appears, and then do something slightly different :)

 Also it may be much simpler to simply define the default clock setup
 as being an integer divider that follows the form 2^n - I am not sure
 I can think off the top of my head of any clock dividers doing the
 rounds that have divide-by-non-power-of-two and if they exist out
 there (please correct me) they would be the rare ones. I think
 properties that modify behavior should err on the side of being rarely
 used and to define unusual behavior, not confirm the status quo.

 Please review the current use of CLK_DIVIDER_POWER_OF_TWO in the kernel.

You're right, I think I got it backwards or was actually thinking of
allow_zero (as in, 0x0 divider = divide by 1)? i.MX definitely has
more 0x0=div-by-1 than any other kind..

 In any case, once you figure out that, rather than specifying a full
 table of 0 1, 1 2, 2, 4 and creating a big list, you could just
 give the lower and upper limits and let the driver figure out what is
 in between with two values. If there were gaps, a full table would be
 necessary. The allow zero property might be better served by the very
 same range property.

 You have described how it already works.  The table is only there for
 when a common formula fails to capture the possible divisor
 combinations.  For the default the driver starts the index at 0 and goes
 up to mask/width, for index_one the driver does the same but starts at
 1, not 0, for power of two ... well you get the picture.  The table is
 only of use when these common divider schemes are insufficient to
 determine the divisors.

Right, but in the case of index_one and allow_zero, one property for
ranges basically covers the case where a slight modification to the
formula used would be required, and allows specifying the first valid
divider and the last valid divider which might come in handy when you
have a mask of a certain size (say, 6

Re: [RFC PATCH v3 1/2] ARM: kernel: update cpuinfo to print SoC model name

2013-01-30 Thread Matt Sealey
On Wed, Jan 30, 2013 at 1:07 PM, Nicolas Pitre  wrote:
> On Wed, 30 Jan 2013, Ruslan Bilovol wrote:
>
>> Currently, reading /proc/cpuinfo provides userspace with CPU ID of
>> the CPU carrying out the read from the file.
>> Userspace using this information may decide what module
>> to load or how to configure some specific (and processor-depended)
>> settings or so.
>> However, since really different SoCs can share same ARM core,
>> this information currently is not so useful.
>> For example, TI OMAP4460 and OMAP4470 SoCs show the same
>> information in the /proc/cpuinfo whereas they are different.
>> Since in most cases ARM CPU is a part of some system on a chip (SoC),
>> the "cpuinfo" file looks like exactly that place, where this
>> information have to be displayed.
>>
>> So added new line "SoC name" in the "cpuinfo" output for system
>> on a chip name. It is placed between CPU information and machine
>> information, so the file structure looks gracefully (CPU-SoC-Hardware)
>>
>> Example:
>>
>> / # cat proc/cpuinfo
>> [...]
>> CPU variant : 0x2
>> CPU part: 0xc09
>> CPU revision: 10
>>
>> SoC name: OMAP4470
>>
>> Hardware: OMAP4 Blaze Tablet
>
> Please remove that extra blank line between "SoC name" and "Hardware".
> The blank line after "CPU revision" is fine.
>
> Also, please rename this to "System name".  Not all systems are "on
> chip".  By using "System name" this is more universally useful.

I can't agree with "System name", it is confusing in common
terminology since it's about the same definition as the current
"Hardware" line.

If we're just printing out the name of the device surrounding the CPU
- be it a Northbridge/Southbridge combination or SoC packaging -
"Chipset" might be a better name for it.

-- 
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 1/2] ARM: kernel: update cpuinfo to print SoC model name

2013-01-30 Thread Matt Sealey
On Wed, Jan 30, 2013 at 1:07 PM, Nicolas Pitre n...@fluxnic.net wrote:
 On Wed, 30 Jan 2013, Ruslan Bilovol wrote:

 Currently, reading /proc/cpuinfo provides userspace with CPU ID of
 the CPU carrying out the read from the file.
 Userspace using this information may decide what module
 to load or how to configure some specific (and processor-depended)
 settings or so.
 However, since really different SoCs can share same ARM core,
 this information currently is not so useful.
 For example, TI OMAP4460 and OMAP4470 SoCs show the same
 information in the /proc/cpuinfo whereas they are different.
 Since in most cases ARM CPU is a part of some system on a chip (SoC),
 the cpuinfo file looks like exactly that place, where this
 information have to be displayed.

 So added new line SoC name in the cpuinfo output for system
 on a chip name. It is placed between CPU information and machine
 information, so the file structure looks gracefully (CPU-SoC-Hardware)

 Example:

 / # cat proc/cpuinfo
 [...]
 CPU variant : 0x2
 CPU part: 0xc09
 CPU revision: 10

 SoC name: OMAP4470

 Hardware: OMAP4 Blaze Tablet

 Please remove that extra blank line between SoC name and Hardware.
 The blank line after CPU revision is fine.

 Also, please rename this to System name.  Not all systems are on
 chip.  By using System name this is more universally useful.

I can't agree with System name, it is confusing in common
terminology since it's about the same definition as the current
Hardware line.

If we're just printing out the name of the device surrounding the CPU
- be it a Northbridge/Southbridge combination or SoC packaging -
Chipset might be a better name for it.

-- 
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] crypto: fix FTBFS with ARM SHA1-asm and THUMB2_KERNEL

2013-01-22 Thread Matt Sealey
On Tue, Jan 22, 2013 at 1:50 AM, Jussi Kivilinna
 wrote:
>
>> There is 'tcrypt' module in crypto/ for quick benchmarking. 'modprobe
>> tcrypt mode=500 sec=1' tests AES in various cipher-modes, using different
>> buffer sizes and outputs results to kernel log.
>>
>
> Actually mode=200 might be better, as mode=500 is for asynchronous
> implementations and might use hardware crypto if such device/module is
> available.

Okeydokey I'll start running some tests..

-- 
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] crypto: fix FTBFS with ARM SHA1-asm and THUMB2_KERNEL

2013-01-22 Thread Matt Sealey
On Tue, Jan 22, 2013 at 1:50 AM, Jussi Kivilinna
jussi.kivili...@mbnet.fi wrote:

 There is 'tcrypt' module in crypto/ for quick benchmarking. 'modprobe
 tcrypt mode=500 sec=1' tests AES in various cipher-modes, using different
 buffer sizes and outputs results to kernel log.


 Actually mode=200 might be better, as mode=500 is for asynchronous
 implementations and might use hardware crypto if such device/module is
 available.

Okeydokey I'll start running some tests..

-- 
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.


On Mon, Jan 21, 2013 at 7:31 PM, John Stultz  wrote:
> On 01/21/2013 05:06 PM, Matt Sealey wrote:
>>
>> On Mon, Jan 21, 2013 at 6:51 PM, John Stultz 
>> wrote:
>>>
>>> On 01/21/2013 02:54 PM, Matt Sealey wrote:
>>>>
>>>> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz 
>>>> wrote:
>>>>>
>>>>> On 01/21/2013 01:14 PM, Matt Sealey wrote:
>>>
>>> As far as jiffies rating, from jiffies.c:
>>>  .rating= 1, /* lowest valid rating*/
>>>
>>> So I'm not sure what you mean by "the debug on the kernel log is telling
>>> me
>>> it has a higher resolution".
>>
>> Oh, it is just if I actually don't run setup_sched_clock on my
>> platform, it gives a little message (with #define DEBUG 1 in
>> sched_clock.c) about who setup the last sched_clock. Since you only
>> get one chance, and I was fiddling with setup_sched_clock being probed
>> from multiple possible timers from device tree (i.MX3 has a crapload
>> of valid timers, which one you use right now is basically forced by
>> the not-quite-fully-DT-only code and some funky iomap tricks).
>>
>> And what I got was, if I use the real hardware timer, it runs at 66MHz
>> and says it has 15ns resolution and wraps every 500 seconds or so. The
>> jiffies timer says it's 750MHz, with a 2ns resoluton.. you get the
>> drift. The generic reporting of how "good" the sched_clock source is
>> kind of glosses over the quality rating of the clock source and at
>> first glance (if you're not paying that much attention), it is a
>> little bit misleading..
>
>
> I've got no clue on this. sched_clock is arch specific, and while ARM does
> use clocksources for sched_clock, what you're seeing is a detail of the ARM
> implementation and not the clocksource code (one complication is that
> clocksources rating values are for the requirements of timekeeping, which
> are different then the requirements for sched_clock - so the confusion is
> understandable).
>
>
>
>>> Yes, in the case I was remembering, the 60HZ was driven by the electrical
>>> line.
>>
>> While I have your attention, what would be the minimum "good" speed to
>> run the sched_clock or delay timer implementation from? My rudimentary
>> scribblings in my notebook give me a value of "don't bother" with less
>> than 10KHz based on HZ=100, so I'm wondering if a direct 32.768KHz
>> clock would do (i.MX osc clock input if I can supply it to one of the
>> above myriad timers) since this would be low-power compared to a 66MHz
>> one (by a couple mA anyway). I also have a bunch of questions about
>> the delay timer requirements.. I might mail you personally.. or would
>> you prefer on-list?
>
> So there are probably other folks who could better comment on sched_clock()
> or the delay timer (I'm guessing the delay() implementation is what you mean
> by that) design trade-offs.

I'm specifically talking about if I do

static struct delay_timer imx_gpt_delay_timer = {
.read_current_timer = imx_gpt_read_current_timer,
};

and then something like:

imx_gpt_delay_timer.freq = clk_get_rate(clk_per);
register_current_timer_delay(_gpt_delay_timer);

In the sense that now (as of kernel 3.7 iirc), I have an ability to
have the delay implementation use this awesome fast accessor (which is
nothing to do with a 'clocksource' as in the subsystem..) to get to my
(here at least) 66.5MHz counter (up or down, i.MX has both, but I
dunno if you can use a down counter for delay_timer, or if that's
preferred, or what.. there are no examples of it.. but it seems to
work.. that said I can't imagine what would be an immediately visible
and not totally random effect of doing it "wrong", maybe that delays
are instantly returned, that could be very hard or impossible to ever
notice compared to not being able to browse the internet on the target
device.. it might pop up on some randomly-not-resetting platform
device or so, though..)

And I can also put sched_clock on a completely different timer. Does
that make any sense at all? I wouldn't know, it's not documented.

And if I wanted to I could register 8 more timers. That seems rather
excessive, but the ability to use those extra 8 as clock outputs from
the SoC or otherwise directly use comparators is useful to some
people, does Linux in general really give a damn about having 8 timers
of the same quality being available when most systems barely have two
clocksources anyway (on x86, tsc and hpet - on ARM I guess twd and
some SoC-specific timer). I dunno how many people might actually want
to define in a d

Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 7:18 PM, Russell King - ARM Linux
 wrote:
> On Mon, Jan 21, 2013 at 07:06:59PM -0600, Matt Sealey wrote:
>> On Mon, Jan 21, 2013 at 6:51 PM, John Stultz  wrote:
>> > On 01/21/2013 02:54 PM, Matt Sealey wrote:
>
> sched_clock() has nothing to do with time keeping, and that
> HZ/NO_HZ/HRTIMERS don't affect it (when it isn't being derived from
> jiffies).
>
> Now, sched_clock() is there to give the scheduler a _fast_ to access,
> higher resolution clock than is available from other sources, so that
> there's ways of accurately measuring the amount of time processes run
> for,

That depends on what you meant by timekeeping, right?

I'm really not concerned about the wallclock time, more about the
accuracy of the scheduler clock (tick?), preemption, accurate delays
(i.e. if I msleep(10) does it delay for 10ms or for 40ms because my
delay timer is inaccurate? I'd rather it was better but closer to
10ms), and whether the scheduler (the thing that tells my userspace
whether firefox is running now, or totem, or any other task) is using
the correct high resolution periodic, oneshot, repeatable (however it
repeats) timers *properly* given that this magic config item is
missing on ARM.

That magic config item being CONFIG_SCHED_HRTICK which is referenced a
bunch in kernel/sched/*.[ch] but *ONLY* defined as a Kconfig item in
kernel/Kconfig.hz.

Do we need to copy that Kconfig item out to arch/arm/Kconfig, that's
the question?

>  and other such measurements - and it uses that to determine how
> to schedule a particular task and when to preempt it.
>
> Not providing it means you get those measurements at HZ-based resolution,
> which is suboptimal for tasks which run often for sub-HZ periods (which
> can end up accumulating zero run time.)

Okay, and John said earlier:

John Stultz:
> So I'm actually not super familiar with SCHED_HRTICK details, but from my
> brief skim of it it looks like its useful for turning off the periodic timer
> tick, and allowing the scheduler tick to be triggered by an hrtimer itself
> (There's a number of these interesting inversions that go on in switching to
> HRT mode - for instance, standard timer ticks are switched to being hrtimer
> events themselves).
>
> This likely has the benefit of time-accurate preemption (well, long term, as
> if the timer granularity isn't matching you could be delayed up to a tick -
> but it wouldn't drift).
>
> I'm guessing Thomas would probably know best what the potential issues would
> be from running ((CONFIG_HRTIMER  || CONFIG_NO_HZ) && !CONFIG_SCHED_HRTICK).

If SCHED_HRTICK isn't enabled but setup_sched_clock has been given an
accessor for a real, hardware, fast, high resolution counter that
meets all the needs of sched_clock, what's going on? If it's enabled,
what extra is it doing that, say, my_plat_read_sched_clock doesn't?

--
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 6:51 PM, John Stultz  wrote:
> On 01/21/2013 02:54 PM, Matt Sealey wrote:
>>
>> On Mon, Jan 21, 2013 at 4:36 PM, John Stultz 
>> wrote:
>>>
>>> On 01/21/2013 01:14 PM, Matt Sealey wrote:
>
> As far as jiffies rating, from jiffies.c:
> .rating= 1, /* lowest valid rating*/
>
> So I'm not sure what you mean by "the debug on the kernel log is telling me
> it has a higher resolution".

Oh, it is just if I actually don't run setup_sched_clock on my
platform, it gives a little message (with #define DEBUG 1 in
sched_clock.c) about who setup the last sched_clock. Since you only
get one chance, and I was fiddling with setup_sched_clock being probed
from multiple possible timers from device tree (i.MX3 has a crapload
of valid timers, which one you use right now is basically forced by
the not-quite-fully-DT-only code and some funky iomap tricks).

And what I got was, if I use the real hardware timer, it runs at 66MHz
and says it has 15ns resolution and wraps every 500 seconds or so. The
jiffies timer says it's 750MHz, with a 2ns resoluton.. you get the
drift. The generic reporting of how "good" the sched_clock source is
kind of glosses over the quality rating of the clock source and at
first glance (if you're not paying that much attention), it is a
little bit misleading..

> Yes, in the case I was remembering, the 60HZ was driven by the electrical
> line.

While I have your attention, what would be the minimum "good" speed to
run the sched_clock or delay timer implementation from? My rudimentary
scribblings in my notebook give me a value of "don't bother" with less
than 10KHz based on HZ=100, so I'm wondering if a direct 32.768KHz
clock would do (i.MX osc clock input if I can supply it to one of the
above myriad timers) since this would be low-power compared to a 66MHz
one (by a couple mA anyway). I also have a bunch of questions about
the delay timer requirements.. I might mail you personally.. or would
you prefer on-list?

-- 
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] crypto: fix FTBFS with ARM SHA1-asm and THUMB2_KERNEL

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 4:46 PM, Nicolas Pitre  wrote:
> On Mon, 21 Jan 2013, Matt Sealey wrote:
>
>> The optimized assembler SHA1 code for ARM does not conform to Thumb2
>> register usage requirements, so it cannot be built when the kernel is
>> configured with THUMB2_KERNEL.
>>
>> Fix the FTBFS for now by preventing misconfigurations of the kernel.
>>
>> Signed-off-by: Matt Sealey 
>
> A .arm directive at the top of the assembly code would be a better
> "fix", as that wouldn't reduce functionality.

If I recall, doing that last time for ssi-fiq.S was the wrong solution
and it was suggesed proper configuration (on top of possibly rewriting
the assembly) was better than hacking around in the assembly..

> Yet, I'd invite you to have a look at commit 638591cd7b in linux-next.

I took a peek, and invite you to ignore my patch. I only tracked the
top of Linus' tree..

That said, it seems nobody benchmarked this on something different
than IXP425 or KS8695 to see if it's markedly faster than the
(moderately recently updated) C-code implementation outside of the
mentioned in the logs for initial commit? It seems like rather a
specific optimization for a rather specific use case for rather
specific processors (and therefore a small test base) probably meant
for a very specific product line somewhere. Whether you get any
benefit in enabling this config item or not for any other ARM platform
is up for debate, isn't it?

If it *is* in fact much faster everywhere, and it works in any ARM or
THUMB2 configuration, there's a case to be built for it being the
default ARM implementation for AES and SHA1..

This question is to the implementor/committer (Dave McCullough), how
exactly did you measure the benchmark and can we reproduce it on some
other ARM box?

If it's long and laborious and not so important to test the IPsec
tunnel use-case, what would be the simplest possible benchmark to see
if the C vs. assembly version is faster for a particular ARM device? I
can get hold of pretty much any Cortex-A8 or Cortex-A9 that matters, I
have access to a Chromebook for A15, and maybe an i.MX27 or i.MX35 and
a couple Marvell boards (ARMv6) if I set my mind to it... that much
testing implies we find a pretty concise benchmark though with a
fairly common kernel version we can spread around (i.MX, OMAP and the
Chromebook, I can handle, the rest I'm a little wary of bothering to
spend too much time on). I think that could cover a good swath of
not-ARMv5 use cases from lower speeds to quad core monsters.. but I
might stick to i.MX to start with..

--
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 6:09 PM, Matt Sealey  wrote:

[LAKML: about lack of SCHED_HRTICK because we don't use
kernel/Kconfig.hz on ARM)]

> kind of leaves ARM in the doghouse.. who knows what weirdo scheduler
> reactions are related to it not being enabled. Maybe when it is, HZ
> *would* need to be allowed to be bumped when using this code path?

Or conversely maybe this is exactly why the Samsung maintainers
decided they need HZ=200, because SCHED_HRTICK isn't being enabled and
they're experiencing some multimedia lag because of it?

-- 
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
Okay so the final resolution of this is;

1) That the arch/arm/Kconfig HZ block is suffering from some cruft

I think we could all be fairly confident that Exynos4 or S5P does not
require HZ=200 - in theory, it has no such timer restrictions like
EBSA110 (the docs I have show a perfectly capable 32-bit timer with a
double-digits MHz input clock, since these are multimedia-class SoCs
it'd be seriously f**ked up if they didn't).

But while some of the entries on this line may be cargo-cult
programming, the original addition on top of EBSA110 *may* be one of
your "unreported" responsiveness issues.

We could just let some Samsung employees complain when Android 6.x
starts to get laggy with a 3.8 kernel because we forced their HZ=100.

What I would do is predicate a fixed, obvious default on
ARCH_MULTIPLATFORM so that it would get the benefit of a default HZ
that you agree with. It wouldn't CHANGE anything, but it makes it look
less funky, since the non-multiplatform settings would be somewhere
else (it either needs more comments or an if - either way - otherwise
it's potentially confusing);

if ARCH_MULTIPLATFORM
config HZ
   int
   default 100
else
   # old config HZ block here
endif

2) We need to add config SCHED_HRTICK as a copy and paste from
kernel/Kconfig.hz since.. well, I still don't understand exactly what
the true effect would be, but I assume since Arnd is concerned and
John's explanation rings true that it really should be enabled on ARM
systems with the exact same dependencies as kernel/Kconfig.hz.

Or not.. I see it as an oddity until I understand if we really care
about it, but the code seems to be fairly important to the scheduler
and also enabled by default almost everywhere else, which means only
people with really freakish SMP architectures with no ability to use
GENERIC_SMP_HELPERS have ever run these code paths besides ARM. That
kind of leaves ARM in the doghouse.. who knows what weirdo scheduler
reactions are related to it not being enabled. Maybe when it is, HZ
*would* need to be allowed to be bumped when using this code path?
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.


On Mon, Jan 21, 2013 at 5:49 PM, Russell King - ARM Linux
 wrote:
> On Mon, Jan 21, 2013 at 05:23:33PM -0600, Matt Sealey wrote:
>> On Mon, Jan 21, 2013 at 4:42 PM, Russell King - ARM Linux
>>  wrote:
>> > On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote:
>> >> I am sorry it sounded if I was being high and mighty about not being
>> >> able to select my own HZ (or being forced by Exynos to be 200 or by
>> >> not being able to test an Exynos board, forced to default to 100). My
>> >> real "grievance" here is we got a configuration item for the scheduler
>> >> which is being left out of ARM configurations which *can* use high
>> >> resolution timers, but I don't know if this is a real problem or not,
>> >> hence asking about it, and that HZ=100 is the ARM default whether we
>> >> might be able to select that or not.. which seems low.
>> >
>> > Well, I have a versatile platform here.  It's the inteligence behind
>> > the power control system for booting the boards on the nightly tests
>> > (currently disabled because I'm waiting for my main server to lock up
>> > again, and I need to use one of the serial ports for that.)
>> >
>> > The point is, it talks via I2C to a load of power monitors to read
>> > samples out.  It does this at sub-100Hz intervals.  Yet the kernel is
>> > built with HZ=100.  NO_HZ=y and highres timers are enabled... works
>> > fine.
>> >
>> > So, no, HZ=100 is not a limit in that scenario.  With NO_HZ=y and
>> > highres timers, it all works with epoll() - you get the interval that
>> > you're after.  I've verified this with calls to gettimeofday() and
>> > the POSIX clocks.
>>
>> Okay.
>>
>> So, can you read this (it's short):
>>
>> http://ck.kolivas.org/patches/bfs/bfs-configuration-faq.txt
>>
>> And please tell me if he's batshit crazy and I should completely
>> ignore any scheduler discussion that isn't ARM-specific, or maybe..
>> and I can almost guarantee this, he doesn't have an ARM platform so
>> he's just delightfully ill-informed about anything but his quad-core
>> x86?
>
> Well... my x86 laptop is... HZ=1000, NO_HZ, HIGH_RES enabled, ondemand...
> doesn't really fit into any of those categories given there.  I'd suggest
> that what's given there is a suggestion/opinion based on behaviours
> observed on x86 platforms.
>
> Whether it's appropriate for other architectures is not really a proven
> point - is it worth running ARM at 1000Hz when the load from running at
> 100Hz is measurable as a defini

Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 5:13 PM, Russell King - ARM Linux
 wrote:
> On Mon, Jan 21, 2013 at 04:54:31PM -0600, Matt Sealey wrote:
>> Hmm, I think it might be appreciated for people looking at this stuff
>> (same as I stumbled into it) for a little comment on WHY the default
>> is 200. That way you don't wonder even if you know why EBSA110 has a
>> HZ=200 default, why Exynos is lumped in there too (to reduce the
>> number of interrupts firing?
>
> Err, _reduce_ ?
>
> Can you please explain why changing HZ from 100 to 200 is a reduction?

We were talking about HZ=1000 at the time, sorry...

>> Maybe the Exynos timer interrupt is kind
>> of a horrid core NMI kind of thing and it's desirable for it not to be
>> every millisecond,
>
> Huh?  HZ=100 is centisecond intervals...

See above..

> I think you're understanding is just wy off.  That default is
> there because that is the _architecture_ _default_ and there _has_ to
> be a default.  No, including kernel/Kconfig.hz won't give us any kind
> of non-specified default because, as I've already said in one of my
> other mails, you can't supplement Kconfig symbol definitions by
> declaring it multiple times.

Okay, so the real



>> I know where the 60Hz clocksource might come from, the old Amiga
>> platforms have one based on the PSU frequency (50Hz in Europe, 60Hz
>> US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at
>> least, it is precisely the vsync clock for synchronizing your display
>> output on TV-out, which makes it completely useful for the framebuffer
>> driver), but.. you just won't expect to assign it as sched_clock or
>> your delay timer. And if anyone does I'd expect they'd know full well
>> it'd not run so well.
>
> Except in the UK where it'd be 50Hz for the TV out.  (Lengthy irrelevant
> explanation why this is so for UK cut.)

Read again: "50Hz in Europe".

Australia too. I'm British and I used to have more EU-manufactured
Amigas than I knew what to do with.. so.. just like your NTP story, I
definitely know this already.

>> >From that description, we are booting with standard HZ on ARM, and the
>> core sched_clock (as in we can call setup_sched_clock)
>> and/or/both/optionally using a real delay_timer switch to HRT mode if
>> we have the right equipment available in the kernel and at runtime on
>> the SoC.. but the process scheduler isn't compiled with the means to
>> actually take advantage of us being in HRT mode?
>
> Don't mix sched_clock() into this; it has nothing to do with HZ at all.
> You're confusing your apples with your oranges.

Okay..

>> A simple BUILD_BUG_ON and a BUG_ON right after each other in the
>> appropriate clocksource driver solves that.. if there's an insistence
>> on having at least some rope, we can put them in a field and tell them
>> they have to use the moon to actually hang themselves...
>
> No it doesn't - it introduces a whole load of new ways to make the
> kernel build or boot fail for pointless reasons - more failures, more
> regressions.
>
> No thank you.

But it would effectively stop users drinking kool-aid.. if you set
your HZ to something stupid, you don't even get a kernel to build, and
certainly don't get to boot past the first 40 lines of boot messages..
I think most people would rather a build error, or a runtime
unmistakable, unmissable warning than a subtle and almost
imperceptible skew in NTP synchronization, to use your example.

-- 
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 4:42 PM, Russell King - ARM Linux
 wrote:
> On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote:
>> I am sorry it sounded if I was being high and mighty about not being
>> able to select my own HZ (or being forced by Exynos to be 200 or by
>> not being able to test an Exynos board, forced to default to 100). My
>> real "grievance" here is we got a configuration item for the scheduler
>> which is being left out of ARM configurations which *can* use high
>> resolution timers, but I don't know if this is a real problem or not,
>> hence asking about it, and that HZ=100 is the ARM default whether we
>> might be able to select that or not.. which seems low.
>
> Well, I have a versatile platform here.  It's the inteligence behind
> the power control system for booting the boards on the nightly tests
> (currently disabled because I'm waiting for my main server to lock up
> again, and I need to use one of the serial ports for that.)
>
> The point is, it talks via I2C to a load of power monitors to read
> samples out.  It does this at sub-100Hz intervals.  Yet the kernel is
> built with HZ=100.  NO_HZ=y and highres timers are enabled... works
> fine.
>
> So, no, HZ=100 is not a limit in that scenario.  With NO_HZ=y and
> highres timers, it all works with epoll() - you get the interval that
> you're after.  I've verified this with calls to gettimeofday() and
> the POSIX clocks.

Okay.

So, can you read this (it's short):

http://ck.kolivas.org/patches/bfs/bfs-configuration-faq.txt

And please tell me if he's batshit crazy and I should completely
ignore any scheduler discussion that isn't ARM-specific, or maybe..
and I can almost guarantee this, he doesn't have an ARM platform so
he's just delightfully ill-informed about anything but his quad-core
x86?

>> HZ=250 is the "current" kernel default if you don't touch anything, it
>> seems, apologies for thinking it was HZ=100.
>
> Actually, it always used to be 100Hz on everything, including x86.
> It got upped when there were interactivity issues... which haven't
> been reported on ARM - so why change something that we know works and
> everyone is happy with?

I don't know. I guess this is why I included Ingo and Peter as they
seem to be responsible for core HZ-related things; why have HZ=250 on
x86 when CONFIG_NO_HZ and HZ=100 would work just as effectively? Isn't
CONFIG_NO_HZ the default on x86 and PPC and.. pretty much everything
else?

I know Con K. has been accused many times of peddling snake-oil... but
he has pretty graphs and benchmarks that kind of bear him out on most
things even if the results do not get his work upstream. I can't fault
the statistical significance of his results.. but even a placebo
effect can be graphed, correlation is not causation, etc, etc. - I
don't know if anything real filters down into the documentation
though.

>> And that is too high for
>> EBSA110 and a couple of other boards, especially where HZ must equal
>> some exact divisor being pumped right into some timer unit.
>
> EBSA110 can do 250Hz, but it'll mean manually recalculating the timer
> arithmetic - because it's not a "reloading" counter - software has to
> manually reload it, and you have to take account of how far it's
> rolled over to get anything close to a regular interrupt rate which
> NTP is happy with.  And believe me, it used to be one of two main NTP
> broadcasting servers on my network, so I know it works.

A-ha...

>> Anyway, a patch for ARM could perhaps end up like this:
>>
>> ~~
>> if ARCH_MULTIPLATFORM
>> source kernel/Kconfig.hz
>> else
>> HZ
>> default 100
>> endif
>>
>> HZ
>> default 200 if ARCH_EBSA110 || ARCH_ETC_ETC || ARCH_UND_SO_WEITER
>> # any previous platform definitions where *really* required here.
>> # but not default 100 since it would override kernel/Kconfig.hz every 
>> time
>
> That doesn't work - if you define the same symbol twice, one definition
> takes priority over the other (I don't remember which way it works).
> They don't accumulate.

Well I did some testing.. a couple days of poking around, and they
don't need to accumulate.

> Because... it simply doesn't work like that.  Try it and check to see
> what Kconfig produces.

I did test it.. whatever you define last, sticks, and it's down to the
order they're parsed in the tree - luckily, arch/arm/Kconfig is
sourced first, which sources the mach/plat stuff way down at  the
bottom. As long as you have your "default" set somewhere, any further
default just has to be sourced or added later in *one* of the
Kconfigs, same as building any C file with "gcc -E" and spitting it
out.

Someone, at the end of it all, has to set some

Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 4:45 PM, Russell King - ARM Linux
 wrote:
> On Mon, Jan 21, 2013 at 10:30:07PM +, Arnd Bergmann wrote:
>> On Monday 21 January 2013, Matt Sealey wrote:
>> > So is that a bug in that it is not available to ARM right now, a bug
>> > in that it would be impossible for anyone on ARM to have ever tested
>> > this code, or a bug in that it should NEVER be enabled for ARM for
>> > some reason? John? Ingo? :)
>> >
>>
>> I think it's a bug that it's not available. That does not look intentional.
>
> What's a bug?  kernel/Kconfig.hz not being available?  No, it's
> intentional.  (See my replies).

The bug I saw as real is that CONFIG_SCHED_HRTICK is defined only in
kernel/Kconfig.hz (and used in kernel/sched only) - so if we want that
functionality enabled we will also have to opencode it in
arch/arm/Kconfig. Everyone else, by virtue of using kernel/Kconfig.hz,
gets this config item enabled for free if they have hrtimers or
generic smp helpers.. if I understood what John just said, this means
on ARM, since we don't use kernel/Kconfig.hz and we don't also define
an item for CONFIG_SCHED_HRTICK, the process scheduler is completely
oblivious that we're running in HRT mode?

The thing I don't know is real is if that really matters one bit..

-- 
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 4:36 PM, John Stultz  wrote:
> On 01/21/2013 01:14 PM, Matt Sealey wrote:
>>
>> On Mon, Jan 21, 2013 at 3:00 PM, John Stultz 
>> wrote:
>>>
>>> On 01/21/2013 12:41 PM, Arnd Bergmann wrote:
>>>>
>>>> Right. It's pretty clear that the above logic does not work
>>>> with multiplatform.  Maybe we should just make ARCH_MULTIPLATFORM
>>>> select NO_HZ to make the question much less interesting.
>>>
>>> Although, even with NO_HZ, we still have some sense of HZ.
>>
>> In this case, no matter whether CONFIG_HZ=1000 or CONFIG_HZ=250 (for
>> example) combined with CONFIG_NO_HZ and less than e.g. 250 things
>> happening per second will wake up "exactly" the same number of times?
>
> Ideally, if both systems are completely idle, they may see similar number of
> actual interrupts.
>
> But when the cpus are running processes, the HZ=1000 system will see more
> frequent interrupts, since the timer/scheduler interrupt will jump in 4
> times more frequently.

Understood..

>> CONFIG_HZ=1000 with CONFIG_NO_HZ would be an effective, all-round
>> solution here, then, and CONFIG_HZ=100 should be a reasonable default
>> (as it is anyway with an otherwise-unconfigured kernel on any other
>> platform) for !CONFIG_NO_HZ.
>
> Eeehhh... I'm not sure this is follows.

Okay, I'm happy to be wrong on this...

>> As above, or "not select anything at all" since HZ=100 if you don't
>> touch anything, right?
>
> Well, Russell brought up a case that doesn't handle this. If a system
> *can't* do HZ=100, but can do HZ=200.
>
> Though there are hacks, of course, that might get around this (skip every
> other interrupt at 200HZ).

Hmm, I think it might be appreciated for people looking at this stuff
(same as I stumbled into it) for a little comment on WHY the default
is 200. That way you don't wonder even if you know why EBSA110 has a
HZ=200 default, why Exynos is lumped in there too (to reduce the
number of interrupts firing? Maybe the Exynos timer interrupt is kind
of a horrid core NMI kind of thing and it's desirable for it not to be
every millisecond, or maybe it has the same restrictions as EBSA110,
but where would anyone go to find out this information?)

>> If someone picks HZ=1000 and their platform can't support it, then
>> that's their own damn problem (don't touch things you don't
>> understand, right? ;)
>
> Well, ideally with kconfig we try to add proper dependencies so impossible
> options aren't left to the user.
> HZ is a common enough knob to turn on most systems, I don't know if leaving
> the user rope to hang himself is a great idea.

I think then the default 100 at the end of the arch/arm/Kconfig is
saying "you are not allowed to know that such a thing as rope even
exists," when in fact what we should be doing is just making sure they
can't swing it over the rafters.. am I taking the analogy too far? :)

>> My question really has to be is CONFIG_SCHED_HRTICK useful, what
>> exactly is it going to do on ARM here since nobody can ever have
>> enabled it? Is it going to keel over and explode if nobody registers a
>> non-jiffies sched_clock (since the jiffies clock is technically
>> reporting itself as a ridiculously high resolution clocksource..)?
>
> ??? Not following this at all.  jiffies is the *MOST* coarse resolution
> clocksource there is (at least that I'm aware of.. I recall someone wanting
> to do a 60Hz clocksource, but I don't think that ever happened).

Is that based on it's clocksource rating (probably worse than a real
hrtimer) or it's reported resolution? Because on i.MX51 if I force it
to use the jiffies clock the debug on the kernel log is telling me it
has a higher resolution (it TELLS me that it ticks "as fast" as the
CPU frequency and wraps less than my real timer).

I know where the 60Hz clocksource might come from, the old Amiga
platforms have one based on the PSU frequency (50Hz in Europe, 60Hz
US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at
least, it is precisely the vsync clock for synchronizing your display
output on TV-out, which makes it completely useful for the framebuffer
driver), but.. you just won't expect to assign it as sched_clock or
your delay timer. And if anyone does I'd expect they'd know full well
it'd not run so well.

>> Or is this one of those things that if your platform doesn't have a
>> real high resolution timer, you shouldn't enable HRTIMERS and
>> therefore not enable SCHED_HRTICK as a result? That affects
>> ARCH_MULTIPLATFORM here. Is the solution as simple as
>> ARCH_MULTIPLATFORM compliant platforms kind of have to have a high
>> resolution timer? Documentation to that effec

Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
to 100
on it's own, then that "default 100" is overly restrictive and we
could remove it, allowing each {mach,plat}-*/Kconfig owner to
investigate and find the correct HZ value and implement an override or
selection, or just allow free configuration?

As far as I can tell AT91 and SHMOBILE only supply defaults because HZ
*must* meet some exact timer divisor (OMAP says "Kernel internal timer
frequency should be a divisor of 32768") in which case their timer
drivers should not be so stupid and instead round down to the nearest
acceptable timer divisor or WARN_ON if the compile-time values are
unacceptable at runtime before anyone sees any freakish behavior. Is
it a hard requirement for the ARM architecture that a woefully
mis-configured kernel MUST boot completely to userspace?

--
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 3:00 PM, John Stultz  wrote:
> On 01/21/2013 12:41 PM, Arnd Bergmann wrote:
>>
>> Right. It's pretty clear that the above logic does not work
>> with multiplatform.  Maybe we should just make ARCH_MULTIPLATFORM
>> select NO_HZ to make the question much less interesting.
>
> Although, even with NO_HZ, we still have some sense of HZ.

I wonder if you can confirm my understanding of this by the way? The
way I think this works is;

CONFIG_HZ on it's own defines the rate at which the kernel wakes up
from sleeping on the job, and checks for current or expired timer
events such that it can do things like schedule_work (as in
workqueues) or perform scheduler (as in processes/tasks) operations.

CONFIG_NO_HZ turns on logic which effectively only wakes up at a
*maximum* of CONFIG_HZ times per second, but otherwise will go to
sleep and stay that way if no events actually happened (so, we rely on
a timer interrupt popping up).

In this case, no matter whether CONFIG_HZ=1000 or CONFIG_HZ=250 (for
example) combined with CONFIG_NO_HZ and less than e.g. 250 things
happening per second will wake up "exactly" the same number of times?

CONFIG_HZ=1000 with CONFIG_NO_HZ would be an effective, all-round
solution here, then, and CONFIG_HZ=100 should be a reasonable default
(as it is anyway with an otherwise-unconfigured kernel on any other
platform) for !CONFIG_NO_HZ.

I have to admit, the only reason I noticed the above is because I was
reading one of CK's BFS logs and reading it makes it seem like the
above is the case, but I have no idea if he thinks BFS makes that the
case or if the current CFQ scheduler makes that the case, or if this
is simply.. the case.. (can you see this is kind of confusing to me as
this is basically not written anywhere except maybe an LWN article
from 2008 I read up on? :)

>> Regarding the defaults, I would suggest putting them into all the
>> defaults into the defconfig files and removing the other hardcoding
>> otherwise. Ben Dooks and Russell are probably the best to know
>> what triggered the 200 HZ for s3c24xx and for ebsa110. My guess
>> is that the other samsung ones are the result of cargo cult
>> programming.
>>
>> at91 and omap set the HZ value to something that is derived
>> from their hardware timer, but we have also forever had logic
>> to calculate the exact time when that does not match. This code
>> has very recently been moved into the new register_refined_jiffies()
>> function. John can probably tell is if this solves all the problems
>> for these platforms.
>
>
> Yea, as far as timekeeping is concerned, we shouldn't be HZ dependent (and
> the register_refined_jiffies is really only necessary if you're not
> expecting a proper clocksource to eventually be registered), assuming the
> hardware can do something close to the HZ value requested.
>
> So I'd probably want to hear about what history caused the specific 200 HZ
> selections, as I suspect there's actual hardware limitations there. So if
> you can not get actual timer ticks any faster then 200 HZ on that hardware,
> setting HZ higher could cause some jiffies related timer trouble (ie: if the
> kernel thinks HZ is 1000 but the hardware can only do 200, that's a
> different problem then if the hardware actually can only do 999.8 HZ). So
> things like timer-wheel timeouts may not happen when they should.
>
> I suspect the best approach for multi-arch in those cases may be to select
> HZ=100

As above, or "not select anything at all" since HZ=100 if you don't
touch anything, right?

If someone picks HZ=1000 and their platform can't support it, then
that's their own damn problem (don't touch things you don't
understand, right? ;)

> and use HRT to allow more modern systems to have finer-grained
> timers.

My question really has to be is CONFIG_SCHED_HRTICK useful, what
exactly is it going to do on ARM here since nobody can ever have
enabled it? Is it going to keel over and explode if nobody registers a
non-jiffies sched_clock (since the jiffies clock is technically
reporting itself as a ridiculously high resolution clocksource..)?

Or is this one of those things that if your platform doesn't have a
real high resolution timer, you shouldn't enable HRTIMERS and
therefore not enable SCHED_HRTICK as a result? That affects
ARCH_MULTIPLATFORM here. Is the solution as simple as
ARCH_MULTIPLATFORM compliant platforms kind of have to have a high
resolution timer? Documentation to that effect?

-- 
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 2:41 PM, Arnd Bergmann  wrote:
> On Monday 21 January 2013, Matt Sealey wrote:
>>
>> ARM seems to be the only "major" platform not using the
>> kernel/Kconfig.hz definitions, instead rolling it's own and setting
>> what could be described as both reasonable and unreasonable defaults
>> for platforms. If we're going wholesale for multiplatform on ARM then
>> having CONFIG_HZ be selected dependent on platform options seems
>> rather curious since building a kernel for Exynos, OMAP or so will
>> force the default to a value which is not truly desired by the
>> maintainers.
>
> Agreed 100%.
>
> (adding John Stultz to Cc, he's the local time expert)

Hi, John! Welcome to the fray :)

>> config HZ
>> int
>> default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \
>> ARCH_S5PV210 || ARCH_EXYNOS4
>> default OMAP_32K_TIMER_HZ if ARCH_OMAP && OMAP_32K_TIMER
>> default AT91_TIMER_HZ if ARCH_AT91
>> default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE
>> default 100
>>

[snip]

>> Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the
>> other ARM platforms I also want to boot on it.. this is not exactly
>> multiplatform compliant, right?
>
> Right. It's pretty clear that the above logic does not work
> with multiplatform.  Maybe we should just make ARCH_MULTIPLATFORM
> select NO_HZ to make the question much less interesting.
>
> Regarding the defaults, I would suggest putting them into all the
> defaults into the defconfig files and removing the other hardcoding
> otherwise. Ben Dooks and Russell are probably the best to know
> what triggered the 200 HZ for s3c24xx and for ebsa110. My guess
> is that the other samsung ones are the result of cargo cult
> programming.
>
> at91 and omap set the HZ value to something that is derived
> from their hardware timer, but we have also forever had logic
> to calculate the exact time when that does not match. This code
> has very recently been moved into the new register_refined_jiffies()
> function. John can probably tell is if this solves all the problems
> for these platforms.

I would be very interested. My plan would be then (providing John
responds in the affirmative) to basically submit a patch to remove the
8 lines pasted above and source kernel/Kconfig.hz instead. I'm doing
this now on a local kernel tree and I can't see any real problem with
it.

It would then be up to the above-mentioned maintainers to decide if
they are part of the cargo cult and don't need it or refine their
board files to match the New World Order of using Kconfig.hz. The
unconfigured kernel default is 100 anyway which is lower than all the
above default setting, so I would technically be causing a regression
on those platforms... do I want to be responsible for that? Probably
not, but as I said, it's not affecting (in fact, it may be
*improving*) the platforms I care about.

>> Additionally, using kernel/Kconfig.hz is a predicate for enabling
>> (forced enabling, even) CONFIG_SCHED_HRTICK which is defined nowhere
>> else. I don't know how many ARM systems here benefit from this, if
>> there is a benefit, or what this really means.. if you really have a
>> high resolution timer (and hrtimers enabled) that would assist the
>> scheduler this way, is it supposed to make a big difference to the way
>> the scheduler works for the better or worse? Is this actually
>> overridden by ARM sched_clock handling or so? Shouldn't there be a
>> help entry or some documentation for what this option does? I have
>> CC'd the scheduler maintainers because I'd really like to know what I
>> am doing here before I venture into putting patches out which could
>> potentially rip open spacetime and have us all sucked in..
>
> Yes, that sounds like yet another bug.

So is that a bug in that it is not available to ARM right now, a bug
in that it would be impossible for anyone on ARM to have ever tested
this code, or a bug in that it should NEVER be enabled for ARM for
some reason? John? Ingo? :)

-- 
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
y
the scheduler works for the better or worse? Is this actually
overridden by ARM sched_clock handling or so? Shouldn't there be a
help entry or some documentation for what this option does? I have
CC'd the scheduler maintainers because I'd really like to know what I
am doing here before I venture into putting patches out which could
potentially rip open spacetime and have us all sucked in..

And I guess I have one more question before I do attempt to open that
tear, what really is the effect of CONFIG_HZ vs. CONFIG_NO_HZ vs. ARM
sched_clock, and usage of the new hooks to register a real timer as
ARM delay_timer? I have patches I can modify for upstream that add
both device tree implementation and probing of i.MX highres
clocksources (GPT and EPIT) and registration of sched_clock and delay
timer implementations based on these clocks, but while the code
compiles and seems to work, the ACTUAL effect of these (and the
fundamental requirements for the clocks being used) seems to be
information only in the minds of the people who wrote the code. It's
not that obvious to me what the true effect of using a non-architected
ARM core timer for at least the delay_timer is, and I have some really
odd lpj values and very strange re-calibrations popping out (with
constant rate for the timer, lpj goes down.. when using the
delay_timer implementation, shouldn't lpj be still relative to the
timer rate and NOT cpu frequency?) when using cpufreq on i.MX5 when I
do it, and whether CONFIG_SCHED_HRTICK is a good or bad idea..

Apologies for the insane number of questions here, but fully
appreciative of any answers,

--
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mc13892: sanity check num_regulators parsed vs. registered

2013-01-21 Thread Matt Sealey
Imagine a situation where a device tree has a few regulators in an
appropriate node:

regulators {
sw1 {
..
};

vvideo {
..
};

:

vfake {
..
};

vtypo {
..
};
};

In the above example, the node name "vfake" is an attempt to match a
regulator name inside the driver which just so happens to not exist. The
node name "vtypo" represents an accidental typographical error in a
regulator name which may have been introduced to a device tree.

In these cases, the number of regulators the mc13892 driver thinks it has
does not match the number of regulators it parsed and registered. Since
it will go over this array based on this number, it will actually
re-register regulator "0" (which happens to be SW1) over and over
again until it reaches the number, resulting in messages on the kernel
log such as these:

SW1: at 1100 mV
VVIDEO: at 2775mV
:
SW1: at 1100 mV
SW1: at 1100 mV

.. up to that number of "mismatched" regulators. Nobody using DT can/will
consume these regulators, so it should not be possible for it to cause any
real regulator problems or driver breakages, but it is an easy thing to
miss in a kernel log and is an immediate indication of a problem with the
device tree authoring.

This patch effectively sanity checks the number of counted children of
the regulators node vs. the number that actually matched driver names,
and sets the appropriate num_regulators value. It also gives a little
warning for device tree authors that they MAY have screwed something up,
such that this patch does not hide the device tree authoring problem.

Signed-off-by: Matt Sealey 
Tested-by: Steev Klimaszewski 
Cc: Shawn Guo 
Cc: Fabio Estevam 
---
 drivers/regulator/mc13892-regulator.c  |   39 ++--
 drivers/regulator/mc13xxx-regulator-core.c |   10 +--
 drivers/regulator/mc13xxx.h|4 +--
 3 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/drivers/regulator/mc13892-regulator.c 
b/drivers/regulator/mc13892-regulator.c
index e4f2197..7c86040 100644
--- a/drivers/regulator/mc13892-regulator.c
+++ b/drivers/regulator/mc13892-regulator.c
@@ -535,15 +535,18 @@ static int mc13892_regulator_probe(struct platform_device 
*pdev)
struct mc13xxx_regulator_init_data *mc13xxx_data;
struct regulator_config config = { };
int i, ret;
-   int num_regulators = 0;
+   int num_regulators = 0, num_parsed;
u32 val;
 
num_regulators = mc13xxx_get_num_regulators_dt(pdev);
+
if (num_regulators <= 0 && pdata)
num_regulators = pdata->num_regulators;
if (num_regulators <= 0)
return -EINVAL;
 
+   num_parsed = num_regulators;
+
priv = devm_kzalloc(>dev, sizeof(*priv) +
num_regulators * sizeof(priv->regulators[0]),
GFP_KERNEL);
@@ -586,7 +589,39 @@ static int mc13892_regulator_probe(struct platform_device 
*pdev)
= mc13892_vcam_get_mode;
 
mc13xxx_data = mc13xxx_parse_regulators_dt(pdev, mc13892_regulators,
-   ARRAY_SIZE(mc13892_regulators));
+   ARRAY_SIZE(mc13892_regulators),
+   _parsed);
+
+   /*
+* Perform a little sanity check on the regulator tree - if we found
+* a number of regulators from mc13xxx_get_num_regulators_dt and
+* then parsed a smaller number in mc13xxx_parse_regulators_dt then
+* there is a regulator defined in the regulators node which has
+* not matched any usable regulator in the driver. In this case,
+* there is one missing and what will happen is the first regulator
+* will get registered again.
+*
+* Fix this by basically making our number of registerable regulators
+* equal to the number of regulators we parsed. We are allocating
+* too much memory for priv, but this is unavoidable at this point.
+*
+* As an example of how this can happen, try making a typo in your
+* regulators node (vviohi {} instead of viohi {}) so that the name
+* does not match..
+*
+* The check will basically pass for platform data (non-DT) because
+* mc13xxx_parse_regulators_dt for !CONFIG_OF will not touch num_parsed.
+*
+*/
+   if (num_parsed != num_regulators) {
+   dev_warn(>dev,
+   "parsed %d != regulators %d - check your device tree!\n",
+   num_parsed, num_regulators);
+
+   num_regulators = num_parsed;
+   priv->num_regulators = num_regulators;
+   }
+
for (i = 0; i < num_regulators; i++) {
struct regulator_init_data *init_data;
st

[PATCH] mc13892-regulator: correct/refine handling of the SWxHI bit

2013-01-21 Thread Matt Sealey
MC13892 PMIC supports a "HI" bit for 3 of it's 4 buck switcher outputs,
which enables a higher set of voltage ranges.

Despite a comment in the code ('sw regulators need special care due to the
"hi" bit'), it actually does not take special care since it does not modify
it's use of the selector table index when this bit is set, giving us very
odd behavior when setting a high voltage on supported switchers or listing
current voltages. Net effect is in best case the kernel and sysfs report
lower voltages than are actually set in hardware (1300mV instead of 1800mV
for example) and in the worst case setting a voltage (e.g. 1800mV) will cause
an undervoltage condition (e.g. 1300mV).

Correct the behavior, taking into account SW1 doesn't support the HI bit,
and as such we need to ignore it.

While we are modifying these functions, fix and optimize the following;

* set_voltage_sel callback was using .reg instead of .vsel_reg - since
  they were set to the same value it actually didn't break anything but
  it would be semantically incorrect to use .reg in this case. We now use
  .vsel_reg and be consistent.
* vsel_shift is always 0 for every SWx regulator, and constantly shifting
  and masking off the bottom few bits is time consuming and makes the
  code very hard to read - optimize this out.
* get_voltage_sel uses the variable "val" and set_voltage_sel uses the
  variable "selector" (and reg_value). Introduce the variable "selector"
  to get_voltage_sel such that it makes more sense and allow some leaner
  code in light of the modifications in this patch. Add better exposure
  to the debug print so the register value AND the selector are printed as
  this will adequately show the HI bit in the register.
* correct a comment in probe which is doing a version check. Magic
  values are awful but for once instance, a comment does just as
  good a job as something symbolic.

Signed-off-by: Matt Sealey 
Tested-by: Steev Klimaszewski 
Cc: Shawn Guo 
Cc: Fabio Estevam 
---
 drivers/regulator/mc13892-regulator.c |   72 +
 1 file changed, 56 insertions(+), 16 deletions(-)

diff --git a/drivers/regulator/mc13892-regulator.c 
b/drivers/regulator/mc13892-regulator.c
index 0d84b1f..169457b 100644
--- a/drivers/regulator/mc13892-regulator.c
+++ b/drivers/regulator/mc13892-regulator.c
@@ -164,6 +164,14 @@ static const unsigned int mc13892_sw1[] = {
135, 1375000
 };
 
+/*
+ * Note: this table is used to derive SWxVSEL by index into
+ * the array. Offset the values by the index of 110uV
+ * to get the actual register value for that voltage selector
+ * if the HI bit is to be set as well.
+ */
+#define MC13892_SWxHI_SEL_OFFSET   20
+
 static const unsigned int mc13892_sw[] = {
60,   625000,  65,  675000,  70,  725000,
75,   775000,  80,  825000,  85,  875000,
@@ -239,7 +247,6 @@ static const unsigned int mc13892_pwgtdrv[] = {
 };
 
 static struct regulator_ops mc13892_gpo_regulator_ops;
-/* sw regulators need special care due to the "hi bit" */
 static struct regulator_ops mc13892_sw_regulator_ops;
 
 
@@ -396,7 +403,7 @@ static int mc13892_sw_regulator_get_voltage_sel(struct 
regulator_dev *rdev)
 {
struct mc13xxx_regulator_priv *priv = rdev_get_drvdata(rdev);
int ret, id = rdev_get_id(rdev);
-   unsigned int val;
+   unsigned int val, selector;
 
dev_dbg(rdev_get_dev(rdev), "%s id: %d\n", __func__, id);
 
@@ -407,12 +414,28 @@ static int mc13892_sw_regulator_get_voltage_sel(struct 
regulator_dev *rdev)
if (ret)
return ret;
 
-   val = (val & mc13892_regulators[id].vsel_mask)
-   >> mc13892_regulators[id].vsel_shift;
+   /*
+* Figure out if the HI bit is set inside the switcher mode register
+* since this means the selector value we return is at a different
+* offset into the selector table.
+*
+* According to the MC13892 documentation note 59 (Table 47) the SW1
+* buck switcher does not support output range programming therefore
+* the HI bit must always remain 0. So do not do anything strange if
+* our register is MC13892_SWITCHERS0.
+*/
+
+   selector = val & mc13892_regulators[id].vsel_mask;
+
+   if ((mc13892_regulators[id].vsel_reg != MC13892_SWITCHERS0) &&
+   (val & MC13892_SWITCHERS0_SWxHI)) {
+   selector += MC13892_SWxHI_SEL_OFFSET;
+   }
 
-   dev_dbg(rdev_get_dev(rdev), "%s id: %d val: %d\n", __func__, id, val);
+   dev_dbg(rdev_get_dev(rdev), "%s id: %d val: 0x%08x selector: %d\n",
+   __func__, id, val, selector);
 
-   return val;
+   return selector;
 }
 
 static int mc13892_sw_regulator_set_voltage_sel(struct regulator_dev *rdev,
@@ -425,18 +448,35 @@ static int mc13892_sw_re

[PATCH] crypto: fix FTBFS with ARM SHA1-asm and THUMB2_KERNEL

2013-01-21 Thread Matt Sealey
The optimized assembler SHA1 code for ARM does not conform to Thumb2
register usage requirements, so it cannot be built when the kernel is
configured with THUMB2_KERNEL.

Fix the FTBFS for now by preventing misconfigurations of the kernel.

Signed-off-by: Matt Sealey 
---
 crypto/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 4641d95..304d60b 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -472,7 +472,7 @@ config CRYPTO_SHA1_SPARC64
 
 config CRYPTO_SHA1_ARM
tristate "SHA1 digest algorithm (ARM-asm)"
-   depends on ARM
+   depends on ARM && !THUMB2_KERNEL
select CRYPTO_SHA1
select CRYPTO_HASH
help
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Compilation problem with drivers/staging/zsmalloc when !SMP on ARM

2013-01-21 Thread Matt Sealey
On Fri, Jan 18, 2013 at 10:46 PM, Konrad Rzeszutek Wilk
 wrote:
> On Fri, Jan 18, 2013 at 07:11:32PM -0600, Matt Sealey wrote:
>> On Fri, Jan 18, 2013 at 3:08 PM, Russell King - ARM Linux
>>
>> I'm gonna put this out to the maintainers (Konrad, and Seth since he
>> committed it) that if this code is buggy it gets taken back out, even
>> if it makes zsmalloc "slow" on ARM, for the following reasons:
>
> Just to make sure I understand, you mean don't use page table
> mapping but instead use copying?

Yes, just back out the USE_PGTABLE_MAPPING code. But as I just replied
with Minchan, maybe there is a better way.

The only real problem here apart from the non-per-cpu usage Russell
describes (which does not affect UP systems anyway) is that without
CONFIG_SMP we have a FTBFS.

However I am sure you agree on the odd fix of enabling pagetable
mapping optimization only on "high end" systems and leaving the low
end using the slow path. It *is* odd. Also, my rudimentary patch for
disabling the code on !CONFIG_SMP is just *hiding* a misuse of a
non-exported mm function...

The FTBFS showed the problem, I don't want the fix to be to hide it,
which is why I brought it up.

>> * It's buggy on SMP as Russell describes above
>> * It might not be buggy on UP (opposite to Russell's description above
>> as the restrictions he states do not exist), but that would imply an
>> export for a really core internal MM function nobody should be using
>> anyway
>> * By that assessment, using that core internal MM function on SMP is
>> also bad voodoo that zsmalloc should not be doing
>
>  'local_tlb_flush' is bad voodoo?

See previous mail to Minchan; local_tlb_flush_kernel_range calls
cpu_tlb.flush_kernel_range on SMP, but a direct function call
("glue(_TLB, flush_kernel_range)" which resolves to
v7wbi_flush_kernel_range etc. etc.) without CONFIG_SMP. That direct
function call it resolves to is not an export and Russell just said he
won't accept a patch to export it.

> If you have an ARM server that you would be willing to part with I would
> be thrilled to look at it.



>> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c
>> b/drivers/staging/zsmalloc/zsmalloc-main.c
>> index 09a9d35..ecf75fb 100644
> > --- a/drivers/staging/zsmalloc/zsmalloc-main.c
>> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
>> @@ -228,7 +228,7 @@ struct zs_pool {
>>   * mapping rather than copying
>>   * for object mapping.
>>  */
>> -#if defined(CONFIG_ARM)
>> +#if defined(CONFIG_ARM) && defined(CONFIG_SMP)
>>  #define USE_PGTABLE_MAPPING
>>  #endif
>>
>> .. such that it even compiles in both "guess" configurations, the
>> slower Cortex-A8 600MHz single core system gets to use the slow copy
>> path and the dual-core 1GHz+ Cortex-A9 (with twice the RAM..) gets to
>> use the fast mapping path. Essentially all the patch does is "improve
>> performance" on the fastest, best-configured, large-amounts-of-RAM,
>> lots-of-CPU-performance ARM systems (OMAP4+, Snapdragon, Exynos4+,
>> marvell armada, i.MX6..) while introducing the problems Russell
>> describes, and leave performance exactly the same and potentially far
>> more stable on the slower, memory-limited ARM machines.
>
> Any ideas on how to detect that?

Nope, sorry. It would rely on knowing the precise configuration *and*
the user intent of using zram which is far beyond the scope of zram or
zsmalloc itself. It could be a VM, a phone with only 256MB RAM and
slow storage, it could be a device with 2TB RAM but no fast local
storage.. whether using it as a compressed block device or a swap
device that is compressed, you can never know inside the driver what
the real use it except the kernel build time config.

All we can know in zram, at build time, is whether we're configuring
for SMP, UP-on-SMP, or UP (!SMP) and which code path makes more sense
to build (ideally, SMP and UP would run the pagetable mapping code
alike). If we can make the pagetable mapping code compile on UP
without the above patch being required, and it has the same effect,
then this would be the best solution. Then the code needs to be fixed
for proper operation on SMP anyway.

If there are still a bunch of export problems with an alternate method
of flushing the tlb for a range of kernel memory exposed by trying a
different way around, this is just proving an issue here that the ARM
guys disagree that things that can be built as modules should be doing
such things, or that the cpu_tlb.flush_kernel_range vs.
v7wbi_tlb_flush_kernel_range export thing is confusing as crap at the
very least in that the CPU topology model the kernel lives by and is
compiled for at build time causes build breakages if you don't want
(or have) your 

Re: Compilation problem with drivers/staging/zsmalloc when !SMP on ARM

2013-01-21 Thread Matt Sealey
Hi Minchan,

On Sun, Jan 20, 2013 at 11:55 PM, Minchan Kim  wrote:
> On Fri, Jan 18, 2013 at 11:46:02PM -0500, Konrad Rzeszutek Wilk wrote:
>> On Fri, Jan 18, 2013 at 07:11:32PM -0600, Matt Sealey wrote:
>> >
>> > diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c
>> > b/drivers/staging/zsmalloc/zsmalloc-main.c
>> > index 09a9d35..ecf75fb 100644
>>   > --- a/drivers/staging/zsmalloc/zsmalloc-main.c
>> > +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
>> > @@ -228,7 +228,7 @@ struct zs_pool {
>> >   * mapping rather than copying
>> >   * for object mapping.
>> >  */
>> > -#if defined(CONFIG_ARM)
>> > +#if defined(CONFIG_ARM) && defined(CONFIG_SMP)
>> >  #define USE_PGTABLE_MAPPING
>
> I don't get it. How to prevent the problem Russel described?
> The problem is that other CPU can prefetch _speculatively_ under us.

It prevents no problems, but if that isn't there, kernels build
without SMP support (i.e. specifically uniprocessor kernels) will fail
at the linker stage.

That's not desirable.

We have 3 problems here, this solves the first of them, and creates
the third. The second is constant regardless..

1) zsmalloc will not build on ARM without CONFIG_SMP because on UP
local_tlb_flush_kern_range uses a function which uses an export which
isn't required on SMP

Basically, with CONFIG_SMP (and CONFIG_UP_ON_SMP),
local_tlb_flush_kern_range is calling functions by dereference from
the per-cpu global cpu_tlb structure.

On UP (!CONFIG_SMP), it is calling functions directly (in my case,
v7wbi_local_tlb_flush_kern_range or whatever, but on v6, v5, v4 ARM
processor kernel builds it may be different) which need to be exported
outside of the MM core.

If this difference is going to stick around - Russell is refusing here
to export that/those direct functions - then the optimized vm mapping
code simply should never be allowed to run on non-SMP systems to keep
it building for everyone.

The patch above is simply a build fix for !CONFIG_SMP in this case to
force it to use the slow path for systems where the above missing
export problem will cause the linker failure.

2) the optimized vm mapping isn't per-cpu aware as per Russell's
arguments. I'll let you guys discuss that as I have no idea what the
real implications are for SMP systems (and my current testing is only
on a non-SMP CPU, I will have to go grab a couple boards from the lab
for SMP)

3) it somewhat defeats the purpose of the optimization if UP systems
(which tend to have less memory and might benefit from things like
zsmalloc/zram more) cannot use it, but SMP systems which tend to have
more memory (unless we're talking about a frugal configuration of a
virtual machine, perhaps). Given the myriad use cases of zram that is
not a pervasive or persuasive argument, I know, but it does seem
slightly backwards.

> If I don't miss something, we could have 2 choice.
>
> 1) use flush_tlb_kernel_range instead of local_flush_tlb_kernel_range
> Or
> 2) use only memory copy
>
> I guess everybody want 2 because it makes code very simple and maintainable.
> Even, rencently Joonsoo sent optimize patch.
> Look at https://lkml.org/lkml/2013/1/16/68 so zram/zcache effect caused by 2
> would be minimized.
>
> But please give me the time and I will borrow quad-core embedded target board
> and test 1 on the phone with Seth's benchmark.

In the meantime please take into account building a non-SMP kernel for
this board and testing that; if there is a way to do the flush without
using the particular function which uses the particular export that
Russell will not export, then that would be better. Maybe for
!CONFIG_SMP using flush_tlb_kernel_range is doing the exact same job
and the real patch is not to disable the optimization with
!CONFIG_SMP, but to additionally #if defined(CONFIG_SMP) around
local_flush_tlb_kernel_range and alternatively for UP use
flush_tlb_kernel_range which.. probably.. doesn't use that contentious
export?

This is far beyond the level I want to be digging around in the Linux
kernel so I am not comfortable even trying that to find out.

-- 
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Compilation problem with drivers/staging/zsmalloc when !SMP on ARM

2013-01-21 Thread Matt Sealey
Hi Minchan,

On Sun, Jan 20, 2013 at 11:55 PM, Minchan Kim minc...@kernel.org wrote:
 On Fri, Jan 18, 2013 at 11:46:02PM -0500, Konrad Rzeszutek Wilk wrote:
 On Fri, Jan 18, 2013 at 07:11:32PM -0600, Matt Sealey wrote:
 
  diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c
  b/drivers/staging/zsmalloc/zsmalloc-main.c
  index 09a9d35..ecf75fb 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
  +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
  @@ -228,7 +228,7 @@ struct zs_pool {
* mapping rather than copying
* for object mapping.
   */
  -#if defined(CONFIG_ARM)
  +#if defined(CONFIG_ARM)  defined(CONFIG_SMP)
   #define USE_PGTABLE_MAPPING

 I don't get it. How to prevent the problem Russel described?
 The problem is that other CPU can prefetch _speculatively_ under us.

It prevents no problems, but if that isn't there, kernels build
without SMP support (i.e. specifically uniprocessor kernels) will fail
at the linker stage.

That's not desirable.

We have 3 problems here, this solves the first of them, and creates
the third. The second is constant regardless..

1) zsmalloc will not build on ARM without CONFIG_SMP because on UP
local_tlb_flush_kern_range uses a function which uses an export which
isn't required on SMP

Basically, with CONFIG_SMP (and CONFIG_UP_ON_SMP),
local_tlb_flush_kern_range is calling functions by dereference from
the per-cpu global cpu_tlb structure.

On UP (!CONFIG_SMP), it is calling functions directly (in my case,
v7wbi_local_tlb_flush_kern_range or whatever, but on v6, v5, v4 ARM
processor kernel builds it may be different) which need to be exported
outside of the MM core.

If this difference is going to stick around - Russell is refusing here
to export that/those direct functions - then the optimized vm mapping
code simply should never be allowed to run on non-SMP systems to keep
it building for everyone.

The patch above is simply a build fix for !CONFIG_SMP in this case to
force it to use the slow path for systems where the above missing
export problem will cause the linker failure.

2) the optimized vm mapping isn't per-cpu aware as per Russell's
arguments. I'll let you guys discuss that as I have no idea what the
real implications are for SMP systems (and my current testing is only
on a non-SMP CPU, I will have to go grab a couple boards from the lab
for SMP)

3) it somewhat defeats the purpose of the optimization if UP systems
(which tend to have less memory and might benefit from things like
zsmalloc/zram more) cannot use it, but SMP systems which tend to have
more memory (unless we're talking about a frugal configuration of a
virtual machine, perhaps). Given the myriad use cases of zram that is
not a pervasive or persuasive argument, I know, but it does seem
slightly backwards.

 If I don't miss something, we could have 2 choice.

 1) use flush_tlb_kernel_range instead of local_flush_tlb_kernel_range
 Or
 2) use only memory copy

 I guess everybody want 2 because it makes code very simple and maintainable.
 Even, rencently Joonsoo sent optimize patch.
 Look at https://lkml.org/lkml/2013/1/16/68 so zram/zcache effect caused by 2
 would be minimized.

 But please give me the time and I will borrow quad-core embedded target board
 and test 1 on the phone with Seth's benchmark.

In the meantime please take into account building a non-SMP kernel for
this board and testing that; if there is a way to do the flush without
using the particular function which uses the particular export that
Russell will not export, then that would be better. Maybe for
!CONFIG_SMP using flush_tlb_kernel_range is doing the exact same job
and the real patch is not to disable the optimization with
!CONFIG_SMP, but to additionally #if defined(CONFIG_SMP) around
local_flush_tlb_kernel_range and alternatively for UP use
flush_tlb_kernel_range which.. probably.. doesn't use that contentious
export?

This is far beyond the level I want to be digging around in the Linux
kernel so I am not comfortable even trying that to find out.

-- 
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Compilation problem with drivers/staging/zsmalloc when !SMP on ARM

2013-01-21 Thread Matt Sealey
On Fri, Jan 18, 2013 at 10:46 PM, Konrad Rzeszutek Wilk
konrad.w...@oracle.com wrote:
 On Fri, Jan 18, 2013 at 07:11:32PM -0600, Matt Sealey wrote:
 On Fri, Jan 18, 2013 at 3:08 PM, Russell King - ARM Linux

 I'm gonna put this out to the maintainers (Konrad, and Seth since he
 committed it) that if this code is buggy it gets taken back out, even
 if it makes zsmalloc slow on ARM, for the following reasons:

 Just to make sure I understand, you mean don't use page table
 mapping but instead use copying?

Yes, just back out the USE_PGTABLE_MAPPING code. But as I just replied
with Minchan, maybe there is a better way.

The only real problem here apart from the non-per-cpu usage Russell
describes (which does not affect UP systems anyway) is that without
CONFIG_SMP we have a FTBFS.

However I am sure you agree on the odd fix of enabling pagetable
mapping optimization only on high end systems and leaving the low
end using the slow path. It *is* odd. Also, my rudimentary patch for
disabling the code on !CONFIG_SMP is just *hiding* a misuse of a
non-exported mm function...

The FTBFS showed the problem, I don't want the fix to be to hide it,
which is why I brought it up.

 * It's buggy on SMP as Russell describes above
 * It might not be buggy on UP (opposite to Russell's description above
 as the restrictions he states do not exist), but that would imply an
 export for a really core internal MM function nobody should be using
 anyway
 * By that assessment, using that core internal MM function on SMP is
 also bad voodoo that zsmalloc should not be doing

  'local_tlb_flush' is bad voodoo?

See previous mail to Minchan; local_tlb_flush_kernel_range calls
cpu_tlb.flush_kernel_range on SMP, but a direct function call
(glue(_TLB, flush_kernel_range) which resolves to
v7wbi_flush_kernel_range etc. etc.) without CONFIG_SMP. That direct
function call it resolves to is not an export and Russell just said he
won't accept a patch to export it.

 If you have an ARM server that you would be willing to part with I would
 be thrilled to look at it.



 diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c
 b/drivers/staging/zsmalloc/zsmalloc-main.c
 index 09a9d35..ecf75fb 100644
  --- a/drivers/staging/zsmalloc/zsmalloc-main.c
 +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
 @@ -228,7 +228,7 @@ struct zs_pool {
   * mapping rather than copying
   * for object mapping.
  */
 -#if defined(CONFIG_ARM)
 +#if defined(CONFIG_ARM)  defined(CONFIG_SMP)
  #define USE_PGTABLE_MAPPING
  #endif

 .. such that it even compiles in both guess configurations, the
 slower Cortex-A8 600MHz single core system gets to use the slow copy
 path and the dual-core 1GHz+ Cortex-A9 (with twice the RAM..) gets to
 use the fast mapping path. Essentially all the patch does is improve
 performance on the fastest, best-configured, large-amounts-of-RAM,
 lots-of-CPU-performance ARM systems (OMAP4+, Snapdragon, Exynos4+,
 marvell armada, i.MX6..) while introducing the problems Russell
 describes, and leave performance exactly the same and potentially far
 more stable on the slower, memory-limited ARM machines.

 Any ideas on how to detect that?

Nope, sorry. It would rely on knowing the precise configuration *and*
the user intent of using zram which is far beyond the scope of zram or
zsmalloc itself. It could be a VM, a phone with only 256MB RAM and
slow storage, it could be a device with 2TB RAM but no fast local
storage.. whether using it as a compressed block device or a swap
device that is compressed, you can never know inside the driver what
the real use it except the kernel build time config.

All we can know in zram, at build time, is whether we're configuring
for SMP, UP-on-SMP, or UP (!SMP) and which code path makes more sense
to build (ideally, SMP and UP would run the pagetable mapping code
alike). If we can make the pagetable mapping code compile on UP
without the above patch being required, and it has the same effect,
then this would be the best solution. Then the code needs to be fixed
for proper operation on SMP anyway.

If there are still a bunch of export problems with an alternate method
of flushing the tlb for a range of kernel memory exposed by trying a
different way around, this is just proving an issue here that the ARM
guys disagree that things that can be built as modules should be doing
such things, or that the cpu_tlb.flush_kernel_range vs.
v7wbi_tlb_flush_kernel_range export thing is confusing as crap at the
very least in that the CPU topology model the kernel lives by and is
compiled for at build time causes build breakages if you don't want
(or have) your driver to be knowledgable of the differences :)

 can build for SMP and UP and get the same code.. with less bugs.

 I get that you want to have this fixed right now.

Somewhat. It is not urgent, since I fixed it for now in my tree, I
am just worried that if that fix goes upstream it hides a real, more
important issue here.

I am mostly only concerned about whether zram

[PATCH] crypto: fix FTBFS with ARM SHA1-asm and THUMB2_KERNEL

2013-01-21 Thread Matt Sealey
The optimized assembler SHA1 code for ARM does not conform to Thumb2
register usage requirements, so it cannot be built when the kernel is
configured with THUMB2_KERNEL.

Fix the FTBFS for now by preventing misconfigurations of the kernel.

Signed-off-by: Matt Sealey m...@genesi-usa.com
---
 crypto/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 4641d95..304d60b 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -472,7 +472,7 @@ config CRYPTO_SHA1_SPARC64
 
 config CRYPTO_SHA1_ARM
tristate SHA1 digest algorithm (ARM-asm)
-   depends on ARM
+   depends on ARM  !THUMB2_KERNEL
select CRYPTO_SHA1
select CRYPTO_HASH
help
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mc13892-regulator: correct/refine handling of the SWxHI bit

2013-01-21 Thread Matt Sealey
MC13892 PMIC supports a HI bit for 3 of it's 4 buck switcher outputs,
which enables a higher set of voltage ranges.

Despite a comment in the code ('sw regulators need special care due to the
hi bit'), it actually does not take special care since it does not modify
it's use of the selector table index when this bit is set, giving us very
odd behavior when setting a high voltage on supported switchers or listing
current voltages. Net effect is in best case the kernel and sysfs report
lower voltages than are actually set in hardware (1300mV instead of 1800mV
for example) and in the worst case setting a voltage (e.g. 1800mV) will cause
an undervoltage condition (e.g. 1300mV).

Correct the behavior, taking into account SW1 doesn't support the HI bit,
and as such we need to ignore it.

While we are modifying these functions, fix and optimize the following;

* set_voltage_sel callback was using .reg instead of .vsel_reg - since
  they were set to the same value it actually didn't break anything but
  it would be semantically incorrect to use .reg in this case. We now use
  .vsel_reg and be consistent.
* vsel_shift is always 0 for every SWx regulator, and constantly shifting
  and masking off the bottom few bits is time consuming and makes the
  code very hard to read - optimize this out.
* get_voltage_sel uses the variable val and set_voltage_sel uses the
  variable selector (and reg_value). Introduce the variable selector
  to get_voltage_sel such that it makes more sense and allow some leaner
  code in light of the modifications in this patch. Add better exposure
  to the debug print so the register value AND the selector are printed as
  this will adequately show the HI bit in the register.
* correct a comment in probe which is doing a version check. Magic
  values are awful but for once instance, a comment does just as
  good a job as something symbolic.

Signed-off-by: Matt Sealey m...@genesi-usa.com
Tested-by: Steev Klimaszewski st...@genesi-usa.com
Cc: Shawn Guo shawn@linaro.org
Cc: Fabio Estevam fabio.este...@freescale.com
---
 drivers/regulator/mc13892-regulator.c |   72 +
 1 file changed, 56 insertions(+), 16 deletions(-)

diff --git a/drivers/regulator/mc13892-regulator.c 
b/drivers/regulator/mc13892-regulator.c
index 0d84b1f..169457b 100644
--- a/drivers/regulator/mc13892-regulator.c
+++ b/drivers/regulator/mc13892-regulator.c
@@ -164,6 +164,14 @@ static const unsigned int mc13892_sw1[] = {
135, 1375000
 };
 
+/*
+ * Note: this table is used to derive SWxVSEL by index into
+ * the array. Offset the values by the index of 110uV
+ * to get the actual register value for that voltage selector
+ * if the HI bit is to be set as well.
+ */
+#define MC13892_SWxHI_SEL_OFFSET   20
+
 static const unsigned int mc13892_sw[] = {
60,   625000,  65,  675000,  70,  725000,
75,   775000,  80,  825000,  85,  875000,
@@ -239,7 +247,6 @@ static const unsigned int mc13892_pwgtdrv[] = {
 };
 
 static struct regulator_ops mc13892_gpo_regulator_ops;
-/* sw regulators need special care due to the hi bit */
 static struct regulator_ops mc13892_sw_regulator_ops;
 
 
@@ -396,7 +403,7 @@ static int mc13892_sw_regulator_get_voltage_sel(struct 
regulator_dev *rdev)
 {
struct mc13xxx_regulator_priv *priv = rdev_get_drvdata(rdev);
int ret, id = rdev_get_id(rdev);
-   unsigned int val;
+   unsigned int val, selector;
 
dev_dbg(rdev_get_dev(rdev), %s id: %d\n, __func__, id);
 
@@ -407,12 +414,28 @@ static int mc13892_sw_regulator_get_voltage_sel(struct 
regulator_dev *rdev)
if (ret)
return ret;
 
-   val = (val  mc13892_regulators[id].vsel_mask)
-mc13892_regulators[id].vsel_shift;
+   /*
+* Figure out if the HI bit is set inside the switcher mode register
+* since this means the selector value we return is at a different
+* offset into the selector table.
+*
+* According to the MC13892 documentation note 59 (Table 47) the SW1
+* buck switcher does not support output range programming therefore
+* the HI bit must always remain 0. So do not do anything strange if
+* our register is MC13892_SWITCHERS0.
+*/
+
+   selector = val  mc13892_regulators[id].vsel_mask;
+
+   if ((mc13892_regulators[id].vsel_reg != MC13892_SWITCHERS0) 
+   (val  MC13892_SWITCHERS0_SWxHI)) {
+   selector += MC13892_SWxHI_SEL_OFFSET;
+   }
 
-   dev_dbg(rdev_get_dev(rdev), %s id: %d val: %d\n, __func__, id, val);
+   dev_dbg(rdev_get_dev(rdev), %s id: %d val: 0x%08x selector: %d\n,
+   __func__, id, val, selector);
 
-   return val;
+   return selector;
 }
 
 static int mc13892_sw_regulator_set_voltage_sel(struct regulator_dev *rdev,
@@ -425,18 +448,35 @@ static int mc13892_sw_regulator_set_voltage_sel(struct 
regulator_dev *rdev,
 
volt

[PATCH] mc13892: sanity check num_regulators parsed vs. registered

2013-01-21 Thread Matt Sealey
Imagine a situation where a device tree has a few regulators in an
appropriate node:

regulators {
sw1 {
..
};

vvideo {
..
};

:

vfake {
..
};

vtypo {
..
};
};

In the above example, the node name vfake is an attempt to match a
regulator name inside the driver which just so happens to not exist. The
node name vtypo represents an accidental typographical error in a
regulator name which may have been introduced to a device tree.

In these cases, the number of regulators the mc13892 driver thinks it has
does not match the number of regulators it parsed and registered. Since
it will go over this array based on this number, it will actually
re-register regulator 0 (which happens to be SW1) over and over
again until it reaches the number, resulting in messages on the kernel
log such as these:

SW1: at 1100 mV
VVIDEO: at 2775mV
:
SW1: at 1100 mV
SW1: at 1100 mV

.. up to that number of mismatched regulators. Nobody using DT can/will
consume these regulators, so it should not be possible for it to cause any
real regulator problems or driver breakages, but it is an easy thing to
miss in a kernel log and is an immediate indication of a problem with the
device tree authoring.

This patch effectively sanity checks the number of counted children of
the regulators node vs. the number that actually matched driver names,
and sets the appropriate num_regulators value. It also gives a little
warning for device tree authors that they MAY have screwed something up,
such that this patch does not hide the device tree authoring problem.

Signed-off-by: Matt Sealey m...@genesi-usa.com
Tested-by: Steev Klimaszewski st...@genesi-usa.com
Cc: Shawn Guo shawn@linaro.org
Cc: Fabio Estevam fabio.este...@freescale.com
---
 drivers/regulator/mc13892-regulator.c  |   39 ++--
 drivers/regulator/mc13xxx-regulator-core.c |   10 +--
 drivers/regulator/mc13xxx.h|4 +--
 3 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/drivers/regulator/mc13892-regulator.c 
b/drivers/regulator/mc13892-regulator.c
index e4f2197..7c86040 100644
--- a/drivers/regulator/mc13892-regulator.c
+++ b/drivers/regulator/mc13892-regulator.c
@@ -535,15 +535,18 @@ static int mc13892_regulator_probe(struct platform_device 
*pdev)
struct mc13xxx_regulator_init_data *mc13xxx_data;
struct regulator_config config = { };
int i, ret;
-   int num_regulators = 0;
+   int num_regulators = 0, num_parsed;
u32 val;
 
num_regulators = mc13xxx_get_num_regulators_dt(pdev);
+
if (num_regulators = 0  pdata)
num_regulators = pdata-num_regulators;
if (num_regulators = 0)
return -EINVAL;
 
+   num_parsed = num_regulators;
+
priv = devm_kzalloc(pdev-dev, sizeof(*priv) +
num_regulators * sizeof(priv-regulators[0]),
GFP_KERNEL);
@@ -586,7 +589,39 @@ static int mc13892_regulator_probe(struct platform_device 
*pdev)
= mc13892_vcam_get_mode;
 
mc13xxx_data = mc13xxx_parse_regulators_dt(pdev, mc13892_regulators,
-   ARRAY_SIZE(mc13892_regulators));
+   ARRAY_SIZE(mc13892_regulators),
+   num_parsed);
+
+   /*
+* Perform a little sanity check on the regulator tree - if we found
+* a number of regulators from mc13xxx_get_num_regulators_dt and
+* then parsed a smaller number in mc13xxx_parse_regulators_dt then
+* there is a regulator defined in the regulators node which has
+* not matched any usable regulator in the driver. In this case,
+* there is one missing and what will happen is the first regulator
+* will get registered again.
+*
+* Fix this by basically making our number of registerable regulators
+* equal to the number of regulators we parsed. We are allocating
+* too much memory for priv, but this is unavoidable at this point.
+*
+* As an example of how this can happen, try making a typo in your
+* regulators node (vviohi {} instead of viohi {}) so that the name
+* does not match..
+*
+* The check will basically pass for platform data (non-DT) because
+* mc13xxx_parse_regulators_dt for !CONFIG_OF will not touch num_parsed.
+*
+*/
+   if (num_parsed != num_regulators) {
+   dev_warn(pdev-dev,
+   parsed %d != regulators %d - check your device tree!\n,
+   num_parsed, num_regulators);
+
+   num_regulators = num_parsed;
+   priv-num_regulators = num_regulators;
+   }
+
for (i = 0; i  num_regulators; i++) {
struct regulator_init_data *init_data;
struct

One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
 there be a
help entry or some documentation for what this option does? I have
CC'd the scheduler maintainers because I'd really like to know what I
am doing here before I venture into putting patches out which could
potentially rip open spacetime and have us all sucked in..

And I guess I have one more question before I do attempt to open that
tear, what really is the effect of CONFIG_HZ vs. CONFIG_NO_HZ vs. ARM
sched_clock, and usage of the new hooks to register a real timer as
ARM delay_timer? I have patches I can modify for upstream that add
both device tree implementation and probing of i.MX highres
clocksources (GPT and EPIT) and registration of sched_clock and delay
timer implementations based on these clocks, but while the code
compiles and seems to work, the ACTUAL effect of these (and the
fundamental requirements for the clocks being used) seems to be
information only in the minds of the people who wrote the code. It's
not that obvious to me what the true effect of using a non-architected
ARM core timer for at least the delay_timer is, and I have some really
odd lpj values and very strange re-calibrations popping out (with
constant rate for the timer, lpj goes down.. when using the
delay_timer implementation, shouldn't lpj be still relative to the
timer rate and NOT cpu frequency?) when using cpufreq on i.MX5 when I
do it, and whether CONFIG_SCHED_HRTICK is a good or bad idea..

Apologies for the insane number of questions here, but fully
appreciative of any answers,

--
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 2:41 PM, Arnd Bergmann a...@arndb.de wrote:
 On Monday 21 January 2013, Matt Sealey wrote:

 ARM seems to be the only major platform not using the
 kernel/Kconfig.hz definitions, instead rolling it's own and setting
 what could be described as both reasonable and unreasonable defaults
 for platforms. If we're going wholesale for multiplatform on ARM then
 having CONFIG_HZ be selected dependent on platform options seems
 rather curious since building a kernel for Exynos, OMAP or so will
 force the default to a value which is not truly desired by the
 maintainers.

 Agreed 100%.

 (adding John Stultz to Cc, he's the local time expert)

Hi, John! Welcome to the fray :)

 config HZ
 int
 default 200 if ARCH_EBSA110 || ARCH_S3C24XX || ARCH_S5P64X0 || \
 ARCH_S5PV210 || ARCH_EXYNOS4
 default OMAP_32K_TIMER_HZ if ARCH_OMAP  OMAP_32K_TIMER
 default AT91_TIMER_HZ if ARCH_AT91
 default SHMOBILE_TIMER_HZ if ARCH_SHMOBILE
 default 100


[snip]

 Either way, if I boot a kernel on i.MX6, CONFIG_HZ depends on the
 other ARM platforms I also want to boot on it.. this is not exactly
 multiplatform compliant, right?

 Right. It's pretty clear that the above logic does not work
 with multiplatform.  Maybe we should just make ARCH_MULTIPLATFORM
 select NO_HZ to make the question much less interesting.

 Regarding the defaults, I would suggest putting them into all the
 defaults into the defconfig files and removing the other hardcoding
 otherwise. Ben Dooks and Russell are probably the best to know
 what triggered the 200 HZ for s3c24xx and for ebsa110. My guess
 is that the other samsung ones are the result of cargo cult
 programming.

 at91 and omap set the HZ value to something that is derived
 from their hardware timer, but we have also forever had logic
 to calculate the exact time when that does not match. This code
 has very recently been moved into the new register_refined_jiffies()
 function. John can probably tell is if this solves all the problems
 for these platforms.

I would be very interested. My plan would be then (providing John
responds in the affirmative) to basically submit a patch to remove the
8 lines pasted above and source kernel/Kconfig.hz instead. I'm doing
this now on a local kernel tree and I can't see any real problem with
it.

It would then be up to the above-mentioned maintainers to decide if
they are part of the cargo cult and don't need it or refine their
board files to match the New World Order of using Kconfig.hz. The
unconfigured kernel default is 100 anyway which is lower than all the
above default setting, so I would technically be causing a regression
on those platforms... do I want to be responsible for that? Probably
not, but as I said, it's not affecting (in fact, it may be
*improving*) the platforms I care about.

 Additionally, using kernel/Kconfig.hz is a predicate for enabling
 (forced enabling, even) CONFIG_SCHED_HRTICK which is defined nowhere
 else. I don't know how many ARM systems here benefit from this, if
 there is a benefit, or what this really means.. if you really have a
 high resolution timer (and hrtimers enabled) that would assist the
 scheduler this way, is it supposed to make a big difference to the way
 the scheduler works for the better or worse? Is this actually
 overridden by ARM sched_clock handling or so? Shouldn't there be a
 help entry or some documentation for what this option does? I have
 CC'd the scheduler maintainers because I'd really like to know what I
 am doing here before I venture into putting patches out which could
 potentially rip open spacetime and have us all sucked in..

 Yes, that sounds like yet another bug.

So is that a bug in that it is not available to ARM right now, a bug
in that it would be impossible for anyone on ARM to have ever tested
this code, or a bug in that it should NEVER be enabled for ARM for
some reason? John? Ingo? :)

-- 
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 3:00 PM, John Stultz john.stu...@linaro.org wrote:
 On 01/21/2013 12:41 PM, Arnd Bergmann wrote:

 Right. It's pretty clear that the above logic does not work
 with multiplatform.  Maybe we should just make ARCH_MULTIPLATFORM
 select NO_HZ to make the question much less interesting.

 Although, even with NO_HZ, we still have some sense of HZ.

I wonder if you can confirm my understanding of this by the way? The
way I think this works is;

CONFIG_HZ on it's own defines the rate at which the kernel wakes up
from sleeping on the job, and checks for current or expired timer
events such that it can do things like schedule_work (as in
workqueues) or perform scheduler (as in processes/tasks) operations.

CONFIG_NO_HZ turns on logic which effectively only wakes up at a
*maximum* of CONFIG_HZ times per second, but otherwise will go to
sleep and stay that way if no events actually happened (so, we rely on
a timer interrupt popping up).

In this case, no matter whether CONFIG_HZ=1000 or CONFIG_HZ=250 (for
example) combined with CONFIG_NO_HZ and less than e.g. 250 things
happening per second will wake up exactly the same number of times?

CONFIG_HZ=1000 with CONFIG_NO_HZ would be an effective, all-round
solution here, then, and CONFIG_HZ=100 should be a reasonable default
(as it is anyway with an otherwise-unconfigured kernel on any other
platform) for !CONFIG_NO_HZ.

I have to admit, the only reason I noticed the above is because I was
reading one of CK's BFS logs and reading it makes it seem like the
above is the case, but I have no idea if he thinks BFS makes that the
case or if the current CFQ scheduler makes that the case, or if this
is simply.. the case.. (can you see this is kind of confusing to me as
this is basically not written anywhere except maybe an LWN article
from 2008 I read up on? :)

 Regarding the defaults, I would suggest putting them into all the
 defaults into the defconfig files and removing the other hardcoding
 otherwise. Ben Dooks and Russell are probably the best to know
 what triggered the 200 HZ for s3c24xx and for ebsa110. My guess
 is that the other samsung ones are the result of cargo cult
 programming.

 at91 and omap set the HZ value to something that is derived
 from their hardware timer, but we have also forever had logic
 to calculate the exact time when that does not match. This code
 has very recently been moved into the new register_refined_jiffies()
 function. John can probably tell is if this solves all the problems
 for these platforms.


 Yea, as far as timekeeping is concerned, we shouldn't be HZ dependent (and
 the register_refined_jiffies is really only necessary if you're not
 expecting a proper clocksource to eventually be registered), assuming the
 hardware can do something close to the HZ value requested.

 So I'd probably want to hear about what history caused the specific 200 HZ
 selections, as I suspect there's actual hardware limitations there. So if
 you can not get actual timer ticks any faster then 200 HZ on that hardware,
 setting HZ higher could cause some jiffies related timer trouble (ie: if the
 kernel thinks HZ is 1000 but the hardware can only do 200, that's a
 different problem then if the hardware actually can only do 999.8 HZ). So
 things like timer-wheel timeouts may not happen when they should.

 I suspect the best approach for multi-arch in those cases may be to select
 HZ=100

As above, or not select anything at all since HZ=100 if you don't
touch anything, right?

If someone picks HZ=1000 and their platform can't support it, then
that's their own damn problem (don't touch things you don't
understand, right? ;)

 and use HRT to allow more modern systems to have finer-grained
 timers.

My question really has to be is CONFIG_SCHED_HRTICK useful, what
exactly is it going to do on ARM here since nobody can ever have
enabled it? Is it going to keel over and explode if nobody registers a
non-jiffies sched_clock (since the jiffies clock is technically
reporting itself as a ridiculously high resolution clocksource..)?

Or is this one of those things that if your platform doesn't have a
real high resolution timer, you shouldn't enable HRTIMERS and
therefore not enable SCHED_HRTICK as a result? That affects
ARCH_MULTIPLATFORM here. Is the solution as simple as
ARCH_MULTIPLATFORM compliant platforms kind of have to have a high
resolution timer? Documentation to that effect?

-- 
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
 and find the correct HZ value and implement an override or
selection, or just allow free configuration?

As far as I can tell AT91 and SHMOBILE only supply defaults because HZ
*must* meet some exact timer divisor (OMAP says Kernel internal timer
frequency should be a divisor of 32768) in which case their timer
drivers should not be so stupid and instead round down to the nearest
acceptable timer divisor or WARN_ON if the compile-time values are
unacceptable at runtime before anyone sees any freakish behavior. Is
it a hard requirement for the ARM architecture that a woefully
mis-configured kernel MUST boot completely to userspace?

--
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 4:36 PM, John Stultz john.stu...@linaro.org wrote:
 On 01/21/2013 01:14 PM, Matt Sealey wrote:

 On Mon, Jan 21, 2013 at 3:00 PM, John Stultz john.stu...@linaro.org
 wrote:

 On 01/21/2013 12:41 PM, Arnd Bergmann wrote:

 Right. It's pretty clear that the above logic does not work
 with multiplatform.  Maybe we should just make ARCH_MULTIPLATFORM
 select NO_HZ to make the question much less interesting.

 Although, even with NO_HZ, we still have some sense of HZ.

 In this case, no matter whether CONFIG_HZ=1000 or CONFIG_HZ=250 (for
 example) combined with CONFIG_NO_HZ and less than e.g. 250 things
 happening per second will wake up exactly the same number of times?

 Ideally, if both systems are completely idle, they may see similar number of
 actual interrupts.

 But when the cpus are running processes, the HZ=1000 system will see more
 frequent interrupts, since the timer/scheduler interrupt will jump in 4
 times more frequently.

Understood..

 CONFIG_HZ=1000 with CONFIG_NO_HZ would be an effective, all-round
 solution here, then, and CONFIG_HZ=100 should be a reasonable default
 (as it is anyway with an otherwise-unconfigured kernel on any other
 platform) for !CONFIG_NO_HZ.

 Eeehhh... I'm not sure this is follows.

Okay, I'm happy to be wrong on this...

 As above, or not select anything at all since HZ=100 if you don't
 touch anything, right?

 Well, Russell brought up a case that doesn't handle this. If a system
 *can't* do HZ=100, but can do HZ=200.

 Though there are hacks, of course, that might get around this (skip every
 other interrupt at 200HZ).

Hmm, I think it might be appreciated for people looking at this stuff
(same as I stumbled into it) for a little comment on WHY the default
is 200. That way you don't wonder even if you know why EBSA110 has a
HZ=200 default, why Exynos is lumped in there too (to reduce the
number of interrupts firing? Maybe the Exynos timer interrupt is kind
of a horrid core NMI kind of thing and it's desirable for it not to be
every millisecond, or maybe it has the same restrictions as EBSA110,
but where would anyone go to find out this information?)

 If someone picks HZ=1000 and their platform can't support it, then
 that's their own damn problem (don't touch things you don't
 understand, right? ;)

 Well, ideally with kconfig we try to add proper dependencies so impossible
 options aren't left to the user.
 HZ is a common enough knob to turn on most systems, I don't know if leaving
 the user rope to hang himself is a great idea.

I think then the default 100 at the end of the arch/arm/Kconfig is
saying you are not allowed to know that such a thing as rope even
exists, when in fact what we should be doing is just making sure they
can't swing it over the rafters.. am I taking the analogy too far? :)

 My question really has to be is CONFIG_SCHED_HRTICK useful, what
 exactly is it going to do on ARM here since nobody can ever have
 enabled it? Is it going to keel over and explode if nobody registers a
 non-jiffies sched_clock (since the jiffies clock is technically
 reporting itself as a ridiculously high resolution clocksource..)?

 ??? Not following this at all.  jiffies is the *MOST* coarse resolution
 clocksource there is (at least that I'm aware of.. I recall someone wanting
 to do a 60Hz clocksource, but I don't think that ever happened).

Is that based on it's clocksource rating (probably worse than a real
hrtimer) or it's reported resolution? Because on i.MX51 if I force it
to use the jiffies clock the debug on the kernel log is telling me it
has a higher resolution (it TELLS me that it ticks as fast as the
CPU frequency and wraps less than my real timer).

I know where the 60Hz clocksource might come from, the old Amiga
platforms have one based on the PSU frequency (50Hz in Europe, 60Hz
US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at
least, it is precisely the vsync clock for synchronizing your display
output on TV-out, which makes it completely useful for the framebuffer
driver), but.. you just won't expect to assign it as sched_clock or
your delay timer. And if anyone does I'd expect they'd know full well
it'd not run so well.

 Or is this one of those things that if your platform doesn't have a
 real high resolution timer, you shouldn't enable HRTIMERS and
 therefore not enable SCHED_HRTICK as a result? That affects
 ARCH_MULTIPLATFORM here. Is the solution as simple as
 ARCH_MULTIPLATFORM compliant platforms kind of have to have a high
 resolution timer? Documentation to that effect?

 SO HRITMERS was designed to be be build time enabled, while still giving you
 a functioning system if it was booted on a system that didn't support
 clockevents.  We boot with standard HZ, and only switch over to HRT mode if
 we have a proper clocksource and clockevent driver.

Okay. I'm still a little confused as to what SCHED_HRTICK actually
makes a difference to, though.

From that description, we are booting

Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 4:45 PM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Mon, Jan 21, 2013 at 10:30:07PM +, Arnd Bergmann wrote:
 On Monday 21 January 2013, Matt Sealey wrote:
  So is that a bug in that it is not available to ARM right now, a bug
  in that it would be impossible for anyone on ARM to have ever tested
  this code, or a bug in that it should NEVER be enabled for ARM for
  some reason? John? Ingo? :)
 

 I think it's a bug that it's not available. That does not look intentional.

 What's a bug?  kernel/Kconfig.hz not being available?  No, it's
 intentional.  (See my replies).

The bug I saw as real is that CONFIG_SCHED_HRTICK is defined only in
kernel/Kconfig.hz (and used in kernel/sched only) - so if we want that
functionality enabled we will also have to opencode it in
arch/arm/Kconfig. Everyone else, by virtue of using kernel/Kconfig.hz,
gets this config item enabled for free if they have hrtimers or
generic smp helpers.. if I understood what John just said, this means
on ARM, since we don't use kernel/Kconfig.hz and we don't also define
an item for CONFIG_SCHED_HRTICK, the process scheduler is completely
oblivious that we're running in HRT mode?

The thing I don't know is real is if that really matters one bit..

-- 
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 4:42 PM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote:
 I am sorry it sounded if I was being high and mighty about not being
 able to select my own HZ (or being forced by Exynos to be 200 or by
 not being able to test an Exynos board, forced to default to 100). My
 real grievance here is we got a configuration item for the scheduler
 which is being left out of ARM configurations which *can* use high
 resolution timers, but I don't know if this is a real problem or not,
 hence asking about it, and that HZ=100 is the ARM default whether we
 might be able to select that or not.. which seems low.

 Well, I have a versatile platform here.  It's the inteligence behind
 the power control system for booting the boards on the nightly tests
 (currently disabled because I'm waiting for my main server to lock up
 again, and I need to use one of the serial ports for that.)

 The point is, it talks via I2C to a load of power monitors to read
 samples out.  It does this at sub-100Hz intervals.  Yet the kernel is
 built with HZ=100.  NO_HZ=y and highres timers are enabled... works
 fine.

 So, no, HZ=100 is not a limit in that scenario.  With NO_HZ=y and
 highres timers, it all works with epoll() - you get the interval that
 you're after.  I've verified this with calls to gettimeofday() and
 the POSIX clocks.

Okay.

So, can you read this (it's short):

http://ck.kolivas.org/patches/bfs/bfs-configuration-faq.txt

And please tell me if he's batshit crazy and I should completely
ignore any scheduler discussion that isn't ARM-specific, or maybe..
and I can almost guarantee this, he doesn't have an ARM platform so
he's just delightfully ill-informed about anything but his quad-core
x86?

 HZ=250 is the current kernel default if you don't touch anything, it
 seems, apologies for thinking it was HZ=100.

 Actually, it always used to be 100Hz on everything, including x86.
 It got upped when there were interactivity issues... which haven't
 been reported on ARM - so why change something that we know works and
 everyone is happy with?

I don't know. I guess this is why I included Ingo and Peter as they
seem to be responsible for core HZ-related things; why have HZ=250 on
x86 when CONFIG_NO_HZ and HZ=100 would work just as effectively? Isn't
CONFIG_NO_HZ the default on x86 and PPC and.. pretty much everything
else?

I know Con K. has been accused many times of peddling snake-oil... but
he has pretty graphs and benchmarks that kind of bear him out on most
things even if the results do not get his work upstream. I can't fault
the statistical significance of his results.. but even a placebo
effect can be graphed, correlation is not causation, etc, etc. - I
don't know if anything real filters down into the documentation
though.

 And that is too high for
 EBSA110 and a couple of other boards, especially where HZ must equal
 some exact divisor being pumped right into some timer unit.

 EBSA110 can do 250Hz, but it'll mean manually recalculating the timer
 arithmetic - because it's not a reloading counter - software has to
 manually reload it, and you have to take account of how far it's
 rolled over to get anything close to a regular interrupt rate which
 NTP is happy with.  And believe me, it used to be one of two main NTP
 broadcasting servers on my network, so I know it works.

A-ha...

 Anyway, a patch for ARM could perhaps end up like this:

 ~~
 if ARCH_MULTIPLATFORM
 source kernel/Kconfig.hz
 else
 HZ
 default 100
 endif

 HZ
 default 200 if ARCH_EBSA110 || ARCH_ETC_ETC || ARCH_UND_SO_WEITER
 # any previous platform definitions where *really* required here.
 # but not default 100 since it would override kernel/Kconfig.hz every 
 time

 That doesn't work - if you define the same symbol twice, one definition
 takes priority over the other (I don't remember which way it works).
 They don't accumulate.

Well I did some testing.. a couple days of poking around, and they
don't need to accumulate.

 Because... it simply doesn't work like that.  Try it and check to see
 what Kconfig produces.

I did test it.. whatever you define last, sticks, and it's down to the
order they're parsed in the tree - luckily, arch/arm/Kconfig is
sourced first, which sources the mach/plat stuff way down at  the
bottom. As long as you have your default set somewhere, any further
default just has to be sourced or added later in *one* of the
Kconfigs, same as building any C file with gcc -E and spitting it
out.

Someone, at the end of it all, has to set some default, and as long as
the one you want is the last one, everything is shiny.

 We know this, because our FRAME_POINTER config overrides the generic
 one - not partially, but totally and utterly in every way.

But for something as simple as CONFIG_HZ getting a value.. it works
okay. If Kconfig.hz sets CONFIG_HZ=250 because CONFIG_HZ_250 is
default yes, and it CONFIG_HZ defaults to 250 if it's set

Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 5:13 PM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Mon, Jan 21, 2013 at 04:54:31PM -0600, Matt Sealey wrote:
 Hmm, I think it might be appreciated for people looking at this stuff
 (same as I stumbled into it) for a little comment on WHY the default
 is 200. That way you don't wonder even if you know why EBSA110 has a
 HZ=200 default, why Exynos is lumped in there too (to reduce the
 number of interrupts firing?

 Err, _reduce_ ?

 Can you please explain why changing HZ from 100 to 200 is a reduction?

We were talking about HZ=1000 at the time, sorry...

 Maybe the Exynos timer interrupt is kind
 of a horrid core NMI kind of thing and it's desirable for it not to be
 every millisecond,

 Huh?  HZ=100 is centisecond intervals...

See above..

 I think you're understanding is just wy off.  That default is
 there because that is the _architecture_ _default_ and there _has_ to
 be a default.  No, including kernel/Kconfig.hz won't give us any kind
 of non-specified default because, as I've already said in one of my
 other mails, you can't supplement Kconfig symbol definitions by
 declaring it multiple times.

Okay, so the real



 I know where the 60Hz clocksource might come from, the old Amiga
 platforms have one based on the PSU frequency (50Hz in Europe, 60Hz
 US/Japan). Even a 60Hz clocksource is useful though (on the Amiga, at
 least, it is precisely the vsync clock for synchronizing your display
 output on TV-out, which makes it completely useful for the framebuffer
 driver), but.. you just won't expect to assign it as sched_clock or
 your delay timer. And if anyone does I'd expect they'd know full well
 it'd not run so well.

 Except in the UK where it'd be 50Hz for the TV out.  (Lengthy irrelevant
 explanation why this is so for UK cut.)

Read again: 50Hz in Europe.

Australia too. I'm British and I used to have more EU-manufactured
Amigas than I knew what to do with.. so.. just like your NTP story, I
definitely know this already.

 From that description, we are booting with standard HZ on ARM, and the
 core sched_clock (as in we can call setup_sched_clock)
 and/or/both/optionally using a real delay_timer switch to HRT mode if
 we have the right equipment available in the kernel and at runtime on
 the SoC.. but the process scheduler isn't compiled with the means to
 actually take advantage of us being in HRT mode?

 Don't mix sched_clock() into this; it has nothing to do with HZ at all.
 You're confusing your apples with your oranges.

Okay..

 A simple BUILD_BUG_ON and a BUG_ON right after each other in the
 appropriate clocksource driver solves that.. if there's an insistence
 on having at least some rope, we can put them in a field and tell them
 they have to use the moon to actually hang themselves...

 No it doesn't - it introduces a whole load of new ways to make the
 kernel build or boot fail for pointless reasons - more failures, more
 regressions.

 No thank you.

But it would effectively stop users drinking kool-aid.. if you set
your HZ to something stupid, you don't even get a kernel to build, and
certainly don't get to boot past the first 40 lines of boot messages..
I think most people would rather a build error, or a runtime
unmistakable, unmissable warning than a subtle and almost
imperceptible skew in NTP synchronization, to use your example.

-- 
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
Okay so the final resolution of this is;

1) That the arch/arm/Kconfig HZ block is suffering from some cruft

I think we could all be fairly confident that Exynos4 or S5P does not
require HZ=200 - in theory, it has no such timer restrictions like
EBSA110 (the docs I have show a perfectly capable 32-bit timer with a
double-digits MHz input clock, since these are multimedia-class SoCs
it'd be seriously f**ked up if they didn't).

But while some of the entries on this line may be cargo-cult
programming, the original addition on top of EBSA110 *may* be one of
your unreported responsiveness issues.

We could just let some Samsung employees complain when Android 6.x
starts to get laggy with a 3.8 kernel because we forced their HZ=100.

What I would do is predicate a fixed, obvious default on
ARCH_MULTIPLATFORM so that it would get the benefit of a default HZ
that you agree with. It wouldn't CHANGE anything, but it makes it look
less funky, since the non-multiplatform settings would be somewhere
else (it either needs more comments or an if - either way - otherwise
it's potentially confusing);

if ARCH_MULTIPLATFORM
config HZ
   int
   default 100
else
   # old config HZ block here
endif

2) We need to add config SCHED_HRTICK as a copy and paste from
kernel/Kconfig.hz since.. well, I still don't understand exactly what
the true effect would be, but I assume since Arnd is concerned and
John's explanation rings true that it really should be enabled on ARM
systems with the exact same dependencies as kernel/Kconfig.hz.

Or not.. I see it as an oddity until I understand if we really care
about it, but the code seems to be fairly important to the scheduler
and also enabled by default almost everywhere else, which means only
people with really freakish SMP architectures with no ability to use
GENERIC_SMP_HELPERS have ever run these code paths besides ARM. That
kind of leaves ARM in the doghouse.. who knows what weirdo scheduler
reactions are related to it not being enabled. Maybe when it is, HZ
*would* need to be allowed to be bumped when using this code path?
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.


On Mon, Jan 21, 2013 at 5:49 PM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Mon, Jan 21, 2013 at 05:23:33PM -0600, Matt Sealey wrote:
 On Mon, Jan 21, 2013 at 4:42 PM, Russell King - ARM Linux
 li...@arm.linux.org.uk wrote:
  On Mon, Jan 21, 2013 at 04:20:14PM -0600, Matt Sealey wrote:
  I am sorry it sounded if I was being high and mighty about not being
  able to select my own HZ (or being forced by Exynos to be 200 or by
  not being able to test an Exynos board, forced to default to 100). My
  real grievance here is we got a configuration item for the scheduler
  which is being left out of ARM configurations which *can* use high
  resolution timers, but I don't know if this is a real problem or not,
  hence asking about it, and that HZ=100 is the ARM default whether we
  might be able to select that or not.. which seems low.
 
  Well, I have a versatile platform here.  It's the inteligence behind
  the power control system for booting the boards on the nightly tests
  (currently disabled because I'm waiting for my main server to lock up
  again, and I need to use one of the serial ports for that.)
 
  The point is, it talks via I2C to a load of power monitors to read
  samples out.  It does this at sub-100Hz intervals.  Yet the kernel is
  built with HZ=100.  NO_HZ=y and highres timers are enabled... works
  fine.
 
  So, no, HZ=100 is not a limit in that scenario.  With NO_HZ=y and
  highres timers, it all works with epoll() - you get the interval that
  you're after.  I've verified this with calls to gettimeofday() and
  the POSIX clocks.

 Okay.

 So, can you read this (it's short):

 http://ck.kolivas.org/patches/bfs/bfs-configuration-faq.txt

 And please tell me if he's batshit crazy and I should completely
 ignore any scheduler discussion that isn't ARM-specific, or maybe..
 and I can almost guarantee this, he doesn't have an ARM platform so
 he's just delightfully ill-informed about anything but his quad-core
 x86?

 Well... my x86 laptop is... HZ=1000, NO_HZ, HIGH_RES enabled, ondemand...
 doesn't really fit into any of those categories given there.  I'd suggest
 that what's given there is a suggestion/opinion based on behaviours
 observed on x86 platforms.

 Whether it's appropriate for other architectures is not really a proven
 point - is it worth running ARM at 1000Hz when the load from running at
 100Hz is measurable as a definite error in loops_per_jiffy calibration?
 Remember - the load from the interrupt handler at 1000Hz is 10x the load
 at 100Hz.

 Do you want to spend more cycles per second on the possibly multi-layer
 IRQ servicing and timer servicing?

 And what about the interrupt latency issue that we've hit several times
 already with devices taking longer than 10ms to service their peripherals
 because the driver doesn't make use of delayed

Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 6:09 PM, Matt Sealey m...@genesi-usa.com wrote:

[LAKML: about lack of SCHED_HRTICK because we don't use
kernel/Kconfig.hz on ARM)]

 kind of leaves ARM in the doghouse.. who knows what weirdo scheduler
 reactions are related to it not being enabled. Maybe when it is, HZ
 *would* need to be allowed to be bumped when using this code path?

Or conversely maybe this is exactly why the Samsung maintainers
decided they need HZ=200, because SCHED_HRTICK isn't being enabled and
they're experiencing some multimedia lag because of it?

-- 
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] crypto: fix FTBFS with ARM SHA1-asm and THUMB2_KERNEL

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 4:46 PM, Nicolas Pitre n...@fluxnic.net wrote:
 On Mon, 21 Jan 2013, Matt Sealey wrote:

 The optimized assembler SHA1 code for ARM does not conform to Thumb2
 register usage requirements, so it cannot be built when the kernel is
 configured with THUMB2_KERNEL.

 Fix the FTBFS for now by preventing misconfigurations of the kernel.

 Signed-off-by: Matt Sealey m...@genesi-usa.com

 A .arm directive at the top of the assembly code would be a better
 fix, as that wouldn't reduce functionality.

If I recall, doing that last time for ssi-fiq.S was the wrong solution
and it was suggesed proper configuration (on top of possibly rewriting
the assembly) was better than hacking around in the assembly..

 Yet, I'd invite you to have a look at commit 638591cd7b in linux-next.

I took a peek, and invite you to ignore my patch. I only tracked the
top of Linus' tree..

That said, it seems nobody benchmarked this on something different
than IXP425 or KS8695 to see if it's markedly faster than the
(moderately recently updated) C-code implementation outside of the
mentioned in the logs for initial commit? It seems like rather a
specific optimization for a rather specific use case for rather
specific processors (and therefore a small test base) probably meant
for a very specific product line somewhere. Whether you get any
benefit in enabling this config item or not for any other ARM platform
is up for debate, isn't it?

If it *is* in fact much faster everywhere, and it works in any ARM or
THUMB2 configuration, there's a case to be built for it being the
default ARM implementation for AES and SHA1..

This question is to the implementor/committer (Dave McCullough), how
exactly did you measure the benchmark and can we reproduce it on some
other ARM box?

If it's long and laborious and not so important to test the IPsec
tunnel use-case, what would be the simplest possible benchmark to see
if the C vs. assembly version is faster for a particular ARM device? I
can get hold of pretty much any Cortex-A8 or Cortex-A9 that matters, I
have access to a Chromebook for A15, and maybe an i.MX27 or i.MX35 and
a couple Marvell boards (ARMv6) if I set my mind to it... that much
testing implies we find a pretty concise benchmark though with a
fairly common kernel version we can spread around (i.MX, OMAP and the
Chromebook, I can handle, the rest I'm a little wary of bothering to
spend too much time on). I think that could cover a good swath of
not-ARMv5 use cases from lower speeds to quad core monsters.. but I
might stick to i.MX to start with..

--
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 6:51 PM, John Stultz john.stu...@linaro.org wrote:
 On 01/21/2013 02:54 PM, Matt Sealey wrote:

 On Mon, Jan 21, 2013 at 4:36 PM, John Stultz john.stu...@linaro.org
 wrote:

 On 01/21/2013 01:14 PM, Matt Sealey wrote:

 As far as jiffies rating, from jiffies.c:
 .rating= 1, /* lowest valid rating*/

 So I'm not sure what you mean by the debug on the kernel log is telling me
 it has a higher resolution.

Oh, it is just if I actually don't run setup_sched_clock on my
platform, it gives a little message (with #define DEBUG 1 in
sched_clock.c) about who setup the last sched_clock. Since you only
get one chance, and I was fiddling with setup_sched_clock being probed
from multiple possible timers from device tree (i.MX3 has a crapload
of valid timers, which one you use right now is basically forced by
the not-quite-fully-DT-only code and some funky iomap tricks).

And what I got was, if I use the real hardware timer, it runs at 66MHz
and says it has 15ns resolution and wraps every 500 seconds or so. The
jiffies timer says it's 750MHz, with a 2ns resoluton.. you get the
drift. The generic reporting of how good the sched_clock source is
kind of glosses over the quality rating of the clock source and at
first glance (if you're not paying that much attention), it is a
little bit misleading..

 Yes, in the case I was remembering, the 60HZ was driven by the electrical
 line.

While I have your attention, what would be the minimum good speed to
run the sched_clock or delay timer implementation from? My rudimentary
scribblings in my notebook give me a value of don't bother with less
than 10KHz based on HZ=100, so I'm wondering if a direct 32.768KHz
clock would do (i.MX osc clock input if I can supply it to one of the
above myriad timers) since this would be low-power compared to a 66MHz
one (by a couple mA anyway). I also have a bunch of questions about
the delay timer requirements.. I might mail you personally.. or would
you prefer on-list?

-- 
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
On Mon, Jan 21, 2013 at 7:18 PM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Mon, Jan 21, 2013 at 07:06:59PM -0600, Matt Sealey wrote:
 On Mon, Jan 21, 2013 at 6:51 PM, John Stultz john.stu...@linaro.org wrote:
  On 01/21/2013 02:54 PM, Matt Sealey wrote:

 sched_clock() has nothing to do with time keeping, and that
 HZ/NO_HZ/HRTIMERS don't affect it (when it isn't being derived from
 jiffies).

 Now, sched_clock() is there to give the scheduler a _fast_ to access,
 higher resolution clock than is available from other sources, so that
 there's ways of accurately measuring the amount of time processes run
 for,

That depends on what you meant by timekeeping, right?

I'm really not concerned about the wallclock time, more about the
accuracy of the scheduler clock (tick?), preemption, accurate delays
(i.e. if I msleep(10) does it delay for 10ms or for 40ms because my
delay timer is inaccurate? I'd rather it was better but closer to
10ms), and whether the scheduler (the thing that tells my userspace
whether firefox is running now, or totem, or any other task) is using
the correct high resolution periodic, oneshot, repeatable (however it
repeats) timers *properly* given that this magic config item is
missing on ARM.

That magic config item being CONFIG_SCHED_HRTICK which is referenced a
bunch in kernel/sched/*.[ch] but *ONLY* defined as a Kconfig item in
kernel/Kconfig.hz.

Do we need to copy that Kconfig item out to arch/arm/Kconfig, that's
the question?

  and other such measurements - and it uses that to determine how
 to schedule a particular task and when to preempt it.

 Not providing it means you get those measurements at HZ-based resolution,
 which is suboptimal for tasks which run often for sub-HZ periods (which
 can end up accumulating zero run time.)

Okay, and John said earlier:

John Stultz:
 So I'm actually not super familiar with SCHED_HRTICK details, but from my
 brief skim of it it looks like its useful for turning off the periodic timer
 tick, and allowing the scheduler tick to be triggered by an hrtimer itself
 (There's a number of these interesting inversions that go on in switching to
 HRT mode - for instance, standard timer ticks are switched to being hrtimer
 events themselves).

 This likely has the benefit of time-accurate preemption (well, long term, as
 if the timer granularity isn't matching you could be delayed up to a tick -
 but it wouldn't drift).

 I'm guessing Thomas would probably know best what the potential issues would
 be from running ((CONFIG_HRTIMER  || CONFIG_NO_HZ)  !CONFIG_SCHED_HRTICK).

If SCHED_HRTICK isn't enabled but setup_sched_clock has been given an
accessor for a real, hardware, fast, high resolution counter that
meets all the needs of sched_clock, what's going on? If it's enabled,
what extra is it doing that, say, my_plat_read_sched_clock doesn't?

--
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: One of these things (CONFIG_HZ) is not like the others..

2013-01-21 Thread Matt Sealey
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.


On Mon, Jan 21, 2013 at 7:31 PM, John Stultz john.stu...@linaro.org wrote:
 On 01/21/2013 05:06 PM, Matt Sealey wrote:

 On Mon, Jan 21, 2013 at 6:51 PM, John Stultz john.stu...@linaro.org
 wrote:

 On 01/21/2013 02:54 PM, Matt Sealey wrote:

 On Mon, Jan 21, 2013 at 4:36 PM, John Stultz john.stu...@linaro.org
 wrote:

 On 01/21/2013 01:14 PM, Matt Sealey wrote:

 As far as jiffies rating, from jiffies.c:
  .rating= 1, /* lowest valid rating*/

 So I'm not sure what you mean by the debug on the kernel log is telling
 me
 it has a higher resolution.

 Oh, it is just if I actually don't run setup_sched_clock on my
 platform, it gives a little message (with #define DEBUG 1 in
 sched_clock.c) about who setup the last sched_clock. Since you only
 get one chance, and I was fiddling with setup_sched_clock being probed
 from multiple possible timers from device tree (i.MX3 has a crapload
 of valid timers, which one you use right now is basically forced by
 the not-quite-fully-DT-only code and some funky iomap tricks).

 And what I got was, if I use the real hardware timer, it runs at 66MHz
 and says it has 15ns resolution and wraps every 500 seconds or so. The
 jiffies timer says it's 750MHz, with a 2ns resoluton.. you get the
 drift. The generic reporting of how good the sched_clock source is
 kind of glosses over the quality rating of the clock source and at
 first glance (if you're not paying that much attention), it is a
 little bit misleading..


 I've got no clue on this. sched_clock is arch specific, and while ARM does
 use clocksources for sched_clock, what you're seeing is a detail of the ARM
 implementation and not the clocksource code (one complication is that
 clocksources rating values are for the requirements of timekeeping, which
 are different then the requirements for sched_clock - so the confusion is
 understandable).



 Yes, in the case I was remembering, the 60HZ was driven by the electrical
 line.

 While I have your attention, what would be the minimum good speed to
 run the sched_clock or delay timer implementation from? My rudimentary
 scribblings in my notebook give me a value of don't bother with less
 than 10KHz based on HZ=100, so I'm wondering if a direct 32.768KHz
 clock would do (i.MX osc clock input if I can supply it to one of the
 above myriad timers) since this would be low-power compared to a 66MHz
 one (by a couple mA anyway). I also have a bunch of questions about
 the delay timer requirements.. I might mail you personally.. or would
 you prefer on-list?

 So there are probably other folks who could better comment on sched_clock()
 or the delay timer (I'm guessing the delay() implementation is what you mean
 by that) design trade-offs.

I'm specifically talking about if I do

static struct delay_timer imx_gpt_delay_timer = {
.read_current_timer = imx_gpt_read_current_timer,
};

and then something like:

imx_gpt_delay_timer.freq = clk_get_rate(clk_per);
register_current_timer_delay(imx_gpt_delay_timer);

In the sense that now (as of kernel 3.7 iirc), I have an ability to
have the delay implementation use this awesome fast accessor (which is
nothing to do with a 'clocksource' as in the subsystem..) to get to my
(here at least) 66.5MHz counter (up or down, i.MX has both, but I
dunno if you can use a down counter for delay_timer, or if that's
preferred, or what.. there are no examples of it.. but it seems to
work.. that said I can't imagine what would be an immediately visible
and not totally random effect of doing it wrong, maybe that delays
are instantly returned, that could be very hard or impossible to ever
notice compared to not being able to browse the internet on the target
device.. it might pop up on some randomly-not-resetting platform
device or so, though..)

And I can also put sched_clock on a completely different timer. Does
that make any sense at all? I wouldn't know, it's not documented.

And if I wanted to I could register 8 more timers. That seems rather
excessive, but the ability to use those extra 8 as clock outputs from
the SoC or otherwise directly use comparators is useful to some
people, does Linux in general really give a damn about having 8 timers
of the same quality being available when most systems barely have two
clocksources anyway (on x86, tsc and hpet - on ARM I guess twd and
some SoC-specific timer). I dunno how many people might actually want
to define in a device tree, but I figure every single one is not a bad
thing and which ones end up as sched_clock, delay_timer or just plain
registered clocksources, or not registered as a clocksource and
accessed as some kind of comparator through some kooky ioctl API, is
something you would also configure...

 But again, someone more familiar with the scheduler and driver requirements
 would probably be more informational.

Okay. I assume that's a combination of Russell and Thomas..

-- 
Matt Sealey

Re: Compilation problem with drivers/staging/zsmalloc when !SMP on ARM

2013-01-18 Thread Matt Sealey
On Fri, Jan 18, 2013 at 3:08 PM, Russell King - ARM Linux
 wrote:
> On Fri, Jan 18, 2013 at 02:24:15PM -0600, Matt Sealey wrote:
>> Hello all,
>>
>> I wonder if anyone can shed some light on this linking problem I have
>> right now. If I configure my kernel without SMP support (it is a very
>> lean config for i.MX51 with device tree support only) I hit this error
>> on linking:
>
> Yes, I looked at this, and I've decided that I will _not_ fix this export,
> neither will I accept a patch to add an export.

Understood..

> As far as I can see, this code is buggy in a SMP environment.  There's
> apparantly no guarantee that:
>
> 1. the mapping will be created on a particular CPU.
> 2. the mapping will then be used only on this specific CPU.
> 3. no guarantee that another CPU won't speculatively prefetch from this
>region.
> 4. when the mapping is torn down, no guarantee that it's the same CPU that
>used the happing.
>
> So, the use of the local TLB flush leaves all the other CPUs potentially
> containing TLB entries for this mapping.

I'm gonna put this out to the maintainers (Konrad, and Seth since he
committed it) that if this code is buggy it gets taken back out, even
if it makes zsmalloc "slow" on ARM, for the following reasons:

* It's buggy on SMP as Russell describes above
* It might not be buggy on UP (opposite to Russell's description above
as the restrictions he states do not exist), but that would imply an
export for a really core internal MM function nobody should be using
anyway
* By that assessment, using that core internal MM function on SMP is
also bad voodoo that zsmalloc should not be doing

It also either smacks of a lack of comprehensive testing or defiance
of logic that nobody ever built the code without CONFIG_SMP, which
means it was only tested on a bunch of SMP ARM systems (I'm guessing..
Pandaboard? :) or UP systems with SMP/SMP_ON_UP enabled (to expand on
that guess, maybe Beagleboard in some multiplatform Beagle/Panda
hybrid kernel). I am sure I was reading the mailing lists when that
patch was discussed, coded and committed and my guess is correct. In
this case, what we have here anyway is code which when PROPERLY
configured as so..

diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c
b/drivers/staging/zsmalloc/zsmalloc-main.c
index 09a9d35..ecf75fb 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -228,7 +228,7 @@ struct zs_pool {
  * mapping rather than copying
  * for object mapping.
 */
-#if defined(CONFIG_ARM)
+#if defined(CONFIG_ARM) && defined(CONFIG_SMP)
 #define USE_PGTABLE_MAPPING
 #endif

.. such that it even compiles in both "guess" configurations, the
slower Cortex-A8 600MHz single core system gets to use the slow copy
path and the dual-core 1GHz+ Cortex-A9 (with twice the RAM..) gets to
use the fast mapping path. Essentially all the patch does is "improve
performance" on the fastest, best-configured, large-amounts-of-RAM,
lots-of-CPU-performance ARM systems (OMAP4+, Snapdragon, Exynos4+,
marvell armada, i.MX6..) while introducing the problems Russell
describes, and leave performance exactly the same and potentially far
more stable on the slower, memory-limited ARM machines.

Given the purpose of zsmalloc, zram, zcache etc. this somewhat defies
logic. If it's not making the memory-limited, slow ARM systems run
better, what's the point?

So in summary I suggest "we" (Greg? or is it Seth's responsibility?)
should just back out that whole USE_PGTABLE_MAPPING chunk of code
introduced with f553646. Then Russell can carry on randconfiging and I
can build for SMP and UP and get the same code.. with less bugs.

I am sure zsmalloc/zram/zcache2 are not so awful at the end of the day
despite the churn in staging.. but the amount of time I just spent
today with my brain on fire because of cross-referencing mm code for a
linker error, when all I wanted was a non-SMP kernel, I feel Greg's
hurt a little bit.

--
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Compilation problem with drivers/staging/zsmalloc when !SMP on ARM

2013-01-18 Thread Matt Sealey
Hello all,

I wonder if anyone can shed some light on this linking problem I have
right now. If I configure my kernel without SMP support (it is a very
lean config for i.MX51 with device tree support only) I hit this error
on linking:

  MODPOST 216 modules
  LZMAarch/arm/boot/compressed/piggy.lzma
  AS  arch/arm/boot/compressed/lib1funcs.o
ERROR: "v7wbi_flush_kern_tlb_range"
[drivers/staging/zsmalloc/zsmalloc.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
make: *** Waiting for unfinished jobs

I am wondering if someone who is far better versed in the mm code can
tell me exactly what would end up calling this function and why it
would only be in the kernel at all if SMP was enabled - all I can see
right now is that it is PROBABLY something resulting from this code
and comment inside the zsmalloc driver:

/*
 * By default, zsmalloc uses a copy-based object mapping method to access
 * allocations that span two pages. However, if a particular architecture
 * 1) Implements local_flush_tlb_kernel_range() and 2) Performs VM mapping
 * faster than copying, then it should be added here so that
 * USE_PGTABLE_MAPPING is defined. This causes zsmalloc to use page table
 * mapping rather than copying
 * for object mapping.
*/
#if defined(CONFIG_ARM)
#define USE_PGTABLE_MAPPING
#endif

And of course, once USE_PGTABLE_MAPPING is enabled,
local_flush_tlb_kernel_range being used is the actual culprit here.

But the question is, for the ARM guys, shouldn't
local_flush_tlb_kernel_range actually be defined in the kernel build,
even without SMP, even if it would be architecturally a no-op on UP
systems, and then CONFIG_SMP_ON_UP would catch this case?

If not, then the fix is obvious, the check inside zsmalloc for
CONFIG_ARM should be fixed to check specifically for
local_flush_tlb_kernel_range definition as well, or for CONFIG_SMP as
well, or the non-presence of CONFIG_SMP_ON_UP or something?

The build cases I have tested are basically CONFIG_SMP +
CONFIG_SMP_ON_UP (since it's a single-core Cortex-A8) and without
CONFIG_SMP (since it's a Cortex-A8..) using imx_v7_defconfig and
imx_v6_v7_defconfig alike and making the appropriate adjustments.
Trying to compile it with CONFIG_SMP without CONFIG_SMP_ON_UP makes no
sense on my target system - but essentially it only links with
CONFIG_SMP.

I got no clue what I am looking at in arch/arm/mm and related, so I am
unsure precisely how I should proceed in patching it with the intent
it goes upstream.. or what the real implication of this kind of memory
management actually is on SMP vs. UP systems, but the intended
functionality of local_flush_tlb_kernel_range seems like it should
exist even on UP, to me.

--
Matt Sealey 
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Compilation problem with drivers/staging/zsmalloc when !SMP on ARM

2013-01-18 Thread Matt Sealey
Hello all,

I wonder if anyone can shed some light on this linking problem I have
right now. If I configure my kernel without SMP support (it is a very
lean config for i.MX51 with device tree support only) I hit this error
on linking:

  MODPOST 216 modules
  LZMAarch/arm/boot/compressed/piggy.lzma
  AS  arch/arm/boot/compressed/lib1funcs.o
ERROR: v7wbi_flush_kern_tlb_range
[drivers/staging/zsmalloc/zsmalloc.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
make: *** Waiting for unfinished jobs

I am wondering if someone who is far better versed in the mm code can
tell me exactly what would end up calling this function and why it
would only be in the kernel at all if SMP was enabled - all I can see
right now is that it is PROBABLY something resulting from this code
and comment inside the zsmalloc driver:

/*
 * By default, zsmalloc uses a copy-based object mapping method to access
 * allocations that span two pages. However, if a particular architecture
 * 1) Implements local_flush_tlb_kernel_range() and 2) Performs VM mapping
 * faster than copying, then it should be added here so that
 * USE_PGTABLE_MAPPING is defined. This causes zsmalloc to use page table
 * mapping rather than copying
 * for object mapping.
*/
#if defined(CONFIG_ARM)
#define USE_PGTABLE_MAPPING
#endif

And of course, once USE_PGTABLE_MAPPING is enabled,
local_flush_tlb_kernel_range being used is the actual culprit here.

But the question is, for the ARM guys, shouldn't
local_flush_tlb_kernel_range actually be defined in the kernel build,
even without SMP, even if it would be architecturally a no-op on UP
systems, and then CONFIG_SMP_ON_UP would catch this case?

If not, then the fix is obvious, the check inside zsmalloc for
CONFIG_ARM should be fixed to check specifically for
local_flush_tlb_kernel_range definition as well, or for CONFIG_SMP as
well, or the non-presence of CONFIG_SMP_ON_UP or something?

The build cases I have tested are basically CONFIG_SMP +
CONFIG_SMP_ON_UP (since it's a single-core Cortex-A8) and without
CONFIG_SMP (since it's a Cortex-A8..) using imx_v7_defconfig and
imx_v6_v7_defconfig alike and making the appropriate adjustments.
Trying to compile it with CONFIG_SMP without CONFIG_SMP_ON_UP makes no
sense on my target system - but essentially it only links with
CONFIG_SMP.

I got no clue what I am looking at in arch/arm/mm and related, so I am
unsure precisely how I should proceed in patching it with the intent
it goes upstream.. or what the real implication of this kind of memory
management actually is on SMP vs. UP systems, but the intended
functionality of local_flush_tlb_kernel_range seems like it should
exist even on UP, to me.

--
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Compilation problem with drivers/staging/zsmalloc when !SMP on ARM

2013-01-18 Thread Matt Sealey
On Fri, Jan 18, 2013 at 3:08 PM, Russell King - ARM Linux
li...@arm.linux.org.uk wrote:
 On Fri, Jan 18, 2013 at 02:24:15PM -0600, Matt Sealey wrote:
 Hello all,

 I wonder if anyone can shed some light on this linking problem I have
 right now. If I configure my kernel without SMP support (it is a very
 lean config for i.MX51 with device tree support only) I hit this error
 on linking:

 Yes, I looked at this, and I've decided that I will _not_ fix this export,
 neither will I accept a patch to add an export.

Understood..

 As far as I can see, this code is buggy in a SMP environment.  There's
 apparantly no guarantee that:

 1. the mapping will be created on a particular CPU.
 2. the mapping will then be used only on this specific CPU.
 3. no guarantee that another CPU won't speculatively prefetch from this
region.
 4. when the mapping is torn down, no guarantee that it's the same CPU that
used the happing.

 So, the use of the local TLB flush leaves all the other CPUs potentially
 containing TLB entries for this mapping.

I'm gonna put this out to the maintainers (Konrad, and Seth since he
committed it) that if this code is buggy it gets taken back out, even
if it makes zsmalloc slow on ARM, for the following reasons:

* It's buggy on SMP as Russell describes above
* It might not be buggy on UP (opposite to Russell's description above
as the restrictions he states do not exist), but that would imply an
export for a really core internal MM function nobody should be using
anyway
* By that assessment, using that core internal MM function on SMP is
also bad voodoo that zsmalloc should not be doing

It also either smacks of a lack of comprehensive testing or defiance
of logic that nobody ever built the code without CONFIG_SMP, which
means it was only tested on a bunch of SMP ARM systems (I'm guessing..
Pandaboard? :) or UP systems with SMP/SMP_ON_UP enabled (to expand on
that guess, maybe Beagleboard in some multiplatform Beagle/Panda
hybrid kernel). I am sure I was reading the mailing lists when that
patch was discussed, coded and committed and my guess is correct. In
this case, what we have here anyway is code which when PROPERLY
configured as so..

diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c
b/drivers/staging/zsmalloc/zsmalloc-main.c
index 09a9d35..ecf75fb 100644
--- a/drivers/staging/zsmalloc/zsmalloc-main.c
+++ b/drivers/staging/zsmalloc/zsmalloc-main.c
@@ -228,7 +228,7 @@ struct zs_pool {
  * mapping rather than copying
  * for object mapping.
 */
-#if defined(CONFIG_ARM)
+#if defined(CONFIG_ARM)  defined(CONFIG_SMP)
 #define USE_PGTABLE_MAPPING
 #endif

.. such that it even compiles in both guess configurations, the
slower Cortex-A8 600MHz single core system gets to use the slow copy
path and the dual-core 1GHz+ Cortex-A9 (with twice the RAM..) gets to
use the fast mapping path. Essentially all the patch does is improve
performance on the fastest, best-configured, large-amounts-of-RAM,
lots-of-CPU-performance ARM systems (OMAP4+, Snapdragon, Exynos4+,
marvell armada, i.MX6..) while introducing the problems Russell
describes, and leave performance exactly the same and potentially far
more stable on the slower, memory-limited ARM machines.

Given the purpose of zsmalloc, zram, zcache etc. this somewhat defies
logic. If it's not making the memory-limited, slow ARM systems run
better, what's the point?

So in summary I suggest we (Greg? or is it Seth's responsibility?)
should just back out that whole USE_PGTABLE_MAPPING chunk of code
introduced with f553646. Then Russell can carry on randconfiging and I
can build for SMP and UP and get the same code.. with less bugs.

I am sure zsmalloc/zram/zcache2 are not so awful at the end of the day
despite the churn in staging.. but the amount of time I just spent
today with my brain on fire because of cross-referencing mm code for a
linker error, when all I wanted was a non-SMP kernel, I feel Greg's
hurt a little bit.

--
Matt Sealey m...@genesi-usa.com
Product Development Analyst, Genesi USA, Inc.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ehci-mxc: remove Efika MX-specific CHRGVBUS hack

2012-12-23 Thread Matt Sealey
Since Efika MX platform support (pre-devicetree) was removed from the tree
this code no longer has any possibility of running and clutters up the
driver which is being replaced by the chipidea host in the future anyway.

Signed-off-by: Matt Sealey 
Tested-by: Steev Klimazewski 
CC: Sascha Hauer 
CC: Alan Stern 

---
 drivers/usb/host/ehci-mxc.c |   20 
 1 file changed, 20 deletions(-)

diff --git a/drivers/usb/host/ehci-mxc.c b/drivers/usb/host/ehci-mxc.c
index 8e58a5f..b62c3a8 100644
--- a/drivers/usb/host/ehci-mxc.c
+++ b/drivers/usb/host/ehci-mxc.c
@@ -102,7 +102,6 @@ static int ehci_mxc_drv_probe(struct platform_device *pdev)
struct usb_hcd *hcd;
struct resource *res;
int irq, ret;
-   unsigned int flags;
struct ehci_mxc_priv *priv;
struct device *dev = >dev;
struct ehci_hcd *ehci;
@@ -213,25 +212,6 @@ static int ehci_mxc_drv_probe(struct platform_device *pdev)
if (ret)
goto err_add;
 
-   if (pdata->otg) {
-   /*
-* efikamx and efikasb have some hardware bug which is
-* preventing usb to work unless CHRGVBUS is set.
-* It's in violation of USB specs
-*/
-   if (machine_is_mx51_efikamx() || machine_is_mx51_efikasb()) {
-   flags = usb_phy_io_read(pdata->otg,
-   ULPI_OTG_CTRL);
-   flags |= ULPI_OTG_CTRL_CHRGVBUS;
-   ret = usb_phy_io_write(pdata->otg, flags,
-   ULPI_OTG_CTRL);
-   if (ret) {
-   dev_err(dev, "unable to set CHRVBUS\n");
-   goto err_add;
-   }
-   }
-   }
-
return 0;
 
 err_add:
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ehci-mxc: remove Efika MX-specific CHRGVBUS hack

2012-12-23 Thread Matt Sealey
Since Efika MX platform support (pre-devicetree) was removed from the tree
this code no longer has any possibility of running and clutters up the
driver which is being replaced by the chipidea host in the future anyway.

Signed-off-by: Matt Sealey 
Tested-by: Steev Klimazewski 
CC: Sascha Hauer 
CC: Alan Stern 

---
 drivers/usb/host/ehci-mxc.c |   20 
 1 file changed, 20 deletions(-)

diff --git a/drivers/usb/host/ehci-mxc.c b/drivers/usb/host/ehci-mxc.c
index 8e58a5f..b62c3a8 100644
--- a/drivers/usb/host/ehci-mxc.c
+++ b/drivers/usb/host/ehci-mxc.c
@@ -102,7 +102,6 @@ static int ehci_mxc_drv_probe(struct platform_device *pdev)
struct usb_hcd *hcd;
struct resource *res;
int irq, ret;
-   unsigned int flags;
struct ehci_mxc_priv *priv;
struct device *dev = >dev;
struct ehci_hcd *ehci;
@@ -213,25 +212,6 @@ static int ehci_mxc_drv_probe(struct platform_device *pdev)
if (ret)
goto err_add;
 
-   if (pdata->otg) {
-   /*
-* efikamx and efikasb have some hardware bug which is
-* preventing usb to work unless CHRGVBUS is set.
-* It's in violation of USB specs
-*/
-   if (machine_is_mx51_efikamx() || machine_is_mx51_efikasb()) {
-   flags = usb_phy_io_read(pdata->otg,
-   ULPI_OTG_CTRL);
-   flags |= ULPI_OTG_CTRL_CHRGVBUS;
-   ret = usb_phy_io_write(pdata->otg, flags,
-   ULPI_OTG_CTRL);
-   if (ret) {
-   dev_err(dev, "unable to set CHRVBUS\n");
-   goto err_add;
-   }
-   }
-   }
-
return 0;
 
 err_add:
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ehci-mxc: remove Efika MX-specific CHRGVBUS hack

2012-12-23 Thread Matt Sealey
Since Efika MX platform support (pre-devicetree) was removed from the tree
this code no longer has any possibility of running and clutters up the
driver which is being replaced by the chipidea host in the future anyway.

Signed-off-by: Matt Sealey m...@genesi-usa.com
Tested-by: Steev Klimazewski st...@genesi-usa.com
CC: Sascha Hauer ker...@pengutronix.de
CC: Alan Stern st...@rowland.harvard.edu

---
 drivers/usb/host/ehci-mxc.c |   20 
 1 file changed, 20 deletions(-)

diff --git a/drivers/usb/host/ehci-mxc.c b/drivers/usb/host/ehci-mxc.c
index 8e58a5f..b62c3a8 100644
--- a/drivers/usb/host/ehci-mxc.c
+++ b/drivers/usb/host/ehci-mxc.c
@@ -102,7 +102,6 @@ static int ehci_mxc_drv_probe(struct platform_device *pdev)
struct usb_hcd *hcd;
struct resource *res;
int irq, ret;
-   unsigned int flags;
struct ehci_mxc_priv *priv;
struct device *dev = pdev-dev;
struct ehci_hcd *ehci;
@@ -213,25 +212,6 @@ static int ehci_mxc_drv_probe(struct platform_device *pdev)
if (ret)
goto err_add;
 
-   if (pdata-otg) {
-   /*
-* efikamx and efikasb have some hardware bug which is
-* preventing usb to work unless CHRGVBUS is set.
-* It's in violation of USB specs
-*/
-   if (machine_is_mx51_efikamx() || machine_is_mx51_efikasb()) {
-   flags = usb_phy_io_read(pdata-otg,
-   ULPI_OTG_CTRL);
-   flags |= ULPI_OTG_CTRL_CHRGVBUS;
-   ret = usb_phy_io_write(pdata-otg, flags,
-   ULPI_OTG_CTRL);
-   if (ret) {
-   dev_err(dev, unable to set CHRVBUS\n);
-   goto err_add;
-   }
-   }
-   }
-
return 0;
 
 err_add:
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ehci-mxc: remove Efika MX-specific CHRGVBUS hack

2012-12-23 Thread Matt Sealey
Since Efika MX platform support (pre-devicetree) was removed from the tree
this code no longer has any possibility of running and clutters up the
driver which is being replaced by the chipidea host in the future anyway.

Signed-off-by: Matt Sealey m...@genesi-usa.com
Tested-by: Steev Klimazewski st...@genesi-usa.com
CC: Sascha Hauer ker...@pengutronix.de
CC: Alan Stern st...@rowland.harvard.edu

---
 drivers/usb/host/ehci-mxc.c |   20 
 1 file changed, 20 deletions(-)

diff --git a/drivers/usb/host/ehci-mxc.c b/drivers/usb/host/ehci-mxc.c
index 8e58a5f..b62c3a8 100644
--- a/drivers/usb/host/ehci-mxc.c
+++ b/drivers/usb/host/ehci-mxc.c
@@ -102,7 +102,6 @@ static int ehci_mxc_drv_probe(struct platform_device *pdev)
struct usb_hcd *hcd;
struct resource *res;
int irq, ret;
-   unsigned int flags;
struct ehci_mxc_priv *priv;
struct device *dev = pdev-dev;
struct ehci_hcd *ehci;
@@ -213,25 +212,6 @@ static int ehci_mxc_drv_probe(struct platform_device *pdev)
if (ret)
goto err_add;
 
-   if (pdata-otg) {
-   /*
-* efikamx and efikasb have some hardware bug which is
-* preventing usb to work unless CHRGVBUS is set.
-* It's in violation of USB specs
-*/
-   if (machine_is_mx51_efikamx() || machine_is_mx51_efikasb()) {
-   flags = usb_phy_io_read(pdata-otg,
-   ULPI_OTG_CTRL);
-   flags |= ULPI_OTG_CTRL_CHRGVBUS;
-   ret = usb_phy_io_write(pdata-otg, flags,
-   ULPI_OTG_CTRL);
-   if (ret) {
-   dev_err(dev, unable to set CHRVBUS\n);
-   goto err_add;
-   }
-   }
-   }
-
return 0;
 
 err_add:
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >