Re: PROBLEM: 5.8-rc7 no video output with nouveau on NV36 (regression)

2020-07-28 Thread Nick Bowler
On 2020-07-29, Dave Airlie  wrote:
> On Wed, 29 Jul 2020 at 15:05, Nick Bowler  wrote:
>>
>> Hi,
>>
>> After installing Linux 5.8-rc7 I seem to get no video output on my
>> NV36 card once the nouveau module is loaded.  The display (connected
>> to the digital output) simply reports "No Signal".
>>
>> I bisected to the following commit, and reverting this commit on
>> top of 5.8-rc7 appears to correct the issue.
>
> Can you test the drm fixes pull I just sent to Linus
>
> https://patchwork.freedesktop.org/patch/381225/

Yes, pulling this seems to fix things.

Thanks,
  Nick


PROBLEM: 5.8-rc7 no video output with nouveau on NV36 (regression)

2020-07-28 Thread Nick Bowler
Hi,

After installing Linux 5.8-rc7 I seem to get no video output on my
NV36 card once the nouveau module is loaded.  The display (connected
to the digital output) simply reports "No Signal".

I bisected to the following commit, and reverting this commit on
top of 5.8-rc7 appears to correct the issue.

  fa4f4c213f5f7807360c41f2501a3031a9940f3a is the first bad commit
  commit fa4f4c213f5f7807360c41f2501a3031a9940f3a
  Author: James Jones 
  Date:   Mon Feb 10 15:15:55 2020 -0800
  
  drm/nouveau/kms: Support NVIDIA format modifiers
  
  Allow setting the block layout of a nouveau FB
  object using DRM format modifiers.  When
  specified, the format modifier block layout and
  kind overrides the GEM buffer's implicit layout
  and kind.  The specified format modifier is
  validated against the list of modifiers supported
  by the target display hardware.
  
  v2: Used Tesla family instead of NV50 chipset compare
  v4: Do not cache kind, tile_mode in nouveau_framebuffer
  v5: Resolved against nouveau_framebuffer cleanup
  
  Signed-off-by: James Jones 
  Signed-off-by: Ben Skeggs 
  
   drivers/gpu/drm/nouveau/dispnv50/wndw.c   | 20 ---
   drivers/gpu/drm/nouveau/nouveau_display.c | 89 
++-
   drivers/gpu/drm/nouveau/nouveau_display.h |  4 ++
   3 files changed, 104 insertions(+), 9 deletions(-)

The dmesg output from loading the driver is identical except several
lines are missing in the non-working case, which I have marked with
"XXX" below:

  [  168.222926] PCI Interrupt Link [LNKE] enabled at IRQ 16
  [  168.223199] nouveau :01:00.0: vgaarb: deactivate vga console
  [  168.224379] Console: switching to colour dummy device 80x25
  [  168.224612] nouveau :01:00.0: NVIDIA NV36 (436200a1)
  [  168.324779] nouveau :01:00.0: bios: version 04.36.20.21.00
  [  168.325646] agpgart-amd64 :00:00.0: AGP 3.0 bridge
  [  168.325657] agpgart: modprobe tried to set rate=x12. Setting to AGP3 
x8 mode.
  [  168.325662] agpgart-amd64 :00:00.0: putting AGP V3 device into 8x 
mode
  [  168.325679] nouveau :01:00.0: putting AGP V3 device into 8x mode
  [  168.325908] agpgart-amd64 :00:00.0: AGP 3.0 bridge
  [  168.325914] agpgart: modprobe tried to set rate=x12. Setting to AGP3 
x8 mode.
  [  168.325918] agpgart-amd64 :00:00.0: putting AGP V3 device into 8x 
mode
  [  168.325933] nouveau :01:00.0: putting AGP V3 device into 8x mode
  [  168.325990] nouveau :01:00.0: tmr: unknown input clock freq
  [  168.326732] nouveau :01:00.0: fb: 256 MiB DDR1
  [  168.328174] [TTM] Zone  kernel: Available graphics memory: 1022540 KiB
  [  168.328175] [TTM] Initializing pool allocator
  [  168.328181] [TTM] Initializing DMA pool allocator
  [  168.328200] nouveau :01:00.0: DRM: VRAM: 255 MiB
  [  168.328201] nouveau :01:00.0: DRM: GART: 128 MiB
  [  168.328204] nouveau :01:00.0: DRM: BMP version 5.40
  [  168.328208] nouveau :01:00.0: DRM: DCB version 2.2
  [  168.328210] nouveau :01:00.0: DRM: DCB outp 00: 01000300 9c40
  [  168.328214] nouveau :01:00.0: DRM: DCB outp 01: 02010310 9c40
  [  168.328215] nouveau :01:00.0: DRM: DCB outp 02: 04000302 
  [  168.328217] nouveau :01:00.0: DRM: DCB outp 03: 02020321 0303
  [  168.328495] nouveau :01:00.0: DRM: Loading NV17 power sequencing 
microcode
  [  168.329691] nouveau :01:00.0: DRM: MM: using M2MF for buffer copies
  [  168.330258] nouveau :01:00.0: DRM: Saving VGA fonts
  [  168.389460] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
  [  168.391250] nouveau :01:00.0: DRM: Setting dpms mode 3 on TV 
encoder (output 3)
  XXX [  168.487647] nouveau :01:00.0: DRM: allocated 1920x1080 fb: 0x9000, 
bo ff426de1
  XXX [  168.491835] fbcon: nouveaudrmfb (fb0) is primary device
  XXX [  168.608512] nouveau :01:00.0: DRM: 0xE4FB: Parsing digital output 
script table
  XXX [  168.662451] Console: switching to colour frame buffer device 240x67
  XXX [  168.755987] nouveau :01:00.0: fb0: nouveaudrmfb frame buffer device
  [  168.763736] [drm] Initialized nouveau 1.3.1 20120801 for :01:00.0 
on minor 0

Let me know if you need any more info.

Cheers,
  Nick


Re: [PATCH] Re: PROBLEM: cryptsetup fails to unlock drive in 5.8-rc6 (regression)

2020-07-27 Thread Nick Bowler
On 2020-07-27, Al Viro  wrote:
> On Mon, Jul 27, 2020 at 05:05:54PM +0100, Al Viro wrote:
>> On Thu, Jul 23, 2020 at 11:51:01AM -0400, Nick Bowler wrote:
>> > After installing Linux 5.8-rc6, it seems cryptsetup can no longer
>> > open LUKS volumes.  Regardless of the entered passphrase (correct
>> > or otherwise), the result is a very unhelpful "Keyslot open failed."
>> > message.
[...]
> Oh, fuck...  Please see if the following fixes your reproducer; the braino
> is, of course, that instead of fetching ucmsg->cmsg_len into ucmlen we read
> the entire thing into cmsg.  Other uses of ucmlen had been replaced with
> cmsg.cmsg_len; this one was missed.
>
> Signed-off-by: Al Viro 
> ---
> diff --git a/net/compat.c b/net/compat.c
> index 5e3041a2c37d..434838bef5f8 100644
> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -202,7 +202,7 @@ int cmsghdr_from_user_compat_to_kern(struct msghdr
> *kmsg, struct sock *sk,
>
>   /* Advance. */
>   kcmsg = (struct cmsghdr *)((char *)kcmsg + tmp);
> - ucmsg = cmsg_compat_nxthdr(kmsg, ucmsg, ucmlen);
> + ucmsg = cmsg_compat_nxthdr(kmsg, ucmsg, cmsg.cmsg_len);
>   }
>
>   /*

This patch appears to resolve the problem when applied on top of 5.8-rc7.

Thanks,
  Nick


Re: PROBLEM: cryptsetup fails to unlock drive in 5.8-rc6 (regression)

2020-07-27 Thread Nick Bowler
On 2020-07-27, Al Viro  wrote:
> On Thu, Jul 23, 2020 at 11:51:01AM -0400, Nick Bowler wrote:
>> Hi,
>>
>> After installing Linux 5.8-rc6, it seems cryptsetup can no longer
>> open LUKS volumes.  Regardless of the entered passphrase (correct
>> or otherwise), the result is a very unhelpful "Keyslot open failed."
>> message.
>>
>> On the kernels which fail, I also noticed that the cryptsetup
>> benchmark command appears to not be able to determine that any
>> ciphers are available (output at end of message), possibly for
>> the same reason.
>>
>> Bisected to the following commit, which suggests a problem specific
>> to compat userspace (this is amd64 kernel).  I tested both ia32 and
>> x32 userspace to confirm the problem.  Reverting this commit on top
>> of 5.8-rc6 resolves the issue.
>>
>> Looking at strace output the failing syscall appears to be:
>>
>>   sendmsg(8, {msg_name=NULL, msg_namelen=0,
>>   msg_iov=[{iov_base=..., iov_len=512}], msg_iovlen=1,
>>   msg_control=[{cmsg_len=16, cmsg_level=SOL_ALG,
>>   cmsg_type=0x3}, {cmsg_len=32, cmsg_level=SOL_ALG,
>>   cmsg_type=0x2}], msg_controllen=48, msg_flags=0}, 0)
>>   = -1 EINVAL (Invalid argument)
>
> Huh?  Just in case - could you verify that on the kernel with that
> commit reverted the same sendmsg() succeeds?

Seems so; with commit 547ce4cfb34c reverted on top of 5.8-rc6 there is
no such error in the strace output.  This particular syscall seems
to be succeeding:

  sendmsg(8, {msg_name=NULL, msg_namelen=0,
  msg_iov=[{iov_base=..., iov_len=512}], msg_iovlen=1,
  msg_control=[{cmsg_len=16, cmsg_level=SOL_ALG,
  cmsg_type=0x3}, {cmsg_len=32, cmsg_level=SOL_ALG,
  cmsg_type=0x2}], msg_controllen=48, msg_flags=0}, 0) = 512

Cheers,
  Nick


PROBLEM: cryptsetup fails to unlock drive in 5.8-rc6 (regression)

2020-07-23 Thread Nick Bowler
Hi,

After installing Linux 5.8-rc6, it seems cryptsetup can no longer
open LUKS volumes.  Regardless of the entered passphrase (correct
or otherwise), the result is a very unhelpful "Keyslot open failed."
message.

On the kernels which fail, I also noticed that the cryptsetup
benchmark command appears to not be able to determine that any
ciphers are available (output at end of message), possibly for
the same reason.

Bisected to the following commit, which suggests a problem specific
to compat userspace (this is amd64 kernel).  I tested both ia32 and
x32 userspace to confirm the problem.  Reverting this commit on top
of 5.8-rc6 resolves the issue.

Looking at strace output the failing syscall appears to be:

  sendmsg(8, {msg_name=NULL, msg_namelen=0, 
 msg_iov=[{iov_base=..., iov_len=512}], msg_iovlen=1,
 msg_control=[{cmsg_len=16, cmsg_level=SOL_ALG,
 cmsg_type=0x3}, {cmsg_len=32, cmsg_level=SOL_ALG,
 cmsg_type=0x2}], msg_controllen=48, msg_flags=0}, 0)
 = -1 EINVAL (Invalid argument)

where fd 8 is the descriptor received after "accept" from the AF_ALG
socket bound to the skcipher algorithm.

  547ce4cfb34cdecfa0ee19c29a5510329a7ac802 is the first bad commit
  commit 547ce4cfb34cdecfa0ee19c29a5510329a7ac802
  Author: Al Viro 
  Date:   Sun May 31 02:06:55 2020 +0100

  switch cmsghdr_from_user_compat_to_kern() to copy_from_user()

  no point getting compat_cmsghdr field-by-field

  Signed-off-by: Al Viro 
  Signed-off-by: David S. Miller 

   net/compat.c | 15 ---
   1 file changed, 8 insertions(+), 7 deletions(-)

  # cryptsetup open /dev/nvme0n1p2 test
  Enter passphrase for /dev/nvme0n1p2:
  Keyslot open failed.
  
  # cryptsetup benchmark
  # Tests are approximate using memory only (no storage IO).
  PBKDF2-sha1   362077 iterations per second for 256-bit key
  PBKDF2-sha256 503155 iterations per second for 256-bit key
  PBKDF2-sha512 396586 iterations per second for 256-bit key
  PBKDF2-ripemd160  283398 iterations per second for 256-bit key
  PBKDF2-whirlpool  159649 iterations per second for 256-bit key
  argon2i   4 iterations, 111601 memory, 4 parallel threads (CPUs) for 
256-bit key (requested 2000 ms time)
  argon2id  4 iterations, 112215 memory, 4 parallel threads (CPUs) for 
256-bit key (requested 2000 ms time)
  # Algorithm |   Key |  Encryption |  Decryption
  aes-cbc128b   N/A   N/A
  serpent-cbc128b   N/A   N/A
  twofish-cbc128b   N/A   N/A
  aes-cbc256b   N/A   N/A
  serpent-cbc256b   N/A   N/A
  twofish-cbc256b   N/A   N/A
  aes-xts256b   N/A   N/A
  serpent-xts256b   N/A   N/A
  twofish-xts256b   N/A   N/A
  aes-xts512b   N/A   N/A
  serpent-xts512b   N/A   N/A
  twofish-xts512b   N/A   N/A

Cheers,
-- 
Nick Bowler


Re: PROBLEM: nfs? crash in Linux 5.3 (possible regression)

2019-10-09 Thread Nick Bowler
On 9/20/19, Nick Bowler  wrote:
> I hit this oops on Linux 5.3 yesterday.  The crash itself occurred while
> compiling Linux (source and build dirs on NFS).  Afterwards, the system
> remained mostly alive but my NFS mounts became very busted with lots
> (but not all) I/O operations appearing to hang forever.

Well that took a long time but I completed the bisection.  I had to
disable CONFIG_JUMP_LABEL because this option was apparently causing
almost every kernel between 5.2 and 5.3-rc1 to be unbootable for some
reason which made successful bisection impossible.

The first commit which exhibits the oops described is 218e6424e711
("keys: Garbage collect keys for which the domain has been removed").

The immediately preceding 8 commits are "skipped" because they crash
quite differently from the issue seen in Linux 5.3.  After 218e6424e711
the observed crash is always exactly the same.

However the crash in the "skipped" kernels occurs under similar
circumstances with significantly more catastrophic results: the system
becoming completely non-responsive (I think basically every user process
is killed), nothing is saved, making it difficult to obtain logs.  I can
try netconsole or something if it would help.

# only skipped commits left to test
# possible first bad commit:
[218e6424e711ceee31eeba93212fed8ee92d6a11] keys: Garbage collect keys
for which the domain has been removed
# possible first bad commit:
[3b6e4de05e9ee2e2f94e4a3fe14d945e2418d9a8] keys: Include target
namespace in match criteria
# possible first bad commit:
[0f44e4d976f96c6439da0d6717238efa4b91196e] keys: Move the user and
user-session keyrings to the user_namespace
# possible first bad commit:
[b206f281d0ee14969878469816a69db22d5838e8] keys: Namespace keyring
names
# possible first bad commit:
[dcf49dbc8077e278ddd1bc7298abc781496e8a08] keys: Add a 'recurse' flag
for keyring searches
# possible first bad commit:
[355ef8e15885020da88f5ba2d85ce42b1d01f537] keys: Cache the hash value
to avoid lots of recalculation
# possible first bad commit:
[f771fde82051976a6fc0fd570f8b86de4a92124b] keys: Simplify key
description management
# possible first bad commit:
[3b8c4a08a471d56ecaaca939c972fdf5b8255629] keys: Kill off
request_key_async{,_with_auxdata}
# possible first bad commit:
[7743c48e54ee9be9c799cbf3b8e3e9f2b8d19e72] keys: Cache result of
request_key*() temporarily in task_struct

Thanks,
  Nick

> [  796.050025] BUG: kernel NULL pointer dereference, address: 0014
> [  796.051280] #PF: supervisor read access in kernel mode
> [  796.053063] #PF: error_code(0x) - not-present page
> [  796.054636] PGD 0 P4D 0
> [  796.055688] Oops:  [#1] PREEMPT SMP
> [  796.056768] CPU: 2 PID: 190 Comm: kworker/2:2 Tainted: GW   
> 5.3.0 #6
> [  796.057953] Hardware name: To Be Filled By O.E.M. To Be Filled By 
> O.E.M./B450 Gaming-ITX/ac, BIOS P3.30 05/17/2019
> [  796.059329] Workqueue: events key_garbage_collector
> [  796.060623] RIP: 0010:keyring_gc_check_iterator+0x27/0x30
> [  796.061845] Code: 44 00 00 48 83 e7 fc b8 01 00 00 00 f6 87 80 00 00 00 21 
> 75 19 48 8b 57 58 48 39 16 7c 05 48 85 d2 7f 0b 48 8b 87 a0 00 00 00 <0f> b6 
> 40 14 c3 0f 1f 40 00 48 83 e7 fc e9 27 eb ff ff 0f 1f 80 00
> [  796.064638] RSP: 0018:b40fc0757df8 EFLAGS: 00010282
> [  796.066058] RAX:  RBX: a14338caed80 RCX: 
> b40fc0757e40
> [  796.067531] RDX: a1433ae85558 RSI: b40fc0757e40 RDI: 
> a1433ae85500
> [  796.069014] RBP: b40fc0757e40 R08:  R09: 
> 000f
> [  796.070513] R10: 8080808080808080 R11: 0001 R12: 
> a4cd6180
> [  796.072025] R13: a14338caee10 R14: a14338caedf0 R15: 
> a1433ffeff00
> [  796.073567] FS:  () GS:a1434048() 
> knlGS:
> [  796.075171] CS:  0010 DS:  ES:  CR0: 80050033
> [  796.076785] CR2: 0014 CR3: 000747ce6000 CR4: 
> 003406e0
> [  796.078445] Call Trace:
> [  796.080091]  assoc_array_subtree_iterate+0x55/0x100
> [  796.081770]  keyring_gc+0x3f/0x80
> [  796.083447]  key_garbage_collector+0x330/0x3d0
> [  796.085155]  process_one_work+0x1cb/0x320
> [  796.086869]  worker_thread+0x28/0x3c0
> [  796.088603]  ? process_one_work+0x320/0x320
> [  796.090335]  kthread+0x106/0x120
> [  796.092053]  ? kthread_create_on_node+0x40/0x40
> [  796.093810]  ret_from_fork+0x1f/0x30
> [  796.095569] Modules linked in: sha1_ssse3 sha1_generic cbc cts 
> rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace ext4 crc16 mbcache jbd2 
> iwlmvm mac80211 libarc4 amdgpu iwlwifi snd_hda_codec_realtek 
> snd_hda_codec_generic kvm_amd gpu_sched kvm snd_hda_codec_hdmi drm_kms_helper 
> irqbypass k10temp syscopyarea sysfillrect sysimgblt fb_sys_fops video ttm 
> cfg80211

Re: PROBLEM: nfs? crash in Linux 5.3 (possible regression)

2019-09-24 Thread Nick Bowler
On 9/20/19, Nick Bowler  wrote:
> On 9/20/19, Trond Myklebust  wrote:
>> On Fri, 2019-09-20 at 14:23 -0400, Nick Bowler wrote:
>>> Not sure how reproducible this is.  Since I've never seen a crash
>>> like this before it may be a regression compared to, say, Linux 4.19
>>> but I am not certain because this particular machine is brand new so
>>> I don't have experience with older kernels on it...
>
> So it actually seems pretty reliably reproducible, 4 attempts to compile
> Linux on Linux 5.3 and all four crash the same way, although there's
> definitely some randomness here...
>
> On the other hand, I cannot reproduce if I install Linux 5.2 so it does
> seem like a regression in 5.3.  I will see how well bisecting goes...

Not well I guess?  I'm not sure what it's doing but for some reason git
bisect does not seem to be converging towards any solution.  I run for
several hours before marking 'good' and it only seems to be reducing the
'# of revisions left to test' by a tiny amount, not the ~half like
I'd expect so bleh...

Any ideas what could be wrong?  Hints on how to reproduce faster?

git bisect start
# bad: [4d856f72c10ecb060868ed10ff1b1453943fc6c8] Linux 5.3
git bisect bad 4d856f72c10ecb060868ed10ff1b1453943fc6c8
# good: [0ecfebd2b52404ae0c54a878c872bb93363ada36] Linux 5.2
git bisect good 0ecfebd2b52404ae0c54a878c872bb93363ada36
# skip: [c236b6dd48dcf2ae6ed14b9068830eccc3e181e6] Merge tag
'keys-request-20190626' of
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
git bisect skip c236b6dd48dcf2ae6ed14b9068830eccc3e181e6
# skip: [028db3e290f15ac509084c0fc3b9d021f668f877] Revert "Merge tag
'keys-acl-20190703' of
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs"
git bisect skip 028db3e290f15ac509084c0fc3b9d021f668f877
# bad: [002c5f73c508f7df5681bda339831c27f3c1aef4] KVM: x86/mmu:
Reintroduce fast invalidate/zap for flushing memslot
git bisect bad 002c5f73c508f7df5681bda339831c27f3c1aef4
# bad: [d3464ccd105b42f87302572ee1f097e6e0b432c6] Merge tag
'dmaengine-fix-5.3' of git://git.infradead.org/users/vkoul/slave-dma
git bisect bad d3464ccd105b42f87302572ee1f097e6e0b432c6
# bad: [0445971000375859008414f87e7c72fa0d809cf8] Merge tag
'mmc-v5.3-rc7' of
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
git bisect bad 0445971000375859008414f87e7c72fa0d809cf8
# bad: [046ddeed0461b5d270470c253cbb321103d048b6] KVM: Check
preempted_in_kernel for involuntary preemption
git bisect bad 046ddeed0461b5d270470c253cbb321103d048b6
# skip: [8f6ccf6159aed1f04c6d179f61f6fb2691261e84] Merge tag
'clone3-v5.3' of
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
git bisect skip 8f6ccf6159aed1f04c6d179f61f6fb2691261e84
# good: [d84f6269ce24eb4c468e246b24fc0fdce34ab6f6] crypto: ccree -
check that cryptocell reset completed
git bisect good d84f6269ce24eb4c468e246b24fc0fdce34ab6f6
# good: [5f92229d184b80712a8b94d098318960171ae749] ASoC: mxs:
mxs-sgtl5000: don't select unnecessary Platform
git bisect good 5f92229d184b80712a8b94d098318960171ae749
# good: [a2928d28643e3c064ff41397281d20c445525032] r8169: use paged
versions of phylib MDIO access functions
git bisect good a2928d28643e3c064ff41397281d20c445525032
# good: [1b2fc358ddfb1b0915922e441182cda7043f5116] perf tools: Add
missing util.h to pick up 'page_size' variable
git bisect good 1b2fc358ddfb1b0915922e441182cda7043f5116
# good: [a011b49f4ed7813777a15da12a426ab939c58f14] net/mlx5e: Consider
XSK in XDP MTU limit calculation
git bisect good a011b49f4ed7813777a15da12a426ab939c58f14
# good: [89a237aa84c7047cafba99f5dc81983ed0c40704] staging: kpc2000:
Use '%llx' for printing 'long long int' type
git bisect good 89a237aa84c7047cafba99f5dc81983ed0c40704
# good: [60e8523e2ea18dc0c0cea69d6c1d69a065019062] ocxl: Allow
contexts to be attached with a NULL mm
git bisect good 60e8523e2ea18dc0c0cea69d6c1d69a065019062
# good: [452181936931f0f08923aba5e04e1e9ef58c389f] afs: Trace afs_server usage
git bisect good 452181936931f0f08923aba5e04e1e9ef58c389f
# good: [78ff751f8e6a9446e9fb26b2bff0b8d3f8974cbd] scsi: mac_scsi: Fix
pseudo DMA implementation, take 2
git bisect good 78ff751f8e6a9446e9fb26b2bff0b8d3f8974cbd
# good: [5315f9d40191f98abcd3164e632a8a8f737b1cf0] rtlwifi: remove
redundant assignment to variable badworden
git bisect good 5315f9d40191f98abcd3164e632a8a8f737b1cf0
# good: [65dc5416d4e02d80ce140078c7c1f4e6c8400396] Merge tag
'batadv-next-for-davem-20190627v2' of
git://git.open-mesh.org/linux-merge
git bisect good 65dc5416d4e02d80ce140078c7c1f4e6c8400396
# skip: [0248a8be6d21dad72b9ce80a7565cf13c11509d8] Merge tag
'gfs2-for-5.3' of
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
git bisect skip 0248a8be6d21dad72b9ce80a7565cf13c11509d8
# good: [aa0bfcd939c30617385ffa28682c062d78050eba] 

Re: PROBLEM: nfs? crash in Linux 5.3 (possible regression)

2019-09-20 Thread Nick Bowler
On 9/20/19, Trond Myklebust  wrote:
> On Fri, 2019-09-20 at 14:23 -0400, Nick Bowler wrote:
>> Not sure how reproducible this is.  Since I've never seen a crash
>> like this before it may be a regression compared to, say, Linux 4.19
>> but I am not certain because this particular machine is brand new so
>> I don't have experience with older kernels on it...

So it actually seems pretty reliably reproducible, 4 attempts to compile
Linux on Linux 5.3 and all four crash the same way, although there's
definitely some randomness here...

On the other hand, I cannot reproduce if I install Linux 5.2 so it does
seem like a regression in 5.3.  I will see how well bisecting goes...

>> [  796.050025] BUG: kernel NULL pointer dereference, address:
>> 0014
>> [  796.051280] #PF: supervisor read access in kernel mode
>> [  796.053063] #PF: error_code(0x) - not-present page
>> [  796.054636] PGD 0 P4D 0
>> [  796.055688] Oops:  [#1] PREEMPT SMP
>> [  796.056768] CPU: 2 PID: 190 Comm: kworker/2:2 Tainted: GW
>>   5.3.0 #6
>> [  796.057953] Hardware name: To Be Filled By O.E.M. To Be Filled By
>> O.E.M./B450 Gaming-ITX/ac, BIOS P3.30 05/17/2019
>> [  796.059329] Workqueue: events key_garbage_collector
>> [  796.060623] RIP: 0010:keyring_gc_check_iterator+0x27/0x30
>
> That would be the keyring garbage collector, not NFS.
>
> Cced keyri...@vger.kernel.org
>
>
>> [  796.061845] Code: 44 00 00 48 83 e7 fc b8 01 00 00 00 f6 87 80 00
>> 00 00 21 75 19 48 8b 57 58 48 39 16 7c 05 48 85 d2 7f 0b 48 8b 87 a0
>> 00 00 00 <0f> b6 40 14 c3 0f 1f 40 00 48 83 e7 fc e9 27 eb ff ff 0f
>> 1f
>> 80 00
>> [  796.064638] RSP: 0018:b40fc0757df8 EFLAGS: 00010282
>> [  796.066058] RAX:  RBX: a14338caed80 RCX:
>> b40fc0757e40
>> [  796.067531] RDX: a1433ae85558 RSI: b40fc0757e40 RDI:
>> a1433ae85500
>> [  796.069014] RBP: b40fc0757e40 R08:  R09:
>> 000f
>> [  796.070513] R10: 8080808080808080 R11: 0001 R12:
>> a4cd6180
>> [  796.072025] R13: a14338caee10 R14: a14338caedf0 R15:
>> a1433ffeff00
>> [  796.073567] FS:  () GS:a1434048()
>> knlGS:
>> [  796.075171] CS:  0010 DS:  ES:  CR0: 80050033
>> [  796.076785] CR2: 0014 CR3: 000747ce6000 CR4:
>> 003406e0
>> [  796.078445] Call Trace:
>> [  796.080091]  assoc_array_subtree_iterate+0x55/0x100
>> [  796.081770]  keyring_gc+0x3f/0x80
>> [  796.083447]  key_garbage_collector+0x330/0x3d0
>> [  796.085155]  process_one_work+0x1cb/0x320
>> [  796.086869]  worker_thread+0x28/0x3c0
>> [  796.088603]  ? process_one_work+0x320/0x320
>> [  796.090335]  kthread+0x106/0x120
>> [  796.092053]  ? kthread_create_on_node+0x40/0x40
>> [  796.093810]  ret_from_fork+0x1f/0x30
>> [  796.095569] Modules linked in: sha1_ssse3 sha1_generic cbc cts
>> rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace ext4 crc16 mbcache
>> jbd2 iwlmvm mac80211 libarc4 amdgpu iwlwifi snd_hda_codec_realtek
>> snd_hda_codec_generic kvm_amd gpu_sched kvm snd_hda_codec_hdmi
>> drm_kms_helper irqbypass k10temp syscopyarea sysfillrect sysimgblt
>> fb_sys_fops video ttm cfg80211 snd_hda_intel snd_hda_codec drm
>> snd_hwdep rfkill snd_hda_core backlight snd_pcm evdev snd_timer snd
>> soundcore efivarfs dm_crypt hid_generic igb hwmon i2c_algo_bit sr_mod
>> cdrom sunrpc dm_mod
>> [  796.104033] CR2: 0014
>> [  796.106304] ---[ end trace 695aee10f9202347 ]---
>> [  796.108585] RIP: 0010:keyring_gc_check_iterator+0x27/0x30
>> [  796.110894] Code: 44 00 00 48 83 e7 fc b8 01 00 00 00 f6 87 80 00
>> 00 00 21 75 19 48 8b 57 58 48 39 16 7c 05 48 85 d2 7f 0b 48 8b 87 a0
>> 00 00 00 <0f> b6 40 14 c3 0f 1f 40 00 48 83 e7 fc e9 27 eb ff ff 0f
>> 1f
>> 80 00
>> [  796.115773] RSP: 0018:b40fc0757df8 EFLAGS: 00010282
>> [  796.118209] RAX:  RBX: a14338caed80 RCX:
>> b40fc0757e40
>> [  796.120683] RDX: a1433ae85558 RSI: b40fc0757e40 RDI:
>> a1433ae85500
>> [  796.123176] RBP: b40fc0757e40 R08:  R09:
>> 000f
>> [  796.125668] R10: 8080808080808080 R11: 0001 R12:
>> a4cd6180
>> [  796.128104] R13: a14338caee10 R14: a14338caedf0 R15:
>> a1433ffeff00
>> [  796.130493] FS:  () GS:a1434048()
>> knlGS:
>> [  796.132923] CS:  0010 DS:  ES:  CR0: 80050033
>> [  796.135266] CR2: 0014 CR3: 000747ce6000 CR4:
>> 003406e0
>


PROBLEM: nfs? crash in Linux 5.3 (possible regression)

2019-09-20 Thread Nick Bowler
Hi all,

I hit this oops on Linux 5.3 yesterday.  The crash itself occurred while
compiling Linux (source and build dirs on NFS).  Afterwards, the system
remained mostly alive but my NFS mounts became very busted with lots
(but not all) I/O operations appearing to hang forever.

Not sure how reproducible this is.  Since I've never seen a crash
like this before it may be a regression compared to, say, Linux 4.19
but I am not certain because this particular machine is brand new so
I don't have experience with older kernels on it...

Full dmesg is attached (gzipped).

Let me know if you need any more info.

[  796.050025] BUG: kernel NULL pointer dereference, address: 0014
[  796.051280] #PF: supervisor read access in kernel mode
[  796.053063] #PF: error_code(0x) - not-present page
[  796.054636] PGD 0 P4D 0
[  796.055688] Oops:  [#1] PREEMPT SMP
[  796.056768] CPU: 2 PID: 190 Comm: kworker/2:2 Tainted: GW
  5.3.0 #6
[  796.057953] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./B450 Gaming-ITX/ac, BIOS P3.30 05/17/2019
[  796.059329] Workqueue: events key_garbage_collector
[  796.060623] RIP: 0010:keyring_gc_check_iterator+0x27/0x30
[  796.061845] Code: 44 00 00 48 83 e7 fc b8 01 00 00 00 f6 87 80 00
00 00 21 75 19 48 8b 57 58 48 39 16 7c 05 48 85 d2 7f 0b 48 8b 87 a0
00 00 00 <0f> b6 40 14 c3 0f 1f 40 00 48 83 e7 fc e9 27 eb ff ff 0f 1f
80 00
[  796.064638] RSP: 0018:b40fc0757df8 EFLAGS: 00010282
[  796.066058] RAX:  RBX: a14338caed80 RCX: b40fc0757e40
[  796.067531] RDX: a1433ae85558 RSI: b40fc0757e40 RDI: a1433ae85500
[  796.069014] RBP: b40fc0757e40 R08:  R09: 000f
[  796.070513] R10: 8080808080808080 R11: 0001 R12: a4cd6180
[  796.072025] R13: a14338caee10 R14: a14338caedf0 R15: a1433ffeff00
[  796.073567] FS:  () GS:a1434048()
knlGS:
[  796.075171] CS:  0010 DS:  ES:  CR0: 80050033
[  796.076785] CR2: 0014 CR3: 000747ce6000 CR4: 003406e0
[  796.078445] Call Trace:
[  796.080091]  assoc_array_subtree_iterate+0x55/0x100
[  796.081770]  keyring_gc+0x3f/0x80
[  796.083447]  key_garbage_collector+0x330/0x3d0
[  796.085155]  process_one_work+0x1cb/0x320
[  796.086869]  worker_thread+0x28/0x3c0
[  796.088603]  ? process_one_work+0x320/0x320
[  796.090335]  kthread+0x106/0x120
[  796.092053]  ? kthread_create_on_node+0x40/0x40
[  796.093810]  ret_from_fork+0x1f/0x30
[  796.095569] Modules linked in: sha1_ssse3 sha1_generic cbc cts
rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace ext4 crc16 mbcache
jbd2 iwlmvm mac80211 libarc4 amdgpu iwlwifi snd_hda_codec_realtek
snd_hda_codec_generic kvm_amd gpu_sched kvm snd_hda_codec_hdmi
drm_kms_helper irqbypass k10temp syscopyarea sysfillrect sysimgblt
fb_sys_fops video ttm cfg80211 snd_hda_intel snd_hda_codec drm
snd_hwdep rfkill snd_hda_core backlight snd_pcm evdev snd_timer snd
soundcore efivarfs dm_crypt hid_generic igb hwmon i2c_algo_bit sr_mod
cdrom sunrpc dm_mod
[  796.104033] CR2: 0014
[  796.106304] ---[ end trace 695aee10f9202347 ]---
[  796.108585] RIP: 0010:keyring_gc_check_iterator+0x27/0x30
[  796.110894] Code: 44 00 00 48 83 e7 fc b8 01 00 00 00 f6 87 80 00
00 00 21 75 19 48 8b 57 58 48 39 16 7c 05 48 85 d2 7f 0b 48 8b 87 a0
00 00 00 <0f> b6 40 14 c3 0f 1f 40 00 48 83 e7 fc e9 27 eb ff ff 0f 1f
80 00
[  796.115773] RSP: 0018:b40fc0757df8 EFLAGS: 00010282
[  796.118209] RAX:  RBX: a14338caed80 RCX: b40fc0757e40
[  796.120683] RDX: a1433ae85558 RSI: b40fc0757e40 RDI: a1433ae85500
[  796.123176] RBP: b40fc0757e40 R08:  R09: 000f
[  796.125668] R10: 8080808080808080 R11: 0001 R12: a4cd6180
[  796.128104] R13: a14338caee10 R14: a14338caedf0 R15: a1433ffeff00
[  796.130493] FS:  () GS:a1434048()
knlGS:
[  796.132923] CS:  0010 DS:  ES:  CR0: 80050033
[  796.135266] CR2: 0014 CR3: 000747ce6000 CR4: 003406e0

Thanks,
  Nick


atlas-crash.log.gz
Description: GNU Zip compressed data


Re: PROBLEM: oops spew with Linux 5.1.5 (NFS regression?)

2019-06-03 Thread Nick Bowler
On 2019-06-03, Nick Bowler  wrote:
> On 2019-05-29, Olga Kornievskaia  wrote:
>> On Wed, May 29, 2019 at 1:14 PM Trond Myklebust 
>>> OK, I think this is the same problem that Olga was seeing (Cced), and
>>> it looks like I missed the use-after-free issue when the server returns
>>> a credential error when she asked.
>>
>> I think this is actually different than what I encountered for the
>> umount case but the trigger is the same -- failing validation.
>>
>> I tried to reproduce Nick's oops on 5.2-rc but haven't been able to
>> (but I'm not confident I produced the right trigger conditions. will
>> try 5.1).
>
> OK, I think I found something that triggers this fault.  This happens
> when certain local users try to stat a file or directory on an nfs
> mount.  Presumably these UIDs do not have appropriate permissions on
> the server but I'm not sure exactly (I do not control the server).
>
> I can reproduce the oops with a command like this:
>
>   # su -s/bin/sh -c 'stat /path/to/nfs/file' problematic_user
>
> which oopes every time (and SIGKILLs the stat command).   (I have not yet
> rebooted since the original report or tried with Trond's patch applied.
> I will do that next, and also try 5.1.6).

OK, armed with this reproducer I can confirm that the issue is still
present in 5.1.6, and that applying Trond's patch on top of 5.1.6
appears to fix the problem.

Thanks,
  Nick


Re: PROBLEM: oops spew with Linux 5.1.5 (NFS regression?)

2019-06-03 Thread Nick Bowler
On 2019-05-29, Olga Kornievskaia  wrote:
> On Wed, May 29, 2019 at 1:14 PM Trond Myklebust 
> wrote:
>>
>> On Wed, 2019-05-29 at 11:10 -0400, Nick Bowler wrote:
>> > Hi,
>> >
>> > I upgraded to Linux 5.1.5 on one machine yesterday, and this morning
>> > I happened noticed a large amount of backtraces in the log.  It appears
>> > that the system oopsed 62 times over a period of about 5 minutes,
>> > producing about half a megabyte of log messages, after which the
>> > messages stopped.  No idea what action (if any) triggered these.
>> >
>> > However, other than the noise in the logs there is nothing obviously
>> > broken, but I thought I should report the spews anyway.  I was
>> > running 5.0.9 previously and have not seen any similar errors.  The
>> > first couple spews are appended.  All 64 faults look very similar
>> > to these ones, with the same faulting address and the same
>> > rpc_check_timeout function at the top of the backtrace.
>>
>> OK, I think this is the same problem that Olga was seeing (Cced), and
>> it looks like I missed the use-after-free issue when the server returns
>> a credential error when she asked.
>
> I think this is actually different than what I encountered for the
> umount case but the trigger is the same -- failing validation.
>
> I tried to reproduce Nick's oops on 5.2-rc but haven't been able to
> (but I'm not confident I produced the right trigger conditions. will
> try 5.1).

OK, I think I found something that triggers this fault.  This happens
when certain local users try to stat a file or directory on an nfs
mount.  Presumably these UIDs do not have appropriate permissions on
the server but I'm not sure exactly (I do not control the server).

I can reproduce the oops with a command like this:

  # su -s/bin/sh -c 'stat /path/to/nfs/file' problematic_user

which oopes every time (and SIGKILLs the stat command).   (I have not yet
rebooted since the original report or tried with Trond's patch applied.
I will do that next, and also try 5.1.6).

Cheers,
  Nick


PROBLEM: oops spew with Linux 5.1.5 (NFS regression?)

2019-05-29 Thread Nick Bowler
05-28T16:00:30-04:00 emergent kernel: CS:  0010 DS:  ES:  CR0: 
80050033
2019-05-28T16:00:30-04:00 emergent kernel: CR2: 0098 CR3: 
54043000 CR4: 000406e0
[ and on and on and on on ... ]

-- 
Nick Bowler


Re: Linux 4.14.63

2018-08-16 Thread Nick Bowler
Hi Greg,

On 2018-08-16, Greg KH  wrote:
> I'm announcing the release of the 4.14.63 kernel.
>
> All users of the 4.14 kernel series must upgrade.

This fails to build:

CC  drivers/rtc/rtc-cmos.o
  In file included from /scratch_space/linux/drivers/rtc/rtc-cmos.c:45:0:
  /scratch_space/linux/arch/x86/include/asm/i8259.h: In function ‘inb_pic’:
  /scratch_space/linux/arch/x86/include/asm/i8259.h:33:24: error:
implicit declaration of function ‘inb’
[-Werror=implicit-function-declaration]
unsigned char value = inb(port);
  ^~~
  /scratch_space/linux/arch/x86/include/asm/i8259.h: In function ‘outb_pic’:
  /scratch_space/linux/arch/x86/include/asm/i8259.h:46:2: error:
implicit declaration of function ‘outb’
[-Werror=implicit-function-declaration]
outb(value, port);
^~~~

4.14.62 builds OK.

#including  in i8259.h fixes it.

Cheers,
  Nick


Re: PROBLEM: Asus C201 video mode problems on HDMI hotplug (regression)

2017-12-04 Thread Nick Bowler
On 2017-12-04 13:33 -0500, Nick Bowler wrote:
> On 2017-12-04 10:04 +, Jose Abreu wrote:
> > Hmmm, my first thought was that audio is being configured first
> > because of the phy lock wait time, I've seen this happening before.
> >
> > Lets try this:
> > - Disable all alsa clients (e.g. pulseaudio, ...) so that no one
> > tries to configure audio.
> > - Plug out/in the cable until the issue appears
> > - When the issue appears use aplay to play audio through the HDMI
> > output
> > - Repeat several times with different audio rates and with no
> > resample (you can use the plughw interface in aplay).
> 
> OK, I will give it a try later this evening.

Using the above sequence on unpatched 4.15-rc1 it seems there is no
sound when starting audio output after the pink bar is visible.

However I am not confident of the results here, restarting aplay
with different sample rates (or even restarting with the same rate)
is causing some weird effects on my setup so I want to check the test
setup with some different source devices.

Cheers,
  Nick


Re: PROBLEM: Asus C201 video mode problems on HDMI hotplug (regression)

2017-12-04 Thread Nick Bowler
On 2017-12-04 21:34 +0200, Laurent Pinchart wrote:
> On Monday, 4 December 2017 21:30:01 EET Nick Bowler wrote:
> > On 2017-12-04 21:06 +0200, Laurent Pinchart wrote:
> > > As you reported that the PLL lock failure message is not printed, the
> > > failure can only come from either the extra delay introduced by the
> > > above loop, or from reading the HDMI_PHY_STAT0 register.
> > > 
> > > How many iterations of the for loop execute before the condition
> > > becomes true?
> > 
> > Judging from the log posted elsethread (where I added extra printouts),
> > it seems to consistently become true on the second iteration.
> > 
> > I will try to rule out read side effects by replacing the polling loop
> > with an unconditional delay.
> 
> You're reading my mind :-)

I did this test by applying the following patch on 4.15-rc1, and the
problem remains.  So it appears the delay is responsible somehow.

Cheers,
  Nick

diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c 
b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
index bf14214fa464..4aec4d5c130e 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
@@ -1101,8 +1101,6 @@ static void dw_hdmi_phy_power_off(struct dw_hdmi *hdmi)
 static int dw_hdmi_phy_power_on(struct dw_hdmi *hdmi)
 {
const struct dw_hdmi_phy_data *phy = hdmi->phy.data;
-   unsigned int i;
-   u8 val;

if (phy->gen == 1) {
dw_hdmi_phy_enable_powerdown(hdmi, false);
@@ -1116,21 +1114,7 @@ static int dw_hdmi_phy_power_on(struct dw_hdmi *hdmi)
dw_hdmi_phy_gen2_txpwron(hdmi, 1);
dw_hdmi_phy_gen2_pddq(hdmi, 0);

-   /* Wait for PHY PLL lock */
-   for (i = 0; i < 5; ++i) {
-   val = hdmi_readb(hdmi, HDMI_PHY_STAT0) & HDMI_PHY_TX_PHY_LOCK;
-   if (val)
-   break;
-
-   usleep_range(1000, 2000);
-   }
-
-   if (!val) {
-   dev_err(hdmi->dev, "PHY PLL failed to lock\n");
-   return -ETIMEDOUT;
-   }
-
-   dev_dbg(hdmi->dev, "PHY PLL locked %u iterations\n", i);
+   usleep_range(1000, 2000);
return 0;
 }



Re: PROBLEM: Asus C201 video mode problems on HDMI hotplug (regression)

2017-12-04 Thread Nick Bowler
Hi,

On 2017-12-04 21:06 +0200, Laurent Pinchart wrote:
> As you reported that the PLL lock failure message is not printed, the
> failure can only come from either the extra delay introduced by the
> above loop, or from reading the HDMI_PHY_STAT0 register.
> 
> How many iterations of the for loop execute before the condition
> becomes true?

Judging from the log posted elsethread (where I added extra printouts),
it seems to consistently become true on the second iteration.

I will try to rule out read side effects by replacing the polling loop
with an unconditional delay.

Cheers,
-- 
Nick Bowler


Re: PROBLEM: Asus C201 video mode problems on HDMI hotplug (regression)

2017-12-04 Thread Nick Bowler
On 2017-12-04 10:04 +, Jose Abreu wrote:
> On 03-12-2017 05:20, Nick Bowler wrote:
> > I brought the original test equipment back to the setup so I can
> > see the video and pink bar again.  The symptoms remain the same
> > (unexpected size, pink bar, and no audio).
> >
>
> Can you tell me which test equipment are you using?

I am using an XRGB-Mini Framemeister as the sink device for testing
which isn't "real" test equipment but does show details of the video
and audio mode (which the monitor I have does not do).

The normal setup of this laptop has a small "audio splitter" as
the sink, which among other things includes HDMI input and output
ports and an S/PDIF audio output.

These devices can be connected together in various ways and the
results seem to be consistent in any configuration.

> > It is very consistent: pink bar <=> no audio.
> >
> > My suspicion is that the audio problem is just the wrong video mode
> > on the sink side messing things up, but I have no way of confirming
> > that (that I know of).
>
> Hmmm, my first thought was that audio is being configured first
> because of the phy lock wait time, I've seen this happening before.
>
> Lets try this:
> - Disable all alsa clients (e.g. pulseaudio, ...) so that no one
> tries to configure audio.
> - Plug out/in the cable until the issue appears
> - When the issue appears use aplay to play audio through the HDMI
> output
> - Repeat several times with different audio rates and with no
> resample (you can use the plughw interface in aplay).

OK, I will give it a try later this evening.

Cheers,
  Nick


Re: PROBLEM: Asus C201 video mode problems on HDMI hotplug (regression)

2017-12-02 Thread Nick Bowler
Hi Jose,

On 2017-12-02 17:11 +, Jose Abreu wrote:
> On 01-12-2017 00:11, Nick Bowler wrote:
> > Another data point... the following patch appears sufficient to
> > restore working behaviour.
[...]
> I don't think you can do this. The phy pll lock check is
> recommended and can indicate hw failure. Can you please check if
> this untested, uncompiled patch makes it work correctly ?

Your patch changes things.  With this applied on top of 4.15-rc1
it is failing 100% of the time instead of only half of the time.

I brought the original test equipment back to the setup so I can
see the video and pink bar again.  The symptoms remain the same
(unexpected size, pink bar, and no audio).

PS: your patch seems to have been line wrapped which made it a
bit annoying to apply.

> -->8---
> diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> index bf14214..456fc54 100644
> --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> @@ -1669,7 +1669,7 @@ static void
> hdmi_disable_overflow_interrupts(struct dw_hdmi *hdmi)
>  
>  static int dw_hdmi_setup(struct dw_hdmi *hdmi, struct
> drm_display_mode *mode)
>  {
> -   int ret;
> +   int ret, vsync_len = mode->vsync_end - mode->vsync_start;
>  
> hdmi_disable_overflow_interrupts(hdmi);
>  
> @@ -1722,6 +1722,14 @@ static int dw_hdmi_setup(struct dw_hdmi
> *hdmi, struct drm_display_mode *mode)
> return ret;
> hdmi->phy.enabled = true;
>  
> +   /* Reset all clock domains */
> +   hdmi_writeb(hdmi, 0x00, HDMI_MC_SWRSTZ);
> +
> +   /* Rewrite vsync register to latch previous written values */
> +   if (mode->flags & DRM_MODE_FLAG_INTERLACE)
> +   vsync_len /= 2;
> +   hdmi_writeb(hdmi, vsync_len, HDMI_FC_VSYNCINWIDTH);
> +
> /* HDMI Initialization Step B.3 */
> dw_hdmi_enable_video_path(hdmi);
>  
> -->8---
>
> I would expect this patch to end your wrong image issue but the
> audio part may be a different problem.

It is very consistent: pink bar <=> no audio.

My suspicion is that the audio problem is just the wrong video mode
on the sink side messing things up, but I have no way of confirming
that (that I know of).

Thanks,
  Nick


Re: PROBLEM: Asus C201 video mode problems on HDMI hotplug (regression)

2017-11-30 Thread Nick Bowler
Hi,

On 2017-11-27 22:30 -0500, Nick Bowler wrote:
> A note about the test setup: I had to remove the test equipment so I
> no longer have any information about the video mode from the sink side
> (like in the photos).  Thus, with the current setup, I am using the
> presense or absense of audio to determine whether the issue is present
> or not.
> 
> The test procedure is: boot up, start music, then hotplug the hdmi four
> times.  If sound is heard after all four connections, PASS; otherwise FAIL.
> 
>  - I retested on 4.15-rc1 to confirm that the issue is still present (it is).
> 
>  - I applied the functional revert from earlier on top of 4.15-rc1 and the
>problem is fixed.
> 
>  - Returning to 4.15-rc1 and applying [Laurent Pinchart's] patch --
>the issue is present again (no change in behaviour compared to
>4.15-rc1).

Another data point... the following patch appears sufficient to restore
working behaviour.

Cheers,
  Nick

---
 drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 17 -
 1 file changed, 17 deletions(-)

diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c 
b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
index bf14214fa464..3118fbd8433d 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
@@ -1101,8 +1101,6 @@ static void dw_hdmi_phy_power_off(struct dw_hdmi *hdmi)
 static int dw_hdmi_phy_power_on(struct dw_hdmi *hdmi)
 {
const struct dw_hdmi_phy_data *phy = hdmi->phy.data;
-   unsigned int i;
-   u8 val;
 
if (phy->gen == 1) {
dw_hdmi_phy_enable_powerdown(hdmi, false);
@@ -1116,21 +1114,6 @@ static int dw_hdmi_phy_power_on(struct dw_hdmi *hdmi)
dw_hdmi_phy_gen2_txpwron(hdmi, 1);
dw_hdmi_phy_gen2_pddq(hdmi, 0);
 
-   /* Wait for PHY PLL lock */
-   for (i = 0; i < 5; ++i) {
-   val = hdmi_readb(hdmi, HDMI_PHY_STAT0) & HDMI_PHY_TX_PHY_LOCK;
-   if (val)
-   break;
-
-   usleep_range(1000, 2000);
-   }
-
-   if (!val) {
-   dev_err(hdmi->dev, "PHY PLL failed to lock\n");
-   return -ETIMEDOUT;
-   }
-
-   dev_dbg(hdmi->dev, "PHY PLL locked %u iterations\n", i);
return 0;
 }
 
-- 
2.13.6


Re: PROBLEM: Asus C201 video mode problems on HDMI hotplug (regression)

2017-11-27 Thread Nick Bowler
On 2017-11-27 11:00 +0200, Laurent Pinchart wrote:
> On Monday, 27 November 2017 06:05:03 EET Archit Taneja wrote:
> > On 2017-11-05 11:41 -0500, Nick Bowler wrote:
[...]
> > > Bisection implicates the following commit:
> > > 
> > > 181e0ef092a4952aa523c5b9cb21394cf43bcd46 is the first bad commit
> > > commit 181e0ef092a4952aa523c5b9cb21394cf43bcd46
> > > Author: Laurent Pinchart 
> > > Date:   Mon Mar 6 01:35:57 2017 +0200
> > > 
> > >  drm: bridge: dw-hdmi: Fix the PHY power up sequence
[...]
> > 
> > The two main things the commit below does it to a) correctly wait on the
> > TX_PHY_LOCK bit to be asserted and b) use usleep_range() instead of
> > udelay().
> 
> Another difference is that the PWDN and TMDS signals, in theory needed for 
> Gen1 PHYs only, are not set anymore for Gen2 PHYs. Nick, could you test the 
> following change to see if it makes a difference ?

I do not notice any difference with this change applied on top of Linux
4.15-rc1.

A note about the test setup: I had to remove the test equipment so I
no longer have any information about the video mode from the sink side
(like in the photos).  Thus, with the current setup, I am using the
presense or absense of audio to determine whether the issue is present
or not.

The test procedure is: boot up, start music, then hotplug the hdmi four
times.  If sound is heard after all four connections, PASS; otherwise FAIL.

 - I retested on 4.15-rc1 to confirm that the issue is still present (it is).

 - I applied the functional revert from earlier on top of 4.15-rc1 and the
   problem is fixed.

 - Returning to 4.15-rc1 and applying this next patch -- the issue is
   present again (no change in behaviour compared to 4.15-rc1).

> diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c b/drivers/gpu/drm/
> bridge/synopsys/dw-hdmi.c
> index b172139502d6..1c18ff1bf24a 100644
> --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> @@ -1104,14 +1104,14 @@ static int dw_hdmi_phy_power_on(struct dw_hdmi *hdmi)
>   unsigned int i;
>   u8 val;
>  
> - if (phy->gen == 1) {
> - dw_hdmi_phy_enable_powerdown(hdmi, false);
> + dw_hdmi_phy_enable_powerdown(hdmi, false);
>  
> - /* Toggle TMDS enable. */
> - dw_hdmi_phy_enable_tmds(hdmi, 0);
> - dw_hdmi_phy_enable_tmds(hdmi, 1);
> + /* Toggle TMDS enable. */
> + dw_hdmi_phy_enable_tmds(hdmi, 0);
> + dw_hdmi_phy_enable_tmds(hdmi, 1);
> +
> + if (phy->gen == 1)
>   return 0;
> - }
>  
>   dw_hdmi_phy_gen2_txpwron(hdmi, 1);
>   dw_hdmi_phy_gen2_pddq(hdmi, 0);
> 
> > I don't see (b) being a problem. About (a), it's possible that the bit above
> > is interpreted differently on a rockchip SoC versus a renesas chip. Could
> > you print the value of HDMI_PHY_STAT0 that's read back?
[...]
> > As an experiment, could you forcefully return 0 instead of -ETIMEDOUT and
> > see if things return back to normal?

I did both of these tests at once by applying the below patch on top of
4.15-rc1.  There is no change in behaviour compared to 4.15-rc1 (except
for the added printouts).

With this, every time after inserting the cable the following is printed:

  [  128.002965] dwhdmi-rockchip ff98.hdmi: 0: HDMI_PHY_STAT0: f2
  [  128.004614] dwhdmi-rockchip ff98.hdmi: 1: HDMI_PHY_STAT0: f3
  [  128.013752] dwhdmi-rockchip ff98.hdmi: 0: HDMI_PHY_STAT0: f2
  [  128.015605] dwhdmi-rockchip ff98.hdmi: 1: HDMI_PHY_STAT0: f3

And there is no difference in output between working and non-working
cases.  I've attached the full log; I manually logged extra messages
to give context from the test procedure:

  "hdmi (not) working" - after bootup or connecting the cable
 (indicating test pass/fail)
  "hdmi disconnect"- after unplugging the cable.

diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c 
b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
index bf14214fa464..0358f6020fb4 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
@@ -1118,7 +1118,11 @@ static int dw_hdmi_phy_power_on(struct dw_hdmi *hdmi)
 
/* Wait for PHY PLL lock */
for (i = 0; i < 5; ++i) {
-   val = hdmi_readb(hdmi, HDMI_PHY_STAT0) & HDMI_PHY_TX_PHY_LOCK;
+   val = hdmi_readb(hdmi, HDMI_PHY_STAT0);
+
+   dev_info(hdmi->dev, "%u: HDMI_PHY_STAT0: %.2hhx\n", i, val);
+
+   val &= HDMI_PHY_TX_PHY_LOCK;
if (val)
break;
 
@@ -1127,7 +1131,7 @@ static int dw_hdmi_phy_power_on(struct dw_hdmi *hdmi)
 
if (!val) {
dev_err(hdmi->dev, "PHY PLL failed to lock\n");
-   return -ETIMEDOUT;
+   return 0;
}
 
dev_dbg(hdmi->dev, "PHY PLL locked %u iterations\n", i);

Let me know if there's anything else I should try.

Thanks,
  Nick


aidos-hdmi-stat0.log.gz
Description: Binary data


Re: PROBLEM: Asus C201 video mode problems on HDMI hotplug (regression)

2017-11-27 Thread Nick Bowler
Hi,

On 11/27/17, Laurent Pinchart  wrote:
> The driver should print a "PHY PLL failed to lock" error message to the
> kernel log in that case. Nick, does that happen on your system ?

I will try to test the other things later today, but after bootup there
were no messages whatsoever printed to the kernel log during the test
procedure.

Cheers,
  Nick


Re: PROBLEM: Asus C201 video mode problems on HDMI hotplug (regression)

2017-11-15 Thread Nick Bowler
Hi,

Any ideas on this issue?  Are there any additional tests I can perform
to help debug this?

On 2017-11-05 11:41 -0500, Nick Bowler wrote:
> I completed bisecting this issue.  See below.
> 
> On 2017-11-02, Nick Bowler  wrote:
> > ~50% of the time after a hotplug, there is a vertical pink bar on the
> > left of the display area and audio is not working at all.  According to
> > the sink device the display size is 1282x720 which seems pretty wrong
> > (normal and working situation is 1280x720).
> >
> > I posted photos of non-working versus working states here:
> >
> >   https://imgur.com/a/qhAZG
> >
> > Unplugging and plugging the cable again will correct the issue (it seems
> > to, for the most part, alternate between working and not-working states,
> > although not always).  It always works on power up with the cable initially
> > connected.
> >
> > This is a regression from 4.11, where hotplug works perfectly every time.
> 
> Bisection implicates the following commit:
> 
> 181e0ef092a4952aa523c5b9cb21394cf43bcd46 is the first bad commit
> commit 181e0ef092a4952aa523c5b9cb21394cf43bcd46
> Author: Laurent Pinchart 
> Date:   Mon Mar 6 01:35:57 2017 +0200
> 
> drm: bridge: dw-hdmi: Fix the PHY power up sequence
> 
> When powering the PHY up we need to wait for the PLL to lock. This is
> done by polling the TX_PHY_LOCK bit in the HDMI_PHY_STAT0 register
> (interrupt-based wait could be implemented as well but is likely
> overkill). The bit is asserted when the PLL locks, but the current code
> incorrectly waits for the bit to be deasserted. Fix it, and while at it,
> replace the udelay() with a sleep as the code never runs in
> non-sleepable context.
> 
> To be consistent with the power down implementation move the poll loop
> to the power off function.
> 
> Signed-off-by: Laurent Pinchart 
> 
> Tested-by: Neil Armstrong 
> Reviewed-by: Jose Abreu 
> Signed-off-by: Archit Taneja 
> Link: 
> http://patchwork.freedesktop.org/patch/msgid/20170305233557.11945-1-laurent.pinchart+rene...@ideasonboard.com
> 
> :04 04 0defad9d1a61c0355f49c679b18eebae2c4b9495
> 5d260e6db25d6abc1211d61ec3405be99e693a23 Mdrivers
> 
> This commit does not revert cleanly, but on top of latest master (which has
> the problem) I manually changed the relevant code back to its original state
> and the problem is fixed, like this:
> 
> diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> index bf14214fa464..6618aac95a51 100644
> --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
> @@ -1100,37 +1100,34 @@ static void dw_hdmi_phy_power_off(struct dw_hdmi 
> *hdmi)
> 
>  static int dw_hdmi_phy_power_on(struct dw_hdmi *hdmi)
>  {
> - const struct dw_hdmi_phy_data *phy = hdmi->phy.data;
> - unsigned int i;
> - u8 val;
> + u8 val, msec;
> 
> - if (phy->gen == 1) {
> - dw_hdmi_phy_enable_powerdown(hdmi, false);
> + dw_hdmi_phy_enable_powerdown(hdmi, false);
> 
> - /* Toggle TMDS enable. */
> - dw_hdmi_phy_enable_tmds(hdmi, 0);
> - dw_hdmi_phy_enable_tmds(hdmi, 1);
> - return 0;
> - }
> + /* toggle TMDS enable */
> + dw_hdmi_phy_enable_tmds(hdmi, 0);
> + dw_hdmi_phy_enable_tmds(hdmi, 1);
> 
> + /* gen2 tx power on */
>   dw_hdmi_phy_gen2_txpwron(hdmi, 1);
>   dw_hdmi_phy_gen2_pddq(hdmi, 0);
> 
>   /* Wait for PHY PLL lock */
> - for (i = 0; i < 5; ++i) {
> + msec = 5;
> + do {
>   val = hdmi_readb(hdmi, HDMI_PHY_STAT0) & HDMI_PHY_TX_PHY_LOCK;
> - if (val)
> + if (!val)
>   break;
> 
> - usleep_range(1000, 2000);
> - }
> + if (msec == 0) {
> + dev_err(hdmi->dev, "PHY PLL not locked\n");
> + return -ETIMEDOUT;
> + }
> 
> - if (!val) {
> - dev_err(hdmi->dev, "PHY PLL failed to lock\n");
> - return -ETIMEDOUT;
> - }
> + udelay(1000);
> + msec--;
> + } while (1);
> 
> - dev_dbg(hdmi->dev, "PHY PLL locked %u iterations\n", i);
>   return 0;
>  }
> 

Thanks,
  Nick


Re: PROBLEM: Asus C201 video mode problems on HDMI hotplug (regression)

2017-11-05 Thread Nick Bowler
Hi,

I completed bisecting this issue.  See below.

On 2017-11-02, Nick Bowler  wrote:
> ~50% of the time after a hotplug, there is a vertical pink bar on the
> left of the display area and audio is not working at all.  According to
> the sink device the display size is 1282x720 which seems pretty wrong
> (normal and working situation is 1280x720).
>
> I posted photos of non-working versus working states here:
>
>   https://imgur.com/a/qhAZG
>
> Unplugging and plugging the cable again will correct the issue (it seems
> to, for the most part, alternate between working and not-working states,
> although not always).  It always works on power up with the cable initially
> connected.
>
> This is a regression from 4.11, where hotplug works perfectly every time.

Bisection implicates the following commit:

181e0ef092a4952aa523c5b9cb21394cf43bcd46 is the first bad commit
commit 181e0ef092a4952aa523c5b9cb21394cf43bcd46
Author: Laurent Pinchart 
Date:   Mon Mar 6 01:35:57 2017 +0200

drm: bridge: dw-hdmi: Fix the PHY power up sequence

When powering the PHY up we need to wait for the PLL to lock. This is
done by polling the TX_PHY_LOCK bit in the HDMI_PHY_STAT0 register
(interrupt-based wait could be implemented as well but is likely
overkill). The bit is asserted when the PLL locks, but the current code
incorrectly waits for the bit to be deasserted. Fix it, and while at it,
replace the udelay() with a sleep as the code never runs in
non-sleepable context.

To be consistent with the power down implementation move the poll loop
to the power off function.

Signed-off-by: Laurent Pinchart 
Tested-by: Neil Armstrong 
Reviewed-by: Jose Abreu 
Signed-off-by: Archit Taneja 
Link: 
http://patchwork.freedesktop.org/patch/msgid/20170305233557.11945-1-laurent.pinchart+rene...@ideasonboard.com

:04 04 0defad9d1a61c0355f49c679b18eebae2c4b9495
5d260e6db25d6abc1211d61ec3405be99e693a23 M  drivers

This commit does not revert cleanly, but on top of latest master (which has
the problem) I manually changed the relevant code back to its original state
and the problem is fixed, like this:

diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
index bf14214fa464..6618aac95a51 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
@@ -1100,37 +1100,34 @@ static void dw_hdmi_phy_power_off(struct dw_hdmi *hdmi)

 static int dw_hdmi_phy_power_on(struct dw_hdmi *hdmi)
 {
-   const struct dw_hdmi_phy_data *phy = hdmi->phy.data;
-   unsigned int i;
-   u8 val;
+   u8 val, msec;

-   if (phy->gen == 1) {
-   dw_hdmi_phy_enable_powerdown(hdmi, false);
+   dw_hdmi_phy_enable_powerdown(hdmi, false);

-   /* Toggle TMDS enable. */
-   dw_hdmi_phy_enable_tmds(hdmi, 0);
-   dw_hdmi_phy_enable_tmds(hdmi, 1);
-   return 0;
-   }
+   /* toggle TMDS enable */
+   dw_hdmi_phy_enable_tmds(hdmi, 0);
+   dw_hdmi_phy_enable_tmds(hdmi, 1);

+   /* gen2 tx power on */
dw_hdmi_phy_gen2_txpwron(hdmi, 1);
dw_hdmi_phy_gen2_pddq(hdmi, 0);

/* Wait for PHY PLL lock */
-   for (i = 0; i < 5; ++i) {
+   msec = 5;
+   do {
val = hdmi_readb(hdmi, HDMI_PHY_STAT0) & HDMI_PHY_TX_PHY_LOCK;
-   if (val)
+   if (!val)
break;

-   usleep_range(1000, 2000);
-   }
+   if (msec == 0) {
+   dev_err(hdmi->dev, "PHY PLL not locked\n");
+   return -ETIMEDOUT;
+   }

-   if (!val) {
-   dev_err(hdmi->dev, "PHY PLL failed to lock\n");
-   return -ETIMEDOUT;
-   }
+   udelay(1000);
+   msec--;
+   } while (1);

-   dev_dbg(hdmi->dev, "PHY PLL locked %u iterations\n", i);
return 0;
 }

Cheers,
  Nick


PROBLEM: Asus C201 video mode problems on HDMI hotplug (regression)

2017-11-01 Thread Nick Bowler
Hi,

On my Asus C201 laptop (rk3288) the HDMI has been behaving weirdly after
Linux upgrade.

~50% of the time after a hotplug, there is a vertical pink bar on the
left of the display area and audio is not working at all.  According to
the sink device the display size is 1282x720 which seems pretty wrong
(normal and working situation is 1280x720).

I posted photos of non-working versus working states here:

  https://imgur.com/a/qhAZG

Unplugging and plugging the cable again will correct the issue (it seems
to, for the most part, alternate between working and not-working states,
although not always).  It always works on power up with the cable initially
connected.

This is a regression from 4.11, where hotplug works perfectly every time.

I attached dmesg output (gzipped) from 4.14-rc7 (I booted up and
re-plugged the cable twice, which triggered non-working state and then
back to working state -- although seems no messages are logged from
these hotplugs).

I am working to bisect this, might take a while.  Partial progress
follows...

Let me know of anything else I should try.

Thanks,
  Nick

git bisect start
# bad: [0b07194bb55ed836c2cc7c22e866b87a14681984] Linux 4.14-rc7
git bisect bad 0b07194bb55ed836c2cc7c22e866b87a14681984
# bad: [fa394784e74b918f44fca1e6a1f826cf818350d2] Linux 4.12.14
git bisect bad fa394784e74b918f44fca1e6a1f826cf818350d2
# good: [bd1a9eb6a755e1cb342725a11242251d2bfad567] Linux 4.11.12
git bisect good bd1a9eb6a755e1cb342725a11242251d2bfad567
# good: [a351e9b9fc24e982ec2f0e76379a49826036da12] Linux 4.11
git bisect good a351e9b9fc24e982ec2f0e76379a49826036da12
# bad: [8f28472a739e8e39adc6e64ee5b460df039f0e4f] Merge tag
'usb-4.12-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect bad 8f28472a739e8e39adc6e64ee5b460df039f0e4f


aidos-4.14-rc7.log.gz
Description: GNU Zip compressed data


Re: PROBLEM: Intel HDMI output busticated on 4.4 (regression)

2016-02-09 Thread Nick Bowler
On 2/9/16, Ville Syrjälä  wrote:
> BTW I'm not at all convinced about the current live status bit defines
> we have for g4x. Supposedly someone tested them and found that they
> don't match the spec, but IIRC when I tried them on one g4x machine
> here, they did match the spec (well, at least for the ports present
> on that particular board).
>
> So something like this may or may not help:
>
> diff --git a/drivers/gpu/drm/i915/i915_reg.h
> b/drivers/gpu/drm/i915/i915_reg.h
> index 188ad5de020f..80c08016e522 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -3302,9 +3302,9 @@ enum skl_disp_power_wells {
>   * Please check the detailed lore in the commit message for for
> experimental
>   * evidence.
>   */
> -#define   PORTD_HOTPLUG_LIVE_STATUS_G4X  (1 << 29)
> +#define   PORTD_HOTPLUG_LIVE_STATUS_G4X  (1 << 27)
>  #define   PORTC_HOTPLUG_LIVE_STATUS_G4X  (1 << 28)
> -#define   PORTB_HOTPLUG_LIVE_STATUS_G4X  (1 << 27)
> +#define   PORTB_HOTPLUG_LIVE_STATUS_G4X  (1 << 29)
>  /* VLV DP/HDMI bits again match Bspec */
>  #define   PORTD_HOTPLUG_LIVE_STATUS_VLV  (1 << 27)
>  #define   PORTC_HOTPLUG_LIVE_STATUS_VLV  (1 << 28)

Well, I applied this on 4.5-rc3 and it seems to fix things.

Thanks,
  Nick


Re: PROBLEM: Intel HDMI output busticated on 4.4 (regression)

2016-02-09 Thread Nick Bowler
On 1/28/16, Nick Bowler  wrote:
> On 2016-01-21, Nick Bowler  wrote:
>> On 2016-01-21, Jindal, Sonika  wrote:
>>> On 1/21/2016 8:59 AM, Nick Bowler wrote:
>>>> On 1/20/16, Nick Bowler  wrote:
>>>>> On 2016-01-20, Jindal, Sonika  wrote:
>> [...]
>>>>>> Does the same system works with any other monitor?
>>>>> I'll see if I can find another to try.
>>>> I tried another monitor, and the same problem occurs.
>>> Which make are these monitors?
>>
>>  - LG Flatron W2253V
>>  - Dell E228WFPc
>>
>>> Do you have any other system other than G45?
>>
>> Nothing else with Linux 4.4, unfortunately.
>
> Anything else you want me to try?
>
> This issue is still present in 4.5-rc1.

Ping?

HDMI is still broken on my system in 4.5-rc3.

Cheers,
  Nick


Re: PROBLEM: Intel HDMI output busticated on 4.4 (regression)

2016-01-27 Thread Nick Bowler
On 2016-01-21, Nick Bowler  wrote:
> On 2016-01-21, Jindal, Sonika  wrote:
>> On 1/21/2016 8:59 AM, Nick Bowler wrote:
>>> On 1/20/16, Nick Bowler  wrote:
>>>> On 2016-01-20, Jindal, Sonika  wrote:
> [...]
>>>>> Does the same system works with any other monitor?
>>>> I'll see if I can find another to try.
>>> I tried another monitor, and the same problem occurs.
>> Which make are these monitors?
>
>  - LG Flatron W2253V
>  - Dell E228WFPc
>
>> Do you have any other system other than G45?
>
> Nothing else with Linux 4.4, unfortunately.

Anything else you want me to try?

This issue is still present in 4.5-rc1.

Cheers,
  Nick


Re: PROBLEM: Intel HDMI output busticated on 4.4 (regression)

2016-01-21 Thread Nick Bowler
On 2016-01-21, Jindal, Sonika  wrote:
> On 1/21/2016 8:59 AM, Nick Bowler wrote:
>> On 1/20/16, Nick Bowler  wrote:
>>> On 2016-01-20, Jindal, Sonika  wrote:
[...]
>>>> Does the same system works with any other monitor?
>>> I'll see if I can find another to try.
>> I tried another monitor, and the same problem occurs.
> Which make are these monitors?

 - LG Flatron W2253V
 - Dell E228WFPc

> Do you have any other system other than G45?

Nothing else with Linux 4.4, unfortunately.

Thanks,
  Nick


Re: PROBLEM: Intel HDMI output busticated on 4.4 (regression)

2016-01-20 Thread Nick Bowler
On 1/20/16, Nick Bowler  wrote:
> Hi,
>
> On 2016-01-20, Jindal, Sonika  wrote:
>> Can you please check if you have following patch:
>> "commit 3d8acd1f667b45c531401c8f0c2033072e32a05d
>> Author: Gary Wang 
>> Date:   Wed Dec 23 16:11:35 2015 +0800
>>
>> drm/i915: increase the tries for HDMI hotplug live status checking"
>
> Yes, that patch seems to be present in 4.4.
>
>> Does the same system works with any other monitor?
>
> I'll see if I can find another to try.

I tried another monitor, and the same problem occurs.


Re: PROBLEM: Intel HDMI output busticated on 4.4 (regression)

2016-01-20 Thread Nick Bowler
Hi,

On 2016-01-20, Jindal, Sonika  wrote:
> Can you please check if you have following patch:
> "commit 3d8acd1f667b45c531401c8f0c2033072e32a05d
> Author: Gary Wang 
> Date:   Wed Dec 23 16:11:35 2015 +0800
>
> drm/i915: increase the tries for HDMI hotplug live status checking"

Yes, that patch seems to be present in 4.4.

> Does the same system works with any other monitor?

I'll see if I can find another to try.

Thanks,
  Nick


Re: PROBLEM: Intel VGA output busticated on 4.3-rc2 (regression)

2015-10-07 Thread Nick Bowler
On 10/7/15, Ville Syrjälä  wrote:
> On Wed, Oct 07, 2015 at 10:29:22AM -0400, Nick Bowler wrote:
>> On 10/7/15, Ville Syrjälä  wrote:
>> > On Tue, Oct 06, 2015 at 11:42:33AM -0400, Nick Bowler wrote:
>> >> On 9/24/15, Nick Bowler  wrote:
>> >> > Testing out 4.3-rc2, first thing I notice is that the VGA output is
>> >> > not working.  Specifically, the display is continuously powering on
>> >> > and off -- at no point is any image visible on the screen (I am
>> >> > expecting to see the console output).  The display connected to the
>> >> > HDMI output is working fine.
>> [...]
> I've attached two potential patches that might help. Can you give a try
> to just patch 1, and if that alone doesn't help then both patches
> together?

Patch #1: no change.
Patch #1+#2: this works.

Regards,
  Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Intel VGA output busticated on 4.3-rc2 (regression)

2015-10-07 Thread Nick Bowler
On 10/7/15, Nick Bowler  wrote:
> On 10/7/15, Ville Syrjälä  wrote:
>> On Wed, Oct 07, 2015 at 10:29:22AM -0400, Nick Bowler wrote:
>>> On 10/7/15, Ville Syrjälä  wrote:
>>> > On Tue, Oct 06, 2015 at 11:42:33AM -0400, Nick Bowler wrote:
>>> >> On 9/24/15, Nick Bowler  wrote:
>>> >> > Testing out 4.3-rc2, first thing I notice is that the VGA output is
>>> >> > not working.  Specifically, the display is continuously powering on
>>> >> > and off -- at no point is any image visible on the screen (I am
>>> >> > expecting to see the console output).  The display connected to the
>>> >> > HDMI output is working fine.
>>> [...]
>> Hmm. You said VGA has the problem, but HDMI does not. Was the problem
>> happening even when you have both displays enabled at the same time, or
>> just when VGA was enabled alone?
>
> When I boot with HDMI cable disconnected, there is no change in behaviour
> for the VGA output.

Clarification: normally both displays are connected.  So the original issue is
that only one of two displays are working.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Intel VGA output busticated on 4.3-rc2 (regression)

2015-10-07 Thread Nick Bowler
On 10/7/15, Ville Syrjälä  wrote:
> On Wed, Oct 07, 2015 at 10:29:22AM -0400, Nick Bowler wrote:
>> On 10/7/15, Ville Syrjälä  wrote:
>> > On Tue, Oct 06, 2015 at 11:42:33AM -0400, Nick Bowler wrote:
>> >> On 9/24/15, Nick Bowler  wrote:
>> >> > Testing out 4.3-rc2, first thing I notice is that the VGA output is
>> >> > not working.  Specifically, the display is continuously powering on
>> >> > and off -- at no point is any image visible on the screen (I am
>> >> > expecting to see the console output).  The display connected to the
>> >> > HDMI output is working fine.
>> [...]
> Hmm. You said VGA has the problem, but HDMI does not. Was the problem
> happening even when you have both displays enabled at the same time, or
> just when VGA was enabled alone?

When I boot with HDMI cable disconnected, there is no change in behaviour
for the VGA output.

> I've attached two potential patches that might help. Can you give a try
> to just patch 1, and if that alone doesn't help then both patches
> together?

I will try.

Cheers,
  Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Intel VGA output busticated on 4.3-rc2 (regression)

2015-10-07 Thread Nick Bowler
On 10/7/15, Ville Syrjälä  wrote:
> On Tue, Oct 06, 2015 at 11:42:33AM -0400, Nick Bowler wrote:
>> On 9/24/15, Nick Bowler  wrote:
>> > Testing out 4.3-rc2, first thing I notice is that the VGA output is
>> > not working.  Specifically, the display is continuously powering on
>> > and off -- at no point is any image visible on the screen (I am
>> > expecting to see the console output).  The display connected to the
>> > HDMI output is working fine.
[...]
>>   b8afb9113c519a8bd742f7df8c424b0af69a75cd is the first bad commit
>>   commit b8afb9113c519a8bd742f7df8c424b0af69a75cd
>>   Author: Ville Syrjälä 
>>   Date:   Mon Jun 29 15:25:48 2015 +0300
>>
>>   drm/i915: Keep GMCH DPLL VGA mode always disabled
[...]
> @@ -1790,13 +1790,13 @@ static void i9xx_disable_pll(struct intel_crtc
> *crtc)
> /* Make sure the pipe isn't still relying on us */
> assert_pipe_disabled(dev_priv, pipe);
>
> -   I915_WRITE(DPLL(pipe), 0);
> +   I915_WRITE(DPLL(pipe), DPLL_VGA_MODE_DIS);
> POSTING_READ(DPLL(pipe));
>  }
>
>
> That hunk is the only relevant part for your machine. Can you try to revert
> just that manually?
>
> But I'm really surprised that would have any effect since we only used
> to enable "VGA mode" when the DPLL is off. And when the DPLL is off,
> there's nothing on the screen anyway.

Nevertheless, manually reverting just that hunk seems to fix it.

Thanks,
  Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Intel VGA output busticated on 4.3-rc2 (regression)

2015-10-06 Thread Nick Bowler
Hi,

This issue is still present in 4.3-rc4.

On 9/24/15, Nick Bowler  wrote:
> Testing out 4.3-rc2, first thing I notice is that the VGA output is
> not working.  Specifically, the display is continuously powering on
> and off -- at no point is any image visible on the screen (I am expecting
> to see the console output).  The display connected to the HDMI output is
> working fine.
>
> Linux 4.2 did not suffer from this problem.
>
> In dmesg I see the following messages, which I do not see on a working
> kernel.  Full dmesg from 4.3-rc2 is attached (gzipped).
>
>   [0.115339] [drm:drm_calc_timestamping_constants] *ERROR* crtc
> 21: Can't calculate constants, dotclock = 0!
>   [0.117582] [drm:intel_opregion_init] *ERROR* No ACPI video bus found
>
> This is an older machine with Intel G45 graphics.

I was able to identify the commit which fixed my boot crashes, so I
cherry-picked 80aa93128653 ("drm/i915: disable_shared_pll doesn't
work on pre-gen5") on top of all otherwise untestable commits.  This
allowed bisection to proceed:

  b8afb9113c519a8bd742f7df8c424b0af69a75cd is the first bad commit
  commit b8afb9113c519a8bd742f7df8c424b0af69a75cd
  Author: Ville Syrjälä 
  Date:   Mon Jun 29 15:25:48 2015 +0300

  drm/i915: Keep GMCH DPLL VGA mode always disabled

  We disable the DPLL VGA mode when enabling the DPLL, but we enaable it
  again when disabling the DPLL. Having VGA mode enabled even in unused
  DPLLs can cause problems for CHV, so it seems wiser to always keep it
  disabled. And let's just do that on all GMCH platforms to keep things
  as similar as possible between them.

  Signed-off-by: Ville Syrjälä 
  Reviewed-by: Sivakumar Thulasimani 
  Signed-off-by: Daniel Vetter 

  :04 04 7797d596e73ecf75723375028decd25fbe332ee0
9f90a92eec483919853d68563bbb09a71a305532 M  drivers

Unfortunately it does not revert cleanly on master.

Regards,
  Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: Intel VGA output busticated on 4.3-rc2 (regression)

2015-09-24 Thread Nick Bowler
Hi,

Testing out 4.3-rc2, first thing I notice is that the VGA output is
not working.  Specifically, the display is continuously powering on
and off -- at no point is any image visible on the screen (I am expecting
to see the console output).  The display connected to the HDMI output is
working fine.

Linux 4.2 did not suffer from this problem.

In dmesg I see the following messages, which I do not see on a working
kernel.  Full dmesg from 4.3-rc2 is attached (gzipped).

  [0.115339] [drm:drm_calc_timestamping_constants] *ERROR* crtc
21: Can't calculate constants, dotclock = 0!
  [0.117582] [drm:intel_opregion_init] *ERROR* No ACPI video bus found

This is an older machine with Intel G45 graphics.

Unfortunately bisection is proving difficult, because the commits it
wants me to test have a different problem: neither display comes up at
all (both remain in standby).  Partial results follow.

Thanks,
  Nick

  git bisect start 'drivers/gpu/drm/i915'
  # bad: [1f93e4a96c9109378204c147b3eec0d0e8100fde] Linux 4.3-rc2
  git bisect bad 1f93e4a96c9109378204c147b3eec0d0e8100fde
  # good: [64291f7db5bd8150a74ad2036f1037e6a0428df2] Linux 4.2
  git bisect good 64291f7db5bd8150a74ad2036f1037e6a0428df2
  # skip: [a7a6c498927ea42c9a3b26e0caa5c854a980d58c] drm/i915:
POSTING_READ() in intel_set_memory_cxsr()
  git bisect skip a7a6c498927ea42c9a3b26e0caa5c854a980d58c
  # skip: [031b698a77a70a6c394568034437b5486a44e868] drm/i915:
Unconditionally do fb tracking invalidate in set_domain
  git bisect skip 031b698a77a70a6c394568034437b5486a44e868
  # skip: [adeca76d8e2b34b5c739a36f4191aed63080da40] drm/i915:
Simplify i915_gem_execbuffer_retire_commands() parameters
  git bisect skip adeca76d8e2b34b5c739a36f4191aed63080da40
  # good: [369712e89404089fa559235bb1ee8fc40d976e6b] drm/i915: reduce
duplicate conditions in i9xx_hpd_irq_handler
  git bisect good 369712e89404089fa559235bb1ee8fc40d976e6b
  # skip: [6eb1a6817246f1a67de4d6959a84d09efead5329] drm/i915: Read wm
values from hardware at init on CHV
  git bisect skip 6eb1a6817246f1a67de4d6959a84d09efead5329
  # bad: [d14e7b6d1d8747826cb900db852351c550e00fdd] drm/i915: Check DP
link status on long hpd too
  git bisect bad d14e7b6d1d8747826cb900db852351c550e00fdd
  # skip: [fe36f55d4d4447679923fc74564786ae423ca4bd] drm/i915/gtt:
Cleanup page directory encoding
  git bisect skip fe36f55d4d4447679923fc74564786ae423ca4bd


i915-novga-dmesg.log.gz
Description: GNU Zip compressed data


Re: [v2.6.34-stable 004/213] crypto: ghash - Avoid null pointer dereference if no key is set

2014-02-05 Thread Nick Bowler
On 2014-02-05 14:59 -0500, Paul Gortmaker wrote:
> From: Nick Bowler 
> 
>---
> This is a commit scheduled for the next v2.6.34 longterm release.
> http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
> If you see a problem with using this for longterm, please comment.
>---
> 
> commit 7ed47b7d142ec99ad6880bbbec51e9f12b3af74c upstream.
> 
> The ghash_update function passes a pointer to gf128mul_4k_lle which will
> be NULL if ghash_setkey is not called or if the most recent call to
> ghash_setkey failed to allocate memory.  This causes an oops.  Fix this
> up by returning an error code in the null case.
> 
> This is trivially triggered from unprivileged userspace through the
> AF_ALG interface by simply writing to the socket without setting a key.

After all this time, I see this patch still manages to find its way,
occasionally, into the patch queue for older -stable. :)

It should be harmless to apply, but this patch doesn't actually fix
any real problem on kernels previous to 2.6.38 because the AF_ALG
userspace interface does not exist in these kernels.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: NFS crash in Linux 3.10.2

2013-07-22 Thread Nick Bowler
Hi folks,

I tried booting 3.10.2 today, and hit the following NFS crash a few
seconds after logging in.  On a subsequent boot, I was not able to
crash the kernel again after several minutes of usage.  This machine
has user home directories NFS-mounted.

I did not have any crashes with 3.9, so this may be a regression.  But
since I was not able to reliably reproduce the issue, it would be hard
to bisect.

Full log attached (gzipped).

  [   64.217241] BUG: unable to handle kernel NULL pointer dereference at 
0008
  [   64.217330] IP: [] nlmclnt_setlockargs+0x50/0xca [lockd]
  [   64.217403] PGD 0 
  [   64.217416] Oops:  [#1] PREEMPT SMP 
  [   64.217454] Modules linked in: nfsv3 nfs_acl nfs bridge stp llc it87 
hwmon_vid coretemp hwmon autofs4 nfsd exportfs lockd sunrpc ipv6 iptable_filter 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
ip_tables x_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel 
snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd soundcore 
usb_storage sr_mod cdrom loop tun acpi_cpufreq mperf arc4 rt2800pci 
eeprom_93cx6 rt2x00pci rt2800lib crc_ccitt rt2x00mmio rt2x00lib mac80211 
cfg80211 e1000e ptp pps_core
  [   64.218124] CPU: 0 PID: 2803 Comm: zsh Not tainted 3.10.2 #318
  [   64.218124] Hardware name: Acer Aspire X3810/WG43M, BIOS P01-A0 04/03/2009
  [   64.218124] task: 880133b8ad40 ti: 88012f4bc000 task.ti: 
88012f4bc000
  [   64.218124] RIP: 0010:[]  [] 
nlmclnt_setlockargs+0x50/0xca [lockd]
  [   64.218124] RSP: 0018:88012f4bdc48  EFLAGS: 00010286
  [   64.218124] RAX: 880133b8ad40 RBX: 88012f695800 RCX: 

  [   64.218124] RDX:  RSI: 004a RDI: 
88012f695b54
  [   64.218124] RBP: 88012f4bdc58 R08: 88012f695800 R09: 
7fff
  [   64.218124] R10: 88013a903b10 R11: 88013a903b00 R12: 
88012f4bdd58
  [   64.218124] R13: 8801302df9c8 R14: 8801302df800 R15: 
0007
  [   64.218124] FS:  () GS:88013fc0() 
knlGS:
  [   64.218124] CS:  0010 DS:  ES:  CR0: 80050033
  [   64.218124] CR2: 0008 CR3: 0160b000 CR4: 
000407f0
  [   64.218124] DR0:  DR1:  DR2: 

  [   64.218124] DR3:  DR6: 0ff0 DR7: 
0400
  [   64.218124] Stack:
  [   64.218124]  88012f4bdd58 88012f695800 88012f4bdcd8 
a02de094
  [   64.218124]  88012f4bdc88  88012f4cb400 
810b9e04
  [   64.218124]  88013fc14460 000368a8 8801302df9b8 
88013a903b00
  [   64.218124] Call Trace:
  [   64.218124]  [] nlmclnt_proc+0x1e6/0x5f5 [lockd]
  [   64.218124]  [] ? kfree+0x8d/0xf0
  [   64.218124]  [] nfs3_proc_lock+0x1c/0x1e [nfsv3]
  [   64.218124]  [] do_unlk+0x88/0xa4 [nfs]
  [   64.218124]  [] nfs_flock+0x61/0x6a [nfs]
  [   64.218124]  [] locks_remove_flock+0x99/0x10e
  [   64.218124]  [] __fput+0xb4/0x1d4
  [   64.218124]  [] fput+0x9/0xb
  [   64.218124]  [] task_work_run+0x7e/0x94
  [   64.218124]  [] do_exit+0x38b/0x8a2
  [   64.218124]  [] ? __set_task_blocked+0x61/0x68
  [   64.218124]  [] ? fput+0x13/0xbf
  [   64.218124]  [] do_group_exit+0x71/0x99
  [   64.218124]  [] SyS_exit_group+0x12/0x12
  [   64.218124]  [] system_call_fastpath+0x16/0x1b
  [   64.218124] Code: 00 00 65 48 8b 04 25 40 b8 00 00 48 8b 72 20 48 81 ee 70 
01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 50 05 00 00 <48> 8b 
52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 38 48 8b 
  [   64.218124] RIP  [] nlmclnt_setlockargs+0x50/0xca [lockd]
  [   64.218124]  RSP 
  [   64.218124] CR2: 0008
  [   64.236645] ---[ end trace 2fe8ddfc44039798 ]---

Thanks,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)


crash.log.gz
Description: Binary data


Re: Black screen with nouveau in 3.8.x (regression)

2013-03-25 Thread Nick Bowler
Ping?

On 2013-03-07 10:06 -0500, Nick Bowler wrote:
> Yesterday I upgraded one of my machines to 3.8.2 from 3.6.6.  This
> machine has an old NV36 AGP board.  With the new kernel, as soon as
> nouveau takes over the console the display connected via DVI goes dark
> (the monitor goes into standby mode).  The display connected via VGA
> continues to work fine.
> 
> Starting Xorg does not correct the problem.  Nouveau seems to know
> that the display is connected:
> 
>   % cat /sys/class/drm/card0-DVI-I-1/status
>   connected
> 
> I don't see anything unusual in the log either (full log attached):
> 
>   % dmesg -t | grep -iE 'drm|nouveau'
>   [drm] Initialized drm 1.1.0 20060810
>   nouveau  [  DEVICE][:01:00.0] BOOT0  : 0x436200a1
>   nouveau  [  DEVICE][:01:00.0] Chipset: NV36 (NV36)
>   nouveau  [  DEVICE][:01:00.0] Family : NV30
>   nouveau  [   VBIOS][:01:00.0] checking PRAMIN for image...
>   nouveau  [   VBIOS][:01:00.0] ... appears to be valid
>   nouveau  [   VBIOS][:01:00.0] using image from PRAMIN
>   nouveau  [   VBIOS][:01:00.0] BMP version 5.28
>   nouveau  [   VBIOS][:01:00.0] version 04.36.20.21
>   nouveau W[  PTIMER][:01:00.0] unknown input clock freq
>   nouveau  [ PFB][:01:00.0] RAM type: DDR1
>   nouveau  [ PFB][:01:00.0] RAM size: 256 MiB
>   nouveau :01:00.0: putting AGP V3 device into 8x mode
>   nouveau  [ DRM] VRAM: 255 MiB
>   nouveau  [ DRM] GART: 64 MiB
>   nouveau  [ DRM] BMP BIOS found
>   nouveau  [ DRM] BMP version 5.40
>   nouveau  [ DRM] Bios version 04.36.20.21
>   nouveau  [ DRM] DCB version 2.2
>   nouveau  [ DRM] DCB outp 00: 01000300 9c40
>   nouveau  [ DRM] DCB outp 01: 02010310 9c40
>   nouveau  [ DRM] DCB outp 02: 04000302 
>   nouveau  [ DRM] DCB outp 03: 02020321 0303
>   nouveau  [ DRM] Loading NV17 power sequencing microcode
>   nouveau  [ DRM] Saving VGA fonts
>   [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
>   [drm] No driver support for vblank timestamp query.
>   nouveau  [ DRM] 0xE51A: Parsing digital output script table
>   nouveau  [ DRM] 0 available performance level(s)
>   nouveau  [ DRM] c: core 425MHz memory 501MHz voltage 1350mV
>   nouveau  [ DRM] MM: using M2MF for buffer copies
>   nouveau  [ DRM] Setting dpms mode 3 on vga encoder (output 0)
>   nouveau  [ DRM] Setting dpms mode 3 on vga encoder (output 1)
>   nouveau  [ DRM] Setting dpms mode 3 on tmds encoder (output 2)
>   nouveau  [ DRM] Setting dpms mode 3 on TV encoder (output 3)
>   nouveau  [ DRM] allocated 1280x1024 fb: 0x9000, bo 88007b6ae000
>   fbcon: nouveaufb (fb0) is primary device
>   nouveau  [ DRM] 0xE51A: Parsing digital output script table
>   nouveau  [ DRM] Setting dpms mode 0 on tmds encoder (output 2)
>   nouveau  [ DRM] Output DVI-I-1 is running on CRTC 0 using output C
>   nouveau  [ DRM] Setting dpms mode 0 on vga encoder (output 1)
>   nouveau  [ DRM] Output VGA-1 is running on CRTC 1 using output B
>   fb0: nouveaufb frame buffer device
>   drm: registered panic notifier
>   [drm] Initialized nouveau 1.1.0 20120801 for :01:00.0 on minor 0
> 
> I started a bisection... here's the first steps so far.  I will try to
> finish the procedure over the next couple days but I'm reporting this
> now in case someone needs me to get some other info.
[...]

On 2013-03-08 09:28 -0500, Nick Bowler wrote:
> I carried this on a bit further, but it seems that most of the remaining
> commits bisect wants to test do not compile, so there is a huge number
> of skipped commits.  Not exactly a lot of fun...
> 
> git bisect start 'drivers/gpu/drm'
> # good: [3820288942d1c1524c3ee85cbf503fee1533cfc3] Linux 3.6.6
> git bisect good 3820288942d1c1524c3ee85cbf503fee1533cfc3
> # bad: [19b00d2dc9bedf0856e366cb7b9c7733ded659e4] Linux 3.8.2
> git bisect bad 19b00d2dc9bedf0856e366cb7b9c7733ded659e4
> # good: [a0d271cbfed1dd50278c6b06bead3d00ba0a88f9] Linux 3.6
> git bisect good a0d271cbfed1dd50278c6b06bead3d00ba0a88f9
> # bad: [daed2dbb7ea4d179e472396ce46377fe758d5faf] drm/i915: use the CPU and 
> PCH transcoders on lpt_pch_enable
> git bisect bad daed2dbb7ea4d179e472396ce46377fe758d5faf
> # good: [df86b5765a48d5f557489577652bd6df145b0e1b] drm/savage: re-add 
> busmaster enable, regression fix
> git bisect good df86b5765a48d5f557489577652bd6df145b0e1b
> # bad: [39df01cd6ce9f6dd755ace0030e2bebe75da7727] Merge branch 
> 'drm-fixes-3.7' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
> git bisect bad 39df01cd6ce9f6dd755ace0030e2bebe75da7727
> # bad: [b9f10852fcb1f09369d931dcbfbaad

Re: Black screen with nouveau in 3.8.x (regression)

2013-03-08 Thread Nick Bowler
On 2013-03-07 10:06 -0500, Nick Bowler wrote:
> I started a bisection... here's the first steps so far.  I will try to
> finish the procedure over the next couple days but I'm reporting this
> now in case someone needs me to get some other info.

I carried this on a bit further, but it seems that most of the remaining
commits bisect wants to test do not compile, so there is a huge number
of skipped commits.  Not exactly a lot of fun...

git bisect start 'drivers/gpu/drm'
# good: [3820288942d1c1524c3ee85cbf503fee1533cfc3] Linux 3.6.6
git bisect good 3820288942d1c1524c3ee85cbf503fee1533cfc3
# bad: [19b00d2dc9bedf0856e366cb7b9c7733ded659e4] Linux 3.8.2
git bisect bad 19b00d2dc9bedf0856e366cb7b9c7733ded659e4
# good: [a0d271cbfed1dd50278c6b06bead3d00ba0a88f9] Linux 3.6
git bisect good a0d271cbfed1dd50278c6b06bead3d00ba0a88f9
# bad: [daed2dbb7ea4d179e472396ce46377fe758d5faf] drm/i915: use the CPU and PCH 
transcoders on lpt_pch_enable
git bisect bad daed2dbb7ea4d179e472396ce46377fe758d5faf
# good: [df86b5765a48d5f557489577652bd6df145b0e1b] drm/savage: re-add busmaster 
enable, regression fix
git bisect good df86b5765a48d5f557489577652bd6df145b0e1b
# bad: [39df01cd6ce9f6dd755ace0030e2bebe75da7727] Merge branch 'drm-fixes-3.7' 
of git://people.freedesktop.org/~agd5f/linux into drm-fixes
git bisect bad 39df01cd6ce9f6dd755ace0030e2bebe75da7727
# bad: [b9f10852fcb1f09369d931dcbfbaad89ad1da4ad] drm/nv98/crypt: fix fuc build 
with latest envyas
git bisect bad b9f10852fcb1f09369d931dcbfbaad89ad1da4ad
# bad: [b10f20d590aa040e4028c04a70a27b9ad6650ba8] drm/nvc0-/gr: remove 
reset-after-grctx-construction hack
git bisect bad b10f20d590aa040e4028c04a70a27b9ad6650ba8
# skip: [18c9b959fd8ea6f3602efbedad788f53e305e6f1] drm/nouveau/gpuobj: create 
wrapper functions for mapping gpuobj into vm/bar
git bisect skip 18c9b959fd8ea6f3602efbedad788f53e305e6f1
# skip: [092599da308bf56b96c849ecdd315b8a1a13ca52] drm/nv50/instmem: remove use 
of nouveau_gpuobj_new_fake()
git bisect skip 092599da308bf56b96c849ecdd315b8a1a13ca52
# skip: [4196faa8623264b79279a06fd186654c959f2767] drm/nouveau/i2c: port to 
subdev interfaces
git bisect skip 4196faa8623264b79279a06fd186654c959f2767
# skip: [9da226f698c01b268b9172050df4150f269a7613] drm/nvc0/fifo: handle bar1 
control regs much like fifo/nve0
git bisect skip 9da226f698c01b268b9172050df4150f269a7613
# skip: [8aceb7de47ea2491abc1a577dc875b19e9947a54] drm/nouveau/clk: implement 
stub clock subdev
git bisect skip 8aceb7de47ea2491abc1a577dc875b19e9947a54
# skip: [70ee6f1cd6911098ddd4c11ee21b69dbe51fb3f9] drm/nv04-nv40/fifo: remove 
use of nouveau_gpuobj_new_fake()
git bisect skip 70ee6f1cd6911098ddd4c11ee21b69dbe51fb3f9
# skip: [f589be88caf32501a734e531180d5df5d6089ef3] drm/nouveau/pageflip: kick 
flip handling out of engsw and into fence
git bisect skip f589be88caf32501a734e531180d5df5d6089ef3
# skip: [73a60c0d218a292f8ef29d3467726ff26ed366fc] drm/nouveau/gpuobj: remove 
flags for vm-mappings
git bisect skip 73a60c0d218a292f8ef29d3467726ff26ed366fc
# skip: [70790f4f819875e8f390871fd15bbbf823f28e1b] drm/nouveau/clock: pull in 
the implementation from all over the place
git bisect skip 70790f4f819875e8f390871fd15bbbf823f28e1b
# skip: [5787640db6ae722aeadb394d480c7ca21b603e34] drm/nv04-nv40/instmem: 
remove use of nouveau_gpuobj_new_fake()
git bisect skip 5787640db6ae722aeadb394d480c7ca21b603e34
# skip: [cb75d97e9c77743ecfcc43375be135a55a4d9b25] drm/nouveau: implement 
devinit subdev, and new init table parser
git bisect skip cb75d97e9c77743ecfcc43375be135a55a4d9b25
# skip: [8a9b889e668a5bc2f4031015fe4893005c43403d] drm/nouveau: remove last use 
of nouveau_gpuobj_new_fake()
git bisect skip 8a9b889e668a5bc2f4031015fe4893005c43403d
# skip: [a73c5c526a8a39b2e61709c753d44be597c9a4c0] drm/nvc0-nve0/graph: rename 
dev to priv, no code changes
git bisect skip a73c5c526a8a39b2e61709c753d44be597c9a4c0
# good: [d6ba6d215a538a58f0f0026f0961b0b9125e8042] drm/nvc0/fence: restore 
pre-suspend fence buffer context on resume
git bisect good d6ba6d215a538a58f0f0026f0961b0b9125e8042
# skip: [7d9115dee978e8540734c456c925d71a37752b8d] drm/nouveau/mc: port to 
subdev interfaces
git bisect skip 7d9115dee978e8540734c456c925d71a37752b8d
# good: [3a92d37e4099054fe187b485a9d27c439c10eca7] drm/nouveau/gem: use 
bo.offset rather than mm_node.start
git bisect good 3a92d37e4099054fe187b485a9d27c439c10eca7
# skip: [5a5c7432bbbd2e318dff107b4ff960ab543a7cef] drm/nouveau/timer: port to 
subdev interfaces
git bisect skip 5a5c7432bbbd2e318dff107b4ff960ab543a7cef
# bad: [77145f1cbdf8d28b46ff8070ca749bad821e0774] drm/nouveau: port remainder 
of drm code, and rip out compat layer
git bisect bad 77145f1cbdf8d28b46ff8070ca749bad821e0774
# skip: [51a3d3425663698a79e8a9d01998a8a32ddee13b] drm/nouveau/backlight: 
remove dependence on nouveau_drv.h
git bisect skip 51a3d3425663698a79e8a9d01998a8a32ddee13b

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this

Black screen with nouveau in 3.8.x (regression)

2013-03-07 Thread Nick Bowler
Hi folks,

Yesterday I upgraded one of my machines to 3.8.2 from 3.6.6.  This
machine has an old NV36 AGP board.  With the new kernel, as soon as
nouveau takes over the console the display connected via DVI goes dark
(the monitor goes into standby mode).  The display connected via VGA
continues to work fine.

Starting Xorg does not correct the problem.  Nouveau seems to know
that the display is connected:

  % cat /sys/class/drm/card0-DVI-I-1/status
  connected

I don't see anything unusual in the log either (full log attached):

  % dmesg -t | grep -iE 'drm|nouveau'
  [drm] Initialized drm 1.1.0 20060810
  nouveau  [  DEVICE][:01:00.0] BOOT0  : 0x436200a1
  nouveau  [  DEVICE][:01:00.0] Chipset: NV36 (NV36)
  nouveau  [  DEVICE][:01:00.0] Family : NV30
  nouveau  [   VBIOS][:01:00.0] checking PRAMIN for image...
  nouveau  [   VBIOS][:01:00.0] ... appears to be valid
  nouveau  [   VBIOS][:01:00.0] using image from PRAMIN
  nouveau  [   VBIOS][:01:00.0] BMP version 5.28
  nouveau  [   VBIOS][:01:00.0] version 04.36.20.21
  nouveau W[  PTIMER][:01:00.0] unknown input clock freq
  nouveau  [ PFB][:01:00.0] RAM type: DDR1
  nouveau  [ PFB][:01:00.0] RAM size: 256 MiB
  nouveau :01:00.0: putting AGP V3 device into 8x mode
  nouveau  [ DRM] VRAM: 255 MiB
  nouveau  [ DRM] GART: 64 MiB
  nouveau  [ DRM] BMP BIOS found
  nouveau  [ DRM] BMP version 5.40
  nouveau  [ DRM] Bios version 04.36.20.21
  nouveau  [ DRM] DCB version 2.2
  nouveau  [ DRM] DCB outp 00: 01000300 9c40
  nouveau  [ DRM] DCB outp 01: 02010310 9c40
  nouveau  [ DRM] DCB outp 02: 04000302 
  nouveau  [ DRM] DCB outp 03: 02020321 0303
  nouveau  [ DRM] Loading NV17 power sequencing microcode
  nouveau  [ DRM] Saving VGA fonts
  [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
  [drm] No driver support for vblank timestamp query.
  nouveau  [ DRM] 0xE51A: Parsing digital output script table
  nouveau  [ DRM] 0 available performance level(s)
  nouveau  [ DRM] c: core 425MHz memory 501MHz voltage 1350mV
  nouveau  [ DRM] MM: using M2MF for buffer copies
  nouveau  [ DRM] Setting dpms mode 3 on vga encoder (output 0)
  nouveau  [ DRM] Setting dpms mode 3 on vga encoder (output 1)
  nouveau  [ DRM] Setting dpms mode 3 on tmds encoder (output 2)
  nouveau  [ DRM] Setting dpms mode 3 on TV encoder (output 3)
  nouveau  [ DRM] allocated 1280x1024 fb: 0x9000, bo 88007b6ae000
  fbcon: nouveaufb (fb0) is primary device
  nouveau  [ DRM] 0xE51A: Parsing digital output script table
  nouveau  [ DRM] Setting dpms mode 0 on tmds encoder (output 2)
  nouveau  [ DRM] Output DVI-I-1 is running on CRTC 0 using output C
  nouveau  [ DRM] Setting dpms mode 0 on vga encoder (output 1)
  nouveau  [ DRM] Output VGA-1 is running on CRTC 1 using output B
  fb0: nouveaufb frame buffer device
  drm: registered panic notifier
  [drm] Initialized nouveau 1.1.0 20120801 for :01:00.0 on minor 0

I started a bisection... here's the first steps so far.  I will try to
finish the procedure over the next couple days but I'm reporting this
now in case someone needs me to get some other info.

  git bisect start 'drivers/gpu/drm'
  # good: [3820288942d1c1524c3ee85cbf503fee1533cfc3] Linux 3.6.6
  git bisect good 3820288942d1c1524c3ee85cbf503fee1533cfc3
  # bad: [19b00d2dc9bedf0856e366cb7b9c7733ded659e4] Linux 3.8.2
  git bisect bad 19b00d2dc9bedf0856e366cb7b9c7733ded659e4
  # good: [a0d271cbfed1dd50278c6b06bead3d00ba0a88f9] Linux 3.6
  git bisect good a0d271cbfed1dd50278c6b06bead3d00ba0a88f9
  # bad: [daed2dbb7ea4d179e472396ce46377fe758d5faf] drm/i915: use the CPU and 
PCH transcoders on lpt_pch_enable
  git bisect bad daed2dbb7ea4d179e472396ce46377fe758d5faf
  # good: [df86b5765a48d5f557489577652bd6df145b0e1b] drm/savage: re-add 
busmaster enable, regression fix
  git bisect good df86b5765a48d5f557489577652bd6df145b0e1b
  # bad: [39df01cd6ce9f6dd755ace0030e2bebe75da7727] Merge branch 
'drm-fixes-3.7' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
  git bisect bad 39df01cd6ce9f6dd755ace0030e2bebe75da7727

Thanks,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)


nouveau-black-lcd.log.xz
Description: application/xz


Re: [v2.6.34-stable 71/77] crypto: ghash - Avoid null pointer dereference if no key is set

2013-01-08 Thread Nick Bowler
On 2013-01-08 18:35 -0500, Paul Gortmaker wrote:
> From: Nick Bowler 
> 
>---
> This is a commit scheduled for the next v2.6.34 longterm release.
> http://git.kernel.org/?p=linux/kernel/git/paulg/longterm-queue-2.6.34.git
> If you see a problem with using this for longterm, please comment.
>---
> 
> commit 7ed47b7d142ec99ad6880bbbec51e9f12b3af74c upstream.
> 
> The ghash_update function passes a pointer to gf128mul_4k_lle which will
> be NULL if ghash_setkey is not called or if the most recent call to
> ghash_setkey failed to allocate memory.  This causes an oops.  Fix this
> up by returning an error code in the null case.
> 
> This is trivially triggered from unprivileged userspace through the
> AF_ALG interface by simply writing to the socket without setting a key.

I haven't been following 2.6.34-longterm development, but unless
you've also backported the AF_ALG userspace interface from 2.6.38,
this sequence can only be triggered by kernel code.  So while this
patch shouldn't break anything, it isn't really necessary.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NFSv4 WARNING: at linux/fs/inode.c:280 drop_nlink+0x23/0x44()

2012-12-13 Thread Nick Bowler
On 2012-08-18 13:57 -0400, Nick Bowler wrote:
> I just noticed the following WARNING in my logs.  Looking through the
> older logs, I see quite a few of these going (at least) all the way back
> to 3.3.x days.  The process which triggers the warning always seems to
> be the same (icecat).
> 
> This is on a Linux 3.5.2 NFSv4 client machine using sec=krb5.  Other
> than the log noise, there seems to be no adverse effects.

FWIW, I'm still seeing this on 3.7.0; this time on a machine with NFSv3
mounts and sec=sys.  Still apparently caused by firefox/icecat.

> [ cut here ]
> WARNING: at /home/nbowler/misc/linux/fs/inode.c:280 drop_nlink+0x23/0x44()
> Hardware name: System Product Name
> Modules linked in: sha1_ssse3 sha1_generic hmac aes_x86_64 aes_generic cbc 
> cts rpcsec_gss_krb5 nfs lockd auth_rpcgss nfs_acl sunrpc ipv6 nls_iso8859_1 
> nls_cp437 vfat fat w83627ehf hwmon_vid snd_pcm_oss snd_mixer_oss acpi_cpufreq 
> mperf i915 drm_kms_helper drm snd_hda_codec_hdmi snd_hda_codec_realtek arc4 
> ath9k mac80211 ath9k_common ath9k_hw intel_agp i2c_algo_bit intel_gtt 
> snd_hda_intel snd_hda_codec snd_pcm snd_timer coretemp agpgart hwmon 
> kvm_intel snd ath cfg80211 soundcore snd_page_alloc i2c_i801 r8169 mii 
> psmouse kvm evdev video sg
> Pid: 1995, comm: icecat Not tainted 3.5.2 #46
> Call Trace:
>  [] warn_slowpath_common+0x80/0x98
>  [] warn_slowpath_null+0x15/0x17
>  [] drop_nlink+0x23/0x44
>  [] nfs_dentry_iput+0x35/0x4d [nfs]
>  [] dentry_kill+0x149/0x171
>  [] dput+0xed/0xfe
>  [] fput+0x1a5/0x1bd
>  [] filp_close+0x6b/0x76
>  [] sys_close+0x92/0xd4
>  [] system_call_fastpath+0x16/0x1b
> ---[ end trace b160c7dc08b4910c ]---
> 
> Please let me know if you need any more info,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] scatterlist: don't BUG when we can trivially return a proper error.

2012-11-14 Thread Nick Bowler
On 2012-11-14 13:05 -0800, Andrew Morton wrote:
> On Thu,  1 Nov 2012 15:03:00 -0400
> Nick Bowler  wrote:
> 
> > There is absolutely no reason to crash the kernel when we have a
> > perfectly good return value already available to use for conveying
> > failure status.
> 
> Yes, I suppose that's true.  I don't see a case for BUGging the kernel
> here.
[...]
> > -   BUG_ON(nents > max_ents);
> > +   if (WARN_ON_ONCE(nents > max_ents))
> > +   return -E2BIG;
> >  #endif
> 
> OK, pet peeve: if this E2BIG gets returned to userspace, our poor user
> will look it up and see "Argument list too long; used when the
> arguments passed to a new program being executed with one of the exec
> functions occupy too much memory space".  He then gets to spend half a
> day reviewing his code's exec() callsites!
> 
> See?  Although the error's name sounds like a nice match to the
> internal state, it isn't really a match at all and our use of it is
> misleading.
> 
> Unfortunately there is no EKERNELSCREWEDUP,

Well, maybe we should add it! :P

> so we usually use EINVAL.

Fair enough.  I will prepare v2.  But perhaps EOPNOTSUPP would be a
better fit?

Thanks,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Does anyone use CONFIG_TINY_PREEMPT_RCU?

2012-11-13 Thread Nick Bowler
On 2012-11-13 14:25 -0800, Paul E. McKenney wrote:
> On Tue, Nov 13, 2012 at 04:47:20PM -0500, Nick Bowler wrote:
> > On 2012-11-13 13:19 -0800, Paul E. McKenney wrote:
> > > On Tue, Nov 13, 2012 at 12:56:54PM -0500, Nick Bowler wrote:
> > > > On 2012-11-13 09:08 -0800, Paul E. McKenney wrote:
> > > > > Suppose that TREE_PREEMPT_RCU was available for !SMP && PREEMPT 
> > > > > builds.
> > > > > Would that work for you?
> > > > 
> > > > To be honest I don't really know what the difference is, other than what
> > > > the help text says, which is:
> > > > 
> > > >   [TINY_PREEMPT_RCU] greatly reduces the memory footprint of RCU.
> > > >   
> > > > "Greatly reduced memory footprint" sounds pretty useful...
> > > 
> > > OK, so from your viewpoint, the only possible benefit is smaller
> > > memory?
> > 
> > Well, I have no idea.  If I was given the choice between TREE_PREEMPT_RCU
> > and TINY_PREEMPT_RCU, absent any information not in the description of
> > these options, I would choose TINY.  The description suggests that the
> > memory savings come at the expense of SMP support, which sounds like a
> > great tradeoff to make for a UP system.
> > 
> > > How much memory does your device have, if I may ask?
> > 
> > It's a (pretty old!) desktop.  I recently had to upgrade it to two
> > gigabytes due to unbearable thrashing with only one...
> 
> If you have two gigabytes (or even one gigabyte), you won't notice the
> few kilobytes of difference between TINY_PREEMPT_RCU and TREE_PREEMPT_RCU.

Well then TINY_PREEMPT_RCU doesn't sound all that useful for me!
Perhaps the help text could be improved... such as changing the words
"greatly reduced" to "marginally reduced" as a first step?

Is there no significant cache impact due to the larger implementation?
I don't really have the time or expertise to do measurements in this
regard, but if TREE_PREEMPT_RCU was actually a selectable option I could
at least choose it to see if anything explodes horribly...

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Does anyone use CONFIG_TINY_PREEMPT_RCU?

2012-11-13 Thread Nick Bowler
On 2012-11-13 13:19 -0800, Paul E. McKenney wrote:
> On Tue, Nov 13, 2012 at 12:56:54PM -0500, Nick Bowler wrote:
> > On 2012-11-13 09:08 -0800, Paul E. McKenney wrote:
> > > Suppose that TREE_PREEMPT_RCU was available for !SMP && PREEMPT builds.
> > > Would that work for you?
> > 
> > To be honest I don't really know what the difference is, other than what
> > the help text says, which is:
> > 
> >   [TINY_PREEMPT_RCU] greatly reduces the memory footprint of RCU.
> >   
> > "Greatly reduced memory footprint" sounds pretty useful...
> 
> OK, so from your viewpoint, the only possible benefit is smaller
> memory?

Well, I have no idea.  If I was given the choice between TREE_PREEMPT_RCU
and TINY_PREEMPT_RCU, absent any information not in the description of
these options, I would choose TINY.  The description suggests that the
memory savings come at the expense of SMP support, which sounds like a
great tradeoff to make for a UP system.

> How much memory does your device have, if I may ask?

It's a (pretty old!) desktop.  I recently had to upgrade it to two
gigabytes due to unbearable thrashing with only one...

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Does anyone use CONFIG_TINY_PREEMPT_RCU?

2012-11-13 Thread Nick Bowler
On 2012-11-13 09:08 -0800, Paul E. McKenney wrote:
> On Tue, Nov 13, 2012 at 09:46:20AM -0500, Nick Bowler wrote:
> > On 2012-11-12 16:49 -0800, Paul E. McKenney wrote:
> > > Hello!
> > > 
> > > I know of people using TINY_RCU, TREE_RCU, and TREE_PREEMPT_RCU, but I
> > > have not heard of anyone using TINY_PREEMPT_RCU for whom TREE_PREEMPT_RCU
> > > was not a viable option (in contrast, the people running Linux on
> > > tiny-memmory systems typically use TINY_RCU).  Of course, if no one
> > > really needs it, the proper thing to do is to remove it.
> > > 
> > > So, if you need TINY_PREEMPT_RCU, please let me know.  Otherwise, I will
> > > remove it, probably in the 3.9 timeframe.
> > 
> > Yes, I use TINY_PREEMPT_RCU on my UP machines.  It is, in fact, the only
> > option.
> 
> Suppose that TREE_PREEMPT_RCU was available for !SMP && PREEMPT builds.
> Would that work for you?

To be honest I don't really know what the difference is, other than what
the help text says, which is:

  [TINY_PREEMPT_RCU] greatly reduces the memory footprint of RCU.
  
"Greatly reduced memory footprint" sounds pretty useful...

As a side note, I wonder why any of these RCU implementations are
user-seclectable options in the first place?  It looks like you will
only ever have one choice, since the dependencies all seem mutually
exclusive:

  TREE_RCU depends on !PREEMPT &&  SMP
  TREE_PREEMPT_RCU depends on  PREEMPT &&  SMP
  TINY_RCU depends on !PREEMPT && !SMP
  TINY_PREEMPT_RCU depends on  PREEMPT && !SMP

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Does anyone use CONFIG_TINY_PREEMPT_RCU?

2012-11-13 Thread Nick Bowler
On 2012-11-12 16:49 -0800, Paul E. McKenney wrote:
> Hello!
> 
> I know of people using TINY_RCU, TREE_RCU, and TREE_PREEMPT_RCU, but I
> have not heard of anyone using TINY_PREEMPT_RCU for whom TREE_PREEMPT_RCU
> was not a viable option (in contrast, the people running Linux on
> tiny-memmory systems typically use TINY_RCU).  Of course, if no one
> really needs it, the proper thing to do is to remove it.
> 
> So, if you need TINY_PREEMPT_RCU, please let me know.  Otherwise, I will
> remove it, probably in the 3.9 timeframe.

Yes, I use TINY_PREEMPT_RCU on my UP machines.  It is, in fact, the only
option.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] ARM: zynq: Allow UART1 to be used as DEBUG_LL console.

2012-11-05 Thread Nick Bowler
The main UART on the Xilinx ZC702 board is UART1, located at address
e0001000.  Add a Kconfig option to select this device as the low-level
debugging port.  This allows the really early boot printouts to reach
the USB serial adaptor on this board.

For consistency's sake, add a choice entry for UART0 even though it is
the the default if UART1 is not selected.

Signed-off-by: Nick Bowler 
Tested-by: Josh Cartwright 
---
Sorry all for the phenominal delay in sending this out.  Josh, I kept
your Tested-By since this version is Obviously Equivalent™ to v2...

v2: rebase on newest patch series, signoff.
v3: squash in style tweaks suggested by Michal Simek.

 arch/arm/Kconfig.debug |   17 +
 arch/arm/mach-zynq/common.c|6 +++---
 arch/arm/mach-zynq/include/mach/zynq_soc.h |   16 +++-
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index b0f3857b3a4c..7754d51f2b19 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -132,6 +132,23 @@ choice
  their output to UART1 serial port on DaVinci TNETV107X
  devices.
 
+   config DEBUG_ZYNQ_UART0
+   bool "Kernel low-level debugging on Xilinx Zynq using UART0"
+   depends on ARCH_ZYNQ
+   help
+ Say Y here if you want the debug print routines to direct
+ their output to UART0 on the Zynq platform.
+
+   config DEBUG_ZYNQ_UART1
+   bool "Kernel low-level debugging on Xilinx Zynq using UART1"
+   depends on ARCH_ZYNQ
+   help
+ Say Y here if you want the debug print routines to direct
+ their output to UART1 on the Zynq platform.
+
+ If you have a ZC702 board and want early boot messages to
+ appear on the USB serial adaptor, select this option.
+
config DEBUG_DC21285_PORT
bool "Kernel low-level debugging messages via footbridge serial 
port"
depends on FOOTBRIDGE
diff --git a/arch/arm/mach-zynq/common.c b/arch/arm/mach-zynq/common.c
index ba8d14f78d4d..93b91059faab 100644
--- a/arch/arm/mach-zynq/common.c
+++ b/arch/arm/mach-zynq/common.c
@@ -84,9 +84,9 @@ static struct map_desc io_desc[] __initdata = {
 
 #ifdef CONFIG_DEBUG_LL
{
-   .virtual= UART0_VIRT,
-   .pfn= __phys_to_pfn(UART0_PHYS),
-   .length = UART0_SIZE,
+   .virtual= LL_UART_VADDR,
+   .pfn= __phys_to_pfn(LL_UART_PADDR),
+   .length = UART_SIZE,
.type   = MT_DEVICE,
},
 #endif
diff --git a/arch/arm/mach-zynq/include/mach/zynq_soc.h 
b/arch/arm/mach-zynq/include/mach/zynq_soc.h
index 1b8bf0ecbcb0..5ebbd8e6 100644
--- a/arch/arm/mach-zynq/include/mach/zynq_soc.h
+++ b/arch/arm/mach-zynq/include/mach/zynq_soc.h
@@ -25,8 +25,9 @@
  * address that is known to work.
  */
 #define UART0_PHYS 0xE000
-#define UART0_SIZE SZ_4K
-#define UART0_VIRT 0xF0001000
+#define UART1_PHYS 0xE0001000
+#define UART_SIZE  SZ_4K
+#define UART_VIRT  0xF0001000
 
 #define TTC0_PHYS  0xF8001000
 #define TTC0_SIZE  SZ_4K
@@ -36,12 +37,17 @@
 #define SCU_PERIPH_SIZESZ_8K
 #define SCU_PERIPH_VIRT(TTC0_VIRT - SCU_PERIPH_SIZE)
 
+#if IS_ENABLED(CONFIG_DEBUG_ZYNQ_UART1)
+# define LL_UART_PADDR UART1_PHYS
+#else
+# define LL_UART_PADDR UART0_PHYS
+#endif
+
+#define LL_UART_VADDR  UART_VIRT
+
 /* The following are intended for the devices that are mapped early */
 
 #define TTC0_BASE  IOMEM(TTC0_VIRT)
 #define SCU_PERIPH_BASEIOMEM(SCU_PERIPH_VIRT)
 
-#define LL_UART_PADDR  UART0_PHYS
-#define LL_UART_VADDR  UART0_VIRT
-
 #endif
-- 
1.7.8.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] scatterlist: don't BUG when we can trivially return a proper error.

2012-11-01 Thread Nick Bowler
There is absolutely no reason to crash the kernel when we have a
perfectly good return value already available to use for conveying
failure status.

Let's return an error code instead of crashing the kernel: that sounds
like a much better plan.

Signed-off-by: Nick Bowler 
---
 lib/scatterlist.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 3675452b23ca..11ecaf000696 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -248,7 +248,8 @@ int __sg_alloc_table(struct sg_table *table, unsigned int 
nents,
unsigned int left;
 
 #ifndef ARCH_HAS_SG_CHAIN
-   BUG_ON(nents > max_ents);
+   if (WARN_ON_ONCE(nents > max_ents))
+   return -E2BIG;
 #endif
 
memset(table, 0, sizeof(*table));
-- 
1.7.8.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: zynq: Allow UART1 to be used as DEBUG_LL console.

2012-11-01 Thread Nick Bowler
On 2012-10-30 12:27 +0100, Michal Simek wrote:
> On 10/29/2012 07:19 PM, Nick Bowler wrote:
> > +#if IS_ENABLED(CONFIG_DEBUG_ZYNQ_UART1)
> > +#  define LL_UART_PADDRUART1_PHYS
> > +#  define LL_UART_VADDRUART_VIRT
> > +#else
> > +#  define LL_UART_PADDRUART0_PHYS
> > +#  define LL_UART_VADDRUART_VIRT
> > +#endif
> 
> Probably no reason to setup LL_UART_VADDR on two lines.
> It is enough to set it up once.
> 
> MINOR: It is just my personal preference to use different coding style.
> 
> #if IS_ENABLED(CONFIG_DEBUG_ZYNQ_UART1)
> # define LL_UART_PADDRUART1_PHYS
> #else
> # define LL_UART_PADDRUART0_PHYS
> #endif
> 
> #define LL_UART_VADDR UART_VIRT

I have no strong feeling either way, so I will send v3 with these
changes.

Thanks,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] ARM: zynq: Allow UART1 to be used as DEBUG_LL console.

2012-10-29 Thread Nick Bowler
The main UART on the Xilinx ZC702 board is UART1, located at address
e0001000.  Add a Kconfig option to select this device as the low-level
debugging port.  This allows the really early boot printouts to reach
the USB serial adaptor on this board.

For consistency's sake, add a choice entry for UART0 even though it is
the the default if UART1 is not selected.

Signed-off-by: Nick Bowler 
Tested-by: Josh Cartwright 
---
v2: rebase on newest patch series, signoff.

This should apply cleanly on top of Josh Cartwright's v5 "zynq subarch
cleanups" series.

 arch/arm/Kconfig.debug |   17 +
 arch/arm/mach-zynq/common.c|6 +++---
 arch/arm/mach-zynq/include/mach/zynq_soc.h |   16 +++-
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index b0f3857b3a4c..7754d51f2b19 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -132,6 +132,23 @@ choice
  their output to UART1 serial port on DaVinci TNETV107X
  devices.
 
+   config DEBUG_ZYNQ_UART0
+   bool "Kernel low-level debugging on Xilinx Zynq using UART0"
+   depends on ARCH_ZYNQ
+   help
+ Say Y here if you want the debug print routines to direct
+ their output to UART0 on the Zynq platform.
+
+   config DEBUG_ZYNQ_UART1
+   bool "Kernel low-level debugging on Xilinx Zynq using UART1"
+   depends on ARCH_ZYNQ
+   help
+ Say Y here if you want the debug print routines to direct
+ their output to UART1 on the Zynq platform.
+
+ If you have a ZC702 board and want early boot messages to
+ appear on the USB serial adaptor, select this option.
+
config DEBUG_DC21285_PORT
bool "Kernel low-level debugging messages via footbridge serial 
port"
depends on FOOTBRIDGE
diff --git a/arch/arm/mach-zynq/common.c b/arch/arm/mach-zynq/common.c
index ba8d14f78d4d..93b91059faab 100644
--- a/arch/arm/mach-zynq/common.c
+++ b/arch/arm/mach-zynq/common.c
@@ -84,9 +84,9 @@ static struct map_desc io_desc[] __initdata = {
 
 #ifdef CONFIG_DEBUG_LL
{
-   .virtual= UART0_VIRT,
-   .pfn= __phys_to_pfn(UART0_PHYS),
-   .length = UART0_SIZE,
+   .virtual= LL_UART_VADDR,
+   .pfn= __phys_to_pfn(LL_UART_PADDR),
+   .length = UART_SIZE,
.type   = MT_DEVICE,
},
 #endif
diff --git a/arch/arm/mach-zynq/include/mach/zynq_soc.h 
b/arch/arm/mach-zynq/include/mach/zynq_soc.h
index 1b8bf0ecbcb0..7f4f38bcada9 100644
--- a/arch/arm/mach-zynq/include/mach/zynq_soc.h
+++ b/arch/arm/mach-zynq/include/mach/zynq_soc.h
@@ -25,8 +25,9 @@
  * address that is known to work.
  */
 #define UART0_PHYS 0xE000
-#define UART0_SIZE SZ_4K
-#define UART0_VIRT 0xF0001000
+#define UART1_PHYS 0xE0001000
+#define UART_SIZE  SZ_4K
+#define UART_VIRT  0xF0001000
 
 #define TTC0_PHYS  0xF8001000
 #define TTC0_SIZE  SZ_4K
@@ -36,12 +37,17 @@
 #define SCU_PERIPH_SIZESZ_8K
 #define SCU_PERIPH_VIRT(TTC0_VIRT - SCU_PERIPH_SIZE)
 
+#if IS_ENABLED(CONFIG_DEBUG_ZYNQ_UART1)
+#  define LL_UART_PADDRUART1_PHYS
+#  define LL_UART_VADDRUART_VIRT
+#else
+#  define LL_UART_PADDRUART0_PHYS
+#  define LL_UART_VADDRUART_VIRT
+#endif
+
 /* The following are intended for the devices that are mapped early */
 
 #define TTC0_BASE  IOMEM(TTC0_VIRT)
 #define SCU_PERIPH_BASEIOMEM(SCU_PERIPH_VIRT)
 
-#define LL_UART_PADDR  UART0_PHYS
-#define LL_UART_VADDR  UART0_VIRT
-
 #endif
-- 
1.7.8.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: zynq: Allow UART1 to be used as DEBUG_LL console.

2012-10-29 Thread Nick Bowler
On 2012-10-29 10:56 -0600, Josh Cartwright wrote:
> On Thu, Oct 25, 2012 at 06:47:34PM -0400, Nick Bowler wrote:
> > The main UART on the Xilinx ZC702 board is UART1, located at address
> > e0001000.  Add a Kconfig option to select this device as the low-level
> > debugging port.  This allows the really early boot printouts to reach
> > the USB serial adaptor on this board.
> > 
> > For consistency's sake, add a choice entry for UART0 even though it is
> > the the default if UART1 is not selected.
> > 
> > As there are currently known issues related to the UART virtual
> > mappings, this is KNOWN BROKEN, not to be merged yet!
> > 
> > Not-Yet-Signed-off-by: Nick Bowler 
> 
> Tested-by: Josh Cartwright 
> 
> Now that v5 of the initial zynq cleanup patchset is queued up for
> merging (with a workaround for the uart mapping problem), what would it
> take for you to sign off on this patch?

Great, I've tested this on top of the other 4 and the boot console is
working now.  I will resend the patch with my signoff.

(I wonder if UART0 has similar address problems on the ZC702 if it is
selected as the boot console... but this is harder to test as AFAIK it's
not connected to anything on this board by default, but it could in
principle be connected to something in the PL.  We can cross that bridge
if and when we get to it, I guess).

> There is some trivial merging that has to be done to get it to apply
> cleanly on v5.  See a rebased version below.

Yup, looks good.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ARM: zynq: Allow UART1 to be used as DEBUG_LL console.

2012-10-25 Thread Nick Bowler
The main UART on the Xilinx ZC702 board is UART1, located at address
e0001000.  Add a Kconfig option to select this device as the low-level
debugging port.  This allows the really early boot printouts to reach
the USB serial adaptor on this board.

For consistency's sake, add a choice entry for UART0 even though it is
the the default if UART1 is not selected.

As there are currently known issues related to the UART virtual
mappings, this is KNOWN BROKEN, not to be merged yet!

Not-Yet-Signed-off-by: Nick Bowler 
---
 arch/arm/Kconfig.debug |   17 +
 arch/arm/mach-zynq/common.c|6 +++---
 arch/arm/mach-zynq/include/mach/zynq_soc.h |   18 --
 3 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index b0f3857b3a4c..7754d51f2b19 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -132,6 +132,23 @@ choice
  their output to UART1 serial port on DaVinci TNETV107X
  devices.
 
+   config DEBUG_ZYNQ_UART0
+   bool "Kernel low-level debugging on Xilinx Zynq using UART0"
+   depends on ARCH_ZYNQ
+   help
+ Say Y here if you want the debug print routines to direct
+ their output to UART0 on the Zynq platform.
+
+   config DEBUG_ZYNQ_UART1
+   bool "Kernel low-level debugging on Xilinx Zynq using UART1"
+   depends on ARCH_ZYNQ
+   help
+ Say Y here if you want the debug print routines to direct
+ their output to UART1 on the Zynq platform.
+
+ If you have a ZC702 board and want early boot messages to
+ appear on the USB serial adaptor, select this option.
+
config DEBUG_DC21285_PORT
bool "Kernel low-level debugging messages via footbridge serial 
port"
depends on FOOTBRIDGE
diff --git a/arch/arm/mach-zynq/common.c b/arch/arm/mach-zynq/common.c
index ba8d14f78d4d..93b91059faab 100644
--- a/arch/arm/mach-zynq/common.c
+++ b/arch/arm/mach-zynq/common.c
@@ -84,9 +84,9 @@ static struct map_desc io_desc[] __initdata = {
 
 #ifdef CONFIG_DEBUG_LL
{
-   .virtual= UART0_VIRT,
-   .pfn= __phys_to_pfn(UART0_PHYS),
-   .length = UART0_SIZE,
+   .virtual= LL_UART_VADDR,
+   .pfn= __phys_to_pfn(LL_UART_PADDR),
+   .length = UART_SIZE,
.type   = MT_DEVICE,
},
 #endif
diff --git a/arch/arm/mach-zynq/include/mach/zynq_soc.h 
b/arch/arm/mach-zynq/include/mach/zynq_soc.h
index c6b9b67bf7c7..cab72bfd183c 100644
--- a/arch/arm/mach-zynq/include/mach/zynq_soc.h
+++ b/arch/arm/mach-zynq/include/mach/zynq_soc.h
@@ -23,23 +23,29 @@
  * vmalloc region
  */
 #define UART0_PHYS 0xE000
-#define UART0_SIZE SZ_4K
-#define UART0_VIRT (VMALLOC_END - UART0_SIZE)
+#define UART1_PHYS 0xE0001000
+#define UART_SIZE  SZ_4K
+#define UART_VIRT  (VMALLOC_END - UART_SIZE)
 
 #define TTC0_PHYS  0xF8001000
 #define TTC0_SIZE  SZ_4K
-#define TTC0_VIRT  (UART0_VIRT - TTC0_SIZE)
+#define TTC0_VIRT  (UART_VIRT - TTC0_SIZE)
 
 #define SCU_PERIPH_PHYS0xF8F0
 #define SCU_PERIPH_SIZESZ_8K
 #define SCU_PERIPH_VIRT(TTC0_VIRT - SCU_PERIPH_SIZE)
 
+#if IS_ENABLED(CONFIG_DEBUG_ZYNQ_UART1)
+#  define LL_UART_PADDRUART1_PHYS
+#  define LL_UART_VADDRUART_VIRT
+#else
+#  define LL_UART_PADDRUART0_PHYS
+#  define LL_UART_VADDRUART_VIRT
+#endif
+
 /* The following are intended for the devices that are mapped early */
 
 #define TTC0_BASE  IOMEM(TTC0_VIRT)
 #define SCU_PERIPH_BASEIOMEM(SCU_PERIPH_VIRT)
 
-#define LL_UART_PADDR  UART0_PHYS
-#define LL_UART_VADDR  UART0_VIRT
-
 #endif
-- 
1.7.8.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 5/5] zynq: move static peripheral mappings

2012-10-25 Thread Nick Bowler
On 2012-10-25 16:29 -0500, Josh Cartwright wrote:
> On Thu, Oct 25, 2012 at 04:17:01PM -0400, Nick Bowler wrote:
> > Did you test this on any real hardware?  I can't get the ZC702 to work
> > with the UART mapped at this address (this ends up being mapped at
> > 0xFEFFF000), although I can't for the life of me figure out why the
> > virtual address even matters.  Note that for the ZC702, the physical
> > address of the "main" UART is 0xE0001000.
> 
> Ugh, not yet;  My testing has been on a qemu model.  I also
> unfortunately neglected to mention I am carrying a qemu patch that
> forces RX_EN/TX_EN of the uarts out of reset.  There is an (incomplete)
> thread on qemu-devel discussing whose responsibility it really is to
> enable the uarts:
> 
>http://lists.gnu.org/archive/html/qemu-devel/2012-10/msg03779.html
> 
> Clearly, though, if you are seeing the "Uncompressing Linux..."
> messages, then the uart is enabled, so I don't think that's the problem.

Yes, the uart is presumably enabled by u-boot.

> >"Works": all printouts make it to the console
> >"Fails": no printouts make it to the console after decompression
> >"Truncated": the first few lines of output do not make it to the
> > console, but after that it "Works".  The first line
> > successfully printed is always
> >   "Built 1 zonelists in Zone order, mobility grouping on.  
> > Total pages: 260096"
> 
> Odd, I'm wondering the uart gets into a weird state, and some bits get
> knocked loose at console_initcall() time, when the console driver comes
> up (Assuming CONFIG_SERIAL_XILINX_PS_UART)?

While I am using that driver, it is not initialized until relatively
late in the boot process.  If I were to guess, I would guess that,
except for when it "Works", the really really early printk stuff isn't
actually hitting the uart at all.  The "Fails" case would then be due to
the stray writes crashing the board, and the "Truncated" case due to the
stray writes being (ostensibly) benign.

But I really have no way right now to test this hypothesis, since I
can't print anything in the failing case.

> > And here are the addresses I tested:
> > 
> >   Address   Result
> >   ---
> >   0xf000Truncated
> >   0xf0001000Works
[...]
> >   0xfefff000Fails
> > 
> > Judging by the list, the console seems to only work properly if the
> > defined virtual address is Fxxx1000 and xxx is not too big...
> 
> Very odd.  Do you mind sending out your patch allowing the selection of
> the secondary uart for DEBUG_LL?

I will follow up with the version that applies on top of your series in
a moment.  I'm confident that the UART works on the ZC702 when mapped at
0xf0001000, since I've been running with that since I first got my hands
on one of these boards.

But you don't need any patch to do the same tests I was doing above:
you can just change UART0_PHYS to 0xe0001000 and then set UART0_VIRT
accordingly (you may need to move the TTC/SCU mappings depending where
you put the UART, of course).

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 5/5] zynq: move static peripheral mappings

2012-10-25 Thread Nick Bowler
Hi Josh,

On 2012-10-24 15:04 -0500, Josh Cartwright wrote:
> Shifting them up into the vmalloc region prevents the following warning,
> when booting a zynq qemu target with more than 512mb of RAM:
[...]
> -/* For now, all mappings are flat (physical = virtual)
> +/* Static peripheral mappings are mapped at the top of the
> + * vmalloc region
>   */
> -#define UART0_PHYS   0xE000
> -#define UART0_VIRT   UART0_PHYS
> +#define UART0_PHYS   0xE000
> +#define UART0_SIZE   SZ_4K
> +#define UART0_VIRT   (VMALLOC_END - UART0_SIZE)

Did you test this on any real hardware?  I can't get the ZC702 to work
with the UART mapped at this address (this ends up being mapped at
0xFEFFF000), although I can't for the life of me figure out why the
virtual address even matters.  Note that for the ZC702, the physical
address of the "main" UART is 0xE0001000.

All I end up seeing is "Uncompressing Linux... done, booting the
kernel." with no further messages.  With the UART mapped at
0xF0001000, all printouts make it to the console.  I tried a couple
different virtual addresses and I'm surprised at the results, since
the behaviour seems to vary wildly.  I saw three behaviours depending
only on the virtual address of the static mapping; all results are 100%
reproducible:

   "Works": all printouts make it to the console
   "Fails": no printouts make it to the console after decompression
   "Truncated": the first few lines of output do not make it to the
console, but after that it "Works".  The first line
successfully printed is always
  "Built 1 zonelists in Zone order, mobility grouping on.  
Total pages: 260096"

And here are the addresses I tested:

  Address   Result
  ---
  0xf000Truncated
  0xf0001000Works
  0xf0007000Truncated
  0xf0008000Fails
  0xf0009000Fails
  0xf000e000Truncated
  0xf000f000Fails
  0xf800Truncated
  0xf8001000Works
  0xfef0Truncated
  0xfef01000Works
  0xfef08000Fails
  0xfef0f000Fails
  0xfeffFails
  0xfeff1000Fails
  0xfeffe000Fails
  0xfefff000Fails

Judging by the list, the console seems to only work properly if the
defined virtual address is Fxxx1000 and xxx is not too big...

Confused,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/4] zynq: move static peripheral mappings

2012-10-24 Thread Nick Bowler
On 2012-10-23 18:42 -0500, Josh Cartwright wrote:
> On Tue, Oct 23, 2012 at 04:27:03PM -0400, Nick Bowler wrote:
> > Just FYI, I sent a patch to fix the same bug a while back
> >
> >   https://patchwork.kernel.org/patch/1156361/
> >
> > together with other patches to fix early printk on the ZC702 serial
> > console.  Admittedly, I dropped the ball on these as other issues
> > came up so I was away from the Zynq for a while.
> >
> > However, I'm now getting back on the Zynq and have a bunch of patches to
> > make it all work on the ZC702 board.  I've respun the ZC702 early boot
> > fixes against newer git but they're obviously going to conflict with
> > this series.  Should I resend them anyway?
> 
> If you have other fixes for the zc702, that'd be great.  Most of my
> testing has been in a qemu model; I haven't had a chance to try getting
> the zc702 booting yet.
> 
> The first stumbling block is that it looks like the secondary uart is
> the primary uart on the zc702.

Yes, that is indeed the case, and was what I tried to address with my
earlier patches.

> > I also have a DT binding for the TTC driver, I can send that.
> 
> That'd be great!

OK, I will respin and test this stuff on top of your v4 series and send
them out.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/5] zynq: remove use of CLKDEV_LOOKUP

2012-10-24 Thread Nick Bowler
On 2012-10-23 19:34 -0500, Josh Cartwright wrote:
> The Zynq support in mainline does not (yet) make use of any of the
> generic clk or clk lookup functionality.  Remove what is upstream for
> now, until the out-of-tree implementation is in suitable form for
> merging.
> 
> An important side effect of this patch is that it allows the building of
> a Zynq kernel without running into unresolved symbol problems:
> 
>drivers/built-in.o: In function `amba_get_enable_pclk':
>clkdev.c:(.text+0x444): undefined reference to `clk_enable'

For the record, I think this was introduced by commit 56a34b03ff427
("ARM: versatile: Make plat-versatile clock optional") which forgot to
select PLAT_VERSATILE_CLOCK on Zynq.  This is not all that surprising,
because the fact that Zynq "uses" PLAT_VERSATILE is secretly hidden in
the Makefile.

Nevertheless, the only feature from versatile that Zynq needed was the
clock support, so this patch should *also* delete the secret use of
plat-versatile by removing this line from arch/arm/Makefile:

  plat-$(CONFIG_ARCH_ZYNQ)  += versatile

> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index cce4f8d..de70d99 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -946,7 +946,6 @@ config ARCH_ZYNQ
>   bool "Xilinx Zynq ARM Cortex A9 Platform"
>   select ARM_AMBA
>   select ARM_GIC
> - select CLKDEV_LOOKUP
>   select CPU_V7
>   select GENERIC_CLOCKEVENTS
>   select ICST

I'd prefer if we just added "select COMMON_CLK" instead of removing this
so we don't have to re-add this later, but I guess it doesn't really
matter either way.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OOPS after deleting file on ext4 filesystem

2012-10-24 Thread Nick Bowler
On 2012-10-23 23:08 -0400, Theodore Ts'o wrote:
> On Tue, Oct 23, 2012 at 08:50:22PM -0400, Nick Bowler wrote:
> > I just saw an ext4 oops on one of my machines after a couple months of
> > uptime, on Linux 3.5.2.  I doubt I will be able to reproduce the problem
> > easily so I'm just posting this in case anyone can tell what's going on.
> 
> Fixed in v3.5.3 or later kernels.  (Commit 2cd45bebc56a)

Good to know.

Thanks,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/4] zynq: move static peripheral mappings

2012-10-23 Thread Nick Bowler
On 2012-10-23 15:53 -0500, Josh Cartwright wrote:
> On Tue, Oct 23, 2012 at 03:09:23PM -0500, Rob Herring wrote:
> > On 10/23/2012 09:50 AM, Arnd Bergmann wrote:
> > > On Monday 22 October 2012, Josh Cartwright wrote:
> > >> -#define SCU_PERIPH_PHYS 0xF8F0
> > >> -#define SCU_PERIPH_VIRT SCU_PERIPH_PHYS
> > >> +#define SCU_PERIPH_PHYS 0xF8F0
> > >> +#define SCU_PERIPH_SIZE SZ_8K
> > >> +#define SCU_PERIPH_VIRT (PL310_L2CC_VIRT - SCU_PERIPH_SIZE)
> > >
> > > And your patch 3 already obsoletes this mapping.
> >
> > Actually, it's probably still needed. The smp platform code typically
> > reads the number of cores from the SCU and the mapping has to be in
> > place before ioremap is up. I don't think there is an architected way to
> > get the number of cores, but it would be nice to avoid this early SCU
> > access. We could also mandate getting the core count from DT instead.
> >
> > Also, the physical address can be read with this on A9's:
> >
> > asm("mrc p15, 4, %0, c15, c0, 0" : "=r" (base));
> 
> For the sake of the zynq cleanups, I think it may still make sense to
> remove the SCU peripheral mappings for now.  By the time we're ready to
> push in SMP support for zynq, maybe we can tackle the problem of how to
> solve the SCU mapping problem generically.

Then the static mapping can be removed if and when the we "solve the SCU
mapping problem generically".  There's no point in removing it until
then since it doesn't cause any actual problems, does it?

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/4] zynq: move static peripheral mappings

2012-10-23 Thread Nick Bowler
On 2012-10-23 11:26 -0500, Josh Cartwright wrote:
> On Tue, Oct 23, 2012 at 02:50:11PM +, Arnd Bergmann wrote:
> > On Monday 22 October 2012, Josh Cartwright wrote:
> > > Shifting them up into the vmalloc region prevents the following warning,
> > > when booting a zynq qemu target with more than 512mb of RAM:
> > > 
> > >   BUG: mapping for 0xe000 at 0xe000 out of vmalloc space
> > > 
> > > In addition, it allows for reuse of these mappings when the proper
> > > drivers issue requests via ioremap().
> > > 
> > > Signed-off-by: Josh Cartwright 
> > 
> > This looks like a bug fix that should be backported to older kernels,
> > so it would be good to add 'Cc: sta...@vger.kernel.org' below your
> > Signed-off-by.
> 
> Will-do, thanks.

Just FYI, I sent a patch to fix the same bug a while back

  https://patchwork.kernel.org/patch/1156361/

together with other patches to fix early printk on the ZC702 serial
console.  Admittedly, I dropped the ball on these as other issues
came up so I was away from the Zynq for a while.

However, I'm now getting back on the Zynq and have a bunch of patches to
make it all work on the ZC702 board.  I've respun the ZC702 early boot
fixes against newer git but they're obviously going to conflict with
this series.  Should I resend them anyway?

> > > -#define TTC0_PHYS0xF8001000
> > > -#define TTC0_VIRTTTC0_PHYS
> > > +#define TTC0_PHYS0xF8001000
> > > +#define TTC0_SIZESZ_4K
> > > +#define TTC0_VIRT(UART0_VIRT - TTC0_SIZE)
> > 
> > It's quite likely that this does not have to be a fixed mapping
> > any more. Just have a look at how drivers/clocksource/dw_apb_timer_of.c
> > calls of_iomap() to get the address.
> 
> Yes, this is already on my list of plans.  The in-tree TTC driver
> unfortunately doesn't yet support device tree bindings.  Are you
> comfortable waiting on the DT-ification of the TTC in a follow-up
> patchset?

I also have a DT binding for the TTC driver, I can send that.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


OOPS after deleting file on ext4 filesystem

2012-10-23 Thread Nick Bowler
Hi folks,

I just saw an ext4 oops on one of my machines after a couple months of
uptime, on Linux 3.5.2.  I doubt I will be able to reproduce the problem
easily so I'm just posting this in case anyone can tell what's going on.

Going by the timing, and the call trace, it is presumably related to the
fact that I had rm'd a ~12G file shortly before the last log entry.  The
filesystem is aged somewhat and close to full (hence why I was deleting
the file in the first place).  However, I'm not certain of the *exact*
timeline because I didn't notice that the system had crashed until the
next day.

In case it matters, fs recovery after resetting the box resulted in
hundreds of messages like:

   EXT4-fs (md127): ext4_orphan_cleanup: deleting unreferenced inode 12058658

I took a photo of the oops text that was on screen and posted it here:

  http://i.imgur.com/7DfIP.jpg

For convenience (and the benefit of list archives), I've transcribed the
oops, but I could have easily fat-fingered something so the only the
photo is authoritative.

BUG: unable to handle kernel NULL pointer dereference at 0028
IP: [] ext4_ext_remove_space+0x725/0x9db [ext4]
PGD 1043f067 PUD 1078f067 PMD 0
Oops:  [#1] PREEMPT
CPU 0
Modules linked in: nls_iso8859_1 nls_cp437 vfat fat usb_storage nls_utf8 isofs 
it87 hwmon_vid sha1_generic hmac aes_generic cbc cts crypto_blkcipher cryptomgr 
aead nfs nfsd exportfs lockd bridge stp ipv6 llc iptable_filter iptable_nat 
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x_tables ext2 
snd_pcm_oss snd_mixer_oss snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul 
snd_seq_virmidi snd_seq_midi_event snd_seq rpcsec_gss_krb5 auth_rpcgss sunrpc 
tun raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor 
async_tx sg firewire_sbp2 loop snd_emu10k1 sr_mod snd_hwdep snd_util_mem 
snd_ac97_codec ac9_bus snd_rawmidi snd_seq_device snd_pcm snd_page_alloc 
snd_timer snd ftdi_sio cdrom epic100 firewire_ohci emu10k1_gp firewire_core 
gameport crc_itu_t soundcore forcedeth k8temp usbserial mii powernow_k8 floppy 
pata_amd mperf evdev i2c_nforce2 ext4 crc16 jbd2 crypto_hash crypto_algapi 
crypto mbcache raid1 md_mod

Pid: 13628, comm: rm Not tainted 3.5.2 #107 ASUSTek Computer Inc. 
K8N-E-Deluxe/'K8N-E-Deluxe'
RIP: 0010:[]  [] 
ext4_ext_remove_space+0x725/0x9db [ext4]
RSP: 0018:8800105cfca8  EFLAGS: 00010246
RAX:  RBX: 88006d5db5f0 RCX: 0002
RDX: 0001 RSI: 0001 RDI: 073f313e
RBP: 8800105cfd88 R08: 073f313e R09: 03e8
R10: 1600 R11: 88007b930180 R12: 8800588e4d68
R13: 88004d002000 R14: 88006d5db5c0 R15: 
FS:  7f1e400b8700() GS:81623000() knlGS:f757a6c0
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 0028 CR3: 7c6ec000 CR4: 07f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process rm (pid: 13628, threadinfo 8800105ce00, task 8800718e7230)
Stack:
 88000133bde0 8800588e4d68 8800219d2300 880017ebf3d8
 8800105cfd50 0003 04549000 8800fff5
 105cfd50 88006d5db648 8800588e4cc8 080080006a88
Call Trace:
 [] ext4_ext_truncate+0xcd/0x173 [ext4]
 [] ? ext4_mark_inode_dirty+0x13e/0x168 [ext4]
 [] ext4_truncate+0x46/0x51 [ext4]
 [] ext4_evict_inode+0x276/0x363 [ext4]
 [] ? ext4_da_writepages+0x423/0x423 [ext4]
 [] evict+0xb6/0x182
 [] iput+0x1fb/0x203
 [] do_unlinkat+0x10b/0x161
 [] sys_unlinkat+0x24/0x26
 [] system_call_fastpath+0x1a/0x1f
Code: e1 ff ff 48 63 5d bc 48 6b db 30 48 03 5d b0 e9 f1 00 00 00 48 63 55 bc 
48 6b da 30 48 03 5d b0 48 83 7b 20 00 75 0c 48 8b 43 28 <48> 8b 40 28 48 89 43 
20 48 8b 43 18 48 85 c0 75 1f 48 8b 43 20
RIP  [] ext4_ext_remove_space+0x725/0x9db [ext4]
 RSP 
CR2: 0028

Thanks,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.6

2012-10-09 Thread Nick Bowler
On 2012-10-04 23:30 +0200, Stefan Richter wrote:
> On Oct 04 Nick Bowler wrote:
> > On 2012-10-04 09:14 -0700, Kees Cook wrote:
> > > On Thu, Oct 04, 2012 at 12:03:54PM -0400, Nick Bowler wrote:
> > > > On 2012-10-04 08:49 -0700, Kees Cook wrote:
> > > > > FWIW, there should have been an audit message about it in dmesg.
[...]
> > > >   # dmesg
> > > >   (no output)
> > > 
> > > Well that's sad. :( Two situations I can think of for that:
> > > - the kernel wasn't build with CONFIG_AUDIT
> > 
> > Indeed, I do not have this option enabled.  Why would I have it?  The
> > description says it's for SELinux, which I do not use.
> 
> It says it is /among else/ for SELinux.  Another user appears to be
> ConsoleKit, which wants CONFIG_AUDITSYSCALL, which depends on CONFIG_AUDIT.

Indeed, you are correct that the help text does imply that there are
(potentially) other users besides SElinux, although it does not say what
they are.  Regardless, the point is that I have no idea why I would have
this optional feature enabled, as I still don't even know what it does
because the help text doesn't actually say.  I even found a website,
http://people.redhat.com/sgrubb/audit/, which seems to be related to
this feature but even here I cannot find one sentence explaining what
the feature is.

Well, from this thread I now know that this feature enables, at least
in some cases, printk messages when your previously-working scripts are
broken by a kernel update.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.6

2012-10-04 Thread Nick Bowler
On 2012-10-04 09:14 -0700, Kees Cook wrote:
> On Thu, Oct 04, 2012 at 12:03:54PM -0400, Nick Bowler wrote:
> > On 2012-10-04 08:49 -0700, Kees Cook wrote:
> > > On Thu, Oct 04, 2012 at 09:35:04AM -0400, Nick Bowler wrote:
[...]
> > > > The thing that bothers me most about all this is that it's basically
> > > > impossible to see why things are failing without digging through the git
> > > > tree or posting to the mailing list (or recalling earlier mailing list
> > > > discussions about the restriction, as I vaguely do now).  You just
> > > > suddenly get "permission denied" errors when all the permissions
> > > > involved look fine.  As far as I know, the owner, group and mode of
> > > > symlinks have always been completely meaningless.  Upgrade to 3.6, and
> > > > they're suddenly meaningful in extremely non-obvious ways.
> > > 
> > > FWIW, there should have been an audit message about it in dmesg.
> > 
> > There were zero messages in the kernel log.
> > 
> >   # dmesg -C
> >   # cd /tmp
> >   # mkdir testdir
> >   # ln -s testdir testlink
> >   # chown -h nobody testlink
> >   # cd testlink
> >   cd: permission denied: testlink
> >   # dmesg
> >   (no output)
> 
> Well that's sad. :( Two situations I can think of for that:
> - the kernel wasn't build with CONFIG_AUDIT

Indeed, I do not have this option enabled.  Why would I have it?  The
description says it's for SELinux, which I do not use.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.6

2012-10-04 Thread Nick Bowler
On 2012-10-04 08:49 -0700, Kees Cook wrote:
> On Thu, Oct 04, 2012 at 09:35:04AM -0400, Nick Bowler wrote:
> > On 2012-10-03 13:54 -0700, Linus Torvalds wrote:
> > > On Wed, Oct 3, 2012 at 1:49 PM, Kees Cook  wrote:
> > > > I think the benefits of this being on by default outweigh glitches
> > > > like this. Based on Nick's email, it looks like a directory tree of his
> > > > own creation.
> > > 
> > > I agree that *one* report like this doesn't necessarily mean that we
> > > need to turn it off, if Nick is happy to just fix up his script and
> > > it's all local.
> > > 
> > > However, I suspect we'll see more. And once that happens, we're not
> > > going to keep a default that breaks peoples old scripts, and we're
> > > going to have to rely on distributions (or users) explicitly setting
> > > it.
> > 
> > Yes, it is a directory of my own creation, intended as a place for users
> > (read: me) to stick stuff on the local disk as opposed to on NFS.  It's
> > pretty trivial for me to fixup everything to not need this symlink
> > anymore (and I suspect it is the only offender); I just created the
> > symlink in the first place so that I wouldn't have to change anything
> > else.
> > 
> > (While on /this/ machine I created the directory, I have used shared lab
> > machines with a similar setup).
> > 
> > The thing that bothers me most about all this is that it's basically
> > impossible to see why things are failing without digging through the git
> > tree or posting to the mailing list (or recalling earlier mailing list
> > discussions about the restriction, as I vaguely do now).  You just
> > suddenly get "permission denied" errors when all the permissions
> > involved look fine.  As far as I know, the owner, group and mode of
> > symlinks have always been completely meaningless.  Upgrade to 3.6, and
> > they're suddenly meaningful in extremely non-obvious ways.
> 
> FWIW, there should have been an audit message about it in dmesg.

There were zero messages in the kernel log.

  # dmesg -C
  # cd /tmp
  # mkdir testdir
  # ln -s testdir testlink
  # chown -h nobody testlink
  # cd testlink
  cd: permission denied: testlink
  # dmesg
  (no output)

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.6

2012-10-04 Thread Nick Bowler
On 2012-10-03 13:54 -0700, Linus Torvalds wrote:
> On Wed, Oct 3, 2012 at 1:49 PM, Kees Cook  wrote:
> > I think the benefits of this being on by default outweigh glitches
> > like this. Based on Nick's email, it looks like a directory tree of his
> > own creation.
> 
> I agree that *one* report like this doesn't necessarily mean that we
> need to turn it off, if Nick is happy to just fix up his script and
> it's all local.
> 
> However, I suspect we'll see more. And once that happens, we're not
> going to keep a default that breaks peoples old scripts, and we're
> going to have to rely on distributions (or users) explicitly setting
> it.

Yes, it is a directory of my own creation, intended as a place for users
(read: me) to stick stuff on the local disk as opposed to on NFS.  It's
pretty trivial for me to fixup everything to not need this symlink
anymore (and I suspect it is the only offender); I just created the
symlink in the first place so that I wouldn't have to change anything
else.

(While on /this/ machine I created the directory, I have used shared lab
machines with a similar setup).

The thing that bothers me most about all this is that it's basically
impossible to see why things are failing without digging through the git
tree or posting to the mailing list (or recalling earlier mailing list
discussions about the restriction, as I vaguely do now).  You just
suddenly get "permission denied" errors when all the permissions
involved look fine.  As far as I know, the owner, group and mode of
symlinks have always been completely meaningless.  Upgrade to 3.6, and
they're suddenly meaningful in extremely non-obvious ways.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.6

2012-10-03 Thread Nick Bowler
On 2012-09-30 17:38 -0700, Linus Torvalds wrote:
> So here it is, 3.6 final. Sure, I'd have been happier with even fewer
> changes, but that just never happens. And holding off the release
> until people get too bored to send me the small stuff just makes the
> next merge window more painful.

Just upgraded to 3.6 from 3.5, and now some of my kernel build scripts
are throwing "permission denied" errors.  Apparently symlinks are
broken somehow?

  # id
  uid=0(root) gid=0(root) 
groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),20(dialout),26(tape),27(video)

  # ls -l /scratch_space/linux
  drwxr-xr-x 24 nbowler eng 4096 2012-10-03 13:41 /scratch_space/linux

  # readlink /scratch_space/linux-2.6
  linux

  # cd /scratch_space/linux
  # pwd
  /scratch_space/linux

  # cd /scratch_space/linux-2.6
  cd: permission denied: /scratch_space/linux-2.6

WTF?  3.5 is fine.  I will try to bisect this later, but I figured I'd
throw this out there now in case anyone has any ideas...

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/7] string: introduce helper to get base file name from given path

2012-10-03 Thread Nick Bowler
On 2012-10-02 11:12 -0700, Greg KH wrote:
> On Tue, Oct 02, 2012 at 08:52:05PM +0300, Andy Shevchenko wrote:
> > On Tue, Oct 2, 2012 at 8:34 PM, Greg KH  wrote:
> > > On Tue, Oct 02, 2012 at 06:00:54PM +0300, Andy Shevchenko wrote:
[...]
> > >> +/**
> > >> + * kbasename - return the last part of a pathname.
> > >> + *
> > >> + * @path: path to extract the filename from.
> > >> + */
> > >> +static inline const char *kbasename(const char *path)
> > >> +{
> > >> + const char *tail = strrchr(path, '/');
> > >> + return tail ? tail + 1 : path;
> > >
> > > What happens if '/' is the last thing in the string?  You will then
> > > point to an empty string, which I don't think all callers of this
> > > function is assuming going to work properly (hint, the USB caller will
> > > not...)
> > Thanks for pointing to that. I think it's a usb specific case, so, I
> > assume your comment related to that patch.
> 
> Well, if you want your kbasename() function to work like the basename(3)
> function, you need to properly handle a trailing '/' character.

Specifically, POSIX basename trims trailing '/' characters, so

  char foo[] = "a/string/with/trailing/slashes///";
  basename(foo);

results in a string that compares equal to "slashes".  This implies that
it must either modify the provided string or copy it somewhere else
(POSIX admits either behaviour).

On the other hand, GNU basename does not trim trailing '/' characters
and returns the empty string in this case.  It's truly unfortunate that
glibc contains two different functions called basename, but regardless,
the behaviour of the function in this proposal is certainly not
unprecedented.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: kernel BUG at fs/nfs/idmap.c:684!

2012-09-17 Thread Nick Bowler
Hello Bryan,

On 2012-09-17 09:02 -0400, Bryan Schumaker wrote:
> Two questions:
> 1) How sure are you that you're using v3?  Make sure you're using the
> 'vers=3' mount option.  I'm not sure which version of nfs-utils Debian
> uses, but recent versions set v4 as the default.

I looked in /proc/mounts and all of them had vers=3.  But...

> 2) Can you try Linux 3.5.4?  Patches for the idmapper issue have
> already been accepted.

...indeed, Linux 3.5.4 seems to be working.  Moreover, if I look at
/proc/mounts now I do in fact see a single line with vers=4.  I guess
the v4 mount is what crashed so it didn't show up in the list.

Thanks,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: kernel BUG at fs/nfs/idmap.c:684!

2012-09-14 Thread Nick Bowler
Hi folks,

I just upgraded an NFSv3 client machine to Linux 3.5.3 and am seeing the
following BUG.  It occurs reproducibly a short time after the first
login in to the machine after boot (within one minute).  There's a lot
of nfs4-looking functions in backtrace, which is weird as there are
absolutely no NFSv4 mounts.  The system still mostly works, although
tab completion in my shell seems to either not complete anything or
hang forever, which could be related as there are NFS-mounted
directories in my PATH.

Curiously, there are other machines on the same network running this
same kernel version that do *not* have this problem.  A couple unique
things about the crashing machine that immediately come to mind...

 - Userspace is running Debian stable (so tends to be pretty old).
 - Its onboard network is quite different from our other machines
   (it has a Marvell chipset using the sky2 driver).

Please let me know if you need any more info.

[ cut here ]
kernel BUG at fs/nfs/idmap.c:684!
invalid opcode:  [#1] PREEMPT SMP 
CPU 0 
Modules linked in: ah4 xfrm4_mode_transport nfs lockd auth_rpcgss nfs_acl 
sunrpc autofs4 acpi_cpufreq mperf deflate zlib_deflate ctr aes_x86_64 
aes_generic des_generic cbc sha512_generic sha256_generic sha1_ssse3 
sha1_generic md5 hmac crypto_null af_key xfrm_algo ipv6 loop 
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm snd_seq snd_timer 
snd_seq_device snd soundcore snd_page_alloc i2c_i801 skge coretemp hwmon 
lpc_ich evdev mfd_core sky2

Pid: 2450, comm: mount.nfs Not tainted 3.5.3 #15 LENOVO 0841A5U/LENOVO
RIP: 0010:[]  [] 
nfs_idmap_legacy_upcall+0x10c/0x15e [nfs]
RSP: 0018:8800755c53f8  EFLAGS: 00010286
RAX: 0015 RBX: 880078d4ba80 RCX: 
RDX: 0080 RSI: 880076e40ed9 RDI: 880078d4ba97
RBP: 8800755c5448 R08: 880078d4ba82 R09: 0015
R10: 88007f40b000 R11: 8800755c5288 R12: 880078dfe880
R13: 880076dc1f40 R14: 880078dfebc0 R15: 880078d4b780
FS:  7fb0bc00d700() GS:88007f40() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 00dd9000 CR3: 755b8000 CR4: 000407f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process mount.nfs (pid: 2450, threadinfo 8800755c4000, task 
88007b838000)
Stack:
 0031396364396565 81364e2a 880076e40ec4 880076e40ed9
 81625be0 880078d4b780 880076dc1f40 880078d4bb40
 880076c16600 880076c166c0 8800755c54e8 8115bb02
Call Trace:
 [] ? kmemleak_alloc+0x21/0x3e
 [] request_key_and_link+0x306/0x389
 [] ? create_object+0x27e/0x290
 [] request_key_with_auxdata+0x1b/0x4c
 [] nfs_idmap_request_key+0xcd/0x187 [nfs]
 [] nfs_idmap_get_key+0x7a/0x99 [nfs]
 [] nfs_idmap_lookup_id+0x23/0x52 [nfs]
 [] nfs_map_group_to_gid+0x53/0x5a [nfs]
 [] decode_getfattr_attrs+0x591/0xa0c [nfs]
 [] ? __switch_to+0x2d/0x355
 [] T.1600+0x78/0xab [nfs]
 [] decode_getfattr+0xe/0x10 [nfs]
 [] nfs4_xdr_dec_lookup_root+0x54/0x5d [nfs]
 [] ? rpc_queue_empty+0x29/0x29 [sunrpc]
 [] ? nfs4_xdr_dec_link+0xa9/0xa9 [nfs]
 [] rpcauth_unwrap_resp+0x56/0x61 [sunrpc]
 [] ? rpc_queue_empty+0x29/0x29 [sunrpc]
 [] ? nfs4_xdr_dec_link+0xa9/0xa9 [nfs]
 [] call_decode+0x2c3/0x31b [sunrpc]
 [] __rpc_execute+0x51/0x179 [sunrpc]
 [] ? wake_up_bit+0x20/0x25
 [] rpc_execute+0x27/0x2b [sunrpc]
 [] rpc_run_task+0x79/0x81 [sunrpc]
 [] rpc_call_sync+0x3f/0x60 [sunrpc]
 [] _nfs4_call_sync+0xe/0x10 [nfs]
 [] _nfs4_lookup_root+0x9a/0xa8 [nfs]
 [] nfs4_lookup_root+0x37/0x62 [nfs]
 [] nfs4_proc_get_rootfh+0x23/0x95 [nfs]
 [] nfs4_get_rootfh+0x36/0xb6 [nfs]
 [] ? kmemleak_alloc+0x21/0x3e
 [] ? kmem_cache_alloc+0xc9/0xd8
 [] nfs4_server_common_setup+0x57/0xc7 [nfs]
 [] nfs4_create_server+0x1d7/0x202 [nfs]
 [] ? kmemleak_alloc_percpu+0x63/0x94
 [] nfs4_remote_mount+0x36/0x5f [nfs]
 [] mount_fs+0x6b/0x14f
 [] ? __alloc_percpu+0xb/0xd
 [] vfs_kern_mount+0x66/0xdf
 [] nfs_do_root_mount+0x96/0xb5 [nfs]
 [] nfs_fs_mount+0x7bf/0x8f8 [nfs]
 [] ? nfs_fill_super+0xc4/0xc4 [nfs]
 [] ? nfs_request_mount+0x1b2/0x1b2 [nfs]
 [] mount_fs+0x6b/0x14f
 [] ? __alloc_percpu+0xb/0xd
 [] vfs_kern_mount+0x66/0xdf
 [] do_kern_mount+0x48/0xd8
 [] do_mount+0x718/0x77b
 [] sys_mount+0x83/0xbd
 [] system_call_fastpath+0x16/0x1b
Code: b3 84 00 00 00 48 8d 7d c0 c6 43 01 00 e8 59 1e fb e0 85 c0 49 89 5c 24 
10 49 c7 44 24 18 8c 00 00 00 78 1e 49 83 7e 08 00 74 04 <0f> 0b eb fe 49 8b 3e 
4d 89 6e 08 4c 89 e6 e8 1f 27 fa ff 85 c0 
RIP  [] nfs_idmap_legacy_upcall+0x10c/0x15e [nfs]
 RSP 
---[ end trace ee7d4fa42e626e1f ]---
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
P

Re: [PATCH v2] lib: gcd: prevent possible div by 0

2012-09-12 Thread Nick Bowler
On 2012-09-12 12:36 -0700, Andrew Morton wrote:
> On Wed, 12 Sep 2012 21:20:30 +0200
> Davidlohr Bueso  wrote:
> > On Wed, 2012-09-12 at 12:10 -0700, Andrew Morton wrote:
> > > On Mon, 10 Sep 2012 16:35:19 +0200
> > > Davidlohr Bueso  wrote:
> > > 
> > > > Account for all properties when a and/or b are 0:
> > > > gcd(0, 0) = 0
> > > > gcd(a, 0) = a
> > > > gcd(0, b) = b
[...]
> I'm scratching my head a bit at the patch though.  What does gcd(0, 13)
> mean?  That 0 can be divided by 13 zero times, which is an integer
> result?  I wonder why any non-buggy code would do that

The number-theoretical definition of gcd(a, b) on the integers, leaving
aside the case where a and b are both 0, are defined as the greatest
integer which divides both a and b.

An integer x divides y if there exists an integer M such that x*M
equals y.

Observe that all integers divide zero (since we can set M to 0, and
x*0 = 0 for any x).  So it's easy to see that the result of gcd(x, 0)
and gcd(0, x) must be |x|.

The case of gcd(0, 0) is tricky.  Clearly, as all integers divide zero,
none of these can be the greatest one.  So this is normally treated as a
special case, defined to be 0 by convention, as this makes the use of
gcd "nicer" in other areas of mathematics.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


NFSv4 WARNING: at linux/fs/inode.c:280 drop_nlink+0x23/0x44()

2012-08-18 Thread Nick Bowler
Hi folks,

I just noticed the following WARNING in my logs.  Looking through the
older logs, I see quite a few of these going (at least) all the way back
to 3.3.x days.  The process which triggers the warning always seems to
be the same (icecat).

This is on a Linux 3.5.2 NFSv4 client machine using sec=krb5.  Other
than the log noise, there seems to be no adverse effects.

[ cut here ]
WARNING: at /home/nbowler/misc/linux/fs/inode.c:280 drop_nlink+0x23/0x44()
Hardware name: System Product Name
Modules linked in: sha1_ssse3 sha1_generic hmac aes_x86_64 aes_generic cbc cts 
rpcsec_gss_krb5 nfs lockd auth_rpcgss nfs_acl sunrpc ipv6 nls_iso8859_1 
nls_cp437 vfat fat w83627ehf hwmon_vid snd_pcm_oss snd_mixer_oss acpi_cpufreq 
mperf i915 drm_kms_helper drm snd_hda_codec_hdmi snd_hda_codec_realtek arc4 
ath9k mac80211 ath9k_common ath9k_hw intel_agp i2c_algo_bit intel_gtt 
snd_hda_intel snd_hda_codec snd_pcm snd_timer coretemp agpgart hwmon kvm_intel 
snd ath cfg80211 soundcore snd_page_alloc i2c_i801 r8169 mii psmouse kvm evdev 
video sg
Pid: 1995, comm: icecat Not tainted 3.5.2 #46
Call Trace:
 [] warn_slowpath_common+0x80/0x98
 [] warn_slowpath_null+0x15/0x17
 [] drop_nlink+0x23/0x44
 [] nfs_dentry_iput+0x35/0x4d [nfs]
 [] dentry_kill+0x149/0x171
 [] dput+0xed/0xfe
 [] fput+0x1a5/0x1bd
 [] filp_close+0x6b/0x76
 [] sys_close+0x92/0xd4
 [] system_call_fastpath+0x16/0x1b
---[ end trace b160c7dc08b4910c ]---

Please let me know if you need any more info,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 01/37] Remove easily user-triggerable BUG from generic_setlease

2012-07-18 Thread Nick Bowler
Hi Greg,

On 2012-07-17 17:14 -0700, Greg KH wrote:
> Argh, I give up, I just can't get message threading working properly
> these days.  These are the 3.4 patches, in response to the 3.0 patches,
> and it took 2 tries to send them out due to formail doing wierd things.
> 
> Sorry about this people, I'll just give up and use 'git send-email' from
> now on, as it doesn't cause as many problems as my old scripts seem to.

In addition to the threading problems, patch 23/23 from the 3.0-stable
series seems to have never made it to the list.  All the rest are there.

Cheers,
-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] drivers/rtc/rtc-pcf8563.c: add device tree support.

2012-07-18 Thread Nick Bowler
Set the of_match_table for this driver so that devices can be described
in the device tree.

Signed-off-by: Nick Bowler 
---
 drivers/rtc/rtc-pcf8563.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/rtc/rtc-pcf8563.c b/drivers/rtc/rtc-pcf8563.c
index 24a9d6a..c2fe426 100644
--- a/drivers/rtc/rtc-pcf8563.c
+++ b/drivers/rtc/rtc-pcf8563.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DRV_VERSION "0.4.3"
 
@@ -285,10 +286,19 @@ static const struct i2c_device_id pcf8563_id[] = {
 };
 MODULE_DEVICE_TABLE(i2c, pcf8563_id);
 
+#ifdef CONFIG_OF
+static const struct of_device_id pcf8563_of_match[] __devinitconst = {
+   { .compatible = "nxp,pcf8563" },
+   {}
+};
+MODULE_DEVICE_TABLE(of, pcf8563_of_match);
+#endif
+
 static struct i2c_driver pcf8563_driver = {
.driver = {
.name   = "rtc-pcf8563",
.owner  = THIS_MODULE,
+   .of_match_table = of_match_ptr(pcf8563_of_match),
},
.probe  = pcf8563_probe,
.remove = pcf8563_remove,
-- 
1.7.8.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] drivers/rtc/rtc-pcf8563.c: set owner field in driver struct.

2012-07-18 Thread Nick Bowler
The owner member is supposed to be set to the module implementing the
device driver, i.e., THIS_MODULE.  This enables the appropriate module
link in sysfs.

Signed-off-by: Nick Bowler 
---
 drivers/rtc/rtc-pcf8563.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/rtc/rtc-pcf8563.c b/drivers/rtc/rtc-pcf8563.c
index 97a3284..24a9d6a 100644
--- a/drivers/rtc/rtc-pcf8563.c
+++ b/drivers/rtc/rtc-pcf8563.c
@@ -288,6 +288,7 @@ MODULE_DEVICE_TABLE(i2c, pcf8563_id);
 static struct i2c_driver pcf8563_driver = {
.driver = {
.name   = "rtc-pcf8563",
+   .owner  = THIS_MODULE,
},
.probe  = pcf8563_probe,
.remove = pcf8563_remove,
-- 
1.7.8.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Patch for Apani Nortel VPN Client to build against kernel 2.6.22 help/review

2007-08-22 Thread Nick Bowler
I haven't read too much into the patch, but some quick comments.  Most applies
throughout:

> +#if (LINUX_VERSION_CODE >= 0x020616)
KERNEL_VERSION(2,6,22) is much more readable than 0x020616.

> +  return (struct iphdr*) skb->network_header;
Should be return ip_hdr(skb);

> +skb->network_header = skb->data;
Should be skb_reset_network_header(skb);

> +iph = skb->network_header.iph;
Probably meant skb->nh.iph.

> +  if ( iph->protocol == 17 )/* if UDP, */
(snip)
> +  if ( iph->protocol == 6 ) /* if TCP, */
Could use IPPROTO_UDP and IPPROTO_TCP instead of 17 and 6, respectively.

Instead of littering the code with #if blah #else blah, you could also simply
provide implementations for the 2.6.22 functions #if LINUX_VERSION_CODE <
KERNEL_VERSION(2,6,22).

-- 
Nick Bowler, Elliptic Semiconductor (http://www.ellipticsemi.com/)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] AH4: Update IPv4 options handling to conform to RFC 4302.

2007-08-21 Thread Nick Bowler
In testing our ESP/AH offload hardware, I discovered an issue with how AH
handles mutable fields in IPv4.  RFC 4302 (AH) states the following on the
subject:

For IPv4, the entire option is viewed as a unit; so even
though the type and length fields within most options are immutable
in transit, if an option is classified as mutable, the entire option
is zeroed for ICV computation purposes.

The current implementation does not zero the type and length fields, resulting
in authentication failures when communicating with hosts that do (i.e. FreeBSD).

I have tested record route and timestamp options (ping -R and ping -T) on a
small network involving Windows XP, FreeBSD 6.2, and Linux hosts, with one
router.  In the presence of these options, the FreeBSD and Linux hosts (with
the patch or with the hardware) can communicate.  The Windows XP host simply
fails to accept these packets with or without the patch.

I have also been trying to test source routing options (using traceroute -g),
but haven't had much luck getting this option to work *without* AH, let alone
with.

Signed-off-by: Nick Bowler <[EMAIL PROTECTED]>
---
 net/ipv4/ah4.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index 7a23e59..39f6211 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -46,7 +46,7 @@ static int ip_clear_mutable_options(struct iphdr *iph, __be32 
*daddr)
memcpy(daddr, optptr+optlen-4, 4);
/* Fall through */
default:
-   memset(optptr+2, 0, optlen-2);
+   memset(optptr, 0, optlen);
}
l -= optlen;
optptr += optlen;
-- 
1.5.2.2

-- 
Nick Bowler, Elliptic Semiconductor (http://www.ellipticsemi.com/)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/