Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-04 Thread Bernd Edlinger
On 12/4/20 9:10 PM, Linus Torvalds wrote:
> On Fri, Dec 4, 2020 at 11:35 AM Eric W. Biederman  
> wrote:
>>
>> From a deadlock perspective the change is strictly better than what we
>> have today.  The readers will no longer block on each other.
> 
> No, agreed, it's better regardless.
> 
>> For the specific case that syzbot reported it is readers who were
>> blocking on each other so that specific case if fixed.
> 
> So the thing is, a reader can still block another reader if a writer
> comes in between them. Which is why I was thinking that the same
> deadlock could happen if somebody does an execve at just the right
> point.
> 
>> On the write side of exec_update_lock we have:
>>
>> cred_guard_mutex -> exec_update_lock
>>
>> Which means that to get an ABBA deadlock cred_guard_mutex would need to
>> be involved
> 
> No, see above: you can get a deadlock with
> 
>  - first reader gets exec_update_lock
> 
>  - writer on exec_update_lock blocks on first reader (this is exec)
> 
>  - second reader of exec_update_lock now blocks on the writer.
> 
> So if that second reader holds something that the first one wants to
> get (or is the same thread as the first one), you have a deadlock: the
> first reader will never make any progress, will never release the
> lock, and the writer will never get it, and the second reader will
> forever wait for the writer that is ahead of it in the queue.
> 
> cred_guard_mutex is immaterial, it's not involved in the deadlock.
> Yes, the writer holds it, but it's not relevant for anything else.
> 
> And that deadlock looks very much like what syzcaller detected, in
> exactly that scenario:
> 
>   Chain exists of:
> >exec_update_mutex --> sb_writers#4 --> >lock
> 
>Possible unsafe locking scenario:
> 
>  CPU0CPU1
>  
> lock(>lock);
>  lock(sb_writers#4);
>  lock(>lock);
> lock(>exec_update_mutex);
> 
>*** DEADLOCK ***
> 
> No?  The only thing that is missing is that writer that causes the
> exec_update_mutex readers to block each other - exactly like they did
> when it was a mutex.
> 
> But I may be missing something entirely obvious that keeps this from 
> happening.
> 

I think you convinced me:

>perf_event_open  (exec_update_mutex -> ovl_i_mutex)
>chown(ovl_i_mutex   -> sb_writes)
>sendfile (sb_writes -> p->lock)
>  by reading from a proc file and writing to overlayfs
>proc_pid_syscall (p->lock   -> exec_update_mutex)


We need just 5 Philosophers (A-E):

Philosopher A calls proc_pid_syscall, takes p->lock, and falls asleep for a 
while now.

Philosphper B calls sendfile, takes sb_writes, and has to wait on p->lock.

Philosopher C calls chown, takes ovl_i_mutes, and has to wait on sb_writes.

Philosopher D calls perf_event_open, takes down_read(exec_mutex_lock), and has 
to wait on ovl_i_mutex.

Philosopher E calls exec, and wants to take down_write(exec_mutex_lock), has to 
wait for Philosopher D.

Now Philosopher A wakes up from his sleep, and wants to take 
down_read(exec_mutex_lock), but due
to fairness reasons, he has to wait until Philosopher E gets and releases his 
write lock again.

If Philosopher A is now blocked, we have a deadlock, right?


Bernd.



>  Linus
> 


Re: [PATCH 3/3] exec: Transform exec_update_mutex into a rw_semaphore

2020-12-04 Thread Bernd Edlinger
Hi Eric,

I think I remembered from a previous discussion about this topic,
that it was unclear if the rw_semaphores are working the same
in RT-Linux.  Will this fix work in RT as well?

On 12/3/20 9:12 PM, Eric W. Biederman wrote:
> --- a/kernel/kcmp.c
> +++ b/kernel/kcmp.c
> @@ -70,25 +70,25 @@ get_file_raw_ptr(struct task_struct *task, unsigned int 
> idx)
>   return file;
>  }
>  
> -static void kcmp_unlock(struct mutex *m1, struct mutex *m2)
> +static void kcmp_unlock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>  {
> - if (likely(m2 != m1))
> - mutex_unlock(m2);
> - mutex_unlock(m1);
> + if (likely(l2 != l1))

is this still necessary ?

> + up_read(l2);
> + up_read(l1);
>  }
>  
> -static int kcmp_lock(struct mutex *m1, struct mutex *m2)
> +static int kcmp_lock(struct rw_semaphore *l1, struct rw_semaphore *l2)
>  {
>   int err;
>  
> - if (m2 > m1)
> - swap(m1, m2);
> + if (l2 > l1)
> + swap(l1, l2);

and this is probably also no longer necessary?


>  
> - err = mutex_lock_killable(m1);
> - if (!err && likely(m1 != m2)) {
> - err = mutex_lock_killable_nested(m2, SINGLE_DEPTH_NESTING);
> + err = down_read_killable(l1);
> + if (!err && likely(l1 != l2)) {

and this can now be unconditionally, right?

> + err = down_read_killable_nested(l2, SINGLE_DEPTH_NESTING);
>   if (err)
> - mutex_unlock(m1);
> + up_read(l1);
>   }
>  
>   return err;
> @@ -156,8 +156,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
>   /*
>* One should have enough rights to inspect task details.
>*/
> - ret = kcmp_lock(>signal->exec_update_mutex,
> - >signal->exec_update_mutex);
> + ret = kcmp_lock(>signal->exec_update_lock,
> + >signal->exec_update_lock);
>   if (ret)
>   goto err;
>   if (!ptrace_may_access(task1, PTRACE_MODE_READ_REALCREDS) ||
> @@ -212,8 +212,8 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
>   }
>  
>  err_unlock:
> - kcmp_unlock(>signal->exec_update_mutex,
> - >signal->exec_update_mutex);
> + kcmp_unlock(>signal->exec_update_lock,
> + >signal->exec_update_lock);
>  err:
>   put_task_struct(task1);
>   put_task_struct(task2);


Thanks
Bernd.


Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

2020-05-01 Thread Bernd Edlinger
On 4/30/20 6:40 PM, Linus Torvalds wrote:
> On Thu, Apr 30, 2020 at 7:29 AM Bernd Edlinger
>  wrote:
>>
>> Ah, now I see, that was of course not the intended effect,
>> but that is not where the pseudo-deadlock happens at all,
>> would returning -RESTARTNOINTR in this function make this
>> patch acceptable, it will not have an effect on the test case?
> 
> So that was why I suggested doing it all with a helper function, and
> also doing that
> 
> set_thread_flag(TIF_SIGPENDING);
> 
> because without going through the "check-for-signals" code at return
> to user space, -ERESTARTNOINTR doesn't actually _do_ any restart.
> 
> However, the more I looked at it, the less I actually liked that hack.
> 
> Part of it is simply because it can cause the exact same problem that
> ptrace() does (at least in theory). And even if you don't get the
> livelock thing, you can get the "use 100% CPU time" thing, because if
> that case ever triggers, and we re-try, it will generally just _keep_
> on triggering (think "execve is waiting for a zombie, nobody is
> reaping it").
> 
> IOW, restarting doesn't really fix the problem, or guarantee any
> forward progress.
> 

Right, if it is a real time process it will result in priority-inversion.
Correct.

If it is a virus checker it will be real time priority and it will be
very aggressive ;-) I can feel its aggressiveness already :-) shiver...

And this little zombie-process will paralyze it immediately, nice try.

You see what I mean?

> So I'd have been ok with your "unsafe_exec_flag" if
> 
>  (a) it had been done in one place with a helper function.
> 
>  (b) it would _only_ trigger for ptrace (and perhaps seccomp).
> 
> but I don't think it works for that write() case.
> 
> That said, I'm not 100% convinced that that write() case really even
> needs that cred_guard_mutex (renamed or not).
> 
> Maybe we can introduce a new mutex just against concurrent ptrace
> (which is what at least the _comment_ says_ that
> security_setprocattr() wants - I didn't check the actual low-level
> security code).
> 
> So maybe that proc_pid_attr_write() case could be done some other way 
> entirely.
> 
> Th emore we go through all this, the more I really think that Oleg's
> patch to just delay the waiting for things until after dropping the
> mutex in execve() is the way to go.
> 
> Is it a "simple" and small patch? No. But it really addresses the core
> issue, without introducing new odd rules or special cases, or making a
> lock that doesn't reliably work as a lock.
> 

Hmm.  I think I can agree, that this problem deserves to be solved
really slowly.

Oleg where was your last patch, does it still work or does it
need to be re-based?

And I almost forgot about Eric, are you still with us?


Thanks
Bernd.


Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

2020-04-30 Thread Bernd Edlinger
Hi Linus,

On 4/30/20 3:47 PM, Linus Torvalds wrote:
> On Thu, Apr 30, 2020 at 6:39 AM Bernd Edlinger
>  wrote:
>>
>> Excuse me, but what in my /proc folder there is no attr/something
>> is there a procfs equivalent of pthread_attach ?
>>
>> What exactly is "attr/something" ?
> 
> Anything that uses that proc_pid_attr_write().
> 
> Which you should have realized, since you wrote the patch that changed
> that function to return -EAGAIN.
> 

Ah, now I see, that was of course not the intended effect,
but that is not where the pseudo-deadlock happens at all,
would returning -RESTARTNOINTR in this function make this
patch acceptable, it will not have an effect on the test case?


Bernd.

> That's
> 
> /proc//attr/{current,exec,fscreate,keycreate,prev,sockcreate}
> 
> and some smack files.
> 
> Your patch definitely made them return -EINVAL if they happen in that
> execve() black hole, instead of waiting for the execve() to just
> complete and then just work.
> 
> Dropping a lock really is broken. It';s broken even if you then set a
> flag saying "I dropped the lock, now you can't use it".
> 
>   Linus
> 


Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

2020-04-30 Thread Bernd Edlinger
On 4/30/20 4:16 AM, Linus Torvalds wrote:
> On Wed, Apr 29, 2020 at 5:00 PM Jann Horn  wrote:
>>
>> Wouldn't you end up livelocked in the scenario that currently deadlocks?
> 
> The test case that we already know is broken, and any fix will have to
> change anyway?
> 

The purpose of the test case was only to test the behaviour of my
later patch.  The test case _must_ be adjusted to the follow-up
patch, I have no problem with that.  Anybody may change the test case
when we know how to fix the API.  I did just not anticipate that Eric
would only apply 14 of 16 patches = 87.5% of the patch series. Now that
causes some tension, but it is not really a problem IMHO.

> Let's just say that I don't care in the least.
> 
> But Bernd's patch as-is breaks a test-case that currently *works*,
> namely something as simple as
> 
>   echo xyz > /proc//attr/something
> 

Excuse me, but what in my /proc folder there is no attr/something
is there a procfs equivalent of pthread_attach ?

What exactly is "attr/something" ?

> and honestly, breaking something that _works_ and may be used in
> reality, in orderf to make a known buggy user testcase work?
> 
> Because no, "write()" returning -EAGAIN isn't ok.
> 

write can return -EAGAIN if the file is non-blocking, it is
never the case for a disk file, but for a NFS that is not at all
clear, depends on a mount option, and once I had a deadlock in
one of my test systems after OOM-stress, but I never was able
to reproduce, the umount deadlocked, then I was not able to
reboot, could be an alpha-particle of course, who knows...


Hmmm.. maybe a stupid idea:

We could keep the old deadlock-capable API,
and add a new _flag_ somewhere to the PTHREAD_ATTACH call,
that _enables_ the non-blocking behavior, how about that.



Thanks
Bernd.

> Linus
> 


Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

2020-04-29 Thread Bernd Edlinger
On 4/30/20 1:59 AM, Jann Horn wrote:
> On Thu, Apr 30, 2020 at 1:22 AM Linus Torvalds
>  wrote:
>> On Wed, Apr 29, 2020 at 3:38 PM Linus Torvalds
>>  wrote:
>>>
>>> If you do it properly, with a helper function instead of repeating
>>> that fragile nasty thing, maybe it will look better to me.

I added the BIG FAT WARNNIG comments as a mitigation for that.
Did you like those comments?

>>
>> Side note: if it has a special helper function for the "get lock,
>> repeat if it was invalid", you can do a better job than return
>> -EAGAIN.
>>
>> In particular, you can do this
>>
>> set_thread_flag(TIF_SIGPENDING);
>> return -RESTARTNOINTR;
>>
>> which will actually restart the system call. So a ptrace() user (or
>> somebody doing a "write()" to /proc//attr/xyz, wouldn't even see
>> the impossible EAGAIN error.
> 
> Wouldn't you end up livelocked in the scenario that currently deadlocks? Like:
> 
>  - tracer attaches to thread A
>  - thread B goes into execve, blocks on waiting for A's death
>  - tracer tries to attach to B and hits the -EAGAIN
> 
> If we make the PTRACE_ATTACH call restart, the tracer will just end up
> looping without ever resolving the deadlock. If we want to get through
> this cleanly with this approach, userspace needs to either
> deprioritize the "I want to attach to pid X" and go back into its
> eventloop, or to just treat -EAGAIN as a fatal error and give up
> trying to attach to that task.
> 

Yes, exactly, the point is the caller is expected to call wait in that
scenario, otherwise the -EAGAIN just repeats forever, that is an API
change, yes, but something unavoidable, and the patch tries hard to
limit it to cases where the live-lock or pseudo-dead-lock is unavoidable
anyway.


Bernd.


Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

2020-04-29 Thread Bernd Edlinger
On 4/29/20 9:26 PM, Jann Horn wrote:
> On Wed, Apr 29, 2020 at 9:23 PM Bernd Edlinger
>  wrote:
>> On 4/29/20 7:58 PM, Linus Torvalds wrote:
>>> On Tue, Apr 28, 2020 at 4:36 PM Jann Horn  wrote:
>>>>
>>>> On Wed, Apr 29, 2020 at 12:14 AM Linus Torvalds
>>>>  wrote:
>>>>>
>>>>>  - we move check_unsafe_exec() down. As far as I can tell, there's no
>>>>> reason it's that early - the flags it sets aren't actually used until
>>>>> when we actually do that final set_creds..
>>>>
>>>> Right, we should be able to do that stuff quite a bit later than it 
>>>> happens now.
>>>
>>> Actually, looking at it, this looks painful for multiple reasons.
>>>
>>> The LSM_UNSAFE_xyz flags are used by security_bprm_set_creds(), which
>>> when I traced it through, happened much earlier than I thought. Making
>>> things worse, it's done by prepare_binprm(), which also potentially
>>> gets called from random points by the low-level binfmt handlers too.
>>>
>>> And we also have that odd "fs->in_exec" flag, which is used by thread
>>> cloning and io_uring, and I'm not sure what the exact semantics are.
>>>
>>> I'm _almost_ inclined to say that we should just abort the execve()
>>> entirely if somebody tries to attach in the middle.
>>>
>>> IOW, get rid of the locking, and replace it all just with a sequence
>>> count. Make execve() abort if the sequence count has changed between
>>> loading the original creds, and having installed the new creds.
>>>
>>> You can ptrace _over_ an execve, and you can ptrace _after_ an
>>> execve(), but trying to attach just as we execve() would just cause
>>> the execve() to fail.
>>>
>>> We could maybe make it conditional on the credentials actually having
>>> changed at all (set another flag in bprm_fill_uid()). So it would only
>>> fail for the suid exec case.
>>>
>>> Because honestly, trying to ptrace in the middle of a suid execve()
>>> sounds like an attack, not a useful thing.
>>>
>>
>> I think the use case where a program attaches and detaches many
>> processes at a high rate, is either an attack or a very aggressive
>> virus checker, fixing a bug that prevents an attack is not a good
>> idea, but fixing a bug that would otherwise break a virus checker
>> would be a good thing.
>>
>> By the way, all other attempts to fix it look much more dangerous
>> than my initially proposed patch, you know the one you hated, but
>> it does work and does not look overly complicated either.
>>
>> What was the reason why that cannot be done this way?
> 
> I'm not sure which patch you're talking about - I assume you don't
> mean 
> <https://lore.kernel.org/lkml/am6pr03mb5170b06f3a2b75efb98d071ae4...@am6pr03mb5170.eurprd03.prod.outlook.com/>?
> 

No, I meant:

[PATCH v7 15/16] exec: Fix dead-lock in de_thread with ptrace_attach
https://marc.info/?l=linux-kernel=158559277631548=2

and

[PATCH v6 16/16] doc: Update documentation of ->exec_*_mutex
https://marc.info/?l=linux-kernel=158559277631548=2


I think that was the latest version, but this had several iterations already.

Thanks
Bernd.


Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

2020-04-29 Thread Bernd Edlinger
On 4/29/20 7:58 PM, Linus Torvalds wrote:
> On Tue, Apr 28, 2020 at 4:36 PM Jann Horn  wrote:
>>
>> On Wed, Apr 29, 2020 at 12:14 AM Linus Torvalds
>>  wrote:
>>>
>>>  - we move check_unsafe_exec() down. As far as I can tell, there's no
>>> reason it's that early - the flags it sets aren't actually used until
>>> when we actually do that final set_creds..
>>
>> Right, we should be able to do that stuff quite a bit later than it happens 
>> now.
> 
> Actually, looking at it, this looks painful for multiple reasons.
> 
> The LSM_UNSAFE_xyz flags are used by security_bprm_set_creds(), which
> when I traced it through, happened much earlier than I thought. Making
> things worse, it's done by prepare_binprm(), which also potentially
> gets called from random points by the low-level binfmt handlers too.
> 
> And we also have that odd "fs->in_exec" flag, which is used by thread
> cloning and io_uring, and I'm not sure what the exact semantics are.
> 
> I'm _almost_ inclined to say that we should just abort the execve()
> entirely if somebody tries to attach in the middle.
> 
> IOW, get rid of the locking, and replace it all just with a sequence
> count. Make execve() abort if the sequence count has changed between
> loading the original creds, and having installed the new creds.
> 
> You can ptrace _over_ an execve, and you can ptrace _after_ an
> execve(), but trying to attach just as we execve() would just cause
> the execve() to fail.
> 
> We could maybe make it conditional on the credentials actually having
> changed at all (set another flag in bprm_fill_uid()). So it would only
> fail for the suid exec case.
> 
> Because honestly, trying to ptrace in the middle of a suid execve()
> sounds like an attack, not a useful thing.
> 

I think the use case where a program attaches and detaches many
processes at a high rate, is either an attack or a very aggressive
virus checker, fixing a bug that prevents an attack is not a good
idea, but fixing a bug that would otherwise break a virus checker
would be a good thing.

By the way, all other attempts to fix it look much more dangerous
than my initially proposed patch, you know the one you hated, but
it does work and does not look overly complicated either.

What was the reason why that cannot be done this way?


Thanks,
Bernd.



Re: [PATCHv5] random: Make /dev/random wait for input_pool initializedy

2019-02-22 Thread Bernd Edlinger
On 2/22/19 12:18 AM, Theodore Y. Ts'o wrote:
> The whole premise of reading from /dev/random is that it should only
> allow reads up to available estimated entropy.  I'm assuming here that
> sane users of /dev/random will be reading in chunks of at least 128
> bits, reading smaller amounts for, say, a 32-bit SPECK key, is not
> someone who is paranoid enough to want to use /dev/random is likely to
> want to do.  So what I was doing was to simply prevent any reads from
> /dev/random until it had accumulated 128 bits worth of entropy.  If
> the user is reading 128 bits in order to generate a 128-bit session
> key, this won't actually slow down /dev/random any more that it does
> today.
> 
> It will slow down someone who just wants to read a byte from
> /dev/random immediately after boot --- but as far as I'm concerned,
> that's crazy, so I don't really care about optimizing it.  Your
> suggestion of simply not allowing any reads until the CRNG is
> initialized, and then injecting 128-bits into the blocking pool would
> also work, but it wouldn't speed up the use case of "the user is
> trying to read 128 bits from /dev/random".  It only speeds up "read 1
> byte from /dev/random".
> 
> Personally, I would generally be simply tell users, "use getrandom(2)
> and be happy", and I consider /dev/random to be a legacy interface.
> It's just that there are some crazy customers who seem to believe that
> /dev/random is required for FIPS compliance.
> 

Sure.

> So optimizing for users who want to read vast amount of data from
> /dev/random is a non-goal as far as I am concerned.  In particular,
> seeding the CRNG and keeping it properly reseeded is higher priority
> as far as I'm concerned.  If that slows down /dev/random a bit,
> /dev/random is *always* going to be slow.
> 

There are rumors that reading one byte from /dev/random in the rcS
script makes /dev/urandom properly seeded.  That sounds logical,
but does not work as expected due to a bug that I am trying to fix.

>>> -   struct entropy_store *other = _pool;
>>> -
>>> -   if (other->entropy_count <=
>>> -   3 * other->poolinfo->poolfracbits / 4) {
>>> -   schedule_work(>push_work);
>>> -   r->entropy_total = 0;
>>> -   }
>>> -   }
>>> +   if (!work_pending(>push_work) &&
>>> +   (ENTROPY_BITS(r) > 6 * r->poolinfo->poolbytes) &&
>>> +   (ENTROPY_BITS(other) <= 6 * other->poolinfo->poolbytes))
>>> +   schedule_work(>push_work);
>>
>> push_to_pool will transfer chunks of random_read_wakeup_bits.
>>
>> I think push_to_pool should also match this change.
> 
> I was trying to keep the size of this patch as small as practical,
> since the primary goal was to improve the security of the bits
> returned when reading the a 128 bit of randomness immediately after
> boot.
> 

I definitely share this desire with you.

But there are also issues with the CRNG initialization, it is very important
to me not to make those worse.

>> I like it that this path is controllable via random_write_wakeup_bits,
>> that would be lost with this change.
> 
> Very few people actually use these knobs, and in fact I regret making
> them available, since changing these to insane values can impact the
> security properties of /dev/random.  I don't actually see a good
> reason why a user *would* want to adjust the behavior of this code
> path, and it makes it much simpler to reason about how this code path
> works if we don't make it controllable by the user.
> 
>>> @@ -1553,6 +1548,11 @@ static ssize_t extract_entropy_user(struct 
>>> entropy_store *r, void __user *buf,
>>> int large_request = (nbytes > 256);
>>>  
>>> trace_extract_entropy_user(r->name, nbytes, ENTROPY_BITS(r), _RET_IP_);
>>> +   if (!r->initialized && r->pull) {
>>> +   xfer_secondary_pool(r, ENTROPY_BITS(r->pull)/8);
>>> +   if (!r->initialized)
>>> +   return 0;
>>
>> Did you want to do that in push_to_pool?
> 
> No, this was deliberate.  The point here is that if the blocking pool
> is not initialized (which is defined as having accumulated 128 bits of
> entropy once), we refuse to return any entropy at all.
> 

Somehow the entropy is not returned, but still transferred from the input_pool
to the blocking_pool, which prevents the initialization of the CRNG.
And finally the blocking_pool is initialized, but not the CRNG.

See my tests below.


>> The second part of the _random_read does not match this change:
>>
>> wait_event_interruptible(random_read_wait,
>> ENTROPY_BITS(_pool) >=
>> random_read_wakeup_bits);
>> if (signal_pending(current))
>> return -ERESTARTSYS;
>>
>>
>> and will go into a busy loop, when
>> ENTROPY_BITS(_pool) >= random_read_wakeup_bits, right?
> 
> No, because what's being tested is the 

Re: [PATCHv5] random: Make /dev/random wait for input_pool initializedy

2019-02-21 Thread Bernd Edlinger
Hi Theodore,

On 2/21/19 1:32 AM, Theodore Y. Ts'o wrote:
> Hi Brend,
> 
> I've been looking at your patch, and as far as the core part of what
> you're patching, I think this patch below is a conceptually simpler
> way of fixing the original core problem.  Please take a look and let
> me know what you think.
> 
> There are some other auxiliary things that your patch is trying to do
> where I'm not sure what you're trying to doing and I'm not at all sure
> it's correct.  Those things should really get separated out as a
> separate patches.
> 

Yes, that is true.  It just got more each time I looked at the code.
Especially the jiffies wrap-around issue is an independent thing.

>> Another minor tweak this patch makes, is when the primary
>> CRNG is periodically reseeded, we reserve twice the amount
>> of read_wakeup_threshold for fairness reasons, to keep
>> /dev/random readable when it is not accessed by user mode.
> 
> I'm not sure what you're trying to do/fix here.
> 

It is the select function that is not working.

What I try to fix is that you can select on /dev/random for
readable, and that it guarantees that at least one process
will be able to read at least one byte out of the device,
and if that is not done, it should stay in that state.
Additionally if the first byte is read out of /dev/random
or /dev/random is selected for readable, that should
ensure that /dev/urandom is also fully initialized.

Additionally I want to achieve that the CRNG is initialized
as fast as possible and once that is done, the /dev/random
is working as soon as possible.  And that the re-seeding
of the CRNG is not creating race conditions which invalidate
the readability guarantee from the select.

Currently on a low entropy system the entroy_available does
count from 0 to 128, where CRNG initializes then again
from 0 to 128..256, where CRNG reseeds then again
from 0 to 128..256, where CRNG reseeds then again...
so each time the entropy_available is in the range 0..63
select indicates !ready and then again ready, so it toggles
all the time.  That is what I wanted to fix mainly.





> - Ted
> 
> From 9275727af22bc08958bad45d604038aa8e9b6766 Mon Sep 17 00:00:00 2001
> From: Theodore Ts'o 
> Date: Wed, 20 Feb 2019 16:06:38 -0500
> Subject: [PATCH] random: only read from /dev/random after its pool has 
> received 128 bits
> 
> Immediately after boot, we allow reads from /dev/random before its
> entropy pool has been fully initialized.  Fix this so that we don't
> allow this until the blocking pool has received 128 bits.
> 

My observation, with a previous attempt (v3) of my patch is that a system
where only interrupt randomness is available it takes typically 2 minutes
to accomplish the initial seeding of the CRNG, if from there one has to
wait until there is 128 bit entropy in the blocking pool it will take
much too long.  The patch was actually buggy and did only ensure 80 bits
of entropy in the blocking pool but even that did take 6.5 minutes, which
felt to me like an absolutely unacceptable waiting time.

Therefore I would like to keep that to 2-3 minutes that the CRNG takes
for it's initialization.  From there I thought, even if the entroy
in the input_pool drops to zero, the information content however is still
there, and after adding the next 64 bits if raw entropy, it would be fine
to extract 8 bytes form the input_pool and feed that to the
blocking pool, that would make 6 bytes of output, which should be
completely unpredictable, the approach taken in my v5 patch.
If that is not good enough, maybe extract 128 bits from the urandom
and inject that into the blocking pool without incrementing
the blocking_pool's entropy count.

> We do this by repurposing the initialized flag in the entropy pool
> struct, and use the initialized flag in the blocking pool to indicate
> whether it is safe to pull from the blocking pool.
> 
> To do this, we needed to rework when we decide to push entropy from the
> input pool to the blocking pool, since the initialized flag for the
> input pool was used for this purpose.  To simplify things, we no
> longer use the initialized flag for that purpose, nor do we use the
> entropy_total field any more.
> 
> Signed-off-by: Theodore Ts'o 
> ---
>  drivers/char/random.c | 44 +--
>  include/trace/events/random.h | 13 ---
>  2 files changed, 27 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 47ac7cd20fb1..e247c45b2772 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -470,7 +470,6 @@ struct entropy_store {
>   unsigned short add_ptr;
>   unsigned short input_rotate;
>   int entropy_count;
> - int entropy_total;
>   unsigned int initialized:1;
>   unsigned int last_data_init:1;
>   __u8 last_data[EXTRACT_SIZE];
> @@ -643,7 +642,7 @@ static void process_random_ready_list(void)
>   */
>  static void 

[PATCHv5] random: Make /dev/random wait for input_pool initialized

2019-02-19 Thread Bernd Edlinger
Reading from /dev/random may return data while the getrandom
syscall is still blocking.

Those bytes are not yet cryptographically secure.

The first byte from /dev/random can have as little
as 8 bits entropy estimation.  Once a read blocks, it will
block until /proc/sys/kernel/random/read_wakeup_threshold
bits are available, which is usually 64 bits, but can be
configured as low as 8 bits.  A select will wake up when
at least read_wakeup_threshold bits are available.
Also when constantly reading bytes out of /dev/random
it will prevent the crng init done event forever.

Furthermore previously the input_pool got initialized when
more than 128 bits of raw entropy are accumulated, but that
is only credited 80 bits of pool entroy.  Therefore if
random_write_wakeup_bits is configured lower than 80 bits,
the code path that sends entropy to the blocking_pool can
get activated, before the CRNG is initialized and delays
the initialization of the CRNG until the blocking_pool is
filled to 75%.

Fixed by making read and select on /dev/random wait until
the input_pool is initialized which means that more than
128 bits of entropy have been fed into the input_pool.
This is guaranteed to happen after the CRNG is ready.
So after the first byte is readable from /dev/random
also /dev/urandom is guaranteed to be fully initialized.

Another minor tweak this patch makes, is when the primary
CRNG is periodically reseeded, we reserve twice the amount
of read_wakeup_threshold for fairness reasons, to keep
/dev/random readable when it is not accessed by user mode.

And finally, when the system runs for long times the jiffies
may roll over, but crng_global_init_time is not updated
when reseed every 5 minutes happens.  This could cause
CRNG reseeding all the time, making the input_pool run
low on entropy which would make /dev/random block
unexpectedly.

Signed-off-by: Bernd Edlinger 
---
v4 makes the /dev/random block until the input_pool has
reached 128 bits of entropy at least once.  Now make
everything depend on input_pool.initialized.
Additionally fixed a potential issue with jiffies roll
over in the CRNG reseeding logic, which would cause the
cgng_global_init_time to be in the future, causing
reseeding to happen all the time.
Finally added a fairness reserve to keep the /dev/random
in the readable state (select can check non-destructively),
when nothing but CRNG reseeding happens.

v5 adds one "&& crng_ready()" to fix a race condition.
---
 drivers/char/random.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index bd449ad..652e024 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -645,6 +645,7 @@ static void process_random_ready_list(void)
 static void credit_entropy_bits(struct entropy_store *r, int nbits)
 {
int entropy_count, orig;
+   int entropy_bits;
const int pool_size = r->poolinfo->poolfracbits;
int nfrac = nbits << ENTROPY_SHIFT;
 
@@ -702,8 +703,9 @@ static void credit_entropy_bits(struct entropy_store *r, 
int nbits)
if (cmpxchg(>entropy_count, orig, entropy_count) != orig)
goto retry;
 
+   entropy_bits = entropy_count >> ENTROPY_SHIFT;
r->entropy_total += nbits;
-   if (!r->initialized && r->entropy_total > 128) {
+   if (!r->initialized && entropy_bits >= 128) {
r->initialized = 1;
r->entropy_total = 0;
}
@@ -713,8 +715,6 @@ static void credit_entropy_bits(struct entropy_store *r, 
int nbits)
  r->entropy_total, _RET_IP_);
 
if (r == _pool) {
-   int entropy_bits = entropy_count >> ENTROPY_SHIFT;
-
if (crng_init < 2 && entropy_bits >= 128) {
crng_reseed(_crng, r);
entropy_bits = r->entropy_count >> ENTROPY_SHIFT;
@@ -722,10 +722,12 @@ static void credit_entropy_bits(struct entropy_store *r, 
int nbits)
 
/* should we wake readers? */
if (entropy_bits >= random_read_wakeup_bits &&
+   r->initialized &&
wq_has_sleeper(_read_wait)) {
wake_up_interruptible(_read_wait);
kill_fasync(, SIGIO, POLL_IN);
}
+
/* If the input pool is getting full, send some
 * entropy to the blocking pool until it is 75% full.
 */
@@ -917,9 +919,11 @@ static void crng_reseed(struct crng_state *crng, struct 
entropy_store *r)
} buf;
 
if (r) {
-   num = extract_entropy(r, , 32, 16, 0);
+   num = extract_entropy(r, , 32, 16, crng_ready()
+ ? random_read_wakeup_bits / 4 : 0);
if (num == 0)
   

Re: [PATCHv4] random: Make /dev/random wait for input_pool initialized

2019-02-18 Thread Bernd Edlinger
> @@ -1826,7 +1830,9 @@ _random_read(int nonblock, char __user *buf, size_t 
> nbytes)
> 
> nbytes = min_t(size_t, nbytes, SEC_XFER_SIZE);
> while (1) {
> -   n = extract_entropy_user(_pool, buf, nbytes);
> +   n = input_pool.initialized
> +   ? extract_entropy_user(_pool, buf, nbytes)

Aehm, sorry, now I see this creates a race condition with this code here, since
this the crng_reseed here also tries to read from the input_pool,
but input_pool.initialized is already true:

if (crng_init < 2 && entropy_bits >= 128) {
crng_reseed(_crng, r);
entropy_bits = r->entropy_count >> ENTROPY_SHIFT;


I was able to get a system in this behavior by running 3 instances of
#include 
#include 
#include 

int main()
{
  int f = open("/dev/random", O_NDELAY);
  if (f<0) return 1;
  for(;;)
  {
unsigned char buf[16];
int x = read(f, buf, sizeof(buf));
if (x>=0)
{
  int i;

  printf("read %d bytes: ", x);
  for (i=0; i +   n = input_pool.initialized && crng_ready()
> +   ? extract_entropy_user(_pool, buf, nbytes)


Thanks (for your patience :-)
Bernd.


[PATCHv4] random: Make /dev/random wait for input_pool initialized

2019-02-17 Thread Bernd Edlinger
Reading from /dev/random may return data while the getrandom
syscall is still blocking.

Those bytes are not yet cryptographically secure.

The first byte from /dev/random can have as little
as 8 bits entropy estimation.  Once a read blocks, it will
block until /proc/sys/kernel/random/read_wakeup_threshold
bits are available, which is usually 64 bits, but can be
configured as low as 8 bits.  A select will wake up when
at least read_wakeup_threshold bits are available.
Also when constantly reading bytes out of /dev/random
it will prevent the crng init done event forever.

Furthermore previously the input_pool got initialized when
more than 128 bits of raw entropy are accumulated, but that
is only credited 80 bits of pool entroy.  Therefore if
random_write_wakeup_bits is configured lower than 80 bits,
the code path that sends entropy to the blocking_pool can
get activated, before the CRNG is initialized and delays
the initialization of the CRNG until the blocking_pool is
filled to 75%.

Fixed by making read and select on /dev/random wait until
the input_pool is initialized which means that more than
128 bits of entropy have been fed into the input_pool.
This is guaranteed to happen after the CRNG is ready.
So after the first byte is readable from /dev/random
also /dev/urandom is guaranteed to be fully initialized.

Another minor tweak this patch makes, is when the primary
CRNG is periodically reseeded, we reserve twice the amount
of read_wakeup_threshold for fairness reasons, to keep
/dev/random readable when it is not accessed by user mode.

And finally, when the system runs for long times the jiffies
may roll over, but crng_global_init_time is not updated
when reseed every 5 minutes happens.  This could cause
CRNG reseeding all the time, making the input_pool run
low on entropy which would make /dev/random block
unexpectedly.

Signed-off-by: Bernd Edlinger 
---
v4 makes the /dev/random block until the input_pool has
reached 128 bits of entropy at least once.  Now make
everything depend on input_pool.initialized.
Additionally fixed a potential issue with jiffies roll
over in the CRNG reseeding logic, which would cause the
cgng_global_init_time to be in the future, causing
reseeding to happen all the time.
Finally added a fairness reserve to keep the /dev/random
in the readable state (select can check non-destructively),
when nothing but CRNG reseeding happens.
---
 drivers/char/random.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index bd449ad..97cde01 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -645,6 +645,7 @@ static void process_random_ready_list(void)
 static void credit_entropy_bits(struct entropy_store *r, int nbits)
 {
int entropy_count, orig;
+   int entropy_bits;
const int pool_size = r->poolinfo->poolfracbits;
int nfrac = nbits << ENTROPY_SHIFT;
 
@@ -702,8 +703,9 @@ static void credit_entropy_bits(struct entropy_store *r, 
int nbits)
if (cmpxchg(>entropy_count, orig, entropy_count) != orig)
goto retry;
 
+   entropy_bits = entropy_count >> ENTROPY_SHIFT;
r->entropy_total += nbits;
-   if (!r->initialized && r->entropy_total > 128) {
+   if (!r->initialized && entropy_bits >= 128) {
r->initialized = 1;
r->entropy_total = 0;
}
@@ -713,8 +715,6 @@ static void credit_entropy_bits(struct entropy_store *r, 
int nbits)
  r->entropy_total, _RET_IP_);
 
if (r == _pool) {
-   int entropy_bits = entropy_count >> ENTROPY_SHIFT;
-
if (crng_init < 2 && entropy_bits >= 128) {
crng_reseed(_crng, r);
entropy_bits = r->entropy_count >> ENTROPY_SHIFT;
@@ -722,10 +722,12 @@ static void credit_entropy_bits(struct entropy_store *r, 
int nbits)
 
/* should we wake readers? */
if (entropy_bits >= random_read_wakeup_bits &&
+   r->initialized &&
wq_has_sleeper(_read_wait)) {
wake_up_interruptible(_read_wait);
kill_fasync(, SIGIO, POLL_IN);
}
+
/* If the input pool is getting full, send some
 * entropy to the blocking pool until it is 75% full.
 */
@@ -917,9 +919,11 @@ static void crng_reseed(struct crng_state *crng, struct 
entropy_store *r)
} buf;
 
if (r) {
-   num = extract_entropy(r, , 32, 16, 0);
+   num = extract_entropy(r, , 32, 16, crng_ready()
+ ? random_read_wakeup_bits / 4 : 0);
if (num == 0)
return;
+   crng_global_init_time = jiffies;
} else {
  

Re: [PATCHv3] random: Make /dev/random wait for crng_ready

2019-02-17 Thread Bernd Edlinger
On 2/17/19 9:44 AM, Bernd Edlinger wrote:
>  
> + if (crng_ready() && !blocking_pool.initialized &&

After some more debugging I realize that blocking_pool.initialized
is true after 128 bits of input entropy, but that is only 80 bits
credited, due to the asymptotic 3/4 crediting formula.

I see that will also enable the code path below:

if (entropy_bits > random_write_wakeup_bits &&
r->initialized &&
r->entropy_total >= 2*random_read_wakeup_bits) {
struct entropy_store *other = _pool;

if (other->entropy_count <=
3 * other->poolinfo->poolfracbits / 4) {
schedule_work(>push_work);
r->entropy_total = 0;
}

when random_write_wakeup_bits is below 80, and random_read_wakeup_bits
is also smallish.  This depletes the input_pool in favor of the
blocking pool, while we are actually waiting for the input_pool to
reach 128 bits security strength, in order to seed the CRNG.

I am testing a new version and will post it later today.

Sorry for all the back-and forth.


Bernd.


[PATCHv3] random: Make /dev/random wait for crng_ready

2019-02-17 Thread Bernd Edlinger
Reading from /dev/random may return data while the getrandom
syscall is still blocking.

Those bytes are not yet cryptographically secure.

The first byte from /dev/random can have as little
as 8 bits entropy estimation.  Once a read blocks, it will
block until /proc/sys/kernel/random/read_wakeup_threshold
bits are available, which is usually 64 bits, but can be
configured as low as 8 bits.  A select will wake up when
at least read_wakeup_threshold bits are available.
Also when constantly reading bytes out of /dev/random
it will prevent the crng init done event forever.

Fixed by making read and select on /dev/random wait until
the crng is fully initialized and the blocking_pool is
also initialized which means that more than 128 bits of
entopy have been accumulated in the blocking_pool.

Signed-off-by: Bernd Edlinger 
---
The v3 version waits much longer than the v2 version,
since first 128 bits from the input_pool go into the
CRNG, the next 64 bits are only accounted 3/4 = 48 bits
in the blocking_pool, so we need in total 192 bits from
the input_pool until the blocking_pool is initialized
And another 64 bits until a select on /dev/random wakes up.

On a system with only interrupt_randomness, this
takes 128+192+64=384 random bits, about 6.5 minutes
from boot until /dev/random is readable.

Maybe this is taking too long, after the CRNG ready?

After the input_pool had already been initialized,
I wonder if feeding the next 64 bits from the input pool
to the empty blocking_pool could already be considered 
to be good enough to derive the first random byte from
the blocking_pool?

Thanks
Bernd.
---
 drivers/char/random.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 38c6d1a..666102d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -476,6 +476,7 @@ struct entropy_store {
__u8 last_data[EXTRACT_SIZE];
 };
 
+static void _xfer_secondary_pool(struct entropy_store *r, size_t nbytes);
 static ssize_t extract_entropy(struct entropy_store *r, void *buf,
   size_t nbytes, int min, int rsvd);
 static ssize_t _extract_entropy(struct entropy_store *r, void *buf,
@@ -719,8 +720,16 @@ static void credit_entropy_bits(struct entropy_store *r, 
int nbits)
entropy_bits = r->entropy_count >> ENTROPY_SHIFT;
}
 
+   if (crng_ready() && !blocking_pool.initialized &&
+   entropy_bits >= random_read_wakeup_bits) {
+   _xfer_secondary_pool(_pool, entropy_bits / 8);
+   r->entropy_total = 0;
+   entropy_bits = r->entropy_count >> ENTROPY_SHIFT;
+   }
+
/* should we wake readers? */
-   if (entropy_bits >= random_read_wakeup_bits &&
+   if (blocking_pool.initialized &&
+   entropy_bits >= random_read_wakeup_bits &&
wq_has_sleeper(_read_wait)) {
wake_up_interruptible(_read_wait);
kill_fasync(, SIGIO, POLL_IN);
@@ -1317,7 +1326,6 @@ void add_disk_randomness(struct gendisk *disk)
  * from the primary pool to the secondary extraction pool. We make
  * sure we pull enough for a 'catastrophic reseed'.
  */
-static void _xfer_secondary_pool(struct entropy_store *r, size_t nbytes);
 static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
 {
if (!r->pull ||
@@ -1661,7 +1669,7 @@ void get_random_bytes(void *buf, int nbytes)
  */
 int wait_for_random_bytes(void)
 {
-   if (likely(crng_ready()))
+   if (crng_ready())
return 0;
return wait_event_interruptible(crng_init_wait, crng_ready());
 }
@@ -1851,7 +1859,9 @@ void rand_initialize_disk(struct gendisk *disk)
 
nbytes = min_t(size_t, nbytes, SEC_XFER_SIZE);
while (1) {
-   n = extract_entropy_user(_pool, buf, nbytes);
+   n = blocking_pool.initialized
+   ? extract_entropy_user(_pool, buf, nbytes)
+   : 0;
if (n < 0)
return n;
trace_random_read(n*8, (nbytes-n)*8,
@@ -1865,6 +1875,7 @@ void rand_initialize_disk(struct gendisk *disk)
return -EAGAIN;
 
wait_event_interruptible(random_read_wait,
+   blocking_pool.initialized &&
ENTROPY_BITS(_pool) >=
random_read_wakeup_bits);
if (signal_pending(current))
@@ -1909,7 +1920,8 @@ void rand_initialize_disk(struct gendisk *disk)
poll_wait(file, _read_wait, wait);
poll_wait(file, _write_wait, wait);
mask = 0;
-   if (ENTROPY_BITS(_pool) >= random_read_wakeup_bits)
+   if (blocking_pool.initialized &&am

Re: [PATCHv2] random: Make /dev/random wait for crng_ready

2019-02-16 Thread Bernd Edlinger
On 2/16/19 7:23 PM, Theodore Y. Ts'o wrote:
> 
> This really isn't a correct way to fix things; since the blocking_pool
> used for /dev/random and the CRNG state are different things, and are
> fed by different sources of entropy.
> 

It is interesting that random_poll does only look at input_pool.
Could it be that the existing entropy in blocking_pool also matters?

My observation is that _random_read extracts always the requested
number of bytes from the input_pool by calling extract_entropy_user.
By not calling extract_entropy_user it is straight forward to ensure
that the entropy in the input_pool can accumulate until the primary_crng
is being initialized.

> What we should do is to have a separate flag which indicates that the
> blocking_pool has been adequately initialized, and set it only when
> the entropy count in the blocking pool is at least 128 bits.  When get
> woken up by the reader lock, we would transfer entropy from the input
> pool to the blocking pool, and if the pool is not yet initializedm,
> and the entropy count is less than 128 bits, we wait until it is.
> 

What happens for me, is that single byte reads out of /dev/random are able
to pull so much entropy out of the input_pool that the input_pool
never reaches 128 bit and therefore crng_ready will never be true.

Therefore I want to delay the blocking_pool until the CRNG is fully
initialized.  I hope you agree about that.

When that is done, there are two ways how entropy is transferred
from the input_pool to the blocking_pool, just to confirm I have
not overlooked something:

/* If the input pool is getting full, send some
 * entropy to the blocking pool until it is 75% full.
 */
if (entropy_bits > random_write_wakeup_bits &&
r->initialized &&
r->entropy_total >= 2*random_read_wakeup_bits) {
struct entropy_store *other = _pool;

if (other->entropy_count <=
3 * other->poolinfo->poolfracbits / 4) {
schedule_work(>push_work);
r->entropy_total = 0;
}
}

this rarely happens, before crng_ready because it requires more than
128 bits of entropy, at least in my setup.

The other path is:
_random_read->extract_entropy_user->xfer_secondary_pool

which pulls at least to random_read_wakeup_bits / 8
from the input_pool to the blocking_pool, 
and the same amount of entropy is extracted again
from the blocking_pool so the blocking_pool's entropy_count
effectively stays at zero, when this path is taken.

That you don't like, right?

You propose to disable the second path until the first path
has pulled 128 bits into the blocking_pool, right?


Thanks
Bernd.


[PATCHv2] random: Make /dev/random wait for crng_ready

2019-02-15 Thread Bernd Edlinger
Reading from /dev/random may return data while the getrandom
syscall is still blocking.

Those bytes are not yet cryptographically secure.

The first byte from /dev/random can have as little
as 8 bits entropy estimation.  Once a read blocks, it will
block until /proc/sys/kernel/random/read_wakeup_threshold
bits are available, which is usually 64 bits, but can be
configured as low as 8 bits.  A select will wake up when
at least read_wakeup_threshold bits are available.
Also when constantly reading bytes out of /dev/random
it will prevent the crng init done event forever.

Fixed by making read and select on /dev/random wait until
the crng is fully initialized.

Signed-off-by: Bernd Edlinger 
---
v2: Fixed one white space in the code.
Also added some more details about the problem
to the commit message.
---
 drivers/char/random.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index bd449ad..c8f16f0 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -721,7 +721,7 @@ static void credit_entropy_bits(struct entropy_store *r, 
int nbits)
}
 
/* should we wake readers? */
-   if (entropy_bits >= random_read_wakeup_bits &&
+   if (crng_ready() && entropy_bits >= random_read_wakeup_bits &&
wq_has_sleeper(_read_wait)) {
wake_up_interruptible(_read_wait);
kill_fasync(, SIGIO, POLL_IN);
@@ -1826,7 +1826,9 @@ _random_read(int nonblock, char __user *buf, size_t 
nbytes)
 
nbytes = min_t(size_t, nbytes, SEC_XFER_SIZE);
while (1) {
-   n = extract_entropy_user(_pool, buf, nbytes);
+   n = crng_ready()
+   ? extract_entropy_user(_pool, buf, nbytes)
+   : 0;
if (n < 0)
return n;
trace_random_read(n*8, (nbytes-n)*8,
@@ -1840,6 +1842,7 @@ _random_read(int nonblock, char __user *buf, size_t 
nbytes)
return -EAGAIN;
 
wait_event_interruptible(random_read_wait,
+   crng_ready() &&
ENTROPY_BITS(_pool) >=
random_read_wakeup_bits);
if (signal_pending(current))
@@ -1884,7 +1887,8 @@ random_poll(struct file *file, poll_table * wait)
poll_wait(file, _read_wait, wait);
poll_wait(file, _write_wait, wait);
mask = 0;
-   if (ENTROPY_BITS(_pool) >= random_read_wakeup_bits)
+   if (crng_ready() &&
+   ENTROPY_BITS(_pool) >= random_read_wakeup_bits)
mask |= EPOLLIN | EPOLLRDNORM;
if (ENTROPY_BITS(_pool) < random_write_wakeup_bits)
mask |= EPOLLOUT | EPOLLWRNORM;
-- 
2.7.4


Re: [PATCH] random: Make /dev/random wait for crng_ready

2019-02-14 Thread Bernd Edlinger
On 2/15/19 2:47 AM, Bernd Edlinger wrote:
>   while (1) {
> - n = extract_entropy_user(_pool, buf, nbytes);
> + n = crng_ready()
> + ?  extract_entropy_user(_pool, buf, nbytes)
> + : 0;

Aehm, the whitespace after ? does not align with :, didn't see that before.

I wonder if it would be better style to use
 if (crng_ready())
  n = extract...;
 else
  n = 0;


Bernd.



[PATCH] random: Make /dev/random wait for crng_ready

2019-02-14 Thread Bernd Edlinger
Reading from /dev/random may return data while the getrandom
syscall is still blocking.

Those bytes are not yet cryptographically secure.

Make read and select for reading on /dev/random wait until
the crng is fully initialized.

Signed-off-by: Bernd Edlinger 
---
 drivers/char/random.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 38c6d1a..ddbc0e2 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -720,7 +720,7 @@ static void credit_entropy_bits(struct entropy_store *r, 
int nbits)
}
 
/* should we wake readers? */
-   if (entropy_bits >= random_read_wakeup_bits &&
+   if (crng_ready() && entropy_bits >= random_read_wakeup_bits &&
wq_has_sleeper(_read_wait)) {
wake_up_interruptible(_read_wait);
kill_fasync(, SIGIO, POLL_IN);
@@ -1851,7 +1851,9 @@ void rand_initialize_disk(struct gendisk *disk)
 
nbytes = min_t(size_t, nbytes, SEC_XFER_SIZE);
while (1) {
-   n = extract_entropy_user(_pool, buf, nbytes);
+   n = crng_ready()
+   ?  extract_entropy_user(_pool, buf, nbytes)
+   : 0;
if (n < 0)
return n;
trace_random_read(n*8, (nbytes-n)*8,
@@ -1865,6 +1867,7 @@ void rand_initialize_disk(struct gendisk *disk)
return -EAGAIN;
 
wait_event_interruptible(random_read_wait,
+   crng_ready() &&
ENTROPY_BITS(_pool) >=
random_read_wakeup_bits);
if (signal_pending(current))
@@ -1909,7 +1912,8 @@ void rand_initialize_disk(struct gendisk *disk)
poll_wait(file, _read_wait, wait);
poll_wait(file, _write_wait, wait);
mask = 0;
-   if (ENTROPY_BITS(_pool) >= random_read_wakeup_bits)
+   if (crng_ready() &&
+   ENTROPY_BITS(_pool) >= random_read_wakeup_bits)
mask |= EPOLLIN | EPOLLRDNORM;
if (ENTROPY_BITS(_pool) < random_write_wakeup_bits)
mask |= EPOLLOUT | EPOLLWRNORM;
-- 
1.9.1


Re: [RFC][PATCH] Update -Wattribute-alias for gcc9

2019-01-28 Thread Bernd Edlinger
On 1/25/19 1:24 PM, Bernd Edlinger wrote:
> On 1/25/19 12:39 PM, Miguel Ojeda wrote:
>> On Fri, Jan 25, 2019 at 11:58 AM Arnd Bergmann  wrote:
>>>
>>> On Fri, Jan 25, 2019 at 11:43 AM Laura Abbott  wrote:
>>>>
>>>> Commit bee20031772a ("disable -Wattribute-alias warning for
>>>> SYSCALL_DEFINEx()") disabled -Wattribute-alias with gcc8.
>>>> gcc9 changed the format of -Wattribute-alias to take a parameter.
>>>> This doesn't quite match with the existing disabling mechanism
>>>> so update for gcc9 to match with the default (-Wattribute-alias=1).
>>>>
>>>> Signed-off-by: Laura Abbott 
>>>> ---
>>>> This is RFC because it feels ugly. I went ahead and did the obvious fixup
>>>> but it's worth discussing if we're going to end up with an explosion or
>>>> if there's a better way to handle this in one macro.
>>>
>>> Bernd Edlinger has sent a patch to gcc for this:
>>> https://gcc.gnu.org/ml/gcc-patches/2019-01/msg01120.html
>>>
>>> and Miguel Ojeda said he wanted to send a patch for it to the
>>> kernel as well, not sure if he wanted to take a different
>>> approach there, so adding both to Cc here.
>>
>> Thanks Arnd (I was working with Martin on the expanded
>> -Wmissing-attribute warnings, not on this, but thanks nevertheless :).
>>
>> Martin/Bernd: from the GCC mailing list I am not sure if we should
>> expect the old behavior to be maintained or not.
>>
> 
> I believe it is not intentional to break the old syntax of the
> pragma.  There will be new -Wattribute-alias=1 and -Wattribute-alias=2
> and -Wattribute-alias is easy to retain as an alias for -Wattribute-alias=1.
> That is what my patch will do.
> 

Okay, I committed the -Wattribute-alias patch to gcc trunk, now
as https://gcc.gnu.org/viewcvs/gcc?view=revision=268336 .
So there will be no need for a workaround on your side.

Also fixed a few false positive -Waddress-of-packed-member warnings with
https://gcc.gnu.org/viewcvs/gcc?view=revision=268118 and
https://gcc.gnu.org/viewcvs/gcc?view=revision=268337 .

However there remain a lot of warnings from -Waddress-of-packed-member,
that look more or less valid, has anybody an idea how to handle
these?


Bernd.


Re: [RFC][PATCH] Update -Wattribute-alias for gcc9

2019-01-25 Thread Bernd Edlinger
On 1/25/19 12:39 PM, Miguel Ojeda wrote:
> On Fri, Jan 25, 2019 at 11:58 AM Arnd Bergmann  wrote:
>>
>> On Fri, Jan 25, 2019 at 11:43 AM Laura Abbott  wrote:
>>>
>>> Commit bee20031772a ("disable -Wattribute-alias warning for
>>> SYSCALL_DEFINEx()") disabled -Wattribute-alias with gcc8.
>>> gcc9 changed the format of -Wattribute-alias to take a parameter.
>>> This doesn't quite match with the existing disabling mechanism
>>> so update for gcc9 to match with the default (-Wattribute-alias=1).
>>>
>>> Signed-off-by: Laura Abbott 
>>> ---
>>> This is RFC because it feels ugly. I went ahead and did the obvious fixup
>>> but it's worth discussing if we're going to end up with an explosion or
>>> if there's a better way to handle this in one macro.
>>
>> Bernd Edlinger has sent a patch to gcc for this:
>> https://gcc.gnu.org/ml/gcc-patches/2019-01/msg01120.html
>>
>> and Miguel Ojeda said he wanted to send a patch for it to the
>> kernel as well, not sure if he wanted to take a different
>> approach there, so adding both to Cc here.
> 
> Thanks Arnd (I was working with Martin on the expanded
> -Wmissing-attribute warnings, not on this, but thanks nevertheless :).
> 
> Martin/Bernd: from the GCC mailing list I am not sure if we should
> expect the old behavior to be maintained or not.
> 

I believe it is not intentional to break the old syntax of the
pragma.  There will be new -Wattribute-alias=1 and -Wattribute-alias=2
and -Wattribute-alias is easy to retain as an alias for -Wattribute-alias=1.
That is what my patch will do.


Bernd.


[PATCH] rt61pci: Work around a firmware bug with shared keys

2019-01-15 Thread Bernd Edlinger
Apparently the rt2x61 firmware fails temporarily to decode
broadcast packets if the shared keys are not assigned
in the "correct" sequence. At the same time unicast
packets work fine, since they are encrypted with the
pairwise key.

At least with WPA2 CCMP mode the shared keys are
set in the following sequence: keyidx=1, 2, 1, 2.
After a while only keyidx 2 gets decrypted, and
keyidx 1 is ignored, probably because there is never
a keyidx 3.

Symptoms are arping -b works for 10 minutes, since
keyidx=2 is used for broadcast, and then it stops
working for 10 minutes, because keyidx=1 is used.
That failure mode repeats forever.

Note, the firmware does not even know which keyidx
corresponds to which hw_key_idx so the firmware is
trying to be smarter than the driver, which is bound
to fail.

As workaround the function rt61pci_config_shared_key
requests software decryption of the shared keys,
by returning EOPNOTSUPP. However, pairwise keys are
still handled by hardware which works just fine.

Signed-off-by: Bernd Edlinger 
---
 drivers/net/wireless/ralink/rt2x00/rt61pci.c | 93 ++--
 1 file changed, 4 insertions(+), 89 deletions(-)

diff --git a/drivers/net/wireless/ralink/rt2x00/rt61pci.c 
b/drivers/net/wireless/ralink/rt2x00/rt61pci.c
index 4c5de8f..52b9fc4 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt61pci.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt61pci.c
@@ -321,97 +321,12 @@ static int rt61pci_config_shared_key(struct rt2x00_dev 
*rt2x00dev,
 struct rt2x00lib_crypto *crypto,
 struct ieee80211_key_conf *key)
 {
-   struct hw_key_entry key_entry;
-   struct rt2x00_field32 field;
-   u32 mask;
-   u32 reg;
-
-   if (crypto->cmd == SET_KEY) {
-   /*
-* rt2x00lib can't determine the correct free
-* key_idx for shared keys. We have 1 register
-* with key valid bits. The goal is simple, read
-* the register, if that is full we have no slots
-* left.
-* Note that each BSS is allowed to have up to 4
-* shared keys, so put a mask over the allowed
-* entries.
-*/
-   mask = (0xf << crypto->bssidx);
-
-   reg = rt2x00mmio_register_read(rt2x00dev, SEC_CSR0);
-   reg &= mask;
-
-   if (reg && reg == mask)
-   return -ENOSPC;
-
-   key->hw_key_idx += reg ? ffz(reg) : 0;
-
-   /*
-* Upload key to hardware
-*/
-   memcpy(key_entry.key, crypto->key,
-  sizeof(key_entry.key));
-   memcpy(key_entry.tx_mic, crypto->tx_mic,
-  sizeof(key_entry.tx_mic));
-   memcpy(key_entry.rx_mic, crypto->rx_mic,
-  sizeof(key_entry.rx_mic));
-
-   reg = SHARED_KEY_ENTRY(key->hw_key_idx);
-   rt2x00mmio_register_multiwrite(rt2x00dev, reg,
-  _entry, sizeof(key_entry));
-
-   /*
-* The cipher types are stored over 2 registers.
-* bssidx 0 and 1 keys are stored in SEC_CSR1 and
-* bssidx 1 and 2 keys are stored in SEC_CSR5.
-* Using the correct defines correctly will cause overhead,
-* so just calculate the correct offset.
-*/
-   if (key->hw_key_idx < 8) {
-   field.bit_offset = (3 * key->hw_key_idx);
-   field.bit_mask = 0x7 << field.bit_offset;
-
-   reg = rt2x00mmio_register_read(rt2x00dev, SEC_CSR1);
-   rt2x00_set_field32(, field, crypto->cipher);
-   rt2x00mmio_register_write(rt2x00dev, SEC_CSR1, reg);
-   } else {
-   field.bit_offset = (3 * (key->hw_key_idx - 8));
-   field.bit_mask = 0x7 << field.bit_offset;
-
-   reg = rt2x00mmio_register_read(rt2x00dev, SEC_CSR5);
-   rt2x00_set_field32(, field, crypto->cipher);
-   rt2x00mmio_register_write(rt2x00dev, SEC_CSR5, reg);
-   }
-
-   /*
-* The driver does not support the IV/EIV generation
-* in hardware. However it doesn't support the IV/EIV
-* inside the ieee80211 frame either, but requires it
-* to be provided separately for the descriptor.
-* rt2x00lib will cut the IV/EIV data out of all frames
-* given to us by mac80211, but we must tell mac80211
-* to generate the IV/EIV data.
-*/
-   key->flags |= IEEE80211_KEY_FLAG_GENERATE_IV;
-   }
-
/*
-

[PATCH v4 4/4] rtlwifi: Don't clear num_rx_inperiod too early

2019-01-08 Thread Bernd Edlinger
This patch moves the clearing of rtlpriv->link_info.num_rx_inperiod in
rtl_watchdog_wq_callback a few lines down.

This is necessary since it is still used in the "AP off" detection
code block. Moved clearing of rtlpriv->link_info.num_rx_inperiod
as well for consistency.

Signed-off-by: Bernd Edlinger 
---
v2: Improved patch description.

v3: Make the title fit in one line.

v4: Try to fix the line endings the message body.
---
 drivers/net/wireless/realtek/rtlwifi/base.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/base.c 
b/drivers/net/wireless/realtek/rtlwifi/base.c
index ef9b502..7aa68fe 100644
--- a/drivers/net/wireless/realtek/rtlwifi/base.c
+++ b/drivers/net/wireless/realtek/rtlwifi/base.c
@@ -2172,8 +2172,6 @@ void rtl_watchdog_wq_callback(void *data)
;
}
 
-   rtlpriv->link_info.num_rx_inperiod = 0;
-   rtlpriv->link_info.num_tx_inperiod = 0;
for (tid = 0; tid <= 7; tid++)
rtlpriv->link_info.tidtx_inperiod[tid] = 0;
 
@@ -2236,6 +2234,8 @@ void rtl_watchdog_wq_callback(void *data)
rtlpriv->btcoexist.btc_info.in_4way = false;
}
 
+   rtlpriv->link_info.num_rx_inperiod = 0;
+   rtlpriv->link_info.num_tx_inperiod = 0;
rtlpriv->link_info.bcn_rx_inperiod = 0;
 
/* <6> scan list */
-- 
1.9.1


[PATCH v4 3/4] rtl8723ae: Re-introduce the adaptive rate control

2019-01-08 Thread Bernd Edlinger
This re-introduces the function rtl8723e_dm_refresh_rate_adaptive_mask.

This function was present in a previous version of the code base,
it works just fine for me -- as long as it is not using stale data.

Unlike the original version of this function it avoids using
dm.undec_sm_pwdb when no beacon was received.

Fixed a style nit in rtl8723e_dm_init_rate_adaptive_mask.

Signed-off-by: Bernd Edlinger 
---
v2: Improve patch description.

v3: Improve patch description, mention that dm.undec_sm_pwdb
is not used when no beacon received. 
Make the title fit in one line.

v4: Try to fix the line endings the message body.
---
 .../net/wireless/realtek/rtlwifi/rtl8723ae/dm.c| 87 +-
 1 file changed, 85 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
index 902b944..acfd54c 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
@@ -673,7 +673,7 @@ void rtl8723e_dm_check_txpower_tracking(struct ieee80211_hw 
*hw)
 void rtl8723e_dm_init_rate_adaptive_mask(struct ieee80211_hw *hw)
 {
struct rtl_priv *rtlpriv = rtl_priv(hw);
-   struct rate_adaptive *p_ra = &(rtlpriv->ra);
+   struct rate_adaptive *p_ra = >ra;
 
p_ra->ratr_state = DM_RATR_STA_INIT;
p_ra->pre_ratr_state = DM_RATR_STA_INIT;
@@ -685,6 +685,89 @@ void rtl8723e_dm_init_rate_adaptive_mask(struct 
ieee80211_hw *hw)
 
 }
 
+void rtl8723e_dm_refresh_rate_adaptive_mask(struct ieee80211_hw *hw)
+{
+   struct rtl_priv *rtlpriv = rtl_priv(hw);
+   struct rtl_hal *rtlhal = rtl_hal(rtl_priv(hw));
+   struct rtl_mac *mac = rtl_mac(rtl_priv(hw));
+   struct rate_adaptive *p_ra = >ra;
+   u32 low_rssithresh_for_ra, high_rssithresh_for_ra;
+   struct ieee80211_sta *sta = NULL;
+
+   if (is_hal_stop(rtlhal)) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+" driver is going to unload\n");
+   return;
+   }
+
+   if (!rtlpriv->dm.useramask) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+" driver does not control rate adaptive mask\n");
+   return;
+   }
+
+   if (mac->link_state == MAC80211_LINKED &&
+   mac->opmode == NL80211_IFTYPE_STATION) {
+   switch (p_ra->pre_ratr_state) {
+   case DM_RATR_STA_HIGH:
+   high_rssithresh_for_ra = 50;
+   low_rssithresh_for_ra = 20;
+   break;
+   case DM_RATR_STA_MIDDLE:
+   high_rssithresh_for_ra = 55;
+   low_rssithresh_for_ra = 20;
+   break;
+   case DM_RATR_STA_LOW:
+   high_rssithresh_for_ra = 60;
+   low_rssithresh_for_ra = 25;
+   break;
+   default:
+   high_rssithresh_for_ra = 50;
+   low_rssithresh_for_ra = 20;
+   break;
+   }
+
+   if (rtlpriv->link_info.bcn_rx_inperiod == 0)
+   switch (p_ra->pre_ratr_state) {
+   case DM_RATR_STA_HIGH:
+   default:
+   p_ra->ratr_state = DM_RATR_STA_MIDDLE;
+   break;
+   case DM_RATR_STA_MIDDLE:
+   case DM_RATR_STA_LOW:
+   p_ra->ratr_state = DM_RATR_STA_LOW;
+   break;
+   }
+   else if (rtlpriv->dm.undec_sm_pwdb > high_rssithresh_for_ra)
+   p_ra->ratr_state = DM_RATR_STA_HIGH;
+   else if (rtlpriv->dm.undec_sm_pwdb > low_rssithresh_for_ra)
+   p_ra->ratr_state = DM_RATR_STA_MIDDLE;
+   else
+   p_ra->ratr_state = DM_RATR_STA_LOW;
+
+   if (p_ra->pre_ratr_state != p_ra->ratr_state) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"RSSI = %ld\n",
+rtlpriv->dm.undec_sm_pwdb);
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"RSSI_LEVEL = %d\n", p_ra->ratr_state);
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"PreState = %d, CurState = %d\n",
+p_ra->pre_ratr_state, p_ra->ratr_state);
+
+   rcu_read_lock();
+   sta = rtl_find_sta(hw, mac->bssid);
+   if (sta)
+   rtlpriv->cfg->ops->update_rate_tbl(hw, sta,

[PATCH v4 2/4] rtl8723ae: Dont use old data for input gain control

2019-01-08 Thread Bernd Edlinger
When no beacon was received, the value in dm.undec_sm_pwdb is most
likely out of date and should not be used to adjust the input path.
Assume instead that the signal level is low.

Fix the state machine in rtl8723e_dm_cck_packet_detection_thresh
which did not clear pre_cck_fa_state when changing cur_cck_pd_state
from CCK_PD_STAGE_LOWRSSI/CCK_FA_STAGE_LOW to CCK_PD_STAGE_HIGHRSSI
and back again to CCK_PD_STAGE_LOWRSSI/CCK_FA_STAGE_LOW, the register
RCCK0_CCA not written to 0x83 on the second change.

Explicitly initialize pre_cck_fa_state/cur_cck_fa_state in
rtl_dm_diginit.

Signed-off-by: Bernd Edlinger 
---
v2: Improved patch description.

v3: Make the title fit in one line.

v4: Try to fix the line endings the message body.
---
 drivers/net/wireless/realtek/rtlwifi/core.c | 2 ++
 drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c | 8 
 2 files changed, 10 insertions(+)

diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c 
b/drivers/net/wireless/realtek/rtlwifi/core.c
index 4bf7967..ce23339 100644
--- a/drivers/net/wireless/realtek/rtlwifi/core.c
+++ b/drivers/net/wireless/realtek/rtlwifi/core.c
@@ -1957,5 +1957,7 @@ void rtl_dm_diginit(struct ieee80211_hw *hw, u32 
cur_igvalue)
dm_digtable->bt30_cur_igi = 0x32;
dm_digtable->pre_cck_pd_state = CCK_PD_STAGE_MAX;
dm_digtable->cur_cck_pd_state = CCK_PD_STAGE_LOWRSSI;
+   dm_digtable->pre_cck_fa_state = 0;
+   dm_digtable->cur_cck_fa_state = 0;
 }
 EXPORT_SYMBOL(rtl_dm_diginit);
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
index 42a6fba..902b944 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
@@ -151,8 +151,14 @@ static u8 rtl8723e_dm_initial_gain_min_pwdb(struct 
ieee80211_hw *hw)
 {
struct rtl_priv *rtlpriv = rtl_priv(hw);
struct dig_t *dm_digtable = >dm_digtable;
+   struct rtl_mac *mac = rtl_mac(rtlpriv);
long rssi_val_min = 0;
 
+   if (mac->link_state == MAC80211_LINKED &&
+   mac->opmode == NL80211_IFTYPE_STATION &&
+   rtlpriv->link_info.bcn_rx_inperiod == 0)
+   return 0;
+
if ((dm_digtable->curmultista_cstate == DIG_MULTISTA_CONNECT) &&
(dm_digtable->cursta_cstate == DIG_STA_CONNECT)) {
if (rtlpriv->dm.entry_min_undec_sm_pwdb != 0)
@@ -417,6 +423,8 @@ static void rtl8723e_dm_cck_packet_detection_thresh(struct 
ieee80211_hw *hw)
} else {
rtl_set_bbreg(hw, RCCK0_CCA, MASKBYTE2, 0xcd);
rtl_set_bbreg(hw, RCCK0_SYSTEM, MASKBYTE1, 0x47);
+   dm_digtable->pre_cck_fa_state = 0;
+   dm_digtable->cur_cck_fa_state = 0;
 
}
dm_digtable->pre_cck_pd_state = dm_digtable->cur_cck_pd_state;
-- 
1.9.1


[PATCH v4 1/4] rtl8723ae: Take the FW LPS mode handling out

2019-01-08 Thread Bernd Edlinger
This appears to trigger a firmware bug and causes severe
problems with rtl8723ae PCI devices.

When the power save mode is activated for longer periods
of time the firmware stops to receive any packets.

This problem was exposed by commit 873ffe154ae0 ("rtlwifi:
Fix logic error in enter/exit power-save mode").

Previously the power save mode was only active rarely and
only for a short time so that the problem was not noticeable.

Signed-off-by: Bernd Edlinger 
---
v2: Adjust the defaults of swlps and fwlps module
parameters to match the firmware capabilities instead of removing
the whole code, so it can be easily re-activated once a firmware
update is available.

v3: Make the title fit in one line.

v4: Try to fix the line endings the message body.
---
 drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c
index 07b82700d1..3103151 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c
@@ -266,8 +266,8 @@ static bool is_fw_header(struct rtlwifi_firmware_header 
*hdr)
 static struct rtl_mod_params rtl8723e_mod_params = {
.sw_crypto = false,
.inactiveps = true,
-   .swctrl_lps = false,
-   .fwctrl_lps = true,
+   .swctrl_lps = true,
+   .fwctrl_lps = false,
.aspm_support = 1,
.debug_level = 0,
.debug_mask = 0,
@@ -395,8 +395,8 @@ static bool is_fw_header(struct rtlwifi_firmware_header 
*hdr)
   bool, 0444);
 MODULE_PARM_DESC(swenc, "Set to 1 for software crypto (default 0)\n");
 MODULE_PARM_DESC(ips, "Set to 0 to not use link power save (default 1)\n");
-MODULE_PARM_DESC(swlps, "Set to 1 to use SW control power save (default 0)\n");
-MODULE_PARM_DESC(fwlps, "Set to 1 to use FW control power save (default 1)\n");
+MODULE_PARM_DESC(swlps, "Set to 1 to use SW control power save (default 1)\n");
+MODULE_PARM_DESC(fwlps, "Set to 1 to use FW control power save (default 0)\n");
 MODULE_PARM_DESC(msi, "Set to 1 to use MSI interrupts mode (default 0)\n");
 MODULE_PARM_DESC(aspm, "Set to 1 to enable ASPM (default 1)\n");
 MODULE_PARM_DESC(debug_level, "Set debug level (0-5) (default 0)");
-- 
1.9.1


[PATCH v4 0/4] rtlwifi: Fix issues with rtl8723ae

2019-01-08 Thread Bernd Edlinger
Currently the rtl8723ae driver is broken (since v4.7).

Connection to AP is lost very often, especially when
the signal level is not very good.

The main issue is the power save mode is basically
not working, and seems to trigger a firmware bug.
So I had to take out the FW LPS mode handling.

While debugging the driver I found a couple related issues,
for instance that the signal level in dm.undec_sm_pwdb
is no longer accurate (may be even much too high) when no more
packets are received, and it increases the likelihood to receive
something if the input gain is set to maximum.

The patch was tested with the rtl8723ae PCI card in my laptop
against a FRITZ!Box 7590 AP -- the WiFi connection works now
very reliable for me.

ChangeLog:

v2: Adjusts the defaults of swlps and fwlps module
parameters to match the firmware capabilities instead of removing
the whole code, so it can be easily re-activated once a firmware
update is available.

v3: Make the title of each patch fit in one line.

v4: Try to fix the line endings the message body.
Which is an exchange server issue.
The patch does not change at all.

Bernd Edlinger (4):
  rtl8723ae: Take the FW LPS mode handling out
  rtl8723ae: Dont use old data for input gain control
  rtl8723ae: Re-introduce the adaptive rate control
  rtlwifi: Don't clear num_rx_inperiod too early 

 drivers/net/wireless/realtek/rtlwifi/base.c|  4 +-
 drivers/net/wireless/realtek/rtlwifi/core.c|  2 +
 .../net/wireless/realtek/rtlwifi/rtl8723ae/dm.c| 95 +-
 .../net/wireless/realtek/rtlwifi/rtl8723ae/sw.c|  8 +-
 4 files changed, 101 insertions(+), 8 deletions(-)

-- 
1.9.1


Re: [PATCH v3 1/4] rtl8723ae: Take the FW LPS mode handling out

2019-01-08 Thread Bernd Edlinger
On 1/8/19 6:27 PM, Kalle Valo wrote:
> Bernd Edlinger  writes:
> 
>> This appears to trigger a firmware bug and causes severe
>> problems with rtl8723ae PCI devices.
>>
>> When the power save mode is activated for longer periods
>> of time the firmware stops to receive any packets.
>>
>> This problem was exposed by commit 873ffe154ae0 ("rtlwifi:
>> Fix logic error in enter/exit power-save mode").
>>
>> Previously the power save mode was only active rarely and
>> only for a short time so that the problem was not noticeable.
>>
>> Signed-off-by: Bernd Edlinger 
>> ---
>> v2: Adjust the defaults of swlps and fwlps module
>> parameters to match the firmware capabilities instead of removing
>> the whole code, so it can be easily re-activated once a firmware
>> update is available.
>>
>> v3: Make the title fit in one line.
> 
> hotmail.de made me suspicious and indeed my Spidey sense was right:
> 
> Applying: rtl8723ae: Take the FW LPS mode handling out
> Using index info to reconstruct a base tree...
> .git/rebase-apply/patch:22: trailing whitespace.
> .swctrl_lps = true,
> .git/rebase-apply/patch:23: trailing whitespace.
> .fwctrl_lps = false,
> .git/rebase-apply/patch:33: trailing whitespace.
> MODULE_PARM_DESC(swlps, "Set to 1 to use SW control power save (default 
> 1)\n");
> .git/rebase-apply/patch:34: trailing whitespace.
> MODULE_PARM_DESC(fwlps, "Set to 1 to use FW control power save (default 
> 0)\n");
> error: patch failed: drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c:266
> error: drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c: patch does not 
> apply
> error: Did you hand edit your patch?
> It does not apply to blobs recorded in its index.
> Patch failed at 0001 rtl8723ae: Take the FW LPS mode handling out
> 
> Did you use Outlook to send this patch or what? Anyway, it's strongly
> suggested to use git to submit the patches (and not use Outlook servers
> either as they are notorious in breaking our patches):
> 

No, I did use firefox with the settings which I found at
https://www.kernel.org/doc/html/latest/process/email-clients.html#email-clients

I think all that is wrong with the message body is it is base64 encoded, and 
all the
lines end with CRLF, and it seems the CR is the trailing whitespace.

I think the Character Encoding/UTF-8 does not work right with
Exchange Servers.

What Character Encoding would git send-email use?  UTF-8?

So I will try to send the patches again, with 7-bit encoded message bodies.
Actually it will be quoted-printable, but I do not use any umlauts.


So for the records, I did open about:config in the firefox preferences,
and change:

mailnews.send_default_charset from UTF-8 to ISO-8859-1

that I have already done before, but it was not sufficient:

mailnews.send_plaintext_flowed from true to false
mailnews.wraplength from 72 to 0


Thanks
Bernd.


[PATCH v3 4/4] rtlwifi: Don't clear num_rx_inperiod too early

2019-01-05 Thread Bernd Edlinger
This patch moves the clearing of rtlpriv->link_info.num_rx_inperiod in
rtl_watchdog_wq_callback a few lines down.

This is necessary since it is still used in the "AP off" detection
code block. Moved clearing of rtlpriv->link_info.num_rx_inperiod
as well for consistency.

Signed-off-by: Bernd Edlinger 
---
v2: Improved patch description.

v3: Make the title fit in one line.
---
 drivers/net/wireless/realtek/rtlwifi/base.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/base.c 
b/drivers/net/wireless/realtek/rtlwifi/base.c
index ef9b502..7aa68fe 100644
--- a/drivers/net/wireless/realtek/rtlwifi/base.c
+++ b/drivers/net/wireless/realtek/rtlwifi/base.c
@@ -2172,8 +2172,6 @@ void rtl_watchdog_wq_callback(void *data)
;
}
 
-   rtlpriv->link_info.num_rx_inperiod = 0;
-   rtlpriv->link_info.num_tx_inperiod = 0;
for (tid = 0; tid <= 7; tid++)
rtlpriv->link_info.tidtx_inperiod[tid] = 0;
 
@@ -2236,6 +2234,8 @@ void rtl_watchdog_wq_callback(void *data)
rtlpriv->btcoexist.btc_info.in_4way = false;
}
 
+   rtlpriv->link_info.num_rx_inperiod = 0;
+   rtlpriv->link_info.num_tx_inperiod = 0;
rtlpriv->link_info.bcn_rx_inperiod = 0;
 
/* <6> scan list */
-- 
1.9.1


[PATCH v3 3/4] rtl8723ae: Re-introduce the adaptive rate control

2019-01-05 Thread Bernd Edlinger
This re-introduces the function rtl8723e_dm_refresh_rate_adaptive_mask.

This function was present in a previous version of the code base,
it works just fine for me -- as long as it is not using stale data.

Unlike the original version of this function it avoids using
dm.undec_sm_pwdb when no beacon was received.

Fixed a style nit in rtl8723e_dm_init_rate_adaptive_mask.

Signed-off-by: Bernd Edlinger 
---
v2: Improve patch description.

v3: Improve patch description, mention that dm.undec_sm_pwdb
is not used when no beacon received. 
Make the title fit in one line.
---
 .../net/wireless/realtek/rtlwifi/rtl8723ae/dm.c| 87 +-
 1 file changed, 85 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
index 902b944..acfd54c 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
@@ -673,7 +673,7 @@ void rtl8723e_dm_check_txpower_tracking(struct ieee80211_hw 
*hw)
 void rtl8723e_dm_init_rate_adaptive_mask(struct ieee80211_hw *hw)
 {
struct rtl_priv *rtlpriv = rtl_priv(hw);
-   struct rate_adaptive *p_ra = &(rtlpriv->ra);
+   struct rate_adaptive *p_ra = >ra;
 
p_ra->ratr_state = DM_RATR_STA_INIT;
p_ra->pre_ratr_state = DM_RATR_STA_INIT;
@@ -685,6 +685,89 @@ void rtl8723e_dm_init_rate_adaptive_mask(struct 
ieee80211_hw *hw)
 
 }
 
+void rtl8723e_dm_refresh_rate_adaptive_mask(struct ieee80211_hw *hw)
+{
+   struct rtl_priv *rtlpriv = rtl_priv(hw);
+   struct rtl_hal *rtlhal = rtl_hal(rtl_priv(hw));
+   struct rtl_mac *mac = rtl_mac(rtl_priv(hw));
+   struct rate_adaptive *p_ra = >ra;
+   u32 low_rssithresh_for_ra, high_rssithresh_for_ra;
+   struct ieee80211_sta *sta = NULL;
+
+   if (is_hal_stop(rtlhal)) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+" driver is going to unload\n");
+   return;
+   }
+
+   if (!rtlpriv->dm.useramask) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+" driver does not control rate adaptive mask\n");
+   return;
+   }
+
+   if (mac->link_state == MAC80211_LINKED &&
+   mac->opmode == NL80211_IFTYPE_STATION) {
+   switch (p_ra->pre_ratr_state) {
+   case DM_RATR_STA_HIGH:
+   high_rssithresh_for_ra = 50;
+   low_rssithresh_for_ra = 20;
+   break;
+   case DM_RATR_STA_MIDDLE:
+   high_rssithresh_for_ra = 55;
+   low_rssithresh_for_ra = 20;
+   break;
+   case DM_RATR_STA_LOW:
+   high_rssithresh_for_ra = 60;
+   low_rssithresh_for_ra = 25;
+   break;
+   default:
+   high_rssithresh_for_ra = 50;
+   low_rssithresh_for_ra = 20;
+   break;
+   }
+
+   if (rtlpriv->link_info.bcn_rx_inperiod == 0)
+   switch (p_ra->pre_ratr_state) {
+   case DM_RATR_STA_HIGH:
+   default:
+   p_ra->ratr_state = DM_RATR_STA_MIDDLE;
+   break;
+   case DM_RATR_STA_MIDDLE:
+   case DM_RATR_STA_LOW:
+   p_ra->ratr_state = DM_RATR_STA_LOW;
+   break;
+   }
+   else if (rtlpriv->dm.undec_sm_pwdb > high_rssithresh_for_ra)
+   p_ra->ratr_state = DM_RATR_STA_HIGH;
+   else if (rtlpriv->dm.undec_sm_pwdb > low_rssithresh_for_ra)
+   p_ra->ratr_state = DM_RATR_STA_MIDDLE;
+   else
+   p_ra->ratr_state = DM_RATR_STA_LOW;
+
+   if (p_ra->pre_ratr_state != p_ra->ratr_state) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"RSSI = %ld\n",
+rtlpriv->dm.undec_sm_pwdb);
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"RSSI_LEVEL = %d\n", p_ra->ratr_state);
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"PreState = %d, CurState = %d\n",
+p_ra->pre_ratr_state, p_ra->ratr_state);
+
+   rcu_read_lock();
+   sta = rtl_find_sta(hw, mac->bssid);
+   if (sta)
+   rtlpriv->cfg->ops->update_rate_tbl(hw, sta,
+  p_ra->ratr_st

[PATCH v3 2/4] rtl8723ae: Dont use old data for input gain control

2019-01-05 Thread Bernd Edlinger
When no beacon was received, the value in dm.undec_sm_pwdb is most
likely out of date and should not be used to adjust the input path.
Assume instead that the signal level is low.

Fix the state machine in rtl8723e_dm_cck_packet_detection_thresh
which did not clear pre_cck_fa_state when changing cur_cck_pd_state
from CCK_PD_STAGE_LOWRSSI/CCK_FA_STAGE_LOW to CCK_PD_STAGE_HIGHRSSI
and back again to CCK_PD_STAGE_LOWRSSI/CCK_FA_STAGE_LOW, the register
RCCK0_CCA not written to 0x83 on the second change.

Explicitly initialize pre_cck_fa_state/cur_cck_fa_state in
rtl_dm_diginit.

Signed-off-by: Bernd Edlinger 
---
v2: Improved patch description.

v3: Make the title fit in one line.
---
 drivers/net/wireless/realtek/rtlwifi/core.c | 2 ++
 drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c | 8 
 2 files changed, 10 insertions(+)

diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c 
b/drivers/net/wireless/realtek/rtlwifi/core.c
index 4bf7967..ce23339 100644
--- a/drivers/net/wireless/realtek/rtlwifi/core.c
+++ b/drivers/net/wireless/realtek/rtlwifi/core.c
@@ -1957,5 +1957,7 @@ void rtl_dm_diginit(struct ieee80211_hw *hw, u32 
cur_igvalue)
dm_digtable->bt30_cur_igi = 0x32;
dm_digtable->pre_cck_pd_state = CCK_PD_STAGE_MAX;
dm_digtable->cur_cck_pd_state = CCK_PD_STAGE_LOWRSSI;
+   dm_digtable->pre_cck_fa_state = 0;
+   dm_digtable->cur_cck_fa_state = 0;
 }
 EXPORT_SYMBOL(rtl_dm_diginit);
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
index 42a6fba..902b944 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
@@ -151,8 +151,14 @@ static u8 rtl8723e_dm_initial_gain_min_pwdb(struct 
ieee80211_hw *hw)
 {
struct rtl_priv *rtlpriv = rtl_priv(hw);
struct dig_t *dm_digtable = >dm_digtable;
+   struct rtl_mac *mac = rtl_mac(rtlpriv);
long rssi_val_min = 0;
 
+   if (mac->link_state == MAC80211_LINKED &&
+   mac->opmode == NL80211_IFTYPE_STATION &&
+   rtlpriv->link_info.bcn_rx_inperiod == 0)
+   return 0;
+
if ((dm_digtable->curmultista_cstate == DIG_MULTISTA_CONNECT) &&
(dm_digtable->cursta_cstate == DIG_STA_CONNECT)) {
if (rtlpriv->dm.entry_min_undec_sm_pwdb != 0)
@@ -417,6 +423,8 @@ static void rtl8723e_dm_cck_packet_detection_thresh(struct 
ieee80211_hw *hw)
} else {
rtl_set_bbreg(hw, RCCK0_CCA, MASKBYTE2, 0xcd);
rtl_set_bbreg(hw, RCCK0_SYSTEM, MASKBYTE1, 0x47);
+   dm_digtable->pre_cck_fa_state = 0;
+   dm_digtable->cur_cck_fa_state = 0;
 
}
dm_digtable->pre_cck_pd_state = dm_digtable->cur_cck_pd_state;
-- 
1.9.1


[PATCH v3 1/4] rtl8723ae: Take the FW LPS mode handling out

2019-01-05 Thread Bernd Edlinger
This appears to trigger a firmware bug and causes severe
problems with rtl8723ae PCI devices.

When the power save mode is activated for longer periods
of time the firmware stops to receive any packets.

This problem was exposed by commit 873ffe154ae0 ("rtlwifi:
Fix logic error in enter/exit power-save mode").

Previously the power save mode was only active rarely and
only for a short time so that the problem was not noticeable.

Signed-off-by: Bernd Edlinger 
---
v2: Adjust the defaults of swlps and fwlps module
parameters to match the firmware capabilities instead of removing
the whole code, so it can be easily re-activated once a firmware
update is available.

v3: Make the title fit in one line.
---
 drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c
index 07b82700d1..3103151 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c
@@ -266,8 +266,8 @@ static bool is_fw_header(struct rtlwifi_firmware_header 
*hdr)
 static struct rtl_mod_params rtl8723e_mod_params = {
.sw_crypto = false,
.inactiveps = true,
-   .swctrl_lps = false,
-   .fwctrl_lps = true,
+   .swctrl_lps = true,
+   .fwctrl_lps = false,
.aspm_support = 1,
.debug_level = 0,
.debug_mask = 0,
@@ -395,8 +395,8 @@ static bool is_fw_header(struct rtlwifi_firmware_header 
*hdr)
   bool, 0444);
 MODULE_PARM_DESC(swenc, "Set to 1 for software crypto (default 0)\n");
 MODULE_PARM_DESC(ips, "Set to 0 to not use link power save (default 1)\n");
-MODULE_PARM_DESC(swlps, "Set to 1 to use SW control power save (default 0)\n");
-MODULE_PARM_DESC(fwlps, "Set to 1 to use FW control power save (default 1)\n");
+MODULE_PARM_DESC(swlps, "Set to 1 to use SW control power save (default 1)\n");
+MODULE_PARM_DESC(fwlps, "Set to 1 to use FW control power save (default 0)\n");
 MODULE_PARM_DESC(msi, "Set to 1 to use MSI interrupts mode (default 0)\n");
 MODULE_PARM_DESC(aspm, "Set to 1 to enable ASPM (default 1)\n");
 MODULE_PARM_DESC(debug_level, "Set debug level (0-5) (default 0)");
-- 
1.9.1


[PATCH v3 0/4] rtlwifi: Fix issues with rtl8723ae

2019-01-05 Thread Bernd Edlinger
Currently the rtl8723ae driver is broken (since v4.7).

Connection to AP is lost very often, especially when
the signal level is not very good.

The main issue is the power save mode is basically
not working, and seems to trigger a firmware bug.
So I had to take out the FW LPS mode handling.

While debugging the driver I found a couple related issues,
for instance that the signal level in dm.undec_sm_pwdb
is no longer accurate (may be even much too high) when no more
packets are received, and it increases the likelihood to receive
something if the input gain is set to maximum.

The patch was tested with the rtl8723ae PCI card in my laptop
against a FRITZ!Box 7590 AP -- the WiFi connection works now
very reliable for me.

ChangeLog:

v2: Adjusts the defaults of swlps and fwlps module
parameters to match the firmware capabilities instead of removing
the whole code, so it can be easily re-activated once a firmware
update is available.

v3: Make the title of each patch fit in one line.


Bernd Edlinger (4):
  rtl8723ae: Take the FW LPS mode handling out
  rtl8723ae: Dont use old data for input gain control
  rtl8723ae: Re-introduce the adaptive rate control
  rtlwifi: Don't clear num_rx_inperiod too early 

 drivers/net/wireless/realtek/rtlwifi/base.c|  4 +-
 drivers/net/wireless/realtek/rtlwifi/core.c|  2 +
 .../net/wireless/realtek/rtlwifi/rtl8723ae/dm.c| 95 +-
 .../net/wireless/realtek/rtlwifi/rtl8723ae/sw.c|  8 +-
 4 files changed, 101 insertions(+), 8 deletions(-)

-- 
1.9.1


[PATCH v2 3/4] rtlwifi: rtl8723ae: Re-introduce

2019-01-05 Thread Bernd Edlinger
 rtl8723e_dm_refresh_rate_adaptive_mask

This function was present in a previous version of the code base,
it works just fine for me -- as long as it is not using stale data.

Fixed a style nit in rtl8723e_dm_init_rate_adaptive_mask.

Signed-off-by: Bernd Edlinger 
---
 .../net/wireless/realtek/rtlwifi/rtl8723ae/dm.c| 87 +-
 1 file changed, 85 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
index 902b944..acfd54c 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
@@ -673,7 +673,7 @@ void rtl8723e_dm_check_txpower_tracking(struct ieee80211_hw 
*hw)
 void rtl8723e_dm_init_rate_adaptive_mask(struct ieee80211_hw *hw)
 {
struct rtl_priv *rtlpriv = rtl_priv(hw);
-   struct rate_adaptive *p_ra = &(rtlpriv->ra);
+   struct rate_adaptive *p_ra = >ra;
 
p_ra->ratr_state = DM_RATR_STA_INIT;
p_ra->pre_ratr_state = DM_RATR_STA_INIT;
@@ -685,6 +685,89 @@ void rtl8723e_dm_init_rate_adaptive_mask(struct 
ieee80211_hw *hw)
 
 }
 
+void rtl8723e_dm_refresh_rate_adaptive_mask(struct ieee80211_hw *hw)
+{
+   struct rtl_priv *rtlpriv = rtl_priv(hw);
+   struct rtl_hal *rtlhal = rtl_hal(rtl_priv(hw));
+   struct rtl_mac *mac = rtl_mac(rtl_priv(hw));
+   struct rate_adaptive *p_ra = >ra;
+   u32 low_rssithresh_for_ra, high_rssithresh_for_ra;
+   struct ieee80211_sta *sta = NULL;
+
+   if (is_hal_stop(rtlhal)) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+" driver is going to unload\n");
+   return;
+   }
+
+   if (!rtlpriv->dm.useramask) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+" driver does not control rate adaptive mask\n");
+   return;
+   }
+
+   if (mac->link_state == MAC80211_LINKED &&
+   mac->opmode == NL80211_IFTYPE_STATION) {
+   switch (p_ra->pre_ratr_state) {
+   case DM_RATR_STA_HIGH:
+   high_rssithresh_for_ra = 50;
+   low_rssithresh_for_ra = 20;
+   break;
+   case DM_RATR_STA_MIDDLE:
+   high_rssithresh_for_ra = 55;
+   low_rssithresh_for_ra = 20;
+   break;
+   case DM_RATR_STA_LOW:
+   high_rssithresh_for_ra = 60;
+   low_rssithresh_for_ra = 25;
+   break;
+   default:
+   high_rssithresh_for_ra = 50;
+   low_rssithresh_for_ra = 20;
+   break;
+   }
+
+   if (rtlpriv->link_info.bcn_rx_inperiod == 0)
+   switch (p_ra->pre_ratr_state) {
+   case DM_RATR_STA_HIGH:
+   default:
+   p_ra->ratr_state = DM_RATR_STA_MIDDLE;
+   break;
+   case DM_RATR_STA_MIDDLE:
+   case DM_RATR_STA_LOW:
+   p_ra->ratr_state = DM_RATR_STA_LOW;
+   break;
+   }
+   else if (rtlpriv->dm.undec_sm_pwdb > high_rssithresh_for_ra)
+   p_ra->ratr_state = DM_RATR_STA_HIGH;
+   else if (rtlpriv->dm.undec_sm_pwdb > low_rssithresh_for_ra)
+   p_ra->ratr_state = DM_RATR_STA_MIDDLE;
+   else
+   p_ra->ratr_state = DM_RATR_STA_LOW;
+
+   if (p_ra->pre_ratr_state != p_ra->ratr_state) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"RSSI = %ld\n",
+rtlpriv->dm.undec_sm_pwdb);
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"RSSI_LEVEL = %d\n", p_ra->ratr_state);
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"PreState = %d, CurState = %d\n",
+p_ra->pre_ratr_state, p_ra->ratr_state);
+
+   rcu_read_lock();
+   sta = rtl_find_sta(hw, mac->bssid);
+   if (sta)
+   rtlpriv->cfg->ops->update_rate_tbl(hw, sta,
+  p_ra->ratr_state,
+ true);
+   rcu_read_unlock();
+
+   p_ra->pre_ratr_state = p_ra->ratr_state;
+   }
+   }
+}
+
 void rtl8723e_dm_rf_saving(struct

[PATCH v2 4/4] rtlwifi: Move the clearing of

2019-01-05 Thread Bernd Edlinger
 rtlpriv->link_info.num_rx_inperiod in rtl_watchdog_wq_callback a few lines
 down

This is necessary since it is still used in the "AP off" detection
code block. Moved clearing of rtlpriv->link_info.num_rx_inperiod
as well for consistency.

Signed-off-by: Bernd Edlinger 
---
 drivers/net/wireless/realtek/rtlwifi/base.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/base.c 
b/drivers/net/wireless/realtek/rtlwifi/base.c
index ef9b502..7aa68fe 100644
--- a/drivers/net/wireless/realtek/rtlwifi/base.c
+++ b/drivers/net/wireless/realtek/rtlwifi/base.c
@@ -2172,8 +2172,6 @@ void rtl_watchdog_wq_callback(void *data)
;
}
 
-   rtlpriv->link_info.num_rx_inperiod = 0;
-   rtlpriv->link_info.num_tx_inperiod = 0;
for (tid = 0; tid <= 7; tid++)
rtlpriv->link_info.tidtx_inperiod[tid] = 0;
 
@@ -2236,6 +2234,8 @@ void rtl_watchdog_wq_callback(void *data)
rtlpriv->btcoexist.btc_info.in_4way = false;
}
 
+   rtlpriv->link_info.num_rx_inperiod = 0;
+   rtlpriv->link_info.num_tx_inperiod = 0;
rtlpriv->link_info.bcn_rx_inperiod = 0;
 
/* <6> scan list */
-- 
1.9.1


[PATCH v2 2/4] rtlwifi: rtl8723ae: Don't use dm.undec_sm_pwdb for input

2019-01-05 Thread Bernd Edlinger
 gain control when no beacon was received in the connected state

When no beacon was received, the value in dm.undec_sm_pwdb is most
likely out of date and should not be used to adjust the input path.
Assume instead that the signal level is low.

Fix the state machine in rtl8723e_dm_cck_packet_detection_thresh
which did not clear pre_cck_fa_state when changing cur_cck_pd_state
from CCK_PD_STAGE_LOWRSSI/CCK_FA_STAGE_LOW to CCK_PD_STAGE_HIGHRSSI
and back again to CCK_PD_STAGE_LOWRSSI/CCK_FA_STAGE_LOW, the register
RCCK0_CCA not written to 0x83 on the second change.

Explicitly initialize pre_cck_fa_state/cur_cck_fa_state in
rtl_dm_diginit.

Signed-off-by: Bernd Edlinger 
---
 drivers/net/wireless/realtek/rtlwifi/core.c | 2 ++
 drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c | 8 
 2 files changed, 10 insertions(+)

diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c 
b/drivers/net/wireless/realtek/rtlwifi/core.c
index 4bf7967..ce23339 100644
--- a/drivers/net/wireless/realtek/rtlwifi/core.c
+++ b/drivers/net/wireless/realtek/rtlwifi/core.c
@@ -1957,5 +1957,7 @@ void rtl_dm_diginit(struct ieee80211_hw *hw, u32 
cur_igvalue)
dm_digtable->bt30_cur_igi = 0x32;
dm_digtable->pre_cck_pd_state = CCK_PD_STAGE_MAX;
dm_digtable->cur_cck_pd_state = CCK_PD_STAGE_LOWRSSI;
+   dm_digtable->pre_cck_fa_state = 0;
+   dm_digtable->cur_cck_fa_state = 0;
 }
 EXPORT_SYMBOL(rtl_dm_diginit);
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
index 42a6fba..902b944 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
@@ -151,8 +151,14 @@ static u8 rtl8723e_dm_initial_gain_min_pwdb(struct 
ieee80211_hw *hw)
 {
struct rtl_priv *rtlpriv = rtl_priv(hw);
struct dig_t *dm_digtable = >dm_digtable;
+   struct rtl_mac *mac = rtl_mac(rtlpriv);
long rssi_val_min = 0;
 
+   if (mac->link_state == MAC80211_LINKED &&
+   mac->opmode == NL80211_IFTYPE_STATION &&
+   rtlpriv->link_info.bcn_rx_inperiod == 0)
+   return 0;
+
if ((dm_digtable->curmultista_cstate == DIG_MULTISTA_CONNECT) &&
(dm_digtable->cursta_cstate == DIG_STA_CONNECT)) {
if (rtlpriv->dm.entry_min_undec_sm_pwdb != 0)
@@ -417,6 +423,8 @@ static void rtl8723e_dm_cck_packet_detection_thresh(struct 
ieee80211_hw *hw)
} else {
rtl_set_bbreg(hw, RCCK0_CCA, MASKBYTE2, 0xcd);
rtl_set_bbreg(hw, RCCK0_SYSTEM, MASKBYTE1, 0x47);
+   dm_digtable->pre_cck_fa_state = 0;
+   dm_digtable->cur_cck_fa_state = 0;
 
}
dm_digtable->pre_cck_pd_state = dm_digtable->cur_cck_pd_state;
-- 
1.9.1


[PATCH v2 1/4] rtlwifi: rtl8723ae: Take the FW LPS mode handling out

2019-01-05 Thread Bernd Edlinger
This appears to trigger a firmware bug and causes severe
problems with rtl8723ae PCI devices.

When the power save mode is activated for longer periods
of time the firmware stops to receive any packets.

This problem was exposed by commit 873ffe154ae0 ("rtlwifi:
Fix logic error in enter/exit power-save mode").

Previously the power save mode was only active rarely and
only for a short time so that the problem was not noticeable.

Signed-off-by: Bernd Edlinger 
---
 drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c
index 07b82700d1..3103151 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c
@@ -266,8 +266,8 @@ static bool is_fw_header(struct rtlwifi_firmware_header 
*hdr)
 static struct rtl_mod_params rtl8723e_mod_params = {
.sw_crypto = false,
.inactiveps = true,
-   .swctrl_lps = false,
-   .fwctrl_lps = true,
+   .swctrl_lps = true,
+   .fwctrl_lps = false,
.aspm_support = 1,
.debug_level = 0,
.debug_mask = 0,
@@ -395,8 +395,8 @@ static bool is_fw_header(struct rtlwifi_firmware_header 
*hdr)
   bool, 0444);
 MODULE_PARM_DESC(swenc, "Set to 1 for software crypto (default 0)\n");
 MODULE_PARM_DESC(ips, "Set to 0 to not use link power save (default 1)\n");
-MODULE_PARM_DESC(swlps, "Set to 1 to use SW control power save (default 0)\n");
-MODULE_PARM_DESC(fwlps, "Set to 1 to use FW control power save (default 1)\n");
+MODULE_PARM_DESC(swlps, "Set to 1 to use SW control power save (default 1)\n");
+MODULE_PARM_DESC(fwlps, "Set to 1 to use FW control power save (default 0)\n");
 MODULE_PARM_DESC(msi, "Set to 1 to use MSI interrupts mode (default 0)\n");
 MODULE_PARM_DESC(aspm, "Set to 1 to enable ASPM (default 1)\n");
 MODULE_PARM_DESC(debug_level, "Set debug level (0-5) (default 0)");
-- 
1.9.1


[PATCH v2 0/4] rtlwifi: Fix issues with rtl8723ae

2019-01-05 Thread Bernd Edlinger
Currently the rtl8723ae driver is broken (since v4.7).

Connection to AP is lost very often, especially when
the signal level is not very good.

The main issue is the power save mode is basically
not working, and seems to trigger a firmware bug.
So I had to take out the FW LPS mode handling.

While debugging the driver I found a couple related issues,
for instance that the signal level in dm.undec_sm_pwdb
is no longer accurate (may be even much too high) when no more
packets are received, and it increases the likelihood to receive
something if the input gain is set to maximum.

The patch was tested with the rtl8723ae PCI card in my laptop
against a FRITZ!Box 7590 AP -- the WiFi connection works now
very reliable for me.

V2 of the patch adjusts the defaults of swlps and fwlps module
parameters to match the firmware capabilities instead of removing
the whole code, so it can be easily re-activated once a firmware
update is available.


Bernd Edlinger (4):
  rtlwifi: rtl8723ae: Take the FW LPS mode handling out
  rtlwifi: rtl8723ae: Don't use dm.undec_sm_pwdb for input gain control 
   when no beacon was received in the connected state
  rtlwifi: rtl8723ae: Re-introduce
rtl8723e_dm_refresh_rate_adaptive_mask
  rtlwifi: Move the clearing of rtlpriv->link_info.num_rx_inperiod in   
 rtl_watchdog_wq_callback a few lines down

 drivers/net/wireless/realtek/rtlwifi/base.c|  4 +-
 drivers/net/wireless/realtek/rtlwifi/core.c|  2 +
 .../net/wireless/realtek/rtlwifi/rtl8723ae/dm.c| 95 +-
 .../net/wireless/realtek/rtlwifi/rtl8723ae/sw.c|  8 +-
 4 files changed, 101 insertions(+), 8 deletions(-)

-- 
1.9.1


Re: [PATCH 1/4] rtlwifi: rtl8723ae: Take the FW LPS mode handling out

2019-01-05 Thread Bernd Edlinger
On 1/5/19 5:13 PM, Larry Finger wrote:
>> but this works:
>>
>> modprobe rtl8723ae debug_mask=0x debug_level=5 swlps=1 fwlps=0
> 
> Yes, I think that is a better thing to do now. If and when Realtek finds a 
> firmware bug, and when the new firmware is readily available, then there will 
> not be a lot of code to reinstall.
> 

Okay, my assumption of how the firmware bug "works" is this:

Once the firmware enters power save mode, it will deliver exactly one (or maybe 
two)
Rx interrupts. If one of those triggers the driver to leave the power save mode 
again,
the firmware continues to work.
If those are only beacons, they won't leave the power save mode.  Then the 
firmware
will usually not recover.
Since prior to commit 873ffe154ae0 ("rtlwifi: Fix logic error in enter/exit 
power-save mode")
the power save mode was only activated when there _is_ busy traffic, the next
packet did usually wake the firmware, rarely it did freeze however.
Other things like changing the cck_packet_detection_threshold or 
refresh_rate_adaptive_mask
can also kick the firmware back to life.

Hope this helps to track down the root cause of this bug.


Thanks
Bernd.


Re: [PATCH 1/4] rtlwifi: rtl8723ae: Take the FW LPS mode handling out

2019-01-05 Thread Bernd Edlinger
On 1/5/19 3:44 AM, Larry Finger wrote:
> On 1/4/19 6:48 AM, Bernd Edlinger wrote:
>> This appears to trigger a firmware bug and causes severe
>> problems with rtl8723ae PCI devices.
>>
>> When the power save mode is activated for longer periods
>> of time the firmware stops to receive any packets.
>>
>> This problem was exposed by commit 873ffe154ae0 ("rtlwifi:
>> Fix logic error in enter/exit power-save mode").
>>
>> Previously the power save mode was only active rarely and
>> only for a short time so that the problem was not noticeable.
>>
>> Signed-off-by: Bernd Edlinger 
>> ---
> 
> While the Realtek firmware group has a chance to look for a bug, I would like 
> you to perform a couple of tests on the original code.
> 
> The driver has three module parameters that affect power save. The 'modinfo 
> rtl8723ae' command lists them as
> 
> parm:   ips:Set to 0 to not use link power save (default 1) (bool)
> parm:   swlps:Set to 1 to use SW control power save (default 0) (bool)
> parm:   fwlps:Set to 1 to use FW control power save (default 1) (bool)
> 
> If you were to load rtl8723ae with 'ips=0', does it still fail?
> If you were to load the driver with 'swlps=1 fwlps=0', does it still fail?
> 

this does not work:

modprobe rtl8723ae debug_mask=0x debug_level=5 ips=0

tail -f /var/log/syslog|grep "AP off" 
Jan  5 11:42:06 w-ed kernel: [ 7267.229713] rtlwifi: :<0> AP off for 2 s
Jan  5 11:42:08 w-ed kernel: [ 7269.276761] rtlwifi: :<0> AP off for 4 s
Jan  5 11:42:10 w-ed kernel: [ 7271.323758] rtlwifi: :<0> AP off for 6 s
Jan  5 11:42:12 w-ed kernel: [ 7273.370759] rtlwifi: :<0> AP off for 8 s
Jan  5 11:42:14 w-ed kernel: [ 7275.417753] rtlwifi: :<0> AP off for 10 s
Jan  5 11:42:14 w-ed kernel: [ 7275.417754] rtlwifi: AP off, try to reconnect 
now
Jan  5 11:42:28 w-ed kernel: [ 7289.746676] rtlwifi: :<0> AP off for 2 s
Jan  5 11:42:40 w-ed kernel: [ 7302.028327] rtlwifi: :<0> AP off for 2 s
Jan  5 11:42:43 w-ed kernel: [ 7304.075327] rtlwifi: :<0> AP off for 4 s
Jan  5 11:42:45 w-ed kernel: [ 7306.122330] rtlwifi: :<0> AP off for 6 s
Jan  5 11:42:47 w-ed kernel: [ 7308.169292] rtlwifi: :<0> AP off for 8 s
Jan  5 11:42:49 w-ed kernel: [ 7310.216236] rtlwifi: :<0> AP off for 10 s
Jan  5 11:42:49 w-ed kernel: [ 7310.216238] rtlwifi: AP off, try to reconnect 
now
Jan  5 11:43:05 w-ed kernel: [ 7326.59] rtlwifi: :<0> AP off for 2 s
Jan  5 11:43:07 w-ed kernel: [ 7328.639076] rtlwifi: :<0> AP off for 4 s
Jan  5 11:43:09 w-ed kernel: [ 7330.686220] rtlwifi: :<0> AP off for 6 s
Jan  5 11:43:11 w-ed kernel: [ 7332.733078] rtlwifi: :<0> AP off for 8 s
Jan  5 11:43:13 w-ed kernel: [ 7334.779988] rtlwifi: :<0> AP off for 10 s
Jan  5 11:43:13 w-ed kernel: [ 7334.779989] rtlwifi: AP off, try to reconnect 
now
Jan  5 11:43:28 w-ed kernel: [ 7349.108839] rtlwifi: :<0> AP off for 2 s
Jan  5 11:43:30 w-ed kernel: [ 7351.155837] rtlwifi: :<0> AP off for 4 s
Jan  5 11:43:32 w-ed kernel: [ 7353.202838] rtlwifi: :<0> AP off for 6 s
Jan  5 11:43:42 w-ed kernel: [ 7363.437779] rtlwifi: :<0> AP off for 2 s
Jan  5 11:43:46 w-ed kernel: [ 7367.531622] rtlwifi: :<0> AP off for 2 s
Jan  5 11:43:48 w-ed kernel: [ 7369.578597] rtlwifi: :<0> AP off for 4 s
Jan  5 11:43:50 w-ed kernel: [ 7371.625694] rtlwifi: :<0> AP off for 6 s
Jan  5 11:43:52 w-ed kernel: [ 7373.672691] rtlwifi: :<0> AP off for 8 s
Jan  5 11:43:54 w-ed kernel: [ 7375.719690] rtlwifi: :<0> AP off for 10 s
Jan  5 11:43:54 w-ed kernel: [ 7375.719691] rtlwifi: AP off, try to reconnect 
now
Jan  5 11:44:09 w-ed kernel: [ 7390.048406] rtlwifi: :<0> AP off for 2 s
Jan  5 11:44:11 w-ed kernel: [ 7392.095678] rtlwifi: :<0> AP off for 4 s
Jan  5 11:44:13 w-ed kernel: [ 7394.142349] rtlwifi: :<0> AP off for 6 s
Jan  5 11:44:15 w-ed kernel: [ 7396.189352] rtlwifi: :<0> AP off for 8 s
Jan  5 11:44:17 w-ed kernel: [ 7398.236352] rtlwifi: :<0> AP off for 10 s
Jan  5 11:44:17 w-ed kernel: [ 7398.236353] rtlwifi: AP off, try to reconnect 
now
Jan  5 11:44:31 w-ed kernel: [ 7412.565079] rtlwifi: :<0> AP off for 2 s
Jan  5 11:44:33 w-ed kernel: [ 7414.612167] rtlwifi: :<0> AP off for 4 s
Jan  5 11:44:35 w-ed kernel: [ 7416.659101] rtlwifi: :<0> AP off for 6 s
Jan  5 11:44:37 w-ed kernel: [ 7418.706035] rtlwifi: :<0> AP off for 8 s
Jan  5 11:44:39 w-ed kernel: [ 7420.753100] rtlwifi: :<0> AP off for 10 s
Jan  5 11:44:39 w-ed kernel: [ 7420.753101] rtlwifi: AP off, try to reconnect 
now
Jan  5 11:44:54 w-ed kernel: [ 7435.081860] rtlwifi: :<0> AP off for 2 s
Jan  5 11:44:56 w-ed kernel: [ 7437.128857] rtlwifi: :<0> AP off for 4 s
Jan  5 11:45:08 w-ed kernel: [ 7449.410653] rtlwifi: :<0> AP off for 2 

[PATCH 1/4] rtlwifi: rtl8723ae: Take the FW LPS mode handling out

2019-01-04 Thread Bernd Edlinger
This appears to trigger a firmware bug and causes severe
problems with rtl8723ae PCI devices.

When the power save mode is activated for longer periods
of time the firmware stops to receive any packets.

This problem was exposed by commit 873ffe154ae0 ("rtlwifi:
Fix logic error in enter/exit power-save mode").

Previously the power save mode was only active rarely and
only for a short time so that the problem was not noticeable.

Signed-off-by: Bernd Edlinger 
---
  .../net/wireless/realtek/rtlwifi/rtl8723ae/fw.c| 20 
  .../net/wireless/realtek/rtlwifi/rtl8723ae/fw.h|  1 -
  .../net/wireless/realtek/rtlwifi/rtl8723ae/hw.c| 56 +-
  3 files changed, 1 insertion(+), 76 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.c
index bf9859f..77833fb 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.c
@@ -268,26 +268,6 @@ void rtl8723e_fill_h2c_cmd(struct ieee80211_hw *hw,
   (u8 *)_cmdbuf);
  }
  
-void rtl8723e_set_fw_pwrmode_cmd(struct ieee80211_hw *hw, u8 mode)
-{
-   struct rtl_priv *rtlpriv = rtl_priv(hw);
-   u8 u1_h2c_set_pwrmode[3] = { 0 };
-   struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw));
-
-   RT_TRACE(rtlpriv, COMP_POWER, DBG_LOUD, "FW LPS mode = %d\n", mode);
-
-   SET_H2CCMD_PWRMODE_PARM_MODE(u1_h2c_set_pwrmode, mode);
-   SET_H2CCMD_PWRMODE_PARM_SMART_PS(u1_h2c_set_pwrmode,
-   (rtlpriv->mac80211.p2p) ? ppsc->smart_ps : 1);
-   SET_H2CCMD_PWRMODE_PARM_BCN_PASS_TIME(u1_h2c_set_pwrmode,
- ppsc->reg_max_lps_awakeintvl);
-
-   RT_PRINT_DATA(rtlpriv, COMP_CMD, DBG_DMESG,
- "rtl8723e_set_fw_rsvdpagepkt(): u1_h2c_set_pwrmode\n",
- u1_h2c_set_pwrmode, 3);
-   rtl8723e_fill_h2c_cmd(hw, H2C_SETPWRMODE, 3, u1_h2c_set_pwrmode);
-}
-
  #define BEACON_PG 0 /* ->1 */
  #define PSPOLL_PG 2
  #define NULL_PG   3
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.h 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.h
index 2e668fc..8618b82 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.h
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.h
@@ -55,7 +55,6 @@
  
  void rtl8723e_fill_h2c_cmd(struct ieee80211_hw *hw, u8 element_id,
   u32 cmd_len, u8 *p_cmdbuffer);
-void rtl8723e_set_fw_pwrmode_cmd(struct ieee80211_hw *hw, u8 mode);
  void rtl8723e_set_fw_rsvdpagepkt(struct ieee80211_hw *hw, bool b_dl_finished);
  void rtl8723e_set_fw_joinbss_report_cmd(struct ieee80211_hw *hw, u8 mstatus);
  void rtl8723e_set_p2p_ps_offload_cmd(struct ieee80211_hw *hw, u8 
p2p_ps_state);
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c
index f783e4a..f0eb356 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c
@@ -408,29 +408,7 @@ void rtl8723e_set_hw_reg(struct ieee80211_hw *hw, u8 
variable, u8 *val)
case HW_VAR_WPA_CONFIG:
rtl_write_byte(rtlpriv, REG_SECCFG, *((u8 *)val));
break;
-   case HW_VAR_SET_RPWM:{
-   u8 rpwm_val;
-
-   rpwm_val = rtl_read_byte(rtlpriv, REG_PCIE_HRPWM);
-   udelay(1);
-
-   if (rpwm_val & BIT(7)) {
-   rtl_write_byte(rtlpriv, REG_PCIE_HRPWM,
-  (*(u8 *)val));
-   } else {
-   rtl_write_byte(rtlpriv, REG_PCIE_HRPWM,
-  ((*(u8 *)val) | BIT(7)));
-   }
-
-   break;
-   }
case HW_VAR_H2C_FW_PWRMODE:{
-   u8 psmode = (*(u8 *)val);
-
-   if (psmode != FW_PS_ACTIVE_MODE)
-   rtl8723e_dm_rf_saving(hw, true);
-
-   rtl8723e_set_fw_pwrmode_cmd(hw, (*(u8 *)val));
break;
}
case HW_VAR_FW_PSMODE_STATUS:
@@ -513,39 +491,7 @@ void rtl8723e_set_hw_reg(struct ieee80211_hw *hw, u8 
variable, u8 *val)
break;
}
case HW_VAR_FW_LPS_ACTION:{
-   bool b_enter_fwlps = *((bool *)val);
-   u8 rpwm_val, fw_pwrmode;
-   bool fw_current_inps;
-
-   if (b_enter_fwlps) {
-   rpwm_val = 0x02;/* RF off */
-   fw_current_inps = true;
-   rtlpriv->cfg->ops->set_hw_reg(hw,
-   

[PATCH 2/4] rtlwifi: rtl8723ae: Don't use dm.undec_sm_pwdb for input

2019-01-04 Thread Bernd Edlinger
  gain control when no beacon was received in the connected state.

This avoids sporadic "Connection to AP ... lost" errors.

Signed-off-by: Bernd Edlinger 
---
  drivers/net/wireless/realtek/rtlwifi/core.c | 2 ++
  drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c | 8 
  2 files changed, 10 insertions(+)

diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c 
b/drivers/net/wireless/realtek/rtlwifi/core.c
index 4bf7967..ce23339 100644
--- a/drivers/net/wireless/realtek/rtlwifi/core.c
+++ b/drivers/net/wireless/realtek/rtlwifi/core.c
@@ -1957,5 +1957,7 @@ void rtl_dm_diginit(struct ieee80211_hw *hw, u32 
cur_igvalue)
dm_digtable->bt30_cur_igi = 0x32;
dm_digtable->pre_cck_pd_state = CCK_PD_STAGE_MAX;
dm_digtable->cur_cck_pd_state = CCK_PD_STAGE_LOWRSSI;
+   dm_digtable->pre_cck_fa_state = 0;
+   dm_digtable->cur_cck_fa_state = 0;
  }
  EXPORT_SYMBOL(rtl_dm_diginit);
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
index 42a6fba..902b944 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
@@ -151,8 +151,14 @@ static u8 rtl8723e_dm_initial_gain_min_pwdb(struct 
ieee80211_hw *hw)
  {
struct rtl_priv *rtlpriv = rtl_priv(hw);
struct dig_t *dm_digtable = >dm_digtable;
+   struct rtl_mac *mac = rtl_mac(rtlpriv);
long rssi_val_min = 0;
  
+   if (mac->link_state == MAC80211_LINKED &&
+   mac->opmode == NL80211_IFTYPE_STATION &&
+   rtlpriv->link_info.bcn_rx_inperiod == 0)
+   return 0;
+
if ((dm_digtable->curmultista_cstate == DIG_MULTISTA_CONNECT) &&
(dm_digtable->cursta_cstate == DIG_STA_CONNECT)) {
if (rtlpriv->dm.entry_min_undec_sm_pwdb != 0)
@@ -417,6 +423,8 @@ static void rtl8723e_dm_cck_packet_detection_thresh(struct 
ieee80211_hw *hw)
} else {
rtl_set_bbreg(hw, RCCK0_CCA, MASKBYTE2, 0xcd);
rtl_set_bbreg(hw, RCCK0_SYSTEM, MASKBYTE1, 0x47);
+   dm_digtable->pre_cck_fa_state = 0;
+   dm_digtable->cur_cck_fa_state = 0;
  
}
dm_digtable->pre_cck_pd_state = dm_digtable->cur_cck_pd_state;
-- 
1.9.1


[PATCH 4/4] rtlwifi: Move the clearing of

2019-01-04 Thread Bernd Edlinger
  rtlpriv->link_info.num_rx_inperiod in rtl_watchdog_wq_callback a few lines
  down.

This is necessary since it is still used in the "AP off" detection
code block. Moved clearing of rtlpriv->link_info.num_rx_inperiod
as well for consistency.

Signed-off-by: Bernd Edlinger 
---
  drivers/net/wireless/realtek/rtlwifi/base.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/base.c 
b/drivers/net/wireless/realtek/rtlwifi/base.c
index ef9b502..7aa68fe 100644
--- a/drivers/net/wireless/realtek/rtlwifi/base.c
+++ b/drivers/net/wireless/realtek/rtlwifi/base.c
@@ -2172,8 +2172,6 @@ void rtl_watchdog_wq_callback(void *data)
;
}
  
-   rtlpriv->link_info.num_rx_inperiod = 0;
-   rtlpriv->link_info.num_tx_inperiod = 0;
for (tid = 0; tid <= 7; tid++)
rtlpriv->link_info.tidtx_inperiod[tid] = 0;
  
@@ -2236,6 +2234,8 @@ void rtl_watchdog_wq_callback(void *data)
rtlpriv->btcoexist.btc_info.in_4way = false;
}
  
+   rtlpriv->link_info.num_rx_inperiod = 0;
+   rtlpriv->link_info.num_tx_inperiod = 0;
rtlpriv->link_info.bcn_rx_inperiod = 0;
  
/* <6> scan list */
-- 
1.9.1


[PATCH 0/4] rtlwifi: Fix issues with rtl8723ae

2019-01-04 Thread Bernd Edlinger
Currently the rtl8723ae driver is broken (since v4.7).

Connection to AP is lost very often, especially when
the signal level is not very good.

The main issue is the power save mode is basically
not working, and seems to trigger a firmware bug.
So I had to remove that code.

While debugging the driver I found a couple related issues,
for instance that the signal level in dm.undec_sm_pwdb
is no longer accurate (may be even much too high) when no more
packets are received, and it increases the likelihood to receive
something if the input gain is set to maximum.

The patch was tested with the rtl8723ae PCI card in my laptop
against a FRITZ!Box 7590 AP -- the WiFi connection works now
very reliable for me.


Bernd Edlinger (4):
   rtlwifi: rtl8723ae: Take the FW LPS mode handling out
   rtlwifi: rtl8723ae: Don't use dm.undec_sm_pwdb for input gain control
when no beacon was received in the connected state.
   rtlwifi: rtl8723ae: Re-introduce
 rtl8723e_dm_refresh_rate_adaptive_mask
   rtlwifi: Move the clearing of rtlpriv->link_info.num_rx_inperiod in
  rtl_watchdog_wq_callback a few lines down.

  drivers/net/wireless/realtek/rtlwifi/base.c|  4 +-
  drivers/net/wireless/realtek/rtlwifi/core.c|  2 +
  .../net/wireless/realtek/rtlwifi/rtl8723ae/dm.c| 95 +-
  .../net/wireless/realtek/rtlwifi/rtl8723ae/fw.c| 20 -
  .../net/wireless/realtek/rtlwifi/rtl8723ae/fw.h|  1 -
  .../net/wireless/realtek/rtlwifi/rtl8723ae/hw.c| 56 +
  6 files changed, 98 insertions(+), 80 deletions(-)

-- 
1.9.1


[PATCH 3/4] rtlwifi: rtl8723ae: Re-introduce

2019-01-04 Thread Bernd Edlinger
  rtl8723e_dm_refresh_rate_adaptive_mask

This function was present in a previous version of the code base,
it works just fine for me -- as long as it is not using stale data.

Fixed a style nit in rtl8723e_dm_init_rate_adaptive_mask.

Signed-off-by: Bernd Edlinger 
---
  .../net/wireless/realtek/rtlwifi/rtl8723ae/dm.c| 87 +-
  1 file changed, 85 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
index 902b944..acfd54c 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
@@ -673,7 +673,7 @@ void rtl8723e_dm_check_txpower_tracking(struct ieee80211_hw 
*hw)
  void rtl8723e_dm_init_rate_adaptive_mask(struct ieee80211_hw *hw)
  {
struct rtl_priv *rtlpriv = rtl_priv(hw);
-   struct rate_adaptive *p_ra = &(rtlpriv->ra);
+   struct rate_adaptive *p_ra = >ra;
  
p_ra->ratr_state = DM_RATR_STA_INIT;
p_ra->pre_ratr_state = DM_RATR_STA_INIT;
@@ -685,6 +685,89 @@ void rtl8723e_dm_init_rate_adaptive_mask(struct 
ieee80211_hw *hw)
  
  }
  
+void rtl8723e_dm_refresh_rate_adaptive_mask(struct ieee80211_hw *hw)
+{
+   struct rtl_priv *rtlpriv = rtl_priv(hw);
+   struct rtl_hal *rtlhal = rtl_hal(rtl_priv(hw));
+   struct rtl_mac *mac = rtl_mac(rtl_priv(hw));
+   struct rate_adaptive *p_ra = >ra;
+   u32 low_rssithresh_for_ra, high_rssithresh_for_ra;
+   struct ieee80211_sta *sta = NULL;
+
+   if (is_hal_stop(rtlhal)) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+" driver is going to unload\n");
+   return;
+   }
+
+   if (!rtlpriv->dm.useramask) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+" driver does not control rate adaptive mask\n");
+   return;
+   }
+
+   if (mac->link_state == MAC80211_LINKED &&
+   mac->opmode == NL80211_IFTYPE_STATION) {
+   switch (p_ra->pre_ratr_state) {
+   case DM_RATR_STA_HIGH:
+   high_rssithresh_for_ra = 50;
+   low_rssithresh_for_ra = 20;
+   break;
+   case DM_RATR_STA_MIDDLE:
+   high_rssithresh_for_ra = 55;
+   low_rssithresh_for_ra = 20;
+   break;
+   case DM_RATR_STA_LOW:
+   high_rssithresh_for_ra = 60;
+   low_rssithresh_for_ra = 25;
+   break;
+   default:
+   high_rssithresh_for_ra = 50;
+   low_rssithresh_for_ra = 20;
+   break;
+   }
+
+   if (rtlpriv->link_info.bcn_rx_inperiod == 0)
+   switch (p_ra->pre_ratr_state) {
+   case DM_RATR_STA_HIGH:
+   default:
+   p_ra->ratr_state = DM_RATR_STA_MIDDLE;
+   break;
+   case DM_RATR_STA_MIDDLE:
+   case DM_RATR_STA_LOW:
+   p_ra->ratr_state = DM_RATR_STA_LOW;
+   break;
+   }
+   else if (rtlpriv->dm.undec_sm_pwdb > high_rssithresh_for_ra)
+   p_ra->ratr_state = DM_RATR_STA_HIGH;
+   else if (rtlpriv->dm.undec_sm_pwdb > low_rssithresh_for_ra)
+   p_ra->ratr_state = DM_RATR_STA_MIDDLE;
+   else
+   p_ra->ratr_state = DM_RATR_STA_LOW;
+
+   if (p_ra->pre_ratr_state != p_ra->ratr_state) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"RSSI = %ld\n",
+rtlpriv->dm.undec_sm_pwdb);
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"RSSI_LEVEL = %d\n", p_ra->ratr_state);
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"PreState = %d, CurState = %d\n",
+p_ra->pre_ratr_state, p_ra->ratr_state);
+
+   rcu_read_lock();
+   sta = rtl_find_sta(hw, mac->bssid);
+   if (sta)
+   rtlpriv->cfg->ops->update_rate_tbl(hw, sta,
+  p_ra->ratr_state,
+ true);
+   rcu_read_unlock();
+
+   p_ra->pre_ratr_state = p_ra->ratr_state;
+   }
+   }
+}
+
  void rtl8723e_dm_rf_saving(struct 

Re: [PATCHv3] Fix range checks in kernfs_get_target_path

2018-08-26 Thread Bernd Edlinger
Ping...
Sorry, I had actually completely forgotten about this one.

On 07/07/18 19:52, Bernd Edlinger wrote:
> The terminating NUL byte is only there because the buffer is
> allocated with kzalloc(PAGE_SIZE, GFP_KERNEL), but since the
> range-check is off-by-one, and PAGE_SIZE==PATH_MAX, the
> returned string may not be zero-terminated if it is exactly
> PATH_MAX characters long.  Furthermore also the initial loop
> may theoretically exceed PATH_MAX and cause a fault.
> 
> Signed-off-by: Bernd Edlinger 
> ---
>   fs/kernfs/symlink.c | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
> index 08ccabd..774bada 100644
> --- a/fs/kernfs/symlink.c
> +++ b/fs/kernfs/symlink.c
> @@ -63,6 +63,9 @@ static int kernfs_get_target_path(struct kernfs_node
>   if (base == kn)
>   break;
> 
> +    if ((s - path) + 3 >= PATH_MAX)
> +    return -ENAMETOOLONG;
> +
>   strcpy(s, "../");
>   s += 3;
>   base = base->parent;
> @@ -79,7 +82,7 @@ static int kernfs_get_target_path(struct kernfs_node
>   if (len < 2)
>   return -EINVAL;
>   len--;
> -    if ((s - path) + len > PATH_MAX)
> +    if ((s - path) + len >= PATH_MAX)
>   return -ENAMETOOLONG;
> 
>   /* reverse fillup of target string from target to base */


Re: [PATCHv3] Fix range checks in kernfs_get_target_path

2018-08-26 Thread Bernd Edlinger
Ping...
Sorry, I had actually completely forgotten about this one.

On 07/07/18 19:52, Bernd Edlinger wrote:
> The terminating NUL byte is only there because the buffer is
> allocated with kzalloc(PAGE_SIZE, GFP_KERNEL), but since the
> range-check is off-by-one, and PAGE_SIZE==PATH_MAX, the
> returned string may not be zero-terminated if it is exactly
> PATH_MAX characters long.  Furthermore also the initial loop
> may theoretically exceed PATH_MAX and cause a fault.
> 
> Signed-off-by: Bernd Edlinger 
> ---
>   fs/kernfs/symlink.c | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
> index 08ccabd..774bada 100644
> --- a/fs/kernfs/symlink.c
> +++ b/fs/kernfs/symlink.c
> @@ -63,6 +63,9 @@ static int kernfs_get_target_path(struct kernfs_node
>   if (base == kn)
>   break;
> 
> +    if ((s - path) + 3 >= PATH_MAX)
> +    return -ENAMETOOLONG;
> +
>   strcpy(s, "../");
>   s += 3;
>   base = base->parent;
> @@ -79,7 +82,7 @@ static int kernfs_get_target_path(struct kernfs_node
>   if (len < 2)
>   return -EINVAL;
>   len--;
> -    if ((s - path) + len > PATH_MAX)
> +    if ((s - path) + len >= PATH_MAX)
>   return -ENAMETOOLONG;
> 
>   /* reverse fillup of target string from target to base */


Re: [PATCH] Fix the oldconfig target rule

2018-07-08 Thread Bernd Edlinger
On 07/08/18 12:41, Masahiro Yamada wrote:
> 2018-07-08 17:09 GMT+09:00 Bernd Edlinger :
>> On 07/08/18 08:31, Bernd Edlinger wrote:
>>> The next make after an oldconfig reads in the outdated
>>> include/config/auto.conf which can kill the make before
>>> it is able to call the syncconfig target.
>>>
>>> $ make defconfig
>>> *** Default configuration is based on 'x86_64_defconfig'
>>> $ make
>>> scripts/kconfig/conf  --syncconfig Kconfig
>>> Makefile:936: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
>>> please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.
>>>
>>> $ sed -i s/CONFIG_UNWINDER_ORC=y// .configs
>>> $ make oldconfig
>>> Choose kernel unwinder
>>>> 1. ORC unwinder (UNWINDER_ORC) (NEW)
>>> 2. Frame pointer unwinder (UNWINDER_FRAME_POINTER)
>>> choice[1-2?]: 2
>>> $ make
>>> Makefile:936: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
>>> please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.
>>>
>>> With this patch oldconfig and similar targets like
>>> menuconfig call syncconfig to avoid this failure.
>>>
>>> Signed-off-by: Bernd Edlinger 
>>> ---
>>>scripts/kconfig/Makefile | 1 +
>>>1 file changed, 1 insertion(+)
>>>
>>> diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
>>> index a3ac2c9..ef34275 100644
>>> --- a/scripts/kconfig/Makefile
>>> +++ b/scripts/kconfig/Makefile
>>> @@ -62,6 +62,7 @@ PHONY += $(simple-targets)
>>>
>>>$(simple-targets): $(obj)/conf
>>>$< $(silent) --$@ $(Kconfig)
>>> +$(Q)$(MAKE) -f $(srctree)/Makefile syncconfig
>>
>> On second thought, removing the auto.conf might be sufficient:
>> $(Q)rm -f include/config/auto.conf include/config/auto.conf.cmd
>>
>> Would you prefer that?
> 
> No.
> 
> My suggestion is this:
> https://patchwork.kernel.org/patch/10513065/
> 
> 

Ok, thanks a lot.

I tried your patch and it seems reasonable me.
(nit: In the change log entry s/RETPORINE/RETPOLINE/)

However the warning is gone as well, which was probably done
on purpose to push people to install the elf library.

I wonder in general if the test using HOSTCC misses
somehow the possibility that there may be a CROSS_COMPLILE ?


Bernd.


Re: [PATCH] Fix the oldconfig target rule

2018-07-08 Thread Bernd Edlinger
On 07/08/18 12:41, Masahiro Yamada wrote:
> 2018-07-08 17:09 GMT+09:00 Bernd Edlinger :
>> On 07/08/18 08:31, Bernd Edlinger wrote:
>>> The next make after an oldconfig reads in the outdated
>>> include/config/auto.conf which can kill the make before
>>> it is able to call the syncconfig target.
>>>
>>> $ make defconfig
>>> *** Default configuration is based on 'x86_64_defconfig'
>>> $ make
>>> scripts/kconfig/conf  --syncconfig Kconfig
>>> Makefile:936: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
>>> please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.
>>>
>>> $ sed -i s/CONFIG_UNWINDER_ORC=y// .configs
>>> $ make oldconfig
>>> Choose kernel unwinder
>>>> 1. ORC unwinder (UNWINDER_ORC) (NEW)
>>> 2. Frame pointer unwinder (UNWINDER_FRAME_POINTER)
>>> choice[1-2?]: 2
>>> $ make
>>> Makefile:936: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
>>> please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.
>>>
>>> With this patch oldconfig and similar targets like
>>> menuconfig call syncconfig to avoid this failure.
>>>
>>> Signed-off-by: Bernd Edlinger 
>>> ---
>>>scripts/kconfig/Makefile | 1 +
>>>1 file changed, 1 insertion(+)
>>>
>>> diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
>>> index a3ac2c9..ef34275 100644
>>> --- a/scripts/kconfig/Makefile
>>> +++ b/scripts/kconfig/Makefile
>>> @@ -62,6 +62,7 @@ PHONY += $(simple-targets)
>>>
>>>$(simple-targets): $(obj)/conf
>>>$< $(silent) --$@ $(Kconfig)
>>> +$(Q)$(MAKE) -f $(srctree)/Makefile syncconfig
>>
>> On second thought, removing the auto.conf might be sufficient:
>> $(Q)rm -f include/config/auto.conf include/config/auto.conf.cmd
>>
>> Would you prefer that?
> 
> No.
> 
> My suggestion is this:
> https://patchwork.kernel.org/patch/10513065/
> 
> 

Ok, thanks a lot.

I tried your patch and it seems reasonable me.
(nit: In the change log entry s/RETPORINE/RETPOLINE/)

However the warning is gone as well, which was probably done
on purpose to push people to install the elf library.

I wonder in general if the test using HOSTCC misses
somehow the possibility that there may be a CROSS_COMPLILE ?


Bernd.


Re: [PATCH] Fix the oldconfig target rule

2018-07-08 Thread Bernd Edlinger
On 07/08/18 08:31, Bernd Edlinger wrote:
> The next make after an oldconfig reads in the outdated
> include/config/auto.conf which can kill the make before
> it is able to call the syncconfig target.
> 
> $ make defconfig
> *** Default configuration is based on 'x86_64_defconfig'
> $ make
> scripts/kconfig/conf  --syncconfig Kconfig
> Makefile:936: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
> please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.
> 
> $ sed -i s/CONFIG_UNWINDER_ORC=y// .configs
> $ make oldconfig
> Choose kernel unwinder
>> 1. ORC unwinder (UNWINDER_ORC) (NEW)
>    2. Frame pointer unwinder (UNWINDER_FRAME_POINTER)
> choice[1-2?]: 2
> $ make
> Makefile:936: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
> please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.
> 
> With this patch oldconfig and similar targets like
> menuconfig call syncconfig to avoid this failure.
> 
> Signed-off-by: Bernd Edlinger 
> ---
>   scripts/kconfig/Makefile | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
> index a3ac2c9..ef34275 100644
> --- a/scripts/kconfig/Makefile
> +++ b/scripts/kconfig/Makefile
> @@ -62,6 +62,7 @@ PHONY += $(simple-targets)
> 
>   $(simple-targets): $(obj)/conf
>   $< $(silent) --$@ $(Kconfig)
> +    $(Q)$(MAKE) -f $(srctree)/Makefile syncconfig

On second thought, removing the auto.conf might be sufficient:
$(Q)rm -f include/config/auto.conf include/config/auto.conf.cmd

Would you prefer that?


> 
>   PHONY += oldnoconfig silentoldconfig savedefconfig defconfig
> 


Re: [PATCH] Fix the oldconfig target rule

2018-07-08 Thread Bernd Edlinger
On 07/08/18 08:31, Bernd Edlinger wrote:
> The next make after an oldconfig reads in the outdated
> include/config/auto.conf which can kill the make before
> it is able to call the syncconfig target.
> 
> $ make defconfig
> *** Default configuration is based on 'x86_64_defconfig'
> $ make
> scripts/kconfig/conf  --syncconfig Kconfig
> Makefile:936: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
> please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.
> 
> $ sed -i s/CONFIG_UNWINDER_ORC=y// .configs
> $ make oldconfig
> Choose kernel unwinder
>> 1. ORC unwinder (UNWINDER_ORC) (NEW)
>    2. Frame pointer unwinder (UNWINDER_FRAME_POINTER)
> choice[1-2?]: 2
> $ make
> Makefile:936: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
> please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.
> 
> With this patch oldconfig and similar targets like
> menuconfig call syncconfig to avoid this failure.
> 
> Signed-off-by: Bernd Edlinger 
> ---
>   scripts/kconfig/Makefile | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
> index a3ac2c9..ef34275 100644
> --- a/scripts/kconfig/Makefile
> +++ b/scripts/kconfig/Makefile
> @@ -62,6 +62,7 @@ PHONY += $(simple-targets)
> 
>   $(simple-targets): $(obj)/conf
>   $< $(silent) --$@ $(Kconfig)
> +    $(Q)$(MAKE) -f $(srctree)/Makefile syncconfig

On second thought, removing the auto.conf might be sufficient:
$(Q)rm -f include/config/auto.conf include/config/auto.conf.cmd

Would you prefer that?


> 
>   PHONY += oldnoconfig silentoldconfig savedefconfig defconfig
> 


[PATCH] Fix the oldconfig target rule

2018-07-08 Thread Bernd Edlinger
The next make after an oldconfig reads in the outdated
include/config/auto.conf which can kill the make before
it is able to call the syncconfig target.

$ make defconfig
*** Default configuration is based on 'x86_64_defconfig'
$ make
scripts/kconfig/conf  --syncconfig Kconfig
Makefile:936: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.

$ sed -i s/CONFIG_UNWINDER_ORC=y// .configs
$ make oldconfig
Choose kernel unwinder
> 1. ORC unwinder (UNWINDER_ORC) (NEW)
   2. Frame pointer unwinder (UNWINDER_FRAME_POINTER)
choice[1-2?]: 2
$ make
Makefile:936: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.

With this patch oldconfig and similar targets like
menuconfig call syncconfig to avoid this failure.

Signed-off-by: Bernd Edlinger 
---
  scripts/kconfig/Makefile | 1 +
  1 file changed, 1 insertion(+)

diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
index a3ac2c9..ef34275 100644
--- a/scripts/kconfig/Makefile
+++ b/scripts/kconfig/Makefile
@@ -62,6 +62,7 @@ PHONY += $(simple-targets)
  
  $(simple-targets): $(obj)/conf
$< $(silent) --$@ $(Kconfig)
+   $(Q)$(MAKE) -f $(srctree)/Makefile syncconfig
  
  PHONY += oldnoconfig silentoldconfig savedefconfig defconfig
  
-- 
1.9.1


[PATCH] Fix the oldconfig target rule

2018-07-08 Thread Bernd Edlinger
The next make after an oldconfig reads in the outdated
include/config/auto.conf which can kill the make before
it is able to call the syncconfig target.

$ make defconfig
*** Default configuration is based on 'x86_64_defconfig'
$ make
scripts/kconfig/conf  --syncconfig Kconfig
Makefile:936: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.

$ sed -i s/CONFIG_UNWINDER_ORC=y// .configs
$ make oldconfig
Choose kernel unwinder
> 1. ORC unwinder (UNWINDER_ORC) (NEW)
   2. Frame pointer unwinder (UNWINDER_FRAME_POINTER)
choice[1-2?]: 2
$ make
Makefile:936: *** "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y,
please install libelf-dev, libelf-devel or elfutils-libelf-devel".  Stop.

With this patch oldconfig and similar targets like
menuconfig call syncconfig to avoid this failure.

Signed-off-by: Bernd Edlinger 
---
  scripts/kconfig/Makefile | 1 +
  1 file changed, 1 insertion(+)

diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
index a3ac2c9..ef34275 100644
--- a/scripts/kconfig/Makefile
+++ b/scripts/kconfig/Makefile
@@ -62,6 +62,7 @@ PHONY += $(simple-targets)
  
  $(simple-targets): $(obj)/conf
$< $(silent) --$@ $(Kconfig)
+   $(Q)$(MAKE) -f $(srctree)/Makefile syncconfig
  
  PHONY += oldnoconfig silentoldconfig savedefconfig defconfig
  
-- 
1.9.1


[PATCHv3] Fix range checks in kernfs_get_target_path

2018-07-07 Thread Bernd Edlinger
The terminating NUL byte is only there because the buffer is
allocated with kzalloc(PAGE_SIZE, GFP_KERNEL), but since the
range-check is off-by-one, and PAGE_SIZE==PATH_MAX, the
returned string may not be zero-terminated if it is exactly
PATH_MAX characters long.  Furthermore also the initial loop
may theoretically exceed PATH_MAX and cause a fault.

Signed-off-by: Bernd Edlinger 
---
  fs/kernfs/symlink.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 08ccabd..774bada 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -63,6 +63,9 @@ static int kernfs_get_target_path(struct kernfs_node
if (base == kn)
break;
  
+   if ((s - path) + 3 >= PATH_MAX)
+   return -ENAMETOOLONG;
+
strcpy(s, "../");
s += 3;
base = base->parent;
@@ -79,7 +82,7 @@ static int kernfs_get_target_path(struct kernfs_node
if (len < 2)
return -EINVAL;
len--;
-   if ((s - path) + len > PATH_MAX)
+   if ((s - path) + len >= PATH_MAX)
return -ENAMETOOLONG;
  
/* reverse fillup of target string from target to base */
-- 
1.9.1


[PATCHv3] Fix range checks in kernfs_get_target_path

2018-07-07 Thread Bernd Edlinger
The terminating NUL byte is only there because the buffer is
allocated with kzalloc(PAGE_SIZE, GFP_KERNEL), but since the
range-check is off-by-one, and PAGE_SIZE==PATH_MAX, the
returned string may not be zero-terminated if it is exactly
PATH_MAX characters long.  Furthermore also the initial loop
may theoretically exceed PATH_MAX and cause a fault.

Signed-off-by: Bernd Edlinger 
---
  fs/kernfs/symlink.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 08ccabd..774bada 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -63,6 +63,9 @@ static int kernfs_get_target_path(struct kernfs_node
if (base == kn)
break;
  
+   if ((s - path) + 3 >= PATH_MAX)
+   return -ENAMETOOLONG;
+
strcpy(s, "../");
s += 3;
base = base->parent;
@@ -79,7 +82,7 @@ static int kernfs_get_target_path(struct kernfs_node
if (len < 2)
return -EINVAL;
len--;
-   if ((s - path) + len > PATH_MAX)
+   if ((s - path) + len >= PATH_MAX)
return -ENAMETOOLONG;
  
/* reverse fillup of target string from target to base */
-- 
1.9.1


Re: [PATCHv2] Fix range checks in kernfs_get_target_path

2018-07-07 Thread Bernd Edlinger
On 07/07/18 16:01, Greg Kroah-Hartman wrote:
> On Sat, Jul 07, 2018 at 09:41:03AM +0000, Bernd Edlinger wrote:
>> The strncpy causes a warning [-Wstringop-truncation] here,
>> which indicates that it never appends a NUL byte to the path.
>> The NUL byte is only there because the buffer is allocated
>> with kzalloc(PAGE_SIZE, GFP_KERNEL), but since the range-check
>> is also off-by-one, and PAGE_SIZE==PATH_MAX the returned string
>> will not be zero-terminated if it is exactly PATH_MAX characters.
>> Furthermore also the initial loop may theoretically exceed PATH_MAX
>> and cause a fault.
>>
>> Signed-off-by: Bernd Edlinger 
>> ---
>>fs/kernfs/symlink.c | 10 +++---
>>1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
>> index 08ccabd..c8b7d44a 100644
>> --- a/fs/kernfs/symlink.c
>> +++ b/fs/kernfs/symlink.c
>> @@ -63,7 +63,10 @@ static int kernfs_get_target_path(struct kernfs_node
>>  if (base == kn)
>>  break;
>>
>> -strcpy(s, "../");
>> +if ((s - path) + 3 >= PATH_MAX)
>> +return -ENAMETOOLONG;
>> +
>> +memcpy(s, "../", 3);
>>  s += 3;
>>  base = base->parent;
>>  }
>> @@ -79,16 +82,17 @@ static int kernfs_get_target_path(struct kernfs_node
>>  if (len < 2)
>>  return -EINVAL;
>>  len--;
>> -if ((s - path) + len > PATH_MAX)
>> +if ((s - path) + len >= PATH_MAX)
>>  return -ENAMETOOLONG;
>>
>>  /* reverse fillup of target string from target to base */
>>  kn = target;
>> +s[len] = '\0';
>>  while (kn->parent && kn != base) {
>>  int slen = strlen(kn->name);
>>
>>  len -= slen;
>> -strncpy(s + len, kn->name, slen);
>> +memcpy(s + len, kn->name, slen);
>>  if (len)
>>  s[--len] = '/';
>>
> 
> This last memcpy replacement has already been applied to my tree, from a
> patch from soeone else, so are you sure all of the other changes are
> also really needed?  Why the extra \0 termination of a string that is
> already terminated?
> 

I did only a code review, but the range checks look really dangerously
wrong.

The string is only zero-terminated because it is allocated in
kernfs_iop_get_link with body = kzalloc(PAGE_SIZE, GFP_KERNEL);

I would recommend to explicitly place a termination in the
buffer, and not rely on the way how the buffer is allocated.

> And why is the first memcpy replacement needed?  gcc doesn't say
> anything about that, does it?
> 

No, that is more or less for efficiency reasons, since it is writing
the NUL which is always overwritten with something different in the
next step.  If the loop is executed zero times, there result is not
explicitly zero-terminated either, so using strcpy is somehow misleading.

Well, I would say personal taste, if the loop below constructs
the string with memcpy the loop above can do the same.

If you prefer the strcpy in the first loop, I have no strong
preference here.


Thanks
Bernd.

> thanks,
> 
> greg k-h
> 


Re: [PATCHv2] Fix range checks in kernfs_get_target_path

2018-07-07 Thread Bernd Edlinger
On 07/07/18 16:01, Greg Kroah-Hartman wrote:
> On Sat, Jul 07, 2018 at 09:41:03AM +0000, Bernd Edlinger wrote:
>> The strncpy causes a warning [-Wstringop-truncation] here,
>> which indicates that it never appends a NUL byte to the path.
>> The NUL byte is only there because the buffer is allocated
>> with kzalloc(PAGE_SIZE, GFP_KERNEL), but since the range-check
>> is also off-by-one, and PAGE_SIZE==PATH_MAX the returned string
>> will not be zero-terminated if it is exactly PATH_MAX characters.
>> Furthermore also the initial loop may theoretically exceed PATH_MAX
>> and cause a fault.
>>
>> Signed-off-by: Bernd Edlinger 
>> ---
>>fs/kernfs/symlink.c | 10 +++---
>>1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
>> index 08ccabd..c8b7d44a 100644
>> --- a/fs/kernfs/symlink.c
>> +++ b/fs/kernfs/symlink.c
>> @@ -63,7 +63,10 @@ static int kernfs_get_target_path(struct kernfs_node
>>  if (base == kn)
>>  break;
>>
>> -strcpy(s, "../");
>> +if ((s - path) + 3 >= PATH_MAX)
>> +return -ENAMETOOLONG;
>> +
>> +memcpy(s, "../", 3);
>>  s += 3;
>>  base = base->parent;
>>  }
>> @@ -79,16 +82,17 @@ static int kernfs_get_target_path(struct kernfs_node
>>  if (len < 2)
>>  return -EINVAL;
>>  len--;
>> -if ((s - path) + len > PATH_MAX)
>> +if ((s - path) + len >= PATH_MAX)
>>  return -ENAMETOOLONG;
>>
>>  /* reverse fillup of target string from target to base */
>>  kn = target;
>> +s[len] = '\0';
>>  while (kn->parent && kn != base) {
>>  int slen = strlen(kn->name);
>>
>>  len -= slen;
>> -strncpy(s + len, kn->name, slen);
>> +memcpy(s + len, kn->name, slen);
>>  if (len)
>>  s[--len] = '/';
>>
> 
> This last memcpy replacement has already been applied to my tree, from a
> patch from soeone else, so are you sure all of the other changes are
> also really needed?  Why the extra \0 termination of a string that is
> already terminated?
> 

I did only a code review, but the range checks look really dangerously
wrong.

The string is only zero-terminated because it is allocated in
kernfs_iop_get_link with body = kzalloc(PAGE_SIZE, GFP_KERNEL);

I would recommend to explicitly place a termination in the
buffer, and not rely on the way how the buffer is allocated.

> And why is the first memcpy replacement needed?  gcc doesn't say
> anything about that, does it?
> 

No, that is more or less for efficiency reasons, since it is writing
the NUL which is always overwritten with something different in the
next step.  If the loop is executed zero times, there result is not
explicitly zero-terminated either, so using strcpy is somehow misleading.

Well, I would say personal taste, if the loop below constructs
the string with memcpy the loop above can do the same.

If you prefer the strcpy in the first loop, I have no strong
preference here.


Thanks
Bernd.

> thanks,
> 
> greg k-h
> 


[PATCHv2] Fix range checks in kernfs_get_target_path

2018-07-07 Thread Bernd Edlinger
The strncpy causes a warning [-Wstringop-truncation] here,
which indicates that it never appends a NUL byte to the path.
The NUL byte is only there because the buffer is allocated
with kzalloc(PAGE_SIZE, GFP_KERNEL), but since the range-check
is also off-by-one, and PAGE_SIZE==PATH_MAX the returned string
will not be zero-terminated if it is exactly PATH_MAX characters.
Furthermore also the initial loop may theoretically exceed PATH_MAX
and cause a fault.

Signed-off-by: Bernd Edlinger 
---
  fs/kernfs/symlink.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 08ccabd..c8b7d44a 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -63,7 +63,10 @@ static int kernfs_get_target_path(struct kernfs_node
if (base == kn)
break;

-   strcpy(s, "../");
+   if ((s - path) + 3 >= PATH_MAX)
+   return -ENAMETOOLONG;
+
+   memcpy(s, "../", 3);
s += 3;
base = base->parent;
}
@@ -79,16 +82,17 @@ static int kernfs_get_target_path(struct kernfs_node
if (len < 2)
return -EINVAL;
len--;
-   if ((s - path) + len > PATH_MAX)
+   if ((s - path) + len >= PATH_MAX)
return -ENAMETOOLONG;

/* reverse fillup of target string from target to base */
kn = target;
+   s[len] = '\0';
while (kn->parent && kn != base) {
int slen = strlen(kn->name);

len -= slen;
-   strncpy(s + len, kn->name, slen);
+   memcpy(s + len, kn->name, slen);
if (len)
s[--len] = '/';

-- 
1.9.1


[PATCHv2] Fix range checks in kernfs_get_target_path

2018-07-07 Thread Bernd Edlinger
The strncpy causes a warning [-Wstringop-truncation] here,
which indicates that it never appends a NUL byte to the path.
The NUL byte is only there because the buffer is allocated
with kzalloc(PAGE_SIZE, GFP_KERNEL), but since the range-check
is also off-by-one, and PAGE_SIZE==PATH_MAX the returned string
will not be zero-terminated if it is exactly PATH_MAX characters.
Furthermore also the initial loop may theoretically exceed PATH_MAX
and cause a fault.

Signed-off-by: Bernd Edlinger 
---
  fs/kernfs/symlink.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 08ccabd..c8b7d44a 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -63,7 +63,10 @@ static int kernfs_get_target_path(struct kernfs_node
if (base == kn)
break;

-   strcpy(s, "../");
+   if ((s - path) + 3 >= PATH_MAX)
+   return -ENAMETOOLONG;
+
+   memcpy(s, "../", 3);
s += 3;
base = base->parent;
}
@@ -79,16 +82,17 @@ static int kernfs_get_target_path(struct kernfs_node
if (len < 2)
return -EINVAL;
len--;
-   if ((s - path) + len > PATH_MAX)
+   if ((s - path) + len >= PATH_MAX)
return -ENAMETOOLONG;

/* reverse fillup of target string from target to base */
kn = target;
+   s[len] = '\0';
while (kn->parent && kn != base) {
int slen = strlen(kn->name);

len -= slen;
-   strncpy(s + len, kn->name, slen);
+   memcpy(s + len, kn->name, slen);
if (len)
s[--len] = '/';

-- 
1.9.1


[PATCH] Fix range checks in kernfs_get_target_path

2018-07-06 Thread Bernd Edlinger
The strncpy causes a warning [-Wstringop-truncation] here,
which indicates that it never appends a NUL byte to the path.
The NUL byte is only there because the buffer is allocated
with kzalloc(PAGE_SIZE, GFP_KERNEL), but since the range-check
is also off-by-one, and PAGE_SIZE==PATH_MAX the returned string
will not be zero-terminated if it is exactly PATH_MAX characters.
Furthermore also the initial loop may theoretically exceed PATH_MAX
and cause a fault.

Signed-off-by: Bernd Edlinger 
---
  fs/kernfs/symlink.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 08ccabd..c8b7d44a 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -63,7 +63,10 @@ static int kernfs_get_target_path(struct kernfs_node 
*parent,
if (base == kn)
break;

-   strcpy(s, "../");
+   if ((s - path) + 3 >= PATH_MAX)
+   return -ENAMETOOLONG;
+
+   memcpy(s, "../", 3);
s += 3;
base = base->parent;
}
@@ -79,16 +82,17 @@ static int kernfs_get_target_path(struct kernfs_node 
*parent,
if (len < 2)
return -EINVAL;
len--;
-   if ((s - path) + len > PATH_MAX)
+   if ((s - path) + len >= PATH_MAX)
return -ENAMETOOLONG;

/* reverse fillup of target string from target to base */
kn = target;
+   s[len] = '\0';
while (kn->parent && kn != base) {
int slen = strlen(kn->name);

len -= slen;
-   strncpy(s + len, kn->name, slen);
+   memcpy(s + len, kn->name, slen);
if (len)
s[--len] = '/';

-- 
1.9.1


[PATCH] Fix range checks in kernfs_get_target_path

2018-07-06 Thread Bernd Edlinger
The strncpy causes a warning [-Wstringop-truncation] here,
which indicates that it never appends a NUL byte to the path.
The NUL byte is only there because the buffer is allocated
with kzalloc(PAGE_SIZE, GFP_KERNEL), but since the range-check
is also off-by-one, and PAGE_SIZE==PATH_MAX the returned string
will not be zero-terminated if it is exactly PATH_MAX characters.
Furthermore also the initial loop may theoretically exceed PATH_MAX
and cause a fault.

Signed-off-by: Bernd Edlinger 
---
  fs/kernfs/symlink.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c
index 08ccabd..c8b7d44a 100644
--- a/fs/kernfs/symlink.c
+++ b/fs/kernfs/symlink.c
@@ -63,7 +63,10 @@ static int kernfs_get_target_path(struct kernfs_node 
*parent,
if (base == kn)
break;

-   strcpy(s, "../");
+   if ((s - path) + 3 >= PATH_MAX)
+   return -ENAMETOOLONG;
+
+   memcpy(s, "../", 3);
s += 3;
base = base->parent;
}
@@ -79,16 +82,17 @@ static int kernfs_get_target_path(struct kernfs_node 
*parent,
if (len < 2)
return -EINVAL;
len--;
-   if ((s - path) + len > PATH_MAX)
+   if ((s - path) + len >= PATH_MAX)
return -ENAMETOOLONG;

/* reverse fillup of target string from target to base */
kn = target;
+   s[len] = '\0';
while (kn->parent && kn != base) {
int slen = strlen(kn->name);

len -= slen;
-   strncpy(s + len, kn->name, slen);
+   memcpy(s + len, kn->name, slen);
if (len)
s[--len] = '/';

-- 
1.9.1


[PATCH] Add a configure option ARM_ERRATA_WFE_BROKEN

2018-06-09 Thread Bernd Edlinger
This works around problems with Altera Cyclone5 Devices. See
https://bugzilla.kernel.org/show_bug.cgi?id=111951

I just wanted to let you know that we use this patch since more than
two years now and the problem never repeated for us.

There is one possibly related observation with
CONFIG_PL310_ERRATA_769419 which was always enabled, although it
should be unnecessary according to the comment in
arch/arm/mm/cache-l2x0.c:
  * 769419: PL310 R0P0->R3P1, fixed R3P2.
  *  Affects: store buffer
  *  store buffer is not automatically drained.

This erratum should not be needed as we use R3P3, but when this
was disabled a kind of dead-lock happened after some time.

It might be connected to this problem since ERRATA_769419 injects
a wmb in the arch_cpu_idle_enter which is one of the last actions
before the wfi instruction which happens in the idle function.

So the current working theory is the cache might freeze before
all data is written out.

Signed-off-by: Bernd Edlinger 
---
  arch/arm/Kconfig| 7 +++
  arch/arm/include/asm/spinlock.h | 7 +++
  2 files changed, 14 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a7f8e7f..f96af3b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1215,6 +1215,13 @@ config ARM_ERRATA_852423
  config option from the A12 erratum due to the way errata are checked
  for and handled.

+config ARM_ERRATA_WFE_BROKEN
+   bool "ARM errata: wfe broken"
+   depends on CPU_V7 && SMP
+   help
+ This option disables the wfe instruction for raw_spin_locks.
+ At least some CycloneV SoC devices wait much too long in wfe.
+
  endmenu

  source "arch/arm/common/Kconfig"
diff --git a/arch/arm/include/asm/spinlock.h 
b/arch/arm/include/asm/spinlock.h
index 099c78f..7efc431 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -10,6 +10,12 @@
  #include 
  #include 

+#ifdef CONFIG_ARM_ERRATA_WFE_BROKEN
+#undef wfe
+#define wfe()
+#define WFE(x)
+#define dsb_sev()
+#else
  /*
   * sev and wfe are ARMv6K extensions.  Uniprocessor ARMv6 may not have 
the K
   * extensions, so when running on UP, we have to patch these 
instructions away.
@@ -44,6 +50,7 @@ static inline void dsb_sev(void)
dsb(ishst);
__asm__(SEV);
  }
+#endif

  /*
   * ARMv6 ticket-based spin-locking.
-- 
2.7.4


[PATCH] Add a configure option ARM_ERRATA_WFE_BROKEN

2018-06-09 Thread Bernd Edlinger
This works around problems with Altera Cyclone5 Devices. See
https://bugzilla.kernel.org/show_bug.cgi?id=111951

I just wanted to let you know that we use this patch since more than
two years now and the problem never repeated for us.

There is one possibly related observation with
CONFIG_PL310_ERRATA_769419 which was always enabled, although it
should be unnecessary according to the comment in
arch/arm/mm/cache-l2x0.c:
  * 769419: PL310 R0P0->R3P1, fixed R3P2.
  *  Affects: store buffer
  *  store buffer is not automatically drained.

This erratum should not be needed as we use R3P3, but when this
was disabled a kind of dead-lock happened after some time.

It might be connected to this problem since ERRATA_769419 injects
a wmb in the arch_cpu_idle_enter which is one of the last actions
before the wfi instruction which happens in the idle function.

So the current working theory is the cache might freeze before
all data is written out.

Signed-off-by: Bernd Edlinger 
---
  arch/arm/Kconfig| 7 +++
  arch/arm/include/asm/spinlock.h | 7 +++
  2 files changed, 14 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a7f8e7f..f96af3b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1215,6 +1215,13 @@ config ARM_ERRATA_852423
  config option from the A12 erratum due to the way errata are checked
  for and handled.

+config ARM_ERRATA_WFE_BROKEN
+   bool "ARM errata: wfe broken"
+   depends on CPU_V7 && SMP
+   help
+ This option disables the wfe instruction for raw_spin_locks.
+ At least some CycloneV SoC devices wait much too long in wfe.
+
  endmenu

  source "arch/arm/common/Kconfig"
diff --git a/arch/arm/include/asm/spinlock.h 
b/arch/arm/include/asm/spinlock.h
index 099c78f..7efc431 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -10,6 +10,12 @@
  #include 
  #include 

+#ifdef CONFIG_ARM_ERRATA_WFE_BROKEN
+#undef wfe
+#define wfe()
+#define WFE(x)
+#define dsb_sev()
+#else
  /*
   * sev and wfe are ARMv6K extensions.  Uniprocessor ARMv6 may not have 
the K
   * extensions, so when running on UP, we have to patch these 
instructions away.
@@ -44,6 +50,7 @@ static inline void dsb_sev(void)
dsb(ishst);
__asm__(SEV);
  }
+#endif

  /*
   * ARMv6 ticket-based spin-locking.
-- 
2.7.4


[PATCH] stmmac: Don't access tx_q->dirty_tx before netif_tx_lock

2017-10-21 Thread Bernd Edlinger
This is the possible reason for different hard to reproduce
problems on my ARMv7-SMP test system.

The symptoms are in recent kernels imprecise external aborts,
and in older kernels various kinds of network stalls and
unexpected page allocation failures.

My testing indicates that the trouble started between v4.5 and v4.6
and prevails up to v4.14.

Using the dirty_tx before acquiring the spin lock is clearly
wrong and was first introduced with v4.6.

Fixes: e3ad57c96715 ("stmmac: review RX/TX ring management")

Signed-off-by: Bernd Edlinger <bernd.edlin...@hotmail.de>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1763e48..c8e280fa 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1800,12 +1800,13 @@ static void stmmac_tx_clean(struct stmmac_priv *priv, 
u32 queue)
 {
struct stmmac_tx_queue *tx_q = >tx_queue[queue];
unsigned int bytes_compl = 0, pkts_compl = 0;
-   unsigned int entry = tx_q->dirty_tx;
+   unsigned int entry;
 
netif_tx_lock(priv->dev);
 
priv->xstats.tx_clean++;
 
+   entry = tx_q->dirty_tx;
while (entry != tx_q->cur_tx) {
struct sk_buff *skb = tx_q->tx_skbuff[entry];
struct dma_desc *p;
-- 
1.9.1


[PATCH] stmmac: Don't access tx_q->dirty_tx before netif_tx_lock

2017-10-21 Thread Bernd Edlinger
This is the possible reason for different hard to reproduce
problems on my ARMv7-SMP test system.

The symptoms are in recent kernels imprecise external aborts,
and in older kernels various kinds of network stalls and
unexpected page allocation failures.

My testing indicates that the trouble started between v4.5 and v4.6
and prevails up to v4.14.

Using the dirty_tx before acquiring the spin lock is clearly
wrong and was first introduced with v4.6.

Fixes: e3ad57c96715 ("stmmac: review RX/TX ring management")

Signed-off-by: Bernd Edlinger 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1763e48..c8e280fa 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1800,12 +1800,13 @@ static void stmmac_tx_clean(struct stmmac_priv *priv, 
u32 queue)
 {
struct stmmac_tx_queue *tx_q = >tx_queue[queue];
unsigned int bytes_compl = 0, pkts_compl = 0;
-   unsigned int entry = tx_q->dirty_tx;
+   unsigned int entry;
 
netif_tx_lock(priv->dev);
 
priv->xstats.tx_clean++;
 
+   entry = tx_q->dirty_tx;
while (entry != tx_q->cur_tx) {
struct sk_buff *skb = tx_q->tx_skbuff[entry];
struct dma_desc *p;
-- 
1.9.1