[389-devel] 389 DS nightly 2019-11-20 - 95% PASS

2019-11-19 Thread vashirov
https://fedorapeople.org/groups/389ds/ci/nightly/2019/11/20/report-389-ds-base-1.4.1.9-1.fc31.x86_64.html
___
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org


[389-devel] Re: Performance discussion

2019-11-19 Thread William Brown


> On 14 Nov 2019, at 22:33, Ludwig Krispenz  wrote:
> 
> 
> On 11/14/2019 12:17 PM, William Brown wrote:
>> 
>>> On 14 Nov 2019, at 19:06, Ludwig Krispenz  wrote:
>>> 
>>> 
>>> On 11/14/2019 09:29 AM, William Brown wrote:
> On 14 Nov 2019, at 18:22, Ludwig Krispenz  wrote:
> 
> Hi William,
> 
> before further thinking about this, I need some clarification, or maybe I 
> just missed this. When you talk about 1..16 threads do you mean worker 
> threads ?
 Server worker threads. ldclt is set to only use 10 client threads - which 
 is surprising that with 10 client threads we see a decline when workers > 
 10 (one would assume it should stabilise).
 
> Or concurrent client connection threads in ldclt/rsearch/ - how many 
> concurrent connections do you have and how does varying this number 
> change results ?
 I will add more tests to this to allow varying the ldclt numbers.
>>> ok, and I assume that you are using a version with nunc-stans removed, 
>>> could you please also verify the effect of tubo-mode on/off ?
>> Correct, I'm using git master. Yes I'll check that also. I plan to add 
>> permutations like this to the test harness so it's easier for us to repeat 
>> in the future when we make changes.
>> 
>> I also need to find a way to wire in perf/stap so we can generate 
>> flamegraphs from each test run too for later analysis.
>> 
>> Thanks for the great ideas :)
> Thanks, and one more idea ;-)
> Can you separate the client and the server on two different machines, I've 
> seen ldclt or other clients impacting cpu usage a lot, there will be some 
> network overhead, but this should be ok (and more realistic)

That was the original goal, but I can't seperate it (yet) because we restart to 
change settings ... 

I'm not sure what's the best way to do it - have the tests maybe act as a 
generator and then you have to run the ldclt from a seperate machine? Not sure 
really  I need to think what it should look like.

I know viktor did some work on pytest over multiple hosts so perhaps that could 
help here too to coordinate? I think they also were speaking about ansible as 
well ... maybe he should comment if he has ideas.



>> 
> Regards,
> Ludwig
> 
> On 11/14/2019 03:34 AM, William Brown wrote:
>> Hi all,
>> 
>> After our catch up, we were discussing performance matters. I decided to 
>> start on this while waiting for some of my tickets to be reviewed and to 
>> see what's going on.
>> 
>> These tests were carried out on a virtual machine configured in search 6 
>> to have access to 6 CPU's, and search 12 with 12 CPU. Both machines had 
>> access to 8GB of ram.
>> 
>> The hardware is an i7 2.2GHz with 6 cores (12 threads) and 32GB of ram, 
>> with NVME storage provided.
>> 
>> The rows are the VM CPU's available, and the columns are the number of 
>> threads in nsslapd-threadnumber. No other variables were changed. The 
>> database has 6000 users and 4000 groups. The instance was restarted 
>> before each test. The search was a randomised uid equality test with a 
>> single result. I provided the thread 6 and 12 columns to try to match 
>> the VM and host specs rather than just the traditional base 2 sequence 
>> we see.
>> 
>> I've attached a screen shot of the results, but I have some initial 
>> thoughts to provide on this. What's interesting is our initial 1 thread 
>> performance and how steeply it ramps up towards 4 thread. This in mind 
>> it's not a linear increase. Per thread on s6 we go from ~3800 to ~2500 
>> ops per second, and a similar ratio exists in s12. What is stark is that 
>> after t4 we immediately see a per thread *decline* despite the greater 
>> amount of available computer resources. This indicates that it is poor 
>> locking and thread coordination causing a rapid decline in performance. 
>> This was true on both s6 and s12. The decline intesifies rapidly once we 
>> exceed the CPU avail on the host (s6 between t6 to t12), but still 
>> declines even when we do have the hardware threads available in s12.
>> 
>> I will perform some testing between t1 and t6 versions to see if I can 
>> isolate which functions are having a growth in time consumption.
>> 
>> For now an early recommendation is that we alter our default CPU 
>> auto-tuning. Currently we use a curve which starts at 16 threads from 1 
>> to 4 cores, and then tapering down to 512 cores to 512 threads - however 
>> in almost all of these autotuned threads we have threads greater than 
>> our core count. This from this graph would indicate that this decision 
>> only hurts our performance rather than improving it. I suggest we change 
>> our thread autotuning to be 1 to 1 ratio of threads to cores to prevent 
>> over contention on lock resources.
>> 
>> Tha

[389-devel] Please review gssapi test on suse support 50730

2019-11-19 Thread William Brown
https://pagure.io/389-ds-base/pull-request/50730


--
Sincerely,

William
___
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org


[389-devel] Re: Couple of troubles around using dsconf

2019-11-19 Thread William Brown


> On 19 Nov 2019, at 22:36, Matus Honek  wrote:
> 
> Hello folks,
> 
> Context: My setup is a running dscontainer with exported /data. While
> developing (outside of the container) I am trying to run `dsconf
> ldapi://%2fpath%2fto%2fdscontainers%2fsocket security get`.
> 
> Issue 1: I get IndexError exception:
>  File "/home/mhonek/src/ds/up/src/lib389/lib389/_mapped_object.py",
> line 158, in display
> How to fix the fact we can get no results to display, and to fix it
> correctly so that nothing else eventually blows up? Don't know...

Raise an issue seems like the first place to deal with this, especially with 
the stack trace associated.

> 
> Issue 2: Tracing back I find out I autobinded as non-root (non 0 UID).
> Expectable, but still unexpected. So I tried to override this by
> providing `-D` and `-w` explicitly to dsconf. No change, still
> autobinding. Turns out the autobind has preference over simple bind in
> DirSrv.open, this comes from [implementation].
> Possible solution: Instead of `elif can_autobind(): ... else:
> simple_bind` do `elif self.binddn is not None: ... else
> can_autobind(): ...`. Worked for me. Would this blow up some use-case?
> Don't know...

autobind is super important for ldapi - you have to prefer autobind else things 
go wildly wrong  but also that whole section of dirsrv.open is extremely 
cursed and never should have been structured like that. But this is the bed we 
have, so we have to lie in it now.

A major drawback that I have been suffering from here is in your statement - 
would this blow up some use-case? I have no idea. It's python, which is 
basically schroedingers language. It's unknown unless it's observed running.

I'd suggest we open an issue here and then let's think about how we could solve 
it, but it won't be easy because all of the def open code needs a rethink - we 
need a better approach to how to determine how we want to present 
authentication credentials.

> 
> Sub-issue 2a: Given I was able to autobind as non-root UID, the
> wording in a log message [aubind-log]. The word "root" probably
> shouldn't be there?

Sure, just open an issue and fix it? (This is kind of why I want PR's to not 
need an issue, to encourage quicker small fixes like this rather than a lot of 
admin overhead to the process).

> 
> Somewhat troubling 1: At the time of running open in the autobind
> branch in DirSrv.open [autobind] the value of `self.bindpw` is
> literally "password" even though no `-D` nor `-w` was provided on
> command line for dsconf. I believe there are some reasons (besides
> "because the code is written so") why this is so but I would like to
> be enlightened here.

Lib389 didn't always have a topologies module. Previously each instance would 
be setup manually in each test with ds.allocate(); ds.create(); ds.bind() ... 
So that meant a lot of "testing" defaults ended up in DirSrv. It became a 
kitchen sink. It was later on that we start to really modularise it out with 
DSLdapObject, topologies and others. That caused a lot of defaults to shift 
out, but a lot of fragments of that legacy still exist in DirSrv because it has 
an extremely fragile design. 

Anyway, this can probably be removed, it shouldn't be needed, and it could 
confuse def open().

In general, I've been trying to push toward local_simple_allocate and 
remote_simple_allocate, but I never got traction on it. The whole way we setup 
DirSrv as a type is really messy :( ... Actually, the whole DirSrv type is a 
small microcosm of evil ... 

Honestly I would *love* a solid two weeks in a room with you, simon, mark, 
viktor, and we just clean and replace DirSrv as a type. That would be amazing.

> 
> [implementation] https://pagure.io/389-ds-base/c/e07e489
> [autobind-log] 
> https://pagure.io/389-ds-base/blob/6d70cbe/f/src/lib389/lib389/__init__.py#_1063
> [autobind] 
> https://pagure.io/389-ds-base/blob/6d70cbe/f/src/lib389/lib389/__init__.py#_1060
> 
> Please, share your ideas, where I went wrong, what we could go ahead with.

Computers were a mistake, that was where we went wrong :) 

But when you see a problem, you talk about it (which you did), then we fix it. 
:) 

> 
> Thanks,
> Matus
> 
> -- 
> Matúš Honěk
> Software Engineer
> Red Hat Czech
> ___
> 389-devel mailing list -- 389-devel@lists.fedoraproject.org
> To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org

—
Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server
SUSE Labs
___
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https

[389-devel] Couple of troubles around using dsconf

2019-11-19 Thread Matus Honek
Hello folks,

Context: My setup is a running dscontainer with exported /data. While
developing (outside of the container) I am trying to run `dsconf
ldapi://%2fpath%2fto%2fdscontainers%2fsocket security get`.

Issue 1: I get IndexError exception:
  File "/home/mhonek/src/ds/up/src/lib389/lib389/_mapped_object.py",
line 158, in display
How to fix the fact we can get no results to display, and to fix it
correctly so that nothing else eventually blows up? Don't know...

Issue 2: Tracing back I find out I autobinded as non-root (non 0 UID).
Expectable, but still unexpected. So I tried to override this by
providing `-D` and `-w` explicitly to dsconf. No change, still
autobinding. Turns out the autobind has preference over simple bind in
DirSrv.open, this comes from [implementation].
Possible solution: Instead of `elif can_autobind(): ... else:
simple_bind` do `elif self.binddn is not None: ... else
can_autobind(): ...`. Worked for me. Would this blow up some use-case?
Don't know...

Sub-issue 2a: Given I was able to autobind as non-root UID, the
wording in a log message [aubind-log]. The word "root" probably
shouldn't be there?

Somewhat troubling 1: At the time of running open in the autobind
branch in DirSrv.open [autobind] the value of `self.bindpw` is
literally "password" even though no `-D` nor `-w` was provided on
command line for dsconf. I believe there are some reasons (besides
"because the code is written so") why this is so but I would like to
be enlightened here.

[implementation] https://pagure.io/389-ds-base/c/e07e489
[autobind-log] 
https://pagure.io/389-ds-base/blob/6d70cbe/f/src/lib389/lib389/__init__.py#_1063
[autobind] 
https://pagure.io/389-ds-base/blob/6d70cbe/f/src/lib389/lib389/__init__.py#_1060

Please, share your ideas, where I went wrong, what we could go ahead with.

Thanks,
Matus

-- 
Matúš Honěk
Software Engineer
Red Hat Czech
___
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org