[Freeipa-users] Re: Number of concurrent connections are decreased by replication.

2024-01-05 Thread Thierry Bordaz via FreeIPA-users

Hi Jaehwan,

Why the nb of established connections (to the server) is a concern ?

The vast majority of the connections are client connections. Replication 
connections, especially in ring topology, would account for a small 
fraction of them. The added hosts generates a replication traffic, over 
the replication connections, and would put some cpu load on the 
destination server. ATM I do not see how it would impact the capacity of 
the destination server to accept new connections. The response time of 
destination server may increase (because of replicated updates), could 
it impact clients to open new connections ?


By the way what version of 389ds, are you running ?

best regards
thierry

On 1/5/24 04:38, Jaehwan Kim via FreeIPA-users wrote:

Hello.

I recently encoutered a poblem that nubmer of concurrent connecitons are 
decreased in FreeIPA servers.

[Architecutre - replication topology]
My replication topology which is circular (ring-shaped), consists of 13 FreeIPA 
servers.
These 13 servers are grouped as 3 clusters, of which members are 5, 4, 4 
respectively.
NLBs(network load balancers) to share request from clients for ipa login, 
kerberos authenticaion, ldap connections, are assinged to each cluster.
Therefore 3 NLBs have 5, 4, 4 FreeIPA servers as their nlb backend pool, 
repectively.

This architecture has been worked successfully for 2 years, but recently I encountered a 
problem that 867 host_add per hours to one cluster results in "# of concurrent 
connections decrement" for all clusters.
Command to get # of concurrent connections is
dsconf -D "cn=Directory Manager" ldap://server.example.com monitor server | 
grep currentconnections:
About 2K connections are observed for each servers, by this command.

I also found that if servers which replication info isn't transfered to, this 
symptom doesn't happen, even though those are in the same replication topology 
ring.
Hence, I guess that "# of concurrent connections decrement" symptom is related 
to replcation.

I tried to tune the parameters like
dtablesize = 65535,
repl-release-timeout = 120,
nnsslapd-threadnumber = authomatic thread tuning,
db and entry cache auto-sizing (nsslapd-cache-autosize = 80,
with failure.

I want to ask help to solve this symptom, if posible.

Thank you.
JHK
--
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

--
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[Freeipa-users] Re: After "writeback to ldap failed" -- silent total freeipa failure / deadlock.

2023-08-09 Thread Thierry Bordaz via FreeIPA-users


On 8/9/23 21:13, Harry G Coin wrote:



On 8/9/23 12:05, Thierry Bordaz wrote:


On 8/9/23 18:55, Harry G Coin wrote:
Theirry asked for a recap summary below, so forgive the 'top post'.  
Here it is:


4.9.10 default install on two systems call them primary (with 
kasp.db) and secondary but otherwise multi-master, 1g link between 
them, modest/old cpu, drives, 5Gmemory, with dns/dnssec and adtrust 
(aimed at local samba share support only).  Unremarkable initial 
install.  Normal operations, GUI, etc.


A python program using the ldap2 backend on Primary starts loading a 
few dozen default domains with A /  and associated PTR 
records.   It first does dns find/show to check for existence, and 
if absent adds the domain/subdomain, missing A /  assoc PTR 
etc.    Extensive traffic in the logs to do with dnssec, notifies 
being sent back and forth between primary and secondary by bind9 
(which you'd think already had the info in ldap so why 'notify' via 
bind really?)  serial numbers going up, dnssec updates.  Every now 
and then the program checks whether dnssec keys need rotating or if 
new zones appear, but that's fairly infrequent and seems unrelated.


After not more than a few minutes of adding records, in Primary's 
log "writeback to ldap failed" will appear.   There will be nothing 
in any log indicating anything else amiss, 'systemctl 
is-system-running' reports 'running'.  Login attempts on the GUI 
fail 'for an unknown reason', named/bind9 queries for A/ seem to 
work.  Anything that calls ns-slapd times out or hangs waiting 
forever.  CPU usage near 0.



Did you get a pstack (ns-slapd) at that time ?


Yes, posted 8/8/23 and again now:

The threads 25 and 33 are in fatal deadlock. Alexander suggested, as a 
workaround, to disable retroCL trimming. After you disabled retroCL 
trimming and restarted the instance are you still seeing this kind of 
deadlock ?


best regards
thierry


[root@registry2 ~]# pstack 1405 > ns-slapd3.log
[root@registry2 ~]# more ns-slapd3.log
Thread 33 (Thread 0x7f66366f5700 (LWP 2654)):
#0  0x7f6639b6d455 in pthread_rwlock_wrlock () at 
target:/lib64/libpthread.so.0
#1  0x7f6628e9d380 in map_wrlock () at 
target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x7f6628e8d393 in backend_shr_post_delete_cb.part () at 
target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#3  0x7f6628e8d508 in backend_shr_betxn_post_delete_cb () at 
target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#4  0x7f663d7bec79 in plugin_call_func (list=0x7f66324c8200, 
operation=operation@entry=563, pb=pb@entry=0x7f65fbdffcc0, 
call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:2032
#5  0x7f663d7beec4 in plugin_call_list (pb=0x7f65fbdffcc0, 
operation=563, list=) at ldap/servers/slapd/plugin.c:1973
#6  0x7f663d7beec4 in plugin_call_plugins 
(pb=pb@entry=0x7f65fbdffcc0, whichfunction=whichfunction@entry=563) at 
ldap/servers/slapd/plugin.c:442
#7  0x7f662ae6ac83 in ldbm_back_delete (pb=0x7f65fbdffcc0) at 
ldap/servers/slapd/back-ldbm/ldbm_delete.c:1289
#8  0x7f663d7696ac in op_shared_delete 
(pb=pb@entry=0x7f65fbdffcc0) at ldap/servers/slapd/delete.c:338
#9  0x7f663d7698bd in delete_internal_pb 
(pb=pb@entry=0x7f65fbdffcc0) at ldap/servers/slapd/delete.c:209
#10 0x7f663d769b3b in slapi_delete_internal_pb 
(pb=pb@entry=0x7f65fbdffcc0) at ldap/servers/slapd/delete.c:151
#11 0x7f66294c4fde in delete_changerecord (cnum=cnum@entry=27941) 
at ldap/servers/plugins/retrocl/retrocl_trim.c:89
#12 0x7f66294c51a1 in trim_changelog () at 
ldap/servers/plugins/retrocl/retrocl_trim.c:290
#13 0x7f66294c51a1 in changelog_trim_thread_fn (arg=out>) at ldap/servers/plugins/retrocl/retrocl_trim.c:333

#14 0x7f663a1cd968 in _pt_root () at target:/lib64/libnspr4.so
#15 0x7f6639b681ca in start_thread () at 
target:/lib64/libpthread.so.0

#16 0x7f663be12e73 in clone () at target:/lib64/libc.so.6
Thread 32 (Thread 0x7f65f61fc700 (LWP 1438)):
#0  0x7f6639b6d022 in pthread_rwlock_rdlock () at 
target:/lib64/libpthread.so.0
#1  0x7f6628e9d242 in map_rdlock () at 
target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x7f6628e88298 in backend_bind_cb () at 
target:/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#3  0x7f663d7bec79 in plugin_call_func (list=0x7f66324c7300, 
operation=operation@entry=401, pb=pb@entry=0x7f6600e07580, 
call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:2032
#4  0x7f663d7beec4 in plugin_call_list (pb=0x7f6600e07580, 
operation=401, list=) at ldap/servers/slapd/plugin.c:1973
#5  0x7f663d7beec4 in plugin_call_plugins 
(pb=pb@entry=0x7f6600e07580, whichfunction=whichfunction@entry=401) at 
ldap/servers/slapd/plugin.c:442
#6  0x55a716413add in ids_sasl_check_bind 
(pb=pb@entry=0x7f6600e07580) at ldap/servers/slapd/saslbind.c:1205
#7  0x55a7163fbd27 in do_bind (pb=pb@entry=0x7f6600e07580) at 
ldap/servers/slapd/bind.c:367
#8  0x55a716401835 in 

[Freeipa-users] Re: After "writeback to ldap failed" -- silent total freeipa failure / deadlock.

2023-08-09 Thread Thierry Bordaz via FreeIPA-users


On 8/9/23 18:55, Harry G Coin wrote:
Theirry asked for a recap summary below, so forgive the 'top post'.  
Here it is:


4.9.10 default install on two systems call them primary (with kasp.db) 
and secondary but otherwise multi-master, 1g link between them, 
modest/old cpu, drives, 5Gmemory, with dns/dnssec and adtrust (aimed 
at local samba share support only).  Unremarkable initial install.  
Normal operations, GUI, etc.


A python program using the ldap2 backend on Primary starts loading a 
few dozen default domains with A /  and associated PTR records.   
It first does dns find/show to check for existence, and if absent adds 
the domain/subdomain, missing A /  assoc PTR etc.    Extensive 
traffic in the logs to do with dnssec, notifies being sent back and 
forth between primary and secondary by bind9 (which you'd think 
already had the info in ldap so why 'notify' via bind really?)  serial 
numbers going up, dnssec updates.  Every now and then the program 
checks whether dnssec keys need rotating or if new zones appear, but 
that's fairly infrequent and seems unrelated.


After not more than a few minutes of adding records, in Primary's log 
"writeback to ldap failed" will appear.   There will be nothing in any 
log indicating anything else amiss, 'systemctl is-system-running' 
reports 'running'.  Login attempts on the GUI fail 'for an unknown 
reason', named/bind9 queries for A/ seem to work.  Anything that 
calls ns-slapd times out or hangs waiting forever.  CPU usage near 0.



Did you get a pstack (ns-slapd) at that time ?




'systemctl restart ipa' and or reboot restores operations-- HOWEVER 
there will be at least a 10 minute wait with ns-slapd at 100% CPU 
until the reboot process forcibly kills it.


I guess most ns-slapd workers (thread running the requests) have been 
stopped but no idea which ones is eating CPU. A 'top -H' and pstack 
would help.




Upgrading to 4.9.11 caused the 'writeback to ldap failed' message to 
move to  Secondary, not primary.   Same further consequences.


Alexander's dsconf notion changed the appearance, it broke dnssec 
updates with an LDAP timeout error message.


There is nothing whatever remarkable about this two node setup. I 
suspect that test environments using the latest processors and all 
nvme storage is just too performant to manifest it, or the test 
environments don't have dnssec enabled and don't add a few thousand 
records to a few dozen subdomains.


I need some way forward, it's dead in the water now.   Presently my 
'plan' such as it is -- is move freeipa VMs to faster systems with 
more memory and 10gb interconnects in hopes of not hitting this, but 
of course this is one of those 'sword hanging over everyone's head by 
a thread' 'don't breathe on it wrong or you'll die' situations that 
needs an answer before trust can come back.


I appreciate the focus!


On 8/9/23 11:24, Thierry Bordaz wrote:


On 8/9/23 17:15, Harry G Coin wrote:


On 8/9/23 01:00, Alexander Bokovoy wrote:

On Аўт, 08 жні 2023, Harry G Coin wrote:
Thanks for your help.  Details below. The problem 'moved' in I 
hope a diagnositcally useful way, but the system remains broken.


On 8/8/23 08:54, Alexander Bokovoy wrote:

On Аўт, 08 жні 2023, Harry G Coin wrote:


On 8/8/23 02:43, Alexander Bokovoy wrote:

pstack $(pgrep ns-slapd)  > ns-slapd log
Tried an upgrade from 4.9.10 to 4.9.11, the "writeback to ldap 
failed" error moved from the primary instance (on which the dns 
records were being added) to the replica which hung in the same 
fashion.   Here's the log you asked for from attempting 
'systemctl restart dirsrv@...'  it just hangs at 100% cpu for 
about 10 minutes.


Thank you. Are you using schema compat for some legacy clients?



This is a fresh install of 4.9.10 about a week ago, upgraded to 
4.9.11 yesterday, just two freeipa instances and no appreciable 
user load, using the install defaults. The 'in house' system then 
starts loading lots of dns records via the python ldap2 interface 
on the first of two systems installed, the replica produced what 
you see in this post. There is no 'private' information involved 
of any sort, it's supposed to field DNS calls from the public but 
was so unreliable I had to implement unbound on other servers, so 
all freeipa does is IXFR to unbound for the heavy load.  I suppose 
there may be <16 other in-house lab systems, maybe 2 or 3 with any 
activity, that use it for dns.   The only other clue is these are 
running on VMs in older servers and have no other software 
packages installed other than freeipa and what freeipa needs to 
run, and the in-house program that loads the dns.


Just to exclude potential problems with schema compat, it can be
disabled if you are not using it.


How?  The installs just use all the defaults, other than enabling 
dnssec and PTR records for all a/.


I'm officially in 'desperation mode' as not being able to populate 
DNS in freeipa reduces everyone to pencil and paper and 

[Freeipa-users] Re: After "writeback to ldap failed" -- silent total freeipa failure / deadlock.

2023-08-09 Thread Thierry Bordaz via FreeIPA-users


On 8/9/23 17:15, Harry G Coin wrote:


On 8/9/23 01:00, Alexander Bokovoy wrote:

On Аўт, 08 жні 2023, Harry G Coin wrote:
Thanks for your help.  Details below. The problem 'moved' in I hope 
a diagnositcally useful way, but the system remains broken.


On 8/8/23 08:54, Alexander Bokovoy wrote:

On Аўт, 08 жні 2023, Harry G Coin wrote:


On 8/8/23 02:43, Alexander Bokovoy wrote:

pstack $(pgrep ns-slapd)  > ns-slapd log
Tried an upgrade from 4.9.10 to 4.9.11, the "writeback to ldap 
failed" error moved from the primary instance (on which the dns 
records were being added) to the replica which hung in the same 
fashion.   Here's the log you asked for from attempting 'systemctl 
restart dirsrv@...'  it just hangs at 100% cpu for about 10 minutes.


Thank you. Are you using schema compat for some legacy clients?



This is a fresh install of 4.9.10 about a week ago, upgraded to 
4.9.11 yesterday, just two freeipa instances and no appreciable user 
load, using the install defaults.  The 'in house' system then starts 
loading lots of dns records via the python ldap2 interface on the 
first of two systems installed, the replica produced what you see in 
this post.   There is no 'private' information involved of any sort, 
it's supposed to field DNS calls from the public but was so 
unreliable I had to implement unbound on other servers, so all 
freeipa does is IXFR to unbound for the heavy load.  I suppose there 
may be <16 other in-house lab systems, maybe 2 or 3 with any 
activity, that use it for dns.   The only other clue is these are 
running on VMs in older servers and have no other software packages 
installed other than freeipa and what freeipa needs to run, and the 
in-house program that loads the dns.


Just to exclude potential problems with schema compat, it can be
disabled if you are not using it.


How?  The installs just use all the defaults, other than enabling 
dnssec and PTR records for all a/.


I'm officially in 'desperation mode' as not being able to populate DNS 
in freeipa reduces everyone to pencil and paper and coffee with full 
project stoppage until it's fixed or at least 'worked around'.   So 
anything that 'might help' can be sacrificed so at least 'something' 
works 'somewhat'.   If old AD needs to be 'broken' or 'off' but mostly 
the rest of it 'works sort of' then how do I do it?


Really this can't be hard to reproduce, it's just two instances with a 
1G link between them, each with a pair of old rusty hard drives in an 
lvm mirror using a COW file system, dnssec on, and one of them loading 
lots of dns with reverse pointers for each A/ with maybe 200 to 
600 PTR records per *arpa and maybe 10-200 records per subdomain, 
maybe 200 domains total.    A couple python for loops and hey presto 
you'll see freeipa lock up without notice in your lab as well.  I just 
can't imagine causing these race conditions to appear in the case of 
the only important load being DNS adds/finds/shows should be difficult.


I appreciate the help, and have become officially fearful about 
freeipa.  Maybe it's seldom used extensively for DNS and so my use 
case is an outlier?   Why are so few seeing this?  It's a fully 
default package install, no custom changes to the OS, freeipa, other 
packages.   I don't get it.


Thanks for any leads or help!



Hi Harry,


I agree with Mark, nothing suspicious on Thread30. It is flushing its txn.
The discussion is quite long, do you mind to re-explain what are the 
current symptoms ?

Is it hanging during update ? consuming CPU ?
Could you run top -H -p  -n 5 -d 3

if it is hanging could you run 'db_stat -CA -h /dev/shm/slapd-/ -N'


regards
thierry






I don't think it is about named per se, it is a bit of an unfortunate
interop inside ns-slapd between different plugins. bind-dyndb-ldap
relies on the syncrepl extension which implementation in ns-slapd is
using the retro changelog content. Retro changelog plugin triggers some
updates that cause schema compatibility plugin to lock itself up
depending on the order of updates that retro changelog would capture. We
fixed that in slapi-nis package some time ago and it *should* be
ignoring the retro changelog changes but somehow they still propagate
into it. There are few places in ns-slapd which were addressed just
recently and those updates might help (out later this year in RHEL).
Disabling schema compat would be the best.

What's worse, every reboot attempt waits the full '9 min 29 secs' 
before systemd forcibly terminates ns-slapd to finish the 'stop job'.


That's why I'm so troubled by all this, it's not like there is any 
interference from anything other than what freeipa puts out there, 
and it just locks with a message that gives no indication of what to 
do about it, with nothing in any logs and 'systemctl 
is-system-running' reports 'running'.


You could easily replicate this:  imagine a simple validation test 
that sets up two freeipa nodes, turns on dnssec, creates some 
domains, then adds A 

[Freeipa-users] Re: How to check the number of read/write locks on /usr/sbin/ns-slapd process?

2022-09-06 Thread Thierry Bordaz via FreeIPA-users

Hi Kathy,

The procedure to diagnose hang looks nice. My understanding is that it 
assumes that in deadlock situation the more we have threads waiting on a 
resource, the more probable we have a hang/deadlock. Now because of the 
dynamic of the server itself, on the configuration, on the type of 
requests, on the monitoring of rwlock vs simple locks, on the impact on 
gdb prone to stop threads on locks, this scripts may fails to detect 
hang/deadlock.


ATM I see no specific 8.6 enhancements that can explain why the script 
is failing now.


If the server is no longer responsive to a client request (keepalive 
req), I suggest you collect pstacks, 'top -H -p `pidof ns-slapd`' and 
db_stat -N -CA. If it is eating CPU you may looks at the activity of 
thread consuming CPU. If it is not, it is possibly deadlock between 
backend/database access.


Best regards
thierry

On 8/30/22 11:20 PM, Kathy Zhu via FreeIPA-users wrote:

Hi Team,

We used following to get the number of rwlocks for /usr/sbin/ns-slapd 
process in Centos 7.9 to catch deadlocks:


PID=`pidof ns-slapd`

gdb -ex 'set confirm off' -ex 'set pagination off' -ex 'thread apply 
all bt full' -ex 'quit' /usr/sbin/ns-slapd $PID |& grep '^#0.*lock' | 
grep pthread_rwlock | sort -u



That helped us to detect ns-slapd hang caused by deadlocks.


After migrating to Red Hat 8.6, we had a lot of hangs (dirsvr is 
running but not responding) and could not find why. We use the same 
above method, however, we are not able to catch anything. I wonder if 
there is a different way to count the rwlocks in Red Hat 8.6?



We realize that there are multiple reasons to cause hangs, however, we 
would like to rule out the possibility of the deadlock.



The OS and packages:


Red Hat Enterprise Linux release 8.6 (Ootpa)

ipa-server.x86_64 4.9.8-7.module+el8.6.0+14337+19b76db2 
@rhel-8-for-x86_64-appstream-rpms


slapi-nis-0.56.6-4.module+el8.6.0+12936+736896b2.x86_64

389-ds-base-libs-1.4.3.28-6.module+el8.6.0+14129+983ceada.x86_64

389-ds-base-1.4.3.28-6.module+el8.6.0+14129+983ceada.x86_64



Many thanks.


Kathy.



___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[Freeipa-users] Re: Need help with confusing query results

2022-02-09 Thread Thierry Bordaz via FreeIPA-users

Hi Edward,

thank you so much diving up to the RC. I opened 
https://github.com/389ds/389-ds-base/issues/5158 to track that issue


regards
thierry

On 2/9/22 1:29 AM, Edward Valley via FreeIPA-users wrote:

Hi,

Finally, I made a bash script that:

1. Receives as arguments a 'base' and a 'filter' (like the fix-up task)
2. Search for incomplete entries (no entryUUID attribute)
3. Patch dirsrv schema (99user.ldif) to make entryUUID attribute mutable 
(Removes NO-USER-MODIFICATION)
4. Restarts dirsrv instance service
5. Generates and sets an entryUUID for every incomplete entry found matching 
the filter
6. Restores dirsrv schema
7. Restarts dirsrv instance service

Changes are immediately replicated and everything works like it should be.
Like I said before, new entries have an entryUUID attribute generated 
automatically, that was never a problem.
I can share the script if anyone is interested.

Thank you all for your work and time.
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


[Freeipa-users] Re: Need help with confusing query results

2022-02-01 Thread Thierry Bordaz via FreeIPA-users


On 2/1/22 6:50 AM, Edward Valley via FreeIPA-users wrote:

Hi Thierry,

Do you want the output of:
ldapsearch -LLL -h localhost -x -D "cn=Directory Manager" -w "..." \
 -b "cn=users,cn=accounts,dc=..." '(uid=user1)' '*'

Or are you talking about something else?


Hi,

yes that is this exact command. You may change it to collect more 
internal data with requesting 'nscpentrywsi' attribute rather than '*'.


ldapsearch -LLL -h localhost -x -D "cn=Directory Manager" -w "..." \
-b "cn=users,cn=accounts,dc=..." '(uid=user1)' nscpentrywsi

regards
thierry



Thanks
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


[Freeipa-users] Re: Need help with confusing query results

2022-01-31 Thread Thierry Bordaz via FreeIPA-users

Hi Edward,

It is looking the fixup task stop upon the first error. I do not know if 
it is intentional or a bug. The error is possibly related to schema 
checking, could you send the ldif format of entry 'uid=user1, 
cn=users,...' ?


regards
thierry


On 1/29/22 11:36 PM, Edward Valley via FreeIPA-users wrote:

Hi Thierry,

Manually creating the task makes it run, but not with the expected result:

DATE_NOW="$(date +%s)"
ldapmodify -h localhost -D "cn=Directory Manager" -w "..." -a < fixup 
failed -> uid=user1,cn=users,cn=accounts,dc=... Operation
[...] - INFO - plugins/entryuuid/src/lib.rs:182 - task_handler -> fixup 
complete, success!

It simply stops when attempting to change the first user matching the filter.
If the filter directly points to a user that already has an entryUUID 
attribute, a success message is printed.

The error is maybe not related to the plugin, but I don't have any replication 
problem.
It isn't clear to me.

Thanks
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


[Freeipa-users] Re: Need help with confusing query results

2022-01-28 Thread Thierry Bordaz via FreeIPA-users

Hi Edward,

I think you may try to create the task manually

ldapmodify -D "cn=directory manager" -w ... -a <,cn=entryuuid task,cn=tasks,cn=config
objectClass: top
objectClass: extensibleObject
basedn: 
cn: entryuuid_fixup_
!

If you want to fixup only specific entries you many add the following 
attribute to the task entry


filter: 

regards
thierry

On 1/28/22 5:35 PM, Edward Valley via FreeIPA-users wrote:

Hi,
Thanks for the tip.
Any workaround in the mean time?
I couldn't find one.
Thanks
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


[Freeipa-users] Re: Need help with confusing query results

2022-01-25 Thread Thierry Bordaz via FreeIPA-users

Hi Edward,

would you run 'dsconf localhost config get nsslapd-ignore-virtual-attrs' 
and check its value. It should be 'on'.


Would you retry the same search after  setting it to 'off'  ?

thanks
thierry

On 1/24/22 10:16 PM, Edward Valley via FreeIPA-users wrote:

This is the version installed:
389-ds-base-1.4.3.23-12.module+el8.5.0+722+e2a0b219.x86_64

Thanks
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


[Freeipa-users] Re: sudorules attribute "entryuuid" not allowed

2021-11-23 Thread Thierry Bordaz via FreeIPA-users

Hi Kees,

Indeed this problem may have raised because in intermediate centos 
builds (without #4872 fix) we delivered a wrong attribute definition.


ATM we need to get the 'entryuuid' definition on Centos7.
I guess it is not present there. You may check with 'ldapsearch -D "DM" 
-b "cn=schema" -o ldif-wrap=no -LLL attributetypes |grep -i entryuuid


I see two options:

 * Do a dummy update of the schema (add a dummy attributetype) on
   Centos8, so that it contains a nsschemaCSN that is recent. Then next
   replication session, the new definition will be learned by Centos7.
 * stop centos7 instance, copy the content of 03entryuuid.ldif into the
   99users.ldif of the instance, start the instance

regards
thierry

On 11/23/21 4:12 PM, Kees Bakker wrote:

Hi Thierry,

It was not sufficient to modify 03entryuuid.ldif. I'm still getting 
the attribute "entryuuid" not allowed error on the Centos 7 system.


Do I need to disable the entryUUID plugin? If so, how do I do that?
-- Kees

On 23-11-2021 10:29, Thierry Bordaz wrote:

Hi Kees,

The missing fix #4872 is pretty small [1]. Initial definition of 
entryuuid required a syntax/MR that was not available with previous 
versions, so it broke schema replication in mixed topology.


A easy workaround is to stop 1.4.3.23 instance, edit 
/usr/share/dirsrv/schema/03entryuuid.ldif on 1.4.3.23 installations 
and restart the server. A dummy update on 1.4.3.23 will trigger the 
replication of the schema definition of 'entryuuid' and then CentOS 7 
instance will be able to manage entryuuid attribute.


Regards
theirry


[1] 
https://github.com/389ds/389-ds-base/commit/bce941ec3cdf77eaf4bc3ea744f1df6e5bfd9d38


On 11/23/21 10:17 AM, Kees Bakker via FreeIPA-users wrote:
So, I have 1.4.3.23. A change was made in 1.4.3.26 (commit 
f370a281b8, Issue 4872).

The latest in Centos 8 Stream is 1.4.3.23-10

That leaves me with the following questions.

1. What do I need to do to disable the entryUUID plugin?
2. What do I need to do to fix the current LDAP conflict?
3. Do I really need 389-ds-base 1.4.3.26 or later (if I manage to 
disable the entryUUID plugin)?

-- Kees

On 22-11-2021 20:04, Kees Bakker via FreeIPA-users wrote:

On Centos 7

389-ds-base-snmp-1.3.9.1-13.el7_7.x86_64
389-ds-base-libs-1.3.9.1-13.el7_7.x86_64
389-ds-base-1.3.9.1-13.el7_7.x86_64
389-ds-base-debuginfo-1.3.9.1-13.el7_7.x86_64

On Centos 8 Stream

389-ds-base-1.4.3.23-7.module_el8.5.0+889+90e0384f.x86_64
python3-lib389-1.4.3.23-7.module_el8.5.0+889+90e0384f.noarch
389-ds-base-libs-1.4.3.23-7.module_el8.5.0+889+90e0384f.x86_64
-- Kees

On 22-11-2021 18:39, Florence Blanc-Renaud wrote:

Hi,

the error looks similar to 
https://github.com/389ds/389-ds-base/issues/4872 
.
The CentOS 8 Streams master probably has a version of 389ds that 
doesn't contain the fix, and has entryuuid plugin enabled (that 
generates an entryuuid attribute). The schema failed to be 
replicated to the CentOS 7 server, and the entryuuid attribute 
present in the entry causes replication issues.


Which versions are installed on the other replicas? You may have 
to disable the entryuuid plugin or update 389ds.

flo


On Mon, Nov 22, 2021 at 3:30 PM Kees Bakker via FreeIPA-users 
> wrote:


Hi,

On my Centos 7 master there was this error message

[19/Nov/2021:11:16:11.863597190 +0100] - ERR -
oc_check_allowed_sv - Entry

"ipaUniqueID=b2211c08-4921-11ec-974b-509a4c9d3b10,cn=sudorules,cn=sudo,dc=example,dc=com"
-- attribute "entryuuid" not allowed
[19/Nov/2021:11:16:26.331298112 +0100] - ERR -
oc_check_allowed_sv - Entry

"ipaUniqueID=b2211c08-4921-11ec-974b-509a4c9d3b10,cn=sudorules,cn=sudo,dc=example,dc=com"
-- attribute "entryuuid" not allowed
[19/Nov/2021:11:16:45.264647201 +0100] - ERR -
oc_check_allowed_sv - Entry

"ipaUniqueID=b2211c08-4921-11ec-974b-509a4c9d3b10,cn=sudorules,cn=sudo,dc=example,dc=com"
-- attribute "entryuuid" not allowed

The sudorule was add via the web-GUI on a Centos 8stream master.

The replication more or less succeeded, besides this error
message. However,
* checkipaconsistency reports "LDAP Conflicts" (the Centos 7
master has count 1, the other masters have count 0)
* ipa-healthcheck reports an error too

[
   {
 "source": "ipahealthcheck.ds.replication",
 "kw": {
   "msg": "Replication conflict",
   "glue": false,
   "conflict": "Schema violation",
   "key":

"ipaUniqueID=b2211c08-4921-11ec-974b-509a4c9d3b10,cn=sudorules,cn=sudo,dc=ghs,dc=nl"
 },
 "uuid": "01d364fc-e48e-44bd-9ea8-63db1e800788",
 "duration": "0.001689",
 "when": "20211122070012Z",
 "check": "ReplicationConflictCheck",
 "result": "ERROR"
   }
]

Any advise how to get rid of the error messages would be
greatly appreciated.
-- 
Kees


[Freeipa-users] Re: sudorules attribute "entryuuid" not allowed

2021-11-23 Thread Thierry Bordaz via FreeIPA-users

Hi Kees,

The missing fix #4872 is pretty small [1]. Initial definition of 
entryuuid required a syntax/MR that was not available with previous 
versions, so it broke schema replication in mixed topology.


A easy workaround is to stop 1.4.3.23 instance, edit 
/usr/share/dirsrv/schema/03entryuuid.ldif on 1.4.3.23 installations and 
restart the server. A dummy update on 1.4.3.23 will trigger the 
replication of the schema definition of 'entryuuid' and then CentOS 7 
instance will be able to manage entryuuid attribute.


Regards
theirry


[1] 
https://github.com/389ds/389-ds-base/commit/bce941ec3cdf77eaf4bc3ea744f1df6e5bfd9d38


On 11/23/21 10:17 AM, Kees Bakker via FreeIPA-users wrote:
So, I have 1.4.3.23. A change was made in 1.4.3.26 (commit f370a281b8, 
Issue 4872).

The latest in Centos 8 Stream is 1.4.3.23-10

That leaves me with the following questions.

1. What do I need to do to disable the entryUUID plugin?
2. What do I need to do to fix the current LDAP conflict?
3. Do I really need 389-ds-base 1.4.3.26 or later (if I manage to 
disable the entryUUID plugin)?

-- Kees

On 22-11-2021 20:04, Kees Bakker via FreeIPA-users wrote:

On Centos 7

389-ds-base-snmp-1.3.9.1-13.el7_7.x86_64
389-ds-base-libs-1.3.9.1-13.el7_7.x86_64
389-ds-base-1.3.9.1-13.el7_7.x86_64
389-ds-base-debuginfo-1.3.9.1-13.el7_7.x86_64

On Centos 8 Stream

389-ds-base-1.4.3.23-7.module_el8.5.0+889+90e0384f.x86_64
python3-lib389-1.4.3.23-7.module_el8.5.0+889+90e0384f.noarch
389-ds-base-libs-1.4.3.23-7.module_el8.5.0+889+90e0384f.x86_64
-- Kees

On 22-11-2021 18:39, Florence Blanc-Renaud wrote:

Hi,

the error looks similar to 
https://github.com/389ds/389-ds-base/issues/4872 
.
The CentOS 8 Streams master probably has a version of 389ds that 
doesn't contain the fix, and has entryuuid plugin enabled (that 
generates an entryuuid attribute). The schema failed to be 
replicated to the CentOS 7 server, and the entryuuid attribute 
present in the entry causes replication issues.


Which versions are installed on the other replicas? You may have to 
disable the entryuuid plugin or update 389ds.

flo


On Mon, Nov 22, 2021 at 3:30 PM Kees Bakker via FreeIPA-users 
> wrote:


Hi,

On my Centos 7 master there was this error message

[19/Nov/2021:11:16:11.863597190 +0100] - ERR -
oc_check_allowed_sv - Entry

"ipaUniqueID=b2211c08-4921-11ec-974b-509a4c9d3b10,cn=sudorules,cn=sudo,dc=example,dc=com"
-- attribute "entryuuid" not allowed
[19/Nov/2021:11:16:26.331298112 +0100] - ERR -
oc_check_allowed_sv - Entry

"ipaUniqueID=b2211c08-4921-11ec-974b-509a4c9d3b10,cn=sudorules,cn=sudo,dc=example,dc=com"
-- attribute "entryuuid" not allowed
[19/Nov/2021:11:16:45.264647201 +0100] - ERR -
oc_check_allowed_sv - Entry

"ipaUniqueID=b2211c08-4921-11ec-974b-509a4c9d3b10,cn=sudorules,cn=sudo,dc=example,dc=com"
-- attribute "entryuuid" not allowed

The sudorule was add via the web-GUI on a Centos 8stream master.

The replication more or less succeeded, besides this error
message. However,
* checkipaconsistency reports "LDAP Conflicts" (the Centos 7
master has count 1, the other masters have count 0)
* ipa-healthcheck reports an error too

[
   {
 "source": "ipahealthcheck.ds.replication",
 "kw": {
   "msg": "Replication conflict",
   "glue": false,
   "conflict": "Schema violation",
   "key":

"ipaUniqueID=b2211c08-4921-11ec-974b-509a4c9d3b10,cn=sudorules,cn=sudo,dc=ghs,dc=nl"
 },
 "uuid": "01d364fc-e48e-44bd-9ea8-63db1e800788",
 "duration": "0.001689",
 "when": "20211122070012Z",
 "check": "ReplicationConflictCheck",
 "result": "ERROR"
   }
]

Any advise how to get rid of the error messages would be greatly
appreciated.
-- 
Kees

___
FreeIPA-users mailing list --
freeipa-users@lists.fedorahosted.org

To unsubscribe send an email to
freeipa-users-le...@lists.fedorahosted.org

Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/

List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines

List Archives:

https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org


Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure





___
FreeIPA-users mailing list 

[Freeipa-users] Re: 389ds on latest CentOS 8 Steam - broken update ?! - undefined symbol

2021-11-17 Thread Thierry Bordaz via FreeIPA-users

Hi Lejeczek,

It is looking like https://bugzilla.redhat.com/show_bug.cgi?id=2023056.

You may workaround that issue with 
https://bugzilla.redhat.com/show_bug.cgi?id=2023056#c3. Still looking 
the proper way to fix it.


regards
thierry

On 11/17/21 2:16 PM, lejeczek via FreeIPA-users wrote:

Hi guys.

I've just gotten some rpm updates and now 'dirsrv' fails with:

> $ journalctl -lf -o cat -u dirsrv@PRIV-MINE.service
dirsrv@PRIV-MINE.service: Failed with result 'exit-code'.
Failed to start 389 Directory Server PRIV-MINE..
Starting 389 Directory Server PRIV-MINE
[17/Nov/2021:11:09:00.635419084 +] - ERR - symload_report_error - 
Netscape Portable Runtime error -5975: 
/usr/lib64/dirsrv/plugins/libpwdstorage-plugin.so: undefined symbol: 
gost_yescrypt_pwd_storage_scheme_init
[17/Nov/2021:11:09:00.658674244 +] - ERR - symload_report_error - 
Could not load symbol "gost_yescrypt_pwd_storage_scheme_init" from 
"libpwdstorage-plugin" for plugin GOST_YESCRYPT
[17/Nov/2021:11:09:00.662676168 +] - ERR - slapd_bootstrap_config 
- The plugin entry [cn=GOST_YESCRYPT,cn=Password Storage 
Schemes,cn=plugins,cn=config] in the configfile 
/etc/dirsrv/slapd-PRIV-MINE/dse.ldif was invalid. Failed to load 
plugin's init function.
[17/Nov/2021:11:09:00.676121018 +] - EMERG - main - The 
configuration files in directory /etc/dirsrv/slapd-PRIV-MINE could not 
be read or were not found.  Please refer to the error log or output 
for more information.
dirsrv@PRIV-MINE.service: Main process exited, code=exited, 
status=1/FAILURE

dirsrv@PRIV-MINE.service: Failed with result 'exit-code'.

affected RPMs:
389-ds-base-1.4.3.23-10.module_el8.5.0+946+51aba098.x86_64 Tue 16 Nov 
2021 22:20:36 GMT
389-ds-base-libs-1.4.3.23-10.module_el8.5.0+946+51aba098.x86_64 Tue 16 
Nov 2021 22:20:35 GMT


simple:
-> $ dnf downgrade 389-ds-base*
seems like a "fix"

L.
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to 
freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/

List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


[Freeipa-users] Re: IPA slapd parameter tuning

2021-09-17 Thread Thierry Bordaz via FreeIPA-users


On 9/17/21 12:26 AM, Kathy Zhu via FreeIPA-users wrote:

Hi Mark,

If it helps, this is the same ipa server which I posted in subject 
"ipa_check_consistency alerts and ERR - slapd_poll - Timed out" 
yesterday.



Hi Kathy,

The slapd_poll message is likely not related to the DB_PANIC. Slap_poll 
here means that the server was not able to send a result (ldap client 
not reading ?) for longer than ioblock-timeout.


The DB panic is a fatal error of the database that requires a restart of 
the server. The restart will trigger a DB recovery. It is difficult to 
know the RC of the DB panic. According to the initial message DB was 
running a deadlock resolution during DB panic, you may try to give 
priority to updates (setting nsslapd-db-deadlock-policy: 6) that can 
significantly reduces db deadlocks.


regards
thierry


Thanks.

Kathy.

On Thu, Sep 16, 2021 at 2:57 PM Kathy Zhu wrote:

Thanks, Mark, for your reply.

The following repeats in /var/log/dirsrv/slapd-EXAMPLE-COM/errors:
...

[16/Sep/2021:08:34:27.880349688 -0700] - CRIT -
deadlock_threadmain - Serious Error---Failed in deadlock detect
(aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error,
run database recovery)

[16/Sep/2021:08:34:27.980810867 -0700] - ERR - libdb - BDB0060
PANIC: fatal region error detected; run recovery

[16/Sep/2021:08:34:27.981036823 -0700] - CRIT -
deadlock_threadmain - Serious Error---Failed in deadlock detect
(aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error,
run database recovery)

[16/Sep/2021:08:34:28.031642976 -0700] - ERR - libdb - BDB0060
PANIC: fatal region error detected; run recovery

[16/Sep/2021:08:34:28.031856673 -0700] - ERR - trickle_threadmain
- Serious Error---Failed to trickle, err=-30973 (BDB0087
DB_RUNRECOVERY: Fatal error, run database recovery)

[16/Sep/2021:08:34:28.081390783 -0700] - ERR - libdb - BDB0060
PANIC: fatal region error detected; run recovery

[16/Sep/2021:08:34:28.081634618 -0700] - CRIT -
deadlock_threadmain - Serious Error---Failed in deadlock detect
(aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error,
run database recovery)

[16/Sep/2021:08:34:28.181946001 -0700] - ERR - libdb - BDB0060
PANIC: fatal region error detected; run recovery

[16/Sep/2021:08:34:28.182160603 -0700] - CRIT -
deadlock_threadmain - Serious Error---Failed in deadlock detect
(aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error,
run database recovery)

[16/Sep/2021:08:34:28.282366716 -0700] - ERR - libdb - BDB0060
PANIC: fatal region error detected; run recovery

[16/Sep/2021:08:34:28.282650113 -0700] - ERR - trickle_threadmain
- Serious Error---Failed to trickle, err=-30973 (BDB0087
DB_RUNRECOVERY: Fatal error, run database recovery)

[16/Sep/2021:08:34:28.283083329 -0700] - ERR - libdb - BDB0060
PANIC: fatal region error detected; run recovery

...

Thanks!

Kathy.


On Thu, Sep 16, 2021 at 2:38 PM Mark Reynolds
mailto:mreyno...@redhat.com>> wrote:


On 9/16/21 5:20 PM, Kathy Zhu via FreeIPA-users wrote:

Hi List,

One of my ipa server's database had issue and left many log
entries like the following in messages and slapd errors log:

*Sep 16 08*:34:28 ipa0 ns-slapd:
[16/Sep/2021:08:34:28.886632992 -0700] - ERR - libdb -
BDB0060 PANIC: fatal region error detected; run recovery

*Sep 16 08*:34:29 ipa0 ns-slapd:
[16/Sep/2021:08:34:28.987593487 -0700] - ERR - libdb -
BDB0060 PANIC: fatal region error detected; run recovery

*Sep 16 08*:34:29 ipa0 ns-slapd:
[16/Sep/2021:08:34:29.035181321 -0700] - ERR - libdb -
BDB0060 PANIC: fatal region error detected; run recovery


Is there anything else in the error log around these
messages?  This is kind of a generic error, and increasing the
DN cache is not a guarantee it will resolve this.


Restart ipa fixed the issue. I googled for root cause and
found the verified solution -
https://access.redhat.com/solutions/3098131
, which is to
increase nsslapd-dncachememsize to a reasonable value
(>150MB). This sounds like easy, however, all slapd cache
parameters are related. Red Hat Directory Server performance
tuning guide explain a bit:


https://access.redhat.com/documentation/en-us/red_hat_directory_server/10/html/performance_tuning_guide/memoryusage



However, I wonder if there is a better guide.


Not really :-)  There is a RHDS 11 version, but I think the
performance tuning part is the same as RHDS 10.


Mark



Thanks.

Kathy.


[Freeipa-users] Re: permission on ldap subtree

2021-07-07 Thread Thierry Bordaz via FreeIPA-users

Hi,

The client application did a search request with a filter testing 
'objectclass' attribute. The connection was unbound, so the server was 
looking for an aci granting anonymous access (userdn = "ldap:///anyone;) 
to 'objectclass'  on entry cn=oradev1.  As it does not exist such aci 
the entry was skipped.


Is it expected to allow anonymous requests ? If yes then you may add  
'objectclass' in the target definition of the anonymous aci.


best regards
thierry



On 7/7/21 9:36 AM, iulian roman via FreeIPA-users wrote:

After enabling the debug , in the logs I see access denied:

[07/Jul/2021:09:27:58.612128660 +0200] - DEBUG - NSACLPlugin - print_access_control_summary - 
conn=11 op=1 (main): Deny search on 
entry(cn=oradev1,cn=oraclecontext,dc=ipadev,dc=example,dc=com).attr(objectClass) to anonymous: no 
aci matched the subject by aci(22): aciname= "Admin can manage any entry", 
acidn="dc=ipadev,dc=example,dc=com"

I do now know if I need to add some extra filters in the permission or how the 
permission rule should look like. Do not know either if it is case sensitive or 
not (although in the query and ldap I have cn=OracleContext , in the logs I see 
it is cn=oraclecontext), therefore I am a bit confused here.
Any help would be really appreciated.
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


[Freeipa-users] Re: Consumer failed to replay change Operations error (1)

2021-06-17 Thread Thierry Bordaz via FreeIPA-users

Hello Alfred,

If it is IPA deployment I doubt that you hit [1] because it only applies 
on read-only replica (hub/consumer). Also this bug is fixed in the 
version you are running.


The consumer (redactedauth0003.redacted.com 
) 
fails to apply a replicated MOD targeting the admin group. It is not 
clear if the failure occurs at changelog update or RUV update. It is 
looking it is a permanent failure so you may enable replication debug 
log in case it gives more details why it is failing.


regards
thierry



[1] https://bugzilla.redhat.com/show_bug.cgi?id=1574602 



On 6/17/21 11:12 AM, Florence Renaud via FreeIPA-users wrote:
Forwarding to 389-us...@lists.fedoraproject.org 
 
as they may have more inputs.


On Wed, Jun 16, 2021 at 11:31 PM Alfred Victor via FreeIPA-users 
> wrote:


Hi FreeIPA,

We have some replication messages in our slapd errors log which
look very like the ones discussed here:

https://bugzilla.redhat.com/show_bug.cgi?id=1574602


I took a look and we do have the MemberOf plugin, but our version
of 389-ds newer:

*389-ds-base-1.3.10.2-10.el7_9.x86_64*

*
*

*
*

Hoping someone might have a suggestion for what we might do to get
rid of these log messages, or what the root cause may be/impact?
They've been going since at least a couple of weeks ago:
*
*

[15/Jun/2021:18:57:26.362094959 -0500] - WARN -
NSMMReplicationPlugin - repl5_inc_update_from_op_result -
agmt="cn=redactedauth0001.redacted.com-to-redactedauth0003.redacted.com
"
(redactedauth0003:389): Consumer failed to replay change
(uniqueid d5896001-39a111eb-8868efc8-91dc0b98, CSN
60c93bc200040025): Operations error (1). Will retry later.




I looked for this same uniqueid (they are ALL the same uniqueID) and found this which 
is interesting and references a specific cn and "optype":


[03/Jun/2021:15:45:43.332068775 -0500] - ERR -
NSMMReplicationPlugin - write_changelog_and_ruv - Can't add a
change for cn=admin,cn=groups,cn=accounts,dc=redacted,dc=com
(uniqid: d5896001-39a111eb-8868efc8-91dc0b98, optype: 8) to
changelog csn 60b93f9300520023



Alfred

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org

To unsubscribe send an email to
freeipa-users-le...@lists.fedorahosted.org

Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/

List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines

List Archives:

https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org


Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure



___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


[Freeipa-users] Re: dirsrv hangs soon after reboot

2021-05-13 Thread Thierry Bordaz via FreeIPA-users


On 5/12/21 8:41 PM, Kees Bakker wrote:

On 12-05-2021 19:44, Thierry Bordaz wrote:

On 5/12/21 4:55 PM, Kees Bakker wrote:

Hi Thierry,

Just to be clear, changelogmaxage was changed to -1 by me after the
upgrade and I've
confirmed it is now set to -1.

The reason for me to change the value was because of the deadlock.
Apparently, it did not make much of a difference. It still gets into a
deadlock
with the value -1.



Did you set nsslapd-changelogmaxage=-1 in the retroCL config entry ?


I've used this input file for ldapmodify

[root@linge ~]# cat change-nsslapd-changelogmaxage.txt
dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
changetype: modify
replace: nsslapd-changelogmaxage
nsslapd-changelogmaxage: -1

After that

[root@linge ~]# ldapsearch -H 
ldapi://%2fvar%2frun%2fslapd-GHS-NL.socket -LLL -o ldif-wrap=no -b 
'cn=Retro Changelog Plugin,cn=plugins,cn=config'

SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
cn: Retro Changelog Plugin
nsslapd-attribute: nsuniqueid:targetUniqueId
nsslapd-changelogmaxage: -1
nsslapd-include-suffix: cn=dns,dc=ghs,dc=nl
nsslapd-plugin-depends-on-named: Class of Service
nsslapd-plugin-depends-on-type: database
nsslapd-pluginDescription: Retrocl Plugin
nsslapd-pluginEnabled: on
nsslapd-pluginId: retrocl
nsslapd-pluginInitfunc: retrocl_plugin_init
nsslapd-pluginPath: libretrocl-plugin
nsslapd-pluginType: object
nsslapd-pluginVendor: 389 Project
nsslapd-pluginVersion: 1.3.10.2
nsslapd-pluginbetxn: on
nsslapd-pluginprecedence: 25
objectClass: top
objectClass: nsSlapdPlugin
objectClass: extensibleObject



You did the right setting. I do not understand why thread 2 is trimming 
records with that setting.










BTW. I've sent the whole stacktrace directly to you to avoid the 4000+
lines to
this mailing list.

Here is the stack trace of one of the threads. The one that hangs in
trim_changelog



Something weird is that the same hang occurs
https://bugzilla.redhat.com/show_bug.cgi?id=1751295.
If I am correct it is between thread 14th and 2nd.

As you are running with a fixed version (> slapi-nis-0.56.5), that means
the fix is incomplete. It would require to debug a dump core to know why
the fix fails.


[root@linge ~]# rpm -qa slapi\*
slapi-nis-0.56.5-3.el7_9.x86_64

So, that is not > 0.56.5, but it is the newest available for CentOS7. No?


reading the 'fixed version' in the bug says it if fixed starting 0.56.5

I do not know if it match centOS7 versions



At the moment I killed it and restarted. No deadlock yet, but the 
system is extreemly

slow. And there is just ns-slapd using CPU cycles.



Would you give some precision what is slow ? (response time, replication 
time,..)


If it is eating CPU, I would recommend 'top -H -p  -b'. If it is 
always the same thread(s) eating the CPU, a pstack will tell you what 
it/they is/are doing. If this is various different threads likely it is 
due to processed requests.






regards
thierry




Thread 2 (Thread 0x7f96ede68700 (LWP 2151)):
#0  0x7f96eaf3939e in pthread_rwlock_wrlock () at
/lib64/libpthread.so.0
#1  0x7f96da9d281f in map_wrlock () at
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x7f96da9c1e58 in backend_shr_delete_cb.part.5 () at
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#3  0x7f96da9c1fd1 in backend_shr_betxn_post_delete_cb () at
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#4  0x7f96ed7ec688 in plugin_call_func (list=0x55cb03cdd8c0,
operation=operation@entry=563, pb=pb@entry=0x55cb4c670fc0,
call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:2028
    n = 
    func = 0x7f96da9c1f70 
    rc = 
    return_value = 0
    count = 2
#5  0x7f96ed7ec943 in plugin_call_list (pb=0x55cb4c670fc0,
operation=563, list=) at 
ldap/servers/slapd/plugin.c:1972

    p = 0x55cb03cb
    locked = 
    plugin_list_number = 21
    rc = 0
    do_op = 
#6  0x7f96ed7ec943 in plugin_call_plugins
(pb=pb@entry=0x55cb4c670fc0, whichfunction=whichfunction@entry=563) at
ldap/servers/slapd/plugin.c:442
    p = 0x55cb03cb
    locked = 
    plugin_list_number = 21
    rc = 0
    do_op = 
#7  0x7f96dc990def in ldbm_back_delete (pb=0x55cb4c670fc0) at
ldap/servers/slapd/back-ldbm/ldbm_delete.c:1267
    be = 0x55cb03c31040
    inst = 0x55cb03a8c680
    li = 0x55cb03a017c0
    e = 0x55cb15499a40
    tombstone = 0x0
    original_tombstone = 0x0
    tmptombstone = 0x0
    dn = 0x55cb1742a600 "changenumber=891343,cn=changelog"
    txn = {back_txn_txn = 0x55cb2b7b6dc0}
    parent_txn = 0x0
    retval = 0
    msg = 
    errbuf = 0x0
    retry_count = 
    disk_full = 0
    parent_found = 
    ruv_c_init = 0
    parent_modify_c = {old_entry = 0x55cb159982a0, new_entry =
0x55cb19e03030, smods = 0x55cb47da1ae0, attr_encrypt = 1}
    ruv_c 

[Freeipa-users] Re: dirsrv hangs soon after reboot

2021-05-12 Thread Thierry Bordaz via FreeIPA-users


On 5/12/21 4:55 PM, Kees Bakker wrote:

Hi Thierry,

Just to be clear, changelogmaxage was changed to -1 by me after the 
upgrade and I've

confirmed it is now set to -1.

The reason for me to change the value was because of the deadlock.
Apparently, it did not make much of a difference. It still gets into a 
deadlock

with the value -1.



Did you set nsslapd-changelogmaxage=-1 in the retroCL config entry ?




BTW. I've sent the whole stacktrace directly to you to avoid the 4000+ 
lines to

this mailing list.

Here is the stack trace of one of the threads. The one that hangs in 
trim_changelog



Something weird is that the same hang occurs 
https://bugzilla.redhat.com/show_bug.cgi?id=1751295.

If I am correct it is between thread 14th and 2nd.

As you are running with a fixed version (> slapi-nis-0.56.5), that means 
the fix is incomplete. It would require to debug a dump core to know why 
the fix fails.


regards
thierry




Thread 2 (Thread 0x7f96ede68700 (LWP 2151)):
#0  0x7f96eaf3939e in pthread_rwlock_wrlock () at 
/lib64/libpthread.so.0
#1  0x7f96da9d281f in map_wrlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x7f96da9c1e58 in backend_shr_delete_cb.part.5 () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#3  0x7f96da9c1fd1 in backend_shr_betxn_post_delete_cb () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#4  0x7f96ed7ec688 in plugin_call_func (list=0x55cb03cdd8c0, 
operation=operation@entry=563, pb=pb@entry=0x55cb4c670fc0, 
call_one=call_one@entry=0) at ldap/servers/slapd/plugin.c:2028

    n = 
    func = 0x7f96da9c1f70 
    rc = 
    return_value = 0
    count = 2
#5  0x7f96ed7ec943 in plugin_call_list (pb=0x55cb4c670fc0, 
operation=563, list=) at ldap/servers/slapd/plugin.c:1972

    p = 0x55cb03cb
    locked = 
    plugin_list_number = 21
    rc = 0
    do_op = 
#6  0x7f96ed7ec943 in plugin_call_plugins 
(pb=pb@entry=0x55cb4c670fc0, whichfunction=whichfunction@entry=563) at 
ldap/servers/slapd/plugin.c:442

    p = 0x55cb03cb
    locked = 
    plugin_list_number = 21
    rc = 0
    do_op = 
#7  0x7f96dc990def in ldbm_back_delete (pb=0x55cb4c670fc0) at 
ldap/servers/slapd/back-ldbm/ldbm_delete.c:1267

    be = 0x55cb03c31040
    inst = 0x55cb03a8c680
    li = 0x55cb03a017c0
    e = 0x55cb15499a40
    tombstone = 0x0
    original_tombstone = 0x0
    tmptombstone = 0x0
    dn = 0x55cb1742a600 "changenumber=891343,cn=changelog"
    txn = {back_txn_txn = 0x55cb2b7b6dc0}
    parent_txn = 0x0
    retval = 0
    msg = 
    errbuf = 0x0
    retry_count = 
    disk_full = 0
    parent_found = 
    ruv_c_init = 0
    parent_modify_c = {old_entry = 0x55cb159982a0, new_entry = 
0x55cb19e03030, smods = 0x55cb47da1ae0, attr_encrypt = 1}
    ruv_c = {old_entry = 0x0, new_entry = 0x0, smods = 0x0, 
attr_encrypt = 0}

    rc = 0
    ldap_result_code = 0
    ldap_result_message = 0x0
    sdnp = 0x55cb5fd93bc0
    e_uniqueid = 0x0
    nscpEntrySDN = {flag = 0 '\000', udn = 0x0, dn = 0x0, ndn = 
0x0, ndn_len = 0}

    operation = 0x55cb5ef86000
    opcsn = 0x0
    is_fixup_operation = 0
    is_ruv = 0
    is_replicated_operation = 0
    is_tombstone_entry = 
    delete_tombstone_entry = 0
    create_tombstone_entry = 0
    addr = 0x55cb5ef86100
    addordel_flags = 38
    entryusn_str = 0x0
    orig_entry = 0x0
    parentsdn = {flag = 2 '\002', udn = 0x0, dn = 0x55cb516c3580 
"cn=changelog", ndn = 0x0, ndn_len = 12}

    opreturn = 0
    free_delete_existing_entry = 1
    not_an_error = 0
    parent_switched = 0
    myrc = 0
    conn_id = 0
    tombstone_csn = 
    deletion_csn_str = 
"\247\001\000\000\000\000\000\000\000T\334YHi\365\357\300\017g", 


    op_id = 0
    ep_id = 
    tomb_ep_id = 0
    result_sent = 0
    pb_conn = 0x0
    parent_op = 1
    parent_time = {tv_sec = 183860, tv_nsec = 41743941}
#8  0x7f96ed79d3bb in op_shared_delete 
(pb=pb@entry=0x55cb4c670fc0) at ldap/servers/slapd/delete.c:324

    rc = 0
    rawdn = 0x55cb5b28de00 "changenumber=891343, cn=changelog"
    dn = 
    be = 0x55cb03c31040
    internal_op = 32
    sdn = 0x55cb5fd93bc0
    operation = 0x55cb5ef86000
    referral = 0x0
    ecopy = 0x0
    errorbuf = "\000\066\062\062Z\nmodifyTime", '\000' times>, "\071\063\066\062\062Z\nnsUniqueI", '\000' , 

[Freeipa-users] Re: dirsrv hangs soon after reboot

2021-05-12 Thread Thierry Bordaz via FreeIPA-users

Hi Kees,

Is changelogmaxage=-1 after the upgrade ?

would you send a full pstack when it hangs ? If pthread_rwlock_wrlock is 
trim_changelog then you may hit another flavor of [1] (without known 
reason).


regards
thierry

On 5/12/21 2:40 PM, Kees Bakker wrote:

Sorry to revive an old thread. I'm getting deadlocks again. See below

On 20-04-2020 15:16, thierry bordaz wrote:

[...]This is a known bug [1].
With the same bug there are two deadlock scenario but only one is 
fixed (for example in  slapi-nis-0.56.4-1 [2]).

A fix for the second one is under tests.

At the moment I would recommend the workaround [3]. The drawback is a 
growth of the retroCL database but unless you have a very high rate 
of update it should not be a concern.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1751295
[2] https://koji.fedoraproject.org/koji/buildinfo?buildID=1457771
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1751295#c5

best regards
thierry



I followed the recommendation to set nsslapd-changelogmaxage to -1. 
The system

has been running successfully for a year.

Recently I upgraded all packages in this CentOS7 system. Ever since 
that moment

the server is quite unusable.

[root@linge ~]# gdb -ex 'set confirm off' -ex 'set pagination off' -ex 
'thread apply all bt full' -ex 'quit' /usr/sbin/ns-slapd `pidof 
ns-slapd` |& grep '^#0.*lock'
#0  0x7f96eaf3939e in pthread_rwlock_wrlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf39184 in pthread_rwlock_rdlock () at 
/lib64/libpthread.so.0
#0  0x7f96eaf3939e in pthread_rwlock_wrlock () at 
/lib64/libpthread.so.0


[root@linge ~]# rpm -qa 389\*
389-ds-base-libs-1.3.10.2-10.el7_9.x86_64
389-ds-base-1.3.10.2-10.el7_9.x86_64
389-ds-base-debuginfo-1.3.10.2-10.el7_9.x86_64

[root@linge ~]# rpm -qa slapi\*
slapi-nis-0.56.5-3.el7_9.x86_64

[root@linge ~]# rpm -qa centos-release
centos-release-7-9.2009.1.el7.centos.x86_64

Are there any new hints to avoid the deadlock?

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


[Freeipa-users] Re: Replication issue with CSN generator

2020-04-22 Thread thierry bordaz via FreeIPA-users

Hi Morgan,

Sure. The most immediate and safest action is to do

|dn: cn=config changetype: modify replace: nsslapd-ignore-time-skew 
nsslapd-ignore-time-skew: on |


On all servers in the topology (no need to restart). Then monitor if 
replication is catching up.
Okay NTP issues is likely the RC of your time skew but there is not easy 
way to prove it if any.


best regards
theirry



On 4/22/20 3:16 PM, Morgan Marodin via FreeIPA-users wrote:

Hi.

I don't have access to RedHat portal :(
There are similar articles in a public forum?

Anyway ... could I stop ipa-server, change the value of 
/nsslapd-ignore-time-skew/ into 
//etc/dirsrv/slapd-IPA-MYDOMAIN-COM/dse.ldif/ and start again the server?

Or is more complicated to change the configuration?

VMs are local, but the cluster where the 1st server is running is 
affected by NTP problems ...
For this reason I want to remove the First Master and install another 
replica in the new cluster.


Thanks, bye.
Morgan

Il giorno mer 22 apr 2020 alle ore 11:33 thierry bordaz via 
FreeIPA-users <mailto:freeipa-users@lists.fedorahosted.org>> ha scritto:


Hi,

CSN generator time skew is a pending issue still under investigation.

At the moment the way your csn generator is messed up looks not
fatal. You can allow replication to continue with the setting of
nsslapd-ignore-time-skew on all servers.
(https://access.redhat.com/solutions/1162703)

If it does not allow replication to continue there is a recovery
procedure but I would recommend to first try ignore-time-skew
(https://access.redhat.com/solutions/3543811)

NTP tuning or specific VMs are suspected to contribute to time
skew. What type of VMs are you using (local or cloud (AWS)) ?

best regards
thierry

On 4/21/20 5:42 PM, Morgan Marodin via FreeIPA-users wrote:

Hi.

Into my environment I have two IPA server, replicating each other.
They are both 7.6 OS systems, ipa-server RPM version is
4.6.4-10.0.1.el7_6.2.x86_64.

The first server installed was srv01 (many years ago), then I
installed the replica into srv02 (like a year later the 1st node).
When I had a single server I did also a trust with my corporate
Active Directory.
VMs are running in 2 different hypervisor clusters.

Now the replication doesn't works. Into log files I have this error:
/[16/Apr/2020:12:25:36.856632697 +0200] - ERR -
csngen_adjust_time - Adjustment limit exceeded; value - 23221226,
limit - 86400
[16/Apr/2020:12:25:36.857909222 +0200] - ERR -
NSMMReplicationPlugin - repl5_inc_run -
agmt="cn=meTosrv01.ipa.mydomain.com
<http://meTosrv01.ipa.mydomain.com>" (srv01:389): Fatal error -
too much time skew between replicas!
[16/Apr/2020:12:25:36.862233147 +0200] - ERR -
NSMMReplicationPlugin - repl5_inc_run -
agmt="cn=meTosrv01.ipa.mydomain.com
<http://meTosrv01.ipa.mydomain.com>" (srv01:389): Incremental
update failed and requires administrator action/

I tried to force the replica, but the limit exceeded problem
doesn't allow the sync.
I know that the problem is that CSN generator has become grossly
skewed.
Using the external script readNsState.py I found that there was
as offset time for about a month, so ... I waited for a month and
then the issue disappeared.
But now the offset is about 9 months ... I can't wait so much time :)

/[root@srv01 scripts]# ./readNsState.py
/etc/dirsrv/slapd-IPA-MYDOMAIN-COM/dse.ldif
nsState is BACCN/xfAHbiBAAABCgNdQ==
Little Endian
For replica
cn=replica,cn=dc\3Dipa\2Cdc\3Dmydomain\2Cdc\3Dcom,cn=mapping
tree,cn=con
  fmtstr=[H6x3QH6x]
  size=40
  len of nsstate is 40
  CSN generator state:
    Replica ID    : 4
    Sampled Time  : 1610364802
    Gen as csn    : 5ffc3782299650004
*Time as str   : Mon Jan 11 12:33:22 2021*
    Local Offset  : 320118
    Remote Offset : 10244
    Seq. num      : 29965
    System time   : Tue Apr 21 15:03:45 2020
    Diff in sec.  : -22890577
    Day:sec diff  : -265:5423

nsState is YAADLZheXSgTAA==
Little Endian
For replica cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config
  fmtstr=[H6x3QH6x]
  size=40
  len of nsstate is 40
  CSN generator state:
    Replica ID    : 96
    Sampled Time  : 1587031299
    Gen as csn    : 5e982d0300190096
    Time as str   : Thu Apr 16 12:01:39 2020
    Local Offset  : 0
    Remote Offset : 10333
    Seq. num      : 19
    System time   : Tue Apr 21 15:03:45 2020
    Diff in sec.  : 442926
    Day:sec diff  : 5:10926

[root@srv02 scripts]# ./readNsState.py
/etc/dirsrv/slapd-IPA-MYDOMAIN-COM/dse.ldif
nsState is AwBU7p5esVNiAQ==
Little Endian
Fo

[Freeipa-users] Re: Replication issue with CSN generator

2020-04-22 Thread thierry bordaz via FreeIPA-users

Hi,

CSN generator time skew is a pending issue still under investigation.

At the moment the way your csn generator is messed up looks not fatal. 
You can allow replication to continue with the setting of 
nsslapd-ignore-time-skew on all servers. 
(https://access.redhat.com/solutions/1162703)


If it does not allow replication to continue there is a recovery 
procedure but I would recommend to first try ignore-time-skew 
(https://access.redhat.com/solutions/3543811)


NTP tuning or specific VMs are suspected to contribute to time skew. 
What type of VMs are you using (local or cloud (AWS)) ?


best regards
thierry

On 4/21/20 5:42 PM, Morgan Marodin via FreeIPA-users wrote:

Hi.

Into my environment I have two IPA server, replicating each other.
They are both 7.6 OS systems, ipa-server RPM version is 
4.6.4-10.0.1.el7_6.2.x86_64.


The first server installed was srv01 (many years ago), then I 
installed the replica into srv02 (like a year later the 1st node).
When I had a single server I did also a trust with my corporate Active 
Directory.

VMs are running in 2 different hypervisor clusters.

Now the replication doesn't works. Into log files I have this error:
/[16/Apr/2020:12:25:36.856632697 +0200] - ERR - csngen_adjust_time - 
Adjustment limit exceeded; value - 23221226, limit - 86400
[16/Apr/2020:12:25:36.857909222 +0200] - ERR - NSMMReplicationPlugin - 
repl5_inc_run - agmt="cn=meTosrv01.ipa.mydomain.com 
" (srv01:389): Fatal error - too 
much time skew between replicas!
[16/Apr/2020:12:25:36.862233147 +0200] - ERR - NSMMReplicationPlugin - 
repl5_inc_run - agmt="cn=meTosrv01.ipa.mydomain.com 
" (srv01:389): Incremental update 
failed and requires administrator action/


I tried to force the replica, but the limit exceeded problem doesn't 
allow the sync.

I know that the problem is that CSN generator has become grossly skewed.
Using the external script readNsState.py I found that there was as 
offset time for about a month, so ... I waited for a month and then 
the issue disappeared.

But now the offset is about 9 months ... I can't wait so much time :)

/[root@srv01 scripts]# ./readNsState.py 
/etc/dirsrv/slapd-IPA-MYDOMAIN-COM/dse.ldif

nsState is BACCN/xfAHbiBAAABCgNdQ==
Little Endian
For replica 
cn=replica,cn=dc\3Dipa\2Cdc\3Dmydomain\2Cdc\3Dcom,cn=mapping tree,cn=con

  fmtstr=[H6x3QH6x]
  size=40
  len of nsstate is 40
  CSN generator state:
    Replica ID    : 4
    Sampled Time  : 1610364802
    Gen as csn    : 5ffc3782299650004
*Time as str   : Mon Jan 11 12:33:22 2021*
    Local Offset  : 320118
    Remote Offset : 10244
    Seq. num      : 29965
    System time   : Tue Apr 21 15:03:45 2020
    Diff in sec.  : -22890577
    Day:sec diff  : -265:5423

nsState is YAADLZheXSgTAA==
Little Endian
For replica cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config
  fmtstr=[H6x3QH6x]
  size=40
  len of nsstate is 40
  CSN generator state:
    Replica ID    : 96
    Sampled Time  : 1587031299
    Gen as csn    : 5e982d0300190096
    Time as str   : Thu Apr 16 12:01:39 2020
    Local Offset  : 0
    Remote Offset : 10333
    Seq. num      : 19
    System time   : Tue Apr 21 15:03:45 2020
    Diff in sec.  : 442926
    Day:sec diff  : 5:10926

[root@srv02 scripts]# ./readNsState.py 
/etc/dirsrv/slapd-IPA-MYDOMAIN-COM/dse.ldif

nsState is AwBU7p5esVNiAQ==
Little Endian
For replica 
cn=replica,cn=dc\3Dipa\2Cdc\3Dmydomain\2Cdc\3Dcom,cn=mapping tree,cn=con

  fmtstr=[H6x3QH6x]
  size=40
  len of nsstate is 40
  CSN generator state:
    Replica ID    : 3
    Sampled Time  : 1587474004
    Gen as csn    : 5e9eee540003
    Time as str   : Tue Apr 21 15:00:04 2020
    Local Offset  : 0
    Remote Offset : 23221169
    Seq. num      : 0
    System time   : Tue Apr 21 15:02:38 2020
    Diff in sec.  : 154
    Day:sec diff  : 0:154

nsState is YQAuLZheAEUB7SYSAA==
Little Endian
For replica cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config
  fmtstr=[H6x3QH6x]
  size=40
  len of nsstate is 40
  CSN generator state:
    Replica ID    : 97
    Sampled Time  : 1587031342
    Gen as csn    : 5e982d2e00180097
    Time as str   : Thu Apr 16 12:02:22 2020
    Local Offset  : 325
    Remote Offset : 9965
    Seq. num      : 18
    System time   : Tue Apr 21 15:02:38 2020
    Diff in sec.  : 442816
    Day:sec diff  : 5:10816/

As you can see in the 1st node the Time as str is Jan 11 of 2021.
With timedatectl command I see that both VMs use the same Time zone 
and the clock is correct.


I found this old article to fix my issue:
/https://www.redhat.com/archives/freeipa-users/2014-February/msg7.html/

But ... I had the same issue in the past, always in the 1st server. 
So, in my mind I don't want to try to use that fix.
I have a new hypervisor cluster, so I would prefer to reinstall the 

[Freeipa-users] Re: dirsrv hangs soon after reboot

2020-04-20 Thread thierry bordaz via FreeIPA-users



On 4/20/20 3:35 PM, Kees Bakker wrote:

On 20-04-2020 15:16, thierry bordaz wrote:

On 4/20/20 3:02 PM, Kees Bakker wrote:

On 20-04-2020 14:51, Rob Crittenden wrote:

Kees Bakker via FreeIPA-users wrote:

On 20-04-2020 09:58, Kees Bakker via FreeIPA-users wrote:

On 20-04-2020 09:09, Florence Blanc-Renaud wrote:

On 4/20/20 8:28 AM, Kees Bakker via FreeIPA-users wrote:

Hey,

I'm looking for advice how to analyse/debug this.

On one of the masters the dirsrv is unresponsive. It runs, but every
attempt to connect it hangs.

The command "systemctl status" does not show anything alarming

● dirsrv@EXAMPLE-COM.service - 389 Directory Server EXAMPLE-COM.
  Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled; vendor 
preset: disabled)
  Active: active (running) since vr 2020-04-17 13:46:25 CEST; 1h 33min ago
     Process: 3123 ExecStartPre=/usr/sbin/ds_systemd_ask_password_acl 
/etc/dirsrv/slapd-%i/dse.ldif (code=exited, status=0/SUCCESS)
    Main PID: 3134 (ns-slapd)
  Status: "slapd started: Ready to process requests"
  CGroup: /system.slice/system-dirsrv.slice/dirsrv@EXAMPLE-COM.service
  └─3134 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-EXAMPLE-COM -i 
/var/run/dirsrv/slapd-EXAMPLE-COM.pid

apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 2
apr 17 15:18:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:18:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 2

However, an ldapsearch command hangs forever

[root@rotte ~]# ldapsearch -H ldaps://linge.example.com -D 
uid=keesbtest,cn=users,cn=accounts,dc=example,dc=com -W -LLL -o ldif-wrap=no -b 
cn=users,cn=accounts,dc=example,dc=com 
'(&(objectClass=person)(memberOf=cn=admins,cn=groups,cn=accounts,dc=example,dc=com))'
 uid
Enter LDAP Password:

Even if I use the socket (ldapi://%2fvar%2frun%2fslapd-EXAMPLE-COM.socket) the 
ldapsearch
command hangs.

"ipactl status" hangs

"kinit" hangs



Hi,
you can start by having a look at dirsrv error log in
/var/log/dirsrv-slapd-YOUR_DOMAIN/errors, and the journal.

The FAQ page of 389 also explains a few troubleshooting steps:
http://www.port389.org/docs/389ds/FAQ/faq.html#Troubleshooting

I did exactly that, look at the "errors" log, but there was no clue, at least
not for me. Strange enough it kept running for a few hours and then it
was hanging again.

I tried the command "ipctl restart", but that was hanging forever.
However "systemctl restart dirsrv@MY-DOMAIN" was able to restart
it after several minutes. Meanwhile the sn-slapd process was using 100%
CPU.

Another remark I want to make. Every ldap connection (ldapsearch, whatever)
hangs for ever. No timeout, nothing.

When it rains, it pours, they say. There is another master with the same 
symptom.
I'm getting nervous now.

Thanks for the Troubleshooting link. I'll have to dive into the deep, I guess.

Could it be a deadlock?

[root@linge ~]# grep -a1 '^#0.*lock' slapd-stacktrace.1587374239.txt
Thread 23 (Thread 0x7ff8ff265700 (LWP 14474)):
#0  0x7ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x7ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
--
Thread 7 (Thread 0x7ff8f7255700 (LWP 14490)):
#0  0x7ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x7ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
--
Thread 4 (Thread 0x7ff8f5a52700 (LWP 14493)):
#0  0x7ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x7ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
--
Thread 2 (Thread 0x7ff92c355700 (LWP 15679)):
#0  0x7ff92943035e in pthread_rwlock_wrlock () at /lib64/libpthread.so.0
#1  0x7ff9190b6639 in backend_be_pre_write_cb () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so

Without debuginfo the trace of these threads look like this:

Thread 23 (Thread 0x7ff8ff265700 (LWP 14474)):
#0  0x7ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x7ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x7ff9190b8745 in backend_search_cb () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#3  0x7ff92bcd9028 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0
#4  0x7ff92bcd92e3 in plugin_call_plugins () at 
/usr/lib64/dirsrv/libslapd.so.0
#5  0x7ff92bccc0d7 in op_shared_search () at /usr/lib64/dirsrv/libslapd.so.0
#6  

[Freeipa-users] Re: dirsrv hangs soon after reboot

2020-04-20 Thread thierry bordaz via FreeIPA-users



On 4/20/20 3:02 PM, Kees Bakker wrote:

On 20-04-2020 14:51, Rob Crittenden wrote:

*** EXTERNAL E-MAIL ***


Kees Bakker via FreeIPA-users wrote:

On 20-04-2020 09:58, Kees Bakker via FreeIPA-users wrote:

On 20-04-2020 09:09, Florence Blanc-Renaud wrote:

On 4/20/20 8:28 AM, Kees Bakker via FreeIPA-users wrote:

Hey,

I'm looking for advice how to analyse/debug this.

On one of the masters the dirsrv is unresponsive. It runs, but every
attempt to connect it hangs.

The command "systemctl status" does not show anything alarming

● dirsrv@EXAMPLE-COM.service - 389 Directory Server EXAMPLE-COM.
 Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled; vendor 
preset: disabled)
 Active: active (running) since vr 2020-04-17 13:46:25 CEST; 1h 33min ago
Process: 3123 ExecStartPre=/usr/sbin/ds_systemd_ask_password_acl 
/etc/dirsrv/slapd-%i/dse.ldif (code=exited, status=0/SUCCESS)
   Main PID: 3134 (ns-slapd)
 Status: "slapd started: Ready to process requests"
 CGroup: /system.slice/system-dirsrv.slice/dirsrv@EXAMPLE-COM.service
 └─3134 /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-EXAMPLE-COM -i 
/var/run/dirsrv/slapd-EXAMPLE-COM.pid

apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:13:54 linge.example.com ns-slapd[3134]: GSSAPI client step 2
apr 17 15:18:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:18:54 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 1
apr 17 15:18:55 linge.example.com ns-slapd[3134]: GSSAPI client step 2

However, an ldapsearch command hangs forever

[root@rotte ~]# ldapsearch -H ldaps://linge.example.com -D 
uid=keesbtest,cn=users,cn=accounts,dc=example,dc=com -W -LLL -o ldif-wrap=no -b 
cn=users,cn=accounts,dc=example,dc=com 
'(&(objectClass=person)(memberOf=cn=admins,cn=groups,cn=accounts,dc=example,dc=com))'
 uid
Enter LDAP Password:

Even if I use the socket (ldapi://%2fvar%2frun%2fslapd-EXAMPLE-COM.socket) the 
ldapsearch
command hangs.

"ipactl status" hangs

"kinit" hangs



Hi,
you can start by having a look at dirsrv error log in
/var/log/dirsrv-slapd-YOUR_DOMAIN/errors, and the journal.

The FAQ page of 389 also explains a few troubleshooting steps:
http://www.port389.org/docs/389ds/FAQ/faq.html#Troubleshooting

I did exactly that, look at the "errors" log, but there was no clue, at least
not for me. Strange enough it kept running for a few hours and then it
was hanging again.

I tried the command "ipctl restart", but that was hanging forever.
However "systemctl restart dirsrv@MY-DOMAIN" was able to restart
it after several minutes. Meanwhile the sn-slapd process was using 100%
CPU.

Another remark I want to make. Every ldap connection (ldapsearch, whatever)
hangs for ever. No timeout, nothing.

When it rains, it pours, they say. There is another master with the same 
symptom.
I'm getting nervous now.

Thanks for the Troubleshooting link. I'll have to dive into the deep, I guess.

Could it be a deadlock?

[root@linge ~]# grep -a1 '^#0.*lock' slapd-stacktrace.1587374239.txt
Thread 23 (Thread 0x7ff8ff265700 (LWP 14474)):
#0  0x7ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x7ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
--
Thread 7 (Thread 0x7ff8f7255700 (LWP 14490)):
#0  0x7ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x7ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
--
Thread 4 (Thread 0x7ff8f5a52700 (LWP 14493)):
#0  0x7ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x7ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
--
Thread 2 (Thread 0x7ff92c355700 (LWP 15679)):
#0  0x7ff92943035e in pthread_rwlock_wrlock () at /lib64/libpthread.so.0
#1  0x7ff9190b6639 in backend_be_pre_write_cb () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so

Without debuginfo the trace of these threads look like this:

Thread 23 (Thread 0x7ff8ff265700 (LWP 14474)):
#0  0x7ff929430144 in pthread_rwlock_rdlock () at /lib64/libpthread.so.0
#1  0x7ff9190cc49c in map_rdlock () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#2  0x7ff9190b8745 in backend_search_cb () at 
/usr/lib64/dirsrv/plugins/schemacompat-plugin.so
#3  0x7ff92bcd9028 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0
#4  0x7ff92bcd92e3 in plugin_call_plugins () at 
/usr/lib64/dirsrv/libslapd.so.0
#5  0x7ff92bccc0d7 in op_shared_search () at /usr/lib64/dirsrv/libslapd.so.0
#6  0x562454427bbe in do_search ()
#7  0x56245441595a in 

[Freeipa-users] Re: setup_pr_read_pds - Not listening for new connections - too many fds open

2020-03-17 Thread thierry bordaz via FreeIPA-users



On 3/17/20 12:14 PM, Lukasz Jaworski via FreeIPA-users wrote:

Hi,

nsslapd-conntablesize = 1024 - I’ve changed on one server to 2028
nsslapd-reservedescriptors: 64 - I don’t know if increase this value?
currentconnections: 960

opened fd (chnaged conntablesize):
find /proc/23515/fd | wc -l
1043

on bad server:
currentconnections: 958 (bad no errors at this moment)
find /proc/172473/fd|wc -l
1028

It looks like change nsslapd-conntablesize fix my. problems.


Great !
Indeed nsslapd-maxdescriptors is a limitation of the connection table in 
case conntablesize is set too high.


thierry


Best regards,
Ender





On 17 Mar 2020, at 09:49, thierry bordaz via FreeIPA-users 
 wrote:

Hi,

At startup DS creates a connection table with a fixed size.
The message "setup_pr_read_pds - Not listening for new connections - too many fds 
open" means that the number of established connections exhausted the table limit.

What are the values of nsslapd-conntablesize and nsslapd-reservedescriptors ?
How many established connections (logconv on access logs or SRCH cn=monitor) ?

regards
thierry

On 3/17/20 9:35 AM, Lukasz Jaworski via FreeIPA-users wrote:

Hi,
I've upgraded freeipa 4.6.x environment on Fedora 27 to 4.8.4 on fedora 31.
- remove old replica
- install fedora 31
- connect as new replica...

now:
389-ds-base-1.4.2.8-3.fc31.x86_64
freeipa-server-4.8.4-2.fc31.x86_64

after that, I have many errors:
setup_pr_read_pds - Not listening for new connections - too many fds open

It looks like fd limit 1024
I've checked:

nsslapd-maxdescriptors:
ldapsearch -xLLL -b "cn=config" -D 'cn=Directory Manager' -W cn=config 
nsslapd-maxdescriptors
Enter LDAP Password:
dn: cn=config
nsslapd-maxdescriptors: 524288

/proc/limits:
cat /proc/2164872/limits
Limit Soft Limit   Hard Limit   Units
Max cpu time  unlimitedunlimitedseconds
Max file size unlimitedunlimitedbytes
Max data size unlimitedunlimitedbytes
Max stack size8388608  unlimitedbytes
Max core file sizeunlimitedunlimitedbytes
Max resident set  unlimitedunlimitedbytes
Max processes 515206   515206   processes
Max open files524288   524288   files
Max locked memory 6553665536bytes
Max address space unlimitedunlimitedbytes
Max file locksunlimitedunlimitedlocks
Max pending signals   515206   515206   signals
Max msgqueue size 819200   819200   bytes
Max nice priority 00
Max realtime priority 00
Max realtime timeout  unlimitedunlimitedus


dirsrv log:
[17/Mar/2020:09:12:18.119324801 +0100] - INFO - main - Setting the maximum file 
descriptor limit to: 524288

find /proc/2164872/fd | wc -l
1037

It looks like 1024 is connection limit.

Any idea what I've done wrong?

Best regards,
Ender - Lukasz Jaworski



___
FreeIPA-users mailing list --
freeipa-users@lists.fedorahosted.org

To unsubscribe send an email to
freeipa-users-le...@lists.fedorahosted.org

Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/

List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines

List Archives:
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org

—
Łukasz Jaworski







___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org


___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
L

[Freeipa-users] Re: setup_pr_read_pds - Not listening for new connections - too many fds open

2020-03-17 Thread thierry bordaz via FreeIPA-users

Hi,

At startup DS creates a connection table with a fixed size.
The message "setup_pr_read_pds - Not listening for new connections - too 
many fds open" means that the number of established connections 
exhausted the table limit.


What are the values of nsslapd-conntablesize and 
nsslapd-reservedescriptors ?
How many established connections (logconv on access logs or SRCH 
cn=monitor) ?


regards
thierry

On 3/17/20 9:35 AM, Lukasz Jaworski via FreeIPA-users wrote:

Hi,
I've upgraded freeipa 4.6.x environment on Fedora 27 to 4.8.4 on 
fedora 31.

- remove old replica
- install fedora 31
- connect as new replica...

now:
389-ds-base-1.4.2.8-3.fc31.x86_64
freeipa-server-4.8.4-2.fc31.x86_64

after that, I have many errors:
setup_pr_read_pds - Not listening for new connections - too many fds open

It looks like fd limit 1024
I've checked:

nsslapd-maxdescriptors:
ldapsearch -xLLL -b "cn=config" -D 'cn=Directory Manager' -W cn=config 
nsslapd-maxdescriptors

Enter LDAP Password:
dn: cn=config
nsslapd-maxdescriptors: 524288

/proc/limits:
cat /proc/2164872/limits
Limit                     Soft Limit           Hard Limit       Units
Max cpu time              unlimited            unlimited      seconds
Max file size             unlimited            unlimited      bytes
Max data size             unlimited            unlimited      bytes
Max stack size            8388608              unlimited      bytes
Max core file size        unlimited            unlimited      bytes
Max resident set          unlimited            unlimited      bytes
Max processes             515206               515206       processes
Max open files            524288               524288       files
Max locked memory         65536                65536      bytes
Max address space         unlimited            unlimited      bytes
Max file locks            unlimited            unlimited      locks
Max pending signals       515206               515206       signals
Max msgqueue size         819200               819200       bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited      us


dirsrv log:
[17/Mar/2020:09:12:18.119324801 +0100] - INFO - main - Setting the 
maximum file descriptor limit to: 524288


find /proc/2164872/fd | wc -l
1037

It looks like 1024 is connection limit.

Any idea what I've done wrong?

Best regards,
Ender - Lukasz Jaworski


___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org


___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org


[Freeipa-users] Re: LDAP Server stop to response after a period of time

2020-03-13 Thread thierry bordaz via FreeIPA-users

Hi Lays,

Unfortunately the fix 1751295 may be  incomplete. It prevents deadlock 
in a condition (for be_write callbacks) but not for betxn_write callbacks.

I will look deeper at it to confirm this.

At the moment I can only recommend the workaround 
https://bugzilla.redhat.com/show_bug.cgi?id=1751295#c5


best regards
thierry

On 3/13/20 7:55 AM, Lays Dragon via FreeIPA-users wrote:

hey guys,Unfortunately looks like the deadlock happen again once,here is  
slapi-nis version access log and stacktrace.
the server seems still easy lock up after replica connection recover
server1  slapi-nis version

best regards
Lays
```

[@ipa1 ~]$ yum info  slapi-nis
Last metadata expiration check: 0:00:01 ago on Fri 13 Mar 2020 02:33:52 PM CST.
Installed Packages
Name : slapi-nis
Version  : 0.56.4
Release  : 1.fc31
Architecture : x86_64
Size : 459 k

```
server2  slapi-nis version
```
[@ipa2 ~]$ yum info  slapi-nis
Last metadata expiration check: 0:15:38 ago on Fri 13 Mar 2020 06:16:54 AM UTC.
Installed Packages
Name : slapi-nis
Version  : 0.56.4
Release  : 1.fc31
Architecture : x86_64
Size : 459 k
```
server1 access log
```
[13/Mar/2020:10:11:25.145984895 +0800] conn=7 op=6244 RESULT err=0 tag=101 
nentries=1 etime=0.000295266
[13/Mar/2020:10:11:25.146011399 +0800] conn=7 op=6245 SRCH base="cn=,cn=kerberos," 
scope=0 filter="(objectClass=krbticketpolicyaux)" attrs="krbMaxTicketLife krbMaxRenewableAge 
krbTicketFlags krbAuthIndMaxTicketLife krbAuthIndMaxRenewableAge"
[13/Mar/2020:10:11:25.146117860 +0800] conn=7 op=6245 RESULT err=0 tag=101 
nentries=1 etime=0.000121085
[13/Mar/2020:10:11:25.146159101 +0800] conn=7 op=6246 SRCH base="" scope=2 
filter="(&(|(objectClass=krbprincipalaux)(objectClass=krbprincipal))(krbPrincipalName=host/rancher2.@))"
 attrs="krbPrincipalName krbCanonicalName krbUPEnabled krbPrincipalKey krbTicketPolicyReference krbPrincipalExpiration 
krbPasswordExpiration krbPwdPolicyReference krbPrincipalType krbPwdHistory krbLastPwdChange krbPrincipalAliases 
krbLastSuccessfulAuth krbLastFailedAuth krbLoginFailedCount krbPrincipalAuthInd krbExtraData krbLastAdminUnlock krbObjectReferences 
krbTicketFlags krbMaxTicketLife krbMaxRenewableAge nsAccountLock passwordHistory ipaKrbAuthzData ipaUserAuthType 
ipatokenRadiusConfigLink krbAuthIndMaxTicke..."
[13/Mar/2020:10:11:25.146369953 +0800] conn=7 op=6246 RESULT err=0 tag=101 
nentries=1 etime=0.000242570
[13/Mar/2020:10:11:25.146397438 +0800] conn=7 op=6247 SRCH base="cn=,cn=kerberos," 
scope=0 filter="(objectClass=krbticketpolicyaux)" attrs="krbMaxTicketLife krbMaxRenewableAge 
krbTicketFlags krbAuthIndMaxTicketLife krbAuthIndMaxRenewableAge"
[13/Mar/2020:10:11:25.146733736 +0800] conn=7 op=6247 RESULT err=0 tag=101 
nentries=1 etime=0.000350288
[13/Mar/2020:10:11:25.246955040 +0800] conn=2454 op=1 BIND dn="" method=sasl 
version=3 mech=GSSAPI
[13/Mar/2020:10:11:25.954814904 +0800] conn=2455 fd=158 slot=158 connection from 
.150 to .165
[13/Mar/2020:10:11:31.155950133 +0800] conn=2456 fd=159 slot=159 connection from 
.153 to .165
[13/Mar/2020:10:11:35.097209559 +0800] conn=2457 fd=160 slot=160 connection from 
.151 to .165
[13/Mar/2020:10:11:35.380751894 +0800] conn=2458 fd=161 slot=161 connection from 
.154 to .165
[13/Mar/2020:10:11:41.386182526 +0800] conn=2459 fd=162 slot=162 connection from 
.152 to .165
[13/Mar/2020:10:11:47.388327986 +0800] conn=2460 fd=163 slot=163 connection from 
.154 to .165
[13/Mar/2020:10:11:53.111784187 +0800] conn=2461 fd=164 slot=164 connection from 
.151 to .165
[13/Mar/2020:10:11:53.391606194 +0800] conn=2462 fd=165 slot=165 connection from 
.154 to .165
[13/Mar/2020:10:11:53.599860902 +0800] conn=2463 fd=166 slot=166 connection from 
.150 to .165
```

server1 stacktrace
```
GNU gdb (GDB) Fedora 8.3.50.20190824-30.fc31
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
 .

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/ns-slapd...
Reading symbols from .gnu_debugdata for /usr/sbin/ns-slapd...
(No debugging symbols found in .gnu_debugdata for /usr/sbin/ns-slapd)
Attaching to program: /usr/sbin/ns-slapd, process 674
[New LWP 757]
[New LWP 758]
[New LWP 759]
[New LWP 760]
[New LWP 761]
[New LWP 763]
[New LWP 764]
[New LWP 765]
[New LWP 766]
[New LWP 767]
[New LWP 768]
[New LWP 769]
[New LWP 770]
[New LWP 771]
[New LWP 772]
[New LWP 773]
[New LWP 774]
[New 

[Freeipa-users] Re: LDAP Server stop to response after a period of time

2020-03-10 Thread thierry bordaz via FreeIPA-users

Hello,

The deadlock you hit is a known issues 
(https://bugzilla.redhat.com/show_bug.cgi?id=1751295) fixed in slapi-nis 
0.56.4. What version of fedora and slapi-nis package are you running ?


Note that it exists a workaround 
https://bugzilla.redhat.com/show_bug.cgi?id=1751295#c5. changelog will 
grow a bit with this workaround but should remain acceptable. Later when 
fix will be applied and workaround turn off, trimming will occur again. 
But likely the changelog database file will not immediately shrink.


best regards
thierry

On 3/10/20 11:19 AM, Lays Dragon via FreeIPA-users wrote:

Hi I found a document might related to this.
https://www.freeipa.org/page/V4_slapi_nis_locking
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org


[Freeipa-users] Re: ipa-replica-install latest failure attempt:

2019-11-19 Thread thierry bordaz via FreeIPA-users



On 11/18/19 11:24 PM, Rob Crittenden wrote:

Auerbach, Steven via FreeIPA-users wrote:

Executed ipa-replica-prepare on an RHEL 6.9 server running ipa-server
3.0.0.1_51  (name : ipa01)

Yum installed ipa-server, ipa-server-dns, bind-dyndb-ldap on the target
Linux 7.6 server (name: ipa04)

Copied the file to the target server to which ipa-server 4.6.5-11.0.1 is
installed (ipa04)

Copied the file :/usr/share/ipa/copy-schema-to-ca.py from ipa v4.6
server to the ipa v3.0 server and executed it successfully.

Edited the /etc/resolv.con on ipa04 to include ipa01. Did not reboot.

Executed ipa-replica-install --setup-dns --forwarder=8.8.8.8 --setup-ca
/var/lib/ipa/replica-info-ipa04.fbog.local.gpg (on ipa04)


2019-11-16T16:23:24Z DEBUG The ipa-replica-install command failed,
exception: NotFound: wait_for_entry timeout on
ldap://ipa01.fbog.local:389 for
krbprincipalname=HTTP/ipa04.fbog.local@FBOG.LOCAL,cn=services,cn=accounts,dc=fbog,dc=local

2019-11-16T16:23:24Z ERROR wait_for_entry timeout on
ldap://ipa01.fbog.local:389 for
krbprincipalname=HTTP/ipa04.fbog.local@FBOG.LOCAL,cn=services,cn=accounts,dc=fbog,dc=local

  


Not sure where to go from here.  Did I leave out some declaration or
specification on the initial command?

The problem isn't in the command invocation, replication is just slow
enough for some reason that the new principal(s) weren't replicated to
the existing master.

I seem to recall a 389-ds option to mitigate this but I can't remember
it off the to of my head (or maybe it isn't applicable for RHEL 6
master). cc'ing someone who would know.

rob


It is difficult to be sure without  all logs (ipa-replica-install, DS 
logs) and config.
From the top of my head I recall an old bug where the replica agreement 
replica->master was failing to bind because master did not lookup the 
updated bind group.


Rob, is it the bug you were thinking of ?

If it is this bug, you may try to set nsds5ReplicaBindDNGroupCheckInterval

ldapmodify -h  -p 389 -D "cn=directory manager" -W
dn:  cn=replica,cn=, cn=mapping tree,cn=config
changetype: modify
replace: nsds5ReplicaBindDNGroupCheckInterval
nsds5ReplicaBindDNGroupCheckInterval: 3

This modification does not require restart.

best regards
thierry
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org


[Freeipa-users] Re: Broken ipa replica

2019-04-11 Thread thierry bordaz via FreeIPA-users

Hi Giulio,

During the new IPA server installation (idc01) the server idc02 sends 
all its entries (total update), one after the other.
The entries are sent idc02->idc01 over a sasl encrypted connection. I 
suspect that one of the entry sent by idc02 is large (a static group ?) 
and its encrypted size overpass the default limit set on idc01 (2Mb). I 
think your solution is the good one.


If you have big static groups, do you know how large are the biggest ones ?

According to the logged error, It looks to me that most important one to 
tune was nsslapd-maxsasliosize.

Possibly IPA installer could increase this value to manage large groups

best regads
thierry

On 4/11/19 10:36 AM, Giulio Casella wrote:

Hi Thierry, Rob, Flo,

unfortunately I have no failure log anymore (after a couple of
reinstallations they get lost). Anyway I'll try to reconstruct some
information to help you investigate further. The behaviour was:

1. the IPA replication started, coming rapidly to "[28/41]: setting up
initial replication".

2. Near the end of replication, after about 20 secs, the process aborted
with a message:
[ldap://idc02.my.dom.ain:389] reports: Update failed! Status: [Error
(-11) connection error: Unknown connection error (-11) - Total update
aborted]

idc02 is the working IPA/389-ds server.

on idc01 (the wannabe-replica) I found (in dirsrv error log):

(idc01:389): Received error -1 (Can't contact LDAP server):  for total
update operation

and somewhere else in the same file on idc01 a message similar to:

SASL encrypted packet length exceeds maximum allowed limit

3. At the time of crash I noticed (via a tcpdump session) some "TCP zero
window" message in the capture, sent by idc01 to idc02

4. After that the 389-ds server on idc01 was up, but many other IPA
parts were not (that's why I say the IPA replica setup crashed, no try
to rollback was made). And the working server was up, but somehow
"dirt", with some replica update vector (RUV) still pointing to idc01.

5. The solution was to pass "--dirsrv-config-file=custom.ldif" to
ipa-replica-install, with custom.ldif containing:

dn: cn=config
changetype: modify
replace: nsslapd-maxsasliosize
nsslapd-maxsasliosize: 4194304
replace: nsslapd-sasl-max-buffer-size
nsslapd-sasl-max-buffer-size: 4194304

(original value was 2097152 for both configuration variables).

This make me think that "TCP zero window" was only a consequence, not a
cause. After this tweak everything worked like a charme.

A couple of consideration:

1. I think you can reproduce the wrong behaviour doing the right
opposite as I did, decreasing those two values. I don't know exactly how
much.

2. Maybe ipa-replica-install should try to catch this situation, output
something more explanatory, and possibly try to rollback.


I'm sorry I've no real log to post, but I hope this helps anyway.

Thank you and regards,
Giulio




Il 10/04/2019 17:44, thierry bordaz ha scritto:


On 4/10/19 4:59 PM, Rob Crittenden wrote:

Giulio Casella via FreeIPA-users wrote:

Hi,
I managed to fix it!
The solution was to increase a couple of parameters in ldap config. I
passed "--dirsrv-config-file=custom.ldif" to ipa-replica-install, with
custom.ldif containing:

dn: cn=config
changetype: modify
replace: nsslapd-maxsasliosize
nsslapd-maxsasliosize: 4194304
replace: nsslapd-sasl-max-buffer-size
nsslapd-sasl-max-buffer-size: 4194304

In brief I doubled the sasl buffer size, because I noticed a log message
saying "SASL encrypted packet length exceeds maximum
allowed limit".

But the behaviour of ipa-replica-install was quite strange, it crashed,
and in a packet capture session I noticed the presence of some "TCP zero
window" packets sent from wannabe-replica to existing ipa server.
Maybe developers want to try to catch that error and revert the
operation, just like is done with other kind of errors.

Maybe one of the 389-ds devs have an idea. They're probably going to
want to see logs and what your definition of crash is.

rob

TCP zero window make me think to a client not reading fast enough.
Is it transient/recoverable or not ?

Rob is right, if a problem is detected at 389-ds  level, access/errors
logs are appreciated.
and also the ipa-replica-install backstack when it crashed.

regards
thierry

Ciao,
g


Il 01/04/2019 15:28, Giulio Casella via FreeIPA-users ha scritto:

Hi,
I'm still stuck on this, I tried to delete every reference to the old
server, with ipa commands ("ipa-replica-manage clean-ruv") and directly
in ldap (as reported in https://access.redhat.com/solutions/136993).

If I try to "ipa-replica-manage list-ruv" on idc02 I get:

Replica Update Vectors:
  idc02.my.dom.ain:389: 5
Certificate Server Replica Update Vectors:
  idc02.my.dom.ain:389: 91

(same result looking directly into ldap)

is it correct? Does a server has replica reference to itself?

I also tried to instantiate a new server, idc03.my.dom.ain, never known
before (fresh centos install, ipa-client-install, ipa-replica-install).

[Freeipa-users] Re: Broken ipa replica

2019-04-10 Thread thierry bordaz via FreeIPA-users



On 4/10/19 4:59 PM, Rob Crittenden wrote:

Giulio Casella via FreeIPA-users wrote:

Hi,
I managed to fix it!
The solution was to increase a couple of parameters in ldap config. I
passed "--dirsrv-config-file=custom.ldif" to ipa-replica-install, with
custom.ldif containing:

dn: cn=config
changetype: modify
replace: nsslapd-maxsasliosize
nsslapd-maxsasliosize: 4194304
replace: nsslapd-sasl-max-buffer-size
nsslapd-sasl-max-buffer-size: 4194304

In brief I doubled the sasl buffer size, because I noticed a log message
saying "SASL encrypted packet length exceeds maximum
allowed limit".

But the behaviour of ipa-replica-install was quite strange, it crashed,
and in a packet capture session I noticed the presence of some "TCP zero
window" packets sent from wannabe-replica to existing ipa server.
Maybe developers want to try to catch that error and revert the
operation, just like is done with other kind of errors.

Maybe one of the 389-ds devs have an idea. They're probably going to
want to see logs and what your definition of crash is.

rob

TCP zero window make me think to a client not reading fast enough.
Is it transient/recoverable or not ?

Rob is right, if a problem is detected at 389-ds  level, access/errors 
logs are appreciated.

and also the ipa-replica-install backstack when it crashed.

regards
thierry



Ciao,
g


Il 01/04/2019 15:28, Giulio Casella via FreeIPA-users ha scritto:

Hi,
I'm still stuck on this, I tried to delete every reference to the old
server, with ipa commands ("ipa-replica-manage clean-ruv") and directly
in ldap (as reported in https://access.redhat.com/solutions/136993).

If I try to "ipa-replica-manage list-ruv" on idc02 I get:

Replica Update Vectors:
 idc02.my.dom.ain:389: 5
Certificate Server Replica Update Vectors:
 idc02.my.dom.ain:389: 91

(same result looking directly into ldap)

is it correct? Does a server has replica reference to itself?

I also tried to instantiate a new server, idc03.my.dom.ain, never known
before (fresh centos install, ipa-client-install, ipa-replica-install).
The setup (surprisingly to me) failed (details below).

At this point I suspect the problem is on idc02 (the only working
server), unrelated to previous server idc01.

For completeness this is what I did:

. Fresh install of a CentOS 7 box, updated, installed ipa software (name
idc03.my.dom.ain)
. ipa-client-install --principal admin --domain=my.dom.ain
--realm=MY.DOM.AIN --force-join
. ipa-replica-install --setup-dns --no-forwarders --setup-ca

Last command failed (in "[28/41]: setting up initial replication"), and
in /var/log/ipareplica-install.log of idc03 I read:

[...]
2019-03-28T09:30:48Z DEBUG   [28/41]: setting up initial replication
2019-03-28T09:30:48Z DEBUG retrieving schema for SchemaCache
url=ldapi://%2fvar%2frun%2fslapd-MY-DOM-AIN.socket
conn=
2019-03-28T09:30:48Z DEBUG Destroyed connection
context.ldap2_140424739228880
2019-03-28T09:30:48Z DEBUG Starting external process
2019-03-28T09:30:48Z DEBUG args=/bin/systemctl --system daemon-reload
2019-03-28T09:30:48Z DEBUG Process finished, return code=0
2019-03-28T09:30:48Z DEBUG stdout=
2019-03-28T09:30:48Z DEBUG stderr=
2019-03-28T09:30:48Z DEBUG Starting external process
2019-03-28T09:30:48Z DEBUG args=/bin/systemctl restart
dirsrv@MY-DOM-AIN.service
2019-03-28T09:30:54Z DEBUG Process finished, return code=0
2019-03-28T09:30:54Z DEBUG stdout=
2019-03-28T09:30:54Z DEBUG stderr=
2019-03-28T09:30:54Z DEBUG Restart of dirsrv@MY-DOM-AIN.service complete
2019-03-28T09:30:54Z DEBUG Created connection context.ldap2_140424739228880
2019-03-28T09:30:55Z DEBUG Fetching nsDS5ReplicaId from master [attempt 1/5]
2019-03-28T09:30:55Z DEBUG retrieving schema for SchemaCache
url=ldap://idc02.my.dom.ain:389 conn=
2019-03-28T09:30:55Z DEBUG Successfully updated nsDS5ReplicaId.
2019-03-28T09:30:55Z DEBUG Add or update replica config
cn=replica,cn=dc\=my\,dc\=dom\,dc\=ain,cn=mapping tree,cn=config
2019-03-28T09:30:55Z DEBUG Added replica config
cn=replica,cn=dc\=my\,dc\=dom\,dc\=ain,cn=mapping tree,cn=config
2019-03-28T09:30:55Z DEBUG Add or update replica config
cn=replica,cn=dc\=my\,dc\=dom\,dc\=ain,cn=mapping tree,cn=config
2019-03-28T09:30:55Z DEBUG No update to
cn=replica,cn=dc\=my\,dc\=dom\,dc\=ain,cn=mapping tree,cn=config necessary
2019-03-28T09:30:55Z DEBUG Waiting for replication
(ldap://idc02.my.dom.ain:389)
cn=meToidc03.my.dom.ain,cn=replica,cn=dc\=my\,dc\=dom\,dc\=ain,cn=mapping 
tree,cn=config
(objectclass=*)
2019-03-28T09:30:55Z DEBUG Entry found
[LDAPEntry(ipapython.dn.DN('cn=meToidc03.my.dom.ain,cn=replica,cn=dc\=my\,dc\=dom\,dc\=ain,cn=mapping
tree,cn=config'), {u'nsds5replicaLastInitStart': ['1970010100Z'],
u'nsds5replicaUpdateInProgress': ['FALSE'], u'cn':
['meToidc03.my.dom.ain'], u'objectClass': ['nsds5replicationagreement',
'top'], u'nsds5replicaLastUpdateEnd': ['1970010100Z'],
u'nsDS5ReplicaRoot': ['dc=my,dc=dom,dc=ain'], u'nsDS5ReplicaHost':
['idc03.my.dom.ain'], u'nsds5replicaLastUpdateStatus': 

[Freeipa-users] Re: Failed to start 389 Directory Server

2019-02-07 Thread thierry bordaz via FreeIPA-users

Hi,

The IPA message are from Jan 28th (failing ipa backup ) while the 
restart failure is from Feb 2nd. Nothing in the ds error logs from Jan28th ?


The first message "Detected Disorderly Shutdown" means that DS stopped 
abruptly (crash, assert,..).
So at restart it runs a recovery of the database. Usually it works fine 
but here the recovery failed "libdb: BDB1546 unable to join the 
environment".


You may check if there is a captured DS core file. Also would you 
provide ls -lR /var/lib/dirsrv/slapd-/db


best regards
thierry

On 02/06/2019 10:43 AM, Florence Blanc-Renaud via FreeIPA-users wrote:

On 2/3/19 8:08 AM, Zarko D via FreeIPA-users wrote:
Hi there, this is ipa-server-4.4.0-12.0.1 with 
389-ds-base-1.3.5.10-11 and suddenly daily backup has started to fail 
with messages:


2019-01-28T04:10:04Z INFO Backing up ipaca in REALM-COM to LDIF
2019-01-28T04:10:04Z INFO Waiting for LDIF to finish
2019-01-28T04:10:05Z DEBUG   File 
"/usr/lib/python2.7/site-packages/ipapython/admintool.py", line 171, 
in execute

 return_value = self.run()
   File 
"/usr/lib/python2.7/site-packages/ipaserver/install/ipa_backup.py", 
line 300, in run

 self.db2ldif(instance, 'ipaca', online=options.online)
   File 
"/usr/lib/python2.7/site-packages/ipaserver/install/ipa_backup.py", 
line 425, in db2ldif

 shutil.move(ldiffile, os.path.join(self.dir, ldifname))
   File "/usr/lib64/python2.7/shutil.py", line 301, in move
 copy2(src, real_dst)
   File "/usr/lib64/python2.7/shutil.py", line 130, in copy2
 copyfile(src, dst)
   File "/usr/lib64/python2.7/shutil.py", line 82, in copyfile
 with open(src, 'rb') as fsrc:
2019-01-28T04:10:05Z DEBUG The ipa-backup command failed, exception: 
IOError: [Errno 2] No such file or directory: u'/var/

lib/dirsrv/slapd-REALM-COM/ldif/REALM-COM-ipaca.ldif'
2019-01-28T04:10:05Z ERROR [Errno 2] No such file or directory: 
u'/var/lib/dirsrv/slapd-REALM-COM/ldif/REALM-COM-ipaca.ldif'
2019-01-28T04:10:05Z ERROR The ipa-backup command failed. See 
/var/log/ipabackup.log for more information


And service start fails with messages:

[02/Feb/2019:22:47:37.889779410 -0800] 389-Directory/1.3.5.10 
B2016.309.1527 starting up
[02/Feb/2019:22:47:37.906422534 -0800] default_mr_indexer_create: 
warning - plugin [caseIgnoreIA5Match] does not handle caseExactIA5Match
[02/Feb/2019:22:47:37.921288555 -0800] WARNING: userRoot: entry cache 
size 10485760 B is less than db size 16932864 B; We recommend to 
increase the entry cache size nsslapd-cachememsize.
[02/Feb/2019:22:47:37.921943984 -0800] WARNING: ipaca: entry cache 
size 10485760 B is less than db size 1757741056 B; We recommend to 
increase the entry cache size nsslapd-cachememsize.
[02/Feb/2019:22:47:37.922701343 -0800] WARNING: changelog: entry 
cache size 2097152 B is less than db size 82935808 B; We recommend to 
increase the entry cache size nsslapd-cachememsize.
[02/Feb/2019:22:47:37.925215059 -0800] Detected Disorderly Shutdown 
last time Directory Server was running, recovering database.
[02/Feb/2019:22:47:37.926177620 -0800] libdb: BDB1546 unable to join 
the environment



thanks in advance for any help, Zarko


Hi,
You may get more help from 389-users mailing list, which I CC'ed.
flo


___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to 
freeipa-users-le...@lists.fedorahosted.org

Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org



___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to 
freeipa-users-le...@lists.fedorahosted.org

Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org


[Freeipa-users] Re: ipa user-mod --rename failed

2018-06-21 Thread thierry bordaz via FreeIPA-users

Hi Harald,

Sorry to be back late.

The MODRDN failed

[20/Jun/2018:12:16:26.438676865 +0200] conn=2464250 fd=417 slot=417 
connection from 172.19.96.3 to 172.19.96.3
[20/Jun/2018:12:16:26.20018 +0200] conn=2464250 op=0 BIND dn="" 
method=sasl version=3 mech=GSS-SPNEGO
[20/Jun/2018:12:16:26.449637703 +0200] conn=2464250 op=0 RESULT err=0 
tag=97 nentries=0 etime=0 
dn="uid=admin,cn=users,cn=accounts,dc=example,dc=de"
[20/Jun/2018:12:16:26.451161509 +0200] conn=2464250 op=1 SRCH 
base="cn=ipaconfig,cn=etc,dc=example,dc=de" scope=0 
filter="(objectClass=*)" attrs=ALL
[20/Jun/2018:12:16:26.451753066 +0200] conn=2464250 op=1 RESULT err=0 
tag=101 nentries=1 etime=0
[20/Jun/2018:12:16:26.452751904 +0200] conn=2464250 op=2 SRCH 
base="uid=bobs,cn=users,cn=accounts,dc=example,dc=de" scope=0 
filter="(objectClass=*)" attrs="distinguishedName"
[20/Jun/2018:12:16:26.452983629 +0200] conn=2464250 op=2 RESULT err=0 
tag=101 nentries=1 etime=0
[20/Jun/2018:12:16:26.453499505 +0200] conn=2464250 op=3 SRCH 
base="uid=bobs,cn=users,cn=accounts,dc=example,dc=de" scope=0 
filter="(objectClass=*)" attrs="krbPrincipalName krbCanonicalName"
[20/Jun/2018:12:16:26.453742775 +0200] conn=2464250 op=3 RESULT err=0 
tag=101 nentries=1 etime=0
[20/Jun/2018:12:16:26.456729268 +0200] conn=2464250 op=4 MODRDN 
dn="uid=bobs,cn=users,cn=accounts,dc=example,dc=de" newrdn="uid=bobk" 
newsuperior="(null)"
[20/Jun/2018:12:16:31.890761679 +0200] conn=2464250 op=4 RESULT err=1 
tag=109 nentries=0 etime=5 csn=5b2a297c00090004

[20/Jun/2018:12:16:31.892091985 +0200] conn=2464250 op=5 UNBIND
[20/Jun/2018:12:16:31.892112732 +0200] conn=2464250 op=5 fd=417 closed - U1


While quite "intensive" read activity were around changelog

[20/Jun/2018:12:16:31.885644563 +0200] - ERR - ldbm_back_modrdn - 
SLAPI_PLUGIN_BE_TXN_POST_MODRDN_FN plugin returned error but did not set 
SLAPI_RESULT_CODE
[20/Jun/2018:12:16:31.890841336 +0200] - ERR - 
agmt="cn=meToipabak.ac.example.de" (ipabak:389) - clcache_load_buffer - 
Can't locate CSN 5b2a297c00050004 in the changelog (DB rc=-30988). 
If replication stops, the consumer may need to be reinitialized.
[20/Jun/2018:12:16:31.895854088 +0200] - ERR - 
agmt="cn=meToipa2.example.de" (ipa2:389) - clcache_load_buffer - Can't 
locate CSN 5b2a297c00050004 in the changelog (DB rc=-30988). If 
replication stops, the consumer may need to be reinitialized.
[20/Jun/2018:12:16:31.899133027 +0200] - ERR - 
agmt="cn=meToipa3.example.de" (ipa3:389) - clcache_load_buffer - Can't 
locate CSN 5b2a297c00050004 in the changelog (DB rc=-30988). If 
replication stops, the consumer may need to be reinitialized.
[20/Jun/2018:12:16:31.910014989 +0200] - ERR - 
agmt="cn=meToipa4.example.de" (ipa4:389) - clcache_load_buffer - Can't 
locate CSN 5b2a297c00050004 in the changelog (DB rc=-30988). If 
replication stops, the consumer may need to be reinitialized.
[20/Jun/2018:12:16:31.918932212 +0200] - ERR - 
agmt="cn=meToipabak.ac.example.de" (ipabak:389) - clcache_load_buffer - 
Can't locate CSN 5b2a297c00050004 in the changelog (DB rc=-30988). 
If replication stops, the consumer may need to be reinitialized.
[20/Jun/2018:12:16:31.91449 +0200] - ERR - 
agmt="cn=meToipa2.example.de" (ipa2:389) - clcache_load_buffer - Can't 
locate CSN 5b2a297c00050004 in the changelog (DB rc=-30988). If 
replication stops, the consumer may need to be reinitialized.
[20/Jun/2018:12:16:31.934273432 +0200] - ERR - 
agmt="cn=meToipa3.example.de" (ipa3:389) - clcache_load_buffer - Can't 
locate CSN 5b2a297c00050004 in the changelog (DB rc=-30988). If 
replication stops, the consumer may need to be reinitialized.
[20/Jun/2018:12:16:31.942328998 +0200] - ERR - 
agmt="cn=meToipa4.example.de" (ipa4:389) - clcache_load_buffer - Can't 
locate CSN 5b2a297c00050004 in the changelog (DB rc=-30988). If 
replication stops, the consumer may need to be reinitialized.


There is not enough detail to confirm but my feeling is that the MODRDN 
(write) failed to update the changelog because of many replication 
agreements (read) competing with it. It retried several times but 
without success so the full txn was aborted.


I think this can be mitigate with appropriate deadlock policy 
(nsslapd-db-deadlock-policy: 6 for example).


Now it broke the index and that is really unexpected (even after a 
db_deadlock). It worth to try to reproduce.


thanks for your help

best regards
thierry

On 06/20/2018 08:14 PM, Harald Dunkel via FreeIPA-users wrote:

Hi Thierry,

On 6/20/18 6:02 PM, thierry bordaz via FreeIPA-users wrote:

Hi Harald,

I wonder if error on ipa1 can not be part of the problem

[20/Jun/2018:12:16:31.885644563 +0200] - ERR - ldbm_back_modrdn - 
SLAPI_PLUGIN_BE_TXN_POST_MODRDN_FN plugin returned error but did not

[Freeipa-users] Re: ipa user-mod --rename failed

2018-06-20 Thread thierry bordaz via FreeIPA-users

Hi Harald,

I wonder if error on ipa1 can not be part of the problem

[20/Jun/2018:12:16:31.885644563 +0200] - ERR - ldbm_back_modrdn - 
SLAPI_PLUGIN_BE_TXN_POST_MODRDN_FN plugin returned error but did not set 
SLAPI_RESULT_CODE

The MODRDN failed, that would explain why 'uid=bobs' remained in the 
index (and findable via search)

But this does not explain how RDN and entry itself was changed.

Could you provide the access logs (ipa1) around that time ?

best regards
thierry

On 06/20/2018 04:34 PM, Harald Dunkel via FreeIPA-users wrote:

Hi Thierry,

On 6/20/18 3:31 PM, thierry bordaz via FreeIPA-users wrote:

Hi Harald,

anything noticeable in the error logs when the problem occurred ? 
(DB_DEADLOCK)




I found something in the slapd error log files on the bad replicas
(attached).

Other replicas show tons of lines like

:
[16/Jun/2018:20:48:14.959827920 +0200] - ERR - DSRetroclPlugin - 
delete_changerecord: could not delete change record 4028228 (rc: 32)
[16/Jun/2018:20:48:14.962389856 +0200] - ERR - DSRetroclPlugin - 
delete_changerecord: could not delete change record 4028229 (rc: 32)
[16/Jun/2018:20:48:14.971465364 +0200] - ERR - DSRetroclPlugin - 
delete_changerecord: could not delete change record 4028230 (rc: 32)
[16/Jun/2018:20:48:14.979659148 +0200] - ERR - DSRetroclPlugin - 
delete_changerecord: could not delete change record 4028231 (rc: 32)
[16/Jun/2018:20:48:14.988140501 +0200] - ERR - DSRetroclPlugin - 
delete_changerecord: could not delete change record 4028232 (rc: 32)
[16/Jun/2018:20:48:14.992190747 +0200] - ERR - DSRetroclPlugin - 
delete_changerecord: could not delete change record 4028233 (rc: 32)
[16/Jun/2018:20:48:15.92668 +0200] - ERR - DSRetroclPlugin - 
delete_changerecord: could not delete change record 4028234 (rc: 32)
[16/Jun/2018:20:48:15.008352154 +0200] - ERR - DSRetroclPlugin - 
delete_changerecord: could not delete change record 4028235 (rc: 32)

:

some of them are months old, but we got "real" problems just today
(at about 12:20).


Any idea?

Regards
Harri


___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/freeipa-users@lists.fedorahosted.org/message/53YBLERUUXIWOTEDNG2ZGO25MWYHEPB7/


___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/freeipa-users@lists.fedorahosted.org/message/LP2RAZEUIEBWPHGF3UUTIN323RQV47WU/


[Freeipa-users] Re: ipa user-mod --rename failed

2018-06-20 Thread thierry bordaz via FreeIPA-users

Hi Harald,

anything noticeable in the error logs when the problem occurred ? 
(DB_DEADLOCK)


best regards
thierry


On 06/20/2018 02:56 PM, Harald Dunkel via FreeIPA-users wrote:

Hi folks,

something got corrupted in my ldap database (again). After running

% ipa user-mod --rename=bobk bobs

I get

% getent passwd bobs
% getent passwd bobk
%

The UID became unusable. (Highly painful, because this user is cut off
from EMails.) This is what I see:

% ipa user-find bobs
--
1 user matched
--
  User login: bobk
  First name: Bob
  Last name: S
  Home directory: /home/bobs
  Login shell: /bin/bash
  Principal alias: b...@example.de
  Email address: b...@example.de
  UID: 1032
  GID: 100
  Account disabled: False

Number of entries returned 1


% ipa user-find bobk
---
0 users matched
---

Number of entries returned 0


% ipa user-find --login bobk
---
0 users matched
---

Number of entries returned 0


% ipa user-find --login bobs
---
0 users matched
---

Number of entries returned 0


Neither login name is found. Using ldap some data is still
available:

% ldapsearch -LLL -Y GSSAPI -b cn=users,cn=accounts,dc=example,dc=de 
'(uid=bobs)'


dn: uid=bobk,cn=users,cn=accounts,dc=example,dc=de
gecos: Bob S
displayName: Bob S
krbPrincipalName: b...@example.de
mepManagedEntry: cn=bobk,cn=groups,cn=accounts,dc=example,dc=de
memberOf: cn=ipausers,cn=groups,cn=accounts,dc=example,dc=de
memberOf: cn=projects,cn=groups,cn=accounts,dc=example,dc=de
memberOf: cn=develop,cn=groups,cn=accounts,dc=example,dc=de
uid: bobk
krbLastSuccessfulAuth: 20180607201703Z
krbLoginFailedCount: 0
krbLastFailedAuth: 20180606135524Z
ipaUniqueID: 35292e46-ad70-11e5-8123-0016cc46e69a
givenName: Bob
mail: b...@example.de
homeDirectory: /home/bobs
sn: S
gidNumber: 100
initials: JS
uidNumber: 1032
loginShell: /bin/bash
objectClass: ipaobject
objectClass: person
objectClass: top
objectClass: ipasshuser
objectClass: inetorgperson
objectClass: organizationalperson
objectClass: krbticketpolicyaux
objectClass: krbprincipalaux
objectClass: inetuser
objectClass: posixaccount
objectClass: ipaSshGroupOfPubKeys
objectClass: mepOriginEntry
cn: Bob S
krbLastPwdChange: 20160104091328Z
krbPasswordExpiration: 20400825091328Z
krbExtraData:: AAK4N4pWanNjaHVsdGVAQUlYSUdPLkRFAA==
krbLastAdminUnlock: 20160314150305Z


% ldapsearch -LLL -Y GSSAPI -b 
cn=users,cn=accounts,dc=example,dc=de '(uid=bobk)'

%

Using jxplorer I see the entry for "bobk" (on 2 replicas), but if I 
try to
look inside I get an error popup "unable to perform read operation". 
On the

other 4 replicas I see "bobs" (no problem here).


WTH? How can I cleanup this mess?

Every helpful comment is highly appreciated
Harri
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to 
freeipa-users-le...@lists.fedorahosted.org

Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/freeipa-users@lists.fedorahosted.org/message/UB477YJDVHK4242T54KHH65MCZONLCJF/

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/freeipa-users@lists.fedorahosted.org/message/WTUZLBJ55CSKE4KFTMEHLL7GVQIKH66X/


[Freeipa-users] Re: Problems setting up replica on Raspberry Pi 3B (ARM)

2018-05-17 Thread thierry bordaz via FreeIPA-users



On 05/16/2018 10:03 PM, Jonathan Vaughn wrote:
I've been just using the packages from Fedora. I can build it 
potentially but I don't have a cross build environment set up at the 
moment. From experience I'd want to do that first because building 
anything on the Pi usually takes ages.


I'd been "redacting" the hostnames but I'll stop bothering since it 
looks like we're getting far enough into the weeds now that the 
difference in string lengths after "redacting" might actually be a red 
herring.


(gdb) p *agmt
$1 = {hostname = 0x1ef9be0 "ipa-12.creatuity.internal", port = 389, 
transport_flags = 0, binddn = 0x1e8f650 "", creds = 0x1e8f7a0, 
bindmethod = 3, replarea = 0x1ef9480,
  frac_attrs = 0x1ef99c0, frac_attrs_total = 0x1ef9a40, 
frac_attr_total_defined = 1, schedule = 0x1a7f0c0, auto_initialize = 
502, dn = 0x1ef8d00, rdn = 0x1ef8c20,
  long_name = 0x1a7f100 "agmt=\"cn=meToipa-12.creatuity.internal\" 
(ipa-12:5)", protocol = 0x19c2930, changecounters = 0x186d180, 
num_changecounters = 0,
  max_changecounters = 256, last_update_start_time = 1526500697, 
last_update_end_time = 1526500697,
  last_update_status = "Error (0) Replica acquired successfully: 
Incremental update succeeded", '\000' , 
update_in_progress = 0, is_enabled = 1,
  last_init_start_time = 0, last_init_end_time = 0, last_init_status = 
'\000' , lock = 0x1ee3740, consumerRUV = 0x1f14e50,
  consumerSchemaCSN = 0x317c520, consumerRID = 4, tmpConsumerRID = 0, 
timeout = 120, stop_in_progress = 0, busywaittime = 0, pausetime = 0, 
priv = 0x0,
  attrs_to_strip = 0x1ef9ba0, agreement_type = 0, protocol_timeout = 
0x1e8f5f0, maxcsn = 0x0, flowControlWindow = 1000, flowControlPause = 
2000, ignoreMissingChange = 0,

  attr_lock = 0x1ef9c20, WaitForAsyncResults = 100}
(gdb) p *agmt->replarea
$2 = {flag = 15 '\017', udn = 0x1efce80 "dc=creatuity,dc=internal", dn 
= 0x1ef9460 "dc=creatuity,dc=internal", ndn = 0x1ef8ec0 
"dc=creatuity,dc=internal", ndn_len = 24}

(gdb) p *agmt->rdn
$3 = {flag = 0 '\000', rdn = 0x19c2840 
"cn=meToipa-12.creatuity.internal", rdns = 0x0, butcheredupto = -1, 
nrdn = 0x0, all_rdns = 0x0, all_nrdns = 0x0}


[root@ipa-11 ~]# grep -r PRId64 /usr/include/*
/usr/include/inttypes.h:# define PRId64  __PRI64_PREFIX "d"
[root@ipa-11 ~]# grep -r PRIu16 /usr/include/*
/usr/include/inttypes.h:# define PRIu16         "u"



On Wed, May 16, 2018 at 2:55 PM, Mark Reynolds > wrote:




On 05/16/2018 03:43 PM, Jonathan Vaughn wrote:

The installed version of 389* is 1.3.7.10-1.fc27 for armv7hl,
which appears to be the latest available version.


Perhaps something is off with the inttypes on Raspberry.  Are you
building this yourself on Raspberry? Can we make code changes and
compile/install them?

Before we do that though, in gdb can you run these commands in the
same gdb frame:

(gdb) p *agmt->replarea
(gdb) p *agmt->rdn


Then do:

# grep -r PRId64 /usr/include/*
# grep -r PRIu16 /usr/include/*


So if you can compile the source, then change this line in
ldap/servers/plugins/replication/repl5_agmt.c:3036, but don't do
this yet until you get me the info I just requested.

From:

    agmt->maxcsn = slapi_ch_smprintf("%s;%s;%s;%"
PRId64 ";%" PRIu16 ";%s", slapi_sdn_get_dn(agmt->replarea),
slapi_rdn_get_value_by_ref(slapi_rdn_get_rdn(agmt->rdn)),
agmt->hostname,
agmt->port, agmt->consumerRID, maxcsn);

To:

    agmt->maxcsn =
slapi_ch_smprintf("%s;%s;%s;%ld;%d;%s",
slapi_sdn_get_dn(agmt->replarea),
slapi_rdn_get_value_by_ref(slapi_rdn_get_rdn(agmt->rdn)),
agmt->hostname,
(long)agmt->port, (int)agmt->consumerRID, maxcsn);


Thanks,
Mark



On Wed, May 16, 2018 at 2:38 PM, Jonathan Vaughn
> wrote:




The agreement structure looks valid to me. it should not lead to a crash.

What looks weird to me is the order of the arguments of cvt_s.
It is called: rv = cvt_s(ss, u.s, width, prec, flags);
But the crashing thread shows the opposite order: flags, prec,width, str, ss

The others frame do not show this change of order.
Also 'str=4' that would be a meaningful value for 'prec=4'.

From debug perspective I only imagine disassemble the two last frames 
to confirm parameters.

What are the nspr rpms ?




(gdb) up
#1  0xb6926b40 in cvt_s (flags=0, prec=,
width=0, str=0x4 , ss=)
    at ../../.././nspr/pr/src/io/prprf.c:374
374             slen = strlen(str);
(gdb) up
#2  dosprintf (ss=ss@entry=0x9e06e4bc, fmt=0xb34b0df2 "",
fmt@entry=0xb34da770  "\360\317p\002", ap=...) at
../../.././nspr/pr/src/io/prprf.c:1018
1018                rv = cvt_s(ss, u.s, width, prec, flags);
(gdb) up
#3  0xb6926c8c in PR_vsmprintf (fmt=fmt@entry=0xb34da770
 "\360\317p\002", ap=..., ap@entry=...) at
   

[Freeipa-users] Re: Problems setting up replica on Raspberry Pi 3B (ARM)

2018-05-15 Thread thierry bordaz via FreeIPA-users

Hi Jonathan,

This problem looks new to me and has something specific to your environment.
I think the best approach is to continue to debug on your system if you 
have the possibility to do so.


From strace we can see that DS started smoothly (created its pid file 
then notified systemd it was running fine). According to the pstack 
nunc-stans was running and was able to accept network events even if it 
appears it detected no incoming connection.
So the server should be ready to serve for some seconds (more than a 
minute), then it crashed with one thread dereferencing likely a wrong 
pointer.


Could you attach a debugger when the server is started and wait for the 
sigsegv to occur. Then confirm the crashing thread backstack.
If it is confirmed, I am afraid this is a stack corruption and valgrind 
could help 
(http://www.port389.org/docs/389ds/FAQ/faq.html#debugging-memory-growthinvalid-access-with-valgrind).


best regards
thierry

On 05/14/2018 10:20 PM, Jonathan Vaughn wrote:
Here's a strace from before it dies. Most of the elapsed time is it 
waiting on some futex call it looks like near the end, when it finally 
"returns" (from lack of strace output for duration of call I assume it 
didn't actually return but SIGSEGV in that call) and strace prints ' = 
?' on the futex it then immediately reports SIGSEGV after. So maybe 
the problem is that futex call, which may mean the problem is not 
directly in 389DS / FreeIPA itself?




15:13:31.626587 (+     0.000630) listen(8, 128) = 0 <0.68>
15:13:31.626857 (+     0.000235) listen(9, 128) = 0 <0.48>
15:13:31.627111 (+     0.000251) clock_gettime(CLOCK_MONOTONIC, 
{tv_sec=1464932, tv_nsec=41120614}) = 0 <0.85>
15:13:31.627457 (+     0.000356) clock_gettime(CLOCK_REALTIME, 
{tv_sec=1526328811, tv_nsec=627560772}) = 0 <0.43>
15:13:31.631233 (+     0.003839) clock_gettime(CLOCK_MONOTONIC, 
{tv_sec=1464932, tv_nsec=45286796}) = 0 <0.77>
15:13:31.631720 (+     0.000427) clock_gettime(CLOCK_MONOTONIC, 
{tv_sec=1464932, tv_nsec=45661430}) = 0 <0.42>
15:13:31.631955 (+     0.000220) clock_gettime(CLOCK_REALTIME, 
{tv_sec=1526328811, tv_nsec=632049036}) = 0 <0.47>
15:13:31.635669 (+     0.003785) clock_gettime(CLOCK_MONOTONIC, 
{tv_sec=1464932, tv_nsec=49725840}) = 0 <0.000146>

15:13:31.636484 (+     0.000784) write(16, "a", 1) = 1 <0.000118>
15:13:31.636855 (+     0.000341) sched_yield() = 0 <0.000252>
15:13:31.637322 (+     0.000470) futex(0x1cb57a0, FUTEX_WAKE_PRIVATE, 
1) = 1 <0.88>

15:13:31.637897 (+     0.000610) write(16, "a", 1) = 1 <0.000221>
15:13:31.638394 (+     0.000467) sched_yield() = 0 <0.47>
15:13:31.638619 (+     0.000202) futex(0x1cb5710, FUTEX_WAKE_PRIVATE, 
1) = 1 <0.65>
15:13:31.638908 (+     0.000298) openat(AT_FDCWD, 
"/var/run/dirsrv/slapd-COMPANY-INTERNAL.pid", 
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 33 <0.000831>

15:13:31.640260 (+     0.001387) getpid() = 32353 <0.77>
15:13:31.640558 (+     0.000256) fstat64(33, {st_mode=S_IFREG|0644, 
st_size=0, ...}) = 0 <0.000119>

15:13:31.641106 (+     0.000556) write(33, "32353\n", 6) = 6 <0.000127>
15:13:31.641472 (+     0.000362) close(33) = 0 <0.000519>
15:13:31.642216 (+     0.000758) 
chmod("/var/run/dirsrv/slapd-COMPANY-INTERNAL.pid", 0644) = 0 <0.000152>
15:13:31.642900 (+     0.000679) clock_gettime(CLOCK_REALTIME, 
{tv_sec=1526328811, tv_nsec=643020294}) = 0 <0.56>
15:13:31.643495 (+     0.000590) write(2, 
"[14/May/2018:15:13:31.643020294 "..., 134) = 134 <0.002697>
15:13:31.646515 (+     0.003052) clock_gettime(CLOCK_REALTIME, 
{tv_sec=1526328811, tv_nsec=646694394}) = 0 <0.75>
15:13:31.646892 (+     0.000337) write(4, 
"[14/May/2018:15:13:31.646694394 "..., 134) = 134 <0.000522>

15:13:31.647841 (+     0.000973) fsync(4) = 0 <0.005967>
15:13:31.654425 (+     0.006617) clock_gettime(CLOCK_REALTIME, 
{tv_sec=1526328811, tv_nsec=654598946}) = 0 <0.000253>
15:13:31.655137 (+     0.000717) write(2, 
"[14/May/2018:15:13:31.654598946 "..., 136) = 136 <0.002427>
15:13:31.658312 (+     0.003165) clock_gettime(CLOCK_REALTIME, 
{tv_sec=1526328811, tv_nsec=658486117}) = 0 <0.000251>
15:13:31.659032 (+     0.000682) write(4, 
"[14/May/2018:15:13:31.658486117 "..., 136) = 136 <0.000346>

15:13:31.659623 (+     0.000595) fsync(4) = 0 <0.003311>
15:13:31.663230 (+     0.003642) getpid() = 32353 <0.45>
15:13:31.663732 (+     0.000454) socket(AF_UNIX, 
SOCK_DGRAM|SOCK_CLOEXEC, 0) = 33 <0.000296>
15:13:31.664760 (+     0.001048) getsockopt(33, SOL_SOCKET, SO_SNDBUF, 
[163840], [4]) = 0 <0.000108>
15:13:31.665141 (+     0.000386) setsockopt(33, SOL_SOCKET, 
SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted) 
<0.51>
15:13:31.665500 (+     0.000334) setsockopt(33, SOL_SOCKET, SO_SNDBUF, 
[8388608], 4) = 0 <0.000229>
15:13:31.665973 (+     0.000468) sendmsg(33, 
{msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, 
msg_namelen=21, msg_iov=[{iov_base="READY=1\nSTATUS=slapd started: 
Re"..., iov_len=69}], msg_iovlen=1, 

[Freeipa-users] Re: Problems setting up replica on Raspberry Pi 3B (ARM)

2018-05-14 Thread thierry bordaz via FreeIPA-users

Hi Jonathan,

This is weird as the crashing thread stack looks truncated (did you 
copy/paste all of it ?)


Thread 1 (Thread 0x9e13c280 (LWP 17245)):
#0  0xb67bbf2e in strlen () at /lib/libc.so.6
#1  0xb6a06b40 in dosprintf () at /lib/libnspr4.so
#2  0x in None ()

Did you install 389-ds-base-debuginfo ?
How did you get that backtrace ? from a core dumped, pstack ? Can you 
attach a debugger before the crash occurs ?


It looks it crashed soon at startup, could it be related to a broken 
dse.ldif. It should exists a dse.ldif.OK, is it possibly to try to start 
with it ?


best regards
thierry

On 05/12/2018 01:22 AM, Jonathan Vaughn via FreeIPA-users wrote:
Not sure if it makes a difference... I was looking into this again and 
realized I had a bunch of messages from gdb telling me to install more 
debuginfo. I've done that now, here it is again freshly run through gdb


GNU gdb (GDB) Fedora 8.0.1-36.fc27
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 


This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "armv7hl-redhat-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/ns-slapd...Reading symbols from 
/usr/lib/debug/usr/sbin/ns-slapd-1.3.7.10-1.fc27.arm.debug...done.

done.
...

Thread 1 (Thread 0x9e13c280 (LWP 17245)):
#0  0xb67bbf2e in strlen () at /lib/libc.so.6
#1  0xb6a06b40 in dosprintf () at /lib/libnspr4.so
#2  0x in None ()



On Tue, May 8, 2018 at 7:52 AM, Rob Crittenden
> wrote:

Jonathan Vaughn via FreeIPA-users wrote:

Still trying to figure this out. It looks like slapd is
dying, I thought it was still running for some reason.

slapd is dying to segfault. strace of it happening doesn't
seem to reveal much:


A stack trace would very much help trying to track down the cause.


http://directory.fedoraproject.org/docs/389ds/FAQ/faq.html#debugging-crashes



rob


18:32:41.543717 (+     0.000801) openat(AT_FDCWD,
"/var/run/dirsrv/slapd-COMPANY-INTERNAL.pid",
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 32
18:32:41.544907 (+     0.001195) getpid() = 16014
18:32:41.545269 (+     0.000329) fstat64(32,
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
18:32:41.545799 (+     0.000536) write(32, "16014\n", 6) = 6
18:32:41.546603 (+     0.000818) close(32) = 0
18:32:41.547061 (+     0.000448)
chmod("/var/run/dirsrv/slapd-COMPANY-INTERNAL.pid", 0644) = 0
18:32:41.547741 (+     0.000676)
clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961,
tv_nsec=548030641}) = 0
18:32:41.548324 (+     0.000587) write(2,
"[07/May/2018:18:32:41.548030641 "..., 134) = 134
18:32:41.551096 (+     0.002840)
clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961,
tv_nsec=551287555}) = 0
18:32:41.551568 (+     0.000406) write(4,
"[07/May/2018:18:32:41.551287555 "..., 134) = 134
18:32:41.552360 (+     0.000811) fsync(4) = 0
18:32:41.558499 (+     0.006170)
clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961,
tv_nsec=558678099}) = 0
18:32:41.558901 (+     0.000350) write(2,
"[07/May/2018:18:32:41.558678099 "..., 136) = 136
18:32:41.561537 (+     0.002680)
clock_gettime(CLOCK_REALTIME, {tv_sec=1525735961,
tv_nsec=561718659}) = 0
18:32:41.562357 (+     0.000793) write(4,
"[07/May/2018:18:32:41.561718659 "..., 136) = 136
18:32:41.563293 (+     0.001148) fsync(4) = 0
18:32:41.566928 (+     0.003452) getpid() = 16014
18:32:41.567712 (+     0.000752) socket(AF_UNIX,
SOCK_DGRAM|SOCK_CLOEXEC, 0) = 32
18:32:41.568628 (+     0.000912) getsockopt(32,
SOL_SOCKET, SO_SNDBUF, [163840], [4]) = 0
18:32:41.568972 (+     0.000319) setsockopt(32,
SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM
(Operation not permitted)
18:32:41.569548 (+     0.000589) setsockopt(32,
SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0
18:32:41.570064 (+     0.000513) sendmsg(32,

[Freeipa-users] Re: Problem on dirsrv when updating from 4.5.0 (RHEL 7.4) to 4.5.4 (RHEL 7.5)

2018-05-03 Thread thierry bordaz via FreeIPA-users

Hi Soler,

Thanks for the information.
So indexing is hanging because SC cache_init is running, the SC 
cache_init is hanging because SSSD is not started, SSSD is not started 
possibly because indexing prevents to get read access to the backend 
("Backend is offline" TBC).


An option would be to disable SC during upgrade phase ? Rob ?

regards
thierry

On 05/03/2018 11:35 AM, SOLER SANGUESA Miguel wrote:

Hello Thierry,

The version is: slapi-nis-0.56.0-8.el7.x86_64

And the errors are:
[02/May/2018:13:04:30.089731032 +0200] - ERR - schema-compat-plugin - group 
"xxx...@ipa.example.org" does not exist because SSSD is offline.
[02/May/2018:13:04:30.093169411 +0200] - ERR - schema-compat-plugin - waiting 
for SSSD to become online...

SSSD service is up, but as the ipa services are not (all) running, the ipa 
domain is down, logs when started sssd shows:
sssd[be[ipa.example.org]][823]: Backend is offline

and logs sssd_nss.log & sssd_sudo.log have hundreds of:
  [sss_dp_get_reply] (0x0010): The Data Provider returned an error 
[org.freedesktop.sssd.Error.DataProvider.Offline]

Thanks & Regards.

-Original Message-
From: thierry bordaz 
Sent: Thursday, May 03, 2018 11:25
To: SOLER SANGUESA Miguel ; Rob Crittenden ; 
FreeIPA users list 
Cc: Ludwig Krispenz 
Subject: Re: [Freeipa-users] Re: Problem on dirsrv when updating from 4.5.0 
(RHEL 7.4) to 4.5.4 (RHEL 7.5)

me again ...

If it exists some logs (/var/log/dirsrv/slapd-/errors) during the hanging 
period it could also indicate the reason what Schema compat was waiting for (SSSD ?)

On 05/03/2018 10:38 AM, SOLER SANGUESA Miguel wrote:

hello,

Yesterday my ssh console closed the connection, so I had to start again the 
"ipa-server-upgrade", but the result is more or less the same:
# ipa-server-upgrade
Upgrading IPA:. Estimated time: 1 minute 30 seconds
[1/10]: stopping directory server
[2/10]: saving configuration
[3/10]: disabling listeners
[4/10]: enabling DS global lock
[5/10]: starting directory server
[6/10]: updating schema
[7/10]: upgrading server

But now, the lines that are repeated on the access log are:
[03/May/2018:10:33:27.969889221 +0200] conn=6 op=79094 SRCH 
base="cn=indextask_l_137445500911864330_4055,cn=index,cn=tasks,cn=config" scope=0 
filter="(objectClass=*)" attrs="nstaskstatus nstaskexitcode"
[03/May/2018:10:33:27.970146545 +0200] conn=6 op=79094 RESULT err=0
tag=101 nentries=1 etime=1.687740

And it is on the same state than the other:
# ldapsearch -H ldapi://%2fvar%2frun%2fslapd-IPA-EXAMPLE-ORG.socket -b
cn=indextask_l_137445500911864330_4055,cn=index,cn=tasks,cn=config -s
base SASL/EXTERNAL authentication started SASL username:
gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
# extended LDIF
#
# LDAPv3
# base

[Freeipa-users] Re: Problem on dirsrv when updating from 4.5.0 (RHEL 7.4) to 4.5.4 (RHEL 7.5)

2018-05-03 Thread thierry bordaz via FreeIPA-users



On 05/03/2018 10:38 AM, SOLER SANGUESA Miguel wrote:

hello,

Yesterday my ssh console closed the connection, so I had to start again the 
"ipa-server-upgrade", but the result is more or less the same:
# ipa-server-upgrade
Upgrading IPA:. Estimated time: 1 minute 30 seconds
   [1/10]: stopping directory server
   [2/10]: saving configuration
   [3/10]: disabling listeners
   [4/10]: enabling DS global lock
   [5/10]: starting directory server
   [6/10]: updating schema
   [7/10]: upgrading server

But now, the lines that are repeated on the access log are:
[03/May/2018:10:33:27.969889221 +0200] conn=6 op=79094 SRCH 
base="cn=indextask_l_137445500911864330_4055,cn=index,cn=tasks,cn=config" scope=0 
filter="(objectClass=*)" attrs="nstaskstatus nstaskexitcode"
[03/May/2018:10:33:27.970146545 +0200] conn=6 op=79094 RESULT err=0 tag=101 
nentries=1 etime=1.687740

And it is on the same state than the other:
# ldapsearch -H ldapi://%2fvar%2frun%2fslapd-IPA-EXAMPLE-ORG.socket -b 
cn=indextask_l_137445500911864330_4055,cn=index,cn=tasks,cn=config -s base
SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
# extended LDIF
#
# LDAPv3
# base 

[Freeipa-users] Re: Problem on dirsrv when updating from 4.5.0 (RHEL 7.4) to 4.5.4 (RHEL 7.5)

2018-05-03 Thread thierry bordaz via FreeIPA-users

Hi,

During indexing task we should see in the task status the periodic 
progression of the indexing.
May be the indexing is hanging somewhere. When the problem occurs could 
you provide a pstack of the dirsrv server ?


best regards
thierry

On 05/02/2018 10:27 PM, Rob Crittenden wrote:

SOLER SANGUESA Miguel via FreeIPA-users wrote:

Hello,

This is the output of the command (seems that is not complete):

# ldapsearch -H ldapi://%2fvar%2frun%2fslapd-IPA-EXAMPLE-ORG.socket 
-b 
cn=indextask_description_137444551994158920_5958,cn=index,cn=tasks,cn=config 
-s base

SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
# extended LDIF
#
# LDAPv3
# base 

[Freeipa-users] Re: ipa-replica-manage: unable to decode: {replica 7} 58809c7c000300070000 58809c7c000300070000

2018-03-12 Thread thierry bordaz via FreeIPA-users

Hi Harald,

What version of DS are you running ?
We have a reproducer (not systematic) for versions before 
https://bugzilla.redhat.com/show_bug.cgi?id=1516309 but we have not 
reproduced it since then, you may need to upgrade.


best regards
thierry


On 03/12/2018 05:10 PM, Ludwig Krispenz wrote:

Hi,

to get rid of this ruv entry with replicaid 7 you could try to run the 
cleanallruv task directly. On any server (and onöy on one) run


ldapmodify . -D "cn=directory manager"
|dn: cn=clean 7, cn=cleanallruv, cn=tasks, cn=config changetype: add 
objectclass: extensibleObject replica-base-dn:  
replica-id: 7 replica-force-cleaning: yes |
|But I would like to understand how you did get in|to this state, we 
have seen this occasionly, but have no reproducer. Unfortunately the 
csn for replicaid 7 is from Jan, 19th 2017 11:01:16 - so you will 
probably not remember

||




On 03/12/2018 03:55 PM, Harald Dunkel via FreeIPA-users wrote:

Hi folks,

somehow my ipa servers became out of sync. ipa4 has an
additional host entry, not known on the others. On
examining I stumbled over this:


[root@ipa0 ~]# ipa-replica-manage clean-dangling-ruv

unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
These RUVs are dangling and will be removed:
Host: ipabak.ac.example.de
    RUVs:
    id: 11, hostname: ipabak.ac.example.de
    CS-RUVs:
Host: ipa1.example.de
    RUVs:
    id: 11, hostname: ipabak.ac.example.de
    CS-RUVs:
Host: ipa0.example.de
    RUVs:
    id: 11, hostname: ipabak.ac.example.de
    CS-RUVs:
Host: ipa3.example.de
    RUVs:
    id: 11, hostname: ipabak.ac.example.de
    CS-RUVs:
Host: ipa4.example.de
    RUVs:
    id: 11, hostname: ipabak.ac.example.de
    CS-RUVs:
Host: ipa2.example.de
    RUVs:
    id: 11, hostname: ipabak.ac.example.de
    CS-RUVs:
Proceed with cleaning? [no]: yes
unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
Clean the Replication Update Vector for ipabak.ac.example.de:389
Background task created to clean replication data. This may take a 
while.

This may be safely interrupted with Ctrl+C
Cleanup task created

[root@ipa0 ~]# ipa-replica-manage clean-dangling-ruv

unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
No dangling RUVs found

[root@ipa0 ~]# ipa-replica-manage list-ruv

unable to decode: {replica 7} 58809c7c00030007 58809c7c00030007
Replica Update Vectors:
    ipa0.example.de:389: 12
    ipa2.example.de:389: 5
    ipa1.example.de:389: 4
    ipa4.example.de:389: 8
    ipa3.example.de:389: 6
    ipabak.ac.example.de:389: 13
Certificate Server Replica Update Vectors:
    ipa0.example.de:389: 1095
    ipa2.example.de:389: 97
    ipa1.example.de:389: 96
    ipabak.ac.example.de:389: 1090

The ruvs are the same on all 6 hosts (AFAICS), so I wonder how I could
fix this?


Every helpful comment is highly appreciated.
Harri
___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to 
freeipa-users-le...@lists.fedorahosted.org


--
Red Hat GmbH,http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric 
Shander


___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org


[Freeipa-users] Re: Replication failed after ipa-server-upgrade

2017-11-29 Thread thierry bordaz via FreeIPA-users



On 11/29/2017 10:53 PM, Rob Crittenden wrote:

skrawczenko--- via FreeIPA-users wrote:

i'm checking with
ldapsearch -Y GSSAPI -b cn=,cn=replicas,cn=ipa,cn=etc,dc=

and there's just

dn: ...
cn: 
objectClass: ipaConfigObject
objectClass: nsContainer
objectClass: top

right after ldapmodify

[root@idm0 ~]# ipa-replica-manage list
unexpected error: u'ipaconfigstring'

like something is not letting the attribute to be added or removes it 
immediately.

That is sure curious.

Thierry, do you know if the topology plugin would mess with this kind of
entry?

rob

Hi,

   topology plugin should not interfere into the udpate of "cn=,cn=replicas,cn=ipa,cn=etc,dc="
   It catches only updates to replica agreements (under cn=config),
   segments (cn=topology), hosts (cn=masters), domain level (cn=domain
   level).

   About the successful update not taken into account. Do you have the
   portion of access logs/error logs where the update is done ?
   Would you retry

   ldapsearch -D "cn=directory manager" -W -b "cn=,cn=replicas,cn=ipa,cn=etc,dc=" nscpentrywsi

   best regards
   thierry




___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org


[Freeipa-users] Re: Failed Upgrade?

2017-08-10 Thread thierry bordaz via FreeIPA-users



On 08/09/2017 09:30 PM, Ian Harding via FreeIPA-users wrote:


On 8/9/17 3:05 AM, thierry bordaz wrote:


Hi Ian,

Thanks for having gather those data.

#
# So pkidbuser entries have a same (old) userCertificate likely
generated during install
# But only freeipa-sea has a new one created on freeipa-sea
around Jun 8th 2017 05:54:16
# This recent certificate is identified by 5938e68800010429
#
[root@freeipa-sea ianh]# ldapsearch -LLL -D 'cn=directory
manager' -W -b "uid=pkidbuser,ou=people,o=ipaca" nscpentrywsi
dn: uid=pkidbuser,ou=people,o=ipaca
...
nscpentrywsi: userCertificate::
MIIDczCCAlugAwIBAgIBBDANBgkqhkiG9w0BAQsFADA0MR 
nscpentrywsi: userCertificate;vucsn-5938e68800010429::
MIIDbjCCAlagAwIBAgI 

[root@seattlenfs ianh]# ldapsearch -LLL -D 'cn=directory manager'
-W -b uid=pkidbuser,ou=people,o=ipaca nscpentrywsi
dn: uid=pkidbuser,ou=people,o=ipaca
nscpentrywsi: userCertificate::
MIIDczCCAlugAwIBAgIBBDANBgkqhkiG9w0BAQsFADA0MR 



#
# why 5938e68800010429 value was not propagated to seattlenfs ?
# The most recent update (from freeipa-sea) that was replicated
to seattlenfs
# is  1 year old (57be804300070429 - Aug 25th 2016 05:21:07).
# In addition seattlenfs received direct update (last one was in
Jan 1017) that were not
# replicated to freeipa-sea
#
# The two servers have diverged because they can not replicate to
eachother because
# they were not correctly initialize.
# They have different "replicageneration" (57c291d90429
vs 55c8f3ae0060)
#
# It is looking like freeipa-sea was created one or two years ago
and used to initialized seattlenfs.
# But later freeipa-sea was recreated.


That's about right.


#
[root@freeipa-sea ianh]# ldapsearch -LLL -D 'cn=directory
manager' -W -b "o=ipaca"

"(&(objectclass=nstombstone)(nsUniqueId=---))"
Enter LDAP Password:
dn: cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config
nsds50ruv: {replicageneration} 57c291d90429
nsds50ruv: {replica 1065 ldap://freeipa-sea.bpt.rocks:389}
57f840bf0429 598a1c410429
nsds50ruv: {replica 1290 ldap://seattlenfs.bpt.rocks:389}
nsruvReplicaLastModified: {replica 1065
ldap://freeipa-sea.bpt.rocks:389} 598a1c16
nsruvReplicaLastModified: {replica 1290
ldap://seattlenfs.bpt.rocks:389} 

[root@seattlenfs ianh]# ldapsearch -LLL -D 'cn=directory manager'
-W -b "o=ipaca"

"(&(objectclass=nstombstone)(nsUniqueId=---))"
dn: cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config
nsds50ruv: {replicageneration} 55c8f3ae0060
nsds50ruv: {replica 1065 ldap://freeipa-sea.bpt.rocks:389}
57b103d40429 57be804300070429
nsds50ruv: {replica 1290 ldap://seattlenfs.bpt.rocks:389}
57be804c050a 58723615050a
nsruvReplicaLastModified: {replica 1290
ldap://seattlenfs.bpt.rocks:389} 
nsruvReplicaLastModified: {replica 1065
ldap://freeipa-sea.bpt.rocks:389} 



In conclusion:
From replication pov, the two instances can not communicate.
One solution would be to identify which instance is the good one, 
you want to keep.

and reinit the second one from that reference.


What exactly does reinit mean?  I have run

ipa-replica-manage re-initialize --from freeipa-sea.bpt.rocks


This triggers the reinit of 'dc=ipadomain,dc=com' suffix but not for 
o=ipaca suffix. To reinit o=ipaca you may use

ipa-cs-replica-manager re-initialize --from freeipa-sea.bpt.rocks.
Note that it will overwrite all data of ipaca on seattlenfs with the 
data from freeipa-sea. This is why it is important to know which server 
is the valid one.


On freeipa-sea and seattlenfs you may check the ipaca replica agreements 
status before/after reinit. I would guess they are currently reporting 
failures between these two replicas.


ldapsearch -LLL -D 'cn=directory manager' -W -b 
"cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config" 
"objectClass=nsds5replicationagreement" nsds5replicaLastUpdateStatus




several times over the years when replication has stopped.

Replication actually is working, as far as user and machine accounts 
and attributes are concerned anyway, and has been for a while.


There is a zombie server freeipa-dal.bpt.rocks that I can't get rid 
of... it shows up in the GUI on the Topology page but generates a 
Server not found error.  I don't know if that's related.


Zombie servers have been discussed a lot on this mailing list. You may 
get rid of them with list-ruv/clean-ruv subcommand. Replication usually 
manage to work fine even if some zombie entries exist but it is a good 
practice to clean them.


regards
thierry


On 08/08/2017 10:33 PM, Ian Harding via FreeIPA-users wrote:


On 8/7/17 1:44 AM, thierry bordaz wrote:




On 08/07/2017 

[Freeipa-users] Re: Failed Upgrade?

2017-08-08 Thread thierry bordaz via FreeIPA-users



On 08/07/2017 09:22 AM, Florence Blanc-Renaud via FreeIPA-users wrote:

On 08/04/2017 11:02 PM, Ian Harding via FreeIPA-users wrote:

On 8/4/17 2:16 AM, Florence Blanc-Renaud wrote:


On 08/03/2017 11:13 PM, Ian Harding via FreeIPA-users wrote:

On 08/03/2017 12:28 AM, Florence Blanc-Renaud wrote:

On 08/02/2017 11:51 PM, Ian Harding via FreeIPA-users wrote:

On 08/02/2017 12:11 AM, Florence Blanc-Renaud wrote:

On 08/02/2017 01:43 AM, Ian Harding wrote:

On 08/01/2017 12:03 PM, Rob Crittenden wrote:

Ian Harding wrote:

On 08/01/2017 07:39 AM, Florence Blanc-Renaud wrote:

On 08/01/2017 03:11 PM, Ian Harding wrote:

On 08/01/2017 01:48 AM, Florence Blanc-Renaud wrote:

On 08/01/2017 01:32 AM, Ian Harding via FreeIPA-users wrote:



On 07/31/2017 11:34 AM, Rob Crittenden wrote:

Ian Harding via FreeIPA-users wrote:
I had an unexpected restart of an IPA server that had 
apparently had
updates run but had not been restarted. ipactl says 
pki-tomcatd

would
not start.

Strangely, the actual service appears to be running:



dogtag is an application within tomcat so tomcat can run 
without

dogtag
running.

We need to see more of the dogtag debug log to see what 
is going on.




It looks like an authentication problem...

[28/Jul/2017:10:08:47][localhost-startStop-1]: SSL 
handshake happened
Could not connect to LDAP server host 
seattlenfs.bpt.rocks port 636
Error netscape.ldap.LDAPException: Authentication failed 
(49)




Hi,

dogtag stores its internal data in the LDAP server and 
needs to

establish a secure LDAP connection. You can check how this
connection is configured in /etc/pki/pki-tomcat/ca/CS.cfg, 
look for

the lines:

internaldb.ldapauth.authtype=SslClientAuth
internaldb.ldapauth.bindDN=cn=Directory Manager
internaldb.ldapauth.bindPWPrompt=internaldb
internaldb.ldapauth.clientCertNickname=subsystemCert 
cert-pki-ca

internaldb.ldapconn.host=vm-...
internaldb.ldapconn.port=636
internaldb.ldapconn.secureConn

authtype can be SslClientAuth (authentication with a ssl
certificate) or BasicAuth (authentication with a bind DN and
password stored in 
/var/lib/pki/pki-tomcat/conf/password.conf).


You can use this information to manually check the 
credentials. For

instance with sslclientauth:

export LDAPTLS_CACERTDIR=/etc/pki/pki-tomcat/alias
export LDAPTLS_CERT='subsystemCert cert-pki-ca'

ldapsearch -H ldaps://`hostname`:636 -b "" -s base -Y 
EXTERNAL
(provide the password from 
/etc/pki/pki-tomcat/alias/pwdfile.txt)




I found this:

internaldb.ldapauth.authtype=SslClientAuth
internaldb.ldapauth.bindDN=uid=pkidbuser,ou=people,o=ipaca
internaldb.ldapauth.bindPWPrompt=internaldb
internaldb.ldapauth.clientCertNickname=subsystemCert 
cert-pki-ca

internaldb.ldapconn.cloneReplicationPort=389
...

and when I try the ldapsearch I am presented with a prompt 
to provide

a pin/password

Please enter pin, password, or pass phrase for security 
token 'ldap(0)':


but there is no password file...


Hi,

you are right, in 4.4. there is no pwdfile.txt and the 
password can be
found in /var/lib/pki/pki-tomcat/conf/password.conf (with 
the tag

internal=...)

Can you check if the password with the tag internal=... 
allows to read

the keys from the NSS db?
certutil -K -d /etc/pki/pki-tomcat/alias
(provide password)


That works...

# certutil -K -d /etc/pki/pki-tomcat/alias
certutil: Checking token "NSS Certificate DB" in slot "NSS 
User Private

Key and Certificate Services"
Enter Password or Pin for "NSS Certificate DB":
< 0> rsa 0f327e760a7eecdcf6973f5dc57ca5367c592d64 (orphan)
< 1> rsa b12580c7c696cfcd8aefc9405a7a870b24b7b96a   NSS 
Certificate

DB:auditSigningCert cert-pki-ca
< 2> rsa 881b7254c40fa40bc50681bcc8d37bb3eb49937e caSigningCert
cert-pki-ca
< 3> rsa fa9a255a1d15585ac28064c0f4986e416bc48403   NSS 
Certificate

DB:ocspSigningCert cert-pki-ca
< 4> rsa 3fb609d0f7d72c2d325d6a2dc16577a7f7e5a01f Server-Cert
cert-pki-ca
< 5> rsa 1e9479a9556af9339bb5e4552ccbd381d3c38856   NSS 
Certificate

DB:subsystemCert cert-pki-ca

But this doesn't (with the same password from password.conf)

# ldapsearch -H ldaps://`hostname`:636 -b "" -s base -Y EXTERNAL
Please enter pin, password, or pass phrase for security token 
'ldap(0)':

SASL/EXTERNAL authentication started
ldap_sasl_interactive_bind_s: Invalid credentials (49)

That password is getting me somewhere though, since if I put 
in a

nonsense or incorrect password it just prompts over and over.


Let's step back a second. You upgraded from what to what?


There wasn't much of a change... I just assumed someone ran yum 
upgrade and didn't restart, then the power outage... it looks 
like not much of a version change though.


# grep ipa /var/log/yum.log
Jan 08 04:45:32 Installed: 
ipa-common-4.4.0-14.el7.centos.1.1.noarch
Jan 08 04:45:32 Installed: 
ipa-client-common-4.4.0-14.el7.centos.1.1.noarch

Jan 08 04:46:06 Updated: libipa_hbac-1.14.0-43.el7_3.4.x86_64
Jan 08 04:46:07 Updated: 
python-libipa_hbac-1.14.0-43.el7_3.4.x86_64

Jan 08 04:46:08 Installed: 

[Freeipa-users] Re: very slow remove users process

2017-06-09 Thread thierry bordaz via FreeIPA-users

Hi,

Just for recording, this issue of slow user-del will be track with 
https://pagure.io/389-ds-base/issue/49286


regards
thierry
On 05/31/2017 03:45 PM, thierry bordaz via FreeIPA-users wrote:



On 05/31/2017 03:30 PM, Rob Crittenden wrote:

thierry bordaz via FreeIPA-users wrote:

Hi Adrian,

deleting a user triggers several LDAP requests. In case the performance
hit comes from slow DS requests, you would look in DS access log
(/var/log/dirsrv/slapd-/access for the set of triggered requests.
In each response, etime indicates the time spent by the each req. Can
you isolate a/some specific req. that are very very long ?

My guess would be it is slow in removing the user from ipausers. I'm not
sure how that shows in the logs as it is likely done by a plugin.


I agree it is the most likely expensive req.
It may depends how many entries are referring to the deleted user but 
16sec is much higher than what expected.




rob


regards
thierry

On 05/31/2017 12:52 AM, Adrian HY via FreeIPA-users wrote:

Hi folks, I have a freeipa group with 3 users to delete. The
process is very very slow.  For example:

# time ipa -v user-del vvv

-
Deleted user "vvv"
-

real0m16.913s
user0m0.814s
sys 0m0.084s

The hardware parameters are normal. The hard drive is SSD.

Regards.


___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to 
freeipa-users-le...@lists.fedorahosted.org



___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to 
freeipa-users-le...@lists.fedorahosted.org



___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to 
freeipa-users-le...@lists.fedorahosted.org

___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org


[Freeipa-users] Re: Fwd: matching rule errors?

2017-05-24 Thread thierry bordaz via FreeIPA-users


Hello Zak,

In fact 'dc' is IAString (e.g. ascii) (1.3.6.1.4.1.1466.115.121.1.26) 
and so can be match with caseIgnoreIA5Match and 
caseIgnoreIA5SubstringsMatch matching rules.
Directory string (e.g. UTF-8) (1.3.6.1.4.1.1466.115.121.1.15) can not. 
It should however work if the 'dc' only contains ascii character set.


Would you check in your DS schema 
(/etc/dirsrv/slapd-/schema/00core.ldif and 99user.ldif) what 
is the syntax for 'dc' ? it should be IAString.


regards
thierry


On 05/23/2017 08:05 PM, Zak Wolfinger via FreeIPA-users wrote:

Running FreeIPA Version 4.2.0

Seeing a lot of these in the slapd error log:

the EQUALITY matching rule [caseIgnoreIA5Match] is not compatible with 
the syntax [1.3.6.1.4.1.1466.115.121.1.15] for the attribute [dc]
the SUBSTR matching rule [caseIgnoreIA5SubstringsMatch] is not 
compatible with the syntax [1.3.6.1.4.1.1466.115.121.1.15] for the 
attribute [dc]


Any clue as to what this means and specifically how to fix it?

Cheers,
*Zak Wolfinger*

Infrastructure Engineer  |  Emma®
zak.wolfin...@myemma.com 
800.595.4401 or 615.292.5888 x197
615.292.0777 (fax)
*
*Emma helps organizations everywhere communicate & market in style.
Visit us online at www.myemma.com 






___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org


___
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org