Re: Some principals not replicating

2018-06-15 Thread Viktor Dukhovni



> On Jun 15, 2018, at 5:31 PM, Adam Lewenberg  wrote:
> 
> PROBLEM: Some of the principals will not replicate.

Well updates to the principal are not replicating...

> If I go on the master and change the password of one of these problematic 
> principals, I 
> see this in the replica's logs:

That's a "modify" not a "create" and modify requires the object
to already be there.  The iprop log is "sparse", recording only
the modified data when doing "modify", so the principal can't
be created just from the latest "modify" record.

> QUESTION: What could be a reason for this principal not to replicate?

You need to stop the slaves, blow away their database and logs,
and replicate the full database from scratch.

-- 
Viktor.



Re: Some principals not replicating

2018-06-15 Thread Adam Lewenberg




On 6/15/2018 6:21 PM, Viktor Dukhovni wrote:




On Jun 15, 2018, at 6:29 PM, Adam Lewenberg  wrote:

This (or something much like it) appears in the initial replication on three 
separate 1.5.2 slaves:


You *really* should upgrade the slaves as soon as possible,
however:


2018-06-15T17:45:12 ipropd-slave started at version: 0
2018-06-15T17:45:12 receive complete database
2018-06-15T17:45:47 receive complete database, version 114134


The master has a complete database snapshot whose version is
contiguous with the log.  Therefore, instead of sending you
the full database, you're getting the snapshot + incremental
logs after that:


2018-06-15T17:46:44 replaying entry 114135
2018-06-15T17:46:44 replaying entry 114136
2018-06-15T17:46:44 replaying entry 114137
2018-06-15T17:46:44 replaying entry 114138


...

However, there's something amiss with the snapshot
or logs.  Better to delete the snapshot on the
master and let it generate a new one, then resync
the slaves.  Or there's something wrong with the
iprop code on the slaves, in any case a truly
complete snapshot stands a better chance.


Thanks for your quick reply.

When you say "delete the snapshot on the master and let it generate a 
new one" I assume you meant "iprop-log truncate --reset", yes?


Anyway, I did that. All the slaves re-synced and now the "bad" 
principals are showing up on the slaves.


Thanks!

Adam Lewenberg









Re: Some principals not replicating

2018-06-15 Thread Viktor Dukhovni



> On Jun 15, 2018, at 6:29 PM, Adam Lewenberg  wrote:
> 
> This (or something much like it) appears in the initial replication on three 
> separate 1.5.2 slaves:

You *really* should upgrade the slaves as soon as possible,
however:

> 2018-06-15T17:45:12 ipropd-slave started at version: 0
> 2018-06-15T17:45:12 receive complete database
> 2018-06-15T17:45:47 receive complete database, version 114134

The master has a complete database snapshot whose version is
contiguous with the log.  Therefore, instead of sending you
the full database, you're getting the snapshot + incremental
logs after that:

> 2018-06-15T17:46:44 replaying entry 114135
> 2018-06-15T17:46:44 replaying entry 114136
> 2018-06-15T17:46:44 replaying entry 114137
> 2018-06-15T17:46:44 replaying entry 114138

...

However, there's something amiss with the snapshot
or logs.  Better to delete the snapshot on the
master and let it generate a new one, then resync
the slaves.  Or there's something wrong with the
iprop code on the slaves, in any case a truly
complete snapshot stands a better chance.

-- 
Viktor.



Re: Some principals not replicating

2018-06-15 Thread Adam Lewenberg

I think I was not clear in my original post. Let me clarify.

I have a master KDC running Heimdal 7.1. In its database is a principal 
called "fprefect" which, as far as I can tell, acts like a normal 
principal. I can do "get fprefect" and the output looks normal. If I 
point to this master and do a "kinit fprefect" I get a TGT.


However, if I bring up a new slave KDC (no database, no transaction log) 
that points to this master, the KDC _appears_ to get the entire database 
from the master, except that the principal "fprefect" is missing. This 
happens if the slave KDC runs 7.1 or if it runs 1.5.2. (There are some 
strange messages in the iprop log on the 1.5.2 slave; see my original 
e-mail for details.)


I don't know how this principal got into this strange state on the 
master, and I don't know how to replicate this issue.


It makes me think that the database on the master is corrupted in some 
subtle way.


I am hoping that someone can tell me some way to query or examine the 
database on the master to get some information that might throw some 
light on why this particular principal behaves this way.


Adam Lewenberg


On 6/15/2018 3:29 PM, Adam Lewenberg wrote:



On 6/15/2018 3:04 PM, Viktor Dukhovni wrote:




On Jun 15, 2018, at 5:31 PM, Adam Lewenberg  wrote:

PROBLEM: Some of the principals will not replicate.


Well updates to the principal are not replicating...

If I go on the master and change the password of one of these 
problematic principals, I

see this in the replica's logs:


That's a "modify" not a "create" and modify requires the object
to already be there.  The iprop log is "sparse", recording only
the modified data when doing "modify", so the principal can't
be created just from the latest "modify" record.


QUESTION: What could be a reason for this principal not to replicate?


You need to stop the slaves, blow away their database and logs,
and replicate the full database from scratch.


I did this. On three different slaves. The problematic principals do not 
appear in the slave's database. To be clear: even after initial 
replication (starting from nothing on the slave) some of the principal's 
do not appear in the slave's database.


This (or something much like it) appears in the initial replication on 
three separate 1.5.2 slaves:


2018-06-15T17:45:12 ipropd-slave started at version: 0
2018-06-15T17:45:12 receive complete database
2018-06-15T17:45:47 receive complete database, version 114134
2018-06-15T17:46:44 replaying entry 114135
2018-06-15T17:46:44 replaying entry 114136
2018-06-15T17:46:44 replaying entry 114137
2018-06-15T17:46:44 replaying entry 114138

... many lines like this until ...

2018-06-15T17:46:45 replaying entry 131686
2018-06-15T17:46:45 replaying entry 131687
2018-06-15T17:46:45 replaying entry 131688
2018-06-15T17:46:45 replaying entry 131689
2018-06-15T17:46:45 Ignoring command 8
2018-06-15T17:50:03 replaying entry 131690
2018-06-15T17:50:03 Ignoring command 8
2018-06-15T17:50:03 replaying entry 131691
2018-06-15T17:50:03 Ignoring command 8
2018-06-15T17:50:03 replaying entry 131692
2018-06-15T17:50:03 Ignoring command 8
2018-06-15T17:51:03 replaying entry 131693
2018-06-15T17:51:03 Ignoring command 8
2018-06-15T17:56:52 replaying entry 131694
2018-06-15T17:56:52 Ignoring command 8
2018-06-15T18:00:03 replaying entry 131695
2018-06-15T18:00:03 Ignoring command 8

... more lines much like until ...

2018-06-15T20:16:57 Ignoring command 8
2018-06-15T20:18:53 replaying entry 131814
2018-06-15T20:18:53 kadm5_log_replay: 131814. Lost entry entry, Database 
out of sync ?: No such entry in the database (36150275)

2018-06-15T20:18:53 Ignoring command 8
2018-06-15T20:19:23 Ignoring command 8
2018-06-15T20:20:02 replaying entry 131815







Re: Some principals not replicating

2018-06-15 Thread Greg Hudson

On 06/15/2018 06:29 PM, Adam Lewenberg wrote:
I did this. On three different slaves. The problematic principals do not 
appear in the slave's database. To be clear: even after initial 
replication (starting from nothing on the slave) some of the principal's 
do not appear in the slave's database.


What database type is the master KDC using?  If you dump the master DB 
and look for one of the principals which is missing in the other 
databases, is it present in the dump file?


Re: Some principals not replicating

2018-06-15 Thread Adam Lewenberg




On 6/15/2018 3:04 PM, Viktor Dukhovni wrote:




On Jun 15, 2018, at 5:31 PM, Adam Lewenberg  wrote:

PROBLEM: Some of the principals will not replicate.


Well updates to the principal are not replicating...


If I go on the master and change the password of one of these problematic 
principals, I
see this in the replica's logs:


That's a "modify" not a "create" and modify requires the object
to already be there.  The iprop log is "sparse", recording only
the modified data when doing "modify", so the principal can't
be created just from the latest "modify" record.


QUESTION: What could be a reason for this principal not to replicate?


You need to stop the slaves, blow away their database and logs,
and replicate the full database from scratch.


I did this. On three different slaves. The problematic principals do not 
appear in the slave's database. To be clear: even after initial 
replication (starting from nothing on the slave) some of the principal's 
do not appear in the slave's database.


This (or something much like it) appears in the initial replication on 
three separate 1.5.2 slaves:


2018-06-15T17:45:12 ipropd-slave started at version: 0
2018-06-15T17:45:12 receive complete database
2018-06-15T17:45:47 receive complete database, version 114134
2018-06-15T17:46:44 replaying entry 114135
2018-06-15T17:46:44 replaying entry 114136
2018-06-15T17:46:44 replaying entry 114137
2018-06-15T17:46:44 replaying entry 114138

... many lines like this until ...

2018-06-15T17:46:45 replaying entry 131686
2018-06-15T17:46:45 replaying entry 131687
2018-06-15T17:46:45 replaying entry 131688
2018-06-15T17:46:45 replaying entry 131689
2018-06-15T17:46:45 Ignoring command 8
2018-06-15T17:50:03 replaying entry 131690
2018-06-15T17:50:03 Ignoring command 8
2018-06-15T17:50:03 replaying entry 131691
2018-06-15T17:50:03 Ignoring command 8
2018-06-15T17:50:03 replaying entry 131692
2018-06-15T17:50:03 Ignoring command 8
2018-06-15T17:51:03 replaying entry 131693
2018-06-15T17:51:03 Ignoring command 8
2018-06-15T17:56:52 replaying entry 131694
2018-06-15T17:56:52 Ignoring command 8
2018-06-15T18:00:03 replaying entry 131695
2018-06-15T18:00:03 Ignoring command 8

... more lines much like until ...

2018-06-15T20:16:57 Ignoring command 8
2018-06-15T20:18:53 replaying entry 131814
2018-06-15T20:18:53 kadm5_log_replay: 131814. Lost entry entry, Database 
out of sync ?: No such entry in the database (36150275)

2018-06-15T20:18:53 Ignoring command 8
2018-06-15T20:19:23 Ignoring command 8
2018-06-15T20:20:02 replaying entry 131815