mmap MAP_FIXED, Linux 4.17

2018-06-14 Thread Howard Chu
There's a new mmap option MAP_FIXED_NOREPLACE added in the Linux 4.17 kernel. 
This will check if the requested mapping overlaps any existing maps and fail 
if so, allowing fixed address mappings to be used more safely. Funny to see 
since we already worked around this danger in LMDB a long time ago.


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a4ff8e8620d3f4f50ac4b41e8067b7d395056843

As usual, adding more stuff to the kernel even though userland solutions 
already existed.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Re: [LMDB] how does reader thread read meta page without locking and avoid data race at the same time?

2018-06-14 Thread Chuntao HONG
I was not sure txn can be copied. Write txns tracks all iterators, so it
can close them at the end of the txn. But I was not sure read txns have
similar things. Now I know I can just copy the txn. Thanks. :-)

Does the "seeing wrong data" scenario apply to different threads in the
same process? As I understand it, the writer overwrites the old meta page,
regardless of whether some reader is trying to read it. So there is chance
that a reader thread gets suspended in the middle of mdb_txn_renew0(), in
which it picked the newest txn_id as Tn, and then suspended before the
memcpy. Then the writer thread comes in, executes two write transactions,
and updates the same meta page as Tn. If the reader wakes up when the
writer is updating the same meta page, it can see partially updated data.
In that case, what problem could that cause? Should we be worried (would it
cause a crash?) about it or the reader will just read the contents in Tn+2?

On Fri, Jun 15, 2018 at 1:35 AM, Howard Chu  wrote:

> Chuntao HONG wrote:
>
>> Background:
>>
>> I am trying to modify the LMDB code so we can have multiple threads
>> reading from the same snapshot (same txn_id) at the same time. I am trying
>> to do this in a "fork txn" way. Basically I have a master thread with a
>> read txn, and then I try to create txns in the slave threads with the same
>> txn_id. So I modified the mdb_txn_renew0() function and provide it with the
>> txn_id the master thread is holding. With that I hope the slave
>> transactions can read the same meta page because we pick the meta pages with
>>
>> meta = env->me_metas[txn->mt_txnid & 1];
>>
>> But then I realized I might be doing it wrong. There are only two meta
>> pages used in LMDB. So what if there had been two write transactions
>> committed after the master thread held its transaction, i.e. the master
>> thread has txn_id==N and current txn_id==N+2? That means the meta page was
>> over-written and the slave thread may read different data from the meta
>> page than the master.
>>
>
> Why would you bother doing this? Just copy the master's txn structure.
>
> Then the question in the header popped into my mind. When reader threads
>> are created, they copy the meta-db infos with a memcpy like this:
>>
>> memcpy(txn->mt_dbs, meta->mm_dbs, CORE_DBS * sizeof(MDB_db));
>>
>> But if the meta page was written in the middle of the memcpy, we can get
>> corrupted data. I am sure there is some code that prevents this data race
>> from happening, since we have been using LMDB with multiple threads for
>> quite a while. Could someone point me to the code that prevents the data
>> race from happening?
>>
>
> There is no data race. Readers are always reading the newer meta page,
> writers only overwrite the older meta page. As noted in the Caveats, if you
> suspend a process while it's opening a read transaction, it can see the
> wrong data.
>
> --
>   -- Howard Chu
>   CTO, Symas Corp.   http://www.symas.com
>   Director, Highland Sun http://highlandsun.com/hyc/
>   Chief Architect, OpenLDAP  http://www.openldap.org/project/
>


Re: Config questions for back-ldap, back-meta, and back-asyncmeta

2018-06-14 Thread Michael Ströder
On 06/14/2018 11:58 PM, Howard Chu wrote:
> Michael Ströder wrote:
>> On 06/14/2018 10:44 PM, Howard Chu wrote:
>>> Quanah Gibson-Mount wrote:
 idle-timeout -> The man page says takes an integer, but is defined as
 a string.  However, I think the man page for this parameter is
 incorrect, and in fact it takes a possible string as defined in the
 back-meta/async manual pages for this same parameter. (I.e, it can
 have a format of something like 1d15h5s)
>>>
>>> I don't see this. The man page says "". It looks correct to me.
>>
>> Wouldn't it be better to consequently convert time strings such as
>> 1d15h5s to integer seconds during migration of static config to dynamic
>> config? IMO for LDAP-on-the-wire those values should always be integer
>> representing seconds (or milli-seconds if needed).
>>
>> I mean back-config content is meant to be machine-processable and
>> cleaner syntax would reduce unneeded complexity.
> 
> The complexity is already there, and obviously somebody thought it was
> desirable for these things to be human-readable. We're doing them no
> favors by converting to straight integers.

It's not only about the complexity in OpenLDAP software itself. All
3rd-party components which want to make use of back-config for automated
configuration have to deal with it. And that's not going to happen.

Ciao, Michael.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Config questions for back-ldap, back-meta, and back-asyncmeta

2018-06-14 Thread Howard Chu

Michael Ströder wrote:

On 06/14/2018 10:44 PM, Howard Chu wrote:

Quanah Gibson-Mount wrote:

idle-timeout -> The man page says takes an integer, but is defined as
a string.  However, I think the man page for this parameter is
incorrect, and in fact it takes a possible string as defined in the
back-meta/async manual pages for this same parameter. (I.e, it can
have a format of something like 1d15h5s)


I don't see this. The man page says "". It looks correct to me.


Wouldn't it be better to consequently convert time strings such as
1d15h5s to integer seconds during migration of static config to dynamic
config? IMO for LDAP-on-the-wire those values should always be integer
representing seconds (or milli-seconds if needed).

I mean back-config content is meant to be machine-processable and
cleaner syntax would reduce unneeded complexity.


The complexity is already there, and obviously somebody thought it was 
desirable for these things to be human-readable. We're doing them no favors by 
converting to straight integers.


Of course, I would have preferred something like DD+HH:MM:SS instead of this 
XdYhZs format, but no, we're not going to change the expected input syntax 
after the fact.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Re: Config questions for back-ldap, back-meta, and back-asyncmeta

2018-06-14 Thread Michael Ströder
On 06/14/2018 10:44 PM, Howard Chu wrote:
> Quanah Gibson-Mount wrote:
>> idle-timeout -> The man page says takes an integer, but is defined as
>> a string.  However, I think the man page for this parameter is
>> incorrect, and in fact it takes a possible string as defined in the
>> back-meta/async manual pages for this same parameter. (I.e, it can
>> have a format of something like 1d15h5s)
> 
> I don't see this. The man page says "". It looks correct to me.

Wouldn't it be better to consequently convert time strings such as
1d15h5s to integer seconds during migration of static config to dynamic
config? IMO for LDAP-on-the-wire those values should always be integer
representing seconds (or milli-seconds if needed).

I mean back-config content is meant to be machine-processable and
cleaner syntax would reduce unneeded complexity.

Ciao, Michael.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Config questions for back-ldap, back-meta, and back-asyncmeta

2018-06-14 Thread Howard Chu

Quanah Gibson-Mount wrote:
There are three options between back-ldap, back-meta, and back-asyncmeta that 
seem to have an incorrect defintion for cn=config and/or a documentation bug.


For back-ldap:

idle-timeout -> The man page says takes an integer, but is defined as a 
string.  However, I think the man page for this parameter is incorrect, and in 
fact it takes a possible string as defined in the back-meta/async manual pages 
for this same parameter. (I.e, it can have a format of something like 1d15h5s)


I don't see this. The man page says "". It looks correct to me.



For back-ldap, back-meta, and back-asyncmeta:

network-timeout -> This takes an integer, but is defined as a string.  The 
back-ldap, back-meta, and back-asyncmeta man pages says it uses the same 
format as idle-timeout, but the function that parses the value does not agree 
with assertion.  It appears to take only accept an integer.


Looks to me like it uses lutil_parse_time, same as idle-timeout.
But in back-meta network-timeout is displayed as an integer, while 
idle-timeout uses lutil_unparse_time. network-timeout probably should be using 
unparse_time as well.


For back-meta and back-asyncmeta:

bind-timeout -> This is clearly described in the man page as a taking an 
integer value, but it is defined as a string.  Any objection to me changing it 
to be an integer type?


I guess that's OK.



Thanks!

--Quanah

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:







--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Re: Deprecated options for slapd-(ldap|meta|asyncmeta)

2018-06-14 Thread Howard Chu

Quanah Gibson-Mount wrote:
There are several deprecated/obsolete options in the ldap, meta, and async 
backends.  Is there any reason not to remove these in git master?  I.e., is 
there a particular reason to keep them for the OpenLDAP 2.5 release? They're 
clearly documented as obsolete in the 2.4 docs:


Go ahead and remove them for 2.5.


Example:

   acl-passwd 
  Formerly known as the bindpw, it is the password used  with the
  above acl-authcDN directive.  This directive is obsoleted by the
  credentials arg of acl-bind when bindmethod=simple, and will be
  dismissed in the future.




--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Re: Issue with mdb_cursor_put

2018-06-14 Thread Howard Chu

Brüns, Stefan wrote:

Hi,

I see a similar problem as reported in October 2017 for mdb_cursor_del, i.e.
pages ending up twice on the dirty list.

Backtrace:
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x761c8da1 in __GI_abort () at abort.c:79
#2  0x75a31052 in mdb_assert_fail (env=0x5579d6e0,
expr_txt=expr_txt@entry=0x75a3294f "rc == 0",
func=func@entry=0x75a33268 <__func__.7016> "mdb_page_dirty",
line=line@entry=2127, file=0x75a32930 "mdb.c") at mdb.c:1542
#3  0x75a25fe5 in mdb_page_dirty (txn=0x5579eaa0, mp=) at mdb.c:2127
#4  0x75a2720b in mdb_page_alloc (num=num@entry=1,
mp=mp@entry=0x7fffd110, mc=) at mdb.c:2308
#5  0x75a29114 in mdb_page_new (mc=mc@entry=0x7fffd2f0,
flags=flags@entry=4, num=1, mp=mp@entry=0x7fffd170) at mdb.c:7147
#6  0x75a29519 in mdb_node_add (mc=mc@entry=0x7fffd2f0,
indx=, key=key@entry=0x7fffd6c0, data=0x7fffd6d0,
pgno=pgno@entry=0, flags=0) at mdb.c:7289
#7  0x75a2ca59 in mdb_cursor_put (mc=0x7fffd2f0,
key=0x7fffd6c0, data=0x7fffd6d0, flags=) at mdb.c:6916
#8  0x75a2ee4b in mdb_put (txn=0x5579eaa0, dbi=3,
key=key@entry=0x7fffd6c0, data=data@entry=0x7fffd6d0,
flags=flags@entry=0) at mdb.c:8991
---

I have tracked down the moment where the duplicate pgno is added to the list:
---
#0  mdb_page_alloc (num=num@entry=6, mp=mp@entry=0x7fffd110, mc=) at mdb.c:2277
#1  0x75a29114 in mdb_page_new (mc=mc@entry=0x7fffd2f0,
flags=flags@entry=4, num=6, mp=mp@entry=0x7fffd170) at mdb.c:7147
#2  0x75a29519 in mdb_node_add (mc=mc@entry=0x7fffd2f0,
indx=, key=key@entry=0x7fffd6c0, data=0x7fffd6d0,
pgno=pgno@entry=0, flags=0) at mdb.c:7289
#3  0x75a2ca59 in mdb_cursor_put (mc=0x7fffd2f0,
key=0x7fffd6c0, data=0x7fffd6d0, flags=) at mdb.c:6916
#4  0x75a2ee4b in mdb_put (txn=0x5579eaa0, dbi=3,
key=key@entry=0x7fffd6c0, data=data@entry=0x7fffd6d0,
flags=flags@entry=0) at mdb.c:8991
---

(gdb) p idl[0]
$22 = 47

(gdb) x/18g  &idl[ 30 ]
0x7fbff2d1ef20: 20234   20230
0x7fbff2d1ef30: 20229   20228
0x7fbff2d1ef40: 19230   19228
0x7fbff2d1ef50: 17181   17180
0x7fbff2d1ef60: 15736   15736   <- double entry
0x7fbff2d1ef70: 15274   8470
0x7fbff2d1ef80: 84387176
0x7fbff2d1ef90: 71744758
0x7fbff2d1efa0: 47191213

So the double page number is already stored in the freelist in the database,
i.e. the database itself is corrupt.

The database was initially created with lmdb 0.9.17, currently I am using
0.9.22.

Any idea how to deal with this issue?


Can you list the steps to reproduce the issue?

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



Issue with mdb_cursor_put

2018-06-14 Thread Brüns , Stefan
Hi,

I see a similar problem as reported in October 2017 for mdb_cursor_del, i.e. 
pages ending up twice on the dirty list.

Backtrace:
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x761c8da1 in __GI_abort () at abort.c:79
#2  0x75a31052 in mdb_assert_fail (env=0x5579d6e0, 
expr_txt=expr_txt@entry=0x75a3294f "rc == 0", 
func=func@entry=0x75a33268 <__func__.7016> "mdb_page_dirty", 
line=line@entry=2127, file=0x75a32930 "mdb.c") at mdb.c:1542
#3  0x75a25fe5 in mdb_page_dirty (txn=0x5579eaa0, mp=) at mdb.c:2127
#4  0x75a2720b in mdb_page_alloc (num=num@entry=1, 
mp=mp@entry=0x7fffd110, mc=) at mdb.c:2308
#5  0x75a29114 in mdb_page_new (mc=mc@entry=0x7fffd2f0, 
flags=flags@entry=4, num=1, mp=mp@entry=0x7fffd170) at mdb.c:7147
#6  0x75a29519 in mdb_node_add (mc=mc@entry=0x7fffd2f0, 
indx=, key=key@entry=0x7fffd6c0, data=0x7fffd6d0, 
pgno=pgno@entry=0, flags=0) at mdb.c:7289
#7  0x75a2ca59 in mdb_cursor_put (mc=0x7fffd2f0, 
key=0x7fffd6c0, data=0x7fffd6d0, flags=) at mdb.c:6916
#8  0x75a2ee4b in mdb_put (txn=0x5579eaa0, dbi=3, 
key=key@entry=0x7fffd6c0, data=data@entry=0x7fffd6d0, 
flags=flags@entry=0) at mdb.c:8991
---

I have tracked down the moment where the duplicate pgno is added to the list:
---
#0  mdb_page_alloc (num=num@entry=6, mp=mp@entry=0x7fffd110, mc=) at mdb.c:2277
#1  0x75a29114 in mdb_page_new (mc=mc@entry=0x7fffd2f0, 
flags=flags@entry=4, num=6, mp=mp@entry=0x7fffd170) at mdb.c:7147
#2  0x75a29519 in mdb_node_add (mc=mc@entry=0x7fffd2f0, 
indx=, key=key@entry=0x7fffd6c0, data=0x7fffd6d0, 
pgno=pgno@entry=0, flags=0) at mdb.c:7289
#3  0x75a2ca59 in mdb_cursor_put (mc=0x7fffd2f0, 
key=0x7fffd6c0, data=0x7fffd6d0, flags=) at mdb.c:6916
#4  0x75a2ee4b in mdb_put (txn=0x5579eaa0, dbi=3, 
key=key@entry=0x7fffd6c0, data=data@entry=0x7fffd6d0, 
flags=flags@entry=0) at mdb.c:8991
---

(gdb) p idl[0]
$22 = 47

(gdb) x/18g  &idl[ 30 ]
0x7fbff2d1ef20: 20234   20230
0x7fbff2d1ef30: 20229   20228
0x7fbff2d1ef40: 19230   19228
0x7fbff2d1ef50: 17181   17180
0x7fbff2d1ef60: 15736   15736   <- double entry
0x7fbff2d1ef70: 15274   8470
0x7fbff2d1ef80: 84387176
0x7fbff2d1ef90: 71744758
0x7fbff2d1efa0: 47191213

So the double page number is already stored in the freelist in the database, 
i.e. the database itself is corrupt.

The database was initially created with lmdb 0.9.17, currently I am using 
0.9.22.

Any idea how to deal with this issue?

Kind regards, Stefan


-- 
Stefan Brüns  /  Bergstraße 21  /  52062 Aachen
home: +49 241 53809034 mobile: +49 151 50412019



Re: [LMDB] how does reader thread read meta page without locking and avoid data race at the same time?

2018-06-14 Thread Howard Chu

Chuntao HONG wrote:

Background:

I am trying to modify the LMDB code so we can have multiple threads reading 
from the same snapshot (same txn_id) at the same time. I am trying to do this 
in a "fork txn" way. Basically I have a master thread with a read txn, and 
then I try to create txns in the slave threads with the same txn_id. So I 
modified the mdb_txn_renew0() function and provide it with the txn_id the 
master thread is holding. With that I hope the slave transactions can read the 
same meta page because we pick the meta pages with


meta = env->me_metas[txn->mt_txnid & 1];

But then I realized I might be doing it wrong. There are only two meta pages 
used in LMDB. So what if there had been two write transactions committed after 
the master thread held its transaction, i.e. the master thread has txn_id==N 
and current txn_id==N+2? That means the meta page was over-written and the 
slave thread may read different data from the meta page than the master.


Why would you bother doing this? Just copy the master's txn structure.

Then the question in the header popped into my mind. When reader threads are 
created, they copy the meta-db infos with a memcpy like this:


memcpy(txn->mt_dbs, meta->mm_dbs, CORE_DBS * sizeof(MDB_db));

But if the meta page was written in the middle of the memcpy, we can get 
corrupted data. I am sure there is some code that prevents this data race from 
happening, since we have been using LMDB with multiple threads for quite a 
while. Could someone point me to the code that prevents the data race from 
happening?


There is no data race. Readers are always reading the newer meta page, writers 
only overwrite the older meta page. As noted in the Caveats, if you suspend a 
process while it's opening a read transaction, it can see the wrong data.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/



[LMDB] how does reader thread read meta page without locking and avoid data race at the same time?

2018-06-14 Thread Chuntao HONG
Background:

I am trying to modify the LMDB code so we can have multiple threads reading
from the same snapshot (same txn_id) at the same time. I am trying to do
this in a "fork txn" way. Basically I have a master thread with a read txn,
and then I try to create txns in the slave threads with the same txn_id. So
I modified the mdb_txn_renew0() function and provide it with the txn_id the
master thread is holding. With that I hope the slave transactions can read
the same meta page because we pick the meta pages with

meta = env->me_metas[txn->mt_txnid & 1];

But then I realized I might be doing it wrong. There are only two meta
pages used in LMDB. So what if there had been two write transactions
committed after the master thread held its transaction, i.e. the master
thread has txn_id==N and current txn_id==N+2? That means the meta page was
over-written and the slave thread may read different data from the meta
page than the master.

Then the question in the header popped into my mind. When reader threads
are created, they copy the meta-db infos with a memcpy like this:

memcpy(txn->mt_dbs, meta->mm_dbs, CORE_DBS * sizeof(MDB_db));

But if the meta page was written in the middle of the memcpy, we can get
corrupted data. I am sure there is some code that prevents this data race
from happening, since we have been using LMDB with multiple threads for
quite a while. Could someone point me to the code that prevents the data
race from happening?

cheers,
chuntao