Re: LMDB and multiple processes

Brian G. Merrell Sat, 07 Jun 2014 18:07:26 -0700

Thanks again Howard for the help early on.  I'm into the development
of my real application now, and I'm able to leverage the docs much
better.


With my application, I have built a (~650MB) database with (9)
sub-databases.  I'll have a writer application to do updates, which
should be pretty easy.  There are a couple of things I want to make
sure that I'm doing correctly for the reader:

My application is a web service written in Go, using Go's net/http
package, which creates a new "goroutine" for each incoming request.
goroutines run concurrently, but may be multiplexed onto a single OS
thread.  So, I will be using the MDB_NOTLS flag when opening the
environment.  Then--from what I can gather--it seems like I will need
to allocate a pool of read-only transactions if I want to avoid
allocating new transactions for each HTTP request (is that right?).
Something like the following:


/* Test this to figure out how many are needed to never run out in practice */
N_READERS = 512

txn = env.BeginTxn(nil, MDB_RDONLY)  // mdb_txn_begin: parent=nil,
flags=MDB_RDONLY
for each dbname in dbnames {
    txn.DBIOpen(dbname, 0)  // mdb_dbi_open: name=dbname, flags=0
}
txn.Commit()

for i = 0; i < N_READERS; i++ {
    txn = env.BeginTxn(nil, MDB_RDONLY)
    txnPool.Add(txn)
}


Then, for each HTTP request, I would pull a txn out of the pool, use
it (for multiple sequential queries for a given HTTP request), reset
it, renew it, and put it back in the pool.

I've got a proof of concept working with the above strategy, but does
this all sound sane?

        Thanks,
        Brian

P.S. Sorry about the previous non-plaintext e-mails sent to the list.
Somehow my e-mail client reverted to silly mode.

On Wed, Jun 4, 2014 at 2:43 PM, Brian G. Merrell <[email protected]> wrote:
> On Wed, Jun 4, 2014 at 1:04 PM, Howard Chu <[email protected]> wrote:
>>
>> Brian G. Merrell wrote:
>>>
>>> On Wed, Jun 4, 2014 at 10:22 AM, Howard Chu <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>     Brian G. Merrell wrote:
>>>
>>>         Hi all,
>>>
>>>         First, I'm having trouble finding resources to answer a question 
>>> like this
>>>         myself, so please forgive me if I've missed something.
>>>
>>>
>>>     http://symas.com/mdb/doc/
>>>
>>>
>>> Thanks.  I did see and skim the API portion of the docs before asking, but I
>>> was just having trouble knowing how the pieces fit together to solve a 
>>> problem.
>>
>>
>> Skimming isn't going to cut it.
>
> Fair enough, I probably gave up prematurely.  Blame my inferior
> intellect, but with zero other context into LMDB, I was having trouble
> getting a holistic view of LMDB from the docs.  From the information
> you've shared, though, it's made the docs much more approachable.  For
> whatever it's worth, I plan to write something up with my findings
> that will hopefully help someone.
>
>>
>>
>>>     Your reader process should be using read transactions.
>>
>>
>>> OK, I interpret this as meaning that I need to pass the MDB_RDONLY flag to
>>> mdb_txn_begin.  Is that correct?
>>
>>
>> Yes.
>>
>>
>>>     In the actual LMDB API read transactions can be reused by their creating
>>>     thread, so they are zero-cost after the first time. I don't know if any 
>>> of
>>>     the other language wrappers leverage this fact.
>>
>>
>>> This helps a lot.  I will investigate what the case is with gomdb.
>>>
>>>
>>>     Opening a DBI only needs to be done once per process. Opening per
>>>     transaction would be stupid, like reopening a file handle on every 
>>> request.
>>>
>>>
>>> I suspected so.  The fact that mdb_dbi_open takes a transaction had me
>>> confused a bit, because I thought I would need to pass in the new 
>>> transaction
>>> every time I got a transaction from mdb_txn_begin.
>>
>>
>> mdb_dbi_open takes a txn because it needs one if you're creating a DB for 
>> the first time. I.e., it must write metadata for the DB into the 
>> environment, and all writes to MDB must be inside a txn. But once that txn 
>> is committed, the DBI itself lives on until mdb_dbi_close. This is all 
>> already explained in the doc for mdb_dbi_open; if you hadn't skimmed you 
>> would have seen it already.
>>
>> Most of this is only a concern when you're using named subDBs. The default 
>> unnamed DB always exists, so its DBI is always valid anyway.
>
> I will probably use named subDBs for my real application (instead of 9
> separate databases like I do in LevelDB), so thanks for sharing.
>
>>
>>
>>> I've refactored the reader to look like this:
>>>
>>>
>>> env = NewEnv()
>>> env.Open("/tmp/foo", 0, 0664)
>>> txn = BeginTxn(nil, mdb.RDONLY) // parent txn is the nil arg
>>> dbi = txn.DBIOpen(nil, 0)
>>> txn.Abort()
>>
>>
>> You want mdb_txn_reset() here, not abort. Abort frees/destroys the txn 
>> handle so it cannot be reused.
>>
>>
>>> while {
>>>       txn = BeginTxn(nil, mdb.RDONLY) // parent txn is the nil arg
>>
>> and here you want mdb_txn_renew(), to reuse the txn handle instead of 
>> creating a new one.
>
> Ahah!  Thank you.  I had tried this before, but because I had used the
> txn.Abort() above, things did not go well.  Now my benchmark times are
> back to what I would expect.  I.e., they are comparable to the
> performance I was seeing when I had all transaction code outside of
> the loop (but wasn't seeing the data being updated after running my
> writer process).
>
>>
>>
>>>       for i = 0; i < n_entries; i++ {
>>>           key = sprintf("Key-%d", i)
>>>           val = txn.Get(dbi, key)
>>>           print("%s: %s", key, value)
>>>       }
>>>       txn.Commit()
>>
>> and you want mdb_txn_reset() here too, not commit. Commit also 
>> frees/destroys the txn handle.
>>
>>>       sleep(5)
>>> }
>>
>>
>> You can abort or commit the txn during your process teardown phase to 
>> dispose of it.
>>
>>
>>> env.DBIClose(dbi)
>>>
>>>
>>> Now, I guess the big question that BeginTxn inside the loop is zero-cost.
>>>
>>> Thanks for the tips so far Howard; it has been very helpful.
>>
>>
>> --
>>   -- Howard Chu
>>   CTO, Symas Corp.           http://www.symas.com
>>   Director, Highland Sun     http://highlandsun.com/hyc/
>>   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Re: LMDB and multiple processes

Reply via email to