Re: py-lmdb often gets segmentation faults

2023-01-30 Thread Roger Marsh
On Sat, 28 Jan 2023 21:11:06 - (UTC)
Stuart Henderson  wrote:

> Thanks for the detailed report, thinking about what information
> might be useful, and including it directly in your email.
> 
> On 2023-01-28, Roger Marsh  wrote:
> > Package lmdb-0.9.29 was installed; and py-lmdb, as a --user install, from 
> > PyPi so lmdb can be used from Python3.9.  
> 
> py-lmdb bundles its own copy of LMDB and uses it by default, or you
> can set extra flags to use the system version.
> 
> Normally the bundled one is ok, but the LMDB library assumes an OS with
> coherent file and mmap access (often referred to as 'unified buffer
> cache') which OpenBSD lacks. The LMDB port (and OpenLDAP) are patched to
> force MDB_WRITEMAP so all io goes through mmap. This has some drawbacks
> (and some advantages in certain circumstances) but as far as we know
> it's the only way we can run it. Obviously the copy bundled in py-lmdb
> doesn't have this patch.
> 
> "Use a writeable memory map unless MDB_RDONLY is set. This is
>   faster and uses fewer mallocs, but loses protection from application
>   bugs like wild pointer writes and other bad updates into the database.
>   Incompatible with nested transactions. Do not mix processes with and
>   without MDB_WRITEMAP on the same environment. This can defeat durability
>   (mdb_env_sync etc)."
> 
> (If I understand correctly, LMDB normally writes through file access,
> reads through mmap).
> 
> > Behaviour seemed erratic, including segmentation faults, but for a
> > while it seemed possible to avoid these by using alternative methods
> > supported by py-lmdb.  
> 
> Yes this sounds about the expected behaviour for the UBC problem.
> 
> You can try the port I've just posted to ports@ which uses the system
> lmdb library. https://marc.info/?l=openbsd-ports=167493896320807=2
> Alternatively set the flags from MAKE_ENV in that port while building
> py-lmdb manually.
> 
> 

Thanks for the explanation and the port.



Re: py-lmdb often gets segmentation faults

2023-01-28 Thread Stuart Henderson
Thanks for the detailed report, thinking about what information
might be useful, and including it directly in your email.

On 2023-01-28, Roger Marsh  wrote:
> Package lmdb-0.9.29 was installed; and py-lmdb, as a --user install, from 
> PyPi so lmdb can be used from Python3.9.

py-lmdb bundles its own copy of LMDB and uses it by default, or you
can set extra flags to use the system version.

Normally the bundled one is ok, but the LMDB library assumes an OS with
coherent file and mmap access (often referred to as 'unified buffer
cache') which OpenBSD lacks. The LMDB port (and OpenLDAP) are patched to
force MDB_WRITEMAP so all io goes through mmap. This has some drawbacks
(and some advantages in certain circumstances) but as far as we know
it's the only way we can run it. Obviously the copy bundled in py-lmdb
doesn't have this patch.

"Use a writeable memory map unless MDB_RDONLY is set. This is
  faster and uses fewer mallocs, but loses protection from application
  bugs like wild pointer writes and other bad updates into the database.
  Incompatible with nested transactions. Do not mix processes with and
  without MDB_WRITEMAP on the same environment. This can defeat durability
  (mdb_env_sync etc)."

(If I understand correctly, LMDB normally writes through file access,
reads through mmap).

> Behaviour seemed erratic, including segmentation faults, but for a
> while it seemed possible to avoid these by using alternative methods
> supported by py-lmdb.

Yes this sounds about the expected behaviour for the UBC problem.

You can try the port I've just posted to ports@ which uses the system
lmdb library. https://marc.info/?l=openbsd-ports=167493896320807=2
Alternatively set the flags from MAKE_ENV in that port while building
py-lmdb manually.




py-lmdb often gets segmentation faults

2023-01-28 Thread Roger Marsh
Package lmdb-0.9.29 was installed; and py-lmdb, as a --user install, from PyPi 
so lmdb can be used from Python3.9.

Behaviour seemed erratic, including segmentation faults, but for a while it 
seemed possible to avoid these by using alternative methods supported by 
py-lmdb.

Eventually I got stuck and downloaded the project from 
github.com/jnwatson/py-lmdb/ to try running the address-book example and modify 
it toward something like the problem case at which I was stuck.

Running got a segmentation fault immediately but repeated retries after 
deleting the *.sem siblings of /tmp/address-book.lmdb got to a case where the 
run did not give a segmentation fault.

The first script below shows this and includes a list of installed packages and 
dmesg output.

The second script below is from an equivalent FreeBSD session and shows the 
address-book example giving the expected output.

I was assuming py-lmdb should just work, but notice FreeBSD has a port but not 
OpenBSD: so maybe assumption is wrong?


Script started on Sat Jan 28 12:59:27 2023
opendev$ uname -a
OpenBSD opendev.home 7.2 GENERIC.MP#758 amd64
opendev$ ls /tmp
sndio  vi.recover
opendev$ python3.9 py-lmdb-py-lmdb_1.4.0/examples/address-book.py
Segmentation fault (core dumped) 
opendev$ ls /y tmp
2192baec8c73f403a857cb5e31a384a8481d96ab965510ec63ad71da9798c522.sem
address-book.lmdb
eebfd9cc55a9dc2ab5b13550e78544804307f93944815b7760405d5dfeac1b0f.sem
sndio
vi.recover
opendev$ python3.9 py-lmdb-py-lmdb_1.4.0/examples/address-book.py
Segmentation fault (core dumped) 
opendev$ ls /tmp/*.sem
/tmp/2192baec8c73f403a857cb5e31a384a8481d96ab965510ec63ad71da9798c522.sem
/tmp/eebfd9cc55a9dc2ab5b13550e78544804307f93944815b7760405d5dfeac1b0f.sem
opendev$ 
opendev$ ls /tmp/*.sem  /tmp/*.sem  /tmp/*.sem 
r /tmp/*.semm /tmp/*.sem
opendev$ 
opendev$ rm /tmp/*.sem 
opendev$ ls /tmp/*.sem 
opendev$ python3.9 py-lmdb-py-lmdb_1.4.0/examples/address-book.py 
DB: home

DB: business

Updating number for dentist
Segmentation fault (core dumped) 
opendev$ 
opendev$ python3.9 py-lmdb-py-lmdb_1.4.0/examples/address-book.py 
opendev$ rm /tmp/*.sem

opendev$ 
opendev$ rm /tmp/*.sem 
opendev$ python3.9 py-lmdb-py-lmdb_1.4.0/examples/address-book.py 
DB: home

DB: business

Updating number for dentist
Deleting number for hospital

Home DB is now:
   b'dentist' b'01231'

Boss telephone number: b'0123151232'

Deleting all numbers from business DB:
Adding number for recruiter to business DB
Business DB is now:
   b'recruiter' b'04123125324'

opendev$ ls /tmp
address-book.lmdb   sndio   vi.recover
opendev$ 
opendev$ 
opendev$ ls /tmp 
opendev$ python3.9 py-lmdb-py-lmdb_1.4.0/examples/address-book.py 
DB: home
   b'recruiter' b'04123125324'

DB: business

Updating number for dentist
Deleting number for hospital

Home DB is now:
   b'dentist' b'01231'
   b'recruiter' b'04123125324'

Boss telephone number: None

Deleting all numbers from business DB:
Adding number for recruiter to business DB
Business DB is now:
   b'recruiter' b'04123125324'

opendev$ 
opendev$ python3.9 py-lmdb-py-lmdb_1.4.0/examples/address-book.py 
opendev$ ls /tmp  

address-book.lmdb   sndio   vi.recover
opendev$ 
opendev$ cat py-lmdb-py-lmdb_1.4.0/examples/address-book.py

import lmdb

# Open (and create if necessary) our database environment. Must specify
# max_dbs=... since we're opening subdbs.
env = lmdb.open('/tmp/address-book.lmdb', max_dbs=10)

# Now create subdbs for home and business addresses.
home_db = env.open_db(b'home')
business_db = env.open_db(b'business')


# Add some telephone numbers to each DB:
with env.begin(write=True) as txn:
txn.put(b'mum', b'012345678', db=home_db)
txn.put(b'dad', b'011232211', db=home_db)
txn.put(b'dentist', b'044415121', db=home_db)
txn.put(b'hospital', b'078126321', db=home_db)

txn.put(b'vendor', b'0917465628', db=business_db)
txn.put(b'customer', b'0553211232', db=business_db)
txn.put(b'coworker', b'0147652935', db=business_db)
txn.put(b'boss', b'0123151232', db=business_db)
txn.put(b'manager', b'0644810485', db=business_db)


# Iterate each DB to show the keys are sorted:
with env.begin() as txn:
for name, db in ('home', home_db), ('business', business_db):
print('DB:', name)
for key, value in txn.cursor(db=db):
print('  ', key, value)
print()


# Now let's update some phone numbers. We can specify the default subdb when
# starting the transaction, rather than pass it in every time:
with env.begin(write=True, db=home_db) as txn:
print('Updating number for dentist')
txn.put(b'dentist', b'01231')

print('Deleting number for hospital')
txn.delete(b'hospital')
print()