[issue39408] Add support for SQLCipher
Sebastian Noack added the comment: Yes, I could use LD_LIBRARY_PATH (after copying /usr/lib/libsqlcipher.so.0 to /some/folder/libsqlite3.so), or alternatively LD_PRELOAD, and the sqlite3 stdlib module will just work as-is with SQLCipher. The latter is in fact what I'm doing at the moment, but this is quite a hack, and it's not portable to macOS or Windows. Alternatively, I could fork the sqlite3 stdlib module, have it built against SQLCipher, and redistribute it. But I'd rather not go there. That's why I'd love to see built-in support for SQLCipher in upstream Python, and as it is a drop-in replacement for SQLite3 which the stdlib already comes with bindings for, it seems to be a fairly small change on your end. -- ___ Python tracker <https://bugs.python.org/issue39408> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39408] Add support for SQLCipher
Sebastian Noack added the comment: Well, the stdlib already depends on a third-party library here, i.e. SQLite3. SQLCipher is a drop-in replacement for SQLite3 that adds support for encrypted databases. In order to use SQLCipher, I'd have to build the sqlite3 module against SQLCipher (instead of SQLite3). As it's a drop-in replacement, no further changes are required (unless rather than having SQLCipher bindings exposed as a separate module, we want enable it through an argument in sqlite3.connect). -- ___ Python tracker <https://bugs.python.org/issue39408> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39408] Add support for SQLCipher
New submission from Sebastian Noack : SQLCipher is industry-standard technology for managing an encrypting SQLite databases. It has been implemented as a fork of SQLite3. So the sqlite3 corelib module would build as-is against it. But rather than a fork (of this module), I'd rather see integration of SQLCiper in upstream Python. I'm happy to volunteer if this changes have any chance of landing. By just adding 2 lines to the cpython repository (and changing ~10 lines), I could make SQLCipher (based on the current sqlite3 module) available as a separate module (e.g. sqlcipher or sqlite3.cipher). However, IMO the ideal interface would be sqlilte3.connect(..., sqlcipher=True). Any thoughts? -- messages: 360373 nosy: Sebastian.Noack priority: normal severity: normal status: open title: Add support for SQLCipher ___ Python tracker <https://bugs.python.org/issue39408> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30297] Recursive starmap causes Segmentation fault
Sebastian Noack added the comment: Thanks for your response, both of you. All you said, make sense. Just for the record, I wouldn't necessarily expect 200k nested iterators to work. Even if it could be made work, I guess it would use way too much memory. But a RuntimeError would be much preferable over a crash. For the code above, the fix would be to just immediately convert the iterator returned by starmap() to a list. But in the end, regardless of this additional operation, it didn't perform well, so that I tossed that code, and used openssl's PBKDF2 implementation through the ctypes module. Still, I'm somewhat concerned that code like this, will cause an unexpected crash that cannot be handled, dependent on run time variables. Could this perhaps even provide a security vulnerability? It seems to be a buffer overflow, after all. -- ___ Python tracker <http://bugs.python.org/issue30297> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30297] Recursive starmap causes Segmentation fault
Sebastian Noack added the comment: I just noticed that the segfault can also be reproduced with Python 2 [1]. So please ignore what I said before that this wouldn't be the case. While it is debatable whether using a lazy evaluated object with so many recursions is a good idea in the first place, causing it the interpreter to crash with a segfault still seems concerning to me. [1]: https://github.com/mitsuhiko/python-pbkdf2/issues/2 -- ___ Python tracker <http://bugs.python.org/issue30297> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30297] Recursive starmap causes Segmentation fault
New submission from Sebastian Noack: If I run following code (on Python 3.5.3, Linux) the interpreter crashes with a segfault: def pbkdf2_bin(data, salt, iterations=1000, keylen=24, hashfunc=None): hashfunc = hashfunc or hashlib.sha1 mac = hmac.new(data, None, hashfunc) def _pseudorandom(x, mac=mac): h = mac.copy() h.update(x) return h.digest() buf = [] for block in range(1, -(-keylen // mac.digest_size) + 1): rv = u = _pseudorandom(salt + _pack_int(block)) for i in range(iterations - 1): u = _pseudorandom(u) rv = starmap(xor, zip(rv, u)) buf.extend(rv) return bytes(buf[:keylen]) pbkdf2_bin(b'1234567890', b'1234567890', 20, 32) I was able to track it down to the line of buf.extend(rv) which apparently is causing the segfault. Note that rv is a lazy-evaluated starmap. I also get a segfault if I evaluate it by other means (e.g. by passing it to the list constructor). However, if I evaluate it immediately by wrapping the starmap constructor with the list constructor, the code works as expected. But I wasn't able yet, to further isolate the issue. FWIW, the Python 2 version [1] of this code works just fine without forcing immediate evaluation of the starmap. Note that the code posted, except for the bits I changed in order to make it compatible with Python 3, is under the copyright of Armin Ronacher, who published it under the BSD license. [1]: https://github.com/mitsuhiko/python-pbkdf2 -- messages: 293192 nosy: Sebastian.Noack priority: normal severity: normal status: open title: Recursive starmap causes Segmentation fault type: crash versions: Python 3.5 ___ Python tracker <http://bugs.python.org/issue30297> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24527] The MimeTypes class cannot ignore global files per instance
New submission from Sebastian Noack: In order to prevent the mimetypes module from considering global files and registry entries, you have to call mimetypes.init([]). However, this will enforce that behavior globally, and only works if the module wasn't initialized yet. There is also a similar argument in the mimetypes.MimeTypes() constructor, however the list of files passed there are considered additionally. But there is no way to prevent an individual MinmeTypes instance to consider global files. Adding a "ignore_global_types" option would be trivial too add to the MimeTypes constructor, and would be extremely useful. -- components: Library (Lib) messages: 245930 nosy: Sebastian Noack priority: normal severity: normal status: open title: The MimeTypes class cannot ignore global files per instance type: behavior ___ Python tracker <http://bugs.python.org/issue24527> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8800] add threading.RWLock
Sebastian Noack added the comment: @Kristján: Uhh, that is a huge amount of code, more than twice as much (don't counting tests) as my implementation, to accomplish the same. And it seems that there is not much code shared between the threading and multiprocessing implementation. And for what? Ah right, to make the API suck as much as the Windows API does. Please tell me more about good coding practice. ;) -- ___ Python tracker <http://bugs.python.org/issue8800> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8800] add threading.RWLock
Sebastian Noack added the comment: Exactly, with my implemantation "the lock acquired first will be granted first". There is no way that either shared nor exclusive locks can starve, and therefore it should satisfy all use cases. Since you can only share simple datastructures like integers across processes, I also found that this seems to be the only policy (except ignoring the acquisition order at all), that can be implemented for multiprocessing. I have also looked at the seqlock algorithm, which seems to be great for use cases where the exclusive lock is acquired rather rarely and where your "reader" code is in fact read-only and therefore can be repeated. But in any other case a seqlock would break your code. However the algorithm is ultra simple and can't be implemented as lock-like object anyway. Though you could implement it as context manager, but that would hide the fact that the "reader" code will be repeated. So if you find yourself that a seqlock is that what you need for your specific use case, you can just use the algorithm like below: lock = multiprocessing.Value(0) count = multiprocessing.Value(0) def do_read(): while True: if count.value % 2: continue data = ... if count.value % 2: continue return data def do_write(data): with lock: count.value += 1 # write data count.value += 1 I have also experimented with implementing a shared/exclusive lock on top of a pipe and UNIX file locks (https://gist.github.com/3818148). However it works only on Unix and only with processes (not threads). Also it turned out that UNIX file locks don't implement an acquisition order. So exclusive locks can starve, which renders it useless for most use cases. -- ___ Python tracker <http://bugs.python.org/issue8800> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8800] add threading.RWLock
Sebastian Noack added the comment: Thanks, but as I already said there are a lot of implementations for shared/exclusive lock that can be acquired from different threads. But we need with threading as well as with multiprocessing. And by the way POSIX is the standard for implementing UNIX-like systems and not an industry standard for implementing anything, including high-level languages like Python. -- ___ Python tracker <http://bugs.python.org/issue8800> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8800] add threading.RWLock
Sebastian Noack added the comment: I would love to see how other people would implement a shared/exclusive lock that can be acquired from different processes. However it really seems that nobody did it before. If you know a reference implementation I would be more than happy. There are plenty of implementations for threading only, but they won't work with multiprocessing, due to the limitations in the ways you can share data between processes. -- ___ Python tracker <http://bugs.python.org/issue8800> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8800] add threading.RWLock
Sebastian Noack added the comment: > If you want to argue it this way, I counter that the attributes > "shared" and "exclusive" apply to the type of "access to the > protected object" you are talking about, and yet, the name suggest > that they are attributes of the lock itself. A lock's sole purpose is to synchronize access to a protected object or context. So naming a lock after its type of protection absolutely makes sense. Those names are also not supposed to be attributes of the lock, rather two locks (a shared and an exclusive lock) should be created, that might be returned as a namedtuple for convenience. > In that sense, "reader lock" and "writer lock", describe attributes > of the user of the lock, and the verbs "readlock" and "writelock" > describe the operation being requested. The user of the lock isn't necessarily a reader or writer. This is just one of many possible use cases. For example in a server application a shared/exclusive lock might be used to protect a connection to the client. So every time a thread wants to use the connection, a shared lock must be acquired and when a thread wants to shutdown the connection, the exclusive lock must be acquired, in order to ensure that it doesn't interrupt any thread still processing a request for that connection. In that case you clearly wouldn't call the users reader and writer. > The patch looks like it was produced using git rather than hg, so > perhaps Rietveld got confused by this. In that case it is a bug > in Rietveld that it produced a partial review instead of producing > no review. Yes, I have imported the Python 3.3.0 tree into a local git repository and created the patch that way. Since patches generated with git are still compatible with the 'patch' program in order to apply them, I hope that isn't a problem. > Although using namedtuple is probably a good idea, I don't think it > really adds much flexibility. This example could just as easily be > written > > selock = ShrdExclLock() > > Thread(target=reader, args=(selock.shared,)).start() > Thread(target=writer, args=(selock.exclusive,)).start() Yes, that is true, but in some cases it is more convenient to be able unpack the shared/exclusive lock into two variables, with a one-liner. And defining a namedtuple doesn't require any extra code compared to defining a class that holds both locks. In fact it needs less code to be implemented. However the flexibility comes from having two lock objects, doesn't matter how they are accessed, instead as suggested by Kristján to have a single lock object, which just provides proxies for use with the with statement. > I also think it is time to drop the "writer preference" model, since > it just adds complexity with doubtful benefits. Sebastian's model > also does that. I have implemented the simplest possible acquisition order. The lock acquired first will be granted first. Without that (or a more advanced policy) in applications with concurrent threads/processes that are heavily using the shared lock, the exclusive lock can never be acquired, because of there is always a shared lock acquired and before it is released the next shared lock will be acquired. -- ___ Python tracker <http://bugs.python.org/issue8800> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8800] add threading.RWLock
Sebastian Noack added the comment: @richard: I'm sorry, but both of my patches contain changes to 'Lib/threading.py' and can be applied on top of Python 3.3.0. So can you explain what do you mean, by missing the changes to threading.py? -- ___ Python tracker <http://bugs.python.org/issue8800> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8800] add threading.RWLock
Sebastian Noack added the comment: Yes, you could also look at the shared/exclusive lock as one lock with different states. But this approach is neither more common, have a look at Java's ReadWriteLock [1] for example, which works just like my patch does, except that a factory is returned instead of a tuple. Nor does it provide any of the benefits, I have mentioned before (same API as Lock and RLock, better compatibility with existing code an with statement, ability to pass the shared or exclusive lock separetly around). But maybe we could satisfy anybody, by following Richard's and Antoine's suggestion of returning a named tuple. So you could use the ShrdExclLock both ways: # use a single object lock = ShrdExclLock() with lock.shared: ... with lock.exclusive: ... # unpack the the object into two variables and pass them separately around shrd_lock, excl_lock = ShrdExclLock() Thread(target=reader, args=(shrd_lock,)).start() Thread(target=writer, args=(excl_lock,)).start) The majority of us seems to prefer the terms shared and exclusive. However I can't deny that the terms read and write are more common, even though there are also notable exmples where the terms shared and exclusive are used [2] [3]. But let us ignore how other call it for now, and get to the origin of both set of terms, in order to figure out which fits best into Python: shared/exclusive -> abstract description of what it is read/write -> best known use case The reason why I prefer the terms shared and exculsive, is that it is more distinct and less likely to get misunderstand. Also naming a generic implementation after a specific use case is bad API design and I don't know any other case where that was done, in the Python core library. [1] http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/locks/ReadWriteLock.html [2] http://www.postgresql.org/docs/9.2/static/explicit-locking.html [3] http://www.unix.com/man-page/freebsd/9/SX/ -- ___ Python tracker <http://bugs.python.org/issue8800> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8800] add threading.RWLock
Sebastian Noack added the comment: I was just waiting for a comment pointing out, that my patch comes without tests. :) Note that we are still discussing the implementation and this patch is just a proof of concept. And since the way it is implemented and the API it provides could still change, its quite pointless to write tests, until we at least agreed on the API. I have uploaded a new patch. The way it is implemented now, is more like the Barrier is implemented. The common code is shared in the threading module and the shared/exclusive lock objects can be pickled now. I have also fixed a bug related to acquiring locks in non-blocking mode. However the code still uses c_uint, but ctypes (and multiprocessing.sharedtypes) is only imported when ShrdExclLock is called. So it is just a lazy dependency, now. However the reason why I am using ctypes instead of python integers for threading and a BufferWrapper for multiprocessing (as the Barrier does) is, because of 2 of the 4 counters need to be continuously incremented, and c_uint has the nice feature that it can overflow, in contrast to python integers and integers in arrays. Also that way the implementation is simpler and it seems that there isn't much difference under the hood between using BufferWrapper() and RawValue(). A shared/exclusive lock isn't one lock but two locks, which are synchronized, but must be acquired separately. Similar to a pipe, which isn't one file, but one file connected to another file that reads whatever you have written into the first file. So it isn't strange to create two lock objects, as it also isn't strange that os.pipe() returns two file descriptors. Also having a separate lock object for the shared and exclusive lock, each providing the same API (as Lock and RLock), gives you huge flexibility. You can acquire both locks using the with statement or pass them separately around. So for example when you have a function, thread or child process, that should only be able to acquire either the shared or the exclusive lock, you don't have to pass both locks. That also means that existing code that expects a lock-like object will be compatible with both the shared and exclusive lock. -- Added file: http://bugs.python.org/file27363/Added-ShrdExclLock-to-threading-and-multiprocessing-2.patch ___ Python tracker <http://bugs.python.org/issue8800> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8800] add threading.RWLock
Sebastian Noack added the comment: I've added a new patch, that implements a shared/exclusive lock as described in my comments above, for the threading and multiprocessing module. -- Added file: http://bugs.python.org/file27350/Added-ShrdExclLock-to-threading-and-multiprocessing.patch ___ Python tracker <http://bugs.python.org/issue8800> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8800] add threading.RWLock
Sebastian Noack added the comment: Using a lock as context manager is the same as calling lock.acquire(blocking=True) and it will in fact block while waiting for an other thread to release the lock. In your code, the internal lock is indeed just hold for a very short period of time while acquiring or releasing a shared or exclusive lock, but it might add up to a notable amount of time dependent on how much concurrent threads are using the same RWLock and how slow/busy your computer is. But what made me reconsider my point are following facts: 1. For example, when you acquire a shared (read) lock in non-blocking mode and False is returned, you assume that an other thread is holding an exclusive (write) lock. But that isn't necessarily the case, if it also returns False, when the internal lock is acquired by an other thread for example in order to acquire or release another shared (read) lock. 2. The internal lock must be acquired also in order to release a shared/exclusive lock. And the 'release' method (at least if implemented as for Lock and RLock) don't have a 'blocking' argument, anyway. For that reasons, I think it is ok to block while waiting for the internal lock, even if the shared/exclusive lock was acquired in non-blocking mode. At least it seems to lead to less unexpected side effects, than returning False in case the internal lock is acquired. -- ___ Python tracker <http://bugs.python.org/issue8800> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8800] add threading.RWLock
Sebastian Noack added the comment: I would love to see a reader/writer lock implementation shipped with Python's threading (and multiprocessing) module. But I have some issues with the patch: 1. I would avoid the terms 'read' and 'write' as those terms are referring just to one of many use cases. A better and more generic name would be shared and exclusive lock. 2. If we add a new synchronization primitive to the threading module we really should add it also to the multiprocessing module, for consistency and to keep switching between threading and multiprocessing as easy as it is right now. 3. The methods rdlock() and wrlock() might even block if you call them with blocking=False. That's because of they acquire the internal lock in a blocking fashion before they would return False. 4. As Antoine already pointed out, it is a bad idea to make acquiring the exclusive (write) lock, the default behavior. That clearly violates the Zen of Python, since explicit is better than implicit. 5. The issue above only raises from the idea that the RWLock should provide the same API as the Lock and RLock primitives. So everywhere where a lock primitive is expected, you can pass either a Lock, RLock or RWLock. That is actually a good idea, but in that case you should explicitly specify, whether to pass the shared (read) or the exclusive (write) lock. Both issues 4. and 5. only raise from the idea that a shared/exclusive lock should be implemented as a single class. But having two different lock primitives, one for the shared lock and one for the exclusive lock and a function returning a pair of those, would be much more flexible, pythonic and compatible with existing lock primitives. def ShrdExclLock() class _ShrdLock(object): def acquire(self, blocking=True): ... def release(self, blocking=True): ... def __enter__(self): self.acquire() retrun self def __exit__(self, exc_value, exc_type, tb): self.release() class _ExclLock(object): def acquire(self, blocking=True): ... def release(self, blocking=True): ... def __enter__(self): self.acquire() retrun self def __exit__(self, exc_value, exc_type, tb): self.release() return _ShrdLock(), _ExclLock() # create a shared/exclusive lock shrd_lock, excl_lock = ShrdExclLock() -- nosy: +Sebastian.Noack ___ Python tracker <http://bugs.python.org/issue8800> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com