Re: Spawned Background Process Knows the Exit of Client Process?
Hi Ashutosh, Thank you for your answer. For the first point, as you suggested, we will migrate to table AM sooner or later. For the second point, your description is exactly correct (an independent process to access the storage engine). We can have multiple threads to overcome the performance issue. The problem comes from the ignorance of data types for storage engine, where the storage engine has to get the comparator function of PG to compare two keys. Otherwise, the storage engine uses "memcmp". In order to get the compare func, we have to let the independent process dependent on a specific database to access the catalog (relcache). Unfortunately, the process cannot become independent anymore once it changed its property by calling BackgroundWorkerInitializeConnection. Then our design evolves to spawn multiple processes for accessing different tables created by the storage engine. As a result, we have to release these spawned processes once the backend process switches database or terminate itself. Currently, we can set a timer for inactivity duration, in order to release the resource. I am wondering is there any elegant way to achieve this goal? Best, Shichao On Mon, 18 May 2020 at 08:37, Ashutosh Bapat wrote: > On Fri, May 15, 2020 at 11:53 PM Shichao Jin wrote: > > > > Hi Postgres Hackers, > > > > I am wondering is there any elegant way for self-spawned background > process (forked by us) to get notified when the regular client-connected > process exit from the current database (switch db or even terminate)? > > > > The background is that we are integrating a thread-model based storage > engine into Postgres via foreign data wrapper. > > PostgreSQL now support pluggable storage API. Have you considered > using that instead of FDW? > > > The engine is not allowed to have multiple processes to access it. So we > have to spawn a background process to access the engine, while the client > process can communicate with the spawned process via shared memory. In > order to let the engine recognize the data type in Postgres, the spawned > process has to access catalog such as relcache, and It must connect to the > target database via BackgroundWorkerInitializeConnection to get the info. > Unfortunately, it is not possible to switch databases for background > process. So it has to get notified when client process switches db or > terminate, then we can correspondingly close the spawned process. Please > advise us if there are alternative approaches. > > There can be multiple backends accessing different database. But from > your description it looks like there is only one background process > that will access the storage engine and it will be shared by multiple > backends which may be connected to different databases. If that's > correct, you will need to make that background process independent of > database and just access storage. That looks less performance though. > May be you can elaborate more about your usecase. > > -- > Best Wishes, > Ashutosh Bapat >
Spawned Background Process Knows the Exit of Client Process?
Hi Postgres Hackers, I am wondering is there any elegant way for self-spawned background process (forked by us) to get notified when the regular client-connected process exit from the current database (switch db or even terminate)? The background is that we are integrating a thread-model based storage engine into Postgres via foreign data wrapper. The engine is not allowed to have multiple processes to access it. So we have to spawn a background process to access the engine, while the client process can communicate with the spawned process via shared memory. In order to let the engine recognize the data type in Postgres, the spawned process has to access catalog such as relcache, and It must connect to the target database via BackgroundWorkerInitializeConnection to get the info. Unfortunately, it is not possible to switch databases for background process. So it has to get notified when client process switches db or terminate, then we can correspondingly close the spawned process. Please advise us if there are alternative approaches. Best, Shichao
Re: Memory-comparable Serialization of Data Types
Thank you for both your feedback. Yes, as indicated by Peter, we indeed use that technique in comparison in index, and now we will try passing comparator to the storage engine according to Alvaro's suggestion. Best, Shichao On Tue, 11 Feb 2020 at 17:16, Peter Geoghegan wrote: > On Tue, Feb 11, 2020 at 1:40 PM Alvaro Herrera > wrote: > > I think adding that would be too much of a burden, both for the project > > itself as for third-party type definitions; I think we'd rather rely on > > calling the BTORDER_PROC btree support function for the type. > > An operator class would still need to provide a BTORDER_PROC. What I > describe would be an optional capability. This is something that I > have referred to as key normalization in the past: > > https://wiki.postgresql.org/wiki/Key_normalization > > I think that it would only make sense as an enabler of multiple > optimizations -- not just the memcmp()/strcmp() thing. A common > strcmp()'able binary string format can be used in many different ways. > Note that this has nothing to do with changing the representation used > by the vast majority of all tuples -- just the pivot tuples, which are > mostly located in internal pages. They only make up less than 1% of > all pages in almost all cases. > > I intend to prototype this technique within the next year. It's > possible that it isn't worth the trouble, but there is only one way to > find out. I might just work on the "abbreviated keys in internal > pages" thing, for example. Though you really need some kind of prefix > compression to make that effective. > > -- > Peter Geoghegan > -- Shichao Jin PhD Student at University of Waterloo, Canada e-mail: jsc0...@gmail.com homepage: http://sites.google.com/site/csshichaojin/
Re: Memory-comparable Serialization of Data Types
Yes, this is exactly what I mean. On Tue, 11 Feb 2020 at 15:01, Peter Geoghegan wrote: > On Tue, Feb 11, 2020 at 11:53 AM Shichao Jin wrote: > > We are currently integrating LSM-tree based storage engine RocksDB into > Postgres. I am wondering is there any function that serialize data types in > memory-comparable format, similar to MySQL and MariaDB. With that kind of > function, we can directly store the serialized format in the storage engine > and compare them in the storage engine level instead of deserializing data > and comparing in the upper level. > > Do you mean a format that can perform Index comparisons using a > memcmp() rather than per-datatype comparison code? > > > > -- > Peter Geoghegan >
Memory-comparable Serialization of Data Types
Hi Postgres Developers, We are currently integrating LSM-tree based storage engine RocksDB into Postgres. I am wondering is there any function that serialize data types in memory-comparable format, similar to MySQL and MariaDB. With that kind of function, we can directly store the serialized format in the storage engine and compare them in the storage engine level instead of deserializing data and comparing in the upper level. I know PostgreSQL is towards supporting pluggble storage engine, so I think this feature would be particular useful. Best, Shichao