Re: Spawned Background Process Knows the Exit of Client Process?

2020-05-18 Thread Shichao Jin
Hi Ashutosh,

Thank you for your answer.

For the first point, as you suggested, we will migrate to table AM sooner
or later.

For the second point, your description is exactly correct (an independent
process to access the storage engine). We can have multiple threads to
overcome the performance issue.

The problem comes from the ignorance of data types for storage engine,
where the storage engine has to get the comparator function of PG to
compare two keys. Otherwise, the storage engine uses "memcmp". In order to
get the compare func, we have to let the independent process dependent on a
specific database to access the catalog (relcache). Unfortunately, the
process cannot become independent anymore once it changed its property by
calling BackgroundWorkerInitializeConnection. Then our design evolves to
spawn multiple processes for accessing different tables created by the
storage engine. As a result, we have to release these spawned processes
once the backend process switches database or terminate itself. Currently,
we can set a timer for inactivity duration, in order to release the
resource. I am wondering is there any elegant way to achieve this goal?

Best,
Shichao

On Mon, 18 May 2020 at 08:37, Ashutosh Bapat 
wrote:

> On Fri, May 15, 2020 at 11:53 PM Shichao Jin  wrote:
> >
> > Hi Postgres Hackers,
> >
> > I am wondering is there any elegant way for self-spawned background
> process (forked by us) to get notified when the regular client-connected
> process exit from the current database (switch db or even terminate)?
> >
> > The background is that we are integrating a thread-model based storage
> engine into Postgres via foreign data wrapper.
>
> PostgreSQL now support pluggable storage API. Have you considered
> using that instead of FDW?
>
> > The engine is not allowed to have multiple processes to access it. So we
> have to spawn a background process to access the engine, while the client
> process can communicate with the spawned process via shared memory. In
> order to let the engine recognize the data type in Postgres, the spawned
> process has to access catalog such as relcache, and It must connect to the
> target database via BackgroundWorkerInitializeConnection to get the info.
> Unfortunately, it is not possible to switch databases for background
> process. So it has to get notified when client process switches db or
> terminate, then we can correspondingly close the spawned process. Please
> advise us if there are alternative approaches.
>
> There can be multiple backends accessing different database. But from
> your description it looks like there is only one background process
> that will access the storage engine and it will be shared by multiple
> backends which may be connected to different databases. If that's
> correct, you will need to make that background process independent of
> database and just access storage. That looks less performance though.
> May be you can elaborate more about your usecase.
>
> --
> Best Wishes,
> Ashutosh Bapat
>


Spawned Background Process Knows the Exit of Client Process?

2020-05-15 Thread Shichao Jin
Hi Postgres Hackers,

I am wondering is there any elegant way for self-spawned background process
(forked by us) to get notified when the regular client-connected process
exit from the current database (switch db or even terminate)?

The background is that we are integrating a thread-model based storage
engine into Postgres via foreign data wrapper. The engine is not allowed to
have multiple processes to access it. So we have to spawn a background
process to access the engine, while the client process can communicate with
the spawned process via shared memory. In order to let the engine recognize
the data type in Postgres, the spawned process has to access catalog such
as relcache, and It must connect to the target database
via BackgroundWorkerInitializeConnection to get the info. Unfortunately, it
is not possible to switch databases for background process. So it has to
get notified when client process switches db or terminate, then we can
correspondingly close the spawned process. Please advise us if there are
alternative approaches.

Best,
Shichao


Re: Memory-comparable Serialization of Data Types

2020-02-12 Thread Shichao Jin
Thank you for both your feedback. Yes, as indicated by Peter, we indeed use
that technique in comparison in index, and now we will try passing
comparator to the storage engine according to Alvaro's suggestion.

Best,
Shichao




On Tue, 11 Feb 2020 at 17:16, Peter Geoghegan  wrote:

> On Tue, Feb 11, 2020 at 1:40 PM Alvaro Herrera 
> wrote:
> > I think adding that would be too much of a burden, both for the project
> > itself as for third-party type definitions; I think we'd rather rely on
> > calling the BTORDER_PROC btree support function for the type.
>
> An operator class would still need to provide a BTORDER_PROC. What I
> describe would be an optional capability. This is something that I
> have referred to as key normalization in the past:
>
> https://wiki.postgresql.org/wiki/Key_normalization
>
> I think that it would only make sense as an enabler of multiple
> optimizations -- not just the memcmp()/strcmp() thing. A common
> strcmp()'able binary string format can be used in many different ways.
> Note that this has nothing to do with changing the representation used
> by the vast majority of all tuples -- just the pivot tuples, which are
> mostly located in internal pages. They only make up less than 1% of
> all pages in almost all cases.
>
> I intend to prototype this technique within the next year. It's
> possible that it isn't worth the trouble, but there is only one way to
> find out. I might just work on the "abbreviated keys in internal
> pages" thing, for example. Though you really need some kind of prefix
> compression to make that effective.
>
> --
> Peter Geoghegan
>


-- 
Shichao Jin
PhD Student at University of Waterloo, Canada
e-mail: jsc0...@gmail.com
homepage: http://sites.google.com/site/csshichaojin/


Re: Memory-comparable Serialization of Data Types

2020-02-11 Thread Shichao Jin
Yes, this is exactly what I mean.

On Tue, 11 Feb 2020 at 15:01, Peter Geoghegan  wrote:

> On Tue, Feb 11, 2020 at 11:53 AM Shichao Jin  wrote:
> > We are currently integrating LSM-tree based storage engine RocksDB into
> Postgres. I am wondering is there any function that serialize data types in
> memory-comparable format, similar to MySQL and MariaDB. With that kind of
> function, we can directly store the serialized format in the storage engine
> and compare them in the storage engine level instead of deserializing data
> and comparing in the upper level.
>
> Do you mean a format that can perform Index comparisons using a
> memcmp() rather than per-datatype comparison code?
>
>
>
> --
> Peter Geoghegan
>


Memory-comparable Serialization of Data Types

2020-02-11 Thread Shichao Jin
Hi Postgres Developers,

We are currently integrating LSM-tree based storage engine RocksDB into
Postgres. I am wondering is there any function that serialize data types in
memory-comparable format, similar to MySQL and MariaDB. With that kind of
function, we can directly store the serialized format in the storage engine
and compare them in the storage engine level instead of deserializing data
and comparing in the upper level. I know PostgreSQL is towards supporting
pluggble storage engine, so I think this feature would be particular useful.

Best,
Shichao