Re: Make COPY extendable in order to support Parquet and other formats

Aleksander Alekseev Tue, 21 Jun 2022 02:56:49 -0700

Hi Ashutosh,

> An extension just for COPY to/from parquet looks limited in
> functionality. Shouldn't this be viewed as an FDW or Table AM support
> for parquet or other formats? Of course the later is much larger in
> scope compared to the first one. But there may already be efforts
> underway
> https://www.postgresql.org/about/news/parquet-s3-fdw-01-was-newly-released-2179/


Many thanks for sharing your thoughts on this!

We are using parquet_fdw [2] but this is a read-only FDW.

What users typically need is to dump their data as fast as possible in
a given format and either to upload it to the cloud as historical data
or to transfer it to another system (Spark, etc). The data can be
accessed later if needed, as read only one.

Note that when accessing the historical data with parquet_fdw you
basically have a zero ingestion time.

Another possible use case is transferring data to PostgreSQL from
another source. Here the requirements are similar - the data should be
dumped as fast as possible from the source, transferred over the
network and imported as fast as possible.

In other words, personally I'm unaware of use cases when somebody
needs a complete read/write FDW or TableAM implementation for formats
like Parquet, ORC, etc. Also to my knowledge they are not particularly
optimized for this.

[2]: https://github.com/adjust/parquet_fdw

-- 
Best regards,
Aleksander Alekseev

Re: Make COPY extendable in order to support Parquet and other formats

Reply via email to