On 2017/08/18 22:41, David Fetter wrote:
On Fri, Aug 18, 2017 at 05:10:29PM +0900, Etsuro Fujita wrote:
On 2017/08/17 23:48, David Fetter wrote:
On Thu, Aug 17, 2017 at 05:27:05PM +0900, Etsuro Fujita wrote:
On 2017/07/11 6:56, Robert Haas wrote:
On Thu, Jun 29, 2017 at 6:20 AM, Etsuro Fujita
<fujita.ets...@lab.ntt.co.jp> wrote:
So, I dropped the COPY part.

Ouch.  I think we should try to figure out how the COPY part will be
handled before we commit to a design.

I spent some time on this.  To handle that, I'd like to propose doing
something similar to \copy (frontend copy): submit a COPY query "COPY ...
>FROM STDIN" to the remote server and route data from a file to the remote
server.  For that, I'd like to add new FDW APIs called during CopyFrom that
allow us to copy to foreign tables:

* BeginForeignCopyIn: this would be called after creating a ResultRelInfo
for the target table (or each leaf partition of the target partitioned
table) if it's a foreign table, and perform any initialization needed before
the remote copy can start.  In the postgres_fdw case, I think this function
would be a good place to send "COPY ... FROM STDIN" to the remote server.

* ExecForeignCopyInOneRow: this would be called instead of heap_insert if
the target is a foreign table, and route the tuple read from the file by
NextCopyFrom to the remote server.  In the postgres_fdw case, I think this
function would convert the tuple to text format for portability, and then
send the data to the remote server using PQputCopyData.

* EndForeignCopyIn: this would be called at the bottom of CopyFrom, and
release resources such as connections to the remote server.  In the
postgres_fdw case, this function would do PQputCopyEnd to terminate data
transfer.

These primitives look good.  I know it seems unlikely at first
blush, but do we know of bulk load APIs for non-PostgreSQL data
stores that this would be unable to serve?

Maybe I'm missing something, but I think these would allow the FDW
to do the remote copy the way it would like; in
ExecForeignCopyInOneRow, for example, the FDW could buffer tuples in
a buffer memory and transmit the buffered data to the remote server
if the data size exceeds a threshold.  The naming is not so good?
Suggestions are welcome.

The naming seems reasonable.

I was trying to figure out whether this fits well enough with the bulk
load APIs for databases other than PostgreSQL.  I'm guessing it's
"well enough" based on checking MySQL, Oracle, and MS SQL Server.

Good to know.

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to