Andrew Dunstan wrote:
Florian G. Pflug wrote:
Would it be possible to determine when the copy is starting that this
case holds, and not use the parallel parsing idea in those cases?
In theory, yes. In pratice, I don't want to be the one who has to
answer to an angry user who just suffered a major drop in COPY
performance after adding an ENUM column to his table.
I am yet to be convinced that this is even theoretically a good path to
follow. Any sufficiently large table could probably be partitioned and
then we could use the parallelism that is being discussed for pg_restore
without any modification to the backend at all. Similar tricks could be
played by an external bulk loader for third party data sources.
That assumes that some specific bulkloader like pg_restore, pgloader
or similar is used to perform the load. Plain libpq-users would either
need to duplicate the logic these loaders contain, or wouldn't be able
to take advantage of fast loads.
Plus, I'd see this as a kind of testbed for gently introducing
parallelism into postgres backends (especially thinking about sorting
here). CPU gain more and more cores, so in the long run I fear that we
will have to find ways to utilize more than one of those to execute a
single query.
But of course the architectural details need to be sorted out before any
credible judgement about the feasability of this idea can be made...
regards, Florian Pflug
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq