Re: [HACKERS] COPY enhancements

Greg Smith Sat, 12 Sep 2009 01:22:29 -0700

On Fri, 11 Sep 2009, Josh Berkus wrote:

I've been thinking about it, and can't come up with a really strong case
for wanting a user-defined table if we settle the issue of having a
strong key for pg_copy_errors.  Do you have one?

No, but I'd think that if the user table was only allowed to be the exactsame format as the system one it wouldn't be that hard to implement--oncethe COPY syntax is expanded at least. I'm reminded of how Oracle EXPLAINPLANs get logged into the PLAN_TABLE by default, but you can specify "INTOtable" to put them somewhere else. You'd basically doing the same thingbut with a different destination relation.

After some thought, I think that Andrew's feature *is* generally
applicable, if done as IGNORE COLUMN COUNT (or, more likely,
column_count=ignore).  I can think of a lot of data sets where column
count is jagged and you want to do ELT instead of ETL.

Exactly, the ELT approach gives you so many more options for cleaning upthe data that I think it would be used more if it weren't so hard todo in Postgres right now.

As opposed to Tom, Peter and Heikki vetoing things because the featuregain doesn't justify the maintnenance burden? That's your real choice.Adding a framework for manageable syntax extensions means that we can bemore liberal about what we justify as an extension.

I think you're not talking at the distinction I was trying to make. Thework to make the *syntax* for COPY easier to extend is an unfortunaterequirement for all these new bits; no arguments from me that using GUCsfor everything is just too painful

What I was suggesting is that the first set of useful features requiredfor what you're calling the ELT load path is both small and wellunderstood. An implementation of the stuff I see a constant need forcould get banged out so fast that trying to completely generalize it onthe first pass has a questionable return.

While complicated, COPY is a pretty walled off command of around 3500lines of code, and the hackery required here is pretty small. Forexample, it turns out we do already have the code to get it to ignorecolumn overruns here, and it's all of 50 new lines--much of which isshared with code that does other error ignoring bits too. It's easy tomake a case for a grand future extensibility cleanup here, but it's reallynot necessary to provide a significant benefit here for the cases Imentioned. And I would guess the maintenance burden of a more generalsolution has to be higher than a simple implementation of the feature listI gave in my last message.

In short: there's a presumption that adding any error-ignoring code wouldrequire significant contortions. I don't think that's really true though,and would like to keep open the possibilty of accepting some simple butuseful ad-hoc features in this area, even if they don't solve everypossible problem in this space just yet.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY enhancements

Reply via email to