On Fri, 11 Sep 2009, Josh Berkus wrote:

I've been thinking about it, and can't come up with a really strong case
for wanting a user-defined table if we settle the issue of having a
strong key for pg_copy_errors.  Do you have one?

No, but I'd think that if the user table was only allowed to be the exact same format as the system one it wouldn't be that hard to implement--once the COPY syntax is expanded at least. I'm reminded of how Oracle EXPLAIN PLANs get logged into the PLAN_TABLE by default, but you can specify "INTO table" to put them somewhere else. You'd basically doing the same thing but with a different destination relation.

After some thought, I think that Andrew's feature *is* generally
applicable, if done as IGNORE COLUMN COUNT (or, more likely,
column_count=ignore).  I can think of a lot of data sets where column
count is jagged and you want to do ELT instead of ETL.

Exactly, the ELT approach gives you so many more options for cleaning up the data that I think it would be used more if it weren't so hard to do in Postgres right now.

As opposed to Tom, Peter and Heikki vetoing things because the feature gain doesn't justify the maintnenance burden? That's your real choice. Adding a framework for manageable syntax extensions means that we can be more liberal about what we justify as an extension.

I think you're not talking at the distinction I was trying to make. The work to make the *syntax* for COPY easier to extend is an unfortunate requirement for all these new bits; no arguments from me that using GUCs for everything is just too painful

What I was suggesting is that the first set of useful features required for what you're calling the ELT load path is both small and well understood. An implementation of the stuff I see a constant need for could get banged out so fast that trying to completely generalize it on the first pass has a questionable return.

While complicated, COPY is a pretty walled off command of around 3500 lines of code, and the hackery required here is pretty small. For example, it turns out we do already have the code to get it to ignore column overruns here, and it's all of 50 new lines--much of which is shared with code that does other error ignoring bits too. It's easy to make a case for a grand future extensibility cleanup here, but it's really not necessary to provide a significant benefit here for the cases I mentioned. And I would guess the maintenance burden of a more general solution has to be higher than a simple implementation of the feature list I gave in my last message.

In short: there's a presumption that adding any error-ignoring code would require significant contortions. I don't think that's really true though, and would like to keep open the possibilty of accepting some simple but useful ad-hoc features in this area, even if they don't solve every possible problem in this space just yet.

--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to